Topological and Ergodic Theory of Symbolic Dynamics [1 ed.] 2022034733, 9781470469849, 9781470472191, 9781470472184

Symbolic dynamics is essential in the study of dynamical systems of various types and is connected to many other fields

275 72 9MB

English Pages 460 [481] Year 2022

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Cover
Title page
Contents
Preface
Chapter 1. First Examples and General Properties of Subshifts
1.1. Symbol Sequences and Subshifts
1.2. Word-Complexity
1.3. Transitive and Synchronized Subshifts
1.4. Sliding Block Codes
1.5. Word-Frequencies and Shift-Invariant Measures
1.6. Symbolic Itineraries
Chapter 2. Topological Dynamics
2.1. Basic Notions from Dynamical Systems
2.2. Transitive and Minimal Systems
2.3. Equicontinuous and Distal Systems
2.4. Topological Entropy
2.5. Mathematical Chaos
2.6. Transitivity and Topological Mixing
2.7. Shadowing and Specification
Chapter 3. Subshifts of Positive Entropy
3.1. Subshifts of Finite Type
3.2. Sofic Shifts
3.3. Coded Subshifts
3.4. Hereditary and Density Shifts
3.5. 𝛽-Shifts and 𝛽-Expansions
3.6. Unimodal Subshifts
3.7. Gap Shifts
3.8. Spacing Shifts
3.9. Power-Free Shifts
3.10. Dyck Shifts
Chapter 4. Subshifts of Zero Entropy
4.1. Linear Recurrence
4.2. Substitution Shifts
4.3. Sturmian Subshifts
4.4. Interval Exchange Transformations
4.5. Toeplitz Shifts
4.6. \cB-Free Shifts
4.7. Unimodal Restrictions to Critical Omega-Limit Sets
Chapter 5. Further Minimal Cantor Systems
5.1. Kakutani-Rokhlin Partitions
5.2. Cutting and Stacking
5.3. Enumeration Systems
5.4. Bratteli Diagrams and Vershik Maps
Chapter 6. Methods from Ergodic Theory
6.1. Ergodicity
6.2. Birkhoff’s Ergodic Theorem
6.3. Unique Ergodicity
6.4. Measure-Theoretic Entropy
6.5. Isomorphic Systems
6.6. Measures of Maximal Entropy
6.7. Mixing
6.8. Spectral Properties
6.9. Eigenvalues of Bratteli-Vershik Systems
Chapter 7. Automata and Linguistic Complexity
7.1. Automata
7.2. The Chomsky Hierarchy
7.3. Automatic Sequences and Cobham’s Theorems
Chapter 8. Miscellaneous Background Topics
8.1. Pisot and Salem Numbers
8.2. Continued Fractions
8.3. Uniformly Distributed Sequences
8.4. Diophantine Approximation
8.5. Density and Banach Density
8.6. The Perron-Frobenius Theorem
8.7. Countable Graphs and Matrices
Appendix. Solutions to Exercises
Bibliography
Index
Back Cover
Recommend Papers

Topological and Ergodic Theory of Symbolic Dynamics [1 ed.]
 2022034733, 9781470469849, 9781470472191, 9781470472184

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

GRADUATE STUDIES I N M AT H E M AT I C S

228

Topological and Ergodic Theory of Symbolic Dynamics Henk Bruin

Topological and Ergodic Theory of Symbolic Dynamics

GRADUATE STUDIES I N M AT H E M AT I C S

228

Topological and Ergodic Theory of Symbolic Dynamics Henk Bruin

EDITORIAL COMMITTEE Matthew Baker Marco Gualtieri Gigliola Staffilani (Chair) Jeff A. Viaclovsky Rachel Ward 2020 Mathematics Subject Classification. Primary 37B10; Secondary 37B05, 28D05, 11J70, 68R15.

For additional information and updates on this book, visit www.ams.org/bookpages/gsm-228

Library of Congress Cataloging-in-Publication Data Names: Bruin, Henk, 1966- author. Title: Topological and ergodic theory of symbolic dynamics / Henk Bruin. Description: Providence, Rhode Island : American Mathematical Society, [2022] | Series: Graduate studies in mathematics, 1065-7339 ; 228 | Includes bibliographical references and index. Identifiers: LCCN 2022034733 | ISBN 9781470469849 (hardcover) | ISBN 9781470472191 (paperback) | ISBN 9781470472184 (ebook) Subjects: LCSH: Symbolic dynamics. | Topological dynamics. | Ergodic theory. | AMS: Dynamical systems and ergodic theory – Topological dynamics – Symbolic dynamics. | Dynamical systems and ergodic theory – Topological dynamics – Transformations and group actions with special properties (minimality, distality, proximality, etc.). | Measure and integration – Measure-theoretic ergodic theory – Measure-preserving transformations. | Number theory – Diophantine approximation, transcendental number theory – Continued fractions and generalizations. | Computer science – Discrete mathematics in relation to computer science – Combinatorics on words. Classification: LCC QA614.85 .B78 2022 | DDC 515/.39–dc23/eng20221021 LC record available at https://lccn.loc.gov/2022034733

Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy select pages for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Requests for permission to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For more information, please visit www.ams.org/publications/pubpermissions. Send requests for translation rights and licensed reprints to [email protected]. c 2022 by the author. All rights reserved.  Printed in the United States of America. ∞ The paper used in this book is acid-free and falls within the guidelines 

established to ensure permanence and durability. Visit the AMS home page at https://www.ams.org/ 10 9 8 7 6 5 4 3 2 1

27 26 25 24 23 22

Contents

Preface

ix

Chapter 1. First Examples and General Properties of Subshifts

1

§1.1. Symbol Sequences and Subshifts

1

§1.2. Word-Complexity

5

§1.3. Transitive and Synchronized Subshifts

9

§1.4. Sliding Block Codes

10

§1.5. Word-Frequencies and Shift-Invariant Measures

12

§1.6. Symbolic Itineraries

14

Chapter 2. Topological Dynamics

19

§2.1. Basic Notions from Dynamical Systems

19

§2.2. Transitive and Minimal Systems

23

§2.3. Equicontinuous and Distal Systems

28

§2.4. Topological Entropy

36

§2.5. Mathematical Chaos

40

§2.6. Transitivity and Topological Mixing

44

§2.7. Shadowing and Specification

47

Chapter 3. Subshifts of Positive Entropy

51

§3.1. Subshifts of Finite Type

51

§3.2. Sofic Shifts

61

§3.3. Coded Subshifts

65

§3.4. Hereditary and Density Shifts

71 v

vi

Contents

§3.5.

β-Shifts and β-Expansions

§3.6. Unimodal Subshifts

77 88

§3.7. Gap Shifts

117

§3.8. Spacing Shifts

120

§3.9. Power-Free Shifts

122

§3.10.

128

Dyck Shifts

Chapter 4. Subshifts of Zero Entropy

133

§4.1. Linear Recurrence

133

§4.2. Substitution Shifts

135

§4.3. Sturmian Subshifts

162

§4.4. Interval Exchange Transformations

180

§4.5. Toeplitz Shifts

185

§4.6.

B-Free Shifts

§4.7. Unimodal Restrictions to Critical Omega-Limit Sets Chapter 5. Further Minimal Cantor Systems

195 203 217

§5.1. Kakutani-Rokhlin Partitions

217

§5.2. Cutting and Stacking

220

§5.3. Enumeration Systems

225

§5.4. Bratteli Diagrams and Vershik Maps

233

Chapter 6. Methods from Ergodic Theory

257

§6.1. Ergodicity

259

§6.2. Birkhoff’s Ergodic Theorem

260

§6.3. Unique Ergodicity

262

§6.4. Measure-Theoretic Entropy

282

§6.5. Isomorphic Systems

284

§6.6. Measures of Maximal Entropy

287

§6.7. Mixing

295

§6.8. Spectral Properties

309

§6.9. Eigenvalues of Bratteli-Vershik Systems

325

Chapter 7. Automata and Linguistic Complexity

341

§7.1. Automata

341

§7.2. The Chomsky Hierarchy

345

§7.3. Automatic Sequences and Cobham’s Theorems

357

Contents

vii

Chapter 8. Miscellaneous Background Topics

367

§8.1. Pisot and Salem Numbers

367

§8.2. Continued Fractions

376

§8.3. Uniformly Distributed Sequences

385

§8.4. Diophantine Approximation

391

§8.5. Density and Banach Density

395

§8.6. The Perron-Frobenius Theorem

398

§8.7. Countable Graphs and Matrices

401

Appendix.

Solutions to Exercises

413

Bibliography

423

Index

451

Preface Symbolic dynamics and coding: For many students in mathematics, symbolic dynamics appears first in a general course on dynamical systems, as a symbolic description of Smale’s horseshoe, basic sets of Axiom A attractors, or toral automorphisms. For these systems there is a Markov partition and on a symbolic level these are described by a subshift of finite type (SFT). That symbolic dynamics can lead to other subshifts (e.g. Sturmian shifts, β-shifts, kneading theory) often remains beyond view or is only covered in specialized topics courses. Also a standard undergraduate program will not reveal that symbolic dynamics is a flourishing subject by itself, giving very flexible approaches to construct examples with particular topological and ergodic properties. This book is meant to give an overview of many of these aspects of symbolic sequences and coding and allows readers to look beyond their own initial interest. That some topics (in particular ergodic theory and also kneading theory, i.e. symbolic dynamics of unimodal interval maps) are studied at considerably greater depth than others has to do with my own research background and interests. Because {0, 1}N and {0, 1}Z are standard representations of the Cantor set, symbolic dynamics is in essence the study of transformations of Cantor sets. When I asked my second thesis supervisor, Jan Aarts, once if there was a general topological classification of such Cantor systems, his answer was “no”. If I would have been able to ask this question again, his answer may have been very different. Not only has there been an explosion of new types of subshifts that have been investigated in detail, several unified ways to study them and Cantor systems as a whole have been developed as well. That a complete classification is still lacking seems nowadays less important, or of rather abstract interest, than how modern techniques help us to understand concrete (symbolic) systems.

ix

x

Preface

What you will find in this book: Chapter 1: Chapter 1 presents the basic notions in symbolic dynamics, including itineraries, word-complexity, and word-frequency (and a first mention of shift-invariant measures), and it presents some of the simplest examples. In Section 1.4 we discuss the CurtisHedlund-Lyndon Theorem 1.23 on conjugacy between subshifts given by sliding block codes. Chapter 2: Chapter 2 gives a brief introduction to topological dynamics. We start with dynamical systems and its basic terminology. Then we present transitive (i.e. dynamically indecomposable) systems followed by minimal systems, which can be characterized by bounded gaps in the sequence of visit times to open sets. We discuss the distinction between expansive and expanding, between expansive and distal (and equicontinuity, which is achieved under the appropriate metric). Mean equicontinuity is a more flexible and important variation of this. In addition to topological entropy and power entropy (which are the logarithmic and polynomial growth rates of the word-complexity), we discuss the fairly new notion of amorphic complexity which is a finer tool than power entropy in distinguishing zero entropy systems, such as constant length substitution shifts and Toeplitz shifts. We discuss several forms of mathematical chaos (in the sense of Devaney, of Li-Yorke, and of Auslander-Yorke for minimal systems) and the Auslander-Yorke dichotomy, with some of its measure-theoretic versions. We present topological mixing in various strengths, and shadowing and specification properties, from Anosov’s and Bowen’s work to its use in intrinsic ergodicity, i.e. the existence of a unique measure of maximal entropy. Chapter 3: The main types of subshifts are divided into positive entropy and zero entropy subshifts, although this distinction doesn’t apply strictly to every type; for instance zero and positive entropy Toeplitz shifts and B-free shifts might be equally important. Chapter 3 covers the positive entropy subshifts. The main class of these is the subshifts of finite type, followed by sofic shifts. Both can be characterized by transition graphs (vertex-labeled versus edgelabeled) and their topological entropy can be simply computed as the logarithm of the leading (Perron-Frobenius) eigenvalue of their transition matrices, which therefore take specific values only (the logarithms of Perron numbers). More general types of subshifts, in which all values of topological entropy can be achieved, include coded shifts, density shifts, gap-shifts, and spacing shifts, which appear here probably first at textbook level. Directly related to

Preface

xi

interval maps (β-transformations and unimodal maps) are β-shifts and kneading theory. Our treatment of the latter is broader than in the (by now 40-year-old) monographs of Milnor & Thurston [420] or Collet & Eckmann [164]. We cover at length Hofbauer’s approach of cutting times, which is closely related to the enumeration systems of Section 5.3. Infinitely renormalizable unimodal maps, ∗-products, and (strange) adding machines are deferred to Section 4.7.1 because they fit better in the zero entropy chapter. Finally, we present the square-free subshifts and Dyck shifts, which have more interest in automata theory than dynamical systems. Chapter 4: Zero entropy subshifts are covered in Chapter 4 starting with linear recurrence, which is rather a property than a class of subshifts. From linear recurrence various properties can be derived, including unique ergodicity and linear complexity, and the absence of arbitrarily high powers of subwords. Substitution shifts are a main class of zero entropy subshift that have been studied in regard to their intriguing ergodic and spectral properties, as demonstrated in Chapter 6. They are central in the theory of tiling spaces, (Rauzy) fractals, and applications such as paper-folding sequences. Their generalization to S-adic shift is powerful enough to describe most minimal subshifts. The second main class of zero entropy subshifts is the Sturmian shifts, which are almost one-to-one extensions of circle rotations (via Denjoy’s construction; see Section 4.3.1), but otherwise appearing in many problems in combinatorics. Sturmian sequences have the threefold characterization as symbolic dynamics of irrational circle rotations, sequences of minimal non-trivial wordcomplexity p(n) = n+1, and minimal 1-balanced sequences. We use them to introduce Rauzy graphs (see Section 4.3.4) and also give a detailed account of their representation as S-adic shifts (generalized to Arnoux-Rauzy shifts of torus rotations). Circle rotations generalize naturally to interval exchange transformations (IETs), and the corresponding subshifts have word-complexity p(n) = (d − 1)n + 1, where d is the number of intervals of the IET. We discuss Rauzy induction which leads to their representation as S-adic shifts, and this will later (Section 6.3.5) be used to give a solution of the Keane conjecture on the typical unique ergodicity of IETs, bypassing the use of Teichmüller flows in the original proofs of Masur [410] and Veech [543]. Sections 4.5 and 4.6 combine adding machines (odometers), which are not actual subshifts as they are not expansive, with Toeplitz shifts and B-free subshifts, because the latter two are, under mild conditions, one-to-one extensions of odometers. The final Section 4.7 comes largely from some of my own research papers

xii

Preface

(with or without coauthors) in topological dynamics. It shows how some minimal subshifts (known and new) arise naturally in the setting of unimodal interval maps, but contrary to the section on kneading theory, they fit in the zero entropy chapter. Chapter 5: Chapter 5 discusses minimal Cantor systems that are not necessarily subshift and presents three methods of describing them, namely cutting and stacking, enumeration systems, and BratteliVershik systems. In Section 5.1 we show a basic construction of Kakutani-Rokhlin partitions, probably first described in general form by Herman, Putnam & Skau [310], that can be used as a building block to translate (virtually) any minimal system into any of these three descriptions. The cutting and stacking was used by Kakutani, Chacon, and others to produce the earliest examples of uniquely ergodic systems with particular (weak) mixing or spectral properties. Enumeration systems are a generalization of the Ostrowski numeration, see Example 5.22, that is based on the standard continued fraction expansion. Liardet with various coauthors studied these in detail, obtaining among other things novel results on unique ergodicity. Under certain conditions (related to Pisot numbers), the method also gives a way of associating Rauzy fractals to particular minimal Cantor systems; see Section 5.3.1. The discussion of Bratteli-Vershik (BV) systems is the most extensive section in this chapter, partly because we give explicit constructions of various subshifts and also of cutting and stacking and of enumeration systems in terms of BV-systems. In this form, these systems frequently return in Chapter 6 where ergodic and spectral properties of minimal Cantor systems are studied. It is worth mentioning that Gjerde & Johansen’s description [275] of IETs follows in fact from Rauzy induction described in Section 4.4. The description of Toeplitz shifts in terms of a BV-system is also given by Gjerde & Johansen [274], and we largely follow their exposition. Chapter 6: Chapter 6 treats the properties of invariant measures of subshifts and other (minimal) Cantor systems. After a brief recall of the notions of ergodicity, the Choquet (and Poulsen) simplex of invariant probability measures, and Birkhoff’s Ergodic Theorem, we discuss unique ergodicity in Section 6.3. Special topics here are Boshernitzan’s results based on low word-complexity, the “classical” proof of unique ergodicity of primitive substitution shifts, unique ergodicity of BV-systems based on contraction in the Hilbert metric, and a proof of unique ergodicity of typical IETs (the Keane conjecture), based on Rauzy induction. In Section 6.4 we discuss measure-theoretic entropy, preparing our discussion of metrically

Preface

xiii

isomorphic systems and Ornstein’s Theorem in Section 6.5. We mention the Variational Principle and measures of maximal entropy in Section 6.6 and discuss the notion of entropy denseness. We also present the Shannon-Parry measure (measure of maximal entropy for SFTs). In Section 6.7 we recall proofs that many minimal subshifts cannot be strongly mixing (i.e. the correlation coefficients cannot tend to zero). This applies to substitution shifts, or more generally linearly recurrent subshifts, and also to cutting and stacking systems with a bounded number of layers of spacer. In contrast, for staircase systems (i.e. cutting and stacking with an unbounded number of slices in between the stacks without bound on the number of layers of spacer) strong mixing can occur. This conjecture by Smorodinsky was proved by Adams, and we present the result in detail. In Section 6.7.3 we discuss the notion of weak mixing, which in spectral terms means that constant functions are the only eigenfunction of the Koopman operator. The search for (continuous or measurable) eigenfunctions for various subshifts, specifically substitution shifts, goes back to Dekking, Keane, Queffélec, and Host, and coauthors. We follow the more recent approach of Durand, Maass, and coauthors and give detailed proofs of how the condition |||αhv (n)||| := d(αhv (n), Z) → 0 as n → ∞ (where hv (n) are the height vectors of the Bratteli diagram) relates to e2πiα being a continuous or measurable eigenvalue of the Koopman operator. In Section 6.8 we study spectral properties of Cantor systems, that is, the properties that can be expressed in terms of the spectrum of the Koopman operator UT : L2 (μ) → L2 (μ), UT f → f ◦ T . This includes the spectral measures (Fourier transforms) of observables f ∈ L2 (μ) that are important for the spectral decomposition of the Hilbert space L2 (μ) and the Koopman operator itself. We discuss the spectral measure and spectral type of UT itself, in particular conditions under which Cantor systems have pure point spectrum or mixed spectrum. Chapter 7: Chapter 7 discusses ways in which symbol sequences play a role in computer science, information theory, and data transmission. Automata represent a theoretical model for computers and a concrete way of generating sequences, which are hence called automatic sequences. Fixed points of substitutions are automatic, by Cobham’s Little Theorem. We discuss automata and automatic sequences in Section 7.1. In Section 7.2 we present the Chomsky hierarchy of formal languages, consisting of four basic levels of complexity of

xiv

Preface

their underlying grammar (production rules) and of the type of automaton that can detect or produce them. The lowest level, regular languages, can be produced by finite automaton and also correspond to sofic subshifts discussed in Section 3.2. Pumping lemmas are a major tool for distinguishing between these levels, and they show that most other types of subshifts discussed in earlier chapters are in the context-sensitive category or beyond. Chapter 8: Here we give background material to which earlier chapters frequently refer, but some of its sections comprise short topics of independent interest as well. Section 8.1 on Pisot numbers, apart from giving definitions and some historic background, addresses the question of for which α ∈ R, |||αhv (n)||| := d(αGn , Z) → 0 as n → ∞, where Gn is a sequence of integers satisfying a linear recursion formula (such as the height vectors of the Bratteli diagram do). In Section 8.2 we discuss the standard continued fraction algorithm, as well as Farey arithmetic, and associated ways (Kepler, CalkinWilf) of denumbering all the rationals. Section 8.3 covers uniformly distributed sequences, Weyl’s criterion, and Van der Corput’s Difference Theorem. Section 8.4 on Diophantine approximation, in addition to definitions and some historic background, discusses Dirichlet’s Theorem, the Lagrange spectrum, and Roth’s Theorem Section 8.5 covers the (Banach) density and logarithmic density (important in B-free subshifts) of sequences. In Section 8.6 we present the Perron-Frobenius Theorem 8.58, on the leading eigenvalue and eigenspace of non-negative matrices, also in connection with the Hilbert metric on projective space. In Section 8.7 we treat countable Markov graphs and matrices and present the Vere-Jones classification into transient, positive recurrent, and null recurrent systems, including Gurevich entropy and refined classifications by Salama and Ruette. We also give some results on the (non-)existence of measures of maximal entropy for countable state Markov chains. Section 8.7.3 presents the rome technique from the paper [88] by Block et al., which facilitates entropy computations for countable Markov graphs.

The scope and aim of the book: Although the text grew from two courses I gave on symbolic dynamics at the University of Vienna, there is no attempt to shape it as a textbook for a course on symbolic dynamics. The material is too diverse, hardly balanced in depth and detail, and I have made no attempt to indicate which sections together would constitute such a course. My

Preface

xv

experience is that hardly any book, however well-conceived and written, completely fits the purpose and taste of the lecturers teaching the actual class. Instead, I hope this book can serve a purpose for topics courses or reading courses or as a reference book for anyone wishing to acquaint him/herself to a particular topic. Also the exercises are not devised as a testing tool of the students’ understanding of the material. Symbolic dynamics is at the intersection of dynamical systems, topological dynamics, combinatorics, and of course coding theory, and there are a lot of trivia to share. Some of these trivia are disguised as exercises. Most of the exercises have solutions in the back of the book. I have given, probably multiple times, proofs of simple results that are used as exercises in comparable books, but I wanted to avoid the annoying situation that you cannot refer to a well-known result because it is only presented as an exercise without solution. The necessary background for the book varies: for most of it a solid knowledge of real analysis and linear algebra and first courses in probability and measure theory (a few times, conditional expectations are used, and martingales in the proof of Theorem 6.118), metric spaces, number theory, topology, and set theory suffice. Chapter 6 is not meant as an introduction to ergodic theory, so a course in ergodic theory and in Hilbert spaces and Fourier analysis for Section 6.8 are probably necessary. Section 8.1 uses some Galois theory. By adding an extensive index and cross-references within the main text, I tried to enable the reader to study the chapters independently. However, readers without any prior knowledge of symbolic dynamics should not skip (the first halves of) Chapters 1 and 2. To follow Chapter 5 one should have a good understanding of SFTs (Section 3.1), substitution shifts (Section 4.2), and Sturmian shifts (Section 4.3). To follow Chapter 6, an additional understanding of BV-systems (Section 5.4) is required. Chapter 8 can be read largely independently of the rest, except for several examples with direct references to earlier parts in the text. The book is largely, but not entirely, self-contained. That would go too far, because various topics are covered much better in other textbooks. General books on ergodic theory are [165, 277, 305, 346, 408, 456, 479, 509, 551]; for proofs of Birkhoff’s Ergodic Theorem we suggest [341, 346, 349, 551], and for the proof of the Variational Principle we recommend [551]. In topological dynamics the book by Auslander is a classic (although difficult to navigate). Other texts are [12,22,198,199,284,381]. In symbolic dynamics there are Kitchens [364] and Lind & Marcus [398], both specializing in SFTs, and the more general book by K˚ urka [381], and topic collections by Blanchard et al. [85] and Bedford et al. [57]. Substitution shifts have

xvi

Preface

the expert monographs Queffélec [465], “Phyteas Fogg” [249], and other groups of authors [68]. I should not fail to mention Viana’s monograph [548] on interval exchange transformations. Continued fractions and Diophantine approximation are the subject of monographs [98,175,360,377]. For general dynamical systems there are [17, 113, 346, 474], and [22, 87, 116, 164, 414, 462] for one-dimensional dynamics in particular. Various milestones in the theory have been treated extensively by people far more expert and well-placed than I am. For example, this holds for Williams’s Theorem on conjugacy in subshifts of finite type; see [364, 398]. This also holds for Ornstein’s Theorem [438] that entropy is a complete invariant among invertible Bernoulli shifts, which we only briefly introduce in Section 6.5 because the full proof is beyond the scope of this book. We refer to [208,209,216,352,456] for further developments, new (more conceptual) proofs, and more detailed expositions. How much detail is given for each topic relied on my own taste and judgment, and if I overstretched the reader and/or my own abilities and understanding, then so be it. My apologies to the true experts. Thus I decided to include Li-Yorke chaos but not distributional chaos, some version of entropy, but no dynamical ζ-functions, no IP-sets, only a few variations of shadowing and specification (see [461] for a monograph on shadowing), no higher-dimensional shifts, and no automorphism groups of shift spaces. Although the first use of symbolic dynamics by Hadamard ([296] in 1898) was for geodesic flows on modular surfaces, this topic does not appear in this book; see e.g. [494]. We cover Kakutani-Rokhlin partitions, cutting and stacking, S-adic transformation, Bratteli-Vershik systems but no graph covers, although they describe the Cantor system in even further generality. See [273, 500–502] for their constructions of some of the earliest uses. Partly, this book is a compendium of bits of knowledge and curiosities that are scattered over the literature if not on Wikipedia pages. The material that I present is not equally up to date. For instance, notions such as B-free shifts (or at least the current state) and amorphic complexity are from the past decade or even less, whereas in the section on SFTs, the material is all from before 1990. Topics that are to my knowledge new to textbook and monograph literature include gap shifts, spacing shifts, power-free shifts, B-free shifts, amorphic complexity, enumeration systems. The breadth of topics allows one to see similarities of methods of proof in different subfields of symbolic dynamics. I hope that my extensive treatment of Bratteli-Vershik systems and unique ergodicity, as well as my treatment of Sturmian shifts and Rauzy fractals (redoing and sometimes reproving results of Arnoux), have some added value. The sections on weak mixing for Bratteli-Vershik

Preface

xvii

systems and on Adams’s Theorem of mixing staircase systems, grew out of topics I set for Master’s Theses for Silvia Radinger and Kathrin Peticzka, respectively. Acknowledgments: In writing this book, in addition to countless articles, monographs, and survey papers, I had a lot of benefit from various lecture notes, conference presentations, and online lectures in the recent coronavirusdominated situation. It is impossible to list exactly which. But most of all I would like to thank Lori Alvin, Ana Anušić, Max Auer, Jernej Činč, María Isabel Cortez, Michel Dekking, Fabien Durand, Robbert Fokkink, Gernot Greschronik, Maik Gröger, Jane Hawkins, Olena Karpel, Mike Keane, Tom Kempton, Henna Koivusalo, Cor Kraaikamp, Dominik Kwietniak, Mariusz Lemańzcyk, Olga Lukina, Kathrin Peticzka, Silvia Radinger, Michel Rigo, Klaus Schmidt, Dalia Terhesiu, Jörg Thuswaldner, Reem Yassawi, for proofreading and/or answering few or many questions, although they didn’t always know it was for this book. The input of several anonymous referees and the work of the AMS publication team is also gratefully acknowledged.

Henk Bruin

Vienna, November 2, 2022

Chapter 1

First Examples and General Properties of Subshifts

Symbolic dynamics is concerned with spaces of (infinite) sequences of symbols. Such sequences can come from the symbolic description of a dynamical system, but they also have intrinsic interest. Symbol sequences are used to code messages, digitally process sound and images, and as the objects that computers process. The “dynamics”, usually, but not exclusively, refers to a transformation σ of such sequences in the form of a shift by one unit to the left. For example,  σ(10011 . . . ) = 0011 . . . for one-sided sequences, σ(011.10011 . . . ) = 0111.0011 . . . for two-sided sequences. That is, for a right-infinite sequence, the first symbol disappears, and all other symbols move a place to the left. For a bi-infinite sequence, the dot that indicates position zero moves one place to the right. A closed σ-invariant subset of sequences over some fixed set of symbols (the alphabet), combined with this left-shift operation σ, is called a subshift. In this chapter, we give the basic notions and examples of subshifts and discuss the number and frequency of their subwords.

1.1. Symbol Sequences and Subshifts Let A be a finite or countable alphabet of letters. Usually A = {0, . . . , N − 1} or {0, 1, 2, . . . } but we can use other letters and symbols too. We are 1

2

1. First Examples and General Properties of Subshifts

interested in the space of infinite or bi-infinite sequences of letters: Σ = AN

or Z

= {x = (xi )i∈N or Z : xi ∈ A}.

Such symbol strings find applications in data-transmission and storage, linguistics, theoretic computer science, and also dynamical systems (symbolic dynamics). A finite string of letters, say x1 · · · xn ∈ An , is called a word or block. A k-word is a word of k letters and  is the empty word (of length 0). We use the notation Ak = {k-words in Σ} and A∗ = {words of any finite length in Σ including the empty word}. Given a subshift (X, σ), a finite word u appearing in some x ∈ X is sometimes called a factor1 of x. If u is concatenated as u = vw, then v is a prefix and w a suffix of u. A cylinder set2 is any set of the form [ek · · · el ] = {x ∈ Σ : xi = ei for k ≤ i ≤ l}. Intersections of cylinder sets are again cylinder sets. The cylinder sets form a basis of the product topology on Σ; i.e. a set is open in the product topology precisely if it can be written as a union of cylinder sets. Note that a cylinder set is both open and closed (because it is the complement of the union of complementary cylinders). Sets that are both open and closed are called clopen. Lemma 1.1. If 2 ≤ #A < ∞, then Σ = AN or Z is a Cantor set (that is, Σ is (i) compact, (ii) has no isolated points, and (iii) its connected components are points). If #A = ∞, then Σ is not compact, but (ii) and (iii) still hold. Proof. (i) Set A = {0, 1, . . . , N − 1} with discrete topology. Clearly A is compact, because it is finite. Compactness of Σ then follows from Tychonov’s Theorem. (ii) No point is isolated, because, for arbitrary x ∈ Σ, the sequence xn defined as xni = xi if i = n and xnn = xn + 1 (mod 1) converges to x. (iii) If x = y, set n = min{|i| : xi = y}; then Z := {x ∈ X : xi = xi for all |i| ≤ n} and X \ Z are two clopen disjoint non-empty sets whose union is X. Thus x and y cannot belong to the same connected component. If #A = ∞, then the collection {[a]}a∈A is an open cover without finite subcover, so Σ is not compact.  1 We will rather not use this word, because of possible confusion with the factor of a subshift (= image under a sliding block code; see Section 1.4). 2 In greater generality, if X is a topological space and n ∈ N ∪ {∞}, then every set of the form A × X n−k for A ⊂ X k is called a cylinder set. If X = R, n = 3, and A is a circle in R2 , then A × R is indeed a geometrical cylinder, stretching infinitely far in the z-direction.

1.1. Symbol Sequences and Subshifts

3

Shift spaces with product topology are metrizable. One of the usual3 metrics that generates the product topology is  0 if x = y or (1.1) d(x, y) = −m 2 for m = sup{n ≥ 0 : xi = yi for all |i| < n}; so in particular d(x, y) = 1 if x0 = y0 , and diam(Σ) = 1. If (xk )k∈N is a sequence of sequences such that xk → x, then there is k0 ∈ N such that d(xk , x) < 2−m for every k ≥ k0 . The definition of the metric d implies that xki = xi for all |i| ≤ m. In other words, xk → x means that xk[a,b] is eventually equal to x[a,b] on every finite window [a, b]. The shift map or left-shift σ : Σ → Σ, defined as σ(x)i = xi+1 ,

for i ∈ N or Z,

is invertible on AZ (with inverse σ −1 (x)i = xi−1 ) but non-invertible on AN . We can use the ε-δ definition of continuity for δ = ε/2 to show that σ is uniformly continuous. This is even true if #A = ∞. Definition 1.2. A pair (X, σ) with X ⊂ Σ and σ the left-shift is a subshift (often called simply shift) if X is closed (in product topology) and strongly shift-invariant; i.e. σ(X) = X. If σ is invertible, then we also stipulate that σ −1 (X) = X. For example, if Σ = {0, 1}Z and x = . . . 000.111111 . . . , then X = {σ n (x) : n ≥ 0} is not a subshift, because x ∈ X but σ −1 (x) ∈ / X. In Examples 1.3–1.6, we use A = {0, 1}. Example 1.3. The set X = {x ∈ Σ : xi = 1 ⇒ xi+1 = 0} is called the Fibonacci shift4 . It disallows sequences with two consecutive 1’s. This Fibonacci shift is an example of a subshift of finite type (SFT); see Section 3.1. The collection X can be represented by a graph in multiple ways: • X is the collection of labels of infinite paths through the vertexlabeled graph in Figure 1.1 (left). Labels are given to the vertices of the graph, and no label is repeated. • X is the collection of labels of infinite paths through the edgelabeled graph in Figure 1.1 (right). Labels are given to the arrows of the graph, and labels can be repeated (different arrows with the same label can occur).  1 metrics are d (x, y) = i |xi − yi |2−|i| or d (x, y) = m with m as in (1.1). They are 1 d(x, y) ≤ d (x, y) ≤ equivalent to d(x, y): the former in the sense that there is some C such that C Cd(x, y) for all x, y ∈ Σ, the latter in the weaker sense that the embedding i : (Σ, d ) → (Σ, d) as well as its inverse i−1 are uniformly continuous. In either case, they generate the same topology. 4 Warning: There is also a Fibonacci substitution shift = Fibonacci Sturmian shift (see Example 4.3), which is different from this one. 3 Other

4

1. First Examples and General Properties of Subshifts

1

0

1

0 0

Figure 1.1. Transition graphs: vertex-labeled and edge-labeled.

Example 1.4. Xeven ⊂ {0, 1}N is the collection of infinite sequences in which the 1’s appear only in blocks of even length and also 1111 · · · ∈ X. We call Xeven the even shift. Similarly, the odd shift Xodd is the collection of infinite sequences in which the 0’s appear only in blocks of odd length and also 0000 · · · ∈ X; see Figure 1.2.

1

1

0

1

0

0

1

Xodd ∩ Xeven 0

0

1

1 0

1

Xodd

0

1

Xeven

Figure 1.2. Edge-labeled graphs for Xodd , Xeven , and Xodd ∩ Xeven .

Example 1.5. Let S be a non-empty subset of N. Let X ⊂ {0, 1}Z be the collection of sequences in which the appearance of two consecutive 1’s occur always s positions apart for some s ∈ S. Hence, sequences in X have the form x = . . . 10s−1 −1 10s0 −1 10s1 −1 10s2 −1 1 . . . where si ∈ S for each i ∈ Z. If #S = ∞, then allowed sequence can also end and/or start with 0∞ . This space is called the S-gap shift; see Section 3.7. For S = {2, 3, 4, . . . } we get the Fibonacci SFT, and for S = {1, 2 . . . } we get the Fibonacci SFT with symbols 0 and 1 interchanged.

1.2. Word-Complexity

5

Example 1.6. The Thue-Morse substitution5 χTM : {0, 1} → {0, 1}∗ is a special substituion, see Section 4.2, defined by  0 → 01, χTM : 1 → 10 and extended on longer words by concatenation. It has two fixed points: ρ0 = 01 10 1001 10010110 1001011001101001 . . . , ρ1 = 10 01 0110 01101001 0110100110010110 . . . . These sequences make their appearance in many settings in combinatorics and elsewhere; cf. [19, 561]. For instance, the n-th entry of ρ0 (where we start counting at n = 0) is the parity of the number of 1’s in the binary expansion of n. The Thue-Morse sequence ρi can be defined by the relation ρi0 = i, ρi2n = ρin , and ρi2n+1 = 1 − ρin . Also, if we have a sequence (Pk )k≥0 of decreasing quality (e.g. rugby players) which we want to divide over two teams T0 and T1 , so that the teams are as close in strength as possible, then team Ti if i is the k-th of ρ0 (or equivalently, of ρ1 ). we assign Pk to  digit 0 −n 1 = 1 − n≥1 ρn has been proved to sum to a Also the series n≥1 ρn 2 transcendental number; see e.g. [20, Theorem 13.4.2]. Example 1.7. The alphabet A consists of brackets (, ), [, ] and the allowed words are those (that can be extended to words) consisting of brackets that are properly paired and unlinked. So [ ( [ ] ) ] and ( ( ) [ ] ) are allowed, but [ ( ] and ( [ ) ] are not. The subshift (X, σ) of which these are the allowed subwords is called the Dyck shift; see Section 3.10.

1.2. Word-Complexity Definition 1.8. Given a subshift X, the collection L(X) = {words of any finite length in X} is called the language of X. We use the notation Ln (X) for all the words in the language of length n. Definition 1.9. The function p := pX : N → N defined by p(n) = #Ln (X) is called the word-complexity of X. Example 1.10. For the Fibonacci SFT of Example 1.3, let Fn = #{w ∈ Ln (X) : wn = 0}. Then F1 = 1, F2 = 2, and Fn+1 = Fn + Fn−1 for n ≥ 3 because Fn is the cardinality of the set of n + 1-words ending in 00 and Fn−1 is the 5 After the Norwegian mathematician Axel Thue (1863–1922) and the American Marston Morse (1892–1977), but the corresponding sequence was used before by the French mathematician Eugène Prouhet (1817–1867), a student of Sturm.

6

1. First Examples and General Properties of Subshifts

cardinality of the set of n + 1-words ending in 010. Therefore the Fn ’s are the Fibonacci numbers. The same argument gives p(1) = 2 = F2 and p(n) = Fn + Fn−1 = Fn+1 for n ≥ 2. 1.2.1. Sublinear and Polynomial Complexity. We start with some terminology and a useful proposition. Definition 1.11. We call a word u ∈ Ln (X) over the alphabet A = {0, 1} • left-special if both 0u and 1u belong to L(X); • right-special if both u0 and u1 belong to L(X); • bi-special if u is both left-special and right-special. Note, however, that there are different types of bi-special words u depending on how many of the words 0u0, 0u1, 1u0, and 1u1 are allowed. If only one choice of 0u or 1u is right-special and only one choice of u0 and u1 is leftspecial, then u is a regular bi-special word. For larger alphabets, we can formulate similar definitions and, naturally, there are more types of left/right/bi-special words. Clearly p(n + 1) − p(n) = #{left-special words of length n} = #{right-special words of length n}. The following result goes back to Morse & Hedlund [425]. Proposition 1.12. If the word-complexity of a subshift (X, σ) satisfies p(n) ≤ n for some n, then (X, σ) consists of finitely many periodic sequences. Proof. If p(1) = 1, then X = {a∞ } is obviously periodic. So assume p(1) ≥ 2. Since p is non-decreasing, the assumption of this proposition implies that there is a minimal n such that p(n − 1) = p(n) = n. Hence there are no right-special words of length n − 1. Start with a word u ∈ Ln−1 (X); there is only one way to extend it to the right by one letter, say to ua. Then the n − 1-suffix of ua can also be extended to the right by one letter in only one way. Continue this way, until, after at most p(n) = n steps, we obtain an n-suffix that we have seen before. All strings need to be extendible to the left in some allowed way (otherwise σ(X) = X). When extending u to the left symbol by symbol, we need to arrive at the same periodic pattern, because otherwise p(n + 1) > n + 1. Therefore X consists of (at most n) periodic sequences.  This proposition shows that the minimal complexity of interest is p(n) = n + 1, because if p(n) ≤ n for some n, then X consists of at most n periodic sequences. We say that (X, σ) is of sublinear complexity if there is a

1.2. Word-Complexity

7

constant C such that p(n) ≤ Cn. Sturmian sequences (see Section 4.3) have p(n) = n + 1; in fact all recurrent words with this word-complexity are Sturmian. There are further possibilities for non-recurrent subshifts. The sequences x = . . . 000.10000 . . .

and

y = 00001111.00000 . . .

both have p(n) = n+1. They are not uniformly recurrent, but asymptotically fixed for n → ±∞. Ormes & Pavlov [435, Theorems 1.2 & 1.3] showed that for non-recurrent shifts (X, σ) that are not asymptotically periodic in both directions, lim inf n p(n)/n ≥ 32 and that this bound is sharp, as is demonstrated by z = 0000.10n0 10n1 10n2 10n3 1 . . . for a carefully chosen increasing sequence of gaps (ni )i≥1 . In fact, given any non-decreasing function g : N → N that tends to infinity, there is x ∈ X such that p{x} (n) := #{w is subword of x : |w|} = n < 32 n+g(n). In further detail, if a transitive6 shift (X, σ) with a recurrent point contains m minimal subsystems, of which m∞ are infinite, then lim sup pX (n) − (m + m∞ + 1)n = ∞, n→∞

lim inf pX (n) − (m + m∞ )n = ∞, n→∞

and these bounds are sharp. The second estimate holds also without the existence of a recurrent point. See [230], specifically Theorems 1.2 and 1.3. Symbolic spaces associated with interval exchanges transformations on k intervals have p(n) = (d − 1)n + 1; see Proposition 4.80. The Chacon substitution shift and primitive Chacon substitution shift (see Example 1.27) have word-complexity p(n) = 2n−1 (for n ≥ 2) and p(n) = 2n+1; see [243]. For many subshifts, pX (n)/n is bounded in n but hard to compute exactly; often limn p(n)/n doesn’t exist. For instance, the word-complexity of the Thue-Morse shift (i.e. the closure {σ n (ρTM ) : n ∈ N0 } of Example 1.6) is  3 · 2m + 4r if 0 ≤ r < 2m−1 , (1.2) p(n) = 4 · 2m + 2r if 2m−1 ≤ r < 2m , where n = 2m + r + 1; see [115, 406]. In [129], the word-complexity of certain (Fibonacci-like) unimodal restrictions to the critical ω-limit set are computed. The following curious result is due to Heinis; see [150, 311]. Proposition 1.13. If limn pX (n)/n exists and is finite, then it has to be an integer. All substitution shifts, in fact all linearly recurrent shifts, have sublinear complexity; see Theorem 4.4. 6 See

Definition 1.18 below.

8

1. First Examples and General Properties of Subshifts

The polynomial growth rate is defined as r = limn loglogp(n) n . Naturally, linear complexity implies r = 1, but every r ∈ {0} ∪ [1, ∞] is possible. Subshifts with polynomial growth rate r > 1 are less studied, but for example symbolic spaces for polygonal billiards on d-dimensional billiard tables can have polynomial growth rate r = d. 1.2.2. Exponential Complexity. Anticipating the definition for general dynamical systems in Section 2.4, for subshifts, the topological entropy is the exponential growth rate of the word-complexity: 1 (1.3) htop (σ) = lim log pX (n). n→∞ n To show that the limit in (1.3) exists, we need one more notion and one well-known lemma. Definition 1.14. We call a real-valued sequence (an )n≥1 subadditive if am+n ≤ am + an

for all m, n ≥ 1.

Analogously, (an )n≥1 is superadditive if am+n ≥ am + an for all m, n ∈ N. Lemma 1.15 (Fekete’s Subadditive Lemma). If (an )n≥1 is subadditive, then limn ann = inf r≥1 arr (possily −∞). Analogously, if (an )n≥1 is superadditive, then limn ann = supr≥1 arr (possily −∞). Proof. Every integer n can be written as n = i · r + j for 0 ≤ j < r. Therefore ai·r+j iar + aj ar an = lim sup ≤ lim sup = . lim sup r n→∞ n i→∞ i · r + j i→∞ i · r + j This holds for all r ∈ N, so we obtain ar an an ar ≤ lim inf ≤ lim sup ≤ inf , inf n→∞ n r∈N r r∈N r n→∞ n as required. The proof for superadditive sequences goes likewise.  Remark 1.16. A positive sequence (an )n∈N is submultiplicative if am+n ≤ am an (and supermultiplicative if am+n ≥ am an ). By taking logarithms, we can turn a sub/supermultiplicative sequence into a sub/superadditive one, and this suffices for our purposes. We devote separate chapters to subshifts of positive and subshifts of zero entropy, because most7 tend to have different topological properties such as topological mixing, existence and number of periodic orbits, shadowing; see 7 At least most shifts we encounter in this book, but it is not a strict rule. For example, Petersen’s shifts mentioned below Theorem 2.77 has zero entropy and is topologically mixing, while Grillenberger’s [287] construction gives minimal shifts of positive entropy (and therefore lacking periodic orbits), with further examples among Toeplitz shifts (see Theorem 4.94) and B-free shifts (Section 4.6).

1.3. Transitive and Synchronized Subshifts

9

Chapter 2. The maximal entropy of a subshift on N letters is log N , and this is achieved by the full shift ({0, . . . , N − 1}N , σ). One can ask whether all intermediate values between 0 and log N can be achieved as topological entropy for some subshift. As we shall see later, this is not true for the class of subshift of finite type or the sofic shifts, because the entropy is then equal to the logarithm of the leading eigenvalue of some integer matrix, so logarithms of algebraic numbers and, in fact Perron numbers; see [397] and (the text below) Definition 8.4. On the other hand, the topological entropy of β-shifts (Xβ , σ) can take any non-negative value ≥ 0, because htop (Xβ ) = log β. Also within the class of gap shift you can achieve every value of the entropy, as can be derived from Theorem 3.114. Some specific constructions of subshifts of a chosen entropy can be found among spacing shifts; see [380] and Section 3.8. Remark 1.17. For many subshifts in AN or Z , the topological entropy can d be computed exactly, but not so for subshift in AZ , i.e. cellular automata. Even for the simplest direct generalization of the Fibonacci SFT, namely 0-1-patterns on Z2 where no two 1’s occur directly next to each other (hor1 log px (m, n) is unknown. izontally or vertically), the entropy limm,n→∞ mn There are however numerical approximations (e.g. for this example, the entropy equals 0.5878116 . . . which these digits certainly correct; see [251]) and characterizations of which values can occur; see e.g. [259, 260, 289, 313, 314, 399].

1.3. Transitive and Synchronized Subshifts The following definition expresses that all parts of a subshift connect to each other: Definition 1.18. A subshift X is transitive or irreducible if for every u, w ∈ L(X), there is v ∈ L(X) such that uvw ∈ L(X). This definition does not automatically produce periodic sequences, because also if u = w, so uvu ∈ L(X), then it doesn’t follow that uvuvu ∈ L(X). Definition 1.19. A subshift (X, σ) is called synchronized if it is transitive and there is a word v ∈ L(X) (called (intrinsically) synchronizing word8 ) such that uv, vw ∈ L(X) implies uvw ∈ L(X). In other words, the appearance of v cancels the influence of the past. Theorem 1.20. A synchronized shift (X, σ) has a dense set of periodic points. If X is not periodic itself, then the entropy htop (X, σ) > 0. 8 Kitchens

in [364] calls it a magic Markov word.

10

1. First Examples and General Properties of Subshifts

Proof. Let v be a synchronizing word and let x ∈ L(X) be arbitrary. Since a synchronized X is, by definition, transitive, there are words u, w ∈ L(X) such that xuv ∈ L(X) and vwx ∈ L(X). Now the infinite periodic word (xuvw)∞ belongs to X. Since x ∈ L(X) was arbitrary, denseness of periodic words follows. Next use transitivity again to find distinct words u, u , v ∈ L(X) such that vuv, vu v ∈ L(X). Let X  be the subshift constructed by free concatenations of vu and vu ; clearly X  is a subshift of X, and for N = max{|v| +  |u|, |v| + |u |} we find pX  (nN ) ≥ 2n . Hence, htop (X  , σ) > N1 log 2. Example 1.21. The Fibonacci SFT (see Example 1.3) has synchronizing word 0. In this case, every 1 is preceded and succeeded by a 0. If we swap 0’s and 1’s, then we obtain the S-gap shift with gap √ sizes 1 and 2. Hence htop (X, σ) = log λ where λ−1 + λ−2 = 1, so λ = 1+2 5 . This is in agreement with Example 1.10.

1.4. Sliding Block Codes Definition 1.22 (Sliding Block Code). Let A and A˜ be alphabets. A map π : AZ → A˜Z is called a sliding block code or local rule of window size 2N + 1 if there is a function f : A2N +1 → A˜ such that π(x)i = f (xi−N · · · xi+N ). In other words, we have a window9 of width 2N + 1 put on the sequence x. If it is centered at position i, then the recoded word y = π(x) will have at position i the f -image of what is visible in the window. After that we slide the window to the next position and repeat. Theorem 1.23 (Curtis–Hedlund–Lyndon10 ). Let X and Y be subshifts over ˜ respectively. A continuous map π : X → Y comfinite alphabets A and A, mutes with the shift (i.e. σ ◦ π = π ◦ σ) if and only if π is a sliding block code. If π : X → Y is a homeomorphism, then we call (X, σ) and (Y, σ) conjugate. Proof. First assume that π is continuous and commutes with the shift. For ˜ the cylinder [a] = {y ∈ Y : y0 = a} is clopen, so Va := π −1 ([a]) each a ∈ A, is clopen too. Since Va is open, it can be written as the union of cylinders, and since Va is closed (andhence compact) it can be written as the finite a Ua,i . Let N be so large that every Ua,i is union of cylinders: Va = ri=1 9 Sometime the window can have memory and anticipation of different lengths, so the window would be [−m, n], but calling their maximum N covers all cases. 10 Curtis and Lyndon were working for the military at the time, so their work was “classified”, and the paper was published under Hedlund’s name only, [308].

1.4. Sliding Block Codes

11

the union of 2N + 1-cylinders Ua,i,j ; i.e. each Ua,i,j is determined by a word x−N · · · xN . This makes 2N +1 a sufficient window size and there is a function f : A2N +1 → A˜ such that π(x)0 = f (x−N · · · xN ). By shift-invariance, π(x)i = f (xi−N · · · xi+N ) for all i ∈ Z. Conversely, assume that π is a sliding block code of window size11 2N +1. Take ε = 2−M > 0 arbitrary and δ = ε2−N . If d(x, y) < δ, then xi = yi for |i| ≤ M + N . By the construction of the sliding block code, π(x)i = π(y)i for all |i| ≤ M . Therefore d(π(x), π(y)) < ε, so π is continuous (in fact uniformly continuous).  Exercise 1.24. Give a surjective sliding block code between the Fibonacci SFT and the even subshift (see Examples 1.3 and 1.4). Corollary 1.25. If (X, σ) and (Y, σ) are conjugate shifts, then there is N such that pX (k − N ) ≤ pY (k) ≤ pX (k + N ) for k > N . Proof. Let 2N + 1 be the maximal window size among the sliding block codes from X to Y and from Y to X. Then every k-word in Y is obtained from an N + k-word in X, so pY (k) ≤ pX (N + k). Replacing the role of X and Y gives the other inequality.  Exercise 1.26. If ψ : X → Y is an onto sliding block code which is k-to-one for some fixed k, show that htop (X, σ) = htop (Y, σ). Example 1.27. The following substitutions (see Section 4.2) are called the Chacon substitution and primitive Chacon substitution: ⎧  ⎪ ⎨0 → 0021, 0 → 0010, and χChac : 1 → 021, (1.4) χchac : ⎪ 1→1 ⎩ 2 → 21, with fixed points ρchac = 0010 0010 1 0010 0010001010 1 0010 . . . , ρChac = 0021 0021 21 021 0021002121021 . . . . They can be transformed into each other using the sliding block code ⎧ ⎧ ⎪ ⎪ ⎨0 → 0, ⎨00a → 0, −1 a ∈ {0, 1} and f : 1 → 0, f : 10a → 1, ⎪ ⎪ ⎩ ⎩ 1 → 2, 2 → 1, and this extends to the shift orbit closures Xchac = {σ n (ρchac ) : n ≥ 0}

and

XChac = {σ n (ρChac ) : n ≥ 0}.

11 If (X, σ) is a one-sided subshift, with window size [0, N ] (so no memory, only anticipation), then this part of the proof still works. The first part of the proof fails: one must first extend (X, σ) to a two-sided shift before the Curtis–Hedlund–Lyndon Theorem can be applied in full.

12

1. First Examples and General Properties of Subshifts

Therefore, these substitution shifts are topologically conjugate, although the word-complexities are different: pXchac (1) = 2, pXchac (n) = 2n − 1 for n ≥ 2 and pXChac (n) = 2n + 1 for n ≥ 1; see [243]. 01

11

00 10

11

11

Figure 1.3. The edge-labeled transition graph of the 2-block even shift.

Each subshift (X, σ) over an alphabet A is conjugate to an -block shift, where the alphabet A˜ ⊂ A consists of the words in L (X) and a, b ∈ A˜ can only follow each other if the  − 1-suffix of a coincides with the  − 1-prefix of b. For instance, if (Xeven , σ) is the even shift, then A˜ = {00, 01, 10, 11} and the edge-labeled transition graph is given in Figure 1.3. Note that to recover the coding of paths in the original shift, we use only the first letters of the codes at the edges. Taking a block shift generally doesn’t change the nature of the shift (SFTs remain SFTs, substitution shifts remain substitution shifts, see Section 6.3.2, etc.). Block shifts can be used to shrink the window size of sliding block codes; see [398, Proposition 1.5.12]. Proposition 1.28. If π is a sliding block code between X and Y of window size 2N + 1, then there is a sliding block code π ˜ (of window size 1) between ˜ of X and Y . the 2N + 1 block shift X Proof. We do the proof for invertible shifts; the one-sided shifts work as well, but then we cannot allow a memory in the sliding block code, only ˜ be the sliding block code that recodes the anticipation. Let φ : X → X ˜ i.e. φ(x)i = f (xi−N · · · xi+N ). 2N + 1-blocks in X into the letters of A; −1  Then π ˜ = π ◦ φ is the required sliding block code.

1.5. Word-Frequencies and Shift-Invariant Measures In addition to the number of words, we can also study the frequency of words w appearing inside infinite sequences: 1 (1.5) fw (x) = lim #{0 ≤ i < n : xi · · · xi+|w|−1 = w}. n→∞ n The question of whether the limit exists and to what extent it depends on x is answered by Birkhoff’s Ergodic Theorem 6.13. For this we need a measure μ

1.5. Word-Frequencies and Shift-Invariant Measures

13

that assigns a number to every cylinder set, according to the following rules: (i) 0 ≤ μ([w]) ≤ 1 for every cylinder [w]; (ii) μ(∅) = 0, μ(X) = 1;

(iii) μ( [wi ]) = μ([wi ]) for all disjoint cylinders [w1 ], [w2 ], . . . . i

i

The Kolmogorov Extension Theorem (see e.g. [56, Section 21.10]) implies that μ can be extended uniquely for every set in the σ-algebra B generated by the cylinder sets. Thus, if x ∈ X is such that fw (x) exists for every w ∈ L(X), then there is a shift-invariant probability measure μ such that μ([w]) = fw (x) for all w ∈ L(X). Remark 1.29. The Kolmogorov Extension Theorem is about extending probability measures μn on finite Cartesian products X n (equipped with an n-fold-product σ-algebra) to a measure on the infinite product X ∞ (equipped with an infinite-product σ-algebra). That is, if μn+1 (A × X) = μn (A) for every n ∈ N and measurable set A ⊂ X n , then there is a unique probability measure μ on X ∞ such that μ(A × X ∞ ) = μn (A) for every n ∈ N and measurable set A ⊂ X n . This carries over to indicator functions. Linear combinations of indicator functions 1A with A ⊂ X n , n ∈ N, lie dense in L1 (μ); i.e. for every ψ ∈ L1 (μ) and ε > 0 there is Nand there are finitely many sets Ak ⊂ X N and ak ∈ R such that X ∞ |ψ − k ak 1Ak | dμ < ε. Definition 1.30. A measure μ on a subshift (X, σ) is called invariant or shift-invariant if μ(B) = μ(σ −1 B) for all B ∈ B. A measure is called ergodic if σ −1 (A) = A mod μ for some A ∈ B implies that μ(A) = 0 or μ(Ac ) = 0. That is, the only measurable shiftinvariant sets are nullsets or the whole space up to a nullset. Birkhoff’s Ergodic Theorem 6.13 implies that if μ is an ergodic shiftinvariant probability measure on (X, σ), then for μ-a.e. x ∈ X, fw (x) = μ([w]) for all w ∈ L(X). However, if fw (x) exists for every w ∈ L(X), the associated measure need not be ergodic. For example, the sequence x = 1001110000111110000001111111 · · · 0n 1n+1 · · · is associated to a combination of Dirac measures measure is clearly not ergodic.

1 ∞ 2 (δ0

+ δ1∞ ), and this

Remark 1.31. Regardless of whether μ is ergodic or not, we call it a generic measure if there is a point x ∈ X such that the frequency fw (x) = μ([w]) for all w ∈ L(X).

14

1. First Examples and General Properties of Subshifts

Definition 1.32. Let A = {1, 2, . . . , d} and X = AN or Z be the full shift space. Let p = (p1 , . . . , pd ) be a probability vector; i.e. pi ≥ 0 and p1 + · · · + pd = 1. The product measure that assigns to every cylinder set μp ([xk · · · xl ]) = pxk pxk+1 · · · pxl is called the p-Bernoulli measure. The measure can be extended to the Borel σ-algebra by means of the Kolmogorov Extension Theorem. Each p-Bernoulli measure is shift-invariant. Bernoulli measures12 are a basic tool in probability theory. For example, encode a sequence of coin-flips by, say, xi = 0 if the i-th flip gives a “head”, and xi = 1 if the i-th flip gives a “tail”. This gives a sequence x ∈ {0, 1}N . If the coin has a bias, say “head” come up with probability p > 12 and “tail” with probability q = 1 − p < 12 , then the probability of a word can be computed by multiplying probabilities; e.g. the probability P(x1 x2 x3 x4 = 0010) = p3 q. Definition 1.33. A subshift (X, σ) is uniquely ergodic if it admits only one invariant probability measure. If (X, σ) is both uniquely ergodic and minimal, it is called strictly ergodic. (This should not be confused with intrinsically ergodic which means that there is a unique measure of maximal entropy; see Definition 6.70.) The full shift is obviously not uniquely ergodic; it has for instance a Bernoulli measure for every probability vector p and neither are SFTs, sofic shifts, or β-shifts (which are, in fact, intrinsically ergodic). The Thue-Morse shift on the other hand is uniquely ergodic. Clearly, unique ergodicity implies intrinsic ergodicity, but not the other way around. It follows from Oxtoby’s Theorem 6.20 that a recurrent subshift (X, σ) is uniquely ergodic if and only if fw (x) exists and is the same for every x ∈ X. In this case, the convergence in the limit (1.5) is uniform in x.

1.6. Symbolic Itineraries An important application of symbol sequences is to use them to represent trajectories of dynamical systems (see Section 2.1 for an introduction to dynamical systems). It was probably Hadamard who first used this idea in his studies of geodesic flows [296]. Over 40 years later, Morse & Hedlund’s [425] wrote the first monograph on symbolic dynamics. If T : X → X is some map on a topological space, denote the n-fold compositions by T n = T ◦· · ·◦T (and T −n is the n-fold composition of T inv if T is invertible). Symbolic 12 Named after Jacob Bernoulli, one of the mathematicians’ family originating from Basel who wrote the book “Ars conjectandi”, one of the first books on probability theory.

1.6. Symbolic Itineraries

15

α

0

1

0

α

1 1/β

0

1 1/2

Figure 1.4. A circle rotation Rα (x) = x + α (mod 1), a βtransformation Tβ (x) = βx (mod 1), and the quadratic map f4 (x) = 4x(1 − x).

dynamics emerges from the dynamical system (X, T ) by coding the T -orbits of the points x ∈ X. To this end, for a finite or countable alphabet A, we let J = {Ja }a∈A be a partition of X. Then to each x ∈ X we assign an itinerary i(x) ∈ AN0 : in (x) = a

if T n (x) ∈ Ja .

If T is invertible, then we can extend itineraries to sequences in AZ . It is clear that i◦T (x) = σ◦i(x). Therefore, i(X) is σ-invariant and if T : X → X is onto, then σ(i(X)) = i(X). In general, however, i(X) is not closed, so we need to take the closure before it can be called a subshift. Using this subshift, we can often show the abundance of different trajectories (periodic or with other properties) of the original system (X, T ). Example 1.34. Let X be the closure of the collection of symbolic itineraries of a circle rotation Rα : S1 → S1 over angle α ∈ [0, 1] \ Q; see Figure 1.4 (left). We use the partition J = {J0 , J1 } with J0 = [0, α) and J1 = [α, 1). Hence, if y ∈ S1 and n ∈ Z, then  0 if Rn (y) ∈ [0, α), i(y)n = 1 if Rn (y) ∈ [α, 1). Slightly different coding comes from the partition {(0, α], (α, 1]}, but the closure of i(S1 ) is the same for both partitions. The resulting shift is called a Sturmian shift; see Definition 4.60. Example 1.35. Consider the β-transformation Tβ : [0, 1] → [0, 1], Tβ (x) = βx mod 1 (see Figure 1.4 (middle)), and i(x)n = a if Tβn (x) ∈ Ja := [ na , a+1 β ). The closure of i([0, 1]) is called a β-shift; see Section 3.5. Example 1.36. Let X = [0, 1] and T (x) = f4 (x) = 4x(1−x); see Figure 1.4 (right). Let J0 = [0, 12 ] and J1 = ( 12 , 1]. Then i(X) is not closed, because

16

1. First Examples and General Properties of Subshifts

there is no x ∈ [0, 1] such that i(x) = 1100000 . . . , while 1100000 · · · = limx 1 i(x). Naturally, redefining the partition to J0 = [0, 12 ) and J1 = [ 12 , 1] 2 doesn’t help, because then there is no x ∈ [0, 1] such that i(x) = 0100000 . . . , while 0100000 · · · = limx 1 i(x). This shows that we have to take the closure 2 to obtain a subshift. There are other ways of coding in the literature to obtain a subshift: • Assign a different symbol (often ∗ or C) to 12 . That is, using the partition J0 = [0, 12 ), J∗ = { 12 } and J1 = ( 12 , 1]. This resolves the “ambiguity” about which symbol to give to 12 , but it doesn’t make the shift space closed. • Assign the two symbols to 12 , so J0 = [0, 12 ] and J1 = [ 12 , 1] are no longer a partition but have 12 in common. Therefore 12 will have two itineraries and so will every point in the backward orbit of 12 . With all these extra itineraries, i(X) becomes closed. But this doesn’t work in all cases; see Exercise 1.37. • Take a quotient space i(X)/ ∼ where in this case x ∼ y if there is n ∈ N0 such that  xn xn+1 xn+2 xn+3 xn+4 · · · = 11000 . . . , x0 · · · xn−1 = y0 · · · yn−1 and yn yn+1 yn+2 yn+3 yn+4 · · · = 01000 . . . or vice versa. This quotient space adopts the quotient topology (so i(X)/ ∼ is not a Cantor set anymore), and it turns the coding map i : [0, 1] → {0, 1}N0 / ∼ into a genuine homeomorphism. Exercise 1.37. Let a=3.83187405528332 . . . and T (x)=fa (x)=ax(1 − x). For this parameter, T 3 ( 12 ) = 12 . Let J  = {[0, 12 ], ( 12 , 1]} and J = {[0, 12 ], [ 12 , 1]}, so 12 gets two symbols. Let Σ = i(X) w.r.t. J  and Σ = i(X) w.r.t. J . Show that Σ = Σ. From now on, assume that X is a compact metric space without isolated points. We will now discuss the properties of the coding map i itself. First of all, for i to be continuous, it is crucial that T |Ja is continuous on each element Ja ∈ J . But this is not enough: if x is a common boundary of two elements of J , then (no matter how you assign the symbol to x in Example 1.36), for each neighborhood U  x, diam(i(U )) = 1, so continuity fails at x. It is only by using quotient spaces of i(X) (so changing the topology of i(X)) that we can make i continuous. Normally, we choose to live with the discontinuity, because it affects only a few points: Lemma 1.38. Let ∂J denote the collection of common boundary points of different elements in a partition J . If orb(x) ∩ ∂J = ∅ for all J ∈ J , then the coding map i : X → AN0 or AZ is continuous at x.

1.6. Symbolic Itineraries

17

Proof. We carry out the proof for invertible maps. Let ε > 0 be arbitrary and fix N ∈ N such that 2−N < ε. For each n ∈ Z with |n| ≤ N , let Un  T n (x) be such a small neighborhood that it is contained in a single partition element Jin (x) . Since orb(x) ∩ ∂J = ∅, this is possible. Then U := |n|≤N T −n (Un ) is an open neighborhood of x and in (y) = in (x) for all |n| ≤ N and y ∈ U . Therefore diam(i(U )) ≤ 2−N < ε, and continuity at x follows.  Definition 1.39. A transformation T : X → X of a metric space (X, d) is called expansive if there exists δ > 0 such that for all distinct x, y ∈ X, there is n ≥ 0 (or n ∈ Z if T is invertible) such that d(T n (x), T n (y)) > δ. We call δ the expansivity constant. Every subshift (X, σ) is expansive. Indeed, if x = y, then there is n ∈ N (or n ∈ Z if (X, σ) is a two-sided shift) such that xn = yn , so d(σ n (x), σ n (y)) = 1. This makes every δ ∈ (0, 1) an expansivity constant. Lemma 1.40. Suppose that T is a continuous expansive map and injective If the expansivity constant δ > on each Ja of some partition J . supa∈A diam(Ja ), then the coding map i : X → AN0 or Z is injective. Proof. Suppose that i is not injective, so there are x = y ∈ X such that i(x) = i(y). Since T |Ja is injective for each a ∈ A, T n (x) = T n (y) for all n ∈ Z. By expansiveness, there is n ∈ Z such that d(T n (x), T n (y)) > δ. By assumption, they cannot lie in the same element of J . Hence x and y cannot have the same itinerary after all.  To obtain injectivity of the coding map, it often suffices (but not always; see Example 1.43 below) that T is expanding on each partition element Ja . Expanding (and expansion) should not be confused with expansive (and expansivity) of Definition 1.39. Definition 1.41. Let T : X → Y be a map between metric spaces. We call T expanding if there is ρ > 1 such that dY (T (x), T (y)) ≥ ρdX (x, y) for all x, y ∈ X and locally expanding if there are ε > 0 and ρ > 1 such that d(T (x), T (y)) ≥ ρd(x, y) for all x, y ∈ Y with d(x, y) < ε. Proposition 1.42 (Gottschalk & Hedlund [284]). Let T : X → X be a homeomorphism on a compact metric space (X, d). If T is locally expanding, then X is finite. Compact is important. For example T : R → R, x → 2x, would be a counterexample without the compactness assumption. Proof. Let ε > 0 and ρ > 1 be as in Definition 1.41. Since T −1 is continuous and X is compact, there is a δ > 0 such that d(x, y) < δ implies d(T −1 (x), T −1 (y)) < ε. Let {Ui }N i=1 be a finite open cover of X

18

1. First Examples and General Properties of Subshifts

such that diam(Ui ) < δ. Then {T −1 (Ui )}N i=1 is an open cover of X, and −1 diam T (Ui ) < ε, so by local expansion, diam T −1 (Ui ) < diam(Ui )/ρ ≤ δ/ρ. Repeating this argument, we find that {T −n (Ui )}N i=1 is a finite open cover  of X with diam(T −n (Ui )) < δρ−n . Since n is arbitrary, #X ≤ N . Example 1.43. In this example, we show that despite T being expanding on partition elements Ja , a ∈ A, this may still not result in an injective coding map i : X → AN0 if the diameter of some of the Ji ’s is too big. Let T : S1 → S1 , x → 2x mod 1, be the doubling map, and let J0 = and J1 = S1 \ J0 . Clearly T  (x) = 2 for all x ∈ S1 , but T is not expanding on the whole of S1 , because for instance d(T ( 14 ), T ( 34 )) = 0 < 1 1 3 2 = d( 4 , 4 ). More importantly, T is not expanding on J0 or J1 either; for example d(T ( 14 + ε), T ( 34 − ε)) = 4ε < 12 − 2ε = d( 14 + ε, 34 − ε) for each 1 ). The corresponding coding map is not injective. The way to see ε ∈ (0, 12 this is by noting that the involution S(x) = 1 − x commutes with T and also preserves each Ja . It follows that i(x) = i(S(x)) for all x ∈ S1 , and only x = 0 and x = 12 have unique itineraries. For the more general partition J0b = (b, b + 12 ) and J1b = S1 \ J0b for b ∈ [0, 12 ), see Remark 3.102. ( 14 , 34 )

Chapter 2

Topological Dynamics

In essence, symbolic dynamical systems are dynamical systems on a topological (in fact metric) space and therefore share many of the topological properties that general dynamical systems can have. In this chapter, we discuss several of these general topological properties, such as minimality, entropy, versions of equicontinuity and mathematical chaos, as well as topological mixing and shadowing properties.

2.1. Basic Notions from Dynamical Systems A dynamical system is a mathematical description of how a physical system evolves in time. It consists of the following: • A phase space X, usually a metric space, or at least topological space, describing the state of the system. For example, R2n can be used to describe the positions and velocities of n point-particles moving along a line, or R6n for the positions and velocities of n point-particles moving in R3 . • A time space, which could be R (for continuous time) or N0 := N ∪ {0} (or Z if the dynamical system is time-invertible) if the observations are only made at discrete time steps. More complicated (multi-dimensional or group-valued) time can be considered too, but in this text, time is always discrete: N0 or Z. • An evolution rule, which for discrete time takes the form of a transformation T : X → X satisfying: (1) T 0 (x) = x for all x ∈ X. (2) T m+n (x) = T m (T n (x)) for all m, n ∈ N0 (or Z) and all x ∈ X. 19

20

2. Topological Dynamics

This is realized if we let T n be the n-fold composition: T n (x) = T ◦ T ◦· · · ◦ T n times

and T −n is the n-fold composition of its inverse transformation if it exists. If T is continuous, then (X, T ) is called a continuous dynamical system. Definition 2.1. Let (X, T ) be a dynamical space on a topological space. The orbit of x ∈ X is the set  {T n (x) : n ∈ Z} if T is invertible, orb(x) = {T n (x) : n ≥ 0} if T is non-invertible. The set orb+ (x) = {T n (x) : n ≥ 0} is the forward orbit of x. This notation is useful if T is invertible; if T is non-invertible, then orb+ (x) = orb(x). Exercise 2.2. Let σ : Σ → Σ be invertible. Is there a difference between x ∈ orb(x) \ {x} and x ∈ orb+ (x) \ {x}? We distinguish several types of orbits. Namely, a point x is: • Periodic if T n (x) = x for some n ≥ 1. The smallest such n is called the period of x. If the period is 1, then x is called a fixed point. • Preperiodic if T m+n (x) = T m (x) for some m, n ∈ N. The minimal such m, n are called the preperiod and period of x. • Asymptotically periodic if there is a periodic point y ∈ / orb(x) n n such that d(T (x), T (y)) → 0 as n → ∞. The periodic point y is attracting if it is periodic and has a neighborhood1 U such that n T (U ) = {y}. If y has a neighborhood U such that T n (U ) = {y}, then y is repelling. For example, for the quadratic family with a = 3.83187405528332 . . . as in Exercise 1.37, the point x = 12 has period 3, and since Qa ( 12 ) = 0, it is easy to show that 12 is attracting. The two fixed points are 0 and 1 − a1 ; they are repelling. For the circle rotation Rα , every point is periodic if and only if α ∈ Q; if α = m/n in lowest terms, then each point x ∈ S1 has period n and can be called neutral. If α ∈ / Q, then every orbit is dense in S1 . Definition 2.3. Let (X, T ) be a dynamical space on a topological space. The ω-limit set of x is the set of accumulation points of its forward orbit. 1 If the space X is one-dimensional, then we can speak of one-sided attracting if there is a  one-sided neighborhood U of y such that T n (U ∪ {y}) = {y}.

2.1. Basic Notions from Dynamical Systems

In formula, ω(x) =



21

T m (x) = {y ∈ X : ∃ ni → ∞, lim T ni (x) = y}.

n∈N m≥n

i→∞

We call x recurrent if x ∈ ω(x). Analogously for invertible dynamical systems, the α-limit set of x is the set of accumulation points of its backward orbit of x:  T m (x) = {y ∈ X : ∃ ni → ∞, lim T −ni (x) = y}. α(x) = n∈N m≤−n

i→∞

Definition 2.4. Given a dynamical system (X, T ), a point x ∈ X is called non-wandering if for every neighborhood U  x there is an n ≥ 1 such that T −n (U ) ∩ U = ∅. The non-wandering set Ω(T ) is the set of all non-wandering points. Recurrent points are always non-wandering, but Ω(T ) can contain nonrecurrent points. In the one-sided full shift, for instance, x = 0111111 · · · is not recurrent but is non-wandering. If (X, T ) has a dense orbit, then Ω(T ) = X. Definition 2.5. Two dynamical systems (X, f ) and (Y, g) are (topologically) conjugate if there is a homeomorphism ψ : X → Y such that ψ ◦ f = g ◦ ψ. If ψ ◦ f = g ◦ ψ and ψ : X → Y is a continuous, onto, but not necessarily one-to-one map, then ψ is called a semi-conjugacy or factor map, (Y, g) is called a factor of (X, f ), and (X, f ) is called an extension of (Y, g). This extension is almost one-to-one if there is a dense set Y  such that #ψ −1 (y) = 1 for all y ∈ Y  . A conjugacy ψ : X → Y is called pointed if it sends specified points x ∈ X and y ∈ Y to each other. Lemma 2.6. Let (X, f ) and (Y, g) be dynamical systems that are conjugate via g ◦ ψ = ψ ◦ f . Then: (1) If p is a (pre)periodic point for f , then ψ(p) is a (pre)periodic point of g, and the (pre)periods are the same. (2) If f, g are continuous, then the conjugacy preserves ω-limit sets: ψ(ω(x)) = ω(ψ(x)). (3) If the periodic point p is attracting/repelling, then ψ(p) is also attracting/repelling. Proof. First note that ψ ◦ f n = ψ ◦ f ◦ ψ −1 ◦ ψ ◦ f ◦ ψ −1 ◦ ψ ◦ f ◦ ψ −1 ◦ · · · ◦ f = g ◦ ψ ◦ ψ −1 ◦ g ◦ ψ ◦ ψ −1 ◦ g ◦ ψ ◦ ψ −1 ◦ · · · ◦ g ◦ ψ = g n ◦ ψ.

22

2. Topological Dynamics

1. Take p such that f n (p) = p and q = ψ(p). Then g n (q) = g n ◦ ψ(p) = ψ ◦ f n (p) = ψ(p) = q, so q if n-periodic for g. Next, suppose that f m+n (p) = f m (p), and set q = ψ(p). Then g m+n (q) = g m+n ◦ ψ(p) = ψ ◦ f m+n (p) = ψ ◦ f m (p) = g m ◦ ψ(p) = g m (q). 2. Now assume that x ∈ ωf (a), so there is a sequence nk → ∞ such that f nk (a) → x. Set y = ψ(x) and b = ψ(a). Then, by continuity of f , g nk (b) = g nk ◦ ψ(a) = ψ ◦ f nk (a) → ψ(x) = y, so y ∈ ωg (b). 3. If p = f (p) is asymptotically attracting, then for every a ∈ X sufficiently close to p, we have p = ωf (a). By part 1, q := ψ(p) is a fixed point  of g, and by part 2, q = ωg (y) for y = ψ(x). Exercise 2.7. Is the following true? If (X, f ) is a factor of (Y, g) and (Y, g) is a factor of (X, f ), then (X, f ) and (Y, g) are conjugate. Example 2.8. The quadratic Chebyshev polynomial Q2 (y) = 2y 2 − 1 on [−1, 1] is conjugate to the tent map T (x) = min{2x, 2(π − x)} on [0, π]. Indeed, (2.1)

Q2 ◦ ψ = ψ ◦ T

for

ψ(x) = cos x.

It is very unusual to find smooth conjugacies between maps, and even here, ψ is not diffeomorphic at the endpoints 0, 1. But applying (2.1) n times and then differentiating, we find (Qn2 ) ◦ ψ(x) · ψ  (x) = ψ  (T n (x)) · (T n ) (x). If x is a p-periodic point of T , and hence y = ψ(x) a p-periodic point of Q2 , we see that |(Qp ) (y)| = 2p . The only periodic point where this fails is y = ψ(0) = 1, because ψ  (0) = 0. Note that the same conjugacy works for the degree n Chebyshev polynomial Qn and the slope n tent map with n branches. The characterization of Chebyshev polynomials Qn (x) = cos(n arccos x) is the cause of this. Example 2.9. We show that two circle rotations Rα and Rβ are not conjugate if 0 ≤ α < β < 1. Let < denote the positive orientation on S1 . Choose n ∈ N such that nα ≤ k < nβ and (n − 1)β ≤ k for some integer k. Then, setting y = ψ(0), (2.2)

Rαn (0) ≤ 0 ≤ Rα (0)

and

y ≤ Rβn (y) ≤ Rβ (y).

The homeomorphism ψ : S1 → S1 must either preserve or reverse the orientation of the circle, but neither way is compatible with (2.2). Therefore there cannot be any conjugacy. A more structural way to see this is using lifts and rotation numbers; see Theorem 4.54. Indeed, the rotation number ρ(f ) is preserved on conjugacy, and ρ(Rα ) = α = β = ρ(Rβ ).

2.2. Transitive and Minimal Systems

23

Definition 2.10. Two dynamical systems (X, f ) and (Y, g) are called orbit equivalent if there is a homeomorphism ψ : X → Y such that ψ(orbf (x)) = orbg (ψ(x)) for all x ∈ X; i.e. ψ sends orbits to orbits (set-wise, not necessarily point-wise). Clearly, a conjugacy is an orbit equivalence. If f and g are themselves homeomorphisms and ψ ◦ f = g −1 ◦ ψ, then ψ is called a flip-conjugacy and this is also an orbit equivalence. More generally, if ψ is a conjugacy or flip-conjugacy, then ψ ◦ f k is an orbit equivalence for each k ∈ Z. Orbit equivalence implies the existence of two functions m, n : X → Z, called orbit cocycles, such that ψ ◦ f (x) = g n(x) ◦ ψ(x)

and

ψ ◦ f m(x) = g ◦ ψ(x).

Thus the orbit cocycle of a conjugacy is constant 1 and of a flip-conjugacy is constant −1. Another special case of orbit equivalence is a speed-up: (Y, g) is a speed-up of (X, f ) if it is orbit equivalent and the orbit cocycle m : X → Z is non-negative. Definition 2.11. Two dynamical systems (X, f ) and (Y, g) are strongly orbit equivalent if their orbit cocycles are continuous on X, except for at most one point each.

2.2. Transitive and Minimal Systems The following definition expresses that all parts of a dynamical system connect to each other: Definition 2.12. A dynamical system (X, T ) is (topologically) transitive if for every two non-empty open2 sets U, V ⊂ X, there is an n ≥ 0 such that U ∩ T −n (V ) = ∅.3 It is called totally transitive if T N is transitive for each N ∈ N. Clearly totally transitive implies transitive. The other implication fails; for example, σ is transitive on the two-point subshift {(10)∞ , (01)∞ } but σ 2 is not. Proposition 2.13. Let X be a compact regular Hausdorff space4 without isolated points and which is second countable; i.e. it possesses a countable basis of its topology. A continuous map T : X → X is a topologically transitive map if and only if there is a dense orbit. 2 Some

authors use the abbreviation opene for open and non-empty. texts write T n (U )∩V = ∅, which may be more intuitive but the fact that T n (U ) need not be open (or not measurable even if U is measurable) might in some cases lead to inadvertent problems. 4 Regular Hausdorff means that singletons {x} are closed and for all closed sets A and x ∈ /A there are neighborhoods U  x and V ⊃ A such that U ∩ V = ∅. 3 Many

24

2. Topological Dynamics

Remark 2.14. The notion of dense orbit may need further explanation if the subshift is two-sided. Consider the sequence (2.3)

x = · · · 000000000000000000.101000101000000000101000101 · · · .

This sequence emerges from the Cantor substitution  0 → 000 χCantor : 1 → 101 from the seed 0.1. This sequence has a dense forward orbit orb+ (x) within its forward orbit closure orb+ (x) as well as a dense backward orbit orb− (x) within its backward orbit closure orb− (x). However, orb− (x) is not dense in its two-sided orbit closure. Proof. Let {Uj }j∈N be a countable basis of the topology. Let U, V ⊂ X be arbitrary open sets. Take j, k ∈ N such that Uj ⊂ U , Uk ⊂ V . Since orb(x) is dense and X has no isolated points, x visits each Uj infinitely often. Hence there is m, n ∈ N such that T m (x) ∈ Uj and T m+n (x) ∈ Uk . This shows that U ∩ T −n (V ) = ∅. Conversely, by topological transitivity applied to U1 and U2 , we can find n1 such that U1 ∩ T −n1 (U2 ) = ∅. By continuity of T , U1 ∩ T −n1 (U2 ) is an open set. Choose V2 open such that V 2 ⊂ U1 ∩ T −n2 (U2 ). Here we use the regular Hausdorff property of X. Next, using topological transitivity applied to V2 and U3 , choose n2 > n1 such that V2 ∩ T −n2 (U3 ) = ∅. Then choose an open set V3 such that V3 ⊂ V2 ∩ T −n2 (U3 ). Continuing this way we find a nested sequence of open sets Vk , with V k ⊂ Vk−1 , and a sequence of integers (nk ) such that Vk ⊂ T −nk (Uk+1 ). Let V∞ = k Vk . Since V k ⊂ Vk−1 and closed sets in X are automatically compact, this intersection is non-empty, and every x ∈ V∞ has a dense orbit. This concludes the proof.  A strong form of transitivity is minimality: Definition 2.15. A dynamical system (X, T ) is minimal if every orbit is dense in X. Remark 2.16. It is a straightforward application of Zorn’s Lemma that every dynamical system on a compact space5 contains at least one minimal subsystem. For compact metric spaces, this fact can also be shown without the use of Zorn’s Lemma; see [304, Chapter 1, Theorem 2.2.1]. 5 Compactness is important; otherwise one could take a single non-recurrent orbit (without its closure) as the phase-space. An interesting example with only recurrent orbits but no minimal subset is due to Auslander [38, page 27].

2.2. Transitive and Minimal Systems

25

Proposition 2.17. Let X be a compact topological space. We have the following equivalent characterizations for a continuous dynamical system (X, T ) to be minimal: (i) There is no closed T -invariant proper subset of X. (ii) Every orbit is dense in X. (iii) There is a dense orbit and T is uniformly recurrent6 ; i.e. for every open set U ⊂ X there is an N ∈ N such that for every x ∈ U there is 1 ≤ n ≤ N such that T n (x) ∈ U . Proof. We prove the three implications by the contrapositive. (i) ⇒ (ii): Suppose that x ∈ X has an orbit that is not dense. Then orb(x) is a T -invariant closed proper subset, so (i) fails. (ii) ⇒ (iii): By (ii) every orbit is dense, so there is at least one dense orbit. Now to prove uniform recurrence, let U be any open set and U0 an open subset such that U0 ⊂ U . / U0 Suppose that for every N ∈ N there is xN ∈ U0 such that T n (xN ) ∈ for all 1 ≤ n ≤ N . Let x ∈ U0 ⊂ U be an accumulation point of (xN )N ∈N . Since x has a dense orbit, we can take n ≥ 1 such that T n (x) ∈ U0 . Take an open set V  x such that T n (V ) ⊂ U0 . Next take N ≥ n so large that xN ∈ V . But this means that T n (xN ) ∈ U0 , which is against the definition of xN . Hence no such n exists, and therefore orb(x) is not dense, and (ii) fails. Now take y ∈ U arbitrary (so not necessarily in U0 ), and take x ∈ U0 with a dense orbit. Find a sequence ki such that T ki (x) → y. For each i there is 1 ≤ ni ≤ N such that T ki +ni (x) ∈ U0 . Passing to a subsequence, we may as well assume that ni ≡ n. Then T n (y) = T n (limi T ki (x)) = limi T ki +n (x) ∈ U0 ⊂ U . This proves the uniform recurrence of U . (iii) ⇒ (i): Let x be a point with a dense orbit. Suppose that Y is a closed T -invariant proper subset of X and let U ⊂ X be non-empty open such that U ∩ Y = ∅. Let n ≥ 0 be minimal such that u := T n (x) ∈ U . Let N = N (U ) ≥ 1 be as in the definition of uniform recurrence, and let y ∈ Y be arbitrary. Since orb(y) ⊂ Y , there is an open set V  y such that V ∩ T −i (U ) = ∅ for 0 ≤ i ≤ N . 

Take n > n minimal such that T n (u) ∈ V , and let n < n be maximal  / U for all 1 ≤ i ≤ n − n + N . such that T n (u) =: u ∈ U . Then T i (u ) ∈ 6 The expression “almost periodic” is frequently used as well, e.g. in [284, 381, 398, 465], but it is not the same with all authors and sometimes refers to a different notion. For instance, in [482] it is used as “periodically recurrent” in our Definition 2.19.

26

2. Topological Dynamics

Since N was arbitrary, this contradicts the uniform recurrence and hence such Y cannot exist.  Definition 2.18. Uniform recurrence means that the set N (x, U ) := {n ∈ Z or N : x ∈ T −n (U )} is syndetic for every x ∈ X; i.e. it has bounded gaps (from the Greek συνδετ ικoς = bound together). A set that is not syndetic has a complement that is thick: for every N ∈ N it contains blocks {n, n + 1, . . . , n + N }. Definition 2.19. A dynamical system is called periodically recurrent if for every non-empty open set U , there is N such that U ⊂ T −kN (U ) for all k ∈ N (or k ∈ Z if T is invertible). Since periodic recurrence is obviously stronger than uniform recurrence, we have the following corollary. Corollary 2.20. Every periodically recurrent dynamical system is minimal. Definition 2.21. Given a dynamical system (X, T ), a point x ∈ X is uniformly recurrent (resp. periodically recurrent) if for every neighborhood U  x, the set N (x, U ) is syndetic (resp. N (x, U ) ⊃ {bk : k ∈ N or Z} for some b ∈ N). Corollary 2.22. Let (X, T ) be a continuous dynamical system and let x ∈ X have a dense orbit. Then (X, T ) is minimal (resp. periodically recurrent) if and only if x is uniformly recurrent (resp. periodically recurrent). Proof. If (X, T ) is minimal, then x is uniformly recurrent by Proposition 2.17, part (iii). Conversely, assume that x is uniformly recurrent. First observe that every u ∈ orb(x) is uniformly recurrent too. Indeed, suppose u = T n (x), and let V be an open neighborhood of x. Then for every open neighborhood U of u, also U  = T −n (U ) ∩ V is an open neighborhood of x, and N (u, U ) ⊃ N (x, U  ) + n. Now minimality of (X, T ) follows precisely as in the step (iii) ⇒ (i) in the proof of Proposition 2.17. The proof for x periodically recurrent is analogous.



Definition 2.23. A dynamical system (X, T ) on a metric space (X, d) is uniformly rigid if for every ε > 0 there is an iterate n ≥ 1 such that d(T n (x), x) < ε for all x ∈ X. Lemma 2.24. A continuous dynamical system (X, T ) on a Cantor set (or compact zero-dimensional set) is uniformly rigid if and only if it is periodically recurrent.

2.2. Transitive and Minimal Systems

27

For this result, it is important that the space X is zero-dimensional. For example, irrational rotations Rα on the circle are uniformly rigid but only uniformly, so not periodically, recurrent. The uniform rigidity follows immediately because a circle rotation is an isometry and every point is recurrent. But periodic recurrence fails because for every n ∈ N and x ∈ S1 , the set {Rαkn (x) : k ∈ N} is dense in S1 . The proof below, however, shows that a periodically recurrent dynamical system on a compact space is uniformly rigid. Proof. ⇒: Take ε > 0 arbitrary with corresponding iterate n ≥ 1, and let k ∈ N be the smallest integer such that 2−k < ε. Thus the distance between every two distinct k-cylinders Z in X is at least ε. By uniform rigidity T n (Z) = Z, and therefore T kn (Z) = Z for all k ≥ 0, proving periodic recurrence. ⇐: Let ε > 0 be arbitrary. For each x ∈ X, we can find a neighborhood Ux of diam(Ux ) < ε and iterate nx such that T nx (Ux ) ⊂ Ux . By compactness,  there is a finite collection x1 , . . . , xN such that X = N i=1 Uxi . Take n = n lcm{nx1 , . . . , nxN }. Then d(T (x), x) < ε for each x ∈ X, as required.  The following weakening of minimality is of importance for e.g. Toeplitz shifts and B-free shifts; see Sections 4.5 and 4.6. Definition 2.25. A dynamical system (X, T ) is called essentially minimal if it contains a unique minimal set Y , i.e. a unique non-empty closed set Y such that T (Y ) = Y . Clearly, essentially minimal maps can have at most one periodic orbit, but as the subshift X := {σ k (· · · 000001000000 · · · )}k∈Z ∪ {0∞ } shows, X \ Y = ∅ is possible. However, the two-sided orbit closure of (2.3) does not give an essentially minimal shift. Proposition 2.26. Given a dynamical system (X, T ) and a point y ∈ X, the following are equivalent: (i) (X, T ) is essentially minimal and y is contained in its minimal set. (ii) For every x ∈ X, ω(x)  y. If, in addition, T is invertible, then two further equivalent statements are: (iii) For every x ∈ X, α(x)  y.  (iv) For every open set U  y, n∈Z T n (U ) = X. Proof. (i) ⇒ (ii): ω(x) is a closed non-empty T -invariant set, so by Zorn’s Lemma, it contains a minimal set. But Y is the unique minimal set, so y ∈ ω(x).

28

2. Topological Dynamics

(ii) ⇒ (i): Assume by contradiction that y ∈ Y and Y  are minimal sets, and take x ∈ Y , x ∈ Y  . By assumption y ∈ ω(x) ∩ ω(x ), so y ∈ Y ∩ Y  . Thus Y ∩ Y  is a non-empty, closed, and T -invariant subset of both Y and Y  . Since Y and Y  are minimal, Y = Y ∩ Y  = Y  . (i) ⇔ (iii): Use the above with T −1 instead of T .

 (i) ⇒ (iv): Let U be an arbitrary neighborhood of y. Since n∈Z T n (U ) is an open (two-sided!) T -invariant set, its complement Y  is closed and T set that is disjoint from y, invariant. If Y  = ∅, then it contains a minimal  contradicting essential minimality. Hence n∈Z T n (U ) = X. (iv) ⇒ (iii): Let x ∈ X be arbitrary; we can assume without loss of generality that x = T k (y) for all k ≥ 0, because if y is periodic, then α(x) = orb(y)  y, and otherwise we replace x by T −(k+1) (x) to get it outside the forward orbit of y. Let (Ur )r∈N  be a nested sequence of neighborhoods of y such that r Ur = {y}. Since n∈Z T n (Ur ) = X and X is compact, there  r n Nr to both sides, we is a finite Nr such that N n=−Nr T (U ) = X. Applying T 2Nr n obtain n=0 T (U ) = X. Thus there is nr ≤ 2Nr such that T −nr (x) ∈ Ur . As we can do this for every r, we have found a sequence (nr ) (and nr → ∞ because x = T k (y) for any k ≥ 0) such that T −nr (x) → y. Thus y ∈ α(x), as required. 

2.3. Equicontinuous and Distal Systems The opposite to expansive (recall Definition 1.39) is equicontinuous. Definition 2.27. A dynamical system (X, T ) on a metric space (X, d) is called equicontinuous if for all ε > 0 there exists δ > 0 such that if d(x, y) < δ, then d(T n (x), T n (y)) < ε for all n ≥ 0 (or n ∈ Z if T is invertible). This is sometimes also called Lyapunov stability. Naturally, if T is not injective, then distality fails immediately. Every isometry, i.e. a dynamical system such that d(T (x), T (y)) = d(x, y) for all x, y ∈ X, is equicontinuous. Exercise 2.28. Let (X, T ) be an equicontinuous dynamical system. Show that it is topologically transitive if and only if it is minimal. Lemma 2.29. Let (X, T ) be an equicontinuous surjection on a compact metric space (X, d). Then the non-wandering set Ω(T ) = X. Proof. Suppose by contradiction that x ∈ X is wandering; i.e. there is an ε > 0 such that T k (Bε (x)) ∩ Bε (x) = ∅ for all k ≥ 1. In particular, x is not periodic. By equicontinuity, there is δ > 0 such that d(a, b) < δ implies d(T n (a), T n (b)) < ε/2 for all n ≥ 0. Construct a backward orbit / Bε (x) for all k ∈ N \ {n}. (x−n )n≥0 , i.e. T n (x−n ) = x and T k (x−n ) ∈

2.3. Equicontinuous and Distal Systems

29

By compactness of X, (x−n )n≥0 has an accumulation point y ∈ X. Let m < n be so that d(y, x−m ) < δ and d(y, x−n ) < δ. Then T n (x−n ) = x ∈ / Bε (x), so d(T n (x−m ), T n (y)) ≥ ε/2 or Bε (x) and T n (x−m ) = T n−m (x) ∈ n n d(T (x−n ), T (y)) ≥ ε/2. This contradicts equicontinuity of T and hence there cannot be a wandering point.  Lemma 2.30. If an equicontinuous dynamical system (X, T ) on a compact metric space (X, d) is topologically transitive, then it is uniformly rigid. See [324, Proposition 1.1] for more general results in this direction. Proof. Suppose z ∈ X has a dense orbit. Take ε > 0 arbitrary and choose δ ∈ (0, ε/3) such that d(x, y) < δ implies d(T n (x), T n (y)) < ε/3 for all n ≥ 0.  −1 n N Choose N ∈ N so large that N n=0 Bδ (T (z)) = X and d(T (z), z) < δ. Now n let x be arbitrary and take 0 ≤ n < N such that d(T (z), x) < δ. Then d(T N (x), x) ≤ d(T N (x), T N +n (z)) + d(T n+N (z), T n (z)) + d(T n (z), x) ε ε + +δ 0 some δ > 0 (and δ → 0 as ε → 0) such that d(x, y) < δ implies d∞ (x, y) < ε, and therefore xn → x in the metric d if and only if xn → x in the metric d∞ . Hence both metrics generate the same topology. If T is itself a strict contraction, then also d∞ (T (x), T (y)) < d∞ (x, y), but if X is compact and T is surjective, then the dynamical system (X, T ) is an isometry in the metric d∞ . Proposition 2.31. If a dynamical system (X, T ) is equicontinuous and surjective on a compact metric space (X, d), then T preserves d∞ . Since isometries are injective, (X, T ) is automatically a homeomorphism in this case. Proof. We have already seen that d∞ (T (x), T (y)) ≤ d∞ (x, y) for all x, y ∈ X. Assume by contradiction that we have strict inequality for some choice a = b, say d∞ (a, b) = d∞ (T (a), T (b)) + 9ε for some ε > 0.

30

2. Topological Dynamics

Consider the product system T2 : X 2 → X 2 with metric d2 ((x, x ), (y, y  )) := max{d∞ (x, y), d∞ (x , y  )}. Clearly T2 is non-expanding on (X 2 , d2 ). Let B ⊂ X 2 be the ε-ball w.r.t. d2 around (a, b). So, if (x, y) ∈ B, then d∞ (x, a) < ε and d∞ (y, b) < ε. Assume by contradiction that there is n ≥ 1 such that B ∩ T2n (B) = ∅. This would mean that d∞ (T n (x), a) < 3ε and d∞ (T n (y), b) < 3ε. But then d∞ (a, b) ≤ d∞ (a, T n (x)) + d∞ (T n (x), T n (y)) + d∞ (T n (y), b) ≤ 3ε + d∞ (T (x), T (y)) + 3ε ≤ 6ε + d∞ (T (x), T (a)) + d∞ (T (a), T (b)) + d∞ (T (b), T (y)) ≤ 3ε + ε + d∞ (a, b) − 9ε + ε = d∞ (a, b) − ε. This contradiction shows that T2n (B) ∩ B = ∅ for all n ≥ 1. But then (a, b)  is a wandering point for T2 , contradicting Lemma 2.29. Related notions to equicontinuity are distality and its opposite: proximality. Definition 2.32. A dynamical system (X, T ) on a metric space (X, d) is distal if lim inf n d(T n (x), T n (y)) > 0 for every x = y. Conversely, a pair (x, y) ∈ X 2 is called proximal if lim inf n d(T n (x), T n (y)) = 0. That is, a distal dynamical system has no proximal pairs (except (x, x)). A dynamical system (X, T ) is called proximal if every pair (x, y) ∈ X 2 is proximal. Auslander & Ellis (see e.g. [13]) proved that for every x ∈ X, there exists a y ∈ X such that orb(y) is a minimal subset of X and (x, y) is a proximal pair. Note that proximality is not an equivalence relation: it is not transitive. For example, (101)(000)2 (101)3 (000)4 · · · and (000)(101)2 (000)3 (101)4 · · · are both proximal to 0∞ under the shift, but not to each other. A stronger version of proximality that does give an equivalence relation is the following: Definition 2.33. Let (X, T ) be a dynamical system on a metric space (X, d). Then a pair of points (x, y) is syndetically proximal if for every ε > 0, the set {n ∈ N or Z : d(T n (x), T n (y)) < ε} is syndetic. The following result for subshifts goes back to [156, 562]; see also [434, Theorem 19] for the proof. Theorem 2.34. Given a subshift (X, σ), the following are equivalent: (1) Proximality is an equivalence relation. (2) Every proximal pair is syndetically proximal. (3) The orbit closure {σ n × σ n (x, y) : n ∈ N or Z} of every (x, y) ∈ X× X contains exactly one minimal set in the product shift.

2.3. Equicontinuous and Distal Systems

31

Distality doesn’t imply equicontinuity; see Exercise 2.37. Neither does equicontinuity imply distality; think of T (x) = x/2 on X = [0, 1] or on X = R. However: Corollary 2.35. Every equicontinuous surjection (X, T ) on a compact metric space (X, d) is distal. Proof. Assume by contradiction that T is not distal. Then there are x = y and a sequence (nk )k∈N such that d(T nk (x), T nk (y)) → 0. Since X is compact, by taking a subsequence, we can assume limk T nk (x) = limk T nk (y) = z in the metric d. But then also limk T nk (x) = limk T nk (y) = z in the metric  d∞ , and this contradicts that T is an isometry in d∞ . In particular, equicontinuous surjections on a compact metric space are invertible, because distal dynamical systems are. Corollary 2.36. An equicontinuous surjection (X, T ) on a compact metric space (X, d) has an equicontinuous inverse. Proof. Take Kε = {(x, y) ∈ X 2 : d(x, y) ≥ ε} for ε > 0. We claim that δ(ε) := inf{d(T n x, T n y) : (x, y) ∈ Kε , n ∈ N} > 0. Indeed, assume by contradiction that there are sequences (xk , yk ) ⊂ Kε and (nk ) ⊂ N such that d(T nk xk , T nk yk ) ≤ 1/k for all k ∈ N and (xk , yk ) → (x∞ , y∞ ) ∈ Kε . By Corollary 2.35, T is distal, so η := inf{d(T n (x∞ ), T n (y∞ )) : n ≥ 0} > 0. By equicontinuity, there is γ(η) > 0 is such that d(x, y) < γ(η) implies that d(T n (x), T n (y)) < η/3 for all n ≥ 0. Take k > 3/η so large that (xk , yk ) ∈ Bγ(η) (x∞ , y∞ ). Then by the triangle inequality d(T nk (x∞ ), T nk (y∞ )) ≤ d(T nk (x∞ ), T nk (xk )) + d(T nk (xk ), T nk (yk )) + d(T nk (yk ), T nk (y∞ )) < η/3 + η/3 + η/3 = η, contradicting the choice of η. Hence two points u, v ∈ X with d(u, v) < δ(ε) have d(T −n (u), T −n (v)) < ε for all n ∈ N. This is equicontinuity of T −1 .  Exercise 2.37. (a) Show that the map T (x, y) = (x, x + y) on the two-torus T2 is distal but not equicontinuous. (b) Let α ∈ [0, 1] be irrational. Show that the map T (x, y) = (x + α, x + y) on the two-torus T2 is distal but not equicontinuous. (Here showing minimality is the hard part; see Proposition 6.26). Proposition 2.38. Every subshift (X, σ) with a non-periodic minimal set is proximal (so not equicontinuous by Corollary 2.35).

32

2. Topological Dynamics

The non-periodicity is essential; otherwise X = {(01)∞ , (10)∞ } is an equicontinuous counterexample. Non-periodicity implies in particular that X is uncountable. Proof. First assume that the shift is one-sided. If it is distal, then it has to be invertible, and therefore a homeomorphism. But a one-sided shift is locally expanding, and locally expanding homeomorphisms only exist on finite spaces; see Proposition 1.42. Hence, there are no distal one-sided shifts other than finite unions of periodic orbits. Now if (X, σ) is a two-sided shift, then its one-sided restriction (X + , σ) is a subshift too. Here we need to check that σ : X + → X + is surjective, but this follows because if x+ is the one-sided restriction of x ∈ X, then y + := σ −1 (x)+ ∈ X + and σ(y) = x. Furthermore, since X has a nonperiodic minimal set, X + has a non-periodic minimal set too. Thus the  above argument shows that (X + , σ) cannot be distal. Definition 2.39. Given a dynamical system (X, T ), we say that (Y, S) is the maximal equicontinuous factor (MEF) if it is equicontinuous and semi-conjugate to (X, T ) and every other equicontinuous factor of (X, T ) is also a factor of (Y, S). Every dynamical system has an MEF, and it can be shown that the MEF is unique up to conjugacy. This goes back to a result of Ellis & Gottschalk [236]. The proof we give is for invertible dynamical systems7 and relies on the notion of regional proximality: Definition 2.40. Let (X, T ) be a dynamical system on a metric space (X, d). Two points x, y ∈ X are regionally proximal if there are sequences xi → x and yi → y and (ni ) ⊂ N such that d(T ni (xi ), T ni (yi )) → 0. In this case we write x ∼rp y. It is not obvious that ∼rp is a transitive relation8 , and therefore we take the transitive hull x ∼trp y if there is a sequence x = z0 ∼rp z1 ∼rp · · · ∼rp zN = y. Proposition 2.41. Every continuous invertible dynamical system (X, T ) on a compact metric space (X, d) has a maximal equicontinuous factor. Proof. First we note that if (X, T ) is equicontinuous and x ∼rp y, then x = y. Indeed, otherwise for any ε > 0 and δ = δ(ε) as in the definition of equicontinuity, there is xi ∈ Bδ (x), yi ∈ Bδ (y), and ni such that d(T ni (xi ), T ni (yi )) < ε. But then also d(T ni (x), T ni (y)) ≤ d(T ni (x), T ni (xi )) + d(T ni (xi ), T ni (yi )) + d(T ni (yi ), T ni (y)) < 3ε. 7 See

[381, Theorem 2.44] for a proof of the non-invertible case, which is not constructive if it comes to the factor map. 8 See e.g. [321, 434, 497] for further information.

2.3. Equicontinuous and Distal Systems

33

Therefore (x, y) is not a distal pair, but equicontinuous maps are distal; see Corollary 2.35. The (transitive hull) relation ∼trp is an equivalence relation that is T invariant and also T −1 -invariant. The equivalence classes are closed, and if xk → x, yk → y are such that xk ∼trp yk , then also x ∼trp y. Therefore the quotient space Xeq = X/ ∼trp is a well-defined Hausdorff space (and in fact a metric space with quotient metric deq ), and the maps T and T −1 are well-defined on it. Now suppose by contradiction that T and hence T −1 is not equicontinuous on the quotient space Xeq . Then there is ε > 0 such that for all i ∈ N, there are xi , yi ∈ Xeq , deq (xi , yi ) < 1/i, and ni ∈ N such that deq (xi , yi ) > ε for xi = T −ni (xi ) and yi = T −ni (yi ). By passing to a subsequence, we can assume that xi → x and yi → y and deq (x, y) ≥ ε. But x ∼trp y by construc tion, contradicting that Xeq has only trivial regionally proximal pairs. 2.3.1. Mean Equicontinuity. Instead of assuming that nearby points always remain close under iteration, mean equicontinuity stipulates that iterates of nearby points remain close on average. This notion was first used by Fomin [250] under the name of mean Lyapunov stability9 . Definition 2.42. A dynamical system (X, T, d) on a metric space is called mean equicontinuous if for every ε > 0 there is a δ > 0 such that d(x, y) < δ implies n−1 1

d(T i x, T i y) < ε. lim sup n→∞ n i=0

Mean equicontinuity is more versatile than its strict version. Clearly, / Q are isometries and therecircle rotations Rα : S1 → S1 with angle α ∈ fore equicontinuous . Their symbolic versions, i.e. Sturmian shifts, see Section 4.3, are expansive and therefore not equicontinuous. Indeed, equip S1 with an orientation and a partition {[0, α) , [α, 1)}, with symbols 1 and 0, respectively, as is done in Example 1.34. If x < y ∈ S1 are very close together, then there are still iterates n ∈ N such that Rαn (x) < 0 < Rαn (y), so the symbolic distance dσ (σ n ◦ i(x), σ n ◦ i(y)) = 1. However, since this happens less frequently as the distance |x − y| becomes smaller, mean equicontinuity of a Sturmian shift is still achieved. Another variation of equicontinuity, which is a priori weaker than mean equicontinuity, is Weyl mean equicontinuity: for every ε > 0 there is a 9 This was defined as for every ε > 0 there is δ > 0 such that d(x, y) < δ implies d(T n (x), T n (y)) < ε for all n ∈ N except for a set of density zero. This is equivalent to Definition 2.42 by Lemma 8.53.

34

2. Topological Dynamics

δ > 0 such that d(x, y) < δ implies n−1 1

lim sup d(T i x, T i y) < ε. n−m→∞ n − m i=m

However, it was proved in [211] for minimal dynamical systems (and [263, 464] in more generality) that (X, T ) is mean equicontinuous if and only if for every ε > 0 there is a δ > 0 and N ∈ N such that d(x, y) < δ implies n−1 1

d(T i x, T i y) < ε n−m

for all m and n ≥ m + N.

i=m

Some of the stronger results on mean equicontinuity rely on invariant measures and therefore don’t quite fit in this section on topological dynamics. We present some of this nonetheless and refer to Chapter 6 for the relevant details. Given a T -invariant Borel probability measure μ, we call (X, T ) μmean equicontinuous if for every η > 0, there is a set Y ⊂ X of measure μ(Y ) > 1 − η such that T is mean equicontinuous on Y . As shown in [208, 449], if (X, T ) is an almost one-to-one extension of a minimal equicontinuous dynamical system (Y, S), then (Y, S) is the maximal equicontinuous factor of (X, T ). It follows from Theorem 6.22 (or more precisely the remarks that follow it) that transitive mean equicontinuous dynamical systems are uniquely ergodic. Thus the following characterization of mean equicontinuity, due to [211] for minimal dynamical systems and to [263] in general, makes sense: Theorem 2.43. A continuous dynamical system (X, T ) is mean equicontinuous if and only if its semi-conjugacy to its maximal equicontinuous factor is at the same time a measure-theoretic isomorphism between the unique invariant probability measures of (X, T ) and its maximal equicontinuous factor10 . Let us show that the symbolic version of an equicontinuous homeomorphism with a partition that is not too complicated (see condition (2) below) is mean equicontinuous. Theorem 2.44. Let (X, T ) be an equicontinuous homeomorphism on a compact metric space (X, d) with T -invariant measure11 μ. Let P = {P0 , . . . , Pr−1 } be a finite partition such that: (1) P is generating (cf. Theorem 6.48); i.e. for every x = x ∈ X there is n ∈ Z such that T n (x) and T n (x ) lie in different partition elements; 10 In 11 If

this case, (X, T ) is called a topo-isomorphic extension of its MEF. (X, T ) is minimal, then μ is unique.

2.3. Equicontinuous and Distal Systems

35

(2) limε→0 μ(Uε ) = 0 where Uε is the ε-neighborhood of ∂P = {x ∈ X : x ∈ P i ∩ P j for some 0 ≤ i < j < r}. Let (Y, σ) be the symbolic system associated to (X, T, P), i.e. the smallest subshift such that the itinerary i(x) ∈ Y for every x ∈ X. Then (Y, σ) is mean equicontinuous. Proof. If (X, T ) is transitive, then μ is the only T -invariant probability measure, see Theorem 6.22, and we can use Oxtoby’s Ergodic Theorem 6.20, later on. Otherwise, we can separate X into transitive parts and deal with each part separately. Choose N ∈ N arbitrary and 0 < ε < 2−N /(2N + 1). Choose ε > 0 so small that μ(Uε ) < ε . By equicontinuity of (X, T ) there is δ > 0 such that d(T n (x), T n (x )) < ε for all n ∈ Z whenever d(x, x ) < δ. Next take M ∈ N so large that the diameter diam(i−1 ([e−M · · · eM ])) < δ for every two-sided (2M + 1)-cylinder [e−M · · · eM ]. Now take y, y  ∈ Y such that dσ (y, y  ) ≤ 2−M , where dσ is the symbolic metric; i.e. y, y  are in the same two-sided (2M + 1)-cylinder. The sequences y, y  may not be well-defined itineraries of points in X, but this is remedied by assuming that points x ∈ X such that T n (x) ∈ ∂P get multiple itineraries, according to which P i contains T n (x). In this sense there are x, x such that at least one of their multiple itineraries equals y and y  , respectively. In particular, d(x, x ) < δ and therefore d(T n (x), T n (x )) < ε for all n ∈ Z. The points T n (x) and T n (x ) can only lie in different partition elements if they  j both lie in Uε . Unless T n (x), T n (x ) ∈ Vε := N j=−N T (Uε ), their itineraries n n  −N satisfy dσ (i(T (x)), i(T (x ))) ≤ 2 . But the measure μ(Vε ) ≤ (2N + 1)ε and by Oxtoby’s Ergodic Theorem 6.2012 , x and x visit Vε with frequency ≤ (2N + 1)ε . Therefore lim sup n→∞

1

1

dσ (σ j (y), σ j (y  )) = lim sup dσ (σ j (i(x)), σ j (i(x ))) n n→∞ n n−1

n−1

j=0

j=0

1

dσ (i(T j (x)), i(T j (x ))) n n−1

≤ lim sup n→∞

j=0

≤ (2N + 1)ε + (1 − (2N + 1)ε )2−N ≤ 2−N +1 . This proves mean equicontinuity.



12 We will apply Oxtoby’s Ergodic Theorem for the indicator function 1 , which is disconVε tinuous. But by assumption (2), μ(∂Vε ) can be made arbitrarily small by taking ε small, so that 1Vε can be approximated by a continuous function with negligible error.

36

2. Topological Dynamics

2.4. Topological Entropy The notion of topological entropy was introduced, by Adler, Konheim & McAndrew [9] in 1969. Nowadays, the definition due to the American mathematician Rufus Bowen [102] and, independently, his Russian colleague Efim Dinaburg [202] is most often13 used. Entropy is a measure of disorder of the dynamical system, and one popular definition of chaos is that the topological entropy is positive. Let (X, T ) be a continuous dynamical system on a compact metric space (X, d). If my eyesight is not so good, I cannot distinguish two points x, y ∈ X if d(x, y) ≤ ε. I may still be able to distinguish their orbits, if d(T k x, T k y) > ε for some k ≥ 0. Hence, if I’m willing to wait up to n − 1 iterations, I can distinguish x and y if dn (x, y) := max{d(T k x, T k y) : 0 ≤ k < n} > ε. If this holds, then x and y are said to be (n, ε)-separated. Among all the subsets of X of which all elements are mutually (n, ε)-separated, choose one, say En (ε), of maximal cardinality. Then sn (ε) := #En (ε) is the maximal number of n-orbits I can distinguish with my ε-poor eyesight. Remark 2.45. Compactness of X together with continuity of T ensures that sn (ε) < ∞. However, also for discontinuous maps, such as β-transformations, it can be proven that sn (ε) < ∞ for all ε > 0 and n ∈ N. Consequently, this approach to topological entropy usually also works for discontinuous functions. The topological entropy is defined as the limit (as ε → 0) of the exponential growth rate of sn (ε): (2.4)

htop (T ) = lim lim sup ε→0 n→∞

1 log sn (ε). n

Note that sn (ε1 ) ≥ sn (ε2 ) if ε1 ≤ ε2 , so lim supn n1 log sn (ε) is a decreasing function in ε, and the limit as ε → 0 indeed exists (we allow the limit to be ∞). Instead of (n, ε)-separated sets, we can also work with (n, ε)-spanning sets, that is, sets that contain, for every x ∈ X, a point y such that dn (x, y) ≤ ε. Let rn (ε) denote the minimal cardinality among all (n, ε)-spanning sets. Due to its maximality, En (ε) is always (n, ε)-spanning, and no proper subset of En (ε) is (n, ε)-spanning. Each y ∈ En (ε) must have a point of an (n, ε/2)spanning set within an ε/2-ball (in dn -metric) around it, and by the triangle 13 Note, however, that the Adler, Konheim & McAndrew definition requires only a topology, whereas the Bowen-Dinaburg definition is metric.

2.4. Topological Entropy

37

inequality, this ε/2-ball is disjoint from the ε/2-balls centered around all other points in En (ε). Therefore, rn (ε) ≤ sn (ε) ≤ rn (ε/2).

(2.5)

Thus we can equally well define (2.6)

htop (T ) = lim lim sup ε→0 n→∞

1 log rn (ε). n

Example 2.46. Let (X, σ) be the full shift on N symbols. Let ε > 0 be arbitrary, and take m minimal such that 2−m < ε. If we select a point from each n + m-cylinder, this gives an (n, ε)-spanning set, whereas selecting one point from each n-cylinder gives an (n, ε)-separated set. Therefore 1 1 log N = lim sup log N n ≤ lim sup log sn (ε) ≤ htop (σ) n→∞ n n→∞ n 1 1 ≤ lim sup log rn (ε) ≤ lim sup log N n+m n n n→∞ n→∞ = log N. Exercise 2.47. Show that for subshifts the definition of (1.3) coincides with (n, ε)-definition in this section. Example 2.48. Consider the β-transformation Tβ : [0, 1) → [0, 1), x → βx mod 1 for some β > 1. Take ε < 2β1 2 and Gn = { βkn : 0 ≤ k < β n }. Then Gn is (n, ε)-separating, so sn (ε) ≥ β n . On the other hand, Gn = βn βn { 2kε β n : 0 ≤ k < 2ε } is (n, ε)-spanning, so rn (ε) ≤ 2ε . Therefore log β = lim sup n→∞

βn 1 1 log β n ≤ htop (Tβ ) ≤ lim sup log = log β. n 2ε n→∞ n

Circle rotations, or in general isometries, have zero topological entropy. Indeed, if E(ε) is an ε-separated set (or ε-spanning set), it will also be (n, ε)separated (or (n, ε)-spanning) for every n ≥ 1. Hence sn (ε) and rn (ε) are independent of n, and their exponential growth rates are equal to zero. In more generality: Proposition 2.49. Every equicontinuous transformation (X, T ) on a compact metric space (X, d) has zero entropy. Proof. Let ε > 0 be arbitrary and choose δ > 0 as in the definition of equicontinuity. Then diam(T n (Bδ (x)) ≤ 2ε for all x ∈ X and n ≥ 0 (or n ∈ Z if T is invertible). Take M = diam(X)/δ. Hence, a single cover of X by M δ-balls constitutes a cover of (n, ε)-balls for all n. Therefore  htop (T ) ≤ limε→0 limn→∞ n1 log M = 0. Corollary 2.50. Given a continuous map T : X → X, htop (T k ) = khtop (T ) for all k ≥ 0, and if T is invertible, then htop (T k ) = |k|htop (T ) for all k ∈ Z.

38

2. Topological Dynamics

Proof. For any k ∈ N, a (kn, ε)-separated set for T is also an (n, ε)-separated set for T k . Therefore 1 1 htop (T k ) = lim log sn (ε, T k ) = k lim log sn (ε, T ) = khtop (T ). n→∞ n n→∞ kn Clearly the identity T 0 has zero entropy. If T is invertible and En (ε) is an (n, ε)-separated set, then T n−1 (En (ε)) is an (nε)-separated set for T −1 . Therefore htop (T −1 ) = htop (T ). Combined with the first part, it follows that htop (T k ) = |k|htop (T ) for all k ∈ Z.  Corollary 2.51. If (Y, S) is a continuous factor of (X, T ) (where (X, d) is a compact metric space), then htop (S) ≤ htop (T ). In particular, conjugate dynamical systems on compact metric spaces have the same topological entropy. Proof. Let π : X → Y be a continuous factor map. Since X is compact, π is uniformly continuous, so for ε > 0, we can find δ > 0 such that d(x, y) < δ implies d(π(x), π(y)) < ε. Therefore, if En (δ) is an (n, δ)-spanning set for T , then π(En (δ)) is an (n, ε)-spanning set for S (but possibly not a minimal (n, ε)-spanning set, even if En (δ) is minimal). It follows that rn (δ, T ) ≥  rn (ε, S), and hence htop (T ) ≥ htop (S). Proposition 2.52. Let (X, T ) be continuous dynamical system on a compact metric space (X, d). The entropy of its restriction to the non-wandering set Ω(T ) satisfies htop (T ) = htop (T |Ω(T ) ). Since T -invariant measures have to be supported on the non-wandering set, Proposition 2.52 follows from the Variational Principle (Theorem 6.63). A direct proof (not using invariant measures) can be found in [22, Lemma 4.1.5]. Example 2.53. The non-wandering set Ω(σ) of the subshift X = {0n1 1n2 0n3 1n4 · · · : 0 ≤ n1 ≤ max{n1 , 1} ≤ n2 ≤ n3 ≤ n4 ≤ · · · } consists of periodic orbits 0k 1k 0k 1k · · · or 1k 0k 1k 0k · · · , i.e. with period 2k. Therefore the number of 2n-periodic points (not necessarily prime period 2n) equals twice the number of divisors of n and hence is ≤ 2n. In view of Proposition 2.52, we have htop (σ) = 0. 2.4.1. Amorphic Complexity. If the cardinalities of (n, ε)-separated and of (n, ε)-spanning sets increase subexponentially, then one could compute the polynomial growth rate instead. This is called power entropy: log sn (ε) ; (2.7) hpow (T ) = lim lim sup ε→0 n→∞ log n see [304]. However, in practice this isn’t a very powerful tool to distinguish between dynamical systems, because, for instance, all dynamical systems

2.4. Topological Entropy

39

with linear word-complexity have hpow (T ) = 1. A recent approach [264], which turns out to distinguish between many zero-entropy systems (even of linear complexity and between some semi-conjugate dynamical systems), is amorphic complexity14 . It is based on the average time v that orbits are δ apart. Given a dynamical system (X, T ) on a metric space (X, d), two points x, y ∈ X are (δ, v)-separated for some δ > 0 if 1 lim sup #{0 ≤ j < n : d(T j (x), T j (y)) ≥ δ} ≥ v. n→∞ n A set S ⊂ X is (δ, v)-separated if every x = y ∈ S is (δ, v)-separated. Let Sep(δ, v) denote the maximal cardinality of the (δ, v)-separated sets. We say that (X, T ) has finite separation numbers if Sep(δ, v) < ∞ for all δ, v > 0. If Sep(δ, v) = ∞ for some δ, v > 0, then (X, T ) has infinite separation numbers, and in this case the amorphic complexity defined below is infinite, hence not so useful. This occurs, for instance, in the following cases; see [264, Theorem 1.1]: Theorem 2.54. Let (X, T ) be a continuous dynamical system on a compact metric space (X, d). If htop (T ) > 0 or T is weakly mixing w.r.t. some non-atomic invariant probability measure (see Definition 6.83), then T has infinite separation numbers. Hence we are only interested in dynamical systems with separation numbers that are finite, but potentially unbounded in v. Definition 2.55. Assume that (X, T ) has finite separation numbers. The upper/lower amorphic complexity is the polynomial growth rate of the separation numbers as a function of v tending to zero: ⎧ ⎨ac(T ) = supδ>0 lim supv→0 log Sep(δ,v) , − log v (2.8) log Sep(δ,v) ⎩ac(T ) = sup δ>0 lim inf v→0 − log v . If these quantities are the same, then ac(T ) = supδ>0 limv→0 amorphic complexity of T .

log Sep(δ,v) − log v

is the

Remark 2.56. Amorphic complexity can also be defined by spanning sets [264, Section 3.2]. A set S ⊂ X is (δ, v)-spanning if for every y ∈ X there is an x ∈ S such that 1 lim sup #{0 ≤ j < n : d(T j (x), T j (y)) ≥ δ} < v. n→∞ n Letting Span(δ, v) denote the minimal cardinality of the (δ, v)-spanning sets, (2.8) holds with Sep(δ, v) replaced by Span(δ, v). 14 This notion was first used in the context of aperiodic tilings that approximate “amorphous” material. The name was coined for this reason.

40

2. Topological Dynamics

If T is an isometry, then the frequency of two points x, y ∈ X being ≥ δ apart is 0 or 1, depending on whether d(x, y) < δ or ≥ δ. Therefore Sep(δ, v) is independent of v, so ac(T ) = 0. More generally: Proposition 2.57. If (X, T ) is equicontinuous, then the amorphic complexity ac(T ) = 0. Proof. Let ε > 0 be arbitrary. By equicontinuity and the compactness of X, we can take δ > 0 such that T n (Bδ (x)) ⊂ Bε/2 (T n (x)) for all x ∈ X and n ∈ N or Z. Thus two points in Bδ (x) are never (ε, v)-separated for any v ∈ (0, 1]. Let N (δ) be the number of such δ-balls that can be packed in ≤ X, so that no such ball contains the center of another. Then log−Sep(ε,v) log v log N (δ) − log v

→ 0 as v → 0. Therefore ac(T ) = 0.



Further properties concern iterates and factors; see [264, Proposition 1.3]. Lemma 2.58. Let (X, T ) and (Y, S) be two dynamical systems on compact metric spaces. • If (Y, S) is a topological factor of (X, T ), then ac(S) ≤ ac(T ). In particular, amorphic complexity is preserved under conjugacy. • ac(T n ) = ac(T ) for every n ∈ N. • ac(S × T ) = ac(S) + ac(T ). In later sections, we compute the amorphic complexity of some particular dynamical systems, such as Sturmian shifts, see Section 4.3.1, and Toeplitz shifts, see Section 4.5.

2.5. Mathematical Chaos Mathematical chaos doesn’t have a single definition, but the basic idea it tries to capture is that forward orbits are unpredictable. The computation of orbits in any (physical) dynamical systems inherently brings errors: measurement errors, round-off errors, error in the mathematical model. Unpredictability means that initial errors blow up over time (sometimes exponentially fast, as is the case with subshifts). Therefore distal dynamical systems on compact spaces (in particular isometries) are not chaotic in any common definition. On the other hand, expansivity is in general too strong a property to require for chaos. For instance, a tent map Ts : [0, 1] → [0, 1], Ts : x → min{sx, s(1 − x)} √ is chaotic if the slope s ∈ ( 2, 2], but not expansive. Indeed, x = 1+ε 2 and 1−ε n n y = 2 are ε apart, but Ts (x) = Ts (y) for all n ≥ 1. A weaker, more

2.5. Mathematical Chaos

41

appropriate, definition in this context is the following: Definition 2.59. A dynamical system (X, T ) on a metric space (X, d) has sensitive dependence on initial conditions if there is δ > 0 such that for all ε > 0 and x ∈ X, there is y ∈ Bε (x) and n ≥ 0 such that d(T n (x), T n (y)) > δ. This leads to one of the most common definitions of chaos [196]: Definition 2.60. A dynamical system (X, T ) on a metric space (X, d) is chaotic in the sense of Devaney if: 1. (X, T ) has sensitive dependence on initial conditions; 2. (X, T ) has a dense orbit; 3. X has a dense set of periodic orbits. As was soon realized by Banks et al. [46], unless X is a single periodic orbit, 1 follows automatically from 2 and 3. See also Silverman’s study [510] on chaos and topological transitivity. Proposition 2.61. Let (X, T ) be a continuous dynamical system on an infinite metric space (X, d). If T has a dense set of periodic orbits as well as a dense orbit, then T has sensitive dependence on initial conditions. Proof. Since X is infinite and has a dense orbit, no periodic point is isolated, and there are at least two periodic orbits, say orb(p) and orb(q). Let δ := min{d(x, y) : x, y ∈ orb(p) ∪ orb(q), x = y}/6 > 0. Take x ∈ X and ε > 0 / orb(p) ∪ orb(q). If arbitrary. Then Bε (x) contains a periodic point r ∈ n n there is n ≥ 0 such that d(T (x), T (r)) > δ, then sensitive dependence is established at x. Therefore assume that d(T n (x), T n (r)) ≤ δ for all n ≥ 0. Since there is a dense orbit, we can find y ∈ Bε (x) such that p, q ∈ orb(y) = X. If there is n ≥ 0 such that d(T n (x), T n (y)) > δ, then sensitive dependence is again established at x, so we assume that d(T n (x), T n (y)) ≤ δ for all n ≥ 0. Take j, k ∈ N such that d(T j+i (y), T i (p)) < δ





d(T k+i (y), T i (q)) < δ

and

for all 0 ≤ i, i ≤ per(r). We can choose 0 ≤ i, i ≤ per(r) such that  r = T j+i (r) = T k+i (r). Therefore d(T i (p), r) ≤ d(T i (p), T j+i (y)) + d(T j+i (y), T j+i (x)) + d(T j+i (x), r) ≤ 3δ and 











d(T i (q), r) ≤ d(T i (q), T k+i (y))+d(T k+i (y), T k+i (y))+d(T k+i (x), r) ≤ 3δ. 

But then d(T i (p), T i (q)) ≤ 6δ, contradicting the choice of δ. This proves the result. 

42

2. Topological Dynamics

The requirement of a dense set of periodic orbits in Devaney chaos is restrictive, because it precludes minimal systems to be chaotic. The following notion doesn’t have this drawback. Definition 2.62. A dynamical system (X, T ) on a metric space (X, d) is chaotic in the sense of Auslander-Yorke if: 1. (X, T ) has sensitive dependence on initial conditions; 2. (X, T ) has a dense orbit. The following result is known as the Auslander-Yorke dichotomy [39]: Theorem 2.63. Every minimal dynamical system (X, T ) is either equicontinuous or has sensitive dependence on initial conditions. Proof. That sensitive dependence precludes equicontinuity is clear from the definition. For the converse, we will assume that equicontinuity fails at one point x, and we show that T is sensitive at every x ∈ X. For this it suffices to assume that orb(x) is dense in X. Given that T is not equicontinuous at x, there are δ > 0 and sequences yk → x, nk → ∞ such that d(T nk (x), T nk (yk )) ≥ δ. Let x ∈ X and U   x be an arbitrary open neighborhood. Using denseness of orb(x), we find m such that T m (x) ∈ U . Since T is continuous, we can take k so large that nk > m and T m (yk ) ∈ U  as well. Now T nk (y), T nk (yk ) ∈ T nk −m (U  ) and d(T nk (x), T nk (yk )) > δ, so d(T nk −m (x), T nk −m (x )) > δ/2

or

d(T nk −m (yk ), T nk −m (x )) > δ/2.

This proves sensitive dependence with expansivity constant δ/2.



Remark 2.64. In fact, there is a version of the Auslander-Yorke dichotomy, see [14, 423, 553, 563], saying that a transitive dynamical system either has sensitive dependence on initial conditions (see Definition 2.59) or is uniformly rigid. This implies in particular that for minimal dynamical systems, equicontinuity is equivalent to uniform rigidity. Remark 2.65. There is also an analogue for mean equicontinuity, see [394] and also [270], saying that every minimal dynamical system is either mean equicontinuous or mean sensitive, which means that there is a δ > 0 such that for every and neighborhood U  x, there is y ∈ U such  x ∈ X i x, T i y) > δ. A measure-theoretic version of the d(T that lim supn n1 n−1 i=0 dichotomy is due to [269], which states that given an ergodic T -invariant Borel measure μ, (X, T ) is either μ-mean equicontinuous or μ-mean sensitive, i.e. mean sensitive with “neighborhood U ” replaced by “Borel set U  x with μ(U ) > 0”.

2.5. Mathematical Chaos

43

The paper of Li & Yorke [395] from 1973 might be called a popular (partial) rediscovery of Sharkovskiy’s theorem [498] from 196415 , but it also initiated the study of the following notions. Definition 2.66. Let (X, T ) be a dynamical system on a metric space (X, d). A pair of points x, y ∈ X is called a Li-Yorke pair if lim inf d(T n (x), T n (y)) = 0 n→∞

and

lim sup d(T n (x), T n (y)) > 0. n→∞

A set S ⊂ X is called scrambled if (x, y) is a Li-Yorke pair for every two distinct x, y ∈ S. The dynamical system is chaotic in the sense of Li and Yorke if there is an uncountable scrambled set. Huang & Ye [324, Theorem 4.1] proved that if a continuous dynamical system is transitive and properly contains a periodic orbit, then it is chaotic in the sense of Li-Yorke. In particular, Devaney chaos implies Li-Yorke chaos. Remark 2.67. A quantitative version of Li-Yorke chaos is distributional chaos, introduced by Schweizer & Smítal [492] (see also [45, 517] for the versions DC1–DC3). It measures the proportion of time that points in a scrambled set spent close to each other and far away from each other. Example 2.68. Let us construct an uncountable scrambled set in the full shift space X = {0, 1}N0 . First define an equivalence relation ∼ by setting x ∼ y if there is n0 ∈ N such that either xn = yn for all n ≥ n0 or xn = yn for all n ≥ n0 . That is, x and y have either the same or opposite tails. Each equivalence class is countable, because for each fixed n0 there are finitely many equivalent points with the same n0 . Since X is uncountable, there are uncountably many equivalence classes. Next, using the axiom of choice, construct a set Y ⊂ X that contains exactly one point in each equivalence class. Now define an injection π : X → X by π(x)j = xn for each 2n − 1 ≤ j < 2n+1 − 1. Then S = π(Y ) is uncountable and scrambled. Indeed, for every x = y ∈ Y , there are infinitely many n such that xn = yn and then n n d(σ 2 −1 ◦ π(x), σ 2 −1 ◦ π(y)) ≤ 2−n . Also there are infinitely many n such n n that xn = yn and then d(σ 2 −1 ◦ π(x), σ 2 −1 ◦ π(y)) ≥ 1 − 2−n .

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

15 Sharkovskiy’s Theorem states that if a continuous map of the real line has a periodic point of period n, it also has a periodic point of period m for every m ≺ n in the Sharkovskiy order 1 ≺ 2 ≺ 4 ≺ 8 ≺ · · · ≺ 4 · 7 ≺ 4 · 5 ≺ 4 · 3 · · · ≺ 2 · 7 ≺ 2 · 5 ≺ 2 · 3 · · · ≺ 7 ≺ 5 ≺ 3. Sharkovskiy related during the 2018 IWCTA: International Workshop and Conference on Topology & Applications (Kochi, India) in honor of his 1,000-th moon that the printer of his original publication didn’t have the sign ≺ at his disposal, and therefore he suggested to use the letter Y turned sideways. The publisher followed this suggestion but turned the Y in the different direction as Sharkovskiy had intended, and therefore the Sharkovskiy order was first printed as . . . 6 10 14 . . . 12 20 28 ...... 4 2 1 in [498]. Štefan [524] in 3 5 7 his 1977 proof used 3  5  7  . . . and the English translation of Sharkovskiy’s proof [499] by Tolosa used 3 5 7 . . . .

44

2. Topological Dynamics

Similarly, all non-trivial subshifts of finite type (SFTs) are Li-Yorke chaotic, but Sturmian subshifts (or more generally distal maps) are not LiYorke chaotic (lim inf n d(T n (x), T n (y)) > 0 for distinct x = y ∈ X). Exercise 2.69. Let X = AN0 be the full shift space for some alphabet A containing a. Define π : X → X by  xk−n2 , n2 ≤ k ≤ n2 + n, π(x)k = a, n2 + n < k ≤ n2 + 2n. Show that π(X) is a scrambled set. An important, long conjectured, result ties Li-Yorke chaos to topological entropy. Theorem 2.70. Every continuous dynamical system of positive entropy on a compact space is Li-Yorke chaotic. This is the main result of [83]; see also [482, Chapter 5] and [210]. The converse is, however, not true. There exist examples of continuous (so-called 2∞ ) interval maps which have periodic points of period 2n for each n ∈ N and no periodic points with other periods, which have (therefore) zero topological entropy, but which still are Li-Yorke chaotic; see [516, 566]. Example 2.53 gives a subshift which has zero entropy but is Li-Yorke chaotic. Theorem 2.71. Let X = {1, . . . , d}N . For every probability vector p = (p1 , . . . , pd ), every scrambled set has zero p-Bernoulli measure. Proof. Let (X, B, μp , σ) be the p-Bernoulli shift and assume by contradiction that S ⊂ X is a scrambled set with μp (S) > 0. Take two distinct Lebesgue density points16 a and b of S  , and for any n, let Zn (a) and Zn (b) be the corresponding n-cylinders of a and b, respectively. Because a and b are density points, the Lebesgue fractions of μp (σ n (S ∩ Zn (a))) and μp (σ n (S ∩ Zn (b))) tend to 1 as n → ∞. That means that there are distinct x, y ∈ S and some n ∈ N such that σ n (x) = σ n (y). But then (x, y) is not a  Li-Yorke pair. This contradiction shows that μp (S) = 0.

2.6. Transitivity and Topological Mixing Transitivity prevents that the phase space consist of multiple pieces that don’t communicate with each other. Topological mixing prevents that they communicate with each other only at a periodic sequence of iterates. There 16 For Lebesgue measure μ on Euclidean space, x is called a density point if lim ε→0 μ(A ∩ Bε (x))/μ(Bε (x)) = 1. The Lebesgue Density Theorem says that if μ(A) > 0 then μa.e. x ∈ A is a density point of A. We now use the same result for Bernoulli measure μp , but this is justified because there is a measure-preserving map ψ : (X, μp ) → ([0, 1], μ) which is also continuous and sends cylinder sets to intervals; cf. Example 6.55.

2.6. Transitivity and Topological Mixing

45

are several related concepts in addition to (totally) transitive from Definition 2.12: Definition 2.72. A dynamical system (X, T ) on a topological space is called topologically mixing if for every two open sets U, V there is N ≥ 0 such that U ∩ T −n (V ) = ∅ for all n ≥ N . Topologically mixing dynamical systems on metric spaces are sensitive to initial conditions (provided X consists of at least two points), and therefore equicontinuous dynamical systems cannot be topologically mixing. In particular, since topological mixing is inherited by factors, the maximal equicontinuous factor of a topologically mixing dynamical system is trivial. Definition 2.73. A dynamical system (X, T ) on a topological space is called topologically exact (also called locally eventually onto or leo for short) if for every open set U there is N ≥ 0 such that T N (U ) = X. Invertible dynamical systems (other than the identity on a singleton) are never topologically exact, and neither are nontrivial dynamical systems with zero entropy. Lemma 2.74. If a dynamical system (X, T ) on a non-trivial metric space (X, d) is topologically exact, then htop (T ) > 0. Proof. Take x0 = x1 ∈ X, and choose 0 < ε < d(x0 , x1 )/3. Let U0 and U1 be the ε-neighborhoods of x0 and x1 , respectively. By topological exactness, there is N ∈ N such that T N (U0 ) = X = T N (U1 ). Hence, for an arbitrary n ∈ N and every w = w0 w1 · · · wn−1 ∈ {0, 1}n , there is xw ∈ X such that T kN (xw ) ∈ Uwk for all 0 ≤ k < n. If w = w ∈ {0, 1}n , then the nN -distance dnN (xw , xw ) > ε. Hence, every (nN, ε)-spanning set must contain at least  2n elements and htop (T ) ≥ N1 log 2 > 0. Theorem 2.75. If T : [0, 1] → [0, 1] is a continuous transitive interval map, then htop (T ) ≥ 12 log 2. If in addition T is topologically mixing, then htop (T ) > 12 log 2. This result is due to Blokh; see [90, 92]. A compact exposition of this and related results can be found in [482, Proposition 4.70]. Definition 2.76. A dynamical system (X, T ) on a topological space is called weakly topologically mixing if for every four non-empty open sets U1 , U2 , V1 , V2 , there is n such that U1 ∩ T −n (V1 ) = ∅ and U2 ∩ T −n (V2 ) = ∅, or equivalently, the product system T × T on X × X is transitive. When presenting these notions, we consistently write the adjective “topological” because there are also measure-theoretic versions of exact, mixing,

46

2. Topological Dynamics

and weak mixing. These are discussed in Section 6.7. Some specific differences exist; for instance, there is no topological analog of Theorem 6.86. From the definition it is clear that topological weak mixing implies that the product system (X 2 , T × T ) is transitive. In fact, Furstenberg [266] showed that this holds for every N -fold Cartesian product (X N , T × · · · × T ). An important result on topological weak mixing is the following multiple recurrence (a dynamical version of Van der Waerden’s Theorem) due to Furstenberg & Weiss [267]: if (X, T ) is minimal, then for every open set U ⊂ X and m ∈ N, there is n ∈ N such that U × T n (U ) × T 2n (U ) × · · · × T mn (U ) = ∅. Glasner [276] extended this to multiple transitivity: if (X, T ) is minimal and topologically weak mixing, then for x in a residual subset of X, the m-tuple (x, . . . , x) has a dense orbit under T ×T 2 ×· · ·×T m . Further results can be found in e.g. [139, 424]. The following hierarchy (which also holds for the measure-theoretic analog) will not come as a surprise: Theorem 2.77. The following implications hold: top. topologically topologically totally topologically ⇒ ⇒ ⇒ ⇒ exact mixing weak mixing transitive transitive The reverse implications are in general false. Counterexamples to the reverse implications can be found among subshifts: full shift

Petersen’s shift

Chacon Fibonacci Thue-Morse substitution shift substitution shift shift

where the Fibonacci, Chacon, and Thue-Morse substitution shifts are defined in Examples 1.3, 1.27, and 1.6, respectively. Petersen’s shift [454] is an example of a zero entropy subshift that is topologically mixing. Lemma 2.74 shows that it cannot be topologically exact. Remark 2.78. Although none of the reverse implications in Theorem 2.77 hold in all generality, for many subshifts, some of these notions are equivalent. For instance, sofic shifts and density shifts that are totally transitive are topologically mixing; cf. [237] and Theorem 3.61. For coded and synchronized shifts, total transitivity is equivalent to topologically weak mixing; see [237, Theorem 1.1]. In terms of the set of visit times for sets U, V ⊂ X, (2.9)

N (U, V ) = {n ∈ N0 or Z : U ∩ T −n (V ) = ∅}.

2.7. Shadowing and Specification

47

The notions in this section can be expressed as follows. For all U, U  , V, V  ⊂ X open and non-empty: • Topologically exact: x∈X N (U, {x}) is cofinite. • Topologically mixing: N (U, V ) is cofinite. • Topologically weak mixing: N (U, V ) ∩ N (U  , V  ) is infinite. • Topologically transitive: N (U, V ) is non-empty. • Totally transitive: ∀k N (U, V ) ∩ kN is non-empty.

2.7. Shadowing and Specification Definition 2.79. Let (X, T ) be a dynamical system on a metric space (X, d). A sequence (xn )n∈N0 or Z is called a δ-pseudo-orbit if d(T (xn ), xn+1 ) < δ for all n ∈ N0 or Z. A point x ∈ X is chain-recurrent if for every δ > 0 there is a δ-pseudo orbit of some length k such that x = x0 and d(T (xk ), x) < δ. Chain recurrence is the weakest version of recurrence in a sequence of implications periodic ⇒ periodically recurrent ⇒ uniformly recurrent ⇒ recurrent ⇒ non-wandering ⇒ chain-recurrent and none of the reverse implications hold in general. Given that every floating-point calculation has round-off errors, orbits that a computer calculates numerically are always pseudo-orbits for some small δ. Whether such a pseudo-orbit represents an approximation of an actual orbit is captured in the following definition. Definition 2.80. A dynamical system (X, T ) on a metric space (X, d) has the shadowing property if for every ε > 0 there is δ > 0 such that for every δ-pseudo-orbit (xn )n∈N0 or Z , there is y ∈ X so that orb(y) ε-shadows (xn ); i.e. d(xn , T n (y)) < ε for all n ∈ N0 or Z. By now, many variations of shadowing have been studied, for example average shadowing (the average error needs to be smaller than ε), periodic shadowing (periodic pseudo-orbits are ε-shadowed by actual periodic orbits), limit shadowing (the ε in the shadowing tends to zero as the iterates |n| → ∞). We refer to the monograph by Pilyugin [461], although many variations of shadowing are from a later date; cf. [55, 280, 423]. The seminal result for shadowing is the Anosov Shadowing Lemma [27] for hyperbolic sets. Work by Bowen [99] showed that hyperbolic dynamical systems, and this includes SFTs, have the shadowing property. Definition 2.81. Let f : M → M be a C 1 diffeomorphism of a C 1 Riemannian manifold M . An f -invariant set Λ is called hyperbolic if there

48

2. Topological Dynamics

is a uniformly transversal splitting Tq M = Eqs ⊕ Equ of the tangent spaces that is continuous in q ∈ Λ, invariant under f , i.e. Dfq (Eqs ) = Efs (q) and Dfq (Equ ) = Efu(q) , and tangent vectors in Eqs , resp. Equ , decrease exponentially fast under forward, resp. backward, iteration. If f : M → M is not invertible, then we need to select inverse branches in order for Equ to be well-defined. The manifold contains stable and unstable s (q) and W u (q) of q, tangent to E s and E u , respectively, local manifolds Wloc q q loc such that  s (q), n→∞ if x ∈ Wloc n n d(f (q), f (x)) → 0 exponentially, as u (q). n → −∞ if x ∈ Wloc In the symbolic setting, i.e. a subshift (X, σ) takes the place of (M, f ), we can define  s (q) = {x ∈ X : x = q for all n ≥ 0}, Wloc n n u Wloc (q) = {x ∈ X : xn = qn for all n ≤ 0}. Theorem 2.82 (Anosov Shadowing Lemma). Let Λ be a hyperbolic set of a C 1 diffeomorphism f : M → M , and let Λε denote the ε-neighborhood of Λ. Then for every ε > 0 there is δ > 0 such that every δ-pseudo orbit (xk ) ⊂ Λδ (finite, one-sided or two-sided infinite), there is x ∈ Λε such that d(f k (x), xk ) < ε for all k. The analogue of this theorem for periodic shadowing is called the Anosov Closing Lemma. See [346, Sections 6.4 and 18.1]. One may think that uniform expansion is enough to guarantee shadowing, but it is not as simple as that. For example (see [170] and [116, Theorem 6.3.5]), tent maps Ts with slope s ∈ (1, 2) have the shadowing property if and only if the critical point c is recurrent or its kneading map is unbounded (in the terminology of Section 3.6.3). An important variation of shadowing, also introduced by Bowen [101], is specification. In this case, no pseudo-orbits are involved, but particular pieces of orbits are to be ε-shadowed for particular intervals of time, allowing gaps in between that are inverse proportional to log ε. Definition 2.83. A dynamical system (X, T ) on a metric space (X, d) has specification for K points if for every ε > 0 there is a gap size N with the following property: for all points x1 , . . . , xK ∈ X and iterates m1 ≤ n1 < m2 ≤ n2 < · · · < mK ≤ nK with mk+1 − nk ≥ N , there is x ∈ X such that (2.10)

d(T j (x), T j−mk (xk )) < ε

for all k ∈ {1, . . . , K}, mk ≤ j < nk .

Sometimes specification includes the requirement that x is periodic as well (periodic specification) and that specification holds for all K ∈ N (strong specification).

2.7. Shadowing and Specification

49

Remark 2.84. For subshifts (X, σ), this definition can be simplified. We give the version for strong specification, because it is the one in most frequent use in this context. There is a gap size N ∗ such that for all K ∈ N and every K-tuple x1 , . . . , xK ∈ X and iterates m1 ≤ n1 < m2 ≤ n2 < · · · < mK ≤ nK with mk+1 − nk ≥ N ∗ , there is x ∈ X such that (2.11)

xj = xkj−mk

for all k ∈ {1, . . . , K}, mk ≤ j < nk .

Since d(x, xk ) ≤ 12 if and only if x0 = xk0 , condition (2.11) implies (2.10) with N (ε) = N ∗ for ε > 12 . For ε ∈ (0, 12 ], condition (2.11) implies (2.10) with N (ε) = N ∗ + n where n is minimal such that 2−n < ε. The strength of specification is that a single orbit can shadow many other orbits consecutively, in particular orbits that have different dynamical behaviors. Lemma 2.85. A dynamical system with specification for some K ≥ 2 is topologically mixing and if the specification is periodic, then the set of periodic orbits is dense. Proof. Specification allows one to connect ε-neighborhoods of any two points x1 , x2 by an orbit of length N = N (ε). To show topological mixing, take n ≥ 1 arbitrary and m1 = n1 = 0, m2 = n2 = n1 + N (ε), and m2 = n2 = n1 + N + n as in the definition of specification. Then there are x, x ∈ Bε (x1 ) such that T N (x) ∈ Bε (x2 ) and T N +n (x) ∈ Bε (x2 ) as required. Finally, for any x1 ∈ X and ε > 0, we can find a periodic point x ∈ Bε (x1 ), so the set of periodic points is dense.  The next result is due to Bowen [101] and in more generality to Sigmund [508, Proposition 3]. Proposition 2.86. Every continuous dynamical system with specification for all K ∈ N on a compact metric space has positive topological entropy. Proof. Take distinct points a, b ∈ X and let ε = d(a, b)/3. Let N be the gap size associated to ε. Now for every K ∈ N and chain {x1 , . . . , xK } ⊂ {a, b}K and the integers mk = nk = mk+1 − N , there is a point x such that d(T mk (x), xk ) < ε for k = 1, . . . , K. There are 2K choices of {x1 , . . . , xK } and the corresponding points x are (nK , ε)-separated. Hence, according to 1 log 2 > 0.  Definition (2.4), htop (T ) ≥ 1+N The following was first shown by Bowen [103]. Lemma 2.87. Every continuous factor of a dynamical system (X, T ) with specification on a compact metric space (X, d) has specification.

50

2. Topological Dynamics

Proof. Let (Y, S) be a factor of (X, T ) such that π : X → Y is the semiconjugacy. Since X is compact, π is uniformly continuous. Choose ε > 0 arbitrary, and take δ > 0 such that the π-image of every δ-neighborhood in X is contained in an ε-neighborhood in Y . Find N = N (δ) as in Definition 2.83 of specification for (X, T ). Choose K ∈ N and m1 ≤ n1 < m2 ≤ n2 < · · · < mK ≤ nK with gaps mk+1 − nk ≥ N and points y1 , . . . , yK ∈ Y arbitrary. Choose xk ∈ π −1 (yk ) for each 1 ≤ k ≤ K. Since (X, T ) has specification, there is x ∈ X that δ-shadows the pieces of orbits of the xk ’s at the required time intervals. Thus y := π(x) ε-shadows the pieces of orbits of the yk ’s at the required time intervals. This completes the proof.  Theorem 2.88. Let (X, T ) be an expansive continuous dynamical system on a compact metric space (X, d). If T has specification, then it is intrinsically ergodic; i.e. T has a unique measure of maximal entropy. This was proven in [103], and it applies of course to subshifts. Strong specification makes it possible, and even easy, to approximate invariant measures in the weak∗ topology by equidistributions on periodic orbits. Indeed, if x is a typical17 point for an ergodic T -invariant measure μ, then for arbitrarily large n, we can find an n-periodic point pn that ε-shadows n−1 the 1 orbit of x up to iterate n − N . The equidistribution μn := n i=0 δT i (p) then tends to μ as n → ∞. Similar ideas work for non-ergodic measures; see Definition 1.30. An extended version of this argument yields  that the 1 measure of maximal entropy is the weak∗ limit of #{p:Per(p)≤n} Per(p)≤n δp where Per(p) denotes the period of p; see [103] and [159]. Further variations of specification were designed to extend this proof of intrinsic ergodicity to dynamical systems where specification fails; see Buzzi [135], Climenhaga & Thompson [159,160], and Kwietniak and coauthors [383,384]. This applies for instance to (factors of) β-shifts and gap shifts.

17 In

the sense that the Ergodic Theorem 6.13 holds for x.

Chapter 3

Subshifts of Positive Entropy

Most of the subshifts of positive entropy are symbolic versions of positive entropy dynamical systems of manifolds, for example dynamical system possessing a Markov partition, β-transformations, or unimodal interval maps. Symbolically these correspond to β-shifts, kneading theory, and subshifts of finite type (SFT), respectively, and the entropy is given by the exponential growth rate of periodic points. We discuss also some subshifts that are not in first instance symbolic versions of other dynamical systems, such as density shifts, coded shifts, gap shifts, and spacing shifts, and in some cases (such as power-free shift), the entropy is not related to periodic sequences at all.

3.1. Subshifts of Finite Type Subshifts of finite type are the simplest and most frequently used subshifts in applications. They emerge naturally in hyperbolic dynamical systems such as toral automorphisms, Markov partitions of Anosov diffeomorphisms, Axiom A attractors (including Smale’s horseshoe), but also in topological Markov chains. 3.1.1. Definition of SFTs and Transition Matrices and Graphs. Definition 3.1. A subshift of finite type (SFT) is a subshift consisting of all strings avoiding a finite list of forbidden words as subwords. For example, the Fibonacci SFT has 11 as forbidden word. Naturally, then also 110 and 111 are forbidden, but we take only the smallest collection of forbidden words. If M + 1 is the length of the longest forbidden word, 51

52

3. Subshifts of Positive Entropy

then this SFT is an M -step SFT, or an SFT with memory M . Indeed, an M -step SFT has the property that if uv ∈ L(X) and vw ∈ L(X) and if |v| ≥ M , then uvw ∈ L(X) as well. The following property is therefore immediate: Lemma 3.2. Every SFT (X, σ) on a finite alphabet can be recoded such that the list of forbidden words consists of 2-words only. Proof. Assume that (X, σ) is a subshift over the alphabet A and the longest forbidden word has length M + 1 ≥ 2. Take a new alphabet A˜ = AM , say a1 , . . . , an are its letters. Recode every x ∈ X using a sliding block code π, where for each index i, π(x)i = aj if aj is the symbol used for xi xi+1 · · · xi+M −1 . Effectively, this is replacing X by its M -block code. Then ˜ and every M + 1-word is uniquely coded by a 2-word in the new alphabet A, −1 vice versa, every a1 a2 such that the M -suffix of π (a1 ) equals the M -prefix of π −1 (a2 ) encodes a unique M + 1-word in A∗ . Now we forbid a 2-word a1 a2 in A˜2 if π −1 (a1 a2 ) contains a forbidden word of X. Since B is finite, and therefore A is finite, this leads to a finite list of forbidden 2-words in the recoded subshift.  Example 3.3. Let X be the SFT with forbidden words 11 and 101, so M = 2. We recode using the alphabet a = 00, b = 01, c = 10, and d = 11. Draw the vertex-labeled transition graph (see Figure 3.1); labels at the arrows indicate which word in {0, 1}3 they stand for. For example, the edge a → b labeled 001 has prefix a = 00 and suffix b = 01. Each arrow containing a forbidden word is dashed and then removed in the right panel of Figure 3.1. 001

a

000

a

b

b

101 100

011 010

c

d

111

c

110

Figure 3.1. The recoding of the SFT with forbidden words 11 and 101.

Corollary 3.4. Every SFT (X, σ) on a finite alphabet A can be represented by a finite graph G with vertices labeled by the letters in A and arrows b1 → b2 only if π −1 (b1 b2 ) contains no forbidden word of X.

3.1. Subshifts of Finite Type

53

Definition 3.5. The directed graph G constructed in the previous corollary is called the transition graph of the SFT. The matrix A = (aij )i,j∈A with ai,j = #{arrows i → j in G} is its transition matrix. The graph is vertexlabeled, which means that to each vertex there is assigned a symbol in the alphabet. We will stipulate throughout this book that the vertex-labels are unique (i.e. no two distinct vertices have the same label), although this assumption is not entirely uniform in the literature. Definition 3.6. A non-negative N × N matrix A = (aij )i,j∈A is called irreducible if for every i, j ∈ A there is k such that Ak has (i, j)-entry (k) (k) aij > 0. For index i, set per(i) = gcd(k > 1 : aii > 0). If A is irreducible, then per(i) is the same for every i, and we call it the period of A. We call A aperiodic if its period is 1. The matrix is called primitive if there is (k) k ∈ N such that aij > 0 for all i, j ∈ A. Exercise 3.7. Show that if A is aperiodic and irreducible, then A is primitive, but irreducibility or aperiodicity alone doesn’t imply primitivity. Conversely, if A is primitive, then it is also aperiodic and irreducible. If A is irreducible, show that per(i) is indeed independent of i. Lemma 3.8. Every irreducible SFT is synchronized; in fact, every word of length M (the memory of the SFT) is synchronizing. Proof. Let v be any word of length M . If uv ∈ L(X), then u has no influence on what happens after v. Hence if vw ∈ L(X), then uvw ∈ L(X). Irreducibility of A then gives a dense orbit.  Example 3.9. Let T : [0, 1] → [0, 1] be the piecewise monotone map; i.e. there is a finite partition {Ji }i∈A of [0, 1] into intervals such that T |Ji is continuous and monotone for each i. Assume also that for each i, T (Ji ) is the closure of the union of Jk ’s. In this case we call {Ji }i∈A a Markov partition. Write  1 if T (Ji ) ⊃ Jj◦ , aij = 0 if T (Ji ) ∩ Jj◦ = ∅. Then the resulting matrix A = (ai,j )i,j∈A is the transition matrix for the subshift obtained by taking the closure of the collection of itineraries {i(x) : x ∈ [0, 1]}. This yields a one-sided shift.   The example in Figure 3.2 produces the transition matrix A = 01 11 , so the corresponding subshift is the Fibonacci SFT; see Example 1.3. It should not come as a surprise that the leading eigenvalue of A is exactly the slope of T : both equal ehtop (T ) = ehtop (σ) = γ; see Section 3.1.2. For the bi-infinite Fibonacci SFT, we can look at a toral automorphism.

54

3. Subshifts of Positive Entropy

⎧ ⎨γ(x + T (x) = √

γ= J1

2−γ γ )

⎩γ(1 − x)

if x ∈ J1 := [0, γ−1 γ ], if x ∈ J2 := [ γ−1 γ , 1],

5+1 2

J2 Figure 3.2. The tent map with slope equal to the golden mean.

Definition 3.10. A toral automorphism T : Td → Td is an invertible linear map on the (d-dimensional) torus Td . Each such T is of the form TM (x) = M x mod 1, where • M is an integer matrix with det(M ) = ±1 (i.e. M is unimodular); • the eigenvalues of M are not on the unit circle; this property is called hyperbolicity; for toral automorphisms, this is equivalent to Td being a hyperbolic set in terms of Definition 2.81. The map TM has a Markov partition1 , that is, a partition {Ji }i∈A for sets such that:  (1) The Ji have disjoint interiors and i Ji = Td . (2) If TM (Ji◦ )∩Jj◦ = ∅, then TM (Ji ) stretches across Jj◦ in the unstable direction (i.e. the direction spanned by the unstable eigenspaces of M ). (3) If TA−1 (Ji◦ ) ∩ Jj◦ = ∅, then TA−1 (Ji ) stretches across Jj◦ in the stable direction (i.e. the direction spanned by the stable eigenspaces of M ). Every hyperbolic toral automorphism has a Markov partition (see [100]), but in general they are fiendishly difficult to find explicitly, especially in dimension ≥ 3 where the boundaries of the Ji might have to be fractal  1 1(see [104]). Therefore we confine ourselves to the simpler case of M = 1 0 , a Markov partition of three rectangles Ji for i = 1, 2, 3 can be constructed; see Figure 3.3. The corresponding transition matrix is ⎛ ⎞  0 1 1 1 if TM (Ji◦ ) ∩ Jj = ∅, A = (ai,j ) = ⎝1 0 1⎠ where aij = 0 if TM (Ji◦ ) ∩ Jj = ∅. 0 1 0 1 The construction of Markov partitions for toral automorphisms on T2 goes back to Berg [58] and Adler & Weiss [10], extended to more general settings in [99, 100, 258, 512] among others.

3.1. Subshifts of Finite Type

55

mod1

J2 J1 M2

J1 e bl sta

J3

re di

J3

ct ion

ion ct i re d e bl sta un

J2

2 Figure 3.3. The Markov partition for TM : T2 → T2 ; the catmap is TM .

The characteristic polynomial of A is det(A − λI) = −λ3 + 2λ + 1 = −(λ + 1)(λ2 − λ − 1) = −(λ + 1) det(M − λI), so A has the eigenvalues of M (no coincidence!), together with λ = −1. Example 3.11. The most “famous” toral automorphism is Arnol’d’s catmap,    2 and it has the matrix 21 11 = 11 10 ; see Figure 3.3 (right). It is called catmap because Arnol’d used this example, including the drawing of a cat’s head, in his book(s) [30] to illustrate the nature of hyperbolic maps. Exercise 3.12. Show that if x ∈ Td has only rational coordinates, then x is periodic under a toral automorphism. Conclude that, if the pixels in Figure 3.3 have rational coordinates (such as the dyadic coordinates that computers use), then the cat will return intact after a finite number of iterates. The following characterization for shadowing subshifts is due to Walters [552] (see also [381, Theorem 3.33] and [279] for an entirely general characterization of dynamical systems with shadowing in terms of SFTs). Theorem 3.13. A subshift (X, σ) has the shadowing property if and only if it is a subshift of finite type. Proof. We give the proof for X ⊂ AN0 only; the two-sided case follows in a similar way. ⇐: Let (X, σ) be an SFT of memory M (see below Definition 3.1) so M + 1 is the length of the longest forbidden word. Let ε > 0 be arbitrary and choose m ≥ M + 1 so small that 2−m < ε. Take δ = 22−m . We need to

56

3. Subshifts of Positive Entropy

show that every δ-pseudo-orbit (xn )n≥0 ⊂ X (in other words, σ(xn )0 · · · σ(xn )m−3 = xn1 · · · xnm−2 = xn+1 · · · xn+1 0 m−3 for every n), there is y ∈ X that ε-shadows (xn )n≥0 . To this end, set yn = xn0 for each n ≥ 0. Then for 0 ≤ i < m, we have = xn+i−1 = xn+i−2 = · · · = xni , yn+i = xn+i 0 1 2 so yn · · · yn+m−1 = xn0 · · · xnm−1 ∈ L(X). Since X is an SFT, y ∈ X and d(σ n (y), xn ) < ε by construction. ⇒: Let (X, σ) be a subshift with the shadowing property, so in particular, for ε = 1, there exists δ > 0 such that every δ-pseudo-orbit in X is εshadowed in X. Take N ∈ N such that 22−N < δ, and let y ∈ AN0 be such that yn · · · yn+N −1 ∈ L(X) for each n. Then there exists a sequence (xn )n≥0 such that xn0 · · · xnN −1 = yn · · · yn+N −1 for each n ≥ 0. Therefore · · · xn+1 σ(xn )0 · · · σ(xn )N −2 = xn1 · · · xnN −1 = yn+1 · · · yn+N −1 = xn+1 0 N −2 and d(σ(xn ), xn+1 ) ≤ 2−N +2 < δ. Hence (xn )n≥0 is a δ-pseudo-orbit, which can be ε-shadowed by some z ∈ X. But then zn = xn0 = yn for every n ≥ 0, so z = y ∈ X. Since y was arbitrary, up to the condition that each of its N -blocks belongs to L(X), it follows that the only restriction of X involves forbidden blocks of length ≤ N . Therefore X is an SFT.  3.1.2. Topological Entropy of SFTs. Theorem 3.14. The topological entropy of an SFT equals max{0, log λ} where λ is the leading eigenvalue of the transition matrix A. If A is irreducible, then the Perron-Frobenius Theorem 8.58 gives that λ ≥ 1 (and λ > 1 if A is not a permutation matrix). (n)

Proof. Let An = (aij )i,j∈A be the n-th power of A. Every word in Ln (X) corresponds to an n-path in the transition graph, and the number of n-paths (n) from i to j is given by aij . Using the Jordan normal form A = U JU −1 , we can find C ≥ 1 such that C −1 λn ≤ max aij ≤ Cn#A λn . (n)

(3.1)

i,j

C −1 λn

≤ p(n) ≤ (#A)2 Cn#A λn (where p(n) stands for the It follows that  word-complexity; see Definition 1.9) and limn n1 log p(n) = log λ. Proposition 3.15. If (Y, σ) is a factor of (X, σ), then htop (Y, σ) ≤ htop (X, σ). If (X, σ) and (Y, σ) are conjugate, then htop (X, σ) = htop (Y, σ). The result also holds in general, i.e. not just in the context of subshifts, see Corollary 2.51, but using the word-complexity and sliding block codes, the proof is particularly straightforward here.

3.1. Subshifts of Finite Type

57

Proof. Let ψ : X → Y be the factor map. Since it is continuous, it is a sliding block code by Theorem 1.23, say of window length 2N + 1. Therefore the word-complexities relate as pY (n) ≤ pX (n + 2N ), and hence lim sup n→∞

1 1 log pY (n) ≤ lim sup log pX (n + 2N ) n n→∞ n 1 n + 2N = lim sup log pX (n + 2N ) n n + 2N n→∞ 1 = lim sup log pX (n + 2N ). n→∞ n + 2N

This proves the first statement. htop (X, σ) = htop (Y, σ).

Using this in both directions, we find 

As shown by Parry [446], see Theorem 6.67, irreducible SFTs are intrinsically ergodic. This follows also from Theorem 3.48 and Proposition 3.41. Weiss [556] showed that factors of irreducible SFTs are intrinsically ergodic as well. 3.1.3. Vertex-Splitting and Conjugacies between SFTs. It is natural to ask which SFTs are conjugate to each other. We have seen that having equal topological entropy is a necessary condition for this, but it is not sufficient. The conjugacy problem for SFTs was solved by Williams and in this section we discuss the ingredients required for this result. Complete details can be found in [364, 398]. We know that an SFT (X, σ) has a graph representation (as vertexlabeled subshift or edge-labeled subshift, and certainly not unique). The following operation on the graph G, called vertex splitting, produces a related subshift: v1

v 

v2

v

Figure 3.4. Insplit graph

v

v 

v1

v 

v

v2

v

Original G

Outsplit graph

Let G = (V, E) where V is the vertex set and E the edge set. For each v ∈ V , let Ev ⊂ E be the set of edges starting in v and let E v ⊂ E be the set of edges terminating in v.

58

3. Subshifts of Positive Entropy

Definition 3.16. Let G = (V, E), and assume that #E v ≥ 2. An elemenˆ is obtained by tary insplit graph Gˆ = (Vˆ , E) • doubling one vertex v ∈ V into two vertices v1 , v2 ∈ Vˆ ; • replacing each e = (v → w) ∈ Ev for w = v by an edge eˆ1 = (v1 → w) and eˆ2 = (v2 → w); • replacing each e = (w → v) ∈ E v for w = v by a single edge eˆ1 = (w → v1 ) or an edge eˆ2 = (w → v2 ) (but make sure that v1 and v2 both have incoming edges); ˆ • replacing each loop (v → v) by two edges (v1 → vi ), (v2 → vi ) ∈ E (so one of them is a loop) where i ∈ {1, 2}. An insplit graph is then obtained by successive elementary insplits. (Elementary) outsplit graphs are defined similarly, interchanging the roles of Ev and E v . Definition 3.17. Let G = (V, E), and assume that #Ev ≥ 2. An elemenˆ is obtained by tary outsplit graph Gˆ = (Vˆ , E) • doubling one vertex v ∈ V into two vertices v1 , v2 ∈ Vˆ ; • replacing each e = (v → w) ∈ Ev for w = v by a single edge eˆ = (v1 → w) or eˆ = (v2 → w) (but make sure that v1 and v2 both have outgoing edges); • replacing each e = (w → v) ∈ E v for w = v by an edge eˆ = (w → v1 ) and an edge eˆ = (w → v2 ); ˆ • replacing each loop (v → v) by two edges (vi → v1 ), (vi → v2 ) ∈ E (so one of them is a loop) where i ∈ {1, 2}. An outsplit graph is then obtained by successive elementary outsplits. ˆ a If every e ∈ E had a unique label, then we will also give each eˆ ∈ E unique label. Proposition 3.18. Let Gˆ be an in- or outsplit graph obtained from G. Then ˆ of Gˆ and the edge-labeled subshift X of G are the edge-labeled subshift X mutually semi-conjugate to each other. ˆ the general Proof. We give the proof for an elementary outsplit graphs G; outsplit and (elementary) insplit graphs follow similarly. By Theorem 1.23, ˆ → X and it suffices to give sliding block code representations for π : X ˆ π ˆ : X → X. ˆ → X is simple. If eˆ ∈ E ˆ replaces e ∈ E, then • The factor map π : X f (ˆ e) = e and π(x)i = f (xi ).

3.1. Subshifts of Finite Type

59

• Each 2-word ee ∈ L(X) uniquely determines the first edge eˆ of the 2-path in Gˆ that replaces the 2-path in G coded by ee . Set ˆ (x)i = fˆ(xi , xi+1 ). fˆ(e, e ) = eˆ and π This concludes the proof. In general, mutual semi-conjugacy is not enough to conclude conjugacy (it is not given that π ˆ = π −1 ), but in this situation, conjugacy holds; see Theorem 3.24.  ˆ be an outsplit graph of G = (V, E) with transition Now let Gˆ = (Vˆ , E) ˆ = #Vˆ and N = #V . Then matrices Aˆ and A, respectively. Assume that N ˆ there is an N × N -matrix D = (dv,ˆv )v∈V,ˆv∈Vˆ where dv,ˆv = 1 if vˆ replaces v. (Thus D is a sort of rectangular diagonal matrix.) ˆ ×N -matrix C = (cvˆ,v ) ˆ where cvˆ,v is the number There also is an N vˆ∈V ,v∈V

ˆvˆ . of edges e ∈ E v that replace an edge eˆ ∈ E

Proposition 3.19. With the above notation, DC = A

and

ˆ CD = A.

Sketch of proof. Prove it first for an elementary outsplit, and then compose elementary outsplits to a general outsplit. For the first step, we compute the elementary outsplit for the example of Figure 3.4. ⎛ ⎞ ⎛ ⎞ 0 0 0 1 1 1 1 ⎜1 1 1 0⎟ ⎟ A = ⎝0 1 1⎠ and Aˆ = ⎜ ⎝0 0 1 1⎠ . 1 0 0 1 1 0 0 Also

⎛ ⎞ 1 1 0 0 D = ⎝0 0 1 0⎠ 0 0 0 1

and

⎛ 0 ⎜1 C=⎜ ⎝0 1

0 1 1 0

⎞ 1 0⎟ ⎟. 1⎠ 0

ˆ Matrix multiplications confirms that DC = A and CD = A.



Exercise 3.20. Do the same for the elementary insplit graph in the example of Figure 3.4. Definition 3.21. Two matrices A and Aˆ are strongly shift equivalent ˆ if there are (rectangular) matrices Di , Ci and (of lag ) (denoted as A ≈ A) Ai , 1 ≤ i ≤  over N0 such that (3.2)

A = A0 ,

Ai−1 = Di Ci ,

Ci Di = Ai ,

i = 1, . . . , ,

ˆ A = A.

Remark 3.22. One important restriction of this definition is that the conjugating matrices must have non-negative integer entries. Even if a square

60

3. Subshifts of Positive Entropy

matrix has determinant ±1, its inverse may still have negative integers among its entries. For example     4 1 3 2 ˆ A= and A = 1 0 2 1 1 1 1 1 are similar via 1 −1 A = Aˆ 1 −1 . From this, we can easily compute that the traces tr(An ) = tr(Aˆn ) for all n ∈ Z, so A and Aˆ share ζ-functions ζA (t) :=  n ˆ exp( ∞ n=0 tr(A )). However, A and A are not (strongly) shift equivalent. This is Williams’s [559, Example 3] counterexample to Bowen’s question of whether sharing ζ-functions for SFTs suffices to conclude conjugacy. Exercise 3.23. Show that strong shift equivalence ≈ is indeed an equivalence relation between non-negative square matrices. Show that A ≈ Aˆ ˆ implies that A and Aˆ have the same leading eigenvalue λ = λ. Strong shift equivalence between matrices A and Aˆ means, in effect, that their associated graphs G and Gˆ can be transformed into each other by a sequence of elementary vertex-splittings and their inverses (vertex-mergers). Conjugacy between SFTs can always be reduced to vertex-splittings and vertex-mergers, as shown in Williams’s Theorem [559] from 1973. The full proof is in [364, Chapter 2] and [398, Chapter 7, specifically Theorem 7.2.7]. Theorem 3.24. Two SFTs are conjugate if and only if their transition matrices are strongly shift equivalent. Strong shift equivalence A ≈ Aˆ may be a complete invariant for conjugacy between edge-labeled SFTs XA and XAˆ . In practice it is difficult to ˆ Even if A and Aˆ have the same characteristic polynomial, check if A ≈ A. they need not be strongly shift equivalent. The following weaker notion may help: Definition 3.25. Two matrices A and Aˆ are shift equivalent (of lag ) ˆ if there are matrices C, D over N0 such that (denoted as A ∼ A) ˆ AD ˆ = DA. (3.3) and AC = C A, A = CD, Aˆ = DC Said differently, the following diagram commutes: Zn

A

D

Znˆ

D



A−1

Zn

Znˆ

C

Zn C

Aˆ

A

Zn D

Znˆ

3.2. Sofic Shifts

61

Shift equivalence means that the -th powers A and Aˆ are strong shift equivalent (with lag 1). Shift equivalence is easier to verify than strong shift equivalence, although verification can still be very complicated. But, and this is Williams’s Conjecture, it is still not fully2 known if it is a complete ˆ then XA invariant; see [398, Section 7.3] and [106, Problem 19.1]. If A ∼ A, ˆ this is insufficient to conclude and XAˆ cannot be conjugate, but if A ∼ A, that (XA , σ) and (XAˆ , σ) are conjugate. Exercise 3.26. Show that (i) A ∼ Aˆ implies A ∼k Aˆ for all k ≥ , (ii) shift equivalence ≈ is an equivalence relation between non-negative square matrices, and (iii) strong shift equivalence implies shift equivalence, with the same value of . Shift equivalence matrices have the same ζ-function, and many other properties coincide too. Lemma 3.27. If A and Aˆ are shift equivalent (of lag ), then they have the same non-zero eigenvalues (so also htop (XA , σ) = htop (XAˆ , σ)). Proof. We have An C = C Aˆn and DAn = Aˆn D for all n ≥ 0. By linearity, ˆ and D · q(A) ˆ = q(A)D for every polynomial. If q q(A) · C = C · q(A) is the characteristic polynomial of A (so q(A) = 0 by the Cayley-Hamilton ˆ Thus Aˆ has no other eigenvalues Theorem), then 0 = D ·q(A)·C = Aˆ ·q(A). than those of A, possibly plus 0. On the other hand, if q is the characteristic ˆ then 0 = C · q(A) ˆ · D = q(A) · A , so A has the eigenvalues polynomial of A, ˆ with the possible exception of 0. of A, Since htop (XA , σ) = log λA for the leading eigenvalue λA of A, the entropies are the same too.  In order to say what can be proved with shift equivalence, we define SFTs (XA , σ) and (XAˆ , σ) to be eventually conjugate if the n-block shifts are conjugate for all sufficiently large n. Then, see [398, Theorem 7.5.15]: Theorem 3.28. Two SFTs (XA , σ) and (XAˆ , σ) are eventually conjugate if and only if A and Aˆ are shift equivalent. There remain many open (classification) problems in SFT, as well as in sofic and other subshifts. The survey of Boyle [106] contains a long list of open problems, many of which remain open to today.

3.2. Sofic Shifts Sofic shifts are shifts that can be described by finite edge-labeled (rather than vertex-labeled as needed for SFT) transition graphs. The word sofic 2 Kim

& Roush [361] gave a negative answer, but only for reducible matrices.

62

3. Subshifts of Positive Entropy

was coined by Benjy Weiss; it comes from the Hebrew word for “finite”. Much of this section can be found in concise form in [364, Section 6.1]. Definition 3.29. A subshift (X, σ) is called sofic if it is the space of paths in an edge-labeled graph. Other than with the vertex-labeling, in this edgelabeling, more than one edge is allowed to have the same symbol. Lemma 3.30. Every SFT is sofic. Proof. Assume that the SFT has memory M . Let G be the vertex-labeled M -block transition graph of the SFT; i.e. each a1 · · · aM ∈ LM (X) is the label of a unique vertex. We have an edge a1 · · · aM → b1 · · · bM if and only if a1 · · · aM bM = a1 b1 · · · bM ∈ LM +1 (X), and then this M + 1-word is also the label of the edge. Since each infinite vertex-labeled path is in one-toone correspondence with an infinite edge-labeled path and also in one-to-one correspondence with an infinite word in X, we have represented X as a sofic shift.  Remark 3.31. Not every sofic shift is an SFT. For example the even shift (Example 1.4) has an infinite collection of forbidden words, but it cannot be described by a finite collection of forbidden words. Sofic shifts that are not of finite type are called strictly sofic. The following theorem shows that we can equally define the sofic subshifts as those that are a factor of a subshift of finite type. Theorem 3.32. A subshift X is generated by an edge-labeled graph if and only if it is the factor of an SFT. Proof. ⇒: Let G be the edge-labeled graph of X, with edges labeled in alphabet A. Relabel G in a new alphabet A such that every edge has a distinct label. Call the new edge-labeled graph G  . Due to the injective edge-labeling, the edge-subshift X  generated by G  is isomorphic to an SFT. For this, we can take the dual graph in which the edges of G  are the vertices, and a → b if and only if a labels the incoming edge and b the outgoing edge of the same vertex in G  . Now π : X  → X is given by π(x)i = a if a is the label in G of the same edge that is labeled xi in G  . This π is clearly a sliding block code, so by Theorem 1.23, π is continuous and commutes with the shift. ⇐: If X is a factor of an SF T , then the factor map is a sliding block code by Theorem 1.23, say of window size 2N + 1: π(x)i = f (xi−N , . . . , xi+N ). Represent the SFT by an edge-labeled graph G  where the labels are the 2N + 1-words w ∈ L2N +1 (X). These are all distinct. The factor map turns G  into an edge-labeled graph G with labels f (w). Therefore X is sofic. 

3.2. Sofic Shifts

63

Corollary 3.33. Every factor of a sofic shift is again a sofic shift. Every shift conjugate to a sofic shift is again sofic. In fact, a sofic shift with an irreducible transition matrix is always transitive, has a dense set of periodic points, and is mixing if and only if it is totally transitive; see [47, Theorem 3.3]. 3.2.1. Follower sets. A further characterization of sofic shifts relies on the following notion. Definition 3.34. Given a subshift X and a word v ∈ L(X), the follower set F (v) is the collection of words w ∈ L(X) such that vw ∈ L(X). Example 3.35. Let Xeven be the even shift from Example 1.4. Then F (0) = L(Xeven ) because a 0 casts no restrictions on the follower set. Also F (011) = L(Xeven ), but F (01) = 1L(X) = {1w : w ∈ L(X)}. Although each follower set is infinite, there are only these two distinct follower sets. Indeed, F (v0) = F (0) for every v ∈ L(X), and F (v0111) = F (v01), F (v01111) = F (v011), etc. The follower set F (1) is not properly defined, but we can ignore this. The following theorem, appearing in [556], is in fact a consequence of the Myhill-Nerode Theorem [428, 430]. Theorem 3.36. A subshift (X, σ) is sofic if and only if the collection of its follower sets is finite. Proof. First assume that the collection V = {F (v) : v ∈ L(X)} is finite. We will build an edge-labeled graph representation G of X as follows: (1) Let V be the vertices of G. (2) If a ∈ A and w ∈ L(X), then F (wa) ∈ V ; draw an edge F (w) → F (wa), and label it with the symbol a. (Although there are infinitely many w ∈ L(X), there are only finitely many follower sets, and we need not repeat arrows between the same vertices with the same label.) The resulting edge-labeled graph G represents X. Conversely, assume that X is sofic, with edge-labeled graph representation G. For every w ∈ L(X), consider the collection of paths in G representing w, and let T (w) be the collection of terminal vertices of these paths. Then F (w) is the collection of infinite paths starting at a vertex in T (w). Since G is finite and there are only finitely many subsets of its vertex set, the collection of follower sets is finite.  Definition 3.37. An edge-labeled transition graph G is right-resolving if for each vertex v ∈ G, the outgoing arrows all have different labels. It is

64

3. Subshifts of Positive Entropy

called follower-separated if for each vertex v ∈ G, the follower set (i.e. the set of labeled words associated to paths starting in v) is different from the follower set of every other vertex. Every sofic shift has a right-resolving follower-separated graph representation and if we minimize the number of vertices in such a graph, there is only one such graph with these properties. In fact, the follower set representation G constructed in the first half of the proof of Theorem 3.36 is both right-resolving, follower-separated, and of smallest size. The latter two properties follow by the choice of V . To see the former, assume that v ∈ V and v → w and v → w have the same label a. This implies that F (w) = {x : ax ∈ F (v)} = F (w ), so w = w . Corollary 3.38. Every transitive sofic shift X is synchronized and (unless it is a single periodic orbit) has positive entropy. In fact, htop (X) = log λA , where λA is the leading eigenvalue of the transition graph of the minimal right-resolving representation of X. Proof. Let edge-labeled graph G be the right-resolving follower-separated representation of X. Pick any word u ∈ L(X) and let T (u) be the collection of terminal vertices of paths in G representing u. If T (u) consists of one vertex v ∈ V , then every path containing u goes through v, and there is a unique follower set F (u), namely the collection of words representing paths starting in v. In particular, u is a synchronizing word. If #T (u) > 1, then we show that we can extend u to the right so that it becomes a synchronizing word. Suppose that v = v  ∈ T (u). Since G is / F (v  ) follower-separated, there is u1 ∈ L(X) such that u1 ∈ F (v) but u1 ∈ (or vice versa; the argument is the same). Extend u to uu1 . Because G is right-resolving, u1 can only represent a single path starting at any single / F (v  ), we have in vertex. Therefore #T (uu1 ) ≤ #T (u). But since u1 ∈ fact #T (uu1 ) < #T (u). Continue this way, extending uu1 until eventually #T (uu1 · · · uN ) = 1. Then uu1 · · · uN is synchronizing. (In fact, what we proved here is that every u ∈ L(X) can be extended on the right to a synchronizing word.) The positive entropy follows from Theorem 1.20 or Corollary 3.47. In fact, since G is right-resolving, there is an at most #V -to-one correspondence between n-paths starting in G and words in Ln (X). Therefore #{n-paths} ≤  pX (n) ≤ #V · #{n-paths}, and we can use Theorem 3.14. Remark 3.39. Irreducible sofic shifts are intrinsically ergodic; see [556] and Theorem 3.48.

3.3. Coded Subshifts

65

3.3. Coded Subshifts Rather than forbid words to appear, as one does in SFTs, we can prescribe which words need to be used, and then these words can be concatenated freely. This type of subshift was first described by Blanchard & Hansel [84]. Definition 3.40. A coded subshift (XC , σ) is the closure of the collection of free concatenations of a finite or countable collection C. Of course, this doesn’t mean that concatenations of words in C are the only words in the language L(X). For example, if C = {10, 01}, then 00 ∈ L(X) \ C ∗ . Proposition 3.41. Every transitive SFT is a coded shift. For example, the Fibonacci SFT of Example 1.3 and the even shift of Example 1.4 are both coded subshifts, with sets of code words C = {0, 01} and C = {0, 01}, respectively. On the other hand,  the SFT (XA , σ) on the 1 1 alphabet {0, 1} with transition matrix A = 0 1 is not transitive, and it is also not a coded shift, because no code word containing 01 can ever be used twice in a concatenation. Proof. Rewrite the SFT to an SFT with memory M = 1; i.e. all forbidden words have length ≤ 2. Let G be the transition graph; since the SFT is transitive, G is strongly connected. Fix vertices a, b such that the arrow a → b occurs in G. Now let S contain the codes of all finite paths b → · · · → a; these can be freely concatenated.  Remark 3.42. Naturally, the set C of codes may not be the most economical, but the idea of the proof of Proposition 3.41 is quite general. It can also be used to show that sofic and synchronized subshifts are coded. Therefore we have the inclusion: SFTs ⊂ sofic shifts ⊂ synchronized subshifts ⊂ coded subshifts. All these inclusions are strict. For instance, Dyck shifts are coded but not synchronized; see Section 3.10. Coded shifts are always transitive, but not always totally transitive; indeed, if the lengths of all code words is a multiple of N ≥ 2, then σ N can easily be non-transitive (but not necessarily; see [180, Theorem 4.1]). Totally transitive coded subshifts are always weakmixing (since they have a dense set of periodic orbits, see [325, Corollary 3.6]), and also topologically mixing, see [180, Theorem 2.2]. Thus for coded systems, these three notions coincide.

66

3. Subshifts of Positive Entropy

It is useful to make some distinction between sequences that are the concatenations of “short” words: (3.4)

VC = {x ∈ AZ : ∃ (sk )k∈Z ⊂ Z such that xsk · · · xsk+1 −1 ∈ C},

and sequences for which every finite subword appears as a subword of “long” words in C: (3.5) UC = {x ∈ AZ : ∀ k ∈ N, x−k · · · xk is a subword of some word in C}. We have XC = VC ⊃ UC . Example 3.43. The odd shift Xodd (recall from Example 1.4 that in this subshift, blocks of 0’s have odd lengths) is a coded shift with collection of code words C = {1, 10, 1000, 10000, . . . , 102n−1 , . . . }. The sequence · · · 010101010 · · · belongs to VC but not to UC . On the other hand, · · · 000000 · · · belongs to UC but not to VC . The sequence · · · 0001000 · · · belongs to neither but lies in the closure VC (but not in UC ; in fact UC = UC = {0∞ }). One can view coded shifts by means of (infinite) edge-labeled transition graphs GC , with a central vertex v0 from which loops of length  emerge. Here q = #C for C := {C ∈ C : |C| = }. The theory of infinite Markov graphs, as summarized in Section 8.7, should then be applicable. In particular, lim sup 1 log q = hG (G) is the Gurevich entropy; see Definition 8.63. According to Theorem 8.73, unless GC is transient, the topological entropy ought to be the leading root of

q e−h = 1. (3.6) F (h) := 

This is indeed true in many cases, see e.g. Examples 3.43 (see Exercise 3.115) and Example 3.49 below, but there are two problems. First, the space of paths on GC can multiply code the points in XC , leading to an overestimate of the entropy. We call x ∈ XC recognizable or uniquely decipherable if the sequence (sk ) in (3.4) is unique. The collection of code words C has the unique decomposition property if every finite word w ∈ L(XC ) can be decomposed in at most one way into words of C. Example 3.44. Let C = {0, 10, 100}; then XC is the Fibonacci SFT, but clearly the word 100 is superfluous here, since it is the concatenation of the first two. Thus XC is neither uniquely decipherable nor has the unique decomposition property. The entropy is not the logarithm of the silver mean as (3.6) would suggest, but truly the logarithm of the golden mean.

3.3. Coded Subshifts

67

Let C = {1010, 0100}. Then XC doesn’t have the unique decomposition property because 0100 010 010

 10

 = 

  . However, if this word is extended by one symbol (either on the left or on the right), then the decomposition is unique. Therefore XC is uniquely decipherable. Let C = {10, 00, 01}. In this case, every word containing 11 is uniquely decipherable, and all other words can be deciphered in exactly two ways; e.g. · · ·  01  00  00  10  10  01 0 · · · = · · · 0  10  00  01  01  00  10 · · · Formula (3.6) suggests that the topological entropy htop (σ) = this is indeed true.

0

0 1

0

v0

4

1

0

0

1

1 2

log 3, and

5

v0

2 3

Figure 3.5. The edge-labeled transition graphs of XC and XC˜.

We see this by considering XC˜ for C˜ = {01, 23, 45}. These have isomorphic transition graphs (with isomorphic path spaces), see Figure 3.5, but the latter is clearly uniquely decipherable with entropy 12 log 3. Since (XC , σ) is a factor of (XC˜, σ) via the sliding block-code π : 0 → 0, 1 → 1, 2 → 1, 3 → 0, 4 → 0, 5 → 0, the factor map is at most 2-to-1 and hence doesn’t decrease entropy. The second problem is that there may not be a good correspondence between the number of loops of length  and the number of subwords of length . The solution of (3.6) can then underestimate the true value of the entropy, and indeed hG (G) ≤ htop (XC ). A crude example of this is C = {01, 00011011, 000001010011100101110111, . . . }; i.e. the n-th code word is a concatenation of all words in {0, 1}∗ of length n. Then q = 1 if  = n2n and q = 0 otherwise. Since every word appears in XC , the true entropy is htop (XC ) = log 2, but (3.6) yields e−2h +e−8h +e−24h +e−128h +· · · = 1, which gives h = log 1.1809 · · · < log 2.

68

3. Subshifts of Positive Entropy

Hence, knowing the numbers q of length  code words is insufficient to decide on the entropy. Pavlov [450] suggests using the n-subwords Wn inside code words instead. The exponential growth rate of their number is limn n1 log #Wn = h(UC ). Theorem 3.45 ([450, Theorems 1.7 and 1.8]). Recall from (3.6) that F (h) =  −h . q e   (i) If h > h(UC ) and F (h) < 1, then htop (XC ) ≤ h. (ii) Conversely, if F (h) > 1 and C has the unique decomposition property, then htop (XC ) > h. Proof. (i) Let Pren and Sufn denote the length n prefixes and suffixes of code words C ∈ C. Note that Pren ∪ Sufn ⊂ Wn . Every word in L(XC ) can be written as the concatenation of the one suffix, some code words, and one prefix, and therefore Ln (XC ) = Wn ∪

n



k=2

n1 +···+nk =n

Sufn1 Cn2 · · · Cnk−1 Prenk ,

ni ≥ 1

where the inner union really runs over the concatenations of all words in the indicated sets. Note that if the concatenation starts with a full code word, then this counts as a suffix, and similarly if the concatenation ends with a full code. Therefore it is justified to assume that ni ≥ 1 for each i. This gives #Ln (XC ) ≤ #Wn +

n



#Sufn1 · qn2 · · · qnk−1 · #Prenk .

k=2 n1 +···+nk =n ni ≥ 1

Since limn n1 log #Wn = h(UC ), our assumption h > h(UC ) implies that there is a constant K such that max{#Prenk , #Sufnk } ≤ #Wn ≤ Kenh . Therefore, setting m = n1 + nk , #Ln (XC ) ≤ Ke

nh



+

n



2 (n1 +nk )h

K e

qnj

j=2

k=2 n1 +···+nk =n ni ≥1

⎜ 2 = enh ⎜ ⎝K + K

k−1 



n n−m





k−1 

m=0 k=2

n2 +···+nk−1 =n−m

j=2

ni ≥ 1

⎟ qnj e−nj h ⎟ ⎠,

3.3. Coded Subshifts

69

where the empty product counts as 1. All the terms in the last sum are part  k−2 ∞ −jh q e . By the assumption that of the expansion of F (h)k−2 = j j=1 F (h) < 1, we obtain    n

(n + 1)K 2 nh 2 k−2 nh . F (h) K+ #Ln (XC ) ≤ e K + (n + 1)K ≤e 1 − F (h) k=2

Taking logarithms, dividing by n, and taking the limit n → ∞ gives htop (XC ) ≤ h.  (ii) If F (h) > 1, then for all t ∈ N sufficiently large, St := tj=1 qj e−jh > 1. For k ∈ N, we have the expansion ⎛ ⎞k t tk k



 k −jh ⎠ −nh ⎝ St = qj e = e qnj . j=1

n=k

n2 +···+nk−1 =n

ni ≥ 1

j=1

Choose n = Nk such that the second sum is maximized. Obviously t ≤ Nk ≤ tk. Then k

 k −Nk h qnj . St ≤ tke n2 +···+nk−1 =Nk

j=1

ni ≥ 1

 For every choice n1 , . . . , nk with ki=1 ni = Nk , the concatenation of words from Cni belongs to L(XC ). Also, by the unique decomposition property, every different choice of such concatenation gives a different word in L(XC ). Therefore k N /t

 eNk h St k . qnj ≥ #LNk (XC ) ≥ tk n +···+n =N 2

k−1

ni ≥ 1

k

j=1

Next take logarithms, divide by Nk , and let Nk → ∞ to obtain htop (XC ) ≥ h + 1t log St . But since F (h) ≥ St > 1 for all sufficiently large t, we get  htop (XC ) > h as required. We can now state the consequence for the entropy of coded shifts, paraphrasing results of Pavlov [450, Theorems 1.1–1.3]. Corollary 3.46. Let h(UC ) = limn n1 log pUC (n) be the exponential growth  rate of words in UC , and recall the function F (h) = ≥1 q e−h from (3.6). (a) Assume that XC has unique decomposition property. If F (h(UC )) ≥ 1, then F (htop (XC , σ)) = 1. In fact, h = htop (XC ) is the only solution of F (h) = 1. (b) If F (h(UC )) < 1, then htop (XC , σ) = h(UC ). Also htop (XC , σ) = h(UC ) if and only if F (h(UC )) ≤ 1.

70

3. Subshifts of Positive Entropy

Proof. The map h → F (h) has a critical value hc such that F (h) = ∞ for h < hc and F (h) < ∞ is strictly decreasing for h > hc . At h = hc , F (h) can be finite or infinite. (a) If 1 < F (h(UC )) is finite, then hc ≤ h(UC ), as there is a unique h1 > h(UC ) such that F (h1 ) = 1. Theorem 3.45 gives that htop (XC ) = h1 . (b) If F (h(UC )) < 1, then by Theorem 3.45(i) we have htop (XC ) < h(UC ) + ε for every ε > 0. Since XC ⊃ UC , we have htop (XC ) ≥ h(UC ), so htop (XC ) = h(UC ) follows. Combining (a) and (b) shows that htop (XC ) = h(UC ) if and only if  F (h(UC )) ≤ 1. Corollary 3.47. Every non-periodic coded shift (XC , σ) has positive entropy. Proof. If C is a single word, then XC is periodic. Let C, C  ∈ C be the two shortest words in C. Then by Theorem 8.73, the entropy htop (XC , σ) ≥ log x,  where x is the largest solution to the equation x−|C| + x−|C | = 1. Clearly x > 1.  The classification also has an analogue for the intrinsic ergodicity of coded shifts. This was studied in several papers by Climenhaga, Thompson, and Pavlov; see [159, Theorem B] and [450]. For countable directed graphs, intrinsic ergodicity is equivalent to positive recurrence, see Theorem 8.68. The results for coded shifts are parallel, except that h(UC ) plays the role of lim sup 1 log q in the case that the graph G is formed by a single vertex v0 from which q loops of length  emerge. That is, if F (h(UC )) > 1, then there is a unique measure of maximal entropy μ and supp(μ) = XC . If F (h(UC )) < 1, then all invariant measures (if there are any) are of maximal entropy μ and μ(UC ) = 1. The case F (h(UC )) = 1 is a mixture of the two: there may be one or multiple measures of maximal entropy. Theorem 3.48. Let (X, σ) be a coded shift with q = #{c ∈ C : |c| = }. (1) If lim sup 1 log q < htop (XC , σ), then (X, σ) is intrinsically ergodic. (2) If lim sup 1 log q = 0, then every factor of (X, σ) is intrinsically ergodic. Example 3.49. The next example, based on [450, Example 5.3 and 5.4], shows that in certain cases the theory of countable directed graph does apply to coded shifts. Important for this seems to be that q ≈ #{subwords of the code words from C }.

3.4. Hereditary and Density Shifts

71

For the alphabet A = {0, 1, . . . , d} and some function τ : N → N, take the set of code words C = {a1 a2 · · · an 0τ (n) : ai ∈ {1, . . . , d}, n ≥ 2}. Hence UC = {0, . . . , d}Z , whence h(UC ) = log d, and F (h(UC )) =



q d− =



dn · e−(n+τ (n)) log d =

n=2

=1



d−τ (n) .

n≥2

If d = 2 and τ (n) = n, then F (h(UC )) = 1, so htop (XC ) = h(UC ) = log 2. In fact, this is a situation (see [450, Proposition 5.1]) where one can equally well work with the transition graph G and the Gurevich entropy hG (G) = log 2. Since also ∞



d F (h)|h=hG (G) = (n + τ (n))d−τ (n)−1 = n2−n < ∞, dh n=2

n≥2

the graph G is positively recurrent. Thus there is a unique measure of maximal entropy, and it is supported on the whole of XC . If d = 4 and τ (n) = log2 n, then F (h(UC )) =



d

−τ (n)

=

n≥2



∞ 2

−1

k+1

4

−log2 n

n≥2

=

4

−k

=

k=1 n=2k



2k 4−k = 1.

k=1

Again, htop (XC ) = h(UC ) = hG (G), and ∞

1 −log2 n d −τ (n)−1 F (h)|h=hG (G) = (n + τ (n))d ≥ n4 dh 4 n=2

n≥2

2k+1 −1



∞ ∞ 1 k −k 1

2 4 = 1 = ∞. 4 4 k k=1 n=2

k=1

Therefore G is null recurrent, and the measure of maximal entropy is supported on UC . In fact, it is the ( 14 , 14 , 14 , 14 )-Bernoulli measure on {1, 2, 3, 4}Z , giving no weight to the symbol 0.

3.4. Hereditary and Density Shifts The natural order on the alphabet A = {0, . . . , N − 1} can be used to create shift-invariant rules. Definition 3.50. A collection X ⊂ AN or Z is hereditary if whenever x ∈ X and y ≤her x (meaning that yn ≤ xn for all n), then also y ∈ X.

72

3. Subshifts of Positive Entropy

Hereditary shifts first appeared in [356, page 882]. It is clear that this rule is shift-invariant, but it is not necessarily closed under taking limits. For example, the collection X = {x ∈ {0, 1}N : xi = 0 infinitely often}

(3.7)

is hereditary, but it contains the sequence 1∞ in its closure. Therefore, some authors [382] make the distinction between hereditary shift and subordinate shift, the latter being hereditary and closed. We will write hereditary subshift, meaning it is indeed closed. SFTs are hereditary, if the collection of forbidden words of length M is exactly the largest in the partial order ≤her on AM . A similar fact holds for sofic shifts. Lemma 3.51. The hereditary closure of (i.e. smallest hereditary subshift containing) the sofic shift (X, σ) is sofic. Proof. Extend the edge-labeled transition graph G of X to G  so that for a

a

each v → w, there is also v → w for each letter a < a.



We will see later that also β-shifts (Corollary 3.71) and spacing shifts are hereditary. Another way to create hereditary subshifts is by stipulating an upper bound of the frequency of non-zero digits. Definition 3.52. Let A = {0, 1, . . . , N − 1} be the alphabet. The (upper) density of the subshift X ⊂ AN or Z is ¯ ¯ : x ∈ X}, d(X) = sup{d(x) ¯ where d(x) is the upper density (see Definition 8.52) of the set of indices ¯ = lim supk k1 {0 ≤ j < k : xj = 0}. Let j such that xj = 0; i.e. d(x) ¯ ≤ δ}. Xδ := {x ∈ AN : d(x) ¯ δ ) = δ, but the example of (3.7) shows that the It is clear that d(X ¯ ≤ δ for every x ∈ Xδ is not closed under taking limits. property that d(x) Remark 3.53. Assume that a collection X ⊂ Xδ is shift-invariant and closed. Then it makes no difference to use Banach density (see Section 8.5) instead of density. Indeed, if there was a sequence x ∈ X with upper Banach density δ, then there is a sequence nk such that k1 #{1 ≤ j ≤ k : xnk +j = 0} → δ. But then k1 #{1 ≤ j ≤ k : σ nk (x)j = 0} → δ, and by compactness, we can find a subsequence of (nk )k∈N along which σ nk (x) converges to y. This y has upper density nk −1 δ. Secondly, if we define a measure μ as δσj (y) where the sequence (nk )k∈N is such accumulation point of n1k j=0 1 that limk nk #{0 ≤ j < nk : yj = 1} = δ, then each μ-typical point satisfies limn n1 #{0 ≤ j < n : yj = 1} = δ. The following entropy estimate is adapted from [382].

3.4. Hereditary and Density Shifts

73

Theorem 3.54. A non-periodic hereditary subshift (X, σ) on the alphabet A = {0, 1, . . . , N − 1} has positive topological entropy. In fact3 htop (X, σ) ≥ ¯ ¯ > 0. d(X) log 2 and htop (X, σ) = 0 if d(X) Proof. Let X be a one-sided hereditary shift (the two-sided case goes similarly). Assume that X is not a single periodic orbit, which for hereditary ¯ shifts means X = {0∞ }. If d(X) > 0, then for every ε > 0 there are x ∈ X and infinitely many integers n such that #{1 ≤ i ≤ n : xi = 0} ≥ ¯ (d(X) − ε)n. Since X is hereditary, 1 1 ¯ ¯ log p(n) ≥ log 2(d(X)−ε)n = (d(X) − ε) log 2. n n But limn n1 log p(n) exists according to Fekete’s Lemma 1.15, and ε > 0 is ¯ log 2. Note that if X, for every ε > 0, contains arbitrary, so htop (σ) ≥ d(X) ¯ − ε, then we find sequences x such that #{1 ≤ i ≤ n : xi = N − 1} ≥ d(X) ¯ log N . htop (σ) ≥ d(X) ¯ For the converse, assume that d(X) = 0, so for every ε > 0 there is n0 such that for all n ≥ n0 , nε 

p(n) ≤

 

n n ≤ nε . k nε k=0

Using Stirling’s formula4 , we obtain   √ 1 ε n nn e−n 1 log p(n) ≤ log n n (nε)nε e−nε (n(1 − ε))n(1−ε) e−n(1−ε) √ 1 log(ε n) − ε log ε − (1 − ε) log(1 − ε). ≤ n Since ε > 0 is arbitrary, it follows that htop (σ) = limn

1 n

log p(n) = 0.



The drawback of Definition 3.52 above is that the collection ¯ ≤ δ} Xδ := {x ∈ AN or Z : d(x) is not closed. For instance xn := 1n 0∞ ∈ Xδ for all δ ≥ 0, but limn xn = 1∞ belongs only to X1 . To obtain closedness, we need to impose further conditions, of the sort that every n-block (for n sufficiently large) contains no more than δn non-zero symbols. The general approach that we shall present is due to Stanley [523]. 3 But 4 n!

this is not √ a sharp bound; see Example 3.59. ∼ nn e−n 2πn.

74

3. Subshifts of Positive Entropy

Definition 3.55. Let A = {0, 1, . . . , N − 1} be the alphabet. Given a function f : N → R, we define the density shift of f as !  n−1

N or Z Xf := x ∈ A : xk+i ≤ f (n) for all k ∈ N or Z and n ∈ N . i=0

In particular, if A = {0, 1}, then Xf := {x ∈ AN or Z : |xk · · · xk+n−1 |1 ≤ f (n) for all k ∈ N or Z and n ∈ N}. Since the condition in the definition is on finite blocks, Xf is closed, and σ-invariance is clear too. Therefore Xf is a subshift, and it is obviously hereditary. We could define density shifts on the infinite alphabet A = {0, 1, 2, . . . }, but as f (1) < ∞, we can use only f (1) + 1 symbols anyway. Example 3.56. The odd shift Xodd from Example 1.4 is not a density shift, / because it is not hereditary. For example, 1011 ∈ L(Xodd ) but 1001 ∈ L(Xodd ). Definition 3.57. The canonical function f of a density shift X is the smallest function such that X = Xf , in the sense that if X = Xf , then f (n) ≤ f (n) for all n ∈ N. Theorem 3.58. The canonical function f of a density shift satisfies (1) f (N) ⊂ N; (2) f is non-decreasing; (3) f (m + n) ≤ f (m) + f (n) (subadditive). Conversely, every function f satisfying (1)–(3) is the canonical function of some density shift. Example 3.59. If f (n) = (n + 1)/2, then the word 11 is forbidden, but no other word is (apart from words that contain 11). Thus Xf is the Fibonacci ¯ SFT, and its density d(X) = 1/2, achieved by x = 101010 · · · . If we set f (n) = (n + 1)/2, then we get the same shift: Xf = Xf . In fact, f is the smallest function with this property. This example also shows that the lower bound of the √ entropy in Theorem 3.54 is1 not sharp, because htop (Xf , σ) = 1 log( 2 (1 + 5)) which is larger than the 2 log 2 given by Theorem 3.54. Proof of Theorem 3.58. For simplicity of exposition, we only consider one-sided shifts. Define the partial order on X as n n



xi ≤ yi for all n ∈ N. (3.8) x sum y if i=1

i=1

Let z ∈ X be such that, inductively, for every n ∈ N, xn ∈ A is the largest symbol such that x1 · · · xn ∈ Ln (X). We claim that x sum z for all x ∈ X.

3.4. Hereditary and Density Shifts

75

We prove the claim by induction on the length n. Clearly x1 ≤ z1 . Assume by induction that x1 · · · xn sum z1 · · · zn and let ξn+1 ∈ A be maximal such that (3.9)

n

n

x1 · · · xn ξn+1 ∈ L(X).

Set p = i=1 zi − i=1 xi ≥ 0. If ξn+1 ≤ p, then (3.9) clearly holds. If ξn+1 > p, then take a = ξn+1 − p ≤ N − 1. For each 1 ≤ r ≤ n, we have n

zi + a =

i=r



n

i=1 n

zi + a −

r−1

xi + p + a −

i=1



n

zi

i=1 r−1

xi

(by the induction hypothesis)

i=1

(by the choice of a),

xi + ξn+1

i=r

which is an allowed  sum in X by the  choice of ξn+1 . Therefore z1 · · · zn a ∈ L(Xf ) and because ni=1 zi + a = ni=1 xi + ξn+1 , we have x1 · · · xn+1 sum x1 · · · xn ξn+1 sum z1 · · · zn a sum z1 · · · zn+1 . for all m ≥ 0; This finishes the induction step. It follows that σ m (z) sum z zi . Then i.e. z is shift-maximal with respect to sum . Define f (n) = ni=1 n clearly f is integer-valued and non-decreasing. Also f (m + n) = i=1 zi + n m σ (z) ≤ f (m) + f (n). Hence (1)–(3) hold. i i=1 Conversely, suppose that f satisfies (1)–(3) and set X = Xf . Let z be the maximal sequence with respect to sum as before. We will prove by induction that r

zi for all r ∈ N. (3.10) f (r) = i=1

This is clear for n = 1, so assume that (3.10) holds for all 1 ≤ r ≤ n. Set a = f (n + 1) − f (n).  We must show that z1 · · · zn a ∈ L(Xf ) and for this it suffices to show that ni=r zi + a ≤ f (n − r + 2) for each 1 ≤ r ≤ n. For r = 1 this holds by the choice of a. Otherwise n

i=r

zi + a =

n

i=1

zi −

r−1

zi + a

i=1

= f (n) − f (f (r − 1)) + a

(by the induction hypothesis)

= f (n + 1) − f (r − 1)

(by the choice of a)

≤ f (n − r + 2)

(by property (3)).

This concludes the induction step and the entire proof.



76

3. Subshifts of Positive Entropy

By Fekete’s Lemma 1.15, limn f (n)/n = inf n f (n)/n, so inf n f (n)/n = 0 if and only if the density shift (Xf , σ) has zero topological entropy by Theorem 3.54. Without proof we state ([523, Theorem 2.10]): Corollary 3.60. If σ m (z) sum z for all m n≥ 0, then z is the maximal sequence of the density shift Xf for f (n) = i=1 zi . Theorem 3.61. Let Xf be a non-trivial density shift with canonical function f . The following are equivalent: (a) (Xf , σ) is topologically transitive. (b) (Xf , σ) is topologically mixing. (c) f is unbounded. Proof. (a) ⇒ (b): Let v, w ∈ L(Xf ) be arbitrary non-empty words. By topological transitivity, there is u ∈ L(Xf ) such that vuw ∈ L(Xf ) as well. But then v0k w ∈ L(Xf ) for every k ≥ |u|, proving topological mixing. (b) ⇒ (a): Trivial. (a) ⇒ (c): Since 1 ∈ L(Xf ), topological transitivity gives a sequence n x ∈ Xf containing infinitely many 1’s. Thus f (n) ≥ i=1 xi → ∞ as n → ∞. (c) ⇒ (a): Let u, v ∈ L(Xf ) be arbitrary non-empty words. Since f |v| |u| is unbounded, there is n ∈ N such that f (n) ≥ i=1 ui + i=1 vi . Then  u0n v ∈ L(Xf ). In particular, SFTs (X, σ) that are also density shifts are transitive, because, unless X = {0∞ }, there is a non-trivial word v and x ∈ X that contains v infinitely often as subword. In fact, density SFTs are completely characterized as those for which the canonical function f satisfies inf n f (n)/n = f (p)/p for some p ∈ N; see [523, Theorem 4.3]. On the other hand, if f is bounded, then all x ∈ Xf end with 0∞ . They can be represented by a finite edge-labeled transition graph [523, Theorem 2.16] and also have a finite collection of follower sets. Hence such density shifts are non-transitive sofic shifts. Sofic density shifts, in general, are characterized [523, Theorem 6.3] as those for which the maximal sequence z is eventually periodic (zn = zn+p for n sufficiently large), or equivalently f (n + p) = f (n) + k (where k = n+p−1 zi and k > 0 if and only if Xf is transitive). i=n Theorem 3.62. Let Xf be a non-trivial density shift with canonical function f . The following are equivalent: (a) Xf contains a periodic point other than 0∞ .

3.5. β-Shifts and β-Expansions

77

(b) There is λ > 0 such that f (n) ≥ λn for all n ∈ N. (c) (Xf , σ) is a coded shift. Proof. (a) ⇒ (b): If 0∞ = x = σ p (x) ∈ Xf , then holds for λ = 1/p. |u| (b) ⇒ (c): Define s(u) =  i=1 ui λ, and let

n

i=1 xi

≥ n/p, so (b)

C := {0s(u) u 0s(u) : u ∈ L(Xf )} be the collection of code words. The “padding blocks” 0s(u) ensure that the “core words” u are sufficiently apart that the code words can be concatenated freely; see [523, Theorem 3.1] for the details. Hence we have the coded shift XC ⊂ Xf . On the other hand, L(Xf ) ⊂ L(XC ) so the reverse inclusion Xf ⊂ XC follows. (c) ⇒ (a): If u is a non-trivial code word, then u∞ ∈ Xf .



Since every infinite subshift is expansive (see below Definition 1.39), Theorems 3.61 and 3.62 allow the following characterizations of chaos for density shifts. Corollary 3.63. Let (Xf , σ) be a density shift with canonical function f . Then: (1) (Xf , σ) is Devaney chaotic if and only if inf n f (n)/n > 0. (2) (Xf , σ) is Auslander-Yorke chaotic if and only if f is unbounded. (3) (Xf , σ) is Li-Yorke chaotic if and only if f is unbounded.

3.5. β-Shifts and β-Expansions Throughout this section, we fix β > 1. A number x ∈ [0, 1] can be expressed as the (infinite) sum of powers of β:  ∞

bk ∈ {0, 1, . . . , β} if β ∈ / N, −k bk β where x= bk ∈ {0, 1, . . . , β − 1} if β ∈ {2, 3, 4, . . . }. k=1 For the case β ∈ {2, 3, 4, . . . }, this is the usual β-ary expansion; it is unique except for the β-adic rationals { βmn : m ∈ Z, n ∈ N}. For example, if β = 10, then 0.3 = 0.29999 . . . . If β ∈ / N, then x need not have a unique β-expansion either. As summarized in Theorem 3.67, some points have uncountably many different expansions, but there is a canonical way to define an expansion, called the greedy expansion: • Take b1 = βx; that is, we take b1 as large as we possibly can. • Let x1 = βx − b1 and b2 = βx1 ; again b2 is as large as possible. • Let x2 = βx1 − b2 and b3 = βx2 , etc.

78

3. Subshifts of Positive Entropy

In other words, xk = Tβk (x) for the map Tβ : x → βx mod 1, and bk+1 is the integer part of βxk . Definition 3.64. The closure of the greedy β-expansions of all x ∈ [0, 1] is a subshift of {0, . . . , β}N ; it is called the β-shift and we will denote it as (Xβ , σ). If b = (bk )∞ k=1 is the β-expansion of some x ∈ [0, 1], then σ(b) is the β-expansion of Tβ (x). The following lemma from [445] characterizes the β-shift in terms of the lexicographic order lex : Lemma 3.65. Let c = c1 c2 c3 · · · be the β-expansion of 1, and suppose it is not finite; i.e. ci > 0 infinitely often5 . Then b ∈ Xβ if and only if σ n (b) lex c for all n ≥ 0. However, the greedy expansion (bi )i≥1 of x is the largest sequence in lexicographical order among all the expansions of x. Example 3.66. Let β = 1.8393 . . . be the largest root of the equation β 3 = β 2 + β + 1. One can check that c = 111000000 · · · . Therefore b ∈ Xβ if and only if one of σ n (b) = 0 · · · ,

σ n (b) = 10 · · · ,

σ n (b) = 110 · · · ,

or

σ n (b) = c

holds for every n ≥ 0. The subshift Xβ is itself not of finite type, because there are infinitely many forbidden words 1110k 1, k ≥ 0, but by some recoding it can be seen to be conjugate to an SFT (see the middle panel of Figure 3.6), and it has a simple edge-labeled transition graph. 1

0

11

0 0

111

1 0

00 0

Tβ2 1

1

0

1

Tβ 1 1

Figure 3.6. Left: The map Tβ for β 3 = β 2 + β + 1. Then Tβ3 (1) = 0. Middle: A corresponding vertex-labeled graph. Right: A corresponding edge-labeled graph. 5 This condition is required for the “if” direction. For example, if c = 1110∞ as in Example 3.66, then b = (110)∞ in (y), so again i(x) ≺pl i(y). This shows that (3.18)

i : ([0, 1], n : en+1 en+2 · · · ek−1 is prefix of ν}.

That is, the function ρ depends on e and ν, but we will suppress this dependence. When we apply this for e = ν itself, we obtain the sequence of cutting times which were introduced in the late 1970s by Hofbauer [316]. They are given recursively by S0 = 1,

Sk+1 = ρ(Sk ),

or in other words Sk = ρk (1) for e = ν and k ≥ 0. Example 3.84. There is a unique transitive unimodal map, up to conjugacy and homtervals18 , that has cutting times S0 , S1 , S2 , S3 , S4 , . . . = 1, 2, 3, 5, 8, . . . equal to the Fibonacci numbers. We call this the Fibonacci (unimodal) map, and, as one would expect, it has connections with Fibonacci substitutions and golden mean rotations; see Proposition 5.26 and [125]. Lemma 3.85. Let ν be an admissible kneading sequence. The integer n ≥ 1 is a cutting time if and only if ν1 · · · νn is admissible w.r.t. ν in the sense that (3.17) holds for it. In this case also ν1 · · · νn contains an odd number of ones. 18 An interval J ⊂ [0, 1] is called a homterval if f n : J → f n (J) is a homeomorphism for every n ∈ N.

3.6. Unimodal Subshifts

93

Proof. We argue by induction. Since ν starts with 10, the statement holds for n = 1, 2. For the induction step, assume the assertion holds for all j < n. Let k be maximal such that Sk < n and assume ρ(Sk ) < ∞, because otherwise, ν is Sk -periodic and there is nothing to prove. We distinguish four cases: • n < ρ(Sk ) and n − Sk is not a cutting time. Then the word  is not admissible by induction, and hence νSk +1 · · · νn = ν1 · · · νn−S k S  σ k (ν1 · · · νn ) fails (3.17). • n < ρ(Sk ) and Sj := n − Sk is a cutting time. Then the word νSk +1 · · · νn = ν1 · · · νS j is admissible by induction. Since |ν1 · · · νSk |1 and |ν1 · · · νSj |1 are odd, |ν1 · · · νn |1 is even and ν1 · · · νn " ν1 · · · νn , so it fails (3.17). • n = ρ(Sk ) and n − Sk is not a cutting time. Then the words  both occur in ν, which ν1 · · · νn−Sk and νSk +1 · · · νn = ν1 · · · νn−S k is against the induction hypothesis. • The remaining case is n = ρ(Sk ) and Sj := n − Sk is a cutting time. Since ρ(Sk ) < ∞, this must be allowed. Furthermore, |ν1 · · · νn |1 = |ν1 · · · νSk |1 +|ν1 · · · νSj |1 ±1 is odd, since |ν1 · · · νSk |1 and |ν1 · · · νSj |1 are odd by the induction hypothesis. This finishes the induction step and the proof.



Hence we can define the kneading map Q : N → N0 ∪ {∞} by (3.20)

Sk = Sk−1 + SQ(k) .

If ρ(Sk ) = ∞, then Q is only defined on {1, . . . , k}. Based on the ρ-function and cutting times, several further admissibility conditions were formulated [118, 316, 534] of which we mention two in the next theorem. Exercise 3.86. Let ν and ν˜ be two admissible kneading sequences with ˜ Show that ν ≺pl ν˜ if and only if (Q(j))j≥1 "lex kneading maps Q and Q. ˜ (Q(j))j≥1 , where "lex is the lexicographical order. Exercise 3.87. How is the ρ-function defined if kneading sequences are expressed as the itinerary of c1 w.r.t. the alternative way of Exercise 3.80? Exercise 3.88. Given an admissible kneading sequence ν, let κ := min{n ≥ 2 : νn = 1} be the position of the second 1. The numbers Sˆj = ρj (κ) are called the co-cutting times. (a) Show that Sˆk − Sˆk−1 is a cutting time, so there is a co-kneading map ˆ : N → N ∪ ∞, . Q Sˆk − Sˆk−1 = SQ(k) ˆ

94

3. Subshifts of Positive Entropy

(b) Assume that c is not periodic. Show that  local maximum if n is a cutting time, |f n (c) − c| has a local minimum if n is a co-cutting time. (c) Show that if n is such that |f n (c) − c| < |f k (c) − c| for all 1 ≤ k < n, then n is a cutting or a co-cutting time. That is, closest returns of c happen at cutting or co-cutting times. ˆ (d) If Q(k) is bounded, show that c is not recurrent. (e) Give an example of a unimodal map with bounded kneading map but c is non-periodic and recurrent. A more graphical way of seeing the cutting times is by means of the following construction. Write ck := f k (c). Inductively, define intervals D1 = [0, c1 ] and  f (Dn ) if c ∈ / Dn , Dn+1 = [cn+1 , c1 ] if c ∈ Dn ; i.e. n = Sk is a cutting time. It follows by induction that Dn = [cn , cβ(n) ] or [cβ(n) , cn ] where β(n) = n − max{Sk : Sk < n}. Moreover, Dn ⊂ Dβ(n) for all n ≥ 2 and these two intervals have cβ(n) as common boundary point. Lemma 3.89. A tent map Ts with slope s > 1 is long-branched if and only if c is periodic or its kneading map Q is bounded. Proof. Note that a tent map Ts is long-branched if and only if lim inf n |Dn | > 0, and since |Dn+1 | = s|Dn | unless n is a cutting time, this is equivalent to lim inf k |D1+Sk | > 0. If c is periodic, then {|Dn | : n ∈ N} is a finite collection and hence Ts is long-branched. So let us assume that c is not periodic. If Q(k) ≤ B, then Sk − Sk ≤ SB . It follows that lim inf k |c − cSk | > 0, because otherwise, the time between cutting times is unbounded. Therefore lim inf k |D1+Sk | > 0 and Ts is long-branched. If, on the other hand, lim supk Q(k) = ∞, then lim supk Sk − Sk−1 = ∞, and hence lim inf k |c − cSk | ≤ lim inf k s−(Sk −Sk−1 ) = 0. This gives  lim inf k |D1+Sk | = 0, and Ts is not long-branched. % The disjoint union D = n≥1 Dn supports a map  Dn+1 if c ∈ / [cn , x], fˆ(x ∈ Dn ) = f (x) ∈ DSQ(k)+1 if c ∈ [cn , x], so n = Sk is a cutting time. The collection {Dn }n≥1 forms a countable Markov partition for (D, fˆ). It  x ∈ [0, 1] satisfies is easy to see that the inclusion map π : x ∈ Dn →

3.6. Unimodal Subshifts

95

π ◦ fˆ = f ◦ π. Hence (D, fˆ) is an extension of ([0, 1], f ), and Hofbauer [316] called it the canonical Markov extension of f , but the object became better known as the Hofbauer tower.

c9 c1

c3

c2

c8

c9 c1

c3

c7

c2

c6

c2

c1

c3

c2

c

c8

c1

c7

c1

c2

c5

c6

c2

c4

0

c2

c1

0

c1

0

c1

0

c1

0

c5

c1

c4

c3

c1

c1

c1

c2

c1

c

c1 1

Figure 3.12. The Hofbauer tower and extended Hofbauer tower for the Fibonacci map.

Hofbauer saw (D, fˆ) as an infinite Markov chain extending the interval dynamics (I, f ) and explicitly added arrows Di → Dj if fˆ(Di ) ⊃ Dj . We can edge-label this graph by setting  0 Di −→ Dj if fˆ−1 (Dj ) ∩ Di ⊂ [0, c], 1 Di −→ Dj if fˆ−1 (Dj ) ∩ Di ⊂ [c, 1]. The infinite paths on this graph starting in D1 are thus put in one-to-one correspondence with X = {i(x) : x ∈ [0, f (c)]} = {x ∈ {0, 1}N0 : σ k (x) pl ν}. Therefore the edge-labeled Hofbauer tower is immediately the countable state automaton accepting the language L(X); see Figure 3.13 for the Fibonacci map. Such automata are discussed at length in [565, Chapter 5 & 6]. If c has an infinite orbit, then all the sets Dn are all different. Therefore the corresponding unimodal shift has F (k) = k + 1 distinct follower sets associated to k-words; see (3.14) and Theorem 3.76. If c is preperiodic, then there are only finitely many different levels Dn , and L(X) is sofic, and in fact it is an SFT if c is periodic.

96

3. Subshifts of Positive Entropy

0

1



1

1



0

0



0



0

1



1



1



0



1



1



Figure 3.13. The edge-labeled Markov graph for the Fibonacci map.

One can extend the Hofbauer tower so as to account for the co-cutting ˆ 1 = [0, 1] and inductively times as well. Set D ⎧ ˆ ⎪ / Dn , ⎨f (Dn ) if c ∈ ˆ n+1 = f (En ) if c ∈ Dn and En is the component of D ⎪ ⎩ Dn \ {c} containing c, See Figure 3.12 for the Hofbauer tower and extended Hofbauer tower of ˆ n for all n ≥ 1 and there is a the unimodal Fibonacci map. Then cn ∈ D n−1 ˆ n is monotone onto. : Zn−1 → D neighborhood Zn−1  f (c) such that f ˆ Also, if c ∈ Dn , then n is a cutting or a co-cutting time. More precisely,  ˆ n , then n is a cutting time if c ∈ Dn , (3.21) if c ∈ D ˆ n \ Dn . co-cutting time if c ∈ D It is clear from this that the cutting and co-cutting times are disjoint sequences. Theorem 3.90. A sequence ν = 10 · · · is an admissible kneading sequence if one of the following equivalent conditions is satisfied: (a) σ(ν) pl σ n ν pl ν for all n ∈ N0 . (b) The kneading map is well-defined by (3.20) above, and (according to Hofbauer [316]) (3.22)

{Q(k + j)}j≥1 #lex {Q(Q2 (k) + j)}j≥1 for all k ≥ 1, where #lex stands for the lexicographical order on sequences. Here we set Q(0) = 0 by convention.

(c) If ρ(m) < ∞, then ρ(m) − m is a cutting time. (d) The sequences of cutting times {Sk }k≥0 and co-cutting times {Sˆ }≥0 (see Exercise 3.88) are disjoint. Proof. We first show that admissibility implies the four conditions (a)–(d). The necessity of condition (a) is shown in Theorem 3.83. Condition (d) follows directly from (3.21). Define the closest precritical point ζ ∈ [0, 1] as any point such that f n (ζ) = c for some n ≥ 1 and f k (x) = c for all k ≤ n and x ∈ (ζ, c). By symmetry, if ζ is a closest precritical point, ζˆ = 1 − ζ is also a closest

3.6. Unimodal Subshifts

97

precritical point. If ζ  ∈ (ζ, c) is the closest precritical point of lowest n > n, ˆ coincide for exactly n − 2 then the itineraries of f (c) and f (x), x ∈ (ζˆ , ζ),   entries and differ at entry n − 1. Hence n is a cutting time, say n = Sk for some k  ≥ 1. We use the notation ζ = ζk if n = Sk . That is,  f Sk (ζk ) = f Sk (ζˆk ) = c, (3.23) · · · < ζk < ζk+1 < · · · < c < · · · < ζˆk+1 < ζˆk < · · · and (3.24)

x ∈ Υk := (ζk−1 , ζk ] ∪ [ζˆk , ζˆk−1 ) ⇒ i(f (x)) = ν1 · · · νSk −1 νS k .

Applying this to x = f m (c), we obtain that ρ(m) − m is a cutting time, and this proves (c).

Figure 3.14. The points ζQ(k) < cSk−1 < ζQ(k)−1 and their images under f SQ(k) .

In particular, (3.25)

f Sk−1 (c) ∈ ΥQ(k) = (ζQ(k)−1 , ζQ(k) ] ∪ [ζˆQ(k) , ζˆQ(k)−1 ),

see Figure 3.14, and the larger Q(k), the closer f Sk−1 (c) is to c. Formula (3.22) can be interpreted geometrically as cSk ∈ [c, cSQ2 (k) ]; see Figure 3.14. To see this, apply f SQ(k) to the points ζQ(k) , cSk−1 , and ζQ(k)−1 . We find cSk ∈ (c, cSQ2 (k) ), so cSk is closer to c than cSQ2 (k) is. This implies that Q(k + 1) ≥ Q(Q2 (k) + 1). If the inequality is strict, then (3.22) holds. Otherwise, i.e. if Q(k + 1) = Q(Q2 (k) + 1), then both cSk and cSQ2 (k) ∈ (ζQ(k+1) , ζQ(k+1)−1 ) and we apply f SQ(k+1) , which maps (ζQ(k+1) , ζQ(k+1)−1 ) S into [c, f Q2 (k+1) (c)). Therefore cSk+1 ∈ (c, cSQ2 (k)+1 ). This shows that Q(k + 2) ≥ Q(Q2 (k) + 2). If the inequality is strict, then again (3.22) holds; otherwise both c1+Sk+1 and c1+SQ2 (k)+1 ∈ ΥQ(k+2) and we can apply f SQ(k+2) . Repeating the argument shows that (3.22) holds in any case, and (b) is proven. (c) ⇒ (a): Since ρ(m) − m is a cutting time, #{m < j ≤ ρ(m) : νj = 1} is even by Lemma 3.85. Hence σ m (ν) pl ν (cf. Exercise 3.86). Since ν1 = 1, the parity-lexicographical order implies that σ m+1 (ν) #pl σ(ν) for all m.

98

3. Subshifts of Positive Entropy

(a) ⇒ (b): We have ν1 · · · νSk = ν1 · · · νSk−1 ν1 · · · νS Q(k) = ν1 · · · νSk−1 ν1 · · · νSQ(k)−1 ν1 · · · νSQ2 (k) , so ν1 · · · νSQ2 (k) is a suffix of ν1 · · · νSk . Suppose by contradiction that (3.22) fails at entry k, say Q(k)+j = Q(Q2 (k)+j) for 1 ≤ j ≤ j0 and Q(k)+j0 +1 < Q(Q2 (k) + j0 + 1). Then σ

Sk −SQ2 (k)

(ν)

= ν1 · · · νSQ2 (k) ν1 · · · νS Q(k+1) · · · · · · ν1 · · · νS Q(k+j ) ν1 · · · νS Q(k+j +1) · · · 0  

 

   0  even ν1 · · · νSQ2 (k) ν1 · · · νS Q(Q2 (k)+1) odd

=

even · · · ν1 · · · νS Q(Q2 (k)+j

= ν1 · · · νn · · · " ν,

0

even ν1 · · · νS k+j +1) ) 0

···

where n is not a cutting time because SQ2 (k)+j0 < n < SQ2 (k)+j0 +1 . This contradicts (3.17). (b) ⇒ (d): First we claim that if Sk−1 < Sˆ < Sk < Sˆ+1 , then Sk − Sˆ = SQ2 (k) . This is true for Sˆ0 = κ and Sk = κ + 1, because then Sk − S = 1 = SQ2 (k) . Assume now by induction that Sk−1 < Sˆ = Sk − SQ2 (k) < Sk for some k, . Then (3.22) gives Sˆ+1 = Sk + Sj for some k  ≥ k and j < Q(k  + 1). But since νSk +1 · · · νSk +Sj νSk +1 = ν1 · · · νSj · · · νS  , the integers Sk + Q(k +1)

Sj , Sk + Sj+1 , . . . , Sk + Sj  are co-cutting times for all j ≤ j  < Q(k  + 1). The largest such integer is Sˆ := Sk + SQ(k +1)−1 , so

Sk +1 − Sˆ = Sk +1 − (Sk + SQ(k +1)−1 ) = SQ(k +1) − SQ(k +1)−1 = SQ2 (k +1) , and this completes the induction step. But repeating this step also shows that {Sk }k≥0 and {Sˆ }>0 are disjoint, so (d) holds. It remains to prove that (d) implies admissibility. For this we will use the quadratic family fa (x) = ax(1 − x), a ∈ [0, 4], with critical point c = 12 to which we assign the symbol ∗. Let Aν1 ···νn := {a ∈ [0, 4] : the kneading sequence ν(fa ) starts with ν1 · · · νn }. √ Then A0 = √[0, 2), A1 = (2, 4] while f2 (c) = c. Also A11 = [2, 1 + 5), 2 √ (c) = c. We are only interested in kneading A10 = (1 + 5, 4] while f1+ 5 sequences starting with 10, so we continue with A10 . Define ϕn (a) := fan (c). It is easy to check that ϕ2 : A10 → [0, c] = 2 2 √ (c)] = [f 2 (c), c] is monotone onto. We claim that this holds in [f4 (c), f1+ 4 5 general: for all prefixes ν1 · · · νn of some ν satisfying (b), (3.26)

ˆ

Smax max (c), fan− (c)] is monotone onto, ϕn : Aν1 ···νn → [fan−S 1 2

3.6. Unimodal Subshifts

99

where Smax and Sˆmax are the largest cutting and co-cutting times in ν1 · · · νn (see Exercise 3.88) and a1 , a2 are the boundary points of Aν1 ···νn and the ˆ max (c), f n−Smax (c)] may be the other way order in boundary points in [fan−S a2 1 around. Also Sˆmax = 0 if ν1 · · · νn = 10 · · · 0 by convention. If n + 1 is neither a cutting nor co-cutting time, then νn−Smax +1 = ˆ max (c), f n−Smax (c))  c, Therefore A νn−Sˆmax +1 = νn+1 , so (fan−S ν1 ···νn+1 = a2 1 n Aν1 ···νn and Smax and Sˆmax remain the same. Also fa : f (Aν1 ···νn+1 ) → ϕn+1 (Aν1 ···νn+1 ) is monotone onto, so (3.26) holds for Aν1 ···νn+1 . If n + 1 is a cutting time, then νn−Smax +1 = νn−Sˆmax +1 = νn+1 , so ˆ

Smax +1 max +1 (fan−S (c), fan− (c))  c. 1 2

Now Aν1 ···νn+1 is a proper subset of Aν1 ···νn and Smax = 0 and Sˆmax remains the same. Again fa : f n (Aν1 ···νn+1 ) → ϕn+1 (Aν1 ···νn+1 ) is monotone onto, so (3.26) holds for Aν1 ···νn+1 . If n + 1 is a co-cutting time, then νn−Sˆmax +1 = νn−Smax +1 = νn+1 , so ˆ

Smax +1 max +1 (c), fan− (c))  c. (fan−S 1 2

Again, Aν1 ···νn+1 is a proper subset of Aν1 ···νn and Sˆmax = 0 and Smax remains the same. Also fa : f n (Aν1 ···νn+1 ) → ϕn+1 (Aν1 ···νn+1 ) is monotone onto, so (3.26) holds for Aν1 ···νn+1 . Since Aν1 ···νn+1 ⊂ Aν1 ···νn , n≥2 Aν1 ···νn = ∅. If ν is periodic and νn = ∗, then there is a ∈ ∂A ν1 ···νn with ν(fa ) = ν. Otherwise, Aν1 ···νn+1 ⊂ Aν1 ···νn infinitely often, so n≥2 Aν1 ···νn = ∅ and ν(fa ) = ν for each a ∈  n≥2 Aν1 ···νn . Exercise 3.91. Show that if Sk−1 < n < Sk ≤ ρ(n), then Sk − n is a cutting time. 3.6.4. Kneading Determinants and Topological Entropy. The theory of kneading determinants was developed by Milnor & Thurston [420] (see also the exposition in [414, Section II.8]) in order to address properties of topological entropy and counting periodic orbits for unimodal maps f : I → I. Recall the alternative way of symbolic dynamics for unimodal maps from Exercise 3.80, evaluated at the critical value  +1 if f k is increasing at f (c), (3.27) θk = −1 if f k is decreasing at f (c). The power series (3.28)

Df (t) := 1 + θ1 t + θ2 t2 + θ3 t3 + · · ·

100

3. Subshifts of Positive Entropy

is called the kneading determinant of the unimodal maps f . To deal with the case that f p (c) = c for some (minimal) period p ≥ 1, it is more accurate to say increasing/decreasing on a left neighborhood of f (c) instead of at f (c). Thus θp = 1 in this case, and Df (t) has a periodic sequence of P (t) coefficients. Therefore Df (t) = 1−t p for some polynomial of degree p − 1 with coefficients ±1 if c is p-periodic (see Table 3.1 19 and Corollary 3.96), but this rational function can sometimes be reduced to a simpler form. For example, if f 2 (c) = c, then the itinerary of f (c) is i(f (c)) = 1 ∗ 1 ∗ . . . and 1−t 1 Df (t) = 1 − t + t2 − t3 + t4 − · · · = 1−t 2 = 1+t . Example 3.92. Hofbauer [315] showed that tent maps Ts (x) are intrinsically ergodic. If s > 1, then the measure of maximal entropy is absolutely continuous w.r.t. Lebesgue measure and its density is given explicitly as dμ θ(n) = 1 n+1 s s (x), (3.29) dx sn [Ts ( 2 ), 2 ] n≥1

for θ(n) as in (3.27); see [195, Section 5.3]. In fact, (3.29) extends to skew tent maps ⎧ t ⎨sx if 0 ≤ x ≤ c := s+t , (3.30) Ts,t : [0, 1] → [0, 1], Ts,t (x) = ⎩t(1 − x) if c ≤ x ≤ 1 with slopes s, t > 1, st ≤ s + t, as 1 dμ

1 k (c),Ts,t (c)] ; = k ) (T (c)) [Ts,t dx (T s,t s,t k≥0 see [330]. Further results in this direction can be found in [282, 370]. The main result of this section relates the kneading determinant to the topological entropy of the map. The rest of this section leads up to its proof. Theorem 3.93. The topological entropy htop (f ) > 0 if and only if t0 := inf{t > 0 : Df (t) = 0} ∈ (0, 1) and in this case htop (f ) = − log t0 . By setting 0 < ∗ < 1 we can extend ≺pl to sequences in {0, ∗, 1}N with the property that if em = ∗, then σ m (e) = ν is the kneading sequence ν of f. Milnor & Thurston [420] used formal power series rather than symbolic dynamics to phrase their kneading theory. This is a bit more involved, but for many purposes a very powerful method. Let us interpret the intervals 19 Exercise 7.15 gives a precise recursive formula of the lap-number of the Feigenbaum map, and Exercise 3.98 gives the kneading determinant of the quadratic map with a period 3 critical point.

3.6. Unimodal Subshifts

101

Table 3.1. Kneading determinants and lap-numbers for the quadratic family fa (x) = ax(1 − x).

Attractor

Kneading det. Dfa (t)

Lap-number (fan |[0,1] )

period 1

2

period 4

1 1−t 1 1+t 1−t 1+t2

period 8

(1−t)(1−t2 ) 1+t4

period 2

period 2r Feigenbaum period 3

2n n2 − n + 2 1 3 12 (2n

− 3n2 + 22n + 0 or 3)

r−1

(1−t)(1−t2 )·(1−t2 ) r 1+t2 & 2r r≥0 (1 − t ) 1 1−t−t2

polynomial of degree r + 1 superpolynomial 2Fn (Fibonacci number)

E0 := [f 2 (c), c) and E1 := (c, f (c)] as formal unit vectors associated with symbols 1 and 0. For j ≥ 0 define ⎧ ⎨ +1 if f j (x) ∈ E0 , 0 if f j (x) = c, εj (x) := ⎩ −1 if f j (x) ∈ E1 , expressing the fact that f |E0 preserves and f |E1 reverses orientation. Then the product ' Ek if f n (x) ∈ Ek for k = 0, 1, Θn (x) := ε0 (x) · · · εn−1 (x) · 1 if f n (x) = c 2 (E0 + E1 ) is a formal vector expressing where f n (x) is situated and whether f n is locally increasing, decreasing or assumes an extremum at x. The vectorvalued formal power series

Θn (x)tn Θ(x, t) = n≥0

is called the invariant coordinate of x. Lemma 3.94. The sum of the coefficients of Ej , j = 0, 1, 2, satisfies (3.31)

1

(1 − ε(Ej )t) · δj (Θ(x, t)) = 1 j=0

for every point x. Here the Kronecker delta (δi (Ej ) = 1 if i = j and δi (Ej ) = 0 otherwise) is extended by linearity to vectors with Q[t]-valued coefficients. Example 3.95. Before proving this lemma, let us see how this works out for the fixed points of f . The orientation-reversing fixed point α ∈ E1

102

3. Subshifts of Positive Entropy

has Θ(α) = (1 − t + t2 − t3 + · · · )E1 , so formally (1 − ε(E1 )t)δ1 (Θ(x)) = (1 + t)(1 − t + t2 − t3 + · · · ) = 1, whereas δ0 (Θ(x)) = 0, because E0 doesn’t appear in θ(x). For the orientation-preserving fixed point β ∈ E0 it works similarly with indices and signs reversed: Θ(β) = (1 + t + t2 + t3 + · · · )E0 , so (1 − ε(E0 )t)δ0 (Θ(x)) = (1 − t)(1 + t + t2 + t3 + · · · ) = 1. If f also has a period two orbit {γ0 , γ1 } with γi ∈ Ei , then we can compute E0 − tE1 tE0 + E1 , Θ(γ1 ) = . Θ(γ0 ) = 2 1+t 1 − t2 (1+t)t (1−t)t 1−t 1+t Therefore 1+t 2 + 1+t2 = 1 = 1−t2 + 1−t2 as Lemma 3.94 claimed.  Proof. We write 1j=0 (1 − ε(Ej )t)δj (Θ(x, t)) as a double sum and assume for simplicity that orbf (x)  c: 1



(1 − ε(Ej )t)δj (Θn (x))tn

j=0 n≥0

=

1

j=0

=





n−1 

n≥0

k=0



εk (x) t − n



n≥0



n−1 

n≥0

k=0

f n (x)∈E1

εk (x) tn

k=0

n≥0

f n (x)∈Ej

f n (x)∈E0

+

(1 − ε(Ej )t)

n−1 

f n (x)∈E0

εk (x) t − n

n≥0

f n (x)∈E1

n 

εk (x) tn+1

k=0



n 

εk (x) tn+1 .

k=0

  Formally, all positive powers tn cancel, leaving only x∈E0 t0 + x∈E1 t0 = 1. If f n (x) = 0 for some n, then the definition of Θn (x) allows a similar proof.  The qualitative behavior of the entire interval map is given by the invariant coordinate of the critical value. In this terminology, the kneading increment ν(t) := lim Θ(x, t) − lim Θ(x, t) xc

x c

is the object closest to our kneading sequence.20 This formula obviously expresses the change of kneading coordinate Θ(x) as the point x moves 20 We

changed the sign from the definition on page 483 of [420] because in our setting f assumes a maximum rather than a minimum at the critical point. The same construction, with d formal unit vectors Ek , k = 0, 1. . . . , d − 1, can be carried out for a d − 1-modal interval map (i.e. with d − 1 critical points) and also (although not covered in [420]) for piecewise continuous maps.

3.6. Unimodal Subshifts

103

across c, but it can also be used to express a change of kneading coordinate Θ(x) as the point x moves across z for any precritical point z, say f n (z) = c for some minimal n ≥ 0: lim Θ(x, t) − lim Θ(x, t) = tn ν(t).

xz

x z

Milnor & Thurston [420] continue to define kneading matrices and kneading determinants Df (t) which in the unimodal case is equal to (3.32) Df (t) =

1 (δ0 (ν(t)) − δ1 (ν(t))) = 1 + ε1 (c1 )t + ε1 (c1 )ε2 (c1 )t2 + · · · 2

which is the same as (3.28). Corollary 3.96. If the critical point c is periodic of period p, then the kneading determinant D(t) is a polynomial of degree p − 1. If 0 is attracted to a periodic orbit orb(x), then the kneading determinant is rational: Df (t) =

P (t) , 1 ± tp

where p is the period of x, P is a polynomial of degree < p and the sign ± is according to whether f p reverses or preserves orientation at x. Proof. If f p (c) = c, then εp = 0, so Df (t) is truncated at the p − 1-st term. If c is attracted to a periodic attractor, rather than periodic itself, then 0 is in the immediate basin of some periodic point x. Writing ε0 = 1, we obtain ε0 · ε1 · · · εp−1 = εkp · εkp+1 · · · · εkp+p−1 = ±1 for all k, where the sign only depends on whether f p preserves or reverses orientation. Hence  P (t) Df (t) = P (t) k≥0 (±t)k = 1∓t p for the polynomial P (t) = 1 + ε1 (c1 )t + 2  ε1 (c1 )ε2 (c1 )t + · · · + ε1 (c1 )ε2 (c1 ) · · · εp−1 (c1 )tp−1 . For an open interval J ⊂ I and n ≥ 0, let γn (J) be the number of precritical points z such that n is the minimal integer such that f n (z) = c,  forming the formal power series γ(J) = n≥0 γn (J)tn . Define also the lapnumber (f n |J ) as the number of maximal subintervals of J on which f n is monotone21 .

21 But not necessarily strictly monotone — even if f n | has flat pieces, this does not count J towards the lap-number.

104

3. Subshifts of Positive Entropy

Then, as formal power series, (1 − t)



(f n |J )tn−1 = 1 −

n=1

= 1+ = 1+



(f n |J ) +

n=0 ∞



(f n+1 |J )tn

n=0

((f n+1 |J ) − (f n |J ))tn

n=0 ∞

γn (J)tn ,

n=0

so that ∞

(3.33)

(f |J )t n

n=1

n−1

1 = 1−t

 1+



γn (J)tn .

n=0

Lemma 3.97. For the interval J = (a, b), the difference lim Θ(x, t) − lim Θ(x, t) = γ(J)ν(t).

xb

x a

Also γ([0, 1]) = (1 − t)−1 Df (t)−1 and ∞

(3.34)

(f n )tn−1 =

n=1

  1 1 − t + Df (t)−1 . 2 (1 − t)

Proof. The difference limxb Θ(x, t) − limx a Θ(x, t) is the sum of the increments of all precritical points z. Each precritical point of order n gives a contribution of tn ν(t), and γ(J) counts how many order n precritical points there are, giving them weight tn . So the first formula follows. Since Θ(β) = E0 (1 + t + t2 + . . . ) =

1 E0 1−t

and Θ(−β) = E1 − E0 (t + t2 + t3 + . . . ) = E1 −

t E0 , 1−t

we can use formula (3.32) to simplify for J = [0, 1]: γ([0, 1])D(t) = =

=

1 (δ0 (γ(t)ν(t)) − δ1 (γ(t)ν(t))) 2 1 δ0 ( lim Θ(x, t) − lim Θ(x, t)) x1 x 0 2 −δ1 ( lim Θ(x, t) − lim Θ(x, t)) x1 x 0   1 t 1 1 + +1 = . 2 1−t 1−t 1−t



3.6. Unimodal Subshifts

105

This gives γ([0, 1]) = (1 − t)−1 Df (t)−1 . Combined with (3.33), we also get   ∞ 1 n n−1 −1 = (1−t)2 1 − t + Df (t) .  n=1 (f )t Exercise 3.98. Let the parameter a = 3.83187405528332 . . . of the quadratic family fa (x) = ax(1 − x) be such that f 3 (c) = c for the critical point c = 12 . (1) Show that the kneading determinant is Dfa (t) =

1−t−t2 . 1−t3

(2) Show that ∞

1 + γ(t) =

2 = 2Fn tn , 1 − t − t2 n=0

for the Fibonacci numbers F0 = F1 = 1, F2 = 2, . . . .

√ (3) Hence show that (fan |[0,1] ) = 2Fn and that htop (fa ) = log 12 (1+ 5). The main result of this section follows directly from (3.34).  n n−1 converges for Proof of Theorem 3.93. The power series ∞ n=1 (f )t all t less than its radius of convergence R. But by [422], the lap-number (f n ) ∼ ehtop (f )n , so − log R = htop (f ). By (3.34), this is also the first zero of the kneading determinant, as required.  Remark 3.99. Theorem 3.93 amounts to a result from  [195, Section 3] −j stating that the slope s of the tent map Ts (x) satisfies s = ∞ j=0 Θj (c1 )s , which can be rewritten to  

∞ 1 1 DTs = = 0. k s DTs (Ts (c)) k=0

As discovered in [330] as a consequence of their so-called for∞ f -expansion 1 mula, this fact extends to skew tent maps: Ts,t as k=0 DT k (T (c)) = 0. s,t

s,t

One of the consequences is that the first zero of Df (t) on [0, 1) is t0 = exp(−htop (Ts,t )) provided htop (Ts,t ) > 0. The following theorem is one of the main results in [420], namely Theorem 9.2 and Corollary 10.7. We don’t prove or need it for our purpose, but we mention it for completeness sake.

106

3. Subshifts of Positive Entropy

Theorem 3.100. The reduced dynamical ζ-function22  ζ(t) := exp



1 #{x : f k (x) = x} tk k k=1

of f : R → R satisfies (3.35)

ζ(t)

−1

' =

(1 − t)D(t) (1 − t)(1 − tp )D(t)

if c is non-periodic, if c is periodic of period p.

3.6.5. Complex Kneading Theory. The standard and most direct extension of unimodal dynamics to the complex plane is via the quadratic family23 fc (z) = z 2 + c. This family is conjugate to fa (w) = aw(1 − w) via √ 2 w = ψ(z) = 12 − az and c = a2 − a4 , a = 1 + 1 − 4c. The family fc has its own kneading theory with features interesting enough to devote a separate section to. Instead of symbolic dynamics on a core interval in the real setting, we now address the symbolic dynamics on a tree H (called the Hubbard tree) or dendrite24 . Definition 3.101. A quadratic Hubbard tree is a tree H equipped with a continuous tree map fH : H → H which is at most two-to-one and with a single critical point c0 , where fH is not a local homeomorphism onto its image, and all end-points lie on the critical orbit; see [124, 354, 452]. We extend this definition to H being a dendrite; the properties of fH : H → H remain the same, except that all post-critical points are end-points but there can be end-points that are in the closure of orbfH (c0 ), but not in orbfH (c0 ) itself. The Hubbard tree models the closed connected hull of the critical orbit within the Julia set Jc provided Jc is a dendrite. This applies to most of the parameters c in the Mandelbrot set M that do not lie in the closure of any of its hyperbolic components. But also if Jc is not a dendrite (because it is not locally connected, or there are bounded Fatou components), there is always a topological model for the Hubbard tree that satisfies Definition 3.101. The

22 Here we count at most one k-periodic point in each lap of f k ; if there are two such orbits with the period of one twice the period of the other (as is the case shortly after a period doubling bifurcation), then only the orbit of the smaller period is counted. Note, however, that the period need not be minimal: a k-periodic point also counts for 2k, 3k, . . . . 23 We use a different font to distinguish from f (w) = aw(1 − w). Also we will write c = 0 a 0 for the critical point of fc (to distinguish it from the parameter), so c1 = fc (c0 ) = c. 24 A dendrite is a compact, connected, locally connected (and therefore arc-connected) set without loops.

3.6. Unimodal Subshifts

107

questions to answer are: (1) Which sequences in {0, 1}N0 are itineraries of points in the Hubbard tree? (2) Which sequences in {0, 1}N0 are kneading sequences, i.e. the itinerary of the critical value in the Julia set? (3) Determine the combinatorial structure of the Hubbard tree (branchpoints, their number of arms, and relative position) from the kneading sequence. Answers to questions (2) and (3) can be found in [126], but we will give combinatorial proofs in terms of the 0-1-sequences, showing the following: • c1 = fH (c0 ) is an end-point of the Hubbard tree. • Every branch-point is precritical or preperiodic; i.e. there are “no wandering triangles” (see [535] and Proposition 3.106). We first need to recall some basics from complex dynamics; see [419]. Let f : C → C be a polynomial of degree ≥ 2. It is often handy to extend f ˆ = C ∪ {∞}, i.e. the one-point compactification of to the Riemann sphere C the complex plane. Then ∞ is always a superattracting fixed point, with no other preimage than itself. For fc (z) = z 2 + c one can see this easily by using w2 a change of coordinates w = 1/z, sending ∞ to 0 and fc (w) = 1+cw 2 . The Riemann sphere falls apart into two fully invariant sets, called the Fatou set F and the Julia set J . The Fatou set is the set of regular dynamics: every z ∈ F has a neighborhood U  z such that (1) f n (U ) converges to an attracting or parabolic periodic orbit orb(p), i.e. f r (p) = p and (f r ) (p) lies in the open unit disk D or is a root of unity, or (2) f maps U into an open topological disk D (called a Siegel disk) on which the dynamics is conjugate to an irrational rotation on the unit disk D. By Sullivan’s Theorem, see [527] and [419, Section 16], there are no other possibilities for rational maps25 , and entire maps can have Baker domains (wandering regions). The Fatou set is open, and for polynomials, F = ∅ because the superattracting fixed point ∞ ∈ F and its basin of attraction A∞ ⊂ F . The Julia set J , the complement of the Fatou set, is the set of chaotic motion. It is closed, completely invariant (f −1 (J ) = J ), non-empty (since it 25 Although rational maps can also have Herman rings (as Siegel disks, but now on an annulus instead of a disk).

108

3. Subshifts of Positive Entropy

contains all repelling periodic points; in fact J = {repelling periodic points}; see [419, Theorem 14.1]) and for polynomials, J = ∂A∞ . The Mandelbrot set is the locus in parameter space of the family fc where Jc is connected, and equivalently, the critical orbit {fnc (0)}n≥0 is bounded; see [144, Theorem VIII.1.1]. This means that for c ∈ M, the ˆ Using the Riemann Mapping Theorem basin A∞ is a topological disk in C. we can find a conformal homeomorphism ψc : D → A∞ which in fact can be chosen such that ψc (z 2 ) = fc (ψc (z)). These are the Böttcher coordinates; see [419, Section 9]. The images Rc (ϑ) = ψc ({re2πiϑ : 0 < r < 1}) are called the dynamic external rays (see [419, Section 18]); they form an fc -invariant foliation of A∞ , namely Rc (2ϑ) = fc (Rc (ϑ)). If Jc is locally connected, then each external ray Rc (ϑ) lands26 ; i.e. ψc (re2πiϑ ) → zϑ ∈ Jc as r → 1, and this zϑ is called the landing point of Rc (ϑ). In fact, also if Jc is not locally connected, Rc (ϑ) lands for Lebesgue-a.e. ϑ including every rational ϑ ∈ S1 . A point z on the Julia set is called biaccessible if it is the landing point of at least two external rays Rc (ϑ) and Rc (ϑ ), and these external angles are then also called biaccessible. Biaccessibility is an equivalence relation ∼c on the circle S1 of external angles. Its equivalence classes are closed, forward invariant under the doubling map g(ϑ) = 2ϑ mod 1, and if ϑi → ϑ, ϑi → ϑ , and ϑi ∼c ϑi for all i ∈ N, then also ϑ ∼c ϑ . The quotient space S1 / ∼c is well-defined and Hausdorff, with a well-defined doubling map on it. The collection of geodesic circles connecting equivalent angles in the disk is called a Thurston lamination and the quotient space is the pinched disk model of the Julia set. If Jc is a dendrite, then (Jc , fc ) is in fact conjugate to its pinched disk model (S1 / ∼c , g). Remark 3.102. The Thruston laminations present an interesting example of how simple partitions of S1 can lead to interesting non-injective itinerary maps. Namely, set J0b = (b, b + 12 ), J1b = S1 \ J0b for b ∈ [0, 12 ) and let S(x) = 1 − x be an involution. Then i(x) = i(S(x)) whenever orb(x) avoids 1 , the symmetric difference J0b $S(J0b ) = (b, 12 − b) ∪ (b + 12 , 1 − b). If b > 15 + 17 then there is a Cantor set of points for which i(x) = i(S(x)). Also if x = S(y), this still doesn’t guarantee that i(x) = i(y) for the itinerary map i w.r.t. {J0b , J1b }. The reason for this is that the quotient space S1 / ∼ for x ∼ y if i(x) = i(y) is a topological “pinched disk” model for the Julia set Jc of fc : z → z 2 + c for some specific c, namely the landing point of the external parameter ray with angle 2b; see [124, 354, 452] and also Section 3.6.5, in particular Figure 3.15 for an illustration of this pinched disk model. Injectivity of i is equivalent to S/ ∼ being a topological circle, which means that Jc is the boundary of a Siegel disk. This happens if c lies on the main cardioid of the Mandelbrot set. 26 Due

to the theorem of Carathéodory; see [419, Section 17].

3.6. Unimodal Subshifts

109

We call z an end-point if it is not biaccessible; i.e. its equivalence class under ∼c contains only z and z is a branch-point with q ≥ 3 arms if its equivalence class under ∼c consists of q points27 . 1 3

ϑc =

1 3

• c2 = c4 •

1 6



c1 •

7 12

c3•

1

1 12

1 •c

1 6

1 12

•0

0

0

7 12

0

• 2 3

fi : z → z 2 + i 2 3

Figure 3.15. The Hubbard tree inside the disk model and the Julia set of the external angle ϑc = 1/6 and kneading sequence ν = 110.

By the symmetry z → −z, the critical point 0 is always biaccessible (at least in its pinched disk model), so it has (at least) two externals angles ϑ∗ and ϑ∗ = ϑ∗ + 12 . These divide the circle into two open semi-circles S0 and S1 , where by convention28 ϑc := g(ϑ∗ ) = g(ϑ∗ ) ∈ S1 . Now we can define the itinerary map i : S1 → {0, ∗, 1}N0 : ⎧ k ⎪ ⎨0 if g (ϑ) ∈ S0 , i(ϑ)k = ∗ if g k (ϑ) ∈ {ϑ∗ , ϑ∗ }, ⎪ ⎩ 1 if g k (ϑ) ∈ S0 . The kneading sequence ν = νc = ν1 ν2 ν3 · · · is the itinerary of ϑc w.r.t. this partition; see Figure 3.15. Each itinerary i(ϑ) ∈ {0, 1}N0 is well-defined, except for the countably many precritical angles. On the other hand, for every e ∈ {0, 1}N0 except those for which ν is a suffix of e, there is ϑ ∈ S1 such that i(ϑ) = e (and recall that itineraries of angles in the same equivalence class of ∼c are the same). However, not every 0-1-sequence is achieved by the post-critical angle. Theorem 3.103. The map c → htop (fc ) is non-increasing on R. also means that Jc \ z has q connected components. exclude the case ϑc = ϑ∗ = 0 achieved for c = 1/4.

27 This 28 We

110

3. Subshifts of Positive Entropy

This entropy function cannot be strictly increasing, because it is constant inside every hyperbolic component. Since the family fa (w) = aw(1 − w) is conjugate to fc (z) = z 2 + c via conjugacies that relate the parameters in an orientation-reversing way, a → htop (fa ) is non-decreasing. Although Theorem 3.103 is a result in real dynamics, all known proofs rely on complex dynamics. The elegant proof we present here was given by Douady [205]. Proof. Recall again the definition of θ(x) of Exercise 3.80 as an alternative way to code itineraries of unimodal maps fc . Applied to the critical value c1 , this reads as  +1 if fnc is increasing at c1 , θn = −1 if fnc is decreasing at c1 . Set γ=

1 2



1 − θn

2

n=1

( ) 2−n ∈ 0, 12 .

Next, for the doubling map g : S1 → S1 , ϑ → 2ϑ mod 1, define the set / (γ, 1 − γ) for all j ≥ 0}; Kγ = {ϑ ∈ S1 : g j (ϑ) ∈ see Figure 3.16. 2 3

2 7



1

•7 γ=

25 56 1 2

1−γ =

31 56

25 56

• •

• • c1 α

• β

4 • 7

31 56





11 14

25 28 1 3

Figure 3.16. A schematic Julia set for external angle γ = 25/56 (with ν = 100101) and the Mandelbrot set with some external rays.

The set Kγ is g-invariant, and usually a Cantor set, except for γ = 12 , to ([c1 , c2 ], f), in an entropywhen Kγ = S1 . Then (Kγ , T ) is semi-conjugate √ 1 preserving way. Let β = β(c) = 2 (1+ 1 − 4c) be the orientation-preserving fixed point of fc . Then there is a semi-conjugacy L : Kγ → [c1 , β] given by the landing point L(ϑ) of the dynamical external ray R(ϑ) with angle ϑ. Also #L−1 (x) ≤ 4 (namely #L−1 (x) = 4 if x is (pre)critical, #L−1 (x) = 1

3.6. Unimodal Subshifts

111

if x = ±β (but −β ∈ [c1 , β) only if c1 = −2), and #L−1 (x) = 2 for all other x ∈ [c1 , β)). In particular, htop (T |Kγ ) = htop (fc |[c1 ,β] ). Moreover, ⎧ ⎪ c → (θn )n≥1 (lexicographic order) ⎪ ⎪ ⎪ ⎨(θ ) 1−θn −n 1 n n≥1 → γ = 2 n≥1 2 2 ⎪ γ → Kγ (inclusion order) ⎪ ⎪ ⎪ ⎩K → h (T | ) γ top Kγ

is is is is

order order order order

preserving; reversing; preserving; preserving.

Since T |Kγ and fc |[c1 ,β] ) have the same entropy, c →  htop (fc |[c1 ,β] ) is an orientation-reversing map, proving the monotonicity of entropy for the quadratic family.  We can extend the ρ-function from (3.19) to this complex case without changing the definition: ρ : N → N,

ρ(n) = max{k > n : en+1 en+2 · · · ek−1 is prefix of ν},

and the sequence of cutting times (3.36)

S0 = 1,

Sk+1 = ρ(Sk ),

is again uniquely determined by νc . In the complex setting, the sequence of cutting times is called the internal address. This name comes from a procedure of locating the parameter c in the Mandelbrot set. A hyperbolic component of period r in M is a subset of parameters where fc has an attracting periodic orbit of period r. Since each such periodic orbit must attract a critical point (by Montel’s Theorem; see [144, Theorem I.3.1]), there can be only one of them, and hence hyperbolic components are disjoint. Attracting periodic orbits persist under small perturbations, so hyperbolic components are open. The hyperbolic component of period 1 is called the main cardioid and contains 0. It is conjectured (and this follows from the MLC-conjecture: the Mandelbrot set is locally connected; see [206, 407]) that the hyperbolic components lie dense in M. For c ∈ M, take an arc connecting 0 to c without self-intersections. Starting from 0, list the periods of hyperbolic components the closures of which the arc passes through, and retain only the smallest entries. That is, S0 = 1 and if Sk is found, let Sk+1 be the smallest period29 that you can find on the remainder of the arc. The sequence {Sk }k≥1 turns out to be exactly the one obtained from (3.36). 29 It was shown by Lavaurs [388] that between two hyperbolic components of the same period, there is always a hyperbolic component of a lower period, and therefore this list determines the hyperbolic components on this arc uniquely.

112

3. Subshifts of Positive Entropy

We extend the use of closest precritical points to the Julia set: ζ ∈ Jc is a closest precritical point if fnc (ζ) = c0 and fkc (x) = c for all k ≤ n and x ∈ (ζ, c0 ) which now denotes the open arc between ζ and c0 in the Julia set. The closest precritical points ζS1k ∈ Jc1 and ζS0k ∈ Jc0 from the real case in (3.23) belong to the interval [c1 , c2 ], i.e. the Hubbard tree, but in the whole Julia set there is a closest precritical point ζn for every n ∈ N. Indeed, since Jc is completely invariant, {Jc0 , Jc1 } is a Markov partition such that fc (Jc0 ∪ {c}) = fc (Jc1 ∪ {c}) = Jc . There are therefore 2n cylinders of  −k generation n, and each z ∈ n−1 k=0 fc (c0 ) is a boundary point of one of these cylinders. Hence ζn1 and ζn0 can be found in the interior of the two n-cylinders with c in their boundary. The arc [ζn1 , ζn0 ] maps in a two-to-one way onto the arc [ζˆn , c1 ] for ζˆn := fc (ζn1 ) = fc (ζn0 ), with the property that fkc (x) = c1 for all x ∈ (ζˆn , c1 ) and k ≤ n. Also the ζˆn ∈ (ζˆn , c1 ] of lowest index n satisfies n = ρ(n). Indeed, maps [ζˆn , c1 ] homeomorphically onto [c0 , cn ]  ζn −n , and n − n is the fn−1 c  smallest positive iterate of [c0 , cn ] to contain c0 . But then fcn −n (c0 ) and  fcn −n (cn ) = cn lie in different components Jc1 and Jc0 , and hence ρ(n) = n , as claimed. Therefore, each ρ-orbit orbρ (k) represents a monotone sequence (ζˆρi (k) )i≥1 of closest precritical points approaching c1 (or ending in c1 if c is periodic). These ρ-orbits are disjoint if and only if the corresponding sequences approach c1 from different components of Jc \ {c1 }. The following proposition therefore implies that points ck , k ≥ 2 (with corresponding ρorbits orbρ (ρ(k) − k)), all lie in the same component of Jc \ {c1 }, and hence c1 is an end-point of the Hubbard tree. Proposition 3.104. For each kneading sequence ν ∈ {0, 1}N and 2 ≤ m ∈ N such that ρ(m) − m < ∞, there exists a k ≤ ρ(m) such that k ∈ orbρ (1) ∩ orbρ (ρ(m) − m). Proof. We argue by induction on n, using the induction hypothesis IH[n]: IH[n]: For every ν ∈ {0, 1}N with corresponding ρ-function and for every m ∈ N such that ρ(m) − m = n, the orbits orbρ (1) and orbρ (n) intersect at the latest at ρ(m). Remark 3.105. IH[n] does not imply that orbρ (1) ∩ orbρ (n) contains ρ(m), not even if m is minimal such that ρ(m) − m = n. For example, if ν = 1011001101101 · · · with n = 6 and m = 7, then n ∈ orbρ (1), but ρ(n) > m > n. The induction hypothesis is trivially true for n = 1. So assume that IH[n ] holds for all n < n. Take ν ∈ {0, 1}N arbitrary and m ∈ N minimal such that ρ(m) − m = n. If no such m exists, then IH[n] is true for this

3.6. Unimodal Subshifts

113

ν by default. Let n0 ∈ orbρ (n) be maximal such that n0 ≤ ρ(m); thus ρ(n0 ) > ρ(m). We distinguish two cases: Case I: n0 < ρ(m). If n0 ≤ m, then ρ(n0 ) > ρ(m) implies n0 < m and ν1 · · · νm−n0 νm−n0 +1 · · · νρ(m)−n0 = νn0 +1 · · · νm νm+1 · · · νρ(m) ; hence ρ(m − n0 ) − (m − n0 ) = ρ(m) − m = n, contradicting minimality of m. Therefore m < n0 < ρ(k). Since ρ(n0 ) > ρ(m) and νm+1 · · · νn0 +1 · · · νρ(m) = ν1 · · · νn0 −m+1 · · · νn (where νn is the opposite symbol of νn ), we have ρ(n0 − m) = n. Consider ν˜ := ν1 · · · νn0 −1 νn 0 νn0 +1 · · · (with arbitrary continuation) with associated function ρ˜. Then ρ˜(m) = n0 . (i) If n0 = n, then the fact that ρ(n0 − m) = n implies ρ˜(n0 − m) > n0 , so ρ˜(m) ∈ / orbρ˜(n0 − m). / orbρ˜(n), and ρ˜(n0 − m) = ρ(n0 − m) = (ii) If n0 > n, then ρ˜(m) = n0 ∈ / orbρ˜(n0 − m). m < n0 again implies ρ˜(m) ∈ So in both cases ρ˜(m) ∈ / orbρ˜(n0 − m). Now ρ˜(m) − m = n0 − m < ρ(m) − m = n, so by the induction hypothesis IH[n0 − m], orbρ˜(1) and / orbρ˜(n0 − m), they meet orbρ˜(n0 − m) meet at or before ρ˜(m); since ρ˜(m) ∈ before ρ˜(m) = n0 . As a result, also orbρ (1) and orbρ (n0 − m) meet before n0 < ρ(m), and since ρ(n0 − m) = n, also orbρ (1) and orbρ (n) meet before ρ(m). Case II: n0 = ρ(m). In this case ρ(m) ∈ orbρ (n). Let p0 ∈ orbρ (1) be maximal such that p0 ≤ ρ(m); hence ρ(p0 ) > ρ(m). If p0 = ρ(m), then there is nothing to prove, so assume that p0 < ρ(m) < ρ(p0 ). As in Case I (by minimality of m), we only need to consider the case that m < p0 < ρ(m) < ρ(p0 ). Since νk+1 · · · νρ(m) = ν1 · · · νn , we have ρ(p0 − m) = n (similarly as above). Set ν˜ := ν1 · · · νp 0 · · · with associated function ρ˜. Then ρ˜(m) = p0 < ρ(m) and by IH[p0 − m], orbρ˜(1) and orbρ˜(p0 − m) meet at the latest at ρ˜(m) = p0 . (i) If n < p0 , then ρ˜(p0 − m) = ρ(p0 − m) = n, so orbρ˜(1) and orbρ˜(n) / orbρ˜(1), so in fact orbρ˜(1) and orbρ˜(n) meet at the latest at p0 . But p0 ∈ meet before p0 . But then orbρ (1) and orbρ (n) also meet before p0 < ρ(m). (ii) If n = p0 , then orbρ (1) and orbρ (n) obviously meet at p0 < ρ(m). (iii) The case n > p0 is impossible. Indeed, ρ(m) − m = n > p0 > m, so ρ(m) > 2m. Since ρ(p0 ) > ρ(m) = n + m > p0 + m, we find that νm+1 · · · νp0 +1 · · · νρ(m)−1 = ν1 · · · νp0 −m+1 · · · νρ(m)−m−1 ; hence ρ(p0 − m) ≥ ρ(m) − m > p0 . For the sequence ν˜ this means that ρ˜(p0 − m) = p0 , while / orbρ˜(1). Therefore orbρ˜(1) and orbρ˜(p0 − m) do not meet at or before p0 ∈ p0 ; since ρ˜(m) − m = p0 − m, this contradicts IH[p0 − m].

114

3. Subshifts of Positive Entropy

This completes Case II and proves that orbρ (1) and orbρ (n) intersect at the latest at ρ(m), where m is minimal with the property that ρ(m)−m = n. For an arbitrary (i.e. not necessarily minimal) m with ρ(m) − m = n, let m be minimal with this property. Then the ρ-orbits orbρ (1) and orbρ (n) meet at the latest at ρ(m ) = n + m ≤ n + m = ρ(m), so the statement holds for arbitrary m. This proves IH[n].  Thurston’s Non-wandering Triangle Theorem states that every branchpoint in the dendritic Julia set of a quadratic map is precritical or (pre)periodic. The next proposition proves this using only the properties of ρ-functions. However, the theorem only works for quadratic polynomials, because cubic or higher-order polynomials can have wandering (i.e. not precritical or preperiodic) triods or n-ods30 . This was discovered by Blokh & Oversteegen [93] and studied systematically by Childers [154]. Upper bounds for the number of external rays were already given in [365]. We extend the ρ-function to arbitrary sequences x ∈ {0, 1}, as ρx (n) = min{k > n : xk = νk−n }

for kneading sequence ν.

Hence ρν = ρ. The ρx -orbits correspond to sequences of closest-to-x precritical points that monotonically approach x (or end in x if x is precritical). The number of disjoint ρx -orbits is equal to the number of components of Jc \ {x}. Proposition 3.106. Let x, ν ∈ {0, 1}N . If there are three disjoint ρx -orbits, then x is preperiodic. Proof. Assume by contradiction that (Ah )h≥0 = orbρx (A0 ), (Bi )i≥0 = orbρx (B0 ), and (Cj )j≥0 = orbρx (C0 ) are pairwise disjoint ρx -orbits. There are infinitely many triples (Ah , Bi , Cj ) such that Bi−1 < Ah−1 < Bi < Ah and Cj−1 < Ah−1 < Cj < Ah (possibly with the roles of Ah , Bi , and Cj permuted. Assume that (Ah , Bi , Cj ) is one of such triples, with span d(Ah , Bi , Cj ) := max{Ah − Ah−1 , Bi − Bi−1 , Cj − Cj−1 } taking the minimal value dmin among the span of all such triples; see Figure 3.17.

• Cj−1

• • • • • • Bi−1 k Ah−1 ρ(k) − k Bi Cj Bi −1Cj 

• Cj  −1

• Ah

Figure 3.17. Parts of the three disjoint ρx -orbits.

30 n

arcs glued together at a common branch-point.

• • B i C j 

3.6. Unimodal Subshifts

115

We have  νBi−1 +1 · · · νAh−1 · · · νBi = ν1 · · · νAh−1 −Bi−1 · · · νB , i −Bi−1

and hence ρ(Ah−1 − Bi−1 ) = Bi − Bi−1 . Proposition 3.104 for m := Ah−1 − Bi−1 and therefore ρ(m) − m = Bi − Ah−1 gives a k ≤ ρ(m) = Bi − Bi−1 such that k ∈ orbρ (1) ∩ orbρ (ρ(m) − m). Similarly, for m = Ah−1 − Cj−1 , we have a k  ≤ ρ(m ) = Cj − Ai−1 such that k  ∈ orbρ (1) ∩ orbρ (ρ(m ) − m ). We have xAh +1 · · · xAh −1 = ν1 · · · νAh −Ah−1 −1 . If Ah − Ah−1 > max{Bi −Bi−1 , Cj −Cj−1 }, then Ah−1 +max{k, k  } ∈ orbρx (Bi )∩orbρx (Cj ), contradicting the disjointness of orbρx (B0 ) and orbρx (C0 ). Therefore Ah − Ah−1 ≤ max{Bi − Bi−1 , Cj − Cj−1 }. Now take i ≥ i and j  ≥ j maximal such that Bi ≤ Ah and Cj  ≤ Ah . Assume without loss of generality that Bi < Cj  , and take j  ≤ j  minimal such that Bi < Cj  . Then (Bi , Cj  , Ah ) forms the next new triple, with span d(Bi , Cj  , Ah ) ≤ d(Ah , Bi , Cj ). By the choice of (Ah , Bi , Cj ), we have in fact d(Bi , Cj  , Ah ) = d(Ah , Bi , Cj ), and that means that the span of all later triples is dmin . A fortiori, the first “over-arching” distance Ah − Ah−1 of these triples is equal to dmin . Therefore x is periodic from this point onwards,  with period dividing dmin . Remark 3.107. This proof is more general than Thurston’s proof, because it applies also to non-admissible kneading sequences, i.e. those that do not come with a Thurston lamination. For instance, if ν = 101100 · · · , then there is a periodic point (see Figure 3.18, right) with itinerary x = 101. The ρx -orbits of A0 = 3, B0 = 6, and C0 = 1 are disjoint, with span dmin = 6. The precritical branch-points are not covered by this proposition, because of the issue of assigning a proper symbol to the critical point. Each choice of 0 or 1 allows for one or two branches according to whether ν is an end-point in the Julia set or not (unless ν is eventually periodic). Therefore 0ν and 1ν together accounts for two or four arms. For quadratic dendritic Julia sets, each non-precritical branch-point has a so-called characteristic branch-point z ∈ [c1 , c0 ] on its orbit that is periodic and is closest to c1 in the sense that the arm, i.e. component of Jc \ {z}, is disjoint from orbfc (z); see [126, Section 3]. Again, the combinatorial properties of characteristic periodic points can be read off the ρ-function, as summarized below (see [126, Proposition 4.19] for the proof): Proposition 3.108. Let ν ∈ {0, 1}N be the kneading sequence of a dendritic quadratic Julia set Jc . Take m ∈ orbρ (1) and write ρ(m) = qm + r for r ∈ {1, . . . , m}. Then there is a characteristic m-periodic point z ∈ [c1 , c0 ]

116

3. Subshifts of Positive Entropy

with itinerary i(z) = ν1 · · · νm . Its number of arms is  q + 1 if m ∈ orbρ (r), q + 2 if m ∈ / orbρ (r), and locally, these arms are permuted cyclically by fm c ; see Figure 3.18, left. There are no other characteristic branch-points in Jc . c1•

• c7 • c4 ◦ ◦ • c5  • c8 ◦ ◦ c2 •

• c1 c =c ◦ 0 • 10 • c6

◦ • c3 • c9

• c5 c =c $ $0 • 6

• c4

$

• c3

• c2

Figure 3.18. The Hubbard tree of ν = 1 10 111 1100 has two periodic orbits of branch-points. The Hubbard tree of ν = 1 0 11 0 0 has an orbit of evil branch-points.

However, it is possible that the Hubbard tree corresponding to some ν ∈ {0, 1}N has characteristic branch-points that are not described by Proposition 3.108. These are called evil points; the existence of evil points is the only restriction for ν not to be the kneading sequence of some quadratic map; see [126, Proposition 4.13]. Proposition 3.109. Let ν ∈ {0, 1}N and m ∈ N and write ρ(m) = qm + r for r ∈ {1, . . . , m}. If ⎧ ⎪ ⎨m = orbρ (1), (3.37) ρ(k) < m if k < m divides m, ⎪ ⎩ m ∈ orbρ (r), then the Hubbard tree associated to ν has an evil characteristic branch-point z ∈ [c1 , c0 ]. Its itinerary is ν1 · · · νm and its number of arms is q + 2. Locally, the m-th iterate of the tree map fixes the arm towards c0 and permutes the other arms cyclically. It follows from Proposition 3.104 that if m ∈ N satisfies (3.37), then ρ(m) ∈ orbρ (1). The fact that arms are not permuted cyclically prevents the existence of a polynomial fc with a periodic branch-point as described, but this is the only restriction; see [126, Section 4]. Hence we have the Complex

3.7. Gap Shifts

117

Admissibility Condition: If ν ∈ {0, 1}N is such that (3.37) fails for every m ∈ N, then ν is the kneading sequence of some quadratic polynomial. If m ≥ ρ(m) − m ∈ orbρ (1) for all m, then all characteristic periodic points have two arms, according to Propositions 3.108 and 3.109, and the Hubbard tree is an arc [c1 , c2 ]. But this condition gives the existence of a kneading map Q, which is central to having a real kneading sequence.

3.7. Gap Shifts These were introduced by Lind & Marcus [398, page 7] as another example of the variety of the notion of subshift, but they were not further developed there. Definition 3.110. Let S be a collection of non-negative integers. The corresponding gap shift (or S-gap shift, for apparently no other reason that S denotes the collection of gap-sizes) is the subshift XS = {x ∈ {0, 1}Z : if 10s 1 is a subword of x, then s ∈ S}. Example 3.111. We obtain the Fibonacci SFT by taking S = {1, 2, 3, 4, . . . }, and if we interchange the roles of 0’s and 1’s, then S = {0, 1}. The even shift with the roles of 0 and 1 interchanged is obtained by taking S = {0, 2, 4, 6, . . . }. Also β-shifts are S-gap shifts, namely with {s ∈ N :  S = −s cs = 1} for the greedy β-representation of 1, i.e. 1 = s∈S β ; see the proof of Theorem 3.77. Example 3.112. The subshift in which every other symbols is a 1 (but no other restrictions) can be seen as a sofic shift (see the edge-labeled transition graph in Figure 3.19), as an S-gap shift with S = {1, 3, 5, 7, 9, . . . }, and it is isomorphic to an SFT on {0, 1, 2} (see the vertex-labeled transition graph in Figure 3.19). It encodes the quadratic interval map √ Ts (x) = min{sx, s(1−x)} such that Ts2 ( 12 ) < 12 < Ts3 ( 12 ) = Ts4 ( 12 ), so s = 2. These dynamical systems are topologically transitive, but not topologically mixing. The entropy is exactly 12 log 2.

1

2

0 1

1 0

Figure 3.19. Graphs of a sofic shift, SFT, and tent map with topological entropy 12 log 2.

118

3. Subshifts of Positive Entropy

The following collection of results was shown in [182]. Theorem 3.113. Let (XS , σ) be an S-gap shift. Then (XS , σ) (a) is an SFT if and only if S is finite or cofinite; (b) is sofic if and only if {si }i is eventually periodic; (c) is synchronized and coded; (d) is topologically mixing if and only if gcd{s + 1 : s ∈ S} = 1; (e) has specification if and only if S is syndetic and gcd{s + 1 : s ∈ S} = 1. Proof. (a) If #S < ∞, then let N = max S, and if S is cofinite, then take N such that s ∈ S for all s > N . Now declare all N + 1-words forbidden if they don’t occur in any concatenation of words 10s , s ∈ S. Conversely, if N is the maximal length of a forbidden word of the SFT, then S ⊃ {N + 1, N + 2, N + 3, . . . }. (b) Take B = {0s 1 : s ∈ S}. Clearly, every (left-infinite) word w ending with 1 has the same follower set F (w) = B N . For every (left-infinite) word w ending in 0, w = · · · 0000 or there is a unique n ∈ N such that 10n is a suffix of w. The word · · · 0000 has its own follower set F (· · · ) and if n < ∞, then the follower set F (w) depends only on n and is eventually periodic in n by our assumption. Therefore there are only finitely many distinct follower sets and by Theorem 3.36 XS is sofic. Conversely, if XS is sofic, then again F (w) depends only on n. The follower set F (w) = {0∞ } ∪ {0a 1B N : a + n ∈ S}. That is, there is an infinite collection of follower sets, unless {N ∩ (S − n)}n≥1 is a finite collection of sets, and this only holds if S is eventually periodic. (c) The S-gap shift, as the free concatenation of words 10s , s ∈ S, is obviously a coded shift. Each such word is synchronizing. (d) If g := gcd{s + 1 : s ∈ S} > 1, then σ n ([1]) ∩ [1] = ∅ if n is not a multiple of g. In this case, topological mixing fails. Conversely, there is N such that for every n ≥ N , there is a word v ∈ Ln (XS ) which is the concatenation of words 10s , s ∈ S. Now let u, w ∈ L(XS ) be arbitrary. By extending u by u on the right and w by w on the left by words u and w of no more than min S symbols, we can turn them in the suffix and prefix of concatenations of words 10s , s ∈ S. But then uu vw w ∈ L(XS ) as well. This proves the topological mixing; cf. [336]. Finally, for the specification, we refer to [182]. In fact, in [43] it is shown that for every h ∈ (0, log 2], there are (sometimes uncountably many) gap  shifts with specification satisfying htop (XS , σ) = h.

3.7. Gap Shifts

119

As mentioned before, a gap shift (X, σ) is coded with C = {0s 1 : s ∈ S} as the set of code words. Therefore Theorem 3.48 immediately gives that (X, σ) (and any of its factors) is intrinsically ergodic; see also [159]. Theorem 3.114. The topological entropy of the S-gap shift (XS , σ) is log λ where λ the largest solution of the equation

λ−(s+1) = 1. s∈S

Proof. Gap shifts are coded shifts, with C = {10s : s ∈ S} as code words. Therefore the results of Section 3.3 apply, but the situation here is simpler because UC from (3.5) reduces to {0∞ }. In fact, we can pass directly to the representation of a gap shift by an infinite transition graph consisting of a single central vertex from which loops of length s + 1, s ∈ S, emerge. So Theorem 3.114 follows directly from Theorem 8.73.  Exercise 3.115. Use Theorem 3.114 to compute the entropy of the Fibonacci SFT, the odd shift, and the even shift. A generalization of S-gap shifts was initiated in [183, 409]. For the alphabet A = {0, 1, . . . , d − 1} and for each a ∈ A, there is a set Sa ⊂ N0 , such that the maximal blocks of each symbol a must have length s ∈ Sa . If in addition, these blocks appear cyclically, i.e. as 0s0 1s1 · · · (d − 1)sd−1 , si ∈ Si , before the next 0 is allowed to appear31 , then we call this shift space the cyclic S-limited gap shift or simply cyclic S-gap shift. Clearly S-gap shifts are cyclic S-gap shifts on two symbols with S0 = S and S1 = {1}. For cyclic S-gap shifts, a fairly straightforward generalization of Theorem 3.113 holds. For instance, a cyclic S-gap shift is topologically mixing if and only if gcd{s0 + s1 + · · · + sd−1 : si ∈ Si } = 1; see [409, Proposition 3.6]. In [409, Theorem 4.3] (using the results of [159]), it is shown that cyclic S-gap shifts are intrinsically ergodic, and so are its factors. Also conditions are given [409, Theorems 5.1 and 5.2] about when two cyclic S-gap shifts are conjugate. Theorem 3.116. The topological entropy of the cyclic S-gap shift (XS , σ) is log λ where λ is the largest solution of the equation 

λ−sa = 1. a∈A s∈Sa

Proof. We give the proof first for the truncated Sa := Sa ∩{0, . . . , N }. Since the entropy increases as N increases, the theorem follows by taking N → ∞. 31 If

Sa  0, then the symbol a can be “jumped”; if Sa ⊃ {0, 1} for each a, then XS = AN or Z .

120

3. Subshifts of Positive Entropy

Also, we use the rome technique from Section 8.7.3 as opposed to the proofs in [183, 398, 409]. Let B be the n × n-transition matrix (for some n ≤ d(N + 1)) for the cyclic S  -gap shift. Then by Theorem 8.72, det(B − λIn ) = (−λ)n−d det(Arome (λ) − λId ) for



0 0 .. .

Σ0 0

0 Σ1 .. .

... 0 .. . .. .

...

0



⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ Arome (λ) = ⎜ ⎟, . . ⎟ ⎜ . ⎟ ⎜ ⎟ ⎜ .. ⎝ . 0 Σd−2 ⎠ Σd−1 0 . . . 0  and Σa := s∈Sa λ1−s , a ∈ A. A straightforward computation gives that ⎛ ⎞  

Σa = (−λ)n ⎝1 − λ−s ⎠ . det(B − λI) = (−λ)n − (−λ)n−d a∈A s∈Sa

a∈A

Therefore the leading root satisfies



& a∈A

s∈Sa

λ−s = 1.



3.8. Spacing Shifts Instead of determining the gaps between 1’s in allowed sequences of a subshift, we can specify the distances of all (not just neighboring) 1’s. This leads to the definition of a spacing shift. These were first described in [387] without giving them a specific name. The name, as well as a rigorous treatment of this type of subshift, stems from [47]. Definition 3.117. Given a subset P ⊂ N, the spacing shift is the collection of all sequences x ∈ {0, 1}N or Z such that xi = xj = 1 implies i = j or |i − j| ∈ P . We denote this subshift as (XP , σ). Example 3.118. If P = 2N, then XP is the odd shift from Example 1.4. If P = 2N + 1, then XP = {0∞ } ∪ {σ n (· · · 000.1000 · · · )}n∈Z ∪ {σ n (· · · 000.11000 · · · )}n∈Z . Clearly every spacing shift contains 0∞ and · · · 0001000 · · · , even if P = ∅, and P  P  implies that XP  XP  . In particular, no spacing shift is minimal. Also (XP , σ) is an SFT if and only if P is cofinite, and max{N\P }− 1 is the length of the longest forbidden word of the SFT; see [47, Theorem 2.4].

3.8. Spacing Shifts

121

The condition in the definition of the spacing shift is more restrictive than the restriction of gap shifts. Clearly if p1 , p2 ∈ P , then 10p1 −1 10p2 −1 1 ∈ L(X) only if also p1 + p2 ∈ P . Therefore it is natural to require that P is closed under addition, but this requirement is a necessary (but not the only) condition for a spacing shift to be a gap shift; see [47, Theorems 2.19 and 2.21]. However, there are many interesting spacing shifts (XP , σ) for which P is not closed under addition. Proposition 3.119. If a spacing shift (XP , σ) is topologically transitive, then P is infinite. If P = ∅ is closed under addition, then (XP , σ) is topologically transitive. Proof. If P is finite, then no point x ∈ [1] can return to [1] infinitely often, so topological transitivity fails. Conversely, assume that u, v ∈ L(XP ) both start and end with a 1, and let p ∈ P be arbitrary. We claim that w = u0p−1 v also belongs to L(XP ). Indeed, let 1 ≤ i < j ≤ |w| be such that wi = wj = 1. If j ≤ |u|, then |j − i| ∈ P because u ∈ L(XP ), and if |u| + p ≤ i, then |j −i| ∈ P because v ∈ L(XP ). The case i ≤ |u| < |u|+p ≤ j follows because P is closed under addition. Therefore (XP , σ) is topologically transitive.  The same proof gives that (XP , σ) has a dense set of periodic orbits, so if P is closed and non-empty, then (XP , σ) is automatically Devaney chaotic. Contrary to gap shifts, spacing shifts are hereditary, which gives a certain freedom in constructing proofs. Theorem 3.120. A spacing shift (XP , σ) is Li-Yorke chaotic if and only if the set P is infinite. Proof. If P is finite, then every x ∈ XP has only finitely many non-zero entries, so XP is countable. This is too small to allow for an uncountable scrambled set. N For the converse, let P = (pi )∞ i=1 . Construct a scrambled set S ⊂ {0, 1} as in Example 2.68, and map it into XP via  xi if n = pi , π(x)n = 0 otherwise.

Then π(S) ⊂ XP is still scrambled and still uncountable.



A spacing shift (XP , σ) is topological mixing if and only if P is cofinite, which can easily be seen from the definition by taking open sets U = V = [1] ⊂ XP . It is weakly topologically mixing if and only if P is thick; see [387, Proposition 1]32 and [47, Theorem 2.1]. As such, for every thick set P with N \ P infinite, (XP , σ) is topologically weakly mixing but not topologically 32 The

term replete was used instead of thick.

122

3. Subshifts of Positive Entropy

mixing; see [387, Theorem 1.3]. It is known from [266] that a dynamical system (X, T ) is topologically weak mixing if the product of (X, T ) with every transitive system is again transitive. Conversely, in [387, Theorem 1.1], it is pointed out that the converse is false. Namely, if P and P  are disjoint thick sets, then (XP , σ) and (XP  , σ) are both topologically weak mixing, but their product is not topologically transitive (since [1] × [1] is not infinitely recurrent). Topologically weak mixing precludes the existence of non-constant Borel-measurable eigenfunctions [359]; i.e. no Borel function f : X → C satisfies UT f := f ◦ T = λf . However, [387, Theorem 1.2] presents a spacing shift that is not topologically weak mixing, but which has a non-constant Borel eigenfunction. All this shows that there is only a partial topological analog of the characterization of measure-theoretically weak mixing in Theorem 6.86. Regarding topological entropy, htop (XP , σ) ≥ k1 log 2 if P ⊃ kN; cf. Theorem 3.54 and [47, Lemma 3.1]. Other than this, there seems to be no easy way to compute the topological entropy htop (XP , σ) from the properties of P . However [47, Theorem 3.6] gives a criterion for htop (XP , σ) = 0.

3.9. Power-Free Shifts A square is a non-empty word of the form ww, e.g. the word bonbon; a cube is a non-empty word of the form www, e.g. the Dutch word kerkerker33 . Naturally, you can go to higher powers, or fractional powers, e.g. the word sense. An overlap is a non-empty word of the form wvwvw, e.g. the word alfalfa, where w = a and w = af. Naturally, every square word or higher power of length ≥ 4 contains an overlap. In fact, an overlap is a fractional power or repetition where the exponent p/q > 2; see Definition 3.126 below. Thue’s pioneering articles [531, 533] on (what is now called) the ThueMorse sequence, see Example 1.6, started this topic in the early 20th century. In computer science, finding languages that avoid powers and overlaps has been pursued at least since the 1970s. Definition 3.121. A subshift (X, σ) over some alphabet A is called squarefree, cube-free, power-free, and overlap-free, if its language L(X) contains no squares, cubes, etc. Overlap-free, power-free, and repetition-free are here synonyms, but if the exponent is indicated, k + -repetition-free = k + -power-free = k-overlap-free. That is, wk is a k-th power or k-th repetition, but wk w1 where w1 is the first letter of w is a k-overlap. 33 Meaning

church-niche, just as kerkerkerkerker means dungeon in a church-niche.

3.9. Power-Free Shifts

123

Theorem 3.122. The smallest alphabet size for which square-free subshifts exist is 3. The Thue-Morse sequence is square+ε-free (i.e. overlap-free) in / L(X) for every w ∈ L(X) and w1 is the first letter the sense that www1 ∈ of w. The crux of the proof relies on a property of the substitution χT M (see [107, Theorem 3]) which is an idea we can reuse elsewhere, so we formulate it in the following lemma. Lemma 3.123. A word w ∈ {0, 1}∗ has a k-overlap if and only if χT M (w) has a k-overlap. The same result holds for k + 1-power instead of k-overlap. This is different from Theorem 3.122 in the sense that x can be any 0-1word, not just words from the Thue-Morse language L(XρTM ). On the other hand, it applies to every k ≥ 1, not just to k = 1 (i.e. overlap-free/squaresfree). Proof. It is immediate that if w has a k-overlap (or k-power), so has χTM (w). Hence we only have to prove the “if”-direction, and we start with a preliminary remark. By the shape of χTM , (3.38)

|χTM (x)|0 = |χTM (w)|1 = |x|

for all x ∈ {0, 1}∗ .

Suppose that w ∈ L contains χT M (w) which contains a k-overlap; i.e. χTM (w) = av k v1 b where v1 is the first letter of w, but w itself does not contain a k-overlap. Assume also that w is the shortest word with this property, so |a|, |b| ≤ 1. Suppose by contradiction that there is x such that χTM (x) = avvc for |a|, |c| ≤ 1. Since |χTM (x)| = |avvb| is even, |a| = |b|. If |v| is odd, then |vv|0 − |vv|1 = ±2, so a, c = . But then a = c =  and when we now divide avvc in blocks of two, then v is chopped in two-block in two different way. Each such block is 01 or 10; therefore a = v2 = v4 = · · · = vn and c = v1 = v3 = · · · = vn−1 = a, where we wrote v = v1 · · · vn . But then |avvc|0 = 1 + 2|v|0 = 1 + 2|v|1 = |avvc|1 , contradicting (3.38). Therefore |v| is even. If |a| = 1, then if we divide avvc in blocks of two, we see that a = v1 and c = vn . The parity of (3.38) gives a = c, so a = vn = v1 = c. But then w shortened by its last letter has the same property as w, contradicting that w is shortest. Hence a = c =  and χTM (x) = vv. But then xk x1 is a prefix of w, so w contains a k-overlap after all.  Proof of Theorem 3.122. If you try to create a two-letter square-free word, then you soon get stuck: 0  01  010  stuck.

124

3. Subshifts of Positive Entropy

To create a three-letter square-free infinite word (a problem that was first solved by Thue; see [531, 533] and also [290, 404] for more modern results and approaches), start with a fixed point ρ0 of the Thue-Morse substitution χTM and replace the symbol by a 2 if a square threatens to appear: 0120 1021 20210120 1021012021201021 . . . . This turns out to work. Another way of creating a square-free word x from ρTM is by taking xi = ρTM,i − ρTM,i+1 (mod 3), because if x contains a square, then ρT M contains an overlap. For the Thue-Morse sequence, we work by induction on n in χnTM . We can see by inspection that the first 8 digits of ρT M are overlap-free. By applying χT M and Lemma 3.123, the first 16, 32, . . . , 2n , . . . digits of ρT M are overlap-free as well.  The next lemma (see [107, Theorem 2]) can be used to produce any number of square-free languages. Lemma 3.124. Let χ : A → B ∗ be a constant length substitution; i.e. |χ(a)| = |χ(b)| for all a, b ∈ A. If χ(w) is square-free for every squarefree 3-letter word w, then χ(x) is square-free for square-free words x of any length. Proof. Clearly χ(a) = χ(b) for all a = b because otherwise χ(aba) is not square-free. If |χ(a)| = 1 for all a ∈ A, then χ is a simple permutation of letters, and χ preserves the square-freeness. So let us assume that |χ(a)| =: d ≥ 2. Assume by contradiction that a square-free word x = x1 · · · xn maps to a non-square-free word χ(x) = rsst. Assume that x is the shortest such word, so |x| ≥ 4 and χ(x1 ) = rr for some non-empty prefix r of ss and χ(xn ) = t t for some non-empty suffix t of ss. However, |χ(x1 )| = |χ(xn )|, so there is some 1 < k < n such that χ(xk ) = yy  and x = x1 uxk vxn

χ

−→

r r χ(u)y y  χ(v)t t.

    s

s

Therefore |r | + d|u| + |y| = |s| = |y  | + d|v| + |t |. • If both |r | = |y  |, then r = y  because they are both prefix of s. Therefore d > | |t | − |y| | = d| |u| − |v| |, so |y| = |t | and y = t (both are suffix of s). But then also χ(u) = χ(v), so u = v because χ is injective. If wk w1 or wk = wn , then w = w1 uw1 uwn or w = w1 uwn uwn contradicting that w is square-free. If wk = wn , then w1 wk wn is a square-free 3-letter word, but χ(w1 wk wn ) = rr yy  tt = ry  yy  yt, contrary to the hypothesis of the lemma.

3.9. Power-Free Shifts

125

• If |r | > |y  |, then χ(w1 ) = ry  r where r =  and χ(v1 ) = r r . Since |χ(w1 )| = |χ(v1 )| = d, also r = . Now χ(w1 v1 w1 ) = ry  r r r ry  r is not square-free, so w1 = v1 . Thus we can rewrite χ(w1 ) = r qr for some q = , because otherwise not even χ(w1 ) is square-free. But r is also a prefix of χ(u), so r qr r is a prefix of χ(w1 u), contradicting the minimality of w. • If |r | < |y  |, then χ(wk ) = yr y  where y  =  and χ(u1 ) = y  y  . Since |χ(wk )| = |χ(u1 )| = d, also y  = . Now χ(wk u1 wk ) = yr y  y  y  yr y  is not square-free, so wk = u1 . Thus we can rewrite χ(wk ) = y  qy  for some q = , because otherwise not even χ(wk ) is square-free. But y  is also a prefix of χ(v), so y  qy  y  is a prefix of χ(wk v), contradicting the minimality of w. This proves the lemma. Note that χ : a → ab, b → cb, c → cd is square-free on all 2-letter words, but χ(abc) = abcbcd is not square-free. Therefore the minimal length |w| = 3 in the hypothesis of the lemma is optimal.  Lemma 3.124 is a building block for the proof of the following result; see [107, Theorems 5]. Theorem 3.125. The square-free subshift (X, σ) on three or more letters has positive topological entropy. As we mentioned in Theorem 3.122, there are no square-free sequences in two letters, but the Thue-Morse sequences are overlap-free. Their entropy, however, is zero: the word-complexity is known exactly; see (1.2). Proof. The idea is to start with a square-free word x ∈ {0, 1, 2}n , from which we can create 2n different square-free words in a 6-letter alphabet A = {a, a , b, b , c, c } by replacing occurrences of 0 by a or a , occurrences of 1 by b or b , and occurrences of 2 by c or c , all independently. Let y be any of the resulting words, and apply the following length-22 substitution to y: ⎧ ⎪ a → 0102012021012102010212, ⎪ ⎪ ⎪ ⎪ ⎪ a → 0102012021201210120212, ⎪ ⎪ ⎪ ⎨b → 0102012101202101210212, χ: ⎪ b → 0102012101202120121012, ⎪ ⎪ ⎪ ⎪ ⎪ c → 0102012102010210120212, ⎪ ⎪ ⎪ ⎩c → 0102012102120210120212. By Lemma 3.124 and because χ is injective, this produces 2n square-free 22n1 log 2 ≈ words in {0, 1, 2}∗ . Hence the topological entropy htop (X, σ) ≥ 22 0.01368 > 0.

126

3. Subshifts of Positive Entropy

In addition, the proof in [107] also produces an upper bound, by remarking that there are 1,172 square-free 24-letter words starting with 01. Combined with the six square-free 2-letter words, there are altogether 6 · 1,172 square-free 24-letter words, and they can be extended in at most 1,172 ways to a square-free 46-letter word, etc. This gives p(n) ≤ 6 · 1,172n/22 , so that 1 log 1,172 ≈ 0.321.  htop (X, σ) ≤ 22 Other methods have been designed than the one in this proof; see e.g. [60, 114, 235, 366, 472, 518]. If p(n) indicates the number of square-free words in {0, 1, 2}n , then htop (X, σ) = limn n1 log p(n). For square-free subshift of {0, 1, 2}∗ , the most accurate estimate to date is htop (X, σ) = log α for 1.3017597 < α < 1.3017619, see [503, 504], which contains also numerical estimates for topological entropy for k-power-free shifts for various values of k and alphabet sizes. Definition 3.126. If w is a finite word, its repetition exponent is the largest rational pq such that there is a prefix v of w such that w is a prefix of v ∞ and |w| = pq |v|. If x is an infinite word, then the critical exponent of x is the supremum of the repetition exponents of all its subwords w. As such, the Thue-Morse sequence has critical exponent 2, and this is the smallest critical exponent of any sequence in {0, 1}N . For general finite alphabets, we have Dejean’s Theorem: Theorem 3.127 (Dejean’s Theorem). The least critical exponent of x ∈ {0, . . . , N − 1}N is ⎧ 7 ⎪ if N = 3, ⎪ ⎨4 7 if N = 4, 5 ⎪ ⎪ ⎩ N if N = 2 or N ≥ 5. N −1 The proof was completed in a list of articles [173, 189, 427, 470]. See [467, 468] for related results. This raises the question of the topological entropy of fractional repetition-free subshifts. For example, in [343] it is shown that the 7/3-rd repetition-free subshift over 3 letters has polynomial wordcomplexity, where γ-repetition-free shifts have positive entropy if k > 7/3. In fact [343, Theorems 7 and 11], the word-complexity of the k-repetitionfree language satisfies  if 2 < k ≤ 7/3, p(n) = O(nlog2 25 ) log p(n) 1 0 < lim supn n ≤ 63 log 2 if k > 7/3. Furthermore, a two-sided infinite word is k-overlap-free for some 2 < k ≤ 7/3 if and only if the set of all its subwords belongs to L(ρTM ).

3.9. Power-Free Shifts

127

Cassaigne [145] discovered that for overlap-free shifts, there is no γ ≥ 1 such that p(n) ≈ nγ , because the behavior is different along different subsequences. The following theorem comes from [337]. Theorem 3.128. Let (XOF , σ) be the overlap-free shift, and pOF (n) its word-complexity. Then: • lim inf n→∞ • lim supn→∞

log pOF (n) log n log pOF (n) log n

∈ [1.2690, 1.2736]. ∈ [1.3322, 1.3326].

pOF (n) • The ratio log log has a limit as n → ∞ along some subsequence n of density 1, and this limit belongs to the interval [1.3005, 1.3098].

In addition to the word-frequency, several authors also study the minimal frequency 1 f (a) = lim inf |x1 · · · xn |a n→∞ n of a letter a ∈ A for k-repetition-free sequences x. This turns out to be a non-trivial number, at least for the minimal alphabet-size that allows a k-power-free shift. For example, [530] computes that within square-free sequences in {0, 1, 2}N , f (0) ≥ 0.2746 . . . . For k-repetition-free sequences with 2 < k ≤ 7/3, it was shown in [367] that f (0) = 12 , which is in agreement with the above result that all subwords belong to L(ρTM ). Proposition 3.129. Power-free and overlap-free subshifts are not sofic. Proof. If (X, σ) was sofic, then there would be a finite edge-labeled transition graph representing X; see Theorem 3.32. But then we can pass a loop arbitrarily often, creating any order powers. (This is basically the Pumping Lemma 7.9 from Section 7.2.2.)  Since k-power-free shifts don’t contain blocks 0k , they are not S-gap or spacing shifts either. Specifically, because power-free shifts contain no periodic sequences, we can ask whether power-free shifts are minimal. The entire k-power-free shift is not minimal, because it contains non-recurrent sequences, but there exist minimal k-power-free subshifts for any value of k > 0. Naturally this holds for the Thue-Morse shift, and by Lemma 3.123 other overlap-free shifts are obtained by performing substitutions to (XTM , σ). Theorem 4.4 in Section 4.1 shows that linearly recurrent shifts with constant L are L + 1-power-free. Sturmian shifts are also k-power-free for k sufficiently large if and only if their frequencies are of bounded type; see Example 4.46 in Section 4.2.5.

128

3. Subshifts of Positive Entropy

3.10. Dyck Shifts Dyck34 shifts first appeared in a paper by Krieger [373], whose interest was partially to give examples of subshifts with multiple measures of maximal entropy. Definition 3.130. A Dyck shift (X, σ) is a two-sided shift on an alphabet of k types of bracket pairs, such that every v ∈ L(X) can be extended to a word v  in such a way that all opening/closing brackets in v are closed/opened in v  and every two distinct pairs of opening and closing brackets are unlinked in v  . For example, ( ) [ ] is a legal word, as is ( ( ( ) [ ] [, but ( [ [ ) is illegal because the ( bracket is not allowed to be closed before the [ brackets are. Every Dyck shift has positive entropy, because it contains the coded shift with code words () and (()). However, a Dyck shift with at least two pairs of brackets is not synchronized. Indeed, there is no way that any word v can synchronize so that both ([v)] and [(v]) both become admissible. On the other hand, the Dyck shift is coded; see [450, Example 5.5]. Indeed, let C be the collection of all the well-formed expressions with brackets where each opening bracket is closed, without linking. In the terminology of groups generated by the brackets, these are the expressions that reduce to the identity if each pair of brackets ( ) = [ ] = · · · = Id. Example 3.131. The language of the Dyck shift with one pair of brackets is isomorphic to the language of the full shift on two symbols (with entropy log 2), because every word v ∈ {(, )}∗ can be extended by brackets on the left to supply brackets ( for every unopened ) and on the right to supply ) for every unclosed (. The collection Lext of such extended words in which every opening bracket is closed, and vice verse without illegally linked pairs, has a representation with a countable automaton; see Figure 3.20. It can also be represented as a push-down automaton (see Section 7.2.2), where we put or remove a plate on/from the stack whenever we read an opening/closing bracket. (

(

(

( ...

)

)

)

)

Figure 3.20. A countable automaton for the two bracket Dyck shift. 34 Named after Walther von Dyck (1856–1934) who, being a student of Klein, was more interested in group theory.

3.10. Dyck Shifts

129

Exercise 3.132. Show that 2n the number of well-formed expressions in Lext 1 of length 2n is Cn = n+1 n , i.e. the n-th Catalan number. The generating function of the Catalan numbers (with C0 = 1 by convention) is

GCat (x) :=



n=0

n

Cn x =

1−

√ 2 1 − 4x √ = . 2x 1 + 1 − 4x

  −3/2 9 −5/2 145 −7/2 4n + 8n + 128 n + ··· , n More precise asymptotics are Cn = √ π so the entropy of the Dyck shift with one pair of brackets is indeed log 2, just as in the full shift of two symbols: The allowed 2n-words are a small fraction of all 2n-words, but not an exponentially small fraction. To compute the entropy of the Dyck shift with k types of bracket pairs, we obtain the well-formed expressions of length 2n by starting with the wellformed expressions with one pair of brackets and then, for each joined pair of open-and-closing brackets, choosing one of the k bracket types independently. Thus there are Cn k n well-formed expressions with k types of bracket pairs, and the entropy is ≥ log 2 + 12 log k. This is only a lower bound, because not every 2n-word in this Dyck shift is a well-formed expression. The topological entropy is really log(k + 1), which follows from the next result by Krieger [373]. Theorem 3.133. The Dyck shift (X, σ) on k ≥ 2 types of bracket pairs has exactly two ergodic measures of maximal entropy log(k + 1), and each one is fully supported and isomorphic to a Bernoulli shift. The notion of isomorphism between measures, as well as the techniques used in the proof, are discussed in detail in Sections 6.5 and 6.1. Proof. Let B− ⊂ X be the set of all sequences in which every left bracket has a corresponding right bracket, and let B+ be the set of all sequences in which every right bracket has a corresponding left bracket. Note that B+ and B− are shift-invariant. One can show that every shift-invariant measure has μ(B− ∪ B+ ) = 1 by partitioning the complement into a countable collection of disjoint sets indexed by the location of the first/last left/right bracket with no partner. Define a map π+ : B+ → {0, 1, . . . , k}Z by sending the k left brackets to the symbols {1, . . . , k} and sending every right bracket to the symbol 0. Then π+ is an isomorphism between the two shift spaces because every right

130

3. Subshifts of Positive Entropy

bracket has a corresponding left bracket, and hence its identity is uniquely determined by the rules of the shift. Similarly, the analogous map π− : B− → {0, 1, . . . , k}Z is an isomorphism. Because every ergodic invariant measure on X is supported on either B− or B+ , we conclude that htop (X, σ) = log(k + 1) and that there are exactly two ergodic measures of maximal entropy μ± = ν ◦ π± , where ν is the Bernoulli measure on the full shift on k + 1 symbols that gives equal weight to all symbols. Each of these measures gives positive measure to every open set in X. Finally, note that if k = 1, then B+ and B− largely overlap. If we let ν be the ( 12 , 12 )-Bernoulli measure, then by the Law of Large Numbers, the mass is concentrated on sequences with zeros and ones occurring with frequency 1/2, so that the number of opening brackets and closing brackets  is asymptotically the same and μ+ = μ− . This and the next result from [373] have been shown in simplified form in the Math Blog of Climenhaga [157]. Proposition 3.134. The set of ergodic measures for the Dyck shift is arcwise connected but is not dense in the Choquet simplex of invariant measures (see Section 6.1 for the definition). Proof. Let M± denote the set of ergodic measures supported on B± . By the isomorphism in the previous proof, each of M+ and M− is arc-wise connected. Moreover, because B+ ∩ B− is a non-empty closed invariant subset of X, it supports at least one ergodic measure; hence M+ ∩ M− = ∅. This shows arc-wise connectedness of the set of ergodic measures Merg . To see that Merg is not dense in the Choquet simplex M (i.e. M is not a Poulsen simplex), let ν1 be the δ-measure supported on the fixed point . . . [[[. . . , and let ν2 be the δ-measure supported on the fixed point . . . ))) . . . . Let ν = 12 (ν1 + ν2 ). Then any ergodic measure μ close to ν in the weak∗ topology must give mass close to 12 to each of the 1-cylinders corresponding to [ and ), and almost no mass to the 1-cylinders corresponding to ] and (. Thus if x is a typical35 point for μ, most symbols in x are [ and ). However, the Dyck shift does not contain such x, because the symbol ) cannot appear until all the preceding symbols [ have been closed with the corresponding symbol ]. This contradiction shows that ν cannot be approximated by ergodic measures  in the weak∗ topology, so Merg is not dense in M.

35 In

the sense of the Birkhoff Ergodic Theorem 6.13.

3.10. Dyck Shifts

131

D

C

A

A

F B

B

D C

Figure 3.21. A three-dimensional ‘heterogeneous’ baker transformation.

Example 3.135. A piecewise affine and ‘heterogeneous’ (i.e. stable manifolds don’t have the same dimension at every point) hyperbolic map F : [0, 1]3 → [0, 1]3 (see Figure 3.21) is defined as ⎧ ⎪ if (x, y, z) ∈ A, ⎪(4x − 2, y/2, (1 + z)/2) ⎪ ⎪ ⎨(4x − 3, (1 + y)/2, (1 + z)/2) if (x, y, z) ∈ B, F (x, y, z) = ⎪ (2x, 2y, z/4) if (x, y, z) ∈ C, ⎪ ⎪ ⎪ ⎩(2x, 2y − 1, (1 + z)/4) if (x, y, z) ∈ D, and primes indicate the F -images of each of these four boxes; see [483]. That is, the two pizza-box shaped regions A and B are mapped into shoeboxes A and B  , and the shoe-box shaped regions C and D are mapped into pizza-boxes C  and D  . The partition into these four boxes is not a Markov partition because of the heterogeneity of the hyperbolicity. The symbolic shift (X, σ) associated with this partition (i.e. a subshift of {A, B, C, D}Z ) is not an SFT . For example, AC can be followed by D but AAC cannot be followed by D. However, (X, σ) is the Dyck shift with two types of brackets, by the sliding block code A→(

B→[

C →)

D →].

Consequently, it is a context-free subshift (see Section 7.2.2) and also synchronized.

Chapter 4

Subshifts of Zero Entropy Circle rotations and more generally interval exchange transformations are zero entropy dynamical systems, whose symbolic versions are well-studied subshifts. Substitution shifts are another major class of zero entropy subshifts that were studied also before their role as symbolic description of dynamical systems (e.g. translations on Rauzy fractals) became apparent. In this chapter we also discuss adding machines (odometers) although these are not subshifts, but they are at the core of Toeplitz shifts and B-free shifts.

4.1. Linear Recurrence Definition 4.1. A subshift (X, σ) is linearly recurrent if there is L ∈ N such that for every w ∈ L(X) and x ∈ X, there is 0 < k ≤ L|w| such that σ k (x) ∈ [w]. That is, every word w ∈ L(X) reoccurs with gap ≤ L|w|. This notion is stronger than uniformly recurrent, in that it relates the N = N (U ) in the definition of uniform recurrence (in the case that U is a cylinder set) in a uniform way to the length of U . An equivalent definition, in terms of shift-invariant measures, is given in Lemma 6.30. Examples of minimal shifts that are not linearly recurrent can be found among the Sturmian shift, i.e. symbolic versions of circle rotations; see Section 4.3.3. To be precise, a Sturmian shift is linearly recurrent if and only if its associated rotation number is of bounded type; see Section 8.4. Definition 4.2. Given u ∈ L(X), we call w a return word for u if • u is a prefix and suffix of wu but u does not occur elsewhere in wu; • wu ∈ L(X). We denote the collection of return words of u by Ru . 133

134

4. Subshifts of Zero Entropy

In other words, we can write every recurrent point x ∈ [u] as (4.1)

x = w1 w2 w3 w4 w5 w6 · · · = uw1 uw2 uw3 uw4 uw5 uw6 · · · ,

where uwj = wj ∈ Ru for each j ∈ N, and the only appearances of u are as prefix and suffix of wj , j ≥ 1. If (X, σ) is minimal (and hence u appears with bounded gaps), then Ru is finite. Example 4.3. Construct ρ ∈ {0, 1}N by setting ρ1 = 0, ρ2 = 1, and recursively k ≥ 1, ρSk +1 · · · ρSk+1 = ρ1 · · · ρSk−1 , for the Fibonacci numbers S0 , S1 , S2 , S3 , . . . = 1, 2, 3, 5, . . . This gives ρ = 01 0 01 010 01001 01001010 0100101001001 · · · . (This sequence is in fact the fixed point of the Fibonacci substitution of Example 4.6.) If u = 010010, then w = 010 ∈ Ru because wu = 010|010010 starts and ends with u (even though these occurrences of u overlap). Note that it is therefore possible that w ∈ Ru is shorter than u. However, at least one of the return words has to be longer than u because otherwise u always returns in ρ with gap ≤ n = |u| and therefore p(n) ≤ n and ρ is periodic by Proposition 1.12. The following result is due to Durand, Host & Skau [225]. Theorem 4.4. Let the subshift (X, σ) be non-periodic and linearly recurrent with constant L. Then: (i) The word-complexity is sublinear: p(n) ≤ Ln for all n ∈ N, so htop (X, σ) = 0. / L(X) for any word u = . (ii) X is L + 1-power-free; i.e. uL+1 ∈ (iii) For all w ∈ Ru , |u| < L|w|. (iv) #Ru ≤ L(L + 1)2 . (v) Every factor (Y, σ) of (X, σ) is linearly recurrent1 . Proof. (i) Linear recurrence implies that for every n ∈ N and every word u ∈ Ln (X) and x ∈ X, the occurrence frequency of u in x satisfies 1 1 . lim inf #{1 ≤ i ≤ k : xi · · · xi+n−1 = u} ≥ k→∞ k Ln Therefore there is no space for more than Ln words of length n. (ii) If v ∈ Ln (X), then the gap between two occurrences of v is ≤ Ln, so every word u of length (L + 1)n − 1 contains v at least once. If v L+1 ∈ L(X), then all words of length n are cyclic permutations of v because the 1 As shown in [221, Theorem 1], a linearly recurrent subshift has, up to isomorphism, only finitely many different factors.

4.2. Substitution Shifts

135

gap between any other words of length n becomes too large otherwise; cf. Proposition 1.12. But then X is periodic. (iii) Take u ∈ L(X) and w ∈ Ru . If |u| ≥ L|w|, then the word wu (which starts and ends with u) must have wL+1 as prefix. This contradicts (ii). (iv) Take u ∈ L(X) and v ∈ L(X) of length (L + 1)2 |u|. By the proof of (ii), every word of length ≤ (L + 1)|u| occurs in v and, in particular, every return word w ∈ Ru occurs in v. Now return words in v don’t overlap, see (4.1), so using the minimal length |w| ≥ |u|/L of return words (from item (iii)), we find #Ru ≤ |v|/(|u|/L) = L(L + 1)2 . (v) Finally, suppose that the subshift (Y, σ) over alphabet B is a factor of (X, σ) and f : A2N +1 → B is the corresponding sliding block code, so 2N + 1 is its window size. Take u ∈ L(X) of length |u| ≥ 2N + 1 and v its image under f . Then |v| = |u| − 2N . If w ∈ Rv , then |w| ≤ max{|s| : s ∈ Ru } ≤ L|u| ≤ (|v| + 2N )L ≤ (2N + 1)|v|L. Therefore Y is linearly recurrent with constant (2N + 1)L. The bound (2N + 1)L is not the sharpest. One can show that for every ε > 0, there is L0 such that for all n ≥ L0 , x ∈ X, and v ∈ Ln (X), the gap between two occurrences of v in x is at most (L + ε)n. 

4.2. Substitution Shifts Definition 4.5. Let A = {0, . . . , N − 1} be a finite alphabet. A substitution2 χ is a map that assigns to every a ∈ A a single word χ(a) ∈ A∗ : ⎧ ⎪ ⎪0 → χ(0), ⎪ ⎪ ⎨1 → χ(1), χ: .. ⎪ ⎪ . ⎪ ⎪ ⎩ N − 1 → χ(N − 1) and extends to A∗ (and to AN ) by concatenation: χ(a1 a2 · · · ar ) = χ(a1 )χ(a2 ) · · · χ(ar ). The substitution is of constant length if |χ(a)| is the same for every a ∈ A.  0 → 01, acts as Example 4.6. The Fibonacci substitution χFib : 1→0 0 → 01 → 010 → 01001 → 01001010 → 0100101001001 → · · · . 2 Some authors use the word morphism for substitution. Formally, a morphism ψ : X → Y is a map for which ψ(xy) = ψ(x)ψ(y) holds, provided concatenations (or products) are properly defined on X and Y . The word substitution agrees better with our intuition of this concept, so we will use substitution.

136

4. Subshifts of Zero Entropy

The lengths of χn (0) are exactly the Fibonacci numbers. The limit word ρFib is also a Sturmian sequence, namely the one associated to the golden mean as rotation number; see Section 4.3. Lemma 4.7. Assume that χ(a) is non-empty for every a ∈ A. Then for every a ∈ A, χn (a) tends to a periodic orbit of χ as n → ∞. Proof. As can be seen in Example 4.6, if a is the first symbol of χ(a), then χ(a) is a prefix of χ2 (a), which is a prefix of χ3 (a), etc. Therefore χn (a) tends to a fixed point of χ as n → ∞. Since #A = N , there must be p < r ≤ N such that χp (a) and χr (a) start with the same symbol b. Now we can apply the above argument to  χr−p and b. Example 4.8. Take χ(0) = 10 and χ(1) = 1. Then 0 → 10 → 110 → 1110 → 11110 → · · · → 1∞ fixed by χ. 1 → 1 fixed by χ. The second line of this example is not interesting, so we will usually make the assumption (4.2)

lim |χn (b)| = ∞

n→∞

for all b ∈ A.

Example 4.9. Recall the Thue-Morse sequences ρ0 and ρ1 from Example 1.6. Applying the sliding block code f ([01]) = f ([10]) = 1 and f ([00]) = f ([11]) = 0, the images of ρ0 and ρ1 are the same: (4.3)

ρfeig = 10 11 1010 10111011 1011101010111010 · · ·

which is the fixed point of the period doubling or Feigenbaum substitution  0 → 11, (4.4) χfeig : 1 → 10. This sequence appears as the kneading sequence (i.e. itinerary of the critical value) of the (infinitely renormalizable) Feigenbaum interval map; see Section 4.7.1. It is also a Toeplitz sequence; see Example 4.86. Example 4.10. The paper folding sequence is obtained by taking a strip of paper, folding it (by a 180◦ right turn), folded over once more (by a 180◦ right turn), and again, etc. Then, after unfolding the paper strip again, write down the traces of the folds on the paper: 1 for a right-turn fold, 0 for a left-turn fold; see Figure 4.1.

4.2. Substitution Shifts

 1 



137

fold

-

  1

0

 1 

fold

-

   1 0

 1 

  0 1 0 1

 !

Figure 4.1. Folding a paper strip: 1 = right-turn, 0 = left-turn.

The folding patterns this creates are 1 110 1101100 1101001110100 110100111010011101000110100 1101001110100111010001101001110100111010001101000110100 .. .. .. .. .. .. .. . . . . . . . Since the n-th stage sequence is always a prefix of the n+1-st stage sequence, there is a well-defined limit x. In folded form, the strip of paper looks much like the zero-composant of the so-called Knaster continuum3 ; see Figure 4.2 (left) 4 . Note that after the central right-turn, the second half of the strip follows the first half in the opposite direction. This explains the palindromic anti-symmetry x2n −k = 1 − x2n +k for all n ≥ 1 and 1 ≤ k < 2n . If we untighten the folded paper half-way in such a way that all 180◦ angles become 90◦ angles, then a fractal called Heighway dragon5 appears; see Figure 4.2 (right). The paper-folding sequence is also generated by the block substitution ⎧ ⎧ ⎪ ⎪ 11 → 1101, 3 → 31, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨10 → 1100, ⎨2 → 30, χpf : equivalent to χ : ⎪01 → 1001, ⎪ 1 → 21, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩00 → 1000 ⎩0 → 20 3 Also

called bucket handle. is also the standard planar embedding of the inverse limit space of the tent map with slope 2 or of the unimodal Chebyshev polynomial Q2 (x) = 4x(1 − x). 5 After the NASA physicist John Heighway; see e.g. [529] for a bit of the story and more mathematical background and references. 4 This

138

4. Subshifts of Zero Entropy

Figure 4.2. The Knaster continuum and the Heighway dragon.

in a 4-letter alphabet. Each 2-letter block has enough information to determine what it looks like after one more fold; see Figure 4.1. The fixed point of χpf is ρpf = 11 01 1001 11001001 1101100011001001 11011001110010001101 · · · , which can be recoded to ρpf % 31 21 3021 31203021 3121302031203021 · · · . Closer inspection shows that the 3’s and 2’s appear alternatingly, and the 1’s and 0’s appear in a pattern equal to ρpf itself. Definition 4.11. A substitution subshift is any subshift (X, σ) that can be written as X = Xρ = orbσ (ρ) where ρ is a fixed point (or periodic point) of a substitution satisfying (4.2). Lemma 4.12. For every substitution with a non-shift-periodic fixed point ρ, the subshift (Xρ , σ) has at least one asymptotic pair, i.e. a pair of distinct points x, y such that limn d(σ n (x), σ n (y)) = 0. For example, the Feigenbaum substitution has an asymptotic pair (0ρ, 1ρ), and each other asymptotic pair is a backward shift of this pair. The Thue-Morse shift has two asymptotic pairs (0ρ0 , 1ρ0 ) and (0ρ1 , 1ρ1 ). Lemma 4.12 holds for all infinite expansive subshifts, as can be derived from [362, Theorem 2.1], but the current setting allows a short proof. Proof. Since Xρ is not a collection of periodic orbits, there is some leftspecial word w and a = b ∈ A such that χn (aw) is not a suffix of χn (bw) for every n ≥ 0 and vice versa. Therefore there are a = b ∈ A, x ∈ Xρ , and a sequence (kn )n∈N such that σ kn ◦ χn (aw) → a x

and

σ kn ◦ χn (bw) → b x,

as n → ∞. Hence the limit words a x and b x are asymptotic.



4.2. Substitution Shifts

139

Lemma 4.13. Each one-sided substitution shift space (Xρ , σ) admits a twosided substitution shift extension. Proof. By Lemma 4.7, we can assume that χ(a) starts with a. First define χ on two-sided sequences as ρ(· · · x−2 x−1 .x0 x1 x2 x3 · · · ) = · · · ρ(x−2 )ρ(x−1 ).ρ(x0 )ρ(x1 )ρ(x2 )ρ(x3 ) · · · , where the central dot indicates where the zeroth coordinate is. To create a two-sided substitution shift, take some i > 1 such that ρi = a, and let a = ρi−1 . Similar to the argument of Lemma 4.7, there is b ∈ A and p < q ∈ N such that ρp (a ) and ρq (a ) both end in b. Set K = q − p, so ρK (b) ends with b. Next iterate ρK (b.a) repeatedly, so that limn ρnK (b.0) =: ρˆ is a ˆ ρ = {σ n (ˆ ρ) : n ∈ Z}. two-sided fixed point of ρK . Finally, set X Even though ρˆ need not be unique (because the choices of b and K are ˆ ρ is unique.  not unique), due to minimality (see below), the shift space X 4.2.1. Primitive Substitutions. Definition 4.14. The associated or transition matrix of a substitution χ is the matrix A = (aij )i,j∈A such that aij = |χ(j)|i is the number of symbols i appearing in χ(j). We call χ aperiodic and/or irreducible if A is aperiodic and/or irreducible, in the sense of the Perron-Frobenius Theorem 8.58; see Definition 3.6. The substitution is primitive if it is both irreducible and aperiodic. Equivalently, χ is irreducible if for every i, j ∈ A there exists n ≥ 1 such that i appears in χn (j). This way of writing the associated matrix (and not its transpose) ensures that composition of substitutions and composition of associated matrices = Aχ˜ · Aχ . work in the same way: Aχ◦χ ˜ Lemma 4.15. Let a primitive substitution χ with χ(0) = 0 · · · have the fixed point ρ = 0 · · · and associated matrix A. Let v bethe right eigenvector of the leading eigenvalue of A. If v is scaled so that i vi = 1, then vj = limn n1 #{1 ≤ i ≤ n : ρi = j} is the frequency of the j-th letter in ρ. Proof. Let u = (uj )j∈A , uj = |w|j /|w|, be the frequency vector of some word w = 0 · · · ∈ A∗ and let u be the frequency vector of χ(w). Then ui =



j

aij uj aij uj ,

i,j

so u = f (u) :=

Au Au1 . Since χ is primitive, the Perronthat f n (u) converges to the leading eigen-

Frobenius Theorem 8.58 assures vector, which is therefore the frequency vector of the letters in the fixed point  ρ = limn χn (w). Remark 4.16. By taking the associated matrix A instead of the substitution, we lose the order structure of the substitution words. For instance, the

140

4. Subshifts of Zero Entropy

Thue-Morse substitution χTM and the substitution χ : 0 → 01, 1 → 01 have the same associated matrix, but they behave entirely differently as subshifts. The associated matrix is called the abelianization of the substitution. A method to retain the order information is by taking matrices with power series as entries: Let x = (xa )a∈A be a formal vector and set |χ(j)|

aij (x) =

k=1

δχ(j)k ,i

k−1 

xχ(j) ,

=1

where δa,b is the Dirac delta and an empty product is 1. Then A(1, . . . , 1) = A and A(x) satisfies the composition rule for substitutions: Aχ◦ψ (x) = Aχ (ATψ (x)) · Aψ (x). See e.g. [132]. Theorem 4.17. Let χ be a substitution satisfying hypothesis (4.2). Assume that χ(a) starts with a, and let ρ be the corresponding fixed point of χ. Then the corresponding substitution subshift (Xρ , σ) is minimal if and only if for every b ∈ A appearing in ρ, there is k ≥ 1 such that χk (b) contains a. Proof. If Xρ is minimal (i.e. uniformly recurrent according to Proposition 2.17), then every word, in particular a, appears with bounded gaps. Let b be a letter appearing in ρ. Then χk (b) is a word in χk (ρ) = ρ, and since |χk (b)| → ∞ by (4.2), χk (b) must contain a for k sufficiently large. Conversely, let k(b) = min{i ≥ 1 : χi (b) contains a} and K = max{k(b) : b appears in ρ}. Set Δb = χk(b) (b) and decompose ρ into blocks: ρ = Δρ1 Δρ2 Δρ3 · · · = ρ1 · · · ρk(ρ1 ) ρk(ρ1 )+1 · · · ρk(ρ1 )+k(ρ2 ) ρk(ρ1 )+k(ρ2 )+1 · · · . By the choice of k(ρj ), each block Δρj contains an a, so a appears with gap K. Now take w ∈ L(Xρ ) arbitrary. There exists m ∈ N such that w appears in χm (a). By the above, w appears in each χm (Δρj ) and hence w appears with gap maxj |χm (Δρj )| = max{|χm+k(b) (b)| : b appears in ρ}. This proves the uniform recurrence of ρ. The minimality of the orbit closure Xρ follows from Corollary 2.22.  Theorem 4.18 below shows that if χ is primitive, then (Xρ , σ) is linearly recurrent and hence of linear complexity (pρ (n) ≤ Ln), but it doesn’t exclude that ρ is periodic. For instance,  0 → 010, (4.5) χ: 1 → 101 produces two fixed points ρ0 = (01)∞ and ρ1 = (10)∞ . We call a substitution such that its fixed point ρ is not periodic under the shift aperiodic. Note

4.2. Substitution Shifts

141

that this is different from ‘the associated matrix of χ is aperiodic’, so be aware of this unfortunate clash of terminology. A mild assumption dispenses with such periodic examples, and then pρ (n) ≥ n + 1; see Proposition 1.12. Theorem 4.18. Every primitive substitution shift is linearly recurrent. We follow the exposition of Durand [221, 222] here; the paper [176] shows that for substitution shifts, linear recurrence is equivalent to minimality. Proof. Let χ : A → A∗ be the substitution with fixed point ρ and (Xρ , σ) the corresponding shift. Let Sk := sup{|χk (a)| : a ∈ A }

and

Ik := inf{|χk (a)| : a ∈ A}.

Note that Ik ≤ S1 Ik−1 and I1 Sk−1 ≤ Sk for all k ∈ N. Since χ is primitive, for every a, b ∈ A there exists Na,b such that χNa,b (a) contains b. Therefore |χk (b)| ≤ |χk+Na,b (a)| ≤ SNa,b |χk (a)|

for all k ∈ N.

Hence, taking N = sup{Na,b : a, b ∈ A}, we find Ik ≤ Sk ≤ SN Ik

for all k ∈ N.

Now let u ∈ L(Xρ ) and v ∈ Ru be arbitrary. Choose k ≥ 1 minimal so that |u| ≤ Ik . Therefore there exists a 2-word ab ∈ L(Xρ ) such that u appears in χk (ab). Let R be the largest distance between two occurrences of any 2-word in L(Xρ ). Then R is finite by minimality of the shift. We have |v| ≤ RSk ≤ RSN Ik ≤ RSN S1 Ik−1 ≤ RSN S1 |u|. This proves linear recurrence with L = RSN S1 .



A more general result on complexity of substitutions (without the assumption of primitivity) is due to Pansiot [441–443]. Theorem 4.19. If χ : A → A∗ is a non-erasing (i.e. χ(a) = , the empty word, for every a ∈ A) substitution with χ(a) = au for some a ∈ A,  = u ∈ A∗ , then the complexity of ρ = limn χn (a) is one of the following: (1) pρ (n) is bounded (if ρ is (pre)periodic). (2) pρ (n) ≈ n, including the primitive case. (3) pρ (n) ≈ n log log n. (4) pρ (n) ≈ n log n. (5) pρ (n) ≈ n2 . Here pρ (n) ≈ a(n) means that there is C > 0 such that C −1 a(n) ≤ pρ (n) ≤ Ca(n) for all n sufficiently large.

142

4. Subshifts of Zero Entropy

Deviatov [197] extended these results to S-adic shifts; see Section 4.2.5. Example 4.20. If we remove the non-erasing condition in the above theorem, then even more asymptotics for p(n) become possible. Let A = {a, b0 , . . . , br } for some r = N and let χ : A → A∗ be given by ⎧ ⎪ ⎨a → abr , χ : bk → bk bk−1 , for k = 1, . . . , r, ⎪ ⎩ b0 → b0 . Then χ has a unique fixed point, which for e.g. r = 3 looks like ρ = ab3 . b3 b2 . b3 b2 b2 b1 . b3 b2 b2 b1 b2 b1 b1 b0 . b3 b2 b2 b1 b2 b1 b1 b0 b2 b1 b1 b0 b1 b0 b0 · · · .

       v1

v2

v3

v4

Set vi = χi (abr ) for i ≥ 0. The dots separate the blocks wi , where w0 = abr and wi is the  suffix of vi of length |vi | − |vi−1 |. Then symbol bk appears i exactly r−k times in wi . Next apply an erasing substitution χ ˜ : A → {0, 1}∗ given by ⎧ ⎪ ⎨a → , χ ˜ : bk → 0, for k = 0, . . . , r − 1, ⎪ ⎩ br → 1

to ρ. Then n0

n1

n2

n3

n4

ρ˜ := χ(ρ) ˜ = 1.10 .10 .10 .10 .10 .10

n5

···

for ni =

r  

i k=1

k

≈ ir /r!.

It can be shown (see [68, Proposition 4.7.2]) that the complexity of ρ˜ is √ pρ˜(n) ≈ n r n. Regarding the amorphic complexity of primitive constant length substitutions, Fuhrmann & Gröger [262] proved the following result: Theorem 4.21. Let χ : {0, 1} → {0, 1}∗ be an aperiodic primitive substitution of constant length , and let (X, σ) be the associated subshift. Then the amorphic complexity ac(σ) =

log  log  − log ∗

for ∗ = #{1 ≤ i ≤  : χ(0)i = χ(1)i }.

In this theorem, ac(σ) = ∞ is allowed if the denominator log  − log ∗ = 0, such as is the case with the Thue-Morse substitution.

4.2. Substitution Shifts

143

4.2.2. Block and Return Word Substitutions. We can view substitutions also on the level of -block shifts, as in Section 1.4. That is, we introduce a new alphabet A having the -words in L (X) as letters, and we study χ as substitution χ : A → A∗ . To this end, if u = u1 u2 · · · u ∈ A and χ(u) = v and k = |χ(u1 )|, define ¯ := w1 · · · w w2 · · · w+1 · · · wk · · · w+k−1 . χ (u) = w

      w ¯1

w ¯2

w ¯k

That is, w ¯j is the j-th word of length  inside w = χ(u). Note that |χ (u)| is equal to |χ(u1 )|, which is not necessarily the same as the number of -words that fit in χ(u). For example, if χ = χFib : 0 → 01, 1 → 0 is the Fibonacci substitution on the alphabet A = {0, 1}, and  = 3, then the new alphabet A = {001, 010, 100, 101} = {a, b, c, d} and ⎧ ⎧ ⎪ ⎪ a → bd because |χ(u1 )| = 2, 001 → 01010, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨b → bc ⎨010 → 01001, because |χ(u1 )| = 2, χ: χ : ⎪ ⎪ c→a because |χ(u1 )| = 1, 100 → 00101, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩d → a ⎩101 → 0010, because |χ(u1 )| = 1, with associated matrices

⎛ ⎞ 0 0 1 1 ⎜1 1 0 0⎟ 1 1 ⎟ A= and A2 = ⎜ ⎝0 1 0 0⎠ 1 0 1 0 0 0 √ √ and eigenvalues 12 (1 ± 5) and 12 (1 ± 5), 0, 0, respectively. For the second iterate: ⎧ ⎧ ⎪ ⎪ 001 → 01001001, a → bca because |χ2 (u1 )| = 3, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨010 → 01001010, ⎨b → bca because |χ2 (u )| = 3, 1 χ2 : χ2 : 2 ⎪ ⎪ 100 → 01010010, c → bd because |χ (u1 )| = 2, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎩ 101 → 0101001, d → bd because |χ2 (u1 )| = 2. 



This example shows that powers of χ and powers of χ match: f ◦ χn (x) = χn ◦ f (x) if f is the transposition of words x ∈ A∗ into words in A∗ . Proposition 4.22. Let χ be the -block version of the substitution χ, with associated matrix A . If χ is a primitive substitution, then so is χ , and the leading eigenvalue of A is equal to the leading eigenvalue of A1 . For the remaining eigenvalues of A , they are the same as those of A2 , possibly with extra eigenvalues 0. Proof. We follow the proof in [465, Section V.5]. Since χ : A → A∗ is primitive, every word u appears in χn (a), a ∈ A, for n sufficiently large. But then χ : A → A∗ is also a primitive substitution.

144

4. Subshifts of Zero Entropy

Let λ and λ be the leading eigenvalues of the associated matrices A and A of χ and χ , respectively. Let u = u1 · · · u be the “first” letter of the alphabet A , where u1 · · · u is a prefix of χ(u1 · · · u ). Then |χ(u1 · · · u )n+1 | |χ (u)n+1 | = = λ. n→∞ |χ (u)n | |χ(u1 · · · u )n |

λ = lim

Now for the remaining eigenvalues, fix  ≥ 3 and take p ∈ N so that |χp (a)| ≥  − 1 for each a ∈ A. This means that χp (w1 w2 ) = y1 y2 · · · y|χp (w1 w2 )| is long enough so we can properly define ψ : A2 → A , ψ (w1 w2 ) = y1 y2 · · · y y2 y3 · · · y+1 · · · y|χp (w1 )| y|χp (w1 )|+1 · · · y|χp (w1 )|+−1 .

   

  w ¯1 ∈A

w ¯2 ∈A

w ¯|χp (w1 )|

Conversely, let ψ2 : A → A2 , ψ2 (w1 w2 · · · w ) = w1 w2 be the reduction of w ∈ A to its first two letters in A. Then the following diagram commutes: A∗

χp

ψ2 A∗2

A∗

χ

ψ χ2

A∗

ψ A∗2

χp2

ψ2 A∗2

That is, ψ ◦ ψ2 = χp and ψ2 ◦ ψ = χp2 . Let A and Aˆ be the abelianizations of χ2 and χ , and let D and C be the abelianizations of ψ2 and ψ . Then the commutative diagram translates to A and Aˆ being shift equivalent with lag ; see Definition 3.25. By Lemma 3.27 they have the same eigenvalues up to possibly 0.  Corollary 4.23. With the notation from the proof of Proposition 4.22, let v be a right eigenvector of A. Then v  = Cv is a right eigenvector of the associated matrix Aˆ for the same eigenvalue. In particular, if v is the leading eigenvector of χ2 , then the normalization of the vector Cv is the frequency vector of the -words in the fixed point of χ. ˆ ˆ we have Av ˆ  = ACv = CAv = Cλv = λv  . Proof. Since CA = AC,



Example 4.24. In order to show that A = A1 itself does not fully determine the eigenvalues of A , we consider   1 → 121, 1 → 112, χ: and ψ : 2 → 212 2 → 212;

4.2. Substitution Shifts

145

see [3]. Clearly they both have the same associated matrix with eigenvalues 1 and 3. However, the fixed point of χ is a shift-periodic sequence 121212121 · · · and χ = χ for each  if we recode the two -blocks by their first letters. For ψ with {a, b, c, d, e, f } = {112, 212, 121, 211, 122, 221} we have ⎧ ⎪ ⎛ ⎞ a → abd, ⎪ ⎪ 1 0 1 0 1 0 ⎪ ⎪ ⎪ b → bcd, ⎪ ⎜1 1 0 1 0 1⎟ ⎪ ⎪ ⎜ ⎟ ⎨c → aef, ⎜0 1 0 1 0 0⎟ ⎜ ⎟ with associated matrix A2 = ⎜ ψ2 : 1 1 0 1 0 0⎟ ⎪d → bcd, ⎪ ⎜ ⎟ ⎪ ⎪ ⎝0 0 1 0 1 1⎠ ⎪ ⎪e → aef, ⎪ ⎪ ⎪ 0 0 1 0 1 1 ⎩f → bef √ which has eigenvalues 3, 1, 12 (1 ± 3i), 0, 0. In Table 4.1 we worked out some of the details for the Fibonacci substitution. Rather than dividing words x ∈ Xρ into blocks of equal length, we can also divide x into return words. Suppose that ρ is the fixed point of a primitive substitution χ : A → A∗ and u =  is a prefix of ρ. Then we can divide ρ into blocks equal to the return words v ∈ Ru , simply by starting a new block at every next occurrence of u in ρ. Let Θu : Au := {1, . . . , #Ru } → Ru be a bijection such that Θu (1) is the first return word in this decomposition. u Thus there is a sequence ρu ∈ AN u such that the concatenation Θu (ρ ) = ρ. The following results are due to Durand [219] Lemma 4.25. Let χ : A → A∗ be a primitive substitution with fixed point ρ and Θu , ρu ∈ AN u for some non-empty prefix u of ρ be as above. Then there is a primitive substitution χu : Au → A∗u with fixed point ρu such that Θu ◦ χu = χ ◦ Θu

and

χu (ρu ) = ρu .

Proof. If u is a prefix of w and w is a prefix of ρ, then each return word in Rw is a concatenation of return words in Ru . Let v = Θu (a) ∈ Ru be arbitrary. Then u is a prefix of vu and χ(u) is a prefix of χ(v)χ(u) and of χ(u) (because vu is a prefix of ρ). Now χ(v) is a subword of ρ that starts with u and is succeeded in ρ by u. Therefore χ(v) = Θu (a1 ) · · · Θu (an ) is some concatenation of return words in Ru . Hence, if we set χu (a) = a1 · · · an and do likewise for all a ∈ Au , then Θu ◦ χu = χ ◦ Θu . By construction, χu (1) starts with 1, so limk χku (1) = ρu and Θu (ρu ) = ρ.

146

4. Subshifts of Zero Entropy

Table 4.1. Block substitutions of the Fibonacci substitution.

χ leading left eigenvector

χ ψ

associated matrices

⎧ ⎪ ⎨a → bc, χ2 : b → bc, ⎪ ⎩ c→a

⎛ ⎞ 0 0 1 ⎝1 1 0⎠ 1 1 0

 = 3, p = 2 ⎧ ⎪ a := 001 → 01010, ⎪ ⎪ ⎪ ⎨b := 010 → 01001, χ3 : ⎪ c := 100 → 00101, ⎪ ⎪ ⎪ ⎩d := 101 → 0010 ⎛ ⎞ 1+γ ⎜1 + 2γ ⎟ ⎜ ⎟ ⎝1+γ ⎠ γ

⎧ ⎪ a → bd, ⎪ ⎪ ⎪ ⎨b → bc, χ3 : ⎪ c → a, ⎪ ⎪ ⎪ ⎩d → a ⎧ ⎪ ⎨a → bca, ψ3 : b → bca, ⎪ ⎩ c → bd

⎛ 0 ⎜1 ⎜ ⎝0 1 ⎛ 1 ⎜1 ⎜ ⎝1 0

 = 4, p = 3 ⎧ ⎪ a := 0010 → 0101001, ⎪ ⎪ ⎪ ⎪ ⎪ ⎨b := 0100 → 0100101, 4 χ : c := 0101 → 010010, ⎪ ⎪ ⎪ d := 1001 → 001010, ⎪ ⎪ ⎪ ⎩ e; = 1010 → 001001 ⎛ ⎞ 1 + 2γ ⎜1 + 2γ ⎟ ⎜ ⎟ ⎜1+γ ⎟ ⎜ ⎟ ⎝1 + 2γ ⎠ 1+γ

⎧ ⎪ a → ce, ⎪ ⎪ ⎪ ⎪ ⎪ ⎨b → bd, χ4 : c → bd, ⎪ ⎪ ⎪ d → a, ⎪ ⎪ ⎪ ⎩ e→a ⎧ ⎪ ⎨a → bdace, ψ4 : b → bdace, ⎪ ⎩ c → bda

=2 ⎧ ⎪ ⎨a := 00 → 0101, 2 χ : b := 01 → 010, ⎪ ⎩ c := 10 → 001 ⎛ ⎞ 1 ⎝γ ⎠ γ

⎛ 0 ⎜0 ⎜ ⎜1 ⎜ ⎝0 1 ⎛ 1 ⎜1 ⎜ ⎜1 ⎜ ⎝1 1

1 0 0 0

1 1 1 0

⎞ 0 1⎟ ⎟ 0⎠ 1

⎞ 1 0⎟ ⎟ 0⎠ 0

0 1 0 1 0

0 1 0 1 0

1 0 0 0 0

0 1 1 0

1 1 1 1 1

⎞ 1 1⎟ ⎟ 0⎟ ⎟ 1⎠ 0

⎞ 1 0⎟ ⎟ 0⎟ ⎟ 0⎠ 0

Finally, if χu is not primitive, then there are a, b ∈ Au such that Θu (b) is not a subword of χk (Θu (a)) = Θ(χku (a)) for any k ≥ 1. But this contradicts that χ is primitive. 

4.2. Substitution Shifts

147

For each prefix u =  of ρ, we call χu a derived substitution of χ. Corollary 4.26. Every primitive substitution has only finitely many different derived substitutions. Proof. Recall from Theorem 4.18 that a primitive substitution shift is linearly recurrent; let L be the correspondent constant. Then, independently of the prefix u =  of ρ, we have by Theorem 4.4 that |u| ≤ |v| ≤ L|u|, and |χ(v)| ≤ KL|u| L for K = supa∈A |χ(a)|. Therefore there is no space for more than finitely many different substitutions.  #Au = #Ru ≤ L(L + 1)2 ,

Proposition 4.27. All derived substitutions of a primitive substitution χ have the same eigenvalues, possibly with extra eigenvalues 0. Proof. Let u and v be prefixes of the fixed point ρ of χ so that u is a prefix of v. Return words of Rv are concatenations of return words of Ru , so there is a substitution ψ : Av → A∗u such that Θu ◦ ψ = Θv . Hence (4.6)

Θu ◦ χu ◦ ψ = χ ◦ Θu ◦ ψ = χ ◦ Θv = Θv ◦ χv = Θu ◦ ψ ◦ χv .

Next take  so large that v is a prefix of χ (u). Return words of χ (u) are concatenations of return words of Rv , so there is a substitution ψ˜ : Au → A∗v such that Θv ◦ ψ˜ = χ ◦ Θu . Hence (4.7) Θv ◦ χv ◦ ψ˜ = χ ◦ Θv ◦ ψ˜ = χ+1 ◦ Θu = χ ◦ Θu ◦ χu = Θv ◦ ψ˜ ◦ χu . Also (4.8)

⎧ ⎨Θv ◦ ψ˜ ◦ ψ = χ ◦ Θu ◦ ψ = χ ◦ Θv = Θv ◦ χv , ⎩Θ ◦ ψ ◦ ψ˜ = Θ ◦ ψ˜ = χ ◦ Θ = Θ ◦ χ . u v u u u

Removing the left-factors Θu and Θv in (4.6), (4.7), and (4.8) gives χu ◦ ψ = ψ ◦ χv , χv ◦ ψ˜ = ψ˜ ◦ χu and ψ˜ ◦ ψ = χv ψ ◦ ψ˜ = χu . This means that the abelianizations A and Aˆ of χu and χv are shift equivalent of lag , with the abelianizations D and C of ψ and ψ˜ as conjugating matrices. By Lemma 3.27, A and Aˆ have the same eigenvalues, up to 0.  By the same sort of argument, Durand [219, Proposition 9] also showed that these eigenvalues are the same as those of χ, except for possible 0 and roots of unity. Example 4.28. The √ primitive Pisot substitution χ : 0 → 0110, has eigenvalues 12 (3 ± 17) and fixed point

1 → 010

ρ = 0 110 0100100110 011001001100110010011001100100100110 · · · .

148

4. Subshifts of Zero Entropy

Table 4.2 shows the derived substitution for the first few prefixes of ρ. Table 4.2. Derived substitutions of the substitution χ.

u Ru = {a, b, (c)}

ρu

0 {011, 0, 01}

abcbcbababcbab · · ·

01 {0110, 010}

abbaabaabaabba · · ·

011

{0110010, 0110, 0110010}

abcbcbababcbab · · ·

0110

{0110010, 0110, 0110010}

abcbcbababcbab · · ·

χu ⎧ ⎪ ⎨a → abcbcb, b → ab, ⎪ ⎩  c → abcb

a → abba, b → aba ⎧ ⎪ ⎨a → abcbcb, b → ab, ⎪ ⎩ c → abcb ⎧ ⎪ ⎨a → abcbcb, b → ab, ⎪ ⎩ c → abcb

eigenval. √

0, 3±2 17 √ 3± 17 2 √

0, 3±2 17 √

0, 3±2 17

For u = 0,  = 1, and v = 01 from the proof of Proposition 4.27, we find ⎧  ⎪ ⎨a → abb, a → ab, (4.9) ψ= and ψ˜ = b → a, ⎪ b → cb ⎩ c → ab. Note that ρu is also the fixed point of the substitution ⎧ ⎪ ⎨a → abcbc, b → bab, ⎪ ⎩ c → abc, but this one doesn’t match with (4.9). 4.2.3. Recognizability. We call a substitution injective if χ(a) = χ(b) for all a = b ∈ A. Most of the examples above were indeed injective, but, in general, substitutions are not injective and hence not invertible, not even as map χ : Xρ → Xρ . But we can still ask: Is an injective substitution χ : Xρ → χ(Xρ ) invertible, and what does the inverse look like? To illustrate the difficulty here, assume that χ from (4.5) acts on a two-sided shift space. Then what is the inverse of x = · · · 010101010 · · · ? Without

4.2. Substitution Shifts

149

putting in the dot to indicate the zeroth position, there are three ways of dividing x into three-blocks, (4.10) x = · · · |010|101|010| · · · = · · · 0|101|010|10 · · · = · · · 01|010|101|0 · · · , and each with their own inverse. The way to cut x into blocks χ(a) is called a 1-cutting of x. The problem is thus: can a sequence x ∈ χ(Xρ ) have multiple 1-cuttings if we don’t know a priori where the first block starts? Remark 4.29. We give a brief history of this problem. In 1973, Martin claimed that any substitution on a two-letter alphabet which is aperiodic is one-sided recognizable (or ‘rank one determined’). His proof was not convincing. In 1986, Host proved that a primitive substitution shift Xρ is one-sided recognizable if and only if χ(Xρ ) is open in Xρ . This condition is not so easy to check, though. In 1987, Quefféllec announced a short proof of the unilateral recognizability of constant length substitutions due to Rauzy. Nobody could check this proof. In his 1989 PhD Thesis, Mentzen claimed to prove this result, using a paper by Kamae of 1972. However, in 1999, Apparicio found a gap in Mentzen’s proof (Kamae’s results only work for a particular case of the theorem, namely if the length is a power of a prime number). She solved the problem using a 1978 result by Dekking. In the meantime, in 1992, Mossé proved a more general result (also non-constant length), but using a new notion of (two-sided) recognizable substitution. She refined this result in 1996. Her results are currently considered as the definitive reference, although since then proofs by other methods (using results from Downarowicz & Maass [215]) were also found [71, 78]. Fix x ∈ Xρ and define the sequences E = {|χ(x1 x2 · · · xi )|}i≥0

and

Ek = {|χk (x1 x2 · · · xi )|}i≥0 .

By convention, the zeroth entry (for i = 0) is 0. In short, Ek tells us how to divide x into blocks of length |χk (xi )| if we start at 0. Clearly if χ is of constant length M , then Ek = {iM k }i≥0 . Definition 4.30. A substitution word x ∈ Xρ is • one-sided recognizable if there is N such that for every i, j ∈ N such that xi · · · xi+N = xj · · · xj+N we have i ∈ E if and only if j ∈ E, • two-sided recognizable if there is N such that for every i, j ∈ N such that xi−N +1 · · · xi+N = xj−N +1 · · · xj+N we have i ∈ E if and only if j ∈ E. We call N the recognizability index.

150

4. Subshifts of Zero Entropy

In this definition, the sequence x from (4.10) is not recognizable, but for example the fixed point of the Fibonacci substitution χFib is recognizable with recognizability index 2. The Thue-Morse sequence ρ0 (or ρ1 ) is recognizable with recognizability index 4. The following result is due to Mentzen (1989) and Apparicio [29]: Theorem 4.31. Every primitive injective constant length substitution with aperiodic fixed point is one-sided recognizable. For non-constant length substitutions, things are more involved. Example 4.32. The substitutions  0 → 0001, χ: 1 → 01

and

χchac

⎧ ⎪ ⎨0 → 0012, : 1 → 12, ⎪ ⎩ 2 → 012

are not one-sided recognizable. For example, the fixed point of the first one is 0001 0001 01 0001 01 · · · ρ = 0001 0001 0001 01 0001 0001 0001 01 00 01 

 u

u

and just based on the word u = 010001, we cannot say if the cut is directly before its occurrence or not. This problem does not disappear if we take longer words. The latter substitution χchac is called the primitive Chacon substitution; see Example 6.124. In 1992 [426], Mossé also gave conditions under which recognizability fails. Theorem 4.33. Let Xρ be an aperiodic primitive substitution. Suppose that for every n ∈ N there exists v ∈ L(Xρ ) with |v| ≥ n and a, b ∈ A such that (1) χ(a) is a proper suffix of χ(b) and (2) χ(a)v and χ(b)v ∈ L(X) and have the same 1-cutting of v. Then χ is not one-sided recognizable. Theorem 4.34. Every aperiodic primitive injective substitution is two-sided recognizable. The recognizability index was determined by Durand & Leroy [227]. Recognizability of aperiodic, but not necessarily primitive, substitution shifts was proved in [78, Theorem 5.17] and later in [71, Theorems 4.6 and 5.3], which in [71, Theorems 5.1 and 5.2] extended a part of the result to S-adic shifts; see Section 4.2.5.

4.2. Substitution Shifts

151

4.2.4. Pisot Substitutions. Substitutions χ for which the leading eigenvalue λ of the associated matrix A is a Pisot number (i.e. their algebraic conjugates lie inside the open unit disc; see Definition 8.2) have particularly nice properties. They are called Pisot substitutions, and irreducible Pisot substitutions if the characteristic polynomial of A is irreducible. Be aware that, confusingly, this is not the same “irreducible” as in Definition 4.14. Remark 4.35. Let λ be the leading eigenvalue of a Pisot matrix A. The minimal polynomial p(x) of λ always divides the characteristic polynomial of A. If these two polynomials are not equal, say det(A − xI) = p(x)q(x), then the roots of q are zero or roots of unity; i.e. Pisot matrices with reducible characteristic polynomials can still have eigenvalues on the unit circle. A Pisot substitution is called unimodular6 if the associated matrix satisfies det(A) = ±1. For Pisot numbers λ, the distance to the nearest integer |||λn ||| → 0 exponentially; see Proposition 8.5. This leads to the very useful property of Pisot substitutions that (4.11)

| λ|χn (a)| − χn+1 (a)| =: en (a) → 0 exponentially.

Indeed, assume that the second eigenvalue μ has multiplicity m, and let 1 |ρ1 · · · ρn |a n→∞ n

fa := lim

(4.12)

be the letter frequencies of the fixed point ρ of χ. As column vector f =  (fa )a∈A is the leading right eigenvector of A, such that a fa = 1. Using the diagonalization A = U DU −1 where (fa ) is the leftmost column of U and writing 1b for the column vector with a single 1 at position b, we find



(An 1b )a = (U D n U −1 1b )a |χn (b)| = a∈A

(4.13)

=



a

fa λn (U −1 )1b + O(nm−1 μn ) = Cb λn ,

a∈A

for Cb = (U −1 )1b . Therefore λ|χn (b)| = λn+1 Cb + O(nm−1 μn ) = |χn+1 (b)| + O(nm−1 μn ), implying (4.11). Condition (4.11) suffices to conclude7 that there is a continuous function gλ : Xρ → S1 such that gλ ◦ σ = e2πiλ gλ . That is, gλ is a continuous eigenfunction of the Koopman operator Uσ f = f ◦ σ. Dynamically this means that the rotation Rλ : S1 → S1 over angle λ is semi-conjugate to (Xρ , σ) and gλ is the semi-conjugacy. 6 Not 7 See

to be confused with unimodal interval maps in Section 3.6.1. Theorem 6.118 for a more general result.

152

4. Subshifts of Zero Entropy

Using the above computation, we see that also ||| λk |χn (b)| ||| → 0 exponentially, and therefore λk are eigenvalues as well; cf. Theorem 8.8. If the minimal polynomial of λ has degree d = #A, then 1, λ, . . . , λd−1 are linearly independent, but λd is a linear combination of 1, λ, . . . , λd−1 . Thus g : Xρ → Td−1 ,

x → (gλ , gλ2 , . . . , gλd−1 )

is a semi-conjugacy between (Xρ , σ) and the toral rotation Rλ : Td−1 → Td−1 , x → x + λ mod 1 for the translation vector λ = (λ, . . . , λd−1 ). Again, since the 1, λ, . . . , λd−1 are linearly independent, Rλ is minimal and uniquely ergodic, with Lebesgue measure as its only Rλ -invariant probability measure. It is widely believed that, for every irreducible Pisot substitution, (Xρ , σ, μ) is isomorphic to (Td−1 , Rλ , Leb); i.e. the semi-conjugacy π is one-to-one μa.e. This is a corollary of Halmos & von Neumann’s Structure Theorem 6.100, together with the Pisot substitution conjecture which states that every irreducible Pisot substitution has a pure point spectrum; see Section 6.8.3. In this section, we will give some more properties of Pisot substitutions, leading to a more geometrical understanding of g. The letter frequencies fa = limn n1 |x1 · · · xn |a of substitution shifts exist for all a ∈ A, independently of x ∈ X. Frequency is a limit notion, but there are ways to measure how often subwords and letters appear in finite words, without taking limits. Given a word v = v1 · · · vn ∈ A∗ , let |v|a = #{1 ≤ i ≤ n : vi = a} be the number of appearances of the letter a in v. Similarly, |v|u stand for the number of occurrences of the word u in v. Definition 4.36. A language L(X) is called R-balanced if there is an R ∈ N such that ||v|a − |w|a | ≤ R for all a ∈ A, n ∈ N, and words v, w ∈ Ln (X). If R is not specified, then we just say balanced. Similarly, we call L(X) balanced on words if there is R ∈ N such that ||v|u − |w|u | ≤ R for all u ∈ L, integers n ≥ |u|, and words v, w ∈ Ln (X). Theorem 4.37. Every primitive Pisot substitution shift is balanced. Proof. Let f = (fa )a∈A be the frequency vector; it is the right eigenvalue of the associated matrix A of χ; see Lemma 4.15. Let λ, μ be the largest two eigenvalues of A. Because λ is a Pisot number, λ > 1 > |μ|. Assume that μ has multiplicity m. Then, using the Jordan decomposition A = U JU −1 where f is the leftmost column of U and writing 1b for the unit column vector with a single 1 at position b, we find |χn (b)|a = (An1b )a = U J n U −11b = fa λn (U −1 )1b + O(nm−1 μn ).

4.2. Substitution Shifts

153

  We sum over a ∈ A, noting that a∈A fa = 1: |χn (b)| = a∈A |An1b |a = λn (U −1 )1b + O(nm−1 μn ). Therefore (4.14)

| |χn (b)|a − fa |χn (b)| | = O(nm−1 μn ),

proving that the discrepancy is bounded at the words χn (b); see (8.18) and Definition 8.40 in Section 8.3.1. We can split an arbitrary word w ∈ L(ρ) as (4.15)

 w = v0 χ(v1 ) · · · χn−1 (vn−1 )χn (vn )χn−1 (vn−1 ) · · · χ(v1 )v0

for some maximal n such that vn =  and each vk and vk have length ≤ L := maxa∈A |χ(a)|. Applying (4.14) to each of χj (vj ) and χj (vj ) we get bounded discrepancy altogether. It follows by Proposition 8.43 that ρ is balanced; see also Proposition 4.22.  Remark 4.38. The above proof can be adapted to show that also whole words v ∈ L (ρ) appear with bounded discrepancy, namely by considering the -block shift, which is also Pisot, and in which v is simply a single letter. Without proof (see [3, 4]), we remark that if λ is not a Pisot number, then the discrepancy ⎧ m log |μ|/ log |λ| ⎪ if |μ| > 1, ⎨(log n) n ∗ m m−1 nDn (ρ) ≈ (log n) or (log n) if |μ| = 1 is a root of unity, ⎪ ⎩ m if |μ| = 1 is not a root of unity. (log n) where again μ is the second largest eigenvalue, of geometric multiplicity m. References for the following construction of Rauzy fractals include [33, 69, 328, 329]. Let us label the coordinate axes of Rd by the letters a ∈ A (so d = #A is also the degree of λ, provided that χ is indeed irreducible). Let 1a , a ∈ A, denote the unit vectors. Let Eλ+ be the positive half-line in the direction the leading right eigenvector f of A. To each x ∈ Xρ we will assign a broken line x → L(x) = (i (x))i≥1 as follows: Starting at the origin, we concatenate unit length arcs ui (x), i ≥ 1, parallel to 1a if xi = a, so that ui+1 (x) meets with ui (x) only at a single common endpoint i (x) ∈ ZN with coordinates i (x)a = |x1 · · · xi |a ; see Figure 4.3. We also let 0 (x) = 0 be the origin. Let V be the d − 1-dimensional hyperplane spanned by the (generalized) eigenvectors of A other than f. Equivalently, V is the orthogonal complement of the leading left eigenvector of A. Let π : RN → V be the projection parallel to Eλ+ . The set R := π({n (ρ) : n ∈ N})

154

4. Subshifts of Zero Entropy

2 1 •



Eλ+





V

• •









5 (ρ)

π 0



⎧ ⎪ ⎨0 → 02 χ: 1→0 ⎪ ⎩ 2→1 ρ = 02100202102 · · · Figure 4.3. Broken line construction of a Rauzy fractal.

is called the Rauzy fractal of χ, [32, 53, 471]. See Figure 4.4 for some examples in dimension two. Strictly speaking, for Rauzy fractals that are topological disks, it is only the boundary of R that is fractal.

Figure 4.4. The Rauzy fractals for x3 = x2 + x + 1 (tribonacci) and x3 = x2 + 1.

We can transfer the shift action σ from Xρ to the space of broken lines via (4.16)

σ ˆ◦L=L◦σ

for σ ˆ (L)k = k+1 − 1 ,

L = (i )i≥0 .

4.2. Substitution Shifts

155

Also the substitution can be carried to the space of broken lines. Set χ( ˆ 1a ) = u1 · · · uχ(xi ) ,

uj is parallel to 1χ(a)j ,

and extend this to a broken line L by concatenating the broken arcs χ(x ˆ i) ˆ i ) and χ(x ˆ i+1 ) have a boundary such that χ(x ˆ 1 ) starts at the origin and χ(x point in common, namely the vector (|χ(x1 · · · xi )|a )a∈A . It also follows that (4.17)

h ◦ π = π ◦ χ, ˆ

h = A|V : V → V.

ˆ ◦ σ n (ρ) = π ◦ Theorem 4.39. The map π ˆ : orbσ (ρ) → R ⊂ V defined by π n (ρ) extends continuously to Xρ and commutes with the piecewise translation (4.18)

T : R → R,

y → y + π(1a )

if y ∈ p([a]).

In particular, p(Xρ ) = R. In fact, T is a group translation on V /Λ for some lattice Λ. If A is unimodular, then R is a fundamental domain of Λ, and π ˆ : Xρ → R % V /Λ is a measure-theoretic isomorphism. Note, however, that T is multivalued at points in π ˆ ([a])∩ π ˆ ([a ]), a = a ∈ A. Under the assumption that A is unimodular, the sets π ˆ ([a]) only overlap  at common boundary points, and π ˆ ([a]) and π ˆ ([a ]) have disjoint interiors for a = a ; see the different gray-tones in Figure 4.4. Arnoux & Ito [33, Theorem 2] (see also [242]) proved that for unimodular Pisot substitutions, π ˆ ([a]) ∩  π ˆ ([a ]) have zero Lebesgue measure, and hence (Xρ , σ) with respect to its unique invariant measure is isomorphic to (R, T, Leb). In this case, the map T can be properly called a domain exchange transformation, since the T images of the sets π ˆ ([a]) are disjoint up to sets of Lebesgue measure zero. In general, Rauzy fractals need not be connected or simply connected and neither does their boundary have zero Lebesgue measure; see [66, 335] for more information. If A is not unimodular, π fails to be one-to-one for a.e. x ∈ Xρ . In this case, according to Halmos & von Neumann’s Structure Theorem 6.100, the group translation that (Xρ , σ) is isomorphic to is a solenoid (skewproduct of a Cantor set and a d − 1-dimensional torus); see [44, 53, 70]. Proof of Theorem 4.39. The main step is showing that p is uniformly continuous on orbσ (ρ), so it has a unique continuous extension Xρ = orbσ (ρ), and then (4.18) follows directly from (4.16). Since Xρ is recognizable, there is N such that every word w ∈ L(ρ) of length |w| ≥ N is a subword of the χ-image of some unique v, shortest in the sense that w is not a subword of χ(v  ) for every proper subword v  of v. Now suppose n1 < n2 are such that d(σ n1 (ρ), σ n2 (ρ)) = 2−n for some n ≥ N , so ρn1 +1 · · · ρn1 +n = ρn2 +1 · · · ρn2 +n but ρn1 +n+1 = ρn2 +n+1 . We can take the inverse of χ on ρn1 +1 · · · ρn2 +n , i.e. find m1 maximal and m2 +m minimal such that ρn1 +1 · · · ρn2 +n is a subword of χ(ρm1 +1 · · · ρm2 +m )

156

4. Subshifts of Zero Entropy

and ρm1 +1 · · · ρm1 +m = ρm2 +1 · · · ρm2 +m . Note that n2 − n1 ≈ λ(m2 − m1 ) and n ≈ λm. Continue this way until the common length of the two coinciding words drops below N . That is, we find k ∈ N, l1 maximal and l2 + l minimal such that ρn1 +1 · · · ρn2 +n is a subword of χk (ρl1 +1 · · · ρl2 +l ) and ρl1 +1 · · · ρl1 +l = ρl2 +1 · · · ρl2 +l . Also n2 − n1 ≈ λk (l2 − l1 ) and n ≈ λk l ≤ λk N , and there is an integer K % λk such that ρn1 +1 · · · ρn1 +n starts at the K-th letter of χk (ρl1 +1 · · · ρl1 +l ) and ρn2 +1 · · · ρn2 +n starts at the K-th letter of χk (ρl2 +1 · · · ρl2 +l ). Since A ◦ π ◦ L = π ◦ χ ˆ◦L = π ˆ ◦ χ by (4.17) and V is the contracting hyperplane of A, we have * n * 2 *

* * * ˆ (σ (ρ))& = * p(1ρi ) * &ˆ π (σ (ρ)) − π * * i=n1 +1 * * * * l2 * *

K *  p ◦ σ ◦ χ( ˆ 1ρi ) * = * * * * i=l1 +1 * * ⎞ ⎛ * * l2

* * K k⎝ K  * *  ⎠ π ˆ (1ρi ) − T (0)* = C *T ◦ A * * i=l1 +1 n2

n1

≤ Ck m |μ|k , where C is a uniform constant and m is the multiplicity of the second largest eigenvalue μ of A. If k is sufficiently large, we have Ck m |μ|k ≤ |μ|k/2 = (λk )α λ k k for α = 2 log log |μ| < 0. Because n ≈ λ l ≤ λ N , this gives &ˆ π (σ (ρ)) − π ˆ (σ (ρ))& ≤ λ n2

n1



≤N

−α α

n =



− log(d(σ n1 (ρ) , σ n2 (ρ))) N log 2

α .

This implies the required uniform continuity of π ˆ : orbσ (ρ) → V and allows us to extend π ˆ continuously to Xρ . However, on each domain π ˆ ([a]), the translation vector πa := π(1a ) is different. When we divide the hyperplane V by a well-chosen lattice Λ, these translation vectors become the same. That is, we need πa − πa to be lattice points for all a, a ∈ A = {0, . . . , d − 1}. The simplest way of achieving this is by letting Λj = πj − πj−1 , for j ∈ {1, . . . , d − 1}, be the vectors spanning Λ. Let us compute the πi more explicitly. Let uj , j ∈ {0, . . . , d − 1}, be the (generalized) right eigenvectors of A, where u0 is associated to the leading eigenvalue λ. Since the uj are the columns of U in the Jordan decomposition  −1 d−1 ui where U −1 = (u−1 A = U JU −1 , we have ej = (U −1 U )j = d−1 i=0 uij  ij )i,j=0 .

4.2. Substitution Shifts

157

Hence πj = ej − u−1 u0 = 0,j 

d−1 i=1

u−1 ui , and i,j 

−1 u0 Λj = ej − ej−1 − (u−1 0,j − u0,j−1 )

for

1 ≤ j ≤ d − 1.

Each πj has rationally independent coordinates, and therefore T acts as a minimal map on the quotient space V /Λ, which is a d − 1-dimensional torus.  The same proof shows that the substitution χ acts as a contraction on R, ˆ 2 )& ≤ |μ| &y1 − y2 & with contraction factor ≈ |μ|: χ(0) ˆ = 0 and &χ(y ˆ 1 ) − χ(y assuming that μ has geometric multiplicity 1. Rauzy fractals of Pisot substitutions can be seen as attractors of iterated function systems (IFS) in the sense of Hutchington [326], or rather graphdirected IFS defined by a kind of transition graph called the prefix-suffix graph, as introduced in [138]. The vertices of this graph are labeled by the letters of A and there is an arrow i → j labeled (p, i, s) ∈ A∗ × A × A∗ (p = prefix, s = suffix), for every occurrence of i, j ∈ A such that χ(j) = pis; see Figure 4.5. In particular, the transition matrix of the prefix-suffix graph is A, the number of incoming arrows to vertex j is |χ(j)|, and the label of each i → j reads χ(j) if we ignore the commas and the empty words . ⎧ ⎪ ⎨0 → 012, χ : 1 → 0, ⎪ ⎩ 2→1 2

⎧ ⎪ ⎨0 → 02, χ : 1 → 0, ⎪ ⎩ 2→1

(, 0, 12)

(01, 2, )

0

(, 0, ) (0, 1, 2)

(, 1, )

1

2

(, 0, 2)

(0, 2, )

0

(, 0, )

1

(, 1, )

Figure 4.5. Prefix-suffix graphs for two substitutions.

Theorem 4.40. Let A be the matrix associated to an irreducible unimodular Pisot substitution χ and let h = A|V be the map restricted to its contracting eigenspace. The subtiles R(i) := {π ◦ n−1 : ρn = i} of the Rauzy graph R satisfy ⎛ ⎞ |p|

1pk ⎠ . h(R(j)) + π ˆ (p), for π ˆ (p) = π ⎝ R(i) = (p,i,s)

k=1

i−→j

The result was first shown for general irreducible unit Pisot substitutions by Sirvent & Wang [515], although special cases were around, see e.g. [33, 125, 329]. In particular, Arnoux & Ito [33] gave a condition under which

158

4. Subshifts of Zero Entropy

the tiles R(i) overlap at most on a null-set. We follow the proof presented in [69], which is Chapter 5 in [68]. Proof. Recall that L = {i }i≥0 is the broken line associated to the fixed point ρ of the Pisot substitution χ. The subtile R(i) is the closure of the points {π ◦ n−1 : ρn = i}. Since χ(ρ) = ρ, for each such n, there is m such that ρ1 · · · ρn = χ(ρ1 · · · ρm )p, where ρm = j and χ(j) = pis. By (4.17), we get ˆ (p) = h ◦ π(m−1 ) + π ˆ (p), π(n−1 ) = π(|χ(ρ1 ···ρm−1 )| ) + π

ρm = j.

Taking the union of such points for all n with ρn = i and then taking the closure, we arrive at R(i) ⊂ h(R(j)) + π ˆ (p), (p,i,s)

i−→j (p,i,s)

where i −→ j are the labeled arrows of the prefix-suffix graph. Now h contracts the d − 1-dimensional Lebesgue measure Leb of V by a factor 1/λ because A is a unimodular Pisot matrix. Therefore, writing wi = Leb(R(i)), we obtain that



wj = aij wj for every i ∈ A. (4.19) λwi ≤ (p,i,s)

j∈A

i−→j

Here A = (aij ) is both the associated matrix of the substitution χ and the transition matrix of the prefix-suffix graph. However, the Perron-Frobenius Theorem 8.58 (part (c)) tells us that if A is a non-negative matrix with leading eigenvalue λ and w a non-negative vector, then λw ≤ Aw coordinatewise (that is (4.19)) can only hold if w is a multiple of the leading eigenvector  of A and then we have equality. Therefore λ Leb(R(i)) = (p,i,s) Leb(R(j)) i−→j  ˆ (p) as claimed.  for every i ∈ A, and R(i) = (p,i,s) h(R(j)) + π i−→j

4.2.5. S-adic Transformations. Instead of using a single substitution to create an infinite word ρ ∈ AN , we can use a sequence of substitutions χn : An → A∗n−1 , potentially between different alphabets An . Thus (4.20)

ρ = lim χ1 ◦ χ2 ◦ · · · ◦ χn (an ), n→∞

an ∈ An .

A priori, the limit need not exist, or can depend on the choice of letters an ∈ An , but if ρ exists and is an infinite sequence, then we have the following definition.

4.2. Substitution Shifts

159

Definition 4.41. Let S be a collection8 of substitutions χ and choose χn ∈ S such that alphabets match: χn : An → A∗n−1 . Assume that the sequence ρ defined in (4.20) exists and is infinite, and let Xρ = orbσ (ρ). Then (Xρ , σ) is called an S-adic shift. The word S-adic was first used by Ferenczi [244] and the S in S-adic stands for substitution. If the sequence (χn )n∈N itself is periodic, then the S-adic shift reduces to a substitution; the reverse question of when S-adic shifts are isomorphic to substitution shifts was addressed in [318]. The following simple set of conditions implies the existence of ρ: An = A  0, an ≡ 0, and χn (0) starts with 0 for each n ∈ N. However, this by itself doesn’t imply that (Xρ , σ) is minimal. We use a straightforward generalization of Definition 4.14. Definition 4.42. A sequence (χn )n∈N is called primitive9 if there is N such that for all 0 ≤ m < n, a ∈ An+N , every b ∈ Am appears in χm+1 ◦ · · · ◦ χn+N (a). If (Xρ , σ) is primitive, then it is minimal. Indeed, let ρ(m) := limn χm+1 ◦ χn (0), so χm (ρ(m+1) ) = ρ(m) . The primitivity implies that all letters a ∈ An occur with bounded gaps ρ(n) , and hence words w = χm+1 ◦ · · · ◦ χn (a) occur with bounded gaps in ρ(m) . This proves minimality. So far this is [221, Lemma 7]. However, not every primitive S-adic subshift is linearly recurrent, because the recurrence of two-letter words can be problematic. For instance [221, Section 2 of the addendum], the substitutions on the alphabet A = {0, 1, 2} ⎧ ⎧ ⎪ ⎪ 0 → 012, ⎨ ⎨0 → 021, χ : 1 → 012, and χ ˜ : 1 → 121, ⎪ ⎪ ⎩ ⎩ 2 → 002 2 → 012 always form primitive S-adic shifts because every letter occurs in every image of every composition of two substitutions. The problem is the word 20 which only occurs when straddling the concatenated images of two words ˜ χ1 ◦ · · · ◦ χn (a), a ∈ A. As a result, two appearances of 20 in χn ◦ χ(w) are always 3n+1 places apart. Hence, to achieve linear recurrence, we need a bound on the distance between occurrences of two-letter words, but this is 8 Some, but not all, authors require S to be finite. We will not require finiteness, because in the few results where this requirement matters, it can easily be assumed separately. 9 In [70] a weaker notion of primitive is used, namely that for every m, there is n such that χm+1 ◦ · · · ◦ χn has a strictly positive associated matrix. This is strong enough to conclude minimality, but not for linear recurrence.

160

4. Subshifts of Zero Entropy

sufficient (see [221, Lemma 3.1 of the addendum]): Lemma 4.43. Let (Xρ , σ) be an S-adic shift with a well-defined infinite ρ, and take ρ(m) as below Definition 4.42. Define the gap-size g (m) (j) = (m) (m) (m) (m) min{i ≥ 1 : ρj ρj+1 = ρj+i ρj+i+1 }. If D := sup{g (m) (j) : j ≥ 1, m ≥ 0} < ∞, then (Xρ , σ) is linearly recurrent. Proof. First, recalling N from the definition of primitivity, we can define K1 := max{|χn ◦ · · · ◦ χn+N (a)| : n ∈ N, a ∈ An+N } and K2 := min{|χn ◦ · · · ◦ χn+N (a)| : n ∈ N, a ∈ An+N } > 0. Hence, for all m ≤ m + N ≤ n and a, b ∈ An , |χ1 ◦ · · · ◦ χn−N (χn−N +1 ◦ · · · ◦ χn (a))| K1 |χm+1 ◦ · · · ◦ χn (a)| ≤ ≤ =: K. |χ1 ◦ · · · ◦ χn (b)| |χ1 ◦ · · · ◦ χn−N (χn−N +1 ◦ · · · ◦ χn (b))| K2 Let u ∈ L(Xρ ) such that |u| ≥ min{χ1 ◦ · · · ◦ χN (a), a ∈ AN } and let v = wu be a return word to u; see Definition 4.2. Take N  > N such that u is a subword of χ1 ◦ · · · ◦ χN  (ab) for some a, b ∈ AN  . Then |v| ≤ D max {|χ1 ◦ · · · ◦ χn (c)|} c∈AN 

≤ D K min {|χ1 ◦ · · · ◦ χN  (c)|} c∈AN 

≤ D K min{|χN  (c)| : c ∈ AN  } · min {|χ1 ◦ · · · ◦ χN  −1 (c)|} c∈AN  −1

≤ DK

2

min {|χN  (c)| : c ∈ AN  } · max {|χ1 ◦ · · · ◦ χN  −1 (c)|}

c∈AN  −1

c∈AN  −1

≤ D K 2 min{|χn (c)| : c ∈ An , n ∈ N}|u|. This gives linear recurrence with constant L = DK 2 min{|χn (c)| : c ∈ An , n ∈ N}.  Verifying that D in Lemma 4.43 is finite can be easily done in many cases. Durand gave a general condition equivalent to linear recurrence. Definition 4.44. A substitution χ is called proper if there exist two letters b, e ∈ A such that for every a ∈ A, χ(a) starts with b and ends with e. Theorem 4.45. The sequence ρ is produced by a proper primitive S-adic system if and only if (Xρ , σ) is linearly recurrent. Proof. For the proof, see [221, Proposition 1.1 of the addendum].



4.2. Substitution Shifts

161

It follows that primitive S-adic shifts have sublinear word-complexity. Extending Mossé’s [426] results that for substitution shifts p(n + 1) − p(n) is bounded, Durand showed that p(n + 1) − p(n) is bounded for primitive S-adic shifts as well. Sturmian shifts have p(n + 1) − p(n) ≡ 1; see Definition 4.60. They are indeed S-adic as explained in Section 4.3.5 and in our next Example 4.46. More generally, the symbolic itinerary space coming from a d-interval exchange transformation has p(n + 1) − p(n) ≡ d − 1; see Section 4.4. Host conjectured that p(n + 1) − p(n) is bounded for a subshift (X, σ) if and only if X is S-adic. Durand’s result gives the “if” part, but the “only if” part was disproved by Ferenczi [244]. Cassaigne [146] showed that every sequence in {0, . . . , d − 1}N (no matter what its word-complexity is) can be written as an S-adic transformation on alphabet {0, . . . , d − 1}. This is a somewhat stronger form of the S-adic complexity conjecture. See also [228] for the question of under what additional condition can we conclude that a minimal subshift (X, σ) is S-adic if and only if its word-complexity p(n) is sublinear. Example 4.46. Sturmian shifts can be represented as S-adic shifts; see Section 4.3.5 for details. Consider the substitutions   0 → 0, 0 → 01, and χ1 : χ0 : 1 → 10 1 → 1; see (4.31). By themselves they are not primitive, neither their iterates χa0 and χa1 , but  χ1 ◦ χa0 :

0 → χ1 (0) = 01, 1 → χ1 (1a 0) = 1a 01

 and

χ0 ◦ χa1 :

0 → χ0 (0a 1) = 0a 10, 1 → χ0 (1) = 10

are primitive. The limit sequence ρ = limn χa01 ◦ χa12 ◦ · · · ◦ χa1n (0) is linearly recurrent if and only if (an )n∈N is a bounded sequence. Since all Sturmian sequences can be found this way, where the corresponding frequency α has continued fraction expansion α = [0; a1 , a2 , . . . ], Sturmian sequences are linearly recurrent if and only if α is of bounded type; see Durand [221, Proposition 10 and Proposition 5.1 of the addendum]. Note that {χ0 , χ1 } is not a collection of proper substitutions; a proper S-adic representation of Sturmian sequences was given in [179]. Well before Durand’s work, it was shown by Mignosi [418] and [20, Theorem 10.6.1] that Sturmian sequences are k-power-free for some k ∈ N if and only if the corresponding frequency α has a continuous fraction of bounded type. Of course, one direction follows, because if α is of unbounded type, say ark > k for arbitrary k, then (4.34) shows the occurrence of a

162

4. Subshifts of Zero Entropy

ar

k-power χa01 ◦ χa12 ◦ · · · ◦ χ0 k (b) for b = 0, 1 depending on whether rk is even or odd. Mignosi’s proof shows that there are no unexpected k-powers for k > supn an . In [70], many parts of the structure of Pisot substitutions and Rauzy fractals are recovered for classes of S-adic shifts. A specific example of this, presented in [34], is when the associated matrices of all the substitutions in the collection are the same. Another example is provided by the so-called Arnoux-Rauzy substitutions: Example 4.47. In [35], Arnoux & Rauzy proposed a generalization of Example 4.46 to describe translations on higher dimensional tori Td−1 and interval exchange transformation on d intervals. These are the Arnoux-Rauzy substitutions on the alphabet {0, 1, . . . , d − 1}:  i → i, (4.21) αi : j → ji for j = i. Itineraries of IETs and torus rotations with respect to a natural partition can be written as (4.22)

lim αi(1) ◦ αi(2) ◦ · · · ◦ αi(n) (0),

n→∞

or with shifts σ interspersed as discussed in Section 4.3.5 for Sturmian shifts. Arnoux & Rauzy also showed that for three-letter alphabets, every minimal sequence with complexity p(n) = 2n+1 can be written as in (4.22). However, the conjecture that every sequence produced by (4.22) is the itinerary for a point in such a dynamical system was disproved in [149], by constructing unbalanced Arnoux-Rauzy sequences. Note that itineraries of torus rotations and IETs have to be balanced. There are even weakly mixing Arnoux-Rauzy sequences; see [148]. Positive results on large classes of Arnoux-Rauzy sequences were achieved in [65]. In [70, Theorems 3.7 and 3.8] it is shown that typical (in some sense) Arnoux-Rauzy sequences on three letters do correspond to itineraries of torus translations and also that linearly recurrent Arnoux-Rauzy shifts have pure point spectra; see Section 6.8.3. Further results on S-adic shifts pertain to recognizability, e.g. Theorems 5.1 and 5.1 of [71], where it was also proved that a recognizable S-adic shift has finite rank; cf. [71, Corollary 6.7].

4.3. Sturmian Subshifts Sturmian sequences emerge as symbolic dynamics of circle rotations or similar systems. There are several textbook sources on the properties of Sturmian sequence, e.g. [85, Chapter 1], [249, Section 6], and [20, Section 5.10]. There

4.3. Sturmian Subshifts

163

0 1 1 0 1 0

1

1

1

1

1 0 0

Figure 4.6. Sturmian sequences produced as intersections with horizontal and vertical grid-lines (left) and billiards on a rectangular billiard table (right).

are at least three equivalent defining properties, to which we will devote separate sections. The name Sturmian was given by Morse & Hedlund [425], seemingly because these sequences appear in connection with the work of the French mathematician Jacques Sturm (1803–1855) on the number of zeroes that sin(αx + β)π has in the interval [n, n + 1), but the sequences as such were certainly not studied by Sturm. There are multiple other ways to obtain Sturmian sequences. For instance, take a piece of paper with a square grid, draw a line on it with slope α, and write a 0 whenever it crosses a horizontal grid-line and a 1 whenever it crosses a vertical grid-line (see Figure 4.6, left). Then we obtain a Sturmian sequence. Also, the trajectory of a billiard ball moving frictionless on a rectangular billiard table can be coded symbolically by writing a 0 for each collision with a long edge and a 1 for each collision with a short edge (see Figure 4.6, right). If the motion is not periodic, then the resulting sequence is Sturmian. Equivalently, Sturmian sequences can be obtained as the difference sequence bn+1 − bn for a Beatty sequence bn = αn for some irrational number α ∈ (0, 1). For irrational α > 1, we would obtain Sturmian sequences on a larger alphabet {0, 1 . . . , α}, but we will not address these in this text. 4.3.1. Rotational Sequences. Definition 4.48. Let Rα : S1 → S1 , x → x + α mod 1, be the rotation over an irrational angle α. Let β ∈ S1 and build the itinerary i(x) = u = (un )n≥0 by  1 if Rαn (β) ∈ [0, α), (4.23) un = / [0, α). 0 if Rαn (β) ∈ Then u is called a rotational sequence.

164

4. Subshifts of Zero Entropy

Remark 4.49. The additional sequences obtained by taking the closure can also be obtained by taking the half-open interval the other way around:  1 if Rαn (x) ∈ (0, α], un = / (0, α]. 0 if Rαn (x) ∈ In either way, the resulting two-sided subshift (Xα , σ) for Xα = orbσ (u) is an extension of (S1 , Rα ) where i : S1 → Xα is the inverse factor map i = ψ −1 . Therefore the points xn = Rαn (0), n ∈ Z, have fibers ψ −1 (xn ) consisting of two points, whereas #ψ −1 (x) = 1 for all other x. Thus (Xα , σ) is an almost one-to-one extension of the circle rotation; see Section 2.3.1. Lemma 4.50. Every rotational word u is palindromic: it contains palindromes of arbitrary length. Remark 4.51. A minimal palindromic shift (X, σ) is also mirror invariant which means that if w1 w2 · · · wn ∈ L(X), then also L(X)  wn wn−1 · · · w1 . It is an open question (posed by Surer) for which shifts mirror invariance implies that the shift is palindromic. Especially for substitution shifts, this question looks very interesting. Proof. By symmetry, the two-sided itinerary of β := α/2 is a palindrome entirely: un = u−n for all n ∈ Z. Since {kα + β mod 1}k is dense in the circle and uniformly recurrent, every subword w1 w2 · · · wn in every itinerary  will have its reversed copy wn wn−1 · · · w1 in the same itinerary. Lemma 4.52. If w is a bi-special subword of a rotational sequence, then it coincides with a prefix of the one-sided itinerary i(2α mod 1) of length qn + aqn+1 − 2 for some n ∈ N and 0 ≤ a < an+1 , where pn /qn are the convergents of the continued fraction expansion α = [0; a1 , a2 , a3 , . . . ] (see Section 8.2). Proof. Each subword w corresponds to a subinterval Jw of the circle, namely the interval of points x such that i(x) starts with w. If w is left-special, so 0w and 1w are both allowed, then Rα−1 (Jw ) contains 0 or α in its interior. In the former case, α ∈ Jw◦ , so not all x ∈ Jw have the same first letter in their itinerary. Therefore α ∈ Rα−1 (Jw◦ ) and Rα2 (0) ∈ Jw◦ . |w|+2 ˆ◦ ( Jw ) = Let Jˆw := Rα−2 (Jw◦ )  0. Now if w is also right-special, then Rα |w| −(|w|+2) (0) ∈ Jˆw◦ . This means that y is a Rα (Jw◦ )  0, and therefore y := Rα preimage of 0 such that no preimage of 0 of lower order belongs to (0, y). The points y with this property are ordered as in Figure 4.7, where the numbers j refer to the points Rα−j (0). Therefore |w| + 2 = qn + aqn+1 and the lemma follows. 

4.3. Sturmian Subshifts

qn−1

qn + qn−1

165

2qn + qn−1 = qn+1

0 qn+2 qn+1 + qn

qn

Figure 4.7. Positions of the preimages of 0 under Rα that are closest to 0.

Exercise 4.53. Show that every bi-special word of a rotational sequence (so Sturmian sequence by Lemma 4.63) is a palindrome. We discuss the work of Denjoy [193] on circle homeomorphisms, specifically Denjoy circle maps with only one orbit of maximal wandering intervals. They have minimal sets that are exactly conjugate to Sturmian shifts. Theorem 4.54 (Denjoy). The rotation number F n (x) − x mod 1 n→∞ n

ρ(f ) = lim

of a circle homeomorphism f : S1 → S1 exists independently of x (and the convergence is uniform). Here F : R → R is a lift of f , i.e. a continuous map of the universal cover R of S1 such that F (x) mod 1 = f (x mod 1). Furthermore, • ρ(f ) = point;

p q

∈ Q (in lowest terms) if and only if f has a q-periodic

• if ρ = ρ(f ) ∈ / Q, then f is semi-conjugate to the rotation Rρ : h◦f = Rρ ◦ h. In fact, h is a conjugacy if and only if f is minimal. For the proof we refer to [414, Chapter I, Theorem 2.1], but let us give some details on how non-minimal circle homeomorphisms f with irrational rotation numbers can be constructed. Start with the rotation Rρ : S1 → S1 and select some x1 ∈ S1 (or any finite or countable set of points xj ∈ S1 having disjoint orbits under Rα ). For each k and n ∈ Z, replace Rρn (xk ) by a closed interval Ik,n of length 2−(k+|n|) ; this creates a new circle K with    circumference 1 + k n∈Z 2−(k+|n|) = 1 + 3 k 2−k . Define f : Ik,n → affine (or any orientation-preserving) homeomorphism, and for Ik,n+1 as an  all x ∈ S1 \ k,n Rρn (xk ) set f (x) = Rρ (x). Then f : K → K is indeed a homeomorphism, and h : K → S1 ,  Rρn (xk ) if x ∈ Ik,n , (4.24) h(x) = x otherwise, is a semi-conjugacy; see Figure 4.8. Such circle homeomorphisms f are called Denjoy circle maps. There is some restriction on how smooth such homeomorphisms can be. Denjoy proved that if f is a C 1 diffeomorphism

166

4. Subshifts of Zero Entropy

f :K→K

Ik,2

Ik,1

Ik,0

Rρ : S1 → S1

h(Ik,2 ) • •h(Ik,1 )

h

•h(Ik,0 )

•h(Ik,−1 )

Ik,−1 • h(Ik,−2 ) Ik,−2

Figure 4.8. The semi-conjugacy h from a Denjoy circle map to a rotation.

such that log f  has bounded variation10 , then f is minimal. On the other hand, for every γ ∈ [0, 1), there are C 1+γ Denjoy circle maps; see [309]. Take Rρ , split open the orbit of 0, replacing the points Rρn (0) by intervals In , and  denote the corresponding Denjoy circle map by f : K → K. Then K \ n In◦ is a minimal Cantor set. If we code [sup I0 , inf I1 ] ∩ X by 1 and [sup I1 , inf I0 ] ∩ X by 0, then the coding map i : X → {0, 1}Z is precisely a conjugacy between (X, f ) and a two-sided rotational shift Xρ with frequency ρ = ρ(f ). If we split open S1 only along the backward orbit of 0, then the map f is not invertible at α, and we obtain a one-sided rotational shift. Remark 4.55. In this construction, we have split open only a single orbit, and this leads to a rotational shift. It is of course possible to split open the circle at several orbits. This still leads to an almost one-to-one extension of the circle map, but no longer to a rotational shift of Definition 4.48. The following result on amorphic complexity holds for these more general Denjoy examples, and the proof given works in this generality. An easier proof for rotational shifts is given in [288]. Theorem 4.56. The amorphic complexity of any non-periodic two-sided rotational subshift (Xρ , σ) is 1. Equivalently, ac(f ) = 1 for any Denjoy circle map f : K → K. Proof. Since two-sided shift σ : Xρ → Xρ is conjugate to f : C → C for  the ◦ , it suffices to show that ac(f | ) = 1. C = K \ k,n Ik,n C  Take three points ξ1 , ξ2 , ξ3 ∈ k,n Ik,n such that d(h(ξi ), h(ξj )) ≥ 14 for i = j. Let δ := min{|Ik,n | : Ik,n  ξj for some j} be the minimal length of the intervals corresponding to the ξi ’s. 10 For

the definition of variation, see before Theorem 8.42

4.3. Sturmian Subshifts

167

 Since h( k,n Ik,n ) is a countable set, we can take N := 1/v points in C such that S := {h(xi ) : i = 1, . . . , N } is an equidistant lattice in S1 with minimal mutual distance 1/N . Set J = [xi , xj ] for some i = j, ordered in such a way that |h(J)| < 12 . Whenever Rρn (h(J))  ξ1 , |f n (J)| ≥ δ, but S1 \ Rρn (h(J)) has length ≥ 1/2, so it must contain ξ2 and/or ξ3 . Therefore also |K \ f n (J)| ≥ δ, and thus d(f n (xi ), f n (zj )) ≥ δ. Since lim

n→∞

1 1 #{0 ≤ k < n : Rnk (h(J))  ξ1 } = Leb(h(J)) ≥ ≥ v, n N

we obtain lim supn n1 #{0 ≤ k < n : d(hk (xi ), hk (xj )) ≥ δ} ≥ v, so S is (δ, v)-separated. We have #S ≥ v1 − 1 and therefore ac(f ) ≥ 1. Now for the other direction, we will use (δ, v)-spanning sets; see Remark 2.56. For v ∈ (0, 1], define a function ψv : S1 → [0, |K|], where |K| is the circumference of K, as ψv (x) = Leb(h−1 ([x, x + v])). Note that d(x, y) ≤ Leb(h−1 ([h(x), h(y)])) (because d(x, y) measures the ([x, x + v])) for all v shortest arc between x and y) and ψv (x) ≥ diam(h−1 sufficiently small and x outside the countable set h( k,n Ik,n ). Therefore ψv is an L1 -function. The Birkhoff Ergodic Theorem 6.13 implies that for Leb-a.e. y ∈ S1 , (4.25)

1 #{0 ≤ k < n : ψv (Rρk (y)) ≥ δ|K|} = Leb({ψv ≥ δ|K|}). n→∞ n lim

We claim that mv := Leb({ψv ≥ δ|K|}) ≤ 2v(1/δ + 1). Indeed, if mv > 2v(1/δ + 1), then the set {ψv ≥ δ|K|} cannot be contained in the union of ˜ = 1/δ + 2 at most 1/δ + 2 intervals of length v. Therefore there are N 1 points ξi ∈ S such that ψv (ξi ) ≥ δ|K| and of minimal mutual distance d(ξi , ξj ) ≥ v. It follows that ˜ N

i=1

Leb(h

−1

([ξi , ξi + v])) =

˜ N

˜ δ|K| ≥ (1 + δ)|K|, ψv (ξi ) ≥ N

i=1

˜ disjoint intervals inside a contradicting that h−1 ([ξi , ξi + v]) consists of N circle of circumference |K|. This proves the claim. Hence we can find a set S = {y1 , y2 , . . . , yN } for N = 1/v such that h(S) is an equidistant lattice on S1 (with minimal mutual distance 1/N ) and (4.25) holds for every h(yi ). Without loss of generality, the yi ’s can be arranged in circular order on K.

168

4. Subshifts of Zero Entropy

Now take y ∈ K arbitrary and i such that y ∈ [yi , yi+1 mod N ). Then h(y) ∈ [h(yi ), h(yi ) + v) and d(f k (yi ), f k (y)) ≤ ψv (Rρk (h(yi ))). Therefore 1 lim sup #{0 ≤ k < n : d(f k (yi ), f k (y)) ≥ δ} n→∞ n 1 ≤ lim sup #{0 ≤ k < n : ψv (Rρk (h(yi ))) ≥ δ} = mv , n→∞ n which means that S is (δ, mv )-spanning. Using the spanning set equivalent of (2.8), we obtain ac(f ) ≤ sup lim sup δ|K|>0

v→0

log 2v(1/δ + 1) = 1, − log v 

and the result follows.

4.3.2. Balanced Words. Another characterization of Sturmian words is by means of their property of being balanced. Recall from Definition 4.36 that a language L ⊂ A∗ is R-balanced if for all a ∈ A, n ∈ N, and v, w ∈ Ln , the numbers |v|a and |w|a of letters a in v and w differs by at most R. Here, we are only interested in the case R = 1, so balanced will mean 1-balanced in this section. Definition 4.57. Clearly, a balanced word x contains precisely one of 00 and 11 as factors (unless x = 10101010 · · · or x = 01010101 · · · ). We say that a balanced word x ∈ {0, 1}N or Z is of type i if the word ii appears in x. Lemma 4.58. Every rotational sequence is balanced. Proof. An equivalent way to define a rotational sequence u is that there is a fixed β ∈ S1 such that (4.26)

un = nα + β − (n − 1)α + β

for all n ∈ Z. This is easy to check, except that in order to include the sequences mentioned in Remark 4.49, we need to add the alternative definition (4.27)

un = nα + β − (n − 1)α + β

for all n ∈ Z. By telescoping series, |uk+1 · · · uk+n |1 = (k + 1)α + β − kα + β +(k + 2)α + β − (k + 1)α + β + · · · + (k + n)α + β − (k + n − 1)α + β = (k + n)α + β − kα + β = nα or nα + 1 regardless of what k is. It follows that u is balanced.



4.3. Sturmian Subshifts

169

Lemma 4.59. If X is an unbalanced subshift on alphabet {0, 1}, then there is a (possibly empty) word w such that both 0w0, 1w1 ∈ L(X). Proof. Let K be minimal such that there are K-words u = u1 · · · uK and v = v1 · · · vK ∈ LK (X) such that | |u|1 − |v|1 | ≥ 2. Since |u|1 − |v|1 can change by at most 1 if u, v are shortened or expanded by one letter, the minimality of K implies that u = 0u2 · · · uK−1 0 and v = 1v2 · · · vK−1 1 (or vice versa) and |u2 · · · uK−1 |1 = |v2 · · · vK−1 |1 . If u2 · · · uK−1 = v2 · · · vK−1 , then we have found our word w. Otherwise, take k = min{j > 1 : uj = vj } and l = max{j < K : uj = vj }. We have four11 possibilities, all leading to shorter possible words: k

l

k

l

u = 0 · · · 1 · · · 1 · · · 0,

u = 0 · · · 1 · · · 0 · · · 0,

v = 1 · · · 0 · · · 0 · · · 1,

v = 1 · · · 0 · · · 1 · · · 1 ,

shorter u,v

k

l

shorter u,v

k

l

u = 0 · · · 0 · · · 1 · · · 0,

u = 0 · · · 0 · · · 0 · · · 0,

v = 1 · · · 1 · · · 0 · · · 1,

v = 1 · · · 1 · · · 1 · · · 1 .

shorter u,v

shorter u,v

This contradicts the minimality of K. The proof is complete, but note that we have proved that |w| ≤ K − 2 as well.  4.3.3. Sturmian Sequences. Definition 4.60. A sequence u ∈ {0, 1}N or {0, 1}Z is called Sturmian if it is recurrent under the shift σ, and the number of different words of length n in u equals pu (n) = n + 1 for each n ≥ 0. Take the shift-orbit closure X = orbσ (u). The corresponding subshift (X, σ) for X = orbσ (u) is called a Sturmian subshift. Remark 4.61. The assumption that u is recurrent is important for the twosided case. Also · · · 00000100000 · · · has p(n) = n + 1, but we don’t want to consider such asymptotically periodic sequences. For one-sided infinite words, the recurrence follows from the assumption that pu (n) = n + 1 for all n ∈ N. Remark 4.62. A Sturmian sequence contains exactly one left-special and one right-special word of length n for each n ∈ N. If they coincide, then this is a bi-special word; see Lemma 4.52. Lemma 4.63. Every rotational sequence is Sturmian. 11 In the first case, the shorter words u, v are not necessarily of the form u = 0w0 and 1w1 yet, so the whole argument needs repeating.

170

4. Subshifts of Zero Entropy

Proof. Let i(x) denote the itinerary of x ∈ S1 w.r.t. {[0, α), [α, 1)}. If ik (x) = ik (y) for 0 ≤ k < n, then Rαk (x) and Rαk (y) belong to the same set [0, α) or [α, 1) for each 0 ≤ k < n. In other words, the interval [x, y) contains no point in Qn := {Rα−k (α) : 0 ≤ k ≤ n}. But Qn consists of exactly n + 1 points, and it divides the circle into n + 1 intervals. Each such interval corresponds to a unique word of length n in the language, so p(n) = n + 1.  Example 4.64. This lemma depends crucially on the partition of S1 into intervals [0, α) and [α, 1). If we take the intervals [0, γ) and [γ, 1) for some γ rationally independent of α ∈ [0, 1] \ Q instead, then p(n) = 2n for all n ≥ 1. Exercise 4.65. Given N ∈ N, find a subshift with complexity p(n) = 2n for n ≤ N and p(n) = N + n for n ≥ N . Theorem 4.66. A non-periodic sequence is Sturmian if and only if it is balanced. Proof. Let x ∈ AN or AZ for A = {0, 1}. ⇐: We prove by contrapositive, so assume that there is a minimal N ∈ N such that p(N ) ≥ N + 2. (Recall from Proposition 1.12 that p(N ) ≤ N implies that x is periodic.) Since p(1) = #A = 2 and 00 and 11 cannot both be words of x (otherwise it wouldn’t be balanced at word-length 2), N ≥ 3. For every n < N − 1, there is one right-special word, but there are two distinct right-special words, say u and v, of length N − 1. In particular, u and v can only differ at their first symbol, because otherwise there are two distinct right-special words of length N − 2. Hence there is w such that 0w = u and 1w = v. But since u and v are right-special, 0w0 and 1w1 are both words in x, and x cannot be balanced. ⇒: Again, proof by contrapositive, so assume that p(n) = n + 1 for all n ∈ N, but x is not balanced. Let N be the minimal integer where this unbalance becomes apparent. We have p(2) = 3. Since both 01 and 10 occur in x (otherwise it would end in 0∞ or 1∞ ) at least one of 00 and 11 cannot occur in x, and hence N ≥ 3. By Lemma 4.59, there is a word w = w1 · · · wN −2 such that both 0w0 and 1w1 occur in x. Observe that w1 = wN −2 , because otherwise both 00 and 11 occur in x. To be definite, suppose that w1 = wN −2 = 0. If N = 3, then w1 = wN −2 , so w is a palindrome. If N ≥ 4, then w2 = wN −3 because otherwise 000 and 101 both occur in x, contradicting that N is the minimal length where the unbalance becomes apparent. Continuing this way, we conclude that w is a palindrome: wk = wN −k−1 for all 1 ≤ k ≤ N − 2.

4.3. Sturmian Subshifts

171

Since p(N − 2) = N − 1 and w is bi-special, exactly one of 0w and 1w is right-special. Say 0w0, 0w1, and 1w1 occur, but not 1w0. Claim: If 1w1 is a prefix of the 2N − 2-word xj+1 · · · xj+2N −2 , then 0w does not occur in this word. Suppose otherwise. Since |1w1| = N and |0w| = N − 1, the occurrence of 0w must overlap with 1w1, say starting at entry k. Then wk · · · wN −2 1 = 0w1 · · · wN −k−1 , so wk = 0 = 1 = wN −k−1 . This contradicts that w is a palindrome and proves the claim. Now xj+1 · · · xj+2N −2 contains N words of length N − 1, but not 0w, according to the claim. This means that one of the remaining N − 1-words must appear twice, and none of these words is right-special. It follows that xj+1 · · · xj+2N −2 can only be continued to the right periodically, and p(n) ≤ N for all n. This contradiction concludes the proof.  Proposition 4.67. If the infinite non-periodic sequence u is balanced, then 1 α := lim |u1 · · · un |1 n→∞ n exists and is irrational. We call α the frequency of u. Proof. Define Mn = min{|uk+1 · · · uk+n |1 : k ≥ 0}.

(4.28)

Since u is balanced, max{|uk+1 · · · uk+n |1 : k ≥ 0} = Mn + 1, so |uk+1 · · · uk+n |1 = Mn or Mn + 1 for every k ∈ N. For q, n ∈ N such that n > q 2 , we can write n = kq + r for a unique k ≥ q and 0 ≤ r < q. We have kMq ≤ Mkq+r = Mn ≤ k(Mq + 1) + r.

(4.29)

Dividing by n gives

Mq − 1 Mq Mn 2 ≤ ≤ + . q n q q

As this holds for all q ≤ q 2 < n, we conclude that { Mnn }n∈N is a Cauchy sequence, say with limit α. Now to prove that α is irrational, assume by contradiction that α = pq and take k = 2m , r = 0, and n = 2m q in (4.29) for increasing m ∈ N. This gives M2q M24q M2m q + 1 M2q + 1 Mq + 1 Mq ≤ ≤ ≤ ··· ≤ ≤ ≤ , q 2q 24q 2q 2q q M

m

2 q so { 2m q }m is increasing and { p to q , so p = Mq or Mq + 1.

M2m q+1 2 m q }m

is decreasing in m. They converge

172

4. Subshifts of Zero Entropy

If p = Mq , then in particular M2m q = 2m Mq for all m ≥ 0. This implies that every 2m q-word w with minimal |w|1 is in fact a concatenation w1 w2 · · · w2M of q-words all with |wi |1 = Mq . Take a subword W := wm · · · wn containing all q-words such that wm+1 = wm = wn ; also, since u is non-periodic, W can be taken non-periodic too. Therefore there exists a v1 in W with |v1 |1 = Mq + 1; we take the leftmost. Since wm = wm+1 , this word v overlaps with wm , and we can write W = w1 v1 v2 · · · vn−m wn , where all vi are q-words, and w1 is a prefix of w1 and wn a suffix of wn such that w1 wn = w1 . But this means that |W |1 ≥ qMq + 1, a contradiction. Finally, if p = Mq +1, then we repeat this argument with a concatenation W = wm · · · wn of q-words wi with |wi |1 = Mq + 1. This completes the proof.  Lemma 4.68. If u and u ∈ {01}N or Z are balanced words with the same frequency α, then u and u generate the same language. Proof. From the proof of Proposition 4.67 we know that α ∈ ( Mnn , Mnn+1 )   and α ∈ ( Mnn , Mnn+1 ) where Mn and Mn are given by (4.28) for u and u , respectively. This implies that Mn = Mn for all n ∈ N. For each n ∈ N, u and u each have only one right-special word in Ln (X); we first show that these right-special words, say w and w , are the same. Assume by contradiction that there is some minimal n such that w = w . Hence there is an n − 1-word v such that w = 0v and w = 1v (or vice versa). But v is right-special, so all four of 0v0, 0v1, 1v0, and 1v1 occur in the combined languages. But  then Mn+1 = |v|1 ≤ Mn+1 − 1, a contradiction. By uniform recurrence (of minimal subshifts), every word of length n appears in any sufficiently long word, specifically in every sufficiently long right-special word. But as these right-special words of u and u are the same, u and u have the same subwords altogether.  We finish this section by proving the last implication for the three equivalent characterizations of Sturmian sequences, due to Morse & Hedlund [425]. Theorem 4.69. Every Sturmian sequence is rotational. Proof. Let u be a Sturmian sequence; by Theorem 4.66 it is balanced as well. By Proposition 4.67, u has an irrational frequency α = limn n1 |u1 · · · un |1 , and by Lemma 4.68, every Sturmian sequence with frequency α generates the same language as u. It is clear that the rotational sequence vn = nα − (n−1)α = i(0) (as in (4.26)) has frequency α. Therefore there is a sequence bj such that σ bj (v) → u. By passing to a subsequence if necessary, we can b / Z, so we can assume that limj Rαj (0) = β. Then (assuming that nα + β ∈

4.3. Sturmian Subshifts

173

use continuity of x → x at this point), un = lim (σ bj v)n = j→∞

lim (n + bj )α − (n + bj − 1)α

j→∞

= nα + β − (n − 1)α + β = i(β). If nα + β ∈ Z, then we need to take the definition (4.27) into account. Note, however, that since α ∈ / Q, this occurs at most for one value of n ∈ Z. This proves the theorem.  4.3.4. Rauzy Graphs. The Rauzy graph Γn of a Sturmian subshift X is the word-graph in which the vertices are the words u ∈ Ln (X) and there is an arrow u → u if ua = bu for some a, b ∈ {0, 1}. Hence Γn has p(n) = n+1 vertices and p(n + 1) = n + 2 edges; it is the vertex-labeled transition graph of the n-block shift interpretation of the Sturmian shift.

110110

010110

011011

101101

101011

011010

110101

Figure 4.9. The Rauzy graph Γ6 based on the Fibonacci Sturmian sequence.

In the example of Figure 4.9, the word u = 101101 is bi-special, but only 0u0, 0u1, and 1u0 ∈ L(X) (i.e. u is a regular bi-special word). Since p(n + 1) − p(n) = 1 for a Sturmian sequence, every Rauzy graph contains exactly one left-special and one right-special word, and they may be merged into a single bi-special word. Hence, there are two types of Rauzy graphs; see Figure 4.10.

left-special

right-special

bi-special

Figure 4.10. The two types of Rauzy graphs for a Sturmian sequence.

174

4. Subshifts of Zero Entropy

The transformation from Γn to Γn+1 is as follows: (a) If Γn is of the first type, then the middle path decreases by one vertex, and the upper and lower path increase by one vertex. The left-special vertex of Γn splits into two vertices, with outgoing arrows leading to the previous successor vertex which now becomes left-special. Similarly, the right-special vertex of Γn is split into two vertices with incoming arrows emerging from the previous predecessor vertex, which now becomes right-special. (b) If Γn is of the second type, then one of the two paths becomes the central path in Γn+1 , the other path becomes the upper path of Γn+1 , and there is an extra arrow in Γn+1 from the right-special word to the left-special word. Thus the bi-special vertex of Γn is split into two vertices, one of which becomes left-special in Γn+1 , and one of the predecessors of the bi-special vertex in Γn becomes right-special in Γn+1 . We can combine all Rauzy graphs into a single inverse limit space Γ=← lim −(Γn , πn ) = {(γn )n≥0 : πn+1 (γn+1 ) = γn ∈ Γn for all n ≥ 0}, where Γ0 has only one vertex  and one arrow  →  and πn+1 : Γn+1 → Γn is the prefix map γn+1 (ua) = u for every u ∈ Ln (X), a ∈ A. It is the inverse of the map described in items (a) and (b) above. By definition, the arrows in Γn have the following property called edge surjective: If v → v  is an arrow in Γn−1 , then there is an arrow u → u in Γn such that πn (u) = v and πn (u ) = v  . It also has the property called positive directional: If u → u and u → u are two arrows in Γn starting at the same vertex, then πn (u ) = πn (u ); that is, πn maps these arrows to the same arrow in Γn−1 . Equipped with product topology, Γ is a Cantor set. We can define a map f : Γ → Γ by “moving one step” along the arrows: (4.30)

f (γ)n = u if u = γn ∈ Γn and there is an arrow u → u .

It seems as if this definition is ambiguous, but by the positive directional property, all choices for f (γ)n+1 give the same answer for f (γ)n . Since this holds for all n ≥ 0, the sequence γ = (γn )n≥0 completely determines f (γ). Moreover, f : Γ → Γ is continuous. The system (Γ, f ) is called the graph cover of the shift (X, σ); it provides a model that is conjugate to the shift. (In order to agree with other graph cover constructions, we can speed up this process by “telescoping” between n’s where there is a bispecial word.) This point of view, which holds for shifts in general and in fact for all continuous

4.3. Sturmian Subshifts

175

Cantor systems, was introduced by Gambaudo & Martens [273] and studied by several authors, especially Shimomura; see e.g. [500–502]. Theorem 4.70. For each k ∈ N, there are at most three values that the frequency 1 lim #{1 ≤ j ≤ n : xj+1 · · · xj+k = w} n→∞ j can take for a k-word w in a Sturmian sequence x. These three values depend only on k and the rotation angle α. Remark 4.71. This is the symbolic version of what is known as the Three Gap Theorem which was conjectured by Hugo Steinhaus and eventually proven by Vera Sós [520, 521]: For every α ∈ R \ Q and n ∈ N, the set {jα mod 1}n−1 j=0 divides the circle into intervals of at most three different sizes. Indeed, since Lebesgue measure is the only invariant probability measure that is preserved by the rotation Rα : x → x + α mod 1, the frequencies in Theorem 4.70 corresponds to the Lebesgue measures (i.e. length) of the intervals. Proof. This is a special case of the more general statement that the frequency can take at most 3(p(n + 1) − p(n)) values, which we will prove here. For Sturmian sequences 3(p(n + 1) − p(n)) = 3. Let n ∈ N be arbitrary and let Γn be the word-graph (Rauzy-graph) of the language. For every vertex a ∈ Γn let a− and a+ be the number of incoming and outgoing arrows. That is, a is left-special, resp. right-special, if a− ≥ 2, resp. a+ ≥ 2. Let V1 = {a ∈ Γn : a+ ≥ 2} be the collection of right-special words of length n. Then



#V1 = 1≤ (a+ − 1) ≤ p(n + 1) − p(n). a+ ≥2

a+ ≥2

Next set V2 = {a ∈ Γn : a+ = 1, a → b, b− ≥ 2}. These are the words a = a0 c that can be extended to the right in a unique way, say a0 can+1 , but b = can+1 is left-special. We have



b− = (b− − 1) + 1 ≤ 2(p(n + 1) − p(n)). #V2 ≤ b− ≥2

b− ≥2

b− ≥2

Now every a ∈ Γn \ (V1 ∪ V2 ) is right-special, and if a → b, then b is leftspecial. This means that a and b appear with the same frequency in infinite words x ∈ X. Every maximal path  ⊂ Γn \(V1 ∪V2 ) is succeeded by a vertex v ∈ V1 ∪ V2 (because otherwise the last vertex of  belongs to V2 ), and no

176

4. Subshifts of Zero Entropy

other such maximal path is succeeded by v. The vertex in V1 ∪ V2 succeeding  has then the same frequency as the elements in . Therefore, the number of different frequencies is bounded by #(V1 ∪ V2 ) ≤ 3(p(n + 1) − p(n)) as claimed.  According to this proof applied to Figure 4.10, the upper and lower arrow indicate two maximal paths, both with an extra vertex in V2 , with distinct frequencies. The middle path (including the left-special and right-special word) has the sum of these frequencies. In the right panel, the bi-special word is the unique vertex that occurs with the sum of the frequencies. 4.3.5. Sturmian Sequences and Substitutions. In this section we will show that every Sturmian shift (Xα , σ) is generated as an S-adic shift based on two substitutions; the order in which these substitutions, χ0 and χ1 , are applied is determined by the continued fraction expansion (see Section 8.2) of rotation number α. Every sequence s ∈ Xα can be generated by using these two substitutions interspersed with shifts. This idea goes back to Morse & Hedlund [425]. Many more details are given by Arnoux in [249, Section 6.3] and [31, 179]. Contrary to Arnoux’s exposition, we prefer to use one-sided Sturmian sequences only. Define the substitutions  0 → 0, (4.31) χ0 : 1 → 10,

 χ1 :

0 → 01, 1 → 1.

Lemma 4.72. Suppose that s = s1 s2 s3 · · · is not left-special: for only one symbol a0 ∈ {0, 1}, the finite prefixes of only a0 s occur in s. Then there is a unique sequence  s = χi (t) if a0 s1 = χi (b0 ), (4.32) t = t1 t2 t3 · · · =: Φ(s) such that s = σ ◦ χi (t) if a0 s1 = χi (b0 ), for some b0 ∈ {0, 1}. Moreover, s is Sturmian if and only if t is Sturmian. If s is left-special, then the first symbol t1 is not uniquely determined. The only left-special right-infinite Sturmian words (when viewed as rotation sequences) are the itineraries of α and 2α. Before proving this, note that if s = χi (t) or s = σ ◦ χi (t) for i ∈ {0, 1}, then more than half of its symbols are equal to i. We call this symbol the type of s; see Definition 4.57. This gives an immediate way to determine if the inverse of χ0 or χ1 is applied in Φ. Proof. Assume that s is of Type 0 (the proof for other type goes likewise). Note that since 11 doesn’t appear in s, it has a unique 1-cutting into words 0 and 10. The choice of whether in this 1-cutting s1 is a block by

4.3. Sturmian Subshifts

177

itself or the second letter of a block is determined by the symbol a0 that can be put in front of s is 0 or 1. With this caveat, the choice of t is unique. Now for the second statement, suppose by contradiction that s is not Sturmian (and hence not balanced; see Theorem 4.66); then by Lemma 4.59, there is a word w such that both 0w0 and 1w1 appear in s. Since s doesn’t contain 11, w = 0v0 for some v, and v0 = χ0 (v  ). Now 10v01 is the prefix of χ0 (1v  1) and 00v00 is the prefix of χ0 (10v  0) or χ(00v  0). This means that both 0v  0 and 1v1 are factors of t, so t is not balanced. For the converse, suppose by contradiction that t is not Sturmian (hence not balanced) and that w is such that 0w0 and 1w1 both appear in t. Then a0w0 appears as a factor too for some a ∈ {0, 1}, unless 0w0 only appears in t as initial word. Let w = χ0 (w). Now χ0 (1w1) = 10w 10 and χ0 (a0w0) = χ(a)0w 0. Because χ0 (a) ends in 0, both 10w 1 and 00w 0 appear in s, so s is not Sturmian. Finally, if 0w0 is the only prefix of t and doesn’t appear elsewhere in t, then a = 0 and 0χ0 (0w0) = 00w 0 appears in s. As above, also 10w 1 appears in s, so s is not Sturmian.  Recall that Xα is the space of one-sided Sturmian sequences with frequency of the symbol 1 equal to α. Proposition 4.73. If s ∈ Xα is a Sturmian sequence and t = Φ(s), then t ∈ Xg(α) , where g : [0, 1] → [0, 1] is defined as  α , α ∈ [0, 12 ), (4.33) g(α) = 1−α 2α−1 1 α , α ∈ ( 2 , 1]. Proof. Let α be the frequency of symbols 1 in t; i.e. 1 α = lim |tk+1 · · · tk+n |1 , n→∞ n uniformly in k ∈ N. If w = tk+1 · · · tk+n , then the limit frequency of 1’s in χi (w) as n → ∞ is   α 1 i = 0,  ∈ [0, 2 ], α = 1+α 1 1 2−α ∈ [ 2 , 1], i = 1. Inverting this relation gives α = g(α). As we already know from Lemma 4.72 that a Sturmian s ∈ Xα can always be written as χi (t) or σ ◦ χi (t) for some  Sturmian sequence t, we have now also determined that t ∈ Xg(α) . Definition 4.74. If we iterate this procedure, then we get a sequence of types (τj )j≥1 which is called the additive coding of the Sturmian shift Xα . We can abbreviate this sequence as τ1 τ2 τ3 · · · = 0a1 1a2 0a3 1a4 · · · ,

178

4. Subshifts of Zero Entropy

where the ai denote the lengths of the consecutive blocks of τj = 0 and τj = 1. Here all exponents ai ≥ 1, except that a1 = 0 if α ∈ ( 12 , 1). This is the multiplicative coding. In particular, the fixed points of the S-adic shifts ρb = lim χτ0 ◦ χτ1 ◦ · · · ◦ χτj (b) (4.34)

j→∞

a

= lim χa01 ◦ χa12 ◦ · · · ◦ χ0n−1 ◦ χa1n (b), n→∞

for b ∈ {0, 1}, are sequences in Xα . To get all sequences s ∈ Xα , we need to intersperse the χτj with shifts as indicated in (4.32). Proposition 4.75. The frequency α ∈ (0, 1) of a Sturmian sequence s satisfies 1 , (4.35) α = [0; 1 + a1 , a2 , a3 , . . . ] := 1 + a1 + a + 1 1 2

a3 + 1

..

. i.e. α has the continued fraction expansion of the multiplicative coding, after adding 1 to a1 . |s1 ···sn |1 α = limn→∞ |s Proof. For simplicity, we take α irrational. Let θ = 1−α 1 ···sn |0 be the limit proportion of 1’s vs. 0’s in s. Then for s = Φ(s) we find the relation between its proportion θ and θ as  θ  ˜ if θ ∈ (0, 1), i.e. s is of Type 0, g−1 0 (θ ) := 1+θ  θ= −1   ˜1 (θ ) := θ + 1 if θ ∈ (1, ∞), i.e. s is of Type 1. g 

  2 ˜−a The iterates of these maps are g˜0−a1 (θ ) = a1θ+θ and g 1 (θ ) = θ + a2 . Therefore ˜ ga01 ◦ g˜1−a1 (x) = a + 1 1 is the first part of a continued fraction 1

a2 +x

expansion. Therefore, for any x ∈ (0, ∞), lim g˜−a1 n→∞ 0

−an−1

2 ◦˜ g−a ◦ ··· ◦ ˜ g0 1

n ◦ g˜−a (x) = lim 1

n→∞

1 a1 +

1 a2 +

1

..

. + an1+x

if θ ∈ (0, 1) (that is, s is of Type 0), and −a 2 3 n ◦˜ g−a ◦· · ·◦˜ g−a ◦˜ g1 n+1 (x) lim ˜g−a1 ◦˜g−a 0 1 0 n→∞ 1

= lim a1 + n→∞

1 a2 +

1 a3 +

1

..

if θ ∈ (1, ∞) (that is, s is of Type 1). Transforming back to α = using that a1 = 0 if s is of Type 1, we find (4.35).

.+ a

1 n+1 +x

1 1+1/θ

and 

4.3. Sturmian Subshifts

179

The substitutions χ0 and χ1 are the symbolic versions of first return maps of a circle rotation Rα (x) = x + α mod 1 with symbol 1 if and only if x ∈ [0, α). • If α ∈ (0, 12 ) (so s is of Type 0), then we take the first return map to J0 := [2α, 1) ∪ [0, α) and rescale this interval to unit size. The resulting rotation is "  ˜ 0 (x) = x + α mod 1 with symbol 1 if and only if x ∈ 0, α ; R 1−α 1−α ˜ 0 -itinerary t, then its Rα see Figure 4.11 (left). If x ∈ J0 has R 1 itinerary is χ0 (t). If x ∈ S \J0 , then i−1 (x)i0 (x) = 10 and Rα−1 (x) ∈ ˜ 0 -itinerary of R−1 (x), then the Rα J0 . In this case, if t is the R α itinerary of x is i(x) = σ ◦ χ0 (t). • If α ∈ ( 12 , 1) (so s is of Type 1), then we take the first return map to J1 := [α, 1) ∪ [0, 2α − 1) and rescale this interval to unit size. The resulting rotation is "  2α − 1 2α − 1 ˜ mod 1 with symbol 1 if and only if x ∈ 0, ; R1 (x) = x+ α α ˜ 1 -itinerary t, then its Rα -itinerary see Figure 4.11 (right). If x has R is χ1 (t). If x ∈ S1 \ J1 , then i−1 (x)i0 (x) = 01 and Rα−1 (x) ∈ J1 . In ˜ 1 -itinerary of R−1 (x), then the Rα -itinerary this case, if t is the R α of x is i(x) = σ ◦ χ1 (t). In conclusion, this renormalization operation turns Rα into Rg(α) . To obtain the itinerary of any other point x ∈ S1 , we need to apply shifts as in the second line of (4.32) every time the renormalization image of x doesn’t belong to J0 or J1 . 0

[

1

α



)[

[ 0

1

)[

α 1−α

0

0

) 1

1

0

)

[

2α − 1

1

α

)[

[

1

)[

0

2α−1 α

0

Figure 4.11. First returns of circle rotations for α < α > 12 (Type 1).

0

1 2

(Type 0) and

1

)

) 1

180

4. Subshifts of Zero Entropy

Since the return maps are always to neighborhoods of 0 ∈ S1 , iterating the above procedure gives the itinerary of 0: i(0) = lim χτ0 ◦ χτ1 ◦ · · · ◦ χτj (1). j→∞

Example 4.76. For the Fibonacci rotation over α = (3 − 0), the itinerary of 0 is

√ 5)/2 (of Type

t := i(0) = 10 01001 01001001 010010100100101001001 · · · = 10ρFib where ρFib is the fixed point of the Fibonacci substitution χFib . At the same time, t = lim χ0 ◦ χ1 ◦ · · · ◦ χ0 ◦ χ1 (0).   j j pairs  By √ the same token, the itinerary of 0 for the Fibonacci rotation with α = ( 5 − 1)/2 (of Type 1) is

t := i(0) = 10 10110 10110101 101101011011010110110 · · · = 10ρFib where ρFib is the fixed point of the Fibonacci substitution χFib with interchanged symbols (χFib : 0 → 1, 1 → 10). However, there is no direct way of writing χFib or χ2Fib as composition of χ0 and χ1 , but ρ = lim χ0 ◦ σ ◦ χ1 ◦ χ0 ◦ χ1 ◦ · · · ◦ χ0 ◦ χ1 (0)

  j j pairs

and

ρ = lim χ1 ◦ σ ◦ χ0 ◦ χ1 ◦ χ0 ◦ · · · ◦ χ1 ◦ χ0 (1).

  j j pairs

There is a further argument, once the frequency of the Sturmian sequence has been determined, to also determine for which point y ∈ S1 the Sturmian sequence x = i(y), but we will skip these details; see [249, Section 6.3].

4.4. Interval Exchange Transformations Interval exchange transformations (IET) were studied by Katok [345, 568] and Sina˘ı, cf. [513], and hints towards such system appear earlier than that (see [252]), in the context of billiard dynamics and flows on flat surfaces, including translation surfaces. The subject was taken up for its own sake in a series of papers by Mike Keane, in which minimality and unique ergodicity were discussed. Together with Rauzy [351, Theorem 7] he showed that the uniquely ergodic IETs form a residual set. The question of whether unique ergodicity for IETs is a Lebesgue typical phenomenon (the Keane Conjecture) inspired the creation of a wealth of new mathematics, in particular new applications of Teichmüller theory, which is central in the solutions of Veech [543] and Masur [410]. We won’t discuss this in this text (the monograph of

4.4. Interval Exchange Transformations

181

Viana [548] is strongly recommended), but we say some more about unique ergodicity and counterexamples to unique ergodicity in Section 6.3.5. Definition 4.77. A map T : [0, 1) → [0, 1) is called an interval exchange transformation (IET) if there is a finite partition into half-open intervals {Δi }di=1 such that T |Δi is a translation and T is invertible. That is, the intervals Δi , i = 1, . . . , d, are mapped back into [0, 1) after a permutation π : {1, . . . , d} → {1, . . . , d}. As a formula, with λi = |Δi |,



T (x) = x − λj + λj if x ∈ Δi = [γi−1 , γi ). j 12 (Type 1). 1−λ Since we cut off at a discontinuity point (Type 0) or a discontinuity point of T −1 (Type 1), the total number of intervals of continuity of the first return map is again d. Only if λd = λe , one has to make a choice, but under the Keane condition this cannot happen. If λd = 0 or λe = 0, then the Rauzy induction doesn’t do anything, but also this degenerate case is prevented by the Keane condition. In effect, if the Keane condition holds, then every iterate of the Rauzy induction is well-defined. As a formula, the Rauzy induction step looks as in Table 4.3. The Rauzy induction map Θ : Σd × S → Σd × S is 2-to-1. Indeed, if we are given the type of the Rauzy induction step that produces (λ , π  ) = Θ(λ, π), then we can reconstruct π from π  and then λ from λ and π. See Table 4.3 for a summary. One can easily compute that the incidence matrices of the substitutions are all unimodular. Let χn denote the substitution emerging from the n-th Rauzy induction step. Under the Keane condition, χn is well-defined for every n ∈ N. Since

4.5. Toeplitz Shifts

185

Table 4.3. Rauzy induction formulas (with σ(e) = d).

Type 0: λ =

Type 1:

λe < λd

1 1−λe (λ1 , . . . , λd−1 , λd

− λe ),

⎧  ⎪ if π(j) ≤ π(d), ⎨π (j) = π(j)  π (e) = π(d) + 1, ⎪ ⎩  π (j) = π(j) + 1 if π(j) > π(d),  χ:

j→j e → ed.

if j = e,

λ =

1 1−λd (λ1 , . . . , λe

λd < λe , − λd , λd , . . . , λd−1 ),

⎧  ⎪ ⎨π (j) = π(j) π  (e + 1) = π(d), ⎪ ⎩  π (j) = π(j − 1) ⎧ ⎪ ⎨j → j χ : e + 1 → ed, ⎪ ⎩ j →j−1

if j ≤ e, if j > e + 1, if j ≤ e, if j > e + 1.

χn (1) starts with 1 for every n and there is a fixed point of the corresponding S-adic transformation: ρT := lim χ1 ◦ χ2 ◦ · · · ◦ χn (1). n→∞

Since the iterates of the Rauzy induction represent first return maps to shorter and shorter one-sided neighborhoods of 0 (assuming the Keane condition holds), every letter will eventually play the role of d and e, and therefore this S-adic substitution is primitive. This gives another proof that irreducible IETs satisfying the Keane condition are minimal. Since χ1 ◦ χ2 ◦ · · · ◦ χn (1) represents the successive intervals Δi that 0 visits before its first return time associated to the n-th Rauzy induction step, ρT = i(0). Since x = 0 has a dense orbit, the one-sided subshift is XT = orbσ (ρT ).

4.5. Toeplitz Shifts Definition 4.84. A sequence x ∈ AN or Z is called a Toeplitz sequence if for every i ∈ N, there exists qi ∈ N such that xi = xi+kqi for all k ∈ N or Z. The orbit closure Xq = {σ n (x) : n ≥ 0} is called a Toeplitz shift. Although the first example of this appeared in a paper by Oxtoby [440], the name and notion were introduced by Jacobs & Keane [332], inspired by a construction by Otto Toeplitz (1881–1940) [537] to create an almost periodic function on the real line, but otherwise, Toeplitz was not involved. Proposition 4.85. If χ : A → A∗ is a constant length substitution such that χ(a) starts with the same symbol for each a ∈ A, then the unique fixed point of χ is a Toeplitz sequence.

186

4. Subshifts of Zero Entropy

Proof. Fix the symbol a ∈ A such that χ(a) starts with a, so ρ = ρ1 ρ2 ρ3 · · · = limn χn (a) is the fixed point of χ. Let N = |χ(b)| for each b ∈ A. Then clearly ρ1+kN = a for all k ∈ N, so we can take q1 = N . It follows that χ(ρ1 · · · ρ1+kN ) (which has length kN 2 + N ) starts and ends with χ(a). Therefore qi = N 2 for i = 2, . . . , N . Continuing by induc tion, we find qi = N r for i = N r−1 + 1, . . . , N r . Example 4.86. The simplest way to construct a Toeplitz sequence emerges from taking qi = 2i , the powers of 2, and x qi +kqi = 12 (1 − (−1)i ) for all 2 k ≥ 0 and i = 1, 2, 3, . . . . The resulting Toeplitz sequence is the Feigenbaum sequence, ρfeig = 1011101010111011101110101011101010111010101110111 · · · ; see Example 1.6 for more details on this sequence. Although ρfeig is Toeplitz, not every sequence in Xfeig = orbσ (ρfeig ) has the Toeplitz property. For example, ρfeig has two preimages in Xfeig , namely 0ρfeig and 1ρfeig . Of these two, only 0ρfeig is a Toeplitz sequence. As will be shown in Section 4.7.1, ρfeig is the kneading sequence of an infinitely renormalizable unimodal map. In fact, the kneading sequence of every infinitely renormalizable unimodal map is a Toeplitz sequence. More generally, Alvin [23, 24] classifies all the Toeplitz sequences which appear as a kneading sequence (and for which the unimodal maps act on ω(c) as (strange) adding machines). Proposition 4.87. The Thue-Morse sequence ρTM = 1001 0110 0110 1001 0110 1001 1001 0110 · · · obtained from the Thue-Morse substitution χTM : 0 → 01, 1 → 10, is not a Toeplitz sequence. However, the Thue-Morse shift factorizes to the Feigenbaum substitution shift via the sliding block code 01, 10 → 1, 00, 11 → 1 (see Example 1.6) and the Feigenbaum substitution shift is Toeplitz. Sketch of Proof. We show that there is no period p1 such that ρ1 = ρ1+kp1 for all k. First assume by contradiction that p1 is odd. Then {2n mod p1 : n ≥ 1} = {1, . . . , p1 − 1}; in fact, 2n mod p1 traverses these rest-classes periodically. Therefore 2n + 3 = kp1 for infinitely many k, n ∈ N. Since ρ1 ρ2 ρ3 ρ4 = 1001 is the opposite word to ρ2n +1 ρ2n +2 ρ2n +3 ρ2n +4 = 0110, ρ1 = ρ2n +4 , so the period cannot be an odd p1 . However, if p1 is even, then kp1 + 1 is odd for all k ∈ N. If we divide ρ into blocks of length 2, then kp1 + 1 is always the first symbol of such a block. By taking the inverse of the substitution (which fixes ρ), it follows

4.5. Toeplitz Shifts

187

that p1 /2 is also a period of ρ1 . Continuing by induction, we find that ρ1 has an odd period after all, contradicting the first half of this proof. There is no infinite arithmetic progression kj = jq + r such that ρkj is the same for all j. This follows from estimates of the possible lengths of arithmetic progressions in the Thue-Morse sequence by Pashina [448].  Lemma 4.88. A Toeplitz shift (Xq , σ) is uniformly rigid and hence minimal. Proof. We give the proof for one-sided Toeplitz sequences; the proof of twosided sequences goes likewise. Let [x1 x2 · · · xn ] be any cylinder set. Then every digit xi reappears with gap qi . Hence, if Ln = lcm(q1 , . . . , qn ) is the least common multiple of q1 , . . . , qn , then σ kL ([x1 x2 · · · xn ]) ⊂ [x1 x2 · · · xn ] for all k ∈ N. This is uniform rigidity. The minimality of the corresponding subshift follows from Lemma 2.24 and Corollary 2.20.  The way to build up a Toeplitz sequence in {0, 1}N or Z is to start with x1 = 1, choose q1 , and set x1+kq1 = 1 for all k ∈ N (or k ∈ Z for a two-sided Toeplitz sequence, but we will focus on the one-sided Toeplitz sequences). The rest of the entries get a “temporary ∗”: xi = ∗. Next set x2 = 0, choose q2 (not coprime with q1 ), and set x2+kq2 = 0. Continuing this way inductively, let xi be the first remaining temporary ∗’s and choose qi − i a multiple of the period of the pattern of the remaining ∗’s. The periodic sequence Sk(qj ) ∈ {0, 1, ∗}N of the j-th step of this construction is called the qj -skeleton of the Toeplitz sequence. Example 4.89. As an example of building

(4.36)

q1 = 3 :

1∗∗1∗∗1∗∗1∗∗1∗∗1∗∗1∗∗1∗∗1∗∗. . . ,

q2 = 6 :

10∗1∗∗10∗1∗∗10∗1∗∗10∗1∗∗10∗. . . ,

q3 = 3 :

1011∗11011∗11011∗11011∗1101 . . . ,

q4 = 12 : .. .

1011011011∗11011011011∗1101 . . . , .. .. .. . . .

In most cases, qj+1 is a multiple of qj , but (4.36) shows that this is not necessary. However, if q = (qj )j≥1 is such that qj divides qj+1 for all j ∈ N, then we call q the periodic structure of the Toeplitz sequence x. This construction of skeletons yields an extension of Proposition 4.85. Theorem 4.90. The one-sided sequence x ∈ AN is Toeplitz if and only if there is a sequence of constant length substitutions χi : Ai → Ai−1 on finite alphabets Ai with A = A0 such that χi (a) starts with the same symbol for each a ∈ Ai , and x = limi→∞ χ1 ◦ χ1 ◦ · · · ◦ χi (a), a ∈ Ai arbitrary.

188

4. Subshifts of Zero Entropy

Proof. Let Ni = χi (a) be the length of the words from the i-th substitution. By the condition that x1 = χ1 (a) for all a ∈ A1 , we find x1+kN1 = x1 for all k ∈ N. By composing χ1 ◦χ2 , we obtain x1 · · · xN1 = x1+kN1 N2 · · · xN1 +kN1 N2 for all k ∈ N. In general, the initial block x1 · · · xN1 N2 ···Nr repeats with period N1 N2 · · · Nr Nr+1 , so x is Toeplitz. Conversely, if x = x1 x2 x3 · · · is Toeplitz on alphabet A0 , then there is N1 such that x1+kN1 = x1 for all k ∈ N, and there is a finite collection of N1 -words bk , k = 1, . . . , K1 , all starting with x1 such that x = bk1 bk2 bk3 · · · . Consider {bk }N k=1 as the letters of alphabet A1 , and define the substitution word χ1 (bk ) (as letter) = bk (as N1 -word). Then x = χ1 (bk1 bk2 bk3 · · · ). Since the N1 -words bki appear with their own gap, bk1 bk2 bk3 · · · ∈ AN 1 is a Toeplitz sequence on its own right, and we can repeat the construction.  4.5.1. Regular Toeplitz Sequences. When constructing a Toeplitz sequence this way, at step n, we have an Ln -periodic sequence, where Ln = lcm(q1 , . . . , qn ). We call the Toeplitz sequence regular if L1n #{1 ≤ i ≤ Ln : xi = ∗} → 0 as n → ∞. The official definition is slightly weaker: Definition 4.91. A sequence x ∈ AN or AZ is a regular Toeplitz sequence if it is the limit of skeletons Sk(Ln ) ∈ (A ∪ {∗})N or (A ∪ {∗})Z of period Ln such that Sk∗ (Ln ) = 0 for Sk∗ (Ln ) := #{1 ≤ i ≤ Ln : Sk(Ln )i = ∗}. n→∞ Ln lim

Theorem 4.92. A regular Toeplitz shift has zero entropy. Proof. We follow [381, Theorem 4.76]. Let V (i) be the Li -word in (A ∪ {∗})Li obtained in the i-th step of the construction of Example 4.89; i.e. we have now an Li -periodic skeleton Sk(i) = V (i)∞ ∈ (A ∪ {∗})N . Let ri = |V (i)|∗ be the number of ∗’s in V (i). Then there are at most #Ari ways to fill in the ∗’s later on, and there are at most #Ari Li -words in the Toeplitz sequence x starting at a position 1 + kLi . Therefore px (Li ) ≤ Li #Ari , and lim

i→∞

1 log Li + ri log #A ri log px (Li ) ≤ lim ≤ log #A lim = 0. i→∞ i→∞ Li Li Li

Since px (n) is subadditive, limn

1 n

log px (n) = 0 by Fekete’s Lemma 1.15. 

The following upper bound for the amorphic complexity of regular Toeplitz sequences was shown in [264]. Theorem 4.93. Let (Xq , σ) be a Toeplitz sequence with periodic structure log qj+1 “=” (qj )∞ j=1 . Then the amorphic complexity ac(σ) ≤ lim supj→∞ − log Sk∗ (qj ) . In particular, if qj+1 ≤ C1 qjt and Sk∗ (qj ) ≤ C2 qj−u , then ac(σ) ≤ ut .

4.5. Toeplitz Shifts

189

With some more work, and for the two-letter alphabet, we could improve log q the upper bound to ac(σ) ≤ lim supj − log Sk∗j (qj ) . By stipulating further properties on the Toeplitz sequence, one can (see [264, Section 5]) give examples showing that this upper bound is sharp and also that for a dense set of values a ∈ [1, ∞) (including a = 1), there is a regular Toeplitz shift with ac(σ) = a. Proof of Theorem 4.93. Note that the densities Sk∗ (qj ) are decreasing in j, and by regularity of the Toeplitz shift, limj Sk∗ (qj ) → 0. Choose δ > 0 arbitrary and m ∈ N such that 2−m < δ. Next choose v arbitrary and j ∈ N such that (2m + 1)Sk∗ (qj+1 ) < v ≤ (2m + 1)Sk∗ (qj ). Then Sep(δ, v) ≤ Sep(2−m , (2m + 1)Sk∗ (qj )). We claim that the right-hand side is bounded by qj+1 . Indeed, assume by contradiction that there is a (2−m , (2m + 1)Sk∗ (qj ))-separated set S with more than qj+1 elements. Then at least two of them, say x, y ∈ S, share the same qj+1 -skeleton. This means that x and y differ at most in qj+1 Sk∗ (qj+1 ) positions in every qj+1 -block. Since d(σ k (x), σ k (y)) ≥ δ only if xi = yi for some i with |i − k| ≤ m, 1 #{0 ≤ k < nqj+1 : d(σ k (x), σ k (y)) ≥ δ} nqj+1 2m + 1 ≤ #{0 ≤ k < nqj+1 : xk = yk } ≤ (2m + 1)Sk∗ (qj+1 ). nqj+1 When taking the limit n → ∞, we get a contradiction with the choice of j. This proves the claim. Therefore Sep(δ, v) ≤ qj+1 . Take logarithms and divide left- and righthand sides by − log v ≥ − log(2m + 1)Sk∗ (qj ), respectively. This gives log qj+1 log Sep(δ, v) ≤ . − log v − log(2m + 1) − log Sk∗ (j) Note that m depends only on δ. Thus taking the superior limit v → 0 (and log q as claimed.  hence j → ∞), we obtain ac(σ) ≤ lim supj − log Skj+1 ∗ (qj ) Theorem 4.94. For every real number K ≥ 0, there is a Toeplitz shift (X, σ) such that htop (σ) = K. For every real number K ≥ 1, there is a Toeplitz shift (X, σ) that has polynomial word-complexity with exponent K; i.e. limn→∞ loglogp(n) n = K. Proof. We start with the positive entropy Toeplitz sequence, following [381, Theorem 4.77], which in turn follows [560]. Let A be alphabet such that &an ∞ log #A ≥ 2K and take a sequence (ki )i∈N such that i=1 (1 − k1i ) = log2K #A ∈ (0, 1). Start with an L0 -word V (0) containing r0 = L0 /2 symbols ∗. We construct the i-th skeleton V (i)∞ with |V (i)| = Li , recursively. Given V (i),

190

4. Subshifts of Zero Entropy

let W (i) be the concatenation of the (#A)ri copies of V (i) where the ri symbols ∗ are replaced by the (#A)ri ri -words in A. Then set V (i + 1) := W (i)V (i)(#A)

ri (k −1) i

,

so that |V (i + 1)| = ki (#A)ri , each non-∗ symbol in V (i) returns with symbols ∗. periodic gap ≤ Li , and V (i + 1) contains ri+1 = ri kik−1 i & r0 2K 1 It follows that limi Lrii = Lr00 ∞ i=1 (1 − ki ) = L0 log #A > 0 (so regularity fails). The number of Li -words p(Li ) is bounded below by (#A)ri (namely the words that start at a positive 1 + kLi ) and bounded above by Li (#A)ri (all starting positions). Therefore log p(Li ) log Li + ri log #A ri log #A ≤ ≤ Li Li Li #A &∞ r0 1 i) whence limi log Lp(L = r0 log i=1 (1 − ki ) = L0 2K = K. Therefore the L0 i topological entropy is limn

log p(n) n

= K by Fekete’s Lemma 1.15.

We will not give the examples with logarithmic complexity limn K ≥ 1, but the technique is the same.

log p(n) log n

= 

4.5.2. Adding Machines. Just like the more general enumeration system in Section 5.3, adding machines are a class of symbolic systems that are not subshifts. They are also called odometers13 , after the device in a car to measure distance. Such an odometer consists of a number of disks, with the digits 0, . . . , 9 written on the edge. A single “tick” moves the rightmost disk by one unit, and if the 9 is passed (so the disk is back at position 0), it ticks over the second disk by one unit. A mathematical odometer has infinitely many disks, and the number of digits may vary from disk to disk. The most common one is the dyadic adding machine or dyadic odometer a : Σ → Σ for Σ = {0, 1}N . For x ∈ Σ, let k = inf{i : xi = 0}. Then a is defined as ⎧ ⎪ ⎨0, i < k, (4.37) a(x)i = 1, i = k, ⎪ ⎩ xi , i > k. Also, if x = 111 · · · , so k = ∞, then a(x) = 000 · · · . In more generality, we can choose a sequence p := (pk )k≥1 of integers pk ≥ 2 and define a on Σp := {(xk )k≥1 : xk ∈ {0, 1, . . . , pk − 1}} analogously to (4.37). It is also instructive to view this procedure algorithmically, as 13 After

the ancient Greek words oδoς ῾ and μετ ρoν for road and measure.

4.5. Toeplitz Shifts

191

“add one and carry”. c := 1 ; k := 1 Repeat s := xk + c If s ≥ pk then c := 1 else c := 0

(4.38)

xk := s mod pk ; k := k + 1 Until

c=0

In fact, Σp is a group under the same rule of add and carry, and a : Σp → Σp is invertible. Proposition 4.95. Every odometer Σp is a topological group under addition. Proof. The addition z = x + y of two sequences x, y ∈ X with add and carry goes according to the algorithm: c := 0 ; k := 1 for all k ∈ N

Repeat

s := xk + yk + c If s ≥ pk then c := 1 else c := 0 zk := s mod pk ; k := k + 1 It is straightforward to check that this is commutative and continuous in x and y.  Exercise 4.96. Show that an odometer (Σp , a) is conjugate to its own inverse (Σp , a−1 ). Remark 4.97. There is a common alternative way to write adding machines. Given p = (pj )j∈N , define a sequence q = (qj )j∈N0 by q0 = 1 and qj = &j k=1 pk . Set ˜ q = {y = (yj )∞ : yj ∈ {0, . . . , qj −1}, qj−1 divides (yj −yj−1 ) for all j ∈ N}, Σ j=1 ˜ q by where y0 = 0 by convention. Define b : Σp → Σ (4.39)

b(x)k =

k

xj qj−1

with inverse

j=1

b−1 (y)k =

yk − yk−1 . qk−1

Then b is a homeomorphism, and ˜ (y)k = yk + 1 mod qk for all k ∈ N. a ˜ q , then qj = 10j and n qj 10j−1 If car odometers were constructed as Σ j=1 on the odometer would be the total number of kilometers driven mod10n . ˜◦b b◦a=a

for

192

4. Subshifts of Zero Entropy

Remark 4.98. There is yet another, less common, way to write the adding ˆ p = Σp and define machine, provided all the pi ’s are pairwise coprime. Let Σ ˆp → Σ ˆ p as a ˆ:Σ ˆ (y)i = yi + 1 mod pi for all i ≥ 1. a ˆ p, a ˆ p defined as ˆ ) are conjugate via ψ : Σp → Σ Then (Σp , a) and (Σ ψ(x)i =

i

xj pj−1 mod pi

with p0 = 1.

j=1

The inverse of this map ψ can be computed using the Chinese Remainder Theorem which states that, whenever p1 , . . . , pk are coprime integers greater & than 1 and N = ki=1 pi and given integers 0 ≤ ai < pi , the congruence equations (4.40)

x mod pi = ai ,

1 ≤ i ≤ k,

have a unique solution 0 ≤ x < N . A constructive solution can be found inductively. Since gcd(p1 , p2 ) = 1, Bézout’s identity (effectively the Euclidean algorithm) gives n1 , n2 ∈ Z such that n1 p1 + n2 p2 = 1. Then x = a1,2 := a1 n2 p2 + a2 n1 p1 mod p1 p2 solves the first two congruence equations. Now replace these first two congruence equations by x ≡ a1,2 mod p1 p2 and continue by induction. This inductive procedure also shows that if we increase k to k + 1, the new solution is in the same congruence class modN as the previous. Hence ψ −1 (y) can be computed term by term. Proposition 4.99. Every odometer is uniformly rigid and hence periodically recurrent. Proof. Let ε > 0 be arbitrary and take k such that 2−k < ε. Let qk = p1 p2 · · · pk . Then aqk (x)i = xi for all i ≤ k; i.e. d(aqk (x), x) < ε as required. Periodic recurrence follows by Lemma 2.24.  Proposition 4.100. Every odometer is strictly ergodic; i.e. it is minimal and has a unique invariant probability measure; see Section 6.3. Proof. Given any n-cylinder Z, every x ∈ Σp will visit it exactly once in every p1 p2 · · · pn iterates of a. Therefore orba (x) is dense in Σp and the only  a-invariant probability measure has μ(Z) = (p1 p2 · · · pn )−1 . Proposition 4.101. Every odometer is an isometry, and hence of zero entropy. Proof. Let x, y ∈ Σp and n = min{i ≥ 1 : xi = yi }, so d(x, y) = 2−n . The algorithmic definition of a shows that mini {a(x)i = a(y)i } = n as well. Therefore a is an isometry, and in particular equicontinuous. Proposi tion 2.49 shows that htop (a) = 0.

4.5. Toeplitz Shifts

193

Odometers can be classified by the structure of the sequence p = (pi )i∈N . There is no restriction in assuming that all pi are primes, because otherwise, i.e. if pi is the product of k primes, we can replace the i-th “wheel” of the odometers by k wheels, each with a prime number of digits. Define kp : {primes} → {0, 1, 2, . . . , ∞} by setting kp (n) = #{i ∈ N : pi = n}. It is shown in e.g. [208] that (Σp , a) and (Σp , a) are conjugate if and only if kp ≡ kp . Also (Σp , a) is a factor of (Σp , a) if and only if kp (n) ≤ kp (n) for every prime n. Therefore the only proper factors of simple odometers, i.e. those odometers for which all pi ’s are the same prime, are finite cyclic groups. All non-simple odometers have other odometers as factors. Proposition 4.102. An odometer has no subshift other than periodic subshifts as continuous factors. However, an odometer can be a factor of a subshift. Proof. Clearly the restriction of a to the first n digits gives an p1 p2 · · · pn periodic orbit. However, since a is an isometry, it cannot have an expansive continuous factor, and by Proposition 2.38, all non-periodic transitive subshifts are expansive. Conversely, take the Feigenbaum substitution shift (Xfeig , σ) with Xfeig = orbσ (ρ) for the fixed point ρfeig = ρ0 ρ1 ρ2 · · · = 1011 1010 10111011 1011101010111010 1011 · · · . The shift is invertible on Xfeig , except that ρfeig itself has two preimages 0ρfeig and 1ρfeig . We define a factor map ϕ onto the dyadic inverse odometer (X, a−1 ), for Σ = {0, 1}N . Since odometers are conjugate to their own inverses (see Exercise 4.96), this gives a factor map onto (Σ, a) too. Carry out the following algorithm: y1 := min{n ≥ 1 : xn = 0} mod 2, y2 := min{n ≥ 1 : xy1 +2n = 1} mod 2, y3 := min{n ≥ 1 : xy1 +2y2 +4n = 0} mod 2, y4 := min{n ≥ 1 : xy1 +2y2 +4y3 +8n = 0} mod 2, .. .. . . and set ϕ(x) = y. For example, we get ϕ(ρfeig ) = 0000 · · ·

and

ϕ ◦ σ(ρfeig ) = 1111 · · · .

Note that this is not a sliding block code, since the windows to consider to determine yi increase with i. However, ϕ is continuous, and one can check that ϕ ◦ σ = a−1 ◦ ϕ. The above minima are taken over n ≥ 1. Therefore ϕ(0ρfeig ) = ϕ(1ρfeig ) and in fact ϕ(σ −k (0ρfeig )) = ϕ(σ −k (1ρfeig )) for all k ≥ 0. 

194

4. Subshifts of Zero Entropy

Theorem 4.103. Let (Xq , σ) be a Toeplitz shift with periodic structure q and assume that p = (pi )i≥1 with p1 = q1 , pi = qi /qi−1 is an integer sequence. Then (Σp , a) is the maximal equicontinuous factor of (Xq , σ), and (Xq , σ) is a non-trivial almost one-to-one extension of (Σp , a). By [208], a minimal system (X, T ) is an almost one-to-one extension of an odometer if and only if X is the closure of a periodically recurrent point. Let us denote the factor map by π. Then the periodically recurrent points x ∈ X are exactly the single fibers: π −1 ◦ π(x) = {x}. If T is a homeomorphism, then (X, T ) is an odometer. The Feigenbaum shift of Example 4.86 demonstrates non-singleton fibers: σ −1 (ρfeig ) = {0ρfeig , 1ρfeig } = π −1 ◦ a−1 ◦ π(ρfeig ) = π −1 ◦ a−1 (0∞ ) = π −1 (1∞ ). Proof. Let Xq be the orbit closure of the Toeplitz sequence x with periodic structure q. Let Sk(qj ) be the j-th skeleton of x, so it is a qj -periodic sequence in (A ∪ {∗})∞ . For y ∈ Xq , define πj (y) = r ∈ {0, . . . , pj − 1}

if yi = Sk(qj )i+r whenever Sk(qj )i+r = ∗.

Therefore πj (σ n y) = πj (y) + n mod qj , so πj is surjective, and π −1 (r), r = 0, . . . , qj − 1, are qj disjoint clopen sets in Xq . For y ∈ Xq , it may not be clear from the first qj entries what πj (y) is. However, for every j, there is mj such that the first mj entries determine the value of πj (y). Therefore πj is continuous. Note that π(y)j − π(y)j−1 is always a multiple of qj−1 . Thus we can ˜ q by define π : Xq → Σ

π(y)j = πj (y).

Then π −1 (z) = j πj−j (z), as the intersection of nested non-empty closed sets, is itself non-empty. Thus π is surjective, continuous, and π ◦ σ = ˜ q, a ˜ ) to ˜ ◦ π, were a ˜ is defined in Remark 4.97. Via b we can recode (Σ a the adding machine (Σp , a) as Remark 4.97 explains. This adding machine is thus a factor of the Toeplitz shift and, as with all adding machines, it is equicontinuous. If we set π ˜ = b ◦ π, we see further that π ˜ (σ n (x)) = an (00000 · · · ) =: (n) ˜ −1 ((n)) = {σ n (x)}. Therefore (Xp , σ) is for each n ∈ N0 and that also π an almost one-to-one extension of (Σp , a). However, there must be z ∈ Σp such that π ˜ −1 (z) ≥ 2, because otherwise (Σp , a) would be conjugate to the (expansive) subshift (Xq , σ), contradicting Proposition 4.102. It follows from Theorem 2.43 that (Σp , a) is the maximal equicontinuous  factor of (Xq , σ). The following result can be found in [381, Theorem 4.4].

4.6. B-Free Shifts

195

Theorem 4.104. Every transitive equicontinuous dynamical system (X, T ) on the Cantor set X is conjugate to an adding machine. Proof. Recall from Proposition 2.31 that (X, T ) preserves the metric d∞ (, y) := sup d(T n (x), T n (y)). n≥0

Define ε-chain-connectedness as the equivalence relation x ≈ε y if there are x = x0 , x1 , . . . , xn = y such that d∞ (xi , xi+1 ) < ε. Since T preserves d∞ , we have x ≈ε y if and only if T (x) ≈ε T (y), so T permutes the equivalence classes of ≈ε . Take ε0 maximal such that X doesn’t consist of a single equivalence class of ≈ε0 anymore. By compactness there are finitely many, say p1 , equivalence classes. Since T is transitive, it permutes these classes cyclically, so we can number them as E0 , E1 , . . . , Ep1 −1 where T (Ei ) = Ei+1 mod p1 , and in particular, T p1 fixed each Ei . Next take ε1 < ε0 maximal such that E0 is not a single equivalence class for ≈ε1 , and as before there are finitely many, say p2 , equivalence classes in E0 . Since T is transitive, T p1 permutes these classes cyclically, and we can number them as E00 , E01 , . . . , E0(p2 −1) where T p1 (E0i ) = E0(i+1 mod p2 ) . Furthermore, we can number the images of the E0i such that, for all 0 ≤ i < p2 , we have T (Eji ) = E(j+1)i for 0 ≤ i ≤ p1 − 2 and T (E(p1 −1)i ) = E0(i+1 mod p2 ) . In particular, T p1 p2 fixed each Eji . Continuing this way, we see that T permutes the sets Ex1 x2 ···xn in accordance with the map a on the q-adding machine for the sequence (pj )j≥1 , and this carries over to the infinite intersections Ex1 x2 ··· = n≥1 Ex1 x1 ···xn . These intersections are in fact points, because X is totally disconnected. This completes the proof. 

4.6. B-Free Shifts In order to study his famous conjecture, Sarnak introduced B-free shifts in 2010, although B-free sets have already been studied since the 1930s. Our main source for this section is the monograph [231]. Definition 4.105. For a subset B ⊂ N and FB := Z\{nb : b ∈ B, n ∈ Z}, let η := 1FB be the indicator function of FB . That is, ηk = 0 if k = nb for some b ∈ B and n ∈ Z; otherwise ηk = 1. Let XB = orbσ (η) be the shift-orbit closure of η. The two-sided subshift (XB , σ) is called the B-free shift. We will assume that 1 ∈ / B, because otherwise FB = ∅. More generally, we will assume that no element b ∈ B is a multiple of some other b ∈ B. This

196

4. Subshifts of Zero Entropy

property, called primitive14 , prevents B from having unnecessary elements that don’t change FB but might interfere with conditions put on B later on. Example 4.106. If B = {prime numbers}, then FB = {−1, 1} and XB = {σ n (· · · 000101000 · · · ) : n ∈ Z} ∪ {0∞ } is clearly a non-minimal shift. However, there is a minimal subshift ({0∞ }, σ) that every sequence in XB is asymptotic to, both in forward and backward time. If B = {pq : p, q are prime numbers}, then FB = {±prime numbers} ∪ {−1, 1}; this is effectively the sieve of Eratosthenes15 . Since there are arbitrarily long gaps between primes (i.e. (FB )c is thick; see Definition 2.18), (0∞ , σ) is again a minimal subshift of XB , and in this case, every x ∈ XB is proximal to 0∞ . The B-free sets date back to the first half of the 20th century; research from that time includes the question of under which conditions the density d(FB ) = limn n1 #{FB ∩ {1, 2, . . . , n}} exists; see [76, 155, 184, 186, 187]. Davenport & Erdös [186] showed that the logarithmic density δ(FB ) (see Definition 8.55) always exists and is equal to the upper density d(FB ). Besicovitch [75] gave the following sufficient condition for d(FB ) to exist: (4.41)

B is pairwise coprime and thin; i.e.

1 b∈B

b

< ∞.

 ¯ thin sequence B has light tails, Since d( Bb>K bZ) ≤ Bb>K 1/b, every  ¯ which means that the upper densities d( Bb>K bZ) → 0 as K → ∞. The set B might contain superfluous elements b0 in the sense that FB\{b0 } and its related shifts have the same properties as FB and its related shifts. A condition on B to avoid superfluous elements is the following: Definition 4.107. The set B ⊂ N is taut if the logarithmic densities satisfy ⎞ ⎛  bZ⎠ < δ bZ for all b0 ∈ B; δ⎝ b0 =b∈B

b∈B

that is, every b0 has a significant contribution to FB . Having light tails implies tautness, but not the other way around. 14 But

this primitive has nothing to do with primitive for matrices. Greek geographer and mathematician Eratosthenes of Cyrene (276–195/194 BC) was the chief librarian of Alexandria in his time. His most famous achievement was an estimate of the circumference of the Earth. 15 The

4.6. B-Free Shifts

197

4.6.1. Hereditary and Admissible B-Free Shifts. Apart from the Bfree shift, the following two shifts related to B-free sets are of use. Definition 4.108. The B-admissible shift is XBadm = {x ∈ {0, 1}Z : ∀ b ∈ B ∃ a ∈ N ∀ n ∈ Z xa+nb = 0}. A subshift X ⊂ {0, 1}N or Z is hereditary if whenever x ∈ X and yn ≤ xn for all n, then also y ∈ X. We call XBher the hereditary subshift if it is the smallest hereditary subshift containing XB . It is clear from the definitions that XBadm is hereditary and XB ⊂ XBher ⊂ XBadm .

(4.42)

We summarize from [231] some of the properties of XBher and XBadm : • There are examples where these inclusions are strict (in particular, XB need not be hereditary). Indeed, for B the primes as in Example 4.106, XBher = XB ∪ {σ n (· · · 0001000 · · · ) : n ∈ Z}, and XBadm is uncountable, because it is hereditary and contains a sequence with infinitely many & ones, for instance x = · · · 000.010001 · · · (with 1’s at positions nj=1 pj where pj is the j-th prime number). • However, as shown in [2], condition (4.41) implies equality in (4.42). In fact, if B is taut, then XB = XBher ; see [353, Theorem 3].

• Every set B can be made taut in the sense that there exists B  her and these spaces carry the same such that FB ⊂ FB , XBher  ⊂ XB shift-invariant measures. For every x ∈ XBher and ε > 0, the set / Bε (XBher {n ∈ Z : σ n (x) ∈  )} has zero density.

• If B and B  are both taut, then equality of B and B  is equivalent to adm = X adm . FB = FB , to XB = XB , to XBher = XBher  , and also to XB B

• If B has light tails, then the density d(FB ) exists. Additionally, if B is pairwise coprime and has light tails16 , then XB = XBher ; i.e. we have equality in (4.42). Proposition 4.109. Regardless of whether equality holds in (4.42) on not, the entropies are equal: ¯ B ) = δ(FB ), (4.43) htop (X her , σ) = htop (X adm , σ) = d(F B

B

where δ(FB ) is the logarithmic density; see Definition 8.55. Proof. We sketch the proof from [231, Proposition K and Theorem 2.28]. It is easy to see that htop (XBher , σ) ≥ d(FB ). Indeed, among the first n entries, η has at least d(FB )n ones, and therefore pX her (n) ≥ 2d(FB )n . Since B

16 Keller [353] derives this conclusion under the weaker assumption that B is taut and pairwise coprime.

198

4. Subshifts of Zero Entropy

htop (XBher , σ) = inf n pX her (n), the inequality htop (XBher , σ) ≥ d(FB ) follows. B One step in [231, Proposition K] is therefore to show that the other inequality ¯ pX her (n) ≤ 2d(FB )n+ε holds.  B

Example 4.110. If B = {p2 : p is prime}, then FB is the set of square-free integers. In terms of the Möbius function17 μ : Z → {−1, 0, 1} defined as ⎧ ⎪ (−1)k if |n| is the product of k distinct primes, ⎪ ⎪ ⎪ ⎨0 if |n| is a multiple of a square of a prime, (4.44) μ(n) = ⎪1 if |n| = 1, ⎪ ⎪ ⎪ ⎩0 if n = 0 we have FB = {n ∈ Z : μ(|n|) = 0}. The study of this example was stimulated by Sarnak’s conjecture18 . The density of FB is d(FB ) = 6/π 2 = 1/ζ(2) , see [303], so by (4.43), htop (XB , σ) = 1/ζ(2). Exercise 4.111. Let A = {1, 2, . . . , a} for some a ∈ N. Show that the number of periodic sequences x ∈ AZ of minimal period n equals n

ad , μ Per(n) = d 1≤d≤n,d|n

where μ denotes the Möbius function, roughly counting the parity of distinct prime factors; see (4.44). In particular, Per(n) = an −a if n is a prime. Derive Fermat’s Little Theorem, an−1 ≡ 1 (mod 1) if n is a prime not dividing a. The connection between B-free shifts and Toeplitz shifts is that every B-free shift has a unique minimal “core” that is a Toeplitz shift, although it is usually a very simple one, namely ({0∞ }, σ). This is summarized in the next result, which is [231, Theorem A]. Theorem 4.112. Every B-free shift (XB , σ) has a unique minimal subshift, which is a Toeplitz shift, and every x ∈ XB is proximal to this subshift. As the proof shows, every x ∈ XB is syndetically proximal to this subshift, and the result holds for (XBher , σ) as well. Furthermore, (XB , σ) is  μ(n) Möbius function is central in algebraic number theory. For instance, =  n ns i.e. the inverse of the Riemann ζ-function. As a consequence, the statement n≤N μ(n) =  1 +ε 2 o(N ) is equivalent to the prime number theorem, and ) is equivalent to n≤N μ(n) = O(N the Riemann hypothesis. 18 His conjecture states that every continuous dynamical system (X, T ) of zero entropy has the property that every continuous f : X → R is orthogonal to the Möbius function, which n−1 1 k means that averages n k=0 μ(k) · f ◦ T (x) tend to zero for every x ∈ X. Many dynamical systems satisfy this conjecture, e.g. circle rotations [185]. It is known that the converse is false: There are continuous positive entropy systems such that every continuous function is orthogonal to the Möbius function; see Downarowicz & Serafin [217]. A recent account of the progress on this problem can be found in [245]. 17 The

ζ(s)−1 ,

4.6. B-Free Shifts

199

proximal (i.e. every pair (x, y) ∈ XB2 is proximal) if and only if its maximal equicontinuous factor is trivial; see [231, Theorem 3.22]. Proof. By construction, each 0 appearing in η = 1FB reappears with period b for some b ∈ B. Thus every block of 0’s also reappears periodically. We will show that some blocks of 1’s appear periodically as well. Throughout this proof, blocks of 0’s (or of 1’s) are always taken to be of maximal length; i.e. they cannot be extended to the left or right with another 0 (or 1). If η contains arbitrarily long blocks of 0’s, all reappearing periodically, 0∞ ∈ XB , and the proof of Proposition 2.17 shows that {0∞ } is the unique minimal subset of (XB , σ). Trivially 0∞ is a Toeplitz sequence. Otherwise, let A0 be the longest block of 0’s that appears in η. Each appearance is followed by a block of 1’s. Let A1 := A0 1s where s ∈ N is the shortest length of all blocks of 1’s succeeding an appearance of A0 in η. Next take A2 := A1 0t where t ∈ N is the longest length of all blocks of 0’s succeeding an appearance of A1 in η. The blocks A0 and 0 both reappear periodically, and if they reappear simultaneously, the s-block in between has to be 1s again. This follows because A0 was a longest block of 0’s and s was the length of the shortest block of 1’s succeeding A0 . Therefore A2 as a whole reappears periodically. Next extend A2 to the left as A3 := 1u A2 where u ∈ N is the shortest length of all blocks of 1’s preceding an appearance of A0 in η. Next take A4 := 0v A3 where t ∈ N is the longest length of all blocks of 0’s preceding an appearance of A3 in η. By the argument above, A4 reappears periodically. Continue by induction; i.e. A4n+2 = A4n+1 0tn , A4n+1 = A4n 1sn , u n A4n+3 = 0 A4n+2 , A4n+4 = 0vn A4n+3 are extensions with the shortest possible blocks of 1’s and longest possible blocks of 0’s available. Then y := limn An is a two-sided sequence of which each subword reappears periodically, so it is Toeplitz. Therefore orbσ (y) is a minimal subset of (XB , σ). Because the blocks An appear with the same periods in η, it is the only minimal subset.  The case that {0∞ } is the minimal subset is easy to determine from B: Lemma 4.113. The set B contains an infinite set of pairwise coprime integers if and only if 0∞ ∈ XB . In this case {0∞ } is the unique minimal set in XB and in XBher . & Proof. ⇒: Let b1 , . . . , bk ∈ B be pairwise coprime, and let N = ki=1 bi . By the Chinese Remainder Theorem, there is m ∈ {0, . . . , N − 1} such that

200

4. Subshifts of Zero Entropy

m ≡ i mod bi for i = 1, . . . , k. Therefore ηj = 0 for j = m + 1, . . . , m + k. Since k is arbitrary, 0∞ ∈ XB . ⇐: Assume that A = {a + 1, . . . , a + n} is a longest block of 0’s in η, and let N be its period. Then ηa+kN −1 = 1 for all k ∈ Z. If b ∈ B is coprime with N , then there are k,  ∈ Z such that b = kN + a − 1 ∈ / FB , contradicting that ηa+kN −1 = 1. Therefore no b ∈ B is coprime with N , and B cannot contain infinitely many pairwise coprime integers.  The following characterization is [231, Theorem B]. Theorem 4.114. The following statements about a B-free shift (XB , σ) are equivalent: (a) The unique minimal subshift of (XB , σ) is (0∞ , σ). (b) 0∞ ∈ XB . (c) (XB , σ) is proximal. (d) B contains an infinite pairwise coprime subset. Part (a) says that B is so large that {nb : n ∈ Z, b ∈ B} is a thick set; see Definition 2.18. For part (c), it may be worth pointing out that every x ∈ XB is proximal to 0∞ , but usually not asymptotic to it. For instance, the set B = {pq : p, q are primes} from Example 4.106 with FB = {± primes} ∪ {−1, 1}; the sequence x = 1FB is proximal to 0∞ because there are arbitrarily large gaps between the primes, but not asymptotic to it, because there are infinitely many primes. Exercise 4.115. Let XCantor be the shift space emerging from the Cantor substitution χCantor ; see Remark 2.14. Is XCantor a B-free shift? Is it a Toeplitz shift? 4.6.2. The Canonical Odometer and the Mirsky Measure. Write B = {b1 , b2 , b3 , . . . }. Let ∞  ˆB → Σ ˆ B , xj → xj + 1 mod bj ˆB = ˆ:Σ {0, 1, . . . , bk − 1} and a Σ k=1

ˆ B, a ˆ ) is called be the adding machine as described in Remark 4.98. Then (Σ the canonical odometer of (XB , σ). We abbreviate 0 = 0∞ and 1 = ˆ B . Note that, contrary to Remark 4.98, we did not make the ˆ (0) = 1∞ ∈ Σ a assumption that B = {b1 , b2 , b3 , . . . } consists of pairwise coprime integers. ˆ B ; we have xi ≡ xj mod gcd(bi , bj ) Therefore orbaˆ (1) need not be dense in Σ ˆ B is an abelian group under addition, and for all i, j ∈ N. In more detail, Σ {. . . , −x, 0, x, 2x, 3x, . . . } is dense if and only if B is pairwise coprime. As ˆ B, a ˆ ) is minimal and uniquely ergodic if and only if B is a consequence, (Σ pairwise coprime.

4.6. B-Free Shifts

201

Define the window ˆ B : wk = 0 for all k ≥ 1}. W := {w ∈ Σ ˆ B → {0, 1}Z ˆ n (1) ∈ W if and only if n ∈ FB . Also define ϕB : Σ Then nB := a by  1 if j ≡ −wk mod bk for all k ≥ 1, (4.45) ϕB (w)j = 0 otherwise. ˆ = σ ◦ ϕB . Although ϕB sends Borel sets to Borel Then η = ϕB (0) and ϕB ◦ a k sets, ϕB is not continuous. For instance, a j=1 (0) → 0 as k → ∞, but in k general σ j=1 (η) → η. Note that the interior W ◦ consists of sequences w for which a finite number of entries wk determine that w ∈ W . Indeed, since cylinder sets are ˆ B : yk = wk for k ≤ m} ⊂ W ◦ . open, there exists m such that [w]m = {y ∈ Σ If n ∈ Z is such that nB ∈ [w]m , then n + jM B ∈ [wm ] as well for M = &m k=1 bk and all j ∈ Z. But then also ηn = 1 and ηn+jM = 1 as well, for all j ∈ Z. Therefore a non-empty interior of the window refers to 1’s that appear periodically in η, just as in the second part of the proof of Theorem 4.112. On the other hand, if B contains an infinite pairwise coprime subset, then the window has empty interior and no 1 in η appears periodically. This is exactly the situation of the first part of the proof of Theorem 4.112, if {0∞ } is the minimal subset (trivially Toeplitz) of XB . See for a more extended argument [380, Theorem C], which also proves19 that W is the closure of W ◦ if and only if η itself is a Toeplitz sequence. In this case, (XB , σ) is an ˆ ). almost one-to-one extension of ({n}, a ˆ -invariant probability measure on Definition 4.116. Let μB be the unique a ˆ ˆ ). The Mirsky measure νB is the pull-back measure (ΣB , a for every Borel set A ⊂ {0, 1}Z . νB (A) := μB (ϕB (A)) & In particular, νB ([1]) = b∈B (1 − 1b ) > 0 if and only if B is thin. Therefore, if B is not thin, then νB = δ0∞ . Peckner [451] showed that the Bfree shift for B = {p2 : p prime} is intrinsically ergodic. Kułaga-Przymus, Lemańczyk & Weiss [379] showed that if htop (Xη ) > 0, then (XB , σ) need not be intrinsically ergodic; the set of shift-invariant measures can be the Poulsen simplex; see Section 6.1. Theorem 4.117. The sequence η = 1FB is quasi-generic for the Mirsky measure νB ; i.e. there is a subsequence (nk )k≥1 of N such that the Cesàro 19 In Theorem B; additionally in Proposition 1.2 it is proved that this Toeplitz sequence is regular if and only if the Haar measure of the boundary ∂W is zero.

202

4. Subshifts of Zero Entropy

means of Dirac measures nk −1 1

k→∞ δσj (η) −→ νB nk

in the weak∗ topology.

j=0

¯ B ) suffices, Any sequence (nk )k≥1 such that n1k #{FB ∩ {0, . . . , nk − 1}} → d(F so if d(FB ) exists, then η is typical20 for νη . It follows that, although νB is defined on XBher or even XBadm , νB (XB ) = 1. ˆ B, a ˆ) In fact, (XB , σ) is uniquely ergodic and htop (XB , σ) = 0, because (Σ has these properties (still assuming that B = {b1 , b2 , b3 , . . . } are pairwise coprime). If B is taut, then XB is the support of νB ; see [353, Theorem 2]. The shift (XBher , σ) is intrinsically ergodic ([378] and [231, Theorem J]) even though XBher carries in general other measures, and htop (XBher , σ) can be positive. ˆ = σ ◦ ϕB , it suffices to prove that Proof. Since ϕB ◦ a Nk −1 1

ˆ n (0) = μB (ϕB (Z)), 1ϕ−1 (Z) ◦ a lim B k→∞ Nk

(4.46)

n=0

for cylinder sets Z = {x ∈ {0, 1}Z : xki = 0 for i = 1, . . . r} for arbitrary r ∈ N and k1 , . . . , kr ∈ Z. Define ˆ B : wk ≡ 0 mod bk for all 1 ≤ k ≤ K}. WK = {s ∈ Σ Then WK is clopen and WK * W . Note that r 

c ˆ −ki (WK ) ⊂ ϕ−1 a B (Z) =

i=1

r 

ˆ −ki (W c ) a

i=1



(4.47)

r  i=1

ˆ a

−ki

c (WK )



r

ˆ −ki (WK \ W ). a

i=1

Choose ε > 0 arbitrary and let K ∈ N be so large that  c )) ≥ μ ( r ˆ −ki (WK ˆ −ki (W c )) − ε, μB ( ri=1 a B i=1 a (4.48) ¯ B ) + ε. d(F{b1 ,...,bK } ) ≤ d(F c ) is a clopen set, the indicator function 1 ˆ −ki (WK Because ri=1 a r c ) ˆ −ki (WK i=1 a ˆ ˆ ) implies that is continuous. The unique ergodicity of (ΣB , a  r Nk −1  1

c ˆ n (0) = μB ˆ −ki (WK 1r aˆ −ki (W c ) ◦ a ) . a (4.49) lim i=1 K k→∞ Nk n=0

20 In

the sense that the Ergodic Theorem 6.13 holds for η.

i=1

4.7. Unimodal Restrictions to Critical Omega-Limit Sets

203

ˆ n (0) ∈ WK \ W if and only if n ∈ F{b1 ,...,bK } \ FB . By (4.48), Note that a Nk −1 1

ˆ n (0) ≤ ε. lim sup 1r aˆ −ki (WK \W ) ◦ a i=1 k→∞ Nk n=0

This combined with (4.47) gives (4.46), and the proof follows.



For every x ∈ XBher and k ∈ N, there is a wk ∈ Z such that xbk n+wk = 0 for all n ∈ Z. Thus, for the set YBher := {x ∈ XBher : wk is unique for all k ∈ N}, we can define the map ˆ B, θ(x)k = wk . θB : Y her → Σ B

ˆ = θB ◦ σ and ϕB ◦ θB (x) ≤ x coordinate-wise, and due to the Then θB ◦ a ˆ B, a ˆ ), we have νB ◦ θB−1 = μB . unique ergodicity of (Σ

4.7. Unimodal Restrictions to Critical Omega-Limit Sets There are many unimodal maps f for which (or rather, many combinatorial conditions implying that) the critical ω-limit set is a minimal Cantor set on which f acts in an interesting way, e.g. (semi-)conjugate to substitution shift, an adding machine, or a Sturmian shift. In this section, we present some results from the literature in this direction. A precise conditions for ω(c) to be a minimal Cantor set containing c is due to Alvin [25, Theorem 5.2], and for this we need the following definition. Definition 4.118. We call the sequence {n , An }n≥1 a uniform scheme if: (1) (n )n≥1 is a strictly increasing sequence of integers with 1 ≥ 2. (2) An ⊂ {0, 1}n . (3) For every n ≥ 1 and u ∈ An+1 , we can write u = v1 · · · vk , vi ∈ An and each w ∈ An is equal to vi for some i. The sequence x ∈ {0, 1}N is generated by {n , An }n≥1 if xin +1 · · · x(i+1)n ∈ An for all n, i ∈ N. Theorem 4.119. A sequence ν is the kneading sequence of a unimodal map such that ω(c)  c is a minimal Cantor set if and only if ν is generated by a uniform scheme {n , An }n≥1 such that the first elements an ∈ An satisfy the following: (i) an is a prefix of an+1 . (ii) ν = limn→∞ an . (iii) σ k (an ) pl an for each n ∈ N and 0 ≤ k < n . Here pl is the parity-lexicographical order (see Definition 3.81) and a pl b also holds if a is a prefix of b.

204

4. Subshifts of Zero Entropy

In terms of kneading maps, we have the following straightforward sufficient, but certainly not necessary, condition. Theorem 4.120. Let f be a unimodal map with kneading map Q. If Q(k) → ∞, then ω(c) is a minimal Cantor set and htop (f |ω(c) ) = 0. Proof. First note that c is not (pre)periodic, because (Q(k))k≥1 is unbounded but finite for each k. Therefore c has an infinite orbit and is recurrent. This means that ω(c) = orb(c). Once we have shown minimality, it is clear that ω(c) cannot contain periodic points and thus has to be nowhere dense. In addition, ω(c) contains no isolated points, so it must be a Cantor set. In Example 5.23 we will see that every unimodal restriction to ω(c) with Q(k) → ∞ is conjugate to an enumeration system with a low enumeration scale. Thus minimality follows by Proposition 5.17. Theorem 5.25 shows then also that htop (f |ω(c) ) = 0, but we will give a slightly more direct proof: Let j ∈ N be arbitrary so that Q(k) > Q(j) for all k > j. Since the kneading sequence ν is a concatenation of blocks ν1 · · · νSQ(k) −1 νS Q(k) , k ≥ j, where νn = 1 − νn , the subshift derived from (ω(c), f ) is a subsubshift of the coded shift with code words {ν1 · · · νSQ(k) −1 νS Q(k) : k ≥ j}. By Theorem 8.73, λj := exp(htop (σ|Xj )) sat   −S −S −(S +k) isfies Q(k)≥Q(j) λj Q(k) = 1. But Q(k)≥Q(j) λj Q(k) ≤ k≥0 λj Q(j) 1−S

= λj Q(j) /(λj − 1). Thus λj → 1 as j → ∞ and 0 ≤ htop (f |ω(c) ) ≤ limj log λj = 0 as claimed.  4.7.1. Renormalizable Unimodal Maps and ∗-Products. In this section, we explain renormalization for unimodal maps f and more specifically the quadratic maps fa (x) = ax(1 − x), as this is the setting in which the concept was first made popular. Let 0 and p = 1 − a1 ∈ ( 12 , 1) be the fixed points of the quadratic map fa , a ∈ [2, a∗ ] where a∗ = 2.9196 . . . is the solution of a3 = 2(a2 + a + 1). If J1 = [1 − p, p], then fa (J1 ) and J1 have disjoint interiors, but fa2 (J1 ) ⊂ J1 ; see Figure 4.14. In this case, f 2 : J1 → J1 is the first return map to J1 and again is unimodal, although turned upside down. Definition 4.121. A unimodal map f is renormalizable if there exists an interval J1  c and q1 ≥ 2 such that f q1 (J1 ) ⊂ J1 and J1 , f (J1 ), . . . , f q1 −1 (J1 ) have disjoint interiors. The map f q1 : J1 → J1 is called the renormalization of f . As the renormalization of a unimodal map is again unimodal, f q1 : J1 → J1 may be again renormalizable; i.e. there is J2 ⊂ J1 such that J2 and f j (J2 ) have disjoint interiors for 0 < j < q2 but f q2 (J2 ) ⊂ J2 . Such a map is

4.7. Unimodal Restrictions to Critical Omega-Limit Sets

fa2

205

fa

f 2 (J2 ) J2 Y H H H H YH H H H J1 1−p c

H H Y H H

f 3 (J2 ) f (J2 )   *  *    f (J1 ) *   

p

Figure 4.14. A renormalizable quadratic map and its second iterate (left) and the permutation of intervals if f is twice renormalizable (right).

twice renormalizable. Figure 4.14 shows how these intervals are permuted if q1 = 2, q2 = 4. Similarly, there are maps that are 3, 4, . . . times renormalizable, or even infinitely renormalizable. In this case there is an infinite sequence of nested intervals · · · ⊂ J4 ⊂ J3 ⊂ J2 ⊂ J1 ⊂ [0, 1] and a sequence of periods (qk )k∈N (where qk divides qk+1 ), such that f qk (Jk ) ⊂ Jk and Jk , f (Jk ), . . . , f qk −1 (Jk ) have pairwise disjoint interiors. This is what happens, with qk = 2k , during the first period doubling cascade in the quadratic family. There is an increasing sequence of parameters (αk )k∈N such that Qα becomes k times renormalizable if α ≥ αk . At the limit parameter αfeig = lim ak ≈ 3.569945672 . . . k→∞

the map becomes infinitely renormalizable. This behavior was first observed in the 1970s by Tresser & Coullet [538] and Feigenbaum21 [241], and an amazing observation was that the relative distances of those parameters converges: |αk+1 − αk | → δ = 4.669201609102990 . . . . (4.50) |αk+2 − αk+1 | This phenomenon has been a major source of inspiration since the 1970s; see e.g. [112, 140, 386] and the monograph in [414, Section VI]. The next proposition gives the effect of having periodic intervals on the kneading map. 21 Mitchell

Feigenbaum (1944–2019).

206

4. Subshifts of Zero Entropy

Proposition 4.122. Let f be a unimodal map with kneading map Q and cutting times (Sk )k≥0 . (1) Suppose that f has no attracting periodic point. Then Q(k +1) ≤ k for all k and f has an n-periodic interval J  c for some n ≥ 2 if and only if n = Sk for some k ≥ 1 and Q(k + j) ≥ k for all j ≥ 1. (2) If a quadratic map f = Qa has an attracting n-periodic point p, then n is a cutting time if p is orientation reversing and n is a co-cutting time if p is orientation preserving. Proof. Recall the closest precritical points ζk , ζˆk from the proof of Theorem 3.90 and (3.25). (1) If Q(k + 1) > k, then f Sk (c) ∈ [ζk , ζˆk ], so f Sk maps one of the intervals [ζk , c] or [c, ζˆk ] monotonically into itself (in an orientation-reversing way). This interval contains an attracting Sk -periodic or 2Sk -periodic point. If J  c is n-periodic, then f n (J ◦ )  c because otherwise f n maps [p, c] into itself, producing an attracting n-periodic point. Therefore J  ζk , ζˆk for some minimal k, and n = Sk . Additionally, f j (J)  c only if j is a multiple of n. In particular, Sk+j are all multiples of n, and thus Q(k + j) ≥ k for all j ≥ 1. Conversely, if Q(k + j) ≥ Q(k + 1) = k for all j ≥ 1, then f Sk (c) ∈ (ζk−1 , ζˆk−1 ), and f Sk maps one of the intervals [ζk−1 , ζk ] or [ζˆk , ζˆk−1 ] in an orientation-reversing way onto itself, producing an orientation-reversing Sk periodic point p. The other interval contains a preperiodic point pˆ. If there / are more such points, then we can take p, pˆ furthest away from c. If f Sk (c) ∈ [p, pˆ], then f jSk (c) ∈ / (ζk−1 , ζˆk−1 ) for some j ≥ 1. Then Q(k + j) < k for this j, contrary to our assumption. Therefore f Sk (c) ∈ [p, pˆ], making J := [p, pˆ] periodic. (2) Let p be an attracting periodic point, and assume it is the one closest to c on its orbit. We can assume without loss of generality that p < c. Since f is quadratic, Singer’s Theorem22 [414, Chapter II.6] implies that c is in the immediate domain of p, so f kn (c) → p as k → ∞, and there is no 0 < j < n such that f j (c) ∈ [p, c]. If p reverses orientation, there is an interval [ζ, p] that is mapped monotonically onto [p, c] by f n . But this means that ζ = ζk is a closest precritical point, so n = Sk . If p preserves orientation, then f n ([p, c]) ⊂ [p, c), so n is not a cutting time. Take y > f (c) maximal such that f n−1 is monotone on [f (p), y]. Then  f n−1 ([f (p), y])  c, so n is a co-cutting time. 22 It

suffices for this that f has negative Schwarzian derivative:

maps do; see [414, Chapter II.6].

f  f



− 32 ( ff  )2 ≤ 0 as quadratic

4.7. Unimodal Restrictions to Critical Omega-Limit Sets

207

Exercise 4.123. Show that if Q is an admissible kneading map such that k − 1 ≤ Q(j) for j = k, k + 1, . . . , k + r − 1 and Q(k + r) > k − 1, then the corresponding unimodal map is renormalizable with period Sk . Exercise 4.124. For the Feigenbaum map, we can take the uniform scheme {n , An } with n = 2n and An = {ν1 , . . . , ν2n −1 , ν2n , ν1 , . . . , ν2n −1 , ν2 n }. Find a uniform scheme if f is infinitely renormalizable with periods (qk )k≥1 . The dynamics of f on ω(c) = k Jk can be represented by an adding machine; see Section 4.5.2. Indeed, let p1 = q1 and pk = qk+1 /qk , and X = {(xi )i≥1 : 0 ≤ xi < pi } with the “add and carry” operation a : X → X. qk −1 Then (X, a) and (ω(c), f ) are conjugate. The intervals {Jk,j }j=0 in the k-th cycle of the renormalization then represent the cylinders [x1 · · · xk ], where  j = ki=1 qi xi . However, f |ω(c) can be conjugate to an adding machine also if f is not infinitely renormalizable. In this case, we speak of a strange adding machine [89]. Renormalization for unimodal maps can also be described symbolically by means of so-called ∗-products; see [195] and [164, page 72]. Suppose that f is a renormalizable unimodal interval map, say it has periodic interval J  0 of period q and the itineraries of orbits in f (J) start with the block β = [β1 · · · βq−1 ∗],23 and that the renormalization f l |J is conjugate to a unimodal map f˜ with kneading sequence ν˜. Then the kneading sequence ν of f has the form ν = β ∗par ν˜, where the parity ∗-product ∗par is defined as ⎧ if q does not divide j, ⎨ ν˜j mod q if q divides j and #{k < q : βk = 1} is even, ν˜ (β ∗par ν˜)j = ⎩ j/q if q divides j and #{k < q : βk = 1} is odd. ν˜j/q The parity of 1’s in the block β determines whether the orientation of the map f l |J is reversed w.r.t. the orientation of the original map f or not. Example 4.125. The kneading sequence of the Feigenbaum map emerges as the infinite ∗-product with β = 1∗. Indeed, β ∗par ν = 1ν1 1ν2 1ν3 1ν4 1ν5 1ν6 1ν7 · · · , β ∗par (β ∗par ν) = 101ν1 101ν2 101ν3 101ν4 · · · , β ∗par (β ∗par (β ∗par ν)) = 1011101ν1 1011101ν2 1011101 · · · , .. .. .. .. . . . . This follows the pattern of a Toeplitz sequence; see Section 4.5. The limit sequence is also obtained by the period doubling or Feigenbaum substitution χfeig : 0 → 11, 1 → 10; see Example 1.6. In fact, every ∗-product of a constant type renormalization is generated by a substitution. If the type of renormalization varies, then we need S-adic transformations instead. 23 We

don’t specify the final symbol, since it is not the same for every point in f (J).

208

4. Subshifts of Zero Entropy

4.7.2. Homeomorphic Restrictions. This section discusses whether and when a unimodal map f : [0, 1] → [0, 1], when restricted to ω(c), is a homeomorphism and which kneading sequences correspond to this. We exclude the trivial case of ω(c) being a single periodic orbit. Lemma 4.126. If X ⊂ [0, 1] is an infinite closed and f -invariant set and f |X is a homeomorphism, then c ∈ X. In addition, if (X, f ) is transitive, then it is minimal. Proof. By (semi)conjugating to a tent map, we can assume that f has constant slope ±λ; see [420] and [116, Section 9.5]. If htop (f ) = log λ > 0 and X  c, then f is locally expanding, and thus X is finite by Proposition 1.42. If htop (f ) = 0, then every orbit of f is asymptotic either to a periodic orbit or to a Cantor set C of Feigenbaum type and hence c is an accumulation point. Therefore X ∩ C = ∅ and X is finite also in this case. If (X, f ) is not minimal, say Y ⊂ X is a closed subset of X not containing c, then Y is finite by Proposition 1.42 and in particular contains a periodic orbit P . By transitivity, we can find x ∈ X \ X  with a dense orbit. As before we can assume that f |Y is locally expanding. Let (nk )k≥1 be so that the f nk (x) are the successive closest approaches of x to P . Let p ∈ P be an accumulation point of (f nk (x))k≥1 . Then f nk −1 (x) → q where f (q) = p but  q ∈ X \ P . This contradicts that f |X is one-to-one. Thus if ω(c) is a Cantor set on which f is one-to-one, then c ∈ ω(c). When we view the problem symbolically, i.e. ask ourselves which kneading sequences ν correspond to such ω-limit sets, we get the following additional property; see [26, Theorem 3.5]. Lemma 4.127. The sequence ν is the kneading sequence of a unimodal map for which ω(c) is a minimal set on which f is homeomorphic if and only if ν is the only infinite sequence u ∈ Xν := {σ k (ν) : k ≥ 0} such that both 0u and 1u ∈ Xν . This means that c has to be accumulated by ω(c) from both the right and the left. Proof. First assume that f : ω(c) → ω(c) is one-to-one; then i(x) ∈ Xν cannot be preceded by both 0 and 1, because f −1 (x) has only one preimage. The only exception is f (c), but i(f (c)) = ν. To show that both 0ν and 1ν ∈ Xν , note that for every x, there must be a left-special word in Ln (Xν ), because otherwise Xν is a single periodic orbit by Proposition 1.12. But then, by taking a convergent sequence of these left-special words, there must be an infinite left-special word. We have already seen that ν is the only candidate.

4.7. Unimodal Restrictions to Critical Omega-Limit Sets

209

Conversely, if ν is the only infinite left-special word in Xν , then there is no x ∈ ω(c) with two preimages. This also holds if f n (x) = c and u =  limyx i(x) or u = limy x i(x). This proof shows that the subshift Xν has only one infinite left-special sequence, like Sturmian sequences. However, there may be left-special words of finite length in L(Xν ) that cannot be extended indefinitely to the right and remain left-special. This happens for example in the Feigenbaum case ν = ρfeig = 1011 1010 10111011 1011101010111010 · · · where 11 is left-special and also right-special, but both 110 and 111 are no longer left-special. This corresponds to property (iii) in Theorem 4.128 below. Excluding the periodic and infinitely renormalizable cases, the first examples of kneading sequences for maps f so that f |ω(c) are homeomorphisms were described in [119], in terms of the kneading map. The construction is flexible enough to provide, say for the tent √ family Tλ , uncountably many parameters within every open subinterval of [ 2, 2] of slopes. Further examples emerged with the discovery of strange adding machines in [23, 89]. The following characterization comes from [26, Theorem 4.3]. Theorem 4.128. A sequence ν ∈ {0, 1}N is the kneading sequence of a unimodal map with an infinite minimal set ω(c) on which f is a homeomorphism if and only if every uniform scheme {n , An }n≥1 (see Definition 4.118) that generates ν satisfies the following: (i) For the first elements an of An , we have ν = limn an and an is a prefix of an+1 . (ii) For sufficiently large N there exists 1 ≤ m < N (with m → ∞ as N → ∞), such that every u ∈ AN occurring in the decomposition of ν into words of AN is preceded by am or am (i.e. am with the last letter changed). (iii) If a1 ∈ AN is different from aN , then there exists an extension a1 a2 · · · ak ∈ AkN for some k ∈ N, such that every occurrence of a1 a2 · · · ak in ν is preceded by aN . 4.7.3. Sturmian Restrictions to the Critical Omega-Limit Set. There are multiple ways of choosing the kneading map Q so that (ω(c), f ) is Sturmian. The simplest way is by means of the Ostrowski numeration; see Example 5.22. Indeed, let θ ∈ [0, 1] be some irrational number and let an be the entries and pn /qn the convergents of its continued fraction expansion. Thus q−1 = 0, q0 = 1, and qn = an qn−1 + qn−2 for n ≥ 1; see Section 8.2.

210

4. Subshifts of Zero Entropy

Take kn =

n

j=0 aj

and then cutting times as follows:

⎧ ⎪ ⎨Sk = k + 1 Skn = qn ⎪ ⎩ Skn +a = aqn + qn−1

for 0 ≤ k ≤ a1 , for n ≥ 1, for 1 ≤ a ≤ an , n ≥ 1.

It is clear that Q(k) → ∞ in this case, and the Sk ’s interpolate between the qn ’s; cf. [125]. However, f : ω(c) → ω(c) is in general not invertible, since c itself and/or other points in the backward orbit of c have two preimages in ω(c); see [119]. Yet also if Q(k) is bounded (and even if Q(k) ≤ 1), there are examples where (ω(c), f ) is Sturmian; see [117, Chapter III, 3.6]. Let ϕ : [0, 1] → [0, 1] be a Lorenz-like map, i.e. an interval map that is continuous and increasing both on [0, c) and (c, 1] with limxc ϕ(x) = 1 and limx c ϕ(x) = 0. Thus it has a discontinuity at the critical point c. In addition, we will assume that  with the notation ϕ is symmetric: ϕ(1 − x) = 1 − ϕ(x) (i.e. ϕ(ˆ x) = ϕ(x) x ˆ = 1 − x) for all x ∈ [0, 1] \ {c}, so the critical point c = 12 . Every symmetric unimodal map f : [0, 1] → [0, 1] with f (c) = 1 can be made into a symmetric Lorenz-like map by flipping the right half of the graph vertically around c = 12 (see [28, 117]):  f (x) if x ∈ [0, c), ϕ(x) = 1 − f (x) if x ∈ (c, 1]. Then ϕ is semi-conjugate to f : ϕ ◦ f = f ◦ f ; see Figure 4.15. In fact  f n (x) if f n is increasing at x, n ϕ (x) = 1 − f n (x) if f n is decreasing at x. We will use the itinerary map i for ϕ with codes +1 for [0, c) and −1 for (c, 1].

ϕ

f

c=

1 2

c=

1 2

Figure 4.15. A symmetric Lorenz-like map obtained from a unimodal map.

4.7. Unimodal Restrictions to Critical Omega-Limit Sets

211

Recall the definition of θ(x) of Exercise 3.80 as an alternative way to code itineraries of unimodal maps f : [0, 1] → [0, 1]. Namely, θ0 (x) = +1 and  n−1  +1 if f n is increasing at x, (4.51) θn (x) = (−1)ij (x) = −1 if f n is decreasing at x, j=0 for n ≥ 1. It follows that θ(f (x)) = σ(θ(x)) if i0 (x)  = 0 and θ(f (x)) = −σ(θ(x)) if i0 (x) = 1. For the itinerary iϕ of x ∈ I \ nj=0 ϕ−j (c) under the function ϕ this means that ⎧ ⎪ ⎨in (x) = 0 and θn (x) = +1 ϕ ⇔ θn+1 (x) = +1 in (x) = 0 ⇔ or ⎪ ⎩ in (x) = 1 and θn (x) = −1 and

⎧ ⎪ ⎨in (x) = 1 and θn (x) = +1 (x) = 1 ⇔ iϕ or n ⎪ ⎩ in (x) = 0 and θn (x) = −1

⇔ θn+1 (x) = −1.

ϕ ϕ In other words, iϕ n = (1 − θn+1 (x))/2. This gives i ◦ ϕ(x) = σ ◦ i (x). ϕ ϕ ϕ Also, ν = limxc i (x) with the first symbol neglected. Define ρ (n) = ϕ }; then we recover the cutting times as S0 = 1, min{k > n : νkϕ = νk−n Sk+1 = ρϕ (Sk ). (The co-cutting times can be recovered as Sˆ0 = κ = min{k ≥ 1 : νkϕ = 0} and Sˆi+1 = min{k > Sˆi : νkϕ = ν ϕ ˆ }.) See the k−Si example in the proof of Proposition 4.129 below.

To each x ∈ I we can assign a rotation number by first assigning a lift Φ : R → R to the Lorenz map ϕ: ⎧ ⎪ if x ∈ [0, c], ϕ(c) = 1, ⎨ϕ(x) Φ(x) = ϕ(x) + 1 if x ∈ (c, 1), ⎪ ⎩ Φ(x − n) + n if x ∈ [n, n + 1). Then Φ(x) mod 1 = ϕ(x mod 1) and the rotation number is defined as (4.52)

α(x) = lim sup n→∞

Φn (x) − x . n

Since Φ(x) = x if and only if x mod 1 ∈ [0, c) and Φ(x) = x + 1 otherwise, we obtain (4.53)

1 α(x) = lim sup #{0 ≤ k < n : iϕ k (x) = 1} n→∞ n 1 = lim sup #{1 ≤ k ≤ n : θk (x) = −1}. n→∞ n

212

4. Subshifts of Zero Entropy

Next we turn ϕ into a proper circle endomorphism (with unique rotation number independent of x ∈ S1 ) by setting  ϕ(1) = f+ (1), x ∈ [0, a], where a < c is such that ϕ(a) = ϕ(1), ϕ(x) ¯ = ϕ(x), otherwise. Also let b > c be such that ϕ(b) = a; see Figure 4.16.

ϕ¯

ϕ(1)

a

c=

1 2

b

Figure 4.16. A stunted symmetric Lorenz map ϕ ¯ as a circle endomorphism.

Proposition 4.129. Assume that f is a unimodal map with cutting times ¯ = a; see Figure 4.16. Then the rotation {Sk }k≥0 . Let b > c be such that ϕ(b) number of the corresponding ϕ¯ equals k ∈ [ 12 , 1] ∩ Q if k is minimal such that f Sk (c) ∈ (ˆb, b), α = Sk limk→∞ Skk ∈ [ 12 , 1] if no such k exists. In the latter case, the kneading map Q(k) ≤ 1 for all k ∈ N, and if α ∈ / Q, then f : ω(c) → ω(c) is a minimal homeomorphism. Proof. Recall that f (c) = 1 and assume that there is a minimal integer n ≥ 1 such that ϕn (1) ∈ (c, b]. Then ϕ¯n+1 (1) ∈ (0, a] and ϕ¯n+2 (1) = ϕ(1) ¯ is periodic with period n + 1. Recall that b > c is such that ϕ(b) ¯ = a, so f (b) = a ˆ > c, and f 2 (b) = 2 (c) > c. Therefore b ∈ (ζˆ , ζˆ ) for closest precritical points ζˆ > f (a) = f 2 1 1 ζˆ2 > c, see (3.23), and ˆb ∈ (ζ1 , ζ2 ). There are two possibilities: • ϕn (1) = f n (1). In this case f n is increasing at 1 and thus n+1 = Sk is a cutting time. n (1). In this case f n is decreasing at 1 and again n + 1 = • ϕn (1) = f Sk is a cutting time.

4.7. Unimodal Restrictions to Critical Omega-Limit Sets

213

By minimality of k, f Sj (c) ∈ / [ˆb, b] \ {c} for all j < k, and hence the kneading sequence ν of f consists of blocks 0 or 11. For example, ν =

1.

0.

0.

1

1.

0.

1

1.

0.

1

1.

1

0

θ = +1 − 1 − 1 − 1 + 1 − 1 − 1 + 1 − 1 − 1 + 1 − 1 + 1 + 1 ν

ϕ

=

1.

1.

1.

0

1.

1.

0

1.

1.

0

1.

0

0

··· , ··· , ···

where dots indicate cutting times and the bold symbol the position Sk . Since n + 1 is the period of ϕ(1), ¯ this shows that #{1 ≤ j ≤ Sk : θj = −1} = k, and in view of (4.53) we have α = k/Sk . / (ˆb, b) for all n ≥ 1, then f n (1) ∈ / If there is no such minimal n, i.e. ϕn (1) ∈ ˆ (b, b) for all n ≥ 1 (and in particular Q(j) ≤ 1) for all j ≥ 1. A counting argument similar to the above shows that α = lim supk k/Sk = limk k/Sk . It is possible that α is rational, e.g. for the logistic map fa (x) = 1 − a(x − 12 )2 with a = 3.5097. In this case, ν = (101)∞ and ϕ¯i (1) converges to an attracting orbit of period 3. Also for the tent map Ts (x) = 1 − s|x − 12 | √ √ with s = 12 (1 + 5), the critical orbit { 12 , 1, 34 − 14 5} has period three and avoids [0, a]. If α ∈ / Q, then ωϕ¯ (c) is a Cantor set, disjoint from [0, a] and minimal w.r.t. the action of ϕ. ¯ Under the semi-conjugacy f between f and ϕ (indeed f ◦ f = f ◦ ϕ), this projects to a minimal map f : ωf (c) → ωf (c). We will show that f : ωϕ (c) → ωf (c) is a homeomorphism, from which it follows that f : ωf (c) → ωf (c) is also a homeomorphism. Assume by contradiction that x) = y ∈ ωf (c). Then, x m : bn = bn−m } and find that ρab (qi ) = qi + aqi+1 for 1 ≤ a ≤ ai+1 , and in a particular, ρb i+1 (qi ) = qi+2 . For example, if ai ≡ 2, so the qi ’s are the Pell numbers 2, 5, 12, 29, 70, 189, . . . , then we obtain ν = 10.1 1.1 1.0.11.11.0.11.11.1 1.0.11.11.0.11.11.1 1.0 · · · where dots indicate cutting times and primes co-cutting times. The bold symbols indicate the positions qi . In fact, for each i νqi+1 −qi +1 · · · νqi+1 −1 νqi+1 = ν1 · · · νqi −1 νqi or ν1 · · · νqi −1 νq i . Therefore c has two limit itineraries limxc i(x) = 0ν and limx c i(x) = 1ν, but c has only one preimage in ω(c). Outside maps: Boyland, de Carvalho & Hall in [105, Section 3] present a different way of creating a circle endomorphism from a unimodal map. They call this the outside map B and use it to study the inverse limit space of the unimodal map as attractors of sphere homeomorphisms. Starting from a unimodal map f : I → I such that the second branch is surjective (i.e.

4.7. Unimodal Restrictions to Critical Omega-Limit Sets

215

B

2(s−1) s

ϕ¯ Ts 2−s

B c ϕ ¯

0 1−

2 s

a=

s−1 s

1−

2 s

d

1 s

=2−d

2

1

Figure 4.17. Constructing the outside map and the stunted Lorenz map for a tent map Ts .

f ([c, 1]) = I), they • double the interval to a circle R/2Z = [0, 2]/0∼2 ; • let B map the second branch onto [1, 2] by flipping this branch; • extend the definition of f on [1, 2 − d] for the unique point d ∈ (c, 1] for which f (d) = f (0) to cover the interval [0, f (0)]; • map the remaining interval [2 − d, 2] to the constant f (0). That is, as shown in Figure 4.17, ⎧ ⎪ f (x) ⎪ ⎪ ⎪ ⎨2 − f (x) B(x) = ⎪ f (2 − x) ⎪ ⎪ ⎪ ⎩f (0)

if if if if

x ∈ [0, c), x ∈ [c, 1), x ∈ [1, 2 − d), x ∈ [2 − d, 2).

Let us carry this out for the family of cores of tent maps Ts : I → I,  Ts =

sx + 2 − s, x ∈ [0, s−1 s ], s−1 −sx + s, x ∈ [ s , 1],

216

for all s ∈ (1, 2]. Then the map ⎧ ⎨s 2 (4.55) ϕ(x) ¯ = ⎩s(x − 1 ) mod 1 2 and the outside map ⎧ ⎨s(x − 1) + 2 mod 2 B(x) = ⎩2 − s

4. Subshifts of Zero Entropy

if 0 ≤ x ≤ a = if a =

s−1 s

≤ x < 1,

if 0 ≤ x < 2s , if

2 s

s−1 s ,

≤ x < 2,

on R/Z

on R/2Z

are conjugate with conjugacy G : R/Z → R/2Z, G(x) = 2(1 − x) mod 2; i.e. G ◦ ϕ¯ = B ◦ G. But the conjugacy reverses orientation, so the rotation numbers are each other’s opposite, α for ϕ¯ versus 1 − α for B.

Chapter 5

Further Minimal Cantor Systems

In this chapter we present three main types of dynamical systems that, although not subshifts themselves, are popular tools to describe (minimal) continuous maps on Cantor sets. Cutting and stacking goes back to von Neumann and Kakutani in the early 1960s. These systems were originally used to create examples to test specific ergodic properties. Enumeration systems are a generalization of and a more number-theoretic approach to both odometers and Ostrowski numeration systems. Bratteli-Vershik systems came in the 1980s and seem to have become the most frequently used tool to describe Cantor systems. We explain how to represent some of the subshifts from earlier chapters in terms of these tools.

5.1. Kakutani-Rokhlin Partitions First return maps (sometimes called induced maps) form an important tool for studying dynamical systems. They are defined by taking a subset Y ⊂ X and setting TY (y) = T r (y) for the return time r = r(y) := min{i > 0 : T i (y) ∈ Y }. The exhaustion of the space (if (X, T ) is minimal) as , T i ({y ∈ Y : i < r(y)}) (5.1) X= i≥0

and the Rokhlin Lemma [479, Theorem 3.10] in the measure-preserving setting are classical techniques associated with first return maps. 217

218

5. Further Minimal Cantor Systems

For continuous minimal (or at least aperiodic) transformations of the Cantor set, this led to a generalization of (5.1), called the Kakutani-Rokhlin partition. The seminal paper is by Herman et al. [310], who coined the name. Definition 5.1. Let (X, T ) be a continuous dynamical system on a Cantor set. A Kakutani-Rokhlin (KR) partition is a partition .N,hi −1 P = T j (Bi ) i=1,j=0

of X into clopen sets that are pairwise disjoint and together cover X. We call  B= N i=1 Bi the base of the KR-partition and the integers hi the heights. Also we assume that T hi (Bi ) ⊂ B. (If T is invertible, this is automatic.) Usually we need a sequence (Pn )n≥0 of KR-partitions, with bases B(n) n and height vectors (hi (n))N i=1 , having the following properties: (KR1) The sequence of bases is nested: B(n + 1) ⊂ B(n) and (KR2) Pn+1 # Pn ; that is, Pn+1 refines Pn . (KR3) n B(n) is a single point. (KR4) {A ∈ Pn : n ∈ N} is a basis of the topology. The following property (KR5) relies crucially on minimality. Property (KR6) is optional but ensures that there is a unique smallest path in the context of Bratteli-Vershik systems and cutting and stacking systems. (KR5) For all n ∈ N, i ≤ N (n), and i ≤ N (n − 1), there is 0 ≤ j < hi (n) such that T j (Bi (n)) ⊂ Bi (n − 1). (KR6) B(n) ⊂ B1 (n − 1) for all n ∈ N. Theorem 5.2. Every continuous minimal Cantor system (X, T ) has Kakutani-Rokhlin partitions Pn satisfying (KR1)–(KR6). This result comes from [310] and was extended from minimal to transitive aperiodic in [78]. Proof. Let B(1) be any clopen subset of X. By minimality, the first entry time r(x) = min{i ≥ 1 : T i (x) ∈ B(1)} is well-defined and finite for every x ∈ X. Take Br (1) = {x ∈ B(1) : r(x) = r}. Then Br (1) = (T

−r

(B(1)) ∩ B(1)) \

r−1 j=1

T −j (B(1)).

5.1. Kakutani-Rokhlin Partitions

219

From this it follows that the Br (1)’s are clopen and pairwise disjoint. By compactness of B(1) (or uniform recurrence), B(1) is the union of finitely  many such Br (1)’s, say B(1) = N i=1 Bri (1). Then X=

N r i −1

T j (Bri (1)),

i=1 j=0

and the sets in this union are pairwise disjoint. Hence, we have found our first KR-partition P1 = {T j (Bri (1)) : 1 ≤ i ≤ N, 0 ≤ j < ri }. To continue, take a clopen set B(2) inside one of the Bri (1)’s in the previous partition, and repeat the above construction. In this way, we can construct inductively a sequence (Pn )n∈N , where Pn+1 refines Pn . The heights hi (n) = ri for step n in this construction. Without loss of generality, we can assume that diam(B(n)) < 1/n, so n B(n) is a singleton, so that (KR1)–(KR3) hold. By renumbering the Bri (n)’s, (KR6) holds as well. To show (KR4), we can view the sets T j (Bri (n)), 1 ≤ i ≤ Nn , 0 ≤ j < hi (n), achieved at the previous step as the targets in which to choose the next clopen set B  . That is, for fixed η > 0 and for each pair (i, j), choose as the next clopen set B  ⊂ T j (Bri (n)) so that diam(B  ) and diam(T j (Bri (n))) \ B  < (1 − η) diam(T j (Bri (n))). After going through all these 1 ≤ i ≤ Nn , 0 ≤ j < hi (n), we return to taking the next clopen set B(n + 1)  x. Since the corresponding Pn+1 = {T j (Bri (n + 1)) : 1 ≤ i ≤ N, 0 ≤ j < ri } refines all the intermediate KR-partitions, and therefore {Pk }k≥1 generates the topology of X. Finally, (KR5) can be achieved using minimality and taking a subse quences of (Pn )n∈N if necessary. Remark 5.3. If T : X → X is equicontinuous, then the above construction can be refined so as to obtain that Nn ≡ 1 and T h1 (n) (B(n)) = B(n). In the next sections, we will discuss (minimal) Cantor systems represented as cutting and stacking systems, as enumeration systems, and as Bratteli-Vershik systems. In all of these representations, as well as for substitution shifts, graph covers (see Remark 5.4), and Toeplitz shifts, nested sequences of KR-partitions appear naturally. The terminology of base and height can be used for all of them; see Remarks 5.8, 5.34, and 5.56. For substitution shifts associated to a primitive (or aperiodic; see [78]) substitution χ : A → A∗ , the base elements Ba (n), a ∈ A, are the cylinder sets associated to the words χn (a), a ∈ A, and the remaining partition elements are the shifts of these. That this constitutes a partition relies on the recognizability of the substitution shift [78]. More generally, for linearly recurrent shifts (with constant L), one can also use the n-th step return words as bases of

220

5. Further Minimal Cantor Systems

the KR-partitions. In this case, Nn = #B(n) ≤ L and hi (n) ≤ Lhi (n − 1) for all 1 ≤ i, i ≤ Nn . Remark 5.4. The transition from nested sequence of Kakutani-Rokhlin partitions to graph cover (see Section 4.3.4) is fairly direct: The vertices of the n-th graph Γn are the elements of Pn , and there are arrows T j (Bi (n)) → T j+1 (Bi (n)) if 0 ≤ j < hi (n) − 1 and also T hi (n)−1 (Bi (n)) → Bi (n) if T hi (n) (Bi (n)) ∩ Bi (n) = ∅. As bonding maps πn : Γn → Γn−1 we take the inclusion: πn (A) = A for vertices A of Γn and A of Γn−1 if A ⊂ A . Then inverse limit space Γ=← lim −(Γn , πn ) = {(γn )n≥0 : πn+1 (γn+1 ) = γn ∈ Γn for all n ≥ 0} is the graph cover and (KR6) provides the positive directional property. The dynamics f : Γ → Γ is by following the arrows as in equation (4.30).

5.2. Cutting and Stacking The purpose of cutting and stacking is to create invertible maps of the interval that preserve Lebesgue measure and have further desired properties such as “unique ergodicity”, “not weak mixing”, or to the contrary “weak mixing but not strong mixing”. The area was initiated by famous examples by Kakutani and to Chacon, and we first give Kakutani’s example. Example 5.5 (Kakutani). Start with the half-open interval [0, 1) as first stack. Cut it in half and put the right half on top of the left half. Repeat

T 6 3 4 1 4 1 2

.. 0

1 2

3 4

. 1

0

T 6



1

stack

1 2 T 6

T 6



3 4

1

1 4

1 2

stack

3 4

1

Figure 5.1. The von Neumann-Kakutani map by cutting and stacking.

5.2. Cutting and Stacking

221

this procedure. The limit map T : [0, 1) → [0, 1) is call the von NeumannKakutani map and the resulting formula is ⎧ ⎪ x + 12 if x ∈ [0, 12 ), ⎪ ⎪ ⎪ ⎪ 1 ⎪ if x ∈ [ 12 , 34 ), ⎪ ⎨x − 4 3 1 if x ∈ [ 34 , 78 ), T (x) = x − 4 + 8 ⎪ ⎪ .. .. ⎪ ⎪ . . ⎪ ⎪ ⎪ ⎩x − (1 − 1 ) + 1 if x ∈ [1 − 1 , 1 − 1 ), n ≥ 0; 2n

2n

2n+1

see Figure 5.1. If x ∈ [0, 1) is written in base 2, i.e. x = 0.b1 b2 b3 . . .

bi ∈ {0, 1},

x=



2n+1

bi 2−i ,

i

then T acts as the adding machine or odometer: add 0.1 with carry. That is, if k = min{i ≥ 1 : bi = 0}, then T (0.b1 b2 b3 . . . ) = 0.0 . . . 01bk+1 bk+2 . . . . If k = ∞, so x = 0.111111 . . . , then T (x) = 0.0000 . . . . However, x = 0.11111 · · · = 1, so we need to extend the domain of T . The general procedure is as follows: • Cut the unit interval into several intervals, say Δ1 , Δ2 , . . . (these will be called the stacks, and in this initial step they all have height 1) and a remaining interval S (the spacer). • Cut each stack into slices (a fixed finite number for each Δi ), and potentially also cut off some intervals from S. • Pile the slices of the stacks and the cut-off pieces of S on top of each other, according to some fixed rule. By choosing the pieces in the previous steps of the correct length, we can ensure that all intervals in each separate stack have the same length, so they can be neatly aligned vertically. Denote the j-th level of the i-th stack by Δji . • Map every point on a level Δji of stack i directly to the level Δj+1 i above it. Then every point has a well-defined image (except for points at the top levels in a stack and points in the remainder of S) and also a well-defined preimage (except for points at a bottom level in a stack and points in the remainder of S). Where defined, Lebesgue measure is preserved. • Repeat the process, now slicing vertically through whole stacks and stacking whole slices on top of other slices, possibly putting some intervals of S in between. Wherever the map was defined at a previous step, the definition remains the same.

222

5. Further Minimal Cantor Systems

• As we repeat this procedure, the measure of points where the map is not defined yet tends to zero. In the limit, assuming that the spacer S will be entirely spent, there will only be a finite set of points X max (not more than the number of stacks) without image and a finite set of points X min (not more than the number of stacks) without preimage. • The resulting transformation of the interval preserves Lebesgue measure and is invertible up to at most finitely many points. Remark 5.6. If the stacking is “right stacks on top of left stacks”, as is the case in many examples, then 0 ∈ X min and 1 ∈ X max . It seems appealing to map X max to X min (e.g. T (1) = 0 in Example 5.5), but it is not always possible to do this continuously or bijectively as Example 5.7 shows. Example 5.7. Figure 5.2 shows the cutting and stacking procedure√for the Fibonacci substitution shift. There are two points 1 and γ −1 = 12 ( 5 − 1) that are always at the top of their stacks, but 0 is only one point that is always at the bottom of the stack. Remark 5.8. The n-th step in a cutting and stacking construction relates the n-th Kakutani-Rokhlin partition as follows: The stacks created at the n-th step in the cutting and stacking procedure, with heights hi (n), are hi (n)−1 , so Bi (n) = Δi at step n. The non-used spacer is left out {T j (Bi (n))}j=0 h (n) and T i |Bi (n) remains undefined until later steps. Definition 5.9. The rank is r = lim inf n #{stacks used in the step n}, regardless of whether spacers are used or not. Note that this is the number of stacks after piling the slices of the previous stacks on top of each other, so not the number of slices. ∗ γ = 12 (1 +



5)

0 is the minimal point 1 = • and γ −1 = ∗ are maximal points





γ −5 γ −4 0



γ −3 γ −2

∗ γ −1

• 1

Figure 5.2. A cutting and stacking representation of the Fibonacci substitution.

5.2. Cutting and Stacking

223

Minimal subshifts can usually be represented as cutting and stacking systems, and the following result1 of Ferenczi [244, Proposition 4] ties the rank of the cutting and stacking system to the word-complexity of the subshift. Theorem 5.10. Let (X, σ) be a minimal subshift with word-complexity satisfying pX (n) ≤ an + b for all n ∈ N. Then (X, σ) can be represented as a cutting and stacking system of rank r ≤ 2a. Cutting and stacking transformations, considered as single-valued maps on the interval [0, 1), are discontinuous. To be definite, we take the intervals of continuity at each finite step in the construction as closed from the left, open from the right. That is, levels Δji in the stacks or pieces of spacer are of the form [a, b), and T is discontinuous at a (unless a = 0). In particular, the whole map is defined on [0, 1) but not at 1. A way of resolving the discontinuities is to double all the discontinuity points x into x− belonging to the interval at the left of x, and x+ belonging to the interval at the right of x. That is, T (x− ) = limyx T (y) and T (x+ ) = limy x T (y). We start with domain [0, 1], and in the limit, T is defined on a totally disconnected space I ∗ . Namely, I ∗ is a Cantor set and T and T −1 is defined and continuous everywhere, except (possibly) at the sets X max and X min , respectively. We illustrate the issue with two examples. Example 5.11. I: Take spacer [r+ , 1] = [ 12 , 1] and at every step slice the stack in two halves, stack the right half on the left, and put a single layer of spacer on top; see Figure 5.3 (left). Doubling the discontinuity points will not produce a minimal map, irrespective of how we define T at 1 (note that T (r− ) = 34 ). If we set T (1) = 1, then T has a fixed point where it is discontinuous. If we set T (1) = 0, then T is continuous, but not minimal because T n (r− ) ∈ S for all n ≥ 1. II: Take spacer [r+ , 1] = [ 23 , 1] and at every step slice the stack in three equal thirds, stack the second slice on the first, then put in a single layer of spacer, and finally stack the third slice on top; see Figure 5.3 (right). This is the Chacon map (or one of the Chacon maps; see Example 1.27), related to the non-primitive Chacon substitution  0 → 0010, χchac : 1 → 1.

1 Extending a particular case in [35], see also [226, 392, 393] for later results; also for representations as Bratteli-Vershik systems, see Section 5.4.

224

5. Further Minimal Cantor Systems

spacer

spacer

Figure 5.3. Two examples of cutting and stacking with spacer.

If after doubling the discontinuity points we set T (1) = T (r− ) = 0, then the result is a continuous, minimal, uniquely ergodic (but weakly mixing) transformation on the Cantor set, but 1 has no preimage. Proposition 5.12. The (pi )-odometer is conjugate to a cutting and stacking transformation on the space I ∗ , i.e. with discontinuity points doubled. Adding machines have rank 1. Proof. The von Neumann-Kakutani map of Example 5.5 is a realization of the dyadic odometer. For the general (pi )-odometer, proceed as follows: In step i, cut the stack in pi ≥ 2 slices, and, without spacer, put them on top of each other to make the stack for step i + 1. Since only one stack is used at each step, adding machines are rank 1 transformations. The resulting cutting and stacking map T : [0, 1] → [0, 1] is  1 1 ) + q1i if x ∈ [1 − qi−1 , 1 − q1i ), i ≥ 1, x − (1 − qi−1 T (x) = 0 if x = 1, where qi = p1 p2 . . . pi and q0 = 1.



hi −1 We call a cutting and stacking map primitive if for every stack {Δji }j=0 at each cutting and stacking step n , there is n > n such that part of

5.3. Enumeration Systems

225 h  −1

hi −1 i {Δji }j=0 is stacked inside the i -th stack {Δ ji }j=0 at cutting and stacking  step n . This definition is the equivalent of primitivity in substitution and S-adic subshifts.

Proposition 5.13. If a cutting and stacking transformation (I ∗ , T ) is primitive and maximal number s∗ of consecutive layers of spacer is finite, then (I ∗ , T ) is minimal. Proof. Let an open interval U be compactly contained in [0, 1], and take n so large that the base Δ at the n-cutting and stacking step has length |Δ| < 12 |U | and the part of U inside the spacer is all included in the stacks. Then for at least one i (5.2)

there exists 0 ≤ j < hi such that Δji ⊂ U,

hi < ∞ by our assumption on the spacer. By primitivity, there is n such that h  −1

hi −1 i for each i , part of the stack {Δji }j=0 finds its way into the stack {Δ ji }j=0 of cutting and stacking step n .

Therefore at this step, (5.2) is true for all i . This implies that for every x ∈ [0, 1] with a properly defined orbit, there is 0 ≤ k ≤ maxi hi + s∗ such  that T k (x) ∈ U . Minimality follows by Proposition 2.17. Example 5.14. Transitivity is a weaker property than primitivity, and there are transitive but non-minimal cutting and stacking transformations. Indeed, the following non-primitive but transitive rank 2 cutting and stacking transformation is not minimal. At every step, cut the left stack into three slices a, b, c and the right stack into two slices d, e, and stack ac and dbe. This implies that in all later cutting and stacking steps, the left stack is encoded acac · · · ac, so the orbit of 0 has a non-dense closure. However, every well-defined orbit outside orbT (0) is dense. See also Example 5.45.

5.3. Enumeration Systems A generalization of adding machines is an enumeration system2 , in the &k sense that the sequence (qk )k for qk = j=1 pj is replaced by an arbitrary strictly increasing integer sequence G = (Gk )k . General references are [48, 49, 286]. The theory goes back to Coquet [166], while many numbertheoretic properties are presented in [253]. Given an increasing sequence of integers (Gk )k≥0 with G0 = 1, called the enumeration scale (after the French échelles de numération), we can construct the greedy expansion (N0 ) = x0 x1 x2 · · · of any integer N0 ≥ 0 as follows. Start with the sequence x = x0 x1 x2 · · · of all zeroes. Take the 2 Also called generalized odometers and they are related to the Ostrowski numerations [439]; see Example 5.22.

226

5. Further Minimal Cantor Systems

maximal Gk ≤ N0 , replace xk with xk = N0 /Gk , and continue with the remainder N1 := N0 − xk Gk . That is, find k  maximal such that Gk ≤ N1   and let xk = N1 /Gk , etc. After a finite number of steps, Ni = 0 and N0 = j x j G j . Remark 5.15. Sometimes the greedy expansion is the only possible expansion, for example the binary expansion, if Gn = 2n−1 and xn ∈ {0, 1}. Zeckendorf’s Theorem [567] states that the expansion is infinite and unique if the Gn are the Fibonacci numbers and the digits xj ∈ {0, 1} satisfy xj xj+1 = 0.  = (N ) := {(n) : n ≥ 0} be the set of greedy expansions of Let XG 0  is an infinite string of integers, but with non-negative integers. Each x ∈ XG  in the only finitely many non-zero entries. Let XG be the closure of XG N0 product topology on N0 .

Lemma 5.16. 0 XG = x = x0 x1 x2 · · · ∈ NN 0 : 0 ≤ xn < Gn+1 /Gn ,

. x0 G0 + x1 G1 + · · · + xn Gn < Gn+1 for all n ≥ 0 .

Proof. For every n, if x0 x1 . . . xn 000 . . . is the greedy expansion of rn := x0 G0 + · · · + xn Gn , then rn < Gn+1 . Hence x ∈ XG . If, on the other hand, x ∈ XG , so rn := x0 G0 + · · · + xn Gn < Gn+1 for each n, then no “carry” takes place, so x0 x1 · · · xn 000 . . . is the greedy expansion of rn .   → X  as a((n)) = (n + 1). Define the “addition of one” map a : XG G This leads to an “add one and carry” algorithm extending the one of (4.38).

c := 1 ; k := 0 Repeat s := xk + c If there is n > k such that Gn+1 = s + xk+1 Gk+1 + · · · + xn Gn then xk := 0; xk+1 := 0; . . . , xn := 0; k := n + 1 else c := 0 xk := s mod pk ; k := k + 1 Until

c=0

Proposition 5.17. Let Q(n) < n be the maximal integer such that (5.3)

Gn = d1 Gn−1 + · · · + dn−Q(n) GQ(n)

for integers 0 ≤ dj ≤ Gj+1 /Gj , so maximality implies that dn−Q(n) > 0. Then limn→∞ Q(n) = ∞ if and only if a can be extended to a continuous “add one and carry” operation a : XG → XG which is then also surjective and minimal.

5.3. Enumeration Systems

227

 → X Proof. We show that under the condition in the proposition, a : XG G is uniformly continuous. Then the extension is well-defined and uniformly  → X  \{0∞ } is surjective and a((G −1)) = continuous as well. Since a : XG n G ∞ (Gn ) → 0 as n → ∞, surjectivity follows. Now for minimality, let Z := [x0 . . . xN −1 ] be an arbitrary cylinder set  −1  and let S := N j=0 xj Gj . Because Q(n) → ∞, there is N such that Q(n) ≥ N for all n ≥ N  . Then (S + xN  GN  ) ∈ Z for each 0 ≤ xN  < GN  +1 /GN  . Next (S + GN  +1 ) ∈ Z and (S + GN  + GN  +1 ) ∈ Z or (S + GN  +2 ) ∈ Z, whichever integer is smaller. Continuing this way we find for each n ∈ N some m ≤ GN  such that (n + m) ∈ Z. This is uniform recurrence, so by Proposition 2.17 minimality follows.

To transfer these properties to the whole of XG , let us prove that a :  → X  \ {0∞ } is uniformly continuous. Let N ∈ N be arbitrary, and XG G take N  so large that Q(n) > N for all n ≥ N  . For any x ∈ XG and y ∈ [x0 . . . xN  ] ∩ XG , a(x)i = a(y)i for 0 ≤ i ≤ N , because, by the choice of N  , the first N + 1 digits cannot give a carry to a digit n ≥ N  . This proves  uniform continuity with ε = 2−(N +1) for arbitrary N and δ = 2N . Conversely, suppose Q(n) → ∞, so there is a sequence (nk )k≥1 with N0 = Q(nk ) for all k. Let nk be such that dnk = 0; then in (Gnk ), the entries xnk = 0 and x0 G0 + x1 G1 + · · · + xN0 −1 GN0 −1 = GN0 − 1. Also nk → ∞ as k → ∞. Therefore, for any ε > 0, there is k such that d((Gnk − 1), (Gnk − Gnk − 1)) < ε. But a((Gnk − 1)) starts with N0 + 1 zeroes and a((Gnk −Gnk −1)) starts with N0 zeroes and some non-zero digit. .  Hence a is not continuous on XG Lemma 5.18. Let (5.4)

 XG = {x ∈ XG : x0 G0 + · · · + xn Gn = Gn+1 − 1 infinitely often}.

 , and a−1 is well-defined at every Then a(x) = 0∞ if and only if x ∈ XG ∞ x ∈ XG \ {0 }.  , there is a largest nx such that x0 G0 + · · · + xnx Gnx = Proof. If x = y ∈ / XG Gnx +1 − 1 and y0 G0 + · · · + yny Gny = Gny +1 − 1. Take n = max{nx , ny } + 1. If no such nx , ny exist, then take n = 1. Then a(x)j = xj and a(y)j = yj for all j ≥ n. Hence, if xj = yj for some j > n, then a(x) = a(y). If xj = yj for some j > n, then a(x)0 , . . . , a(x)n , 0, 0, . . . and a(y)0 , . . . , a(y)n , 0, 0, . . . are the greedy expansions of r(x) := 1 + x0 G0 + · · · + xn Gn and r(y) := 1 + y0 G0 + · · · + yn Gn , respectively, and hence they are not equal because  . Surjectivity follows from x = y. This proves that a is injective on XG \ XG  ∞ Proposition 5.17, so a : XG \ {0 } → XG \ XG is well-defined.

228

5. Further Minimal Cantor Systems

 On the other hand, if x ∈ XG , then a(x) = 0∞ , because the condition  x0 G0 + x1 G1 + · · · + xn Gn = Gn+1 − 1 holds for infinitely many n.  . Hence Corollary 5.19. If x ∈ XG , then a(x) = 0∞ if and only if x ∈ XG  = 1. a is invertible if and only if #XG  is called the arbre de retenues in [48]. The system (XG , a) The set XG is called an enumeration system.

Example 5.20. For & an integer sequence (pj )j≥0 with p0 = 1, pj ≥ 2 for j ≥ 1, set Gn = nj=1 pj . Then the corresponding numeration scale XG  is exactly the odometer, and XG = {(p1 − 1, p2 − 1, p3 − 1, p4 − 1, . . . )}. Therefore a : XG → XG is invertible.

Example 5.21. If (Gn )n≥0 = 1, 2, 3, 5, 8, 13, 21, . . . are the Fibonacci numbers, then XG is exactly the space for the Fibonacci SFT, although the addition map is of course entirely different from the shift. The two “maximal” sequences are  = {(1, 0, 1, 0, 1, 0, 1, . . . ) , (0, 1, 0, 1, 0, 1, . . . )}, XG

so a is not invertible. Example 5.22. The standard continued fraction expansion of a number θ ∈ (0, 1) is 1 ; θ = [0; a1 , a2 , a3 , . . . ] := a1 + a + 1 1 2

a3 + 1

..

. see Section 8.2. For every rational θ ∈ (0, 1) there are two finite expansions, and for every irrational θ the expansion is unique. The convergents pqnn := [0; a1 , a2 , . . . , an ] are the partial fractions obtained by cutting the infinite expansion at step k. If we set p−1 = 1, q−1 = 0, p0 = 0, q0 = 1, then the sequences (pn )n≥1 and (qn )n≥1 satisfy the recursive relations (see Section 8.2) pn = an pn−1 + pn−2 ,

qn = an qn−1 + qn−2 .

Furthermore, ( pqnn − θ)n≥1 is an alternating sequence converging to 0 (super)exponentially, depending on how fast the sequence (an )n≥1 grows. If we let (Gn )n≥0 = (qn )n≥0 , then we obtain the Ostrowski numeration XG (see [439] and [20, Section 3.9]) with G0 = 1, G1 = a1 , Gn = an Gn−1 + Gn−2 , so ai ≡ 1 reduces this example to Example 5.21. In any case  = {(a1 − 1, 0, a3 , 0, a5 , 0, a7 , . . . ), (0, a2 , 0, a4 , 0, a6 , . . . )}, XG

under a. A curious property of this numeration is so 0∞ has two preimages  that θ(n + 1) = i (n)i Gi ; see [20, Corollary 9.1.14].

5.3. Enumeration Systems

229

Example 5.23. Let Gk = Sk be the cutting times of a unimodal map f : [0, 1] → [0, 1] and critical point c. Then G0 = 1 and Gk = Gk−1 + GQ(k) ≤ 2Gk−1 . Here the kneading map plays exactly the role of the Q in Proposition 5.17, and we assume that Q(k) → ∞. (5.5)

XG = {x ∈ {0, 1}N | xn = 1 ⇒ aj = 0 for Q(n + 1) ≤ j < n}.

In the language of [48], such a G is a low enumeration scale (échelle basse).  → orb(c) by π((n)) = f n (c). This map is uniformly Define π : XG continuous and extends to a continuous map π : XG → ω(c). Although this extension need not be injective, we have π ◦ a = f ◦ π; see [125, Theorem 1].

Remark 5.24. Let f : [0, 1] → [0, 1] be a unimodal map with kneading map Q(k) → ∞. One can show that f |ω(c) is invertible if and only if #(XS ) = 1. This applies to infinitely renormalizable unimodal maps, but also if there is a strange adding machine; see [24]. In [49] we have the following result concerning entropy. Theorem 5.25. Assume that Q(n) → ∞ and Gn /Gn−1 is bounded (so that XG is compact). Then the enumeration system (XG , a) has zero entropy. Recall from Theorem 4.104 that an equicontinuous minimal Cantor system is conjugate to an adding machine, so, unless Gn−1 divides Gn for all n sufficiently large, enumeration systems are not equicontinuous. Proof. Let ε > 0 be arbitrary and let N be such that 2−N < ε. Set Q := inf n>N Q(n). Let x ∈ XG be arbitrary. For a(x) to have a carry beyond N index N , we need i=1 xi Gi ≥ GQ − 1, and this happens at most once every GQ iterates of a. At such iterate there can be a carry or not, so it can no more than double the number of points in an (n, ε)-separated set and still have an (n + GQ , ε)-separated set. The maximal cardinality of an (Gn , ε)-separated set is bounded by ε−1 2Gn /GQ , so htop (a) ≤ lim lim sup ε→0 n→∞

log(ε−1 2Gn /GQ ) log 2 ≤ lim = 0. N →∞ GQ Gn 

This ends the proof. 5.3.1. Factors of Enumeration Systems. Let |||x||| = min{x − x , x − x}

denote the distance of x to the nearest integer, and let sf(x) = x − x − 12  be a signed fractional part of x taken in (− 12 , 12 ]. Hence |||x||| is the absolute value of sf(x).

230

5. Further Minimal Cantor Systems

Proposition 5.26. Let (XG , a) be an enumeration system associated to the integer sequence (Gn )n≥0 . If ρ ∈ R is such that (5.6)



|||ρGj ||| < ∞,

j=0

then g : XG → T1 ,

x → e2πi(

∞

j=0

ρxj Gj )

where T1 is the unit circle in C, is a continuous eigenvector of (XG , a), with eigenvalue e2πiρ .  Proof. Take ε > 0 and N ∈ N such that ∞ j=N |||ρGj ||| < ε. Then / / /∞ / N ∞



/

/ / / sf(ρxj Gj ) − sf(ρxj Gj )/ ≤ |||ρGj ||| < ε. / / j=0 / j=N +1 j=0 This shows that g : X → T1 is uniformly continuous. For each n ∈ N0 , we have g((n)) = e2πiρn . Therefore g ◦ a((n)) = g((n + 1)) = e2πiρ(n+1) = e2πiρ g((n)).  to X . By uniform continuity, we can extend this relation from XG G



The choice ρ = 0 trivially satisfies (5.6), with the eigenfunction g(x) ≡ 1. This just confirms ergodicity, and it is not what we want to study. There can be multiple (rationally independent) non-zero ρ’s, say ρ0 , . . . , ρd−1 , satisfying (5.6) simultaneously. In this case, we can build a continuous factor map onto the (d − 1)-dimensional torus: g : XG → Td−1 , x → (g1 (x), . . . , gd−1 (x)),  where gk (x) = exp(2πi ∞ j=0 ρk xj Gj ). This happens for instance if (Gn )n≥0 is generated by a recursive relation Gn = a1 Gn−1 + · · · + ad Gn−d such that the corresponding characteristic equation 1 = a1 x−1 + · · · + ad x−d has a Pisot number λ as leading root; see Proposition 8.6. In this case, we can take ρ1 = λ, ρ2 = λ2 , . . . , ρd−1 = λd−1 and λd is an integer combination of {1, λ, . . . , λd−1 }. The standard (and actually first [471]) example is the Rauzy fractal based on the tribonacci number, i.e. the leading root of x3 = x2 + x + 1; see Figure 4.4 (left). Example 5.27. In [125]3 , this approach is used to describe the ω-limit sets of unimodal maps with kneading maps Q with k − Q(k) bounded. The left 3 However, [125] also gives examples where (ω(c), f ) factors to a torus (of any dimension) and to solenoids, i.e. circle suspensions over Cantor sets.

5.3. Enumeration Systems

231

panel in Figure 5.4 is constructed in this way from the sequence (Gn )n≥0 = 1, 2, 3, 4, 6, 9, 13, 19, . . . (sometimes called the Narayama cow sequence4 ). The picture suggests, and this is indeed true, that the boundary of such a Rauzy fractal is a fractal, non-rectifiable, curve. It has infinite length, but the interesting question is whether it has positive two-dimensional Lebesgue measure or not. Occasionally, this can be decided upon by a simple geometric ¯ 2 , the argument. For the case x3 = x2 + 1 with solutions λ0 > 1 and λ1 = λ space XG consists of sequences in which every two 1’s are separated by at least two 0’s. Define π : XG → C as π(x) = lim

n→∞

n

j=0

sf(λ0 xj Gj ) + i

n

sf(λ20 xj Gj )

j=0

and set P = π(XG ). Identify the two-dimensional torus T2 with the quotient space C/(Z + iZ), and note that we have T ◦π =π◦a

for the translation T : T2 → T2 , z → z + λ0 + iλ20 .

In Figure 5.4 (left), the three shades refer to three cylinder sets P00 = π(00XG ), P100 = π(100XG ), and P0100 = π(0100XG ). As shown in [125] P = P00 ∪ P100 ∪ P0100 is the attractor (in the sense of Hutchington [326]) of an iterated function system (IFS): ⎧ ⎪ z → λ21 z, ⎨ ψ00 : P → P00 , (5.7) z → λ41 + λ31 z, ψ100 : P → P100 , ⎪ ⎩ z → λ51 + λ41 z. ψ0100 : P → P0100 , Since λ0 λ1 λ2 = 1, the squares of the absolute values of the contraction factors sum to −3 −4 |λ21 |2 + |λ31 |2 + |λ41 |2 = λ21 λ22 + λ31 λ32 + λ41 λ42 = λ−2 0 + λ0 + λ0

= λ−4 (λ20 + λ0 + 1) 4 3 2 = λ−4 0 (λ0 − (λ0 + 1) (λ0 − λ0 − 1)) = 1.

  =0

Therefore Leb(P ) = Leb(P00 ) + Leb(P100 ) + Leb(P0100 ), so that the three cylinders overlap in a Lebesgue nullset. On the other hand, P mod (Z + iZ)  ) mod (Z + iZ) = {n(λ + iλ2 ) : n ≥ 0} mod (Z + iZ) is = T2 because π(XG 0 0 dense in the torus. This discussion on Rauzy fractals raises the question of how the construction in this section is related to the construction in Section 4.2.4, i.e. 4 The Indian mathematician Narayama Pandita (1325–1400) studied cows in much the same way that Fibonacci studied rabbits, only cows take more time to mature than rabbits.

232

5. Further Minimal Cantor Systems

Figure 5.4. Rauzy fractals for x3 = x2 + 1 in two different constructions.

the left and right panels of Figure 5.4. Continuing our example, substitution ⎧ ⎛ ⎪ 1 1 ⎨0 → 02, χ : 1 → 0, with associated matrix A = ⎝0 0 ⎪ ⎩ 1 0 2→1

consider the ⎞ 0 1⎠ . 0

This matrix has left and right eigenvectors (λ2i , λi , 1) and (λ2i , 1, λi )T , where λi , i = 0, 1, 2, are the roots of the characteristic polynomial p(x) = x3 − x2 − 1. Hence p(x) = 0 is exactly the characteristic equation of our enumeration scale. This means that the attracting right eigenspace of A is V = (λ20 , λ0 , 1)⊥ . Applying Theorem 4.40 to this substitution (its prefixsuffix graph is given in the right panel of Figure 4.5) we get that its Rauzy fractal R ⊂ V is the attractor of the graph-directed IFS ⎧ ⎪ ⎨R(0) = h(R(0)) ∪ h(R(1)), (5.8) where h = A|V . R(1) = h(R(2)), ⎪ ⎩  R(2) = h(R(0)) + π(10 ) Using (5.8) twice on its first line, we obtain R(0) = h2 (R(0)) ∪ h2 (R(1)) ∪ h2 (R(2)) = h2 (R), so the first tile R(0) is an affine (and since λ1 = λ2 actually conformal) copy of the entire Rauzy fractal R. Applying (5.8) twice on R(1) and once more on R(2), we get R(0) = h2 (R(0)) ∪ (h3 (R(0)) + h2 ◦ π(10 )) ∪ (h4 (R(2)) + h ◦ π(10 )). But R(0) = h2 (R), so R = h2 (R) ∪ (h3 (R) + π(10 )) ∪ (h4 (R) + h ◦ π(10 )).

5.4. Bratteli Diagrams and Vershik Maps

233

Next let f : V → C be a linear isometry. Since λ1 = λ2 , the contraction h turns into a multiplication by λ1 : f ◦ h = λ1 · f . Therefore Q := f (R) satisfies Q = λ21 Q ∪ (λ31 Q + z) ∪ (λ41 Q + λz),

for z := f ◦ π(10 ) = 0.

4 −1 Substitute P = zλ−4 1 Q and finally multiply the set-equation by λ1 z . This gives P = λ21 P ∪ (λ31 P + λ41 ) ∪ (λ41 P + λ51 ),

exactly the same as the set-equation we derived from (5.7). Uniqueness of Hutchington attractors shows that P = P, and hence the Rauzy fractal of the above example is, up to a scaling, the same as the Rauzy fractal associated to χ.

5.4. Bratteli Diagrams and Vershik Maps Bratteli diagrams emerged in the area of operator algebras [108], C ∗ -algebras in the first place, but were given a dynamical interpretation when Vershik equipped it with an order and a successor map, [546]. Vershik named this map the adic transformation of a Markov compactum (see [402, 403, 519, 546, 547]); the explicit connection with the Bratteli diagram seems to have been made later. Herman, Putnam & Skau [310] showed that every (essentially) minimal homeomorphism on the Cantor set can be represented as a Bratteli-Vershik system. Later, Medynets [413] extended this to all aperiodic homeomorphisms on the Cantor set; see also [212, Theorem 6.14]. An ordered Bratteli diagram is an infinite graph consisting of • a sequence of finite non-empty vertex sets Vi , i ≥ 0, such that V0 consists of a single vertex v0 ; • a sequence of finite non-empty edge sets Ei , i ≥ 1, such that each edge e ∈ Ei connects a vertex s(e) ∈ Vi−1 to a vertex t(e) ∈ Vi . (Here s and t stand for source and target.) For every v ∈ Vi−1 , there exists at least one outgoing edge e ∈ Ei with v = s(e), and for every v ∈ Vi there exists at least one incoming edge e ∈ Ei with v = t(e);  • for each v ∈ i≥1 Vi , a total order < between its incoming edges. The path space XBV := {(xi )i≥1 : xi ∈ Ei , t(xi ) = s(xi+1 ) for all i ∈ N} is the collection of all infinite edge-labeled paths starting from v0 , endowed with product topology. That is, the set of infinite paths with a common initial n-path is clopen, and all sets of this type form a basis of the topology.

234

5. Further Minimal Cantor Systems

To each Ei we assign an incidence matrix M (i) = (mv,w (i))v∈Vi−1 ,w∈Vi of size #Vi−1 × #Vi , where mv,w (i) is the number of edges from v ∈ Vi−1 to w ∈ Vi . Definition 5.28. For n ∈ N, define the height vectors h(n) = (hv (n))v∈Vn (interpreted as row vectors) as hv (n) = #{x1 . . . xn : s(x1 ) = v0 , t(xn ) = v}, that is, the number of n-paths from v0 to v ∈ Vn . Taking h(0) = (1) by default, it follows by induction that h(n) = h(0)·M (1)·M (2) · · · M (n) for all n ∈ N. The Bratteli diagram has a simple cap if h(1) = M (1) = (1, . . . , 1). Remark 5.29. Instead of taking the collections of edges Ei separately, we can also consider all the paths from Vi−1 to Vj for some j ≥ i. We denote this collection of paths by Ei,j . The incidence matrix associated to Ei,j can be shown to be the matrix product M (i, j) = M (i) · · · M (j). This process is called telescoping. The opposite procedure, i.e. inserting extra levels of vertex and edge sets, is called microscoping. Example 5.30. In the Bratteli diagram in Figure 5.5, we telescope the first two levels away. The corresponding computation of the associated matrices is as follows:        1 1 1 = 4 . M (1, 3) = 1 1 2 0 1 v0

  M (1) = 1 1   V1 1 1 M (2) = 2 0 V2   1 M (3) = 1 V3

v0   M  (1) = 3 1

v0   M  (1) = 4



 V1 1  M (2) = 1 V2

V1

Figure 5.5. Telescoping a Bratteli diagram.

Exercise 5.31. Show that also if M (1) = (1, . . . , 1), then there is an equivalent Bratteli diagram with M (1) = (1, . . . , 1). It is often useful to telescope Bratteli diagrams in such a way that all incidence matrices become strictly positive. This is possible if and only if, for every i, there exists j ≥ i such that for every v ∈ Vi and w ∈ Vj+1 , there is a path from v to w.

5.4. Bratteli Diagrams and Vershik Maps

235

Definition 5.32. A Bratteli diagram is called simple if there is an increasing sequence 0 = m0 < m1 < · · · such that after telescoping between levels mi−1 and mi , then new matrices M  (i) := M (mi−1 , mi ) are strictly positive, for all i ≥ 1. In this way, it is analogous to the notion of primitive used for incidence matrices of SFTs or S-adic transformations, except that primitive requires that supi mi − mi−1 < ∞; see Definition 4.42. If the Bratteli diagram is stationary (i.e. M (i) = M is the same matrix for all i ≥ 2), then the path space XBV is identical to the path space of an edgelabeled transition graph associated to M . The Vershik map τ : XBV → XBV that we will define below, however, is quite different5 from the left-shift σ. The latter is hyperbolic and of positive entropy6 whereas τ is not hyperbolic and, on stationary Bratteli diagrams, has zero entropy. If x = x1 . . . xN and y = y1 . . . yN , with xi , yi ∈ Ei for 1 ≤ i ≤ N , are finite paths, then we can compare x and y if they have the same endpoint in VN . Let m < N be the largest index such that xm = ym . This means that t(xm ) = t(ym ), and we say that x < y if xm < ym . This gives a partial order on the set of N -paths and a total order on the set of N -paths ending in the same v ∈ VN . For every v ∈ Vi , there is a unique minimal path from v up to v0 and at least one e ∈ Ei with s(e) = v. From this it follows that there are infinite paths x ∈ XBV such that the initial N -path xmin [1,N ] is minimal among min of all N -paths with the same terminal vertex. That is, the collection XBV minimal infinite paths is non-empty, and at the same time, every v ∈ VN min can have only one minimal incoming path xmin [1,N ] terminating in v. If XBV consists of a single element, we denote it as xmin . max of maximal infinite paths and The same is true for the collection XBV max if XBV consists of a single element, we denote it as xmax . In other words, we have proved the following lemma:

Lemma 5.33. For every ordered Bratteli diagram XBV , we have min max , #XBV ≤ lim inf #Vn . 1 ≤ #XBV n→∞

The Vershik adic transformation (Vershik map) τ : XBV → XBV is defined as follows [546]: For x ∈ XBV , let i be minimal such that xi ∈ Ei 5 Without going into details, the shift and Vershik map can be likened to the geodesic flow and horocycle flow on a manifold of curvature −1; see the interesting exposition in [487]. 6 See [487] for a comparison to geodesic and horocycle flow.

236

5. Further Minimal Cantor Systems

is not the maximal incoming edge. Then set ⎧ ⎪ ⎨τ (x)j = xj for j > i, τ (x)i is the successor of xi among all incoming edges at this level, ⎪ ⎩ τ (x)1 . . . τ (x)i−1 is the minimal path connecting v0 with s(τ (x)i ). max , and we need to choose y ∈ X min to define If no such i exists, then x ∈ XBV BV max depends on how well we τ (x) = y. Whether τ extends continuously to XBV can make this choice. Medynets [413] gave an example of a Bratteli diagram that doesn’t allow any ordering by which τ is continuously extendable, even if min = #X max ; see Figure 5.6 (right). For this diagram the only incoming #XBV BV edges to u ∈ Vn come from u ∈ Vn−1 , and therefore there is a minimal and a maximal path going through vertices u only. By the same token, there is a minimal and a maximal path going through vertices w only. No matter how τ is defined on these two maximal paths, there is no way of putting an order on the incoming edges to v ∈ Vn such that this definition makes τ continuous at these maximal paths.

Remark 5.34. For Bratteli-Vershik systems to a Kakutani-Rokhlin partition, the partition Pn is formed by the n-cylinders, represented by n-paths connecting v0 with some v ∈ Vn . There are then hv (n) such path, and the smallest path corresponds to the base elements Bv (n). Example 5.35. There are examples where XBV is not a Cantor set. The two examples in Figure 5.6 (left) have opposite ordering. On the min consists of the (vertex-labeled) paths e := v → v → v → v → · · · left, XBV 0 v0

v

v0

v

0 v

1 0

0 v

1 0

0 v

1 0

v

v

v

v

v0

v

0 v

0 1

0 v

0 1

0 v

0 1

v

v

v

u

v

w

u

v

w

u

v

w

u

v

w

Figure 5.6. Non-Cantor set Bratteli diagrams and a non-simple Bratteli diagram.

5.4. Bratteli Diagrams and Vershik Maps

237

max consists of the path v → and e := v0 → v  → v  → v  → · · · , and XBV 0 v → v → v → · · · . That is, e is both minimal and maximal, and defining τ (e) = e is the only continuous possibility. The path e is an isolated point (as are in fact all paths different from e). The resulting system is conjugate 1 to f : Y → Y for Y := { n1 : n ∈ N} ∪ {0}, defined by f (0) = 0, f ( n1 ) = n+1 . Note that f is not surjective. min = {e} and X max = {e, e }, and τ (e) = e is the On the right, XBV BV only continuous possibility, but then τ (e ) = e is forced too (since we want τ (X max ) ⊂ X min ). Now τ is conjugate to f −1 : Y → Y defined as f −1 (0) = 1 ) = n1 for f −1 (1) = 0 (compensating for f not being surjective) and f −1 ( n+1 n ≥ 1.

Proposition 5.36. The Vershik map τ can be extended continuously on an ordered Bratteli diagram in the following situations: min = #X max = 1. In this case τ : X max min • #XBV BV \ XBV → XBV \ XBV BV extends to a homeomorphism of XBV . min ≤ #X max and X min and X max have non-empty interi• 2 ≤ #XBV BV BV BV max → X min ors. If also τ : XBV \ XBV BV \ XBV is uniformly continuous, min = #X max , then τ extends to an endomorphism of XBV . If #XBV BV then this extension is a homeomorphism.

The first part was already addressed in [310, Section 2]; they called such Bratteli-Vershik systems “essentially simple” where currently the word “properly ordered” is used; see Definition 5.38. The question has been investigated in detail by Bezuglyi et al. [81, 82]7 , calling such Bratteli-Vershik systems perfect. A different account is due to Downarowicz & Karpel [212, 213]. They call a Bratteli-Vershik system decisive if τ can be extended to a homeomorphism in a unique way. According to [212, Lemma 6.11], a Bratteli-Vershik system is decisive if and only if τ is uniformly conmax and the interior of X max is either empty or a single tinuous on XBV \ XBV BV isolated point. Example 5.37. There are four ways to assign a stationary order to a sta1 1 tionary Bratteli diagram with incidence matrix 1 1 ; see Figure 5.7. • The right two cases represent the Thue-Morse shift. They have two minimal and two maximal infinite paths, but it is impossible max → X min so that τ : X to define τ : XBV BV → XBV becomes BV continuous, let alone a homeomorphism. As shown in [81, Example 3.5] (see also [212, Example 6.12]), it is not possible to extend τ continuously, also for non-stationary orders, as soon as there are two minimal paths. 7 Also

for infinite rank Bratteli diagrams, see Definition 5.41 below.

238

5. Further Minimal Cantor Systems

v0

v0

v0

v0

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

Figure 5.7. Different orders on the same stationary Bratteli diagram.

• The left two cases have only one minimal and maximal path. Now τ can be extended continuously to a homeomorphism of XBV , and it is conjugate to the dyadic odometer; see Remark 5.55, [81, Proposition 2.20], or [255, Section 5] which are concerned with the characterization of odometers as Bratteli-Vershik systems. This also follows because (XBV , τ ) represents an invertible Toeplitz shift; see [208, below Theorem 5.1] and Theorem 5.54 and the text below it. Definition 5.38. A Bratteli-Vershik system (XBV , τ ) is called properly min = #X max = 1. ordered if it is simple (as in Definition 5.32) and #XBV BV Proposition 5.39. All well-defined forward orbits of a simple BratteliVershik system are dense; in particular, a properly ordered Bratteli-Vershik system is minimal. Proof. Take any cylinder set [x1 . . . xn ]; it corresponds to an n-path from v0 to some v ∈ Vn . The map τ goes cyclically through all paths from v0 to v, except that there are other paths (not to v) between the maximal path from v0 to v and the reappearance of the minimal path between v0 and v. Recall from Definition 5.28 that hv (n) = #{paths between v0 and v ∈ Vn }. By transitivity, there is k ≥ 1 such that M (n, n+k) = M (n) · · · M (n+k) is strictly positive. Hence, in whichever way x1 . . . xn continues to x1 . . . xn+k , it will take no more than the maximal column-sum K of M (n, n + k) of successor paths of xn+1 . . . xn+k before a path xn+1 . . . xn+k appears with s(xn+1 ) = v. Therefore, if x ∈ [x1 . . . xn ] has a well-defined orbit, then there is 0 < j ≤ K · maxv∈Vn hv (n) such that τ j (x) ∈ [x1 . . . xn ]. This proves that orbτ (x) is dense and uniformly recurrent. Hence, if XBV is properly ordered and, in particular, τ is a homeomorphism, then  Proposition 2.17 implies that (XBV , τ ) is minimal.

5.4. Bratteli Diagrams and Vershik Maps

239

Figure 5.6 illustrates the effect of reversing the order of each collection of incoming edges: Lemma 5.40. The system (XBV , ≤, τ ) is conjugate to (XBV , ≥, τ −1 ) wherever τ is well-defined and injective. That is, if we reverse the order on the incoming edges everywhere, we obtain a system whose inverse Vershik map is conjugate to the original system. In particular, the set of minimal and maximal paths change roles. Definition 5.41. A Bratteli diagram B = ((Ei )i∈N , (Vi )i∈N ) has rank r B := lim inf i #Vi . A Bratteli diagram has rank r if r is the smallest integer such that (XBV , τ ) is isomorphic to a system on a Bratteli diagram with r B = r. If no such finite r exists, then the Bratteli diagram is said to have infinite rank. As was shown in [204], every minimal subshift with sublinear wordcomplexity can be represented as a Bratteli-Vershik system of finite rank. There are, however, minimal subshifts with superlinear word-complexity that can be represented as a Bratteli-Vershik system of finite rank. Exercise 5.42. Show that telescoping and microscoping produces conjugate Bratteli-Vershik systems. Show that the rank r ≤ r B and that the inequality can be strict. Show that a rank r B Bratteli diagram can be telescoped to a Bratteli diagram with r B = #Vi for all i. Example 5.43. The Pascal Bratteli diagram is characterized by Vk = {0, 1, . . . , k} and Ek = {i → i : 0 ≤ i ≤ k − 1} ∪ {i → i + 1 : 0 ≤ i ≤ k − 1}. The corresponding Bratteli-Vershik system has uncountably many measures that are ergodic w.r.t. the equivalence relation of having the same tail; namely every (p, 1 − p)-Bernoulli measure represents one such ergodic measure; see [415] and e.g. [254, 256] for more results on infinite rank BratteliVershik systems. In this text we will confine ourselves to finite rank systems. The following result, reminiscent of the Auslander-Yorke dichotomy, is due to Downarowicz & Maass [215] in the minimal case and Bezuglyi et al. [78, Theorem 4.8] for aperiodic systems. Theorem 5.44. Every minimal Bratteli-Vershik system of finite rank is either expansive (namely if its rank r ≥ 2) or conjugate to an odometer. If the Bratteli-Vershik system is not minimal but aperiodic and if no minimal component is conjugate to an odometer, then it is expansive. Example 5.45. The non-primitive substitution of Figure 5.8 illustrates the second part of Theorem 5.44. The corresponding Bratteli-Vershik system (XBV , τ ) is invertible, because there is a unique xmin = 000 . . . and a unique xmax = 1111 . . . . Since χ is not primitive, (XBV , τ ) cannot be simple, and

240

5. Further Minimal Cantor Systems

v0

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

⎧ ⎪ 0 → 01, ⎪ ⎪ ⎪ ⎨1 → 01, χ: ⎪ 2 → 02031, ⎪ ⎪ ⎪ ⎩3 → 02131

Figure 5.8. A transitive, non-primitive Bratteli diagram.

 , τ ) using only symbols 0 and 1 hence not minimal. The subdiagram (XBV (dashed in the figure) is a minimal subsystem which is actually conjugate to a dyadic odometer as in Example 5.37. However, there are also points with dense orbits, such as y = 2222 . . . . The point z = 3333 . . . is non-recurrent, because every 3 that disappears when iterating τ will never reoccur. Instead,  . For more information on Bratteli-Vershik orbτ (y) accumulates on XBV systems with this kind on non-minimal structure, see [77].

Definition 5.46. A pointed dynamical system (X, T, x) is essentially min imal if for every open set U  x, n T n (U ) = X. Thus essentially minimal is a pointed version of topologically exact. Also, (X \ {x}, T ) is essentially minimal in the sense of Definition 2.25. Minimality implies essential minimality, but also if x is a fixed point, (X, T, x) can still be essentially minimal. For example, the Bratteli-Vershik system in the left panel of Figure 5.9 has two minimal sequences x = SSS . . . and y = 000 . . . and two maximal sequences x and z = 333 . . . . If we set τ (x) = x and τ (z) = y, then x is fixed but the whole system (XBV , τ, x) is essentially minimal. We can call x the spacer path, because it plays the role of spacer in a cutting and stacking system; see Example 5.11, II. If we set τ (x) = y and τ (z) = x, then the whole system (XBV , τ, x) is minimal. max . For neither choice does τ extend continuously to XBV The following theorem from [310] says that Bratteli-Vershik systems model every essentially minimal Cantor system.

5.4. Bratteli Diagrams and Vershik Maps

241

v0

v0

3

0 1 2 3

S

0

1

0

2 1

0

0 1 2 3

S

0

1

0

2 1

0

0 1 2 3

S

0

1

0

2 1

0

0 1 2 3

S

0

1

0

2 1

0

12

3 12

3 12

3 12

Figure 5.9. A Bratteli-Vershik system isomorphic to the Chacon substitution.

Theorem 5.47. For every essentially minimal homeomorphism T on the Cantor set X and x ∈ X, there exists a properly ordered Bratteli-Vershik system (XBV , τ ) such that (X, T, x) is pointedly conjugate to (XBV , τ, xmin ). Example 5.48. The Bratteli-Vershik systems in Figure 5.9 are both isomorphic to the Chacon substitution shift (see Example 1.27) generated by the fixed point ρ = 0010 0010 1 0010 0010 0010 1 0010 1 0010 0010 1 0010 . . . of the Chacon substitution χchac

 0 → 0010, : 1 → 1.

The one on the left is not properly ordered, because it has two minimal sequences x and y and two maximal sequences x and z as we saw below Definition 5.46. The one on the right, constructed by Park [444], is properly ordered. See [225] for general results on finding properly ordered Bratteli-Vershik systems.

242

5. Further Minimal Cantor Systems

5.4.1. Bratteli-Vershik Systems and S-adic Shifts. Substitutions are a common way to build minimal subshift; in fact, if we are allowed to build the substitution shift on a countable collection of substitutions, i.e. apply S-adic shifts, then every minimal Cantor system can be expressed in this way. For each i ≥ 1, let Vi be a finite alphabet, and let Vi∗ denote the ∗ be a collection of finite words in this alphabet. For i ≥ 1, let χi : Vi → Vi−1 substitution; hence, to each v ∈ Vi+1 , χi assigns a string of letters from Vi . The substitution acts on strings by concatenation: χi (v1 v2 . . . vN ) = χi (v1 )χi (v2 ) . . . χi (vN ). To each substitution χi we associate the incidence matrix M (i) of size #Vi−1 × #Vi , where the entry mk,l (i) denotes the number of appearances of the k-th letter in Vi−1 in the χi -image of the l-th letter of Vi . Hence M (i) is the associated matrix of the i-th substitution; see Definition 4.14. By iterating the substitutions, we can construct an infinite string s = lim χ2 ◦ χ3 ◦ · · · ◦ χi (v), i→∞

where v is taken from Vi . For completeness, we also set χ1 (v) = 0 for every v ∈ V1 . Using the irreducibility conditions, ∀i ∃w ∈ Vi−1 ∃j > i ∀v ∈ Vj the word χi ◦ · · · ◦ χj (v) starts with w, the limit can be shown to exist, independently of the choice of v ∈ Vj . Moreover, s is a uniformly recurrent string in V1N . It generates a minimal subshift (Σ, σ), where σ is the left-shift and Σ = {σ n (s) : n ≥ 0}, the closure taken with respect to product topology. Example 5.49. The best-known examples are of course the stationary substitutions; i.e. Vi ≡ V and χi ≡ χ. For example, the Fibonacci substitution acts on the alphabet {0, 1} by ' 0 → 01, χFib : 1→0 and s = 0100101001001 . . . . This sequence is equal to the sequence of first labels of {τ n (xmin )}n≥0 in the Fibonacci Bratteli diagram in Figure 5.13. As a result, the Fibonacci substitution is isomorphic to the Fibonacci Bratteli diagram, which in turn is isomorphic to the Fibonacci enumeration system. The following result can be found in e.g. [225]. Lemma 5.50. Every S-adic shift such that each letter v ∈ Vi−1 , i ≥ 2, appears in some word χi (w), w ∈ Vi , is isomorphic to a Vershik transformation on an ordered Bratteli diagram and vice versa.

5.4. Bratteli Diagrams and Vershik Maps

243

If the substitutions (χi ) are such that for every i ≥ 2, there is j0 ≥ 1 such that (5.9)

χi ◦ · · · ◦ χj (v) starts with the same symbol for all j ≥ j0 , v ∈ Vj ,

then the corresponding Bratteli diagram has a unique minimal path xmin . Proof. The vertices of the Bratteli diagram coincide with the alphabets Vi (for this reason we choose the same notation), except that the Bratteli diagram has a first level V0 = {v0 }. Let8 E1 = {v0 → v | v ∈ V1 }. For each v ∈ Vi , i ≥ 1, there is an incoming edge w → v for each appearance of w ∈ Vi−1 in χi (v), and the ordering of the incoming edges in v is the same as the order of the letters in χi (v). It follows that the incidence matrices of the substitution χi coincide with incidence matrices associated to the edges Ei . Hence M (i) is the transpose of the matrix associated to χi . Clearly, the Bratteli diagrams and substitutions (χi )i≥2 are in one-toone correspondence, provided every w ∈ Vi−1 appears in at least one χi (v), v ∈ Vi . Let vi ∈ Vi be the symbol indicated by (5.9); then it easily follows that vi−1 is the first symbol of χi (vi ) and that xmin := v0 → v1 → v2 → v3 → · · · is the unique minimal element. The sequence s = limj χ2 ◦ · · · ◦ χj (v) can be read off as sn = s(τ n (xmin )2 ). In other words, sn records the vertex in V1 that the n-th τ -image of xmin goes through. The way to see this is the following: Since the incoming edges to w ∈ V3 are ordered as in χ2 (w), a path starting with χ2 (w)1 → w is followed by a path starting with χ2 (w)2 → w, etc. Because this is true for  every vertex in every level Vi , the required sequence s will emerge. Remark 5.51. A graph cover map f : Γ → Γ on the inverse limit Γ = lim ←−(Γn , πn ) = {(γn )n≥0 : πn+1 (γn+1 ) = γn ∈ Γn for all n ≥ 0} (where Γ0 is a single loop from a single vertex) can be turned into a BratteliVershik system as follows. The edges a ∈ Γn correspond bijectively to the vertices v = Vn of an ordered Bratteli diagram. We call the bijection Pn . If the bonding map πn maps a to the concatenation πn (a) = a1 · · · ak of edges in Γn−1 , then we draw k incoming edges ei ∈ En to v = Pn (a) from the vertices Pn−1 (ai ) = s(ei ) ∈ Vn−1 , i = 1, 2, . . . , k, in this order. The map f : Γ → Γ will then lift to the Vershik map τ on the path space XBV . Conversely, when given an ordered Bratteli diagram (En , Vn )n≥1 , the vertices v ∈ Vn correspond bijectively to edges a = Pn−1 (v) in Γn . The 8 In fact, in the construction of Theorem 4.120, there is an extra edge (1 → 2) ∈ E . This 1 gives rise to an isomorphic Bratteli diagram.

244

5. Further Minimal Cantor Systems

ordered set of incoming edges ei ∈ En to v determines the bonding map πn by the concatenation πn (a) = a1 . . . ak if Pn−1 (ai ) = s(ei ). The Vershik map τ then translates to the graph cover map f . The difficult step of how to connect the edges of Γn , i.e. how to determine the vertices of Γn , is solved by Shimomura [502] using an equivalence relation of which the equivalence classes in Vn are called clusters. Vertices in the cluster correspond to edges in Γn with the same terminal vertex. 5.4.2. Bratteli-Vershik Systems and Interval Exchange Transformations. As in Section 4.4, let T : [0, 1) → [0, 1) be an interval exchange transformation of d intervals Δi = [γi−1 , γi ), i = 1, . . . , d and γ0 = 0. Since the subshifts of interval exchange transformations can be seen as S-adic shifts (using Rauzy induction), ordered Bratteli diagrams can be constructed for them in the same way as in Section 5.4.1; see specifically Table 4.3. A more direct construction was given Gjerde & Johansen [275] (see also [222]). Theorem 5.52. The subshift associated to an IET on d intervals has an ordered Bratteli diagram ((Ei )i≥1 , (Vi )i≥0 ) where • d ≥ #Vi ≥ #Vi+1 for all i ≥ 1. If #Vi+1 < #Vi , then9 #Vi+1 = #Vi −1 and limi #Vi = 1+#{disjoint orbits of discontinuity points}. • If #Vi−1 = #Vi has the form ⎛ 1 ⎜0 ⎜ ⎜ ⎜0 ⎜. ⎜. ⎜. (5.10) M (i) = ⎜ ⎜ .. ⎜. ⎜. ⎜ .. ⎜ ⎜ ⎝0 s1

= m, then the incidence matrix of edge set E(i) 0 1 0

... ..

0

0

1

1

0

0

0

...

.

1 0 .. .

s2 . . .

sk sk+1 sk+2

.. ..

.

. ...

0 0 .. . .. . .. .



⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ 0⎟ ⎟ ⎟ 1⎠ sm

for sj ∈ {0, r, r + 1} for some r ≥ 0. The matrices M (i) are all unimodular. If #Vi−1 = #Vi + 1 = m, then the k + 1-st row in the matrix should be removed. • The order on the incoming edges {e ∈ Ei : t(e) = v} is left to right. However, not every Bratteli-Vershik system of this form corresponds to an IET. 9 This corresponds to the loss of an interval for some first return map due to a critical connection and is hence not generic.

5.4. Bratteli Diagrams and Vershik Maps

245

Proof. The proof is based on an accelerated version of Rauzy induction, although [275] doesn’t mention Rauzy and basically reinvents the procedure. First all the points x in the backward orbits and left forward orbits of discontinuity points γi are doubled to x− and x+ . This transforms [0, 1) into a Cantor set with order topology, provided the Keane condition holds. Then  consider the first return map T1 to [0, γd−1 ) = d−1 i=1 Δi . Let e < d be the index of the interval such that T (Δe )  1. • If |Δd | < |Δe |, then the procedure coincides with a Type 1 Rauzy induction step; see Figure 5.10. The corresponding substitution (as in Table 4.3) and incidence matrix are ⎛ ⎞ 1 0 ... ... 0 ⎜ .. ⎟ ⎜0 . . . .⎟ ⎧ ⎜ ⎟ ⎪ ⎜ ⎟ j → j, 1 ≤ j ≤ e, . ⎨ . ⎜. 1 1 0⎟ ⎜ ⎟. χ : e + 1 → ed, M (i) = ⎜ . ⎟ . ⎪ . . ⎩ ⎜. . 0⎟ 0 0 j → j − 1, e + 1 < j ≤ d, ⎜ ⎟ ⎜ .. .. ⎟ .. . 1⎠ ⎝. . 0 ... 0 1 0 0 Clearly M (1) is unimodular and has the form (5.10). Δd

Δe γe

γd

T (Δd )

T (Δe )

Figure 5.10. The first return map to [0, γd− ] for the case |Δd | < |Δe |.

• If |Δe | = |Δd |, then T (γe ) = γd and we don’t have to split Δe . Instead ⎛ ⎞ 1 0 ... ... 0 ⎜ .. ⎟ ⎧ ⎜0 . . . . . . .⎟ ⎜ ⎟ ⎪ j → j, 1 ≤ j < d, ⎪ ⎜ ⎟ . . ⎨ . ⎜ .. 1 0 .⎟ ⎜ ⎟. j =

e, χ: M (i) = ⎜ . ⎟ ⎪ . ⎪ ⎜ . 0 1 0⎟ ⎩ ⎜ ⎟ e → ed, ⎜ .. ⎟ ⎝. 0 1⎠ 0 ... 1 0 0 This is a rectangular matrix of the form (5.10) with a column removed.

246

5. Further Minimal Cantor Systems

• If |Δd | > |Δe |, then there is k < d and r ≥ 0 such that T 1+r (Δk )  γd and T j (Δk ) ⊂ δd for 1 ≤ j ≤ r. Thus r ≥ 1 is the minimal / Δd . The intervals Δj mapped into Δd iterate such that T r (1− ) ∈ remain in Δd for r or r + 1 steps, depending on whether T (Δj ) is to the left or right of T (Δk ); see Figure 5.11. The first return map to [0, γd−1 ) is comprised of r Type 1 Rauzy steps and a single Type 0 Rauzy step, where r = r × #{j : T (Δj ) ∩ Δd = ∅} + #{j : T (Δj ) lies to the right of T (Δk )}. Δe

Δk

Δd γd−1

γe

T (Δd )

T (Δk )

T (Δe )

T 2 (Δk )

Figure 5.11. The first return map to [0, γd−1 ) for the case |Δd | > |Δe |.

The corresponding substitution is ⎧ j → jdr or jdr+1 if 1 ≤ j < k and T (Δj ) ⊂ [γd−1 , 1), ⎪ ⎪ ⎪ ⎪ ⎪ j→j if 1 ≤ j < k and T (Δj ) ⊂ [0, γd−1 ), ⎪ ⎪ ⎪ ⎪ r ⎪ ⎪k → kd , ⎪ ⎪ ⎨k + 1 → kdr+1 , χ: ⎪ if k + 1 < j ≤ d j → (j − 1)dr ⎪ ⎪ ⎪ ⎪ r+1 ⎪ and T (Δj−1 ) ⊂ [γd−1 , 1), or (j − 1)d ⎪ ⎪ ⎪ ⎪ ⎪ j →j−1 if k + 1 < i ≤ d ⎪ ⎪ ⎩ and T (Δj−1 ) ⊂ [0, γd−1 ), and the incidence matrix ⎛ 1 0 ... ⎜0 1 ⎜ .. ⎜ . ⎜0 0 ⎜. ⎜. ⎜. M (i) = ⎜ ⎜ .. ⎜. ⎜. ⎜ .. ⎜ ⎜ ⎝0 s1 s2 . . .

is 0

0

1

1

0

0

0

...

1 0 .. .

sk sk+1 sk+2

.. ..

.

. ...

0 0 .. . .. . .. .



⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ 0⎟ ⎟ ⎟ 1⎠ sm

5.4. Bratteli Diagrams and Vershik Maps

247

Since sk+1 = sk + 1, a simple computation shows that M (1) is unimodular and has the form (5.10). (Alternatively, multiplying the incidence matrices of the Type 1 and Type 0 Rauzy induction steps − as right boundary gives this result as well.) If T r+1 (Δk ) has γd−1 point, then we don’t need to split Δk and the k+1-st row is removed from (5.10). Now continue inductively creating substitutions and incidence matrices for − − − + the n-th first return map Tn : [0, a− n ) → [0, an ] for a1 = γd and an the left endpoint of the rightmost interval on which Tn−1 is continuous. For the final statement that not every Bratteli-Vershik system is of this form, we refer to [275, Section 4].  5.4.3. Bratteli-Vershik Systems and Toeplitz Shifts. Gjerde & Johansen [274] gave a characterization of Bratteli-Vershik systems coming from Toeplitz shifts. The following notion is central in this characterization. Definition 5.53. A Bratteli diagram has the equal path number property10 if for every i ≥ 1, the number of incoming edges #t−1 (v) is the same for all v ∈ Vi . Theorem 5.54. A minimal shift is a Toeplitz shift if and only if it has a representation as Bratteli-Vershik system that is expansive, has a unique minimal path xmin , and satisfies the equal path number property. In fact, Gjerde & Johansen [274] proved this for invertible Toeplitz shifts, provided the Bratteli diagram is properly ordered, so it has both a unique minimal and a unique maximal path. Remark 5.55. If the equal path number property holds, then the sequence p = (pi )i∈N ⊂ N, pi := #t−1 (v) for v ∈ Vi is well-defined. We label the incoming edges e ∈ Ei with t(e) = v in increasing order with the labels 0, 1, . . . , pi −1. Then the labeling map ψ : XBV → Σp assigning the sequence of labels to each path is a continuous factor onto the p-adic odometer (Σp , a) and ψ◦τ = a◦ψ. This gives another way of seeing that odometers are factors of Toeplitz shifts. Remark 5.56. Relating Toeplitz shifts to Kakutani-Rokhlin partitions, for a Toeplitz sequence with periodic structure (qn )∞ n=1 , the elements of Pn are the sequences that share the same qn -skeleton, and the heights hi (n) = qn . In order to prove Theorem 5.54, we start with a lemma that holds for general Bratteli-Vershik systems with a unique minimal path. 10 In

[77] this property is called equal row sum (ERS).

248

5. Further Minimal Cantor Systems .

Lemma 5.57. Every Bratteli-Vershik system with a unique minimal path can be telescoped such that the minimal incoming edge at every v ∈ Vˆk+1 , k ≥ 0, has the same source u ˆk ∈ Vˆk . Proof. First remove the minimal path xmin from the Bratteli diagram. Since there is only one minimal path, for no v ∈ Vi , i ∈ N, there remains an infinite minimal path starting at v. That is, there is an increasing sequence (ik )k∈N , such that no minimal path connects Vik−1 to Vik . Therefore, after telescoping ˆk , Vˆk )k∈N , between Vik−1 and Vik for all k ∈ N, obtaining a Bratteli diagram (E ˆ ˆ there is no minimal edge connecting Vk−1 to Vk for any k ∈ N. Now reinsert ˆk , Vˆk )k∈N . This achieves the (telescoped version of the) minimal path in (E the required property.  Proof of Theorem 5.54. First we note that telescoping a Bratteli diagram preserves the equal path number property. Indeed, set pi = #{e ∈ Ei : t(e) = v ∈ Vi }; this is independent of v because of the equal path property. & Then there are qi := ij=1 pj paths connecting v0 to each v ∈ Vi . If we & telescope between Vk and Vi , k < i, then the new pˆi = ij=k+1 pj is still independent of v. Suppose the Vershik map is expansive, with expansivity constant δ > 0. We can find i such that the distance of every two paths with xj = yj for all j ≤ i is less than δ. Therefore, if we telescope between v0 and Vi , the ˆ Vˆ , τˆ) has the property that for every two new Bratteli-Vershik system (E, distinct paths x, y, there is an n such that τˆn (x) and τˆn (y) differ at the first entry. Now we come to the proof, treating the “if”-part first: ⇐: Because there is a unique minimal path, Lemma 5.57 gives a telescoping after which the minimal incoming edge at every v ∈ Vˆk+1 , k ≥ 0, has the same source u ˆk ∈ Vˆk . The analogous argument holds for maximal edges, provided we have a unique maximal path. This is useful to obtain an invertible Toeplitz shift (which is in fact an odometer; see [208, below Theorem 5.1]), but it is not required for one-sided Toeplitz shifts. ˆk : t(e) = v ∈ Vˆk } and let qˆk = &k pˆk be the Now let pˆk = #{e ∈ E j=1 number of edges connecting v0 to v ∈ Vˆk . Then every path will go through u ˆk exactly once every qˆk+1 iterates of the Vershik map τˆ. Let θ0 . . . θqˆk −1 ˆ1 from the iterates be the first qˆk symbols read off at the first edge set E j min τˆ (x ), 0 ≤ j < qˆk . This word is repeated with period qˆk+1 . Since k is τ j (xmin ))j≥0 is Toeplitz, as required. arbitrary, the full sequence (θj )j≥0 = (ˆ ⇒: For the opposite direction, we will first distill a sequence of KRpartitions (Pn )n≥1 from the Toeplitz sequence θ = (θj )j≥0 in alphabet A.

5.4. Bratteli Diagrams and Vershik Maps

249

We write X = orbσ (θ) for the (one-sided) Toeplitz shift space. Let q1 be the period of θ0 , and let V1 be the collection of q1 -prefixes of {σ kq1 (θ) : k ≥ 0}. Next set the first base Bv (1) := {x ∈ X : x and θ share q1 -skeleton and x0 . . . xq1 −1 = v}, for v ∈ V1 , so that we obtain the first Kakutani-Rokhlin partition P1 := {σ j (Bv (1)) : 0 ≤ j < q1 , v ∈ V1 }. To continue the induction, suppose we have found qn , Vn , (Bv (n))v∈Vn , and Pn = {σ j (Bv (n)) : 0 ≤ j < qn , v ∈ Vn }. Let qn+1 be the minimal period with which the word θ0 . . . θqn appears in θ. Since θ0 . . . θqn−1 −1 is a prefix of θ0 . . . θqn , qn+1 is a multiple of qn . Let Vn+1 be the collection of qn -prefixes of {σ kqn+1 (θ) : k ≥ 0}. Next set the n + 1-st base Bv (n + 1) := {x ∈ X : x and θ share qn -skeleton and x0 . . . xqn −1 = v}, for v ∈ Vn+1 , so that we obtain the n + 1-st Kakutani-Rokhlin partition Pn+1 := {σ j (Bv (n + 1)) : 0 ≤ j < qn+1 , v ∈ Vn+1 }. Clearly Pn+1 refines Pn and the height of each base element Bv (n + 1) is the same, namely qn+1 , for each v ∈ Vn+1 .  Since all v ∈ Vn+1 have θ0 . . . θqn −1 as prefix, v∈Vn+1 Bv (n + 1) ⊂ Bθ0 ...θqn −1 (n), verifying condition (KR6) of Section 5.1. To check (KR4), suppose that x = x ∈ X and let k be the smallest entry for which xk = xk . Take n such that qn > k. If x, x are in different qn -skeletons, then they belong to different levels in Pn . If x, x are in the same qn -skeleton, then there is j < qn , but two different w, w ∈ Vn such that x ∈ σ j (Bw (n)) and x ∈ σ j (Bw (n)). This shows that (Pn )n≥1 separates points. Since all Pn ’s are partitions into clopen sets, the Pn ’s generate the topology of X. Hence (Pn )n≥1 satisfies (KR1)–(KR4) and (KR6). Condition (KR5) may fail but can be achieved by taken a subsequence of (Pn )n≥1 . Finally, to construct the Bratteli-Vershik system, Vn are the vertex sets. For each v ∈ V1 , the edge set E1 contains q1 edges connecting v0 to each v. To get a simple cap of the Bratteli diagram, we can microscope between {v0 } and V1 by inserting a level V 1 = A such that there is a single edge 2 (labeled a) between v0 and a ∈ V 1 . There will be q1 edges between a and 2 v ∈ V1 , ordered in the same way as the letters appear in v. The general En , n ≥ 2, will, for each v ∈ Vn , contain qn /qn−1 edges connecting v with u ∈ Vn−1 ordered in the same way that σ (rv +k)qn−1 (θ) visits the u’s. Here rv = min{r ≥ 0 : σ rqn−1 (θ) ∈ v}. Clearly the equal path number property is satisfied. 

250

5. Further Minimal Cantor Systems

v0

V1

V2

V3

10

0 1011

1

0

1

0

0

11

V1

1010

V1

0

10

V2

1

11

0 1011

1

0

1

1 10111011

1

2

1

10111010

0 V4

v0

0

1 1010

0

1

10111010

10111011

0

1

0

1

0

1

0

1

V3

V4

Figure 5.12. Bratteli diagrams for the Feigenbaum Toeplitz shift.

Example 5.58. The Feigenbaum sequence, ρfeig = 10 11 1010 10111011 1011101010111010 . . . , i.e. the fixed point of the Feigenbaum substitution χfeig : 0 → 11, 1 → 10 (see Examples 1.6 and 4.86) is a Toeplitz sequence. Its periodic structure is qi = 2i , so ρ0 . . . ρ2i −1 reappears with period 2i+1 for i ≥ 0. We find V1 = {10, 11}, V2 = {1011, 1010}, and in general Vn = {ρ0 . . . ρ2n −2 0, ρ0 . . . ρ2n −2 1}. The resulting Bratteli diagram and its microscoped form (right panel) is given in Figure 5.12. This microscoped Bratteli diagram coincides with the one obtained from the Feigenbaum substitution. It is not properly ordered, because it has two maximal paths, which agrees with ρfeig having two preimages in Xfeig . 5.4.4. Bratteli-Vershik Systems and Cutting and Stacking. BratteliVershik systems are in one-to-one correspondence with cutting and stacking systems. The translation algorithm from ordered Bratteli-Vershik system to cutting and stacking in its short version is as follows: Start with #V1 stacks Si , coded by i ∈ V1 = {0, . . . , #V1 − 1}. Then by induction, assuming Sv , v ∈ Vn−1 , are the stacks for n ∈ N, we repeat the following two steps: (1) Cut each stack Sv , v ∈ Vn−1 , into #{e ∈ En : s(e) = v} slices Sv,e .

5.4. Bratteli Diagrams and Vershik Maps

251

(2) If e ∈ En (with s(e ) = v) is the direct successor of e ∈ En (with s(e) = v) among all edges with the same terminal vertex in w = t(e) = t(e ) ∈ Vn , then put slice Sv ,e on top of Sv,e . At every finite stage n, the codes at the bottom of the stacks represent minimal n-paths, and the codes at the top of the stacks represent maximal n-paths. This shows in particular that the number of minimal and maximal paths in an ordered Bratteli diagram is bounded by the rank. Example 5.59. The first three iterations of the above algorithm are worked out for the Fibonacci Bratteli diagram in Figure 5.13. In this case, one part of one bottom level always stays at the bottom; this corresponds to the one minimal path xmin . The top level of one part of the split stack and the top level of the other stack always stay on top; this corresponds to the two maximal paths. t v0 @ @ @ @ @t V1 tH  HH  HH  H HH  0 1 0 H  H V2 t t HH   HH  H HH  HH 0 1 0  V3 t t H Figure 5.13. and stacking

0

1

1 0

0

0 1

1

0

0

The Fibonacci Bratteli diagram and equivalent cutting

If (XBV , τ ) is equipped with a non-atomic τ -invariant measure μ, then we can give the precise width of all stacks and slices in terms of the μmeasure of cylinder sets, and ultimately the precise form of the cutting and stacking interval map T : [0, 1] → [0, 1]. We use here a simplified version of the algorithm described in [401]11 , for which we need, for each n ∈ N and v ∈ Vn−1 , an order ≤s on the outgoing edges {e ∈ En : s(e) = v}. This produces a total lexicographical order ≺lex on XBV : given x = x ∈ XBV , find the smallest n ∈ N such that xn = xn (whence s(xn ) = s(xn )) and set x ≺lex x if and only if xn 0 and μ(Ac ) > 0, then the indicator function ψ = 1A is T -invariant, but not constant μ-a.e.  Exercise 6.8. Show that ergodicity of μ is equivalent to the following: if μ = αμ1 +(1−α)μ2 for two measures and some α ∈ (0, 1), then μ = μ1 = μ2 . Conclude that if there is only one invariant probability measure, it has to be ergodic. Example 6.9. A Sturmian shift is ergodic. To prove this, let Rα : S1 → S1 be an irrational circle rotation; it preserves Lebesgue measure m. We show that every function ψ ∈ L2 (m) must be constant. Indeed, write T -invariant 2πinx as a Fourier series. The T -invariance implies that ψ(x) = n∈Z an e 2πiα an e = an for all n ∈ Z. Since α ∈ / Q, this means that an = 0 for all n = 0, so ψ(x) ≡ a0 is indeed constant. Since the Sturmian shift with rotation number α (with its unique invariant probability measure) is isomorphic (see Definition 6.53) to (S1 , Rα , m), the ergodicity of the rotation carries over to the Sturmian subshift. The set M(T ) of all invariant measures is convex; it is a special case of a Choquet simplex. The actual definition of a Choquet simplex is that it is a compact, metrizable, convex set in which every element can be decomposed

260

6. Methods from Ergodic Theory

uniquely3 as a convex combination of extremal points4 (this is an instance of Choquet’s Theorem; see [460]). The set of probability measures has indeed this property, since, as Exercise 6.8 showed, the ergodic measures Merg (T ) are precisely the extremal points of this simplex. Hence, as a consequence of Choquet’s Theorem, for every μ ∈ M(T ), there is a probability measure ν on Merg (T ) such that 0 μerg (A)dν for every A ∈ B. μ(A) = Merg (T )

This is called the ergodic decomposition of μ. The Choquet simplex is called a Poulsen simplex if its set of extremal points (here the ergodic measures) is dense. Up to homeomorphisms (and in fact affine homeomorphisms) there is only one non-singleton simplex in which the extremal points are dense, see [400], so we can speak of the Poulsen simplex. The next theorem is due to Sigmund [507, 508] with precursors by Ruelle [480]. Theorem 6.10. If (X, T ) is a continuous dynamical system with specification on a compact metric space, then the set of equidistributions on periodic orbits is dense in Merg (T ), so Merg (T ) is a Poulsen simplex. A fortiori, for every convex subset V ⊂ Merg (T ), there is x ∈ X such that V is exactly the  set of weak∗ accumulation points of ( n1 n−1 i=0 δT i (x) )n∈N . However, also for dynamical systems lacking specification, and in particular zero entropy subshifts, the Choquet simplex can be Poulsen. Downarowicz demonstrated that the family of Toeplitz shifts is so rich that for every simplex Σ, there is a Toeplitz shift whose Choquet simplex equals Σ; see [207]. Cortez & Rivera-Letelier [168] showed that for enumeration systems every simplex Σ with a compact, totally disconnected set of extremal points can emerge as Choquet simplex. Kułaga-Przymus et al. [379] showed that for B-free shifts with positive entropy, the set of shift-invariant measures is a Poulsen simplex.

6.2. Birkhoff ’s Ergodic Theorem A simple consequence of having an invariant probability measure is: Theorem 6.11 (Poincaré Recurrence Theorem). If (X, B, T ) has an invariant probability measure μ, then for every set A ∈ B and μ-a.e. x ∈ A, there is n ≥ 1 such that T n (x) ∈ A. This property of μ is called recurrence, hence the name of the theorem. 3 Therefore a filled triangle is a Choquet simplex, but a filled square is not, because its center is the convex combination of the corners in multiple ways. 4 I.e. points that cannot be written as non-trivial convex combinations of other points.

6.2. Birkhoff’s Ergodic Theorem

261

Proof. Let A ∈ B be an arbitrary set of positive measure (if μ(A) = 0, −i (A)) = μ(A) > 0 for the result is trivially true). As μ is invariant, μ(T  −i all i ≥ 0. On the other hand, 1 = μ(X) ≥ μ( i T (A)), so there must be overlap in the backward iterates of A; i.e. there are 0 ≤ i < j such that μ(T −i (A) ∩ T −j (A)) > 0. Take the j-th iterate and find μ(T j−i (A) ∩ A) ≥ μ(T −i (A) ∩ T −j (A)) > 0. This means that a positive measure part of the set A returns to itself after n := j − i iterates. For the part A of A that didn’t return within n steps, assuming A has positive measure, we repeat the argument. That is, there is n such that   μ(T n (A ) ∩ A ) > 0 and then also μ(T n (A ) ∩ A) > 0. Repeating this argument, we can exhaust the set A up to a set of measure zero, and this proves the theorem.  Remark 6.12. Kac’s Lemma provides a quantitative version of Poincaré recurrence. If τA (x) = min{n ≥ 1 : T n (x) ∈ A} is the first return time to some A ∈ B with μ(A) > 0, then A τA (x) dμ = 1. The central result in ergodic theory is paraphrased as Space Average = Time Average, at least for typical points. This is called Birkhoff’s5 Ergodic Theorem: Theorem 6.13. Let μ be a probability measure and let ψ ∈ L1 (μ). Then the ergodic average n−1 1

ψ ∗ (x) := lim ψ ◦ T i (x) n→∞ n i=0

exists μ-a.e., and (6.3)

ψ∗

is T -invariant. If in addition μ is ergodic, then 0 ∗ ψ dμ μ-a.e. ψ = X

Remark 6.14. A point x ∈ X satisfying (6.3) is called μ-typical. To be precise, the set of μ-typical points also depends on ψ, but for different functions ψ, φ, the (μ, ψ)-typical points and (μ, φ)-typical points differ only on a nullset. Exercise 6.15. Let (X, T, B, μ) be an ergodic measure-preserving system on a compact metric space. Show that T is topologically transitive on supp(μ). 5 Named after George Birkhoff (1884–1944). Details of the controversy on priority of the 1 i−1 i Ergodic Theorem (John von Neumann was earlier in proving his L1 -version n k=0 UT ψ →  1 ψ dμ in L (μ), but Birkhoff delayed its publication until after the appearance of his own X paper) can be found in [569].

262

6. Methods from Ergodic Theory

Definition 6.16. A point x ∈ X is called quasi-generic6 w.r.t. μ if there are sequences (an )n∈N and (bn )n∈N with bn → ∞ such that 0 a +bn −1 1 n

j lim ψ ◦ T (x) = ψ dμ n→∞ bn X j=an

for every continuous function ψ.

6.3. Unique Ergodicity For continuous dynamical systems on compact spaces (such as subshifts), Theorem 6.2 provides at least one invariant measure. The question we raise in this section is whether there is a unique invariant measure. Definition 6.17. A transformation (X, T ) is uniquely ergodic if it admits only one invariant probability measure. If (X, T ) is both uniquely ergodic and minimal, we call it strictly ergodic. Example 6.18. An SFT is not uniquely ergodic, exceptwhen it consists of a single periodic orbit, because the equidistribution n1 n−1 i=0 δσ i (x) an each ∞ periodic sequences x = (x1 . . . xn ) is an invariant measure. The same holds for sofic shifts. On the other hand, Sturmian shifts (X, σ) are strictly ergodic, and their unique measure is obtained by lifting Lebesgue measure from the circle, using the itinerary map i : S1 → X; that is, μ(A) = Leb(i−1 (A)). Example 6.19. The Cantor substitution from Remark 2.14  0 → 000, χCantor : 1 → 101 with fixed point ρCantor = 101000101000000000101000101 . . . generates a non-minimal (since it contains a fixed point 0∞ ) subshift (ΣCantor , σ). It is uniquely ergodic, with the Dirac measure δ0∞ being the only shift-invariant probability measure, because for every n and among all words of length n ∈ N, the word 0n occurs with limit frequency 1. This example shows that the support of the unique invariant measure need not be the whole space. As mentioned in Exercise 6.8, if (X, T ) is uniquely ergodic, its unique measure is ergodic. A very useful property of uniquely ergodic systems is that Birkhoff averages converge uniformly, rather than only μ-a.e. Theorem 6.20 (Oxtoby’s Theorem). A continuous dynamical system T : X → X on a compact space is uniquely ergodic if and only  if, for every i continuous function ψ : X → R, the Birkhoff averages n1 n−1 i=0 ψ ◦ T (x) converge uniformly to a constant function. 6 Recall

that sometimes μ-typical points are called μ-generic.

6.3. Unique Ergodicity

263

In fact [277, Theorem 4.10], if every point is typical for a generic measure7 , then (X, T ) is uniquely ergodic. A major consequence of unique ergodicity is the uniform existence of visit frequencies; i.e. for a uniquely ergodic subshift (X, σ, μ) 1 (6.4) μ([a1 . . . aN ]) = lim #{0 ≤ j < n : xj+1 . . . xj+N = a1 . . . aN }, n→∞ n for every word a1 . . . aN and all x ∈ X. Proof. If μ and ν were two different ergodic then we can find a measures, continuous function ψ : X → R such that ψ dμ = ψ dν. Using Birkhoff’s Ergodic Theorem 6.13 for both measures (with their own typical points x and y), we see that 0 0 n−1 n−1 1

1

k ψ ◦ T (x) = ψdμ = ψdν = lim ψ ◦ T k (y), lim n→∞ n n→∞ n k=0

k=0

so there is no uniform convergence to a constant function. Conversely, we know by Birkhoff’s Ergodic Theorem 6.13 that 0 n−1 1

k ψ ◦ T (x) = ψ dμ lim n n k=0

is constant μ-a.e. But if the convergence is not uniform, there is a nithen −1 ψ ◦ T k (yi ) = sequence (yi )i∈N ⊂ X and (ni )i∈N ⊂ N, such that limi n1i k=0 ni −1 := n1i k=0 δT k (yi ) . This sequence X ψ dμ. Define probability measures νi ∗ (νi )i∈N has a weak accumulation points ν which is shown to be T -invariant measures in the same way as in the proof of Theorem 6.2. But ν = μ because ψ dν = ψ dμ. Hence (X, T ) cannot be uniquely ergodic.  We think of unique ergodicity as an indicator of low complexity. For instance, the uniquely ergodic subshifts presented so far (such as Sturmian and substitution shifts) have zero entropy, and the non-uniquely ergodic shifts (SFTs, sofic shifts) have positive entropy. However, there are minimal zero entropy shifts that are not uniquely ergodic. One of the first such examples is due to Keane [348] and comes from an interval exchange transformation on four intervals. It is known [542, Theorem 2.12] that any transitive interval exchange transformation8 on n intervals can have at most n/2 ergodic measures, so IETs on two or three intervals are uniquely ergodic. On the other hand, there are positive entropy subshifts that are uniquely ergodic. Without proof, we state a general result by Krieger [372] in this direction. 7 See 8 For

Remark 1.31 for the definition. Interval Translation Maps on n intervals, the bound is (n + 1)/2; see [128, 137].

264

6. Methods from Ergodic Theory

Theorem 6.21. Every subshift (X, σ) has a uniquely ergodic subshift (X  , σ) of the same entropy. On the positive side, there are several conditions implying unique ergodicity. Theorem 6.22. Let (X, T ) be an equicontinuous surjection on compact metric space (X, d). Then the following are equivalent: (a) T is transitive. (b) T is uniquely ergodic. (c) Every T -invariant probability measure is ergodic. The main implication (a) ⇒ (b) is due to Fomin [250] for minimal dynamical systems and was generalized to transitive systems by Oxtoby [440], as we shall do in the proof below (but see Exercise 2.28). In fact, Oxtoby’s proof applies to transitive mean equicontinuous systems as well. Proof. (a) ⇒ (b): Assume that T : X → X is equicontinuous and transitive. Let ψ : X → R be continuous and, due to the compactness of X, also uniformly continuous. Take ε > 0 arbitrary, and choose δ > 0 so that |ψ(x)−ψ(y)| < ε whenever d(x, y) < δ. By equicontinuity, we can find γ > 0 such that d(x, y) < γ implies d(T j (x), T j (y)) < δ for all j ≥ 0. Take a point x ∈ X with a dense orbit, and let y ∈ X be arbitrary. Choose k ≥ 0 such that d(T k (x), y) < γ. Then / / / /n−1 /k−1 / 

/  1 //  1 // j j j n−k+j / (x) − ψ ◦ T (y) ≤ (x) − ψ ◦ T (y) ψ ◦ T ψ ◦ T / n // n // / j=0 j=0 / n−1 //

 ψ ◦ T j (x) − ψ ◦ T j−k (y) // + / j=k  1 k(sup ψ − inf ψ) + (n − k)ε → ε ≤ n as n → ∞. Since ε and y ∈ X are arbitrary, it follows that the ergodic average of every point converges to the same value. By Oxtoby’s Theorem 6.20, this implies unique ergodicity. (b) ⇒ (c): This follows immediately from Exercise 6.8. (c) ⇒ (a): Suppose that T is not transitive, so in particular not minimal. Let (Y, T ) be a T -invariant subsystem for some compact proper subset Y ⊂ X. It supports an invariant measure μ0 , due to the Krylov-Bogul’yubov Theorem 6.2. Take ε > 0 so small that there is x ∈ X such that d(x, Y ) > ε. As T is equicontinuous and surjective, Corollary 2.35 implies that there

6.3. Unique Ergodicity

265

is δ > 0 such that d(T n x, Y ) > δ for all n ∈ N. Hence, for any weak∗ accumulation point nk −1 1

μ1 = lim δT i x , k→∞ nk i=0

the support supp(μ1 ) ∩ Y = ∅. Therefore 12 (μ0 + μ1 ) is not ergodic.



Lemma 6.23. If a shift (X, σ) is balanced on words, see Definition 4.36, then it is uniquely ergodic. Proof. Suppose that μ and ν are two different ergodic invariant measures and that u ∈ L(X) is such that μ([u]) = ν([u]). Take typical points x and y (in the sense of the Birkhoff Ergodic Theorem 6.13,) for μ and ν, respectively. Then | |x1 . . . xn |u −|y1 . . . yn |u | ∼ n|μ([u])−ν([u])| → ∞ as n → ∞, so L(X) cannot be balanced.  Example 6.24. If L(X) is R-balanced on words, then it is R-balanced on letters, but the other direction fails of course. For example, the full shift X := {0, 1}Z is not balanced, but χT M (X) for the Thue-Morse substitution χTM is balanced on letters but not even balanced on 2-words. This example shows that balancedness on letters is not sufficient for Lemma 6.23. Adding machines are minimal isometries, and therefore uniquely ergodic by Theorem 6.22. For the more general class of Toeplitz shifts, the question of unique ergodicity is more interesting. A following result (requiring regularity; see Section 4.5.1) is by Jacobs & Keane [332]. Theorem 6.25. Every regular Toeplitz shift is uniquely ergodic. Proof. The original result is [332, Theorem 5]; we follow [381, Theorem 4.78], using the notation from Theorem 4.92. In particular, Li = lcm(q1 , . . . , qi ), where (qi )i∈N is the periodic structure of the Toeplitz sequence. Let u ∈ L(x) be an arbitrary word; then for i so large that Li > |u|, the frequency μi (u) := L1i |V (i)|u of the word u in V (i) is a lower bound for

i inf j≥0 lim inf n n1 #{1 ≤ k ≤ n : xj+n+1 . . . xj+n+|u| = u}, whereas μi + |u|+r Li is an upper bound for supj≥0 lim supn n1 #{1 ≤ k ≤ n : xj+n+1 . . . xj+n+|u| =

i → 0 as i → ∞. Therefore limi μi (u) exists, and the visit u}. But |u|+r Li frequency is well-defined and with uniform convergence on orbσ (x). By Oxtoby’s Theorem 6.20, this implies unique ergodicity. 

However, there are non-uniquely ergodic Toeplitz shifts. For an explicit counterexample to unique ergodicity if regularity fails, see [381, Theorem 4.78] and also the aforementioned result of Downarowicz [207]. The following result is due to Furstenberg [265] and is used to prove the unique ergodicity (and minimality) of many skew-product systems.

266

6. Methods from Ergodic Theory

Proposition 6.26. Let (X, T, μ) be uniquely ergodic and let G be a compact group with Haar measure9 m. Let the group extension S : Y → Y be defined on Y := X ×G as S(x, g) = (T (x), f (x)g) for some f : X → G. If S is ergodic w.r.t. ν = μ × m, then S is uniquely ergodic. Proof. Let (x, g) ∈ Y be the ν-typical point, so it satisfies Birkhoff’s Ergodic Theorem 6.13 w.r.t. every continuous function ϕ : Y → R. For any h ∈ G, ϕ˜ defined by ϕ(g) ˜ = ϕ(gh) is continuous too, so (x, gh) is ν-typical w.r.t. ϕ because (x, g) is ν-typical w.r.t. ϕ. ˜ It follows that there is a subset W ⊂ X with μ(W ) = 1 such that W × G consists entirely of ν-typical points. If ν  was another ergodic S-invariant probability measure, then the argument above gives a set W  ⊂ X \ W with ν  (W ) = 1 such that W  × G consists entirely of ν  -typical points. Then the projected measure μ on X defined by μ (A) = ν  (A×G) is T -invariant and satisfies μ (W  ) = 1. But W and W  are disjoint, so μ = μ , contradicting that T is uniquely ergodic.  6.3.1. Unique Ergodicity and Word-Complexity. Subshifts with a sufficiently low word-complexity p(n) are uniquely ergodic. The following results in this direction are due to Boshernitzan [96, 97]. Theorem 6.27. Let (X, σ) be a minimal shift and let b := lim inf n p(n)/n < ∞. Then there are at most10 max{b, 1} shift-invariant ergodic measures. Let us first discuss some extensions and related results. A strengthening of the previous theorem is: if b < 3, then (X, σ) is uniquely ergodic, see [96, Theorem 1.5], and without the minimality assumption [230, Theorem 1.4]. Damron & Fickenscher [177, Theorem 1] show, using bi-special words and Rauzy graph techniques, that if there is a constant such that p(n) = bn + C for some 3 ≤ b ∈ N, C ∈ N, and all sufficiently large n, then there are at most b − 2 shift-invariant ergodic measures. In [178], they improved this estimate to b/2 shift-invariant ergodic measures for transitive subshifts, provided all bi-special words are regular; see Definition 1.11. There are also extensions to the non-minimal setting, see [435, Theorem 1.5.], namely that if X = orbσ (x) for a bi-infinite sequence x that is not eventually periodic in both directions, then lim inf n p(n)/n < 2 implies unique ergodicity. Cyr & Kra [174] noted that, without minimality assumption, there are no more than max{b, 1} generic measures11 . They also give, for any 4 ≤ b ∈ N, an example of a subshift with lim inf n pn /n = b − 1 < b = lim supn p(n)/n having precisely b ergodic measures. 9 Haar measure, introduced by Alfréd Haar [295], is the unique measure on a group that is preserved under each group rotation: x → gx. 10 Or max{b+  − 1 , 1} shift-invariant ergodic measures if b+ := lim inf p(n)/n < ∞. n 11 See Remark 1.31 for the definition.

6.3. Unique Ergodicity

267

Theorem 6.28. Let (X, σ) be a minimal subshift and μ a shift-invariant ergodic measure. Define εμ (n) = min{μ(Z) : Z is an n-cylinder in X}. If lim supn εμ (n) n > 0, then (X, σ) is uniquely ergodic. Corollary 6.29. Every transitive linearly recurrent subshift is uniquely ergodic. Proof. Since every word w ∈ Ln reoccurs with gap ≤ Ln, Birkhoff’s Ergodic Theorem 6.13 implies that μ([w]) ≥ 1/(Ln) > 0 for every ergodic invariant measure μ. Therefore Theorem 6.28 applies.  Lemma 6.30. A subshift (X, σ) is linearly recurrent (see Definition 4.1) if and only if εˆ := lim inf n nεμ (n) > 0 for every σ-invariant measure μ. Proof. The implication ⇒ uses the same argument as in the previous proof. The reverse implication ⇐ is due to Boshernitzan (unpublished) and appears in [74]12 . Let u ∈ Ln be arbitrary, and let N = N (n, u) be the length of the longest return word x = x1 x2 . . . xN associated to u. Note that N > n because otherwise x is periodic; see Example 4.3. For  n ≤ k ≤ N , let Uk be the collection of words ending in x1 . . . xk . Then μ( v∈Uk [v]) = μ([x1 . . . xk ]) ≥ εˆ/k. Since the sets Uk are all disjoint (after all u cannot reappear in the return word x), it follows that 1≥

0 N +1 N

N εˆ εˆ ≥ dx ≥ εˆ log . k x n n

k=n

Therefore N/n ≤ e1/ˆε , uniformly in n and u ∈ Ln (X). Since every x ∈ X is a concatenation of return words to u, linear recurrence follows for L = e1/ˆε .  Proof of Theorem 6.27. A sequence of words (Bk )k≥1 of lengths |Bk | → ∞ as k → ∞ is called generic for a measure μ if for every word B ∈ L (defining a cylinder [B] = {x ∈ X : B is prefix of x}), lim

k→∞

|Bk |B = μ(B), |Bk |

where |Bk |B denotes the number of appearances of B in Bk . An argument similar to Theorem 6.2 shows that there is a subsequence (kj ) and a measure μ such that (Bki )i≥1 is generic for μ. 12 I

am grateful to Fabien Durand for pointing this out to me and the streamlined argument.

268

6. Methods from Ergodic Theory

Let W = {n ∈ N : there are ≤ b right-special words of length n}. Since p(n) ≤ (b+ 12 )n for large n and p(n+1) ≥ p(n)+#{right-special words in Ln }, the set W is infinite. For each n ∈ W , let {Rn,1 , . . . , Rn,b } ⊂ Ln be the collection of right-special words. (If there are fewer, then we just duplicate some.) By taking subsequences (at most b times), we can construct a subset 1 such that (Rn,k )  is generic for a single measure μk , k = 1, . . . , b. W n∈W Take m(n) = (b + 1)n. The following claim is proved as in Proposition 1.12. Claim 1: Every m(n)-word contains a right-special word. Let μ be an ergodic shift-invariant measure. Choose a μ-typical point 1 , we can decompose x into m(n)-words x = x ∈ X. For every n ∈ W Cn,1 Cn,2 Cn,3 Cn,4 . . . , and each word Cn,i contains a right-special word Bi,k(i) so that at least one of them occurs with upper density 1 1 lim sup #{0 ≤ i < j : Bi,k(i) = Rn,k } ≥ . j b j→∞ 1 ; then μ = Claim 2: Set η = 4b(b+1) non-negative measure.

1 1−η (μ

− η · μk ) is a

From Claim 2 we conclude that μ = (1 − η)μ + ημk . But since μ was assumed to be ergodic (see Exercise 6.8), μ = μk . Hence there are at most b ergodic measures. It remains to prove Claim 2. Let A ⊂ X be arbitrary. Since (Rn,k )n∈N is generic for μk , we have |Rnk |A /|Rn,k | ≥ 12 μk (A) for all sufficiently large n. By definition of upper density, #{0 ≤ i < j : Cn,i contains Rn,k } > j/2b for all sufficiently large j. Therefore |x1 . . . xj·m(n) |A >

j j μ(A)n |Rn,k |A ≥ . 2b 2b 2

Dividing by jm(n) we find |x1 . . . xj·m(n) |A μk (A) n μk (A) ≥ ≥ = η · μk (A). jm(n) 4b m(n) 4b(b + 1) 1 But since x is typical for μ, also jm(n) |x1 . . . xj·m(n) |A → μ(A), and therefore μ(A) ≥ η · μk (A). This proves Claim 2, thus completing the entire proof. 

Proof of Theorem 6.28. If lim inf n p(n)/n in Theorem 6.27 is infinite, then lim supn nεμ (n) = 0 and there is nothing to prove. Hence we can assume that b := lim inf n p(n)/n < ∞ and there are finitely many ergodic shift-invariant measures. Clearly μ is one of them. Assume there is another ergodic measure μ , and take an N -cylinder [a1 . . . aN ] such that

6.3. Unique Ergodicity

269

μ([a1 . . . aN ]) = μ ([a1 . . . aN ]). Without loss of generality, we can assume that μ([a1 . . . aN ]) < u < v < μ ([a1 . . . aN ]) / [u, v] for every ergodic shift-invariant measure μ . By and μ ([a1 . . . aN ]) ∈ the Birkhoff Ergodic Theorem 6.13,   1 j lim μ x ∈ X : #{0 ≤ j < n : σ (x) ∈ [a1 . . . aN ]} ∈ [u, v] = 0. n→∞ n However, by minimality there are arbitrarily large n and two n-cylinders Z1 and Z2 such that 1 1 |Z1 |a1 ...aN < u < v < |Z2 |a1 ...aN . n n Let Z = z1 . . . zr be the smallest word having Z1 as prefix and Z2 as suffix. Since |zj+1 . . . zj+n |a1 ...aN − |zj+2 . . . zj+n+1 |a1 ...aN ≤ 1, there must be at least n(v − u) distinct integers j such that u
0 such that | |χn (b)|a − λn pa qb | ≤ Cρn , where p and q are the left and right leading eigenvectors of A, scaled so that  p a∈A a qa = 1. Also, by the triangle inequality, / / /

/ / n n n n / |χ (b)|a − pa λ qb / | |χ (b)| − λ qb | = / / / a∈A

| |χn (b)|a − λn pa qb | ≤ C #A ρn ≤ a∈A

and | |χn (b)|a − pa |χn (b)| | ≤ | |χn (b)|a − λn pa qb | + pa | |χn (b)| − λn qb | ≤ C  ρn for C  = C(1 + pa #A). Therefore we have / / n / / |χ (b)|a  n / / / |χn (b)| − pa / ≤ C (ρ/λ) → 0, uniformly in a, b ∈ A. For general words w ∈ L(X) (instead of w = χn (b)), we can split (6.6)

 ) . . . χ(v1 )v0 w = v0 χ(v1 ) . . . χn−1 (vn−1 )χn (vn )χn−1 (vn−1

6.3. Unique Ergodicity

271

for some maximal n such that vk and vk has length ≤ L := / / kvn =  and each maxa∈A |χ(a)|. Therefore /|χ (vk )|a − pa λk / ≤ LC  ρk and the same holds for each χk (vk ). Additionally, there can be at most 2n|u| appearances of |u| in between the words χk (vk ) and χk (vk ) in (6.6). Altogether | |w|a − pa |v| | ≤ 2n|u| + 2LC 

n

ρk ≤ C  ρn .

k=0

Divide by |v| ≥ Cλn to obtain (6.5) for u = a ∈ A. Now for a general word u ∈ L (X) we consider the -block shift as in Section 4.2.2. By Proposition 4.22 this -block substitution χ : A → A∗ is primitive. Hence we can apply the first half of the proof to χ and conclude that u ∈ A appears with a single frequency in all words of the block substitution shift space X . Since u ∈ A∗ appears in x ∈ L(X ) if and  only if u ∈ A appears in L(X ), the unique ergodicity follows. 6.3.3. Unique Ergodicity and Bratteli-Vershik Systems. Let (XBV , τ ) be a BV-system with incidence matrices M (n), n ≥ 1, i.e., mv,w (n) = #{e ∈ En : s(e) = v, t(e) = w}. Throughout, we assume that the diagram is telescoped so that all M (n) are strictly positive. Recall from Definition 5.28 that hw (n) = #{paths between v0 and w ∈ Vn }. The (non-)unique ergodicity for Bratteli-Vershik systems was investigated in [77, 80, 248] and in [121, 168, 169] for specific cases. It requires a probabilistic version of the incidence matrices M (n) = (mv,w (n))v∈Vn−1 ,w∈Vn of the Bratteli diagram: (6.7)

K(n) = (kv,w (n))v∈Vn−1 ,w∈Vn ,

kv,w (n) = mv,w (n)

hv (n − 1) . hw (n)

Due to the appearance of hv (n − 1) and hw (n), the matrix K(n) is not just a normalized version of M (n), but all the previous incidence matrices M (1), . . . , M (n − 1) play a role. Lemma 6.32. The columns of every K(n) all add up to 1. Proof. Suppose μ is a τ -invariant Borel measure. Then μ(Z) = μ(Z  ) for every two cylinder sets Z = [x1 , . . . , xn ], Z  = [x1 , . . . , xn ] ⊂ XBV with t(xn ) = t(xn ) = w ∈ Vn . Denote this probability by pw (n). Note that p(n) = (pw (n))w∈Vn is in general not a probability vector, but p˜(n) with p˜w (n) := hw (n)pw (n) is. Indeed, p˜w (n) represents the total mass of all

272

6. Methods from Ergodic Theory

cylinder sets Z = [x1 , . . . , xn ] with t(xn ) = w. It follows that p(n − 1) = M (n)p(n) and

p˜v (n − 1) = hv (n − 1)pv (n − 1) = hv (n − 1) mv,w (n)pw (n) =



w∈Vn

hv (n − 1)mv,w (n)

w∈Vn

p˜w (n) = kv,w (n)˜ pw . hw (n) w∈Vn

Therefore p˜(n − 1) = K(n)˜ p(n),

(6.8) and

kv,w (n) =

v∈Vn−1



mv,w (n)

v∈Vn−1

=

hv (n − 1) hw (n)

hw (n) 1 = 1, hv (n − 1)mv,w (n) = hw (n) hw (n) v∈Vn−1



as claimed.

Example 6.33. To illustrate this lemma, we repeat the telescoping of Example 5.30 with probabilistic incidence matrices:  2    12 1   3 K(1, 3) = 1 1 1 = 1 . 1 2 0 3

Figure 6.1. Telescoping a probabilistic Bratteli diagram.

The following results appears in e.g. [80, Proposition 2.13]. Proposition 6.34. The number of ergodic invariant measures of a BVsystem is bounded by its rank. Proof. If μ is a probability measure  on the path space of the BV-system, then the probabilities p˜v (n) = μ( [x1 . . . xn ] : t(xn ) = v} for v ∈ Vn , n ∈ N,

6.3. Unique Ergodicity

273

determine the measure completely due to the Kolmogorov Extension Theorem. Furthermore, these p˜(n)’s satisfy (6.8), and we can view this as an n infinite chain of linear maps, mapping cones R#V ≥0 to each other: K(1)

K(2)

K(3)

K(4)

#V2 #V3 1 R≥0 ←− R#V ≥0 ←− R≥0 ←− R≥0 ←− · · · . #Vn #Vn n Let Σn = {x ∈ R#V j=1 xj = 1} be the unit simplex in R≥0 . The ≥0 : ergodic measures correspond to the extremal points of the sets  K(n + 1) · K(n + 2) · · · K(n + j)(Σj ). Sn := j≥1

If r is the rank of the BV-system, then #Vn = r infinitely often, and Sn can have no more than r extremal points for every n. This proves that (X, τ ) cannot preserve more than r ergodic measures.  Lemma 6.35. Let K(n) = (kv,w )v∈Vn−1 ,w∈Vn be the probabilistic matrices of a Bratteli diagram as defined in (6.7). Define 2 max{kv,w /kv,w : v ∈ Vn−1 } ρn := max  w,w ∈Vn min{kv,w /kv,w : v ∈ Vn−1 } ∞ for n ≥ 1. If n=1 1/ρn = ∞, then the BV-system is uniquely ergodic. Proof. This goes as the proof of Proposition 6.34,  but now because of Lemma 8.61, there is a unique solution to (6.8) if n 1/ρn = ∞. Hence, under this assumption, unique ergodicity follows. See also [121] and [80, Section 4].  This result gives another proof that minimal linear recurrent shifts are uniquely ergodic, because (after telescoping to make the transition matrices M (n) ≥ 1), the M (n) are still bounded, and also the entries of K(n) are ∞bounded and bounded away from zero. Therefore supn ρn < ∞ and n=1 1/ρn = ∞. In fact, we have a type of exponential mixing: Lemma 6.36. Assume that the Bratteli diagram of a minimal linearly recurrent (with constant L) shift is telescoped so that its transition matrices are strictly positive. Then there exist C > 0 and β = β(L) ∈ (0, 1) such that for every 0 ≤ k ≤ n and v ∈ Vn−k , w ∈ Vn , / / / / μ([v] ∩ [w]) / ≤ Cβ k . / − μ([v]) / / μ([w]) Here [v] = {x ∈ XBV : t(xn−k ) = v} and [w] = {x ∈ XBV : t(xn−k ) = w} are cylinder sets. Proof. Let K = K(n − k + 1) · · · K(n) and let Σn denote the unit simplex n in R#V ≥0 . Then there is β ∈ (0, 1) (which can be derived from (8.29)), such

274

6. Methods from Ergodic Theory

that diam(K(Σn )) ≤ Cβ k diam(K). This means that for each v ∈ Vn−k , the entries Kv,w are no more than Cβ k apart from each other or from any of their convex combinations. That is, / / / /

/ /  / Kv,w μ([w ])μ([w])// |μ([v] ∩ [w]) − μ([v])μ([w])| = /Kv,w μ([w]) − / / w ∈Vn ≤ Cβ k μ([w]). 

Now divide by μ([w]) to get the lemma.

Example 6.37. Let F0 , F1 , F2 , F3 , F4 , . . . = 1, 1, 2, 3, 5, . . . be the Fibonacci numbers. For the Fibonacci Bratteli-Vershik system of 5.13 (i.e. the 1 Figure 1 diagram is stationary with M (1) = (1 1) and M (n) = 1 0 for every n ≥ 2), we find hvn (n) = Fn and hwn (n) = Fn−1 for the vertex sets Vn = {vn , wn }. Therefore  Fn−1 1 F . K(n) = F n n−2 0 Fn After telescoping

⎛ K(n − 1)K(n) =

2Fn−2 ⎝ Fn Fn−3 Fn



Fn−2 Fn−1 ⎠ Fn−3 Fn−1

we obtain strictly positive probabilistic incidence matrices. Now, for the √ 1+ 5 golden mean γ = 2 , we find / / / / / 2Fn−2 Fn−2 / / Fn−3 Fn−3 / 2Fn−2 Fn−3 −4 / / / / / Fn − Fn−1 / + / Fn − Fn−1 / = Fn Fn−1 → 2γ = 0 by Binet’s formula (8.5). Lemma 6.35 shows that the Fibonacci BratteliVershik system is uniquely ergodic. Indeed, we can compute 2 2F2n−1 /F2n √ = 2, ρ˜n := ρ(K(2n − 1)K(2n)) = F2n−1 /F2n  ρn = ∞. so n 1/˜ Lemma 6.35 gives only a sufficient condition, but this is not a necessary condition; see Example 6.37 below. An “if and only if” condition for unique ergodicity is the following. Theorem 6.38. A Bratteli-Vershik system (XBV , τ ) is uniquely ergodic if and only if, after sufficient telescoping,

|kvw (n) − kvw (n)| = 0. lim max  n→∞ w=w ∈Vn

v∈Vn−1

6.3. Unique Ergodicity

275

This is [77, Theorem 3.1]13 and this paper also contains a description of BV-systems with any fixed number of ergodic measures. Example 6.39. Assume that we have a rank 2 simple Bratteli diagram with Vn = {vn , wn } and incidence matrices   1 an − 1 for an ≥ 2, n ≥ 2. M (1) = (1, 1) and M (n) = 1 an − 1 & Then hvn (n) = hwn (n) = nj=1 aj . This gives   1 εn 1 − εn , εn = , n ≥ 2. K(n) = εn 1 − εn an  n for ρn as in Lemma 6.35, so if n εn = ∞ (hence We compute that ρn = 1−ε ε n  n 1/ρn = ∞), then the corresponding BV-system is uniquely ergodic.  Now assume that n εn < ∞. Telescope the matrices K(n) to   ηn,r 1 − ηn,r , L(n, r) = (lvw (n, r))v∈V (n−1) := K(r) · · · K(n) = ηn,r 1 − ηn,r w∈V (r) for r ≥ n ≥ 2, with  ηn,n = εn and ηn,r+1 = ηn,r + εr+1  − 2ηn,r εr+1 < ηn,r + εr+1 . Since n≥2 εn < ∞, we have limr ηn,r ≤ r≥n εr → 0 as n → ∞. Therefore

|lvw (n, r) − lvw (n, r)| → 2 as n → ∞. lim max  r→∞ w=w ∈Vr

v∈Vn−1

In other words, no telescoping will be enough to satisfy Theorem 6.38. Therefore unique ergodicity fails. Instead, there will be two ergodic invariant measures (not more, because the rank is 2), and what they look like depends on the ordering on the Bratteli diagram; cf. [79, Lemma 2.7]. Further examples of this sort can be found in [7, Section 3]. 6.3.4. Unique Ergodicity and Enumeration Systems. Barat et al. proved the following results on enumeration system (see [49, Corollary 2 and Theorem 7] for proofs): Theorem 6.40. Let (XG , a) be the enumeration system based on the enumeration scale (Gj )j≥0 .  (1) If k≥1 1/Gk < ∞, then XG supports an a-invariant probability measure. (2) Conversely, the existence of an a-invariant probability measure implies that lim inf k k/Gk = 0.  (3) If Gj −Gj−1 → ∞ and the sequence (m k≥m 1/Gk )m≥1 is bounded, then (XG , a) is uniquely ergodic. 13 In [7, Proposition 3.1] the complete case of 2 × 2 matrices with the equal row sum property is given.

276

6. Methods from Ergodic Theory

A special case holds for unimodal maps. Corollary 6.41. Let f be a unimodal map with kneading map Q. If k−Q(k) is bounded, then f : ω(c) → ω(c) is uniquely ergodic. Proof. The cutting times (Sk )k≥1 form an enumeration scale with Sk = Sk−1 + SQ(k) . The boundedness of k − Q(k) implies that Sk , ∞ exponentially, and therefore condition (2) in Theorem 6.40 is satisfied.  Unique ergodicity is not given for minimal critical ω-limit sets; see [121]. In fact, the collection of kneading maps is so rich that for every simplex Σ with a compact totally disconnected set of extremal points, there is a kneading map Q such that the corresponding system (ω(c), f ) has its Choquet simplex homeomorphic to Σ; see [168]. This goes to the extent that the invariant measures can form a Poulsen simplex. Note that a : XG → XG need not be continuous, so the existence of an a-invariant measure does not follow from the Krylov-Bogul’yubov Theorem 6.2. Theorem 6.40 shows, however, that unique ergodicity holds whenever lim inf j Gj+1 /Gj > 1, so also if e.g. Gj = 2Gj−1 + 1 for all j ≥ 1, even though in this case a is not continuous. As shown in [121, 168], there are non-uniquely ergodic enumeration systems, Example 6.42. Condition (1) in Theorem k/G n k → 0 is not enough as [49, Example 6] k=1 k! for n ∈ N, G0 = 1, and  (n + 1)Gj−1 + 1 Gj = Gj−1 + 1

6.40 is fairly strict. Having shows. Here we choose Jn = if j = Jn , otherwise.

In this case a : XG → XG is discontinuous at (0) (because Gj − Gj−1 = 1 infinitely often). Also (m)0 = 0 for each GJn ≤ m ≤ GJn+1 , but one can compute that (m)0 = 1 for a definite proportion of the integers GJn−1 < m < GJn . As a result, the cylinder [1] has no well-defined visit frequency for the a-orbit of (0), or in fact of any x ∈ XG . Therefore a admits no invariant measure. 6.3.5. Unique Ergodicity and Interval Exchange Transformations. Recall the definition of interval exchange transformations with permutation π on {1, . . . , d} and length vector λ = (λ1 , . . . , λd ). Under the Keane condition, interval exchange transformations are minimal. The next natural question of whether they are uniquely ergodic, i.e. whether Lebesgue measure is the only T -invariant probability measure, has a negative answer. The work of Veech [541] contains counterexamples, but this was not realized as such, partly because of the terminology he used (skew-product of rotations), except by Keynes & Newton [358] who created an IET T with fives pieces that is not

6.3. Unique Ergodicity

277

uniquely ergodic (in fact, it is an IET on three pieces with eigenvalue −1, and therefore T 2 on five pieces allows two independent T -invariant functions). Keane [348] found a class of examples with four pieces, and this is the minimal possible number, because an IET with d pieces can have at most d/2 ergodic probability measures; see [345,347,542]. Chaika & Masur [151] gave an example of a six piece IET with two ergodic measures and one extra generic (but non-ergodic) measure. We start with Keane’s counterexample; the technique of proving nonunique ergodicity is similar to what is discussed in Section 6.3.3. This example T has permutation π : {1, 2, 3, 4} → {3, 2, 4, 1} and lengths λi = |Δi | satisfying λ3 > λ4 > λ1 ; see Figure 6.2.

m − 1 times

Δ1

Δ4

n times

Δ3

Δ2

Δ2

Δ1

Δ4 d c

b

a

ca

b

d

Δ3

Figure 6.2. An IET and the first return map to D.

The first return map to Δ4 is again an IET with four pieces and permutation π  = π provided we number the sub-pieces of Δ4 in reverse order. Since λ4 > λ1 , T maps the right part Δ of Δ4 into Δ2 , and then T translates T (Δ ) some m − 1 ∈ N times within Δ2 until it covers the right end-point γ2 of Δ2 . Thus there is a decomposition Δ = b ∪ a into intervals such that γ2 is the common boundary point of T m (b) and T m (a). Since T |Δ3 is a translation over λ4 − λ1 = |Δ |, T m+1 (b) is adjacent to the other side of T m (a). Now the adjacent intervals T m (a), T m+1 (b) are mapped n times within Δ3 (for some n ∈ N) before returning to Δ4 . The left part Δ of Δ4 is first mapped onto Δ1 , then into Δ3 , and after n − 1 more iterates it covers γ3 . Thus there is a decomposition Δ = d ∪ c into intervals such that γ3 is the common boundary point of T n+1 (d) and T n+1 (c). The interval c is now back in Δ4 and d needs one more iterate to return. As a whole, the itineraries of the four sub-pieces of Δ4 before returning to Δ4 are described

278

6. Methods from Ergodic Theory

by the substitution and associate matrix: ⎧ ⎛ ⎪ ⎪1 → 4 2m−1 3n , 0 0 1 ⎪ ⎪ ⎨2 → 4 2m 3n , ⎜m − 1 m 0 χ: A=⎜ ⎝ n n−1 , n n − 1 ⎪ 3 →  4 1 3 ⎪ ⎪ ⎪ 1 1 1 ⎩4 → 4 1 3n ,

⎞ 1 0⎟ ⎟. n⎠ 1

Proposition 6.43. There is an IET as above that is not uniquely ergodic. From the proof it is clear that there are actually uncountably many such IETs. Whether Lebesgue measure is ergodic for any of them is still an open question. Proof. For k ∈ N , let Ak be the associated matrix to the substitution χk of the k-th induction step. It was shown in [348, Theorem 1] that for any choice on (mk )k∈N and (nk )k∈N of integers appearing in the matrices (Ak )k∈N , there is an IET realizing these sequences. Let fk : Σ3 → Σ3 , v → vAk /&vAk &1 be the map v → vAk normalized to the 3-simplex Σ3 . Let v0 = 12 (0, 1, 1, 0) and w0 = (0, 0, 1, 0) be two vectors in Σ3 . The goal is to show that for appropriate choices of the integers mk and nk appearing in Ak , the differences 1 for all k ∈ N, (6.9) &f1 ◦ · · · ◦ fk (v0 ) − f1 ◦ · · · ◦ fk (w0 )&1 ≥ 2 because then the convergence to limit frequencies of symbols in σ j (ρT ) is not uniform in j. Oxtoby’s Theorem 6.20 then shows that unique ergodicity fails. Take ε ∈ (0, 18 ) arbitrary and N ∈ N such that 2/N < ε. Set nk = N 2k , mk = 2nk , and εk = ε2−k . Note that &v0 − w0 &1 = 1. Therefore (6.9) follows immediately from the following claim: &fk (v) − v0 &1 , &fk (w) − w0 &1 ≤ εk for all v, w ∈ Σ3 with &v − v0 &1 , &w − w0 &1 < εk+1 . To prove this, assume that v = v0 + η, where &η&1 < εk+1 . Then 1 1 (1, 2nk , 2nk − 1, 2) + (η3 + η4 , 2nk (η1 + η2 ) − η1 , fk (v) = 2nk + O(1) 2  nk (η1 + η2 + η3 + η4 ) − η3 , η1 + η2 + η3 + η4 )   1 1 . = v0 + 0, η1 + η2 , (η1 + η2 + η3 + η4 ), 0 + O 2 nk Therefore, since 1/N < ε/2, &fk (v) − v0 &1 ≤ 32 εk+1 + O(n−1 k ) < εk . The computation for w is analogous. This finishes the proof of the claim and of the whole proposition. 

6.3. Unique Ergodicity

279

Despite these examples, the prevalent case is that IETs are uniquely ergodic. Keane & Rauzy showed in [351] that a residual set of the IETs in  Σ × Sd is uniquely ergodic. In [348], Keane stated the conjecture d−1 d≥2 that Lebesgue-a.e. IETs are uniquely ergodic, and this was proven in separate papers (but in the same issue of the Annals of Mathematics) by Veech [543] and Masur [410].

0

(12)

1 0

(132)

1

0

(13) 1

(123)

1

0

Figure 6.3. Graphs for the Rauzy induction with d = 2 and d = 3.

Rauzy induction constitutes a dynamical system on the parameter space of the family of interval exchange transformations. Recall that for IETs on d pieces, this parameter space is Σd−1 × Sd , where Σd−1 = {λ ∈ [0, 1]d : d i=1 λi = 1} is the d − 1-dimensional simplex and Sd is the group of permutations on {1, . . . , d}. Each copy Σd−1 × {π} is divided in halves Σ0d−1 × {π} and Σ1d−1 × {π} according to whether λd > λπ−1 (d) (Type 0) or λd < λπ−1 (d) (Type 1). The case λd = λπ−1 (d) fails the Keane condition. The Rauzy induction Θ maps Σ0d−1 × {π} diffeomorphically onto Σd−1 × {π  } and Σ1d−1 × {π} diffeomorphically onto Σd−1 × {π  } for some π  , π  ∈ Sd (and π  = π  if d ≥ 3). As such, Θ is a 2-to-1 map and it is schematically represented by 0 a graph with the permutations π ∈ Sd as vertices and arrows π → π  and 1 π → π  ; see Figure 6.3 for the cases d = 2 (circle rotations) and d = 3. Reducible permutations are left out, e.g. for d = 3 there are only three vertices instead of #S3 = 3! = 6, because the permutations (12), (23) and the identity are reducible. The case d = 4, see Figure 6.4, is the first case that shows that the graph need not be irreducible but can fall apart in so-called Rauzy classes. All permutations in the first Rauzy class in Figure 6.4 keep (at least two) adjacent intervals adjacent, and therefore it describes effectively IETs on three intervals. On each Rauzy class, Θ is topologically transitive and in fact topologically exact (i.e. locally eventually onto).

280

6. Methods from Ergodic Theory

1

(1234)

0

(1324) 1

0

0

1

(14)

(13)(24)

1

0

1

0 0

(1432)

(1423)

1

0 1

(1342)

1

(142) 0

(143)

1

0

1

(14)(23) 1 0

(1243)

0

(124) 1

0

(134)

0

1

Figure 6.4. The graph for the Rauzy induction with d = 4. 0/1

The restriction of Θ to the half-simplices Σd−1 is expanding diffeomorphisms. This expansion is achieved by the normalization (i.e. division by 1 − λd or 1 − λπ−1 (d) ) and therefore it is not uniform. Indeed, Σ0d−1 has a hyperplane {λd = 0} and Σ1d−1 has a hyperplane {λπ−1 (d) = 0} of neutral points, which can in fact contain fixed points. To overcome this lack of expansion, we can accelerate (i.e. take an induced map) called Zorich acceleration: Z : Σd−1 × Sd → Σd−1 × Sd defined as follows. Let τ (λ) = min{n ≥ 1 : Θn changes type at λ}

and

Z(λ, π) = Θn (λ, π).

There are countably many connected components of level sets of τ , and it can be shown that Z is uniformly expanding on each of them. The following theorem, after Veech [543] and Masur [410], is the main ingredient for the proof of the Keane conjecture.

6.3. Unique Ergodicity

281

Theorem 6.44. Rauzy induction Θ preserves an infinite measure, equivalent to d − 1-dimensional Lebesgue measure × counting measure, and it is ergodic on each Rauzy class. Zorich acceleration preserves a finite measure μZ , equivalent to d − 1-dimensional Lebesgue × counting measure, and it Z (λ) is ergodic on each Rauzy class. Moreover, its density dμdλ is a rational dμZ (λ) function in λ, and dλ is bounded and bounded away from 0. Now the Keane conjecture can be stated as a corollary of the above. Corollary 6.45. The Keane conjecture holds: Lebesgue-a.e. irreducible interval exchange transformation is uniquely ergodic, with Lebesgue measure as its only invariant probability measure. Proof. Assume we are in some Rauzy class R; the proof for every Rauzy class is the same. Rauzy induction Θ “removes” the rightmost interval of length λd or λπ−1 (d) , whichever is shorter. So by applying Θ repeatedly, each interval j will eventually become rightmost, so some χ-image will have j as second letter; see Table 4.3 in Section 4.4. Hence, we can find an open set U on which Z N is continuous and U visits all parts Σd−1 × {π} in R sufficiently often in these N iterates that the telescoping of the corresponding substitution χ1 ◦· · ·◦χN has a strictly positive transition matrix A. Therefore ρ(A) as computed in (8.29) is a fixed positive number, and consequently, A is a strict contraction in Hilbert metric. By Birkhoff’s Ergodic Theorem 6.13, μZ -a.e. (λ, π) ∈ R visits U infinitely often, so we can apply Lemma 8.61 and conclude that such (λ, π) corresponds to a uniquely ergodic IET. Since μZ is equivalent to Lebesgue measure × counting measure, the Keane conjecture follows.  Remark 6.46. The collection of non-uniquely ergodic IETs of d pieces has Lebesgue measure 0 in d − 1-dimensional parameter space, but their Hausdorff dimension is equal to d − 32 for d ≥ 4; see [37, 152]. Dimension d = 2, 3 is too low to get any non-unique ergodicity other than via a rational relation between the lengths of the pieces (for d = 2, i.e. circle rotations, this means a rational rotation number) and therefore the Hausdorff dimension of non-uniquely ergodic IETs is d − 2. Remark 6.47. Katok (quoted in [165]) showed that for every IET, uniquely ergodic or not, Lebesgue measure is not mixing; see Section 6.7. Avila & Forni [40] showed that typical IETs are weak mixing. This comes after the result of Nogueira & Rudolph [432] that generic IETs have no continuous eigenfunctions apart from constant functions, and results by Sina˘ı & Ulcigrai [514] showing that IETs for which the Rauzy induction has a certain type of periodicity are weak mixing. Conditions ensuring that IETs have a continuous spectrum and satisfy Sarnak’s conjecture were given in [131] and [342], respectively.

282

6. Methods from Ergodic Theory

6.4. Measure-Theoretic Entropy Every T -invariant probability measure μ has its own entropy, called Kolmogorov entropy, measure-theoretic entropy, or (a misnomer) metric entropy. It is denoted as hμ (T ). Both topological and measure-theoretic entropy are measures for the complexity of a dynamical system (X, T ), but hμ (T ) also plays a role in information theory under the name of Shannon entropy. The topological entropy from 2.4 gives the exponential growth rate of the cardinality 3 Section −k P for some natural partition of the space X. In this T of14 Pn = n−1 k=0 section, instead of just counting Pn , we take a particular weighted sum of the elements Zn ∈ Pn . If the mass of μ is equally distributed (as much as possible) over all Zn ∈ Pn , then the outcome of this sum is largest; μ would then be the measure of maximal entropy. The weighing function is ϕ : [0, 1] → R,

ϕ(x) = −x log x,

with ϕ(0) := limx 0 ϕ(x) = 0. Clearly ϕ (x) = −(1+log x), so ϕ(x) assumes its maximum at 1/e and ϕ(1/e) = 1/e. Also ϕ (x) = −1/x < 0, so ϕ is strictly concave. Given a finite partition P of a probability space (X, μ), let



(6.10) Hμ (P) = ϕ(μ(P )) = − μ(P ) log(μ(P )), P ∈P

P ∈P

where we can ignore the partition elements with μ(P ) = 0 because ϕ(0) = 0. For a T -invariant probability measure μ on (X, B, T ) and a partition P, define the entropy of μ w.r.t. P as n−1 4 1 T −k P . (6.11) hμ (T, P) = lim Hμ n→∞ n k=0

3 −k P) This limit exists by Fekete’s Lemma 1.15 and the fact that n1 Hμ ( n−1 k=0 T is subadditive; see [551, Corollary 4.9.1]. Finally, the measure-theoretic entropy of μ is (6.12)

hμ (T ) = sup{hμ (T, P) : P is a finite partition of X}.

The next theorem is the key to really computing entropy, as it shows that a single well-chosen partition P suffices to compute the entropy as hμ (T ) = hμ (T, P). 14 The

joint P ∨ Q := {P ∩ Q : P ∈ P, Q ∈ Q}. The expression here is an n-fold joint.

6.4. Measure-Theoretic Entropy

283

Theorem 6.48 (Kolmogorov-Sina˘ı). Let (X, B, T, μ) be a measure-preserving dynamical system. If partition P is such that ' 3∞ T −k P generates B if T is non-invertible, 3∞ j=0 −k P generates B if T is invertible, j=−∞ T then hμ (T ) = hμ (T, P). We haven’t explained properly what “generates B” means, but the idea to have in mind is that (up to measure 0), every two points in X should be in 3n−1 3 −k P (if T is non-invertible) or of −k P different elements of n−1 k=0 T k=−n T (if T is invertible) for some sufficiently large n. Example 6.49. Recall from Definition 1.32 that Bernoulli shifts are the full shifts AN0 or Z (where we take A = {1, . . . , N }) equipped with a stationary product measure μp , depending on a fixed probability vector p = (p1 , . . . , pN ). The partition into cylinder sets [a], a ∈ A, is generating, and the entropy can be computed to be ([551, Theorem 4.26])

pi log pi . (6.13) hμp (σ) = − i

The existence of a finite generating partition is guaranteed by a theorem due to Krieger [372]. Theorem 6.50. Let (X, B, μ) be a Lebesgue space15 . If T is an invertible measure-preserving transformation of finite entropy, then there is a finite generator P = {P1 , . . . , PN } and ehμ (T ) ≤ N ≤ ehμ (T ) + 1. Example 6.51. An example can be created where a likely candidate of partition is not generating is the doubling map T2 : S1 → S1 , T2 (x) = 2x mod 1 preserving Lebesgue measure μ. The partition P = {[0, 12 ), [ 12 , 1)} separates each pair of points, because if x = y, say 2−(n+1) < |x − y| ≤ 2−n , then there is k ≤ n such that T2k x and T2k y belong to different partition elements. However, Q = {[ 14 , 34 ), [0, 14 ) ∪ [ 34 , 1)} does not separate points. Indeed, if y = 1 − x, then T2k (y) = 1 − T2k (x) for all k ≥ 0, so x and y belong to the same partition element, and T2k (y) and T2k (x) will also belong to the same partition element. The partitions P and Q are special cases of the family of partitions {J0b , J1b } in Example 1.43. The partition P can be used to compute hμ (T ), while Q in principle cannot (although here, for every Bernoulli measure μ = μ(p,1−p) , we have hμ (T2 ) = hμ (T, P) = hμ (T, Q)). 15 That

is, (X, B, μ) is isomorphic to ([0, 1], Leb) ! (countable set with counting measure).

284

6. Methods from Ergodic Theory

For piecewise differentiable maps T : I → I on the interval that preserve a measure μ equivalent to Lebesgue measure, there is the Rokhlin formula to find the entropy: 0 1 (6.14) hμ (T ) = log |T  | dμ. 0

See [475], [549, Theorem 9.7.3], and [390] for proofs16 The entropy in this case is positive, provided T is non-invertible in the sense that μ({x ∈ [0, 1] : #T −1 (x) ≥ 2}) > 0. This follows from the following general result. Proposition 6.52. Let (X, T ) be a transformation that preserves an ergodic probability measure μ. If there are two disjoint sets A, B such that T (A) = T (B) are measurable and μ(A), μ(B) > 0, then μ has positive entropy. Proof. Construct the factor system Y ⊂ {A, B, C}N0 where C = X \(A∪B) with the itinerary map as factor map. This factor system is non-invertible, but such non-invertible symbolic systems have positive entropy; see [209, Fact 2.3.12]. It follows that (X, T ) has positive entropy as well. 

6.5. Isomorphic Systems An isomorphism is the measure-theoretic equivalent of conjugacy. It is both a stronger notion than conjugacy (since it requires the preservation of measures) and a weaker notion (isomorphisms need not be homeomorphisms or continuous or even defined everywhere). In particular, the phase spaces of isomorphic dynamical systems need not be homeomorphic; see e.g. Theorem 6.56. Definition 6.53. Two measure-preserving dynamical systems (X, B, T, μ) and (Y, C, S, ν) are called isomorphic if there are X  ∈ B, Y  ∈ C, and φ : Y  → X  such that • μ(X  ) = 1, ν(Y  ) = 1; • φ : Y  → X  is a bi-measurable bijection; • φ is measure preserving: ν(φ−1 (B)) = μ(B) for all B ∈ B; • φ ◦ S = T ◦ φ. Example 6.54. The doubling map T2 : [0, 1] → [0, 1] with Lebesgue measure is isomorphic to the one-sided ( 12 , 12 )-Bernoulli shift (X, B, σ, μ), via the coding map i : Y  → X  , where Y  = [0, 1] \ {dyadic rationals} because these dyadic rationals map to 12 under some iterate of T , and at 12 the coding map is not well-defined. Note that X  = {0, 1}N \ {v10∞ , v01∞ : v ∈ {0, 1}∗ }. 16 Note that the Rokhlin formula can fail if T has infinitely many branches; see Example 6.71 and [127]. The matrix A in that example is the transition matrix of a countably piecewise Markov map T : [0, 1] → [0, 1] such that the slope |T  | > 4 wherever defined. Yet the entropy is hμ (T ) < htop (T ) = log 4. Similar examples can be found in [94, 95, 421].

6.5. Isomorphic Systems

285

Example 6.55. Let p = (p1 , . . . , pN ) be some probability vector with all pi > 0. The one-sided p-Bernoulli shift is isomorphic to ([0, 1], B, T, Leb) where T : [0, 1] → [0, 1] has N linear branches of slope 1/pi . The onesided p-Bernoulli shift is also isomorphic to ([0, 1], B, S, ν) where S(x) = i N x mod 1. But here ν is another measure that gives [ i−1 N , N ] the mass pi j−1 i−1 j i−1 and [ N + N 2 , N + N 2 ] the mass pi pj , etc. Theorem 6.56. Sturmian shift with rotation number α ∈ / Q (with its unique invariant probability measure) is isomorphic to (S1 , Rα , μ). Proof. Lebesgue measure μ is the only Rα -invariant probability measure. The itinerary map i : S1 → X is a bijection from S1 \ orb(0) onto its image. Since μ(orb(0)) = 0, i serves as isomorphism φ and ν(A) := μ(φ−1 (A)) is automatically σ-invariant. By unique ergodicity of (X, σ), ν is indeed the only σ-invariant measure on X.  Clearly, invertible systems cannot be isomorphic to non-invertible systems, i.e. where μ({x : #T −1 (x) ≥ 2}) > 0. But there is a construction to make a non-invertible system invertible, namely the natural extension. Definition 6.57. Let (X, B, μ, T ) be a measure-preserving dynamical system. A system (Y, C, S, ν) is a natural extension of (X, B, μ, T ) if there are X  ∈ B, Y  ∈ C, and φ : Y  → X  such that • μ(X  ) = 1, ν(Y  ) = 1; • S : Y  → Y  is invertible; • φ : Y  → X  is a measurable surjection; • φ is measure preserving: ν(φ−1 (B)) = μ(B) for all B ∈ B; • φ ◦ S = T ◦ φ. Any two natural extensions are isomorphic, see [456, page 13], so it makes sense to speak of the natural extension. Sometimes natural extensions have explicit formulas, e.g. the baker transformation  (2x, y2 ) if x ≤ 12 , 2 2 b(x, y) = b : [0, 1] → [0, 1] , if x > 12 (2x − 1, 1+y 2 ) is the natural extension of the doubling map T2 (x) = 2x mod 1. There is also a general construction: Set Y = {(xi )i≥0 : T (xi+1 ) = xi ∈ X for all i ≥ 0} with S(x0 , x1 , . . . ) = (T (x0 ), x0 , x1 , . . . ). Then S is invertible (with the left shift σ = S −1 ) and ν(A0 , A1 , A2 , . . . ) = inf μ(Ai ) i

for (A0 , A1 , A2 . . . ) ⊂ S

286

6. Methods from Ergodic Theory

is S-invariant. The surjection φ(x0 , x1 , x2 , . . . ) := x0 makes the diagram commute: T ◦ φ = φ ◦ S. Also φ is measure preserving because, for each A ∈ B, φ−1 (A) = (A, T −1 (A), T −2 (A), T −3 (A), . . . ) and clearly ν(A, T −1 (A), T −2 (A), T −3 (A), . . . ) = μ(A) because μ(T −i (A)) = μ(A) for every i by T -invariance of μ. 6.5.1. The Bernoulli Property and Ornstein’s Theorem. Definition 6.58. Let (X, B, μ, T ) be a measure-preserving dynamical system. (1) If T is invertible, then the system is called Bernoulli if it is isomorphic to a two-sided Bernoulli shift. (2) If T is non-invertible, then the system is called one-sided Bernoulli if it is isomorphic to a one-sided Bernoulli shift. (3) If T is non-invertible, then the system is called Bernoulli if its natural extension is isomorphic to a two-sided Bernoulli shift. The third Bernoulli property is quite general (for example, one-sided SFTs and similar shifts with natural measures have this property; see e.g. [158]), even though the isomorphism φ may be very difficult to find explicitly. Expanding circle maps that are sufficiently smooth are also Bernoulli, i.e. have a Bernoulli natural extension; see [390]. Being one-sided Bernoulli, on the other hand, is quite special. If T : [0, 1] → [0, 1] has N linear surjective branches Ii , i = 1, . . . , N , then Lebesgue measure m is invariant, and ([0, 1], B, m, T ) is isomorphic to the one-sided Bernoulli system with probability vector (|I1 |, . . . , |IN |); see Example 6.55. If T is piecewise C 2 but not piecewise linear, then it has to be C 2 -conjugate to a piecewise linear expanding map to be one-sided Bernoulli; see [123]. Entropy is preserved under isomorphisms. This is a direct consequence of being isomorphic. The opposite question, namely whether systems with the same entropy are isomorphic, was solved for two-sided Bernoulli shifts by Ornstein [438] in 1974 (cf. [551, Theorem 4.28] and [479, Chapter 7]). Theorem 6.59 (Ornstein’s Theorem). Two two-sided Bernoulli shifts are isomorphic if and only if they have the same entropy. This is a remarkable result; e.g. it implies that the ( 14 , 14 , 14 , 14 )-Bernoulli shift and the ( 12 , 18 , 18 , 18 , 18 )-Bernoulli shift are isomorphic, although the first is on four and the second on five symbols. Remark 6.60. Ornstein’s Theorem is usually stated for Bernoulli shifts, but it holds for two-sided SFTs with stationary Markov measures as well,

6.6. Measures of Maximal Entropy

287

see Definition 6.65 below, because these systems are isomorphic to Bernoulli shifts [261]. Ornstein’s Theorem also holds for infinite alphabet shifts; see [209]. A short and elegant proof was given by Downarowicz & Serafin [216]. Ornstein’s Theorem strengthened a result by Sina˘ı [511] from 1962: Theorem 6.61 (Sina˘ı’s Theorem). Every ergodic measure-preserving transformation (X, B, T, μ) with entropy hμ (T ) has every p-Bernoulli shift with hμp (σ) ≤ hμ (T ) as measure-theoretic factor. Sina˘ı’s Theorem says, for example, that if two Bernoulli shifts (with probability vectors p and p ) have the same entropy, then there are measurepreserving factor maps ψ and ψ  from the one to the other and vice versa. But this leaves unanswered whether ψ  = ψ −1 . Ornstein’s Theorem settles this in the positive. We stress that (unlike Sina˘ı’s Theorem) Ornstein’s Theorem holds for two-sided shifts because in the one-sided shift setting the number of preimages is, almost surely, preserved under isomorphisms. Walters [550] showed that the one-sided (p1 , . . . , pm )-Bernoulli shift is isomorphic to the (p1 , . . . , pn )-Bernoulli shift if and only if m = n and (p1 , . . . , pm ) is a permutation of (p1 , . . . , pn ). The isomorphism for the two-sided setting is very complicated and has nothing to do with sliding block codes (no continuity is required). The proof of the existence of the isomorphism by Ornstein is not constructive, but in 1979, Keane & Smorodinsky [352] (sketched also in [456]) gave a constructive proof showing that the isomorphism can be made finitary. Definition 6.62. A factor map ψ : (X, μ) → (Y, ν) is called finitary if one of the following equivalent properties holds: • ψ is continuous μ-a.e. • For μ-a.e. x ∈ X, there is N = N (x) such that the zeroth entry of ψ(x) depends only on [x−N , . . . , xN ]. In this sense, a finitary factor map is a sliding block code with window size depending on x. If ψ is invertible ν-a.e. and ψ −1 satisfies the above two properties, then ψ is a finitary isomorphism.

6.6. Measures of Maximal Entropy There is an important connection between topological and measure-theoretic entropy (see Section 6.4); namely the former is the supremum over the latter. Theorem 6.63 (Variational Principle). If (X, T ) is a continuous dynamical system on a compact space, then (6.15)

htop (T ) = sup{hμ (T ) : μ is a T -invariant probability measure}.

288

6. Methods from Ergodic Theory

Therefore it makes sense to define: Definition 6.64. A T -invariant probability measure μ is called a measure of maximal entropy if hμ (T ) = htop (T ). For β-transformations and tent maps (or every interval maps T of constant slope s > 1, so htop (T ) = log s), the absolutely continuous (w.r.t. Lebesgue measure) invariant probability measures are also the measures of maximal entropy. This follows from the Rokhlin formula (6.14). See Remark 3.70 and Example 3.92 for precise formulas for these measures. Full shifts on A = {0, . . . , d − 1} have a unique measure of maximal entropy, namely the ( d1 , . . . , d1 )-Bernoulli measure. A generalization of Bernoulli measures for SFTs is Markov measures. For such measures, the probability of xk depends on the value of xk−1 but not on the further past . . . xk−3 , xk−2 . Definition 6.65. Let A = {0, . . . , d − 1} be our alphabet. Define a d × d probability transition matrix P = (pij )d−1 i,j=0 where all rows are probabild ity vectors. Let π ∈ R be a probability row-vector. The measure defined on cylinders as μπ ([x0 . . . xn ]) = πx0 px0 x1 px1 x2 · · · pxn−1 xn and extended to the Borel σ-algebra B of AN0 by the Kolmogorov Extension Theorem is call a Markov measure. It is shift-invariant and (provided P is irreducible) ergodic. The following result follows directly from the Perron-Frobenius Theorem 8.58. Theorem 6.66. If P is a primitive d × d probability matrix, then there is a unique stationary probability row-vector p = (p1 , . . . , pd ) with the following equivalent properties: (i) p is the probability left-eigenvector of P w.r.t. eigenvalue 1. ⎛ ⎞ p0 . . . pd−1 ⎜ .. ⎟ as n → ∞. (ii) P n → ⎝ ... . ⎠ p0 . . . pd−1 (iii) xP n → p as n → ∞ for every probability vector x ∈ Rd . The convergence in items (ii) and (iii) is exponential. Moreover, the expected first return time E(τi : x0 = i) = 1/pi , where τi (x) = min{n > 0 : xn = i}. For subshifts of finite type, Shannon [495] and Parry [446] (see also [364, Section 6.2] and [346, Section 4.4]) demonstrated how to construct the measure of maximal entropy. Let (ΣA , σ) be a subshift of finite type on

6.6. Measures of Maximal Entropy

289

alphabet {0, . . . , d − 1} with transition matrix A = (Ai,j )d−1 i,j=0 , so x = (xn ) ∈ 17 Σn if and only if Axn ,xn+1 = 1 for all n. Let us assume that A is aperiodic and irreducible. Then by the Perron-Frobenius Theorem 8.58, there is a unique real eigenvalue λ, of multiplicity one, which is larger in absolute value than every other eigenvalue, and htop (σ) = log λ. Furthermore, by irreducibility of A, the left and right eigenvectors u = (u0 , . . . , ud−1 ) and v = (v0 , . . . , vd−1 )T associated to λ are unique up to a multiplicative factor, and they can be chosen to be strictly positive. We will scale them such that d−1

ui vi = 1.

i=0

Now define the Shannon-Parry measure by pi := ui vi = μSP ([i]), Ai,j vj = μSP ([ij] | [i]), pi,j := λvi so pi,j indicates the conditional probability that xn+1 = j knowing that xn = i. Therefore μSP ([ij]) = μSP ([i])μSP ([ij] | [i]) = pi pi,j . It is stationary (i.e. shift-invariant) but not quite a product measure: μSP ([im . . . in ]) = pim · pim ,im+1 · · · pin−1 ,in . Theorem 6.67. The Shannon-Parry measure μSP is the unique measure of maximal entropy for an SFT with a primitive transition matrix. Proof. This measure was introduced by Shannon [495], and Parry showed in [446] that it is indeed the unique measure. In this proof, we will only show that hμSP (σ) = htop (σ) = log λ and skip the (more complicated) uniqueness part; see [550, Theorem 8.10]. The definitions of the masses of 1-cylinders and 2-cylinders are compatible, because (since v is a right eigenvector) d−1

j=0

μSP ([ij]) =

d−1

pi pi,j = pi

j=0

Summing over i, we get

d−1

Ai,j vj j=0

d−1 i=0

μSP ([i]) =

λvi

= pi

d−1 i=0

λvi = pi = μSP ([i]). λvi

ui vi = 1, due to our scaling.

fact, also if k = Aij ∈ N \ {1} we can interpret this as of k paths from state i to state j. The theory doesn’t change. 17 In

290

6. Methods from Ergodic Theory

To show shift-invariance, take any cylinder set Z = [im . . . in ] and compute d−1

μSP (σ −1 Z) =

μSP ([iim . . . in ]) =

i=0

d−1

pi pi,i i=0

m

pim

μSP ([im . . . in ])

d−1

ui vi Ai,im vim = μSP ([im . . . in ]) λvi uim vim i=0

= μSP (Z)

d−1

i=0

ui Ai,im λuim = μSP (Z) = μSP (Z). λuim λuim

This invariance carries over to all sets in the σ-algebra B generated by the cylinder sets. Based on the interpretation of conditional probabilities, the identities ⎧ d−1 ⎪ ⎨ im+1 ,...,in =0 pim pim ,im+1 · · · pin−1 ,in = pim , (6.16)

⎪ ⎩d−1

im ,...,in−1 =0 pim pim ,im+1

· · · pin−1 ,in = pin

follow because the left-hand side indicates the total probability of starting in state im and reaching some state after n − m steps, respectively, starting at some state and reaching state n after n − m steps. To compute hμSP (σ), we will confine ourselves to the partition P of 1cylinder sets; this partition is generating, so this restriction is justified by Theorem 6.48.

HμSP

n−1 4

σ

−k

P

= −

k=0

d−1

μSP ([i0 . . . in−1 ]) log μSP ([i0 . . . in−1 ])

i0 ,...,in−1 =0 Aik ,ik+1 =1

= −

d−1

pi0 pi0 ,i1 · · · pin−1 ,in

i0 ,...,in−1 =0 Aik ,ik+1 =1

  × log pi0 + log pi0 ,i1 + · · · + log pin−2 ,in−1

= −

d−1

i0 =0

pi0 log pi0 − (n − 1)

d−1

i,j=0

pi pi,j log pi,j ,

6.6. Measures of Maximal Entropy

291

by using (6.16) repeatedly. Hence n−1 4 1 hμSP (σ) = lim HμSP σ −k P n→∞ n k=0

= −

d−1

pi pi,j log pi,j

i,j=0 d−1

ui Ai,j vj (log Ai,j + log vj − log vi − log λ) . = − λ i,j=0

The first term in the brackets is zero because Ai,j ∈ {0, 1}. The second term (summing first over i) simplifies to −

d−1

λuj vj j=0

λ

log vj = −

d−1

uj vj log vj ,

j=0

whereas the third term (summing first over j) simplifies to d−1

ui λvi i=0

λ

log vi =

d−1

ui vi log vi .

i=0

Hence these two terms cancel each other. The remaining term is d−1 d−1 d−1



ui Ai,j vj ui λvi log λ = log λ = ui vi log λ = log λ. λ λ

i,j=0

i=0

i=0



This shows that μSP maximizes entropy. ⎛ ⎞ 0 1 1 A = ⎝1 0 1⎠ 1 1 0 λ=2



 1 1 A= 1 0 √ λ = 12 (1 + 5)

Figure 6.5. Slope λ maps with prescribed transition matrices.

We can interpret the Shannon-Parry measure geometrically by building an interval map T with a Markov partition {Pi }d−1 i=0 (for d = #A) of intervals. Whenever Aij = 1, assure that there is a subinterval of Pi that is mapped monotonically with slope ±λ onto Pj . Therefore the topological entropy htop (T ) = log λ; see [422]. This may result in a discontinuous map T , see

292

6. Methods from Ergodic Theory

the left panel of Figure 6.518 , but from the measure-theoretic viewpoint this doesn’t matter. As a piecewise expanding map, T preserves a probability measure μ - Leb with density ρ = d dμ Leb constant on each Pi . If we denote  the lengths of the partition elements by vi = |Pi | and set ui = ρ|Pi , then i ui vi = i μ(Pi ) = 1. By the Rokhlin formula (6.14) the entropy is hμ (T ) = log |T  | dμ = log λ = htop (T ), so μ is a measure of maximal entropy. Also d−1

Aij vj = Leb(T (Pi )) = λ|Pi | = λvi ,

j=0

and, because ρ is a fixed point of the transfer operator LT ,

1 1

ρ(y) =: LT (ρ)(x) = vj Aij vi = (6.17)  λ |T (y)| i

T (y)=x

for all x ∈ Pj◦ . An intuitive way to see this is that the expansion factor λ of T dilutes the density by a factor 1/λ, but summing over all preimages of the interval Pj gives (6.17). Thus u and v are left and right eigenvectors of v A for the leading eigenvalue λ. Finally Aij λvji is the relative measure of the subinterval of Pi that is mapped to Pj by T . Example 6.68. We carry out the computation for the Fibonacci SFT with associated matrix   √ 1 1 1 A= and leading eigenvalue λ = γ = (1 + 2). 1 0 2 In this case, we can let T be the tent map Tγ (x) = min{γx, γ(1 − x)}; see the right panel of Figure 6.5. Then P0 = [ 12 , γ2 ] has length v1 = 12 (γ − 1) and P1 = [ 12 (γ − 1), 12 ] has length v1 = 12 (2 − γ). Solving μ0 = u0 v0 ,

μ1 = u1 v1 ,

μ0 + μ1 = 1,

u1 = u0 /γ,

γ2

1 = γ 2 +1 and μ1 = γ 21+1 , which is in agreement which what we find μ0 = 3−γ Theorem 6.67 would provide. Although the Fibonacci substitution χFib has the same associated matrix A, μ does not describe the frequency of symbols 0, 1 or of words in the Fibonacci substitution shift (XFib , σ). This is because the fixed point ρFib = 0100101001001 . . . of χFib is not the itinerary of a μ-typical point.

The next example applies this to S-gap shifts and finds their measure of maximal entropy. 18 Since in the example of Figure 6.5 the matrix has zeros on the diagonal, there is no fixed point, which is impossible for a continuous map T : I → I.

6.6. Measures of Maximal Entropy

293

Example 6.69. Let the directed graph G consist of a vertex v0 from which q loops of length  emerge. Let (X, σ) be the corresponding SFT. We want to find the Shannon-Parry measure. First we canreplace the directed graph  by one with  q vertices vi,j , 1 ≤ i ≤ L :=  q and 1 ≤ j ≤  if i is the index of a loop of length  (let us write  = i in this case). For such i, the edges of the graph are vi,j → vi,j+1 if j < i and vi,i → vi ,1 for each i ∈ {1, . . . , L}. Thus the collection R = {vi,1 : i = 1, . . . , L} is a rome; see Definition 8.71. Once in vertex vi,1 it takes i steps to return to R, and the first return map to R has the full graph as transition graph. By Theorem 8.73, htop (σ) = log λ, where λ is the unique positive solution to

(6.18)

−

q λ

=

≥1

L

λ−i = 1.

i=1

The Shannon-Parry measure μ ˜ is completely determined by the probabilities p˜i that when you return to R, you are in vertex vi,1 . Thus the entropy hμ˜  of the first return shift σ  to R is − i p˜i log p˜i . Abramov’s formula gives  − i p˜i log p˜i hμ˜ (σ  )  = . (6.19) hμ (σ) = Eμ˜ () ˜i i i p this gives a probability vector. Inserting it in We try p˜i = λ−i . By (6.18),  −

λ−i log λ−i

i  = log λ, so we found the measure of (6.19) gives hμ (σ) = −i i i λ maximal entropy. It remains to construct a σ-invariant probability measure by setting pi,j = p˜i /i for each 1 ≤ j ≤ i as the probability of being in vertex vi,j . The transition probabilities are uniquely determined by this: P(xn ∈ vi ,1 |xn−1 ∈ vi,i ) = p˜i .

6.6.1. Intrinsic Ergodicity. If (X, T ) has a unique measure of maximal entropy, then μ can be shown to be ergodic; see [551, Theorem 8.7]. Definition 6.70. A dynamical system (X, T ) is intrinsically ergodic if it has a unique measure of maximal entropy. This notion was coined by Benjy Weiss [555]. Clearly, uniquely ergodic systems are intrinsically ergodic, and for zero entropy systems, the two notions are equivalent. But for positive entropy, most intrinsically ergodic dynamical systems are not uniquely ergodic. As shown in Theorem 2.88, specification implies intrinsic ergodicity. For this reason, irreducible SFTs19 , irreducible sofic shifts (Theorem 3.48), and factors thereof [556] are all intrinsically but not uniquely ergodic. Intrinsic 19 However,

for SFTs in higher dimension, intrinsic ergodicity can fail [133, 134].

294

6. Methods from Ergodic Theory

ergodicity need not hold for SFTs on infinite alphabets, as the next example shows. Example 6.71. Take the infinite alphabet N matrix A = (Ai,j )i,j∈N is given by ⎛ 1 ⎜1 ⎜ ' ⎜0 1 if j ≥ i − 1, ⎜ A = ⎜0 Ai,j = 0 if j < i − 1, ⎜ ⎜0 ⎝ .. .

and the infinite transition 1 1 1 0 0 .. .

1 1 1 1 0

1 1 1 1 1

⎞ 1 ... 1 . . .⎟ ⎟ ⎟ 1 ⎟ ⎟. 1 ⎟ ⎟ 1 ⎠ .. .. . .

Then htop (σ) = log 4, but there is no measure of maximal entropy. For the proof, see [127]. There are weaker versions of specification that still imply intrinsic ergodicity. This approach has been used to show that β-shifts (Corollary 3.69) and unimodal shifts [159, 315] (see Example 3.92) are intrinsically ergodic. Some further results, not relying on (any form of) specification, follow: • S-gap shifts are intrinsically ergodic; see [159]. • Coded shifts and their factors are intrinsically ergodic under the conditions given by Theorem 3.48. • The hereditary B-free subshift (XBher , σ) is intrinsically ergodic [378] and [231, Theorem J] and also B-free shifts themselves are often intrinsically ergodic (such as the B-free shift for B = {p2 : p prime}; see [451]), but (XB , σ) need not be intrinsically ergodic if htop (Xη ) > 0 [379]. On the other hand, there exist transitive and even minimal shifts that are not intrinsically ergodic; see e.g. [306] and [194, Example 27.2] where there are infinitely many measures of maximal entropy. Definition 6.72. A dynamical system (X, T ) is called entropy dense if for every invariant measure μ, there is a sequence of ergodic measures μn such that μn → μ in the weak∗ topology and the entropies hμn (T ) → hμ (T ). Obviously, uniquely ergodic systems are entropy dense, but there are many more systems which have this property for non-trivial reasons. • Every dynamical system with specification is entropy dense; see [234] with an extended result in [458, Theorem 2.1] and [459,536]. Thus topological mixing unimodal maps are entropy dense (they have specification; see [91, 92]). Weaker versions of specification

6.7. Mixing

295

hold for β-shifts, and this can be used to show that also β-shifts are entropy dense [458, Theorem 2.1 and Proposition 5.1]. • Every transitive dynamical system with the shadowing property is entropy dense, see [383, Corollary 31]. • General conditions for entropy denseness in the context of hyperbolic measures were given in [283], and also B-free shifts [344]. Remark 6.73. Entropy denseness implies that the Choquet simplex is Poulsen, which in its turn implies that the Choquet simplex is arc-wise connected. The reverse implications are not true in general. For Dyck shifts, the Choquet simplex is arc-wise connected, but not Poulsen; see Proposition 3.134. In [369] it was shown that the set of ergodic measures of a hereditary shift is arc-wise connected, but according to [379] not necessarily Poulsen. In [271, Proposition 4.29] examples are given where the Poulsen simplex is not entropy dense; using [207] one can create minimal shifts with this property.

6.7. Mixing Whereas a Bernoulli process consists of totally independent trials, mixing refers to an asymptotic independence: Definition 6.74. A dynamical system (X, B, μ, T ) preserving the probability measure μ is called mixing (or strongly mixing) if μ(T −n (A) ∩ B) → μ(A)μ(B)

(6.20)

as n → ∞

for every A, B ∈ B. Lemma 6.75. Every Bernoulli system is mixing. Proof. For any pair of cylinder sets C, C  we have μ(σ −n (C) ∩ C  ) = μ(C)μ(C  ) for n sufficiently large. This property carries over to all measurable sets by the Kolmogorov Extension Theorem.  Similar to Bernoulli shifts, SFTs with Markov measures (so in particular the Shannon-Parry measure) are mixing. Proposition 6.76. A probability-preserving dynamical system (X, B, μ, T ) is mixing if and only if the correlation coefficients 0 0 0 n f ◦ T (x) · g(x) dμ → f (x) dμ · g(x) dμ as n → ∞ (6.21) X

X

X

of the Koopman operator for all f, g ∈ L2 (μ). Written in the notation UT f = f ◦ T and inner product (f, g) = X f (x) · g(x) dμ, we get (6.22)

(UTn f, g) → (f, 1)(1, g)

as n → ∞.

296

6. Methods from Ergodic Theory

Proof. The “if” direction follows by taking indicator functions f = 1A and g = 1B . For the “only if” direction, all f, g ∈ L2 (μ) can be approximated by linear combinations of indicator functions.  Corollary 6.77. If (X, T ) is uniformly rigid, then no T -invariant measure apart from a Dirac measure can be strongly mixing. Proof. Take A, B ⊂ X measurable such that μ(A), μ(B) > 0 and inf{d(a, b) : a ∈ A, b ∈ B} =: ε > 0. Next take any n such that d(T n (x), x) < ε for all x ∈ X. Then A ∩ T n (B) = ∅, so μ(T −n (A) ∩ B) = 0 = μ(A)μ(B). Since n  can be arbitrarily large, μ(T n (A) ∩ B) → μ(A)μ(B). 6.7.1. Linearly Recurrent Shifts and Mixing. Dekking & Keane [191] gave the first general proof that constant length substitution shifts are never strongly mixing. A short and more general result [167] states that no linearly recurrent shift20 can be strongly mixing, and this includes all primitive substitution shifts. Theorem 6.78. A linearly recurrent shift (X, σ) is not mixing w.r.t. its unique21 invariant probability measure. Proof. Let L be the constant appearing in the definition of linear recurrence. Let u0 ∈ L(X) be some word and recursively construct a sequence of return words un ∈ R(un−1 ); see Definition 4.2. As in Example 4.3, we can choose un at least as long as un−1 , so any appearance of un in any x ∈ X has un−1 as prefix and is also followed by un−1 . Set hn := |un |. Since un reappears with gap ≤ Lhn , the measure of the cylinder [un ] satisfies μ([un ]) ≥ 1/(Lhn ).

(6.23)

Take m so large that μ([um ]) < L−2 . Define D(n) = [um ] ∩ σ hn ([um ]) and E(n) = {0 ≤ j < hn−1 : σ j ([un−1 ]) ⊂ [um ]}. For j ∈ E(n) we have σ j ([un ]) ⊂ σ j ([un−1 ]) ⊂ [um ] and σ j+hn ([un ]) ⊂ σ j ([un−1 ]) ⊂ [um ], and therefore σ j ([un ]) ⊂ D(n). Because σ i ([un ]) ∩ σ j ([un ]) = ∅ for 0 ≤ i < j < hn−1 , it follows that μ(D(n)) ≥ #E(n)μ([un ]). Unique ergodicity of 20 In

fact, [167] applies to any linearly recurrent mapping of the Cantor set. from Corollary 6.29 that linear recurrent shifts are uniquely ergodic.

21 Recall

6.7. Mixing

297

(X, σ) gives limn→∞

#E(n) hn

= μ([um ]). Combining all of the above, we get

lim inf μ(D(n)) ≥ lim inf #E(n)μ([un ]) n→∞

n→∞

≥ lim inf hn−1 μ([un ])μ([um ]) (by Theorem 4.4(iii)) n→∞

≥ lim inf n→∞

hn μ([un ])μ([um ]) L

hn 1 μ([um ]) L Lhn > μ([um ])2



(bounded gaps) (by (6.23)) (by the choice of m).

Also, μ(D(n)) = μ([um ] ∩ σ hn ([um ])) = μ(σ −hn ([um ] ∩ σ hn ([um ]))) = μ(σ −hn ([um ]) ∩ σ −hn ◦ σ hn ([um ])). But htop (σ|X ) = hμ (σ) = 0, so by Proposition 6.52, σ is invertible μ-a.s. Therefore μ([um ]$σ −hn ◦ σ hn ([um ])) = 0 and thus lim inf μ(σ −hn ([um ]) ∩ [um ]) > μ([um ])2 . n→∞

In other words, (X, σ, μ) cannot be mixing. This finishes the proof.



6.7.2. Cutting and Stacking and Mixing. A similar poof as in the previous section holds for cutting and stacking systems of finite rank and with a bound on the number of layers of spacer. Theorem 6.79. Let (Δ, T, μ) be a finite rank cutting and stacking system such that at each step of the construction, at most s layers of spacer are inserted between stacked slices. Then (Δ, T, μ) is not strongly mixing. Proof. Let w(n) be the number of stacks Δi (n) at step n in the construction, hi (n)−1 j Δi (n), 1 ≤ i ≤ w(n). and let hi (n) be their heights, so Δi (n) = j=0 Also let hmin (n) = mini hi (n) and hmax (n) = maxi hi (n). By speeding up the cutting and stacking construction, we can assume that hmin (n) ≥ 2n . Let Sn be the spacer left at step n; the assumption on hmin (n) also implies that μ(Sn ) = O(2−n ). Because the system is of finite rank r, 1 ≤ w(n) ≤ r for all n, but there need not be a fixed upper bound on the number of slices each stack is cut into. −r

Choose ε ∈ (0, 2rs ). Take m so large that εhmin (m) ≥ 100(1 + s), m μ(Sm ) < ε/100, and e−4s/2 ≥ 89 . Now let w(m) εhi (m)

A=





i=1

j=0

Δji (m).

298

Clearly

6. Methods from Ergodic Theory

4 3

ε ≥ μ(A) ≥

3 4

ε. We claim 5 ' 2ε μ(A ∩ Δi (n)) an := min ≥ μ(Δi (n)) 3 i=1,...,w(n)

for all n ≥ m.

By the choice of A, am ≥ 34 ε. Each of the slices into which each stack Δi (m) hmin (m) is cut receives at most s layers of spacer. Therefore am+1 ≥ am hmin (m)+s ≥ s am (1 − hmin (m) ). By induction ⎛ ⎞    n−1 ∞ 

s 3ε s ⎠ 1− ≥ exp ⎝ log 1 − an ≥ am hmin (j) 4 hmin (j) j=m j=m ⎞ ⎛ ∞

3ε −4s/2m 2ε 3ε exp ⎝−2s e 2−j ⎠ = ≥ , ≥ 4 4 3 j=m

proving the claim. Now let n ≥ m be arbitrary. Note that the cutting and stacking procedure ensures that A ∩ Δji (n) = ∅ implies that Δji (n) ⊂ A. In fact, A ∩ Δi (n) consists of (multiple) layers of at least 100(1 + s) consecutive levels Δji (n). Let p > n be minimal such that for each 1 ≤ i ≤ w(n), at least two slices of the stack Δi (n) can be found in at least one of the stacks Δk (p), 1 ≤ k ≤ w(p). In between appearances of those slices, there can be layers of spacer and slices of other stacks Δi (n). But the difference in level in Δk (p) of the appearance of two slices of the same stack Δi (n), without slices of Δi (n) or more than one slice of any stack Δi (n), i = i in between, can take at most sr2r−1 values. Take the value t that occurs the most often. Then μ(T −t (A) ∩ A) >

2 3ε r−1 2 rs

> 2ε2 > μ(A)2 .

Since t → ∞ as n → ∞, this contradicts mixing.



Remark 6.80. Ferenczi [244, Corollary 3], using cutting and stacking systems techniques, showed that every minimal uniquely ergodic subshift of (X, σ) with word-complexity satisfying lim supn pX (n)/n < ∞ is not mixing. Note that although in Theorem 6.79 every stack can receive no more that s layers of spacer throughout the procedure, we don’t require a bound on the number of slices a stack is cut into. Without the bound s, and still without a bound on the number of slices, strong mixing may be achieved. This was first hinted at by Ornstein in [436]. In detail, in the n-th step of the construction, we cut the stack in z(n) slices and add i − 1 layers of spacer to the i-th slice before stacking them (slice i on top of slice i − 1) into

6.7. Mixing

299

a single stack again. Symbolically, this amounts to a (non-adic) substitution (6.24)

B0 = 0,

Bn+1 = Bn Bn 1Bn 11 . . . Bn 1z(n)−1 .

Although this is a rank 1 cutting and stacking, we rather speak of a staircase system. Smorodinsky22 conjectured that such a system is strong mixing if z(n) → ∞. Extending unpublished results in [6], Adams [5] showed that this is indeed true if z(n) → ∞ in such a way that z(n)2 /h(n) → 0. We follow his exposition, but see [171, 244] for further results. Before going to this main result, we prove [5, Lemma 2.2]: Lemma 6.81. Let E be a union of levels in Δ(n0 ) for some arbitrary n0 ∈ N and let (ρn )n≥n0 be an integer sequence such that h(n) ≤ ρn < 2h(n). Then there exists a sequence (n )n≥n0 tending to infinity such that / 0 // 

n −1 / / /1 −iρn (x) 1 (T ) − μ(E) lim / dμ → 0 as n → ∞. / E n→∞ X / n / i=0

Proof. Choose i ≥ 1. For each n ≥ max{i2 , n0 }, take j ≥ 1 and t < h(n) such that iρn = jh(n) + t. Take Cn = D1 ∪ D2 where the D1 are the top t levels of Cn and the D2 the bottom h(n) − t levels. Take D3 := {bottom (j + 1)z(n) levels of D1 } ∪ {j + 1 rightmost slices of D1 } and E1 := (D1 ∩ E) \ D3 . Then (j + 1)z(n) j + 1 + →0 h(n) z(n) as n → ∞. Hence, the part of D1 ∩E that is outside E1 is so small for large n that it doesn’t affect the limit behavior. Let I be a level of E1 . Then T iρn (I) goes through the roof j + 1 times and thus intersects z(n) − (j + 1) levels z(n)−j−2 ; see the dashed of Cn in an arithmetic progression {η0 + η(j + 1)}η=0 lines in Figure 6.6. The mass of each intersection is μ(I)/z(n). Let I ∗ be the highest of the levels of Cn that intersects T iρn (I). Then for E1∗ = I⊂E1 I ∗ , we have μ((D1 ∩ E) \ E1 ) ≤

z(n)−j−2

1 (6.25) μ(T (E1 ) ∩ E) = μ(T −η(j+1) (E1∗ ) ∩ E) → μ(E1∗ )μ(E) z(n) η=0  as n → ∞, by ergodicity of T (see Lemma 6.84). For E2 and E2∗ = I⊂E2 I we get iρn

(6.26) μ(T

iρn

1 (E2 ) ∩ E) = z(n)



z(n)−j−2

μ(T −ηj (E2∗ ) ∩ E) → μ(E2∗ )μ(E).

η=0

22 See [5, 172, 244]; the conjecture was probably stated by Meir Smorodinsky in personal discussions with Nat Friedman. Ferenczi, building on the example from a symbolic point of view, states that the map was proposed by Smorodinsky but not published before [6].

300

6. Methods from Ergodic Theory

⎧ ⎪ ⎨ spacer

⎪ ⎩

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ t

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎧ ⎪ ⎪ ⎨ h(n) − t ⎪ ⎪ ⎩

j+1=4 ¯1 ⊃ I E

t = 11

D1

D2 Δ(n)

Figure 6.6. The action of T iρn on a level I ⊂ E1 .

Since E1 and E2 are disjoint, we can add (6.25) and (6.26) to obtain μ(T iρn (E) ∩ E) → μ(E)2

as n → ∞.

By T -invariance of μ, we obtain for i1 − i2 = i, μ(T i1 ρn (E) ∩ T i2 ρn (E)) = μ(T (i1 −i2 )ρn (E) ∩ T i2 ρn (E)) → μ(E)2 , as n → ∞. Choosing n ∈ N sufficiently small that the above convergence holds uniformly over 1 ≤ i1 , i2 < n , i1 = i2 , but still so that n → ∞, we arrive at / 0 // 

n −1 / /1 / 1E (T −iρn (x)) − μ(E)/ dμ (by Cauchy-Schwarz) / / X / n i=0 6 / /2 70 n −1 / 7 / 1 

/ / 8 1E (T −iρn (x)) − μ(E)/ dμ ≤ / / X / n i=0 6 / / 70 / / 7 / 

n −1 / 1 7 / −iρ −jρ (1E (T n (x)) − μ(E)) (1E (T n (x)) − μ(E))// dμ =8 / 2 X / n i,j=0 / 6 / / 70 / / 7 / 

n −1 / 7 /1 −iρ −jρ 2 n n 1E (T (x))1E (T (x)) − μ(E) // dμ. =8 / 2 X / n i,j=0 /

6.7. Mixing

301

Since the terms in this sum tend to 0 for i = j and the terms with i = j are only an n -th part of the whole, the average tends to 0 as n → ∞. This is what we needed to prove.  Theorem 6.82. Let (X, T, μ) be a staircase cutting and stacking system such that z(n) → ∞ and z(n)2 /h(n) → 0. Then (X, T, μ) is strongly mixing. Proof. Let us go through the construction once more. We start with a single interval Δ(0) and a certain amount of spacer. For the induction step, assume that Δ(n) is a single stack of height h(n). Cut it into z(n) slices, place i − 1 layers of spacers on the i-th slice, and then stack them up, slice i on top of slice i − 1 for 2 ≤ i ≤ z(n), into a single stack Δ(n + 1). Thus the heights satisfy 1 h(n + 1) = z(n)h(n) + z(n)(z(n) − 1), 2 and the condition z(n)2 /h(n) → 0 implies that 12 z(n)(z(n)−1) < h(n)/10 for n sufficiently large. In the limit, we obtain a transformation T : X → X that preserves Lebesgue measure μ and is invertible up to a set of zero measure. Let A, B be subsets of X. We need to show that lim |μ(T m (A) ∩ B) − μ(A)μ(B)| → 0.

(6.27)

m→∞

Note that for any ε > 0 we can take n0 ∈ N so large and A , B  consisting of full levels of Δ(n0 ) such that the symmetric differences satisfy μ(A$A ), μ(B$B  ) < ε. Then A , B  also consist of full levels of Δ(n) for all n ≥ n0 . If (6.27) holds for every such A , B  and ε > 0, then (6.27) holds for A and B as well. So there is no loss of generality to work with sets A, B that consist of full levels of Δ(n) for every n ≥ n0 . Next take m ∈ N arbitrary, and let n ∈ N be such that h(n) ≤ m = kn h(n) + tn < h(n + 1),

1 ≤ kn ≤ z(n),

0 ≤ t n < hn .

We can assume that m is so large that n ≥ n0 . Divide Δ(n) into pieces D1 , D2 , D3 where D1 = {kn + 1 rightmost slices of Δ(n)}, D2 = {top tn levels of Δ(n) \ D1 }, D3 = {bottom h(n) − tn levels of Δ(n) \ D1 }; see Figure 6.7. Mixing on D1 : Since the levels of D1 get interspersed with layers of spacer, D1 occupies the top (kn + 1)h(n) + (z(n) − 1) + (z(n) − 2) + · · · + (z(n) − kn − 1) = (kn + 1)h(n) + 12 (kn + 1)(2z(n) − kn − 2) levels of Δ(n + 1).

302

6. Methods from Ergodic Theory

 spacer ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ D1

' spacer

tn

h(n) − tn

⎧ ⎪ ⎪ ⎪ ⎪ ⎨ A¯2 ⊃ I

D2

⎪ ⎪ ⎪ ⎪ ⎩ ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩

D1

D2 and D3 interspersed

D3

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨

¯1 I ⊂A

¯1 D

⎫ ⎪ ⎪ ⎪ ⎪ ⎬

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

⎪ ⎪ ⎪ ⎪ ⎭

T m (I)

Δ(n + 1)

Δ(n)

Figure 6.7. The staircase represented as Δ(n) (left) and Δ(n + 1) (right).

Inside the representation Δ(n + 1), define  ¯ 1 = D1 \ {bottom h(n) + z(n) levels of Δ(n + 1)} D

 ∪{rightmost slice of Δ(n + 1)} .

¯ 1 . It consists of levels of Δ(n + 1) \ {rightmost slice of Let A¯1 = A ∩ D 1 → 0 as n → ∞. Let I be one of Δ(n + 1)}, and μ((A ∩ D1 )$A¯1 ) < z(n+1) m these levels. The iterate T to I pushes it through the roof, where it splits up into z(n + 1) − 1 intervals that intersect z(n + 1) − 1 consecutive levels of Δ(n + 1) (see Figure 6.7, the “dotted diagonal” in the right panel), and each of the intersections has mass μ(I)/(z(n + 1) − 1). Let I ∗ be the highest level

6.7. Mixing

303

of Δ(n + 1) that T m (I) intersects. Then μ(T m (I) ∩ B) = and μ(I ∗ ) = gives

z(n+1) z(n+1)−1 μ(I).



z(n+1)−2

1 z(n + 1) − 1 Let A∗1 =

μ(T −i (I ∗ ∩ B)),

i=0

 ¯1 I⊂A

I ∗ . Summing over all I ⊂ A¯1

|μ(T m (A¯1 ) ∩ B) − μ(A¯1 )μ(B)| / / / /   z(n+1)−2

/ / 1 1 μ(A∗1 )μ(B)// . μ(T −i (A∗1 ) ∩ B) − 1 + = // z(n + 1) − 1 / / z(n + 1) − 1 i=0 By ergodicity (see Lemma 6.84) 1 z(n + 1) − 1



z(n+1)−2

μ(T −i (A∗1 ) ∩ B) − μ(A∗1 )μ(B) → 0

i=0

as m → ∞ (and thus z(n + 1) → ∞). Naturally well. This proves (6.27) for the part A1 ⊂ A.

1 ∗ z(n+1)−1 μ(A2 )μ(B)

→ 0 as

Mixing on D2 : Let A¯2 = A ∩ D2 \ {bottom z(n)2 levels of D2 in Δ(n)}. 2

Then μ((A∩D2 )\A¯2 ) ≤ z(n) h(n) → 0 as m → ∞ (and hence n → ∞). Let I be a ¯ level in A2 , which is only a (z(n)−kn −1)/z(n) part of a level in Δ(n). Under T m , I goes through the roof kn +1 times, splitting into z(n)−kn −1 intervals which intersect z(n) − kn − 1 levels on Δ(n), in an arithmetic progression z(n)−k −2 {i0 + i(kn + 1)}i=0 n ; see the dotted lines in Figure 6.7 (left). Let I ∗ be the highest level on Δ(n) that T m (I) intersects. Then z(n)−2

1 μ(T −i(kn +1) (I ∗ ) ∩ B). μ(T (I) ∩ B) = z(n) − 1 m

i=0

Let A∗2 =



z(n)−kn −1 ∗ ¯ μ(A∗2 ). Thus we can as¯2 I . Note that μ(A2 ) = I⊂A z(n) / / kn +1 /μ(T m (A¯2 ) ∩ B) − μ(A¯2 )μ(B)/ < < 1−ε, because otherwise z(n)

sume that ε immediately.

Summing up over all I ⊂ A¯2 we obtain / / /μ(T m (A¯2 ) ∩ B) − μ(A¯2 )μ(B)/ / / / / z(n)−kn −2

/ / 1 z(n) − kn − 1 / −i(kn +1) ∗ ∗ /. μ(T (A ) ∩ B) − μ(A )μ(B) = 2 2 / / z(n) / z(n) − kn − 1 i=0 /

304

6. Methods from Ergodic Theory

Since T is invertible mod μ and preserves μ, the expression in the absolute value bars can be estimated as 1 z(n) − kn − 1

z(n)−kn −2



μ(T −i(kn +1) (A∗2 ) ∩ B) − μ(A∗2 )μ(B)

i=0

1 = z(n) − kn − 1

z(n)−kn −2



μ(A∗2 ∩ T i(kn +1) (B)) − μ(A∗2 )μ(B)

i=0 z(n)−kn −2 0

1 1B ◦ T −i(kn +1) − μ(B) dμ ∗ z(n) − kn − 1 A2 i=0 / / / 0 / z(n)−kn −2

/ / 1 −i(kn +1) / / dμ. 1 ◦ T − μ(B) ≤ B / / X / z(n) − kn − 1 /

=

i=0

So we need to show that this expression tends to 0 as n → ∞. Take p ∈ N such that h(p − 1) ≤ kn + 1 < h(p). We have (z(n) − kn − 1)(kn + 1) h(p)

εz(n)(kn + 1) h(p − 1)2 ≥ε h(p) h(p) z(p − 1)h(p − 1) h(p − 1) → ∞. = ε h(p) z(p − 1) ≥

n −1 Choose kn ≥ 1 minimal such that kn (kn + 1) ≥ h(p). Then z(n)−k →∞  kn as well. By Lemma 6.81 we can choose a sequence n → ∞ so slowly that n −1  → ∞, but an :=  z(n)−k n k  n

(6.28)

/ / / 0 // 

n −1 /  (k +1) /1 / −ikn n 1 ◦ T − μ(B) / / dμ → 0 B / /  

 X / n i=0 /  −ikn

as n → ∞.

g◦S

In the rest of the proof, we abbreviate S = T kn +1 and g = 1B − μ(B). By S-invariance of μ, / / 0 // 

0 // 

n −1 n −1 / / 1 1   / / / / g ◦ S −ikn / dμ = g ◦ S −ikn +j − μ(B)/ dμ / / / / / /   n n X X i=0

i=0

6.7. Mixing

305

for 0 ≤ j < kn . Taking the average of this expression over j = 0, . . . , kn − 1, we find / / /  / 0 // 

0 / k



n −1 n −1 n −1 / / 1 1  /  +j / /1 −ikn −ikn / / dμ g◦S g◦S / / dμ = /  / / X / n i=0 X / kn j=0 n i=0 / / /   −1 / 0 / n / 1 kn

/ −i / / g ◦ S / dμ. (6.29) = / k  n X/ n / i=0

Recall that z(n) − kn − 1 = an kn n + bn for some integer 0 ≤ bn < kn n . Hence, / / / 0 / z(n)−kn −2

/ / 1 −i / / g ◦ S / dμ / z(n) − kn − 1 X/ / i=0 / /   +b −1 / 0 / an kn n n

/ / 1 −i / / g ◦ S / dμ = / an k  n + bn X/ n / i=0 / ⎛  ⎞/   +b −1 / 0 / an kn n −1 an kn n n 



/ / an kn n 1 −i −i ⎠/ / ⎝ g◦S + g◦S = / dμ / an k  n + bn an k  n X/ n n /   i=0 i=an kn n / /   −1 / 0 / an kn n

/ 1 / 2bn &g&∞ −i / / ≤ g ◦ S / dμ + / an k  n an kn n + bn X/ n / i=0 / /   −1 / 0 / a

n / 1 n −1 1 kn

2&g&∞   )/ −(i+jkn n / / dμ + ≤ g◦S / an /  k  an X/ / j=0 n n i=0 / /   −1 / 0 / n / 1 kn

/ 2&g&∞ −i / / ≤ g ◦ S / k  n / dμ + an , X/ n / i=0

where in the last line we used the opposite argument of (6.29). Combining this result with (6.29) and (6.28), we get the required convergence. This finishes the proof for D2 . Mixing on D3 : The argument here is the same as for D2 , except that now only goes through the roof kn times. 

T m (I)

Recall that, since this staircase example uses only one stack, it is of rank 1. In [244] the word-complexity of a small variation of this example is computed. Namely, instead of (6.24), the recursion is (6.30)

B0 = 0,

Bn+1 = Bn Bn 1Bn 11Bn . . . Bn 1n−1 Bn .

306

6. Methods from Ergodic Theory

Then h(0) = h(1) = 1, h(2) = 2, h(3) = 7, and in general h(n + 1) = (n + 1)h(n) + 12 n(n − 1). For this example, [244, Proposition 1] gives p(1) = 2, p(2) = 4, p(3) = 7, p(4) = 12, p(5) = 18, p(6) = 26, p(7) = 35, p(8) = 43, and

⎧ ⎪ k+i+1 ⎪ ⎪ ⎪ ⎪ ⎪ k+n ⎪ ⎪ ⎪ ⎨k + n − i p(k + 1) − p(k) = ⎪ ⎪ ⎪ ⎪ ⎪ ⎪k + 2 ⎪ ⎪ ⎪ ⎩k + 1

for k = h(n) + i, 1 ≤ i < k, for h(n) + n < k ≤ 2h(n) + 1, for 2h(n) + 2i ≤ k ≤ 2h(n) + 2i + 1, 1 ≤ i ≤ n − 2, for 2h(n) + 2n − 2 < k ≤ 3h(n) + 2n − 3, for 3h(n) + 2n − 2 ≤ k ≤ h(n + 1).

This amounts to quadratic complexity: 1 1 2 (k + 3k − 2) ≤ p(k) ≤ 2 2

 k2 + O

 k log k  . log log k

For staircase systems with general z(n), i.e. (6.31)

B0 = 0,

Bn+1 = Bn Bn 1Bn 11Bn . . . Bn 1z(n)−1 Bn ,

the word-complexity satisfies p(k) ≥ 12 (k 2 + 3k − 2) with infinitely many values of k where equality holds. There is no definite upper bound apart from p(k) being subexponential, but it can be superpolynomial. 6.7.3. Weak Mixing. Weak mixing refers to the decay of correlations in average. Definition 6.83. A dynamical systems (X, B, μ, T ) preserving a probability measure μ is called weak mixing if 1

|μ(T −i (A) ∩ B) − μ(A)μ(B)| → 0 n n−1

(6.32)

as n → ∞

i=0

for every A, B ∈ B. We can express ergodicity in analogy to (6.20) and (6.32): Lemma 6.84. A probability-preserving dynamical system (X, B, T, μ) is ergodic if and only if 1

μ(T −i (A) ∩ B) − μ(A)μ(B) → 0 n n−1

(6.33)

as n → ∞,

i=0

for all A, B ∈ B. Note the absence of absolute value bars compared to (6.32).

6.7. Mixing

307

Proof. Assume that T is ergodic, so by Birkhoff’s Ergodic Theorem 6.13 1 n−1 i i=0 1A ◦ T (x) → μ(A) for μ-a.e. x. Multiplying by 1B gives n 1

1A ◦ T i (x)1B (x) → μ(A)1B (x) μ-a.e. n n−1 i=0

Integrating over x (using the Dominated Convergence Theorem to swap limit 1 n−1 and integral) gives limn n i=0 X 1A ◦ T i (x)1B (x) dμ = μ(A)μ(B).

Conversely, assume that A = T −1 A and take B = A. Then we obtain 1 n−1  μ(A) = n i=0 μ(T −i (A)) → μ(A)2 ; hence μ(A) ∈ {0, 1}.

Theorem 6.85. We have the following implications: Bernoulli ⇒ mixing ⇒ weak mixing ⇒ ergodic ⇒ recurrent. None of the reverse implications holds in general. Proof. Bernoulli ⇒ mixing holds by Lemma 6.75. The classical example of why mixing automorphisms need not be two-sided Bernoulli comes from Ornstein [437]. For non-invertible systems, it is easier to find mixing systems that are not one-sided Bernoulli. For instance, typical C 2 expanding circle maps have this property; see [123]. Mixing ⇒ weak mixing is immediate from the definition. Conversely, Examples 6.125, 6.121 and 6.124 all give weakly mixing systems that are not strongly mixing. Weak mixing ⇒ ergodic holds by Lemma 6.8423 . Conversely, irrational circle rotations are ergodic but not weakly mixing.  Ergodic ⇒ recurrent. If B ∈ B has positive mass, then A := i∈N T −i (B) is T -invariant up to a set of measure 0; see the Poincaré Recurrence Theorem 6.11. By ergodicity, μ(A) = 1, which is the definition of recurrence. Conversely, rational circle rotations are recurrent but not ergodic.  There are many equivalent characterizations of weak mixing. The following refer mostly to the product system. Theorem 6.86. Let (X, B, μ, T ) be a probability measure-preserving dynamical system. Then the following are equivalent: (1) (X, B, μ, T ) is weak mixing. (2) limEn→∞ μ(T −n A ∩ B) = μ(A)μ(B) for all A, B ∈ B and a subset E of zero density. direct argument: Let A = T −1 (A) be a measurable T -invariant set. Then by weak 1 n−1 −i (A) ∩ A) → μ(A)μ(A) = μ(A2 ). This means that μ(A) = 0 or 1. mixing, μ(A) = n i=0 μ(T 23 A

308

6. Methods from Ergodic Theory

(3) T × T is weak mixing. (4) T × S is ergodic on (X, Y ) for every ergodic system (Y, C, ν, S). (5) T × T is ergodic. Proof. (1) ⇔ (2): Use Lemma 8.53 for ai = |μ(T −i (A) ∩ B) − μ(A)μ(B)|. (2) ⇒ (3): For every A, B, C, D ∈ B, there are subsets E1 and E2 of N of zero density such that lim

E1 n→∞

=

μ(T −n (A) ∩ B) − μ(A)μ(B) lim

E2 n→∞

μ(T −n (C) ∩ D) − μ(C)μ(D) = 0.

The union E = E1 ∪ E2 still has density 0, and 0 ≤

lim

En→∞

/ /μ × μ((T × T )−n (A × C) ∩ (B × D)) −μ × μ(A × B) · μ × μ(C × D)|

= ≤

−n

(A) ∩ B) · μ(T −n (C) ∩ D) − μ(A)μ(B)μ(C)μ(D)|

lim

|μ(T

lim

μ(T −n (A) ∩ B) · |μ(T −n (C) ∩ D) − μ(C)μ(D)|

En→∞ En→∞

+

lim

En→∞

μ(C)μ(D) ∗ |μ(T −n (A) ∩ B) − μ(A)μ(B)| = 0.

(3) ⇒ (4): If T × T is weakly mixing, then so is T itself. Suppose (Y, C, ν, S) is an ergodic system; then for A, B ∈ B and C, D ∈ C we have 1

μ (T −i (A) ∩ B)ν(S −i (C) ∩ D) n n−1 i=0

1

μ(A)μ(B)ν(S −i (C) ∩ D) n n−1

=

+

i=0 n−1

1 n

(μ(T −i (A) ∩ B) − μ(A)μ(B))ν(S −i (C) ∩ D).

i=0

 −i By ergodicity of S (see Lemma 6.84), n1 n−1 i=0 ν(S (C) ∩ D) → μ(C)μ(D), so the first term in the above expression tends to μ(A)μ(B)μ(C)μ(D). The  −i second term is majorized by n1 n−1 i=0 |μ(T (A) ∩ B) − μ(A)μ(B)|, which tends to 0 because T is weak mixing. (4) ⇒ (5): By assumption T × S is ergodic for the trivial map S : {0} → {0}. Therefore T itself is ergodic, and hence T × T is ergodic.

6.8. Spectral Properties

309

(5) ⇒ (1): First observe that ergodicity of T × T trivially implies ergodicity of T . Because of Lemma 8.53 it suffices to show that the limit 2 1  μ(T −i (A) ∩ B) − μ(A)μ(B) = 0. n n−1

L := lim n

i=0

Expanding the square gives 1

μ(T −i (A) ∩ B)2 + (μ(A)μ(B))2 L = lim n→∞ n n−1 i=0

1

μ(T −i (A) ∩ B). n→∞ n n−1

−2μ(A)μ(B) lim

i=0

The first limit is (μ(A)μ(B))2 by ergodicity of T × T ; indeed apply Lemma 6.84 to (T × T )−i (A × A) and B × B. Hence  1  μ(T −i (A) ∩ B) − μ(A)μ(B) , n→∞ n n−1

L = −2μ(A)μ(B) lim

i=0

which is indeed zero by ergodicity of T (use Lemma 6.84 again).



6.8. Spectral Properties For a measure space (X, B, μ), the space L2 (μ) of complex-valued squareintegrable observables f, g ∈ L2 (μ), equipped with inner product (f, g) = X f (x) · g(x) dμ is a Hilbert space. Let T be a measure-preserving transformation of (X, B, μ). We restrict the Koopman operator (see Remark 6.4) to the Hilbert space L2 (μ): UT f = f ◦ T for all f ∈ L2 (μ). Then the corresponding transfer operator LT defined via duality X LT f ·g dμ = f ·UT g dμ fixes constant functions. Definition 6.87. A dynamical system (X, T ) has a continuous eigenvalue λ ∈ C if there is a continuous eigenfunction ϕ : X → C such that UT ϕ = ϕ ◦ T = λϕ. A measurable dynamical system (X, B, μ, T ) has a measurable eigenvalue λ ∈ C if there is an L2 (μ) eigenfunction ϕ : X → C such that UT ϕ = ϕ ◦ T = λϕ μ-a.e. The spectral properties of a dynamical system (X, B, μ, T ) refer to the spectral properties (in particular, the eigenvalues and eigenfunctions, and their span) of the Koopman operator UT . In particular, the spectral properties of isomorphic systems are the same.

310

6. Methods from Ergodic Theory

The Koopman operator is unitary because T preserves μ. Indeed 0 (UT f, UT g) = f ◦ T (x) · g ◦ T (x) dμ 0 0X (f · g) ◦ T (x) dμ = f · g dμ = (f, g), = X

UT∗ UT

UT UT∗

= and therefore to all unitary operators.

X

= I. This has several consequences, common

Proposition 6.88. The spectrum σ(UT ) of UT is a closed subgroup of the unit circle S1 . Eigenvectors to different eigenvalues κ = λ are orthogonal. If μ is ergodic, then the eigenfunctions have constant modulus and each eigenspace is one-dimensional. In fact, if μ is ergodic, then σ(UT ) = S1 , but for the proof of this result we refer to [429]. Proof. The spectrum of every operator is closed, and if κ is the eigenvalue of eigenfunction f , then (f, f ) = (UT f, UT f ) = (κf, κf ) = |κ|2 (f, f ), so κ lies on the unit circle. If κ, λ ∈ S1 are eigenvalues with eigenfunctions f and g, respectively, then UT (f g) = (f g) ◦ T = (f ◦ T ) · (g ◦ T ) = UT f · UT g = κλ(f g). Also ¯ f¯ = κ−1 f¯, UT (f¯) = f¯ ◦ T = f ◦ T = UT f = κ so the set of eigenvalues forms a multiplicative group of the unit circle (and this carries over to the closure σ(UT )). Additionally, ¯ g), (f, g) = (UT f, UT g) = (κf, λg) = κλ(f, and if κ = λ, then this can only hold if f and g are orthogonal. Assume now that μ is ergodic, so by Corollary 6.7, the only eigenvectors of eigenvalue 1 are constant μ-a.e. If f is the eigenfunction of eigenvalue κ, then |f | is an eigenfunction of eigenvalue |κ| = 1, so |f | is constant μ-a.e.; we can scale |f | = 1. If g is another eigenfunction of κ, scaled so that |g| = 1, then f /g is an eigenfunction of 1, so f = g μ-a.e.  Lemma 6.89. If (Y, C, ν, S) is a measure-theoretic factor of (X, B, μ, T ) (with factor map π and ν = μ ◦ π −1 ), then every eigenvalue of S is also an eigenvalues of T . In particular, the spectrum of S is contained in the spectrum of T , and isomorphic systems have the same eigenvalues and spectrum.

6.8. Spectral Properties

311

Proof. Let g be an eigenvalue of S, with eigenvalue λ. Then f := g ◦ π is an eigenvector of T , because f ◦ T = g ◦ π ◦ T = g ◦ S ◦ π = λg ◦ π = λf μ-a.e. Hence f is an eigenfunction of (X, T, μ) with the same eigenvalue λ.



6.8.1. Spectral Measures and Decompositions. Given a non-negative measure ν ∈ M(S1 ) on the circle, the Fourier coefficients of ν are defined as 0 1 0 n z dν = e2πinx dν(e2πix ). νˆ(n) = S1

0

For every sequence (zj )j∈N of complex numbers and N ∈ N, we have N N 0 1



zj z¯k νˆ(j − k) = zj e2πijx zk e2πikx dν j,k=1 0

j,k=1

0

N 1

=

zj e

2πijx

0 j=1

0

1

= 0

&

N

N

zk e2πikx dν

k=1

zj e2πijx &2 dν ≥ 0.

j=1

This property of (ˆ ν (n))n∈Z is called positive definiteness. Conversely, the Bochner-Herglotz Theorem (see e.g. [277, Chapter 5], [389], or [465, page 2]) states that for every positive definite sequence (an )n∈Z ⊂ C, there is a unique non-negative measure ν ∈ M(S1 ) such that νˆ(n) = an for each n, 9  2 and ν(S1 ) = n |an | . Let (X, B, μ, T ) be an invertible dynamical system. Given a function f ∈ L2 (μ), the sequence an := (UT f, f ) = X f ◦ T n f dμ is positive definite because N N N



zj z¯k aj−k = zj z k (UTj−k f, f ) = (zj UTj f, zk UTk f ) j,k=1

j,k=1

: =

N

j=1

j,k=1

zj UTj f,

N

k=1

; zk UTk f

* *2 *N * *

* j =* zj UT f * * * ≥ 0. * j=1 *

Therefore the Bochner-Herglotz Theorem associates a non-negative measure νf ∈ M(S1 ) to f , called the spectral measure of f . Remark 6.90. If U is an invertible unitary operator, then νˆf (−n) = (U −n f, f ) = (U −n f, U −n U n f ) = (f, U n f ) = (U n f, f ) = νˆf (n), for every n ∈ N. Therefore it makes sense to define νˆf (−n) := νˆf (n) also for non-invertible unitary operators. Most of the theory remains valid.

312

6. Methods from Ergodic Theory

Remark 6.91. For the Koopman operator UT of an invertible dynamical system (X, B, μ, T ), the Fourier coefficients 0 n νˆf (n) = (UT f, f ) = f ◦ T n f¯ dμ X

are the autocorrelation coefficients of the observable f ∈ L2 (μ). If μ is 2 mixing, then νˆf (n) → 0 for every f ∈ L (μ) with X f dμ = 0. Conversely, if σf - Leb, then by the Riemann-Lebesgue Lemma, these Fourier coefficients, i.e. the autocorrelation coefficients of f , tend to 0. In fact, the correlation coefficients (UTn f, g) = X f ◦ T n g¯ dμ of two observables f, g ∈ L2 (μ) are the Fourier coefficient of a complex measure σf,g . (This is an application of a more general version of the Bochner-Herglotz Theorem.) Suppose that a unitary operator U acts on a Hilbert space H. We can decompose H into subspaces that are the linear spans of U -orbits of wellchosen functions in H; see [465, Theorem II.4]: Theorem 6.92. Let U be an invertible unitary operator acting on a separable Hilbert space H. Then there is a (possibly finite) sequence of functions hj ∈ H such that ⎧ < ⎨H = j Span(U n hj : n ∈ Z), (6.34) ⎩Span(U n h : n ∈ Z) ⊥ Span(U n h : n ∈ Z), if j = k. j k The corresponding spectral measures satisfy νh1 / νh2 / νh3 / · · · . Moreover, if the (hj ) satisfy (6.34), then νhj ∼ νhj for each j. Definition 6.93. The spectral measure νh1 of the leading function h1 in (6.34) is called the maximal spectral type. If U = UT is the Koopman operator of an invertible dynamical system, then we call νh1 the spectral measure of T and we will denote it as νT . Example 6.94. If f is an eigenfunction of UT to eigenvalue λ scaled so that &f &2 = 1, then νf = δλ is the Dirac measure at the eigenvalue. Indeed, 0 δˆλ (n) = z n dδλ = λn = (λn f, f ) = (UTn f, f ). S1

For each eigenfunction f , Span(UTn f : n ∈ Z) =: Span(f ) is only a onedimensional subspace. However, the Kronecker factor Hpp := Span(f : UT f = λf ) can be as large as the whole Hilbert space L2 (μ). Using νT , we can also give a (continuous) decomposition of UT in orthogonal projections, called the spectral decomposition; see [298]. For a fixed eigenfunction f (with eigenvalue λ ∈ S1 ), we let Πλ : L2 (μ) → L2 (μ) be the

6.8. Spectral Properties

313

orthogonal projection onto the span of f . More generally, if S ⊂ σ(UT ), we define ΠS as the orthogonal projection on the largest closed subspace V such that UT |V has spectrum contained in S. As with any orthogonal projection, they have the following properties: • Π2S = ΠS (ΠS is idempotent). • Π∗S = ΠS (ΠS is self-adjoint). • ΠS ΠS  = 0 if S ∩ S  = ∅. • The kernel ker(ΠS ) = V ⊥ , the orthogonal complement of V . Theorem 6.95 (Spectral Decomposition of Unitary Operators). There is a measure νT on S1 such that 0 λ Πλ dνT (λ), UT = σ(UT )

and νT (λ) = 0 if and only if λ is an eigenvalue of UT . Using the above properties of orthogonal projections, we also get 0 n λn Πλ dνT (λ). UT n = UT = σ(UT )

6.8.2. Weak Mixing Revisited. Although not immediately apparent from the definition, the most important aspect of the notion of weak mixing is that it excludes the existence of eigenfunctions other than constant functions (with eigenvalue 1). Lemma 6.96. A probability measure-preserving transformation (X, B, μ, T ) is weakly mixing if and only if the Koopman operator UT has no measurable eigenfunctions other than constants. Proof. ⇒: Assume that (X, B, μ, T ) is weakly mixing. By Theorem 6.86, the product system T × T is ergodic. Suppose that f is an eigenfunction with eigenvalue κ. Write φ(x, y) = f (x)f (y). Then φ ◦ (T × T )(x, y) = φ(T x, T y) = f (T x)f (T y) = |κ|2 φ(x, y) = φ(x, y), because |κ| = 1 by Proposition 6.88. Hence φ is T × T -invariant. By ergodicity of T × T , φ must be constant μ × μ-a.e. But then f must also be constant μ-a.e. ⇐: The other direction relies on spectral theory of unitary operators. If φ is an eigenfunction of UT , then by assumption, φ is constant, so the eigenvalue is 1. Let V = Span(φ) and let Π1 be the orthogonal projection onto V ; clearly V ⊥ = {f ∈ L2 (μ) : f dμ = 0}. One can derive that the spectral measure νT cannot have any atoms, except possibly at Π1 .

314

6. Methods from Ergodic Theory

Now take f ∈ V ⊥ and g ∈ L2 (μ) arbitrary. Using the Spectral Theorem 6.95, we have /2 / n−1 n−1 /

//0 1 / i 2 i 1 |(U f, g)| = λ (Π f, g) dν (λ) / / T λ T n / / σ(UT ) n i=0

=

1 n

i=0

n−1

0 i=0 n−1

0

1 = n i=0 0 0 = 0 0

0

λi (Πλ f, g) dνT (λ) σ(UT )

κi (Πκ f, g) dνT (κ) σ(UT )

0 λi κi (Πλ f, g)(Πκ f, g) dνT (λ) dνT (κ) σ(UT )×σ(UT )

σ(UT )×σ(UT )

1 i i λ κ (Πλ f, g)(Πκ f, g) dνT (λ) dνT (κ) n

σ(UT )×σ(UT )

1 1 − (λκ)n (Πλ f, g)(Πκ f, g) dνT (λ) dνT (κ). n 1 − λκ

n−1

=

i=0

In the final line we used that the diagonal {λ = κ} has νT × νT -measure zero, because ν is non-atomic (except possibly the atom Π1 at λ = 1, but n then Π1 f = 0). Now n1 1−(λκ) 1−λκ is bounded (use l’Hôpital’s rule) and tends to 0 for λ = κ, so by the Bounded Convergence Theorem, we have 1

|(UTi f, g)|2 = 0. n→∞ n i=0  i By Corollary 8.54, also limn n1 n−1 i=0 |(UT f, g)| = 0 (i.e. without the square). 2 Finally, if f ∈ L (μ) is arbitrary, then f − (f, 1) ∈ V ⊥ . We find n−1

lim

0 =

=

1

1

|(UTi (f − (f, 1)), g)| = lim |(UTi f − (f, 1), g)| n→∞ n n→∞ n n−1

n−1

i=0 n−1

i=0

lim

1 n→∞ n lim

|(UTi f, g) − (f, 1)(1, g)|.

i=0

Take f = 1A , g = 1B to get the definition of weak mixing.



Example 6.97. Circle rotations Rα , of any rotation angle α ∈ [0, 1), are neither mixing nor weakly mixing. To prove non-mixing, set A = B = [0, 1/3]. There are infinitely many n such that Rα−n (A) ∩ B ⊃ [0, 1/4], so 1/4 ≤ μ(Rα−n (A) ∩ B) → μ(A)μ(B) = 1/9. Furthermore, Rα has a nonconstant eigenfunction ψ : S1 → C defined as ψ(x) = e2πix because ψ ◦ Rα (x) = e2πi(x+α) = e2πiα ψ(x). Therefore Rα is not weakly mixing. Since the Sturmian shift with rotation number α ∈ [0, 1] \ Q (with its unique invariant probability measure) is isomorphic to (S1 , Rα , μ), the absence of (weak) mixing carries over to the Sturmian shift.

6.8. Spectral Properties

315

6.8.3. Pure Point Spectra. The spectral measure of T decomposes as νT = νpp + νac + νsing where the following hold: • νpp is the discrete or pure point part of νT . It is an at most countable linear combination of Dirac measures, namely at every eigenvalue, so in particular at λ = 1. For weak mixing transformations νpp = cδ0 for some c ∈ (0, 1]. • νac is absolutely continuous w.r.t. Lebesgue measure. • νsing is non-atomic but singular w.r.t. Lebesgue measure. Then parts νac + νsing = νcont together are called the continuous part of the spectral measure. It follows from a result by Wiener [558] that ν = νcont if  ν (n)|2 → 0. and only if the averages of Fourier coefficients 2N1+1 N n=−N |ˆ Definition 6.98. A measure-preserving dynamical system (X, B, μ, T ) is said to have pure point spectrum (also called discrete spectrum) if the collection of eigenfunctions of the Koopman operator UT spans L2 (μ). That is, the Kronecker factor is L2 (μ). Equivalently, the spectral measure νT = νpp is a countable linear combination of Dirac measures. As we have seen in Proposition 6.88, the eigenvalues of UT form an (in case of pure point spectrum countable) subgroup of S1 . Von Neumann [299] proved that this group is a complete isomorphic invariant among the measure-preserving dynamical systems with pure point spectrum; see [550, Theorem 3.4]24 : Theorem 6.99. Two measure-preserving dynamical systems with pure point spectra are isomorphic if and only if their eigenvalues are the same. The structure theorem by Halmos & von Neumann [299] associates pure point spectrum to group rotations: Theorem 6.100. An ergodic probability measure-preserving system (X, B, μ, T ) on compact metric space has pure point spectrum if and only if it is isomorphic to a rotation on a compact metrizable abelian group G with Haar measure μG , so there is g0 ∈ G such that T x = φ−1 (φ(x) + g0 ), where φ : X → G is the isomorphism. Examples of such group rotations are rotations on the torus Td or addition on an odometer Σp , or (skew-)products of these. We give a proof following K˚ urka [381, Theorem 2.55], who, however, works with continuous eigenfunctions only. 24 Walters uses the word conjugate for what is called isomorphic in this text. Conjugate in this text is topologically conjugate in Walters’s book.

316

6. Methods from Ergodic Theory

Proof. ⇒: Suppose (fn )n∈N is a system of continuous eigenfunctions of UT , spanning L2 (μ). Let d be the metric of X, but we define a new metric ρ as follows:

1 ρ(x, y) = |fn (x) − fn (y)|. 2n n∈N

The triangle inequality is easily checked. For all x = y ∈ X, there is a function g ∈ Span(fn : n ∈ N) such that g(x) = g(y), so there must be some n ∈ N such that fn (x) = fn (y), so ρ(x, y) = 0. Since the eigenvalues λn of the fn ’s all lie on the unit circle, we have ρ(T (x), T (y)) =

1 |fn (T (x)) − fn (T (y))| 2n

n∈N

=

|λn | |fn (x) − fn (y)| = ρ(x, y), 2n

n∈N

so T is an (invertible!) isometry w.r.t. ρ. The identity map Id : (X, d) → (X, ρ) is continuous. To show this, take ε > 0 arbitrary and N such that  2 n>N 2n ≤ ε/2. Also using uniform continuity of the eigenfunctions fn , we can choose δ > 0 such that d(x, y) < δ implies that |fn (x) − fn (y)| < ε/2 for all n < N . Therefore N

1

1 |f (x) − f (y)| + |fn (x) − fn (y)| ρ(x, y) = n n 2n 2n n=1


N

1 ε ε + < ε. 2n 2 2

Thus (X, ρ), as a continuous image of the compact space (X, d), is compact itself. Also T is assumed to be transitive, so by Exercise 2.28 it is minimal (and in fact uniformly rigid; see Lemma 2.30). It remains to give (X, ρ) a group structure. Fix x0 ∈ X, and define a homomorphism h : Z → orb(x0 ) by h(n) = T n (x0 ). Since T is an isometry on (X, ρ), it is easy to check that the addition on Z transfers to a uniformly continuous action on orb(x0 ) and T (x) = h(h−1 (x) + 1). But orb(x) = X, so this action extends continuously to X and the group G is the compactification of Z in the topology that Z inherits from (X, ρ) via h−1 . ˆ be the group of characters of G; i.e. each γ ∈ G ˆ is a continuous ⇐: Let G 1 function γ : G → S such that γ(g1 + g2 ) = γ(g1 )γ(g2 ) for all g1 , g2 ∈ G. Define Tˆ : G → G as Tˆ(g) = g + g0 , so ψ ◦ Tˆ = T ◦ φ μG -a.e. Then γ(Tˆ(g)) = γ(g + g0 ) = γ(g0 ) · γ(g) = λγ(g)

for λ = γ(g0 ) ∈ S1 ,

6.8. Spectral Properties

317

ˆ so each character is an eigenfunction. The linear span Span(γ : γ ∈ G) is an algebra of complex-valued continuous functions, containing the constant function (because g ≡ 1 is also a character). The Theorem of Stoneˆ is Weierstraß, see e.g. [56, Theorem 20.45], implies that Span(γ : γ ∈ G) 2 ˆ dense in C(G, C) and therefore also in L (μ). Hence (G, μG , T ) has pure point spectrum and so has the isomorphic system (X, μ, T ).  Corollary 6.101. Let (X, T ) be a continuous dynamical system on a compact metric space. If T has pure point spectrum for each of its invariant measures, then htop (T ) = 0. Proof. Since a group rotation is an isometry, Haar measure has zero topological entropy. By Theorem 6.100, and since isomorphisms preserve entropy, each ergodic T -invariant measure has therefore zero entropy. Now use the Variational Principle 6.63.  Definition 6.102. We say that U has simple spectrum if there is single h ∈ L2 (μ) such that Span(UTn h : n ∈ Z) = L2 (μ). In other words, for the decomposition of Theorem 6.92, the sequence (hj ) consists of a single function h. Dynamical systems with simple spectrum are necessarily ergodic, [429]. A theorem by von Neumann [431] says that pure point spectrum implies simple spectrum. We will not prove this theorem in full but only give the special case in Theorem 6.104 for circle rotations, where the eigenfunctions are the Fourier modes. This situation is not unrepresentative for the general case, because the eigenfunctions form a complete orthogonal system when the spectrum is pure point. A result going back to Fomin [250] states that every equicontinuous system25 has pure point spectrum. A measure-theoretic version of equicontinuity is in fact enough; see [323]. Recently, this result was extended to mean equicontinuity in [263, 394]: Theorem 6.103. If (X, T ) is mean equicontinuous, then it has pure point spectrum w.r.t. each of its ergodic invariant measures. This was again strengthened in [269] to μ-mean equicontinuous systems with ergodic T -invariant measures μ: (X, T ) μ-mean equicontinuous if and only if T has pure point spectrum w.r.t. μ. The condition that μ is ergodic was later removed in [322]; see also [263, Theorem 1.4]. For invertible maps26 , mean equicontinuity was shown to be equivalent to (X, T ) having a pure point spectrum; see [263, Corollary 1.6]. 25 Recall

that they are uniquely ergodic; see Theorem 6.20. from Corollary 2.36 that equicontinuous surjections are invertible, but mean equicontinuous maps need not be invertible. 26 Recall

318

6. Methods from Ergodic Theory

6.8.4. Examples of Pure Point Spectra. That the dynamical systems in this section have pure point spectrum can now be seen as a special case of Theorem 6.99 or Theorem 6.103, but the specific proofs may be instructive, also because we omit the proofs of Theorem 6.99 and Theorem 6.103 and Fomin’s original proof. Theorem 6.104. A Sturmian shift with rotation number α ∈ / Q has pure point spectrum, with eigenvalues e2πiαn , n ∈ Z. It also has simple spectrum. Proof. Since the Sturmian shift with frequency α (with its unique invariant probability measure) is isomorphic to (S1 , B, Leb, Rα ), it suffices to consider the irrational circle rotation; it preserves Lebesgue measure Leb, so we take μ to be Lebesgue measure. The Koopman operator URα has eigenfunctions ψn : S1 → C defined as ψn (x) = e2πnix , n ∈ Z, because ψn ◦ Rα (x) = e2πni(x+α mod 1) = e2πniα ψn (x). But the (ψn )n∈Z form the standard basis of Fourier modes spanning L2 (μ), so URα has pure point spectrum. Now for the simple spectrum part, irrational rotations Rα indeed have a simple spectrum, but the Fourier modes, i.e. the eigenfunctions, don’t ˆ = play the role of h. Quite the opposite: take h ∈ L2 (μ) such that h(n) 1 2πin dx = 0 for all n ∈ Z. We show that the orthogonal complement 0 h(x)e n Span(UT h : n ∈ Z)⊥ = {0}. Indeed, suppose that g ∈ L2 (μ) satisfies  ˆ  gˆ(k)e−2πikx , h(j)e−2πijx and g = g ⊥ U n h for all n ∈ Z. Write h = j

T

k

where both sequences of Fourier coefficients belong to 2 (C). Then ⎛ ⎞ : ;



T T ⎝ −2πijx −2πikx ˆ ⎠, gˆ(k)e h(j)e 0 = (U h, g) = U n

=

:

n

j∈Z

ˆ h(j)e

j

= e

−2πijnα

−2πij(x+nα)

j,k∈Z

,

k∈Z



;

gˆ(k)e

−2πikx

k∈Z

ˆ (h(j)e

−2πijx

, gˆ(k)e−2πikx ) =



ˆ g (j)e−2πjnα . h(j)ˆ

j∈Z

ˆ g (j)) ∈ 2 (C), then Hence, if we abbreviate ζ = e−2πiα ∈ S1 and (bj ) = (h(j)ˆ   nj = 0 for each n ∈ Z. The function b(z) := j∈Z bj z j ∈ L2 (μ) is j bj ζ 1 n ) = 0, b(z) is identically continuous. But since {ζ n }n∈Z is dense  in S2 and b(ζ 2 zero. By the Parseval inequality, j |bj | ≤ &b&2 = 0, and therefore bj = ˆ ˆ g (j) = 0 for each j. But h(j)

= 0, so gˆ(j) = 0 for each j ∈ Z and h(j)ˆ g(x) ≡ 0 as well. 

6.8. Spectral Properties

319

Theorem 6.105. Every odometer has pure point spectrum. Proof. Let (X, a) be the odometer, with a-invariant measure μ. Set qj = p1 p2 . . . pj . Recall that the j-cylinder Z[0j ] = [0 . . . 0] (j zeros) is periodic with period qj . Abbreviate Zjn = σ n (Z[0j ] ). Now λkj := e2πik/qj for 0 ≤ k < qj is an eigenvalue, because we can construct a corresponding eigenfunction fj,k of the Koopman operator as fj,k |Zjn = e−2πink/qj . In particular, as shown in Proposition 6.88, (fj,k )j∈N,0≤k 0 sufficiently small / Zjn . Clearly gε is a linear combination of eigen|gε (x)| ≤ 1 − ε2 < 1 for x ∈ power gεr is a linear combination functions, so X h gε dμ = 0. The algebraic since |gεr (x)| < of eigenfunctions too, and hence also X h gεr dμ = 0. But / Zjn , we get limr X h gεr dμ = Z n h dμ = 0. This (1 − ε2 )r → 0 for each x ∈ j

contradiction shows that (fj,k )j∈N,0≤k 0 be arbitrary. λf (x). To get the same equality also for xmax ∈ XBV Since f is uniformly continuous, we can find δ > 0 such that f (Bδ (x)) ⊂ Bε (f (x)) for every x ∈ XBV . Therefore, for y ∈ orb(x) ∩ Bδ (xmax ) with τ (y) ∈ Bδ (τ (xmax )), we have |f (τ (xmax )) − λf (xmax )| ≤ |f (τ (xmax )) − λf (y)| + |λf (y) − λf (xmax )| = |f (τ (xmax )) − f (τ (y))| + |λ| |f (y) − f (xmax )| (6.44)

≤ ε + ε = 2ε.

Since ε > 0 was arbitrary, f (τ (xmax )) = λf (xmax ) as required.



Equation (6.43) suggests that we can approximate f by λrn (x)+ρn (t(xn )) where the power contains an “average phase” ρn (v) if t(x) = v ∈ Vn and an integer rn (x) indicating for how many iterates τ j (x) ∈ [v]. However, in the proofs below, it is convenient to approximate f by a martingale (fn )n∈N obtained from the condition expectations32 : 0 1 f dμ, (6.45) fn (x) := Eμ (f |Pn )(x) = μ(Pn [x]) Pn [x] where Pn [x] denotes the partition element of the n-th Kakutani-Rokhlin partition containing x. This sequence (fn )n∈N is indeed a martingale because (6.46)

Eμ (fn+1 |Pn ) = Eμ (Eμ (f |Pn+1 )|Pn ) = Eμ (f |Pn ) = fn .

The Martingale Theorem (see e.g. [86, Theorem 35.6]) implies that fn → f in L2 (μ) and μ-a.e. as n → ∞. Let [v min ] denote the cylinder set of the minimal path connecting v0 with v ∈ Vn . Then the elements of Pn are of the form τ j ([v min ]) for v ∈ Vn , 0 ≤ j < hv (n), and for each v ∈ Vn , the measures μ(τ j ([v min ])) for 0 ≤ j < hv (n) all coincide. Assume now that rn (x) ≥ 2, so τ (x) ∈ [v]. Then 32 For background on martingales and conditional expectation, see e.g. [56, Chapter 21] and [86, Sections 34 & 35].

6.9. Eigenvalues of Bratteli-Vershik Systems

for x ∈ τ j ([v min ]) = Pn [x], fn ◦ τ (x) = = = =

331

0 1 f ◦ τ dμ μ(Pn [x]) Pn [x] 0 1 f ◦ τ dμ μ(τ j ([v min ])) τ j ([vmin ]) 0 1 f ◦ τ j+1 dμ μ([v min ]) [vmin ] 0 λ f ◦ τ j dμ = λfn (x). μ([v min ]) [vmin ]

Since averages over a function f taking values in the unit circle lie in the closed unit disk, we can find cn (v) ∈ [0, 1] and ρn : Vn → R such that (6.47)

fn (x) = cn (v)λ−(rn (x)+ρn (v))

if t(xn ) = v ∈ Vn .

The L2 (μ)-norm of fn satisfies &fn &22 = =



hn (v)−1

v∈Vn

j=0





|cn (v)λ−(rn (x)+ρn (v)) |2 μ([v min ])

|cn (v)|2 hn (v)μ([v min ])

v∈Vn

(6.48)

=



cn (v)2 μ(x ∈ XBV : t(xn ) = v).

v∈Vn

Since our system is linearly recurrent with constant L, we have (6.49)

1 μ(x ∈ XBV : t(xn ) = v) ≤ L2 ≤ L2 μ(x ∈ XBV : t(xn ) = w)

for all v, w ∈ Vn and all n ∈ N. Since &fn &22 → 1 by the Martingale Convergence Theorem, (6.48) and (6.49) together imply that (6.50)

1 ≥ min cn (v) → 1 v∈Vn

as n → ∞.

Proof of Theorem 6.118. Since the BV-system is assumed to be linearly recurrent, say with constant L, #Vn ≤ L for all n ∈ N and τ preserves a single invariant probability measure μ; see Corollary 6.29. Measurable eigenvalues, “only if ” direction: Let fn = Eμ (f |Pn ) be the martingale as in (6.45). By the Martingale Convergence Theorem fn → f μ-a.e. Also, since Eμ (fn+1 − fn |Pn ) = 0, Eμ ((fn+1 − fn )(fm+1 − fm )) = Eμ (Eμ (fn+1 − fn )(fm+1 − fm )|Pn ) = Eμ ((fm+1 − fm )Eμ (fn+1 − fn )|Pn ) = 0

332

6. Methods from Ergodic Theory

for m > n ≥ 1. This makes the mixed terms disappear in * *2 * *

* * 2 2 * &fn+1 − fn &2 = * fn+1 − fn * * = &f &2 < ∞. *n≥1 * n≥1

2

As before, let v min denote the path connecting v0 and v ∈ Vn , and let [v min ] be the corresponding n-cylinder. Define for v ∈ Vn and w ∈ Vn+1 J(v, w) = {0 ≤ j < hv (n + 1) : τ j (wmin ) ∈ [v min ]}. Then for j ∈ J(v, w) and x ∈ τ j+k ([wmin ]) ⊂ τ k ([v min ]) we have by (6.47): 

fn+1 (x) = e2πiα(j+k) cn+1 (w)λρw (n+1) , fn (x) = e2πiαk cn (v)λρv (n) ,

for 0 ≤ k < hv (n). Because all the sets τ j+k ([wmin ]) have the same mass, it follows that hv (n)−1 0

&fn+1 −

fn &22





k=0

[wmin ]

|fn+1 − fn |2 dμ

hv (n)−1 0





k=0 min

[wmin ]

|e2πiαj cn+1 (w)λρw (n+1) − cn (v)λρv (n) |2 dμ

= μ([w ])hv (n)|e2πiαj cn+1 (w)λρw (n+1) − cn (v)λρv (n) |2 1 2πiαj ≥ |e cn+1 (w)λρw (n+1) − cn (v)λρv (n) |2 , L2 where the last line is by linear recurrence. Therefore (6.51)

n≥1

max

max |e2πiαj cn+1 (w)λρw (n+1) − cn (v)λρv (n) |2 < ∞.

v∈Vn ,w∈Vn+1 j∈J(v,w)

By (6.38), 0 ∈ J(vnmin , w) for every n ∈ N and w ∈ Vn+1 , so (6.51) for j = 0 gives

n∈N

ρvmin (n) 2 n

max |cn+1 (w)λρw (n+1) − cn (vnmin )λ

w∈Vn+1

| < ∞.

6.9. Eigenvalues of Bratteli-Vershik Systems

333

By the triangle inequality, for any v ∈ Vn , w ∈ Vn+1 also |cn+1 (w)λρw (n+1) − cn (v)λρv (n) |2  ρ ≤ |cn+1 (w)λρw (n+1) − cn (vnmin )λ vnmin (n)| ρvmin (n+1)

+ |cn (vnmin )λ

(6.52)

+

n

ρvmin (n−1)

min − cn−1 (vn−1 )λ

ρ min (n−1) min )λ vn−1 |cn−1 (vn−1

n−1

|

2

− cn (v)λ

ρv (n)

|

ρ (n) ≤ 6 max |cn+1 (w)λρw (n+1) − cn (vnmin )λ vnmin |, ρvmin (n+1)

|cn (vnmin )λ

n

ρvmin (n−1)

min − cn−1 (vn−1 )λ

ρ min (n−1) min )λ vn−1 |cn−1 (vn−1

n−1

|,

− cn (v)λ

ρv (n)

52 |

is summable. By (6.50), 1 ≥ maxv∈Vn cn (v) ≥ minv∈Vn cn (v) → 1 as n → ∞. Therefore / 2 / / / / / / / c (v)λρv (n) ρv (n) c (v)λ / / / / n n − 1 + |e2πiαj − 1|2 ≤ /e2πiαj − / / / ρ (n+1) ρ (n+1) w w / / / / cn+1 (w)λ cn+1 (w)λ  ≤ 4 |e2πiαj cn+1 (w)λρw (n+1) − cn (v)λρv (n) | 2 +|cn+1 (w)λρw (n+1) − cn (v)λρv (n) | . Combining this with (6.51) and (6.52), we obtain

max max |e2πiαj − 1|2 < ∞. (6.53) n∈N

v∈Vn ,w∈Vn+1 j∈J(v,w)

Furthermore, |λhv (n) − 1|2 = |e2πiαhv (n) − 1|2 = |e2πiα(j+hv (n)) − e2πiαj |2  2 ≤ |e2πiα(j+hv (n)) − 1| + |e2πiαj − 1| .2 ≤ 4 max |e2πiα(j+hv (n)) − 1| , |e2πiαj − 1| . For j ∈ J(v, w), we have j + hv (n) ∈ J(v  , w) for some v  ∈ Vn , and therefore the summability in (6.53) implies that maxv∈Vn |λhv (n) − 1|2 is summable as well, completing this direction of the proof. Measurable eigenvalues, “if ” direction: Let λ = e2πiα be the eigenvalue, and set κn := maxv∈Vn |||hv (n)α||| where we recall that hv (n) = #{n − paths from v0 to v ∈ Vn }.

334

6. Methods from Ergodic Theory

By assumption,



2 n κn

< ∞. Define

θn (x) =



hs(e) (n)α mod Z.

En+1 e>xn+1

t(e)=t(xn+1 )

Since (XBV , τ ) is linearly recurrent with recurrence constant L, we have  |||θn (x)||| ≤ Lκn . Recalling rn (x) from (6.39), we have rn (x)α ≡ n−1 j=0 θj (x) mod 1. Define ⎛ ⎞ n−1

(6.54) gn (x) = ⎝ θj (x) − Eμ (θj )⎠ mod Z. j=0

Then gn (x) − gn ◦ τ (x) = α whenever t(xn+1 ) = t(τ (x)n+1 ). In order to show that g := limn gn exists in L2 (μ), we decompose gn+1 − gn as gn+1 − gn = Eμ (θn − Eμ (θn )|Pn ) mod Z + (θn − Eμ (θn |Pn )) mod Z    

Yn

Zn

and show that both (Yn ) and (Zn ) are summable in L2 (μ). For this we replace the mod Z by the closest distance ||| · ||| to the integers. The L2 (μ)-norm of Zn satisfies &Zn & = &θn − Eμ (θn |Pn )&2 ≤ &θn &2 + &E(θn |Pn )&2 ≤ 2&θn &2 ≤ 2Lκn , so the (Zn ) is summable. Secondly, Zn is Pm -measurable for m > n because it depends only on the first n + 1 edges of x and E(Zm |Pm ) = Eμ (θn |Pm ) − Eμ (Eμ (θn |Pm )) = 0. Therefore, Eμ (Zm Zn ) = Eμ (Eμ (Zm Zn |Pm )) = Eμ (Zn Eμ (Zm |Pm )) = 0, so that the mixed terms in the following expression vanish: ⎛ ⎞  ∞ 2 ∞



2 Zm ⎠ = Eμ Zm Eμ ⎝ m=n+1

m=n+1





m=n+1

Therefore limn Zn = Z exists in L2 (μ).

2 Eμ (Zm )

≤C



m=n+1

κ2m < ∞.

6.9. Eigenvalues of Bratteli-Vershik Systems

335

Now for Yn , let θn (e) be the value of θn at the cylinder set [e] = {x ∈ XBV , xn+1 = e} and recall that μ([v]) ≥ 1/L by linear recurrence. Then &Yn &2 = &Eμ (θn − Eμ (θn )|Pn )&2 = &Eμ (θn |Pn ) − Eμ (θn )&2 * * * * * *  



* * hv (n) μ([w]) hv (n)μ([w]) * − θn (e) * =* 1{s(xn+1 )=v} * hw (n + 1) μ([v]) hw (n + 1) *w∈Vn+1 e>xn+1 * * v∈V t(e)=t(x ) * n n+1 2 * * * * * * 

hv (n)μ([w])  *

* 1 * 1{s(xn+1 )=v} − 1 θn (e) * =* * μ([v]) *w∈Vn+1 e>xn+1 hw (n + 1) * * v∈V t(e)=t(x ) * n





n+1

hv (n)mv,w (n) μ([w])(L − 1)Lκn hw (n + 1)

2

w∈Vn+1 v∈Vn

=



μ([w])(L − 1)Lκn ≤ L2 κn .

w∈Vn+1

Since θn only depends on the edges in En+1 , Yn is constant on cylinders [w], w ∈ Vn . Thus w.r.t. the partition Qn−1 = {[w], w ∈ Vn−1 }, the conditional expectation satisfies 0 1 Yn (x) = Eμ (θn − E(θn )|Qn−1 (x)) = θn − Eμ (θn ) dμ =: q(w) μ([w]) [w] for x ∈ [w]. It follows that for 1 ≤ k ≤ n, v ∈ Vn−k , and x ∈ v, the conditional expectation 0 1 Yn dμ Eμ (Yn |Qn−k )(x) = μ([w]) [v]

1 = μ([v] ∩ [w]) q(w) μ([v]) w∈Vn 

 μ([v] ∩ [w]) − μ([w]) q(w), = μ([v]) w∈Vn

 where in the last line we used XBV Yn dμ = w∈Vn μ([w]) q(w) = 0. Since |q(w)| ≤ Lκ (XBV , τ ) is linearly recurrent and / n , Lemma 6.36 gives us a / / / μ([vw]) C > 0 and β ∈ (0, 1) such that / μ([v]) − μ([w])/ ≤ Cβ k . It follows that Eμ (Yn |Qn−k ) ≤ Cβ k Lκn .

336

6. Methods from Ergodic Theory

Therefore * *2 ⎛⎛ ⎞2 ⎞ * n * n



*

* * ⎝⎝ Yj * Yj ⎠ ⎠ = Eμ (Yj Yk ) * * = Eμ *j=m+1 * j+m+1 m p from Ogden’s Lemma 7.13, and take   = Bm Bm Bm Bm−1 Bm−1 ∈ L(Xfeig ) z = Bm Bm Bm Bm

 

(7.5)

marked

2m−1

positions. Let z = rstuv be the decomof which we mark the last position promised by Ogden’s Lemma, and by (7.3), t has to intersect the  in (7.5). Therefore, if u = , then it is contained in the final block Bm−1  final block Bm−1 and hence |u| < 2m−1 . Thus item (2) above implies that / L(Xfeig ). Finally, if u = , then r has to intersect the final block rs5 tx5 v ∈   in (7.5), and s is contained in Bm−1 . Thus we can repeat the above Bm−1 argument with s instead of u.  Exercise 7.15. We compute the number of n-paths n starting in the leftmost node of Figure 7.2. (1) Let an and bn be the number of n-paths from the second and third node of Figure 7.2. By convention we set 0 = a0 = b0 = 1. Show that n = n−1 + an−1 and an = an−1 + bn−1 . (2) Show that b2n = b2n+1 . (3) Use the self-similarity of the graph of Figure 7.2 to conclude that b2n = an . Hence a0 = 1, a2 = 2, and an = an−1 + an/2 . (4) Conclude that the lap-number (f n |[0,1] ) of the Feigenbaum equals n ; it grows superpolynomially but subexponentially2 . From the shape of its production rules, it is clear that the language of Example 7.8 is context-free. No finite automaton can keep track of the precise number of 0’s before starting on the 1’s, but there is a simple memory device that can. Imagine that for every 0, we put a card on a stack, until 2 That

is, lim sup

1 n

log (f n ) = 0 but lim inf n

log (f n ) log n

= ∞.

352

7. Automata and Linguistic Complexity

we reach the first 1. At every 1 we remove a card again. If at the end of the word no cards are left on the stack, the word is accepted. This device is simple in construction: we can only add or remove at the top of the stack; what is further down cannot be read until all the cards above it are removed. On the other hand, the stack is unbounded, so it requires unbounded memory. Formally, the (push-down) stack has its (finite) stack alphabet C (think of cards of different color) which is different from A and an “empty stack” symbol e. The transition function needs to include instructions for the stack: δ : Q × A × (C ∪ {})  Q × (C ∪ {, r}) where c ∈ C refers to adding a card of color c to the stack, r refers to removing the top card from the stack, and  refers to leaving the stack unchanged. When the input is read entirely, its acceptance depends on the status of both the stack and state finally reached. The resulting automaton with stack is called a push-down automaton. Theorem 7.16. A language is (not more complicated than) context-free if and only if it is recognized by a push-down automaton. See [319, Section 5.3] for the proof. Using this theorem, it becomes clear that Dyck shifts from Example 7.7 are context-free languages (and non-regular if there are at least two types of brackets). We use a different color card for each type of bracket, add a card of the correct color for every opening bracket, and remove it for the corresponding closing bracket. If the correct color is not at the top of the stack, then there are linked sets of brackets. 7.2.3. Context-Sensitive Grammars. We call a grammar (V, T, P, S) context-sensitive if its set P of productions is finite and each of them has the form α → β, where α, β ∈ (V ∪ T )∗ and |β| ≥ |α|. The terminals themselves cannot change, but they can swap position with a variable. For example aA → Aa and a1 a2 A → Ba1 a2 are valid production rules in a context-sensitive grammar. Remark 7.17. The word context-sensitive comes from a particular normal form of the productions, in which each of them has the form α1 Aα2 → α1 Bα2 , where B ∈ (V ∪ T )∗ is a non-empty finite string of variables and terminals and α1 , α2 ∈ (V ∪T )∗ are contexts in which the production rule can be applied. Only if A is preceded by α1 and followed by α2 , the production rule can be applied, leaving the context α1 , α2 unchanged. Example 7.18. In contrast to Example 7.8, consider the language L = {0n 1n 2n : n ≥ 1}. Pumping Lemma 7.11 can be applied to show that L is

7.2. The Chomsky Hierarchy

353

not context-free. However L is context-sensitive. For example, we can use the productions S → 012, S → 00A12, A1 → 1A, 1A2 → 11B22, 1B → B1, 0B1 → 00A1, 1A2 → 1122. In practice, A is a cursor moving to the right, doubling 12 when it hits the first 3. The procedure can stop here (by using the last production rule) or produce cursor B that moves to the left, doubling 1 when it hits the first 1. Note that at any stage, there is at most one variable: A can change into B and B into A. Only the last production rule can remove this one variable and stop the procedure altogether. Example 7.19. The following set of productions produces the language n L = {12 : n ≥ 0}, that is, strings of 1’s of length equal to a power of 2: S C1 CB CB

→ → → →

AC1B, 11C, DB, E,

1D AD 1E AE

→ → → →

D1, AC, E1, .

Here A and B are begin-marker and end-marker, and C is a moving marker, doubling the number of 1’s when it moves to the right. When it reaches the end-marker B, then the following happens: • It changes to a moving marker D, which just moves to the left until it hits begin-marker A, and changes itself in C again. In this loop, the number of 1’s is doubled again. • Or it merges with the end-marker B to a new marker E. This marker E moves left until it hits begin-marker A. It then merges with A into the empty word: end of algorithm. This language is context-sensitive, although the production rules CB → E and AE →  are strictly speaking not of the required form. The trick around it is to glue a terminal 1 to (pairs of) variables in a clever way and then call these glued strings the new variables of grammar; see [319, page 224]. Proposition 7.20. Let χ : A → A∗ be a substitution on a finite alphabet A = {a1 , . . . , aN } with χ(a1 ) = a1 · · · ar and fixed point ρ = limn χn (a1 ). Then the corresponding substitution shift language L(Xρ ) is context-sensitive.

354

7. Automata and Linguistic Complexity

Proof. The language L(Xρ ) consists of all the finite subwords of ρ. We present production rules that generate L(Xρ ) using the terminals T = A and variables {Ai : i = 1, . . . , n} ∪ {B, E, P, Q, R, S}. The production rules, with initial variable B, are3 B RAi RE Ai Q SQ SQ P Ai ai P E PE Sai S

→ → → → → → → → → → →

SRA1 E χ(Ai )R QE QAi SR SP ai P PE  S 

the starting rule; χ acts on Ai in the same way as on ai ∈ A; introduces a cursor to repeat the substitution; the cursor walks backwards; initiates the next round of substitutions; initial step to replace variable with terminals; replacing variables by terminals; to remove suffixes; final step to remove P ; acts as the left-shift σ to remove prefixes; final step to remove S.

It is straightforward to check that the first five rules mimic the substitution. The seventh and eighth rule take a subword of the result of the first five rules, and the last three rules remove the auxiliary variables P and S.  Proposition 7.21. Let α ∈ (0, 1) be a quadratic irrational number. Then the Sturmian subshift space Xα with frequency α has a context-sensitive language. Proof. By Lagrange’s Theorem 8.45, the continued fraction expansion α = [0; a1 , . . . , am , am+1 , . . . , am+n ] of every quadratic irrational is (pre)periodic. Recall from Section 4.3.5 that the S-adic transformation (χi )i≥1 with χi = i (where χ0 and χ1 are defined in (4.31)) produces a Sturmian sequence χa0/1 1 ◦ ··· ◦ with the required frequency. But we can rewrite this as χpre = χa0/1 m m +1 m +n (which is used once) and χperiod = χa0/1 ◦ · · · ◦ χa0/1 (which is used χa0/1 repeatedly). Thus we can find a set of production rules in the same way as in Proposition 7.20. 

A cardinality argument shows that not all Sturmian shifts or S-adic shifts (or β-shifts or unimodal shifts) can have context-sensitive grammars. Indeed, there are only countably many collections of production rules and uncountably many Sturmian shifts, etc. However, due to the quadratically irrational frequency, Xα involves an eventually periodic S-adic system, and then there are only countably many such systems. In [554] it is shown that the language of every unimodal map with unbounded kneading map 3 Note that the last three productions actually shorten strings. To avoid this, we can replace them by “dummy” variables: P E → DD, Sai → SD, and S → D, and finally put all dummies at the end by an extra production DB → BD for each B ∈ T ∪ V .

7.2. The Chomsky Hierarchy

355

cannot be context-free. This adds examples to the conjecture that contextfree languages coming from unimodal maps actually have to be regular (and thus be derived from a unimodal map with preperiodic critical point; see [564]). Proposition 7.22. Let νfeig = 1011 1010 10111011 · · · be the Feigenbaum kneading sequence. The associated unimodal shift space Xfeig is contextsensitive. Complete proofs of this (regarding the itineraries of all x ∈ [0, 1], not just those contained in the attractor) were given in [153] and [565, Section 6.2]. We give an explicit set of production rules. In [554] it is also shown that two types of languages derived from Fibonacci unimodal maps are contextsensitive as well. Proof. Recall the structure of L(Xfeig ) from Corollary 7.14. The production rules will be in groups, doing specific tasks. (1) The initial word is BCr 1HE, where B and E are begin- and endmarkers that will also be used to eventually remove symbols from the left and right. (2) Produce a length 2n prefix of νfeig (for n ≥ 1 arbitrary) with markers H at positions 2k , 0 ≤ k ≤ n. The markers Cr , Cl are cursors running right and left, respectively. BCr 1H Cr 0 Cr H BCl BCl

→ → → → →

B1H0Cr H, 11Cr , HCr , BCr , BHCs ,

XCr 1 Cr 0 Cr E XCl BCl

→ → → → →

X10Cr , X = B, 11r C, Cl E, Cl X, X = B, B  ,  BHCr .

(3) Produce an n-fold concatenation of ν2k−1 +1 · · · ν2k −1 ν2 k before the appearance of ν2k−1 +1 · · · ν2k −1 ν2k . Here markers S play the role of cards in the stack, and the cursor Cs puts them there. The cursors C0 and C1 are used to copy a 0 and 1 from the beginning of a subword to its end. Cursors Cl and Cr are again cursors going to the left and right, respectively. Cs a XCl HCl SaS HCl XCl Cr H

→ → → → → →

SaCs , Cs H Cl X, X = H, HCl Sa aCa S, HCl SaX HCs a, HCl a  Cl X, X =

H, Cr X HCs , Cr H

→ → → → → →

Cl H, aHCa , a ∈ {0, 1}, a Ca Xa = 1 − a, X = S, Cr , XCr if X = H, HCr .

356

7. Automata and Linguistic Complexity

(4) Remove the markers H. Cr E → Ch , HCh → Ch ,

BCh → B  , XCh → Ch X,

X = H.

(5) Remove symbols from the left and the right, and remove final variables. Ba → B aE → E

for every a ∈ {0, 1}, for every a ∈ {0, 1},

This concludes the set of production rules.

B  → , E → . 

In effect, a Turing machine is a finite automaton with a memory device in the form of an input tape that can be read, erased, and written on, in little steps of one symbol at the time, but otherwise without restrictions on the tape. If we restrict the tape to the length of the initial input, then we have a linearly bounded non-deterministic Turing machine or linearly bounded automaton. To avoid going beyond the initial input, we assume that the input is preceded by a begin-marker, that cannot be erased and to the left of which the reading/writing device cannot go. Similarly, the input is followed by an end-marker, that cannot be erased and to the right of which the reading/writing device cannot go. The next characterization is proved in [319, Theorem 9.5]. Theorem 7.23. A language is context-sensitive if and only if it is recognized by a linearly bounded non-deterministic Turing machine. With this restriction, a Turing machine can still be very powerful. It can compute prime numbers in the sense that {1p }p prime is a context-sensitive language (but not context-free); see [486]. Theorem 7.23 suggests that to find languages which are not context-sensitive, one needs to search for problems that take a lot of memory to solve. The class EXPSPACE is the class of problems whose solution requires memory space of order 2p(n) (but not less) for some polynomial p(n) of the input length n. The known examples of such problems are complicated, and even more so to state in the form of a language, so we will not try to give an example. 7.2.4. Recursively Enumerable Grammars. A grammar is called recursively enumerable if there is no restriction anymore on the type of production rules. For this largest class in the Chomsky hierarchy, there is no restriction on the Turing machine anymore either, not even halting. That is, for every word in the language, the Turing machine needs to halt and accept the word, but for words not in the language, the Turing machine need not halt. If the Turing machine halts on every input (and hence can decide for every word if it belongs to the language or not), then the corresponding grammar is called

7.3. Automatic Sequences and Cobham’s Theorems

357

recursive (without enumerable); recursively enumerable grammars form a strictly larger class than enumerable grammars. Theorem 7.24. A language is recursively enumerable if and only if it is recognized by a Turing machine. A language that is not recursively enumerable can be shown to exist by a cardinality argument and also by a version of the diagonal argument. In short, let {Mi }i∈N be an enumeration of all Turing machines, and wi ∈ {0, 1}∗ is a word rejected by Mi (or Mi doesn’t halt on input wi ). Then there is no Turing machine recognizing the language L = {wi }i∈N . Seen in terms of push-down automata, a Turing machine is equivalent to a push-down automaton with two stacks. Namely, we first read and store the input of the tape into the first stack; then it serves as the input plus the infinite left end of the tape. The second stack, empty at the beginning, serves as the infinite right end of the tape. Then all computations that otherwise would have been done on the tape can now be done with these two stacks In summary of the Chomsky hierarchy, we have Table 7.1: Table 7.1. Summary of the Chomsky hierarchy.

Type regular (sofic shift)

Automaton finite automaton

Productions A → w, A → wB (right-linear) A → w, A → Bw (left-linear)

context-free push-down automaton

A → γ ∈ (V ∪ T )∗

contextsensitive

linearly bounded non-deterministic Turing machine

α → β, α, β ∈ (V ∪ T )∗ , |β| ≥ |α| (or αAβ → αγβ ∅ = γ ∈ (V ∪ T )∗ )

recursively enumerable

Turing machine α → β (no restrictions) = two stack pushdown automaton

7.3. Automatic Sequences and Cobham’s Theorems A deterministic finite automaton with output (DFAO) is a sextuple (7.6)

Mout = (Q, A, q0 , δ, τ, B),

where the first four components are the same as in a finite automaton in (7.1) and τ : Q → B is an output function that gives a symbol associated to each state in Q. For a word w ∈ A∗ , we extend the transition function δ to

358

7. Automata and Linguistic Complexity

words w = w0 w1 . . . wk−1 , so that δ(q0 , w) is the state the DFAO is in when the last letter of w is read (or when the automaton halts). Clearly (7.7)

δ(q, w) = δ(δ(q, w0 . . . wk−2 ), wk−1 ) = δ(δ(q, w0 ), w1 . . . wk−1 ).

Similarly, τ (q0 , w) denotes the symbol that is read off when the last letter of w is read (or when the automaton halts). The following central notion was originally due to Büchi [130]; see also the monograph by Allouche & Shallit [20]. Definition 7.25. Fix a base N ≥ 2. Let A = {0, 1, . . . , N − 1}, and for the integer n ≥ 0, let w(n) ∈ A∗ be the representation of n in base N ; i.e. |w(n)| n = [w(n)]N := i=1 wi (n)N |w(n)|−i and w1 (n) = 0. A sequence x ∈ B N is N -automatic if xn = τ (q0 , w(n)) for all n ∈ N, and an automaton that generates x is called an N -automaton. A sequence x ∈ B N is called automatic if it is N -automatic for some N ∈ N. Example 7.26. The automaton in Figure 7.3 assigns the symbol 1 to every word w ∈ L(Xeven ) of the one-sided even shift (Example 1.4) and the symbol 0 to words w ∈ {0, 1}N \ L(Xeven ). The output function τ : Q → {0, 1} is indicated by the second symbol at each state (i.e. the numbers in the circles). 1 0

q0 /1

0

q1 /0

q2 /0

1 0

1 start Figure 7.3. The DFAO for the even shift. The label qi /t stands for qi , τ (qi ).

As such, the sequence x ∈ {0, 1}N defined as  1, n contains only even blocks of 1 in its binary expansion, xn = 0, otherwise is a 2-automatic sequence. 1 0

q0 /0

q1 /1

0

1 start Figure 7.4. The DFAO for the Thue-Morse sequence.

7.3. Automatic Sequences and Cobham’s Theorems

359

The Thue-Morse sequence ρTM = 0110 1001 10010110 10 · · · is used as an example of a 2-automatic sequence, because of its characterization as xn = #{1’s in the binary expansion of n} mod 2; see Example 1.6 and [20, Section 5.1]. Figure 7.4 gives its 2-automaton. An exact characterization whose sequences are automatic was given by Cobham in 1972. This is sometimes called Cobham’s Little Theorem; see [61, 162] and [20, Theorem 6.3.2]. Theorem 7.27. A sequence x is automatic if and only if x = ψ(ρ) for some letter-to-letter substitution ψ and ρ is a fixed point of some constant length substitution χ. Proof. The proof relies on a way to rewrite the DFAO as a pair of substitutions χ and ψ, and vice versa. The state space Q is the alphabet of both substitutions, the base N of the N -automaton is the length of the substitution words χ(q) and also the cardinality of the input alphabet A = {0, . . . , N −1}, and the output function τ : Q → B is the letter-to-letter ψ : Q → B. First assume that χ and ψ are given so that χ(q0 ) starts with q0 . Since χ has the fixed point ρ, the letter q0 is the zeroth letter of ρ = ρ0 ρ1 ρ2 . . . , and we take it as the initial state of the DFAO. Now define the transition function δ : Q × A → Q as δ(q, a) is the a-th letter of χ(q). Then for n ∈ {0, . . . , N − 1}, the representation of n in base N is simply w(n) = n. If this is the input word, then q = δ(q0 , w(n)) is the n-th letter of χ(q0 ), which is the n-th letter of ρ. We continue by induction using that the induction hypothesis is δ(q0 , w(n)) = ρn . We verified this for 0 ≤ n < N and assume that it holds for all m < n. Write n = n N + n . Then δ(q0 , w(n)) = δ(q0 , w(n)1 · · · w(n) ) = δ(δ(q0 , w(n)1 · · · w(n)−1 ), w(n) )

(by (7.7))



= δ(δ(q0 , w(n )), w(n) ) = δ(ρn , n )

(by induction)



= the n -th letter of χ(ρn ) = ρN n +n = ρn . This completes the induction. It follows that τ (q0 , w) = τ (ρn ) = ψ(ρn ) = xn , as required for an automatic sequence. Now for the converse, we are given the DFAO, and we can assume that δ(q0 , 0) = q0 for intitial state q0 , because this simply deals with the insignificant digits 0 in a base N representation of n. Set ψ = τ : Q → B and define

360

7. Automata and Linguistic Complexity

χ : Q → QN by χ(q) = δ(q, 0)δ(q, 1) . . . δ(q, N − 1). Then χ(q0 ) = q0 and χ has a fixed point ρ starting with the letter q0 . For n ∈ {0, . . . , N − 1} and its representation w(n) = n in base N , we find that δ(q0 , w(n)) is the n-th letter of χ(q0 ), which is the n-th letter of ρ. The induction hypothesis is again δ(q0 , w(n)) = ρn . We verified this for 0 ≤ n < N and assume that it holds for m < n. Write n = n N + n . Then δ(q0 , w(n)) = δ(q0 , w(n)1 . . . w(n) ) = δ(δ(q0 , w(n)1 . . . w(n)−1 ), w(n) ) 

(by (7.7))

(by induction)

= δ(ρn , n ) 

= the n -th letter of χ(ρn ) = ρN n +n = ρn . This completes the induction. Again τ (q0 , w) = τ (ρn ) = ψ(ρn ) = xn .



All sequences that are eventually periodic are N -automatic for every N ∈ N. In particular, every indicator sequence x = 1E of a finite set E is automatic for every N ∈ N. However, as soon as sup E > N #Q for some N -automaton Mout with set of states Q, then there must be a loop. That is, for some m = [w]N ∈ E, the automaton Mout reading w must reach the same state q ∈ Q twice: there must be a loop from q to q. The (proof of the) Pumping Lemma 7.9 gives that Mout must accept the words that take this loop an arbitrary number of times. These loops explain the existence of geometric progressions in automatic sequences, such as the indicator sequence x = 1{2k :k≥0} of the powers of 2; its 2-automaton in shown in Figure 7.5. It also shows that 1{n!:n∈N} , or the indicator sequence of any superexponentially increasing sequence, cannot be automatic.

0

q0 /0 start

1 0

0 1

q1 /1

1

q2 /0

k 0 1 2 3 4 5 6 7 8

3k base 2 1 11 1001 11011 1010001 11110011 1011011001 100010001011 11001101000011

Figure 7.5. A 2-automaton recognizing the powers of 2, and the lack of structure of 3k expressed in base 2.

7.3. Automatic Sequences and Cobham’s Theorems

361

We call q ∈ Q a rejecting state if τ (q) = 0 and an accepting state otherwise. From any DFAO, we can remove halting states q by removing their halting status and instead add labeled arrows q →a q for each a ∈ A. These are the dashed arrow in Figure 7.5. If Mout has no halting states and from every q ∈ Q there is some path to an accepting state q  , then for every u ∈ A∗ , we can find a suffix v ∈ A∗ such that uv is accepted by Mout . A language with this property is called right dense. This is a substantial extra requirement; the 2-automaton from Figure 7.5, also when the halting states are removed, doesn’t satisfy it. It has the following consequence on the density of automatic sequences of right dense languages; see [20, Theorem 11.1.2]: Proposition 7.28. If x is N -automatic and for every u ∈ A∗ there is v such that xn = 0 for n = [uv]N , then {n ∈ N : xn = 0} is syndetic. This is not surprising in view of Theorem 7.27 and the fact that fixed points of substitution shifts are syndetic. For example, the indicator sequence x = 1{3k :k≥0} of the powers of 3 is trivially 3-automatic, but the powers of 3 written in base 2 betray no pattern; see Figure 7.5. The natural question that inspired Büchi’s paper [130] is whether x is 2-automatic. The answer relies on an elegant application of the Pumping Lemma 7.9. ˜ are multiplicatively independent, i.e. log N˜ ∈ Proposition 7.29. If N and N log N / ˜ of the powers of N is not N Q, then the indicator sequence x = 1 ˜ k automatic.

{N :k≥0}

Before giving the proof, we need a simple result from number theory. ˜ ∈ N are multiplicatively independent, then for every Lemma 7.30. If N, N ˜ r˜ − N r | < εN r . ε > 0 there are r, r˜ ∈ N such that |N ˜ m N −dm < Proof. For every m ∈ N, there is a unique dm ∈ N such that 1 ≤ N N . Let ε > 0 be given and divide [0, N ] into intervals of length ≤ ε. The pi˜ m N −dm − N ˜ m N −dm | < ε. geon hole principle gives m < m ∈ N such that |N ˜ −m , we get Multiplying with N dm N ˜ −m ≤ εN dm −dm . ˜ m −m − N dm −dm | < εN dm N |N Hence the lemma holds for r = dm − dm and r˜ = m − m.



Proof of Proposition 7.29. Suppose Mout is an N -automaton generating ˜ −#Q . By Lemma 7.30 we can find r, r˜ x; let Q be its set of states. Let ε = N ˜ r˜ − N r | < εN r . This means that such that |N ˜ r˜ = N #Q+|v| + [v]N = N r + [v]N [10#Q v]N = N

362

7. Automata and Linguistic Complexity

for some word v. Hence, when M0 parses 10#Q v, it must see some state q ∈ Q twice before it finishes reading 10#Q . Say the corresponding loop from q to q has length s. By the Pumping Lemma 7.9, Mout has to accept 10#Q+ks v for every integer k ≥ 0. Therefore there is an (k) ∈ N such that (7.8)

˜ (k) = N #Q+ks+|v| + [v]N = N r+ks + [v]N . [10#Q+ks v]N = N

Note that (k + 1) − (k) is bounded in k. Subtracting (7.8) for k from (7.8) for k + 1, we obtain ˜ (k+1)−(k) − 1) = N #Q+ks+|v| (N − 1). ˜ (k) (N N ˜ = N m for some m, m ˜m This can only hold for all k ≥ 0 if N ˜ ∈ N, which ˜ contradicts our assumption that N and N are multiplicatively independent. This concludes the proof. 

˜ -automatic sequences, not Cobham [161] generalized this to general N ˜ just the indicator sequences of the powers of N . This is Cobham’s Theorem. ˜ ∈ N are multiplicatively independent, then the Theorem 7.31. If 2 ≤ N, N ˜ -automatic are eventually only sequences which are both N -automatic and N periodic. Since Cobham’s proof4 several others were given; see [229,301,453,473] and [20, Section 11]. We will follow a new and much shorter proof due to Krebs [371]. We start with a definition and lemma. Definition 7.32. A sequence (xn )n∈N is locally periodic of period p ≥ 1 on an integer interval I if xn = xn+p whenever n, n + p ∈ I. Lemma 7.33. Let (Ik )k∈N be a sequence of integer intervals such that N0 \  I is finite. Assume that (xn )n≥0 is a sequence that is locally periodic k∈N k with period pk on Ik for each k ∈ N. If #(Ik ∩ Ik+1 ) ≥ pk + pk+1 , then (xn )n∈N is eventually periodic with period p = mink∈N pk . Proof. The overlap of neighboring intervals Ik and Ik+1 is large enough that the period pk extends to Ik ∪ Ik+1 . This is a special case of the Fine-Wilf Theorem [247, Theorem 3]. In detail: (i) Suppose that n, n + pk ∈ Ik ∪ Ik+1 . If n + pk ∈ Ik , then xn = xn+pk by the local periodicity on Ik . (ii) Otherwise n, n + pk ∈ Ik+1 and by local periodicity on Ik+1 we can find n ≡ n mod pk+1 such that n , n + pk ∈ I ∩ I  ; see Figure 7.6. But then xn = xn = xn +pk = xn+pk , using local periodicity on Ik+1 , on Ik , and again on Ik+1 , respectively. 4 Eilenberg [233] called Cobham’s proof correct, but long and unreasonable, but Cobham’s proof is just six pages, and whether unreasonably technical is in the eye of the reader.

7.3. Automatic Sequences and Cobham’s Theorems

 n

case (i)

  n+p I

 n

363

case (ii) 



n +p p + p

n

 n+p

I

Figure 7.6. Overlapping intervals with local periods p and p .

Therefore the local period pk carries over to Ik+1 . By the same argument, the period pk carries over to all next neighbors, both left and right. By induction, the period carries over to all Ij , j ∈ N. Hence xn = xn+pk for all  n ≥ min I1 . This is true for all k, so in particular to p = mink pk . Proof of Theorem 7.31. Assume that the sequence (xn )n≥0 is both N ˜ -automatic. The aim is to prove that (xn )n≥0 is locally automatic and N periodic. We will specify the integer intervals Ik , show that they have a local period, and finally show that they have sufficient overlap to apply Lemma 7.33. An important idea to achieve this overlap is to use larger ˜ − 1} than the input alphabets A = {0, 1, . . . , 2N − 1} and A˜ = {0, 1, . . . , 2N bases suggest, but according to [20, Theorem 6.8.6], there are DFAOs that produce the same automatic sequences in this case. We don’t change the bases, so integers may have multiple representations in the extended input alphabets, but for all representations, the DFAO will give the same output. ˜ . Let Q and Q ˜ be the Without loss of generality, assume that N < N sets of states of the corresponding automata. Since we only have to prove that (xn )n≥0 is eventually periodic, it suffices to consider states q such that δ(q0 , w) = q for infinitely many w ∈ A∗ . If w, w ∈ A∗ are such that δ(q0 , w) = δ(q0 , w ), then also δ(q0 , wz) = δ(q0 , w z) for every z ∈ A∗ . For the N -automatic sequence (xn )n≥0 and any r ∈ N, this means that (7.9)

xkN r +j = xk N r +j

for every j ∈ {0, . . . , 2N r − 1},

|w| |w|−i (as in Definition 7.25) and k  = [w  ] if k = [w]N = N = i=1 wi N |w |  |w |−i  are the integers represented by w and w , respectively. The i=1 wi N ˜ -automaton. analogous statement holds for the N ˜ and distinct integers aq , a such that For each q ∈ Q, we can find q˜ ∈ Q q

q0 , w) ˜ = δ(q0 , w ˜  ) whenever w, w ∈ A∗ q = δ(q0 , w) = δ(q0 , w ) and q˜ = δ(˜  ∗ and w, ˜ w ˜ ∈ A˜ are such that aq = [w]N = [w] ˜ N˜ and aq = [w ]N = [w ˜  ]N˜ . Let ξ = maxq∈Q max{aq , aq }. Using Lemma 7.30 for ε = 1/(8ξ) we can find

364

7. Automata and Linguistic Complexity

r, r˜ ∈ N, independently of q ∈ Q, such that ˜ r˜ − N r | ≤ 1 N r . ξ|N 8 ˜ r˜. Define integer intervals In particular, 78 N r ≤ N " A 1 5 Ik = kN r + N r , kN r + N r ∩ N, 3 3

k ∈ N,

centered at (k + 1)N r with radius 23 N r . Assume that k = [w]N for some word w with δ(q0 , w) = q. Swapping aq and aq if necessary, we fix the local period on Ik as   ˜ r˜ − N r ) ∈ 0, 1 N r . pq := (aq − aq ) · (N 8 To show that pq is indeed a local period, choose j ∈ {0, . . . , 2N r − 1} so that kN r + j, kN r + j + pq ∈ Ik . Then ˜ r˜ − N r ) − N ˜ r˜| ≤ |j − N r | + (aq + 1)|N ˜ r˜ − N r | |j − aq (N 7 2 r 1 r ˜ r˜, N + N < Nr ≤ N ≤ 3 8 8 ˜ r˜ − N r ) < 2N ˜ r˜ − 1, and so 0 ≤ j − aq (N xkN r +j

by (7.9) since 0 ≤ j < 2N r

= xaq N r +j = xaq N˜ r˜+j−aq (N˜ r˜−N r ) = xa N˜ r˜+j−aq (N˜ r˜−N r ) q

by the analogue of (7.9) for the

˜ r˜ − N r ) < 2N ˜ r˜ ˜ -automaton, since 0 ≤ j − aq (N N = xa N r +j+(a −aq )(N˜ r˜−N r ) q

q

= xkN r +j+pq = xkN r +j+pq

by (7.9) since 0 ≤ j < 2N r .

Since |Ik ∩ Ik+1 | = 13 N r > 2 maxq∈Q pq , the overlap of these integer intervals is as large as required in Lemma 7.33.  Since all automatic sequences can basically be written as in Theorem 7.27 (cf. [405]), this gives retrospectively information on constant length substitution shifts; see Durand & Rigo [220, 229] and also [223] and references therein. The question is then whether Cobham’s Theorem extends beyond automatic sequences to fixed points of substitutions that are not constant length. Durand [223, Theorem 1] indeed proves a corresponding result.

7.3. Automatic Sequences and Cobham’s Theorems

365

Theorem 7.34. Let ρ and ρ˜ be the fixed points of two primitive5 substitutions ˜ If there are whose associated matrices have leading eigenvalues λ and λ. ˜ ˜ substitutions ψ and ψ such that ψ(ρ) = ψ(˜ ρ) and this is not an eventually ˜ log λ periodic sequence, then log λ ∈ Q. That non-trivially different substitutions can have the same fixed point is shown in e.g. Example 4.28. Example 7.35. One application of Theorem 7.34 to the classification inverse limit spaces of tent maps helps solving the so-called Ingram Conjecture. Let Ts (x) = min{sx, s(1 − x)} be a tent map whose critical point c = 12 has periodic N . Restrict Ts to the core [c2 , c1 ] where cn = Tsn (c). The inverse limit space ← lim −([c2 , c1 ], Ts ) is the collection of backward orbits of Ts : [c2 , c1 ] → [c2 , c1 ] with product topology. Such inverse limit spaces consist of uncountably many continuous images of R as well as N continuous images of [0, ∞) (and the image of 0 is called an end-point), all lying dense in lim ←−([c2 , c1 ], Ts ). They can be embedded in the plane and then look similar to the Knaster continuum6 of Figure 4.2 (left), except that instead of one end-point, ← lim −([c2 , c1 ], Ts ) has N end-points. Ingram’s Conjecture states that if Ts and Ts˜, both with periodic critical points, have different slopes, then their inverse limit spaces are not homeomorphic. Let c2 = x1 < x2 < · · · < xN = c1 be the critical orbit arranged in increasing order. It defines a Markov partition, see Example 3.9 and Theorem 3.14, and the leading eigenvalue of the corresponding transition matrix A is ehtop (Ts ) = s. We can also associate a substitution χ to Ts on  } (where M = N − 1) by  , 1,  2,  ...,M the alphabet A = {1, 2, . . . , M ⎧ i → j · · · k if Ts ([xi−1 , xi ] = [xj−1 , xk ]) is orientation preserving, ⎪ ⎪ ⎪ ⎪ ⎨i → k · · · j if T ([x , x ] = [x , x ]) is orientation reversing, s i−1 i j−1 k χ:    ⎪ i → k · · · j if Ts ([xi−1 , xi ] = [xj−1 , xk ]) is orientation preserving, ⎪ ⎪ ⎪ ⎩i → j · · · k if T ([x , x ] = [x , x ]) is orientation reversing. s i−1 i j−1 k Figure 7.7 gives the three possibilities if the period N = 5. As shown in [120, Lemma 3], the associate matrix of χ has the same eigenvalues as A as well as M eigenvalues on the unit circle. Since c2 has period N , χN (1) starts with 1, and ρ := limn χnN (1) is a fixed point of χN . It represents the way the arc-component of the end-point (c2 , c1 , c = cN , cN −1 , . . . , c3 , c2 , c1 , c, . . . ) coils through the inverse limit space. 5 In

fact, [223] uses a slightly weaker assumption than primitive. is the inverse limit space of T2 .

6 Which

366

7. Automata and Linguistic Complexity

⎧ 1 → 2, ⎪ ⎪ ⎪ ⎨ 2 → 3, χ: 3 → 4, ⎪ ⎪ ⎪ ⎩  4 → 4 3 2 1,

c2

⎧ 1 → 2 3, 1 → 3 2,   1 → 2, ⎪ ⎪ ⎪ ⎨   2 → 4, 2 → 4, 2 → 3, χ:   3 → 3 4, 3 → 4 3, ⎪ 3 → 4, ⎪ ⎪ ⎩        4 → 1 2 3 4, 4 → 21, 4 → 1 2,

c c1

c2

c

c1

⎧ 1 → 3 4, ⎪ ⎪ ⎪ ⎨  2 → 4, χ:  3 → 3 2, ⎪ ⎪ ⎪ ⎩  4 → 1,

 1 → 4 3, 2 → 4, 3 → 2 3, 4 → 1.

c2 c

c1

Figure 7.7. Partitions for three different tent maps with critical period 5.

It is a topological invariant in the sense that if Ts˜ is another tent map ˜ and χ where the critical point has period N ˜ and ρ˜ are constructed analo, c ], T ) can only be homeomorphic to ← lim gously, then ← lim ([c −([c2 , c1 ], Ts˜) if − 2 1 s ˜ and there is a substitution ψ such that ψ(˜ N =N ρ) = ρ; see [120]. But Theorem 7.34 implies that this can only happen if the logarithms of the leading eigenvalues of the associated matrices of χ and χ are rationally dependent, h (Ts ) ∈ Q. that is, if htop top (Ts ˜) By now, the Ingram Conjecture has been confirmed in [338], [526], and [51] for periodic, preperiodic, and general critical orbits, respectively. For multimodal maps, the Ingram Conjecture remains open. Although similar techniques are likely to work, the problem has an extra facet in that there are non-conjugate multimodal tent maps with the same entropy.

Chapter 8

Miscellaneous Background Topics

8.1. Pisot and Salem Numbers Definition 8.1. A real number α is called algebraic if it is a zero of a nonconstant polynomial with integer coefficients. The smallest possible degree d that such a polynomial can have is called the degree of α, and the polynomial1 p ∈ Z[x] of this smallest degree is called the minimal polynomial. The other solutions of p(x) = 0 are called the algebraic conjugates or Galois conjugates of α. If the leading coefficient of p(x) is 1, then α is called an algebraic integer. Non-algebraic numbers are called transcendental. The set of algebraic numbers is countable, as one can check from the fact that for each n ∈ N, there are only finitely many algebraic numbers that are the root of a degree d polynomial with integer coefficients ai such  that d + di=0 |ai | = n. Hence, most real numbers are transcendental. They are more difficult to specify (short of writing down all their decimal digits), but examples of transcendental numbers are e, π (in fact, π α for every algebraic number α) and ζ(3), as proved by Hermite (1873), Lindemann (1882), and Apéry (1978), respectively. Hilbert asked in his 1900 address of √ 2 the International Mathematical Congress whether numbers such as 2 are transcendental (Hilbert’s 7-th Problem). This was solved a good thirty years later by Gelfond [272] and Schneider [491]: every number of the form ab where a = 0, 1 is algebraic and b is an algebraic irrational is transcendental. 1 For definiteness, we assume that the coefficients have no common prime divisor and the first coefficient is positive.

367

368

8. Miscellaneous Background Topics

Among the algebraic numbers there are some classes that are responsible for special properties in various dynamical systems. Definition 8.2. An algebraic integer α > 1 is called a Pisot number if all its Galois conjugates of its minimal polynomial (called the Pisot polynomial) are in the open unit disk. If the Galois conjugates are in the closed unit disk, with at least one on the boundary, then α is a Salem number. For example, all the multinacci numbers, i.e. the leading solutions of the equations xd = xd−1 + xd−2 + · · · + 1, √

2

are Pisot numbers. The numbers xa = a+ 2a +4 , i.e. the leading roots of √ x2 − ax − 1, a ∈ N, are all Pisot numbers. In particular, x1 = 12 (1 + 5) is √ the golden mean, x2 = 1+ 2 is the silver mean, and in general, the numbers xb are called the metallic means; see also (8.21). Salem [485] showed that the set of Pisot numbers is closed, so there is a smallest Pisot number. This turns out to be the cubic irrational x = 1.3247 . . . solving x3 = x + 1. It is known as the plastic number (see [1] for more on the history of this terminology), and it is isolated in the set of Pisot numbers [506]. The next √ one is the leading root of x4 = x3 − 1 and every other one is larger than 2. The smallest known Salem number is called Lehmer’s number λ = 1.17628 . . . [312, 391]; it is the leading root of Lehmer’s polynomial p(x) = x10 + x9 − x7 − x6 − x5 − x4 − x3 + x + 1. There are polynomials of lower degree that non-trivially have roots on the unit circle, for example x4 − 2x3 − 2x + 1 = 0, which has smallest possible degree, but the leading root is larger than Lehmer’s number. It is an open question whether all characteristic polynomials of non-negative integer matrices with roots on the unit circle are reducible (so not of Salem type). Proposition 8.3. If α > 1 is a Salem number, then its minimal polynomial is palindromic; i.e. p(x) = ad xd + ad−1 xd−1 + · · · + a1 x + a0 = a0 xd + a1 xd−1 + · · · + ad−1 x + ad . Except for 1/α, all Galois conjugates of α lie on the unit circle but are not roots of unity. Proof. Let p(x) = ad xd + · · · + a1 x + a0 be the minimal polynomial of α, and let p∗ (x) = a0 xd + · · · + ad−1 x + ad be the reciprocal polynomial, i.e. p(x) with the coefficients written in backward order. We need to show that p(x) = p∗ (x). Note that the reciprocal polynomial p∗ (x) = xd p(1/x), so if α is a Galois conjugate, then so is 1/α . In particular 1/α is a Galois conjugate, and no

8.1. Pisot and Salem Numbers

369

other Galois conjugate α can have |α | < 1, because then |1/α | > 1 and this contradicts that α is a Salem number. Thus all the remaining Galois conjugates α lie on the unit circle, and the complex conjugate α = 1/α is also a root of p(x) and of p∗ (x). But then α is also a root of the polynomial a0 p(x) − ad p∗ (x) which has degree < d. This contradicts that p(x) is the minimal polynomial of α, unless p(x) = p∗ (x). If α is a root of unity, then its minimal polynomial, which is (a factor of) xr − 1, divides p(x), but is not equal to it. This contradicts that p is irreducible. The proof is complete.  Definition 8.4. An algebraic number α > 1 is a Perron number if all its algebraic conjugates α satisfy |α | < α. That the leading eigenvalue of a non-negative aperiodic irreducible matrix is a Perron number follows directly from the Perron-Frobenius Theorem 8.58. The converse, however, is also true: every Perron number is the leading eigenvalue of a non-negative aperiodic irreducible matrix; see [397, Theorem 1]. In fact [397, Theorem 3], the algebraic conjugates α of α satisfy |α | ≤ α if and only if α is the leading eigenvalue of a non-negative irreducible integer matrix (without stipulating that this matrix is aperiodic). Recall that |||x||| denotes the distance of x to the nearest integer. Proposition 8.5. If α > 1 is Pisot, then |||αn ||| → 0 exponentially fast. Proof. Let α1 , . . . , αd−1 be the Galois conjugates of α, and let A be a d × d integer matrix whose characteristic polynomial is the minimal polynomial of α. Let J = U −1 AU be the Jordan normal form of A, so J n = U −1 An U and J n and An have the same characteristic polynomial det(λI − An ) = λd +pd−1 λd−1 +· · ·+p1 λ+p0 , with the same coefficient pd−1 . This coefficient is minus the trace, so αn +

d−1

αin = tr(J n ) = tr(An ) = −pd−1 ∈ Z.

i=1

Therefore

|||αn |||

= |||

d−1 i=1

αin ||| → 0 exponentially fast.



8.1.1. Conditions for recursive sequences satisfying |||αGn||| → 0. We will investigate for which irrational α (if any) |||αGn||| → 0 for sequences Gn . This is of interest in several questions in ergodic theory and elsewhere; see e.g. Section 6.9. Pisot numbers are important, but this is not the whole story. Proposition 8.6. Let the integer sequence (Gn )n≥0 satisfy the recursion (8.1)

Gn =

d

i=1

ad−i Gn−i ,

G0 , . . . , Gd−1 ∈ N0 arbitrary,

370

8. Miscellaneous Background Topics

where ai ∈ N0 , a1 ≥ 1 are the coefficients of a Pisot polynomial p(x) :=  i xd − d−1 i=0 ai x Pisot. If α is the leading root of p(x), then (8.2)

|||αGn ||| → 0

exponentially fast.

The recursive relation (8.1) can be written as ⎛ ⎞ ⎛ ⎞ ⎛ 0 1 0 Gn−d Gn−d+1 ⎜0 0 1 ⎜Gn−d+1 ⎟ ⎜ ⎜Gn−d+2 ⎟ ⎟ ⎜ ⎟ ⎜ .. ⎜ . .. (8.3) ⎜ ⎟ = A⎜ ⎟, A = ⎜ . .. .. ⎜ ⎠ ⎝ ⎠ ⎝ . . ⎝0 Gn Gn−1 a0 a1 . . . so p(x) = det(xI − A). This has the advantage position of A, namely A = U JU −1 , the matrix ⎛ 1 1 ... ⎜ λ1 λ2 ... ⎜ ⎜ 2 λ22 ... U = ⎜ λ1 ⎜ . . .. ⎝ .. λd−1 1

λd−1 ... 2

... ... .. .

0 0 .. .



⎟ ⎟ ⎟ ⎟, ⎟ 1 ⎠

0 ad−2 ad−1

that in the Jordan decom⎞

1 λd λ2d .. .

⎟ ⎟ ⎟ ⎟ ⎟ ⎠

λd−1 d

is (the transpose of) a Vandermonde matrix, provided that all the eigenvalues λi of A are distinct. However, we get the same result for any matrix A  i with characteristic polynomial p(x) = xd − d−1 i=0 ai x . Indeed, if we take (0) (0) G(0) = (G0 , . . . , Gd−1 )T ∈ Nd0 arbitrary and set G(n) = An G(0) , then by the Cayley-Hamilton Theorem, p(A) = 0. Hence G(n) = ad−1 G(n−1) + · · · + a0 G(n−d) . In particular, each component of G(n) satisfies (8.1). Proof of Proposition 8.6. Let (vi )di=1 be the eigenvectors of A associated to eigenvalues λi where λ = λ1 > 1 > |λi | for i = 2, . . . , d. For simplicity, we assume that A is indeed complex diagonalizable; otherwise the proof becomes more technical but remains in essence the same. Decompose  (G0 , . . . , Gd−1 )T = di=1 civi for ci ∈ C; then (8.4)

(Gn−d+1 , . . . , Gn )T = An−d

d

civi =

i=1

Therefore, the d-th component  d

civi λn−d (λ − λi ) λGn − Gn+1 = i i=1

is indeed exponentially small in n.

civi λn−d . i

i=1

= d

d

d−1

i=1

civi λn−d (λ − λi ) i d



8.1. Pisot and Salem Numbers

371

Remark 8.7. For the case p(x) = x2 − x − 1 with Gn = Fn , F0 = 0, F1 = 1 the Fibonacci numbers, (8.4) reduces to Binet’s formula √  1  1 (8.5) Fn = √ γ n − (−γ)−n , γ = (1 + 5). 2 5 The problem we would like to solve (for various purposes in number theory, Diophantine approximation, and, as presented in Section 6.9) is under what conditions (8.2) has a solution. That is, when is α ∈ R such that |||αGn||| → 0? If the Gn ’s are multiples of q, then α = p/q solves this equation. No α ∈ Q solves the equation if q | Gn for all sufficiently large n. But for irrational α beyond Pisot numbers, it is an important and intriguing question, which sometimes, maybe surprisingly, has a positive answer.  i For the rest of this section, let p(x) = det(A−xI) = xd − d−1 i=0 ai x be the characteristic polynomial of an integer matrix A. Let Λ = {λi ∈ C : p(λi ) = 0} be the set of eigenvalues of A, which we split into Λ+ = {λi ∈ Λ : |λj | ≥ 1} and Λ− = {λi ∈ Λ : |λi | < 1}. Let also λ1 be the leading root of p(x); if A is a non-negative matrix, λ1 > 0 due to the Perron-Frobenius Theorem 8.58. Theorem 8.8. Assume that the eigenvalues λ ∈ Λ+ are all distinct. Suppose that theinteger sequence (Gn )n≥0 satisfies (8.1). If there is a polynomial g(x) = k gk xk ∈ Q[x] such that g(λ) = α for all λ ∈ Λ+ , then |||αGn ||| → 0

for  = lcm{denominators of gi }.

In addition, |||αk Gn ||| → 0

for every k ∈ N and some  = (k) ∈ N.

This result goes back to Livshits; see [402, 403] and also [519]. For Pisot matrices, we can trivially choose g(x) = x and α = λ1 , the leading eigenvalue. By choosing g(x) = xk , we see that |||λk1 Gn ||| → 0 for all k ∈ N. Example 8.9. In [246, Section 4], the following example is presented: ⎧ ⎛ ⎞ ⎪ 0 → 0133, ⎪ 1 0 0 1 ⎪ ⎪ ⎨1 → 12, ⎜1 1 0 0⎟ ⎟ χ: with associated matrix A = ⎜ ⎝0 1 0 0⎠ . ⎪ 2 → 3, ⎪ ⎪ ⎪ 2 0 1 0 ⎩3 → 0 The characteristic polynomial p(x) = x4 − 2x3 − x2 + 2x + 1 = (x2 − x − 1 − has solutions λ1 =

1+

9

√ 5+4 2 > 1, 2

λ2 =

1−

9



2)(x2 − x − 1 +

√ 5+4 2 < −1 2



2)

372

8. Miscellaneous Background Topics

and

9 √ √ 1 − i −5 + 4 2 −5 + 4 2 , λ4 = λ3 = 2 2 √ 2 − x − 1, we have g(λ ) = g(λ ) = 2, within the unit circle. For g(x) = x 1 2 √ so 2 solves (8.2). 1+i

9

Proof of Theorem 8.8. By the recursive relation (8.1) we can write Gn = d−1 n i=0 ci λi , where λi are the roots of p. Here we assume that all roots of p(x) are distinct, not just the roots outside the unit disk. Without this assumption we need to consider non-trivial Jordan blocks for these roots, which is more technical but not seriously different. Thus we have



ci λni + α ci λni . αGn = α |λi |≥1

Suppose that the polynomial g(x) = Fn :=

m

gj Gn+j =

j=0

=

d

|λi |