279 24 3MB
English Pages [94]
Letter from the Editor Susan Jane Colley
Beginning with this volume, the Monthly embarks upon a transition to new leadership. Della Dumbaugh is now Editor-Elect, and will become Editor in 2022. She is Professor of Mathematics at the University of Richmond. Her research focuses on the history of mathematics, especially the history of algebra in the 20th century. In addition to the many articles and book chapters she has written or co-written, Della was one of the editors of A Century of Advancing Mathematics, published by the MAA in 2015 in celebration of the Association’s centennial. She is coauthor (with Joachim Schwermer) of Emil Artin and Beyond: Class Field Theory and L-Functions. Della will bring a fresh and interesting perspective to the Monthly and I look forward to working with her this year. I know that readers join me in welcoming Della to the Monthly enterprise. In addition, I am pleased to announce that Gary Kennedy, Professor Emeritus of Mathematics at the Ohio State University, Mansfield, has agreed to serve as the Monthly’s Deputy Editor for the year. This is a new position, designed to ensure the smooth operation of the journal, especially in unforeseen circumstances; the importance of this position has been made evident with the advent of COVID-19. Gary has served as a stalwart Associate Editor for the Monthly, and I have very much valued—and will continue to value—his mathematical taste and good counsel. As the new year begins, I would also like to express my ongoing gratitude to all the members of the Editorial Board for their wise work; to Bonnie Ponce, our Managing Editor; Beverly Ruedi, the Monthly’s Electronic Production and Publishing Manager; and the staff at Taylor & Francis and at the MAA who enable the Monthly to appear on your computer screen and in your mailbox. I hope that you will enjoy the mathematical treats to come.
doi.org/10.1080/00029890.2021.1840174
January 2021]
LETTER FROM THE EDITOR
3
Containing All Permutations Michael Engen and Vincent Vatter
Abstract. Numerous versions of the question “what is the shortest object containing all permutations of a given length?” have been asked over the past fifty years: by Karp (via Knuth) in 1972; by Chung, Diaconis, and Graham in 1992; by Ashlock and Tillotson in 1993; and by Arratia in 1999. The large variety of questions of this form, which have previously been considered in isolation, stands in stark contrast to the dearth of answers. We survey and synthesize these questions and their partial answers, introduce infinitely more related questions, and then establish an improved upper bound for one of these questions.
1. INTRODUCTION. What is the shortest object containing all permutations of length n? As we shall describe, there are a variety of such problems, going by an assortment of names including superpatterns and superpermutations. Throughout, we call all such problems universal permutation problems. The diversity of these problems stems from the multiple possible definitions of the terms involved. To state these problems, it is necessary to view permutations as words. A word is simply a finite sequence of letters or entries drawn from some alphabet. The length of the word w, denoted |w| throughout, is its number of letters, and if w is a word of length at least i, then we denote by w(i) its ith letter. From this viewpoint, a permutation of length n is a word consisting of the letters [n] = {1, 2, . . . , n}, each occurring precisely once. Permutations are thus special types of words over the positive integers P. Two words u, v ∈ Pn (i.e., both of length n, with positive integer letters) are order-isomorphic if, for all indices i, j ∈ [n], we have u(i) > u(j ) ⇐⇒ v(i) > v(j ). In all universal permutation problems considered here, the object that is to contain all permutations of length n, called the universal object, is a word, but there are two different types of containment. Sometimes we insist that the word w contain each such permutation π as a contiguous subsequence, or factor, by which we mean that w can be expressed as a concatenation w = upv where the word p is order-isomorphic to π. At other times we merely insist that w contain each such permutation π as a subsequence, by which we mean that there are indices 1 ≤ i1 < i2 < · · · < in ≤ |w| so that the word p = w(i1 )w(i2 ) · · · w(in ) is order-isomorphic to π. These notions of containment give rise to two different universal permutation problems. To obtain infinitely many, we vary the size of the alphabet that the letters of the universal word w can be drawn from. In the strictest form, we insist that w be a word over the alphabet [n], meaning that w is only allowed the symbols of the permutations it must contain. In this case, the notion of order-isomorphism reduces to doi.org/10.1080/00029890.2021.1835384 MSC: Primary 05A05, Secondary 68R15; 05D99 (c) 2021 The Author(s). Published with license by Taylor & Francis Group, LLC This is an Open Access article distributed under the terms of the Creative Commons AttributionNonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.
4
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Table 1. Current best upper bounds for the lengths of the shortest universal words in six flavors of universal permutation problems (for large n).
words over [n] factor n! + (n − 1)! + (n − 2)! +(n − 3)! + n − 3 subsequence
19 7 n2 − n + 3 3
words over [n + 1]
words over P
n! + n − 1
n! + n − 1
n2 + n 2
n2 + 1 2
equality: a word p ∈ [n]n is order-isomorphic to a permutation π of length n if and only if p = π. At the other end of the spectrum, we allow the letters of w to be arbitrary positive integers. Between these extremes, another interesting case stands out: when the alphabet is [n + 1], thus allowing the universal word one more symbol than the permutations it must contain. Table 1 displays the best upper bounds established to date for the six versions of this question that have garnered the most interest. In the case of the rightmost two cells of the upper row, these upper bounds are known to be the actual answers. The bounds shown in Table 1 weakly decrease as we move from left to right (a word over [n] is also a word over [n + 1], which is also a word over P) and also as we go from top to bottom (factors are also subsequences). Another notable feature of this table is that the lengths of the shortest universal words over the alphabet [n] seem to be significantly greater than the lengths of the shortest universal words over the alphabet [n + 1], whose lengths seem to be either equal to or close to those of the shortest universal words over the largest possible alphabet, P. We remark that some research in this area has sought a universal permutation instead of a universal word, but this is in fact equivalent to finding a universal word over P, as we briefly explain. The word u ∈ Pn is order-homomorphic to the word v ∈ Pn if, for all indices i, j ∈ [n], we have u(i) > u(j ) =⇒ v(i) > v(j ). Less formally, if u is order-homomorphic to v, then all strict inequalities between entries of u also hold between the corresponding entries of v, but equalities between entries of u may be broken in v. It is clear that every word over P is order-homomorphic to at least one permutation (one simply needs to “break ties” among the letters of the word), and it follows that if u contains the permutation π (as a factor or subsequence) and u is order-homomorphic to v, then v also contains π (in the same sense—indeed, in the same indices—that u contains it). As every permutation is also a word over P, it follows finding a universal word, in either the factor or subsequence setting, is equivalent to finding a universal word over P. Each of the subsequent five sections of this paper is devoted to the examination of one of the cells of Table 1 (except for Section 3, which considers both the uppercenter and upper-right cells). While the results described in Sections 2–5 are previously known, the results of Section 6 appear for the first time here. In the final section, we briefly describe some further variations on universal permutation problems. 2. AS FACTORS, OVER [n]. The case in the upper-left of Table 1 dates to a 1993 paper of Ashlock and Tillotson [5] and can be restated as follows. What is the length of the shortest word over the alphabet [n] that contains each permutation of length n as a factor?
January 2021]
CONTAINING ALL PERMUTATIONS
5
This version of the universal permutation problem has recently attracted a surprising amount of attention, including an article in The Verge [18] and two in Quanta Magazine [23, 30], and investigations are very much ongoing. We call a word over the alphabet [n] that contains all permutations of length n as factors an n-superpermutation. A (not particularly good) lower bound on the length of n-superpermutations is easy to establish by observing that every word w has at most |w| − n + 1 many factors of length n. Observation 1. Every n-superpermutation has length at least n! + n − 1. In the cases of n = 1 and n = 2, the shortest n-superpermutations are easy to find. The word 1 meets the demands for n = 1 and the word 121 is as short as possible for n = 2. The shortest 3-superpermutation has length 9—one more than the lower bound above, but may be shown to be optimal with a slightly more delicate argument, which we now present. First, there is a word of length 9, 123121321, that contains all permutations of length 3 as factors. Now suppose that the word w over the alphabet [3] contains all permutations of length 3 as factors. We say that the letter w(i) is wasted if the factor w(i − 2)w(i − 1)w(i) is not equal to a new permutation of length 3—either because not all of the letters are defined, or because it contains a repeated letter, or because that permutation occurs earlier in w. As each nonwasted letter corresponds to the first occurrence of a permutation, we have |w| = 3! + (# of wasted letters in w). Clearly the first two letters of w are wasted. If w contains an additional wasted letter, then its length must be at least 9. Suppose then that w does not contain any additional wasted letters. Thus each of the factors w(1)w(2)w(3), w(2)w(3)w(4), w(3)w(4)w(5), and w(4)w(5)w(6) must be equal to different permutations. However, the only way for these factors to be equal to permutations at all is to have w(4) = w(1), w(5) = w(2), and w(6) = w(3), and this implies that w(4)w(5)w(6) = w(1)w(2)w(3), a contradiction. Computations by hand become more difficult at n = 4, but we invite the reader to check that the word 123412314231243121342132413214321 of length 33 is a 4-superpermutation, and that no shorter word suffices. As Ashlock and Tillotson [5] noticed, the lengths of these superpermutations are, respectively, 1! = 1, 2! + 1! = 3, 3! + 2! + 1! = 9, and 4! + 3! + 2! + 1! = 33. They also gave a recursive construction establishing the following result. Proposition 2 (Ashlock and Tillotson [5, Theorem 3 and Lemma 5]). If there is an (n − 1)-superpermutation of length m, then there is an n-superpermutation of length n! + m. 6
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Proposition 2 guarantees an n-superpermutation of length at most n! + · · · + 2! + 1!. Given this construction and the lower bounds they had been able to compute, Ashlock and Tillotson made the natural conjecture that the shortest n-superpermutation has length n! + · · · + 2! + 1! for all n. They further conjectured that all of the shortest n-superpermutations were unique up to the relabeling of their letters. For about twenty years, very little progress seemed to have been made on these conjectures, although they were rediscovered many times on Internet forums such as MathExchange and StackOverflow (references to some of these rediscoveries are given in Johnston’s article [28]). Then, in 2013, Johnston [28] constructed multiple distinct n-superpermutations of length n! + · · · + 2! + 1! for all n ≥ 5, proving that at least one of Ashlock and Tillotson’s two conjectures must be false, although giving no hint as to which one. A year later, Benjamin Chaffin verified the n = 5 case of the length conjecture by computer (see Johnston’s blog post [29] for details), showing that no word of length less than 153 = 5! + 4! + 3! + 2! + 1! is a 5-superpermutation. This showed, via Johnston’s constructions, that Ashlock and Tillotson’s uniqueness conjecture was certainly false, although their length conjecture might still have held. The next case of the length conjecture to be verified would be n = 6, where the conjectured shortest length was 6! + 5! + 4! + 3! + 2! + 1! = 873. However, only weeks after Chaffin’s verification of the length conjecture for n = 5, Houston [24]— by viewing the problem as an instance of the traveling salesman problem—found a 6-superpermutation of length only 872. Whether this is the shortest 6-superpermutation is the focus of an ongoing distributed computing project at www.supermutations.net. Regardless of the outcome of that project, the 6-superpermutation of length 872 and Proposition 2 reduce the upper bound on the length of the shortest n-superpermutation to n! + · · · + 3! + 2! for all n ≥ 6. After breaking the length conjecture of Aslock and Tillotson in 2014, Houston created a Google discussion group called Superpermutators, where those interested in the problem could work on it in a loose and Polymath-esque manner, and most of the subsequent research mentioned here has been communicated there. The next breakthrough was made shortly after John Baez tweeted about Houston’s construction in September 2018. This tweet caused Greg Egan, who is known for his science fiction novels (coincidently including one entitled Permutation City [13]), to become interested in the problem. Egan found inspiration in an unpublished manuscript of Williams [44]. In that paper, Williams showed how to construct Hamiltonian paths and cycles in the Cayley graph on the symmetric group Sn generated by the two permutations denoted by (12 · · · n) and (12) in cycle notation (see Sawada and Williams [42] for a published, streamlined construction). Williams’s construction had solved a forty-year-old conjecture of Nijenhuis and Wilf [38] (later included by Knuth as an exercise with a difficulty rating of 48/50 in Volume 4A of the Art of Computer Programming [32, Problem 71 of Section 7.2.1.2]), and, in October 2018, Egan showed how it could be adapted to prove the following. Theorem 3 (Egan [14]). For all n ≥ 4, there is an n-superpermutation of length at most n! + (n − 1)! + (n − 2)! + (n − 3)! + n − 3. For n = 6, the construction of Theorem 3 is worse than Houston’s (Theorem 3 gives a 6-superpermutation of length 873), but for n ≥ 7 this bound is strictly less than the bound of n! + · · · + 3! + 2! implied by Houston’s construction and Proposition 2. January 2021]
CONTAINING ALL PERMUTATIONS
7
The efforts described above yield upper bounds. For lower bounds, Ashlock and Tillotson improved on Observation 1 by focusing on wasted letters as we did earlier in the n = 3 case. For general n, we say that the letter w(i) is wasted if the factor w(i − n + 1)w(i − n + 2) · · · w(i) is either not a permutation of length n, or occurs earlier in w. The crucial observation is that if neither w(i) nor w(i + 1) are wasted letters, then the permutations ending at those letters are cyclic rotations of each other. The n! permutations of length n can be partitioned into (n − 1)! disjoint cyclic classes, where the cyclic class of the permutation π consists of all of its cyclic rotations. For example, the cyclic class of the permutation 12345 is {12345, 23451, 34512, 45123, 51234}. Our reasoning above implies that upon completing a cyclic class (having visited all of its members), the next letter in the word (if there is one) must be wasted. Any nsuperpermutation must complete all (n − 1)! cyclic classes, and thus doing so requires at least (n − 1)! − 1 wasted letters. Together with the n − 1 letters at the beginning of w, which are trivially wasted, we obtain the following result. Proposition 4 (Ashlock and Tillotson [5, Proof of Theorem 18]). For all n ≥ 1, every n-superpermutation has length at least n! + (n − 1)! + n − 2. At least since a 2013 blog post of Johnston [27], it had been known that there was an argument (on a website devoted to anime) claiming to improve on the lower bound provided by Proposition 4. However, the argument was far from what most mathematicians would consider a proof, and there had been no efforts to make it into one, in part because the claimed lower bound was so far from what was thought to be the correct answer at the time. However, Egan’s breakthrough quickly inspired several participants of the Superpermutators group to re-examine the argument. In the process, it was realized not only that the argument was correct, but that it did not originate on the anime website where Johnston had found it. Instead, it had been copied there from a series of anonymous posts in 2011 on the somewhat-notorious Internet forum 4chan. The crux of the argument is an idea that we call a trajectory (though the original proof called it a 2-loop). The proof of Proposition 4 suggests that in building an nsuperpermutation, one might try to complete an entire cyclic class, then waste a single letter to enter a new cyclic class, and so on. For example, in the n = 5 case, suppose we visit the cyclic class of 12345 in order, 12345, 23451, 34512, 45123, 51234. Once we have come to the 4 of 51234, there is a unique way to waste a single letter to move to a different cyclic class; this is to append the letters 15, and doing so moves us to the cyclic class of the permutation 23415. It would then be natural to complete this cyclic class, by visiting the permutations 23415, 34152, 41523, 15234, 52341. After that it would again be natural to waste a letter to traverse to the cyclic class of 34125 and complete that class by visiting the permutations 34125, 41253, 12534, 25341, 53412. 8
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Finally, by wasting a letter to enter and complete the cyclic class of 41235, we would encounter the permutations 41235, 12354, 23541, 35412, 54123 in that order. However, from that point there would be no way to waste a single letter to enter a new cyclic class; by appending a 5 we would cycle back to 41235, while appending a 45 would return us to 12345. In general, by following this procedure one would visit n − 1 cyclic classes before reaching a point where wasting a single letter would either cause us to stay in the same cyclic class or to return to the initial permutation. We define the trajectory of the permutation π of length n to consist of the sequence of n(n − 1) permutations visited by following this procedure starting at π, and thus the above sequence of permutations is the trajectory of 12345. We caution the reader that trajectories do not partition the set of permutations; while 23451 lies in the trajectory of 12345, the trajectories of 23451 and of 12345 contain different sets of permutations. As we read a superpermutation from left to right, we keep track of which trajectory we are on. We begin on the trajectory of the first permutation we see in the word. After that, we say that we change trajectories whenever the word deviates from the above pattern of traversing an entire cyclic class, wasting a letter, traversing an entire cyclic class, etc., and the trajectory we change to is the trajectory of the first permutation encountered after a change of trajectories. Changing trajectories obviously requires at least one wasted letter because one must at least change cyclic classes to change trajectories. We view the wasted letter immediately before entering the new trajectory (that is, encountering a new permutation) as the letter wasted to change trajectories. As each trajectory contains n(n − 1) permutations, any n-superpermutation must change trajectory at least (n − 2)! − 1 times, and doing so requires at least (n − 2)! − 1 wasted letters. To improve on Proposition 4, we now argue as follows. As in the proof of Proposition 4, any n-superpermutation must complete all (n − 1)! cyclic classes, and doing so requires at least (n − 1)! − 1 wasted letters. We view the letter wasted immediately after completing a cyclic class as the letter wasted to leave a completed cyclic class. For example, suppose that our word begins with the prefix 123451234152314. Thus we begin on the trajectory of 12345. The next four letters, 1234, complete the cyclic class of 12345. The letter immediately after that (the first 1 above) is wasted to leave that completed cyclic class. We then visit the permutations 23415, 34152, and 41523 in that order before wasting another letter (the second 1 above) to change trajectories. Finally, we note that the letters wasted to complete cyclic classes and those wasted to change trajectory must be distinct—indeed, this claim amounts to saying that when one has completed a cyclic class, wasting a single letter does not change trajectories. This completes the proof of the following result. Theorem 5 (Anonymous 4chan poster). For all n ≥ 1, every n-superpermutation has length at least n! + (n − 1)! + (n − 2)! + n − 3. Houston has shown (in the Superpermutators group) that the bound in Theorem 5 can be increased by 1. For general n, Theorem 3 and this improvement to Theorem 5 January 2021]
CONTAINING ALL PERMUTATIONS
9
are the best results established so far. There had been some hope in the Superpermutators group that perhaps Egan’s construction could be made one letter shorter for n ≥ 7, while the lower bound could be increased by (n − 3)! − 1, so that the two met at n! + (n − 1)! + (n − 2)! + (n − 3)! + n − 4, but this has also been shown to be false in the n = 7 case. In this case, the original length conjecture of Ashlock and Tillotson suggested that the length of the shortest 7-superpermutation should be 7! + 6! + 5! + 4! + 3! + 2! + 1! = 5913, while Egan’s Theorem 3 gives a 7-superpermutation of length 7! + 6! + 5! + 4! + 4 = 5908. In February 2019, Bogdan Coanda made several theoretical improvements to the computer search for superpermutations and used these to find a 7-superpermutation of length 7! + 6! + 5! + 4! + 3 = 5907, thus matching the wishful thinking above. (Continuing the tradition of “publishing” progress on this problem in unorthodox places, Coanda announced his construction pseudonymously in the comment section of a YouTube video [39] about the problem.) Shortly thereafter, Egan and Houston modified Coanda’s approach to construct a 7-superpermutation of length 7! + 6! + 5! + 4! + 2 = 5906. 3. AS FACTORS, OVER [n + 1] AND P. In moving from the previous universal permutation problem to this one, we see for the first of two times the dramatic effect of adding a letter to the alphabet. Not only does the addition of a single letter seem to significantly shorten the universal words, but it changes the problem from one that remains wide open to one solved a decade ago. A de Bruijn word of order n over the alphabet [k] is a word w of length k n such that every word in [k]n occurs exactly once as a cyclic factor in w, or equivalently, every such word occurs exactly once as a factor in the longer word w(1)w(2) · · · w(k n ) w(1)w(2) · · · w(n − 1). These words were (mis)named for de Bruijn (see [11]) because in addition to estabn−1 lishing that such words exist, he showed that there are precisely (k!)k /k n of them. An example of a de Bruijn word, written cyclically, is shown on the left of Figure 1. In their highly influential 1992 paper, Chung, Diaconis, and Graham [8] explored generalizations of de Bruijn words to other types of objects, including permutations. (In fact, Diaconis and Graham [12, Chapter 4] state that their motivation was a magic trick.) As they defined it, a universal cycle (frequently shortened to ucycle) for the permutations of length n would be a word w of length n! (over some alphabet) such that every permutation of length n is order-isomorphic to a cyclic factor of w, or equivalently, to a factor of the slightly longer word w(1)w(2) · · · w(n!) w(1)w(2) · · · w(n − 1). An example of a universal cycle over [5], written cyclically, for the permutations of length 4 is shown on the right of Figure 1. If such a universal cycle w were to exist (which was the question they were interested in, leaving enumerative concerns for later), then the word w(1)w(2) · · · w(n!) w(1)w(2) · · · w(n − 1) would be, in our terms, a shortest possible answer to the universal permutation problem for factors over the alphabet P. In this way, their universal cycle of length 4! = 24 for the permutations of length 4 shown on the right of Figure 1 is converted (starting at noon and proceeding clockwise) into the universal word 123412534153214532413254 123 10
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Figure 1. On the left, a de Bruijn word of order 3 over the alphabet [3]. On the right, a universal cycle for the permutations of length 4 over the alphabet [5].
of length 4! + 4 − 1 = 27. Thus, together with the trivial lower bound of n! + n − 1 noted in Observation 1, the answer to the question posed in the upper-right-hand cell of Table 1 is implied by the following result. Theorem 6 (Chung, Diaconis, and Graham [8]). For all positive integers n, there is a universal cycle over the alphabet [6n] for the permutations of length n. Chung, Diaconis, and Graham left open the question of whether the alphabet [6n] could be shrunk. Proposition 4 shows that, for n ≥ 3, there cannot be a universal cycle over the alphabet [n] for the permutations of length n. Therefore the result below, established by Johnson in 2009, is best possible. Theorem 7 (Johnson [26]). For all positive integers n, there is a universal cycle over the alphabet [n + 1] for the permutations of length n. In terms of universal permutation problems, Theorem 7 establishes that there is a word of length n! + n − 1 over the alphabet [n + 1] that contains every permutation of length n as a factor. 4. AS SUBSEQUENCES, OVER [n]. The universal permutation problem for subsequences over the alphabet [n] pre-dates the others by 20 years. In a 1972 technical report entitled “Selected Combinatorial Research Problems” and edited together with Chv´atal and Klarner, Knuth [10, Problem 36] stated the following problem, which he attributed to Richard Karp: What is the shortest string of {1, 2, . . . , n} containing all permutations on n elements as subsequences? (For n = 3, 1213121; for n = 4, 123412314321; for n = 5, M. Newey claims the shortest has length 19.)
To this day, the lengths of the shortest universal words in this case are known exactly only for 1 ≤ n ≤ 7. These values were computed by Newey to be 1, 3, 7, 12, 19, 28, and 39 in his 1973 technical report [37], and he observed that this sequence is equal to n2 − 2n + 4 for 3 ≤ n ≤ 7. In fact, Newey gave a construction of universal words of this length for all n ≥ 3, meaning n2 − 2n + 4 is an upper bound on the answer to this universal permutation problem. While Newey remarked that it is an “obvious conjecture” that the length of the shortest universal word in this case is n2 − 2n + 4, he also suggested a competing conjecture that would imply that the lengths grow like n2 − n log2 (n). Simpler constructions of universal words of length n2 − 2n + 4 were presented in a 1974 paper of Adleman [1], a 1975 paper of Koutas and Hu [33], and a 1976 paper of Galbiati and Preparata [16]. The latter two constructions were given a common January 2021]
CONTAINING ALL PERMUTATIONS
11
generalization in the 1980 paper of Mohanty [36]. Interestingly, of these four papers, only Koutas and Hu were bold (or foolish) enough to conjecture that n2 − 2n + 4 is the true answer (it isn’t). After this initial flurry of activity, the problem laid dormant until the surprising 2011 work of Z˘alinescu [45], who lowered the upper bound by 1 for n ≥ 10, constructing a word of length n2 − 2n + 3 that contains all permutations of length n as subsequences. However, his upper bound stood for just over one year before being improved upon, for n ≥ 11, by the following. Theorem8 (Radomirovi´c [40]). For all n ≥ 7, there is a word over the alphabet [n] of length n2 − 7n/3 + 19/3 containing subsequences equal to every permutation of length n. For a lower bound on the length of a universal word in this context, we briefly present the elementary proof given by Kleitman and Kwiatkowski [31]. Let w ∈ [n]∗ be a word that contains each permutation of length n as a subsequence. Choose π(1) to be the symbol whose earliest occurrence in w is as late as possible, and note that this occurrence may not appear before w(n). Next, choose π(2) to be the symbol whose earliest occurrence after π(1) in w is as late as possible, and note that this occurrence must be at least n − 1 symbols later. Then, choose π(3) to be the symbol whose earliest occurrence after π(1)π(2) first appears as a subsequence in w is as late as possible, and note that this means that π(3) must occur at least n − 2 symbols later. Continuing in this manner, we construct a permutation π whose earliest possible occurrence in w requires at least n + (n − 1) + · · · + 2 + 1 =
n2 + n 2
symbols. Kleitman and Kwiatkowski [31] go on to prove (via a delicate inductive argument) a lower bound of n2 − c n(7/4)+ , where the constant c depends on . While this later bound lacks concreteness, it does establish that the lengths of the shortest universal words in this case are asymptotic to n2 . 5. AS SUBSEQUENCES, OVER [n + 1]. As in the factor case, by adding a single symbol to our alphabet, we again see a dramatic decrease in the length of the shortest universal word. To date, this version of the problem has only been studied implicitly, in the 2009 work of Miller [35], where she established the following bound. Theorem 9 (Miller [35]). For all n ≥ 1, there is a word over the alphabet [n + 1] of length (n2 + n)/2 containing subsequences order-isomorphic to every permutation of length n. To establish this result, define the infinite zigzag word to be the word formed by alternating between ascending runs of the odd positive integers 1357 · · · and descending runs of the even positive integers · · · 8642, 1357 · · · · · · 8642 1357 · · · · · · 8642 1357 · · · · · · 8642 · · · . While this object does not conform to most definitions of the word word in combinatorics, we hope the reader forgives us the slight expansion of the definition adopted here. We are interested in the leftmost embeddings of words over P into the infinite zigzag word. 12
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
We also need two definitions. First, given a word p ∈ P∗ , we define the word p +1 ∈ P to be the word formed by adding 1 to each letter of p, so p +1 (i) = p(i) + 1 for all indices i of p. Next we say that the word p ∈ P∗ has an immediate repetition if there is an index i with p(i) = p(i + 1), i.e. if p contains a factor equal to for some letter ∈ P. ∗
Proposition 10. If the word p ∈ Pn has no immediate repetitions, then either p or p +1 occurs as a subsequence of the first n runs of the infinite zigzag word. Before proving Proposition 10, note that permutations do not have immediate repetitions. Thus if π is a permutation of length n, Proposition 10 implies that either π or π +1 occurs as a subsequence in the first n runs of the infinite zigzag word. Since π +1 is order-isomorphic to π and both π and π +1 are words over [n + 1], this implies that the restriction of the first n runs of the infinite zigzag word to the alphabet [n + 1] contains every permutation of length n. For example, in the case of n = 5 we obtain the universal word 135 642 135 642 135 of length 15 over the alphabet [6]. The restriction of the infinite zigzag word described above consists of n runs of average length (n + 1)/2: if n is odd, then all runs are of this length, while if n is even, then half are of length n/2 and half are of length (n + 2)/2. Thus Proposition 10 implies Theorem 9. While Proposition 10 does not appear explicitly in Miller [35], its proof, presented below, is adapted from her proof of Theorem 9. Proof of Proposition 10. We define the score of the word p ∈ P∗ , denoted by s(p), as the minimum number of runs that an initial segment of the infinite zigzag word must have in order to contain p, minus the length of p. Thus our goal is to show that for every word p ∈ P∗ without immediate repetitions, either s(p) ≤ 0 or s(p+1 ) ≤ 0. In fact, we show that for such words we have s(p) + s(p +1 ) = 1, which implies this. We prove this claim by induction on the length of p. For the base case, we see that words consisting of a single odd letter are contained in the first run of the infinite zigzag word (thus corresponding to scores of 0) while words consisting of a single even letter are contained in the second run (corresponding to scores of 1). Thus for every ∈ P1 we have s() + s(+1 ) = 1, as desired. Now suppose that the claim is true for all words p ∈ Pn without immediate repetitions and let ∈ P denote a letter. We see that, for any p ∈ Pn , ⎧ if p(n) < and both entries are odd or ⎪ −1 ⎪ ⎪ if p(n) > and both entries are even; ⎪ ⎪ ⎪ ⎪ ⎨ 0 if p(n) and are of different parity; or s(p) − s(p) = ⎪ ⎪ if p(n) < and both entries are even, ⎪ ⎪ ⎪ ⎪ if p(n) = , or +1 ⎪ ⎩ if p(n) > and both entries are odd. Because our words do not have immediate repetitions, we can ignore the possibility that = p(n). In the other cases, it can be seen by inspection that
s(p) − s(p) + s (p)+1 − s p +1 = 0. By rearranging these terms, we see that
s(p) + s (p)+1 = s(p) + s p +1 . January 2021]
CONTAINING ALL PERMUTATIONS
13
Since s(p) + s p +1 = 1 by induction, this completes the proof of the inductive claim, and thus also of the proposition. We conclude our consideration of this case by providing a lower bound. Suppose that the word w over the alphabet [n + 1] contains subsequences order-isomorphic to every permutation of length n. For each letter ∈ [n + 1], let r denote the number of occurrences of the letter in w. To create a subsequence of w that is order-isomorphic to a permutation, we must choose a letter of the alphabet [n + 1] to omit and then choose precisely one occurrence of each of the other letters. Thus the number of permutations that can be contained in w is at most 1 r1 · · · r−1 r+1 · · · rn+1 = r1 · · · rn+1 . r ∈[n+1] ∈[n+1]
Setting m = |w| = r , we see that the above quantity attains its maximum over all (r1 , . . . , rn+1 ) ∈ Rn+1 ≥0 when each r is equal to m/(n + 1), and in that case the number of permutations contained in w is at most n m . (n + 1) n+1
If w is to contain all permutations of length n, then this quantity must be at least n!. Using the fact that k! ≥ (k/e)k for all k, we must therefore have n n n m ≥ n! ≥ . (n + 1) n+1 e It follows that, asymptotically, we must have m ≥ n2 /e. 6. AS SUBSEQUENCES, OVER P. For the final cell of Table 1, we seek a word over the positive integers P that contains all permutations of length n as subsequences. As remarked upon in the Introduction, this is equivalent to seeking a permutation that contains all permutations of length n, and such a permutation is sometimes called an n-superpattern (e.g., by B´ona [7, Chapter 5, Exercises 19–22 and Problems Plus 9– 12]). The first result about universal permutations of this type was obtained by Simion and Schmidt in 1985 [43, Section 5]. They computed the number of 3-universal permutations of length m ≥ 5 to be m m − 2Fm − 14m + 20. m! − 6Cm + 5 · 2 + 4 2 (Here Cm denotes the mth Catalan number and Fm denotes the mth combinatorial Fibonacci number, so F0 = F1 = 1 and Fm = Fm−1 + Fm−2 for m ≥ 2.) However, the first to study this version of the universal permutation problem for general n was Arratia [4] in 1999. As our alphabet has only expanded from the version of the problem discussed in the previous section, the upper bound of (n2 + n)/2 established in Theorem 9 also holds for the version of the problem discussed in this section. It should be noted that before Miller [35] established Theorem 9 in 2009, Eriksson, Eriksson, Linusson, and W¨astlund [15] had established an upper bound for this problem asymptotically equal to 2n2 /3. Here, we give a new improvement to Miller’s upper bound. In order to do so, we further restrict the infinite zigzag word, and then break ties between its letters to obtain 14
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
a specific permutation ζn . To this end, we define the word zn to be the restriction of the first n runs of the infinite zigzag word to the alphabet [n]. When n is even, each run of zn has length n/2. When n is odd, zn consists of (n + 1)/2 ascending odd runs, each of length (n + 1)/2, and (n − 1)/2 descending even runs, each of length (n − 1)/2. Thus we have ⎧ n2 ⎪ ⎪ if n is even, ⎨ 2 |zn | = 2 ⎪ ⎪ ⎩ n + 1 if n is odd. 2 Next we choose a specific permutation, ζn , such that zn is order-homomorphic to ζn . Recall that this means that for all indices i and j , zn (i) > zn (j ) =⇒ ζn (i) > ζn (j ). In constructing ζn , we have the freedom to break ties between equal letters of zn . That is to say, if zn (i) = zn (j ) for i = j , then in constructing ζn we may choose whether ζn (i) < ζn (j ) or ζn (i) > ζn (j ) arbitrarily without affecting any other pair of comparisons and thus without losing any occurrences of permutations. We choose to break these ties by replacing all instances of a given letter k ∈ [n] in zn by a decreasing subsequence in ζn . Thus for indices i < j , we have zn (i) = zn (j ) =⇒ ζn (i) > ζn (j ). This choice uniquely determines ζn (up to order-isomorphism), as all comparisons between its letters are determined either in zn , if the corresponding letters of zn differ, or by the rule above, if the corresponding letters of zn are the same. Figure 2 shows the plots of z5 and ζ5 , where the plot of a word w over P is the set {(i, w(i))} of points in the plane.
Figure 2. On the left, the further restriction we define of the infinite zigzag word, z5 . On the right, the normalized permutation formed by breaking ties, ζ5 .
In the following sequence of results, we show that ζn is almost universal. In fact, we show that ζn fails to be universal only for even n, and in that case, the only missing permutation is the decreasing permutation n · · · 21. The first of these results, Proposition 11, covers almost all permutations. (In fact, Proposition 12 shows that Proposition 11 handles all but 2n−1 permutations of length n.) January 2021]
CONTAINING ALL PERMUTATIONS
15
We say that two entries π(j ) and π(k) form an inverse-descent if j < k and π(j ) = π(k) + 1. (As the name is meant to indicate, if a pair of entries forms an inversedescent in π, then the corresponding entries of π −1 form a descent.) If π(j ) and π(k) form an inverse-descent and they are not adjacent in π (so k ≥ j + 2), then we say that they form a distant inverse-descent. Proposition 11. If the permutation π of length n has a distant inverse-descent, then ζn contains a subsequence order-isomorphic to π. Proof. Suppose that the entries π(a) and π(b) form a distant inverse-descent in π, meaning that π(a) = π(b) + 1 and b ≥ a + 2. We define the word p ∈ [n − 1]n by π(i) if π(i) ≤ π(b), p(i) = π(i) − 1 if π(i) ≥ π(a) = π(b) + 1. The word p has two occurrences of the letter π(b), but because π(a) and π(b) form a distant inverse-descent, these two occurrences of π(b) in p do not constitute an immediate repetition. Thus, Proposition 10 shows that either p or p +1 occurs as a subsequence in the first n runs of the infinite zigzag word. As p and p +1 are both words over [n], whichever of these words occurs in the first n runs of the infinite zigzag word also occurs as a subsequence of zn . Suppose that this subsequence occurs in the indices 1 ≤ i1 < i2 < · · · < in ≤ |zn |, so zn (i1 )zn (i2 ) · · · zn (in ) is equal to either p or p +1 , and thus for j, k ∈ [n] we have zn (ij ) > zn (ik ) ⇐⇒ p(j ) > p(k). Because zn is order-homomorphic to ζn , this implies that for all pairs of indices j, k ∈ [n] except the pair {a, b}, we have ζn (ij ) > ζn (ik ) ⇐⇒ p(j ) > p(k) ⇐⇒ π(j ) > π(k). Furthermore, since p(a) = p(b), we have zn (a) = zn (b), and so by our construction of ζn it follows that ζn (a) > ζn (b), while we know that π(a) > π(b) because those entries form an inverse-descent. This verifies that ζn (i1 )ζn (i2 ) · · · ζn (in ) is order-isomorphic to π, completing the proof. To describe the permutations that Proposition 11 does not apply to, we need the notions of sums of permutations and layered permutations. Given permutations π and σ of respective lengths m and n, their (direct) sum is the permutation π ⊕ σ of length m + n defined by π(i) if 1 ≤ i ≤ m, (π ⊕ σ )(i) = σ (j − m) + m if m + 1 ≤ i ≤ m + n. Pictorially, the plot of π ⊕ σ then consists of the plot of σ placed above and to the right of the plot of π, as shown on the left of Figure 3. A permutation is said to be layered if it can be expressed as a sum of decreasing permutations, and in this case, these decreasing permutations are themselves called the layers. An example of a layered permutation is shown on the right of Figure 3. Proposition 12. The permutation π is layered if and only if it does not have a distant inverse-descent. Proof. One direction is completely trivial: if π is layered then all of its inversedescents are between consecutive entries, so it does not have a distant inverse-descent. For the other direction we use induction on the length of π. The empty permutation 16
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Figure 3. The plot of the sum of π and σ is shown on the left. The figure on the right shows the plot of the layered permutation 21 3 654 87 with layer lengths 2, 1, 3, 2.
is layered, so the base case holds. If π is a nonempty permutation without distant inverse-descents, then it must begin with the entries π(1), π(1) − 1, . . . , 2, 1 in that order. This means that π = δ ⊕ σ where δ is a nonempty decreasing permutation and σ is a permutation shorter than π that also does not have any distant inverse-descents. By induction, σ is layered, and thus π is as well, completing the proof. Having characterized the permutations to which Proposition 11 does not apply, we now show that almost all of them are nevertheless contained in ζn . Proposition 13. If the permutation π of length n is layered and not a decreasing permutation of even length, then ζn contains a subsequence order-isomorphic to π. Proof. Let π denote an arbitrary layered permutation of length n. To prove the result, we compute the score of π as in the proof of Proposition 10, show that this score can only take on the values 0 or ±1, and then describe an alternative embedding of π in ζn in the case where the score of π is 1, except when π is a decreasing permutation of even length. Recall that the score of any word π, s(π), is defined as the number of initial runs of the infinite zigzag word necessary to contain π minus the length of π. As observed in the proof of Proposition 10, the score of a word does not change upon reading a letter of opposite parity. This implies that, while reading a layered permutation, the score changes only when transitioning from one layer to the next, and thus we compute the score of π layer-by-layer. The change in score when moving from one layer of π to the next is determined by the parity of the last entry of the layer we are leaving and the first entry of the layer we are entering. Specifically, the score changes by −1 if both of these entries are odd and +1 if both are even. This shows that in order to compute the score of the layered permutation π, we simply need to know the parities of the first and last entries of each of its layers. This information is represented by the labels of the nodes of the directed graph shown in Figure 4.
Figure 4. A directed graph describing the scoring of a layered permutation.
Moreover, not all transitions between these nodes are possible, because the last entry of a layer is precisely 1 greater than the first entry of the preceding layer. This is why there are only eight edges shown in Figure 4. In this figure, each of those edges January 2021]
CONTAINING ALL PERMUTATIONS
17
is labeled by the change in the score function. Note that the first layer must end with 1 (an odd entry), and its first entry must be either odd (for a score of 0) or even (for a score of 1); this is equivalent to starting our walk on the graph in Figure 4 at the node labeled (even–even) before any layers are read. From this graphical interpretation of the scoring process, it is apparent that the score of a layered permutation can take on only three values: −1 if it ends at the node (odd–even); 0 if it ends at either node (even–even) or (odd–odd); or 1 if it ends at the node (even–odd). Except in this final case, we are done. Now suppose that we are in the final case, so the ultimate layer of π is of (even–odd) type. The first entry of this layer is the greatest entry of π, so we know that π has even length. If π were a decreasing permutation then there would be nothing to prove (as we have not claimed anything in this case), so let us further suppose that π is not a decreasing permutation, and thus that π has at least two layers. We further divide this case into two cases. In both cases, as in the proof of Proposition 11, we construct a word p ∈ [n − 1]n such that if zn contains p, then ζn contains π. First, suppose that the penultimate layer of π is of (even–odd) type and that this layer begins with the entry π(b). This implies that the penultimate layer of π has at least two entries (because its first and last entries have different parities). In this case, we define p by p(i) =
π(i) if π(i) < π(b), π(i) − 1 if π(i) ≥ π(b).
In other words, to form p from π we decrement the first entry of the penultimate layer and all entries of the ultimate layer. Because the penultimate layer of π has at least two entries, performing this operation creates an immediate repetition (of the entry π(b) − 1) at the beginning of this layer. For example, if π = 21 6543 87 then π(b) = 6 and we decrement the 6, 8, and 7 to obtain the word p = 21 5543 76. As with our previous constructions, if zn contains an occurrence of p, then ζn will contain a copy of π. We establish that zn contains p by showing that s(p) = 0, which requires a further bifurcation into subcases. In both subcases, the scoring of p is computed by considering its score in the antepenultimate layer (the layer immediately before the penultimate layer), the score change when reading the newly decremented first entry of the penultimate layer, the score penalty of +1 because p contains an immediate repetition (namely, π(b) − 1 occurs twice in a row), and finally the score change between the penultimate and ultimate layers. We label these cases by the final three nodes of the directed graph from Figure 4 visited while computing the score of π. •
18
The final three layers are of type (even–even)(even–odd)(even–odd). Note that this case includes the possibility that π has only two layers. If p has an antepenultimate layer, then the score while reading that layer is 0 and the ascent between its last entry and the newly decremented first entry of the penultimate layer is of different parity (even to odd), contributing 0 to the score. If p does not have an antepenultimate layer, then p begins with the newly decremented first entry of its penultimate layer, which contributes 0 to the score. In either case, the score of p is 0 upon reading the first entry of the penultimate layer. The immediate repetition in the penultimate layer contributes +1 to the score, while the ascent between the last entry of the penultimate layer and the newly decremented first entry of the ultimate layer is odd and thus contributes −1, so s(p) = 0. c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
•
The final three layers are of type (even–odd)(even–odd)(even–odd). The score while reading the antepenultimate layer is +1. The ascent between the last entry of the antepenultimate layer and the newly decremented first entry of the penultimate layer is odd, so it contributes −1 to the score, the immediate repetition in the penultimate layer contributes +1, and the ascent between the last entry of the penultimate layer and the newly decremented first entry of the ultimate layer is odd and thus contributes −1, so s(p) = 0.
It remains to treat the case where the penultimate layer is of (even–even) type. Note that this case includes the possibility that the penultimate layer consists of a single entry. Suppose that the penultimate layer ends with the entry π(a). We define p by π(i) if π(i) < π(a) or π(i) = n, p(i) = π(i) + 1 if π(i) ≥ π(a) and π(i) = n. Thus in forming p from π we increment all entries of the penultimate layer and all but the first entry of the ultimate layer. For example, if π = 21 3 654 87, then we increment the 6, 5, 4, and 7 to obtain the word p = 21 3 765 88. As before, if zn contains an occurrence of p then ζn will contain a copy of π. Thus we need only show that s(p) = 0, which we do, as in the previous case, by considering the scoring of the final three layers. As in that case, we identify two subcases. •
•
The final three layers are of type (odd–even)(even–even)(even–odd). The score while reading the antepenultimate layer is −1. The ascent between the last entry of the antepenultimate layer and the newly incremented first entry of the penultimate layer is of different parity (even to odd) and thus contributes 0 to the score. The ascent between the newly incremented last entry of the penultimate and the first entry of the ultimate layer (which is n) is of different parity (odd to even) and thus contributes 0 to the score. Finally, the immediate repetition at the beginning of the ultimate layer (the two entries equal to n) contributes +1 to the score, so s(p) = 0. The final three layers are of type (odd–odd)(even–even)(even–odd). The score while reading the antepenultimate layer is 0. The ascent between the last entry of the antepenultimate layer and the newly incremented first entry of the penultimate layer contributes −1 to the score (as both entries are now odd). The ascent between the newly incremented last entry of the penultimate layer and the first entry of the ultimate layer (which is n) is of different parity (odd to even) and thus contributes 0 to the score. Finally, the immediate repetition at the beginning of the ultimate layer contributes +1 to the score, so s(p) = 0. As we have considered all of the cases, the proof is complete.
It remains only to conclude. The length of ζn is (n2 + 1)/2 when n is odd and n2 /2 when n is even. When n is odd, we have established that ζn is universal. However, Proposition 13 shows that ζn need not be universal when n is even. (Indeed, it can be checked that ζn is not universal when n is even.) However, in this case we know that ζn contains the decreasing permutation (n − 1) · · · 21 (for instance because it contains the permutation (n − 1) · · · 21 ⊕ 1). Thus we obtain a universal permutation by prepending a new maximum entry to ζn , giving us the following bound. Theorem 14. There is a word over P of length (n2 + 1)/2 containing subsequences order-isomorphic to every permutation of length n. A computer search reveals that the bound in Theorem 14 is best possible for n ≤ 5. Alas, for n = 6 the bound in the Theorem 14 is 19, but Arnar Arnarson [private January 2021]
CONTAINING ALL PERMUTATIONS
19
communication] has found that the permutation 6 14 10 2 13 17 5 8 3 12 9 16 1 7 11 4 15 of length 17 is universal for the permutations of length 6. Computations have shown that no shorter permutation is universal for the permutations of length 6. The best lower bound in this case is still the one given by Arratia [4] in his initial work on the problem. Note that if the word w of length m over the alphabet P is to contain subsequences order-isomorphic to each permutation of length n, then we must have m ≥ n!. n As in the analysis of the lower bound of the previous section, using the fact that k! ≥ (k/e)k for all k, we see that for the above inequality to hold we must have n n m n me ≥ ≥ n! ≥ , n e n from which it follows that we must have m ≥ n2 /e2 . In fact, Arratia [4, Conjecture 2] conjectured that the length of the shortest universal permutation in this case is asymptotic to n2 /e2 . 7. FURTHER VARIATIONS. In case the infinitely many problems introduced so far are not enough, we conclude by briefly describing further variants that have received attention. 1. As observed in Section 2, there is no universal cycle over the alphabet [n] for the permutations of length n. However, Jackson [25] proved that there is a universal cycle over the alphabet [n] for all shorthand encodings of permutations of length n, where the shorthand encoding of the permutation π of length n is the word π(1) · · · π(n − 1). This result and some extensions are discussed in [32, Section 7.2.1.2, Exercises 111–113], where Knuth asked for an explicit construction of such a universal cycle (Jackson’s proof was nonconstructive). Knuth’s request was answered by Ruskey and Williams [41]. Further constructions have been given by Holroyd, Ruskey, and Williams [21, 22]. 2. Gupta [19] considered a subsequence version of a universal cycle for permutations. A rosary is a word w over the alphabet [n] such that every permutation of length n is contained as a subsequence of the word w(k)w(k + 1) · · · w(|w|)w(1)w(2) · · · w(k − 1) for some value of k. In other words, thinking of the letters as being arranged in a circle as on the left of Figure 5, we may start anywhere we like, but must traverse the rosary clockwise, and cannot return to where we started. Gupta conjectured that one could always construct a rosary of length at most n2 /2. This conjecture was discussed by Guy [20, Problem E22] and proved in the case where n is even by Lecouturier and Zmiaikou [34]. Gupta also considered the variant where one is allowed to traverse the rosary both clockwise and counterclockwise (see the right of Figure 5); he conjectured that one can always construct a rosary of length at most 3n2 /8 + 1/2 in this version of the problem. 20
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Figure 5. Two rosaries presented by Gupta for permutations of length 6. The rosary on the left may only be read clockwise, while the rosary on the right may be read either clockwise or counterclockwise.
3. Albert and West [3] studied the existence of universal cycles in the sense of Section 3 for permutation classes, making no restrictions on the size of the alphabet. To describe their results, we define a partial order on the set of all finite permutations where σ ≤ π if π contains a subsequence that is order-isomorphic to σ . If σ ≤ π then we say that π avoids σ . A permutation class is a set closed downward in this order. Every permutation class can be specified by giving the set of minimal elements not in the class (this set is called the basis of the class), and when presented in this form, we use the notation Av(B) = {π : π avoids all β ∈ B}. Most of Albert and West’s results are negative in nature, but some classes they consider, such as Av(132, 312), do have universal cycles over the alphabet P. They say that a permutation class with such a universal cycle is value cyclic. 4. At the end of his paper, Arratia [4] defines t (n) to be the least integer m such that at least half of all permutations of length m contain subsequences orderisomorphic to every permutation of length n, and he states that Noga Alon has conjectured that t (n) is asymptotic to n2 /4. Figure 6 plots the proportions of these permutations of lengths 0 ≤ m ≤ 40 for n = 3, 4, 5, and 6. For n = 3, we compute these proportions exactly using the formula of Simion and Schmidt [43] mentioned at the beginning of Section 6, while for n ≥ 4, these plots are obtained by random sampling to a high level of confidence. This data
Figure 6. The proportion of permutations containing subsequences order-isomorphic to every permutation of length n, by length, for 3 ≤ n ≤ 6.
January 2021]
CONTAINING ALL PERMUTATIONS
21
and further computations suggest the following values of t (n) for 1 ≤ n ≤ 8: n t (n) 1 1 2 3 3 7 4 13 5 20 6 28 7 36 8 48 While the first six values of t (n) above might lead the reader to suspect that t (n) is the nearest integer to πn2 /4, this seems not to hold for n = 7, 8. We leave it to the reader to decide whether these values support or undermine Alon’s conjecture that t (n) ∼ n2 /4. 5. Universal words over P containing, as subsequences, all permutations of length n from a proper permutation class have also been studied. Bannister, Cheng, Devanny, and Eppstein [6] construct a universal word of length n2 /4 + (n) for the permutations of length n in the class Av(132), and they show that every proper subclass C Av(132) has a universal word of length at most O(n logO(1) n). In [6], among other results, Bannister, Devanny, and Eppstein find a universal word of length at most 22n3/2 + (n) for the class Av(321). Finally, Albert, Engen, Pantone, and Vatter [2] consider the class of layered permutations, Av(231, 312). In addition to verifying a conjecture of Gray [17], they show that the length of the shortest universal word over P containing all layered permutations of length n as subsequences is given precisely by a(0) = 0 and a(n) = (n + 1) log2 (n + 1) − 2 log2 (n+1) + 1 for n ≥ 1. Added in page proofs. Chroman, Kwan, and Singhal [9] have established that the lower bound given at the end of Section 5 is certainly not the correct answer for that version of the problem, and have established a lower bound of 1.000076n2 /e2 for the version of the problem discussed in Section 6, thereby refuting Arratia’s conjecture. ACKNOWLEDGMENTS. We thank Michael Albert, Arnar Arnarson, Robert Brignall, Robin Houston, and Jay Pantone for numerous fruitful discussions that improved this work. We are additionally grateful to Jay Pantone for his assistance in verifying that no permutation of length 16 or less contains all permutations of length 6 as subsequences.
REFERENCES [1] Adleman, L. (1974). Short permutation strings. Discrete Math. 10(2): 197–200. doi.org/10.1016/ 0012-365X(74)90116-2 [2] Albert, M., Engen, M., Pantone, J., Vatter, V. (2018). Universal layered permutations. Electron. J. Combin. 25(3): Paper #P3.23, 5 pp. doi.org/10.37236/7386 [3] Albert, M., West, J. (2009). Universal cycles for permutation classes. Discrete Math. Theor. Comput. Sci. Proc. AK: 39–50. dmtcs.episciences.org/2727 [4] Arratia, R. (1999). On the Stanley–Wilf conjecture for the number of permutations avoiding a given pattern. Electron. J. Combin. 6: Note 1, 4 pp. doi.org/10.37236/1477 [5] Ashlock, D., Tillotson, J. (1993). Construction of small superpermutations and minimal injective superstrings. Congr. Numer. 93: 91–98.
22
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
[6] Bannister, M., Cheng, Z., Devanny, W., Eppstein, D. (2014). Superpatterns and universal point sets. J. Graph Algorithms Appl. 18(2): 177–209. doi.org/10.7155/jgaa.00318 [7] B´ona, M. (2012). Combinatorics of Permutations, 2nd ed. Boca Raton, FL: CRC Press. [8] Chung, F., Diaconis, P., Graham, R. (1992). Universal cycles for combinatorial structures. Discrete Math. 110(1–3): 43–59. doi.org/10.1016/0012-365X(92)90699-G [9] Chroman, Z., Kwan, M., Singhal, M. (2020). Lower bounds for superpatterns an universal sequences. arxiv.org/abs/2004.02375 [10] Chv´atal, V., Klarner, D., Knuth, D. (1972). Selected combinatorial research problems. Technical Report STAN-CS-72-292. Stanford University. i.stanford.edu/TR/CS-TR-72-292.html [11] de Bruijn, N. (1975). Acknowledgement of priority to C. Flye Sainte-Marie on the counting of circular arrangements of 2n zeros and ones that show each n-letter word exactly once. Technical Report T.H.-Report 75-WSK-06. Technische Hogeschool Eindhoven Nederland. pure.tue.nl/ws/files/ 1875614/252901.pdf [12] Diaconis, P., Graham, R. (2012). Magical Mathematics: The Mathematical Ideas That Animate Great Magic Tricks. Princeton, NJ: Princeton Univ. Press. [13] Egan, G. (1994). Permutation City. London: Orion Books Ltd. [14] Egan, G. (2018). Superpermutations. gregegan.net/SCIENCE/Superpermutations/Superpermutations. html [15] Eriksson, H., Eriksson, K., Linusson, S., W¨astlund, J. (2007). Dense packing of patterns in a permutation. Ann. Comb. 11(3–4): 459–470. doi.org/10.1007/s00026-007-0329-7 [16] Galbiati, G., Preparata, F. (1976). On permutation-embedding sequences. SIAM J. Appl. Math. 30(3): 421–423. doi.org/10.1137/0130040 [17] Gray, D. (2015). Bounds on superpatterns containing all layered permutations. Graphs Combin. 31(4): 941–952. doi.org/10.1007/s00373-014-1429-x [18] Griggs, M. (2018). An anonymous 4chan post could help solve a 25-year-old math mystery. The Verge, October 24. theverge.com/2018/10/24/18019464/4chan-anon-anime-haruhi-math-mystery [19] Gupta, H. (1981). On permutation-generating strings and rosaries. In: Rao, S., ed. Combinatorics and Graph Theory. Lecture Notes in Math., Vol. 885. Berlin: Springer-Verlag, pp. 272–275. doi.org/10.1007/BFb0092270 [20] Guy, R. (2004). Unsolved Problems in Number Theory. Problem Books in Math, 3rd ed. New York: Springer-Verlag. [21] Holroyd, A., Ruskey, F., Williams, A. (2010). Faster generation of shorthand universal cycles for permutations. In: Thai, M., Sahni, S., eds. Computing and Combinatorics. Lecture Notes in Comput. Sci., Vol. 6196. Berlin: Springer-Verlag, pp. 298–307. doi.org/10.1007/978-3-642-14031-0 33 [22] Holroyd, A., Ruskey, F., Williams, A. (2012). Shorthand universal cycles for permutations. Algorithmica. 64(2): 215–245. doi.org/10.1007/s00453-011-9544-z [23] Honner, P. (2019). Unscrambling the hidden secrets of superpermutations. Quanta Mag., January 16. quantamagazine.org/unscrambling-the-hidden-secrets-of-superpermutations-20190116/ [24] Houston, R. (2014). Tackling the minimal superpermutation problem. arxiv.org/abs/1408.5108 [25] Jackson, B. W. (1993). Universal cycles of k-subsets and k-permutations. Discrete Math. 117(1–3): 141–150. doi.org/10.1016/0012-365X(93)90330-V [26] Johnson, J. R. (2009). Universal cycles for permutations. Discrete Math. 309(17): 5264–5270. doi.org/ 10.1016/j.disc.2007.11.004 [27] Johnston, N. (2013). The minimal superpermutation problem. April 10. njohnston.ca/2013/04/theminimal-superpermutation-problem/ [28] Johnston, N. (2013). Non-uniqueness of minimal superpermutations. Discrete Math. 313(14): 1553– 1557. doi.org/10.1016/j.disc.2013.03.024 [29] Johnston, N. (2014). All minimal superpermutations on five symbols have been found. August 13. njohnston.ca/2014/08/all-minimal-superpermutations-on-five-symbols-have-been-found/ [30] Klarreich, E. (2018). Mystery math whiz and novelist advance permutation problem. Quanta Mag., November 5. quantamagazine.org/sci-fi-writer-greg-egan-and-anonymous-math-whiz-advancepermutation-problem-20181105/ [31] Kleitman, D., Kwiatkowski, D. (1976). A lower bound on the length of a sequence containing all permutations as subsequences. J. Combin. Theory Ser. A. 21(2): 129–136. doi.org/10.1016/00973165(76)90057-1 [32] Knuth, D. (2011). The Art of Computer Programming, Vol. 4A. Upper Saddle River, NJ: AddisonWesley. [33] Koutas, P. J., Hu, T. C. (1975). Shortest string containing all permutations. Discrete Math. 11(2): 125– 132. doi.org/10.1016/0012-365X(75)90004-7 [34] Lecouturier, E., Zmiaikou, D. (2012). On a conjecture of H. Gupta. Discrete Math. 312(8): 1444–1452. doi.org/10.1016/j.disc.2011.12.027
January 2021]
CONTAINING ALL PERMUTATIONS
23
[35] Miller, A. (2009). Asymptotic bounds for permutations containing many different patterns. J. Combin. Theory Ser. A. 116(1): 92–108. doi.org/10.1016/j.jcta.2008.04.007 [36] Mohanty, S. P. (1980). Shortest string containing all permutations. Discrete Math. 31(1): 91–95. doi.org/10.1016/0012-365X(80)90177-6 [37] Newey, M. C. (1973). Notes on a problem involving permutations as subsequences. Technical Report STAN-CS-73-340. Stanford University. i.stanford.edu/TR/CS-TR-73-340.html [38] Nijenhuis, A., Wilf, H. (1975). Combinatorial Algorithms. New York: Academic Press. [39] Parker, M. (2019). Superpermutations: the maths problem solved by 4chan. YouTube, January 28. youtube.com/watch?v=OZzIvl1tbPo [40] Radomirovi´c, S. (2012). A construction of short sequences containing all permutations of a set as subsequences. Electron. J. Combin. 19(4): Paper 31, 11 pp. doi.org/10.37236/2859 [41] Ruskey, F., Williams, A. (2010). An explicit universal cycle for the (n − 1)-permutations of an n-set. ACM Trans. Algorithms. 6(3): Art. 45, 12 pp. doi.org/10.1145/1798596.1798598 [42] Sawada, J., Williams, A. (2018). A Hamilton path for the sigma-tau problem. In: Czumaj, A., ed. Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). Philadelphia, PA: SIAM, pp. 568–575. doi.org/10.1137/1.9781611975031.37 [43] Simion, R., Schmidt, F. W. (1985). Restricted permutations. European J. Combin. 6(4): 383–406. doi.org/10.1016/S0195-6698(85)80052-4 [44] Williams, A. (2013). Hamiltonicity of the Cayley digraph on the symmetric group generated by σ = (12 · · · n) and τ = (12). arxiv.org/abs/1307.2549 [45] Z˘alinescu, E. (2011). Shorter strings containing all k-element permutations. Inform. Process. Lett. 111(12): 605–608. doi.org/10.1016/j.ipl.2011.03.018 MICHAEL ENGEN is a Ph.D. candidate at the University of Florida under the supervision of Vincent Vatter. In the fall of 2019, he was a Chateaubriand fellow at Universit´e Paris Nord under the supervision of Fr´ed´erique Bassino. Department of Mathematics, University of Florida, Gainesville, FL 32611 [email protected]
VINCENT VATTER is an associate professor at the University of Florida. He obtained his Ph.D. from Rutgers University under the supervision of Doron Zeilberger before completing postdocs at the University of St. Andrews and Dartmouth College. Department of Mathematics, University of Florida, Gainesville, FL 32611 [email protected]
24
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Geometry and Algebra of the Deltoid Map Joshua P. Bowman Abstract. The geometry of the deltoid curve gives rise to a self-map of C2 that is expressed in coordinates by f (x, y) = (y 2 − 2x, x 2 − 2y). This is one in a family of maps that generalize Chebyshev polynomials to several variables. We use this example to illustrate two important objects in complex dynamics: the Julia set and the iterated monodromy group.
1. INTRODUCTION. Complex dynamics is perhaps best known for the fractal images it produces. For instance, given a polynomial function C → C, an important set to consider is the Julia set, whose points behave “chaotically” under iteration of the function; for most polynomials, the Julia set is a fractal. However, the Julia set is a smooth curve in the case of two special families: power maps, having the form z → zd , and Chebyshev polynomials, of which the simplest example is z → z2 − 2. For power maps, the Julia set is the unit circle, and for Chebyshev polynomials it is the segment [−2, 2], contained in the real line. These structurally simple examples play a distinguished role in complex dynamics, and studying them can illuminate parts of the theory that apply in more complicated cases. Power maps have an obvious generalization to functions from Cn to itself: just take the dth power of each coordinate. The higher-dimensional analogues of Chebyshev polynomials are not as obvious, however. In the 1980s, Veselov [19,20] and Hoffman– Withers [7] independently constructed a family of “Chebyshev-like” self-maps of Cn associated to each crystallographic root system of rank n. The cases where n = 2 have received much further attention (see, e.g., [9,11,14–17,21]), especially for the A2 root system, which is connected with the deltoid curve (a.k.a. three-cusped hypocycloid or Steiner’s hypocycloid). This article presents a new approach to construct a quadratic A2 -type map f based directly on geometric properties of the deltoid. For this reason, we call f the deltoid map. The set of lines tangent to the deltoid will play a crucial role, and indeed we will see that f preserves this set of lines. (This fact was previously observed in [21]; the difference is that we construct the map from the tangent lines, rather than starting with the map ahead of time and deducing from it the invariance of the tangent lines; in particular, our approach does not use the theory of root systems.) Using this invariance property, we will study two dynamical features of f : one geometric (the Julia set) and the other algebraic (the iterated monodromy group). Both of these objects will be formally defined later in the article. The Julia set of f is a real algebraic hypersurface J of degree 4 (Corollary 1). We derive this property from a description of J in terms of pedal curves, which arise from classical differential geometry (Theorem 1). The Julia set of f is therefore considerably more interesting geometrically than in the case of a Chebyshev polynomial in one variable, the segment [−2, 2] mentioned above. The iterated monodromy group of f is an affine Coxeter group (Theorem 2). Such groups are present implicitly in the construction from [19, 20] and explicitly in [7]. The connection with iterated monodromy groups is new, however, and extends the doi.org/10.1080/00029890.2021.1847630 MSC Primary 37F10, Secondary 53A04; 20F55
January 2021]
THE DELTOID MAP
25
Figure 1. Tracing out the deltoid as a hypocycloid.
(very short) list of polynomial endomorphisms of Cn , with n ≥ 2, whose iterated monodromy groups are known (see [4,13] for the only other examples known to the author). In future work, we will show how these properties of the deltoid map generalize to other Chebyshev-like maps. 2. LINES AND PLANES. In this section, we establish some notation and terminology. The complex projective line CP1 is identified with the one-point compactification of C (i.e., the Riemann sphere) in the usual way; generally t ∈ C ∪ {∞} will be used to mean this extended complex coordinate on CP1 . The complex projective plane CP2 has homogeneous coordinates [x : y : z], where x, y, and z are complex numbers, not all zero; this means that [x : y : z] = [αx : αy : αz] for all α ∈ C \ {0}. We use [a : b : c]∨ to represent homogeneous coordinates on the dual projective plane (CP2 )∨ , whose elements are the lines in CP2 , so that [x : y : z] ∈ [a : b : c]∨
⇐⇒
ax + by + cz = 0.
The affine plane C2 is canonically included in CP2 via the map (x, y) → [x : y : 1]. The complement of the image of C2 under this embedding is the complex line at infinity L∞ ∼ = CP1 , having equation z = 0; that is, L∞ = [0 : 0 : 1]∨ . The real plane in C2 with equation y = x¯ is a copy of the Euclidean plane, and it will be denoted by E2 . Its closure in CP2 is a copy of the real projective plane, E2 ∼ = RP2 , 2 but we do not write it as such, because the coordinates induced on E as a subset of C2 are not real. We call ∂E2 = E2 \ E2 = E2 ∩ L∞ ∼ = S 1 the circle at infinity, trusting no confusion will arise from the fact that the (real) circle at infinity is contained in the (complex) line at infinity. As a real submanifold of CP2 , E2 does not carry a complex structure (else its closure could not be the real projective plane, topologically), but the restriction of the coordinate x to E2 provides a bijection E2 ∼ = C. This is what we will always mean when we carry out constructions on E2 using a complex coordinate. 3. THE DELTOID AS A REAL CURVE AND AS A COMPLEX CURVE. In this section, we collect some known properties of the deltoid—especially regarding its tangent lines—that will be useful in our study. The classical deltoid is the curve traced by a point marked on the circumference of a circle of radius 1 rolling without slipping inside a circle of radius 3. When the center of the smaller circle travels once counterclockwise around the center of the larger circle, a point on the smaller circle’s circumference makes two clockwise revolutions around its center. (See Figure 1.) Because the centers remain 2 units apart, the deltoid can be 26
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
parametrized in E2 by x = 2t + t¯2 ,
|t| = 1.
This extends to a complex algebraic curve in the following way. Because E2 is embed¯ the parametrization of the deltoid in C2 becomes ded in C2 as the real plane y = x, 2 2 (2t + t¯ , 2t¯ + t ) with |t| = 1. In order to make this parametrization holomorphic, we replace t¯ with t −1 (when |t| = 1, these are the same), and we define 1 2 γ (t) = 2t + 2 , + t 2 , t ∈ C \ {0}. (1) t t We can further extend γ to a curve in CP2 , which we also call γ , by appending an additional coordinate, initially equal to 1, then clearing denominators (which is allowed in homogeneous coordinates): t ∈ CP1 . γ (t) = 2t 3 + 1 : 2t + t 4 : t 2 , Note that γ (0) = [1 : 0 : 0] and γ (∞) = [0 : 1 : 0]. (To see why the latter expression is correct, rewrite γ (t) as γ (1/s), clear denominators, then let s go to 0.) These are the only two points of CP1 that γ sends to L∞ . D will denote the image of γ in either C2 or CP2 , and DE2 = D ∩ E2 is the real deltoid. In C2 , we have 2 2 1 γ (t) = 2 − 3 , − 2 + 2t = 2 1 − 3 (1, t) t t t and so γ (t) vanishes precisely when t equals 1, ω = ei 2π/3 , or ω2 = ei 4π/3 ; these cube roots of unity give rise to the three cusps of D. At every other point of D, a tangent vector is (1, t). An equation for the line tangent to D at γ (t) is therefore 1 x − 2t − t −2 t y − 2t −1 − t 2 = 0, which is equivalent to t 3 − t 2 x + ty − 1 = 0.
(2)
This equation works equally well at the cusps, where t 3 = 1 and (2) reduces to y = tx, so each cusp also has a well-defined tangent line, which passes through the origin. It is worth remarking that in [10] the study of the real deltoid begins, not with any classical construction, but with equation (2), restricted to y = x¯ and |t| = 1, which is simply called the “line equation” of the deltoid. Equation (2) shows that a generic point (x, y) of C2 lies on three tangent lines of D. A point belongs to D if and only if at least two of these tangent lines coincide, which is to say that the discriminant of the left side of (2) (as a polynomial in t) is zero. Thus we obtain an affine equation for D (and an additional reason to name this set D, since it is where a discriminant vanishes): (3) x 2 y 2 − 4 x 3 + y 3 + 18xy − 27 = 0. January 2021]
THE DELTOID MAP
27
( t)
( t)
( t)
(1/t2)
(t)
(t)
(t)
Figure 2. Three properties of lines tangent to D.
Now we can also parametrize the dual curve D∨ in the dual projective plane (CP2 )∨ . From (2), we get the following parametrization of D∨ : γˇ (t) = [−t 2 : t : t 3 − 1]∨ .
(4)
In particular, we see that γˇ (0) = γˇ (∞) = [0 : 0 : 1]∨ , so that the line at infinity in CP2 is tangent to D at both γ (0) and γ (∞). (This tangency can also be seen, less directly, from the fact that L∞ intersects D, a curve of degree 4, in only two points.) From (4), we can deduce that an equation for D∨ is a 3 + b3 = abc
(5)
(when a, b, and c are real, this equation produces the folium of Descartes). This curve is smooth except for a self-intersection at [0 : 0 : 1]∨ , which shows that the line at infinity is the only bitangent of D. Because equation (3) has degree four, a generic line in CP2 will intersect D in four points. Meanwhile, a generic element of D∨ (that is, a line tangent to D) will intersect D at two points besides the point of tangency. These other two points of intersection are connected with several interesting geometric properties; we state three of them here for later use. All three have easy algebraic proofs, which we leave to the reader. They are illustrated in Figure 2. (A) For all t ∈ C \ {0}, the line containing γ (t) and γ (−t) is tangent to D at γ (1/t 2 ). (B) The midpoint of γ (t) and γ (−t) in C2 lies on the curve xy = 1. (C) The tangent lines γˇ (t) and γˇ (−t) intersect at a point also on xy = 1. Property (A) will later form the basis for our geometrically-defined dynamical system. Properties (B) and (C) will relate to the critical points of the map. The curve C with equation xy = 1 is, projectively speaking, a conic section. Its intersection CE2 with the plane E2 is the unit circle, having equation |x|2 = 1. The real deltoid DE2 is a Jordan curve in E2 ; let K be the union of DE2 with its ¯ −1=0 interior. K consists of those points x such that all solutions to t 3 − xt 2 + xt lie on the unit circle |t| = 1; in other words, these are the points that lie on three “real” tangent lines. (See Figure 3, left and middle.) 4. THE DELTOID MAP. In this section, we use the geometric properties of D to define a map f from CP2 to itself. First, we define a natural map fˇ on the dual curve D∨ . Given L = Tx D ∈ D∨ , let fˇ(L) be the unique element of D∨ such that {L, fˇ(L)} is the full set of tangent lines to D passing through x, as illustrated in Figure 3, right. 28
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
fˇ(L)
L x Figure 3. Left: The set K ⊂ E2 bounded by D ∩ E2 . Middle: Tangent lines through the three cusps of D and their point of intersection at the origin. Right: For a generic tangent line L ∈ D ∨ there is another line fˇ(L) ∈ D∨ such that fˇ(L) is secant to D at the point x where L is tangent.
(Note that fˇ(L) is the same as L when x = γ (t) for t ∈ {1, ω, ω2 , 0, ∞}, and it is distinct otherwise.) It follows from property (A) in the previous section that fˇ(γˇ (t)) = γˇ (1/t 2 )
for all t ∈ CP1 .
In particular, fˇ fixes L∞ as an element of D∨ , but it is helpful to think of it as exchanging the points of tangency, namely γ (0) and γ (∞). Now we turn to our promised self-map of C2 . First we observe that, given (x, y) ∈ 2 C , the solutions t1 , t2 , t3 to (2) satisfy t1 t2 t3 = 1 and x = t1 + t2 + t3 ,
y=
1 1 1 + + . t1 t2 t3
(6)
Conversely, if t1 , t2 , t3 are chosen to satisfy t1 t2 t3 = 1, then the formulas (6) provide coefficients for the equation (2) to be solved by t1 , t2 , t3 . Proposition 1. Suppose γˇ (t1 ), γˇ (t2 ), and γˇ (t3 ) are concurrent. Then so are fˇ(γˇ (t1 )), fˇ(γˇ (t2 )), and fˇ(γˇ (t3 )). Proof. If the point of concurrency lies on L∞ , then the result is trivial, as at most two lines are involved. Otherwise a necessary and sufficient condition for concurrency is t1 t2 t3 = 1. But if t1 , t2 , and t3 satisfy this equality, then also (1/t1 2 )(1/t2 2 )(1/t3 2 ) = (1/t1 t2 t3 )2 = 1. This proposition provides the basis for defining a map on all of CP2 : given x ∈ CP2 , let L1 , L2 , and L3 be the three elements of D∨ passing through x (some of these may coincide). Then define f (x) to be the point at which fˇ(L1 ), fˇ(L2 ), and fˇ(L3 ) are concurrent. (See Figure 4.) To handle the special cases of when all three lines L1 , L2 , and L3 coincide, we extend by continuity and define f ([1 : 0 : 0]) = [0 : 1 : 0], f ([0 : 1 : 0]) = [1 : 0 : 0]), and whenever fˇ(L1 ) = fˇ(L2 ) = fˇ(L3 ) passes through a cusp of D, f (x) is defined to be that cusp. With this geometric definition in hand, we find polynomials that describe f . Proposition 2. On C2 , f takes the form (x, y) → (y 2 − 2x, x 2 − 2y). On CP2 , this extends to [x : y : z] → [y 2 − 2xz : x 2 − 2yz : z2 ]. On L∞ , f has the form ζ → 1/ζ 2 . January 2021]
THE DELTOID MAP
29
x f (x)
Figure 4. Geometric definition of f . Any point x ∈ CP2 lies on three tangent lines of D (counted with multiplicity). The point of tangency for each of these lines lies on another element of D ∨ , as seen in Figure 3. The resulting collection of three new tangent lines (again, counted with multiplicity) is concurrent at f (x).
Proof. If (x, y) ∈ C2 , and t1 , t2 , and t3 are the roots of (2), then by the observations surrounding equation (6), we have 1 1 1 2 2 2 f (x, y) = . + + , t + t + t 1 2 3 t1 2 t2 2 t3 2 Now we observe that 1 1 1 2 1 1 1 1 1 1 + + −2 + + = 2+ 2+ 2 t1 t2 t3 t1 t2 t2 t3 t3 t1 t1 t2 t3 and (t1 + t2 + t3 )2 − 2 (t1 t2 + t2 t3 + t3 t1 ) = t1 2 + t2 2 + t3 2 , which proves the result on C2 . The formula on CP2 is then obtained by a standard homogenization process. Because L∞ is defined by z = 0, on this line the map becomes [x : y : 0] → [y 2 : x 2 : 0]; if we set ζ = y/x, the result for L∞ becomes clear. Alternatively, for L∞ we could use the observations made previously that f (γˇ (t)) = γˇ (1/t 2 ) and that γˇ (t) intersects L∞ at [1 : t : 0], so ζ = t. 5. JULIA SET, FATOU SET, AND GREEN FUNCTION. Having defined the deltoid map f , we turn to some of its dynamical properties. Ideally, for any point x ∈ CP2 , we would like to be able to predict the behavior of its orbit under f , which is the sequence x, f (x), f 2 (x), f 3 (x), . . . , and also to say something about the orbits of points near x. (Here and in the rest of the article f n denotes the composition of f with itself n times; this notation is standard in dynamical systems.) From the construction of f , we can already see that it has some exceptional properties: the deltoid D is forward invariant, meaning f (D) = D, and f also sends each line tangent to D to another such line. These tangent lines will continue to be key in studying properties of f . Notice that f commutes with the involution ι(x, y) = (y, x). The composition ι ◦ f = f ◦ ι is studied by Uchimura in [15–17] and Nakane in [11]. The dynamical properties of f and ι ◦ f are essentially identical. A fundamental tool in complex dynamics is the partition of the dynamical space into the Fatou set, where the dynamics are “simple,” and the Julia set, where the dynamics are “chaotic.” More precisely, the Fatou set = f is the largest open set of CP2 on which the iterates of f locally form an equicontinuous family; thus if x and y are 30
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
points of that are sufficiently near each other, then f n (x) and f n (y) remain close (in CP2 ) as n increases. The Julia set J = Jf is the complement of ; thus if x is in J and y is close to x, then f n (x) and f n (y) may be very far apart. On L∞ , as we have seen, f reduces to the power map ζ → 1/ζ 2 . This map of CP1 exchanges 0 and ∞ (in CP2 , these are the points [1 : 0 : 0] and [0 : 1 : 0]), and so n these two points form a period 2 orbit. If |ζ | = 1, then ζ (−2) approaches the previously n observed period 2 orbit. If |ζ | = 1, then ζ (−2) remains on the unit circle, while some nearby points are drawn to the {0, ∞} orbit. Thus the Julia set of f on L∞ is the circle at infinity, and the Fatou set in L∞ has two components, one containing 0 and the other ∞. To determine the Julia and Fatou sets of f in C2 , we introduce the Green function G = Gf of f , which is defined [3, 8] by G(x) = lim
n→∞
1 log+ f n (x) , 2n
where log+ = max {log, 0}, and · is any norm on C2 . This function measures how quickly points of C2 escape to infinity under iteration of f ; it is zero precisely for those points whose orbits are bounded, which comprise the set K. It is a continuous, subharmonic function on C2 , and it satisfies the functional equation G(f (x, y)) = 2G(x, y). For most self-maps of C2 , the Green function cannot be explicitly calculated. The deltoid map is an exception. Proposition 3. The Green function G of the deltoid map f can be calculated as follows: Given (x, y) ∈ C2 , let t1 , t2 , and t3 be the solutions to (2). Then
1 1 1 G(x, y) = log max |t1 |, |t2 |, |t3 |, , , . |t1 | |t2 | |t3 |
(7)
Notice that we do not need to use log+ in (7), because the set over which the maximum is taken contains at least one element that is greater than or equal to 1. Proof of Proposition 3. Using the L∞ norm on C2 , we have n 1 n n + G(x, y) = lim n log max t1 2 + t2 2 + t3 2 , n→∞ 2
1 n + 1n + 1n . t 2 2 2 t2 t3 1
Set τ = max |t1 |, |t2 |, |t3 |, |t1 |−1 , |t2 |−1 , |t3 |−1 . Then τ ≥ 1, and we have
2n 1 1 1 2n 2n 1 log max t1 + t2 + t3 , 2n + 2n + 2n − log τ 2n t1 t2 t3
n t1 2 + t2 2n + t3 2n 1 1 1 1 1 , 2n 2n + 2n + 2n . = n log max n 2 2 τ τ t1 t2 t3
(8) (9)
By our choice of τ , the maximum of the set in (9) is bounded by 3. Therefore, as n tends to ∞, the difference in (8) tends to 0. This shows that G(x, y) = log τ , as claimed. January 2021]
THE DELTOID MAP
31
Figure 5. Some pedal curves of the real deltoid in E2 . In each image the point O is indicated by a dot. Top: With respect to the center (a trifolium), with respect to the point opposite a cusp (a bifolium), and with respect to a cusp (a simple folium). Bottom: With respect to an exterior point, with respect to an interior point on an axis of symmetry, and with respect to a generic interior point.
In terms of the Green function, is the set of points where dd c G vanishes. Here dd c = 2πi ∂∂ is the so-called pluri-Laplacian, and the derivatives should properly be interpreted as currents (“differential forms with distributional coefficients”); for us, however, it is sufficient to know where dd c G = 0. Because dd c log+ |t| vanishes except on the unit circle S 1 , we obtain the following characterization of J . Proposition 4. The Julia set of f is the set J of points [x : y : z] ∈ CP2 such that the polynomial z(t 3 − 1) − xt 2 + yt has at least one root on S 1 . Given our geometric definition of f , this result is not surprising: as we have seen, the line γˇ (t) ∈ D∨ intersects L∞ at [1 : t : 0], and the circle at infinity, where |t| = 1, is precisely the Julia set of f |L∞ . Nakane [11] provided a description of the foliation of J by “stable disks” of the circle at infinity, as well as how external rays land at points of K. We shall take a different perspective and consider the intersection of J with complex lines in C2 parallel to the x- and y-axes. In order to describe the result, however, we must invoke some classical differential geometry. Given a curve C and a point O in E2 , the pedal curve of C with respect to O is the locus of points P such that P is the orthogonal projection of O onto a line tangent to C. (See Figure 5 for some examples.) At this point we can state our first main result, which says that the Julia set of the deltoid map on C2 geometrically decomposes into a disjoint union of pedal curves of the real deltoid. Theorem 1. The intersection of J with a line L = L∞ through [1 : 0 : 0] (that is, parallel to the x-axis in C2 ) is the pedal curve of the real deltoid with respect to the xcoordinate of L ∩ E2 . Likewise, the intersection of J with a line parallel to the y-axis 32
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
is the pedal curve of the real deltoid with respect to the y-coordinate of the intersection of this line and E2 . To prove this result, we will use the following projection from C2 to E2 : x + y¯ y + x¯ , . prE2 (x, y) = 2 2 This projection is orthogonal with respect to the standard Hermitian inner product on C2 , namely (x1 , y1 ) · (x2 , y2 ) = x1 x2 + y1 y2 . Conveniently, it also preserves each complex line that is tangent to D at a point of DE2 , which is the content of the next lemma. Lemma 1. If |t| = 1 and (x, y) ∈ γˇ (t), then also prE2 (x, y) ∈ γˇ (t). Proof. By assumption, t and (x, y) satisfy equation (2) t 3 − t 2 x + ty − 1 = 0, as well as its conjugate t¯3 − t¯2 x¯ + t¯y¯ − 1 = 0. Because |t| = 1, we have t¯ = t −1 , and so, after multiplying the conjugate of (2) by t 3 we obtain 1 − t x¯ + t 2 y¯ − t 3 = 0. Subtracting this latter equation from (2) and dividing by 2 produces y + x¯ ¯ 3 2 x +y +t −1=0 t −t 2 2 as desired. A line in C2 parallel to the x-axis is determined by its y-coordinate. Let Lα be the line with equation y = α. ¯ The intersection of Lα with E2 is ¯ Lα ∩ E2 = {(α, α)}. The restriction of prE2 to Lα is a bijection, whose inverse λα : E2 → Lα is the affine map ¯ = (2x − α, α). ¯ λα (x, x) Notice, however, that with respect to the metrics induced on E2 and Lα by the Hermitian inner product on C2 , λα is not just affine, but a similarity. To prove Theorem 1, therefore, it suffices to show that prE2 (J ∩ Lα ) is the pedal curve of D ∩ E2 with respect to (α, α). ¯ Or what is the same, we need to show that for all t ∈ S 1 , the point 2 (x, x) ¯ ∈ E is the orthogonal projection of (α, α) ¯ onto γˇ (t) ∩ E2 if and only if λα (x, x) ¯ is in γˇ (t). If |t| = 1, then the Hermitian inner product of the vectors (1, t) and (1, −t) is zero, so any two lines in C2 of the form y = tx + b1 and y = −tx + b2 are orthogonal. Proof of Theorem 1. Let |t| = 1. The intersection of γˇ (t) and E2 has the equation t 3 − t 2 x + t x¯ − 1 = 0,
or
x¯ = tx − t 2 + t −1 .
The line through (α, α) ¯ that is orthogonal to γˇ (t) is therefore x¯ − α¯ = −t (x − α). These latter two equations together imply (by eliminating x) ¯ that tx − t 2 + t −1 = α¯ − t (x − α), January 2021]
THE DELTOID MAP
33
and solving for x produces x=
1 α + t + αt ¯ −1 − t −2 . 2
On the other hand, if λα (x, x) ¯ ∈ J , then t 3 − t 2 (2x − α) + t α¯ − 1 = 0, which produces the same solution for x, as desired. The proof for the intersection of J with a line parallel to the y-axis is virtually identical. From this geometric description of the intersection of J with a horizontal or vertical line, we can find an algebraic equation for J in C2 . Corollary 1. The Julia set of f is the real hypersurface in C2 having the equation ¯ 2 (x¯ 2 − y 2 ) = 0. 2 Re(x − y) ¯ 3 + Re(x − y) Proof. Start in E2 with the real lines t 3 − t 2 x + t x¯ − 1 = 0
x¯ − α¯ = −t (x − α),
and
then eliminate t to get
x¯ − α¯ x−α
3
+
x¯ − α¯ x−α
2
x+
x¯ − α¯ x¯ + 1 = 0. x−α
Now a point (x, y) ∈ C2 is in J if prE2 (x, y) satisfies this equation (meaning we replace x with (x + y)/2 ¯ and x¯ with (x¯ + y)/2) when α = y, ¯ which yields 2(x¯ − y)3 + (x¯ − y)2 (x 2 − y¯ 2 ) + (x − y) ¯ 2 (x¯ 2 − y 2 ) + 2(x − y) ¯ 3 = 0. This is equivalent to the desired equation. Note that in particular the equation in Corollary 1 is satisfied when y = x, ¯ so E2 is 2 entirely contained in J . This is to be expected, because every point of E lies on a line that intersects L∞ on the circle at infinity. To end this section, we provide a description of the Fatou set . Corollary 2. has two components, each of which is biholomorphic to (D × D)/σ , where D is the open unit disk in C and σ is the involution σ (u, v) = (v, u). These two components are exchanged by f . Proof of Corollary 2. Define the following two functions from C2 to CP2 :
x (u, v) = u2 v + uv 2 + 1 : u + v + u2 v 2 : uv ,
y (u, v) = u + v + u2 v 2 : u2 v + uv 2 + 1 : uv . 34
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Direct computation shows that (f ◦ x )(u, v) = y (u2 , v 2 )
and
(f ◦ y )(u, v) = x (u2 , v 2 ),
and for uv = 0, x (1/u, 1/v) = y (u, v). Geometrically, u and v are the t-parameters for two of the lines in D∨ passing through x (u, v), the third being 1/uv. Thus,
x (u, v) is contained in J if and only if either u or v lies on the unit circle, and the same holds for y (u, v). Together, x and y cover all of CP2 . By definition of J as the complement of , we see that is covered by the two images of D × D via x and y . Thus has two connected components. The polynomials defining x and y are symmetric in u and v, and distinct unordered pairs {u, v} lead to different points of CP2 by x and y . This proves the result. The functions x and y are variants of the function used in [11] as an “inverse B¨ottcher coordinate” on the Julia set of f . We can see from the formulas for f ◦ x and f ◦ y how the orbit of any point of tends uniformly and super-exponentially to the orbit consisting of x (0, 0) = [1 : 0 : 0] and y (0, 0) = [0 : 1 : 0]. 6. ITERATED MONODROMY GROUP OF THE DELTOID MAP. We begin this final section with one more exceptional property of f . The Jacobian determinant of f at (x, y) ∈ C2 is 4(1 − xy). Thus the locus of critical points in C2 is the curve C having equation xy = 1, whose importance was previously noted in Section 3. Indeed, because the lines γˇ (t) and γˇ (−t) have the same image under fˇ, their point of intersection must be a critical point of f ; by property (C), all such points lie on C . If we parametrize C by (t, 1/t), then we find that the image of a point of C can be written as 1 2 2 1 = −2t + 2 , t − = γ (−t), f t, t t t and so we see that f (C ) = D. Because D is forward invariant under f , we conclude that f is post-critically finite, meaning that the post-critical locus n≥1 f n (C ) is an algebraic curve—in this case, D itself. (Post-critically finite maps of CP2 were introduced in [5], under the name of “critically finite rational maps.”) Set X = C2 \ D and X1 = X \ C . Then the above property implies that f |X1 is a covering map from X1 to X , called a partial self-covering of X . Let x0 = (0, 0) ∈ X ; then the fundamental group π1 (X , x0 ) permutes the set of preimages of x0 by f in a standard way: given [η] ∈ π1 (X , x0 ) and y ∈ f −1 (x0 ), use f to lift η to a path η¯ starting at y, and let [η] · y be the endpoint of η. ¯ This defines a homomorphism μf from π1 (X , x0 ) to the symmetric group on f −1 (x0 ), called the monodromy homomorphism. Likewise, if we set Xn = f −n (X ), then f n |Xn is a covering map, and π1 (X , x0 ) acts on f −n (x0 ) by the monodromy homomorphism μf n . The intersection ker μf n κf = n≥1
is a normal subgroup of π1 (X , x0 ), consisting of all elements [η] such that every lift of η by every iterate of f remains a loop. The quotient IMG(f ) = π1 (X , x0 )/κf is called the iterated monodromy group of f . (See [6, 12] for details.) January 2021]
THE DELTOID MAP
35
Iterated monodromy groups are a relatively recent addition to the complex dynamics toolbox. They have already proved useful in classification problems [1] and in determining the shape of Julia sets more complicated than that of the deltoid map [13]. Nevertheless, only a few such groups have been explicitly calculated, especially for maps in dimension greater than 1. A nice feature of f is that IMG(f ) can be found directly from the definition, which is how we will prove our second main result. Theorem 2. IMG(f ) is isomorphic to the affine Coxeter group A˜ 2 . A˜ 2 can be realized geometrically as the group generated by reflections across the sides of an equilateral triangle in the plane. It has the group presentation A˜ 2 = g1 , g2 , g3 | ∀k gk2 = 1, ∀j ∀k (gj gk )3 = 1 . On the other hand, the fundamental group π1 (X , x0 ) is isomorphic to the related Artin group A¯ 2 = h1 , h2 , h3 | ∀j ∀k hj hk hj = hk hj hk (see [2] for a proof). Note that in A˜ 2 , the relation (gj gk )3 = 1 is equivalent to gj gk gj = gk gj gk , and so A˜ 2 can be obtained from A¯ 2 by adding the relations h2k = 1 for k = 1, 2, 3. We will accomplish this in Lemma 2, then show that no additional relations are present in IMG(f ). First we find a useful set of generators for π1 (X , x0 ): these can be chosen as circles contained in the lines γˇ (ω), γˇ (ω2 ), and γˇ (1) and passing through x0 . To see why, we use the Zariski–van Kampen theorem [18, 22], which states that generators can be obtained by taking a sufficiently general line L and drawing loops around the finite set of points L ∩ D. The condition on L is that L ∩ D should have four distinct points in C2 . We choose a line of the form L = {(x, y) | x + y = −a}, where 2 < a < 3. Then (3) implies that γ (t) lies on L if t 4 + 2t 3 + at 2 + 2t + 1 = 0, and our choice of a ensures that all solutions of this equation lie on the unit circle, which means all points of intersection in L ∩ D lie in E2 . (See Figure 6, left.) Thus the four points of L ∩ D lie in a straight (real) line, and so we can draw small loops around these inside the (complex) line L. Each such loop intersects E2 in two points: one in K, and one outside. Connect each loop in L from the point where it intersects K to x0 with a line segment, so that it becomes an element of π1 (X , x0 ) (with orientation given by the complex line in which it lies). Let’s label these elements. The real deltoid has three cusps, and between these lie three “branches”: • • •
one from γ (1) to γ (ω), one from γ (ω) to γ (ω2 ), and one from γ (ω2 ) to γ (1).
Call these branches, respectively, b2 , b3 , and b1 , so that bk and bk+1 meet at the cusp to which γˇ (ωk+2 ) lies tangent. (All indices are computed modulo 3.) The loops in L surrounding b1 and b2 are homotopic in X to loops that lie in γˇ (ω) and γˇ (ω2 ). On the other hand, the two loops surrounding b3 are both homotopic to the same loop in γˇ (1). 36
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Figure 6. Left: The complex line L with equation x + y = a intersects the deltoid D at four points, all contained in E2 , provided −3 < a < −2. Around each point of intersection, draw a loop inside L that intersects K at one point. When connected to x0 by additional segments in K, these loops generate π1 (X , x0 ). Right: Generators for π1 (X , x0 ), homotopic to those found in left picture. Each loop ηk is contained in the complex line γˇ (ωk ), which intersects D at the cusp γ (ωk ) and at the midpoint of the opposite branch bk .
Thus π1 (X , x0 ) is generated by three elements, which have representatives lying in the lines γˇ (ω), γˇ (ω2 ), and γˇ (1). Call these, respectively, η1 , η2 , and η3 , so that ηk wraps around bk ∩ γˇ (ωk ). (See Figure 6, right.) Lemma 2. For each k = 1, 2, 3 and for all n ≥ 1, μf n ([ηk ]) has order 2. Proof. We want to show that every lift of ηk by every iterate of f is either a closed loop, or forms a closed loop with one other lift. We will use the fact that every lift of ηk by any iterate of f is contained in some line γˇ (t) ∈ D∨ . The line γˇ (t), when t ∈ C \ {0}, can be parametrized by
√ s 1 σt (s) = t + √ , + s t , t t
s ∈ C,
√ as may be checked directly from the equation for γˇ (t). (Here, t can be either square root of t.) This parametrization of γ (t) has the nice feature that when√ s = 0, the result√ ing point lies on the critical locus C , since it is the midpoint of γ ( t) and γ (− t) (see property (B) from Section 3). Now when we apply f to σt (s), we obtain f σt (s) =
1 1 + (s 2 − 2)t, t 2 + (s 2 − 2) 2 t t
= σ1/t 2 (s 2 − 2).
So we just need to consider the possible lifts of a closed curve in C by the polynomial T (s) = s 2 − 2, avoiding the post-critical set of T . (See, for example, Figure 7, which illustrates the four lifts of η3 by f .) The critical point of T (s) is 0, and its critical value is −2. The image of −2 by T (s) is 2, which is a fixed point. Let η be any loop in C that does not pass through −2 or 2. If η does not encircle −2, then it lifts to a pair of disjoint loops; if η encircles 2, then one of these loops encircles 2 and one encircles −2, otherwise neither lift encircles −2. If η does encircle −2, then it lifts to a double cover of itself, consisting of two arcs, that does not encircle −2. January 2021]
THE DELTOID MAP
37
Figure 7. The four lifts of η3 by f . The loops lie in γˇ (−1), and the arcs lie in γˇ (1).
In other words, Lemma 2 says that the square of each generator [ηk ] is in κf . Together with the relations in π1 (X , x0 ) = A¯ 2 , this result implies that IMG(f ) is a quotient of A˜ 2 . To complete the proof of Theorem 2, we need to show that no additional relations are present in IMG(f ). Proof of Theorem 2. Recall the realization of A˜ 2 as the group generated by reflections ρ1 , ρ2 , ρ3 across the sides of an equilateral triangle. This group can be expressed as the semidirect product D3 , where is the normal subgroup consisting of translations (isomorphic to Z2 ) and D3 is the subgroup that fixes a vertex of the triangle (the dihedral group of order 6). D3 is generated by the reflections in two adjacent sides of the triangle. Suppose φ : A˜ 2 → IMG(f ) is the homomomorphism that sends ρk to [ηk ]κf . If ker φ ∩ D3 = {id}, then the order of φ(D3 ) is either 1 or 2, because the group of rotations is the only nontrivial normal subgroup of D3 ; in either case we must have φ(ρ1 ) = φ(ρ2 ) = φ(ρ3 ). On the other hand, if ker φ ∩ = {id}, then because this intersection is invariant under the action of D3 , it must contain two linearly independent elements λ1 , λ2 ; the group /(λ1 Z ⊕ λ2 Z) is then finite and so is φ(). Therefore, in order to show that φ is an isomorphism, it suffices to show that [η1 ]κf = [η2 ]κf and that IMG(f ) is infinite. The first condition is easily checked by observing that μf ([η1 ]) and μf ([η2 ]) are different permutations of f −1 (x0 ). The second condition may be seen by restricting our attention to an invariant line such as γ (1); on this line f behaves like the single-variable Chebyshev map s → s 2 − 2, and the iterated monodromy group of such a map is known to have elements of infinite order (see [12]). REFERENCES [1] Bartholdi, L., Nekrashevych, V. (2006). Thurston equivalence of topological polynomials. Acta Math. 197(1): 1–51. [2] Bartolo, E. A., Agust´ın, J. I. C. (2009). On the topology of hypocycloids. In: Mathematical Physics and Field Theory: Julio Abad, “in Memoriam.” Zaragoza: Prensas Universitarias de Zaragoza, pp. 83–98. [3] Bedford, E., Jonsson, M. (2000). Dynamics of regular polynomial endomorphisms of Ck . Amer. J. Math. 122(1): 152–212. [4] Belk, J., Koch, S. (2010). Iterated monodromy for a two-dimensional map. In: Bonk, M., Gilman, J., Masur, H., Minsky, Y., Wolf, M., eds. In the Tradition of Ahlfors–Bers V. Contemp. Math., Vol. 510. Providence, RI: American Mathematical Society, pp. 1–12. [5] Fornæss, J. E., Sibony, N. (1992). Critically finite rational maps of P2 . In: Nagel, A., Stout, E. L., eds. The Madison Symposium on Complex Analysis. Contemp. Math., Vol. 137. Providence, RI: American Mathematical Society, pp. 245–260.
38
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
[6] Godillon, S. (2012). Introduction to iterated monodromy groups. Ann. Fac. Sci. Toulouse Math. 21(5): 1069–1118. [7] Hoffman, M. E., Withers, W. D. (1988). Generalized Chebyshev polynomials associated with affine Weyl groups. Trans. Amer. Math. Soc. 308(1): 91–104. [8] Hubbard, J. H., Papadopol, P. (1994). Superattractive fixed points in Cn . Indiana J. Math. 43(1): 321– 365. [9] Lopes, A. O. (1990). Dynamics of real polynomials on the plane and triple point phase transition. Math. Comput. Model. 13(9): 17–32. [10] Morley, F., Morley, F. V. (1933). Inversive Geometry. Boston, MA: Ginn and Company. (2014). Reprint. Mineola, NY: Dover Publications. [11] Nakane, S. (2008). External rays for polynomial maps of two variables associated with Chebyshev maps. J. Math. Anal. Appl. 338(1): 552–562. [12] Nekrashevych, V. (2011). Iterated monodromy groups. In: Campbell, C., Quick, M., Robertson, E., Roney-Dougal, C., Smith, G., Traustason, G., eds. Groups St Andrews 2009 in Bath. London Math. Soc. Lecture Note Series, Vol. 387. Cambridge, UK: Cambridge Univ. Press, pp. 41–93. [13] Nekrashevych, V. (2012). The Julia set of a post-critically finite endomorphism of PC2 . J. Mod. Dyn. 6(3): 327–375. [14] Ryland, B., Munthe-Kaas, H. (2011). On multivariate Chebyshev polynomials and spectral approximations on triangles. In: Hesthaven, J., Rønquist, E., eds. Spectral and High Order Methods for Partial Differential Equations. Lect. Notes Comput. Sci Eng., Vol. 76. Heidelberg: Springer, pp. 19–41. [15] Uchimura, K. (2001). The set of points with bounded orbits for generalized Chebyshev mappings. Int. J. Bifur. Chaos. 11(1): 91–107. [16] Uchimura, K. (2007). Dynamics of symmetric polynomial endomorphisms of C2 . Michigan Math. J. 55(3): 483–511. [17] Uchimura, K. (2009). Generalized Chebyshev maps of C2 and their perturbations. Osaka J. Math. 46(4): 995–1017. [18] Van Kampen, E. (1933). On the fundamental group of an algebraic curve. Amer. J. Math. 55(1): 255– 260. [19] Veselov, A. P. (1986). Integrable polynomial mappings and Lie algebras. In: Kozlov, V. V., Fomenko, A. T., eds. Geometry, Differential Equations and Mechanics (Moscow 1985). Moscow: Moskov. Gos. Univ. Mekh.-Mat. Fak., pp. 59–63. [20] Veselov, A. P. (1991). Integrable maps. Russian Math. Surveys. 46(5): 1–51. [21] Withers, W. D. (1988). Folding polynomials and their dynamics. Amer. Math. Monthly. 95(5): 399–413. [22] Zariski, O. (1929). On the problem of existence of algebraic functions of two variables possessing a given branch curve. Amer. J. Math. 51(2): 305–328. JOSHUA BOWMAN is an assistant professor of mathematics in the Natural Sciences Division of Seaver College at Pepperdine University. Natural Sciences Division, Pepperdine University, 24255 Pacific Coast Highway, Malibu, CA 90263 [email protected]
ORCID Joshua P. Bowman
January 2021]
http://orcid.org/0000-0002-7705-3069
THE DELTOID MAP
39
Sums of Proper Powers We say that m is a proper power if m = a b for some integers a, b > 1. In [2], the authors proved that every positive integer n ≥ 3317 + 12 can be written as the sum of four proper powers. We improve their result to: Theorem. All n ≥ 28 can be represented as a sum of four proper powers. Furthermore, the only exceptions are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 22, 23, 27. Proof. Let n ≥ 333 + 12 be an odd number. Since gcd(φ(32), 3) = 1 (where φ is the Euler totient function), a 3 runs through all odd residue classes modulo 32 as a runs through all odd residues modulo 32. Let P ∈ {33 , 53 , . . . , 333 } be a cube such that P ≡ n − 12 mod 32. Then n − P = 4(8k + 3). Gauss showed that every positive integer not congruent to 0, 4, 7 mod 8 is a sum of three coprime squares [1, p. 262]. Hence we may choose integers x, y, z with x 2 + y 2 + z2 = 8k + 3. Looking mod 8 we see x, y, z are all odd, so we may assume x, y, z ≥ 1. But then n = P + (2x)2 + (2y)2 + (2z)2 . Therefore, for all odd n ≥ 333 + 12 = 35949, we have a representation of n as a sum of four proper powers. Let n ≥ 493 + 45 = 117693 be even. Since gcd(3, φ(48)) = 1, as a runs through all odd residues modulo 48, a 3 also runs through all the odd residues modulo 48. Therefore, we can find an odd cube P ∈ {33 , 53 , . . . , 493 } such that P ≡ n − 45 mod 48. Hence n = P + (48k + 45) for some integer k. Since 48k + 45 ≡ 5 mod 8, there exist integers x ≥ y ≥ z ≥ 0 such that x 2 + y 2 + z2 = 48k + 45 with gcd(x, y, z) = 1. We want to show that none of x, y, z is 0 or 1. First, note that x 2 + y 2 ≡ 0 mod 3 implies x ≡ y ≡ 0 mod 3. So if z = 0, then gcd(x, y, z) ≥ 3. Therefore x, y, z ≥ 1. If z = 1, then x 2 + y 2 = 48k + 44 ≡ 12 mod 16. The squares modulo 16 are 0, 1, 4, 9. We cannot add two of them to get 12 mod 16; therefore the representation of 48k + 45 as x 2 + y 2 + z2 includes only proper powers. Hence, n = P + x 2 + y 2 + z2 is the sum of four proper powers. Therefore, for n ≥ 493 + 45 we have a representation of n as a sum of four proper powers. To conclude one needs to check n < 117693. We computed all possible sums of four proper powers up to 117693 and we found all numbers between 1 and 117693 that are not represented as a sum of four proper powers.
REFERENCES [1] Dickson, L. E. (1966). History of the Theory of Numbers, Vol. II: Diophantine Analysis. New York: Chelsea Publishing Co. [2] Schinzel, A., Sierpi´nski, W. (1965). Sur les puissances propres. Bull. Soc. Roy. Sci. Li`ege. 34: 550–554.
—Submitted by Paul Pollack, University of Georgia Enrique Trevi˜no, Lake Forest College doi.org/10.1080/00029890.2021.1847588 MSC: Primary 11D85, Secondary 11A07
40
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Pick’s Theorem and Convergence of Multiple Fourier Series L. Brandolini, L. Colzani, S. Robins, and G. Travaglini
Abstract. We add another brick to the large building comprising proofs of Pick’s theorem. Although our proof is not the most elementary, it is short and reveals a connection between Pick’s theorem and the pointwise convergence of multiple Fourier series of piecewise smooth functions.
1. INTRODUCTION. A polygon in the Cartesian plane is simple if it has no holes and if its boundary does not intersect itself. It is called an integer polygon if all of its vertices have integer coordinates. Let P be a simple integer polygon, |P | its area, I the number of integer points strictly inside P , and B the number of integer points on the boundary ∂P . Then Theorem 1 (Pick). 1 |P | = I + B − 1. 2
(1)
In spite of the elementary statement, this is not an ancient result. It was published by Georg Pick in 1899, and first popularized by Hugo Steinhaus in 1937 in the Polish edition of Mathematical Snapshots; see [12, Chapter 4] for an English edition. The theorem has many proofs and interesting features. Its statement can be explained to elementary school children, who could be asked to verify it on examples. On the other hand, it can be related to certain nontrivial topics in mathematics. See, e.g., [5] for a connection to Euler’s formula for planar graphs, or [10] for a connection to Minkowski’s theorem on integer points in convex bodies, or [3] for a complex-analytic proof. A sketch of an easy proof runs as follows. Step 1. A simple integer polygon can be triangulated into integer primitive triangles, with no integer points other than the vertices. Step 2. Both terms |P | and I + 12 B − 1 in (1) are “additive” with respect to the above triangulation. Step 3. A primitive triangle together with one of its reflections gives a parallelogram that tiles the plane under integer translations. Step 4. This latter parallelogram has area 1, so that (1) holds true for primitive triangles. The purpose of this article is to exhibit a direct connection between Pick’s theorem and harmonic analysis. The Fourier-analytic proof we give here is rather short and selfcontained, it does not rely on any of the above geometric steps, and it is an elementary consequence of a classical result on pointwise convergence of multiple Fourier series. Moreover, it suggests a point of departure for higher-dimensional investigations. In what follows, our standard reference for the harmonic analysis on Euclidean spaces is [13]. We recall some notations and some well-known results. If f and ϕ are doi.org/10.1080/00029890.2021.1839241 MSC: Primary 52B20, Secondary 42B05
January 2021]
PICK’S THEOREM AND FOURIER SERIES
41
integrable functions on Rd , then ϕ ∗ f denotes the convolution: ϕ (x − y) f (y) dy. ϕ ∗ f (x) = Rd
Moreover, f denotes the Fourier transform: f (x) e−2πiξ ·x dx. f(ξ ) = Rd
It is easily verified that if f is integrable on Rd , then n∈Zd f (n + x) is a periodic function integrable on the torus Rd /Zd and its Fourier coefficients are the restriction of f to the integer points in Zd , ⎞ ⎛ ⎝ f (n + x)⎠ e−2πim·x dx Rd /Zd
n∈Zd
=
d n∈Zd {n+[0,1) }
=
Rd
f (y) e−2πim·(y−n) dy
f (y) e−2πim·y dy = f(m) .
Hence, formally, one has the Poisson summation formula f (n + x) = f(m) e2πim·x . n∈Zd
(2)
m∈Zd
Our proof of Pick’s theorem is based on the Poisson summation formula applied to the characteristic function of a polygon. But there is a problem. The above formula as written does not immediately apply to nonsmooth functions, such as characteristic functions. Without additional assumptions the series in both sides of this identity may not converge pointwise. Even when both sides converge, they may differ. For example, observe that when a function is modified on a set of measure zero, such as the boundary of a polygon, the left-hand side of the formula may change, while the right-hand side remains the same. On the other hand, a correct formula can be obtained assuming natural regularity conditions on the function f and using suitable summability methods for the Fourier series. Next, we recall the elementary facts that are required in order to use Poisson summation correctly. Recall that if ϕ and f are square integrable, then the convolution is uniformly bounded: |ϕ ∗ f (x)| = ϕ (x − y) f (y) dy Rd
1/2
≤
|ϕ (x − y)| dy
1/2
2
Rd
|f (y)| dy 2
Rd
,
using the Cauchy–Schwartz inequality. It follows from the latter inequality, and the fact that square integrable functions can be approximated by continuous functions with compact support, that ϕ ∗ f is uniformly continuous. The Fourier transform of the 42
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
convolution is the product of the Fourier transforms of the factors, i.e., ϕ ∗ f (ξ ) = ϕ (ξ ) f(ξ ). The Fourier transform commutes with rotations, and in particular the Fourier transform of a radial function is radial. Moreover, for every dilation ε > 0 the ϕ (εξ ). The proofs of these statements Fourier transform of ϕε (x) := ε−d ϕ ε−1 x is involve elementary manipulations of integrals. Finally, if ϕ is smooth with compact support, then ϕ is smooth and it has rapid decay at infinity. This follows by repeated integrations by parts in the integral that defines the Fourier transform. A quick sketch of our proof of Pick’s theorem now runs as follows. Take a smooth radial function ϕ with compact support and integral 1, and define f (x) = ϕε ∗ χP (x) , so that f(ξ ) = ϕ (εξ ) χ P (ξ ). The Poisson summation formula applies to this function f , so that applying (2) with x = 0, we have ϕε ∗ χP (n) = ϕ (εm) χ P (m) . n∈Z2
m∈Z2
Observe that the series on the left is finite, because ϕε ∗ χP has compact support, and the series on the right is absolutely convergent, because χ P is bounded and ϕ has fast decay at infinity. Also observe that, if ε is small enough, ϕε ∗ χP (n) is the normalized measure of the angle at the point n: ϕε ∗ χP (n) = ε−2 ϕ(εy)χP (n − y)dy R2
⎧ 0 ⎪ ⎨ 1 = ⎪ ⎩ 1/2 α/2π
if x if x if x if x
∈ / P, is in the interior of P , is in the interior of a side of P , is a vertex of P , with interior angle α.
See Figure 1.
Figure 1. Values of ϕε ∗ χP (n) at the integer points.
From the formula for the sum of the interior angles of a polygon it follows that n∈Z
January 2021]
1 ϕε ∗ χP (n) = I + B − 1. 2 2
PICK’S THEOREM AND FOURIER SERIES
43
One can compute explicitly the Fourier transform χ P via the divergence theorem. In particular, we have P (0) = |P | , ϕ (0) χ P (0) = χ while for every m = 0, ϕ (−εm) χ P (−m) , ϕ (εm) χ P (m) = − and one of the main points is that all of these terms cancel. It follows that
ϕ (εm) χ P (m) = |P | .
m∈Z2
Hence, 1 |P | = I + B − 1. 2 This is only a sketch of the proof, but it is not difficult to fill in the details. The details are contained in what follows. 2. CONVERGENCE OF FOURIER EXPANSIONS. The following variation on the classical Poisson summation formula is tailored for our problem. Theorem 2. Let ϕ and f be square integrable functions on Rd with compact support. Assume that Rd ϕ (x) dx = 1, and also assume that for every x, f (x) = lim {ϕε ∗ f (x)} .
(3)
ε→0+
Then, for every ε > 0, ϕ (εm) f(m) < +∞. m∈Zd
Moreover, for every x, n∈Zd
f (n + x) = lim
ε→0+
⎧ ⎨ ⎩
m∈Zd
ϕ (εm) f(m) e2πim·x
⎫ ⎬ ⎭
.
(4)
If ϕ is smooth with compact support, then ϕ has fast decay at infinity and the theorem reduces to the classical Poisson summation formula. See, e.g., [13, Chapter 7, Corollary 2.6] and [13, Chapter 2, Theorem 3.16] for similar results where ϕ is the Poisson kernel. Condition (3) is a regularity assumption on the function f , and it is satisfied at every point of continuity of the function. In particular, in dimension d = 1 and if ϕ is even, at a jump discontinuity of f the hypothesis is satisfied provided that f (x + ε) + f (x − ε) . f (x) = lim ε→0+ 2 44
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Recall that the one-dimensional Fourier series of a piecewise smooth function at a jump discontinuity converges precisely to the above limit. Proof. Under the assumption of the theorem, the convolution ϕε ∗ f is bounded with compact support, the sum n∈Zd ϕε ∗ f (n + x) is finite, and it gives a bounded funcϕ (εm) f(m), tion on the torus Rd /Zd = [0, 1)d , with Fourier coefficients ⎛ ⎞ ⎝ ϕε ∗ f (n + x)⎠ e−2πim·x dx Rd /Zd
n∈Zd
=
ϕε ∗ f (y) e−2πim·y dy = ϕ (εm) f(m) .
Rd
Hence, the function
ϕε ∗ f (n + x)
n∈Zd
has Fourier expansion
ϕ (εm) f(m) e2πim·x .
m∈Zd
Recall that Fourier series may diverge at some points; hence it is not obvious that the above Fourier series converges and that it is equal pointwise to the function being expanded. Since the sum n∈Zd ϕε ∗ f (n + x) is finite and has a bounded number of nonzero terms as ε → 0+ , for every x the limit commutes with the sum: ⎧ ⎫ ⎨ ⎬ ϕε ∗ f (n + x) = f (n + x) . lim ⎭ ε→0+ ⎩ d d n∈Z
n∈Z
Then it is enough to show that for every x,
ϕε ∗ f (n + x) =
n∈Zd
ϕ (εm) f(m) e2πim·x .
(5)
m∈Zd
This follows from the fact that m∈Zd ϕ (εm) f(m) converges, which is a consequence of the following Plancherel–Polya type inequality. Let g be an integrable function with compact support and let ψ be a smooth compactly supported function with ∗ is ψ (x) = 1 on the support of g. Since g (x) = ψ (x) g (x), g (ξ ) = ψ g (ξ ), and ψ rapidly decreasing, we have | g (m)| = ψ (m − ξ ) g (ξ ) dξ m∈Zd
m∈Zd
Rd
⎫ ⎧ ⎨ ⎬ | | sup ψ (m − ξ ) g (ξ )| dξ c g (ξ )| dξ. ⎭ Rd Rd ξ ∈Rd ⎩ d m∈Z
January 2021]
PICK’S THEOREM AND FOURIER SERIES
45
Observe that above the constant c depends on ψ (x), hence on the support of g. See, e.g., [11, Chapter 3] for more general inequalities of this type. Applying this inequality to the function g (x) = ϕε ∗ f (x), we obtain ∗ f = ϕ ϕ f ϕ (εξ ) f(ξ ) dξ c (m) (εm) (m) ε m∈Zd
Rd
m∈Zd
1/2
c = cε
Rd
| ϕ (εξ )|2 dξ
f(ξ )2 dξ
1/2
Rd
1/2
−d/2
|ϕ (x)| dx
1/2
2
Rd
|f (x)| dx 2
Rd
.
Observe that the factor ε−d/2 does not contradict the existence of the limit as ε → 0+ . The above estimate is just what we need to show the pointwise equality (5) for every fixed ε > 0, since we already observed that the limit of the left-hand side of (5) exists. 3. PICK’S THEOREM. Our proof of Pick’s theorem below is a corollary of the version of the Poisson summation formula above, applied to characteristic functions of integer polygons. Such characteristic functions do not satisfy the assumption (3) of Theorem 2, but they can be regularized by modifying the values at the boundary. It is a classical argument to restate Pick’s theorem in terms of normalized angles as follows. Define a regularization of the characteristic function of the polygon P : ⎧ 0 ⎪ ⎨ 1 χ P (x) = 1/2 ⎪ ⎩ α/2π
if x if x if x if x
∈ / P, is in the interior of P , is in the interior of a side of P , is a vertex of P , with interior angle α.
Assuming that P has N vertices, since the sum of the inner angles is π (N − 2), we have χ P (k) = 1+ 1/2 + α/2π k∈Z2
interior points of P
interior points of sides of P
vertices of P
1 1 1 = I + (B − N) + (N − 2) = I + B − 1. 2 2 2 Hence, Pick’s theorem is reduced to the following. Theorem 3. If P is a simple integer polygon, then
χ P (n) = |P | .
n∈Z2
Proof. Let ϕ be a square integrable radial function with compact support and integral 1; for example let ϕ (x) = 4π −1 χ{|x| 0 small enough and for every n ∈ Z2 it can be easily shown that P (n) . P (n) = χ ϕε ∗ χ 46
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Then χ P satisfies the assumption (3) of Theorem 2 and ⎫ ⎧ ⎬ ⎨ χ P (n) = lim ϕ (εm) χ P (m) . ⎭ ε→0+ ⎩ 2 2 n∈Z
(6)
m∈Z
Observe that in this identity the limit can be omitted as soon as ε is small enough, and P (m), since χ (x) = also observe that χ P (m) = χ (x) except on a set of measure χ zero. Let P have vertices P j and sides Pj + t Pj +1 − Pj : 0 ≤ t ≤ 1 with out ward unit normals nj . Then, with the notation PN+1 = P1 , if m = 0 the divergence theorem yields
−m −2πim·x −2πim·x χ P (m) = e dx = div e dx 2πi |m|2 P P N 1 −2πim· P +t P −P −1 nj · m ( j ( j +1 j )) dt. = e Pj +1 − Pj 2πi j =1 |m|2 0 The one-dimensional integrals can be computed explicitly, and when Pj and m belong to Z2 , then 1 0 if m · Pj +1 − Pj = 0, e−2πim·(Pj +t (Pj +1 −Pj )) dt = 1 if m · Pj +1 − Pj = 0. 0 Recalling that ϕ (0) = 1 and χ P (0) = |P |, we obtain P (0) + ϕ (εm) χ P (m) = χ ϕ (εm) χ P (m) m∈Z2 \{0}
m∈Z2
⎛ N 1 Pj +1 − Pj ⎝ = |P | − 2πi j =1
m=0, m·(Pj +1 −Pj )=0
⎞ m · nj ⎠ ϕ (εm) . |m|2
(7)
Finally, the sums inside the parentheses vanish, because, under the assumption that ϕ (εm) is radial, ϕ (εm) |m|−2 m · nj is an odd function of m. Hence
ϕ (εm) χ P (m) = |P | .
m∈Z2
Observe that in the proof of the above theorem the assumption that the polygon is simple can be weakened. In particular, the formulation of Pick’s theorem in terms of normalized interior angles also holds for integer polygons with holes. 4. FURTHER REMARKS. Pick’s theorem, in the naive form that we know it, fails in dimension d ≥ 3. Indeed, as observed by J. E. Reeve, the tetrahedron with vertices (0, 0, 0), (1, 0, 0), (0, 1, 0), (1, 1, N), has volume N/6, contains four integer points on the boundary, and has no integer points inside. Hence there is no simple relation between the volume and the integer points for general three-dimensional integer polytopes. January 2021]
PICK’S THEOREM AND FOURIER SERIES
47
Fascinating relations do appear, however, when an integer polyhedron is dilated by an integer factor. By Ehrhart’s theorem from the 1950s, the number of integer points in a dilated integer polyhedron P is a polynomial function of the integer dilation parameter, with leading coefficient equal to the volume of P . The reader may consult, for example, the books [1] and [2] for an account of Ehrhart’s main theorems. The Poisson summation formula was used in [4] to analyze the Ehrhart polynomial of an integer polytope in Rd . P (n) can be easily defined The above-defined regularized discrete volume n∈Zd χ in every dimension, but in general it is no longer equal to the Euclidean volume |P |. However, as we see from equations (6) and (7), it is still true that
χ P (n) = |P |
if and only if
ϕ (εm) χ P (m) = 0
(8)
0=m∈Zd
n∈Zd
for all sufficiently small ε > 0, and for every choice of ϕ as before. That is, if an integer polytope P satisfies (8), then by definition its continuous Euclidean volume is equal to its regularized discrete volume. We can call such integer polytopes concrete polytopes, following the tradition of [7], who used the first three letters of “continuous” and the last five letters of “discrete” to consider objects that can be described by both continuous methods and by discrete methods. An interesting open problem is to characterize the concrete polytopes in Rn ; that is, what are the integer polytopes that enjoy the relation n∈Zd χ P (n) = |P |? As already shown by Barvinok [1], integer zonotopes are concrete polytopes, as well as integer symmetric polytopes whose facets are also symmetric. A more general family of concrete polytopes is given by multiple tilers. Indeed, an easy application of the Poisson summation formula (see [8], [9, p. 137]) tells us that the integer polytope P (m) = 0 for P multi-tiles Rd with the lattice of integer translations, if and only if χ every m ∈ Z\{0}, so that identity (8) is trivially satisfied. On the other hand Garber and Pak have recently shown that there exist concrete lattice polytopes in R3 which do not multi-tile R3 (see [6]). ACKNOWLEDGMENTS. The third author was supported in part by Conselho Nacional de Desenvolvimento Cient´ıfico e Technol´ogico – CNPq (Proc. 423833/2018-9).
ORCID L. Brandolini
http://orcid.org/0000-0002-9670-9051
REFERENCES [1] Barvinok, A. (2008). Integer Points in Polyhedra. Zurich Lectures in Advanced Mathematics. Z¨urich: European Mathematical Society. [2] Beck, M., Robins, S. (2015). Computing the Continuous Discretely. Integer-Point Enumeration in Polyhedra, 2nd ed. New York: Springer. [3] Diaz, R., Robins, S. (1995). Pick’s formula via the Weierstrass ℘-function. Amer. Math. Monthly. 102(5): 431–437. [4] Diaz, R., Robins, S. (1997). The Ehrhart polynomial of a lattice polytope. Ann. of Math. (2). 145(3): 503–518. [5] Funkenbusch, W. W. (1974). From Euler’s formula to Pick’s formula using an edge theorem. Amer. Math. Monthly. 81(6): 647–648. [6] Gerber, A., Pak, I. (2020). Concrete polytopes may not tile the space. arxiv.org/abs/ 2003.04667v2 [7] Graham, R. L., Knuth, D. E., Patashnik, O. (1994). Concrete Mathematics, 2nd ed. Upper Saddle River, NJ: Addison-Wesley.
48
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
[8] Gravin, N., Robins, S., Shiryaev, D. (2012). Translational tilings by a polytope, with multiplicity. Combinatorica. 32(6): 629–649. [9] Kolountzakis, M. N. (2004). The study of translation tilings with Fourier analysis. In: Brandolini, L., Colzani, L., Iosevich, A., Travaglini, G., eds. Fourier Analysis and Convexity. Boston, MA: Birkh¨auser, pp. 131–187. [10] Murty, M. R., Thain, N. (2007). Pick’s theorem via Minkowski’s theorem. Amer. Math. Monthly. 114(8): 732–736. [11] Nikol’ski˘ı, S. M. (1975). Approximation of Functions of Several Variables and Imbedding Theorems. (Danskin, J. M., trans.) New York-Heidelberg: Springer-Verlag. [12] Steinhaus, H. (1983). Mathematical Snapshots. New York, NY: Oxford Univ. Press. [13] Stein, E., Weiss, G. (1971). Introduction to Fourier Analysis on Euclidean Spaces. Princeton, NJ: Princeton Univ. Press. LUCA BRANDOLINI is a full professor at the University of Bergamo, Italy. His research interests are harmonic analysis, its applications to geometric discrepancy and partial differential equations. He has published more than fifty research papers on these subjects. Dipartimento di Ingegneria Gestionale, dell’Informazione e della Produzione, University of Bergamo, Dalmine, Italy [email protected]
LEONARDO COLZANI received his Laurea in Matematica from the Universit`a degli Studi di Milano and his Ph.D. from Washington University in St. Louis. He has been a full professor at the Universit`a della Calabria and at the Universit`a degli Studi di Milano, and he is currently at the Universit`a di Milano-Bicocca. His main field of research is harmonic analysis. Dipartimento di Matematica ed Applicazioni, Universit`a di Milano-Bicocca, Milano, Italy [email protected]
SINAI ROBINS is a full professor at the University of S˜ao Paulo, Brazil, working in the Institute for Mathematics and Statistics. His primary interests are discrete geometry, number theory, and Fourier analysis. He has coauthored the book Computing the Continuous Discretely: Integer point Enumeration in Polytopes with M. Beck, and enjoys doing math at the local coffee shop with his friends. Instituto de Matem´atica e Estatistica, Universidade de S˜ao Paulo, S˜ao Paulo, Brazil [email protected]
GIANCARLO TRAVAGLINI is a full professor of Mathematics at the Universit`a di Milano-Bicocca. His main mathematical interests are Fourier analysis, geometric discrepancy, and mathematics education. He has authored the books Number Theory, Fourier Analysis and Geometric Discrepancy, Cambridge University Press (2014), and Studying Mathematics: The Beauty, the Toil and the Method, Springer (2018) (joint with M. Bramanti). Dipartimento di Matematica e Applicazioni, Universit`a di Milano-Bicocca, Milano, Italy [email protected]
January 2021]
PICK’S THEOREM AND FOURIER SERIES
49
Perturbing the Mean Value Theorem: Implicit Functions, the Morse Lemma, and Beyond David Lowry-Duda and Miles H. Wheeler
Abstract. The mean value theorem of calculus states that, given a differentiable function f on an interval [a, b], there exists at least one mean value abscissa c such that the slope of the tangent line at (c, f (c)) is equal to the slope of the secant line through (a, f (a)) and (b, f (b)). In this article, we study how the choices of c relate to varying the right endpoint b. In particular, we ask: When we can write c as a continuous function of b in some interval? As we explore this question, we touch on the implicit function theorem, a simplified version of the Morse lemma, and the theory of analytic functions.
1. INTRODUCTION AND STATEMENT OF THE PROBLEM. The mean value theorem is one of the truly fundamental theorems of calculus. It says that if f is a differentiable function defined on a closed interval [a, b], then there is at least one c in the open interval (a, b) such that f (b) − f (a) = f (c). b−a
(1)
We call c a mean value abscissa for f on [a, b]. Looking at a graph y = f (x) as in Figure 1, the left-hand side of (1) is the slope of the secant line from (a, f (a)) to (b, f (b)), while the right-hand side is the slope of the tangent line passing through (c, f (c)). Observe that there can be multiple possibilities for c; in the first graph in Figure 1 we could have chosen either c or c .
Figure 1. An illustration of the mean value theorem for the function f (x) = x 3 − 3x 2 + 2x. The straight lines are the secant lines. In each graph, the endpoints of the secant line and two mean value abscissae are indicated by points on the curve. Dashed lines from each point indicate corresponding x-values. doi.org/10.1080/00029890.2021.1840879 MSC Primary 26A24, Secondary 26A06; 26A15
50
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
In this article, we are interested in how the possible solutions c of (1) change as we vary one of the endpoints of the interval, say the right endpoint b. In particular, we will try to answer the following question: Suppose c = c0 is our favorite mean value abscissa for a = a0 and b = b0 . If b changes slightly, can we also change c slightly so that (1) is still satisfied? In other words, is there a locally continuous choice c = C(b) of the mean value abscissa? For example, in the right-hand graph in Figure 1, we consider the new value bnew . Here, it appears that the small change from —is this always b to bnew corresponds to small changes from c to cnew and c to cnew possible? 2. SOME EXAMPLES. To get a better feel for the problem we have set out for ourselves, let’s graph some functions and their mean value abscissae. To make our life simpler, we will stick to examples with a = a0 = 0 and f (a0 ) = f (b0 ) = 0. Then the left-hand side of (1) is zero when a = a0 and b = b0 , and so any corresponding mean value abscissa c = c0 has to be a critical point where f (c0 ) = 0. This may seem like a lot of assumptions, but in fact if someone hands us a more general function f we can always consider the related function g(x) = f (x) −
f (b ) − f (a ) 0 0 (x − a0 ) + f (a0 ) , b0 − a0
which satisfies g(a0 ) = g(b0 ) = 0. Both f and g have the same solutions to the mean value condition (1).
Figure 2. Left: The parabola y = −x 2 + 2x. Three pairs of points are shown: (b0 , c0 ), (b1 , c1 ), and (b2 , c2 ). For each bi , the corresponding ci is a mean value abscissa on the interval [a0 , bi ]. Right: The graph of all mean value abscissae as a function of b (where a0 = 0 is fixed); b is on the horizontal axis, and c = b/2 is on the vertical axis. Points corresponding to the three pairs on the left are noted. In the mean value theorem, b > c, and we represent this by shading the region where c ≥ b.
Consider the parabola at the left of Figure 2. There is only one choice of c0 : the vertex. When we slightly increase b0 to b1 , we have to slightly increase c0 to c1 . Similarly if we decrease b0 to b2 , then we have to decrease c0 to c2 . Plotting the mean value abscissae c for each b, we get the picture at the right of Figure 2. Looking at the figure, c appears to be a continuous function of b; indeed in this case we can solve (1) explicitly to get c = b/2. For more on functions where the ratio c/b is constant, see [1]. Next consider the more complicated graph at left in Figure 3. There are now two critical points. One is a local maximum, and the behavior near this point is very similar to the behavior near the vertex of the parabola. The second one, which we have labeled as c0 , is a nonextremal critical point (it is neither a local maximum nor a local minimum). Suppose that b1 is just a bit bigger than b0 . Then the slope of the secant January 2021]
PERTURBING THE MEAN VALUE THEOREM
51
Figure 3. Left: The graph of a function with an inflection point. The point (1, 1) is a mean value abscissa on the interval [0, 3], but there is no continuous extension of this solution to a neighborhood of b0 . The straight line is the secant line corresponding to the interval [a0 , b]. Right: The graph of all mean value abscissae, as in the previous figure. The behavior is substantially more complicated. At the point (b0 , c0 ), we observe that c is not a function of b.
line that appears on the right-hand side of (1) is strictly negative. When c is close to c0 , on the other hand, the right-hand side f (c) of (1) is nonnegative. There is no solution to (1) without choosing c far away from c0 . 3. THE IMPLICIT FUNCTION THEOREM. In the last section, we saw that the set of solutions (b, c) of (1) can look quite complicated. On the right of Figure 3, for instance, c is not a function of b (the curve fails the “vertical line test”) and b is also not a function of c (the curve fails the “horizontal line test”). This is possible because (1) is an implicit equation. Implicit equations show up all over the place. For instance in geometry, x 2 + y 2 = 1 is the equation for the circle with radius one centered at the origin, while x 2 + 4xy + y 2 = 2 is the equation for a certain hyperbola. By subtracting any terms on the righthand side, we can write any implicit equation in two variables as F (x, y) = 0
(2)
for some function F . It’s often tempting to try and solve an implicit equation for one of the variables, and indeed that’s how we got the formula c = b/2 for the example on the right of Figure 2. When the equation gets more complex, though, as it is in Figure 3, this method becomes very tedious and is quite often impossible! Calculus offers a powerful tool, called the implicit function theorem, to help us understand implicit equations. The intuition behind the theorem is the following: Suppose that we have found one solution (x0 , y0 ), and we are interested in solutions of (2) near this initial solution. If the function F is differentiable, then for (x, y) ≈ (x0 , y0 ) we can approximate F as F (x, y) ≈ F (x0 , y0 ) + Fx (x0 , y0 )(x − x0 ) + Fy (x0 , y0 )(y − y0 ),
(3)
where the subscripts on F are used to denote partial derivatives. This is a first-order Taylor approximation of F , a two-variable version of the tangent-line approximation for functions of a single variable. Plugging (3) into the equation (2) that we are trying to solve and using F (x0 , y0 ) = 0, we get Fx (x0 , y0 )(x − x0 ) + Fy (x0 , y0 )(y − y0 ) ≈ 0. 52
c THE MATHEMATICAL ASSOCIATION OF AMERICA
(4) [Monthly 128
While (4) is only approximately true, its advantage is that it is a linear equation. In particular, if Fy (x0 , y0 ) = 0, then we can try to “solve for y,” giving the approximation y = Y (x) ≈ y0 −
Fx (x0 , y0 ) (x − x0 ). Fy (x0 , y0 )
(5)
The implicit function theorem says that the conclusion of this intuitive argument is correct: as long as F (x0 , y0 ) = 0 and Fy (x0 , y0 ) = 0, we can indeed solve F (x, y) = 0 for y when (x, y) is close to (x0 , y0 ). Theorem 1 (Implicit function theorem). Suppose that F = F (x, y) is a continuously differentiable function and that at some point (x0 , y0 ) we have F (x0 , y0 ) = 0
and
Fy (x0 , y0 ) = 0.
(6)
Then there exists a continuously differentiable function Y (x) such that the implicit equation F (x, y) = 0 is equivalent to the explicit equation y = Y (x) whenever (x, y) is sufficiently close to (x0 , y0 ). For a proof of Theorem 1, see for instance [7, §13], which in fact treats a more general problem with many variables. Much more about the implicit function theorem can be found in the classic reference [3]. With the implicit function theorem in hand, we are now ready to investigate the possibility of determining when there exist locally continuous choices of the mean value abscissa c in (1). The first step is to rewrite (1) as F (b, c) = 0 where F (b, c) =
f (b) − f (a) − f (c). b−a
(7)
From now on we assume that f is twice continuously differentiable, in which case F is once continuously differentiable. Suppose that c0 is a mean value abscissa corresponding to b0 , i.e., that F (b0 , c0 ) = 0. A quick computation shows that Fb (b0 , c0 ) =
f (b0 ) − f (c0 ) , b0 − a
Fc (b0 , c0 ) = −f (c0 ),
where here we have used F (b0 , c0 ) = 0 to simplify. Thus Fc (b0 , c0 ) = 0 is true exactly when f (c0 ) = 0. And if f (c0 ) = 0, then by Theorem 1 there exists a continuously differentiable function C(b) so that F (b, c) = 0 is equivalent to c = C(b) whenever (b, c) is close enough to (b0 , c0 ). Although we have focused on the question of when the mean value abscissa c can be written as a continuous function of the right endpoint b, we also have the data for the converse question: When is the right endpoint b a function of the mean value abscissa c? By Theorem 1, b can be written as a function of c near (b0 , c0 ) when Fb (b0 , c0 ) = 0, or equivalently when f (b0 ) = f (c0 ). In total, we have proved the following theorem. Theorem 2. Let f be a twice continuously differentiable function, fix an interval [a0 , b0 ], and let c0 be a mean value abscissa for f in [a0 , b0 ]. January 2021]
PERTURBING THE MEAN VALUE THEOREM
53
(a) Suppose that f (c0 ) = 0. Then there is a continuously differentiable function C(b) so that f (b) − f (a) = f (C(b)) b−a for all b close to b0 . There are no other solutions (b, c) of (1) close to (b0 , c0 ). (b) Suppose that f (b0 ) = f (c0 ). Then there is a continuously differentiable function B(c) so that f (B(c)) − f (a) = f (c) B(c) − a for all c close to c0 . There are no other solutions (b, c) of (1) close to (b0 , c0 ). This theorem gives perspective on Figure 3. The mean value abscissa in that figure was at an inflection point, where f (c0 ) = 0, and so Theorem 2a is inconclusive about whether we can write c = C(b). Looking at the figure it appears that we cannot. On the other hand f (b0 ) < f (c0 ) = 0, and so Theorem 2 implies that we can write b = B(c). 4. THE MORSE LEMMA. We have now shown that there exist continuous choices of c = C(b) around those mean value abscissae such that f (c0 ) = 0. Conversely, we have shown that when there is a mean value abscissa c such that f (b0 ) = f (c0 ), then b can be written as a continuous function of c in a neighborhood of c0 . But what if both f (c0 ) = 0 and f (b0 ) = f (c0 )? As before, we return to pictorial investigation. Fortunately, these are two strong constraints and we quickly identify interesting aspects from graphs. For ease, we suppose again that f (a0 ) = f (b0 ) = 0, and we now suppose that f (c0 ) = f (b0 ) = f (c0 ) = 0. In Figure 4, we examine three different functions f : each satisfies f (c0 ) = 0, but f (b) is positive on the left, zero in the middle, and negative on the right. We’ve named these three values of b as bm , bi , and bM (according to whether b is a minimum, an inflection point, or a maximum, respectively). Examining the top left graph of Figure 4, we observe that in a small neighborhood of c0 , all tangent lines have nonnegative slope. Similarly, for all b in a small neighborhood around bm , the secant lines from (0, 0) to (b, f (b)) have nonnegative slope. Qualitatively, it appears that for b just a little less than bm , we could vary c to match slopes. But in which direction should c be moved? We can see this apparent choice of direction in the mean value abscissa graph at bottom left: near (bm , c0 ), the graph resembles an X. This reveals a key difference from the situation where f (c) = 0. In both the implicit function theorem and Theorem 2, the resulting implicitly defined functions are unique. But here, it appears that sometimes there are multiple different continuous choices of c(b)—that is, if there are any at all. In the top right graph of Figure 4, we see that in a small neighborhood of c0 , all tangent lines again have nonnegative slope. But in a small neighborhood around bM , the secant lines from (0, 0) to (b, f (b)) all have nonpositive slope. Thus there is no hope at all to extend c to a function on a larger neighborhood. We recognize this in the mean value abscissa graph below by seeing that (bM , c0 ) is an isolated point. The behavior in the top center graph, near bi , is a bit more delicate. Here, in a small neighborhood of c0 , all tangent lines have nonpositive slope. For b just to the left of bi , the secant lines from (0, 0) to (b, f (b)) have nonpositive slope, and so it qualitatively 54
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Figure 4. Top: Three graphs of functions f with an interval and corresponding mean value abscissa indicated. In each graph, f (c0 ) = 0. In the left graph, f (bm ) > 0. In the middle graph, f (bi ) = 0. In the right graph, f (bM ) < 0. Bottom: Below each graph is a plot of all mean value abscissae as a function of b, as in previous figures. In the first graph, there appear to be multiple choices for continuous functions c(b). In the second graph, there is a continuous extension on an interval with bi as an endpoint. In the third graph, the initial solution is completely isolated.
appears that it might be possible to associate points near c with matching slopes. But for b just to the right of bi , the secant lines all have positive slope, which cannot be matched to slopes of points in a neighborhood of c. These examples indicate a wider variety of behavior, and it is not immediately obvious what the general rule is. We cannot directly apply the implicit function theorem to F (b, c) at (b0 , c0 ) because both of its partial derivatives Fb (b0 , c0 ) and Fc (b0 , c0 ) are zero there. As with the second derivative test for local maxima and minima, the next natural step is to try and extract information from the second derivatives Fbb , Fbc , and Fcc at (b0 , c0 ). This is the idea behind the Morse lemma, a simple version of which is the following. Lemma 3 (Morse lemma). Let F = F (x, y) be a three-times continuously differentiable function and suppose that F (x0 , y0 ) = Fx (x0 , y0 ) = Fy (x0 , y0 ) = 0 but that Fxx (x0 , y0 )Fyy (x0 , y0 ) − (Fxy (x0 , y0 ))2 = 0.
(8)
Then in a neighborhood of (x0 , y0 ) there is a change of coordinates (x, y) → (u, v) so that F (x, y) = ±u2 ± v 2 .
(9)
The number of minus signs on the right-hand side of (9) is called the Morse index of F at (x0 , y0 ). It is independent of the particular choice of coordinates (u, v), and is one of the basic ingredients in Morse theory [4]. By a “change of coordinates” (x, y) → (u, v), we mean that u and v can be written as continuously differentiable functions of (x, y), while at the same time x and y can be written as continuously differentiable functions of (u, v). We also require that (x0 , y0 ) → (0, 0). January 2021]
PERTURBING THE MEAN VALUE THEOREM
55
Remark. Those familiar with multivariable calculus might recognize the conditions of the Morse lemma as an alternative way of saying that the gradient of G vanishes at the origin, but the Hessian matrix is invertible there. A full proof of Lemma 3 involves the implicit function theorem in higher dimensions (or its close cousin the inverse function theorem). But there is a related result for functions g = g(x) of a single variable that we can prove with Theorem 1 alone. Lemma 4 (One-dimensional Morse lemma). Let g = g(x) be a three-times continuously differentiable function and suppose that g(x0 ) = g (x0 ) = 0 but g (x0 ) = 0. Then near x0 there is a change of coordinates x → u so that g(x) = ±u2 ,
(10)
where we take + in (10) when g (x0 ) > 0 and − when g (x0 ) < 0. Proof. Expanding g in a Taylor series near x0 , the first two terms are zero and so we have the approximate formula g(x) ≈
g (x0 ) (x − x0 )2 . 2
More precisely, we can use Taylor theorem to write g (x0 ) (x − x0 )2 + r(x)(x − x0 )2 2 g (x ) 0 + r(x) = (x − x0 )2 2
g(x) =
(11)
where r(x) is a remainder term that is small when x is near x0 . Suppose that g (x0 ) > 0. Then the term in square brackets on the right-hand side of (11) is positive for x close to x0 , and so the definition u = (x − x0 )
g (x0 ) + r(x) 2
(12)
makes sense. Moreover, squaring (12) we see that (11) is equivalent to g(x) = +u2 . But we still need to check that x → u is a valid change of coordinates. To do this, we study the zeros of the function G(x, u) = (x − x0 )
g (x0 ) + r(x) − u 2
using the implicit function theorem. We calculate that G(x0 , 0) = 0 while the two partial derivatives Gx (x0 , 0) =
g (x0 ) > 0, 2
Gu (x0 , 0) = −1 < 0
are both nonzero. By the implicit function theorem, the first inequality shows that we can uniquely solve G(x, u) = 0 for x = X(u) in a neighborhood of (x, u) = (x0 , 0), while the second inequality shows that we can also solve for u = U (x). Further, each 56
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
of these maps is continuously differentiable. Thus x → u is a valid change of coordinates. The argument when g (x0 ) < 0 is almost identical, except that now we set g (x0 ) + r(x) u = (x − x0 ) − 2 and we get a minus sign instead of a plus sign in (10). Returning to the problem at hand, observe that our function F (b, c) defined in (7) can be written as F (b, c) = g1 (b) − g2 (c)
(13)
where g1 and g2 are defined by g1 (b) =
f (b) − f (a0 ) − f (c0 ), b − a0
g2 (c) = f (c) − f (c0 ).
(14)
We include the f (c0 ) terms in (14) so that g1 (b0 ) = g2 (c0 ) = 0. Our assumptions that Fb (b0 , c0 ) = Fc (b0 , c0 ) = 0 while Fbb (b0 , c0 ) = 0 and Fcc (b0 , c0 ) = 0 translate to g1 (b0 ) = g2 (c0 ) = 0 and g1 (b0 ) = 0, g2 (c0 ) = 0. If the original function f is four-times continuously differentiable, then g1 and g2 are three-times continuously differentiable, and we can apply Lemma 4 to find changes of coordinates x → u and y → v so that g1 (x) = ±u2 and g2 (y) = ±v 2 . Plugging back into (13) we get F (b, c) = ±u2 ± v 2 , which is exactly the conclusion (9) of the Morse lemma! Introducing some notation, we have shown that for (b, c) close to (b0 , c0 ) we can write F (b, c) = σ1 u2 − σ2 v 2 where σ1 and σ2 are each either +1 or −1. In terms of the original function f , we note that σ1 is the sign of f (b0 ) and σ2 is the sign of f (c0 ). There are a few different possibilities depending on the combinations of the signs σ1 and σ2 . (i) If σ1 and σ2 have opposite signs, then F (b, c) = 0 is equivalent to u2 = −v 2 . The only solution is (u, v) = (0, 0) and this solution is isolated. (ii) If σ1 and σ2 have the same sign, then F (b, c) = 0 is equivalent to u2 = v 2 . This has two solutions u = ±v, and no other nearby solutions. Looking again at Figure 4, we see that case (i) corresponds to the right graph, and case (ii) corresponds to the left graph. We summarize the results of our exploration in the following theorem. Theorem 5. Let f be a four-times continuously differentiable function, and suppose c0 is a mean value abscissa for f in the interval [a0 , b0 ]. If f (c0 ) = 0 and f (b0 ) = f (c0 ), but f (b0 ) and f (c0 ) are both nonzero, then • •
If f (b0 ) and f (c0 ) have opposite signs, then c0 cannot be extended to a continuous function c = C(b) near b0 . If f (b0 ) and f (c0 ) have the same sign, then there are two continuously differentiable functions c = C1 (b) and c = C2 (b) solving (1) for b near b0 . There are no other nearby solutions.
January 2021]
PERTURBING THE MEAN VALUE THEOREM
57
5. FURTHER GENERALIZATIONS. Looking back, if f (c0 ) = 0 we can use Theorem 2, while if f (c0 ) = 0 but f (c0 ) = 0 and f (b0 ) = 0, then we can use Theorem 5. What if f (c0 ) = f (c0 ) = 0 but f (4) (c0 ) = f (c0 ) = 0? Or, even more ambitiously, what if all of the derivatives of f at c0 up to f (200) (c0 ) vanish but f (201) (c0 ) = 0? Looking back at our proof Theorem 5, it is clear that we will need a generalization of Lemma 4. Thankfully there is one available; the proof is similar and so we leave it as an exercise to the reader. Lemma 6. Fix an integer k and suppose that g is a (k + 1)-times continuously differentiable function with g(x0 ) = g (x0 ) = · · · = g (k−1) (x0 ) = 0 but
g (k) (c0 ) = 0.
(15)
Then near x0 there is a change of coordinates x → u so that g(x) = ±uk .
(16)
If k is odd then we can choose either + or − in (16), while if k is even then we must take the sign of g (k) (x0 ). When (15) holds we say that g has a zero of order k at 0. Suppose now that f is a smooth function, c0 is a mean value abscissa for f on the interval [a0 , b0 ], and f (b0 ) = f (b0 ) = · · · = f (k−1) (b0 ) = 0
f (c0 ) = f (c0 ) = · · · = f
(−1)
(c0 ) = 0
but f (k) (b0 ) = 0, but f
()
(c0 ) = 0,
(17a) (17b)
for some integers k and . Using the definitions in (14), we can check that g1 has a zero of order k at b0 while g2 has a zero of order − 1 at c0 . Applying Lemma 6 twice to find changes of coordinates b → u and c → v, this means that F (b, c) = ±uk ± v −1 ,
(18)
analogous to the Morse lemma but with higher powers. Relating the ± signs back to f (k) (b0 ) and f () (c0 ) eventually leads to the following four cases: (i) If is even, then there is one continuous solution c = C(b) of (1) in a neighborhood of (b0 , c0 ) and no other nearby solutions. (ii) If is odd, k is even, and f (k) (b0 ) and f () (c0 ) have the same sign, then there are two continuous solutions c = C1 (b) and c = C2 (b) of (1) in a neighborhood of (b0 , c0 ) and no other nearby solutions. (iii) If is odd, k is even, and f (k) (b0 ) and f () (c0 ) have opposite signs, then (b0 , c0 ) is an isolated solution of (1). (iv) If and k are both odd, then there are two continuous solutions c = C1 (b) and c = C2 (b) of (1), but they are only defined in a one-sided neighborhood of b0 where f (k) (b0 )f () (c0 )(b − b0 ) > 0. Unfortunately, even when our function f is infinitely differentiable, it is possible that the functions g1 or g2 in (14) will have roots of “infinite order” where all of their derivatives vanish, in which case our argument above breaks down. For instance, the classic “bump function,” defined to be exp(−1/(1 − x 2 )) for −1 < x < 1 and 0 58
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
otherwise, is infinitely differentiable but vanishes together with all of its derivatives at the points x = 1 and x = −1. We can rule out this possibility by restricting to analytic functions. Recall that a function f is analytic if the Taylor series for f centered at each point x0 converges to f in a neighborhood of x0 . That is, for each x0 , we have the equality f (x) =
∞ f (n) (x0 ) n=0
n!
(x − x0 )n
for all x in a neighborhood of x0 . One of the nice things about analytic functions is that we can only have f (k) (x0 ) = 0 for all k ≥ 1 if f is a constant function. This allows us to prove the following simple lemma. Lemma 7. Let f be a nonconstant analytic function satisfying f (a0 ) = f (b0 ) = 0. Then there is a mean value abscissa c0 of f on [a0 , b0 ] such that (17a) holds with k even, i.e., the order of vanishing of f at c0 is odd. Proof. Since f is nonconstant with f (a0 ) = f (b0 ) = 0, it must attain either a positive maximum or a negative minimum at some point c0 in the interval (a0 , b0 ). Since c0 is an extremum, f (c0 ) = 0, and since f is analytic we know that (17b) holds for some integer . If were odd, then a Taylor expansion would imply that f (x) ≈ f (c0 ) +
f () (c0 ) (x − c0 ) !
was strictly increasing or strictly decreasing near c0 , contradicting the fact that c0 is a local extremum. Using Lemma 7, we can now prove a final theorem specialized to analytic functions. Theorem 8. Let f be real analytic on the interval [a0 , b0 ], and assume that neither f nor f is constant. Then there is at least one mean value abscissa c0 for f on [a0 , b0 ] for which there exists a continuous function c = C(b) so that f (b) − f (a) = f (C(b)) b−a for all b in a neighborhood of b0 . There are no other solutions near (b0 , c0 ). Proof. By Lemma 7, we can find c0 in (a0 , b0 ) where (17a) holds with k even. The analyticity of f also guarantees that (17b) holds for some integer , and so we are in case (i) above, as desired.
Remark. As a final note, we note that being “merely” infinitely differentiable is not strong enough to guarantee that there is always a choice of a mean value abscissa with a continuous dependence on the right endpoint. For a counterexample, see Figure 5. One can construct a smooth function of this shape from bump functions. On the indicated interval [0, 3], every value c0 with 1 ≤ c0 ≤ 2 is a valid mean value abscissa; this is reflected in the mean value abscissa plot on the right by a vertical line segment from January 2021]
PERTURBING THE MEAN VALUE THEOREM
59
Figure 5. A smooth function where no continuous choice of mean value abscissa exists.
(3, 1) to (3, 2). For b just to the left of b0 , the slope of the secant line is negative, and for b just to the right of b0 , the slope of the secant line is positive. But any value c0 is either at least distance 1/2 away from a point c where f (c) < 0 or a point c where f (c) > 0. There is no continuous choice of mean value abscissa for this function. 6. REFLECTION AND FURTHER QUESTIONS. The major selling point of Theorem 8 is that it gives us a continuous choice c = C(b) of mean value abscissa without any assumptions on f other than analyticity. The downside is that it only tells us about abscissas of type (i) above, and does not give us any information about types (ii)–(iv). It is also worth noting that in general we expect “most” abscissas to be “regular” enough that either Theorem 2 or Theorem 5 applies. There are of course many more questions that one could ask about the set of solutions to (1). For instance, what if we allowed the left endpoint a to vary as well as b and look for continuous choices c = C(a, b)? The techniques we have used will still be very powerful, but, for instance, the decomposition F (b, c) = g1 (b) − g2 (c) in Sections 4 and 5 will no longer be as simple. One could also study the global structure of the solution sets shown in Figures 3 and 4. How many different connected components are there? In what ways can they “begin” and “end”? Answering such questions involves a very different set of mathematical techniques. ACKNOWLEDGMENTS. D. L.-D. was supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE 0228243, EPSRC Programme Grant EP/K034383/1 LMF: LFunctions and Modular Forms, and the Simons Collaboration in Arithmetic Geometry, Number Theory, and Computation via the Simons Foundation grant 546235. M. H. W. was supported by the National Science Foundation under Grant No. DMS-1400926. We also thank the many contributors to the Python programming packages NumPy [6], SymPy [5], and matplotlib [2], as we used this software for our own exploration and to create the functions and figures in this article. A copy and description of the code used for this article are available at davidlowryduda.com/choosing-functions-for-mvt-abscissa/.
ORCID David Lowry-Duda
http://orcid.org/0000-0002-8543-4558
REFERENCES [1] Carter, P., Lowry-Duda, D. (2017). On functions whose mean value abscissas are midpoints, with connections to harmonic functions. Amer. Math. Monthly. 124(6): 535–542.
60
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
[2] Hunter, J. D. (2007). Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9(3): 90. [3] Krantz, S. G., Parks, H. R. (2012). The Implicit Function Theorem: History, Theory, and Applications. New York, NY: Springer Science & Business Media. [4] Matsumoto, Y. (2002). An Introduction to Morse Theory. Iwanami Series in Mathematics, Vol. 208. Providence, RI: American Mathematical Society. ˇ ık, O., Kirpichev, S. B., Rocklin, M. A., Ivanov, S., Moore, [5] Meurer, A., Smith, C. P., Paprocki, M., Cert´ J. K., Singh, S., Rathnayake, T. (2017). SymPy: Symbolic computing in Python. PeerJ Comput. Sci. 3: e103. [6] Oliphant, T. E. (2006). A Guide to NumPy. Trelgol Publishing. web.mit.edu/dvp/Public/numpybook.pdf [7] Strichartz, R. S. (2000). The Way of Analysis. Sudbury, MA: Jones & Bartlett Learning. DAVID LOWRY-DUDA earned his Ph.D. from Brown University in 2017. He is a senior research scientist at the Institute for Computational and Experimental Research in Mathematics (ICERM), where he develops mathematical software and mathematics. Although he normally thinks about analytic number theory and algebraic geometry, the mean value theorem seems to pop up everywhere. ICERM, 121 Main Street, Box E, 11th Floor, Providence, RI 02903, USA [email protected]
MILES H. WHEELER earned his Ph.D. from Brown University in 2014. He is a lecturer in the Department of Mathematical Sciences at the University of Bath. In his downtime, he likes to play the ukulele. His mathematical interests include partial differential equations and fluid mechanics. University of Bath, Claverton Down, Bath BA2 7AY, UK [email protected]
100 Years Ago This Month in The American Mathematical Monthly Edited by Vadim Ponomarenko Girolamo Saccheri’s Euclides Vindicatus. Edited and translated by George Bruce Halsted. Chicago and London, Open Court, 1920. 8vo. 30+246 pp. Price $2.00. The Jesuit Girolamo Saccheri was born in 1667 and first taught in a college of his order in Milan where Tommaso Ceva, a brother of the Giovanni Ceva whose triangle theorem is well known, was teacher of mathematics. [. . . ] As to Dr. Halsted’s translation there is no indication in the work before us that any part of the work has appeared in print elsewhere before. Nevertheless this is the case. The last paragraph of a circular advertising the book under review, and signed by Dr. Halsted, is as follows: “The English translation is a revision of the first ever made into any language, published in 1894, but long unprocurable.” Hardly a single statement in this sentence is accurate. The translation in question (of propositions I–XXXVI in the first book) appeared in this Monthly, volumes 1–5, June, 1894– December, 1898. In “1894” the English translation of thirteen propositions only had appeared. The complete German translation of the first book was published before half of the English translation had appeared. The early numbers of the Monthly are not “unprocurable.” —Excerpted from R. C. Archibald, “Recent Publications” (1921). 28(1): 28–30.
January 2021]
PERTURBING THE MEAN VALUE THEOREM
61
A Good Question Won’t Go Away: An Example of Mathematical Research Robert F. Brown Abstract. The story of the question “must commuting maps of the unit interval have a common fixed point” is used to illustrate strategies that advance mathematical research.
1. INTRODUCTION. Some time ago, we were discussing this at a math conference: what would you say if you met someone at a cocktail party and she asked you about your work? If you reply “I teach and I do research” and she asks “what sort of research?” do you just get embarrassed and mumble “it’s hard to explain” or do you have something coherent to say? I do have something to say. “I work with geometric objects. I’m interested in what happens when you move the points around on these objects in a continuous way, which means that nearby points are moved to nearby locations. For example, if you continuously move the points on the sphere, that is, the surface of a ball, just a little bit, it turns out that at least one point has to stay fixed. On the other hand, I can move all the points of a torus, that is, the surface of a donut, a little bit just by rotating it. So that rotation has no fixed points. My specialty is called fixed point theory and what I’ve just described for you is how fixed point theory reveals a rather subtle mathematical difference between a sphere and a torus.”1 I probably would have told my cocktail party acquaintance just about as much as she ever wanted to know about mathematical research. I assume that readers of the Monthly have a much longer attention span for such topics, especially if they have not (yet) participated in research either in an undergraduate research program or as a graduate student. In this article, I will tell the story of a fixed point question, a story that demonstrates several significant features of mathematical research. 2. THE QUESTION. The question concerns moving points on a geometric object, but one even more basic than the sphere or torus: the interval I = [0, 1] in the real line. A continuous function, for which I’ll use the simpler term map, f from the interval to itself must have a fixed point. The reason is that the difference function d(x) = f (x) − x is a map because f is, d(0) ≥ 0, and d(1) ≤ 0, so the intermediate value theorem tells us that d(x∗ ) = 0 for some x∗ ∈ I and therefore f (x∗ ) = x∗ . Now consider two maps f and g of I to itself. Both have fixed points, but in general the fixed points of f have no relationship with those of g. However, suppose the maps f and g are related, are their fixed points also related? Specifically, The question: Suppose f, g : I → I commute, that is, f (g(x)) = g(f (x)) for all x ∈ I . Do their fixed point sets intersect? That is, if f and g are commuting selfmaps of the interval, do they have a common fixed point: a solution x∗ to the equations f (x∗ ) = x∗ = g(x∗ )? Although this question was not published, it was informally posed, independently, by three different people: Eldon Dyer in 1954, Alan Shields in 1955, and Lester Dubins 1 Unfortunately, I have never had the opportunity to say all that. Perhaps I should go to more cocktail parties.
doi.org/10.1080/00029890.2021.1847592
62
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
in 1956. The question may have been motivated by the fact that the common fixed point property was known to hold for commuting polynomials. J. F. Ritt [27] proved in 1923 that, besides rather obvious examples like f (x) = (p(x))m , g(x) = (p(x))n for some polynomial p(x), the only polynomials that commute are the Chebyshev polynomials. This much-studied class was known to have the common fixed point property, see [3].2 The earliest response to the question, in terms of publication date, was a paper by Ralph DeMarr [12], published in this Monthly in 1963. The research strategy that DeMarr employed was also used by several other researchers, as we shall see. It appeared that the hypotheses, that f and g are continuous and that they commute, were not strong enough to imply that they shared a fixed point. So DeMarr strengthened the hypotheses to require a more restrictive form of continuity: the maps f and g must be Lipschitz continuous. Recall that a map f : I → I is Lipschitz continuous if there exists α > 0 such that |f (x) − f (y)| < α|x − y| for all x, y ∈ I . Notice that the number α, called the Lipschitz constant, is not unique since the inequality will hold for any larger value. Let β be the Lipschitz constant for g. If the smallest possible values for both α and β is greater than 1, then he showed that the maps have a common fixed point provided these constants satisfy a certain inequality that thus relates the behavior of the two maps. However, if for one of the maps, call it f , we can use α = 1, that is, if |f (x) − f (y)| ≤ |x − y| for all x, y ∈ I so that f is what is called a nonexpansive map, then there is a common fixed point even if g is only continuous rather than Lipschitz continuous. To prove this, let (f ) denote the set of fixed points of f , which is a closed subset of I and thus has a maximum M and minimum m. If x ∈ [m, M], then f (x) − m ≤ |f (x) − m| = |f (x) − f (m)| ≤ |x − m| = x − m, so f (x) ≤ x. In the same way, we can see that M − f (x) ≤ M − x so f (x) ≥ x and we have proved that if f is a nonexpansive map, then (f ) = [m, M]. Now if x ∈ [m, M] then since g commutes with f we have g(x) = g(f (x)) = f (g(x)), so the restriction of g is a map of [m, M] to itself and the intermediate value theorem again tells us that g(x∗ ) = x∗ for some x∗ ∈ [m, M] = (f ), that is, x∗ is a common fixed point. Thus, if a map commutes with a nonexpansive map, then there is a common fixed point. A few years later, in another Monthly paper, Gerald Jungck [22] refined DeMarr’s conditions so that neither map need actually be Lipschitz continuous, but the maps still must exhibit similar closely related behavior. Haskell Cohen’s paper [11] established the existence of a common fixed point by strengthening the hypotheses on the commuting functions f and g in a different way, namely that they are not only continuous but also open. In our setting that means that the images of subsets of I that are intersections of open intervals of the real line with I are subsets of the same kind.3 In particular, homeomorphisms are open maps, so Cohen’s result proves that commuting homeomorphisms have a common fixed point. Just as the posing of the problem occurred to several people independently, the same thing happened with the open map condition. Jon Folkman [14] and James Joichi [21] both improved Cohen’s result by showing that if just one of the commuting maps f or g is an open map, then they have a common fixed point. A simple lemma in Folkman’s paper can give us a sense of the significance of the commutativity condition. We will not try to prove that a map of the interval that com2 There are more extensive discussions of the background of the question in the introduction to [5] and in Section 4 of [26]. 3 Cohen stated his condition in different terms, but it was subsequently established that his condition was equivalent to the concept of an open map which is widely employed in topology.
January 2021]
A GOOD QUESTION
63
mutes with an open map has a common fixed point with it. Instead we will establish the much weaker result that a map that commutes with a monotone map has a common fixed point with it. Let f be a monotone map of I and g a map that commutes with it. If f is monotone decreasing, it has a single fixed point x∗ and since the maps commute, f (g(x∗ )) = g(f (x∗ )) = g(x∗ ), so g(x∗ ) is a fixed point of f and, since there is only one, g(x∗ ) = x∗ . If f is monotone increasing, let x0 be a fixed point of g and define a sequence in I by setting xn = f (xn−1 ) for n ≥ 1; then (xn ) is an increasing sequence and therefore it converges to some x∗ ∈ I . We can prove by induction that the points (xn ) of the sequence are all fixed points of g: it’s true for n = 0 and if g(xn−1 ) = xn−1 , then by commutativity g(xn ) = g(f (xn−1 )) = f (g(xn−1 )) = f (xn−1 ) = xn . Consequently, x∗ is a fixed point of g because g is continuous, as follows: g(x∗ ) = g( lim xn ) = lim g(xn ) = lim xn = x∗ . n→∞
n→∞
n→∞
But x∗ is also a fixed point of the continuous function f because x∗ = lim xn = lim xn+1 = lim f (xn ) = f ( lim xn ) = f (x∗ ). n→∞
n→∞
n→∞
n→∞
A different supplementary hypothesis for our question depends on the concept of a periodic point. Given a function f : X → X, let f 1 (x) = f (x) and, in general, define f n : X → X by f n (x) = f (f n−1 (x)). A point x ∈ X is a periodic point of f if f n (x) = x. If f n (x) = x but f k (x) = x for all k < n, then n is called the least period of the periodic point x. We will simplify the terminology by calling such a periodic point a period n point. Again similar conditions strong enough to imply a common fixed point were discovered independently. John Maxfield and W. J. Mourant [25] proved that commuting maps on the unit interval have a common fixed point if one of the maps has no period 2 points. In other words, if f (f (x)) = x implies f (x) = x. The condition of Sherwood Chu and R. D. Moyer [9] is that there is a subinterval [a, b] on which one of the maps has a fixed point and the other has no period 2 points. The periodic point concept appears in the somewhat different approach to our question by Arthur Schwartz in [28]. In attempting to answer a question like ours, of whether certain hypotheses imply the desired conclusion, rather than strengthening the hypotheses, you can ask whether the given hypotheses are at least sufficient to establish a somewhat weaker conclusion. Rather than ask whether the commuting maps f and g have a common fixed point, we could require only that they have a common periodic point, of possibly different periods. In principle, this should be easier to prove since maps usually have many periodic points that are not fixed points. In [28] Schwartz did require a strengthening of the hypotheses, namely, that f is differentiable rather than only continuous. His conclusion is that there is a fixed point of f that is a periodic point of g, but it is not necessarily a fixed point of g. 3. THE ANSWER. It seems appropriate that a question that independently occurred to more than one person should have been answered independently by two people. In 1967 William Boyce and John Huneke constructed, and in 1969 published in [5] and [20] respectively, examples of commuting maps of the interval that have no common fixed point, so the answer to our question is “no.” Before discussing the examples, let’s note that this could be viewed as good news for all the people mentioned above. If the 64
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
answer had been “yes,” that is, commuting maps of the interval always have a common fixed point, then all their results would just be partial solutions to a subsequently solved problem. But since the general answer is “no,” their contributions are part of a more interesting answer: “not always, but . . . ” In [20], Huneke presented two examples. The first is identical to the example in Boyce’s paper while the other, obtained in a different manner, is, in Huneke’s words “somewhat smoother” with regard to Lipschitz continuity. We will restrict this discussion to the example shared by Boyce and Huneke. According to Boyce, the key to the construction is contained in a paper of Glen Baxter [2] that was inspired by our question and had recently been published. To understand what Baxter did, keep in mind that we can think of the graph of a map f : I → I as a curve in the unit square and the fixed points of f as the intersections of its graph with the diagonal of the unit square. Starting with the usual commuting maps f and g, Baxter noted that the map f permutes the fixed points of the map h defined by h(x) = f (g(x)) because if h(x) = x, then f (h(x)) = f (x) and by commutativity f (h(x)) = f (f (g(x)) = f (g(f (x)) = h(f (x)), so h(f (x)) = f (x). Assuming that h has a finite number of fixed points, he divided the fixed points of h on (0, 1) into three classes depending on how the graph of h relates to the diagonal: crosses it heading down, crosses it heading up, or hits it but does not cross it. The theorem of Baxter is that f can take a fixed point of h only to a fixed point of the same class. If we number the fixed points of h from left to right as x1 to xN , not all permutations of the subscripts are possible since, for instance, the graph of h cannot cross the diagonal heading down at consecutive fixed points. The possible permutations became known as Baxter permutations. The example is based on a Baxter permutation of 13 crossings of the diagonal. Boyce used a computer to reduce the number of possible Baxter permutations that he needed to study to 112 cases and he showed that the only one that could generate the example was number 101.4 In order to define the commuting functions f and g, Boyce explicitly defined piecewise linear functions f1 , f2 , g1 , and g2 from I to itself. Thus the graphs of those functions are line segments joined end to end. He then inductively defined sequences of functions (fn ) and (gn ) with the property that fn gn+1 = gn fn+1 . He proved that the sequences are uniformly convergent so their limits f and g, which commute, are continuous and the fn and gn were defined in such a way that the fixed point sets of f and g are disjoint. 4. RESEARCH STRATEGIES AFTER THE ANSWER. Like a stone thrown into a still pond that sends out ripples in all directions, this question and its answer were just the starting points for wide-ranging research. Since our focus is on research strategies, I will just discuss some results from this literature that illustrate those strategies.5 After the publication of the examples of Boyce and Huneke, there were several improvements to previous results since the negative answer allowed for extensions of what was known. For example, Boyce in [6] refined the conditions of Maxfield and Mourant and of Chu and Moyer by proving that if maps of the interval commute and the set of period 2 points of one of them is connected, in particular if there is just one such point, then they still have a common fixed point. Much later, in [23], Jungck 4 Since computers only became generally available to researchers in the 1950s, this 1967 example must have been one of the first uses of a computer to solve an abstract mathematical problem. 5 A much more thorough exposition of the research inspired by our question is presented as Section 4 of [26].
January 2021]
A GOOD QUESTION
65
greatly clarified the relationship between fixed and periodic points. The coincidence set of f and g is the set of points x ∈ I such that f (x) = g(x). Jungck proved that if f and g commute on their coincidence set, then they have a common fixed point if and only if g has no periodic points other than fixed points. Our survey of the literature prior to the answering of the question did not illustrate the research strategy that is probably the one most frequently employed: generalization. The definitions of commuting maps and common fixed points make sense for any maps f, g : X → X of any topological spaces, though in general you would not expect the first to imply the second even in the presence of rather strong supplementary conditions. What is known about maps of the unit interval certainly extends to any arc, that is, a space X for which there is a homeomorphism h : I → X, because in that case a map f : X → X can be translated to the map h−1 f h : I → I . The question becomes more interesting if X is an arc-like continuum, that is, a compact connected space that is a limit, in an appropriate sense, of arcs. Jan Boronski in [4] recently proved that there are even commuting homeomorphisms, not just maps, of an arc-like continuum that have no common fixed points. In a different direction of generalization, suppose Y is a simple triad, that is, three arcs that intersect only at a common endpoint so it looks like the letter “Y.” If f, g : Y → Y are commuting maps, do they at least have a coincidence point y∗ ∈ Y , that is, f (y∗ ) = g(y∗ ), even though it may not be a fixed point? In 2009, Eric McDowell published an extensive discussion [26] of coincidence points. But as yet the simple triad question remains unanswered. Although commuting maps of the interval may fail to have a common fixed point, we have seen that additional hypotheses do imply its existence, in particular if one of the maps is open [14, 21]. Suppose that f and g are commuting open and onto maps of a space that has many of the topological properties of the interval, must they have a common fixed point? In 1975, William Gray and Carol Smith discussed this question in [15] and proposed the properties that the space should share with the interval, but it seems that no progress has been made in answering their question. A natural direction of generalization concerns dimension. The unit interval is onedimensional; what happens if we consider commuting maps in the next dimension, that is, maps of the unit square I 2 ? Martin Grinc [16] studied what he called triangular maps, that is, maps F : I 2 → I 2 of the form F (x, y) = (f1 (x), f2 (x, y)) for continuous f1 and f2 . He found conditions under which commuting triangular maps of the unit square have common fixed points and, in particular, he and Lubomir Snoha [17] proved that Jungck’s theorem, that a map with no periodic points other than fixed points has a common fixed point with every map that commutes with it, extends to triangular maps. Antonio Linero in [24] considered a class of maps F, G of the square that are of the form F (x, y) = (f2 (y), f1 (x)), G(x, y) = (g2 (y), g1 (x)) where the fi and gi are continuous. He called them Cournot maps, after the economist who introduced them in the 19th century. They are a significant class of functions in mathematical economics. He was able to extend results of Grinc to Cournot maps under suitable hypotheses. Another quite different way to generalize results to two dimensions is to exploit the fact that, just as the points of the line can be viewed as the real numbers, the points of the plane represent the complex numbers, so a map defined on the plane can be viewed as a complex function. Dan Eustice in [13] proved that commuting maps of the 2-disc have a common fixed point if they are holomorphic, that is, complex differentiable, on the interior. Eustice’s theorem was extended to holomorphic maps of all cartesian products of discs by Roberto Tauraso in [29]. There is a generalization that concerns maps on the unit interval, but now with respect to periodic points as a conclusion rather than a hypothesis. Must the sets of 66
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
all the periodic points of commuting maps have a nonempty intersection? As noted above, Schwartz proved that the answer is “yes” if one of the maps is differentiable, but Aliasghar Alikhani-Koopaei conjectured that it’s “no” if only continuity is assumed [1]. Jose Canovas and Antonio Linero proved in [8] that the answer is “yes” provided the set of periodic points of one of the maps is a closed subset of I , and in fact then a fixed point of that map is a periodic point of the other one. However, it appears that Alikhani-Koopaei’s conjecture is still unsettled. Recall that the Baxter permutations arose as part of the investigation of our question and were the key to its negative answer. Of the N! permutations of the numbers 1 to N, how many of them are Baxter permutations? In what the reviewer in [19] described as “a gem of enumeration,” Fan Chung, Ronald Graham, Verner Higgett, and Mark Kleiman obtained the desired formula in terms of binomial coefficients in [10]. But that was by no means the end of the interest in these combinatorial objects, which, as its extensive literature demonstrates, have close connections with many other topics in that area of mathematical research. For example, in [18], Olivier Guibert and Svante Linusson related the Baxter permutations to the Catalan numbers that were defined in the 19th century and in 2015, Benjamin Caffrey and his coauthors [7] introduced “snow leopard permutations” as a useful type of Baxter permutation. I have assumed that readers of the Monthly have a greater attention span for a discussion of mathematical research than my hypothetical cocktail party companion and, if you are still reading, you have demonstrated it. By now I hope that I have convinced you: a good question just won’t go away. ACKNOWLEDGMENTS. I thank the reviewers for their helpful comments, in particular for introducing me to some of the unsolved problems that are related to the question of this article.
REFERENCES [1] Alikhani-Koopaei, A. (2003). On common fixed points, periodic points and recurrent points of continuous functions. Int. J. Math. 2003(39): 2465–2473. doi.org/10.1155/S0161171203205366 [2] Baxter, G. (1964). On fixed points of the composite of commuting functions. Proc. Amer. Math. Soc. 15(6): 851–855. doi.org/10.1090/S0002-9939-1964-0184217-8 [3] Block, H., Thielman, H. (1951). Commuting polynomials. Quart. J. Math. 2(1): 241–243. doi.org/10.1093/qmath/2.1.241 [4] Boronski, J. (2019). A note on fixed points of abelian actions in dimension one. Proc. Amer. Math. Soc. 147(4): 1653–1655. doi.org/10.1090/proc/14365 [5] Boyce, W. (1969). Commuting functions with no common fixed points. Trans. Amer. Math. Soc. 137: 77–92. doi.org/10.1090/S0002-9947-1969-0236331-5 [6] Boyce, W. (1971). -compact maps on an interval and fixed points. Trans. Amer. Math. Soc. 160: 87– 102. doi.org/10.2307/1995792 [7] Caffrey, B., Egge, E., Michel, G., Rubin, K., Ver Steegh, J. (2015). Domino tilings of Aztec diamonds, Baxter permutations, and snow leopard permutations. Involve 8(5): 833–858. doi.org/10.2140/ involve.2015.8.833 [8] C´anovas, J., Linero, A. (2005). On the dynamics of compositions of commuting interval maps. J. Math. Anal. Appl. 305(1): 296–303. doi.org/10.1016/j.jmaa.2004.11.045 [9] Chu, S., Moyer, R. (1966). On continuous functions, commuting functions and fixed points. Fund. Math. 59: 91–95. doi.org/10.4064/fm-59-1-91-95 [10] Chung, F., Graham, R., Higgett, V., Kleiman, M. (1978). The number of Baxter permutations. J. Combin. Theory. 24(3): 382–394. doi.org/10.1016/0097-3165(78)90068-7 [11] Cohen, H. (1964). On fixed points of commuting functions. Proc. Amer. Math. Soc. 15(2): 293–296. doi.org/10.1090/S0002-9939-1964-0184219-1 [12] DeMarr, R. (1963). A common fixed point theorem for commuting mappings. Amer. Math. Monthly. 70(5): 535–537. doi.org/10.2307/2312067 [13] Eustice, D. (1972). Holomorphic idempotents and common fixed points on the 2-disk. Mich. Math. J. 19(4): 347–352. doi.org/10.1307/mmj/1029000945
January 2021]
A GOOD QUESTION
67
[14] Folkman, J. (1966). On functions that commute with full functions. Proc. Amer. Math. Soc. 17(2): 383– 386. doi.org/10.1090/S0002-9939-1966-0190916-6 [15] Gray, W., Smith, C. (1975). Common fixed points of commuting mappings. Proc. Amer. Math. Soc. 53(1): 223–226. doi.org/10.1090/S0002-9939-1975-0377843-1 [16] Grinc, H. (1999). On common fixed points of commuting triangular maps. Bull. Pol. Acad. Sci. Math. 47(1): 61–67. [17] Grinc, H., Snoha, L. (2000). Jungck theorem for triangular maps and related results. Appl. Gen. Topol. 1(1): 83–92. doi.org/10.4995/agt.2000.3025 [18] Guibert, O., Linusson, S. (2000). Doubly alternating Baxter permutations are Catalan. Discrete Math. 217(1–3): 157–166. doi.org/10.1016/S0012-365X(99)00261-7 [19] Harper, L. (1978). Review of [10]. Math. Rev. MR0491652 (82b:05011). [20] Huneke, J. (1969). On common fixed points of commuting continuous functions on an interval. Trans. Amer. Math. Soc. 139: 371–381. doi.org/10.1090/S0002-9947-1969-0237724-2 [21] Joichi, J. (1966). On functions that commute with full functions and common fixed points. Nieuw. Arch. Wiss. 14: 247–251. [22] Jungck, G. (1966). Commuting mappings and common fixed points. Amer. Math. Monthly. 73(7): 735– 738. doi.org/10.2307/2313982 [23] Jungck, G. (1992). Common fixed points for compatible maps on the unit interval. Proc. Amer. Math. Soc. 115(2): 495–499. doi.org/10.1090/S0002-9939-1992-1105040-0 [24] Linero, A. (2002/3). Common fixed points for commuting Cournot maps. Real Anal. Exchange 28(1): 121–143. doi.org/10.14321/realanalexch.28.1.0121 [25] Maxfield, J., Mourant, W. (1965). Common fixed points of commuting continuous functions on the unit interval. Indag. Math. 27: 668–670. [26] McDowell, E. (2009). Coincidence values of commuting functions. Topology Proc. 34: 365–384. [27] Ritt, J. F. (1923). Permutable rational functions. Trans. Amer. Math. Soc. 25(3): 399–448. doi.org/ 10.1090/S0002-9947-1923-1501252-3 [28] Schwartz, A. (1965). Common periodic points of commuting functions. Michigan Math. J. 12(3): 353– 355. doi.org/10.1307/mmj/1028999371 [29] Tauraso, R. (1998). Common fixed points of commuting holomorphic maps of the polydisc which are expanding on the torus. Adv. Math. 138(1): 92–104. doi.org/10.1006/aima.1998.1742 ROBERT F. BROWN received his Ph.D., under the direction of Edward Fadell, from the University of Wisconsin in 1963. Department of Mathematics, University of California, Los Angeles, CA 90095 [email protected]
68
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Perfect Metric Spaces of Increasingly Large Cardinality It is well known that every nonempty perfect subset of reals is uncountable. (Recall that a subset of a topological space is said to be perfect if it is closed and has no isolated points.) The same is true in complete metric spaces (see [2, p. 41] and [1, p. 93]). We provide a collection of perfect metric spaces having increasingly large cardinalities. Theorem 1. For every perfect metric space of cardinality α, there exists a perfect metric space of cardinality 2α . d Proof. Let (X, d) be any perfect metric space with cardinality α. Then d1 := 1+d is an equivalent bounded metric on X and hence (X, d1 ) is also perfect. Fix any x0 ∈ X. Let Y denote the family of all functions from X \ {x0 } into {0, 1}. Being a perfect metric space, X is infinite. Hence Y has cardinality 2α . For all f, g ∈ Y, define 0, if f = g; D(f, g) := sup d1 (x, x0 ) : x ∈ X \ {x0 }, f (x) = g(x) , if f = g.
It can be verified that D is a metric on Y. We claim that (Y, D) is a perfect metric space. Let f ∈ Y and ∈ (0, 1). Define g ∈ Y as follows: g(x) :=
1 − f (x), f (x),
if d1 (x, x0 ) < /2; if d1 (x, x0 ) ≥ /2.
Since X is perfect, there exists some x1 ∈ X \ {x0 } such that d1 (x1 , x0 ) < /2. Hence g(x1 ) = f (x1 ). Thus 0 < D(g, f ) ≤ 2 < . Therefore f is a limit point of Y. Hence (Y, D) is a perfect metric space. Starting with X = [0, 1] under the usual metric, the above result provides a c sequence of perfect metric spaces having cardinalities c, 2c , 22 , . . . . Let be the collection of cardinal numbers α such that there exists a perfect metric space with cardinality α. Then contains each of the cardinal numbers c c, 2c , 22 , . . . . It is also easy to see that contains all α such that ℵ0 ≤ α ≤ c. The question remains open whether contains any other cardinal number, apart from these. REFERENCES [1] Pugh, C. C. (2002). Real Mathematical Analysis. New York: Springer-Verlag. [2] Rudin, W. (1976). Principles of Mathematical Analysis, 3rd ed. Singapore: McGraw Hill.
—Submitted by Surinder Pal Singh Kainth, Panjab University, India doi.org/10.1080/00029890.2021.1834326 MSC: Primary 54A99
January 2021]
c THE MATHEMATICAL ASSOCIATION OF AMERICA
69
NOTES Edited by Vadim Ponomarenko
A Probabilistic Proof of the Spherical Excess Formula Daniel A. Klain Abstract. A probabilistic proof of Girard’s angle excess formula for the area of a spherical triangle emerges from the observation that an unbounded 3-dimensional convex cone, with single vertex at the origin, has only three kinds of 2-dimensional orthogonal projections: a 2-dimensional convex cone with one vertex, a 2-dimensional half-plane (an outcome with probability zero), and a 2-dimensional plane.
A triangle T in the unit sphere with inner angles θ1 , θ2 , and θ3 has area given by the spherical excess formula1 : Area(T ) = θ1 + θ2 + θ3 − π.
(1)
See Figure 1.
θ3
θ1
θ2
Figure 1. A spherical triangle.
This note offers a probabilistic proof of the angle excess formula (1), based on the observation that an unbounded cone at the origin in R3 has only three kinds of 2dimensional orthogonal projections: a cone in R2 , a half-plane in R2 (an outcome with probability zero), and all of R2 . See Figure 2. Observe that, if we omit the middle outcome of measure zero, the number of edges on each projected figure is twice the number of vertices. Some notation will help to interpret angles as probabilities. Let S denote the unit sphere in R3 centered at the origin, having surface area 4π. 1 This formula was discovered in 1603 by Thomas Harriot [6, p. 65] and is also known as Girard’s formula [2, p. 95]. doi.org/10.1080/00029890.2021.1839303 MSC: Primary 52A15
70
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
?
1 vertex 2 edges
(measure zero outcome)
0 vertices 0 edges
Figure 2. Projections of a 3-dimensional cone.
Suppose that P is a convex polytope in R3 , and let x be any point of P . The solid inner angle aP (x) of P at x is given by aP (x) = {u ∈ S | x + u ∈ P for some > 0}. Let αP (x) denote the measure of the solid inner angle aP (x) ⊆ S, given by the usual surface area measure on subsets of the sphere. If F is a vertex, edge, or facet of a convex polytope P , then the solid inner angle measure αP (x) is the same at every point x in the relative interior of F . This value will be denoted by αP (F ). Consider the case of an unbounded cone C with single vertex at the origin o, as in Figure 2. Specifically, let v1 , v2 , v3 be three linearly independent unit vectors in R3 , and let C denote all nonnegative linear combinations: C = {t1 v1 + t2 v2 + t3 v3 | ti ≥ 0}. The polyhedral cone C has exactly one vertex at o and three (unbounded) edges ei along the directions of the vectors vi . Note that αC (o) is the area of the spherical triangle with vertices at vi . Denote the spherical angles of this triangle by θi , as in Figure 1 (where o lies at the center of the sphere in Figure 1). Given a uniformly distributed random unit vector u, let Cu denote the orthogonal projection of C onto the plane u⊥ . Evidently Cu will resemble one of the outcomes in Figure 2. Specifically, Cu will cover the entire plane u⊥ if and only if u lies in the interior of ±aC (o). It follows that Cu = u⊥ with probability Area(aC (o)) + Area(−aC (o)) 2αC (o) αC (o) = = . 4π 4π 2π Since the number of vertices of Cu is either 0 or 1, the expected number of vertices of Cu is given by the complementary probability E(# of vertices) = 1 −
αC (o) . 2π
(2)
Meanwhile, an edge e projects to the interior of Cu if and only if u lies in the interior of ±aC (e). Taking the complement as before, e projects to a boundary edge of Cu with probability 1 − αC2π(e) . Observe that each solid inner angle measure αC (ei ) is given by 2θi (see Figure 3), so that the expected number of edges of Cu is θ1 + θ2 + θ3 αC (ei ) θi = = 3− . (3) 1− 1− E(# of edges) = 2π π π i i January 2021]
NOTES
71
θ e
Figure 3. αC (e) = 2θ.
Since the number of edges in Cu is almost surely twice the number of vertices (see Figure 2), the identities (2) and (3) imply that 3−
αC (o) θ1 + θ2 + θ3 = E(# of edges) = 2E(# of vertices) = 2 − . π π
(4)
It is now immediate from (4) that αC (o) = θ1 + θ2 + θ3 − π, as asserted in (1). In higher dimensions a proliferation of cases makes this viewpoint much more complicated. However, variations of this approach are applied in [1], [3], [4, p. 315a], [5], and [7] to derive many fundamental formulas for intrinsic volumes of polyhedral angles and cones. REFERENCES [1] Amelunxen, D., Lotz, M. (2017). Intrinsic volumes of polyhedral cones: a combinatorial perspective. Disc. Comput. Geom. 58(2): 371–409. [2] Coxeter, H. S. M. (1969). Introduction to Geometry, 2nd ed. New York: Wiley. [3] Feldman, D. V., Klain, D. (2009). Angles as probabilities. Amer. Math. Monthly. 116(8): 732–735. [4] Gr¨unbaum, B. (2003). Convex Polytopes, 2nd ed. New York: Springer. [5] Perles, M. A., Shephard, G. C. (1967). Angle sums of convex polytopes. Math. Scand. 21: 199–218. [6] Stillwell, J. (1992). Geometry of Surfaces. New York: Springer. [7] Welzl, E. (1994). Gram’s equation—a probabilistic proof. In: Karhum¨aki, J., Maurer, H., Rozenberg, G., eds. Results and Trends in Theoretical Computer Science (Graz, 1994). Lecture Notes in Computer Science, Vol. 812. Berlin: Springer, pp. 422–424. Mathematical Sciences, University of Massachusetts Lowell, Lowell, MA 01854, USA Daniel [email protected]
72
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Foster, Tur´an, and Neighbors Xiaomin Chen, Fenglin Huang, Shuhan Zhou, Mingxuan Zou, and Junchi Zuo Abstract. In this note, we prove a graph inequality based on the sizes of the common neighborhoods. We also characterize the extremal graphs that achieve equality. The result was first discovered as a consequence of Foster’s classical theorem about electrical networks. We also present a short combinatorial proof that was inspired by a similar inequality related to the Tur´an’s celebrated theorem.
1. INTRODUCTION. A (simple) graph G is a pair G = (V , E) where V is the vertex set and E ⊆ V2 is the edge set. For any pair of distinct vertices u and v, we denote {u, v} by uv. For a vertex u, the (open) neighborhood is defined as N(u) = {v ∈ V : uv ∈ E}. In particular, u ∈ N(u). For two vertices u and v, N(u) ∩ N(v) is the set of their common neighbors. A graph is biconnected if deleting any vertex results in a graph that is still connected. A block of a graph is a subgraph that is biconnected and maximal with respect to this property. For other standard graph-theoretical notations not specified here, we refer to the textbook [2]. In this short note, we prove the following theorem for any connected graph. Theorem 1. Let G = (V , E) be a connected graph with n vertices, then (a) uv∈E
n−1 1 ≥ ; |N(u) ∩ N(v)| + 2 2
(b) equality in (a) holds if and only if every block of G forms a clique. We show here some easy examples. Denote by S(G) the left-hand side of Theorem 1 (a). For the complete graph Kn , |N(u) ∩ N(v)| = n − 2 for each edge uv; summing over the n2 edges, we get S(Kn ) = (n − 1)/2. For any tree T on n vertices, |N(u) ∩ N(v)| = 0 for each edge uv; and S(T ) = (n − 1)/2. For the cycle Cn with n ≥ 4, it is also easy to see that S(Cn ) = n/2. The theorem was first discovered and proved by one of the authors using electrical networks. In Section 3, we give the original reasoning and a generalization. While the implication from the classical results in electrical networks is easy, we find the theorem in its graph-theoretical form in terms of common degrees is interesting. In Section 2, we give a simple combinatorial proof, which was inspired by one of the classical proofs of Tur´an’s theorem. 2. GRAPH-THEORETICAL PROOFS. In one of the well-known proofs (see [1]) of Tur´an’s theorem [6], we have the inequality (see [3] and [7])
1 ≤ α(G), |N(v)| +1 v∈V (G)
(1)
doi.org/10.1080/00029890.2021.1839280 MSC: Primary 05C35, Secondary 05C09
January 2021]
NOTES
73
where α(G) is the independence number of the graph G. It was the similarity of the forms of this inequality and that in Theorem 1 that inspired us to prove the theorem in a similar way. Consider a uniformly random permutation of the vertices; the left-hand side of (1) can be seen as the expected number of vertices that come before all its neighbors in the ordering. Because the set of such vertices in any ordering is an independent set, (1) follows. Similarly, we explain the left-hand side of Theorem 1(a) as the expected number of edges where one of its two endpoints comes before all its common neighbors in a random ordering. Definition. For a graph G = (V , E) and any ordering of the vertices, that is, an injection π : V → N, call a pair (u, v) ∈ V 2 good with respect to π if uv ∈ E, π(u) < π(v), and π(u) < π(w) for any w ∈ N(u) ∩ N(v). Define the graph G(π) = (V , E (π) ) to be the subgraph whose edge set is formed by good pairs, that is, E (π) = {{u, v} : (u, v) is good with respect to π}. Proof of Theorem 1. Randomly uniformly pick an ordering π : V → [n], define the random variables χu,v to be the indicator of the event that (u, v) is good, and let X = (u,v),uv∈E χu,v be the number of the good pairs. It is clear that, for any edge uv, Eχu,v = Pr(χu,v = 1) =
1 , |N(u) ∩ N(v)| + 2
so EX =
(u,v)∈V 2 ,uv∈E
1 . |N(u) ∩ N(v)| + 2
Note that each edge uv contributes two terms in the sum. Thus, Theorem 1 follows directly from the next lemma. Lemma 1. Let G be a graph with n vertices. (a) G(π) is connected for any injection π : V → N. (b) G(π) is a tree for every injection π : V → N if and only if every block of G forms a clique. We provide three short proofs of part (a) of Lemma 1, then a proof of part (b). The first proof uses induction. Proof 1 of part (a). We use induction on n. The base case n = 1 is obvious. Now suppose G is a connected graph with n > 1 vertices, π is an ordering of the vertices, and v is the last vertex in the ordering, that is, π(v) = max{π(u) : u ∈ V }. Let G1 , G2 , . . . , Gk (k ≥ 1) be the connected components of the graph G − v, denote by Vi the vertex set of Gi , and denote by πi the restriction of π to Vi . (π ) By the inductive hypothesis, Gi i is connected for every 1 ≤ i ≤ k. It is easy to (π ) check that Gi i is a subgraph of G(π) : let (a, b) be a good pair in Gi with respect to πi . We have ab ∈ E(Gi ) ⊆ E(G), π(a) = πi (a) < πi (b) = π(b), and any vertex c in NG (a) ∩ NG (b) is either v or in the same connected component with a, b in G − v, 74
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
i.e., c ∈ Vi . In the former case π(a) < π(c) = π(n); in the latter case π(a) < π(c) since c is also a common neighbor of a and b in Gi , and (a, b) is good in Gi with respect to πi . Let ai be the vertex in NG (v) ∩ Vi with the smallest π-value. We claim that (ai , v) is also a good pair. Indeed, ai ∈ NG (v) implies ai v ∈ E; π(ai ) < π(v) because v has the maximum π-value, and any common neighbor b of ai and v is in Vi and in NG (V ), so by the definition of ai , we have π(ai ) < π(b). (π ) Thus, G(π) contains all the edges of Gi i that connect all the vertices in Vi , and all the edges vai that connect the Vi ’s to v. It is therefore a connected graph. The second proof shows that the edges of a “minimum coded tree” provides good pairs. Proof 2 of part (a). For any spanning tree T of G, define the weight w(T ) =
min(π(u), π(v)).
uv∈E(T )
Suppose T ∗ is a tree with minimum weight. We claim that any pair (u, v) with π(u) < π(v) and uv ∈ E(T ) is good with respect to π. Assume the contrary, that there is a ∈ N(u) ∩ N(v) such that π(a) < π(u). Then T ∗ − uv has two connected components Cu and Cv , where u is in Cu and v is in Cv . When a is in Cu (respectively, Cv ), T = T ∗ − uv + av (respectively, T ∗ − uv + au) is a spanning tree of G, but, in both cases, w(T ) = w(T ) − π(u) + π(a) < w(T ), which is a contradiction. In the last proof, we provide an algorithm that finds all the good pairs. Proof 3 of part (a). We maintain a graph H and set H = G to begin with. For each edge uv ∈ E, define σ (uv) = min(π(u), π(v)). We do the following for each edge, in the order of larger σ value to smaller, and break ties arbitrarily: For each edge uv, if there is a triangle uva in H with π(a) < π(u) and π(a) < π(v), we delete uv from H.
Figure 1. Left: An edge uv is deleted because in the triangle auv, a comes before u and v in π . Right: The final graph G(π ) after we delete two more edges.
On the one hand, H is always a subgraph of G. For any pair (u, v) where uv gets deleted, the triangle auv certifies that (u, v) is not a good pair. On the other hand, suppose (u, v) is not a good pair, so that there is a triangle auv in G with π(a) < π(u) and π(a) < π(v). In the algorithm when we consider uv, by our ordering, none of au and av is deleted from H , so uv will be deleted. January 2021]
NOTES
75
Hence, the edge set of the final graph H is indeed G(π) . Note that we only delete an edge from a triangle from H , so H is always connected. Now we prove the second part of Lemma 1. Proof of Lemma 1, part (b). Sufficiency: Let G be a graph whose every block is a clique and let π be an ordering of its vertices. Assume to the contrary that G(π) contains a cycle C whose vertices, sorted according to π, are π(v1 ) < π(v2 ) < · · · < π(vk ). v2 must have two edges on C, so there is some i > 2 such that (v2 , vi ) is a good pair. However, the cycle C certifies that v1 , v2 , vi are in the same block, which is a clique, so v1 ∈ N(v2 ) ∩ N(vi ); thus (v2 , vi ) is not a good pair. This is a contradiction. Necessity: Suppose G has a block H that is not a clique, so there are two points in H that are distance 2 apart. Therefore there are a, b, c ∈ V (H ) such that ac, bc ∈ E, ab ∈ E. Since H is a block, there are paths between a and b that do not go through c. Let P = (a, v1 , v2 , . . . , vk , b) be such a path with smallest k. We have k ≥ 1. Now let π be a permutation of the vertices where π(a) < π(b) < π(c) < π(v1 ) < π(v2 ) < · · · < π(vk ) and π(vk ) < π(u) for any u ∈ {a, b, c, v1 , . . . , vk }. It is clear that (a, c) and (a, vk ) are good pairs. (b, c) and (b, v1 ) are also good pairs because the only vertex appearing before b is a and it is not connected to b. Next we prove that (vi , vi+1 ) is a good pair for any 1 ≤ i < k — otherwise, there is an x such that π(x) < π(vi ) and xvi vi+1 is a triangle. However, by the minimality of P , x = a since avi ∈ E; x = b since bvi+1 ∈ E; x = vj for any j < i since vj vi+1 ∈ E. Hence all the edges on P are in G(π) and ab, ac are also in G(π) ; they form a cycle. 3. PROOF VIA ELECTRICAL NETWORKS. Given a connected graph G = (V , E), we construct an electrical network NG on the set of vertices, and connect each pair of adjacent vertices by a resistance of magnitude 1. The resistance distance Ru,v between two vertices is the effective resistance between u and v in NG . It is well known that this is a metric on V . Theorem 1 was first discovered by Mingxuan while pondering the following beautiful fact in electrical networks, known as Foster’s first theorem (see [4, 5, 8]). Theorem 2 ([4]). For a connected graph G = (V , E), denote by Ru,v the effective resistance between two vertices u and v in NG . Then the sum of effective resistances over the edges is Ru,v = n − 1. uv∈E
Theorem 1 is a consequence of Foster’s theorem. Proof of Theorem 1. (a) For any edge uv ∈ E, consider the network modified from NG by keeping the resistances on edges {uv} ∪ {uw, vw : w ∈ N(u) ∩ N(v)}, 76
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
and deleting all the other edges (or, equivalently, increasing all the other resistances to +∞). The subgraph considered here is K2,d , where d is the number of common neighbors of u and v, with an edge added between u and v. The resulting network is equivalent to |N(u) ∩ N(v)| + 1 parallel edges, one having resistance 1 and the others having resistance 2 each. By Rayleigh’s monotonicity law, the effective resistance Ru,v is bounded above as Ru,v ≤
1 |N(u) 2
1 . ∩ N(v)| + 1
(2)
So, combined with Theorem 2, uv∈E
1 1 1 1 n−1 = . ≥ Ru,v = 1 |N(u) ∩ N(v)| + 2 2 uv∈E 2 |N(u) ∩ N(v)| + 1 2 uv∈E 2
(b) It is easy to check the “if” direction. To check the “only if” direction, we assume that some block C of G is not a clique, and prove that for at least one pair of adjacent vertices equality in (2) does not hold. Being a connected, noncomplete graph, a well-known fact is that C has an induced P3 , i.e., there are u, v, z ∈ V (C) such that uv, uz ∈ E, vz ∈ E. Since C is biconnected, C − u is connected, so we may pick a shortest path P from z to {v} ∪ (N(u) ∩ N(v)) in C − u. Now, in NG , only keep the edges {uv} ∪ {uw, vw : w ∈ N(u) ∩ N(v)} ∪ P . It is then easy to see that Ru,v
0. Then μ(B) ≤ μ(X) =
∞ 1 = 1. 2i i=1
Furthermore, by the density of A, there exists xk ∈ A such that xk ∈ B(x0 , R). Therefore, μ(B(x0 , R)) ≥
1 1 δx (B(x0 , R)) = k > 0. 2k k 2
1A
Borel measure is a measure that is defined on the Borel σ -algebra. doi.org/10.1080/00029890.2021.1835339 MSC: Primary 30L99, Secondary 28E99; 28A05
84
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
In this way, we have proved that the measure μ of all open balls is positive and finite. Implication (2) ⇒ (3) is obvious. (3) ⇒ (1): Let us fix x0 ∈ X and, for k ∈ N, put Bk = B(x0 , k). For n, l ∈ N such that l ≥ l0 := μ(B(x10 ,1)) + 1, let A(k, n, l) denote the set of families of nonoverlapping open balls contained in Bk with radius smaller than 1/n and measure bigger than 1/ l. Since l ≥ l0 , the set A(k, n, l) is nonempty. We claim that there exists a maximal element in A(k, n, l). Indeed, let us suppose that it is not true; then for each α ∈ A(k, n, l) there exists β ∈ A(k, n, l) such that α β. Therefore, from the axiom of dependent choice there exists a sequence αi ∈ A(k, n, l) such that αi αi+1 . Let α := ∞ α ; then it is easy to see that α ∈ A(k, n, l). On the other hand, α contains an i=1 i infinite number of nonoverlapping balls. Therefore, since for every B ∈ α we have that μ(B) > 1/ l and B ⊂ Bk , we get a contradiction with the assumption that μ(Bk ) < ∞. Let αk,n,l be a maximal element of A(k, n, l). It is easy to see that αk,n,l is a finite set. Next, let S(k, n, l) denote the set of centers of balls from αk,n,l . Finally, we define the set S=
∞ ∞ ∞
S(k, n, l).
k=1 n=1 l=l0
Since S is obviously countable we need to show that S is dense in X. For this purpose, ˜ We shall prove that we fix x ∈ X and > 0. Let k˜ be such that B(x, ) ⊂ B(x0 , k). B(x, ) ∩
∞ ∞
˜ n, l) = ∅. S(k,
n=1 l=l0
Suppose that this is not true; let n˜ be such that 1/n˜ < /3 and l˜ such that μ B x, 21n˜ ˜ Since for every y ∈ S(k, ˜ n, ˜ and δ < 1/n˜ we have > 1/l. ˜ l) 1 B(y, δ) ∩ B x, = ∅, 2n˜ we get a contradiction with the maximality of αk,˜ n, ˜ l˜. We close this note with some remarks. Remark 1. Under the assumption of the axiom of choice the implication (3) ⇒ (1) has been proved previously (see, e.g., [1, Proposition 1] or [2, Theorem 4.1], where the Vitali covering type theorem was applied). In the implication (3) ⇒ (1), we assume the axiom of dependent choice (ADC), which is weaker than the axiom of choice (AC). One can ask whether ADC is equivalent to the implication (3) ⇒ (1). Remark 2. In the implication (1) ⇒ (2), we constructed a finite-valued measure with atoms. Let us observe that if the continuum hypothesis holds, then there is no finite-valued, nonatomic measure2 on any separable metric space X that has domain equal to the entire power set P (X). Indeed, since X is separable and the continuum hypothesis holds, we have |X| ≤ ω or |X| = ω1 . Therefore, if |X| ≤ ω, then obviously every nontrivial measure has atoms. In the latter case, i.e., |X| = ω1 , the claim follows from the Ulam theorem (see [3, Theorem 5.6]). So the natural question is what might happen without the continuum hypothesis. 2 Nonatomic
means here that the measure of singletons is equal to zero.
January 2021]
NOTES
85
ACKNOWLEDGMENTS. The author would like to thank the reviewers for valuable comments and suggestions. The article was made during quarantine. So, special thanks go to my wife for patience and forbearance, and to my children for being polite.
REFERENCES [1] Bj¨orn, A., Bj¨orn, J. (2011). Nonlinear Potential Theory on Metric Spaces. EMS Tracts in Mathematics, Vol. 17. Z¨urich: European Mathematical Society. [2] Gaczkowski, M., G´orka, P. (2009). Harmonic functions on metric measure spaces: convergence and compactness. Potential Anal. 31(3): 203–214. [3] Oxtoby, J. C. (1970). Measure and Category. Graduate Texts in Mathematics, Vol. 2. New York-Berlin: Springer-Verlag. Department of Mathematics and Information Sciences, Warsaw University of Technology, Pl. Politechniki 1, 00-661 Warsaw, Poland [email protected]
86
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
What’s Special about the Perfect Number 6? For n ∈ N, let σ (n) be the sum of all positive divisors of n. If σ (n) = 2n, then n is a perfect number. As early as 300 BC, Euclid gave a sufficient condition for a number to be perfect: if p and 2p − 1 are both primes, then 2p−1 (2p − 1) is a perfect number. In the mid-eighteenth century, Euler showed that the sufficient condition is also necessary for even perfect numbers. Nowadays, two big questions are still open. First, are there infinitely many even perfect numbers? Second, is there an odd perfect number? Recently, several mathematicians have characterized numbers that divide the sum of their divisors raised to a power. In particular, given k ∈ N, define σk (n) := d|n d k . Cai et al. [1] proved that given n = 2α p (α ≥ 1 and p is prime), we have n|σ3 (n) if and only if n is an even perfect number except 28. Continuing the work, Jiang [2] showed that the result still holds if n = 2α pβ for any β ≥ 1. Inspired by these works, we prove the following result: Theorem. Let n be an even perfect number. Then n|σk (n) for all odd k if and only if n = 6. Proof. We first prove the backward implication. Let k ≥ 1 be an odd integer. Write k = 2j + 1 for some j ≥ 0. Then σk (6) = 1 + 22j +1 + 32j +1 + 62j +1 . Observe that 32j +1 − 3 = 3(32j − 1) ≡ 0 mod 6; 22j +1 − 2 = 2(4j − 1) ≡ 0 mod 6. Hence, 3k ≡ 3 mod 6 and 2k ≡ 2 mod 6 and so, 6|σk (6), as desired. For the forward implication, write n = 2p−1 (2p − 1), where p is a prime. We show that n σp (n); because p > 2 when n = 6, the result will follow. We have σp (n) = σp (2p−1 (2p − 1)) = (1 + 2p + · · · + 2p(p−1) )(1 + (2p − 1)p ). Suppose, for a contradiction, that n|σp (n). Since (2p − 1, 1 + (2p − 1)p ) = 1, we have 2p − 1 | 1 + 2p + · · · + 2p(p−1) =
p−1
2pi .
(1)
i=0
Each term in the summation is 1 mod 2p − 1 and there are p terms; thus, the summation is p mod 2p − 1, which contradicts (1). Hence, n σp (n). REFERENCES [1] Cai, T., Chen, D., Zhang, Y. (2015). Perfect numbers and Fibonacci primes (I). Int. J. Number Theory. 11(1): 159–169. [2] Jiang, X. (2018). On even perfect numbers. Colloq. Math. 154(1): 131–135.
—Submitted by H`ung Viˆe.t Chu, Department of Mathematics, University of Illinois at Urbana-Champaign doi.org/10.1080/00029890.2021.1839304 MSC: Primary 11A25
January 2021]
c THE MATHEMATICAL ASSOCIATION OF AMERICA
87
PROBLEMS AND SOLUTIONS Edited by Daniel H. Ullman, Daniel J. Velleman, and Douglas B. West with the collaboration of Paul Bracken, Ezra A. Brown, Zachary Franco, Christian Friesen, L´aszl´o Lipt´ak, Rick Luttmann, Hosam Mahmoud, Frank B. Miles, Lenhard Ng, Kenneth Stolarsky, Richard Stong, Stan Wagon, Lawrence Washington, and Li Zhou. Proposed problems should be submitted online at americanmathematicalmonthly.submittable.com/submit. Proposed solutions to the problems below should be submitted by May 31, 2021, via the same link. More detailed instructions are available online. Proposed problems must not be under consideration concurrently at any other journal nor be posted to the internet before the deadline date for solutions. An asterisk (*) after the number of a problem or a part of a problem indicates that no solution is currently available.
PROBLEMS 12223. Proposed by Michael Elgersma, Plymouth, MN, and James R. Roche, Ellicott City, MD. Two weighted m-sided dice have faces labeled with the integers 1 to m. The first die shows the integer i with probability pi , while the second die shows the integer i with probability ri . Alice rolls the two dice and sums the resulting integers; Bob then independently does the same. (a) For each m with m ≥ 2, find the probability vectors (p1 , . . . , pm ) and (r1 , . . . , rm ) that minimize the probability that Alice’s sum equals Bob’s sum. (b)* Generalize to n dice, with n ≥ 3. 12224. Proposed by Cherng-tiao Perng, Norfolk State University, Norfolk, VA. Let ABC be a triangle, with D and E on AB and AC, respectively. For a point F in the plane, let DF intersect BC at G and let EF intersect BC at H . Furthermore, let AF intersect BC at I , let DH intersect EG at J , and let BE intersect CD at K. Prove that I , J , and K are collinear. 12225. Proposed by Pakawut Jiradilok, Massachusetts Institute of Technology, Cambridge, MA, and Wijit Yangjit, University ∞ of Michigan, Ann Arbor, MI. Let denote the gamma function, defined by (x) = 0 e−t t x−1 dt for x > 0. (a) Prove that (1/n) = n for every positive integer n, where y denotes the smallest integer greater than or equal to y. (b) Find the smallest constant c such that (1/n) ≥ n − c for every positive integer n. 12226. Proposed by Jovan Vukmirovic, Belgrade, Serbia. Let x1 , x2 , and x3 be real numbers, and define xn for n ≥ 4 recursively by xn = max{xn−3 , xn−1 } − xn−2 . Show that the sequence x1 , x2 , . . . is either convergent or eventually periodic, and find all triples (x1 , x2 , x3 ) for which it is convergent. 12227. Proposed by Gregory Galperin, Eastern Illinois University, Charleston, IL, and Yury J. Ionin, Central Michigan University, Mount Pleasant, MI. Prove that for any integer n with n ≥ 3 there exist infinitely many pairs (A, B) such that A is a set of n consecutive positive integers, B is a set of fewer than n positive integers, A and B are disjoint, and k∈A 1/k = k∈B 1/k. doi.org/10.1080/00029890.2021.1840171
88
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
12228. Proposed by Herv´e Grandmontagne, Paris, France. Prove √ 1 (ln x)2 ln 2 x/(x 2 + 1) dx = 2G2 , x2 − 1 0 n 2 where G is Catalan’s constant ∞ n=0 (−1) /(2n + 1) . 12229. Proposed by Moubinool Omarjee, Lyc´ee Henri IV, Paris, France. Let f : [0, 1] → R be a function that has a continuous second derivative and that satisfies 1 f (0) = f (1) and 0 f (x) dx = 0. Prove 30240
2
1
1
≤
xf (x) dx 0
2 f (x) dx.
0
SOLUTIONS Two Generating Functions with Some Equal Coefficients 12113 [2019, 468]. Proposed by Richard P. Stanley, University of Miami, Coral Gables, FL. Define f (n) and g(n) for n ≥ 0 by
f (n)x n =
j
x2
j ≥0
n≥0
and
n≥0
g(n)x n =
j −1
k
k
1 + x 2 + x 3·2
k=0
i i 1 + x 2 + x 3·2 . i≥0
Find all values of n for which f (n) = g(n), and find f (n) for these values. Solution by Harris Kwong, State University of New York, Fredonia, NY, and the editors. Equality occurs if and only if n = 1 (with f (1) = 1) or n has the form 3 · 2p − 2q with 0 ≤ q ≤ p. In this general case, f (n) is the Fibonacci number Fp−q+2 , where F1 = F2 = 1 and Fk = Fk−1 + Fk−2 for k ≥ 3. Note that f (0) = 0 < 1 = g(0) and f (1) = 1 = g(1), so we may henceforth assume n ≥ 2. The form of the generating function for g implies that g(n) is the number of partitions of n into distinct parts such that each part has the form 2j or 3 · 2j with j ≥ 0 and such that these two values for the same j are not both used. For each such partition, we can write n = dj=0 aj 2j for some positive integer d, with aj ∈ {0, 1, 3} for each j and with ad = 0. We represent such a partition as a word ad · · · a0 over the alphabet {0, 1, 3}, and we refer to ad · · · a0 as an “extended binary representation” or just “extended representation” of n. For example, 1031 is an extended representation of 15. The distinction between f and g is that g(n) counts all extended representations of n (leading 0s ignored), while f (n) counts extended representations of n whose leftmost digit is 1. Thus f (n) = g(n) precisely when n is 1-pure, meaning that it has no extended representation with leftmost digit 3. We show first that if n is 1-pure, then n = 3 · 2p − 2q with 0 ≤ q ≤ p. Since 2p+1 has this form with q = p, we may assume for this implication that the (ordinary) binary representation of n has at least two 1s. Write its leftmost portion as 10r 1 with r ≥ 0, where br indicates a string of bs of length r. We show first that r must equal 1 when n is 1-pure. Let x be the integer that is 10r 1 in binary. Since x = 2r+1 + 1, we see that x is divisible by 3 when r is even and congruent to 2 modulo 3 when r is odd. When r is even, we can use x/3 in binary to give an extended representation of x with all nonzero digits 3. Appending the rest of the binary representation of n gives an extended representation of
January 2021]
PROBLEMS AND SOLUTIONS
89
n starting with 3. For example, when n is 37, with binary representation 100101, we have x = 9 and obtain the extended representation 3301. Hence r cannot be even. When r is odd and positive, x ≡ 2 mod 3 allows us to write x = 3y + 5 for some nonnegative integer y. When r ≥ 3, we have x − 5 = 3y with y = (2r+1 − 4)/3, and y is both positive and divisible by 4. Thus we can build an extended representation of n by putting 3 times the binary representation of y/4 at the left, followed by two positions holding 13 to reach x, and then the trailing portion of the binary representation of n. For example, when n is 69, with binary representation 1000101, we have x = 17 and obtain the extended binary representation 31301. Hence r must equal 1. Now suppose that after the initial 101, the binary representation of n has somewhere another 0 followed by a 1. Thus it starts with 101s 0t 1 with s, t ≥ 1, representing an integer z. Note that z = 1 + 2t+1 (2s+1 + 2s − 1) ≡ 1 − 2t+1 mod 3. Thus z ≡ 0 mod 3 when t is odd and z ≡ 2 mod 3 when t is even. As discussed above, either case yields an extended representation of n starting with 3. Therefore, whenever n is 1-pure and n ≥ 2, the binary representation of n has the form 101s 0t with s, t ≥ 0. Note that n = 2t (2s+1 + 2s − 1) = 3 · 2s+t − 2t , which has the claimed form with p = s + t and q = t, so p − q = s. It remains to show that every integer n of this form is 1-pure and has Fs+2 extended representations. We do both simultaneously by describing all the extended representations for such n. Converting an extended representation to a binary representation moves units to the left, since two units of 2j convert to one unit of 2j +1 . Hence the rightmost t digits of any extended representation of 101s 0t are 0. Next we must obtain s digits that equal 1. Working from the right in an extended representation, when we see a 1 or a 0 we leave it alone. When we see a 3, we convert it to a 1 and add 1 to the digit at its left. That digit must end up as 1 if we have not yet produced all s consecutive 1s, so the extended binary representation must have 0 to the left of the 3. There is also the possibility that the leftmost of these s positions is a 3 with a 1 to its left, so that 13 at the left end of the extended representation converts to 101 to match the left end of the binary representation of n. Thus an extended representation that converts to 101s 0t is determined by the positions that are 3 within the string 1s . Any of these positions can have a 3, but those having a 3 must be nonconsecutive. Once the 3s are fixed, there is a unique way to fill in the remaining positions to obtain an extended binary representation of n. Furthermore, either the leftmost position among these s is a 1, or it is a 3 with a 1 immediately to its left, so n is 1-pure. We have shown that f (n) is the number of subsets of the positions {1, . . . , s} that do not contain consecutive positions. It is well known that the number of such subsets is the Fibonacci number Fs+2 . Thus f (n) = Fs+2 = Fp−q+2 . Editorial comment. Most solvers determined all values f (n) and g(n) recursively. They satisfy the same the recurrence a(2n) = a(n) and a(2n + 1) = a(n) + a(n − 1), but the initial conditions f (0) = 0 and g(0) = 1 are different. The solution above emphasizes what the values actually count. The sequence associated with g is at oeis.org/A120562, while the sequence associated with f is at oeis.org/A082498. Rory Molinari observed that these and similar sequences were studied in S. Northshield (2010), Sums across Pascal’s triangle modulo 2, Congr. Numer. 200: 35–52. Northshield used the term “(3, 1)-hyperbinary representation” for the extended binary representation used here and proved that g(n) is the number of with i ≤ n/3. For more on hyperbinary repodd binomial coefficients of the form n−2i i resentations, see T. Mansour and M. Shattuck (2011), Two further generalizations of the Calkin–Wilf tree, J. Combinatorics 2(4): 507–524. Also solved by R. Chapman (UK), N. Hodges (UK), O. Kouba (Syria), P. Lalonde (Canada), R. Molinari, A. Natian, A. Pathak, R. Stong, R. Tauraso (Italy), L. Zhou, and the proposer.
90
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
A Dirichlet Series with Reduced Numerators 12114 [2019, 469]. Proposed by Zachary Franco, Houston, TX. Let n be a positive integer, and let An = {1/n, 2/n, . . . , n/n}. Let an be the sum of the numerators in An when these fractions are expressed in lowest terms. For example, A6 = {1/6, 1/3, 1/2, 2/3, 5/6, 1/1}, 4 a /n . so a6 = 1 + 1 + 1 + 2 + 5 + 1 = 11. Find ∞ n n=1 Solution by Tamas Wiandt, Rochester Institute of Technology, Rochester, NY. The value of the sum is ζ (4) ζ (2) +1 , 2 ζ (3) where ζ is the Riemann zeta function. To establish this result, observe that an =
n k=1
=1+
n−k k k n−k =1+ =1+ =1+ gcd(k, n) gcd(k, n) gcd(n − k, n) gcd(k, n) k=1 k=1 k=1 n−1
n k=1
n−1
n−1
n−k n = 1 − an + . gcd(k, n) gcd(k, n) k=1 n
Solving for an yields 1 1 n n 1 1 n 1 1 + = + ϕ = + dϕ (d) , 2 2 k=1 gcd(k, n) 2 2 d|n d d 2 2 d|n n
an =
(∗)
where ϕ is the Euler totient function. Using the well-known identity ∞
ϕ(d)/d k = ζ (k − 1)/ζ (k)
d=1
(when k > 2) for the Dirichlet series of the totient function, ⎛ ⎞ ∞ ∞ ∞ ∞ dϕ(d) 1 an dϕ(d) 1 ⎝ ⎠ = ζ (4) + 1 = + 4 4 4 n 2n 2 d|n n 2 2 d=1 k=1 d 4 k 4 n=1 n=1
∞ ∞ ∞ ζ (2) ϕ(d) ζ (4) 1 dϕ(d) 1 ζ (4) ζ (4) + 1+ . = = = 1+ 2 2 d=1 d 4 k=1 k 4 2 d3 2 ζ (3) d=1
The interchange of the order of summation here is justified by the positivity of all summands. Editorial comment. The computation in (∗) appears also in the solution of Problem 10829 [2000, 753; 2002, 763] in this Monthly. Many solvers noted the more general result ∞ ζ (s − 2) an ζ (s) 1 + = ns 2 ζ (s − 1) n=1 for s > 3. This can be shown by replacing the final set of equations in the given proof with ⎛ ⎞ ∞ ∞ ∞ ∞ dϕ(d) 1 an dϕ(d) 1 ⎝ ⎠ = ζ (s) + 1 = + s s s n 2n 2 d|n n 2 2 d=1 k=1 d s k s n=1 n=1
∞ ∞ ∞ ζ (s − 2) ϕ(d) ζ (s) 1 dϕ(d) 1 ζ (s) ζ (s) + 1 + . = = = 1 + 2 2 d=1 d s k=1 k s 2 d s−1 2 ζ (s − 1) d=1
January 2021]
PROBLEMS AND SOLUTIONS
91
Also solved by R. Boukharfane (Saudi Arabia), R. Chapman (UK), H. Chen, A. B. Dixit & S. Pathak (Canada), G. Fera (Italy), N. Fogarty & B. Bradie, D. Garth, J.-P. Grivaux (France), R. Guadalupe (Philippines), N. Hodges, J. Hyeonwook, Y. J. Ionin, O. Kouba (Syria), P. Lalonde (Canada), O. P. Lossers (Netherlands), R. Molinari, M. Omarjee (France), M. A. Prasad (India), M. D. Schmidt, N. C. Singer, A. Stadler (Switzerland), A. Stenger, R. Stong, R. Tauraso (Italy), L. Zhou, GCHQ Problem Solving Group (UK), and the proposer.
An Asymmetric Proof of a Symmetric Inequality 12115 [2019, 469]. Proposed by Marius Dr˘agan, Bucharest, Romania. Let a, b, c, and d be positive real numbers. Prove (a 3 + b3 )(a 3 + c3 )(a 3 + d 3 )(b3 + c3 )(b3 + d 3 )(c3 + d 3 ) ≥ (a 2 b2 c2 + a 2 b2 d 2 + a 2 c2 d 2 + b2 c2 d 2 )3 . Solution by Sarah B. Seales, Prescott, Arizona. We begin by pairing up factors on the left side of the desired inequality to get (a 3 + b3 )(a 3 + c3 )(a 3 + d 3 )(b3 + c3 )(b3 + d 3 )(c3 + d 3 )
(1)
= (a + a c + a b + b c )(d + a b + a d + b d )(c + b c + b d + c d ). 6
3 3
3 3
3 3
6
3 3
3 3
3 3
6
3 3
3 3
3 3
Next, we employ a generalization of H¨older’s inequality, which states that if ai,j ≥ 0 for 1 ≤ i ≤ k and 1 ≤ j ≤ n, then ⎛ ⎞ ⎛ ⎞k k n n k
k ⎝ ai,j ⎠ ≥ ⎝ ai,j ⎠ i=1
j =1
j =1
i=1
Applying this inequality to the vectors (a 6 , a 3 c3 , a 3 b3 , b3 c3 ), (d 6 , a 3 b3 , a 3 d 3 , b3 d 3 ), and (c6 , b3 c3 , b3 d 3 , c3 d 3 ), we get (a 6 + a 3 c3 + a 3 b3 + b3 c3 )(d 6 + a 3 b3 + a 3 d 3 + b3 d 3 )(c6 + b3 c3 + b3 d 3 + c3 d 3 ) √ 3 3 a 6 d 6 c6 + (a 3 c3 )(a 3 b3 )(b3 c3 ) ≥ +
3
(a 3 b3 )(a 3 d 3 )(b3 d 3 ) +
3
(b3 c3 )(b3 d 3 )(c3 d 3 )
3
(2)
= (a 2 b2 c2 + a 2 b2 d 2 + a 2 c2 d 2 + b2 c2 d 2 )3 . Combining (1) and (2) gives the result. Also solved by F. R. Ataev (Uzbekistan), R. Boukharfane (Saudi Arabia), R. Chapman (UK), M. Dinc˘a (Romania), H. Y. Far, G. Fera (Italy), O. Geupel (Germany), L. Giugiuc (Romania), E. A. Herman, W. Janous (Austria), M. Kaplan & M. Goldenberg, J. C. Kieffer, K. T. L. Koo (China), J. H. Lindsey II, O. P. Lossers (Netherlands), L. Matej´ıc˘ ka (Slovakia), A. Pathak, C. R. Pranesachar (India), M. Reid, A. Stadler (Switzerland), R. Stong, R. Tauraso (Italy), L. Zhou, GCHQ Problem Solving Group (UK), and the proposer.
The Minimum Score in a Tournament 12116 [2019, 469]. Proposed by Rishubh Thaper, Fleminton, NJ. In a round-robin tournament with n players, each player plays every other player exactly once, and each match results in a win for one player and a loss for the other. When player A defeats player B, we call B the victim of A. At the end of the tournament, each player computes the total number of losses incurred by the player’s victims. Let q be the average of this quantity
92
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
over all players. Prove that there exists a player with at most √ q losses. at most
√ q wins and a player with
Solution by Nicole Grivaux, Paris, France. Let Pk denote the kth player, with dk losses, and let S = nq. Since Pk is the victim of dk players, the contribution of Pk to S is dk2 . Thus nq = dk2. √ If dk > q for all k, then dk2 > q for all k, and so S > nq, a contradiction. Hence √ q . mink {dk } ≤ √ Player Pk has n − 1 − dk wins. If n − 1 − dk > q for all k, then (n − 1 − dk )2 > q and dk2 > q + 2(n − 1)dk − (n − 1)2 for all k. Since dk = n(n − 1)/2, summing over k yields S > nq + 2n(n − 1)2 /2 − n(n − 1)2 = nq, √ again contradicting S = nq. It follows that mink {n − 1 − dk } ≤ q . Also solved by R. Chapman (UK), N. Fogarty, O. Geupel (Germany), N. Hodges, Y. Ionin, M. Jones, J. H. Lindsey II, O. P. Lossers (Netherlands), R. Molinari, A. Pathak, M. A. Prasad, E. Schmeichel, R. Tauraso (Italy), L. Zhou, GCHQ Problem Solving Group (UK), and the proposer.
A Complex Fibonacci Series 12118 [2019, 562]. Proposed by Hideyuki Ohtsuka, Saitama, Japan. Let Fn be the nth Fibonacci number, defined by F0 = 0, F1 = 1, and Fn = Fn−1 + Fn−2 when n ≥ 2. Compute ∞ n=0
where m is an odd integer and i =
√
1 , F2mn + Fm i
−1.
Solution by Aritro Pathak, student, Brandeis University, Waltham, MA. We use various identities for Fibonacci numbers to obtain the answer (1 − iϕ m )/F2m . Extend the definition of Fn to negative n by setting Fn = (−1)n+1 F−n . Note first that k − r is odd when k = 2nm and r = m, since m is odd. Therefore, multiplying numerator and denominator by the complex conjugate and using Catalan’s identity Fk2 − Fk+r Fk−r = (−1)k−r Fr2 yields ∞ n=0
∞
∞
F2nm − iFm F2nm − iFm 1 = = . 2 F2mn + Fm i F F F2nm + Fm2 n=0 n=0 (2n+1)m (2n−1)m
By induction on r, it is easy to prove Fk Lr = Fk+r + (−1)r Fk−r , where Lr is the Lucas number, equal to Fr+1 + Fr−1 . With k = r = m, we have Fm Lm = F2m . With k = 2nm and r = m, we have F2nm Lm = F(2n+1)m − F(2n−1)m . Replacing F2nm in our expression for the sum, for the real part we obtain ∞ ∞ 1 1 F2nm 1 1 1 = = − = , F F Lm n=0 F(2n−1)m F(2n+1)m Lm F−m Lm F m n=0 (2n+1)m (2n−1)m where the last step uses the fact that m is odd. We have already noted Lm Fm = F2m . For the imaginary part, we use d’Ocagne’s identity Fk Fr+1 − Fk+1 Fr = (−1)r Fk−r . With k = (2n + 1)m and r = (2n − 1)m (both odd), we have Lm Fm = F2m = F(2n+1)m+1 F(2n−1)m − F(2n+1)m F(2n−1)m+1 .
January 2021]
PROBLEMS AND SOLUTIONS
93
Thus 1 Fm = F(2n+1)m F(2n−1)m Lm
F(2n+1)m+1 F(2n−1)m+1 − F(2n+1)m F(2n−1)m
.
Again the sum telescopes, and we obtain ∞ n=0
Fm 1 = F(2n+1)m F(2n−1)m Lm
ϕ−
F−m+1 F−m
,
√ where ϕ is the golden ratio (1 + 5)/2. This simplifies to (Fm−1 + ϕFm )/F2m . A final induction on m yields Fm−1 + ϕFm = ϕ m , completing the proof. Editorial comment. Several solvers commented that this problem generalizes problem B1180 from the Fibonacci Quarterly, Nov. 2015, p. 366, with solution Nov. 2016, p. 371. Also solved by A. Berkane (Algeria), B. Bradie, R. Chapman (UK), K. Egamberganov (France), G. Fera (Italy), K. T. L. Koo (China), O. Kouba (Syria), H. Kwong, P. Lalonde (Canada), A. Natian, J. H. Nieto (Venezuela), M. Omarjee (France), N. C. Singer, A. Stadler (Switzerland), R. Stong, R. Tauraso (Italy), T. Wiandt, L. Zhou, GCHQ Problem Solving Group (UK), Missouri State University PSG, and the proposer.
An Inequality of Cyclic Sums 12119 [2019, 562]. Proposed by Vu Thanh Tung, Nam Dinh, Vietnam. Let I be a real interval, and let F: I × I → R be a function such that ∂ 3F ∂ 3F . ≥ 0 ≥ ∂x ∂y 2 ∂x 2 ∂y For a positive integer n, suppose that a1 , . . . , an are real numbers in I satisfying a1 ≥ a2 ≥ · · · ≥ an , and let an+1 = a1 . Prove n
F (ai , ai+1 ) ≥
i=1
n
F (ai+1 , ai ).
i=1
Solution by Robin Chapman, University of Exeter, Exeter, UK. Consider the special case where n = 3 and F (x, y) = xy 2 , which satisfies the hypothesis on third partial derivatives. We have 3 i=1
F (ai , ai+1 ) −
3
F (ai+1 , ai ) = a1 a22 + a2 a32 + a12 a3 − a12 a2 − a22 a3 − a1 a32
i=1
= −(a1 − a2 )(a1 − a3 )(a2 − a3 ) ≤ 0, with strict inequality when a1 > a2 > a3 . This means that the requested conclusion cannot hold in general. We prove instead the reverse inequality, n i=1
F (ai , ai+1 ) ≤
n
F (ai+1 , ai ).
(∗)
i=1
We assume that the function F is smooth enough so that mixed partial derivatives do not depend on the order of the partial differentiation operators. The problem is trivial when n ≤ 2, so we assume n ≥ 3. Define G by the equation G(x, y) = F (y, x) − F (x, y).
94
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128
Since G(y, x) = −G(x, y) and G(x, x) = 0, inequality (∗) is equivalent to n
G(ai , ai+1 ) ≥ 0.
i=1
Define n (a1 , . . . , an ) =
n
G(ai , ai+1 ),
i=1
so that (∗) becomes n (a1 , . . . , an ) ≥ 0. We have n (a1 , . . . , an ) = n−1 (a1 , . . . , an−1 ) − G(an−1 , a1 ) + G(an−1 , an ) + G(an , a1 ) = n−1 (a1 , . . . , an−1 ) + G(a1 , an−1 ) + G(an−1 , an ) + G(an , a1 ) = n−1 (a1 , . . . , an−1 ) + 3 (a1 , , an−1 , an ). Thus, to prove (∗) by induction on n, it suffices to prove that 3 (a, b, c) ≥ 0 whenever a ≥ b ≥ c with a, b, c ∈ I . That is, we need to prove (∗) only for the case n = 3. We now adopt the notation that Fj denotes the partial derivative of F with respect to the j th variable; similarly Fj,k denotes a second partial derivative, and so on. We have 3 (a, b, c) = G(a, b) + G(b, c) + G(c, a) = G(a, b) − G(a, c) − G(b, b) + G(b, c) = H (a) − H (b), where
H (x) = G(x, b) − G(x, c).
By the mean value theorem, 3 (a, b, c) = (a − b)H (r) = (a − b)(G1 (r, b) − G1 (r, c)) for some r with a ≥ r ≥ b. Applying the mean value theorem again gives 3 (a, b, c) = (a − b)(b − c)G1,2 (r, s) = (a − b)(b − c)(F1,2 (s, r) − F1,2 (r, s)), where b ≥ s ≥ c. We have F1,2 (r, s) ≤ F1,2 (r, r) since s ≤ r and F1,2,2 ≥ 0. We also have F1,2 (r, r) ≤ F1,2 (s, r) since r ≥ s and F1,1,2 ≤ 0. Hence F1,2 (r, s) ≤ F1,2 (r, r) ≤ F1,2 (s, r), and we conclude 3 (a, b, c) = (a − b)(b − c)(F1,2 (s, r) − F1,2 (r, s)) ≥ 0, as required. Also solved by O. Kouba (Syria), J. H. Lindsey II, O. P. Lossers (Netherlands), and the proposer.
A Parametrized Inequality 12121 [2019, 563]. Proposed by Leonard Giugiuc, Drobeta Turnu Severin, Romania, and Kunihiko Chikaya, Tokyo, Japan. For what values of k does k(a x + bx + cx ) + a(1 − b) + b(1 − c) + c(1 − a) ≥ 3k hold for all x ∈ (0, 1) and all positive real numbers a, b, and c satisfying a + b + c = 3? √ Solution by the proposers. We show that the inequality holds for k ≤ 3/2 but not for any larger k. We rewrite the required inequality in the form
January 2021]
PROBLEMS AND SOLUTIONS
95
k(3 − (a x + bx + cx )) ≤
a(1 − b) + b(1 − c) + c(1 − a).
(1)
We first observe that if a, b, and c are positive real numbers satisfying a + b + c = 3, then by the AM–GM inequality, bc ≤
(3 − a)2 (b + c)2 = , 4 4
and therefore a(1 − b) + b(1 − c) + c(1 − a) = (a + b + c) − (ab + bc + ca) = 3 − a(b + c) − bc ≥ 3 − a(3 − a) −
(3 − a)2 3 = (a − 1)2 . 4 4
It follows that the quantity inside the radical in (1) is nonnegative, and √ 3 |a − 1|. a(1 − b) + b(1 − c) + c(1 − a) ≥ 2
(2)
By symmetry, we also have a(1 − b) + b(1 − c) + c(1 − a) ≥
√ 3 |c − 1|. 2
(3)
Also, if x ∈ (0, 1), then the function f (t) = t x is concave on [0, 3]. Therefore, by Jensen’s inequality, a x + bx + cx a+b+c x ≤ = 1, 3 3 so 3 − (a x + bx + cx ) ≥ 0. Suppose now that for some value of k, (1) holds for all x ∈ (0, 1) and all positive a, and b, c → 0+ , we get b, and c with√a + b + c = 3. Taking the limit of (1) √ as a → 3− √ x + k(3 − 3 ) ≤ 3. Letting x → √ 0 , we conclude 2k ≤ 3, so k ≤ 3/2. Conversely, suppose k ≤ 3/2, and fix a, b, c, and x as in the problem. We may assume without loss of generality that a ≥ b ≥ c, which implies that a ≥ 1 ≥ c. Case 1: b ≥ 1. Since a x ≥ 1, bx ≥ 1, and cx ≥ c, we have √ 3 x x x (3 − (1 + 1 + c)) k(3 − (a + b + c )) ≤ 2 √ 3 |c − 1| ≤ a(1 − b) + b(1 − c) + c(1 − a), = 2 where we have applied (3) in the last step. Case 2: b < 1. Since a x ≥ 1, bx ≥ b, and cx ≥ c, by (2) √ √ 3 3 x x x (3 − (1 + b + c)) = (3 − (1 + 3 − a)) k(3 − (a + b + c )) ≤ 2 2 √ 3 |a − 1| ≤ a(1 − b) + b(1 − c) + c(1 − a). = 2 No other correct and complete solutions were received.
96
c THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 128