346 82 5MB
English Pages 272 [293] Year 2020
49
A First Course in Enumerative Combinatorics Carl G. Wagner
A First Course in Enumerative Combinatorics
UNDERGRADUATE
TEXTS
•
49
A First Course in Enumerative Combinatorics Carl G. Wagner
In loving memory of my niece, Sara Louise Weller (1983–2005)
Contents
Preface
xiii
Notation
xvii
Chapter 1.
Prologue: Compositions of an integer
1
1.1. Counting compositions
2
1.2. The Fibonacci numbers from a combinatorial perspective
3
1.3. Weak compositions
5
1.4. Compositions with arbitrarily restricted parts
5
1.5. The fundamental theorem of composition enumeration
6
1.6. Basic tools for manipulating finite sums
7
Exercises
8
Chapter 2.
Sets, functions, and relations
11
2.1. Functions
12
2.2. Finite sets
16
2.3. Cartesian products and their subsets
19
2.4. Counting surjections: A recursive formula
21
2.5. The domain partition induced by a function
22
2.6. The pigeonhole principle for functions
24
2.7. Relations
25
2.8. The matrix representation of a relation
27
2.9. Equivalence relations and partitions
27
References
27
Exercises
28
Project
30 vii
viii
Chapter 3.
Contents
Binomial coefficients
31
3.1. Subsets of a finite set
31
3.2. Distributions, words, and lattice paths
34
3.3. Binomial inversion and the sieve formula
36
3.4. Problème des ménages
39
3.5. An inversion formula for set functions
41
3.6. *The Bonferroni inequalities
43
References
45
Exercises
45
Chapter 4.
Multinomial coefficients and ordered partitions
49
4.1. Multinomial coefficients and ordered partitions
49
4.2. Ordered partitions and preferential rankings
51
4.3. Generating functions for ordered partitions
52
Reference
54
Exercises
54
Chapter 5.
Graphs and trees
57
5.1. Graphs
57
5.2. Connected graphs
58
5.3. Trees
59
5.4. *Spanning trees
62
5.5. *Ramsey theory
63
5.6. *The probabilistic method
65
References
66
Exercises
66
Project
67
Chapter 6.
Partitions: Stirling, Lah, and cycle numbers
69
6.1. Stirling numbers of the second kind
69
6.2. Restricted growth functions
72
6.3. The numbers 𝜎(𝑛, 𝑘) and 𝑆(𝑛, 𝑘) as connection constants
73
6.4. Stirling numbers of the first kind
75
6.5. Ordered occupancy: Lah numbers
76
6.6. Restricted ordered occupancy: Cycle numbers
78
6.7. Balls and boxes: The twenty-fold way
82
References
82
Exercises
83
Projects
85
Contents
Chapter 7.
ix
Intermission: Some unifying themes
87
7.1. The exponential formula
87
7.2. Comtet’s theorem
90
7.3. Lancaster’s theorem
92
References
93
Exercises
93
Project
94
Chapter 8.
Combinatorics and number theory
8.1. Arithmetic functions
97 97
8.2. Circular words
100
8.3. Partitions of an integer
101
8.4. *Power sums
109
8.5. 𝑝-orders and Legendre’s theorem
112
8.6. Lucas’s congruence for binomial coefficients
114
8.7. *Restricted sums of binomial coefficients
115
References
116
Exercises
117
Project
118
Chapter 9.
Differences and sums
119
9.1. Finite difference operators
119
9.2. Polynomial interpolation
121
9.3. The fundamental theorem of the finite difference calculus
122
9.4. The snake oil method
123
9.5. * The harmonic numbers
126
9.6. Linear homogeneous difference equations with constant coefficients
127
9.7. Constructing visibly real-valued solutions to difference equations with obviously real-valued solutions
131
9.8. The fundamental theorem of rational generating functions
132
9.9. Inefficient recursive formulae
134
9.10. Periodic functions and polynomial functions
135
9.11. A nonlinear recursive formula: The Catalan numbers
136
References
139
Exercises
140
Project
141
Chapter 10.
Enumeration under group action
10.1. Permutation groups and orbits
143 143
x
Contents
10.2. Pólya’s first theorem
145
10.3. The pattern inventory: Pólya’s second theorem
148
10.4. Counting isomorphism classes of graphs
150
10.5.
154
𝐺-classes of proper subsets of colorings / group actions
10.6. De Bruijn’s generalization of Pólya theory
155
10.7. Equivalence classes of boolean functions
157
References
159
Exercises
160
Chapter 11.
Finite vector spaces
163
11.1. Vector spaces over finite fields
163
11.2. Linear spans and linear independence
164
11.3. Counting subspaces
165
11.4. The 𝑞-binomial coefficients are Comtet numbers
167
11.5.
𝑞-binomial inversion
169
11.6. The 𝑞-Vandermonde identity
171
11.7.
𝑞-multinomial coefficients of the first kind
172
11.8.
𝑞-multinomial coefficients of the second kind
173
11.9. The distribution polynomials of statistics on discrete structures
175
11.10. Knuth’s analysis
180
11.11. The Galois numbers
182
References
183
Exercises
184
Projects
185
Chapter 12.
Ordered sets
187
12.1. Total orders and their generalizations
187
12.2. *Quasi-orders and topologies
188
12.3. *Weak orders and ordered partitions
189
12.4. *Strict orders
191
12.5. Partial orders: basic terminology and notation
192
12.6. Chains and antichains
194
12.7. Matchings and systems of distinct representatives
198
12.8. *Unimodality and logarithmic concavity
200
12.9. Rank functions and Sperner posets
203
12.10. Lattices
203
References
205
Exercises
206
Projects
207
Contents
Chapter 13.
xi
Formal power series
209
13.1. Semigroup algebras
209
13.2. The Cauchy algebra
210
13.3. Formal power series and polynomials over ℂ
211
ℕ
13.4. Infinite sums in ℂ
215
13.5. Summation interchange
217
13.6. Formal derivatives
219
13.7. The formal logarithm
221
13.8. The formal exponential function
222
References
223
Exercises
223
Projects
224
Chapter 14.
Incidence algebra: The grand unified theory of enumerative combinatorics
14.1. The incidence algebra of a locally finite poset
227 227
Int(𝑃)
14.2. Infinite sums in ℂ
229
14.3. The zeta function and the enumeration of chains
231
14.4. The chi function and the enumeration of maximal chains
232
14.5. The Möbius function
233
14.6. Möbius inversion formulas
235
14.7. The Möbius functions of four classical posets
237
14.8. Graded posets and the Jordan–Dedekind chain condition
239
14.9. Binomial posets
240
14.10. The reduced incidence algebra of a binomial poset
243
14.11. Modular binomial lattices
247
References
249
Exercises
249
Projects
250
Appendix A.
Analysis review
251
A.1. Infinite series
251
A.2. Power series
252
A.3. Double sequences and series
253
References
254
Appendix B. Topology review
255
B.1. Topological spaces and their bases
255
B.2. Metric topologies
256
xii
Contents
B.3. Separation axioms
257
B.4. Product topologies
257
B.5. The topology of pointwise convergence
257
References
259
Appendix C.
Abstract algebra review
261
C.1. Algebraic structures with one composition
261
C.2. Algebraic structures with two compositions
262
C.3. 𝑅-algebraic structures
263
C.4. Substructures
264
C.5. Isomorphic structures
265
References
266
Index
267
Preface
This book offers an introduction to enumerative combinatorics for advanced undergraduates as well as beginning graduate students. One of the attractive features of combinatorics, however, is the modest mathematical background required for its study, and I have occasionally had students as young as sophomores master this material. For much of the book, the only prerequisites are knowledge of elementary linear algebra and familiarity with power series (the latter reviewed in Appendix A). While a prior course covering logic and sets, functions and relations, etc., would also be helpful, those who have not had such a course or simply require a review will find an adequate account of these subjects in Chapter 2. Passing references to abstract algebraic structures (groups, rings, fields, algebras) are mostly shorthand for lists of properties. Appendix C includes a glossary of the basic terminology of abstract algebra, along with a few of its elementary theorems, which should suffice for reading most of the text (the exceptions being Chapter 10 and parts of Chapters 13 and 14). Concepts from point set topology also appear, but only in section 12.2, and in Chapters 13 and 14, where alternatives to a topological treatment are presented. In any case, there is an outline of all pertinent topological results in Appendix B. Sections devoted to special supplementary topics are designated with the asterisk symbol “*”. The material in this book and its style of presentation are deeply indebted to an approach to combinatorics developed by Gian-Carlo Rota and his students, especially Richard Stanley. In particular, bijective proofs (which establish the numerical identity 𝑎 = 𝑏 by counting a set 𝑆 in two different ways, or by exhibiting sets 𝐴 and 𝐵 with respective cardinalities 𝑎 and 𝑏, along with a bijection from 𝐴 to 𝐵) are preferred, whenever possible, to verification by symbol manipulation. I also make frequent use of the notion of an 𝑚-to-one surjection, which provides a clear and rigorous way of establishing that a set has 𝑎/𝑚 members. Although combinatorics texts typically follow the (entirely reasonable) practice of adopting the addition and multiplication rules as axioms, I have included careful derivations of these rules for interested readers, based on the beautiful exposition of Seth Warner in Chapter III of his Modern Algebra I
xiii
xiv
Preface
(Prentice-Hall, 1965). I have aimed for as concise an exposition as possible, relying on the substantial exercises to elaborate material in the body of the text. At the same time, I have included illustrative examples and solved problems when necessary to illuminate the more difficult topics in the text. Many chapters include collections of related, somewhat more challenging, exercises, labeled “Projects”, which can be assigned as take-home examinations or as honors problems. Chapters 1–9 cover the standard tools of enumeration (recursion, generating functions, and the sieve formula) as well as the fundamental discrete structures (sets and multisets, words and permutations, ordered and unordered partitions of a set, graphs and trees, and combinatorial number theory) most frequently encountered in combinatorics. Chapter 10 covers the Pólya–de Bruijn theory of enumeration under group action, an important tool for counting equivalence classes of discrete structures. In an unusual feature in a book of this type, Chapter 11 gives a thorough account of the structure of vector spaces over finite fields (“𝑞-theory”), both for its intriguing parallels with the combinatorics of finite sets and as a prime example of a class of posets that play an important role later in the book. Chapter 12 deals with ordered structures, and in addition to the classical theory of posets (Dilworth’s chain and antichain decomposition theorems, Sperner theory, matchings and systems of distinct representatives, etc.), it contains supplementary material on other generalizations of total orders, such as quasi-orders and weak orders, as well as a discussion of their asymmetric counterparts, the strict orders. Many combinatorics texts finesse convergence questions for generating functions with the comment that this can all be made rigorous by conceptualizing such functions as formal power series. While there is nothing wrong with this practice if one is in a hurry to get on to other subjects, I have decided to devote Chapter 13 to a detailed account of formal power series, based on the beautiful paper by Ivan Niven in the 1969 American Mathematical Monthly. Until the 1960s, many, if not most, mathematicians regarded combinatorics as a more or less unorganized collection of problems and their solutions. In decrying this attitude in his 1968 monograph, Combinatorial Identities, John Riordan nicely expressed it as the view that “. . . the challenge of verification provides the chief interest of the identity; once verified, it drops into an undifferentiated void”. Chapter 14 offers what I hope is a convincing refutation of this view, as embodied in Rota’s theory of incidence algebras over locally finite posets and the Doubilet–Rota–Stanley theory of binomial posets, arguably the most attractive unification to date of the essential core of enumerative combinatorics. In particular, the latter theory provides a satisfying explanation of why various types of enumeration problems are inevitably associated with particular types of generating functions and, when specialized to the case of modular binomial lattices, an explanation of why the formulas of 𝑞-theory so frequently reduce to formulas about chains “when 𝑞 = 0” and about sets “when 𝑞 = 1”. There is more material in this text than can be taught in a typical one-semester undergraduate course. At a minimum, such a course might cover the unstarred sections of Chapters 1–9, with substantial class time devoted to the discussion of exercises. Time permitting, I would also cover sections 11.1–11.4, as well as the unstarred sections of Chapter 12. Chapters 9–13 are independent of each other, which allows instructors considerable flexibility in designing variations on the above. Those wishing to cover
Preface
xv
Chapter 14 (in, say, an honors undergraduate or masters level graduate course) will want to ensure that students have read in advance sections 11.1–11.5, the unstarred sections of Chapter 12, and sections 13.1-13.6. My longtime friend and colleague, Shashikant Mulay, has been generous beyond measure with advice during the writing of this book, as well as many other projects. He is responsible in particular for the formulation and proofs of Theorems 13.5.1 and B.5.6. Two of my former doctoral students have also made important contributions. Mark Shattuck, now an accomplished combinatorist, has given the book a meticulous proofreading, checking the proofs of all of the theorems, as well as suggesting the proof of Theorem 11.11.1 and several exercises. Reid Davis was responsible for calling my initial attention to the important unifying role of modular binomial lattices, only later discovering identical ideas in earlier work of Doubilet, Rota, and Stanley. The thoughtful proposals of three anonymous referees for revisions and additions to my original manuscript have greatly improved the final text. Steve Kennedy and Jennifer Wright Sharp have been all that one could hope for as editors. To all of these individuals I offer my sincere thanks.
Notation
ℙ = the set of positive integers ℕ = the set of nonnegative integers ℤ = the set of integers ℚ = the set of rational numbers ℝ = the set of real numbers ℂ = the set of complex numbers ∶= is equal by definition to [𝑛] = {1, . . . , 𝑛} if 𝑛 ∈ ℙ ; [0] = ∅ [𝑖, 𝑗] = {𝑖, 𝑖 + 1, . . . , 𝑗} if 𝑖, 𝑗 ∈ ℤ and 𝑖 ≤ 𝑗 ⌊𝑥⌋ = the greatest integer ≤ 𝑥 ⌈𝑥⌉ = the smallest integer ≥ 𝑥 |𝑆| = the cardinality of the set 𝑆 im(𝑓) = the image (or range) of the function 𝑓 𝐹𝑞 = the finite field of cardinality 𝑞 𝛿 𝑖,𝑗 = the Kronecker delta function (1 if 𝑖 = 𝑗 and 0 if 𝑖 ≠ 𝑗) 𝑥𝑘 = 𝑥(𝑥 − 1) ⋯ (𝑥 − 𝑘 + 1) if 𝑘 ∈ ℙ ; 𝑥0=1 𝑥𝑘 = 𝑥(𝑥 + 1) ⋯ (𝑥 + 𝑘 − 1) if 𝑘 ∈ ℙ ; 𝑥0=1 (𝑥𝑘) =
𝑥𝑘 𝑘!
Δ 𝑓(𝑥) = 𝑓(𝑥 + 1) − 𝑓(𝑥) 𝑛
𝑆𝑟 (𝑛) = ∑𝑘=0 𝑘𝑟
xvii
Chapter 1
Prologue: Compositions of an integer
This chapter is intended as a preview of coming attractions, and features a case study of the following simple, but evocative, enumeration problem: Determine the number of compositions of the positive integer 𝑛, i.e., the number of ways in which 𝑛 may be expressed as an ordered sum of one or more positive integers (called the parts of the composition). In exploring this problem, we will encounter three different types of solutions, namely, closed forms, recursive formulas, and generating functions, each of which will make frequent appearances in the remainder of this text. Pursuing a special case of this problem, we consider compositions in which each part is equal to 1 or 2, encountering Fibonacci numbers in one of their many intriguing appearances in combinatorics. We conclude with a proof of the Fundamental Theorem of Composition Enumeration, which furnishes a simple generating function solution to the problem of counting compositions under arbitrary restrictions on their parts. In what follows, the number of elements belonging to the finite set 𝐴 is denoted by |𝐴|. We assume from elementary combinatorics only the fact that an 𝑛-element 𝑛! set has 2𝑛 subsets and (𝑛𝑘) = 𝑘!(𝑛−𝑘)! subsets of cardinality 𝑘. In particular, there are (𝑛𝑘) sequential arrangements of 𝑘 indistinguishable symbols of one type (say, 𝑥’s) and 𝑛−𝑘 indistinguishable symbols of another type (say, 𝑦’s), since any such arrangement is completely determined by choosing 𝑘 of the 𝑛 positions in the sequence to be occupied by 𝑥’s. The term bijection is synonymous with what is frequently called a one-to-one correspondence. A bijection from a set 𝐴 to itself is called a permutation. The terms map and mapping are employed as synonyms for the term function. We often denote the function 𝑓 from 𝐴 to 𝐵 given by 𝑓(𝑥) = 𝑦 by the simple notation 𝑥 ⟼ 𝑦, eliminating the need to employ a function symbol. The set {0, 1, 2, . . . } of nonnegative integers is denoted by ℕ, and the set {1, 2, . . . } of positive integers by ℙ. We employ the notation ∞ ∑𝑛≥𝑘 𝑎𝑛 for the infinite series 𝑎𝑘 + 𝑎𝑘+1 + 𝑎𝑘+2 + ⋯ as an alternative to ∑𝑛=𝑘 𝑎𝑛 .
1
2
1. Prologue: Compositions of an integer
In attempting to count a set 𝐴, it is often useful to employ the strategy of exhibiting a bijection between 𝐴 and some other (more easily enumerated) set 𝐵, and counting 𝐵 instead. Similarly, to establish a numerical identity of the form 𝑎 = 𝑏, one can identify sets 𝐴 and 𝐵, with |𝐴| = 𝑎 and |𝐵| = 𝑏, and exhibit a bijection between 𝐴 and 𝐵. Arguments of this sort, which are ubiquitous in enumerative combinatorics, are known as bijective (or combinatorial) proofs.
1.1. Counting compositions Let comp(𝑛) denote the total number of compositions of the positive integer 𝑛. It is easy to see that comp(1) = 1, comp(2) = 2, comp(3) = 4, and comp(4) = 8, which suggests that (1.1.1)
comp(𝑛) = 2𝑛−1 .
Here is a bijective proof of the above formula. First, note that each composition of 𝑛 corresponds in one-to-one fashion with a certain sequential arrangement of 𝑛 balls, denoted by the symbol •, and 𝑗 bars, denoted by the symbol |, where 0 ≤ 𝑗 ≤ 𝑛 − 1. If 𝑛 = 3, for example, the correspondence is given by 1 + 1 + 1 ⟼ •| • |•; 1 + 2 ⟼ •| • •; 2 + 1 ⟼ • • |•; 3 ⟼ • • •. Each such arrangement is determined by choosing, in one of 2𝑛−1 possible ways, a subset (possibly empty) of the 𝑛 − 1 spaces between the 𝑛 balls and inserting a bar in each of the selected spaces. Of course, selecting exactly 𝑘 − 1 of the 𝑛 − 1 spaces and inserting a bar in those spaces results in an arrangement corresponding to a composition of 𝑛 with exactly 𝑘 parts. So if we denote by comp(𝑛, 𝑘) the number of compositions of 𝑛 with exactly 𝑘 parts (henceforth, k-compositions of n), it follows that (1.1.2)
comp(𝑛, 𝑘) = (
𝑛−1 ). 𝑘−1
The formula comp(𝑛) = 2𝑛−1 provides a closed form solution to the problem of enumerating compositions of 𝑛. Another way of specifying comp(𝑛) involves a recursive formula, consisting of the initial value comp(1) = 1, along with the recurrence relation (more simply, recurrence) (1.1.3)
comp(𝑛) = comp(𝑛 − 1) + comp(𝑛 − 1) = 2 comp(𝑛 − 1), 𝑛 ≥ 2.
This recurrence follows from the observation that, among all compositions of 𝑛, there are comp(𝑛−1) compositions with initial part equal to 1 (since the map 1+𝑛2 +⋯ ⟼ 𝑛2 + ⋯ is a bijection from the set of such compositions to the set of all compositions of 𝑛 − 1), as well as comp(𝑛 − 1) compositions with initial part greater than 1 (since the map 𝑛1 + ⋯ ⟼ (𝑛1 − 1) + . . . is a bijection from the set of such compositions to the set of all compositions of 𝑛 − 1). Using this recursive formula, it is trivial to verify the closed form comp(𝑛) = 2𝑛−1 by induction. It is, however, not always this easy to derive closed forms from recursive formulas. Indeed, we will encounter enumeration problems for which we will have to be content with recursive solutions.
1.2. The Fibonacci numbers from a combinatorial perspective
3
1.2. The Fibonacci numbers from a combinatorial perspective Let 𝐹𝑛 denote the number of compositions of 𝑛 in which each part is equal to 1 or 2. Listing and counting a few such compositions reveals the following. 𝐹1 = 1, the sole relevant composition of 1 being 1. 𝐹2 = 2, the relevant compositions of 2 being 1 + 1 and 2. 𝐹3 = 3, the relevant compositions of 3 being 1 + 1 + 1, 1 + 2, and 2+1. 𝐹4 = 5, the relevant compositions of 4 being 1 + 1 + 1 + 1, 1 + 1 + 2, 1 + 2 + 1, 2+1+1, and 2+2. 𝐹5 = 8, the relevant compositions of 5 being 1 + 1 + 1 + 1 + 1, 1 + 1 + 1 + 2, 1 + 1 + 2 + 1, 1 + 2 + 1 + 1, 1 + 2 + 2, 2+1+1+1, 2+1+2, and 2+2+1. Note that in each row 𝑛 above the compositions displayed in ordinary type have initial part equal to 1, and those displayed in bold type have initial part equal to 2. When 𝑛 ≥ 3, those with initial part equal to 1 arise by appending to that 1 each of the compositions in row 𝑛 − 1, and those with initial part equal to 2 arise by appending to that 2 each of the compositions in row 𝑛 − 2. This suggests that (1.2.1)
𝐹𝑛 = 𝐹𝑛−1 + 𝐹𝑛−2 , 𝑛 ≥ 3.
This recurrence is easily established by noting that, among all compositions of 𝑛 in which each part is equal to 1 or 2, 𝐹𝑛−1 counts those with initial part equal to 1, and 𝐹𝑛−2 counts those with initial part equal to 2. (What are the bijections that rigorously establish these claims?) Can we derive a closed form expression for 𝐹𝑛 ? Set 𝐹0 = 1, so that (1.2.1) holds for all 𝑛 ≥ 2, and consider the ordinary generating function 𝐹(𝑥) of the sequence (𝐹𝑛 )𝑛≥0 , where 𝐹(𝑥) = ∑ 𝐹𝑛 𝑥𝑛 .
(1.2.2)
𝑛≥0
The infinite series appearing in formula (1.2.2) is the first of a multitude that we will encounter in the following chapters. The convergence of such series, and the manipulations that we will perform on them, can be justified analytically (see Appendix A below for a review of the relevant theorems). As an alternative, however, sums such as (1.2.2) can be treated as so-called formal power series, for which convergence questions become almost trivial. See Chapter 13 below for details. The following derivation of a closed form for 𝐹(𝑥) from (1.2.1) illustrates a useful general technique for deriving a generating function from a recursive formula: Write 𝐹(𝑥) = 𝐹0 + 𝐹1 𝑥 + ∑ (𝐹𝑛−1 + 𝐹𝑛−2 )𝑥𝑛 = 1 + 𝑥 + 𝑥 ∑ 𝐹𝑛−1 𝑥𝑛−1 + 𝑥2 ∑ 𝐹𝑛−2 𝑥𝑛−2 𝑛≥2
𝑛≥2 𝑛
2
𝑛
𝑛≥2 2
= 1 + 𝑥 + 𝑥 ∑ 𝐹𝑛 𝑥 + 𝑥 ∑ 𝐹𝑛 𝑥 = 1 + 𝑥 + 𝑥(𝐹(𝑥) − 1) + 𝑥 𝐹(𝑥). 𝑛≥1
𝑛≥0
Solving the above for 𝐹(𝑥) then yields 1 . 1 − 𝑥 − 𝑥2 We next expand (1.2.3) as a power series. By the uniqueness of such series, the coefficient of 𝑥𝑛 in that series will furnish a closed form expression for 𝐹𝑛 . Our first step (1.2.3)
𝐹(𝑥) =
4
1. Prologue: Compositions of an integer
is to factor the polynomial 1 − 𝑥 − 𝑥2 . For reasons that will shortly be clear, we want this factorization to take the form (1 − 𝑎𝑥)(1 − 𝑏𝑥). Since such factorizations are often useful in combinatorics, let us pause and consider the general case. Suppose that we wish to factor the polynomial 𝑝(𝑥) = 𝑐 0 + ⋯ + 𝑐𝑛 𝑥𝑛 , where 𝑐 0 𝑐𝑛 ≠ 0, in the form (1.2.4)
𝑝(𝑥) = 𝑐 0 (1 − 𝜌1 𝑥) ⋯ (1 − 𝜌𝑛 𝑥)
rather than the form, usually sought in algebra, (1.2.5)
𝑝(𝑥) = 𝑐𝑛 (𝑥 − 𝑟1 ) ⋯ (𝑥 − 𝑟𝑛 ).
Equating the constant terms in (1.2.4) and (1.2.5) yields 𝑐𝑛 = 𝑐 0 /(−𝑟1 ) ⋯ (−𝑟𝑛 ), and substituting this expression for 𝑐𝑛 in (1.2.5) yields 1 1 (1.2.6) 𝑝(𝑥) = 𝑐 0 (1 − 𝑥) ⋯ (1 − 𝑥). 𝑟1 𝑟𝑛 So the parameters 𝜌𝑖 in what we’ll call the combinatorial factorization (1.2.4) of 𝑝(𝑥) are simply the reciprocals of the roots of 𝑝(𝑥). But things are even easier than that, for the reciprocals of the roots of p(x) are just the roots of the reciprocal polynomial of p(x), denoted by Rp(x), where 1 (1.2.7) 𝑅𝑝(𝑥) ∶= 𝑥𝑛 𝑝( ) = 𝑐 0 𝑥𝑛 + 𝑐 1 𝑥𝑛−1 + ⋯ + 𝑐𝑛−1 𝑥 + 𝑐𝑛 , 𝑥 1
since 𝑝(𝑟) = 0 ⇒ 𝑅𝑝( 𝑟 ) = 𝑟−𝑛 𝑝(𝑟) = 0. In the case (1.2.3) above, 𝑝(𝑥) = 1 − 𝑥 − 𝑥2 , and so 𝑅𝑝(𝑥) = 𝑥2 − 𝑥 − 1. The roots 1+√5
of 𝑅𝑝(𝑥) are the golden ratio Φ ∶= 2 and its conjugate 𝜙 ∶= partial fraction decomposition of (1.2.3) yields (1.2.8)
𝐹(𝑥) =
1−√5 (= 2
1 − Φ). The
𝛽 1 1 𝛼 = = + , 2 1−𝑥−𝑥 (1 − Φ𝑥)(1 − 𝜙𝑥) (1 − Φ𝑥) (1 − 𝜙𝑥)
where 𝛼 = Φ/√5 and 𝛽 = −𝜙/√5. Expressing 𝛼/(1 − Φ𝑥) and 𝛽/(1 − 𝜙𝑥) as infinite series then yields the closed form 1 1 + √5 𝑛+1 1 1 − √5 𝑛+1 ( ( ) − ) . 2 2 √5 √5 √5 √5 Note that |Φ| > 1 and |𝜙| < 1. Two easily proved consequences of this are the asymptotic formula
(1.2.9)
(1.2.10)
𝐹𝑛 =
1
Φ𝑛+1 −
1
𝜙𝑛+1 =
𝐹𝑛 ∼
where 𝑎𝑛 ∼ 𝑏𝑛 means that lim𝑛→∞
1 1 + √5 𝑛+1 ) , ( 2 √5
𝑎𝑛 𝑏𝑛
= 1, and the limit
𝐹𝑛 1 + √5 = , 2 𝑛→∞ 𝐹𝑛−1 which asserts that the asymptotic growth rate of 𝐹𝑛 is equal to the golden ratio. (1.2.11)
lim
Remark 1.2.1. The numbers 𝐹𝑛 are of course the familiar Fibonacci numbers, but parameterized slightly differently from the “official” Fibonacci numbers (as dictated by the Fibonacci Society) 𝐹0 = 0, 𝐹1 = 1, 𝐹2 = 1, etc. Our parameterization is frequently advantageous in stating combinatorial results, whereas the official parameterization
1.4. Compositions with arbitrarily restricted parts
5
often results in more salient statements of number-theoretic properties of the Fibonacci numbers. (Consider, for example, the charming identity gcd(𝐹𝑚 , 𝐹𝑛 ) = 𝐹gcd(𝑚,𝑛) , which takes the unaesthetic form gcd(𝐹𝑚 , 𝐹𝑛 ) = 𝐹gcd(𝑚+1,𝑛+1)−1 when expressed in terms of the combinatorial parameterization.) Remark 1.2.2 (Fibonacci numbers and tilings). It is often advantageous to think of 𝐹𝑛 as enumerating tilings of the array 1 2 3 ⋯ 𝑛 using only squares (1 × 1 tiles) and dominoes (1 × 2 tiles). For example, the composition 2+1+1+2+2+1 of 9 is represented by the tiling 12 3 4 56 78 9 . More generally, any composition of 𝑛 can be represented as a tiling of the array 1 2 3 ⋯ 𝑛, where one is allowed to use tiles of dimension 1 × 𝑙 for 𝑙 = 1, . . . , 𝑛. In the exercises you are asked to furnish tiling-based proofs of the identities (1.2.12)
𝐹𝑚+𝑛 = 𝐹𝑚 𝐹𝑛 + 𝐹𝑚−1 𝐹𝑛−1 , for all 𝑚, 𝑛 ≥ 1
and 𝑛
∑ 𝐹 𝑘 = 𝐹𝑛+2 − 1.
(1.2.13)
𝑘=0
1.3. Weak compositions Suppose that 𝑛 is a nonnegative integer and (for the moment) 𝑘 is a positive integer. A weak composition of n with k parts (or weak k-composition of n) is a way of expressing 𝑛 as an ordered sum of k nonnegative integers. For example, the weak 3-compositions of 2 are 0+0+2, 0+1+1, 0+2+0, 1+0+1, 1+1+0, and 2+0+0. There is clearly a one-to-one correspondence between the set of weak 𝑘-compositions of 𝑛 and the set of arbitrary sequential arrangements of 𝑛 copies of the symbol • and 𝑘 − 1 copies of the symbol |. In the preceding case (with such sequential arrangements bracketed by the symbols J and K for ease of reading), the correspondence is given by 0 + 0 + 2 ⟼ J | | • •K; 0 + 1 + 1 ⟼ J | • |•K; 0 + 2 + 0 ⟼ J | • •| K; 1 + 0 + 1 ⟼ J•| |•K ;1 + 1 + 0 ⟼ J•| • | K; and 2 + 0 + 0 ⟼ J• • | | K. (Note that when 𝑛 = 0 and 𝑘 = 1, the correspondence is given by 0 ⟼ J K). If we denote by wcomp(𝑛, 𝑘) the number of weak 𝑘-compositions of 𝑛, it follows that (1.3.1)
wcomp(𝑛, 𝑘) = (
𝑛+𝑘−1 𝑛+𝑘−1 )=( ), 𝑛 ≥ 0, 𝑘 ≥ 1, 𝑘−1 𝑛
for a sequential arrangement of 𝑛 balls and 𝑘 − 1 bars, is fully determined by the choice of which of the 𝑛 + 𝑘 − 1 positions in this arrangement will be occupied by bars or, alternatively, which of those 𝑛 + 𝑘 − 1 positions will be occupied by balls.
1.4. Compositions with arbitrarily restricted parts If we adopt the common mathematical convention that empty sums (i.e., sums with no parts) take the value zero, we may define comp(𝑛), comp(𝑛, 𝑘), and wcomp(𝑛, 𝑘) for all nonnegative integers 𝑛 and 𝑘, the value 𝑘 = 0 corresponding to an empty sum. In particular, comp(0) = 1 and comp(𝑛, 0) = wcomp(𝑛, 0) = 𝛿𝑛,0 for all 𝑛 ∈ ℕ.
6
1. Prologue: Compositions of an integer
Keeping the preceding remarks in mind, suppose that we are given a nonempty set 𝐼 of nonnegative integers, along with integers 𝑛, 𝑘 ≥ 0, and that we wish to determine the number of ways in which 𝑛 may be expressed as an ordered sum of 𝑘 members of 𝐼. Denote this number by 𝐶(𝑛, 𝑘; 𝐼). Theorem 1.4.1. For each 𝑘 ≥ 0, ∑ 𝐶(𝑛, 𝑘; 𝐼)𝑥𝑛 = (∑ 𝑥𝑖 )𝑘 .
(1.4.1)
𝑛≥0
𝑖∈𝐼
Proof. Since 𝐶(𝑛, 0; 𝐼) = 𝛿𝑛,0 ,the result holds for 𝑘 = 0. Since 𝐶(𝑛, 1; 𝐼) = 1 if 𝑛 ∈ 𝐼 and 𝐶(𝑛, 1; 𝐼) = 0 otherwise, the result holds for 𝑘 = 1. If 𝑘 ≥ 2, then (∑ 𝑥𝑖 )𝑘 = ( ∑ 𝑥𝑛1 ) . . . ( ∑ 𝑥𝑛𝑘 ) = 𝑖∈𝐼
𝑛1 ∈𝐼
∑
𝑥𝑛1 +⋯+𝑛𝑘
𝑛1 ,. . .,𝑛𝑘 ∈𝐼
𝑛𝑘 ∈𝐼
= (collecting like powers) ∑ |{(𝑛1 , ... , 𝑛𝑘 ) ∶
each 𝑛𝑗
∈ 𝐼 and 𝑛1 + ⋯ + 𝑛𝑘 = 𝑛}|𝑥𝑛
𝑛≥0
□
= ∑ 𝐶(𝑛, 𝑘; 𝐼)𝑥𝑛 . 𝑛≥0
1.5. The fundamental theorem of composition enumeration If the set 𝐼 of allowable parts of a composition contains only positive integers, then 𝑛 𝐶(𝑛, 𝑘; 𝐼) = 0 whenever 𝑘 > 𝑛. So 𝐶(𝑛; 𝐼) ∶= ∑𝑘=0 𝐶(𝑛, 𝑘; 𝐼) counts the total number of compositions of 𝑛 with all parts in 𝐼. Theorem 1.5.1 (The fundamental theorem of composition enumeration). For every nonempty set 𝐼 of positive integers, 1 ∑ 𝐶(𝑛; 𝐼)𝑥𝑛 = . (1.5.1) 1 − 𝑥𝑖 ∑ 𝑖∈𝐼 𝑛≥0 Proof. 𝑛
∑ 𝐶(𝑛; 𝐼)𝑥𝑛 = ∑ ∑ 𝐶(𝑛, 𝑘; 𝐼)𝑥𝑛 𝑛≥0
𝑛≥0 𝑘=0
= ∑ ∑ 𝐶(𝑛, 𝑘; 𝐼)𝑥𝑛 = ∑ (∑ 𝑥𝑖 )𝑘 = 𝑘≥0 𝑛≥𝑘
𝑘≥0 𝑖∈𝐼
1 . 1 − ∑𝑖∈𝐼 𝑥𝑖
□
The summation interchange above can be proved analytically (see Appendix A) or by the theory of formal power series (as shown in section 13.5). An alternative derivation of formula (1.2.3). Since 𝐹𝑛 = 𝐶(𝑛; 1, 2), it follows immediately from Theorem 1.5.1 that 1 ∑ 𝐹𝑛 𝑥𝑛 = (1.5.2) . 1 − 𝑥 − 𝑥2 𝑛≥0 In the preceding, we were able to write down the ordinary generating function of the Fibonacci numbers directly, without first having to derive a recursive formula. On the other hand, starting with formula (1.5.2), one can recover the recursive formula for
1.6. Basic tools for manipulating finite sums
7
these numbers by multiplying each side of (1.5.2) by 1−𝑥−𝑥2 and matching coefficients of like powers of 𝑥. More generally, whenever the ordinary generating function of a combinatorial sequence is a rational function (i.e., a ratio of two polynomials), a similar method may be used to find a recursive formula for the sequence. See Chapter 9 for an elaboration of these remarks.
1.6. Basic tools for manipulating finite sums Two commonly occurring, and closely related, techniques in enumerative combinatorics involve the reindexing of sums and the interchange of multiple sums. Here, for ease of reference, is a brief catalogue of commonly used tools. The quantities occurring in the following sums are typically nonnegative integers in this text, but the formulas obviously hold much more generally. For a review of the basic tools for manipulating infinite sums, readers should consult Appendix A, and, for an alternative treatment by means of the theory of formal power series, Chapter 13. Top-down summation. If 𝑚 and 𝑛 are integers, with 𝑚 ≤ 𝑛, then 𝑛−𝑚
𝑛
(1.6.1)
𝑛
∑ 𝑎𝑘 = ∑ 𝑎𝑛−𝑘 = ∑ 𝑎𝑚+𝑛−𝑘 . 𝑘=𝑚
𝑘=0
𝑘=𝑚
As suggested by the name, the leftmost sum above is equal to 𝑎𝑚 + 𝑎𝑚+1 + ⋯ + 𝑎𝑛 , and each of the other two sums to 𝑎𝑛 + 𝑎𝑛−1 + ⋯ + 𝑎𝑚 . When 𝑚 = 0, (1.6.1) reduces simply 𝑛 𝑛 𝑛 𝑛−1 𝑛 to ∑𝑘=0 𝑎𝑘 = ∑𝑘=0 𝑎𝑛−𝑘 . Note, however, that ∑𝑘=1 𝑎𝑘 = ∑𝑘=0 𝑎𝑛−𝑘 = ∑𝑘=1 𝑎𝑛+1−𝑘 . Finite rectangular sums. Consider the following rectangular array.
(1.6.2)
𝑎0,0 𝑎1,0 . . 𝑎𝑛,0
𝑎0,1 𝑎1,1 . . 𝑎𝑛,1
. . . . .
. 𝑎0,𝑚 . 𝑎1,𝑚 . . . . . 𝑎𝑛,𝑚
We may sum the elements of this array by (𝑖) summing each row, and then taking the sum of the row sums or (𝑖𝑖) by summing each column, and then taking the sum of the column sums. That is, if 𝑚, 𝑛 ≥ 0, then 𝑛
(1.6.3)
𝑚
𝑚
𝑛
∑ ( ∑ 𝑎𝑖,𝑗 ) = ∑ ( ∑ 𝑎𝑖,𝑗 ). 𝑗=0 𝑖=0
𝑖=0 𝑗=0
Of course, a similar principle applies to the interchange of multiple sums, so long as the limits on the sums involved are independent of each other. Finite triangular sums. Consider the following triangular array.
(1.6.4)
𝑎0,0 𝑎1,0 . . 𝑎𝑛,0
𝑎1,1 . . 𝑎𝑛,1
. . . . .
𝑎𝑛,𝑛
8
1. Prologue: Compositions of an integer
Here again we can sum the elements of this array by (𝑖) summing each row, and then taking the sum of the row sums, or (𝑖𝑖) summing each column, and then taking the sum of the column sums. That is, for all 𝑛 ≥ 0, 𝑛
𝑖
𝑛
𝑛
∑ ( ∑ 𝑎𝑖,𝑗 ) = ∑ (∑ 𝑎𝑖,𝑗 ).
(1.6.5)
𝑖=0 𝑗=0
𝑗=0 𝑖=𝑗
One can write the sums in (1.6.5) more compactly in the form ∑0≤𝑗≤𝑖≤𝑛 𝑎𝑖,𝑗 , and this notation suggests a mechanical rule for fixing the limits of summation in higher dimensional sums of triangular type when the summation indices are permuted: The limits of summation for a given summation index are the closest parameters to that index that have already been fixed. So, in the preceding example, when the order of summation is ∑𝑖 ∑𝑗 , we take 0 ≤ 𝑖 ≤ 𝑛 and, 𝑖 having now been fixed, we take 0 ≤ 𝑗 ≤ 𝑖. For the following triple triangular sum we have 𝑛
(1.6.6)
𝑖
𝑗
∑ ∑ ∑ 𝑎𝑖,𝑗,𝑘 = 𝑖=0 𝑗=0 𝑘=0
𝑛
∑ 0≤𝑘≤𝑗≤𝑖≤𝑛
𝑖
𝑖
𝑎𝑖,𝑗,𝑘 = ∑ ∑ ∑ 𝑎𝑖,𝑗,𝑘 = etc. 𝑖=0 𝑘=0 𝑗=𝑘
You are asked to complete formula (1.6.6) in Exercise 1.18 below.
Exercises In a number of the following exercises, you are asked to find recursive formulas (i.e., initial values and recurrence relations) for sequences of numbers that enumerate certain discrete structures and to provide combinatorial proofs of their recurrences. These should be modeled after the arguments presented above in sections 1.1 and 1.2 to establish the recurrences (1.1.3) and (1.2.1). In each case you should express the set of discrete structures under consideration as the union of two or more pairwise disjoint subsets, and enumerate their union as the sum of the cardinalities of those subsets. Often, a very simple taxonomy (categorizing words, for example, according to their initial letter, or sets according to whether they contain, or do not contain, a particular distinguished element) will do the trick. 1.1. Prove formula (1.1.2) by showing that the mapping 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘 ⟼ {𝑛1 , 𝑛1 + 𝑛2 , . . . , 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘−1 } from the set of all 𝑘-compositions of 𝑛 to the set of all (𝑘 − 1)-element subsets of the set {1, . . . , 𝑛 − 1} is a bijection. (When 𝑘 = 1, this is interpreted as the map 𝑛 ⟼ ∅.) 1.2. Determine recursive formulas, with combinatorial proofs, of the following sequences, where 𝑟, 𝑠 ≥ 2. A word of length 𝑛 in an alphabet 𝐴 is simply an ordered list of 𝑛 elements, with repetitions allowed, chosen from 𝐴. There is one word of length 0, the so-called empty word, which vacuously satisfies virtually any property. (Can you identify any exceptions?) (a) 𝑤 2 (𝑛) := the number of words of length 𝑛 in {1, 2} with no two consecutive 1’s.
Exercises
9
(b) 𝑤∗2 (𝑛) := the number of words of length 𝑛 in {1, 2} with no three consecutive 1’s (two consecutive 1’s are allowed). (c) 𝑤 𝑟 (𝑛) := the number of words of length 𝑛 in {1, 2, . . . , 𝑟} with no two consecutive 1’s. (d) 𝑤 𝑟,𝑠 (𝑛) := the number of words of length 𝑛 in {1, 2, . . . , 𝑟} with no 𝑠 consecutive 1’s (𝑠 − 1 or fewer consecutive 1’s are allowed). 1.3. Determine recursive formulas, with combinatorial proofs, of the following sequences. (a) 𝑠2 (𝑛) := the number of subsets 𝐴 of {1, . . . , 𝑛} (or of ∅, if 𝑛 = 0) such that |𝑥 − 𝑦| ≥ 2 for all 𝑥, 𝑦 ∈ 𝐴 with 𝑥 ≠ 𝑦. (b) 𝑠𝑟 (𝑛) := the number of subsets 𝐴 of {1, . . . , 𝑛} (or of ∅, if 𝑛 = 0) such that |𝑥 − 𝑦| ≥ 𝑟 for all 𝑥, 𝑦 ∈ 𝐴 with 𝑥 ≠ 𝑦. 𝑛
1.4. Give a combinatorial proof of the identity 𝐹𝑛 = ∑𝑘=0 (𝑛−𝑘 ). Hint: What subclass 𝑘 of the compositions of 𝑛 enumerated by 𝐹𝑛 is counted by (𝑛−𝑘 )? 𝑘 1.5. Give tiling proofs of the following identities. 𝑛 (a) ∑𝑘=0 𝐹 𝑘 = 𝐹𝑛+2 − 1, and (b) 𝐹𝑚+𝑛 = 𝐹𝑚 𝐹𝑛 + 𝐹𝑚−1 𝐹𝑛−1 , for all 𝑚, 𝑛 ≥ 1. 1.6. For real numbers 𝑎 and 𝑏, let 𝐺 0 = 𝑎, 𝐺 1 = 𝑏, and 𝐺𝑛 = 𝐺𝑛−1 + 𝐺𝑛−2 for 𝑛 ≥ 2. Prove that, with the exception of those cases in which 𝑎 + 𝑏Φ = 0, the asymptotic growth rate of 𝐺𝑛 is equal to Φ. 2−𝑥
1.7. Define the sequence (𝐿𝑛 )𝑛≥0 by the generating function ∑𝑛≥0 𝐿𝑛 𝑥𝑛 = 1−𝑥−𝑥2 . (a) Find a recursive formula for the numbers 𝐿𝑛 (which are known as Lucas numbers). (b) Show that 𝐿𝑛 = 𝐹𝑛 + 𝐹𝑛−2 for all 𝑛 ≥ 2. (c) Find a closed-form expression for 𝐿𝑛 . 1.8. Prove formula (1.3.1) by showing that the mapping 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘 ⟼ {𝑛1 + 1, 𝑛1 + 𝑛2 + 2, . . . , 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘−1 + 𝑘 − 1} from the set of all weak 𝑘-compositions of 𝑛 to the set of all (𝑘−1)-element subsets of the set {1, 2, . . . , 𝑛 + 𝑘 − 1} is a bijection. (When 𝑘 = 1, this is interpreted as the map 𝑛 ⟼ ∅.) 1.9. (a) In how many ways may 𝑛 unlabeled (hence, indistinguishable) balls be distributed among 𝑘 boxes labeled 1, . . . , 𝑘? (b) In how many ways if at least one ball must be placed in each box? 1.10. Derive formula (1.1.1) from Theorem 1.5.1. 1.11. For all 𝑛 ∈ ℕ and 𝑘 ∈ ℙ, a sequence 0 = 𝑖0 ≤ 𝑖1 ≤ ⋯ ≤ 𝑖𝑘 = 𝑛 in ℕ is called a multichain of length k from 0 to n. (a) Determine the number of such multichains. (b) If 𝑛, 𝑘 ∈ ℙ, with 𝑘 ≤ 𝑛, a sequence 0 = 𝑖0 < 𝑖𝑖 < ⋯ < 𝑖𝑘 = 𝑛 in ℙ is called a chain of length k from 0 to n. Determine the number of such chains. 1.12. Prove the following variant of Theorem 1.4.1. Suppose that (𝐼𝑖 )1≤𝑖≤𝑘 is a sequence of nonempty subsets of nonnegative integers, and 𝐶(𝑛, 𝑘; (𝐼𝑖 )) := the number of weak 𝑘-compositions 𝑛1 + ⋯ + 𝑛𝑘 = 𝑛 such that 𝑛𝑖 ∈ 𝐼𝑖 , 𝑖 = 1, . . . , 𝑘. Then, for
10
1. Prologue: Compositions of an integer
all 𝑘 ≥ 1, ∑ 𝐶(𝑛, 𝑘; (𝐼𝑖 ))𝑥𝑛 = ( ∑ 𝑥𝑖 ) ⋯ ( ∑ 𝑥𝑖 ). 𝑛≥0
𝑖∈𝐼1
𝑖∈𝐼𝑘
1.13. Let 𝑎𝑛 denote the number of compositions of 𝑛, with all parts odd, and let 𝑏𝑛 be the number of compositions of 𝑛, with all parts even. (a) Determine the ordinary generating function of (𝑎𝑛 ) and a recursive formula for this sequence, and give a combinatorial proof of this recursive formula. (b) Do the same for the sequence (𝑏𝑛 ). 1.14. Let 𝑐𝑛 denote the number of compositions of 𝑛 with all parts greater than 1. (a) Determine the ordinary generating function of (𝑐𝑛 ), and (b) determine a recursive formula for 𝑐𝑛 . (c) Find a combinatorial proof of that recursive formula. (d) Express 𝑐𝑛 as a Fibonacci number for 𝑛 ≥ 2. (e) Find a combinatorial proof of your answer to part (d). 1.15. Let 𝑐𝑛 denote the number of compositions of 𝑛 with all parts congruent to 2 modulo 3. (a) Determine the ordinary generating function of (𝑐𝑛 ), and (b) use it to find a recursive formula for 𝑐𝑛 . (c) Give a combinatorial proof of this recursive formula. 1−𝑥4
1.16. Suppose that ∑𝑛≥0 𝑎𝑛 𝑥𝑛 = 1−𝑥−𝑥4 . (a) Find a combinatorial interpretation of 𝑎𝑛 , and (b) find a recursive formula for 𝑎𝑛 . 1.17. Suppose that 𝑛 ≥ 2. Let ℰ be the set of compositions of 𝑛 with an even number of even parts, and let 𝒪 be the set of compositions of 𝑛 with an odd number of even parts. Find a bijection from ℰ to 𝒪. 1.18. Complete formula (1.6.6) above by determining the limits of summation in the last four sums below 𝑛
𝑖
𝑗
𝑛
∑ ∑ ∑ 𝑎𝑖,𝑗,𝑘 = 𝑖=0 𝑗=0 𝑘=0
∑ 0≤𝑘≤𝑗≤𝑖≤𝑛
𝑖
𝑖
𝑎𝑖,𝑗,𝑘 = ∑ ∑ ∑ 𝑎𝑖,𝑗,𝑘 = ∑ ∑ ∑ 𝑎𝑖,𝑗,𝑘 𝑖=0 𝑘=0 𝑗=𝑘
𝑗= 𝑘= 𝑖=
= ∑ ∑ ∑ 𝑎𝑖,𝑗,𝑘 = ∑ ∑ ∑ 𝑎𝑖,𝑗,𝑘 = ∑ ∑ ∑ 𝑎𝑖,𝑗,𝑘 . 𝑗= 𝑖= 𝑘=
𝑘= 𝑖= 𝑗=
1.19. Interchange summation in the double sum
𝑛 ∑𝑖=0
𝑘= 𝑗= 𝑖= 2𝑖 ∑𝑗=0
𝑎𝑖,𝑗 .
1.20. Suppose that 𝑛 ∈ 𝑁, 𝑘 ∈ 𝑃, and 𝑏1 , . . . , 𝑏𝑘 are integers (possibly negative) such 𝑘 +𝑘−1) sequences of integers that 𝑏1 +⋯+𝑏𝑘 ≤ 𝑛. Prove that there are (𝑛−𝑏1 −⋯−𝑏 𝑘−1 (𝑛1 , . . . , 𝑛𝑘 ) satisfying (i) 𝑛𝑖 ≥ 𝑏𝑖 for 𝑖 = 1, . . . , 𝑘 and (ii) 𝑛1 + ⋯ + 𝑛𝑘 = 𝑛.
Chapter 2
Sets, functions, and relations
Readers who have had an undergraduate course in discrete mathematics or in logic and proof may be familiar with the material in sections 2.1, 2.7–2.9, and the beginning part of section 2.2. In that case, a quick reading of these sections will suffice. In the latter part of section 2.2 and in section 2.3 the addition rule and the product rule are derived from four postulates for the sets [𝑛]. Those uninterested in the derivation of these foundational rules of enumeration may simply adopt these rules as axioms.
2.0. Notation and terminology We employ the following notation and terminology. ℙ = {1, 2, . . . }, the set of positive integers. ℕ = {0, 1, 2, . . . }, the set of nonnegative integers. ℤ = {0, ±1, ±2, . . . }, the set of integers (from the German word Zahl). The symbols 𝑖, 𝑗, 𝑘, 𝑚, and 𝑛 will always denote integers. ℚ = the set of rational numbers. ℝ = the set of real numbers. ℂ = the set of complex numbers. We shall avoid using the term natural number, since some authors use this term to denote a positive integer and others to denote a nonnegative integer. The following notation, due to Richard Stanley (1997), has been widely adopted in combinatorics. For all 𝑛 ∈ ℕ, [𝑛] = {𝑘 ∈ ℙ ∶ 1 ≤ 𝑘 ≤ 𝑛}. So [0] = ∅ and [𝑛] = {1, . . . , 𝑛} if 𝑛 ≥ 1. The set [𝑛] functions as our canonical 𝑛-element set, and allows us to avoid tedious repetition of the sentence, “Let A be a set having 𝑛 elements”, in stating definitions, problems, and theorems. 11
12
2. Sets, functions, and relations
Set-theoretic notation. In this text, the symbol ⊆ denotes the subset relation, and the symbol ⊂ denotes the proper subset relation. The proposition 𝐴 ⊆ 𝐵 may also be expressed as 𝐵 ⊇ 𝐴, and the proposition 𝐴 ⊂ 𝐵 as 𝐵 ⊃ 𝐴. The number of elements belonging to the finite set 𝐴 is denoted by |𝐴|, and the set of all subsets of 𝐴 (called the power set of A) is denoted by 2𝐴 , a mnemonic for the familiar fact that |2𝐴 | = 2|𝐴| . The intersection 𝐴 ∩ 𝐵 of sets 𝐴 and 𝐵 will sometimes be indicated simply by concatenation (i.e., by 𝐴𝐵). We denote by 𝐴+𝐵 the union of the disjoint sets 𝐴 and 𝐵, and by ∑𝑖 𝐴𝑖 the union of the family of pairwise disjoint subsets {𝐴𝑖 }. Finally, 𝐴 ∩ 𝐵 𝑐 , the set-theoretic difference between A and B, will be denoted by 𝐴 − 𝐵 (an alternative notation being 𝐴\𝐵). Conventions on operations indexed on the empty set. Empty sums = 0. Empty products = 1. Empty unions = ∅. Empty intersections = the universal set. The principle of vacuous implication. In its most compact form, this principle might well have been listed in the previous section, since it simply asserts that the empty set is a subset of every set. But the full implications of this convention are frequently not appreciated. In particular, it entails that if there are no things with property 𝑃, it is then true that all things with property 𝑃 have property 𝑄. Along with the conventions detailed in the previous section, this principle is frequently invoked, as will be seen, to extend the enumeration of discrete structures to those involving the empty set. A note on logical notation and terminology. If 𝑃 and 𝑄 are 𝑛-ary predicates and 𝑥1 , . . . , 𝑥𝑛 are individual variables, then 𝑃(𝑥1 , . . . , 𝑥𝑛 ) ⇒ 𝑄(𝑥1 , . . . , 𝑥𝑛 ), where 𝑥1 , . . . , 𝑥𝑛 are free (i.e., unquantified) variables, has the same meaning as ∀𝑥1 ⋯ ∀𝑥𝑛 (𝑃(𝑥1 , . . . , 𝑥𝑛 ) → 𝑄(𝑥1 , . . . , 𝑥𝑛 )). The same is true when ⇒ is replaced by ⇔ and → is replaced by ↔. The symbols ⇒ and ⇔ designate relations between propositions, and are verbalized, respectively, as “implies” and “is equivalent to”. The symbols → and ↔ are truth-functional connectives (known, respectively, as the conditional and the biconditional), with 𝑝 → 𝑞 and 𝑝 ↔ 𝑞 verbalized, respectively, as “if 𝑝, then 𝑞” and “𝑝 if and only if 𝑞”. These terminological niceties are, however, frequently ignored. So we say, for example, “If 𝑓 is differentiable on the open interval 𝐼, then 𝑓 is continuous on 𝐼”, with universal quantification over 𝐼 and 𝑓 being implicitly understood.
2.1. Functions Given any nonempty sets 𝐴 and 𝐵, a function from 𝐴 to 𝐵 is a rule that assigns to each 𝑎 ∈ 𝐴 a single element of 𝐵, denoted by 𝑓(𝑎). The notation 𝑓 ∶ 𝐴 → 𝐵 indicates that 𝑓 is a function from 𝐴 to 𝐵. When a formula for 𝑓(𝑎) is available, one often indicates this by the notation 𝑎 ⟼ 𝑓(𝑎). Depending on the context, functions may also be called correspondences, functionals, maps, mappings, or transformations. The set 𝐴 is called the domain of 𝑓; the set 𝐵, the codomain of f. The range of f (or image of 𝐴 under 𝑓)
2.1. Functions
13
is the set im(𝑓) ∶= {𝑓(𝑎) ∶ 𝑎 ∈ 𝐴} = {𝑏 ∈∶ ∃𝑎 ∈ 𝐴 with 𝑓(𝑎) = 𝑏}. The set of all functions from 𝐴 to 𝐵 is denoted by 𝐵 𝐴 , a mnemonic, as we shall see, for the fact that |𝐵 𝐴 | = |𝐵||𝐴| . The function 𝑖𝐴 ∶ 𝐴 → 𝐴, defined for all 𝑎 ∈ 𝐴 by 𝑖𝐴 (𝑎) = 𝑎, is called the identity function on A. If 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐵 → 𝐶, their composition 𝑔 ∘ 𝑓 ∶ 𝐴 → 𝐶 is defined for all 𝑎 ∈ 𝐴 by 𝑔 ∘ 𝑓(𝑎) ∶= 𝑔(𝑓(𝑎)). Composition of functions is associative, i.e., if ℎ ∶ 𝐶 → 𝐷, then ℎ ∘ (𝑔 ∘ 𝑓) = (ℎ ∘ 𝑔) ∘ 𝑓. If 𝑓 ∶ 𝐴 → 𝐵, the graph of f is the set graph(𝑓) ∶= {(𝑎, 𝑓(𝑎)) ∶ 𝑎 ∈ 𝐴}. A function is fully determined by its domain, codomain, and graph (in fact, the latter two suffice, since the domain of a function can be recovered from its graph in the obvious way). This observation motivates a complementary, and often useful, conception of a function as a set of ordered pairs, according to which a function from 𝐴 to 𝐵 is a set 𝑓 ⊆ 𝐴 × 𝐵 such that, for every 𝑎 ∈ 𝐴, there is just one 𝑏 ∈ 𝐵 such that (𝑎, 𝑏) ∈ 𝑓. Employing this conception, we can make sense of functions with empty domains and/or codomains. In particular, given any set 𝐵 (empty or nonempty), there is just one function 𝑓 ∶ ∅ → 𝐵, namely, the empty function ∅. (Clearly, ∅ ⊆ ∅ × 𝐵 = ∅, and it is vacuously true that for all 𝑎 ∈ ∅, there is just one 𝑏 ∈ 𝐵 such that (𝑎, 𝑏) ∈ ∅.) In symbols, 𝐵 ∅ = {∅}. On the other hand, if 𝐴 ≠ ∅, there are no functions 𝑓 ∶ 𝐴 → ∅. In symbols, ∅𝐴 = ∅ if 𝐴 ≠ ∅. Remark 2.1.1. The conception of a function as a rule is known as the intensional conception, and the conception of a function as a set of ordered pairs as the extensional conception. The terms intension and extension are part of the vocabulary of logic and the philosophy of language, the former being synonymous with sense, meaning, and connotation, and the latter with reference and denotation. Definition 2.1.2 (Injective, surjective, and bijective functions; equinumerous sets). Suppose that 𝑓 ∶ 𝐴 → 𝐵. (i) 𝑓 is injective (in older terminology, one-to-one) if 𝑎1 ≠ 𝑎2 ⇒ 𝑓(𝑎1 ) ≠ 𝑓(𝑎2 ), or, equivalently, if 𝑓(𝑎1 ) = 𝑓(𝑎2 ) ⇒ 𝑎1 = 𝑎2 . Note that the empty function from ∅ to 𝐵 is injective for every set 𝐵. It is easy to see that the composition of two or more injective functions is again injective. The noun associated with the adjective injective is injection. (ii) 𝑓 is surjective (in older terminology, onto) if im(𝑓) = 𝐵, i.e., if, for all 𝑏 ∈ 𝐵, there exists at least one 𝑎 ∈ 𝐴 such that 𝑓(𝑎) = 𝑏. It is easy to see that the composition of two or more surjective functions is again surjective. The noun associated with the adjective surjective is surjection. (iii) 𝑓 is bijective (in older terminology, one-to-one onto, or a one-to-one correspondence) if it is both injective and surjective, i.e., if, for all 𝑏 ∈ 𝐵, there exists a unique 𝑎 ∈ 𝐴, denoted 𝑓−1 (𝑏), such that 𝑓(𝑎) = 𝑏. If 𝑓 ∶ 𝐴 → 𝐵 is bijective, then 𝑓−1 , called the inverse of f , is a bijective function from 𝐵 to 𝐴, with 𝑓−1 ∘ 𝑓 = 𝑖𝐴 (i.e., 𝑓−1 (𝑓(𝑎)) = 𝑎 for all 𝑎 ∈ 𝐴), and 𝑓 ∘ 𝑓−1 = 𝑖𝐵 (i.e., 𝑓(𝑓−1 (𝑏)) = 𝑏 for all 𝑏 ∈ 𝐵). Note that the empty function from ∅ to 𝐵 is bijective if and only if 𝐵 = ∅. It is easy to see that the composition of two or more bijective functions is again bijective. The noun associated with the adjective bijective is bijection. (iv) If there exists a bijection 𝑓 ∶ 𝐴 → 𝐵, we say that 𝐴 and 𝐵 are equinumerous (or equipotent, or equipollent, or have the same cardinality), symbolizing this relation
14
2. Sets, functions, and relations
by 𝐴 ≅ 𝐵. It is easy to show that the relation ≅ is reflexive, symmetric, and transitive. We shall use the following seemingly unremarkable theorem in a subsequent section. Theorem 2.1.3. If 𝐴 ≅ 𝐵, with 𝑎 ∈ 𝐴 and 𝑏 ∈ 𝐵, then 𝐴 − {𝑎} ≅ 𝐵 − {𝑏}. Proof. Suppose that 𝑓 ∶ 𝐴 → 𝐵 is a bijection. Define 𝑔 ∶ 𝐴 − {𝑎} → 𝐵 − {𝑏} by (𝑖) 𝑔(𝑥) = 𝑓(𝑥) if 𝑓(𝑥) ≠ 𝑏, and (𝑖𝑖) 𝑔(𝑥) = 𝑓(𝑎) if 𝑓(𝑥) = 𝑏. The function 𝑔 is clearly a bijection. □ Remark 2.1.4. Somewhat confusingly, the symbols 𝑓 and 𝑓−1 are also used to denote certain set-valued functions of a set-valued variable. Specifically, if 𝑓 is any function from 𝐴 to 𝐵, then 𝑓 ∶ 2𝐴 → 2𝐵 is defined for all 𝐸 ⊆ 𝐴 by 𝑓(𝐸) ∶= {𝑓(𝑎) ∶ 𝑎 ∈ 𝐸}, and 𝑓−1 ∶ 2𝐵 → 2𝐴 is defined for all 𝐻 ⊆ 𝐵 by 𝑓−1 (𝐻) ∶= {𝑎 ∈ 𝐴 ∶ 𝑓(𝑎) ∈ 𝐻}. The set 𝑓(𝐸) is called the image of E under f, and the set 𝑓−1 (𝐻) is called the preimage (or inverse image) of H under f. A function 𝑓 ∶ 𝐴 → 𝐵 is injective if and only if, for all 𝑏 ∈ 𝐵, 𝑓−1 ({𝑏}) is either empty or a singleton set. It is surjective if and only if 𝑓(𝐴) = 𝐵, or, equivalently, for all 𝑏 ∈ 𝐵, 𝑓−1 ({𝑏}) is nonempty. Definition 2.1.5 (One- and two-sided inverses). If 𝑓 ∶ 𝐴 → 𝐵, a function 𝑔 ∶ 𝐵 → 𝐴 is called a left inverse of f if 𝑔 ∘ 𝑓 = 𝑖𝐴 , a right inverse of f if 𝑓 ∘ 𝑔 = 𝑖𝐵 , and a two-sided inverse of f if it is both a left and a right inverse of 𝑓. Theorem 2.1.6. If 𝐴 and 𝐵 are nonempty sets, a function 𝑓 ∶ 𝐴 → 𝐵 has a left inverse if and only if it is injective. Proof. Necessity. Suppose that 𝑔 is a left inverse of 𝑓 and 𝑓(𝑎1 ) = 𝑓(𝑎2 ). Then 𝑎1 = 𝑔(𝑓(𝑎1 )) = 𝑔(𝑓(𝑎2 )) = 𝑎2 . Sufficiency. If 𝑓 is injective, define 𝑔 ∶ 𝐵 → 𝐴 by (𝑖) if 𝑏 ∈ im(𝑓), let 𝑔(𝑏) = the unique 𝑎 ∈ 𝐴 such that 𝑓(𝑎) = 𝑏, and (𝑖𝑖) if 𝑏 ∉ im(𝑓), let 𝑔(𝑏) = any element of 𝐴 whatsoever. Clearly 𝑔 ∘ 𝑓 = 𝑖𝐴 . □ Corollary 2.1.7. If 𝐴 and 𝐵 are nonempty sets, and there exists an injection 𝑓 ∶ 𝐴 → 𝐵, then there exists a surjection 𝑔 ∶ 𝐵 → 𝐴. Proof. By Theorem 2.1.6, 𝑓 has a left inverse 𝑔. The function 𝑔 is surjective, since, for all 𝑎 ∈ 𝐴, there exists a 𝑏 ∈ 𝐵 (for example, 𝑏 = 𝑓(𝑎)) such that 𝑔(𝑏) = 𝑎. □ Theorem 2.1.8. If If 𝐴 and 𝐵 are nonempty sets, a function 𝑓 ∶ 𝐴 → 𝐵 has a right inverse if and only if it is surjective. Proof. Necessity. If ℎ is a right inverse of 𝑓, then, for all 𝑏 ∈ 𝐵, 𝑓(ℎ(𝑏)) = 𝑏, and so im(𝑓) = 𝐵. Sufficiency. If 𝑓 is surjective, then, for all 𝑏 ∈ 𝐵, 𝑓−1 ({𝑏}) ≠ ∅. For each such 𝑏, let ℎ(𝑏) be equal to any member of 𝑓−1 ({𝑏}). Clearly, 𝑓 ∘ ℎ = 𝑖𝐵 . □ Remark 2.1.9 (The axiom of choice). In the above construction of the right inverse ℎ, we needed to select a member of each of the sets 𝑓−1 ({𝑏}). If 𝐵, and hence the family {𝑓−1 ({𝑏})}𝑏∈𝐵 , is finite, such a selection is unproblematic in elementary set theory. If 𝐵
2.1. Functions
15
(and hence 𝐴) is infinite, however, it is necessary to justify this selection by invoking the axiom of choice, which extends the set-building capabilities of elementary set theory by postulating that for every family ℱ of nonempty subsets of a set 𝐴, there is a function 𝑐 ∶ ℱ → 𝐴, called a choice function, such that 𝑐(𝐸) ∈ 𝐸, for all 𝐸 ∈ ℱ. The axiom of choice has substantial applications in all areas of mathematics. A particularly good discussion of this axiom, along with its many equivalent formulations, may be found in Warner (1965, Volume II, Chapter XI). See also Velleman (2006). Corollary 2.1.10. If 𝐴 and 𝐵 are nonempty sets and there exists a surjection 𝑓 ∶ 𝐴 → 𝐵, then there exists an injection 𝑔 ∶ 𝐵 → 𝐴. Proof. By Theorem 2.1.8, 𝑓 has a right inverse 𝑔. Since 𝑓 ∘ 𝑔 = 𝑖𝐵 , 𝑓 is a left inverse of 𝑔, and so 𝑔 is injective by Theorem 2.1.6. □ Theorem 2.1.11. If 𝐴 and 𝐵 are nonempty sets, and the function 𝑓 ∶ 𝐴 → 𝐵 has a left inverse 𝑔 and a right inverse ℎ, then 𝑔 = ℎ. Proof. Exercise.
□
Remark 2.1.12. As a consequence of Theorem 2.1.11, it follows that if 𝑓 ∶ 𝐴 → 𝐵 is a bijection, then its inverse 𝑓−1 is both the sole left inverse and the sole right inverse of 𝑓. Consequently, if we know that 𝑓 is bijective, then to show that 𝑓−1 = 𝑔, it suffices to show either that 𝑔 is a left inverse of 𝑓 or that 𝑔 is a right inverse of 𝑓. We conclude this section by noting that combinatorial problems involving functions are often more easily approached when interpreted in one of the following concrete ways: Functions as words. If 𝐴 ⊆ ℤ, a function 𝑓 ∶ 𝐴 → 𝐵 is called a sequence, or word, in B. Usually, 𝐴 = ℕ, ℙ, or [𝑛]. In the latter case, 𝑓 is called a sequence of length n in B, symbolized (𝑓(1), . . . , 𝑓(𝑛)), or a word of length n in the alphabet B, symbolized 𝑓(1) ⋯ 𝑓(𝑛). Note that for each 𝑏 ∈ 𝐵, 𝑓−1 ({𝑏}) is the set of positions in which the letter 𝑏 occurs. A word in 𝐵 is injective if and only if each letter in 𝐵 appears at most once in the word, and it is surjective if and only if each letter in 𝐵 appears at least once in the word. An injective word of length 𝑘 in 𝐵 is called a permutation of the letters in B, taken k at a time. If |𝐵| = 𝑛, a permutation of the letters in 𝐵, taken 𝑛 at a time, is simply called a permutation of B. In closely associated usage, the term permutation of B is also used to denote any bijection 𝜋 ∶ 𝐵 → 𝐵. Functions as distributions. . If 𝐴 is a set of labeled (hence, distinguishable) balls and 𝐵 is a set of labeled (hence, distinguishable) boxes, a function 𝑓 ∶ 𝐴 → 𝐵 defines a distribution of (all of) the balls among the boxes, with ball 𝑎 placed in box 𝑓(𝑎). Here, for each 𝑏 ∈ 𝐵, 𝑓−1 ({𝑏}) denotes the set of balls placed in box 𝑏. The distribution 𝑓 is injective if and only if at most one ball is placed in each box, and it is surjective if and only if at least one ball is placed in each box.
16
2. Sets, functions, and relations
2.2. Finite sets A set 𝐴 is finite if 𝐴 ≅ [𝑛] for some 𝑛 ∈ ℕ. A set is infinite if it is not finite. Properties of finite sets and their mappings are ultimately based on properties of the sets [𝑛], which can be rigorously derived from any of several axiomatizations of the set ℕ. (See, for example, the particularly comprehensive treatment in Warner (1965, Volume I, Sections 16, 17, and 19).) We shall adopt as postulates four such properties. Rather than stating all four postulates immediately, however, we have opted to highlight the role played by each postulate by introducing it just when needed to prove a particular theorem or sequence of theorems. Postulate I. [𝑚] ≅ [𝑛] if and only if 𝑚 = 𝑛. If 𝐴 ≅ [𝑛], we say that A has cardinality n (or that A is an n-set) symbolized by |𝐴| = 𝑛. The well-definedness of the cardinality of a finite set follows from Postulate I, along with the transitivity of the relation ≅. Of course, |[𝑛]| = 𝑛, for all 𝑛 ∈ 𝑁. Theorem 2.2.1. If 𝐴 and 𝐵 are finite sets, then 𝐴 ≅ 𝐵 if and only if |𝐴| = |𝐵|. Proof. Necessity. Suppose that |𝐴| = 𝑚 and |𝐵| = 𝑛. Then there exist bijections 𝑓 ∶ 𝐴 → [𝑚] and 𝑔 ∶ 𝐵 → [𝑛]. Since 𝐴 ≅ 𝐵, there exists a bijection ℎ ∶ 𝐴 → 𝐵. It follows that 𝑔 ∘ ℎ ∘ 𝑓−1 is a bijection from [𝑚] to [𝑛], and so [𝑚] ≅ [𝑛]. Then 𝑚 = 𝑛 by Postulate I, whence |𝐴| = |𝐵|. Sufficiency. Suppose that |𝐴| = |𝐵| = 𝑛. Then there exists a bijection 𝑓 ∶ 𝐴 → [𝑛] and a bijection 𝑔 ∶ 𝐵 → [𝑛]. So 𝑔−1 ∘ 𝑓 is a bijection from 𝐴 to 𝐵, whence 𝐴 ≅ 𝐵. □ Note that the above theorem warrants the following types of bijective (or combinatorial) proofs, mentioned earlier in Chapter 1: (𝑖) To determine |𝐴|, identify a more easily enumerated set 𝐵 and a bijection 𝑓 ∶ 𝐴 → 𝐵, determine |𝐵|, and conclude that |𝐴| = |𝐵|. (𝑖𝑖) To establish an identity of the form 𝑎 = 𝑏, where 𝑎 and 𝑏 are expressions denoting nonnegative integers, exhibit sets 𝐴 and 𝐵 and a bijection 𝑓 ∶ 𝐴 → 𝐵, and show that |𝐴| = 𝑎 and |𝐵| = 𝑏. Postulate II. If 𝐴 ⊆ [𝑛], then 𝐴 is finite, and |𝐴| ≤ 𝑛. The above postulate leads immediately to the following theorem. Theorem 2.2.2. If B is a finite set and 𝐴 ⊆ 𝐵, then 𝐴 is finite and |𝐴| ≤ |𝐵| Proof. Suppose that |𝐵| = 𝑛 and 𝑔 ∶ 𝐵 → [𝑛] is a bijection. Define 𝑓 ∶ 𝐴 → 𝐵 by 𝑓(𝑎) = 𝑎, for all 𝑎 ∈ 𝐴. Then 𝑔 ∘ 𝑓 is an injection from 𝐴 to [𝑛], and it is a bijection from 𝐴 onto the set 𝑔(𝑓(𝐴)) ⊆ [𝑛]. By Theorem 2.2.1 and Postulate II, it follows that |𝐴| = |𝑔(𝑓(𝐴))| ≤ 𝑛 = |𝐵|. □ Theorem 2.2.3. If 𝐴 and 𝐵 are finite sets and there exists an injection 𝑓 ∶ 𝐴 → 𝐵, then |𝐴| ≤ |𝐵|. Proof. If 𝑓 is an injection from 𝐴 to 𝐵, it is a bijection from 𝐴 onto the set 𝑓(𝐴) ⊆ 𝐵. By Theorems 2.2.1 and 2.2.2, it follows that |𝐴| = |𝑓(𝐴)| ≤ |𝐵|. □
2.2. Finite sets
17
Corollary 2.2.4. If 𝐴 and 𝐵 are finite sets and there exists an injection 𝑓 ∶ 𝐴 → 𝐵 and an injection 𝑔 ∶ 𝐵 → 𝐴, then 𝐴 ≅ 𝐵. Proof. By Theorem 2.2.3, we have |𝐴| = |𝐵|, and so 𝐴 ≅ 𝐵 by Theorem 2.2.1.
□
Remark 2.2.5. Corollary 2.2.4 actually holds for arbitrary sets 𝐴 and 𝐵, but it requires a more complicated proof. The general result is called the Schröder–Bernstein theorem. See Velleman (2006, Section 7.3) for further details. Remark 2.2.6 (The pigeonhole principle). The contrapositive of Theorem 2.2.3 asserts for finite sets 𝐴 and 𝐵 that if |𝐴| > |𝐵|, then there are no injections from 𝐴 to 𝐵. In other words, if |𝐴| > |𝐵| and 𝑓 ∶ 𝐴 → 𝐵, then there exists a 𝑏 ∈ 𝐵 such that |𝑓−1 ({𝑏})| > 1. This is the simplest form of the pigeonhole principle (“If there are more pigeons than pigeonholes, and each pigeon must be placed in a pigeonhole, then there is at least one pigeonhole occupied by more than one pigeon”), which we elaborate below in section 2.6. Theorem 2.2.7. If 𝐴 and 𝐵 are finite sets and there exists a surjection 𝑓 ∶ 𝐴 → 𝐵, then |𝐴| ≥ |𝐵|. Proof. By Corollary 2.1.10, the existence of a surjection from 𝐴 to 𝐵 implies the existence of an injection from 𝐵 to 𝐴, whence |𝐵| ≤ |𝐴|, by Theorem 2.2.3. □ Postulate III. For all 𝑛 ∈ ℙ, [𝑛] − {𝑛} = [𝑛 − 1]. Theorem 2.2.8. If |𝐴| = 𝑛 and 𝑎 ∈ 𝐴, then |𝐴 − {𝑎}| = 𝑛 − 1. Proof. Since 𝐴 ≅ [𝑛], it follows from Theorem 2.1.3 and Postulate III that 𝐴 − {𝑎} ≅ [𝑛] − {𝑛} = [𝑛 − 1]. So |𝐴 − {𝑎}| = |[𝑛 − 1]| = 𝑛 − 1. □ Theorem 2.2.9. If 𝐵 is finite, 𝐴 ⊆ 𝐵, and |𝐴| = |𝐵|, then 𝐴 = 𝐵. Proof. If 𝐴 ⊂ 𝐵, then there exists a 𝑏 ∈ 𝐵 such that 𝑏 ∉ 𝐴. So 𝐴 ⊆ 𝐵 − {𝑏}. By Theorems 2.2.2 and 2.2.8, it follows that |𝐴| ≤ |𝐵 − {𝑏}| = |𝐵| − 1, a contradiction. □ Theorem 2.2.10. Suppose that 𝐴 and 𝐵 are finite sets and |𝐴| = |𝐵|. A function 𝑓 ∶ 𝐴 → 𝐵 is injective if and only if it is surjective. Proof. Necessity. If 𝑓 ∶ 𝐴 → 𝐵 is injective, then 𝑓 is a bijection from 𝐴 onto the set 𝑓(𝐴) ⊆ 𝐵, with |𝑓(𝐴)| = |𝐴| = |𝐵|. By Theorem 2.2.9, it follows that 𝑓(𝐴) = 𝐵. Sufficiency. Suppose that 𝑓 is surjective, but not injective. Then there exist 𝑎, 𝑎′ ∈ 𝐴, with 𝑎 ≠ 𝑎′ , and 𝑓(𝑎) = 𝑓(𝑎′ ). Let 𝑔 ∶ 𝐴 − {𝑎′ } → 𝐵 be the restriction of 𝑓 to 𝐴 − {𝑎′ }. The function 𝑔 is still surjective, and so, by Theorem 2.2.7, |𝐴 − {𝑎′ }| ≥ |𝐵|. But by Theorem 2.2.8, |𝐴 − {𝑎′ }| = |𝐴| − 1 < |𝐴|, and so |𝐴| > |𝐵|, a contradiction. □ The above theorem is extremely useful, for if we know that the finite sets 𝐴 and 𝐵 have the same cardinality, and we wish to demonstrate that the function 𝑓 ∶ 𝐴 → 𝐵 is a bijection, it suffices to show either that 𝑓 is injective or that 𝑓 is surjective. Readers who have had an introductory course in linear and abstract algebra may recall that this theorem plays a central role in the proof that every finite integral domain is a field and
18
2. Sets, functions, and relations
that every monomorphism from a finite-dimensional vector space to a vector space of the same dimension is an isomorphism. If 𝑎, 𝑏 ∈ ℕ, with 𝑎 ≤ 𝑏, let [𝑎, 𝑏] ∶= {𝑘 ∈ ℕ ∶ 𝑎 ≤ 𝑘 ≤ 𝑏}. The set [𝑎, 𝑏] is called the closed integer interval from a to b. Postulate IV. For all 𝑚 ∈ ℕ and all 𝑛 ∈ ℙ, [𝑚 + 1, 𝑚 + 𝑛] ≅ [𝑛]. Theorem 2.2.11 (The addition rule). If 𝐴 and 𝐵 are finite, disjoint sets, then 𝐴 ∪ 𝐵 is finite and |𝐴 ∪ 𝐵| = |𝐴| + |𝐵|. Proof. If 𝐵 = ∅, the result is clear, so suppose that |𝐴| = 𝑚 and |𝐵| = 𝑛, where 𝑚 ∈ ℕ and 𝑛 ∈ ℙ. Then there exists a bijection 𝑓 ∶ 𝐴 → [𝑚], and, by Postulate IV, there exists a bijection 𝑓′ ∶ 𝐵 → [𝑚 + 1, 𝑚 + 𝑛]. Since 𝐴 ∩ 𝐵 = ∅ and [𝑚] ∩ [𝑚 + 1, 𝑚 + 𝑛] = ∅, the function 𝑔 ∶= 𝑓 ∪ 𝑓′ is a bijection from 𝐴 ∪ 𝐵 to [𝑚] ∪ [𝑚 + 1, 𝑚 + 𝑛]. But [𝑚] ∪ [𝑚 + 1, 𝑚 + 𝑛] = [𝑚 + 𝑛]. For it is clear that (𝑖) [𝑚] ∪ [𝑚 + 1, 𝑚 + 𝑛] ⊆ [𝑚 + 𝑛]. Also, (𝑖𝑖) [𝑚 + 𝑛] ⊆ [𝑚] ∪ [𝑚 + 1, 𝑚 + 𝑛], for if (𝑖𝑖) fails, then there exists an integer 𝑘 ∈ [𝑚 + 𝑛] such that (𝑖𝑖𝑖) 𝑚 < 𝑘 < 𝑚 + 1. But (𝑖𝑖𝑖), along with Postulate III, implies that 𝑘 ∈ [𝑚 + 1] − {𝑚 + 1} = [𝑚], contradicting the inequality 𝑚 < 𝑘. By the preceding argument, we have |𝐴 ∪ 𝐵| = |[𝑚 + 𝑛]| = 𝑚 + 𝑛 = |𝐴| + |𝐵|. □ Corollary 2.2.12 (The extended addition rule). If the finite sets 𝐴1 , . . . , 𝐴𝑛 are pairwise disjoint (𝑖 ≠ 𝑗 ⇒ 𝐴𝑖 ∩ 𝐴𝑗 = ∅), then 𝐴1 ∪ ⋯ ∪ 𝐴𝑛 is finite, and |𝐴1 ∪ ⋯ ∪ 𝐴𝑛 | = |𝐴1 | + ⋯ + |𝐴𝑛 |. □
Proof. By induction on 𝑛, based on Theorem 2.2.11.
Along with their many other applications, the addition rules are indispensable in establishing recurrence relations by combinatorial arguments, as we have seen, for example, in the proofs of formulas (1.1.3) and (1.2.1). The following theorem states an important generalization of these rules. Theorem 2.2.13 (The sieve formula). If the sets 𝐴1 , . . . , 𝐴𝑛 are finite, then so is their union, and (2.2.1)
|𝐴1 ∪ ⋯ ∪ 𝐴𝑛 | = ∑ |𝐴𝑖 | − 1≤𝑖≤𝑛
∑
|𝐴𝑖 𝐴𝑗 | + ⋯ + (−1)𝑛−1 |𝐴1 . . . 𝐴𝑛 |,
1≤𝑖 |𝐵|, and |𝐵bij | = 𝑛! if |𝐴| = |𝐵| = 𝑛.
□
Proof. Clear.
2.4. Counting surjections: A recursive formula Suppose that 𝑛, 𝑘 ≥ 0. How many functions 𝑓 ∶ [𝑛] → [𝑘] are surjective? Let us denote the answer by 𝜎(𝑛, 𝑘). We know from earlier observations that 𝜎(𝑛, 0) = 𝛿𝑛,0 and 𝜎(0, 𝑘) = 𝛿0,𝑘 . Also, 𝜎(𝑛, 𝑛) = 𝑛! and 𝜎(𝑛, 𝑘) = 0 whenever 𝑛 < 𝑘. While there is no simple closed form expression for 𝜎(𝑛, 𝑘), there is a nice recursive formula. Theorem 2.4.1. For all 𝑛, 𝑘 ≥ 0, 𝜎(𝑛, 0) = 𝛿𝑛,0 and 𝜎(0, 𝑘) = 𝛿0,𝑘 . For all 𝑛, 𝑘 ≥ 1, (2.4.1)
𝜎(𝑛, 𝑘) = 𝑘𝜎(𝑛 − 1, 𝑘 − 1) + 𝑘𝜎(𝑛 − 1, 𝑘).
Proof. Among all distributions of 𝑛 balls, labeled 1, . . . , 𝑛, among 𝑘 boxes, labeled 1, . . . , 𝑘, with no box left empty, 𝑘𝜎(𝑛 − 1, 𝑘 − 1) counts those in which ball 𝑛 is placed in a box by itself, and 𝑘𝜎(𝑛 − 1, 𝑘) counts those in which ball 𝑛 is placed in a box with at least one other ball. (Note how using the language of distributions simplifies this proof. If we had employed functional terminology, we would have had to classify the surjective functions 𝑓 ∶ [𝑛] → [𝑘], according to whether 𝑓−1 ({𝑓(𝑛)}) = {𝑛} or 𝑓−1 ({𝑓(𝑛)}) ⊃ {𝑛}). □ Remark 2.4.2 (Two bogus formulas for 𝜎(𝑛, 𝑘)). Neophytes often hit on one of the following incorrect formulas, (2.4.2) and (2.4.3), for 𝜎(𝑛, 𝑘). (2.4.2)
𝜎(𝑛, 𝑘) = 𝑛𝑘 𝑘𝑛−𝑘 .
Putative derivation: (1) Choose 𝑖1 ∈ [𝑛] in one of 𝑛 possible ways, and set 𝑓(𝑖1 ) = 1. (2) Choose 𝑖2 ∈ [𝑛] − {𝑖1 } in one of 𝑛 − 1 possible ways, and set 𝑓(𝑖2 ) = 2.
22
2. Sets, functions, and relations
Table 2.1. The numbers 𝜎(𝑛, 𝑘), for 0 ≤ 𝑛, 𝑘 ≤ 6
𝑛=0 𝑛=1 𝑛=2 𝑛=3 𝑛=4 𝑛=5 𝑛=6
𝑘=0 1 0 0 0 0 0 0
𝑘=1 0 1 1 1 1 1 1
𝑘=2 0 0 2 6 14 30 62
𝑘=3 0 0 0 6 36 150 540
𝑘=4 0 0 0 0 24 240 1560
𝑘=5 0 0 0 0 0 120 1800
𝑘=6 0 0 0 0 0 0 720
(3) Choose 𝑖3 ∈ [𝑛] − {𝑖1 , 𝑖2 } in one of 𝑛 − 2 possible ways, and set 𝑓(𝑖3 ) = 3. ... (k) Choose 𝑖𝑘 ∈ [𝑛] − {𝑖1 , . . . , 𝑖𝑘−1 } in one of 𝑛 − 𝑘 + 1 possible ways, and set 𝑓(𝑖𝑘 ) = 𝑘. Now, every element of the codomain [𝑘] already appears as a value of 𝑓, so we just map the remaining 𝑛 − 𝑘 elements in [𝑛] − {𝑖1 , . . . , 𝑖𝑘 } to arbitrary elements of [𝑘]. This procedure always produces a surjective function from [𝑛] to [𝑘], and it can be carried out in 𝑛𝑘 𝑘𝑛−𝑘 ways. So 𝜎(𝑛, 𝑘) = 𝑛𝑘 𝑘𝑛−𝑘 . (2.4.3)
𝜎(𝑛, 𝑘) = 𝑘! 𝑘𝑛−𝑘 .
Putative derivation: (1) Choose 𝑗1 ∈ [𝑘] in one of 𝑘 possible ways, and set 𝑓(1) = 𝑗1 . (2) Choose 𝑗2 ∈ [𝑘] − {𝑗1 } in one of 𝑘 − 1 possible ways, and set 𝑓(2) = 𝑗2 . ... (k) Choose 𝑗 𝑘 ∈ [𝑘] − {𝑗1 , . . . , 𝑗 𝑘−1 } in the only possible way, and set 𝑓(𝑘) = 𝑗 𝑘 . Now, every element of the codomain [𝑘] already appears as a value of 𝑓, so we just map the elements of [𝑛] − [𝑘] to arbitrary elements of [𝑘]. This procedure always produces a surjective function from [𝑛] to [𝑘], and it can be carried out in 𝑘! 𝑘𝑛−𝑘 ways. So 𝜎(𝑛, 𝑘) = 𝑘! 𝑘𝑛−𝑘 . It is a worthwhile exercise to articulate carefully just what is wrong with the above arguments.
2.5. The domain partition induced by a function A partition of a set 𝐴 is a set of nonempty, pairwise disjoint subsets (called blocks) of 𝐴, with union equal to 𝐴. A function 𝑓 ∶ 𝐴 → 𝐵 induces a partition of 𝐴 with as many blocks as there are elements of im(𝑓). Theorem 2.5.1. If 𝑓 ∶ 𝐴 → 𝐵, then 𝐴/𝑓 ∶= {𝑓−1 ({𝑏}) ∶ 𝑏 ∈ 𝑖𝑚(𝑓)} is a partition of A, and 𝐴/𝑓 ≅ 𝑖𝑚(𝑓). In particular, if f is surjective, then 𝐴/𝑓 ≅ 𝐵. Proof. For each 𝑏 ∈ 𝑖𝑚(𝑓), 𝑓−1 ({𝑏}) = {𝑎 ∈ 𝐴 ∶ 𝑓(𝑎) = 𝑏} ≠ ∅. If 𝑏1 , 𝑏2 ∈ 𝑖𝑚(𝑓) and 𝑏1 ≠ 𝑏2 , then 𝑓−1 ({𝑏1 }) ∩ 𝑓−1 ({𝑏2 }) = ∅, since, if not, there would exist an 𝑎 ∈ 𝐴 such that 𝑓(𝑎) = 𝑏1 and 𝑓(𝑎) = 𝑏2 , contradicting the fact that functions are by definition
2.5. The domain partition induced by a function
23
single-valued. Finally, ⋃𝑏∈𝑖𝑚(𝑓) 𝑓−1 ({𝑏}) = 𝐴, since for each 𝑎 ∈ 𝐴, 𝑎 ∈ 𝑓−1 ({𝑓(𝑎)}). The map 𝑏 ↦ 𝑓−1 ({𝑏}) is clearly a bijection from im(𝑓) to 𝐴/𝑓. □ The following theorems establish some useful connections between the cardinalities of the domain and range of a function when these sets are finite. Theorem 2.5.2. Let 𝑓 ∶ 𝐴 → 𝐵, where 𝐴 and 𝐵 are finite. Then |𝐴| = ∑ |𝑓−1 ({𝑏})|.
(2.5.1)
𝑏∈𝐵
Proof. By Theorem 2.5.1, 𝐴 = ∑𝑏∈im(𝑓) 𝑓−1 ({𝑏}), so by the addition rule and the fact that 𝑓−1 ({𝑏}) = ∅ if 𝑏 ∉ im(𝑓), |𝐴| =
∑
□
|𝑓−1 ({𝑏})| = ∑ |𝑓−1 ({𝑏})|. 𝑏∈𝐵
𝑏∈im(𝑓)
There is an amusing, and occasionally useful, corollary of Theorem 2.5.2, the STEEP (solution-to-every-enumeration-problem) corollary. Corollary 2.5.3 (STEEP). For every finite set 𝐴, |𝐴| = ∑𝑎∈𝐴 1. □
Proof. Let 𝐵 = 𝐴 and 𝑓 = 𝑖𝐴 in Theorem 2.5.2.
If 𝑓 ∶ 𝐴 → 𝐵 and for all 𝑏 ∈ 𝐵, |𝑓−1 ({𝑏})| = 𝑚, where 𝑚 ∈ ℙ, then 𝑓 is called an m-to-one surjection. Theorem 2.5.4. Suppose that 𝐴 and 𝐵 are finite and 𝑓 ∶ 𝐴 → 𝐵 is an 𝑚-to-one surjection. Then |𝐴| = 𝑚|𝐵|. Proof. By Theorem 2.5.2 and Corollary 2.5.3, |𝐴| = ∑𝑏∈𝐵 |𝑓−1 ({𝑏})| = ∑𝑏∈𝐵 𝑚 = 𝑚 ∑𝑏∈𝐵 1 = 𝑚|𝐵|. □ The preceding result provides a nice, clean way of deriving a formula of the form |𝐵| = |𝐴|/𝑚, without a lot of vague talk about “counting 𝐴 and dividing by the overcount”. We shall use this result frequently in what follows, and you are strongly encouraged to make use of it in your proofs (as an elementary example, see the proof of Theorem 3.1.9 below). The following result generalizes Theorem 2.5.4: Theorem 2.5.5. Suppose that 𝐴 and 𝐵 are finite sets, that {𝐵1 , . . . , 𝐵𝑟 } is a partition of 𝐵, and that 𝑓 ∶ 𝐴 → 𝐵. If (𝑚1 , . . . , 𝑚𝑟 ) is a sequence of positive integers and for all 𝑗 ∈ [𝑟] 𝑟 and all 𝑏 ∈ 𝐵𝑗 that |𝑓−1 ({𝑏})| = 𝑚𝑗 , then |𝐴| = ∑𝑗=1 𝑚𝑗 |𝐵𝑗 |. Proof. 𝑟
𝑟
𝑟
𝑟
|𝐴| = ∑ |𝑓−1 ({𝑏})| = ∑ ∑ |𝑓−1 ({𝑏})| = ∑ ∑ 𝑚𝑗 = ∑ 𝑚𝑗 ∑ 1 = ∑ 𝑚𝑗 |𝐵𝑗 |. 𝑏∈𝐵
𝑗=1 𝑏∈𝐵𝑗
𝑗=1 𝑏∈𝐵𝑗
𝑗=1
𝑏∈𝐵𝑗
𝑗=1
□
24
2. Sets, functions, and relations
2.6. The pigeonhole principle for functions We have already encountered (in Remark 2.2.6) an elementary form of the pigeonhole principle. The following elaborates on this idea in a useful way. Theorem 2.6.1. If 𝐴 and 𝐵 are finite sets with 𝐵 ≠ ∅ and 𝑓 ∶ 𝐴 → 𝐵, then there exists a 𝑏 ∈ 𝐵 such that |𝑓−1 ({𝑏})| ≥ |𝐴|/|𝐵|. Proof. Suppose that, for all 𝑏 ∈ 𝐵, |𝑓−1 ({𝑏})| < |𝐴|/|𝐵|. By Theorem 2.5.2 it would then follow that |𝐴| = ∑𝑏∈𝐵 |𝑓−1 ({𝑏})| < ∑𝑏∈𝐵 |𝐴|/|𝐵| = |𝐴|. □ The preceding theorem is of course interesting only when |𝐴| > |𝐵|. Since the quantity |𝐴|/|𝐵| is equal to the average number of domain elements per codomain element, the theorem may be paraphrased as asserting that not every codomain element can be subaverage in this sense. Here are a few simple applications. Theorem 2.6.2. In any subset of [2𝑛] of cardinality 𝑛+1, there are at least two numbers, one of which divides the other. This result is best possible, in the sense that it fails to hold in general for subsets of [2𝑛] of cardinality 𝑛. Proof. Write each of the 𝑛 + 1 numbers as a power of 2 times an odd number. As there are only 𝑛 odd numbers < 2𝑛, two of the 𝑛 + 1 numbers, say 𝑎 and 𝑏, have the same odd factor 𝜔. Suppose that 𝑎 = 2𝑗 × 𝜔 and 𝑏 = 2𝑘 × 𝜔. If 𝑗 < 𝑘, then 𝑎 divides 𝑏, and if 𝑗 > 𝑘, then 𝑏 divides 𝑎. In the 𝑛-element subset 𝑆 = {𝑛 + 1, 𝑛 + 2, . . . , 2𝑛} of [2𝑛] there are no divisibility relations, since any nontrivial multiple of any element of 𝑆 must be ≥ 2𝑛 + 2. □ Theorem 2.6.3 (Erdős and Szekeres (1935)). In any sequence (𝑥1 , 𝑥2 , . . . , 𝑥𝑛2 +1 ) of distinct real numbers there is either an increasing subsequence with 𝑛 + 1 terms or a decreasing subsequence with 𝑛 + 1 terms. Proof. Consider the map 𝑓 ∶ {𝑥1 , 𝑥2 , . . . , 𝑥𝑛2 +1 } → [𝑛2 + 1], where 𝑓(𝑥𝑖 ) = the length of the longest increasing subsequence beginning with 𝑥𝑖 . If 𝑓(𝑥𝑖 ) ≥ 𝑛 + 1 for some 𝑖, the desired conclusion holds. If not, then 𝑓 ∶ {𝑥1 , 𝑥2 , . . . , 𝑥𝑛2 +1 } → [𝑛], and so there exists a 𝑗 ∈ [𝑛] such that |𝑓−1 ({𝑗})| ≥ (𝑛2 + 1)/𝑛, whence |𝑓−1 ({𝑗})| ≥ 𝑛 + 1. So there are at least 𝑛+1 𝑥𝑖 ’s, say, 𝑥𝑖1 , 𝑥𝑖2 , . . . , 𝑥𝑖𝑛+1 , where 1 ≤ 𝑖1 < 𝑖2 < ⋯ < 𝑖𝑛+1 ≤ 𝑛2 +1, such that the longest increasing subsequence beginning with each of these 𝑥𝑖 ’s has length 𝑗. But then it must be the case that 𝑥𝑖𝑖 > 𝑥𝑖2 > ⋯ > 𝑥𝑖𝑛+1 , for if it were the case that 𝑥𝑖𝑗 < 𝑥𝑖𝑗+1 for some 𝑗 ∈ [𝑛], then 𝑥𝑖𝑗 , followed by any increasing subsequence of length 𝑗 beginning with 𝑥𝑖𝑗+1 would be an increasing subsequence of length 𝑗 + 1 beginning with 𝑥𝑖𝑗 , a contradiction. □ Remark 2.6.4. The preceding theorem is best-possible in the sense that it fails to hold in general for sequences of 𝑛2 distinct real numbers. You are asked to prove this, as well as an obvious generalization of this theorem to the case of a sequence of 𝑚𝑛 + 1 distinct reals, as exercises. As we shall see in Chapter 12, both Theorem 2.6.3 and the generalization that you are asked to prove in Exercise 2.12 are special cases of a much deeper result from the theory of partially ordered sets known as Dilworth’s lemma.
2.7. Relations
25
2.7. Relations Just as a function from 𝐴 to 𝐵 may be construed either intensionally or extensionally, the same is true of a relation from 𝐴 to 𝐵. On the former conception, a relation is simply a predicate, or property, generically denoted by 𝑅, which holds for certain pairs (𝑎, 𝑏) ∈ 𝐴 × 𝐵, in which case one writes 𝑎𝑅𝑏 (or, in predicate logic, 𝑅(𝑎, 𝑏)). We symbolize the fact that the relation 𝑅 fails to hold for the pair (𝑎, 𝑏) by ∼ 𝑎𝑅𝑏. Every function 𝑓 from 𝐴 to 𝐵 gives rise to a relation 𝑅 from 𝐴 to 𝐵, defined by 𝑎𝑅𝑏 ⇔ 𝑓(𝑎) = 𝑏. Binary relations are ubiquitous in mathematics. Among the most frequently occurring is the relation ∈ from a set 𝐴 to its power set 2𝐴 . If 𝑅 is a relation from 𝐴 to 𝐵, the domain, range, and graph of 𝑅 are, respectively, the sets domain(𝑅) ∶= {𝑎 ∈ 𝐴 ∶ ∃𝑏 ∈ 𝐵 such that 𝑎𝑅𝑏}, range(𝑅) ∶= {𝑏 ∈ 𝐵 ∶ ∃𝑎 ∈ 𝐴 such that 𝑎𝑅𝑏}, and graph(𝑅) ∶= {(𝑎, 𝑏) ∶ 𝑎𝑅𝑏}. Note that the domain of a relation from 𝐴 to 𝐵 (unlike that of a function from 𝐴 to 𝐵) may be a proper subset of 𝐴. Given 𝐴 and 𝐵, a relation from 𝐴 to 𝐵 is completely determined by its graph. This observation suggests defining a relation from 𝐴 to 𝐵 extensionally as any subset of 𝐴 × 𝐵. Strictly speaking, one should then write (𝑎, 𝑏) ∈ 𝑅, but we will often opt for the simpler intensional notation 𝑎𝑅𝑏. One of (several) benefits of construing a relation extensionally is that one can then consider the cardinality |𝑅| of a relation 𝑅. In what follows, we employ a mixture of intensional and extensional notation, as convenient. If 𝑅 ⊆ 𝐴 × 𝐵, and 𝑎 ∈ 𝐴, then 𝑅(𝑎) ∶= {𝑏 ∈ 𝐵 ∶ 𝑎𝑅𝑏}. The relation 𝑅 is a function from 𝐴 to 𝐵 if and only if, for all 𝑎 ∈ 𝐴, |𝑅(𝑎)| = 1. If |𝑅(𝑎)| ≤ 1 for all 𝑎 ∈ 𝐴, then 𝑅 is called a partial function from A to B. Equivalently, a partial function from 𝐴 to 𝐵 is a function from 𝐸 to 𝐵, where 𝐸 ⊆ 𝐴. Such functions arise in logic, in the theory of recursive functions. The following theorem generalizes Theorem 2.5.2. Theorem 2.7.1. If 𝐴 and 𝐵 are finite sets and 𝑅 ⊆ 𝐴 × 𝐵, then |𝑅| = ∑𝑎∈𝐴 |𝑅(𝑎)|. Proof. |𝑅| = |{(𝑎, 𝑏) ∈ 𝐴 × 𝐵 ∶ 𝑎𝑅𝑏}| = | ∑ {(𝑎, 𝑏) ∶ 𝑏 ∈ 𝐵 and 𝑎𝑅𝑏}| 𝑎∈𝐴
= ∑ |{(𝑎, 𝑏) ∶ 𝑏 ∈ 𝐵 and 𝑎𝑅𝑏}| = ∑ |{𝑏 ∈ 𝐵 ∶ 𝑎𝑅𝑏}| = ∑ |𝑅(𝑎)|. 𝑎∈𝐴
𝑎∈𝐴
□
𝑎∈𝐴
How does Theorem 2.7.1 generalize Theorem 2.5.2? If 𝑅 ⊆ 𝐴 × 𝐵, the dual of 𝑅 is the relation 𝑑𝑅 ⊆ 𝐵 × 𝐴 defined by 𝑑𝑅 ∶= {(𝑏, 𝑎) ∶ (𝑎, 𝑏) ∈ 𝑅}. Equivalently, 𝑏(𝑑𝑅)𝑎 if and only if 𝑎𝑅𝑏. Clearly, 𝑑𝑑𝑅 = 𝑅 and |𝑑𝑅| = |𝑅|. Now if 𝑅 = 𝑓, a function from 𝐴 to 𝐵, then, by Corollary 2.5.3 and Theorem 2.7.1, |𝐴| = ∑𝑎∈𝐴 1 = ∑𝑎∈𝐴 |𝑅(𝑎)| = |𝑅| = |𝑑𝑅| = ∑𝑏∈𝐵 |𝑑𝑅(𝑏)| = ∑𝑏∈𝐵 |𝑓−1 ({𝑏})|. A relation from 𝑋 to 𝑋 is simply called a relation on 𝑋. Let 𝑅 be such a relation. (1) 𝑅 is reflexive if, for all 𝑥 ∈ 𝑋, 𝑥𝑅𝑥. (2) 𝑅 is irreflexive if, for all 𝑥 ∈ 𝑋, ∼ 𝑥𝑅𝑥. (3) 𝑅 is symmetric if 𝑥𝑅𝑦 ⇒ 𝑦𝑅𝑥. (4) 𝑅 is asymmetric if 𝑥𝑅𝑦 ⇒∼ 𝑦𝑅𝑥.
26
2. Sets, functions, and relations
(5) 𝑅 is antisymmetric if (𝑥𝑅𝑦 and 𝑦𝑅𝑥) ⇒ 𝑥 = 𝑦. (6) 𝑅 is transitive if (𝑥𝑅𝑦 and 𝑦𝑅𝑧) ⇒ 𝑥𝑅𝑧. (7) 𝑅 is complete if 𝑥 ≠ 𝑦 ⇒ 𝑥𝑅𝑦 or 𝑦𝑅𝑥. (8) 𝑅 is negatively transitive if ∼ 𝑥𝑅𝑦 and ∼ 𝑦𝑅𝑧 ⇒∼ 𝑥𝑅𝑧. The following theorem delineates the relation between asymmetry and antisymmetry. Theorem 2.7.2. A relation 𝑅 is asymmetric if and only if it is antisymmetric and irreflexive. Proof. Necessity. If 𝑅 is asymmetric, then the implication (𝑥𝑅𝑦 and 𝑦𝑅𝑥) ⇒ 𝑥 = 𝑦 holds vacuously. Moreover, if 𝑥𝑅𝑥 for some 𝑥, then ∼ 𝑥𝑅𝑥 by asymmetry, a contradiction. So 𝑅 is both antisymmetric and irreflexive. Sufficiency. By contraposition, 𝑅 is antisymmetric if and only if 𝑥 ≠ 𝑦 ⇒ ∼ (𝑥𝑅𝑦 and 𝑦𝑅𝑥). By irreflexivity ∼ 𝑥𝑅𝑥, and hence ∼ (𝑥𝑅𝑥 and 𝑥𝑅𝑥) holds for all 𝑥. So for all 𝑥 and 𝑦, we have ∼ (𝑥𝑅𝑦 and 𝑦𝑅𝑥), which is equivalent to the implication 𝑥𝑅𝑦 ⇒∼ 𝑦𝑅𝑥. So 𝑅 is asymmetric. □ The following two theorems delineate the relation between transitivity and the perhaps unfamiliar property of negative transitivity. Theorem 2.7.3. If 𝑅 is transitive and complete, then 𝑅 is negatively transitive. Proof. Exercise.
□
Theorem 2.7.4. If 𝑅 is negatively transitive and antisymmetric, then 𝑅 is transitive. Proof. Exercise.
□
Suppose that 𝑅 is a relation on 𝑋. There are four important relations associated with 𝑅. (i) The dual of R, denoted by 𝑑𝑅 (introduced earlier, but repeated here for completeness), is defined by 𝑑𝑅 ∶= {(𝑥, 𝑦) ∶ (𝑦, 𝑥) ∈ 𝑅}. Equivalently, 𝑥(𝑑𝑅)𝑦 if and only if 𝑦𝑅𝑥. Clearly, 𝑑𝑑𝑅 = 𝑅. (ii) The complement of R, denoted by 𝑐𝑅, is defined by 𝑐𝑅 ∶= (𝑋 ×𝑋)−𝑅. Equivalently, 𝑥(𝑐𝑅)𝑦 if and only if ∼ 𝑥𝑅𝑦. Clearly, 𝑐𝑐𝑅 = 𝑅. (iii) The asymmetric part of R, denoted by 𝑎𝑅, is defined by 𝑎𝑅 ∶= {(𝑥, 𝑦) ∈ 𝑅 ∶ (𝑦, 𝑥) ∉ 𝑅}. Equivalently, 𝑥(𝑎𝑅)𝑦 if and only if 𝑥𝑅𝑦 and ∼ 𝑦𝑅𝑥. Clearly, 𝑎𝑎𝑅 = 𝑎𝑅, (iv) The symmetric part of R, denoted by 𝑠𝑅, is defined by 𝑠𝑅 ∶= 𝑅 ∩ 𝑑𝑅. Equivalently, 𝑥(𝑠𝑅)𝑦 if and only if 𝑥𝑅𝑦 and 𝑦𝑅𝑥. Clearly, 𝑠𝑠𝑅 = 𝑠𝑅. Theorem 2.7.5. If 𝑅 is a relation on 𝑋, then 𝑎𝑅 is asymmetric, 𝑠𝑅 is symmetric, and 𝑅 = 𝑎𝑅 + 𝑠𝑅. Proof. Exercise.
□
References
27
2.8. The matrix representation of a relation As noted above, the relations on 𝑋, construed extensionally, are simply subsets of 𝑋 ×𝑋. Among these relations are the empty relation ∅, the universal relation 𝑋 × 𝑋 and the 2 identity relation {(𝑥, 𝑥) ∶ 𝑥 ∈ 𝑋}. If |𝑋| = 𝑛, there are clearly 2(𝑛 ) relations on 𝑋. In analyzing relations on finite sets, it is often helpful to employ their boolean matrix representations, constructed as follows. If 𝑅 is a relation on 𝑋 = {𝑥1 , . . . , 𝑥𝑛 }, the matrix representation of 𝑅 is the 𝑛×𝑛 matrix 𝑀 = (𝑚𝑖,𝑗 ), where 𝑚𝑖,𝑗 = 1 if 𝑥𝑖 𝑅𝑥𝑗 , and 𝑚𝑖,𝑗 = 0 if ∼ 𝑥𝑖 𝑅𝑥𝑗 . On the set 𝑋 = [3], for example, the matrices of the empty relation, the universal relation, and the identity relation are, respectively, 0 (0 0
0 0 0
0 0) , 0
1 1 (1 1 1 1
1 1) , 1
1 0 0 and (0 1 0) . 0 0 1
Many properties of relations (most obviously, symmetry) reveal themselves clearly in their associated matrices. Matrix representations also facilitate enumeration. For ex2 ample, it is easy to see that there 2𝑛 −𝑛 reflexive (respectively, irreflexive) relations on [𝑛], since these properties constrain only the diagonal elements of the relevant matrix representations. You are asked in the exercises to enumerate various other classes of relations on [𝑛].
2.9. Equivalence relations and partitions A relation on 𝑋 is an equivalence relation if it is reflexive, symmetric, and transitive. If 𝑅 is such a relation and 𝑥 ∈ 𝑋, the R-equivalence class determined by x, denoted [𝑥]𝑅 , is defined by [𝑥]𝑅 ∶= {𝑦 ∈ 𝑋 ∶ 𝑥𝑅𝑦}. Theorem 2.9.1. If 𝑅 is an equivalence relation on 𝑋, then the distinct R-equivalence classes constitute a partition of X . Proof. If 𝑥, 𝑦 ∈ 𝑋, then [𝑥]𝑅 ∩ [𝑦]𝑅 = ∅ or [𝑥]𝑅 = [𝑦]𝑅 , since if [𝑥]𝑅 ∩ [𝑦]𝑅 ≠ ∅, then [𝑥]𝑅 = [𝑦]𝑅 . For suppose that 𝑧 ∈ [𝑥]𝑅 ∩ [𝑦]𝑅 . If 𝑢 ∈ [𝑥]𝑅 , then 𝑢𝑅𝑧 and 𝑧𝑅𝑦, hence 𝑢𝑅𝑦 and 𝑢 ∈ [𝑦]𝑅 . The proof that 𝑢 ∈ [𝑦]𝑅 ⇒ 𝑢 ∈ [𝑥]𝑅 is identical. Hence, distinct 𝑅-equivalence classes are disjoint. Since, for all 𝑥 ∈ 𝑋, 𝑥 ∈ [𝑥]𝑅 , such equivalence classes are nonempty, with union equal to 𝑋. □ Remark 2.9.2. The preceding construction is easily seen to yield a bijection from the set of all equivalence relations on 𝑋 to the set of all partitions of 𝑋. (What is the inverse of this bijection?) So the equivalence relations on any set are equinumerous with the partitions of that set. In Chapter 6 we will study the Stirling numbers of the second kind S(n,k) and the Bell numbers 𝐵𝑛 , which enumerate, respectively, the partitions of [𝑛] with 𝑘 blocks and the total number of partitions of [𝑛].
References [1] P. Erdős and A. Szekeres (1935): A combinatorial problem in geometry, Compositio Mathematica 2, 463–470. MR1556929 [2] M. Erickson (1996): Introduction to Combinatorics, Wiley. MR1409365
28
2. Sets, functions, and relations
[3] R. Graham, D. Knuth, and O. Patashnik (1989): Concrete Mathematics, Addison-Wesley. MR1001562 [4] G.-C. Rota (1964): On the foundations of combinatorial theory I: theory of Möbius functions, Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 2, 340-368. MR174487 [5] R. Stanley (1997): Enumerative Combinatorics, Volume 1, Cambridge University Press. MR1442260 [6] D. Velleman (2006): How to Prove It: A Structured Approach (2nd edition), Cambridge University Press. MR2200345 [7] S. Warner (1965): Modern Algebra, Volumes I and II, Prentice-Hall (reprinted in 1990 in a single volume by Dover Publications). MR1068318
Exercises 2.1. Prove that if 𝑓 ∶ 𝐴 → 𝐵, then (a) for all 𝐸 ⊆ 𝐴, 𝑓−1 (𝑓(𝐸)) ⊇ 𝐸, with equality if and only if 𝑓 is injective; (b) for all 𝐻 ⊆ 𝐵, 𝑓(𝑓−1 (𝐻)) ⊆ 𝐻, with equality if and only if f is surjective; (c) 𝑓(𝐴1 ∪ 𝐴2 ) = 𝑓(𝐴1 ) ∪ 𝑓(𝐴2 ) for all 𝐴1 , 𝐴2 ⊆ 𝐴; and (d) 𝑓(𝐴1 ∩ 𝐴2 ) ⊆ 𝑓(𝐴1 ) ∩ 𝑓(𝐴2 ) for all 𝐴1 , 𝐴2 ⊆ 𝐴, with equality if 𝑓 is injective. 2.2. Suppose that 𝐴 and 𝐵 are finite, nonempty sets and that 𝑓 ∶ 𝐴 → 𝐵 is injective. How many left inverses does 𝑓 have? 2.3. Prove Theorem 2.1.11. 2.4. Describe carefully the fallacies involved in the derivation (in section 2.4 above) of the two bogus formulas for 𝜎(𝑛, 𝑘). 2.5. Prove that 𝜎(𝑛, 2) is always even. 2.6. If 𝑓 ∶ 𝐴 → 𝐵 and 𝑎 ∈ 𝐴, 𝑎 is a fixed point of 𝑓 if 𝑓(𝑎) = 𝑎 (so, for example, if 𝐴 ∩ 𝐵 = ∅, then 𝑓 has no fixed points). Let 𝑛, 𝑟, 𝑠 ∈ ℙ. (a) If we select a random function 𝑓 ∶ [𝑟𝑛] → [𝑠𝑛], what is the probability 𝜋𝑟,𝑠,𝑛 that 𝑓 has no fixed points? (b) Determine lim𝑛→∞ 𝜋𝑟,𝑠,𝑛 . 2.7. Suppose that 𝑓 ∶ 𝐴 → 𝐵 and ∼ is an equivalence relation on 𝐵. Define a relation 𝑅 on 𝐴 by 𝑎𝑅𝑏 ⇔ 𝑓(𝑎) ∼ 𝑓(𝑏). Prove that 𝑅 is an equivalence relation on 𝐴. How does this result relate to Theorem 2.5.1? 2.8. If 𝐴 ≠ ∅, a function 𝑓 ∶ 𝐴 → 𝐴 is called a constant function if there exists a 𝑐 ∈ 𝐴 such that, for all 𝑥 ∈ 𝐴, 𝑓(𝑥) = 𝑐. Prove that a function 𝑓 ∶ 𝐴 → 𝐴 is a constant function if and only if, for all 𝑔 ∶ 𝐴 → 𝐴, 𝑓 ∘ 𝑔 = 𝑓. 2.9. Let 𝑓, 𝑔 ∶ [𝑛] → [𝑘], where 𝑛, 𝑘 > 0. We say that 𝑓 and 𝑔 are domain equivalent, symbolized 𝑓 ∼ 𝑔, if there exists a bijection 𝜎 ∶ [𝑛] → [𝑛] such that 𝑓 = 𝑔 ∘ 𝜎. (a) Prove that ∼ is an equivalence relation on [𝑘][𝑛] . (b) Into how many equivalence classes does ∼ partition [𝑘][𝑛] ? (c) Suppose that 𝑛 ≥ 𝑘. Into how many equivalence classes does ∼ partition [𝑛] [𝑘]surj ?
Exercises
29
2.10. If 𝑘 > 0 and 𝑆 is a set, an ordered k-cover of S is a sequence (𝑆 1 , . . . , 𝑆 𝑘 ) of subsets of 𝑆, with 𝑆 1 ∪ ⋯ ∪ 𝑆 𝑘 = 𝑆. Determine, with proof, the number of ordered 𝑘-covers of [𝑛]. 2.11. Suppose that there is a set of 𝑛 labeled balls and a finite set of labeled boxes. At least one box is red, at least one is blue, and the rest (if any) are white. Prove that if 𝑛 > 2, the number of ways to distribute the balls among the boxes so that all red and blue boxes remain empty is never equal to the number of ways to distribute the balls so that at least one red box and at least one blue box receive a ball. You may use without proof here any theorems of number theory that turn out to be relevant. 2.12. Suppose that (𝑥1 , 𝑥2 , . . . , 𝑥𝑚𝑛+1 ) is a sequence of 𝑚𝑛 + 1 distinct real numbers. (a) Prove that this sequence must contain an increasing subsequence of length 𝑚 + 1 or a decreasing subsequence of length 𝑛 + 1. (b) Prove that this result is best possible, in the sense that it no longer holds in general for a sequence of merely 𝑚𝑛 distinct reals. 2.13. Suppose that “is acquainted with” is construed as a symmetric, irreflexive relation. Prove that in any set 𝑆 of 𝑛 ≥ 2 individuals there are at least two with the same number of acquaintances in 𝑆. 2.14. (a) With the same terminology as in Exercise 2.13 above, prove that in any set 𝑆 of six or more individuals, there are at least three mutually acquainted individuals, or at least three mutually unacquainted individuals. (b) Prove that this result is best possible, in the sense that it no longer holds in general for a set of five individuals. 2.15. Prove that any set of 𝑛 + 1 numbers chosen from [2𝑛] must contain two numbers that are relatively prime (i.e, that have no common prime divisor). 2.16. Determine the number of binary relations on [𝑛] that are, respectively, (a) symmetric. (b) asymmetric. (c) antisymmetric. (d) complete. (e) In any cases where two of these families turn out to be equinumerous, find a bijection that independently establishes this fact. (f) How many tournaments (asymmetric, complete relations) are there on [𝑛]? 2.17. Prove Theorem 2.7.3. 2.18. Prove Theorem 2.7.4. 2.19. Prove Theorem 2.7.5. 2.20. Let 𝐴 be a set with at least two elements. Find a minimal set of properties from those given in section 2.7 that characterize the following relations on 𝐴: (a) the empty relation, (b) the universal relation, (c) the identity relation.
30
2. Sets, functions, and relations
2.21. Prove the following relational pigeonhole principle, and show how part (c) below generalizes the pigeonhole principle for functions: If 𝐴 and 𝐵 are finite, nonempty sets and 𝑅 ⊆ 𝐴 × 𝐵, then (a) There exists an 𝑎 ∈ 𝐴 such that |𝑅(𝑎)| ≥ |𝑅|/|𝐴|. (b) There exists an 𝑎 ∈ 𝐴 such that |𝑅(𝑎)| ≤ |𝑅|/|𝐴|. (c) There exists a 𝑏 ∈ 𝐵 such that |𝑑𝑅(𝑏)| ≥ |𝑅|/|𝐵|. (d) There exists a 𝑏 ∈ 𝐵 such that |𝑑𝑅(𝑏)| ≤ |𝑅|/|𝐵|. 2.22. Among 21 mathematics majors at a small college, eight students are enrolled in abstract algebra, fifteen in combinatorics, ten in number theory, twelve in analysis, seven in logic, and twelve in probability. Prove that some math major is enrolled in four of the aforementioned courses.
Project 2.A Recall the following. (1) A sequence (𝑎𝑛 )𝑛≥0 is said to be periodic if there exists some 𝑡 ∈ ℙ such that 𝑎𝑛+𝑡 = 𝑎𝑛 , for all 𝑛 ≥ 0. The least such 𝑡 is called the period of the sequence. (2) If 𝑎, 𝑏 ∈ ℤ and 𝑚 ∈ ℙ, we say that 𝑎 and 𝑏 are congruent modulo m, symbolized 𝑎 ≡ 𝑏 (mod 𝑚), if 𝑎 − 𝑏 is divisible by 𝑚. Let 𝑎 mod 𝑚 be equal by definition to the unique 𝑟 ∈ 𝑍𝑚 ∶= {0, . . . , 𝑚 − 1} such that 𝑎 ≡ 𝑟 (mod 𝑚). (3) The sequence of Fibonacci numbers (𝐹𝑛 )𝑛≥0 is defined by the recursive formula 𝐹0 = 𝐹1 = 1, and 𝐹𝑛 = 𝐹𝑛−1 + 𝐹𝑛−2 , for all 𝑛 ≥ 2. (a) Prove that, for every 𝑚 ∈ ℙ, the sequence (𝐹𝑛 mod 𝑚)𝑛≥0 is periodic. The result is obvious for 𝑚 = 1, so suppose that 𝑚 ≥ 2. Hint: To simplify notation, abbreviate 𝐹𝑛 mod 𝑚 by 𝑓𝑛 . Show that 𝑓0 = 𝑓1 = 1 and (*) 𝑓𝑛 ≡ 𝑓𝑛−1 + 𝑓𝑛−2 (mod 𝑚) for all 𝑛 ≥ 2. There are only 𝑚2 possible pairs of the form 𝑓𝑖 , 𝑓𝑖+1 in the infinite sequence (𝑓𝑛 )𝑛≥0 . So, by the pigeonhole principle, there must exist 𝑖, 𝑗 with 0 ≤ 𝑖 < 𝑗, such that 𝑓𝑖 = 𝑓𝑗 and 𝑓𝑖+1 = 𝑓𝑗+1 . Conclude, using (*), that 𝑓𝑗−𝑖 ≡ 𝑓0 (mod 𝑚) and 𝑓𝑗−𝑖+1 ≡ 𝑓1 (mod 𝑚), whence 𝑓𝑗−𝑖 = 𝑓0 and 𝑓𝑗−𝑖+1 = 𝑓1 . So (𝑓𝑛 )𝑛≥0 is periodic (let 𝑡 = 𝑗 − 𝑖). (b) Determine the periods of the sequences (𝐹𝑛 mod 𝑚)𝑛≥0 for 1 ≤ 𝑚 ≤ 6. (c) Can you extend the result of part (a) to other sequences?
Chapter 3
Binomial coefficients
As enumerators of certain subsets of a finite set, binomial coefficients are arguably the most fundamental numbers in enumerative combinatorics. We have already seen how these numbers arise in counting compositions of an integer, and we will encounter here additional counting problems for which they furnish a solution. We will also catalogue the most frequently referenced binomial coefficient identities, with proofs based purely on their combinatorial interpretation, illustrating how much can be accomplished without appealing to closed form expressions. One such identity, the simple formula for the alternating sum of binomial coefficients (Theorem 3.3.1), turns out to play a key role in the proof of the sieve formula (also known as the principle of inclusion and exclusion), one of the basic tools of enumeration. This identity also underlies the proof of a powerful counting method, the binomial inversion principle, the first of many inversion principles that we will encounter, culminating in their ultimate formulation (in Chapter 14), in Gian-Carlo Rota’s celebrated Möbius inversion principle.
3.1. Subsets of a finite set In earlier chapters we have made use of some elementary properties of binomial coefficients. In this chapter we pursue a more detailed study of these important numbers. We showed previously (Corollary 2.3.4) that the set [𝑛] has 2𝑛 subsets. For all 𝑘 ≥ 0, let let (𝑛𝑘) denote, as usual, the number of 𝑘-element subsets of [𝑛]. In particular, (𝑛𝑘) = 0 if 𝑘 > 𝑛. An old-fashioned (and highly misleading) term for subset is combination. So (𝑛𝑘) enumerates the combinations of 𝑛 things, taken 𝑘 at a time. The number (𝑛𝑘), read as 𝑛 choose 𝑘, is the 𝑘th binomial coefficient of order n. It follows immediately that 𝑛
(3.1.1)
𝑛 ∑ ( ) = 2𝑛 , 𝑘 𝑘=0 31
32
3. Binomial coefficients
since each side of this identity represents a different way of counting the subsets of [𝑛]. A number of additional theorems follow directly from the combinatorial interpretation of the binomial coefficients, with no need to appeal to a closed-form expression for these numbers. Theorem 3.1.1. For all 𝑛, 𝑘 ≥ 0, (𝑛0) = 𝛿𝑛,0 , and (𝑘0) = 𝛿0,𝑘 . For all 𝑛, 𝑘 ≥ 1, 𝑛−1 𝑛−1 𝑛 )+( ). ( )=( 𝑘 𝑘−1 𝑘
(3.1.2)
Proof. Among all 𝑘-element subsets of [𝑛], (𝑛−1 ) counts those subsets containing 𝑛 as 𝑘−1 an element, and (𝑛−1 counts those subsets not containing 𝑛 as an element. □ ) 𝑘 The recurrence (3.1.2), known as Pascal’s formula, leads immediately to the familiar array known as Pascal’s triangle. Table 3.1. The numbers (𝑛𝑘), for 0 ≤ 𝑛, 𝑘 ≤ 6 (binomial coefficients)
𝑛=0 𝑛=1 𝑛=2 𝑛=3 𝑛=4 𝑛=5 𝑛=6
𝑘=0 1 1 1 1 1 1 1
𝑘=1 0 1 2 3 4 5 6
𝑘=2 0 0 1 3 6 10 15
𝑘=3 0 0 0 1 4 10 20
𝑘=4 0 0 0 0 1 5 15
𝑘=5 0 0 0 0 0 1 6
𝑘=6 0 0 0 0 0 0 1
Theorem 3.1.2. For all 𝑛, 𝑘 ≥ 1, (3.1.3)
𝑛 𝑛 𝑛−1 ( )= ( ). 𝑘 𝑘 𝑘−1
Proof. Although the verification of (3.1.3) is a trivial algebraic exercise if one makes use of formula (3.1.11) below, we prefer to give combinatorial proof of the equivalent identity
(3.1.4)
𝑛 𝑛−1 ( )𝑘 = 𝑛( ) for all 𝑛, 𝑘 ≥ 1, 𝑘 𝑘−1
by noting that each side of (3.1.4) counts the set {(𝐴, 𝑎) ∶ 𝐴 ⊆ [𝑛], |𝐴| = 𝑘, and 𝑎 ∈ 𝐴}. □ The modest little identity (3.1.3) is worth keeping in mind. Here is an amusing consequence: Theorem 3.1.3. If 𝑛, 𝑘 ≥ 1 and gcd(𝑛, 𝑘) = 1, then (3.1.5)
𝑛 ( ) ≡ 0 (mod𝑛). 𝑘
3.1. Subsets of a finite set
33
In particular, if 𝑝 is prime and 1 ≤ 𝑘 ≤ 𝑝 − 1, then (𝑘𝑝) ≡ 0 (mod𝑝), and so 𝑎, 𝑏 ∈ ℤ ⇒ (𝑎 + 𝑏)𝑝 ≡ 𝑎𝑝 + 𝑏𝑝 (mod𝑝). Proof. If 𝑘 > 𝑛, then (3.1.5) holds irrespective of whether 𝑛 and 𝑘 are relatively prime. Suppose that 1 ≤ 𝑘 ≤ 𝑛. Since 𝑛 clearly divides the right-hand side of (3.1.4), it also divides the left-hand side. But since 𝑛 and 𝑘 are relatively prime, it must be the case that 𝑛 divides (𝑛𝑘). □ The following two results are sometimes called hockey stick theorems. Theorem 3.1.4. For all 𝑛, 𝑘 ≥ 0, 𝑛
𝑛
𝑗 𝑛+1 𝑗 ∑( )= ∑( )=( ). 𝑘 𝑘 𝑘+1 𝑗=0 𝑗=𝑘
(3.1.6)
Proof. Among all (𝑘 + 1)-element subsets of [𝑛 + 1], (𝑘𝑗 ) counts those subsets whose largest element is 𝑗 + 1. □ Corollary 3.1.5. For all 𝑛, 𝑘 ≥ 0, 𝑘
∑(
(3.1.7)
𝑗=0
𝑛+𝑗 𝑛+𝑘+1 )=( ). 𝑗 𝑘 □
Proof. Exercise.
Table 3.2 shows a portion of Table 3.1 which explains why the previous two results are called hockey stick theorems. Table 3.2. Two hockey sticks
𝑛=3 𝑛=4 𝑛=5 𝑛=6
𝑘=0 [1] 1 1 1
𝑘=1 3 [4] 5 6
𝑘=2 3 6 [10] [15]
𝑘=3 (1) (4) (10) 20
𝑘=4 0 1 5 (15)
The bracketed numbers illustrate the case (03) + (41) + (25) = (62) of Corollary 3.1.5. The parenthesized numbers illustrate the case (33) + (43) + (35) = (46) of Theorem 3.1.4. The following is a useful identity due to Alexandre-Théophile Vandermonde (1735–1796). Theorem 3.1.6. For all 𝑚, 𝑛, 𝑟 ≥ 0, 𝑟
(3.1.8)
𝑚 𝑛 𝑚+𝑛 ∑ ( )( )=( ). 𝑘 𝑟 − 𝑘 𝑟 𝑘=0
Proof. Among all 𝑟-element subsets of [𝑚 + 𝑛], (𝑚 )( 𝑛 ) counts those subsets, exactly 𝑘 𝑟−𝑘 𝑘 of whose elements are ≤ 𝑚 (and, hence, 𝑟 − 𝑘 of whose elements are > 𝑚). □
34
3. Binomial coefficients
Remark 3.1.7 (Application). Consider a population of 𝑚 + 𝑛 things, 𝑚 of type 1, and 𝑛 of type 2. Suppose that we select 𝑟 of these things, randomly and without replacement. The random variable 𝑋, which records the number of things of type 1 so selected, is called a hypergeometric random variable, and for 𝑘 = 0, . . . , 𝑟, we have (3.1.9)
𝑃(𝑋 = 𝑘) =
(𝑚 )( 𝑛 ) 𝑘 𝑟−𝑘 (𝑚+𝑛 ) 𝑟
.
𝑟
That ∑𝑘=0 𝑃(𝑋 = 𝑘) = 1 follows from (3.1.8). In what follows, if 𝐴 is any finite set and 𝑘 ≥ 0, then (𝐴𝑘) denotes the set of all 𝑘-element subsets of 𝐴, and 𝐴𝑘 denotes the set of all permutations of members of 𝐴, taken 𝑘 at a time. Of course, |(𝐴𝑘)| = (|𝐴| ) and |𝐴𝑘 | = |𝐴|𝑘 . 𝑘 Theorem 3.1.8. If 0 ≤ 𝑘 ≤ 𝑛, then 𝑛 𝑛 ( )=( ). 𝑘 𝑛−𝑘
(3.1.10)
[𝑛] Proof. The map 𝐸 → 𝐸 𝑐 is a bijection from ([𝑛] ) to (𝑛−𝑘 ). 𝑘
□
We have deliberately postponed the following theorem to the end of this section, in order to emphasize how much can be proved without appealing to a closed form for binomial coefficients. Theorem 3.1.9. For all 𝑛, 𝑘, ≥ 0, (3.1.11)
𝑛 𝑛𝑘 𝑛! if 0 ≤ 𝑘 ≤ 𝑛) ( )= (= 𝑘! 𝑘 𝑘! (𝑛 − 𝑘)!
Proof. The map (𝑖1 , . . . , 𝑖𝑘 ) → {𝑖1 , . . . , 𝑖𝑘 } is clearly a 𝑘!-to-1 surjection from [𝑛]𝑘 to ([𝑛] ), 𝑘 𝑛 𝑘 and so 𝑛 = 𝑘! (𝑘) by Theorem 2.5.4. □ 𝑛𝑘
Note that the formula (𝑛𝑘) = 𝑘! has the virtue of holding for all 𝑛, 𝑘 ≥ 0, yielding the value (𝑛𝑘) = 0 whenever 𝑘 > 𝑛.
3.2. Distributions, words, and lattice paths The central result of this section, from which all of the subsequent applications are derived, is the near-trivial observation that binomial coefficients count functions with 2-element codomains, subject to prescribed preimage cardinalities. Theorem 3.2.1. For 0 ≤ 𝑘 ≤ 𝑛, (3.2.1)
𝑛 |{𝑓 ∶ [𝑛] → [2] ∶ |𝑓−1 ({1})| = 𝑘 and |𝑓−1 ({2})| = 𝑛 − 𝑘}| = ( ). 𝑘
Proof. Choose from [𝑛] the 𝑘 elements that will be mapped to 1, and map the remaining 𝑛 − 𝑘 elements to 2. □ Corollary 3.2.2. There are (𝑛𝑘) distributions of 𝑛 labeled balls among boxes labeled 1 and 2, in which 𝑘 balls are placed in box 1 (and hence 𝑛 − 𝑘 balls in box 2).
3.2. Distributions, words, and lattice paths
35
Proof. Obvious.
□ (𝑛𝑘)
Corollary 3.2.3. There are sequential arrangements of 𝑘 indistinguishable things of one type (say, 𝑥’s) and 𝑛 − 𝑘 indistinguishable things of another type (say, 𝑦’s). □
Proof. Obvious.
Recall from Chapter 1 that a weak k -composition of n is a sequence (𝑛1 , . . . , 𝑛𝑘 ) of nonnegative integers summing to 𝑛. There we established a one-to-one correspondence between the set of such compositions and the set of sequential arrangements of 𝑛 copies of the symbol • and 𝑘−1 copies of the symbol |, and hence that wcomp(𝑛, 𝑘) = (𝑛+𝑘−1 ). This is just one application of the preceding corollary. Here are two others. 𝑛 If 𝑟, 𝑠 ∈ 𝑁, a lattice path from (0, 0) to (𝑟, 𝑠) is a sequence of steps of the form (𝑖) (𝑎, 𝑏) → (𝑎 + 1, 𝑏) or (𝑖𝑖) (𝑎, 𝑏) → (𝑎, 𝑏 + 1), beginning at (0, 0) and ending at (𝑟, 𝑠). Theorem 3.2.4. There are (𝑟+𝑠 ) lattice paths from (0, 0) to (𝑟, 𝑠). 𝑟 Proof. Representing steps of type (𝑖) above by 𝑥’s and steps of type (𝑖𝑖) by 𝑦’s yields a bijection from the set of lattice paths under consideration to the set of sequential arrangements of 𝑟 𝑥’s and 𝑠 𝑦’s. □ Theorem 3.2.5 (Binomial theorem). If 𝛼 and 𝛽 are commuting elements in a ring with identity and 𝑛 ≥ 0, then 𝑛
𝑛 (𝛼 + 𝛽)𝑛 = ∑ ( )𝛼𝑘 𝛽𝑛−𝑘 . 𝑘 𝑘=0
(3.2.2)
In particular, if 𝑥 and 𝑦 are variables ranging over ℂ (or, indeed, any commutative ring with identity), then 𝑛
𝑛 (𝑥 + 𝑦)𝑛 = ∑ ( )𝑥𝑘 𝑦𝑛−𝑘 . 𝑘 𝑘=0
(3.2.3)
Proof. Using only the distributive law and the associativity of addition and multiplication, (𝛼 + 𝛽)𝑛 may be expanded as the sum of all 2𝑛 words of length 𝑛 in the alphabet {𝛼, 𝛽}, with concatenation indicating ring multiplication. Using the fact that 𝛼 and 𝛽 commute and summing like terms, then yields 𝑛
(𝛼 + 𝛽)𝑛 = ∑ (the number of sequential arrangements of 𝑘 𝛼’s and 𝑛 − 𝑘 𝛽’s)𝛼𝑘 𝛽 𝑛−𝑘 𝑘=0 𝑛
𝑛 = ∑ ( )𝛼𝑘 𝛽 𝑛−𝑘 . 𝑘 𝑘=0 □ 𝑚(𝑚−1)⋯(𝑚−𝑛+1)
The variant (𝑚 of formula (3.1.11) suggests a generalization of )= 𝑛 𝑛! binomial coefficients in which the parameter 𝑚 is replaced by an arbitrary real number 𝛼, so that (3.2.4)
𝛼 𝛼 𝛼(𝛼 − 1) ⋯ (𝛼 − 𝑛 + 1) , for 𝑛 > 0. ( ) ∶= 1 and ( ) ∶= 0 𝑛 𝑛!
36
3. Binomial coefficients
These numbers play a central role in a generalization of the binomial theorem due to Newton. Theorem 3.2.6. For all real numbers 𝛼 and all real numbers 𝑥 such that |𝑥| < 1, 𝛼 (1 + 𝑥)𝛼 = ∑ ( )𝑥𝑛 . 𝑛 𝑛≥0
(3.2.5)
Proof. Wade (2010, p. 256) and, for formal power series analogues, Chapter 13 (Theorems 13.6.5 and 13.7.6) below. □ Remark 3.2.7. Theorem 3.2.6 is frequently invoked in generating function solutions to enumeration problems. Here, for example, is a generating function derivation of formula (1.3.1). Note first that (−𝑘 ) = (−1)𝑛 (𝑛+𝑘−1 ) for all 𝑛, 𝑘 ≥ 0. In particu𝑛 𝑛 −1 𝑛 lar, ( 𝑛 ) = (−1) . Then, since wcomp(𝑛, 𝑘) = 𝐶(𝑛, 𝑘; ℕ), Theorems 1.4.1 and 3.2.6 yield ∑𝑛≥0 wcomp(𝑛, 𝑘)𝑥𝑛 = (∑𝑖≥0 𝑥𝑖 )𝑘 = (1 − 𝑥)−𝑘 = ∑𝑛≥0 (−𝑘 )(−𝑥)𝑛 , whence, 𝑛 wcomp(𝑛, 𝑘) = (−1)𝑛 (−𝑘 ) = (𝑛+𝑘−1 ). Note that this formula agrees with formula 𝑛 𝑛 (1.3.1) if 𝑛 ≥ 0 and 𝑘 ≥ 1, but it has the added virtue of holding for 𝑘 = 0 as well.
3.3. Binomial inversion and the sieve formula The following simple identity underlies a useful enumeration technique, known as binomial inversion, and also a noninductive proof of the sieve formula. Theorem 3.3.1. For all 𝑛 ≥ 0, 𝑛
𝑛 ∑ (−1)𝑘 ( ) = 𝛿𝑛,0 . 𝑘 𝑘=0
(3.3.1)
Proof. Of course, the result follows immediately from formula (3.2.3), with 𝑥 = −1 and 𝑦 = 1, but we prefer a combinatorial proof. The result is obvious if 𝑛 = 0. If 𝑛 > 0, formula (3.3.1) is equivalent to (3.3.2)
𝑛 𝑛 ∑ ( ). ( )= 𝑘 𝑘 𝑘≡1 ( mod 2) 𝑘≡0 ( mod 2) ∑
Let ℰ ∶= {𝐴 ⊆ [𝑛] ∶ |𝐴| ≡ 0 (mod 2)} and 𝒪 ∶= {𝐴 ⊆ [𝑛] ∶ |𝐴| ≡ 1 (mod 2)}.The map 𝐴 → 𝐴 − {1} if 1 ∈ 𝐴, and 𝐴 → 𝐴 ∪ {1} if 1 ∉ 𝐴, is a bijection from ℰ to 𝒪 (and also from 𝒪 to ℰ). □ Theorem 3.3.2 (Orthogonality relations for binomial coefficients). If 0 ≤ 𝑗 ≤ 𝑛, then 𝑛
(3.3.3)
𝑛 𝑘 ∑ (−1)𝑛−𝑘 ( )( ) = 𝛿𝑛,𝑗 𝑘 𝑗 𝑘=𝑗
and 𝑛
(3.3.4)
𝑛 𝑘 ∑ (−1)𝑘−𝑗 ( )( ) = 𝛿𝑛,𝑗 . 𝑘 𝑗 𝑘=𝑗
3.3. Binomial inversion and the sieve formula
37
Proof. Note first that 𝑛 𝑘 𝑛 𝑛−𝑗 ( )( ) = ( )( ), 𝑘 𝑗 𝑗 𝑘−𝑗
(3.3.5)
since each side of (3.3.5) counts the set {(𝐴, 𝐵) ∶ 𝐴 ⊆ 𝐵 ⊆ [𝑛], |𝐴| = 𝑗, and |𝐵| = 𝑘}. Hence, the left-hand side of (3.3.3) is equal to 𝑛
𝑛−𝑗
𝑛 𝑛−𝑗 𝑛 𝑛−𝑗 ( ) ∑ (−1)𝑛−𝑘 ( ) =(𝑖=𝑘−𝑗) ( ) ∑ (−1)𝑛−𝑗−𝑖 ( ) = 𝛿 (𝑛−𝑗),0 = 𝛿𝑛,𝑗 . 𝑗 𝑘=𝑗 𝑘−𝑗 𝑗 𝑖=0 𝑖 There is an analogous proof of (3.3.4), but it is unnecessary to furnish such a proof. For in both (3.3.3) and (3.3.4), the sums in question could have been taken from 𝑘 = 0 to 𝑘 = 𝑛, since (𝑘𝑗) = 0 if 𝑘 < 𝑗. So (3.3.3) holds for all 𝑛 ≥ 0 if and only if, for all ∗ 𝑁 ≥ 0, 𝑀𝑁 𝑀𝑁 = 𝐼𝑁 , where the (𝑁 + 1) × (𝑁 + 1) matrix 𝑀𝑁 = (𝑚𝑛,𝑘 )0≤𝑛,𝑘≤𝑁 is ∗ defined by 𝑚𝑛,𝑘 = (𝑛𝑘), the (𝑁 + 1) × (𝑁 + 1) matrix 𝑀𝑁 = (𝑚∗𝑛,𝑘 )𝑜≤𝑛,𝑘,≤𝑁 is defined by 𝑚∗𝑛,𝑘 = (−1)𝑛−𝑘 (𝑛𝑘), and 𝐼𝑁 is the (𝑁 + 1) × (𝑁 + 1) identity matrix. Similarly, (3.3.4) ∗ holds for all 𝑛 ≥ 0 if and only if, for all 𝑁 ≥ 0, 𝑀𝑁 𝑀𝑁 = 𝐼𝑁 . But by basic linear algebra, ∗ ∗ 𝑀𝑁 𝑀𝑁 = 𝐼𝑁 ⇔ 𝑀𝑁 𝑀𝑁 = 𝐼𝑁 . □ Theorem 3.3.3 (Binomial inversion principle). For sequences (𝑎𝑛 )𝑛≥0 and (𝑏𝑛 )𝑛≥0 in ℂ, the following are equivalent: 𝑛
(3.3.6)
For all 𝑛 ≥ 0,
𝑛 𝑏𝑛 = ∑ ( )𝑎𝑘 . 𝑘 𝑘=0 𝑛
(3.3.7)
𝑛 𝑎𝑛 = ∑ (−1)𝑛−𝑘 ( )𝑏𝑘 . 𝑘 𝑘=0
For all 𝑛 ≥ 0,
Proof. Given (3.3.6), along with (3.3.3), the right-hand side of (3.3.7) is equal to 𝑛
𝑘
𝑛
𝑛
𝑛
𝑛 𝑘 𝑛 𝑘 ∑ (−1)𝑛−𝑘 ( ) ∑ ( )𝑎𝑗 = ∑ 𝑎𝑗 ∑ (−1)𝑛−𝑘 ( )( ) = ∑ 𝑎𝑗 𝛿𝑛,𝑗 = 𝑎𝑛 . 𝑘 𝑗 𝑘 𝑗 𝑘=0 𝑗=0 𝑗=0 𝑘=𝑗 𝑗=0 The derivation of (3.3.6) from (3.3.7) is similar. Alternatively, one can show the equivalence of (3.3.6) and (3.3.7), by noting that they are equivalent, respectively, to the matrix equations (3.3.8)
For all 𝑁 ≥ 0,
𝐵𝑁 = 𝑀𝑁 𝐴𝑁
For all 𝑁 ≥ 0,
∗ 𝐵𝑁 , 𝐴𝑁 = 𝑀 𝑁
and (3.3.9)
∗ where 𝐴𝑁 = (𝑎0 , . . . , 𝑎𝑁 )𝑡 , 𝐵𝑁 = (𝑏0 , . . . , 𝑏𝑁 )𝑡 , and 𝑀𝑁 and 𝑀𝑁 are as defined as in the proof of Theorem 3.3.2. □
Binomial inversion yields a rapid solution to the problème des rencontres, which asks for the number of permutations of [𝑛] having no fixed points (Fr., rencontres). A permutation of [𝑛] here is construed as a bijection 𝑓 ∶ [𝑛] → [𝑛] and a fixed point of 𝑓 is an element 𝑖 ∈ [𝑛] such that 𝑓(𝑖) = 𝑖. A permutation with no fixed points is called a dérangement, and the number of dérangements of [𝑛] is denoted by 𝑑𝑛 .
38
3. Binomial coefficients
Theorem 3.3.4 (Problème des recontres). For all 𝑛 ≥ 0, 𝑛
𝑑𝑛 = ∑ (−1)𝑛−𝑘 𝑛𝑘 .
(3.3.10)
𝑘=0
Proof. If 0 ≤ 𝑘 ≤ 𝑛, the number of permutations of [𝑛] with exactly 𝑘 fixed points is clearly (𝑛𝑘)𝑑𝑛−𝑘 , and so, 𝑛
𝑛
(3.3.11)
𝑛
𝑛 𝑛 𝑛 𝑛! = ∑ ( )𝑑𝑛−𝑘 = ∑ ( )𝑑𝑘 = ∑ ( )𝑑𝑘 , for all 𝑛 ≥ 0. 𝑘 𝑛 − 𝑘 𝑘 𝑘=0 𝑘=0 𝑘=0
By Theorem 3.3.3, equation (3.3.11) is equivalent to 𝑛
𝑛
𝑛 𝑑𝑛 = ∑ (−1)𝑛−𝑘 ( )𝑘! = ∑ (−1)𝑛−𝑘 𝑛𝑘 , for all 𝑛 ≥ 0. 𝑘 𝑘=0 𝑘=0
□
Corollary 3.3.5. If 𝜋𝑛 denotes the probability that a randomly chosen permutation of [𝑛] has no fixed points, then lim 𝜋𝑛 = 𝑒−1 .
(3.3.12)
𝑛→∞
Proof. By (3.3.10), 𝑛
lim 𝜋𝑛 = lim 𝑑𝑛 /𝑛! = lim ∑ (−1)𝑛−𝑘
𝑛→∞
𝑛→∞
𝑛→∞
𝑛
𝑘=0
𝑘
1 (𝑛 − 𝑘)!
(−1) (−1)𝑘 =∑ = 𝑒−1 . 𝑘! 𝑘! 𝑛→∞ 𝑘=0 𝑘≥0
= lim ∑
□
There is an amusing version of the above, known as the hat-check problem: Each of 𝑛 gentlemen attending an opera checks his hat with an attendant. The checks on each hat become dislodged, and so the attendant simply returns the hats at random. What is the probability that no gentleman gets his own hat back? How does this probability change as 𝑛 increases? Some people guess that the probability approaches 1 as 𝑛 increases (there are, after all, an increasing number of “wrong” hats available to be returned to each gentleman). Others guess that the probability approaches 0 (with so many gentlemen, even an event with tiny probability ought to occur at least once). In fact, as shown in Corollary 3.3.5, the probability approaches the limit 𝑒−1 ≐ 0.368, and rather quickly, since 𝑒−1 is represented by an alternating series of absolutely decreasing terms (see Wade (1995, Corollary 4.15)). Binomial inversion also leads quickly to a formula for 𝜎(𝑛, 𝑘), the number of surjective functions 𝑓 ∶ [𝑛] → [𝑘]. Theorem 3.3.6. For all 𝑛, 𝑘 ≥ 0, 𝑘
(3.3.13)
𝑘 𝜎(𝑛, 𝑘) = ∑ (−1)𝑘−𝑗 ( )𝑗𝑛 . 𝑗 𝑗=0
3.4. Problème des ménages
39
Proof. Apply binomial inversion to the formula 𝑘
𝑘 𝑘𝑛 = ∑ ( )𝜎(𝑛, 𝑗), 𝑗 𝑗=0
(3.3.14)
which is established by observing that, among all functions 𝑓 from [𝑛] to [𝑘], (𝑘𝑗)𝜎(𝑛, 𝑗) counts those for which | im(𝑓)| = 𝑗. □ Remark 3.3.7 (The sieve formula). Finally, we use Theorem 3.3.1 to give a noninductive proof of the sieve formula (3.3.15)
|𝐴1 ∪ ⋯ ∪ 𝐴𝑛 | =
∑ (−1)|𝐼|−1 | ∅≠𝐼⊆[𝑛]
⋂
𝐴𝑖 |.
𝑖∈𝐼
Suppose that 𝐴 and 𝐵 are sets and 𝐵 ⊆ 𝐴. The characteristic function of B, denoted 𝜒𝐵 , is defined on 𝐴 by (𝑖) 𝜒𝐵 (𝑎) = 1 if 𝑎 ∈ 𝐵, and (𝑖𝑖) 𝜒𝐵 (𝑎) = 0 if 𝑎 ∉ 𝐵. Note that if 𝐵 is finite, then |𝐵| = ∑ 𝜒𝐵 (𝑎).
(3.3.16)
𝑎∈𝐴
Proof of (3.3.15). Let 𝐴𝐼 ∶= ⋂𝑖∈𝐼 𝐴𝑖. By (3.3.16), formula (3.3.15) is equivalent to ∑ 𝜒𝐴1 ∪⋯∪𝐴𝑛 (𝑎) = (3.3.17)
∑ (−1)|𝐼|−1 ∑ 𝜒𝐴𝐼 (𝑎) ∅≠𝐼⊆[𝑛]
𝑎∈𝐴
= ∑
∑ (−1)
𝑎∈𝐴 |𝐼|−1
𝜒𝐴𝐼 (𝑎).
𝑎∈𝐴 ∅≠𝐼⊆[𝑛]
Formula (3.3.17) holds, quite cooperatively, in virtue of the fact that for each 𝑎 ∈ 𝐴, (3.3.18)
𝜒𝐴1 ∪⋯∪𝐴𝑛 (𝑎) =
∑ (−1)|𝐼|−1 𝜒𝐴𝐼 (𝑎). ∅≠𝐼⊆[𝑛]
If 𝑎 is an element of none of the sets 𝐴𝑖 , then (3.3.18) holds in the form 0 = 0. Suppose then that 𝑎 ∈ 𝐴𝑖 for precisely those 𝑖 ∈ 𝐽, where |𝐽| = 𝑗 > 0. Then the left-hand side of (3.3.18) is equal to 1, and, by Theorem 3.3.1, the right-hand side is equal to 𝑗
𝑗 ∑ (−1)|𝐼|−1 = ∑ (−1)𝑖−1 ( ) = 1. 𝑖 ∅≠𝐼⊆𝐽 𝑖=1
□
3.4. Problème des ménages How many visually distinct ways are there to seat 𝑛 man-woman couples in chairs labeled 1, . . . , 2𝑛 and placed around a circular table, so that men and women alternate and no one sits next to his or her partner? This is the problème des ménages, first posed by Lucas (1891). The solution to this problem was first published, without proof, by Touchard (1934). The first proof was given by Kaplansky (1943). The proof that we present is due to Bogart and Doyle (1986) and is based on two simple lemmas. In what follows a domino is simply a 1 × 2 rectangle.
40
3. Binomial coefficients
Lemma 3.4.1. Let dom(line, 𝑚, 𝑘) denote the number of ways to place k indistinguishable, nonoverlapping dominos on the linear sequence of numbers 1, 2, . . . , 𝑚, where each domino covers two numbers. Then (3.4.1)
dom(line, 𝑚, 𝑘) = (
𝑚−𝑘 ). 𝑘
Proof. Each such placement corresponds in one-to-one fashion with a sequential arrangement of 𝑘 2’s and 𝑚 − 2𝑘 1’s, as illustrated by the example, 1 [2, 3] 4 5 [6, 7] [8, 9] 10, which corresponds to the sequence 1211221. Formula (3.4.1) follows from Corollary 3.2.3. □ Lemma 3.4.2. Let dom(circle, 𝑚, 𝑘) denote the number of ways to place k indistinguishable, nonoverlapping dominos on a circular sequence of numbers 1, 2, . . . , 𝑚, where each domino covers two numbers. Then (3.4.2)
dom(circle, 𝑚, 𝑘) =
𝑚 𝑚−𝑘 ( ). 𝑚−𝑘 𝑘
Proof. The placements under consideration fall into three classes: (𝑖) those in which the number 1 is not covered by a domino, (𝑖𝑖) those in which a single domino covers both 𝑚 and 1 ([𝑚, 1]), and (𝑖𝑖𝑖) those in which a single domino covers 1 and 2 ([1, 2]). There are dom(line, 𝑚 − 1, 𝑘) placements in class (𝑖), and dom(line, 𝑚 − 2, 𝑘 − 1) placements in each of classes (𝑖𝑖) and (𝑖𝑖𝑖). So, by Lemma 3.4.1, dom(circle, 𝑛, 𝑘) = 𝑚 □ (𝑚−𝑘−1 ) + 2(𝑚−𝑘−1 ) = 𝑚−𝑘 (𝑚−𝑘 ). 𝑘 𝑘−1 𝑘 We are now prepared to prove the main result of this section. Let 𝑀𝑛 denote the number of seatings of 𝑛 couples, subject to the restrictions described above. The number 𝑀𝑛 is called the 𝑛th ménage number. Theorem 3.4.3. 𝑀1 = 0 and for all 𝑛 ≥ 2 𝑛
(3.4.3)
𝑀𝑛 = 2(𝑛! ) ∑ (−1)𝑘 𝑘=0
2𝑛 2𝑛 − 𝑘 ( )(𝑛 − 𝑘)! . 2𝑛 − 𝑘 𝑘
Proof. Label the couples 1, 2, . . . , 𝑛. Let 𝑆 be the set of all sex-alternating seatings of the couples, and, for 𝑖 = 1, . . . , 𝑛, let 𝑆 𝑖 be the set of all seatings in 𝑆 in which the spouses of couple 𝑖 are seated next to each other. Clearly, 𝑀𝑛 = |𝑆 − (𝑆 1 ∪ ⋯ ∪ 𝑆𝑛 )| = |𝑆| − ∑ |𝑆 𝑖 | + (3.4.4)
1≤𝑖≤𝑛
∑
|𝑆 𝑖 ∩ 𝑆𝑗 | − ⋯ + (−1)𝑛 |𝑆 1 ∩ ⋯ ∩ 𝑆𝑛 |
1≤𝑖 0, So (𝑑1 − 1) + ⋯ + (𝑑𝑛 − 1) = 𝑛 − 2 and 𝑑1 + ⋯ + 𝑑𝑛 = 2(𝑛 − 1). But, . .,𝑑𝑛 −1 1 as previously noted, 𝑑1 + ⋯ + 𝑑𝑛 = 2 times the number of edges in the tree.
5.4. *Spanning trees Given any connected graph on [𝑛], a spanning tree of that graph is simply a tree on [𝑛], each edge of which is an edge of the original graph. Remarkably, there is a formula, due to Gustav Robert Kirchhoff (1824–1887), for the number of spanning trees of any connected graph. This formula involves the adjacency matrix of the graph, namely, the 𝑛 × 𝑛 matrix 𝐴 = (𝑎𝑖,𝑗 ), where 𝑎𝑖,𝑗 = 1 if the vertices 𝑖 and 𝑗 are adjacent, and 𝑎𝑖,𝑗 = 0 otherwise. Theorem 5.4.1 (Kirchhoff’s matrix tree theorem). If 𝐴 is the adjacency matrix of a connected graph on the vertex set [𝑛], and 𝑀 ∶= −𝐴 + 𝐷, where 𝐷 = (𝑑𝑖,𝑗 ) is the 𝑛 × 𝑛 diagonal matrix, with 𝑑𝑖,𝑖 = the degree of vertex 𝑖, then all cofactors of 𝑀 are equal, and their common value is equal to the number of spanning trees of the graph. Proof. See Kirchhoff (1847), Harary (1969), or West (1996).
□
Remark 5.4.2. Kirchhoff’s theorem leads to a quick proof of Cayley’s formula for the number of trees on [𝑛]. For a tree on [𝑛] is simply a spanning tree of 𝐾𝑛 , the complete graph on [𝑛]. So 𝑛−1 −1 ⎛ −1 𝑛−1 𝑀=⎜ ⋅ ⎜ ⋅ −1 ⋯ ⎝
⋯ −1 ⋯ ⋅ −1
−1 ⎞ −1 ⎟. ⋅ ⎟ 𝑛 − 1⎠
Taking the cofactor associated with the first row and column of 𝑀, and simplifying the resulting (𝑛 − 1) × (𝑛 − 1) matrix by row operations yields a formula for 𝑡𝑛 as the determinant of the (𝑛 − 1) × (𝑛 − 1) upper triangular matrix indicated below: 𝑛 0 0 ⎛ 0 𝑛 0 ⎜ 𝑡𝑛 = det ⎜ ⋅ ⋅ ⋅ ⎜0 0 ⋯ ⎝0 0 0
⋯ −𝑛 ⎞ ⋯ −𝑛 ⎟ ⋅ ⋅ ⎟ = 𝑛𝑛−2 . 𝑛 −𝑛⎟ ⋯ 1⎠
5.5. *Ramsey theory
63
5.5. *Ramsey theory Frank Ramsey (1903–1930) was a brilliant British philosopher, mathematician, and economist, who made important contributions to the foundations of subjective probability and decision theory. Interestingly, the theorem for which he is best known among mathematicians was proved simply as a lemma in the process of proving a theorem in mathematical logic (Ramsey 1930). This theorem, which appears in its full generality below as Theorem 5.5.1, has evolved into a major area of combinatorial mathematics known as Ramsey theory. See Graham, Rothschild, and Spencer (1990). Theorem 5.5.1 (Ramsey’s theorem for two colors). For all integers 𝑟, 𝑏 ≥ 2 there exists a positive integer, and hence a smallest such integer, denoted 𝑅(𝑟, 𝑏), with the following property: If each edge of the complete graph 𝐾𝑅(𝑟,𝑏) on 𝑅(𝑟, 𝑏) vertices is colored red or blue, then there is a complete subgraph on r of these vertices, all of the edges of which are red (“𝐾𝑅(𝑟,𝑏) contains a red 𝐾𝑟 ”), or there is a complete subgraph on 𝑏 of these vertices, all of the edges of which are blue (“𝐾𝑅(𝑟,𝑏) contains a blue 𝐾𝑏 ”). Moreover, (5.5.1)
𝑅(𝑟, 𝑏) ≤ 𝑅(𝑟 − 1, 𝑏) + 𝑅(𝑟, 𝑏 − 1), for all 𝑟, 𝑏 ≥ 3.
Proof. By induction on 𝑟 + 𝑏. It is clear that 𝑅(𝑟, 2) = 𝑟 and 𝑅(2, 𝑏) = 𝑏. Suppose that 𝑅(𝑟, 𝑏) exists for all 𝑟 and 𝑏 such that 𝑟 + 𝑏 = 𝑠. Then 𝑅(𝑟, 𝑏) exists for all 𝑟 and 𝑏 such that 𝑟 + 𝑏 = 𝑠 + 1, by the following argument. We have already noted that this is the case if 𝑟 = 2 or 𝑏 = 2, so suppose that 𝑟, 𝑏 ≥ 3. By the induction hypothesis, both 𝑅(𝑟 − 1, 𝑏) and 𝑅(𝑟, 𝑏 − 1) exist. The existence of 𝑅(𝑟, 𝑏) and the upper bound (5.5.1) then follow from the fact that any red/blue coloring of the edges of 𝐾𝑅(𝑟−1,𝑏)+𝑅(𝑟,𝑏−1) must contain a red 𝐾𝑟 or a blue 𝐾𝑏 . For given such a coloring of this complete graph, let 𝑣 be an arbitrary vertex. There are 𝑅(𝑟 − 1, 𝑏) + 𝑅(𝑟, 𝑏 − 1) − 1 edges emanating from 𝑣, and so at least 𝑅(𝑟 − 1, 𝑏) of these edges are red, or at least 𝑅(𝑟, 𝑏 − 1) of these edges are blue. Suppose, with no loss of generality, that 𝑅(𝑟 − 1, 𝑏) red edges emanate from 𝑣. By the inductive hypothesis, the complete graph on the 𝑅(𝑟 − 1, 𝑏) vertices connected to 𝑣 by red edges contains a red 𝐾𝑟−1 or a blue 𝐾𝑏 . In the latter case, the desired result holds. In the former, add 𝑣 and all the red edges connecting 𝑣 to the vertices of the red 𝐾𝑟−1 , producing thereby a red 𝐾𝑟 . □ Corollary 5.5.2. 𝑅(𝑟, 𝑏) ≤ (𝑟+𝑏−2 ), for all 𝑟, 𝑏 ≥ 2. 𝑟−1 Proof. By induction on 𝑟 + 𝑏. By earlier observations, the above inequality holds in the form of an equality if 𝑟 = 2 or 𝑏 = 2. So suppose that it holds for all 𝑟 and 𝑏 such that 𝑟 + 𝑏 = 𝑠. If 𝑟 + 𝑏 = 𝑠 + 1, we may assume by the preceding remark that 𝑟, 𝑠 ≥ 3. Hence, by (5.5.1) and the induction hypothesis, 𝑅(𝑟, 𝑏) ≤ 𝑅(𝑟 − 1, 𝑏) + 𝑅(𝑟, 𝑏 − 1) ≤ □ (𝑟+𝑏−3 ) + (𝑟+𝑏−3 ) = (𝑟+𝑏−2 ). 𝑟−2 𝑟−1 𝑟−1 Theorem 5.5.3. 𝑅(𝑟, 𝑏) > (𝑟 − 1)(𝑏 − 1), for all 𝑟, 𝑏 ≥ 2. Proof. It suffices to exhibit a red/blue coloring of 𝐾(𝑟−1)(𝑏−1) that contains no red 𝐾𝑟 and no blue 𝐾𝑏 , which is done as follows. Take 𝑏 − 1 disjoint copies of red 𝐾𝑟−1 ’s, and color each edge connecting vertices in different 𝐾𝑟−1 ’s blue. The resulting coloring contains no red 𝐾𝑟 , since any set of 𝑟 vertices contains at least two from different red
64
5. Graphs and trees
𝐾𝑟−1 ’s. It contains no blue 𝐾𝑏 , since any set of 𝑏 vertices contains at least two from the same red 𝐾𝑟−1 . □ Table 5.2 presents known values of 𝑅(𝑟, 𝑏), taken from Graham, Rothschild, and Spencer (1990). Recall that 𝑅(𝑟, 2) = 𝑟 for all 𝑟 ≥ 2, 𝑅(2, 𝑏) = 𝑏 for all 𝑏 ≥ 2, and 𝑅(𝑚, 𝑛) = 𝑅(𝑛, 𝑚) for all 𝑚, 𝑛 ≥ 2. Table 5.2. Some Ramsey numbers
𝑟=3 𝑟=4
𝑏=3
𝑏=4
𝑏=5
𝑏=6
𝑏=7
𝑏=8
𝑏=9
6
9 18
14 25
18
23
28
36
Theorem 5.5.4 (Ramsey’s theorem for 𝑘 colors). Given an integer 𝑘 ≥ 2, distinct colors 𝑐 1 , . . . , 𝑐 𝑘 , and integers 𝑛1 , . . . , 𝑛𝑘 ≥ 2, there exists a positive integer, and hence a least such integer 𝑅(𝑛1 , . . . , 𝑛𝑘 ), with the following property. If the edges of 𝐾𝑅(𝑛1 ,. . .,𝑛𝑘 ) are colored, using the colors 𝑐 1 , . . . , 𝑐 𝑘 , then, for some 𝑖 ∈ [𝑘], 𝐾𝑅(𝑛1 ,. . .,𝑛𝑘 ) contains a 𝑐 𝑖 -colored 𝐾𝑛𝑖 . Moreover, (5.5.2)
𝑅(𝑛1 , . . . , 𝑛𝑘 ) ≤ 𝑅(𝑅(𝑛1 , . . . , 𝑛𝑘−1 ), 𝑛𝑘 ), for all 𝑘 ≥ 3.
Proof. We prove the existence of 𝑅(𝑛1 , . . . , 𝑛𝑘 ) by induction on 𝑘, the case 𝑘 = 2 holding by Theorem 5.5.1. Suppose that 𝑅(𝑛1 , . . . , 𝑛𝑘−1 ) exists for some 𝑘 ≥ 3. Then, for every 𝑛𝑘 ≥ 2, 𝑅(𝑛1 , . . . , 𝑛𝑘 ) exists and satisfies (5.5.2) by the following argument. Consider a 𝑘-coloring of the edges of the complete graph 𝐾𝑅(𝑅(𝑛1 ,. . .,𝑛𝑘−1 ),𝑛𝑘 ) . Suppose, for vividness, that 𝑐 1 , . . . , 𝑐 𝑘−1 are different shades of red and 𝑐 𝑘 is blue. By not distinguishing among the different shades of red, this coloring can be considered as a red/blue coloring. So regarded, it contains a red 𝐾𝑅(𝑛1 ,. . .,𝑛𝑘−1 ) or a blue 𝐾𝑛𝑘 . In the latter case, the desired result holds. In the former, we now distinguish the different shades of red again, and we have a (𝑘 − 1)-coloring of 𝐾𝑅(𝑛1 ,. . .,𝑛𝑘−1 ) , which, by the induction hypothesis, contains a 𝑐 𝑖 -colored 𝐾𝑛𝑖 for some 𝑖 ∈ [𝑘 − 1]. □ Obviously, 𝑅(𝑛1 , . . . , 𝑛𝑘 ) is symmetric in 𝑛1 , . . . , 𝑛𝑘 . Apart from that and the crude bounds derivable from (5.5.2), along with (5.5.1) and Corollary 5.5.2, the only known significant results are 𝑅(3, 3, 3) = 7 and 13 ≤ 𝑅(4, 4, 3) ≤ 15. The foregoing results, and the following corollary of Theorem 5.5.4 all suggest, roughly speaking, that within sufficiently large random discrete structures there will always exist certain uniform substructures. Corollary 5.5.5 (Schur’s lemma). For every integer 𝑘 ≥ 1, there exists a positive integer, and hence a smallest positive integer 𝑆(𝑘), with the following property. For every 𝑘-coloring of [𝑆(𝑘)], there is a monochromatic solution to 𝑥 + 𝑦 = 𝑧 in [𝑆(𝑘)]. Moreover (5.5.3)
𝑆(𝑘) ≤ 𝑅(3, 3, . . . , 3) − 1. (𝑘 3’s)
5.6. *The probabilistic method
65
Proof. Let 𝑐 ∶ [𝑅(3, 3, . . . 3) − 1] → {𝑐 1 , . . . , 𝑐 𝑘 }. This map induces a 𝑘-coloring of the edges of the complete graph 𝐾𝑅(3,3,. . .,3) , the vertices of which are labeled 1, 2, . . . , 𝑅(3, 3, . . . , 3), by assigning the edge between vertex 𝑖 and vertex 𝑗 the color 𝑐(|𝑖 − 𝑗|). By the definition of 𝑅(3, 3, . . . , 3), we are guaranteed the existence of a monochromatic triangle, say, on the vertices 𝑖, 𝑗, and 𝑘, with 𝑖 < 𝑗 < 𝑘. So 𝑐(𝑗 − 𝑖) = 𝑐(𝑘 − 𝑗) = 𝑐(𝑘 − 𝑖). But (𝑗 − 𝑖) + (𝑘 − 𝑗) = (𝑘 − 𝑖).
□
The numbers 𝑆(𝑘) are called Schur numbers, and their known values are 𝑆(1) = 2, 𝑆(2) = 5, 𝑆(3) = 14, and 𝑆(4) = 45. According to Erickson (1996), Schur established Corollary 5.5.5 while attempting to prove Fermat’s last theorem.
5.6. *The probabilistic method If 𝑆 is a finite, nonempty set and 𝐷 ⊆ 𝑆, then to show that 𝐷 is nonempty, it suffices to show that 𝑝(𝐷) > 0 for some probability distribution 𝑝 on 𝑆. This simple observation can be used to find a strict lower bound on the Ramsey number 𝑅(𝑟, 𝑏), as follows. Let 𝐾𝑛 denote the complete graph on [𝑛]. For each set of vertices 𝐼 ⊆ [𝑛], let 𝐶𝐼𝑟𝑒𝑑 be equal to the set of all red/blue colorings of the edges of 𝐾𝑛 such that all edges connecting vertices in 𝐼 are red, and let 𝐶𝐼𝑏𝑙ᵆ𝑒 be defined analogously. Then (5.6.1)
𝒞𝑟,𝑏 ∶=
⋃
𝐶𝐼𝑟𝑒𝑑 ∪
𝐼⊆[𝑛] |𝐼|=𝑟
⋃
𝐶𝐼𝑏𝑙ᵆ𝑒
𝐼⊆[𝑛] |𝐼|=𝑏
is the set of all red/blue colorings of the edges of 𝐾𝑛 such that 𝐾𝑛 contains a red 𝐾𝑟 or a blue 𝐾𝑏 . If 𝑝(𝒞𝑟,𝑏 ) < 1 for some probability distribution 𝑝 on the set 𝒞 of all 𝑛 2( 2 ) red/blue colorings of the edges of 𝐾 , then 𝒞 is a proper subset of 𝒞, and so 𝑛
𝑟,𝑏
𝑅(𝑟, 𝑏) > 𝑛. Suppose that 𝑝 is the uniform distribution on 𝒞 (or equivalently, that the color of each edge is determined by the flip of a fair coin, with heads ↦ red and tails ↦ blue). Then, by Boole’s inequality, (5.6.2)
𝑟 𝑏 𝑛 1 (2) 𝑛 1 ( 2) 𝑝(𝒞𝑟,𝑏 ) ≤ ∑ 𝑝(𝐶𝐼𝑟𝑒𝑑 ) + ∑ 𝑝(𝐶𝐼𝑏𝑙ᵆ𝑒 ) = ( )( ) + ( )( ) . 𝑏 2 𝑟 2 𝐼⊆[𝑛] 𝐼⊆[𝑛]
|𝐼|=𝑟
|𝐼|=𝑏
So every 𝑛 (and, in particular, the largest 𝑛) satisfying (5.6.3)
𝑟 𝑏 𝑛 1 ( 2) 𝑛 1 (2) ( )( ) + ( )( ) < 1 𝑟 2 𝑏 2
is a strict lower bound on 𝑅(𝑟, 𝑏). In particular, suppose that 𝑟 = 𝑏 = 𝑚. Then (5.6.4)
𝑚 𝑛 ( ) < 2( 2 )−1 ⇒ 𝑅(𝑚, 𝑚) > 𝑛. 𝑚
Remark 5.6.1. Instead of flipping a fair coin to determine whether to color a given edge red or blue, we might use a coin that is biased so that 𝑝(𝑟𝑒𝑑) = 𝑝 and 𝑝(𝑏𝑙𝑢𝑒) =
66
5. Graphs and trees
1 − 𝑝. Then any 𝑛 (and, in particular, the largest 𝑛) satisfying 𝑟 𝑏 𝑛 𝑛 ( )𝑝(2) + ( )(1 − 𝑝)(2) < 1 𝑏 𝑟
(5.6.5)
is a strict lower bound on 𝑅(𝑟, 𝑏). For 3 ≤ 𝑚 ≤ 8, (𝑚 − 1)2 , the lower bound on 𝑅(𝑚, 𝑚) furnished by Theorem 5.5.3 is larger than that derivable from (5.6.4). But for 𝑚 ≥ 9, (5.6.4) yields larger lower bounds. We can also derive a general lower bound on 𝑅(𝑚, 𝑚) from (5.6.4), as follows. Theorem 5.6.2. For all 𝑚 ≥ 2, 𝑅(𝑚, 𝑚) > 2𝑚/2−1/2−1/𝑚 (𝑚! )1/𝑚 .
(5.6.6)
Proof. Let 𝑁 be the smallest value of 𝑛 for which 𝑚 𝑛 ( ) ≥ 2( 2 )−1 . 𝑚
(5.6.7)
If 𝑛 < 𝑁, then 𝑅(𝑚, 𝑚) > 𝑛 by (5.6.4). In particular, 𝑅(𝑚, 𝑚) > 𝑁 − 1, and so 𝑚 1/𝑚
𝑅(𝑚, 𝑚) ≥ 𝑁 = (𝑁 )
𝑁 > (( )𝑚! ) 𝑚
1/𝑚
≥ (2( 2 )−1 𝑚! )1/𝑚 = 2𝑚/2−1/2−1/𝑚 (𝑚! )1/𝑚 . 𝑚
□
References [1] N. Alon and J. Spencer (1992): The Probabilistic Method, Wiley. MR1140703 [2] A. Cayley (1889): A theorem on trees, Quarterly Journal of Mathematics 23, 376–378. [3] M. Erickson (1996): Introduction to Combinatorics, Wiley. MR1409365 [4] R. Graham, B. Rothschild, and J. Spencer (1990): Ramsey Theory, 2𝑛𝑑 edition, Wiley. MR1044995 [5] F. Ramsey (1930): On a problem in formal logic, Proceedings of the London Mathematical Society (2nd series), 264–286. MR1576401 [6] F. Harary (1969): Graph Theory, Addison-Wesley. MR0256911 [7] G. Kirchhoff (1847): Uber die Auflösung der Gleichungen, auf welche man bei der Untersuchung der linearen Verteilen galvanischer Ströme gefuhrt wird, Ann. Phys. Chem. 72, 497–505 (English trans. IRE Trans. Circuit Theory CT-5 (1958), 4–7). [8] D. West (1996): Introduction to Graph Theory, Prentice Hall. MR1367739
Exercises 𝑛−1 5.1. Prove that there are 2( 2 ) graphs on 𝑉 = [𝑛] in which each vertex has even degree.
5.2. Determine the trees on 𝑉 = [8] with Prüfer codes (a) 744171 and (b) 111165.
5.A
67
5.3. Suppose that 𝐴 = {𝑎1 , . . . , 𝑎𝑚 } and 𝐵 = {𝑏1 , . . . , 𝑏𝑛 } are disjoint sets. The complete bipartite graph on 𝐴 ∪ 𝐵, denoted 𝐾𝑚,𝑛 has an edge from each 𝑎𝑖 ∈ 𝐴 to each 𝑏𝑗 ∈ 𝐵, and no other edges. Use Kirchhoff’s theorem to determine the number of spanning trees of 𝐾𝑚,𝑛 . 5.4. (a) Suppose that 𝑛 ≥ 3. It is obvious that there are 𝑛 spanning trees for the graph consisting solely of the cycle 1, 2, 3, . . . , 𝑛, 1. (b) Suppose that 𝑛 ≥ 2. It is obvious that there is just one spanning tree of the graph consisting solely of the path 1, 2, . . . , 𝑛. (c) Derive the assertions in parts (a) and (b) using Kirchhoff’s theorem (Theorem 5.4.1). 5.5. Use the probabilistic method with (a) 𝑝(𝑟𝑒𝑑) = 𝑝(𝑏𝑙𝑢𝑒) = 0.5 and (b) 𝑝(𝑟𝑒𝑑) = 0.4, 𝑝(𝑏𝑙𝑢𝑒) = 0.6 to find strict lower bounds on the Ramsey number 𝑅(4, 5). (c) Can your answers to parts (a) or (b) be improved by invoking other theorems?
Project 5.A Distinct vertices 𝑥, 𝑦, and 𝑧 in a graph form a triangle (or 3-cycle) if {𝑥, 𝑦}, {𝑦, 𝑧}, and {𝑧, 𝑥} are all edges. (a) If a graph on six vertices contains no triangles, at most how many edges can it have? (b) Answer part (a) for a graph on 𝑛 vertices containing no triangles.
Chapter 6
Partitions: Stirling, Lah, and cycle numbers
Set partitions and their “alter egos”, equivalence relations, are among the most fundamental concepts in mathematics. This chapter explores the enumeration of partitions under various constraints, including those in which their blocks are equipped with various discrete structures. A wide variety of combinatorial arrays arise in this way, and they turn out to do double duty as coefficients in expressing members of one polynomial basis as a linear combination of members of another such basis. The chapter concludes by observing that much of elementary enumerative combinatorics can be unified under the rubric (the twenty-fold way) of placing balls (labeled or unlabeled) in boxes (labeled or unlabeled, contents ordered or unordered).
6.1. Stirling numbers of the second kind Recall that a partition of a set 𝐴 is a set of nonempty, pairwise disjoint subsets (called blocks) of 𝐴, with union equal to 𝐴. For each 𝑛, 𝑘 ≥ 0, let 𝑆(𝑛, 𝑘) denote the number of partitions {𝐴1 , . . . , 𝐴𝑘 } of [𝑛] with 𝑘 blocks. The numbers 𝑆(𝑛, 𝑘) are known as Stirling numbers of the second kind, and they clearly also enumerate (i) the equivalence relations on [𝑛] with 𝑘 equivalence classes, and (ii) the distributions of 𝑛 labeled balls among 𝑘 unlabeled (hence, indistinguishable) boxes, with no box left empty. Since the union of 𝑘 nonempty, pairwise disjoint sets has cardinality at least equal to 𝑘, it follows that 𝑆(𝑛, 𝑘) = 0 if 𝑘 > 𝑛. Also 𝑆(0, 0) = 1, by our convention on empty unions. 69
70
6. Partitions: Stirling, Lah, and cycle numbers
Theorem 6.1.1. For all 𝑛, 𝑘 ≥ 0, (6.1.1)
𝑆(𝑛, 𝑘) = 𝜎(𝑛, 𝑘)/𝑘! 𝑘
(6.1.2)
=
1 𝑘 ∑ (−1)𝑘−𝑗 ( )𝑗𝑛 𝑘! 𝑗=0 𝑗
(6.1.3)
=
1 𝑛 ∑ ( ). 𝑘! 𝑛1 +⋯+𝑛𝑘 =𝑛 𝑛1 , . . . , 𝑛𝑘 𝑛𝑖 >0
Moreover, the numbers 𝑆(𝑛, 𝑘) are generated by the boundary values 𝑆(𝑛, 0) = 𝛿𝑛,0 and 𝑆(0, 𝑘) = 𝛿0,𝑘 , for all 𝑛, 𝑘 ≥ 0,
(6.1.4)
and the recurrence relation (6.1.5)
𝑆(𝑛, 𝑘) = 𝑆(𝑛 − 1, 𝑘 − 1) + 𝑘 𝑆(𝑛 − 1, 𝑘), for all 𝑛, 𝑘 > 0,
and they have the exponential generating function ∞
∑ 𝑆(𝑛, 𝑘)
(6.1.6)
𝑛=0
𝑥𝑛 1 = (𝑒𝑥 − 1)𝑘 , for all 𝑘 ≥ 0. 𝑛! 𝑘!
Proof. Since 𝜎(𝑛, 𝑘) = 0 if 𝑘 > 𝑛, and 𝜎(𝑛, 0) = 𝛿𝑛,0 for all 𝑛 ≥ 0, formula (6.1.1) holds for these values of 𝑛 and 𝑘 by our preceding observations. If 1 ≤ 𝑘 ≤ 𝑛, then (6.1.1) follows from the fact that (𝐴1 , . . . , 𝐴𝑘 ) ↦ {𝐴1 , . . . , 𝐴𝑘 } is a 𝑘!-to-one surjection from the set of ordered partitions of [𝑛] with 𝑘 blocks to the set of partitions of [𝑛] with 𝑘 blocks. Formulas (6.1.2)–(6.1.6) then follow from formula (6.1.1), along with formulas (3.3.13), (4.1.3), Theorem 2.4.1, and example (4.3.3), respectively. Naturally, there is also a direct combinatorial proof of the recurrence relation (6.1.5): among all partitions of [𝑛] with 𝑘 blocks, 𝑆(𝑛 − 1, 𝑘 − 1) enumerates those in which {𝑛} is one of the blocks (the relevant bijection being {𝐴1 , . . . , 𝐴𝑘−1 , {𝑛}} ↦ {𝐴1 , . . . , 𝐴𝑘−1 }), and 𝑘𝑆(𝑛 − 1, 𝑘) enumerates those in which 𝑛 belongs to a block of cardinality at least 2 (the relevant 𝑘-to-one surjection being the map {𝐴1 , . . . , 𝐴𝑘 } ↦ {𝐴∗1 , . . . , 𝐴∗𝑘 }, where 𝐴∗𝑖 = 𝐴𝑖 if 𝑛 ∉ 𝐴𝑖 and 𝐴∗𝑖 = 𝐴𝑖 − {𝑛} if 𝑛 ∈ 𝐴𝑖 . □
Table 6.1. The numbers 𝑆(𝑛, 𝑘) for 0 ≤ 𝑛, 𝑘 ≤ 6 (Stirling numbers of the second kind)
𝑛=0 𝑛=1 𝑛=2 𝑛=3 𝑛=4 𝑛=5 𝑛=6
𝑘=0 1 0 0 0 0 0 0
𝑘=1 0 1 1 1 1 1 1
𝑘=2 0 0 1 3 7 15 31
𝑘=3 0 0 0 1 6 25 90
𝑘=4 0 0 0 0 1 10 65
𝑘=5 0 0 0 0 0 1 15
𝑘=6 0 0 0 0 0 0 1
6.1. Stirling numbers of the second kind
71
The row sums 𝑛
𝐵𝑛 ∶= ∑ 𝑆(𝑛, 𝑘)
(6.1.7)
𝑘=0
are called Bell numbers (after Eric Temple Bell, 1883–1960), and they enumerate (i) the set of all partitions of [𝑛], (ii) the set of all equivalence relations on [𝑛], and (iii) the set of all distributions of 𝑛 labeled balls among 𝑛 unlabeled boxes. Theorem 6.1.2. The Bell numbers are generated by the initial value 𝐵0 = 1 and the recurrence 𝑛
𝐵𝑛 = ∑ (
(6.1.8)
𝑘=1
𝑛−1 )𝐵 , for all 𝑛 > 0, 𝑘 − 1 𝑛−𝑘
or, equivalently (and more aesthetically), 𝑛
𝑛 𝐵𝑛+1 = ∑ ( )𝐵𝑘 , for all 𝑛 ≥ 0. 𝑘 𝑘=0
(6.1.9)
Proof. Among all partitions of [𝑛], (𝑛−1 enumerates those for which the cardi)𝐵 𝑘−1 𝑛−𝑘 nality of the block containing the element 𝑛 is equal to 𝑘. □ The exponential generating function for the Bell numbers is one of the prettiest formulas in mathematics. Theorem 6.1.3. ∞
∑ 𝐵𝑛
(6.1.10)
𝑛=0
𝑥𝑛 𝑥 = 𝑒𝑒 −1 . 𝑛!
Proof. By (6.1.6) and (6.1.7), ∞
∑ 𝐵𝑛 𝑛=0
∞
𝑛
∞
∞
∞
(𝑒𝑥 − 1)𝑘 𝑥𝑛 𝑥𝑛 𝑥𝑛 𝑥 = ∑ ∑ 𝑆(𝑛, 𝑘) = ∑ ( ∑ 𝑆(𝑛, 𝑘) ) = ∑ = 𝑒𝑒 −1 . 𝑛! 𝑛! 𝑛! 𝑘! 𝑘=0 𝑛=0 𝑘=0 𝑘=0 𝑛=0
The summation interchange here can be justified analytically or by the theory of formal power series. □ From (6.1.10) we may derive a series representation for 𝐵𝑛 analogous to formula (4.3.5). Theorem 6.1.4 (Dobinski’s formula (1877)). For all 𝑛 ≥ 0, ∞
(6.1.11)
𝐵𝑛 =
𝑘𝑛 1 ∑ . 𝑒 𝑘=0 𝑘!
Proof. ∞
𝑥 −1
𝐵𝑛 = 𝐷𝑛 𝑒𝑒
|𝑥=0 =
∞
1 𝑛 𝑒𝑥 1 𝑒𝑘𝑥 1 𝑘𝑛 𝐷 𝑒 |𝑥=0 = 𝐷𝑛 ∑ |𝑥=0 = ∑ . 𝑒 𝑒 𝑘! 𝑒 𝑘=0 𝑘! 𝑘=0
□
72
6. Partitions: Stirling, Lah, and cycle numbers
∞
𝑝(𝑘)
Remark 6.1.5. It follows from (6.1.11) that any infinite series of the form ∑𝑘=0 𝑘! , where 𝑝(𝑘) is a polynomial in 𝑘, may easily be summed. For if 𝑝(𝑘) = 𝑎0 + 𝑎1 𝑘 + ⋯ + 𝑎𝑛 𝑘𝑛 , then ∞
∞
𝑛
𝑛
∞
𝑛
𝑝(𝑘) 1 𝑘𝑗 ∑ 𝑎𝑗 𝑘𝑗 = ∑ 𝑎𝑗 ∑ = ∑ = ( ∑ 𝑎𝑗 𝐵𝑗 )𝑒. 𝑘! 𝑘! 𝑗=0 𝑘! 𝑘=0 𝑘=0 𝑗=0 𝑘=0 𝑗=0 ∑
(6.1.12)
Remark 6.1.6. From the generating function (6.1.10) one can also derive the recurrence (6.1.9). Differentiation of (6.1.10) yields 𝑛
(6.1.13)
∑ 𝐵𝑛+1 𝑛≥0
𝑛 𝑥𝑛 𝑥𝑛 𝑥𝑛 𝑥𝑛 𝑥 = 𝑒𝑥 ⋅ 𝑒𝑒 −1 = ∑ ⋅ ∑ 𝐵𝑛 = ∑ ( ∑ ( )𝐵𝑘 ) , 𝑛! 𝑛! 𝑛≥0 𝑛! 𝑘 𝑛! 𝑛≥0 𝑛≥0 𝑘=0
and equating coefficients of like powers of 𝑥 yields formula (6.1.9).
6.2. Restricted growth functions In studying an unordered discrete structure, it is often useful to represent it by a socalled canonical ordered structure. In the case of a partition {𝐴1 , . . . , 𝐴𝑘 } of [𝑛], a commonly employed canonical representative is the ordered partition (𝐵1 , . . . , 𝐵𝑘 ) gotten by arranging the blocks 𝐴𝑖 in increasing order of their smallest elements. So, for example, the partition {{6, 4, 2}, {3, 9}, {8}, {5, 1, 7}} is represented by the canonical ordered partition ({5, 1, 7}, {6, 4, 2}, {3, 9}, {8}), which is often further simplified by writing the elements of each block in increasing order, and dispensing with brackets, parentheses, etc., as follows: 157 246 39 8 . Having constructed the canonical ordered partition representing a given partition, one then goes on to define a surjective function 𝑔 ∶ [𝑛] → [𝑘] where 𝑔(𝑖) = 𝑗 whenever 𝑖 ∈ 𝐵𝑗 . Such surjections have the property that, in the sequence of functional values (𝑔(1), . . . , 𝑔(𝑛)), the first appearance of 𝑗 precedes the first appearance of 𝑗 + 1 for 𝑗 = 1, . . . , 𝑘 − 1. In particular, it is always the case that 𝑔(1) = 1. Surjections 𝑔 ∶ [𝑛] → [𝑘] with the latter property are called restricted growth functions from [𝑛] to [𝑘], and they clearly stand in one-to-one correspondence with canonical ordered partitions of [𝑛] with 𝑘 blocks and, hence, with partitions of [𝑛] with 𝑘 blocks. As an illustration, the partition of [9] represented by 157 246 39 8 is associated with the restricted growth function 𝑔 ∶ [9] → [4], where (𝑔(1), . . . , 𝑔(9)) = (1, 2, 3, 2, 1, 2, 1, 4, 3). As a consequence of the preceding observations, we now have an additional combinatorial interpretation of the Stirling number 𝑆(𝑛, 𝑘) as enumerator of the set of all restricted growth functions 𝑔 ∶ [𝑛] → [𝑘]. This leads to yet another representation of 𝑆(𝑛, 𝑘). Theorem 6.2.1. For all 𝑛, 𝑘 ≥ 0, (6.2.1)
𝑆(𝑛, 𝑘) =
∑
0𝑑0 1𝑑1 2𝑑2 ⋯ 𝑘𝑑𝑘 .
𝑑0 +𝑑1 +⋯+𝑑𝑘 =𝑛−𝑘 𝑑𝑖 ≥0
Proof. If 𝑘 = 0, the sum in question consists of the single term 0𝑛 = 𝛿𝑛,0 . If 𝑘 > 𝑛 the sum in question is empty and thus takes the value 0. If 1 ≤ 𝑘 ≤ 𝑛, then, among all restricted growth functions 𝑔 ∶ [𝑛] → [𝑘], the term 0𝑑0 1𝑑1 2𝑑2 ⋯ 𝑘𝑑𝑘 enumerates those having the property that, in the sequence (𝑔(1), . . . , 𝑔(𝑛)), there are 𝑑𝑗 numbers
6.3. The numbers 𝜎(𝑛, 𝑘) and 𝑆(𝑛, 𝑘) as connection constants
73
appearing between the first occurrence of 𝑗 and the first occurrence of 𝑗 + 1, for 𝑗 = 1, . . . , 𝑘 − 1, and there are 𝑑𝑘 numbers appearing after the first occurrence of 𝑘. □ From formula (6.2.1) it is straightforward to derive the ordinary generating function of the 𝑘th column of Stirling’s triangle. Theorem 6.2.2. For all 𝑘 ∈ ℕ, ∞
(6.2.2)
∑ 𝑆(𝑛, 𝑘)𝑥𝑛 = 𝑛=0
𝑥𝑘 , for |𝑥| < 1/𝑘. (1 − 𝑥)(1 − 2𝑥) ⋯ (1 − 𝑘𝑥) □
Proof. Exercise.
There is also a very fine-grained enumeration of partitions of [𝑛] with 𝑘 blocks. Suppose that 1 ≤ 𝑘 ≤ 𝑛, 𝜆1 + ⋯ + 𝜆𝑛 = 𝑘 and 1𝜆1 + 2𝜆2 + ⋯ + 𝑛𝜆𝑛 = 𝑛, with each 𝜆𝑖 ∈ ℕ. Let 𝑆(𝑛, 𝑘; 1𝜆1 2𝜆2 ⋯ 𝑛𝜆𝑛 ) be equal by definition to the number of partitions of [𝑛] with 𝑘 blocks, 𝜆𝑖 of which have cardinality 𝑖, for 𝑖 = 1, . . . , 𝑛. Theorem 6.2.3. If 1 ≤ 𝑘 ≤ 𝑛, then (6.2.3)
𝑆(𝑛, 𝑘; 1𝜆1 2𝜆2 ⋯ 𝑛𝜆𝑛 ) =
𝑛! . 𝜆1 ! 𝜆2 ! ⋯ 𝜆𝑛 ! (1! )𝜆1 (2! )𝜆2 ⋯ (𝑛! )𝜆𝑛
Proof. The map 𝑖1 𝑖2 ⋯𝑖𝑛 ↦ {{𝑖1 }, {𝑖2 }, . . . , {𝑖𝜆1 }, {𝑖𝜆1 +1 , 𝑖𝜆1 +2 }, . . . , {𝑖𝜆1 +2𝜆2 −1 , 𝑖𝜆1 +2𝜆2 }, . . . }, which creates one-element blocks from the first 𝜆1 entries of the permutation 𝑖1 𝑖2 ⋯ 𝑖𝑛 , two-element blocks from the next 𝜆2 pairs, three-element blocks from the next 𝜆3 triples, etc., is a 𝜆1 ! 𝜆2 ! ⋯ 𝜆𝑛 ! (1! )𝜆1 (2! )𝜆2 ⋯ (𝑛! )𝜆𝑛 -to-one surjection from the set of permutations of [𝑛], construed as words, to the set of partitions of [𝑛] under consideration. □
6.3. The numbers 𝜎(𝑛, 𝑘) and 𝑆(𝑛, 𝑘) as connection constants Recall that any sequence of polynomials 𝑝0 (𝑥), 𝑝1 (𝑥), . . . , 𝑝𝑛 (𝑥) over a field 𝐾, with deg 𝑝 𝑘 (𝑥) = 𝑘, is a basis for the vector space of all polynomials in 𝐾[𝑥] of degree ≤ 𝑛. In particular, the binomial polynomials (𝑥0), (𝑥1), . . . , (𝑛𝑥), as well as the falling factorial polynomials 𝑥0 , 𝑥1 , . . . , 𝑥𝑛 , are bases for the vector space of all polynomials of degree ≤ 𝑛 over any field of characteristic 0. Hence, the monomial 𝑥𝑛 may be written both as a linear combination of binomial polynomials, and as a linear combination of falling factorial polynomials. In what follows, we will have occasion to consider a number of polynomial identities. We will usually offer combinatorial, rather than algebraic, proofs of these identities, based on the familiar fact that a nonzero polynomial of degree 𝑛 over an integral domain 𝐴 has at most 𝑛 roots in 𝐴, multiplicities counted. So if 𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛 𝑥𝑛 and 𝑞(𝑥) = 𝑏0 + 𝑏1 𝑥 + ⋯ + 𝑏𝑛 𝑥𝑛 are polynomials of degree ≤ 𝑛 over 𝐴, and there exist distinct 𝑟1 , . . . , 𝑟𝑛+1 ∈ 𝐴 such that 𝑝(𝑟 𝑘 ) = 𝑞(𝑟 𝑘 ), for 1 ≤ 𝑘 ≤ 𝑛+1, then 𝑝(𝑥) = 𝑞(𝑥), in the sense that 𝑎𝑖 = 𝑏𝑖 , for 0 ≤ 𝑖 ≤ 𝑛. In other words, to prove such a polynomial identity, it suffices to prove sufficiently many instantiations of that identity. In the cases that we consider, such instantiations result in identities between nonnegative integers, which we establish by combinatorial arguments. Here is a typical example.
74
6. Partitions: Stirling, Lah, and cycle numbers
Theorem 6.3.1. For all 𝑛 ≥ 0, 𝑛
(6.3.1)
𝑥 𝑥𝑛 = ∑ 𝜎(𝑛, 𝑘)( ), and so 𝑘 𝑘=0
(6.3.2)
𝑥𝑛 = ∑ 𝑆(𝑛, 𝑘)𝑥𝑘 .
𝑛 𝑘=0
Proof. By the preceding remarks, to establish (6.3.1), it more than suffices to show 𝑛 that for all 𝑟 ∈ ℙ, 𝑟𝑛 = ∑𝑘=0 𝜎(𝑛, 𝑘)(𝑘𝑟 ), which follows by observing that, among all functions 𝑓 ∶ [𝑛] → [𝑟], the term 𝜎(𝑛, 𝑘)(𝑘𝑟 ) enumerates those for which | im(𝑓)| = 𝑘. Formula (6.3.2) follows from (6.1.1) and (6.3.1). □ Remark 6.3.2 (Another proof of the recurrence for the Bell numbers). Although our combinatorial proof of formula (6.1.9) proof is simple, clear, and memorable, it is worth considering a proof of the recurrence 𝑛
(6.3.3)
𝑛 𝐵𝑛+1 = ∑ ( )𝐵𝑘 , for all 𝑛 ≥ 0, 𝑘 𝑘=0
that employs a powerful and broadly applicable tool of combinatorial analysis devised by Gian-Carlo Rota (1964), the method of linear functionals. Since the polynomial sequence (𝑥𝑘 )𝑘≥0 is a basis of the ℚ-vector space ℚ[𝑥], we may define a linear functional 𝐿 on ℚ[𝑥] by setting 𝐿(𝑥𝑘 ) = 1 for all 𝑘 ≥ 0. Then formula (6.3.2) implies that 𝑛 𝐿(𝑥𝑛 ) = ∑𝑘=0 𝑆(𝑛, 𝑘) = 𝐵𝑛 , and so formula (6.3.3) may be established by proving that 𝑛
(6.3.4)
𝑛 𝐿(𝑥𝑛+1 ) = ∑ ( )𝐿(𝑥𝑘 ) = 𝐿((𝑥 + 1)𝑛 ). 𝑘 𝑘=0
Since 𝑥𝑘+1 = 𝑥 ⋅ (𝑥 − 1)𝑘 , we have 𝐿(𝑥 ⋅ (𝑥 − 1)𝑘 ) = 𝐿(𝑥𝑘+1 ) = 1 = 𝐿(𝑥𝑘 ), for all 𝑘 ≥ 0. Again, since (𝑥𝑘 )𝑘≥0 is a basis of ℚ[𝑥], it follows (you should check this) that (6.3.5)
𝐿(𝑥 ⋅ 𝑝(𝑥 − 1)) = 𝐿(𝑝(𝑥)), for every 𝑝(𝑥) ∈ ℚ[𝑥].
In particular, (6.3.5) holds for 𝑝(𝑥) = (𝑥 + 1)𝑛 , so that 𝐿(𝑥𝑛+1 ) = 𝐿((𝑥 + 1)𝑛 ).
□
Remark 6.3.3 (Applications to probability). Recall that the 𝑛th moment of a random variable 𝑋 is equal to the expected value 𝐸[𝑋 𝑛 ], and the 𝑛th factorial moment of X to the expected value 𝐸[𝑋 𝑛 ]. As the latter is often easier to evaluate than the former, the 𝑛 formula 𝐸[𝑋 𝑛 ] = ∑𝑘=0 𝑆(𝑛, 𝑘)𝐸[𝑋 𝑘 ], which follows from (6.3.2) and the linearity of the expectation operator, is worth remembering. In particular, one often computes the 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 of 𝑋, Var[𝑋] = 𝐸[𝑋 2 ] − (𝐸[𝑋])2 , by the alternative formula Var[𝑋] = 𝐸[𝑋 2 ] + 𝐸[𝑋](1 − 𝐸[𝑋]). The integer-valued random variable 𝑋 has a Poisson distribution with parameter 𝜆𝑘
𝜆 > 0 (symbolized, 𝑋 ∼ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆)) if 𝑃(𝑋 = 𝑘) = 𝑒−𝜆 𝑘! , 𝑘 = 0, 1, 2, . . . . It is easy to show that 𝐸[𝑋] = 𝜆. More generally (as you are asked to show in one of the exercises), 𝑛 𝐸[𝑋 𝑛 ] = ∑𝑘=0 𝑆(𝑛, 𝑘)𝜆𝑘 . In particular, if 𝑋 ∼ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(1), then 𝐸[𝑋 𝑛 ] = 𝐵𝑛 , the 𝑛th Bell number, which is also immediately obvious from Dobinski’s formula (Theorem 6.1.4).
6.4. Stirling numbers of the first kind
75
6.4. Stirling numbers of the first kind Stirling numbers of the first kind, denoted 𝑠(𝑛, 𝑘), are the connection constants between the basis of falling factorial polynomials of degree ≤ 𝑛 and the basis 1, 𝑥, . . . , 𝑥𝑛 . That is, for all 𝑛 ≥ 0, 𝑛
(6.4.1)
𝑥𝑛 = 𝑥(𝑥 − 1) ⋯ (𝑥 − 𝑛 + 1) = ∑ 𝑠(𝑛, 𝑘)𝑥𝑘 . 𝑘=0
Remark 6.4.1 (Elementary symmetric functions). Recall that 𝑛
𝑛
(6.4.2) (𝑥 + 𝑎1 )(𝑥 + 𝑎2 ) ⋯ (𝑥 + 𝑎𝑛 ) = ∑ 𝐸𝑛−𝑗 (𝑎1 , . . . , 𝑎𝑛 )𝑥𝑗 = ∑ 𝐸𝑗 (𝑎1 , . . . , 𝑎𝑛 )𝑥𝑛−𝑗 , 𝑗=0
𝑗=0
where 𝐸𝑗 (𝑎1 , . . . , 𝑎𝑛 ) is the 𝑗th elementary symmetric function in the numbers 𝑎1 , . . . , 𝑎𝑛 , i.e., (6.4.3)
𝐸𝑗 (𝑎1 , . . . , 𝑎𝑛 ) ∶=
∑
𝑎𝑖1 𝑎𝑖2 ⋯ 𝑎𝑖𝑗 .
1≤𝑖1 0, (6.6.1)
𝑐(𝑛, 𝑘) = 𝑐(𝑛 − 1, 𝑘 − 1) + (𝑛 − 1)𝑐(𝑛 − 1, 𝑘).
Proof. Among all distributions of the type under consideration, 𝑐(𝑛 − 1, 𝑘 − 1) enumerates those in which ball 𝑛 is the sole occupant of its box. Correspondingly, (𝑛 − 1)𝑐(𝑛 − 1, 𝑘) enumerates those distributions in which ball 𝑛 shares a box with at least one other ball, since the map which removes ball 𝑛 from that box is an (𝑛 − 1)-toone surjection between the relevant classes of distributions. (Why?) □ Remark 6.6.2. The numbers 𝑐(𝑛, 𝑘) are called cycle numbers, for the following reason. Suppose that 𝜋 ∶ [𝑛] → [𝑛] is a permutation of [𝑛]. The values taken on by 𝜋 are often 1 2 ⋅ ⋅ ⋅ 𝑛 exhibited as the word 𝜋(1)𝜋(2) ⋯ 𝜋(𝑛) or as the array ( ). 𝜋(1) 𝜋(2) ⋅ ⋅ ⋅ 𝜋(𝑛) In addition, there is the so-called cycle notation for 𝜋, which, for the permutation 𝜋 = 1 2 3 4 5 6 7 ), is given by (14)(2)(375)(6). Note that 1 ↦ 4 ↦ 1, 2 ↦ 2, 3 ↦ ( 4 2 7 1 3 6 5 7 ↦ 5 ↦ 3, and 6 ↦ 6, i.e., the permutation 𝜋 may be decomposed into four cycles. Of course we could have represented the cycle (14) equally well by (41), and the cycle (375) equally well by (753). And we need not have arranged the four cycles above in exactly that order. Our cyclic decomposition of 𝜋 is arranged in the canonical form in
6.6. Restricted ordered occupancy: Cycle numbers
79
which the smallest member of each cycle is listed first, and the cycles are arranged, left to right, in increasing order of their first elements. But now it is clear that the number of permutations of [𝑛] with 𝑘 cycles is equal to 𝑐(𝑛, 𝑘), the number of distributions of balls labeled 1, . . . , 𝑛 among 𝑘 unlabeled, contents-ordered boxes, with no box left empty, and with the ball with smallest label in a box placed first in that box, since we can always arrange those unlabeled boxes from left to right, in increasing order of the numbers on their leftmost balls. Table 6.4. The numbers 𝑐(𝑛, 𝑘) for 0 ≤ 𝑛, 𝑘 ≤ 6 (cycle numbers)
𝑛=0 𝑛=1 𝑛=2 𝑛=3 𝑛=4 𝑛=5 𝑛=6
𝑘=0 1 0 0 0 0 0 0
𝑘=1 0 1 1 2 6 24 120
𝑘=2 0 0 1 3 11 50 274
𝑘=3 0 0 0 1 6 35 225
𝑘=4 0 0 0 0 1 10 50
𝑘=5 0 0 0 0 0 1 15
𝑘=6 0 0 0 0 0 0 1
Since 𝑐(𝑛, 𝑘) is the number of permutations of [𝑛] with 𝑘 cycles, it is unsurprising 𝑛 that ∑𝑘=0 𝑐(𝑛, 𝑘) = 𝑛!, that 𝑐(𝑛, 1) = (𝑛 − 1)!, and that 𝑐(𝑛, 𝑛) = 1. Comparison of Table 6.4 with Table 6.2 also suggests the following theorem. Theorem 6.6.3. For all 𝑛, 𝑘 ≥ 0, 𝑐(𝑛, 𝑘) = |𝑠(𝑛, 𝑘)|. Proof. Since 𝑐(𝑛, 0) = |𝑠(𝑛, 0)| = 𝛿𝑛,0 and 𝑐(0, 𝑘) = |𝑠(0, 𝑘)| = 𝛿0,𝑘 , for all 𝑛, 𝑘 ≥ 0, it suffices to show that, for all 𝑛, 𝑘 > 0, (6.6.2)
|𝑠(𝑛, 𝑘)| = |𝑠(𝑛 − 1, 𝑘 − 1)| + (𝑛 − 1)|𝑠(𝑛 − 1, 𝑘)|.
By formula (6.4.4), 𝑠(𝑛, 𝑘) = (−1)𝑛−𝑘 𝐸𝑛−𝑘 (1, 2, . . . , 𝑛 − 1). So (6.6.3)
(−1)𝑛−𝑘 𝑠(𝑛, 𝑘) = 𝐸𝑛−𝑘 (1, 2, . . . , 𝑛 − 1) = |𝑠(𝑛, 𝑘)|.
But from 𝑠(𝑛, 𝑘) = 𝑠(𝑛 − 1, 𝑘 − 1) − (𝑛 − 1)𝑠(𝑛 − 1, 𝑘) it follows that (−1)𝑛−𝑘 𝑠(𝑛, 𝑘) = (−1)(𝑛−1)−(𝑘−1) 𝑠(𝑛−1, 𝑘−1)+(𝑛−1)(−1)𝑛−1−𝑘 𝑠(𝑛−1, 𝑘), which, by (6.6.3), establishes (6.6.2). □ As a result of Theorem 6.6.3, the cycle numbers 𝑐(𝑛, 𝑘) are sometimes called signless Stirling numbers of the first kind. Unsurprisingly, the numbers 𝑐(𝑛, 𝑘) are the connection constants between the basis 𝑥0 , 𝑥1 , . . . , 𝑥𝑛 of rising factorial polynomials and the basis of monomials 𝑥0 , 𝑥1 , . . . , 𝑥𝑛 . Theorem 6.6.4. For all 𝑛 ≥ 0, 𝑛
(6.6.4)
𝑥𝑛 = 𝑥(𝑥 + 1) ⋯ (𝑥 + 𝑛 − 1) = ∑ 𝑐(𝑛, 𝑘)𝑥𝑘 . 𝑘=0
80
6. Partitions: Stirling, Lah, and cycle numbers
Proof. Substitute –𝑥 for 𝑥 in formula (6.4.1), and apply formula (6.6.3) and Theorem 6.6.3. We may also give a combinatorial proof by showing that for all 𝑟 ∈ ℙ, 𝑛
(6.6.5)
𝑟𝑛 = ∑ 𝑐(𝑛, 𝑘)𝑟𝑘 . 𝑘=0
If 𝑛 = 0, (6.6.5) is obvious, so suppose that 𝑛 ≥ 1. Let 𝒜 be equal to the set of all distributions of 𝑛 balls, labeled 1, . . . , 𝑛, among 𝑟 contents-ordered boxes, labeled 1, . . . , 𝑟. Let ℬ be equal to the set of all distributions of 𝑛 balls, labeled 1, . . . , 𝑛, among 𝑛 contentsordered, unlabeled boxes, where the ball with smallest label in each box is placed first in that box, and, for 𝑘 = 1, . . . , 𝑛, let ℬ𝑘 be equal to the set of distributions in ℬ with exactly 𝑘 nonempty boxes. Note that {ℬ1 , . . . , ℬ𝑛 } is a partition of ℬ. We will exhibit a map from 𝒜 to ℬ such that, for each 𝑘 = 1, . . . , 𝑛, every distribution in ℬ𝑘 has 𝑟𝑘 preimages in 𝒜 under this map. By Theorem 2.5.5 (used here for the first time), it will 𝑛 then follow that |𝒜| = ∑𝑘=1 |ℬ𝑘 |𝑟𝑘 , and since |𝒜| = 𝑟𝑛 and |ℬ𝑘 | = 𝑐(𝑛, 𝑘), this will establish (6.6.5). The map is defined as follows. Given a distribution in 𝒜, for example (with 𝑛 = 9 and 𝑟 = 4), (6.6.6)
[∧ 9∧ 7∧ 56]1 [∧ 34∧ 1]2 [ ]3 [∧ 28]4 ,
move left to right through the boxes, creating a break (indicated above by ∧ ) before each new record low, in that box (the initial element in each box is, by convention, a record low), and from this configuration create 𝑛 unlabeled contents-ordered boxes (using as many empty boxes as necessary) where the ball with smallest label is first in each box. For example, the configuration (6.6.6) yields the nine-box configuration (with 𝑘 = 6) (6.6.7)
[9][7][56][34][1][28][ ][ ][ ].
(Note that the order in which the above boxes appear is immaterial.) The above configuration has 46 preimages under this map: the sequence in each of the 6 nonempty boxes in (6.6.7) may be placed in any of the boxes labeled 1, . . . , 4, and if more than one sequence is placed in a given labeled box, they are to be ordered in decreasing order of their first elements. Here, for example, are two preimages of the configuration (6.6.7) distinct from (6.6.6): (6.6.8)
[ ]1 [975634281]2 [ ]3 [ ]4
and (6.6.9)
[561]1 [934]2 [728]3 [ ]4 .
More generally, it is clear that any distribution in ℬ𝑘 has 𝑟𝑘 preimages in 𝒜 under the map in question. □ It turns out that the cycle numbers count other categories of permutations as well. Given a permutation of [𝑛] represented as the word 𝑖1 𝑖2 ⋯ 𝑖𝑛 and where 𝑗 ∈ [𝑛], the element 𝑖𝑗 is called a (1) left-to-right minimum (or record low) if 𝑘 < 𝑗 ⇒ 𝑖𝑘 > 𝑖𝑗 . (2) left-to-right maximum (or record high) if 𝑘 < 𝑗 ⇒ 𝑖𝑘 < 𝑖𝑗 . (3) right-to-left minimum if 𝑘 > 𝑗 ⇒ 𝑖𝑘 > 𝑖𝑗 . (4) right-to-left maximum if 𝑘 > 𝑗 ⇒ 𝑖𝑘 < 𝑖𝑗 .
6.6. Restricted ordered occupancy: Cycle numbers
81
Note that the first element of a permutation is, vacuously, both a left-to-right minimum and a left-to-right maximum. Similarly, the last element is, vacuously, both a right-toleft minimum and a right-to-left maximum. For example, the permutation 4271365 has three left-to-right minima (4, 2, and 1), two left-to-right maxima (4 and 7), three right-to-left minima (5, 3, and 1), and three right-to-left maxima (5, 6, and 7). Theorem 6.6.5. For all 𝑛, 𝑘 ≥ 0, 𝑐(𝑛, 𝑘) enumerates the permutations of [𝑛] with (1) 𝑘 left-to-right minima, (2) 𝑘 left-to-right maxima, (3) 𝑘 right-to-left minima, and (4) 𝑘 right-to-left maxima. Proof. Let Π0 denote the set of permutations of [𝑛] with 𝑘 cycles, and for 𝑡 = 1, 2, 3, 4, let Π𝑡 denote the set of permutations of [𝑛] satisfying condition (𝑡) above. We have shown earlier that |Π0 | = 𝑐(𝑛, 𝑘). Given 𝜋 ∈ Π0 , take its canonical cyclic representation (smallest element first in each cycle, cycles listed in increasing order of their first elements), and rearrange the cycles so that they are listed in decreasing order of their first elements. Now read off the elements of the cycles from left to right. The result is a permutation (represented as a word) with 𝑘 left-to-right minima, and the map just constructed is a bijection from Π0 to Π1 . As an illustration, with 𝑛 = 7 and 𝑘 = 4, the permutation 𝜋 = (14)(2)(375)(6) is mapped to the permutation 6375214, which has four left-to-right minima (6, 3, 2, and 1). The remainder of the proof, involving the construction of bijections from Π1 to Π2 , Π1 to Π3 , and Π2 to Π4 , is left as an exercise. □ Finally, we note that there is also a very fine-grained enumeration of permutations of [𝑛] with 𝑘 cycles. Suppose that 1 ≤ 𝑘 ≤ 𝑛, 𝜆1 +⋯+𝜆𝑛 = 𝑘, and 1𝜆1 +2𝜆2 +⋯+𝑛𝜆𝑛 = 𝑛, with each 𝜆𝑖 ∈ ℕ. Let 𝑐(𝑛, 𝑘; 1𝜆1 2𝜆2 ⋯ 𝑛𝜆𝑛 ) be equal by definition to the number of permutations of [𝑛] with 𝑘 cycles, 𝜆𝑖 of which have length 𝑖, 𝑖 = 1, . . . , 𝑛. Theorem 6.6.6 (Cauchy’s formula). If 1 ≤ 𝑘 ≤ 𝑛, then (6.6.10)
𝑐(𝑛, 𝑘; 1𝜆1 2𝜆2 ⋯ 𝑛𝜆𝑛 ) =
𝑛! . 𝜆1 ! 𝜆2 ! ⋯ 𝜆𝑛 ! (1)𝜆1 (2)𝜆2 ⋯ (𝑛)𝜆𝑛
Proof. We construe 𝑐(𝑛, 𝑘; 1𝜆1 2𝜆2 ⋯ 𝑛𝜆𝑛 ) as the number of distributions of balls labeled 1, . . . , 𝑛 among 𝑘 unlabeled, contents-ordered boxes, with no box left empty, with the ball with smallest label placed first in each box, and with 𝜆𝑗 being the number of boxes that receive exactly 𝑗 balls. Let ℬ denote the set of such distributions, and let 𝒜 be the set of permutations of balls 1, . . . , 𝑛. Given the permutation 𝜋 = 𝑏1 𝑏2 ⋯ 𝑏𝑛 ∈ 𝒜, create boxes for each of the first 𝜆1 balls in 𝜋, each of the next 𝜆2 pairs of balls, each of the next 𝜆3 triples of balls, etc., initially placing the balls in the boxes in the order in which they appear in 𝜋 (if 𝜆𝑗 = 0, no such boxes are created). Then move the ball with smallest label in each box to the initial position in that box, leaving the other balls undisturbed. The result is a distribution in ℬ, and each distribution in ℬ has 𝜆1 ! 𝜆2 ! ⋯ 𝜆𝑛 ! (1)𝜆1 (2)𝜆2 ⋯ (𝑛)𝜆𝑛 preimages in 𝒜 under this map. (Why?) □
82
6. Partitions: Stirling, Lah, and cycle numbers
6.7. Balls and boxes: The twenty-fold way Many of the numbers considered in the preceding chapters can be construed as counting distributions of 𝑛 balls among 𝑘 boxes, with the balls labeled 1, . . . , 𝑛 or unlabeled, and the boxes labeled 1, . . . , 𝑘 or unlabeled, and with the contents of the boxes either ordered (in which case the balls must be labeled) or unordered under various conditions. The idea of organizing the results of elementary combinatorics in this way originated with Gian-Carlo Rota. We have dubbed the following table The Twenty-fold Way, by analogy with Joel Spencer’s use of the term twelve-fold way to describe the first three lines of this table. The numbers designated by 𝑝(𝑛, 𝑘) in the table enumerate the partitions of the integer n with k parts (multisets of 𝑘 positive integers summing to 𝑛) or, equivalently, the number of distributions of 𝑛 unlabeled balls among 𝑘 unlabeled boxes, with no box left empty. We will study these numbers in more detail in section 8.2. Table 6.5. The twenty-fold way
balls labeled 1, . . . , 𝑛; boxes labeled 1, . . . , 𝑘 𝑘𝑛
balls labeled 1, . . . , 𝑛; 𝑘 unlabeled boxes 1, if 𝑛 ≤ 𝑘; 0, if 𝑛 > 𝑘
No box empty
𝜎(𝑛, 𝑘)
𝑆(𝑛, 𝑘) = 𝜎(𝑛, 𝑘)/𝑘!
No restrictions
𝑘𝑛
∑𝑗=1 𝑆(𝑛, 𝑗)
Contents ordered, no box empty
𝜆(𝑛, 𝑘) = 𝑛! (𝑛−1 ) 𝑘−1
𝐿(𝑛, 𝑘) 𝑛! = 𝑘! (𝑛−1 ) 𝑘−1
Contents ordered
𝑘𝑛 = 𝑛! (𝑛+𝑘−1 ) 𝑛 𝑘! 𝑐(𝑛, 𝑘)
∑𝑗=1 𝐿(𝑛, 𝑗)
At most one ball per box
Contents ordered, smallest first, no box empty Contents ordered, smallest first
𝑘
𝑗 ∑𝑗=1 𝑘 𝑐(𝑛, 𝑗)
𝑛 unlabeled balls; boxes labeled 1, . . . , 𝑘 (𝑛𝑘) (𝑛−1 ) 𝑘−1
𝑘
𝑛 unlabeled balls; 𝑘 unlabeled boxes 1, if 𝑛 ≤ 𝑘; 0, if 𝑛 > 𝑘 𝑝(𝑛, 𝑘) 𝑘
(𝑛+𝑘−1 ) 𝑛
∑𝑗=1 𝑝(𝑛, 𝑗)
𝑘
𝑐(𝑛, 𝑘) = |𝑠(𝑛, 𝑘)|
𝑘
∑𝑗=1 𝑐(𝑛, 𝑗)
References [1] G. Dobinski (1877): Grunert’s Archiv 61, 333-336. [2] I. Lah (1955): Eine neue Art von Zahlen, und ihre Anwendung in der mathematischen Statistik, Mitteilungsblatt für Mathematischen Statistik 7, 203–212. MR74435
Exercises
83
[3] G.-C. Rota (1964): The number of partitions of a set, American Mathematical Monthly 71, 498–504. MR161805
Exercises 6.1. (For students of probability theory) (a) Suppose that the random variable 𝑋 has a Poisson distribution, with param𝑛 eter 𝜆 > 0 (see Remark 6.3.3 above). Prove that 𝐸[𝑋 𝑛 ] = ∑𝑘=0 𝑆(𝑛, 𝑘)𝜆𝑘 . (b) Suppose that 𝑟, 𝑛 ∈ ℙ. Let the random variable 𝑋 record the number of fixed points in a randomly chosen permutation of [𝑟]. Determine 𝐸[𝑋 𝑛 ]. 6.2. Suppose that 𝑓, 𝑔 ∶ [𝑛] → [𝑘], where 𝑛, 𝑘 > 0. We say that 𝑓 is codomain equivalent to g (symbolized by 𝑓 ≈ 𝑔) if there exists a permutation 𝜎 ∶ [𝑘] → [𝑘] such that 𝑓 = 𝜎 ∘ 𝑔. (a) Prove that ≈ is an equivalence relation on [𝑘][𝑛] . (b) Into how many equivalence classes does ≈ partition [𝑘][𝑛] ? Consider in particular the case in which 𝑘 ≥ 𝑛. (c) Suppose that 𝑛 ≥ 𝑘. Into how many equivalence classes does ≈ partition [𝑛] [𝑘]surj ? 6.3. Let 𝐼 be a nonempty set of positive integers. For all 𝑛, 𝑘 ≥ 0, let 𝑆(𝑛, 𝑘; 𝐼) denote the number of partitions of [𝑛] with 𝑘 blocks, each having a cardinality belong𝑛 ing to 𝐼, and let 𝐵(𝑛; 𝐼) ∶= ∑𝑘=0 𝑆(𝑛, 𝑘; 𝐼). Prove theorems analogous to Theorems 4.3.1 and 4.3.4 for the exponential generating functions of the sequences (𝑆(𝑛, 𝑘; 𝐼))𝑛≥0 and (𝐵(𝑛; 𝐼))𝑛≥0 . 6.4. For all 𝑛 ≥ 0, let 𝐵𝑛∗ be equal to the number of partitions of [𝑛] having no blocks of cardinality 1. (a) Determine the exponential generating function of (𝐵𝑛∗ )𝑛≥0 . ∗ , for all 𝑛 ≥ 0. (b) Give a combinatorial proof of the formula 𝐵𝑛 = 𝐵𝑛∗ + 𝐵𝑛+1 6.5. (a) Find a formula for 𝑄𝑛 ∶= the number of symmetric, transitive relations on [𝑛]. (b) Determine the exponential generating function of (𝑄𝑛 )𝑛≥0 . ∞
𝑥𝑛
6.6. (a) Suppose that ∑𝑛=0 𝑎𝑛 𝑛! = 𝑒sinh(𝑥) . Find a combinatorial interpretation of 𝑎𝑛 . ∞ 𝑥𝑛 (b) Suppose that ∑𝑛=0 𝑏𝑛 𝑛! = 𝑒cosh(𝑥)−1 . Find a combinatorial interpretation of 𝑏𝑛 . ∞
6.7. Prove Euler’s theorem: For all 𝑛 ≥ 0, 𝑛! = ∫0 𝑒−𝑡 𝑡𝑛 𝑑𝑡. ∞
𝑥𝑛
∞
6.8. Let (𝑎𝑛 )𝑛≥0 be a sequence in ℝ, with 𝐸(𝑥) = ∑𝑛=0 𝑎𝑛 𝑛! and 𝐴(𝑥) = ∑𝑛=0 𝑎𝑛 𝑥𝑛 . ∞ Using the result of Exercise 6.7, prove that 𝐴(𝑥) = ∫0 𝑒−𝑡 𝐸(𝑥𝑡)𝑑𝑡. 6.9. Use the result of Exercise 6.8 to derive formula (6.2.2) from formula (6.1.6). 6.10. Find a formula for 𝜎(𝑛, 𝑘; 1𝜆1 2𝜆2 ⋯ 𝑛𝜆𝑛 ), defined by analogy with 𝑆(𝑛, 𝑘; 1𝜆1 2𝜆2 ⋯ 𝑛𝜆𝑛 ).
84
6. Partitions: Stirling, Lah, and cycle numbers
6.11. (a) Derive formula (6.2.2) from formula (6.2.1). (b) Derive formula (6.2.1) from formula (6.2.2). 6.12. Prove Theorem 6.4.3 (write 𝑥𝑛 as 𝑥𝑛−1 (𝑥 − (𝑛 − 1))). 6.13. Prove the following orthogonality relations for the Stirling numbers of the first and second kind. For all 𝑛, 𝑗 ∈ ℕ, with 0 ≤ 𝑗 ≤ 𝑛, 𝑛 (a) ∑𝑘=𝑗 𝑠(𝑛, 𝑘)𝑆(𝑘, 𝑗) = 𝛿𝑛,𝑗 , and 𝑛
(b) ∑𝑘=𝑗 𝑆(𝑛, 𝑘)𝑠(𝑘, 𝑗) = 𝛿𝑛,𝑗 . 6.14. Establish the following recurrence relations for Stirling numbers of the second kind, where 𝑛, 𝑘 > 0. Combinatorial proofs are preferred. 𝑛−1 (a) 𝑆(𝑛, 𝑘) = ∑𝑗=0 (𝑛−1 )𝑆(𝑗, 𝑘 − 1). 𝑗 (b) 𝑆(𝑛, 𝑘) =
1 𝑘
𝑛
∑𝑗=1 (𝑛𝑗)𝑆(𝑛 − 𝑗, 𝑘 − 1). 𝑛−1
(c) 𝑆(𝑛, 𝑘) = ∑𝑗=0 𝑘𝑛−𝑗−1 𝑆(𝑗, 𝑘 − 1). 6.15. Suppose that 1 ≤ 𝑘 ≤ 𝑛. Find a formula for 𝐿(𝑛, 𝑘; 1𝜆1 2𝜆2 ⋯ 𝑛𝜆𝑛 ), defined by analogy with 𝑆(𝑛, 𝑘; 1𝜆1 2𝜆2 ⋯ 𝑛𝜆𝑛 ). 6.16. Let the numbers 𝑙(𝑛, 𝑘) be defined as connection constants between falling and 𝑛 rising factorial polynomials, i.e., 𝑥𝑛 = ∑𝑘=0 𝑙(𝑛, 𝑘)𝑥𝑘 . Determine a formula for 𝑙(𝑛, 𝑘). 𝑥𝑛
6.17. For each 𝑘 ≥ 0, determine the exponential generating function ∑𝑛≥0 𝐿(𝑛, 𝑘) 𝑛! . 𝑛
6.18. (a) Give an algebraic proof of the identity 𝐿(𝑛, 𝑘) = ∑𝑗=𝑘 𝑐(𝑛, 𝑗)𝑆(𝑗, 𝑘). (b) Give a combinatorial proof of the above identity. 6.19. State and prove an analogue of the multinomial theorem for (𝑥1 + ⋯ + 𝑥𝑘 )𝑛 . 6.20. It is a trivial algebraic exercise to show that 𝑘𝑛 = (𝑛 + 𝑘 − 1)𝑛 for all 𝑛, 𝑘 ≥ 0. Give a combinatorial proof of this identity for 𝑛, 𝑘 > 0, using the interpretation of 𝑘𝑛 furnished by Theorem 6.5.1. 6.21. Give a combinatorial proof of the recurrence 𝑛−1 𝑗
𝑐(𝑛, 𝑘) = ∑ (𝑛 − 1) 𝑐(𝑛 − 𝑗 − 1, 𝑘 − 1), for 𝑛, 𝑘 ≥ 1. 𝑗=0
Note that if 𝑛 < 𝑘, the above identity reduces to 0 = 0, so that you only need consider the cases where 1 ≤ 𝑘 ≤ 𝑛. Then, the upper limit 𝑛 − 1 in the above sum may be replaced with 𝑛 − 𝑘. (Why?) 6.22. A family 𝒜 of subsets of a set Ω is called an algebra if (i) ∅ ∈ 𝒜, (ii) Ω ∈ 𝒜, (iii) 𝐴, 𝐵 ∈ 𝒜 ⇒ 𝐴 ∪ 𝐵 ∈ 𝒜, and (iv) 𝐴 ∈ 𝒜 ⇒ 𝐴𝑐 ∈ 𝒜 (whence, 𝐴, 𝐵 ∈ 𝒜 ⇒ 𝐴 ∩ 𝐵 ∈ 𝒜). Determine the number of distinct algebras on the set Ω = [𝑛].
6.B
85
Projects 6.A 𝑛
For each 𝑛 ≥ 0, let 𝐿(𝑛) ∶= ∑𝑘=0 𝐿(𝑛, 𝑘). (a) Find a terse combinatorial interpretation of 𝐿(𝑛) and determine the exponential 𝑥𝑛 generating function ∑𝑛≥0 𝐿(𝑛) 𝑛! . 𝑛
(b) Use your answer to part (a) to prove that 𝐿(𝑛 + 1) = ∑𝑗=0 (𝑛𝑗)(𝑛 − 𝑗 + 1)! 𝐿(𝑗), for all 𝑛 ≥ 0. 𝑥𝑛 𝑥𝑛 Hint: Use the fact that 𝐷𝑥 ∑𝑛≥0 𝐿(𝑛) 𝑛! = ∑𝑛≥0 𝐿(𝑛 + 1) 𝑛! . (c) Give a combinatorial proof of the recurrence relation in your answer to part (b). (d) Derive the recurrence stated in part (b) by using the method of linear functionals, as described above in section 6.3.2. (e) Derive the recursive formula 𝐿(0) = 𝐿(1) = 1 and 𝐿(𝑛 + 1) = (2𝑛 + 1)𝐿(𝑛)− (𝑛2 − 𝑛)𝐿(𝑛 − 1), for all 𝑛 ≥ 1, using your answer to part (a) above. (f) Give a combinatorial proof of the recurrence in part (e) above.
6.B A permutation 𝑖1 𝑖2 ⋯ 𝑖𝑛 is said to have an ascent from 𝑖𝑗 to 𝑖𝑗+1 , where 1 ≤ 𝑗 ≤ 𝑛 − 1, if 𝑖𝑗 < 𝑖𝑗+1 . For all 𝑛, 𝑘 ≥ 0, the Eulerian number 𝐴(𝑛, 𝑘) denotes the number of permutations of [𝑛] with exactly 𝑘 ascents. Clearly, 𝐴(𝑛, 𝑘) = 0 if 𝑘 ≥ 𝑛 ≥ 1, and 𝐴(𝑛, 0) = 1 for all 𝑛 ≥ 0 (with the convention that the empty permutation of the set [0] = ∅ has no ascents). (a) Prove that 𝐴(𝑛, 1) = 2𝑛 − 𝑛 − 1, for all 𝑛 ≥ 0. (b) Prove that 𝐴(𝑛, 𝑘) = 𝐴(𝑛, 𝑛 − 𝑘 − 1), for 0 ≤ 𝑘 ≤ 𝑛 − 1. (c) Prove that 𝐴(𝑛, 𝑘) = (𝑛 − 𝑘)𝐴(𝑛 − 1, 𝑘 − 1) + (𝑘 + 1)𝐴(𝑛 − 1, 𝑘), for all 𝑛, 𝑘 > 0, and tabulate the values of 𝐴(𝑛, 𝑘) for 0 ≤ 𝑛, 𝑘 ≤ 6. (d) Prove that the Stirling number of the second kind 𝑆(𝑛, 𝑘) = for 0 ≤ 𝑘 ≤ 𝑛. 𝑛−1
1 𝑘!
𝑘
∑𝑗=0 𝐴(𝑛, 𝑗)(𝑛−𝑗 ), 𝑘−𝑗
(e) Prove Worpitsky’s identity, 𝑥𝑛 = ∑𝑘=0 𝐴(𝑛, 𝑘)(𝑥+𝑘 ), for all 𝑛 ≥ 0. 𝑛
Chapter 7
Intermission: Some unifying themes
This chapter identifies and develops several unifying themes that underlie material in the preceding chapters. The first is a broadly comprehensive generating function, the exponential formula, which includes as special cases many of the generating functions that we have encountered in the text and exercises of Chapters 4 and 6. The second, the theory of Comtet–Lancaster numbers, furnishes a general theory of connection constants, displaying their relations with certain classes of recursive formulas and generating functions. The proofs of many of the following theorems are left as exercises to encourage the reader to review and reframe earlier material.
7.1. The exponential formula Let us call a function 𝑤 ∶ ℙ → ℝ a weight function and denote its exponential generating function by (7.1.1)
𝑤(𝑥) = ∑ 𝑤(𝑛) 𝑛≥1
𝑥𝑛 . 𝑛!
If (𝐴1 , . . . , 𝐴𝑘 ) is an ordered partition of [𝑛] and 𝑤 is a weight function, let (7.1.2)
𝑤(𝐴1 , . . . , 𝐴𝑘 ) ∶= 𝑤(|𝐴1 |) ⋯ 𝑤(|𝐴𝑘 |)
and (7.1.3)
𝑃(𝑛, 𝑘; 𝑤) ∶=
∑
𝑤(𝐴1 , . . . , 𝐴𝑘 ),
(𝐴1 ,. . .,𝐴𝑘 )
the sum in (7.1.3) being taken over all ordered partitions (𝐴1 , . . . , 𝐴𝑘 ) of [𝑛] with 𝑘 blocks. 87
88
7. Intermission: Some unifying themes
Theorem 7.1.1. For all 𝑘 ≥ 0 and all weight functions 𝑤, 𝑥𝑛 ∑ 𝑃(𝑛, 𝑘; 𝑤) (7.1.4) = (𝑤(𝑥))𝑘 . 𝑛! 𝑛≥0 □
Proof. Exercise. Theorem 7.1.2. If 𝑃(𝑛; 𝑤) ∶= (7.1.5)
𝑛 ∑𝑘=0
𝑃(𝑛, 𝑘; 𝑤), then
∑ 𝑃(𝑛; 𝑤) 𝑛≥0
𝑥𝑛 1 = . 𝑛! 1 − 𝑤(𝑥) □
Proof. Exercise. Similarly, if {𝐴1 , . . . , 𝐴𝑘 } is a partition of [𝑛] and 𝑤 is a weight function, let (7.1.6)
𝑤{𝐴1 , . . . , 𝐴𝑘 } ∶= 𝑤(|𝐴1 |) ⋯ 𝑤(|𝐴𝑘 |),
(7.1.7)
𝑆(𝑛, 𝑘; 𝑤) ∶=
∑
𝑤{𝐴1 , . . . , 𝐴𝑘 },
{𝐴1 ,. . .,𝐴𝑘 }
where the sum in (7.1.7) is taken over all partitions {𝐴1 , . . . , 𝐴𝑘 } of [𝑛] with 𝑘 blocks, and (7.1.8)
(𝑤)
𝐵𝑛
𝑛
∶= ∑ 𝑆(𝑛, 𝑘; 𝑤). 𝑘=0
(𝑤) The numbers 𝑆(𝑛, 𝑘; 𝑤) and 𝐵𝑛
are called, respectively, weighted Stirling numbers and weighted Bell numbers, and they reduce to the classical Stirling and Bell numbers when 𝑤(𝑛) ≡ 1. Theorem 7.1.3. For all 𝑘 ≥ 0 and all weight functions 𝑤, 𝑥𝑛 1 ∑ 𝑆(𝑛, 𝑘; 𝑤) (7.1.9) = (𝑤(𝑥))𝑘 . 𝑛! 𝑘! 𝑛≥0 Proof. Immediate, from (7.1.4) and the fact that 𝑃(𝑛, 𝑘; 𝑤) = 𝑘! 𝑆(𝑛, 𝑘; 𝑤).
□
Theorem 7.1.4 (The exponential formula). For all weight functions 𝑤, 𝑛 (𝑤) 𝑥 ∑ 𝐵𝑛 (7.1.10) = 𝑒𝑤(𝑥) . 𝑛! 𝑛≥0 Proof. Exercise.
□
Example 7.1.5. If 𝐼 ⊆ ℙ and 𝑤 = 𝜒𝐼 , the characteristic function of the set 𝐼, Theorems 7.1.1 and 7.1.2 reduce to Theorems 4.3.1 and 4.3.4. Theorems 7.1.3 and 7.1.4 reduce to the partition analogues of Theorems 4.3.1 and 4.3.4 that you were asked to derive in Exercise 6.3. Remark 7.1.6 (Lah numbers are weighted Stirling numbers). Formulas (7.1.9) and (7.1.10) are often useful in counting partitions, the blocks of which are equipped with certain discrete structures. In such cases, 𝑤(𝑛) designates the number of structures of the relevant type that can be defined on an 𝑛-element set. The Lah number 𝐿(𝑛, 𝑘) can
7.1. The exponential formula
89
clearly be construed as enumerating partitions of [𝑛] with 𝑘 blocks, each block being equipped with a total order, and so (7.1.11)
𝐿(𝑛, 𝑘) = 𝑆(𝑛, 𝑘; 𝑤), with 𝑤(𝑛) = 𝑛! .
It is then straightforward to show that (7.1.12)
𝑤(𝑥) = ∑ 𝐿(𝑛, 𝑘)
(7.1.13)
𝑛≥0
and, with 𝐿(𝑛) =
𝑛 ∑𝑘=0
𝑥 , 1−𝑥
1 𝑥 𝑘 𝑥𝑛 = ( ) , 𝑛! 𝑘! 1 − 𝑥
𝐿(𝑛, 𝑘), ∑ 𝐿(𝑛)
(7.1.14)
𝑛≥0
𝑥 𝑥𝑛 = 𝑒 1−𝑥 , 𝑛!
results which you may have derived in Project 6.A using the closed form 𝐿(𝑛, 𝑘) = 𝑛! 𝑛−1 ( ). 𝑘! 𝑘−1 Remark 7.1.7 (Cycle numbers are weighted Stirling numbers). The cycle number 𝑐(𝑛, 𝑘) can clearly be construed as enumerating partitions of [𝑛] with 𝑘 blocks, each block of which is a equipped with a total order for which the numerically smallest element of the block is listed first (or, equivalently, with a cyclic permutation). So (7.1.15)
𝑐(𝑛, 𝑘) = 𝑆(𝑛, 𝑘; 𝑤), with 𝑤(𝑛) = (𝑛 − 1)! .
It is then straightforward to show that 𝑥𝑛 1 (7.1.16) 𝑤(𝑥) = ∑ = ln , for − 1 ≤ 𝑥 < 1 𝑛 1 − 𝑥 𝑛≥1 and so ∑ 𝑐(𝑛, 𝑘)
(7.1.17)
𝑛≥0
1 1 𝑘 𝑥𝑛 = {ln } . 𝑛! 𝑘! 1−𝑥
Remark 7.1.8 (Dérangements are weighted partitions). A dérangement of [𝑛] can clearly be construed as a partition of [𝑛] with no singleton blocks and with all blocks equipped with a total order for which the numerically smallest element of the block is listed first. So with 𝑑𝑛 being the number of dérangements of [𝑛], (𝑤)
(7.1.18)
𝑑𝑛 = 𝐵𝑛 , where 𝑤(1) = 0 and 𝑤(𝑛) = (𝑛 − 1)! if 𝑛 ≥ 2.
It then follows from the exponential formula that 𝑥𝑛 𝑥𝑛 1 ∑ 𝑑𝑛 = exp ∑ = exp{ln( ) − 𝑥} 𝑛! 𝑛 1 − 𝑥 𝑛≥0 𝑛≥2 (7.1.19) 𝑛 1 −𝑥 1 (−1)𝑛 𝑥𝑛 (−1)𝑘 𝑛 1 ∑ = 𝑒 = = ∑(∑ )𝑥 . 1−𝑥 1 − 𝑥 𝑛≥0 𝑛! 𝑘! 𝑛≥0 𝑘=0 𝑛
From (7.1.19) it follows that 𝑑𝑛 = 𝑛! ∑𝑘=0 using the sieve. 1
Recall that
1 1−𝑥
(−1)𝑘 , 𝑘!
𝑛
∑𝑛≥0 𝑎𝑛 𝑥𝑛 = ∑𝑛≥0 𝑠𝑛 𝑥𝑛 , where 𝑠𝑛 = ∑𝑘−0 𝑎𝑘 .
a formula that we derived earlier
90
7. Intermission: Some unifying themes
7.2. Comtet’s theorem This section deals with a class of triangular arrays with the following attractive property. Once one exhibits either of two recursive formulas, or a certain type of closed form known as a complete symmetric function, or a column generating function for the numbers in the array, or shows that the rows of the array are connection constants of a certain type, the four remaining ways of characterizing the array follow immediately. Recall that every sequence (𝑝 𝑘 (𝑥))𝑘≥0 of polynomials over a field 𝐾, with deg 𝑝 𝑘 (𝑥) = 𝑘, is a basis for the 𝐾-vector space 𝐾[𝑥] of all polynomials over 𝐾, and (𝑝 𝑘 (𝑥))0≤𝑘≤𝑛 is a basis for the subspace of 𝐾[𝑥] comprising all polynomials of degree ≤ 𝑛. In particular, if 𝑛 deg 𝑝(𝑥) ≤ 𝑛, then there exist unique 𝑐 𝑘 ∈ 𝐾 such that 𝑝(𝑥) = ∑𝑘=0 𝑐 𝑘 𝑝 𝑘 (𝑥). In what follows, we write sums of the preceding type using the simpler notation ∑𝑘≥0 𝑐 𝑘 𝑝 𝑘 (𝑥) with the understanding that 𝑐 𝑘 = 0 if 𝑘 > 𝑛. Theorem 7.2.1 (Comtet (1972)). Let (𝑏𝑛 )𝑛≥0 be a sequence of complex numbers, and define polynomials 𝜑𝑛 (𝑥) for all 𝑛 ≥ 0 by (7.2.1)
𝜑0 (𝑥) = 1 and 𝜑𝑛 (𝑥) = (𝑥 − 𝑏0 )(𝑥 − 𝑏1 ) ⋯ (𝑥 − 𝑏𝑛−1 ), for all 𝑛 ≥ 1.
The following are equivalent characterizations of an array (𝐴(𝑛, 𝑘))𝑛,𝑘≥0 ∶ For all 𝑛 ≥ 0, 𝑥𝑛 = ∑ 𝐴(𝑛, 𝑘)𝜑𝑘 (𝑥).
(7.2.2)
𝑘≥0
(7.2.3)
For all 𝑛, 𝑘 > 0, 𝐴(𝑛, 𝑘) = 𝐴(𝑛 − 1, 𝑘 − 1) + 𝑏𝑘 𝐴(𝑛 − 1, 𝑘),
subject to the boundary conditions 𝐴(𝑛, 0) = 𝑏𝑛0 and 𝐴(0, 𝑘) = 𝛿0,𝑘 , for all 𝑛, 𝑘 ≥ 0. 𝑛
𝑛−𝑗
For all 𝑛, 𝑘 > 0, 𝐴(𝑛, 𝑘) = ∑ 𝐴(𝑗 − 1, 𝑘 − 1)𝑏𝑘 ,
(7.2.4)
𝑗=𝑘
subject to the same boundary conditions as (7.2.3). For all 𝑘 ≥ 0, ∑ 𝐴(𝑛, 𝑘)𝑥𝑛 =
(7.2.5)
𝑛≥0
(7.2.6)
𝑥𝑘 . (1 − 𝑏0 𝑥)(1 − 𝑏1 𝑥) ⋯ (1 − 𝑏𝑘 𝑥) 𝑑
𝑑
𝑑
𝑏00 𝑏1 1 ⋯ 𝑏𝑘𝑘 .
∑
For all 𝑛, 𝑘 ≥ 0, 𝐴(𝑛, 𝑘) =
𝑑0 +𝑑1 +⋯+𝑑𝑘 =𝑛−𝑘 𝑑𝑖 ≥0
Proof. (7.2.2) ⇒(7.2.3): Setting 𝑥 = 𝑏0 in (7.2.2) yields 𝐴(𝑛, 0) = 𝑏𝑛0 . In particular, 𝐴(0, 0) = 1. It is implicit in (7.2.2) that 𝐴(𝑛, 𝑘) = 0 if 𝑛 < 𝑘. In particular, 𝐴(0, 𝑘) = 0 if 𝑘 > 0, and so 𝐴(0, 𝑘) = 𝛿0,𝑘 . If 𝑛 > 0, then ∑ 𝐴(𝑛, 𝑘)𝜑𝑘 (𝑥) = 𝑥𝑛 = 𝑥𝑛−1 ⋅ 𝑥 = ∑ 𝐴(𝑛 − 1, 𝑘)𝜑𝑘 (𝑥) ⋅ (𝑥 − 𝑏𝑘 + 𝑏𝑘 ) 𝑘≥0
𝑘≥0
= ∑ 𝐴(𝑛 − 1, 𝑘)𝜑𝑘+1 (𝑥) + ∑ 𝑏𝑘 𝐴(𝑛 − 1, 𝑘)𝜑𝑘 (𝑥) 𝑘≥0
𝑘≥0
= ∑ 𝐴(𝑛 − 1, 𝑘 − 1)𝜑𝑘 (𝑥) + 𝑏0 𝐴(𝑛 − 1, 0) + ∑ 𝑏𝑘 𝐴(𝑛 − 1, 𝑘)𝜑𝑘 (𝑥) 𝑘≥1
=
𝑏𝑛0
𝑘≥1
+ ∑ {𝐴(𝑛 − 1, 𝑘 − 1) + 𝑏𝑘 𝐴(𝑛 − 1, 𝑘)}𝜑𝑘 (𝑥). 𝑘≥1
7.2. Comtet’s theorem
91
Equating coefficients of 𝜑𝑘 (𝑥) for each 𝑘 > 1 establishes (7.2.3). (7.2.3) ⇒ (7.2.2): Since 𝐴(0, 𝑘) = 𝛿0,𝑘 , we have 𝑥0 = 1 = ∑𝑘≥0 𝐴(0, 𝑘)𝜑𝑘 (𝑥). Suppose that 𝑥𝑛−1 = ∑𝑘≥0 𝐴(𝑛 − 1, 𝑘)𝜑𝑘 (𝑥) for some 𝑛 > 0. Then 𝑥𝑛 = 𝑥𝑛−1 ⋅ 𝑥 = ∑ 𝐴(𝑛 − 1, 𝑘)𝜑𝑘 (𝑥)(𝑥 − 𝑏𝑘 + 𝑏𝑘 ) 𝑘≥0
= ∑ 𝐴(𝑛 − 1, 𝑘)𝜑𝑘+1 (𝑥) + ∑ 𝑏𝑘 𝐴(𝑛 − 1, 𝑘)𝜑𝑘 (𝑥) 𝑘≥0
𝑘≥0
=(as above) 𝑏0 𝐴(𝑛 − 1, 0)𝜑0 (𝑥) + ∑ {𝐴(𝑛 − 1, 𝑘 − 1) + 𝑏𝑘 𝐴(𝑛 − 1, 𝑘)}𝜑𝑘 (𝑥) 𝑘≥1
= ∑ 𝐴(𝑛, 𝑘)𝜑𝑘 (𝑥). 𝑘≥0
(7.2.3) ⇔ (7.2.4): Exercise. (7.2.3) ⇒ (7.2.5): Since 𝐴(𝑛, 0) = 𝑏𝑛0 , (7.2.5) holds for 𝑘 = 0. Suppose that ∑𝑛≥0 𝐴(𝑛, 𝑘 − 1)𝑥𝑛 =
𝑥𝑘−1 (1−𝑏0 𝑥)⋯(1−𝑏𝑘−1 𝑥)
for some 𝑘 > 0. Then
∑ 𝐴(𝑛, 𝑘)𝑥𝑛 𝑛≥0
= 𝐴(0, 𝑘) + ∑ {𝐴(𝑛 − 1, 𝑘 − 1) + 𝑏𝑘 𝐴(𝑛 − 1, 𝑘)}𝑥𝑛 𝑛≥1
= 0 + 𝑥 ∑ 𝐴(𝑛 − 1, 𝑘 − 1)𝑥𝑛−1 + 𝑏𝑘 ∑ 𝐴(𝑛 − 1, 𝑘)𝑥𝑛 𝑛≥1
𝑛≥1 𝑛
= 𝑥 ∑ 𝐴(𝑛, 𝑘 − 1)𝑥 + 𝑏𝑘 𝑥 ∑ 𝐴(𝑛, 𝑘)𝑥𝑛 𝑛≥0
𝑛≥0 𝑘
=
𝑥 + 𝑏𝑘 𝑥 ∑ 𝐴(𝑛, 𝑘)𝑥𝑛 , whence (1 − 𝑏𝑘 𝑥) ∑ 𝐴(𝑛, 𝑘)𝑥𝑛 (1 − 𝑏0 𝑥) ⋯ (1 − 𝑏𝑘−1 𝑥) 𝑛≥0 𝑛≥0
=
𝑥𝑘 , which yields (7.2.5). (1 − 𝑏0 𝑥) ⋯ (1 − 𝑏𝑘−1 𝑥)
(7.2.5) ⇒ (7.2.6): By (7.2.5), 𝑑
𝑑
∑ 𝐴(𝑛, 𝑘)𝑥𝑛 = 𝑥𝑘 ∑ 𝑏00 𝑥𝑑0 . . . ∑ 𝑏𝑘𝑘 𝑥𝑑𝑘 𝑛≥0
𝑑0 ≥0
=
∑
𝑑𝑘 ≥0 𝑑 𝑏00
𝑑
⋯ 𝑏𝑘𝑘 𝑥𝑑0 +⋯+𝑑𝑘 +𝑘
𝑑0 ,. . .,𝑑𝑘 ≥0
= ∑{
∑
𝑑
𝑑
𝑏00 ⋯ 𝑏𝑘𝑘 }𝑥𝑛 ,
𝑛≥0 𝑑0 +⋯+𝑑𝑘 +𝑘=𝑛 𝑑𝑖 ≥0
from which (7.2.6) follows by equating coefficients of 𝑥𝑛 . (7.2.6) ⇒ (7.2.3): Given (7.2.6), it follows immediately that 𝐴(𝑛, 0) = 𝑏𝑛0 and, if 𝑘 > 𝑛, that 𝐴(𝑛, 𝑘) is an empty sum, which by convention takes the value zero. In particular, 𝐴(0, 𝑘) = 0 if 𝑘 > 0. Suppose that 𝑛, 𝑘 > 0. Separating the sum in (7.2.6) into two parts, corresponding to the cases 𝑑𝑘 = 0 and 𝑑𝑘 > 0, we may express 𝐴(𝑛, 𝑘)
92
7. Intermission: Some unifying themes
in the form 𝑑
𝑑
𝑘−1 𝑏00 ⋯ 𝑏𝑘−1
∑
𝐴(𝑛, 𝑘) =
𝑑0 +⋯+𝑑𝑘−1 =(𝑛−1)−(𝑘−1) 𝑑𝑖 ≥0 𝑑 −1
𝑑
𝑏00 ⋯ 𝑏𝑘𝑘
∑
+ 𝑏𝑘 {
}
𝑑0 +⋯+(𝑑𝑘 −1)=𝑛−1−𝑘 𝑑0 ,. . .,𝑑𝑘−1 ,𝑑𝑘 −1≥0
□
= 𝐴(𝑛 − 1, 𝑘 − 1) + 𝑏𝑘 𝐴(𝑛 − 1, 𝑘).
Remark 7.2.2. Let us call the numbers 𝐴(𝑛, 𝑘) the Comtet numbers associated with (𝑏𝑛 ). If 𝑏𝑛 = 𝑛, then 𝐴(𝑛, 𝑘) = 𝑆(𝑛, 𝑘), the Stirling number of the second kind. In particular, the recursive formula (6.1.5) for 𝑆(𝑛, 𝑘) leads immediately to the closed form (6.2.1), the column generating function (6.2.2), and the connection constant formula (6.3.2). If 𝑏𝑛 ≡ 1, then 𝐴(𝑛, 𝑘) = (𝑛𝑘). In particular, (7.2.5) yields the column generating 𝑥𝑘
function ∑𝑛≥0 (𝑛𝑘)𝑥𝑛 = (1−𝑥)𝑘+1 . We shall apply Comtet’s theorem to 𝑞-analogues of the binomial coefficients in Chapter 11. Remark 7.2.3. Recall that, for all 𝑗 ≥ 0, the 𝑗th elementary symmetric function 𝐸𝑗 (𝑢1 , . . . , 𝑢𝑛 ) in the quantities 𝑢1 , . . . , 𝑢𝑛 is given by the formula (7.2.7)
∑
𝐸𝑗 (𝑢1 , . . . , 𝑢𝑛 ) =
𝑢𝑖1 𝑢𝑖2 ⋯ 𝑢𝑖𝑗 .
1≤𝑖1 0,
Exercises
93
𝑛−1
subject to the boundary conditions 𝐴∗ (𝑛, 0) = ∏𝑖=0 (𝑎𝑖 + 𝑏0 ) and 𝐴∗ (0, 𝑘) = 𝛿0,𝑘 , for all 𝑛, 𝑘 ≥ 0. 𝑛
(7.3.4)
𝑛−1
𝐴∗ (𝑛, 𝑘) = ∑ 𝐴∗ (𝑗 − 1, 𝑘 − 1) ∏(𝑎𝑖 + 𝑏𝑘 ), for all 𝑛, 𝑘 > 0, 𝑗=𝑘
𝑖=𝑗
subject to the same boundary conditions as (7.3.3). Proof. Prove that (7.3.2) ⇔ (7.3.3) and (7.3.3) ⇔ (7.3.4).
□
Remark 7.3.2. We call 𝜑𝑛 (𝑥) the falling factorial polynomial of degree n associated with (𝑏𝑛 ) and call 𝜌𝑛 (𝑥) the rising factorial polynomial of degree n associated with (𝑎𝑛 ). When 𝑎𝑛 ≡ 0, then 𝐴∗ (𝑛, 𝑘) = 𝐴(𝑛, 𝑘), and the equivalence of conditions (7.3.2)–(7.3.4) reduces to the equivalence of conditions (7.2.2)-(7.2.4). (i) If 𝑎𝑛 = 𝑏𝑛 = 𝑛, then 𝐴∗ (𝑛, 𝑘) = 𝐿(𝑛, 𝑘), the Lah number. (ii) If 𝑎𝑛 = 𝑛 and 𝑏𝑛 ≡ 0, then 𝐴∗ (𝑛, 𝑘) = 𝑐(𝑛, 𝑘), the cycle number. (iii) If 𝑎𝑛 = −𝑛 and 𝑏𝑛 = 0, then 𝐴∗ (𝑛, 𝑘) = 𝑠(𝑛, 𝑘), the Stirling number of the first kind. Readers are encouraged to check that properties of the latter three arrays, established earlier by separate arguments, are simply special cases of formulas (7.3.2)–(7.3.4).
References [1] L. Comtet (1972): Nombres de Stirling généraux et fonctions symmetriques, C. R. Acad. Sc. Paris 275: Série A, 747–750. MR307928 [2] M. Lancaster (1996): Generalizations of the Stirling, Lah, and Cycle Numbers of the First and Second Kinds, M.S. Thesis, University of Tennessee. [3] H. Wilf (1990): generatingfunctionology, Academic Press. MR1034250
Exercises 7.1. Prove Theorem 7.1.1. 7.2. (a) Prove Theorem 7.1.2. (b) Let 𝑊𝑛 ∶= 𝑃(𝑛; 𝑤), where 𝑤 is a weight function. Find a recursive formula for 𝑊𝑛 . 7.3. For all 𝑛 ≥ 0, let 𝑇𝑛 denote the number of ordered partitions of [𝑛], each block of which is equipped with a symmetric, transitive relation. (a) Determine the exponential generating function of the sequence (𝑇𝑛 )𝑛≥0 . (b) Find a recursive formula for this sequence. 7.4. A relation 𝑅 on a set 𝑋 is negatively transitive if ∼ 𝑥𝑅𝑦 and ∼ 𝑦𝑅𝑧 ⇒∼ 𝑥𝑅𝑧. (a) Prove that a relation 𝑅 on a set 𝑋 is symmetric, transitive, and negatively transitive if and only if 𝑅 = ∅ or 𝑅 = 𝑋 × 𝑋.
94
7. Intermission: Some unifying themes
(b) For all 𝑛 ≥ 0, let 𝑀𝑛 denote the number of ordered partitions of [𝑛], each block of which is equipped with a symmetric, transitive, negatively transitive relation. Determine the exponential generating function of the sequence (𝑀𝑛 )𝑛≥0 . (c) Find a recursive formula for this sequence. 7.5. Prove Theorem 7.1.4. 7.6. If Π1 and Π2 are partitions of [𝑛], Π1 is said to be refined by Π2 (or that Π2 refines Π1 ) if every block of Π2 is contained in a block of Π1 . (2) (a) Let 𝐵𝑛 ∶= |{(Π1 , Π2 ) ∶ Π1 and Π2 are partitions of [𝑛], and Π1 is refined by (2) Π2 }|. Determine the exponential generating function of (𝐵𝑛 )𝑛≥0 . (𝑟) (b) Generalize the result of part (a) to the numbers 𝐵𝑛 ∶= |{(Π1 , Π2 , . . . , Π𝑟 ) ∶ each Π𝑖 is a partition of [𝑛] and Π𝑖 is refined by Π𝑖+1 , for 1 ≤ 𝑖 ≤ 𝑟 − 1}|. 7.7. Prove that (7.2.3) ⇔ (7.2.4). 7.8. Prove that (7.3.2) ⇔ (7.3.3). 7.9. Prove that (7.3.3) ⇔ (7.3.4). 7.10. For all 𝑛 ∈ ℕ, let 𝑎(𝑛) denote the number of permutations 𝜋 ∶ [𝑛] → [𝑛] such 𝑥𝑛 that 𝜋 ∘ 𝜋 = 𝑖[𝑛] . Determine the exponential generating function ∑𝑛≥0 𝑎𝑛 𝑛! . (Hint: What is the cycle structure of such a permutation?)
Project 7.A (a) With 𝜑𝑛 (𝑥) defined by (7.2.1), 𝜌𝑛 (𝑥) by (7.3.1), and 𝐴∗ (𝑛, 𝑘) by (7.3.2), prove that 𝑛
𝐴∗ (𝑛, 𝑘) = ∑ 𝐸𝑛−𝑗 (𝑎0 , . . . , 𝑎𝑛−1 )𝐶𝑗−𝑘 (𝑏0 , . . . , 𝑏𝑘 ). 𝑗=𝑘 𝑛
(Hint: Recall that 𝜌𝑛 (𝑥) = ∑𝑗=0 𝐸𝑛−𝑗 (𝑎0 , . . . , 𝑎𝑛−1 )𝑥𝑗 , and then write 𝑥𝑗 as a linear combination of 𝜑0 (𝑥), . . . , 𝜑𝑗 (𝑥) using (7.2.2) and (7.2.6).) (b) Let (𝐴𝑖 )𝑖≥0 be a sequence of finite, pairwise disjoint sets, with |𝐴𝑖 | = 𝑎𝑖 for all 𝑖 ≥ 0. Using formula (6.4.3), show that 𝐸𝑛−𝑗 (𝑎0 , . . . , 𝑎𝑛−1 ) counts the words 𝑥𝑖1 𝑥𝑖2 ⋯ 𝑥𝑖𝑛−𝑗 of length 𝑛 − 𝑗 in the alphabet 𝐴0 ∪ ⋯ ∪ 𝐴𝑛−1 where 0 ≤ 𝑖1 < 𝑖2 < ⋯ < 𝑖𝑛−𝑗 ≤ 𝑛 − 1 and 𝑥𝑖𝑟 ∈ 𝐴𝑖𝑟 for 𝑟 = 1, . . . , 𝑛 − 𝑗. Call such words ascending. (c) Let (𝐵𝑖 )𝑖≥0 be a sequence of finite, pairwise disjoint sets, with |𝐵𝑖 | = 𝑏𝑖 for all 𝑖 ≥ 0. Using formula (7.2.8), show that 𝐶𝑗−𝑘 (𝑏0 , . . . , 𝑏𝑘 ) counts the words 𝑦 𝑖1 𝑦 𝑖2 ⋯ 𝑦 𝑖𝑗−𝑘 in the alphabet 𝐵0 ∪ ⋯ ∪ 𝐵𝑘 , where 0 ≤ 𝑖1 ≤ 𝑖2 ≤ ⋯ ≤ 𝑖𝑗−𝑘 ≤ 𝑘 and 𝑦 𝑖𝑟 ∈ 𝐵𝑖𝑟 for 𝑟 = 1, . . . , 𝑗 − 𝑘. Call such words weakly ascending. (d) Suppose that (𝐴0 ∪ 𝐴1 ∪ ⋯) ∩ (𝐵0 ∪ 𝐵1 ∪ ⋯) = ∅. Using the results of parts (a), (b), and (c) above, show that 𝐴∗ (𝑛, 𝑘) counts the words of length 𝑛 − 𝑘 consisting of an ascending word (possibly empty) from the alphabet 𝐴0 ∪ ⋯ ∪ 𝐴𝑛−1 followed by a weakly ascending word (possibly empty) from the alphabet 𝐵0 ∪ ⋯ ∪ 𝐵𝑘 .
7.A
95
(e) (challenging) Establish the recurrences (7.3.3) and (7.3.4) for sequences (𝑎𝑖 )𝑖≥0 and (𝑏𝑖 )𝑖≥0 in ℕ, using the combinatorial interpretation of 𝐴∗ (𝑛, 𝑘) established in part (d) above.
Chapter 8
Combinatorics and number theory
This chapter explores a number of intriguing connections between combinatorics and number theory. We begin by studying the ring of arithmetic (accent on the first syllable) functions, including the Euler phi-function, the zeta function, and the Möbius function, and we examine some combinatorial applications of the Möbius inversion formula, which turns out to generalize binomial inversion. We then turn to several topics in additive number theory, first considering the enumeration of partitions of an integer 𝑛 (multisets of positive integers that sum to 𝑛), and then the evaluation of power sums (expressions of the form 0𝑟 + 1𝑟 + ⋯ + 𝑛𝑟 , with 𝑟, 𝑛 ∈ ℕ). We conclude by establishing some basic number-theoretic properties of factorials, and binomial and multinomial coefficients, including classical theorems of Lucas, Legendre, and Kummer.
8.1. Arithmetic functions An arithmetic (or number-theoretic) function is a mapping 𝑓 ∶ ℙ → ℂ. Under the operations + and ∗ defined by
(8.1.1)
(𝑓 + 𝑔)(𝑛) ∶= 𝑓(𝑛) + 𝑔(𝑛)
and (8.1.2)
(𝑓 ∗ 𝑔)(𝑛) ∶= ∑ 𝑓(𝑑)𝑔(𝑛/𝑑), 𝑑|𝑛
97
98
8. Combinatorics and number theory
where the notation 𝑑|𝑛 indicates a sum taken over all positive divisors 𝑑 of 𝑛, the set of arithmetic functions is an integral domain, with additive identity 𝟎(𝑛) ≡ 0 and multiplicative identity 1(𝑛) = 𝛿1,𝑛 . By associativity of the operation ∗, it follows that (8.1.3)
∑
𝑓1 ∗ 𝑓2 ∗ ⋯ ∗ 𝑓𝑘 (𝑛) =
𝑓1 (𝑑1 )𝑓2 (𝑑2 ) ⋯ 𝑓𝑘 (𝑑𝑘 ).
𝑑1 𝑑2 ⋯𝑑𝑘 =𝑛
Of particular interest are those arithmetic functions 𝑓 for which 𝑓(𝑛) records some number-theoretic property of 𝑛. Among the most important is the Euler 𝜙-function, defined for all 𝑛 ∈ ℙ by (8.1.4)
𝜙(𝑛) ∶= |{𝑘 ∈ [𝑛] ∶ gcd(𝑘, 𝑛) = 1}|.
(Numbers 𝑘 and 𝑛 satisfying gcd(𝑘, 𝑛) = 1 are said to be relatively prime, or coprime.) In particular, 𝜙(1) = 1, 𝜙(𝑝) = 𝑝 − 1 for every prime 𝑝, and, more generally, 𝜙(𝑝𝑛 ) = 𝑝𝑛 − 𝑝𝑛−1 ,
(8.1.5)
since, among the members of [𝑝𝑛 ], only the 𝑝𝑛−1 numbers of the form 𝑚𝑝, where 1 ≤ 𝑚 ≤ 𝑝𝑛−1 , fail to be relatively prime to 𝑝𝑛 . We evaluate 𝜙(𝑛) for arbitrary 𝑛 ∈ ℙ in Theorems 8.1.5 and 8.1.6 below. In addition to 𝜙, the zeta function 𝜁, defined by (8.1.6)
𝜁(𝑛) ≡ 1,
and the Möbius function 𝜇, defined by 𝜇(1) = 1, (8.1.7)
𝜇(𝑛) = 0 if 𝑝2 |𝑛 for some prime 𝑝, and 𝜇(𝑝1 ⋯ 𝑝𝑟 ) = (−1)𝑟 for distinct primes 𝑝1 , . . . , 𝑝𝑟 ,
play a crucial role in the theory of arithmetic functions. Theorem 8.1.1. 𝜇 ∗ 𝜁 = 𝜁 ∗ 𝜇 = 𝟏. Proof. Since the operation ∗ is commutative, we need only show that 𝜇 ∗ 𝜁 = 𝟏. It is 𝑒 𝑒 clear that 𝜇 ∗ 𝜁(1) = 1. If 𝑛 = 𝑝11 ⋯ 𝑝𝑟𝑟 is the prime factorization of 𝑛, where 𝑟 ≥ 1, then, by Theorem 3.3.1, 𝑟
𝜇 ∗ 𝜁(𝑛) = ∑ 𝜇(𝑑) = 𝑑|𝑛
∑ 𝑑|𝑝1 ⋯𝑝𝑟
𝑟 𝜇(𝑑) = ∑ ( )(−1)𝑘 = 0. 𝑘 𝑘=0
□
Theorem 8.1.2 (Möbius inversion principle). For arithmetic functions 𝑓, 𝑔 ∶ ℙ → ℂ, the following are equivalent. (8.1.8)
For all 𝑛 ≥ 1, 𝑔(𝑛) = ∑ 𝑓(𝑑). 𝑑|𝑛
(8.1.9)
For all 𝑛 ≥ 1, 𝑓(𝑛) = ∑ 𝜇(𝑛/𝑑)𝑔(𝑑) = ∑ 𝜇(𝑑)𝑔(𝑛/𝑑). 𝑑|𝑛
𝑑|𝑛
Proof. The asserted equivalence amounts to the equivalence of 𝑔 = 𝑓∗𝜁 and 𝑓 = 𝑔∗𝜇, which follows immediately from Theorem 8.1.1. Note that the transition from the first to the second sum in (8.1.9) above, which involves substituting 𝑛/𝑑 for 𝑑, is an analogue of top-down summation. □
8.1. Arithmetic functions
99
Interestingly, the binomial inversion principle (Theorem 3.3.3) turns out to be a simple corollary of the Möbius inversion principle. Corollary 8.1.3 (Binomial inversion principle, redux). For all complex sequences (𝑎𝑟 )𝑟≥0 and (𝑏𝑟 )𝑟≥0 , the following are equivalent. 𝑟
(8.1.10)
𝑟 For all 𝑟 ≥ 0, 𝑏𝑟 = ∑ ( )𝑎𝑘 . 𝑘 𝑘=0
(8.1.11)
𝑟 For all 𝑟 ≥ 0, 𝑎𝑟 = ∑ (−1)𝑟−𝑘 ( )𝑏𝑘 . 𝑘 𝑘=0
𝑟
Proof. (Carlitz (1978)). Suppose (8.1.10). Define 𝑓 ∶ ℙ → ℂ by (𝑖) 𝑓(1) = 𝑎0 , (𝑖𝑖) 𝑓(𝑝1 ⋯ 𝑝𝑟 ) = 𝑎𝑟 when 𝑝1 , . . . , 𝑝𝑟 are distinct primes, and (𝑖𝑖𝑖) 𝑓(𝑛) = 0 otherwise. Define 𝑔 ∶ ℙ → ℂ by 𝑔 = 𝑓 ∗ 𝜁. In particular, 𝑔(1) = 𝑓(1) = 𝑎0 , and for all 𝑘 > 0 and distinct primes 𝑝1 , . . . , 𝑝 𝑘 , 𝑘
(8.1.12)
𝑔(𝑝1 ⋯ 𝑝 𝑘 ) =
∑ 𝑑|𝑝1 ⋯𝑝𝑘
𝑘 𝑓(𝑑) = ∑ ( )𝑎𝑗 = 𝑏𝑘 . 𝑗 𝑗=0
By Möbius inversion, 𝑓 = 𝑔 ∗ 𝜇. So for all 𝑟 > 0 and distinct primes 𝑝1 , . . . , 𝑝𝑟 , 𝑟
(8.1.13)
𝑎𝑟 = 𝑓(𝑝1 ⋯ 𝑝𝑟 ) =
∑ 𝑑|𝑝1 ⋯𝑝𝑘
𝑟 𝜇(𝑝1 ⋯ 𝑝𝑟 /𝑑)𝑔(𝑑) = ∑ (−1)𝑟−𝑘 ( )𝑏𝑘 , 𝑘 𝑘=0
by (8.1.12). That (8.1.11) implies (8.1.10) follows from remarks included in the proof of Theorem 3.3.3. □ In the above proof, the product of any 𝑟 distinct primes stands in, so to speak, for the nonnegative integer 𝑟. Suppose that 𝐹 ∶ ℕ → ℂ and 𝐹 ∶ ℙ → ℂ. Following Carlitz, we say that 𝐹 extends 𝐹 if, for all 𝑛 ∈ ℕ, and for every product 𝑝1 ⋯ 𝑝𝑛 of distinct primes, it is the case that 𝐹(𝑛) = 𝐹(𝑝1 ⋯ 𝑝𝑛 ). In particular, 𝐹(0) = 𝐹(1). Here is a simple example. By (8.1.3) and (8.1.6), we have 𝜁𝑘 (𝑚) =
∑
1,
𝑑1 ⋯𝑑𝑘 =𝑚 𝑑𝑖 ≥1
the number of weak ordered factorizations of m with k factors and (𝜁 − 𝟏)𝑘 (𝑚) =
∑
1,
𝑑1 ⋯𝑑𝑘 =𝑚 𝑑𝑖 >1
the number of ordered factorizations of m with k factors. If 𝑝1 ⋯ 𝑝𝑛 is the product of any 𝑛 distinct primes, then 𝜁𝑘 (𝑝1 ⋯ 𝑝𝑛 ) = 𝑘𝑛 , the number of weak ordered partitions of any 𝑛-element set having 𝑘 blocks, and (𝜁 − 𝟏)𝑘 (𝑝1 ⋯ 𝑝𝑛 ) = 𝜎(𝑛, 𝑘), the number of ordered partitions of any 𝑛-element set having 𝑘 blocks. Additional examples may be found in the exercises. Recall that the Euler 𝜙 -function is defined for all 𝑛 ∈ ℙ by 𝜙(𝑛) = |{𝑘 ∈ [𝑛] ∶ gcd(𝑘, 𝑛) = 1}|. The following elaborates this idea in a useful way, and leads to a formula for 𝜙(𝑛).
100
8. Combinatorics and number theory
Theorem 8.1.4. If 𝑑|𝑛, then |{𝑘 ∈ [𝑛] ∶ gcd(𝑘, 𝑛) = 𝑑}| = 𝜙(𝑛/𝑑). Proof. The elements 𝑘 of [𝑛] satisfying gcd(𝑘, 𝑛) = 𝑑 are to be found among the 𝑛/𝑑 numbers 𝑑, 2𝑑, . . . , (𝑛/𝑑)𝑑, and gcd(𝑗𝑑, (𝑛/𝑑)𝑑) = 𝑑 if and only if gcd(𝑗, (𝑛/𝑑)) = 1. □ Theorem 8.1.5. For all 𝑛 ≥ 1, 𝜇(𝑑) . 𝑑 𝑑|𝑛
𝜙(𝑛) = 𝑛 ∑
(8.1.14)
Proof. By Theorem 8.1.4, 𝑛 = ∑𝑑|𝑛 𝜙(𝑛/𝑑), i.e., 𝜆 = 𝜁∗𝜙, where 𝜆(𝑛) ∶= 𝑛. By Möbius inversion, we have 𝜙 = 𝜇 ∗ 𝜆, which is (8.1.14). □ Here is another interesting formula for 𝜙(𝑛). Theorem 8.1.6. For all 𝑛 ≥ 1, 𝜙(𝑛) = 𝑛 ∏(1 −
(8.1.15)
𝑝|𝑛
1 ), 𝑝
where the product is taken over all prime divisors p of n. 𝑒
𝑒
Proof. If 𝑛 = 𝑝11 ⋯ 𝑝𝑟𝑟 , then by (8.1.14), 𝜙(𝑛) = 𝑛
∑
1 1 1 + ∑ − ⋯ + (−1)𝑟 ) 𝑝 𝑝 𝑝 𝑝 ⋯ 𝑝𝑟 1 1≤𝑖≤𝑟 𝑖 1≤𝑖 𝑛, then 𝑝(𝑛, 𝑘) = 0. For all 𝑛 ≥ 0 and all 𝑘 > 0, (8.3.1)
𝑝(𝑛 + 𝑘, 𝑘) = 𝑝(𝑛, 0) + 𝑝(𝑛, 1) + ⋯ + 𝑝(𝑛, 𝑘).
Proof. If 𝑛 = 0, the result is obvious, so suppose that 𝑛 > 0. Let 𝒜 denote the set of all partitions of 𝑛 with 𝑘 or fewer parts, and let ℬ be the set of all partitions of 𝑛 + 𝑘 with 𝑘 parts, representing a partition of the former type by a sequence 𝑛1 ≤ 𝑛2 ≤ ⋯ ≤ 𝑛𝑗 , where 1 ≤ 𝑗 ≤ 𝑘, and a partition of the latter type by a sequence 𝑚1 ≤ 𝑚2 ≤ ⋯ ≤ 𝑚𝑘 . Formula (8.3.1) now follows from the fact that the map 𝑛1 ≤ 𝑛2 ≤ ⋯ ≤ 𝑛𝑗 ↦ 1 ≤ 1 ≤ ⋯ ≤ 1 ≤ 𝑛1 + 1 ≤ 𝑛2 + 1 ≤ ⋯ ≤ 𝑛𝑗 + 1, where the latter sequence begins with 𝑘 − 𝑗 1’s, is a bijection from 𝒜 to ℬ. □ Table 8.2. The numbers 𝑝(𝑛, 𝑘) for 0 ≤ 𝑛, 𝑘 ≤ 6
𝑛=0 𝑛=1 𝑛=2 𝑛=3 𝑛=4 𝑛=5 𝑛=6
𝑘=0 1 0 0 0 0 0 0
𝑘=1 0 1 1 1 1 1 1
𝑘=2 0 0 1 1 2 2 3
𝑘=3 0 0 0 1 1 2 3
𝑘=4 0 0 0 0 1 1 2
𝑘=5 0 0 0 0 0 1 1
𝑘=6 0 0 0 0 0 0 1
8.3. Partitions of an integer
103
Figure 8.1. Ferrers diagram of the partition 5 ≥ 5 ≥ 3 ≥ 2 ≥ 2 ≥ 1 of 18
A partition of 𝑛 represented as a sequence 𝑛1 ≥ 𝑛2 ≥ ⋯ ≥ 𝑛𝑘 of weakly decreasing parts is usefully represented geometrically by its Ferrers diagram, illustrated in Figure 8.1 for the partition 5 ≥ 5 ≥ 3 ≥ 2 ≥ 2 ≥ 1 of 18: If we read the above diagram from left to right instead of from top to bottom, we get the partition 6 ≥ 5 ≥ 3 ≥ 2 ≥ 2, which is known as the conjugate of 5 ≥ 5 ≥ 3 ≥ 2 ≥ 2 ≥ 1. Note that the number of parts in either partition is equal to the largest part in its conjugate. This suggests that the number of partitions of 𝑛 with 𝑘 parts is equal to the number of partitions of 𝑛 with largest part equal to 𝑘, which is indeed the case. But before we prove this result, originally due to Euler, we establish a lemma of independent interest. Lemma 8.3.2. Suppose that 𝑎1 , . . . , 𝑎𝑛 is any sequence of nonnegative integers, with max{𝑎𝑖 ∶ 𝑖 ∈ [𝑛]} = 𝑚. Then 𝑛
(8.3.2)
𝑚
∑ 𝑎𝑖 = ∑ 𝑔(𝑘), where 𝑔(𝑘) ∶= |{𝑖 ∈ [𝑛] ∶ 𝑎𝑖 ≥ 𝑘}|. 𝑖=1
Proof.
𝑘=1
𝑛
𝑛
𝑎𝑖
𝑚
𝑚
∑ 𝑎𝑖 = ∑ ∑ 1 = ∑ ∑ 1 = ∑ 𝑔(𝑘). 𝑖=1
𝑖=1 𝑘=1
𝑘=1
𝑖 𝑎𝑖 ≥𝑘
□
𝑘=1
Remark 8.3.3. Students of probability theory may note the similarity of formula (8.3.2) with the formula ∞
(8.3.3)
𝐸[𝑋] = ∫ 𝑃(𝑋 > 𝑥)𝑑𝑥 0
for the expected value of a nonnegative random variable 𝑋. Indeed, (8.3.2) implies the special case of (8.3.3) in which 𝑋 takes the values 𝑎1 , . . . , 𝑎𝑛 with equal probability. Theorem 8.3.4 (Euler). The number of partitions of 𝑛 with 𝑘 parts is equal to the number of partitions of 𝑛 with largest part equal to 𝑘. Proof. Suppose that 𝑛1 ≥ 𝑛2 ≥ ⋯ ≥ 𝑛𝑘 is a partition of 𝑛 with 𝑘 parts. The conjugate of this partition has the form 𝑚1 ≥ 𝑚2 ≥ ⋯ ≥ 𝑚𝑛1 , where 𝑚1 (= 𝑘) is the number of parts of the partition 𝑛1 ≥ 𝑛2 ≥ ⋯ ≥ 𝑛𝑘 that are greater than or equal to 1, 𝑚2 is the number of parts that are greater than or equal to 2, . . . , and 𝑚𝑛1 is the number of parts that are ≥ 𝑛1 . Conjugation clearly takes a partition of 𝑛 with 𝑘 parts to a multiset of
104
8. Combinatorics and number theory
positive integers with largest part equal to 𝑘. That 𝑚1 + 𝑚2 + ⋯ + 𝑚𝑛1 = 𝑛 follows from the preceding lemma. □ As an illustration, the partition 5 ≥ 5 ≥ 5 ≥ 3 ≥ 3 ≥ 2 ≥ 1 ≥ 1 of 25 with eight parts is mapped to the partition 8 ≥ 6 ≥ 5 ≥ 3 ≥ 3 with largest part equal to 8. Corollary 8.3.5. The number of partitions of 𝑛 with 𝑘 or fewer parts is equal to the number of partitions of 𝑛 with largest part less than or equal to 𝑘. □
Proof. Obvious.
Theorem 8.3.4 is one of a large number of propositions in the theory of integer partitions of the form “the number of partitions of 𝑛 with property 𝑃 is equal to the number of partitions of 𝑛 with property 𝑄”. Here are two others. A partition is self-conjugate if it is equal to its own conjugate. The partitions 3+2+1 and 5+3+3+1+1 are each self-conjugate, which is immediately obvious from their Ferrers diagrams; see Figure 8.2.
Figure 8.2
Theorem 8.3.6 (Sylvester). The number of self-conjugate partitions of 𝑛 is equal to the number of partitions of 𝑛 with all parts distinct and odd. Proof. Use Ferrers diagrams, noting that any odd part can be folded to produce a selfconjugate partition, as illustrated in Figure 8.3.
Figure 8.3
Nesting the results of this procedure for each odd distinct part yields a bijection from the set of partitions of 𝑛 with distinct odd parts to the set of self-conjugate partitions of 𝑛, as illustrated in Figure 8.4. □
8.3. Partitions of an integer
105
Figure 8.4
Theorem 8.3.7 (Euler). The number of partitions of 𝑛 with all parts odd is equal to the number of partitions of 𝑛 with all parts distinct. Proof. Let {1𝜆1 , 3𝜆3 , 5𝜆5 , . . . , (2𝑟 + 1)𝜆2𝑟+1 } be a partition of 𝑛 with all parts odd. Map this partition to a partition of 𝑛 with all parts distinct, as follows. Write each positive 𝜆𝑖 as the sum of distinct powers of 2, say, 𝜆𝑖 = 2𝑒𝑖,1 + 2𝑒𝑖,2 + ⋯ + 2𝑒𝑖,𝑘 , where 0 ≤ 𝑒 𝑖,1 < 𝑒 𝑖,2 < ⋯ < 𝑒 𝑖,𝑘 . Construct a new partition of 𝑛 by replacing 𝑖𝜆𝑖 (i.e., the 𝜆𝑖 parts equal to the odd number 𝑖) with the 𝑘 parts 2𝑒𝑖,1 𝑖, 2𝑒𝑖,2 𝑖, . . . , 2𝑒𝑖,𝑘 𝑖. These 𝑘 parts differ from each other, since the exponents of 2 differ from each other. Moreover, if 𝑖 and 𝑗 are distinct odd numbers, it is never the case that 2𝑎 𝑖 = 2𝑏 𝑗, for this would contradict unique factorization. Since 2𝑒𝑖,1 𝑖 + 2𝑒𝑖,2 𝑖 + ⋯ + 2𝑒𝑖,𝑘 𝑖 = 𝜆𝑖 𝑖 and 𝜆1 1 + 𝜆3 3 + ⋯ + 𝜆2𝑟+1 (2𝑟 + 1) = 𝑛, this procedure produces a partition of 𝑛 with all parts distinct. As an illustration, the partition {12 , 33 , 71 , 95 } of 63 with all parts odd gives rise by this procedure to the partition {21 1, 20 3, 21 3, 20 7, 20 9, 22 9} = {2, 3, 6, 7, 9, 36} of 63 with all parts distinct. That this procedure produces a bijection between the sets of partitions in question may be seen by constructing its inverse, as follows, (𝑖) Write each distinct part as a power of 2, multiplied by an odd number. (𝑖𝑖) Group terms with the same odd factor. (𝑖𝑖𝑖) Repeat odd factors the appropriate number of times. As an illustration, using the example above, one writes 2 = 21 1, 3 = 20 3, 6 = 21 3, 7 = 20 7, 9 = 20 9, and 36 = 22 9. This yields the partition 12 33 71 95 of 63. □ The above proof is actually due to Sylvester. We will shortly present Euler’s generating function proof of this theorem, but first we will pause for a few basic observations about the ordinary generating functions of various classes of integer partitions. Theorem 8.3.8. If 𝐼 is a nonempty subset of positive integers and 𝑝(𝑛; 𝐼) denotes the number of partitions of 𝑛 with all parts in 𝐼, then 1 ∑ 𝑝(𝑛; 𝐼)𝑥𝑛 = ∏ . (8.3.4) 1 − 𝑥𝑖 𝑛≥0 𝑖∈𝐼 In particular, ∑ 𝑝(𝑛)𝑥𝑛 = ∏
(8.3.5)
𝑛≥0
𝑖≥1
1 . 1 − 𝑥𝑖
Proof. ∏ 𝑖∈𝐼
1 = ∏ ∑ 𝑥𝑖𝜆𝑖 = ∑ ( ∑ 1)𝑥𝑛 = ∑ 𝑝(𝑛; 𝐼)𝑥𝑛 . 1 − 𝑥𝑖 𝑛≥0 ∑ 𝑖𝜆 =𝑛 𝑛≥0 𝑖∈𝐼 𝜆 ≥0 𝑖
𝑖 𝑖∈𝐼 𝜆𝑖 ≥0
106
8. Combinatorics and number theory
Formulas (8.3.4) and (8.3.5) can be construed as formal products (Niven (1969)). It can also be shown analytically that these products converge if 0 < 𝑥 < 1. □ Theorem 8.3.9. If 𝑝(𝑛; 𝐼, all parts distinct) denotes the number of partitions of 𝑛 with all parts distinct members of 𝐼, then ∑ 𝑝(𝑛; 𝐼, all parts distinct)𝑥𝑛 = ∏(1 + 𝑥𝑖 ).
(8.3.6)
𝑛≥0
𝑖∈𝐼
In particular, ∑ 𝑝(𝑛; all parts distinct)𝑥𝑛 = ∏(1 + 𝑥𝑖 ).
(8.3.7)
𝑛≥0
𝑖≥1
Proof. List the elements of 𝐼 in the increasing order 𝑖1 < 𝑖2 < ⋯. Let 𝑖𝑗 be the largest part ≤ 𝑛. The coefficient of 𝑥𝑛 in the expansion of the product in (8.3.6) is equal to the number of subsequences of (𝑖1 , 𝑖2 , . . . , 𝑖𝑗 ), the elements of which sum to 𝑛, which is precisely 𝑝(𝑛; 𝐼, all parts distinct). □ Euler’s generating function proof that p(n; all parts distinct) = p(n; all parts odd) now proceeds as follows: ∑ 𝑝(𝑛; all parts distinct)𝑥𝑛 = (1 + 𝑥)(1 + 𝑥2 )(1 + 𝑥3 ) ⋯ 𝑛≥0
(1 + 𝑥)(1 − 𝑥)(1 + 𝑥2 )(1 − 𝑥2 )(1 + 𝑥3 )(1 − 𝑥3 ) ⋯ (1 − 𝑥)(1 − 𝑥2 )(1 − 𝑥3 )(1 − 𝑥4 )(1 − 𝑥5 )(1 − 𝑥6 ) ⋯ (1 − 𝑥2 )(1 − 𝑥4 )(1 − 𝑥6 ) ⋯ = (1 − 𝑥)(1 − 𝑥2 )(1 − 𝑥3 )(1 − 𝑥4 )(1 − 𝑥5 )(1 − 𝑥6 ) ⋯ 1 =∏ = ∑ 𝑝(𝑛; all parts odd)𝑥𝑛 . 2𝑖−1 1 − 𝑥 𝑛≥0 𝑖≥1 =
□
The reciprocal of the generating function (8.3.5), known as Euler’s function, is given by the formula ∏𝑖≥1 (1 − 𝑥𝑖 ), which differs from (8.3.7) only in the appearance of minus signs in place of plus signs. Since (8.3.7) is the generating function for partitions with all parts distinct, one might suspect that Euler’s function has some connection with such partitions. To see that this is the case, let 𝑝𝑒 (𝑛) denote the number of partitions of 𝑛 into an even number of distinct parts, and let 𝑝𝑜 (𝑛) denote the number of partitions of 𝑛 into an odd number of distinct parts. Theorem 8.3.10. (8.3.8)
∏(1 − 𝑥𝑖 ) = ∑ (𝑝𝑒 (𝑛) − 𝑝𝑜 (𝑛))𝑥𝑛 . 𝑖≥1
𝑛≥0
Proof. Consider the expansion of the finite product (1 − 𝑥)(1 − 𝑥2 ) ⋯ (1 − 𝑥𝑛 ), which consists, before combining like powers, of the sum of 2𝑛 terms. A term 𝑥𝑛 will appear with coefficient +1 whenever 𝑥𝑛 materializes as the product of an even number of the terms 𝑥, 𝑥2 , . . . , 𝑥𝑛 , and with the coefficient −1 whenever 𝑥𝑛 materializes as the product of an odd number of these terms. The asserted result follows immediately by combining like powers. □
8.3. Partitions of an integer
107
The preceding result is due to F. Franklin (1881), who employed it in a proof of the following, known as Euler’s pentagonal number theorem. Theorem 8.3.11. ∞
∏(1 − 𝑥𝑖 ) = ∑ (−1)𝑘 𝑥𝑘(3𝑘−1)/2 𝑖≥1
(8.3.9)
𝑘=−∞
= 1 + ∑ (−1)𝑘 (𝑥𝑘(3𝑘−1)/2 + 𝑥𝑘(3𝑘+1)/2 ) 𝑘≥1
= 1 − 𝑥 − 𝑥2 + 𝑥5 + 𝑥7 − 𝑥12 − 𝑥15 + ⋯ . Proof. In view of Theorem 8.3.10, it suffices to show that the right-hand side of formula (8.3.9) is equal to the generating function ∑𝑛≥0 (𝑝𝑒 (𝑛) − 𝑝𝑜 (𝑛))𝑥𝑛 . Consider the partition 6+5+3 of 14 into three distinct parts, and its Ferrers diagram shown in Figure 8.5.
Figure 8.5
Let 𝑏 denote the number of blocks in the bottom row, and 𝑑 the number of blocks (marked above with the symbol ∗) in the rightmost diagonal with slope 45 degrees. Here 𝑏(= 3) > 𝑑(= 2), and so we can move the blocks marked with ∗ to the bottom of the above diagram, creating the Ferrers diagram (see Figure 8.6) of the partition 5 + 4 + 3 + 2 of 14 into four distinct parts. On the other hand, if we start with the latter partition, where now 𝑏(= 2) ≤ 𝑑(= 4), we can take the boxes in the bottom row and create a new rightmost diagonal, thereby recovering the partition with diagram given in Figure 8.5. In each case, we preserve the distinctness of the parts, and change the parity of the number of parts, creating a pairing between a partition into an odd number of distinct parts and a partition into an even number of distinct parts.
Figure 8.6
So that moving the rightmost diagonal of a diagram down to the bottom creates a valid Ferrers diagram of partition with distinct parts, it is clearly necessary that 𝑏 > 𝑑. And moving the bottom row of a diagram into place as the rightmost diagonal of
108
8. Combinatorics and number theory
that diagram creates a valid diagram only if 𝑏 ≤ 𝑑. So, given a Ferrers diagram, there is at most one move that will create a valid diagram from the original. If that move does create a valid diagram, the complementary move will transform the latter into the original diagram, thereby creating a pairing of diagrams differing in parity in their number of parts. In what situations will neither type of move lead to a valid diagram of a partition with distinct parts? An examination of all possible scenarios (you should check this) reveals that there are just two such cases. First, suppose that in the diagram of a partition of 𝑛 into distinct parts, it is the case that 𝑏 = 𝑑, and the bottom row and rightmost diagonal share a box, as in Figure 8.7.
Figure 8.7
In this case, the only possible move with the potential to create a valid diagram is the bottom-to-diagonal move. But as illustrated in Figure 8.7, this creates a hanging box. In general, if 𝑏 = 𝑑 = 𝑘, then (8.3.10)
𝑛 = (2𝑘 − 1) + (2𝑘 − 2) + ⋯ + 𝑘 = 𝑘(3𝑘 − 1)/2.
In all partitions of this 𝑛 into distinct parts, with the exception of the partition given in (8.3.10), those with an even number of parts can be paired by the procedures described above with those with an odd number of parts. The parity of the number of parts in the single exceptional partition is identical to that of 𝑘, and so the coefficient of 𝑥𝑘(3𝑘−1)/2 in the expansion of the infinite product in (8.3.9) is equal to (−1)𝑘 . Suppose next that in the diagram of a partition of 𝑛 into distinct parts, it is the case that 𝑏 = 𝑑 + 1, and the bottom row and rightmost diagonal share a box, as shown in Figure 8.8.
Figure 8.8
Here the only move with the potential to create a valid diagram is the diagonalto-bottom move. But, as illustrated above, this move fails to create the diagram of a partition with distinct parts. In general, if 𝑏 = 𝑘 + 1 and 𝑑 = 𝑘, then (8.3.11)
𝑛 = 2𝑘 + (2𝑘 − 1) + ⋯ + (𝑘 + 1) = 𝑘(3𝑘 + 1)/2.
8.4. *Power sums
109
In all partitions of this 𝑛 into distinct parts, with the exception of the partition given in (8.3.11), those with an even number of parts can be paired by the procedures described above with those with an odd number of parts. The parity of the number of parts in the single exceptional partition is identical to that of 𝑘, and so the coefficient of 𝑥𝑘(3𝑘+1)/2 in the expansion of the infinite product in (8.3.9) is equal to (−1)𝑘 . The coefficients of all remaining powers of 𝑥 are equal to zero. □ Remark 8.3.12. The numbers pent(𝑘) ∶= 𝑘(3𝑘 − 1)/2, with 𝑘 ≥ 1, are known as pentagonal numbers, terminology that is explained in Exercise 8.13. These numbers form the sequence 1, 5, 12, 22, 35, . . . . The numbers 𝑔(𝑘) ∶= 𝑘(3𝑘−1)/2, with 𝑘 ∈ ℤ, are called generalized pentagonal numbers and are listed in the order 𝑔(0), 𝑔(1), 𝑔(−1), 𝑔(2), 𝑔(−2), . . . , and they form the sequence 0, 1, 2, 5, 7, 12, 15, 22, 26, 35, 40,. . . . As a consequence of formulas (8.3.5) and (8.3.9), we get the following beautiful recurrence relation for the partition numbers 𝑝(𝑛). Theorem 8.3.13. For all 𝑛 ≥ 1, (8.3.12) 𝑝(𝑛) = 𝑝(𝑛 − 1) + 𝑝(𝑛 − 2) − 𝑝(𝑛 − 5) − 𝑝(𝑛 − 7) + 𝑝(𝑛 − 12) + 𝑝(𝑛 − 15) − ⋯ . Proof. Exercise. Note that the sum on the right-hand side of (8.3.12) is finitely nonzero for every 𝑛 ≥ 1, since the partition function takes the value zero for negative arguments. □ We have considered in this section only a few of the basic results of the theory of integer partitions. Readers interested in further pursuing the subject will find the book, Integer Partitions, by Andrews and Eriksson (2004) to be a beautifully lucid account of this rich part of number theory.
8.4. *Power sums 𝑛
For all 𝑟, 𝑛 ∈ ℕ, the 𝑟th power sum 𝑆𝑟 (𝑛) is defined by 𝑆𝑟 (𝑛) = ∑𝑘=0 𝑘𝑟 , where, as usual, 00 = 1. So 𝑆 0 (𝑛) = 𝑛 + 1, 𝑆 1 (𝑛) = 1 + ⋯ + 𝑛 = (𝑛+1 ), etc. That 𝑆𝑟 (𝑛) is a polynomial 2 in 𝑛 of degree 𝑟 + 1 is a consequence of the following recursive formula. Theorem 8.4.1. For all 𝑛 ≥ 0, 𝑆 0 (𝑛) = 𝑛 + 1, and for all 𝑛 ≥ 0 and 𝑟 > 0, 𝑟−1
(8.4.1)
𝑆𝑟 (𝑛) =
1 𝑟+1 {(𝑛 + 1)𝑟+1 − ∑ ( )𝑆 𝑖 (𝑛)}. 𝑟+1 𝑖 𝑖=0 𝑟
Proof. By the binomial theorem, (𝑘 + 1)𝑟+1 − 𝑘𝑟+1 = ∑𝑖=0 (𝑟+1 )𝑘𝑖 , and so 𝑖 𝑛
𝑛
𝑟
𝑟
𝑛
𝑟
𝑟+1 𝑖 𝑟+1 𝑟+1 ∑ [(𝑘 + 1)𝑟+1 − 𝑘𝑟+1 ] = ∑ ∑ ( )𝑘 = ∑ ( ) ∑ 𝑘𝑖 = ∑ ( )𝑆 𝑖 (𝑛). 𝑖 𝑖 𝑖 𝑘=0 𝑘=0 𝑖=0 𝑖=0 𝑘=0 𝑖=0 Since the leftmost sum above telescopes to (𝑛 + 1)𝑟+1 , this yields the implicit recur𝑟 rence ∑𝑖=0 (𝑟+1 )𝑆 𝑖 (𝑛) = (𝑛 + 1)𝑟+1 , which yields (8.4.1) upon solving for 𝑆𝑟 (𝑛). For a 𝑖 combinatorial proof of (8.4.1), see Wagner (1997). □
110
8. Combinatorics and number theory
There is, however, a far better recursive procedure for generating the polynomials 𝑆𝑟 (𝑛). Instead of factoring these polynomials (as is commonly done, and which obscures the simple procedure for constructing 𝑆𝑟 (𝑛) from 𝑆𝑟−1 (𝑛)), let us write 𝑟+1
𝑆𝑟 (𝑛) = ∑ 𝑎(𝑟, 𝑗)𝑛𝑗 .
(8.4.2)
𝑗=0
From formula (8.4.1) we may compute the following: 1 1 𝑛 + 𝑛2 , 2 2 1 1 1 𝑆 2 (𝑛) = 𝑛 + 𝑛2 + 𝑛3 , 6 2 3 1 2 1 3 1 4 𝑆 3 (𝑛) = 𝑛 + 𝑛 + 𝑛 , 4 2 4 1 1 3 1 4 1 5 𝑆 4 (𝑛) = − 𝑛 + 𝑛 + 𝑛 + 𝑛 , 30 3 2 5 5 4 1 5 1 6 1 2 𝑆 5 (𝑛) = − 𝑛 + 𝑛 + 𝑛 + 𝑛 . 12 12 2 6 𝑆 1 (𝑛) =
Tabulating the coefficients of these polynomials clarifies the connection between the coefficients of 𝑆𝑟 (𝑛) and those of 𝑆𝑟−1 (𝑛). Table 8.3. 𝑎(𝑟, 𝑗) for 0 ≤ 𝑟 ≤ 5 and 0 ≤ 𝑗 ≤ 6
𝑟=0 𝑟=1 𝑟=2 𝑟=3 𝑟=4 𝑟=5
𝑗=0
𝑗=1
𝑗=2
𝑗=3
𝑗=4
𝑗=5
𝑗=6
1 0 0 0 0 0
1 1/2 1/6 0 −1/30 0
1/2 1/2 1/4 0 −1/12
1/3 1/2 1/3 0
1/4 1/2 5/12
1/5 1/2
1/6
A careful examination of Table 8.3 leads us to conjecture the following theorem. Theorem 8.4.2. For all 𝑟 ≥ 1, 𝑎(𝑟, 0) = 0 and 𝑟+1
(8.4.3)
𝑟+1
∑ 𝑎(𝑟, 𝑗) = ∑ 𝑎(𝑟, 𝑗) = 1, whence , 𝑗=0
𝑗=1 𝑟+1
(8.4.4)
𝑎(𝑟, 1) = 1 − ∑ 𝑎(𝑟, 𝑗). 𝑗=2
For all, 𝑟 ≥ 1 and 𝑗 ≥ 2, (8.4.5)
𝑟 𝑎(𝑟, 𝑗) = 𝑎(𝑟 − 1, 𝑗 − 1). 𝑗
8.4. *Power sums
111
𝑟+1
Proof. (Owens 1992). Setting 𝑛 = 1 in (8.4.2) yields ∑𝑗=0 𝑎(𝑟, 𝑗) = 𝑆𝑟 (1) = 0𝑟 +1𝑟 = 1. Setting 𝑛 = 0 in (8.4.2) for 𝑟 ≥ 1 yields 𝑎(𝑟, 0) = 𝑆𝑟 (0) = 0𝑟 = 0. Now consider the polynomial 𝑟+1
𝑆𝑟 (𝑥) ∶= ∑ 𝑎(𝑟, 𝑗)𝑥𝑗 .
(8.4.6)
𝑗=0
On the vector space 𝑅[𝑥] of polynomials over the field of real numbers, the differentiation operator 𝐷 is a linear operator. So is the (forward) finite difference operator Δ, defined by Δ𝑝(𝑥) = 𝑝(𝑥 + 1) − 𝑝(𝑥). The finite difference analogue of the differentiation formula 𝐷𝑥𝑛 = 𝑛𝑥𝑛−1 is the formula Δ𝑥𝑛 = 𝑛𝑥𝑛−1 , which (as we will see in 𝑥 Chapter 9) has the variant Δ(𝑛𝑥) = (𝑛−1 ). In what follows, we use the easily established fact that Δ and 𝐷 commute, i.e., for every 𝑝(𝑥) ∈ ℝ[𝑥], (8.4.7)
Δ(𝐷𝑝(𝑥)) = 𝐷(Δ𝑝(𝑥)).
For fixed 𝑟 ≥ 0 and all 𝑛 ≥ 0, 𝑆𝑟 (𝑛+1)−𝑆𝑟 (𝑛) = (𝑛+1)𝑟 , which implies the polynomial identity Δ𝑆𝑟 (𝑥) = 𝑆𝑟 (𝑥 + 1) − 𝑆𝑟 (𝑥) = (𝑥 + 1)𝑟 .
(8.4.8)
From (8.4.7) and (8.4.8) it follows that (8.4.9) Δ(𝐷𝑆𝑟 (𝑥)) = 𝐷(Δ𝑆𝑟 (𝑥)) = 𝐷(𝑥 + 1)𝑟 = 𝑟(𝑥 + 1)𝑟−1 = 𝑟Δ𝑆𝑟−1 (𝑥) = Δ𝑟𝑆𝑟−1 (𝑥). Hence, (8.4.10)
Δ[𝐷𝑆𝑟 (𝑥) − 𝑟𝑆𝑟−1 (𝑥)] = 0,
from which it follows that the polynomial 𝐷𝑆𝑟 (𝑥) − 𝑟𝑆𝑟−1 (𝑥) is equal to a constant,1 namely, the constant 𝑐 = 𝐷𝑆𝑟 (0) − 𝑟𝑆𝑟−1 (0) = 𝑎(𝑟, 1) − 𝑟𝛿𝑟,1 . Hence, (8.4.11)
𝐷𝑆𝑟 (𝑥) = 𝑟𝑆𝑟−1 (𝑥) + 𝑐, which implies that 𝑟+1
(8.4.12)
𝑟
𝑟
∑ 𝑗𝑎(𝑟, 𝑗)𝑥𝑗−1 = ∑ (𝑗 + 1)𝑎(𝑟, 𝑗 + 1)𝑥𝑗 = ∑ 𝑟𝑎(𝑟 − 1, 𝑗)𝑥𝑗 + 𝑐. 𝑗=1
𝑗=0
𝑗=0
Comparing coefficients of 𝑥𝑗 for 𝑗 ≥ 1 yields (8.4.13)
(𝑗 + 1)𝑎(𝑟, 𝑗 + 1) = 𝑟𝑎(𝑟 − 1, 𝑗) for all 𝑟, 𝑗 ≥ 1, i.e., 𝑗𝑎(𝑟, 𝑗) = 𝑟𝑎(𝑟 − 1, 𝑗 − 1) for all 𝑟 ≥ 1 and all 𝑗 ≥ 2.
□
Remark 8.4.3. The numbers 𝐵𝑟 defined by 𝐵0 = 𝑎(0, 1), 𝐵1 = −𝑎(1, 1), and 𝐵𝑟 = 𝑎(𝑟, 1) for all 𝑟 ≥ 2, are called Bernoulli numbers. We have 𝐵0 = 1, 𝐵1 = −1/2, 𝐵2 = 1/6, 𝐵3 = 0, 𝐵4 = −1/30, 𝐵5 = 0, 𝐵6 = 1/42, 𝐵7 = 0, 𝐵8 = −1/30, 𝐵9 = 0, 𝐵10 = 5/66, 𝐵11 = 0, 𝐵12 = −691/2730, etc. It may be proved that 𝐵𝑟 = 0 for all odd 𝑟 ≥ 3, and that 𝐵𝑟 alternates in sign for 𝑟 even. If we define a function 𝛽 ∶ ℝ → ℝ by 𝑥 𝑥 (8.4.14) 𝛽(𝑥) = 𝑥 if 𝑥 ≠ 0 and 𝛽(0) = 1(= lim 𝑥 ), 𝑒 −1 𝑥→0 𝑒 − 1 1
𝑚
𝑚
𝑥 If 𝑝(𝑥) = ∑𝑗=0 𝑐𝑗 (𝑥𝑗), then ∆𝑝(𝑥) = ∑𝑗=1 𝑐𝑗 (𝑗−1 ). So if ∆𝑝(𝑥) = 0, then 𝑐1 = ⋯ = 𝑐𝑚 = 0, since the set of
𝑥 polynomials {(𝑗−1 ), 𝑗 = 1, . . . , 𝑚} is a basis for the vector space of polynomials of degree ≤ 𝑚 − 1. So 𝑝(𝑥) = 𝑐0 .
112
8. Combinatorics and number theory
Then, for |𝑥| < 2𝜋, ∞
∑ 𝐵𝑟
(8.4.15)
𝑟=0
𝑥𝑟 = 𝛽(𝑥). 𝑟! ∞
𝑥𝑟
Since 𝑎(1, 1) = 𝐵1 + 1 and 𝑎(𝑟, 1) = 𝐵𝑟 if 𝑟 ≠ 1, ∑𝑟=0 𝑎(𝑟, 1) 𝑟! = 𝛽(𝑥) + 𝑥 = numbers 𝑎(𝑟, 1) are sometimes called second Bernoulli numbers.
𝑥𝑒𝑥 . 𝑒𝑥 −1
The
8.5. 𝑝-orders and Legendre’s theorem If 𝑝 is a prime number and 𝑚 ∈ ℤ, the p-order of m, denoted ord𝑝 (𝑚), is defined by (𝑖) ord𝑝 (0) = ∞ and (𝑖𝑖) ord𝑝 (𝑚) = max{𝑘 ∶ 𝑝𝑘 |𝑚} if 𝑚 ≠ 0. It is easy to verify that, for all 𝑚, 𝑛 ∈ ℤ (8.5.1)
ord𝑝 (𝑚𝑛) = ord𝑝 (𝑚) + ord𝑝 (𝑛), and
(8.5.2)
ord𝑝 (𝑚 + 𝑛) ≥ min{ord𝑝 (𝑚), ord𝑝 (𝑛)},
with equality holding in (8.5.2) whenever 𝑜𝑟𝑑𝑝 (𝑚) ≠ 𝑜𝑟𝑑𝑝 (𝑛). Theorem 8.5.1 (Legendre’s theorem). If 𝑝 is prime, 𝑛 ∈ ℕ, and 𝑛 = ∑𝑖≥0 𝑛𝑖 𝑝𝑖 , where 0 ≤ 𝑛𝑖 < 𝑝, then ord𝑝 (𝑛! ) = ∑ ⌊𝑛/𝑝𝑗 ⌋.
(8.5.3)
𝑗≥1
Proof. # 1. If 𝑛 ∈ ℕ and 𝑑 ∈ ℙ, there are clearly ⌊𝑛/𝑑⌋ multiples of 𝑑 in [𝑛]. Also, if 𝑘 ∈ ℤ and 𝑗 ∈ ℕ, then ord𝑝 (𝑘) ≥ 𝑗 if and only if 𝑝𝑗 |𝑘. Along with (8.5.1), this implies that 𝑛
ord𝑝 (𝑛! ) = ∑ ord𝑝 (𝑘) = ∑ 𝑗 × |{𝑘 ∈ [𝑛] ∶ ord𝑝 (𝑘) = 𝑗}| 𝑘=1
𝑗≥0
= ∑ 𝑗 × (|{𝑘 ∈ [𝑛] ∶ ord𝑝 (𝑘) ≥ 𝑗}| − |{𝑘 ∈ [𝑛] ∶ ord𝑝 (𝑘) ≥ 𝑗 + 1}|) 𝑗≥0
= ∑ 𝑗 × (|{𝑘 ∈ [𝑛] ∶ 𝑝𝑗 |𝑘}| − |{𝑘 ∈ [𝑛] ∶ 𝑝𝑗+1 |𝑘}|) 𝑗≥0
= ∑ 𝑗 × (⌊𝑛/𝑝𝑗 ⌋ − ⌊𝑛/𝑝𝑗+1 ⌋) 𝑗≥0
= ∑ 𝑗⌊𝑛/𝑝𝑗 ⌋ − ∑ (𝑗 − 1)⌊𝑛/𝑝𝑗 ⌋ = ∑ ⌊𝑛/𝑝𝑗 ⌋. 𝑗≥1
𝑗≥1
□
𝑗≥1
Proof. # 2. Apply Lemma 8.3.2, with 𝑎𝑖 = ord𝑝 (𝑖).
□
Corollary 8.5.2. Under the hypotheses of Theorem 8.5.1, (8.5.4)
ord𝑝 (𝑛! ) = (𝑛 − ∑ 𝑛𝑖 )/(𝑝 − 1). 𝑖≥0
Proof. Exercise.
□
8.5. 𝑝-orders and Legendre’s theorem
113
Note that if 0 ≤ 𝑘 ≤ 𝑛, then 𝑛! = (𝑛𝑘)𝑘! (𝑛 − 𝑘)!, and so by (8.5.1), 𝑛 ord𝑝 ( ) = ord𝑝 (𝑛! ) − ord𝑝 (𝑘! ) − ord𝑝 ((𝑛 − 𝑘)! ). 𝑘
(8.5.5)
Formulas (8.5.5) and (8.5.4) yield the following intriguing interpretation of ord𝑝 (𝑛𝑘). Theorem 8.5.3 (Kummer’s carry theorem). If 𝑝 is prime and 0 ≤ 𝑘 ≤ 𝑛, then ord𝑝 (𝑛𝑘) is equal to the number of carries that occur when the addition of 𝑘 and 𝑛 − 𝑘 is performed in 𝑝-ary arithmetic. Proof. Suppose that 𝑛 = ∑𝑖 𝑛𝑖 𝑝𝑖 , 𝑘 = ∑𝑖 𝑘𝑖 𝑝𝑖 , and 𝑛 − 𝑘 = ∑𝑖 𝑙𝑖 𝑝𝑖 , where 0 ≤ 𝑛𝑖 , 𝑘𝑖 , 𝑙𝑖 < 𝑝. In performing the 𝑝-ary addition of 𝑘 and 𝑛 − 𝑘, we have 𝑘0 + 𝑙0 = 𝑛0 + 𝑐 0 𝑝, 𝑐 0 + 𝑘1 + 𝑙1 = 𝑛1 + 𝑐 1 𝑝,
(8.5.6)
𝑐 1 + 𝑘2 + 𝑙2 = 𝑛2 + 𝑐 2 𝑝, etc., where 𝑐 𝑖 (which always takes the value 0 or 1) denotes the carry resulting from the addition in the 𝑖th place. So the number of carries is equal to the sum of the carries. Summing (8.5.6) over all relevant 𝑖 yields ∑𝑖 𝑐 𝑖 +∑𝑖 𝑘𝑖 +∑𝑖 𝑙𝑖 = ∑𝑖 𝑛𝑖 +𝑝 ∑𝑖 𝑐 𝑖 , whence ∑ 𝑐 𝑖 = ∑(𝑘𝑖 + 𝑙𝑖 − 𝑛𝑖 )/(𝑝 − 1).
(8.5.7)
𝑖
𝑖
But by (8.5.5) and (8.5.4), it is also the case that (8.5.8) (𝑛 − ∑𝑖 𝑛𝑖 ) (𝑘 − ∑𝑖 𝑘𝑖 ) (𝑛 − 𝑘 − ∑𝑖 𝑙𝑖 ) 𝑛 − − = ∑(𝑘𝑖 + 𝑙𝑖 − 𝑛𝑖 )/(𝑝 − 1). ord𝑝 ( ) = 𝑝−1 𝑝−1 𝑝−1 𝑘 𝑖
□
Here is another interesting consequence of formula (8.5.5). Theorem 8.5.4. If 1 < 𝑘 < 𝑛 − 1, then (𝑛𝑘) is never equal to a power of a single prime number. Proof. In what follows, we make use of the following easily established inequality for the floor function. For all 𝑥, 𝑦 ∈ ℝ, (8.5.9)
⌊𝑥 + 𝑦⌋ ≤ ⌊𝑥⌋ + ⌊𝑦⌋ + 1.
Let 𝑒𝑝 (𝑛) ∶= max{𝑗 ∶ 𝑝𝑗 ≤ 𝑛}.Then 𝑛 ord𝑝 ( ) ≤ 𝑒𝑝 (𝑛), 𝑘
(8.5.10) since, by (8.5.5) and (8.5.9),
𝑛 𝑘 (𝑛 − 𝑘) 𝑘 (𝑛 − 𝑘) ord𝑝 ( ) = ∑ ⌊ 𝑗 + ⌋−⌊ 𝑗 ⌋−⌊ ⌋ ≤ ∑ 1 = max{𝑗 ∶ 𝑝𝑗 ≤ 𝑛} = 𝑒𝑝 (𝑛). 𝑗 𝑘 𝑝 𝑝 𝑝 𝑝𝑗 𝑗≥1 𝑗≥1 𝑝𝑗 ≤𝑛
𝑝𝑗 ≤𝑛
Note that if 1 < 𝑘 < 𝑛 − 1, then (𝑛𝑘) > 𝑛. Suppose now that 1 < 𝑘 < 𝑛 − 1, and that (𝑛𝑘) = 𝑝𝑡 for some 𝑡 ≥ 1. Then 𝑛 < 𝑝𝑡 , and so 𝑒𝑝 (𝑛) < 𝑡 = ord𝑝 (𝑛𝑘), contradicting □ (8.5.10).
114
8. Combinatorics and number theory
8.6. Lucas’s congruence for binomial coefficients Recall (Theorem 3.1.3) that if 𝑝 is prime, then 𝑝 ( ) ≡ 0 (mod𝑝) if 1 ≤ 𝑘 ≤ 𝑝 − 1. 𝑘
(8.6.1)
The following theorem generalizes this congruence. Theorem 8.6.1 (Lucas’s congruence). Suppose that 𝑝 is prime, that 𝑛 = 𝑛0 + 𝑛1 𝑝 + ⋯ + 𝑛𝑟 𝑝𝑟 , and that 𝑘 = 𝑘0 + 𝑘1 𝑝 + ⋯ + 𝑘𝑟 𝑝𝑟 , where 0 ≤ 𝑛𝑗 , 𝑘𝑗 < 𝑝, for 𝑗 = 0, . . . , 𝑟. Then 𝑛 𝑛 𝑛 𝑛 ( ) ≡ ( 0 )( 1 ) ⋯ ( 𝑟 ) (mod𝑝). 𝑘 𝑘0 𝑘1 𝑘𝑟
(8.6.2)
𝑛
𝑛
Proof. If 𝑝(𝑥) = ∑𝑖=0 𝑎𝑖 𝑥𝑖 and 𝑞(𝑥) = ∑𝑖=0 𝑏𝑖 𝑥𝑖 are polynomials over ℤ and 𝑝 is prime, we say that 𝑝(𝑥) and 𝑞(𝑥) are congruent (mod𝑝), symbolized 𝑝(𝑥) ≡ 𝑞(𝑥) (mod 𝑝), if 𝑎𝑖 ≡ 𝑏𝑖 (mod𝑝) for 𝑖 = 0, . . . , 𝑛. It is easy to verify that congruence mod𝑝 is an equivalence relation on ℤ[𝑥], and that if 𝑝𝑗 (𝑥) ≡ 𝑞𝑗 (𝑥) (mod𝑝) for 𝑗 = 1, . . . , 𝑟, 𝑟 𝑟 𝑟 𝑟 then ∑𝑗=1 𝑝𝑗 (𝑥) ≡ ∑𝑗=1 𝑞𝑗 (𝑥) (mod𝑝) and ∏𝑗=1 𝑝𝑗 (𝑥) ≡ ∏𝑗=1 𝑞𝑗 (𝑥) (mod𝑝). By Theorem 3.1.3, it follows that (1 + 𝑥)𝑝 ≡ 1 + 𝑥𝑝 (mod𝑝), and by induction that (8.6.3)
𝑝𝑗 𝑗 𝑗 (1 + 𝑥)𝑝 ≡ 1 + 𝑥𝑝 (mod𝑝) (whence, ( ) ≡ 0 (mod𝑝) if 1 ≤ 𝑘 ≤ 𝑝𝑗 − 1). 𝑘
If 𝑘 > 𝑛, then 𝑘𝑗 > 𝑛𝑗 for some 𝑗, and so (8.6.2) holds in the form 0 ≡ 0 (mod𝑝). So suppose that 0 ≤ 𝑘 ≤ 𝑛. Using uniqueness of the 𝑝-adic expansions of 𝑛 and 𝑘, we have 𝑛
𝑛 𝑟 ∑ ( )𝑥𝑘 = (1 + 𝑥)𝑛 = (1 + 𝑥)𝑛0 +𝑛1 𝑝+⋯+𝑛𝑟 𝑝 𝑘 𝑘=0 (8.6.4) =
∑ 0≤𝑘𝑗 0. Establish this equivalence as a corollary of the Möbius inversion principle. 8.3. Let circword(𝑛, 𝑘; 𝑛1 , . . . , 𝑛𝑘 ) denote the number of circular words of length 𝑛 in the alphabet [𝑘] in which the letter 𝑖 appears exactly 𝑛𝑖 times, and let 𝑑 ∶= gcd{𝑛1 , . . . , 𝑛𝑘 }. Prove that circword(𝑛, 𝑘; 𝑛1 , . . . , 𝑛𝑘 ) =
1 𝑟𝑛/𝑑 𝑑 ∑ ∑ 𝜇(𝑗/𝑟)( ). 𝑛 𝑗|𝑑 𝑗 𝑟|𝑗 𝑟𝑛1 /𝑑, . . . , 𝑟𝑛𝑘 /𝑑
8.4. Simplify the preceding double sum by showing that circword(𝑛, 𝑘; 𝑛1 , . . . , 𝑛𝑘 ) =
1 𝑛/𝑟 ∑ 𝜙(𝑟)( ). 𝑛 𝑟|𝑑 𝑛1 /𝑟, . . . , 𝑛𝑘 /𝑟
𝑟
8.5. Prove that 𝑆𝑟 (𝑛) = ∑𝑗=0 𝜎(𝑟, 𝑗)(𝑛+1 ). 𝑗+1 8.6. In how many zeros does the base 10 representation of 1000! terminate? 8.7. Prove Corollary 8.5.2. 8.8. Generalize formula (8.3.5) by showing that ∑ ∑ 𝑝(𝑛, 𝑘)𝑥𝑛 𝑦𝑘 = ∏ 𝑛≥0 𝑘≥0
𝑖≥1
1 . 1 − 𝑥𝑖 𝑦
8.9. Express the generating function ∑𝑛≥0 𝑝(𝑛, 𝑘 or fewer parts)𝑥𝑛 as a product. 8.10. Express the generating function ∑𝑛≥0 𝑝(𝑛, 𝑘)𝑥𝑛 as a product. 8.11. Let 𝑝(𝑛, smallest part = 𝑘) denote the number of partitions of 𝑛 with smallest part equal to 𝑘. Express the generating function ∑𝑛≥0 𝑝(𝑛, smallest part=𝑘)𝑥𝑛 as a product. 8.12. Prove the following generalization of Theorem 8.3.7. For all 𝑑 ≥ 2, the number of partitions of 𝑛 with no part divisible by 𝑑 is equal to the number of partitions of 𝑛 in which no part appears 𝑑 or more times.
118
8. Combinatorics and number theory
8.13. Consider the following family of three nested pentagons, with the single dot at the top considered to be a degenerate pentagon:
Let pent(1) = 1, pent(2) = 5, pent(3) = 12, and, in general, let pent(𝑘) denote the number of dots in a nested array of 𝑘 pentagons, where each edge of the 𝑘th pentagon contains 𝑘 points. Prove by induction that, for all 𝑘 ≥ 1, pent(𝑘) = 𝑘(3𝑘 − 1)/2. 8.14. Prove Theorem 8.3.13. Hint: By formulas (8.3.5) and (8.3.9), (1 − 𝑥 − 𝑥2 + 𝑥5 + 𝑥7 − 𝑥12 − 𝑥15 + ⋯)(1 + 𝑝(1)𝑥 + 𝑝(2)𝑥2 + ⋯) = 1. 8.15. We have defined a partition of the integer 𝑛 as a multiset of positive integers with sum equal to 𝑛. Less formally, one can think of such a partition as a way of writing 𝑛 as an unordered sum of positive integers, and thus as an unordered counterpart of a composition of 𝑛, defined in Chapter 1 as a way of writing 𝑛 as an ordered sum of positive integers. Thus, the relation between partitions of 𝑛 and compositions of 𝑛 appears at first glance to be similar to the relation between partitions of a set 𝑆 and ordered partitions of 𝑆. But the simple relation between the number of partitions of [𝑛] with 𝑘 blocks and the number of ordered partitions of [𝑛] with 𝑘 blocks has no counterpart for partitions of 𝑛 with 𝑘 parts and compositions of 𝑛 with 𝑘 parts. Explain why.
Project 8.A Suppose that 𝐹 ∶ ℕ → ℂ and 𝐹 ∶ ℙ → ℂ. Recall that 𝐹 is said to extend 𝐹 if, for all 𝑛 ∈ ℕ and for every product 𝑝1 ⋯ 𝑝𝑛 of distinct primes, it is the case that 𝐹(𝑛) = 𝐹(𝑝1 ⋯ 𝑝𝑛 ). In particular, 𝐹(0) = 𝐹(1). In Exercise 3.22 you were asked to determine 𝑚(𝑛, 𝑘), the number of minimal ordered 𝑘 -covers of [𝑛]. For each fixed 𝑘 ∈ ℙ, and each 𝑛 ∈ ℙ, let (a) 𝑚(𝑛, ̃ 𝑘) ∶= |{(𝑑1 , . . . , 𝑑𝑘 ) ∶ lcm(𝑑𝑖 ) = 𝑛 and, for all 𝑗 ∈ [𝑘], lcm(𝑑𝑖 ∶ 𝑖 ≠ 𝑗) < 𝑛}|. (b) 𝑚(𝑛, 𝑘) ∶= |{(𝑑1 , . . . , 𝑑𝑘 ) ∶ 𝑑𝑖 |𝑛, 𝑛| ∏ 𝑑𝑖 , and, for all 𝑗 ∈ [𝑘], 𝑛 does not divide ∏𝑖≠𝑗 𝑑𝑖 }|. 𝑘
(c) 𝑚(𝑛, ̂ 𝑘) ∶= ∑𝑗=0 (−1)𝑗 (𝑘𝑗)𝜁2
𝑘 −𝑗−1
(𝑛).
̂ 𝑘) each extend 𝑚(𝑛, 𝑘), and find formulas for each Prove that 𝑚(𝑛, ̃ 𝑘), 𝑚(𝑛, 𝑘), and 𝑚(𝑛, 𝑛 𝑛 𝑛 of these extensions when 𝑛 = 𝑝1 1 𝑝2 2 ⋯ 𝑝𝑠 𝑠 , where 𝑝1 , 𝑝2 , . . . , 𝑝𝑠 are distinct primes.
Chapter 9
Differences and sums
This chapter begins with a brief account of the calculus of finite differences, a discrete analogue of differential and integral calculus. In particular, we shall see that an easily established discrete analogue of the fundamental theorem of calculus enables us to evaluate certain finite sums, just as the classical fundamental theorem enables the evaluation of certain definite integrals. We then consider another way to evaluate finite sums, namely, by evaluating their generating functions, a surprisingly simple and effective technique that Wilf (1990) has dubbed the snake oil method. We conclude with a study of difference equations, which play the same role in the calculus of finite differences as do differential equations in classical calculus. While the subject has many applications in fields far removed from combinatorics, our chief interest will be in applying the methods for solving such equations to the problem of deriving closed forms from recursive formulas.
9.1. Finite difference operators Let 𝑆 ⊆ ℝ be such that 𝑥 ∈ 𝕊 ⇒ 𝑥 + 1 ∈ 𝕊 (typically, 𝑆 = ℕ, ℤ, or ℝ). Define the (forward) difference operator Δ for all 𝑓 ∶ 𝑆 → ℝ and all 𝑥 ∈ 𝑆 by (9.1.1)
Δ𝑓(𝑥) ∶= 𝑓(𝑥 + 1) − 𝑓(𝑥).
Theorem 9.1.1. Δ is a linear operator on the ℝ- vector space ℝ𝑆 , with the following properties. (i) Δ𝑐 = 0 for every periodic constant 𝑐 (i.e., for every 𝑐 ∶ 𝑆 → 𝑅 such that 𝑐(𝑥) = 𝑐(𝑥 + 1) for all 𝑥 ∈ 𝑆). 𝑛−1
(ii) Δ𝑥𝑛 = ∑𝑘=0 (𝑛𝑘)𝑥𝑘 , and so Δ is degree-reducing on polynomial functions. 𝑥 (iii) Δ𝑥𝑛 = 𝑛𝑥𝑛−1 , whence, Δ(𝑛𝑥) = (𝑛−1 ) for all 𝑛 > 0.
(iv) Δ2𝑥 = 2𝑥 . (v) Δ𝑓(𝑥)𝑔(𝑥) = 𝑓(𝑥)Δ𝑔(𝑥) + 𝑔(𝑥 + 1)Δ𝑓(𝑥) = 𝑓(𝑥 + 1)Δ𝑔(𝑥) + 𝑔(𝑥)Δ𝑓(𝑥). 119
120
(vi) Δ
9. Differences and sums
𝑔(𝑥)Δ𝑓(𝑥) − 𝑓(𝑥)Δ𝑔(𝑥) 𝑓(𝑥) = . 𝑔(𝑥) 𝑔(𝑥)𝑔(𝑥 + 1) □
Proof. Exercise.
Higher order finite differences are defined inductively by (𝑖) Δ0 𝑓 = 𝑓 and (𝑖𝑖) Δ 𝑓 = Δ(Δ𝑛 𝑓) for all 𝑛 ≥ 0. Like Δ, each of the higher order differences Δ𝑛 is a linear operator on the ℝ- vector space ℝ𝑆 . Since Δ is degree-reducing on polynomial functions, it follows immediately that Δ𝑛 𝑝(𝑥) = 0 if deg 𝑝(𝑥) < 𝑛. The following theorem elaborates Theorem 9.1.1(𝑖𝑖𝑖). 𝑛+1
Theorem 9.1.2. If 0 ≤ 𝑗 ≤ 𝑛, (9.1.2)
𝑗 𝑛−𝑗
Δ 𝑗 𝑥𝑛 = 𝑛 𝑥
𝑥 𝑥 and Δ𝑗 ( ) = ( ). 𝑛 𝑛−𝑗
In particular, Δ𝑛 𝑥𝑛 = 𝑛! and Δ𝑛 (𝑛𝑥) = 1. □
Proof. Exercise.
As shown in the following theorem, the value Δ𝑘 𝑓(𝑥) is determined by the values of 𝑓 at the 𝑘 + 1 distinct points 𝑥, 𝑥 + 1, . . . , 𝑥 + 𝑘. Theorem 9.1.3. For all 𝑘 ∈ ℕ, 𝑘
(9.1.3)
𝑘 Δ𝑘 𝑓(𝑥) = ∑ (−1)𝑘−𝑗 ( )𝑓(𝑥 + 𝑗). 𝑗 𝑗=0
Proof. Define the shift operator 𝐸 on ℝ𝑆 by 𝐸𝑓(𝑥) = 𝑓(𝑥 + 1), and the identity operator 𝐼 by 𝐼𝑓(𝑥) = 𝑓(𝑥), with powers of these operators defined inductively by 𝐸 0 𝑓 = 𝐼 0 𝑓 = 𝑓, 𝐸 𝑛+1 𝑓 = 𝐸(𝐸 𝑛 𝑓) and 𝐼 𝑛+1 𝑓 = 𝐼(𝐼 𝑛 𝑓) for all 𝑛 ≥ 0. Clearly, 𝐼 𝑗 𝑓(𝑥) = 𝑓(𝑥) and 𝐸 𝑗 𝑓(𝑥) = 𝑓(𝑥 + 𝑗) for all 𝑗 ≥ 0. Since the linear operators 𝐸 and 𝐼 commute in the ring of linear operators on the ℝ-vector space ℝ𝑆 , and Δ = 𝐸 − 𝐼, it follows from Theorem 𝑘 3.2.5 that Δ𝑘 = (𝐸 − 𝐼)𝑘 = ∑𝑗=0 (−1)𝑘−𝑗 (𝑘𝑗)𝐸 𝑗 , which is simply the operator form of formula (9.1.3). □ Remark 9.1.4. Alternating sums involving binomial coefficients can sometimes be rapidly evaluated by observing that they are higher differences of some function 𝑓(𝑥) evaluated at a particular value of 𝑥. For example, the orthogonality relation 𝑛
(9.1.4)
𝑛 𝑘 ∑ (−1)𝑛−𝑘 ( )( ) = 𝛿𝑛,𝑗 , 0 ≤ 𝑗 ≤ 𝑛, 𝑘 𝑗 𝑘=0
which was proved earlier as part of Theorem 3.3.2, follows immediately from the observation that the above sum is equal to Δ𝑛 (𝑥𝑗)|𝑥=0 , along with the previously noted facts
that Δ𝑛 (𝑥𝑗) = 0 if 𝑗 < 𝑛, and Δ𝑛 (𝑛𝑥) = 1.
9.2. Polynomial interpolation
121
9.2. Polynomial interpolation A polynomial 𝑝(𝑥) of degree 𝑛 is fully determined by its values at 𝑛 + 1 distinct values of 𝑥. When the values 𝑝(0), 𝑝(1), . . . , 𝑝(𝑛) are known, there is a particularly nice formula for 𝑝(𝑥), which furnishes a finite difference analogue of Taylor’s theorem for polynomials. Theorem 9.2.1. If deg 𝑝(𝑥) = 𝑛, then 𝑛
(9.2.1)
𝑥 𝑝(𝑥) = ∑ Δ𝑘 𝑝(0)( ), where Δ𝑘 𝑝(0) ∶= Δ𝑘 𝑝(𝑥)|𝑥=0 . 𝑘 𝑘=0 𝑛
Proof. Suppose that 𝑝(𝑥) = ∑𝑗=0 𝑎𝑗 (𝑥𝑗). By (9.1.2), 𝑛
𝑛
𝑥 𝑥 Δ𝑘 𝑝(𝑥) = ∑ 𝑎𝑗 Δ𝑘 ( ) = ∑ 𝑎𝑗 ( ), 𝑗 𝑗 − 𝑘 𝑗=0 𝑗=𝑘 □
and so Δ𝑘 𝑝(0) = 𝑎𝑘 .
Example 9.2.2. When implementing the interpolation formula (9.2.1) one can rapidly calculate the quantities Δ𝑘 𝑝(0) by means of a difference table, illustrated in Table 9.1, for the case in which one wishes to construct the unique polynomial 𝑝(𝑥) of degree ≤ 4 satisfying 𝑝(0) = 10, 𝑝(1) = 3, 𝑝(2) = −17, 𝑝(3) = 0, and 𝑝(4) = 100. Table 9.1
𝑗=0 𝑗=1 𝑗=2 𝑗=3 𝑗=4
𝑝(𝑗)
Δ𝑝(𝑗)
Δ2 𝑝(𝑗)
Δ3 𝑝(𝑗)
Δ4 𝑝(𝑗)
10 3 −17 0 100
−7 −20 17 100
−13 37 83
50 46
−4
Reading off the values of Δ𝑘 𝑝(0) for 𝑘 = 0, . . . , 4 from the first row of Table 9.1 yields the result 𝑝(𝑥) = 10(𝑥0) − 7(𝑥1) − 13(𝑥2) + 50(𝑥3) − 4(𝑥4). In the above example the values of 𝑝(0), . . . , 𝑝(4) were, so to speak, externally prescribed, and our aim was simply to produce the unique polynomial of degree ≤ 4 taking those values. In a variant on this problem, suppose we have already proved that some function of combinatorial or number-theoretic interest must be a polynomial in 𝑛 of degree 𝑑. To identify that polynomial, we evaluate, by computation or listing-andcounting, the values of that function at 0, 1, . . . , 𝑑 and apply formula (9.2.1), setting 𝑛 = 𝑑 and 𝑥 = 𝑛. For example, we proved as a consequence of (8.4.1) that the power sum 𝑆𝑟 (𝑛) = 0𝑟 + 1𝑟 + ⋯ + 𝑛𝑟 must be a polynomial in 𝑛 of degree 𝑟 + 1. Suppose now that we want to find a formula for 𝑆 3 (𝑛). This formula follows immediately from the values (which we compute) 𝑆 3 (0) = 0, 𝑆 3 (1) = 1, 𝑆 3 (2) = 9, 𝑆 3 (3) = 36, and 𝑆 3 (4) = 100, using Table 9.2.
122
9. Differences and sums
Table 9.2
𝑆 3 (𝑗) Δ𝑆 3 (𝑗) 𝑗=0 𝑗=1 𝑗=2 𝑗=3 𝑗=4
0 1 9 36 100
1 8 27 64
Δ2 𝑆 3 (𝑗)
Δ3 𝑆 3 (𝑗)
Δ4 𝑆 3 (𝑗)
7 19 37
12 18
6
It follows that 𝑆 3 (𝑛) = (𝑛1) + 7(𝑛2) + 12(𝑛3) + 6(𝑛4) = 𝑛2 (𝑛 + 1)2 /4. Finally, difference tables can be used to explore whether a given function 𝑓(𝑛) might be a polynomial in n. For example, suppose that we did not already know that the power sum 𝑆𝑟 (𝑛) is a polynomial in 𝑛 of degree 𝑟 + 1, and we were seeking a formula for 𝑆 2 (𝑛). We compute, say, the values 0, 1, 5, 14, 30, 55, and 91 for 𝑆 2 (𝑛), 𝑛 = 0, . . . , 6, and construct a difference table (Table 9.3) for these numbers. Table 9.3
𝑆 2 (𝑗) Δ𝑆 2 (𝑗) Δ2 𝑆 2 (𝑗) 𝑗=0 𝑗=1 𝑗=2 𝑗=3 𝑗=4 𝑗=5 𝑗=6
0 1 5 14 30 55 91
1 4 9 16 25 36
3 5 7 9 11
Δ3 𝑆 2 (𝑗)
Δ4 𝑆 2 (𝑗)
Δ5 𝑆 2 (𝑗)
xΔ6 𝑆 2 (𝑗)
2 2 2 2
0 0 0
0 0
0
The fact that Δ4 𝑆 2 (0) = Δ5 𝑆 2 (0) = Δ6 𝑆 2 (0) = 0 suggests (though it does not prove) that 𝑆 2 (𝑛) might be a polynomial in 𝑛 of degree 3, leading to the conjecture that 𝑆 2 (𝑛) = (𝑛1) + 3(𝑛2) + 2(𝑛3) = 𝑛(𝑛 + 1)(2𝑛 + 1)/6. It would then remain to prove this conjecture, say, by induction on 𝑛. Remark 9.2.3 (An alternative evaluation of 𝜎(𝑛, 𝑘)). Theorem 9.2.1 furnishes yet another way of deriving the formula for 𝜎(𝑛, 𝑘), the number of surjective functions 𝑓 ∶ 𝑛 [𝑛] → [𝑘]. By Theorem 6.3.1, 𝑥𝑛 = ∑𝑘=0 𝜎(𝑛, 𝑘)(𝑥𝑘), and so by (9.2.1) and Theorem 𝑘
9.1.3, 𝜎(𝑛, 𝑘) = Δ𝑘 𝑥𝑛 |𝑥=0 = ∑𝑗=0 (−1)𝑘−𝑗 (𝑘𝑗)𝑗𝑛 .
9.3. The fundamental theorem of the finite difference calculus If Δ𝐹 = 𝑓, 𝐹 is called an antidifference of f, also symbolized as Δ−1 𝑓 = 𝐹. If 𝐹 is an antidifference of 𝑓, then so is 𝐹 + 𝑐, where 𝑐 is any periodic constant. The following theorem is a discrete analogue of the fundamental theorem of calculus. Unlike the proof of the latter, however, its proof is an utter triviality.
9.4. The snake oil method
123
Theorem 9.3.1 (Fundamental theorem of the finite difference calculus). If 𝑎, 𝑏 ∈ ℤ, with 𝑎 ≤ 𝑏, and 𝐹 is an antidifference of 𝑓, then 𝑏
∑ 𝑓(𝑗) = 𝐹(𝑏 + 1) − 𝐹(𝑎).
(9.3.1)
𝑗=𝑎
𝑏
𝑏
𝑏
Proof. By assumption, ∑𝑗=𝑎 𝑓(𝑗) = ∑𝑗=𝑎 Δ𝐹(𝑗) = ∑𝑗=𝑎 𝐹(𝑗 + 1) − 𝐹(𝑗), which telescopes to 𝐹(𝑏 + 1) − 𝐹(𝑎). □ In order to apply the above theorem, it is obviously useful to have a table of antidifferences. Table 9.4 a few of the most basic, which can be verified in the obvious way. For more extensive tables, see Spiegel (1971). Table 9.4. A short table of antidifferences
1. 2. 3. 4. 5.
𝑓(𝑥) 0 2𝑥 𝑥𝑛 , 𝑛 ≥ 0 (𝑛𝑥), 𝑛 ≥ 0 𝑥𝑛 , 𝑛 ≥ 0
Δ−1 𝑓(𝑥) 𝑐 2𝑥 + 𝑐 1 𝑥𝑛+1 + 𝑐 𝑛+1 𝑥 (𝑛+1) + 𝑐 𝑛 𝑥 ∑𝑘=0 𝜎(𝑛, 𝑘)(𝑘+1 )+𝑐
In applications to the evaluation of finite sums, the variable 𝑥 in Table 9.4 is often replaced by a variable (such as 𝑖 or 𝑗) ranging over integers, as in Example 9.3.2. Example 9.3.2. Here is a derivation using Theorem 9.3.1 of the binomial coefficient 𝑗 identity (3.1.6), proved earlier by a simple combinatorial argument. Since 𝐹(𝑗) = (𝑘+1 ) 𝑛
0 is an antidifference of 𝑓(𝑗) = (𝑘𝑗 ), ∑𝑗=0 (𝑘𝑗 ) = 𝐹(𝑛 + 1) − 𝐹(0) = (𝑛+1 ). ) = (𝑛+1 ) − (𝑘+1 𝑘+1 𝑘+1 Additional applications of Theorem 9.3.1 appear in the chapter exercises.
9.4. The snake oil method 𝑛
In what follows we show how a sum 𝑠(𝑛) = ∑𝑘=0 𝑓(𝑘) can sometimes be evaluated by identifying the generating function ∑𝑛≥0 𝑠(𝑛)𝑥𝑛 . In order to make use of this technique, which Wilf (1990) has dubbed the snake oil method, we need to have a stock of basic generating functions for reference. The following list, which includes several previously noted results for the sake of completeness, should suffice for our purposes. Although the series below can be construed as formal power series (though only for 𝛼 ∈ ℚ in the case of (9.4.1)), we have indicated their intervals of convergence (except in (9.4.1), where that interval depends on 𝛼, but always includes (−1, 1)) for interested
124
9. Differences and sums
readers. (9.4.1)
𝛼 ∑ ( )𝑥𝑛 = (1 + 𝑥)𝛼 , for all 𝛼 ∈ ℝ, |𝑥| < 1 𝑛 𝑛≥0
(9.4.2)
𝑥𝑘 𝑛 ∑ ( )𝑥𝑛 = , for all 𝑘 ∈ ℕ, |𝑥| < 1. 𝑘 (1 − 𝑥)𝑘+1 𝑛≥0
(9.4.3) (9.4.4) (9.4.5)
2𝑛 𝑛 1 , − 1/4 ≤ 𝑥 < 1/4. )𝑥 = 𝑛 √1 − 4𝑥 𝑛≥0 1 ∑ 𝐹𝑛 𝑥𝑛 = , |𝑥| < (√5 − 1)/2. 1 − 𝑥 − 𝑥2 𝑛≥0 ∑(
1 𝑥𝑛 = ln( ) = − ln(1 − 𝑥), − 1 ≤ 𝑥 < 1. 𝑛 1 − 𝑥 𝑛≥1 ∑
In what follows we apply the snake oil method to the evaluation of several familiar sums with easy combinatorial proofs, just to illustrate the sort of techniques involved. As will be seen in the examples below, the great virtue of the snake oil method is that, when successful, it enables one both to derive and to verify the value of a given sum simultaneously. Literally all applications of this method involve interchanging infinite multiple sums. Such summation interchange can be justified either analytically, or by means of the theory of formal power series. See Appendix A for the former and Chapter 13 for the latter. 𝑛
Example 9.4.1. Evaluate 𝑠(𝑛) = ∑𝑘=0 (𝑛𝑘). Solution: 𝑛
𝑛 ∑ 𝑠(𝑛)𝑥𝑛 = ∑ ∑ ( )𝑥𝑛 𝑘 𝑛≥0 𝑛≥0 𝑘=0 𝑛 𝑥𝑘 = ∑ ∑ ( )𝑥𝑛 = ∑ 𝑘 (1 − 𝑥)𝑘+1 𝑘≥0 𝑛≥𝑘 𝑘≥0 =
1 𝑥 𝑘 1 1 1 ∑( = ⋅ = ∑ 2𝑛 𝑥𝑛 , ) = 1 − 𝑥 𝑘≥0 1 − 𝑥 1−𝑥 1− 𝑥 1 − 2𝑥 𝑛≥0 1−𝑥
whence, 𝑠(𝑛) = 2𝑛 . 𝑛
𝑠 Example 9.4.2. Evaluate 𝑠(𝑛) = ∑𝑘=0 (𝑘𝑟 )(𝑛−𝑘 ) for fixed 𝑟, 𝑠 ∈ ℕ. Solution: 𝑛
𝑟 𝑠 𝑟 𝑠 ∑ 𝑠(𝑛)𝑥𝑛 = ∑ ∑ ( )( )𝑥𝑛 = ∑ ( )𝑥𝑘 ∑ ( )𝑥𝑛−𝑘 𝑘 𝑛 − 𝑘 𝑘 𝑛 − 𝑘 𝑛≥0 𝑛≥0 𝑘=0 𝑘≥0 𝑛≥𝑘 𝑟
𝑠
𝑟+𝑠
𝑟 𝑚 𝑟+𝑠 𝑛 =(𝑚=𝑛−𝑘) ∑ ( )𝑥𝑘 ∑ ( )𝑥𝑚 = (1 + 𝑥)𝑟+𝑠 = ∑ ( )𝑥 , 𝑘 𝑠 𝑛 𝑘=0 𝑚=0 𝑛=0 whence, 𝑠(𝑛) = (𝑟+𝑠 ). 𝑛 Here is a somewhat more substantial application.
9.4. The snake oil method
125
𝑛
Example 9.4.3. For all 𝑛 ∈ ℕ, evaluate 𝑠(𝑛) = ∑𝑘=0 (−1)𝑛−𝑘 𝑘2 . Solution: 𝑛 𝑛
∑ 𝑠(𝑛)𝑥 = ∑ ∑ (−1)𝑛−𝑘 𝑘2 𝑥𝑛 = ∑ 𝑘2 𝑥𝑘 ∑ (−1)𝑛−𝑘 𝑥𝑛−𝑘 𝑛≥0
𝑛≥0 𝑘=0
𝑘≥0
=(𝑗=𝑛−𝑘) ∑ 𝑘2 𝑥𝑘 ∑ (−𝑥)𝑗 = 𝑘≥0
𝑗≥0
𝑛≥𝑘
𝑘 𝑘 1 ∑ [2( ) + ( )]𝑥𝑘 1 + 𝑥 𝑘≥0 2 1
=
1 𝑘 𝑘 {2 ∑ ( )𝑥𝑘 + ∑ ( )𝑥𝑘 } 1 + 𝑥 𝑘≥0 2 1 𝑘≥0
=
1 𝑥 1 𝑚 2𝑥2 𝑥2 + }= ⋅ = ∑ ( )𝑥𝑚−1 { 3 2 1 + 𝑥 (1 − 𝑥) 𝑥 (1 − 𝑥)3 𝑚≥2 2 (1 − 𝑥)
=(𝑛=𝑚−1) ∑ ( 𝑛≥1
𝑛+1 𝑛 )𝑥 , 2
whence 𝑠(0) = 0 and 𝑠(𝑛) = (𝑛+1 ) if 𝑛 > 0. 2 Remark 9.4.4. The fact that (𝑛+1 ) = 1 + 2 + ⋯ + 𝑛 suggests a proof without words of 2 the preceding result, which is illustrated in Figure 9.1 for 𝑛 = 5.
Figure 9.1
The following theorem furnishes a list of useful results for combining generating functions or deriving one generating function from another. As usual, these results can be construed purely formally or justified analytically on certain domains. Theorem 9.4.5. Suppose that 𝑓(𝑥) = ∑𝑛≥0 𝑎(𝑛)𝑥𝑛 and 𝑔(𝑥) = ∑𝑛≥0 𝑏(𝑛)𝑥𝑛 . Then (9.4.6)
𝑓(𝑥) + 𝑔(𝑥) = ∑ {𝑎(𝑛) + 𝑏(𝑛)}𝑥𝑛 . 𝑛≥0 𝑛
(9.4.7)
𝑓(𝑥) ⋅ 𝑔(𝑥) = ∑ { ∑ 𝑎(𝑘)𝑏(𝑛 − 𝑘)}𝑥𝑛 . 𝑛≥0 𝑘=0 𝑛
(9.4.8)
1 𝑓(𝑥) = ∑ 𝑠(𝑛)𝑥𝑛 , where 𝑠(𝑛) = ∑ 𝑎(𝑘). 1−𝑥 𝑛≥0 𝑘=0
(9.4.9)
(1 − 𝑥)𝑓(𝑥) = ∑ ∇𝑎(𝑛)𝑥𝑛 , where ∇𝑎(0) = 𝑎(0) 𝑛≥0
and ∇(𝑛) = 𝑎(𝑛) − 𝑎(𝑛 − 1), 𝑛 > 0. (9.4.10)
𝐷𝑥 𝑓(𝑥) = ∑ (𝑛 + 1)𝑎(𝑛 + 1)𝑥𝑛 . 𝑛≥0
126
9. Differences and sums
𝑥𝑛
Furthermore, if 𝜑(𝑥) = ∑𝑛≥0 𝑎(𝑛) 𝑛! ,then (9.4.11)
𝐷𝑥 𝜑(𝑥) = ∑ 𝑎(𝑛 + 1) 𝑛≥0
𝑥𝑛 . 𝑛! □
Proof. Exercise.
9.5. * The harmonic numbers In this section we apply results from the previous section to study the numbers 𝐻𝑛 , 𝑛 1 defined for all 𝑛 ≥ 1 by 𝐻𝑛 = ∑𝑘=1 𝑘 . 𝐻𝑛 is called the 𝑛th harmonic number, the term originating in musical theory, where the 𝑛th harmonic produced by a violin string is the fundamental tone produced by a string that is 1/𝑛 times the length of that string. The sequence of harmonic numbers serves as a discrete analogue of the natural logarithm, since ∇𝐻𝑛 = 1/𝑛. These numbers occur often in the analysis of algorithms (e.g., the quicksort algorithm (Graham, et al., (1994, 28–29 and 272 ff.))), and also (as we show below) as the mean number of cycles in a randomly chosen permutation of [𝑛]. As is well known, lim𝑛→∞ 𝐻𝑛 = ∞. Some sense of the magnitude of these numbers can be gotten from the (crude) bounds (9.5.1)
ln(𝑛 + 1) < 𝐻𝑛 < 1 + ln 𝑛,
𝑛 ≥ 1.
which are easily established by examining step-function approximations to the integrals defining ln(𝑛 + 1) and ln 𝑛. As remarked above, the harmonic numbers have an interesting connection with permutations. Theorem 9.5.1. If 𝜇 denotes the mean number of cycles in a randomly chosen permutation of [𝑛], then 𝜇 = 𝐻𝑛 . 𝑛
Proof. By Theorem 6.6.4, 𝜇 = ∑𝑘=0 𝑘 ⋅ 𝑐(𝑛, 𝑘)/𝑛! =
1 𝐷 𝑥𝑛̄ |𝑥=1 𝑛! 𝑥
𝑛
1
= ∑𝑘=1 𝑘 .
□
The proof of the following theorem exhibits the use of formulas (9.4.6) and (9.4.10) in evaluating the partial sum 𝑛
𝑆𝑛 ∶= ∑ 𝐻𝑘 .
(9.5.2)
𝑘=1
Theorem 9.5.2. 𝑆𝑛 = (𝑛 + 1)[𝐻𝑛+1 − 1]. Proof. By formulas (9.4.5) and (9.4.6), (9.5.3)
𝐻(𝑥) ∶= ∑ 𝐻𝑛 𝑥𝑛 = 𝑛≥1
− ln(1 − 𝑥) , 1−𝑥
and so, again by formula (9.4.6), (9.5.4)
𝑆(𝑥) ∶= ∑ 𝑆𝑛 𝑥𝑛 = 𝑛≥1
− ln(1 − 𝑥) . (1 − 𝑥)2
9.6. Linear homogeneous difference equations with constant coefficients
127
By formulas (9.5.3) and (9.5.4), 𝑆(𝑥) = 𝐻 ′ (𝑥) − (1 − 𝑥)−2 = ∑ (𝑛 + 1)𝐻𝑛+1 𝑥𝑛 − 𝐷𝑥 (1 − 𝑥)−1 𝑛≥0 𝑛
= ∑ (𝑛 + 1)𝐻𝑛+1 𝑥 − 𝐷𝑥 ∑ 𝑥𝑛 = ∑ (𝑛 + 1)𝐻𝑛+1 𝑥𝑛 − ∑ (𝑛 + 1)𝑥𝑛 𝑛≥0
𝑛≥0
𝑛≥0
𝑛≥0
𝑛
= ∑ (𝑛 + 1)[𝐻𝑛+1 − 1]𝑥 .
□
𝑛≥0
Remark 9.5.3. Once Theorem 9.5.2 is known or conjectured, it is of course easily proved by induction. The virtue of the method of generating functions is that one discovers and proves this theorem simultaneously. Remark 9.5.4. It is natural to wonder if 𝐻𝑛 is ever an integer when 𝑛 ≥ 2. That this is never true is a special case of a theorem of Erdős, which asserts that if 𝑘1 < 𝑘2 < ⋯ < 𝑛 𝑘𝑛 is any arithmetic progression of positive integers, then ∑𝑖=1 1/𝑘𝑖 is not an integer.
9.6. Linear homogeneous difference equations with constant coefficients Suppose that 𝑑 ∈ ℙ, 𝑐 0 , 𝑐 1 , . . . , 𝑐 𝑑 ∈ ℂ, 𝑣 0 , 𝑣 1 , . . . , 𝑣 𝑑−1 ∈ ℂ, with 𝑐 0 𝑐 𝑑 ≠ 0. The equations (9.6.1)
𝑓(0) = 𝑣 0 , 𝑓(1) = 𝑣 1 , . . . , 𝑓(𝑑 − 1) = 𝑣 𝑑−1
and (9.6.2) 𝑐 𝑑 𝑓(𝑛 + 𝑑) + 𝑐 𝑑−1 𝑓(𝑛 + 𝑑 − 1) + ⋯ + 𝑐 1 𝑓(𝑛 + 1) + 𝑐 0 𝑓(𝑛) = 0, for all 𝑛 ≥ 0 for an unknown function 𝑓 ∶ ℕ → ℂ constitute a linear homogeneous difference equation of order d with constant coefficients (9.6.2) and initial conditions (9.6.1). Note that the above equations really amount to a certain simple kind of recursive formula for 𝑓, since (9.6.2) may be written in the form 𝑑−1
(9.6.3)
𝑓(𝑛 + 𝑑) =
−1 ∑ 𝑐 𝑓(𝑛 + 𝑘), for all 𝑛 ≥ 0. 𝑐 𝑑 𝑘=0 𝑘
Many recurrence relations arising in enumerative combinatorics have this form, and so it is of interest to have general methods for solving (9.6.2) subject to the initial conditions (9.6.1), the solution being a closed form expression for 𝑓(𝑛). As we shall see, these methods are analogous to those for solving linear homogeneous differential equations with constant coefficients. Moreover, just as the solutions to such differential equations are linear combinations of certain exponential functions, the solutions to linear homogeneous difference equations with constant coefficients are linear combinations of certain geometric progressions (the discrete analogue of exponential functions). It is customary to write (9.6.2) in operator form as (9.6.4)
(𝑐 𝑑 𝐸𝑑 + 𝑐 𝑑−1 𝐸 𝑑−1 + ⋯ + 𝑐 1 𝐸 + 𝑐 0 𝐼)(𝑓) = 𝟎, where 𝟎(𝑛) ≡ 0,
with 𝐸 and 𝐼 defined as in the proof of Theorem 9.1.3. It may seem odd to refer to (9.6.4) as a difference equation, since it is written in terms of the shift operator E. But
128
9. Differences and sums
substituting Δ + 𝐼 for 𝐸 converts (9.6.4) into a genuine difference equation of the form (9.6.5)
(𝑏𝑑 Δ𝑑 + 𝑏𝑑−1 Δ𝑑−1 + ⋯ + 𝑏1 Δ + 𝑏0 𝐼)(𝑓) = 𝟎.
Equations in the above form arise directly, especially in scientific applications, and, as we shall see, their solutions are closely related to the solutions of difference equations in the form (9.6.4). Unlike the case of differential equations, where the proofs of existence and uniqueness of solutions to initial value problems are nontrivial, the existence and uniqueness of solutions to (9.6.2), subject to the initial conditions (9.6.1), are immediately obvious. For (9.6.1) determines the values of 𝑓(𝑛) for 𝑛 = 0, . . . , 𝑑 − 1, and the recurrence (9.6.3) derived from (9.6.2) determines 𝑓(𝑛) for all 𝑛 ≥ 𝑑. We now wish to determine solutions to (9.6.2), as expressed in the operator form (9.6.4). Let (9.6.6)
𝐿 = 𝑐 𝑑 𝐸 𝑑 + 𝑐 𝑑−1 𝐸 𝑑−1 + ⋯ + 𝑐 1 𝐸 + 𝑐 0 𝐼.
𝐿 is clearly a linear operator on the ℂ-vector space ℂℕ , and the set of all solutions 𝑓 to the equation 𝐿(𝑓) = 𝟎 is simply the null space of 𝐿, which we denote by 𝑁𝐿 . Theorem 9.6.1. 𝑁𝐿 is a 𝑑-dimensional subspace of ℂℕ . Proof. The map 𝑓 ↦ (𝑓(0), . . . , 𝑓(𝑑 − 1)) is clearly a linear transformation from 𝑁𝐿 to ℂ𝑑 . It is surjective by the existence of a solution to 𝐿(𝑓) = 𝟎 for each sequence (𝑣 0 , . . . , 𝑣 𝑑−1 ) of initial values, and it is injective by the uniqueness of such a solution, hence an isomorphism. So dim 𝑁𝐿 = dim ℂ𝑑 = 𝑑. □ Perhaps unsurprisingly, solutions to 𝐿(𝑓) = 𝟎 are closely related to roots of the characteristic polynomial 𝑝𝐿 (𝑥) of 𝐿, defined by (9.6.7)
𝑝𝐿 (𝑥) = 𝑐 𝑑 𝑥𝑑 + 𝑐 𝑑−1 𝑥𝑑−1 + ⋯ + 𝑐 1 𝑥 + 𝑐 0 .
Theorem 9.6.2. If 𝑝𝐿 (𝛼) = 0, where 𝛼 ∈ ℂ, then the geometric progression 𝑓(𝑛) = 𝛼𝑛 is a solution to the homogeneous equation 𝐿(𝑓) = 𝟎. Proof. We have 𝑝𝐿 (𝛼) = 𝑐 𝑑 𝛼𝑑 + 𝑐 𝑑−1 𝛼𝑑−1 + ⋯ + 𝑐 1 𝛼 + 𝑐 0 = 0, whence 𝛼𝑛 𝑝𝐿 (𝛼) = 𝑐 𝑑 𝛼𝑛+𝑑 + 𝑐 𝑑−1 𝛼𝑛+𝑑−1 + ⋯ + 𝑐 1 𝛼𝑛+1 + 𝑐 0 𝛼𝑛 = 0 for all 𝑛 ≥ 0.
□
Theorem 9.6.3. Suppose that the roots 𝛼1 , . . . , 𝛼𝑑 of 𝑝𝐿 (𝑥) are distinct and 𝑓𝑖 (𝑛) = 𝛼𝑛𝑖 for all 𝑛 ≥ 0.Then {𝑓1 , . . . , 𝑓𝑑 } is a basis for 𝑁𝐿 . Proof. By Theorem 9.6.2, each 𝑓𝑖 ∈ 𝑁𝐿 , and since dim𝑁𝐿 = 𝑑, it suffices to show that the functions 𝑓𝑖 are linearly independent. Suppose then that 𝜆1 𝑓1 + ⋯ + 𝜆𝑑 𝑓𝑑 = 𝟎.
9.6. Linear homogeneous difference equations with constant coefficients
129
Instantiating this equation for 𝑛 = 0, 1, . . . , 𝑑 − 1 yields the following matrix equation.
(9.6.8)
1 ⎡ ⎢ 𝛼1 ⎢ 𝛼2 ⎢ 1 ⎢ ⋅ ⎢ ⋅ ⎢ 𝑑−1 ⎣𝛼1
1 𝛼2 𝛼22
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
𝛼𝑑−1 2
⋅
⋅
⋅
1 𝜆 0 ⎤ ⎡ 1⎤ ⎡ ⎤ 𝛼𝑑 ⎥ ⎢ 𝜆2 ⎥ ⎢0⎥ 𝛼2𝑑 ⎥ ⎢ ⋅ ⎥ ⎢ ⋅ ⎥ ⎥⎢ ⎥ = ⎢ ⎥ ⋅ ⎥ ⎢ ⋅ ⎥ ⎢⋅⎥ ⎢ ⎥ ⎢ ⎥ ⋅ ⎥ ⎥ ⎢ ⋅ ⎥ ⎢⋅⎥ 𝑑−1 𝛼𝑑 ⎦ ⎣𝜆𝑑 ⎦ ⎣0⎦
If the above matrix is invertible, then 𝜆1 = ⋯ = 𝜆𝑑 = 0. Moreover, the equation
(9.6.9)
1 ⎡ ⎢ 𝛼1 ⎢ 𝛼2 ⎢ 1 ⎢ ⋅ ⎢ ⋅ ⎢ 𝑑−1 ⎣𝛼1
1 𝛼2 𝛼22 𝛼𝑑−1 2
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅
⋅
⋅
1 𝜆 𝑣 ⎤ ⎡ 1⎤ ⎡ 0 ⎤ 𝛼𝑑 ⎥ ⎢ 𝜆2 ⎥ ⎢ 𝑣 1 ⎥ 𝛼2𝑑 ⎥ ⎢ ⋅ ⎥ ⎢ ⋅ ⎥ ⎥⎢ ⎥ = ⎢ ⎥ ⋅ ⎥⎢ ⋅ ⎥ ⎢ ⋅ ⎥ ⎢ ⎥ ⎢ ⎥ ⋅ ⎥ ⎥⎢ ⋅ ⎥ ⎢ ⋅ ⎥ 𝑑−1 𝛼𝑑 ⎦ ⎣𝜆𝑑 ⎦ ⎣𝑣 𝑑−1 ⎦
determines unique values of 𝜆1 , . . . , 𝜆𝑑 , resulting in a unique solution to the difference equation (9.6.2) with the initial values specified by (9.6.1). But by a classical theorem of Vandermonde, to be proved below, the determinant of the coefficient matrix in (9.6.8) is equal to ∏1≤𝑖 0, with 0𝑞 ∶= 0.
The q-factorial of the first kind 𝑛!𝑞 is defined by (11.1.2)
𝑛!𝑞 ∶= 𝑛𝑞 ⋅ (𝑛 − 1)𝑞 ⋯ 1 if 𝑛 > 0, with 0!𝑞 ∶= 1. 163
164
11. Finite vector spaces
𝑛
The q-factorial of the second kind 𝑛𝑞 is defined by (11.1.3)
0
𝑛
𝑛𝑞 ∶= 𝑛𝑞 ⋅ (𝑛𝑞 − 1𝑞 ) ⋅ (𝑛𝑞 − 2𝑞 ) ⋯ (𝑛𝑞 − (𝑛 − 1)𝑞 ) if 𝑛 > 0, with 0𝑞 ∶= 1.
Note that 𝑛𝑞 = 𝑞( 2 ) 𝑛!𝑞 for all 𝑛 ≥ 0. 𝑛
(11.1.4)
𝑛
𝑛
As we shall see, 𝑛𝑞 , 𝑛!𝑞 , and 𝑛𝑞 all have combinatorial interpretations, with 𝑛𝑞 enumerating the one-dimensional subspaces of 𝐹𝑞𝑛 , 𝑛!𝑞 the number of maximal chains in the 𝑛 lattice of subspaces of 𝐹𝑞𝑛 , and 𝑛𝑞 the number of ordered direct sum decompositions of 𝑛 𝐹𝑞 with 𝑛 one-dimensional direct summands. Remark 11.1.1. It should be emphasized that the symbols !𝑞 and 𝑞 are unitary notations, and one should take care to write the letter 𝑞 directly below the symbols ! and −, 𝑛 so that 𝑛!𝑞 is confused neither with (𝑛! )𝑞 nor with (𝑛𝑞 )!, and 𝑛𝑞 is confused neither with 𝑛 𝑛 (𝑛 )𝑞 nor with (𝑛𝑞 ) . There is not much uniformity in 𝑞-notation in the literature. For example, Stanley (1997) uses (n) for the 𝑞-integer 𝑛𝑞 and (n)! for the 𝑞-factorial 𝑛!𝑞 . Remark 11.1.2. In much of this chapter 𝑞 is construed as the cardinality of a finite field. It should be noted, however, that formulas like (11.1.1)–(11.1.3) are perfectly meaningful if 𝑞 is construed as an indeterminate. Throughout this chapter, but especially in sections 11.9 and 11.10, we will encounter the fascinating interplay between these two ways of construing 𝑞.
11.2. Linear spans and linear independence We begin by recalling a few basic facts of linear algebra. If 𝑉 is a vector space over any field and 𝑆 is a subset of 𝑉, the linear span of S, denoted by ⟨𝑆⟩, is the intersection of all subspaces of 𝑉 that contain 𝑆. In particular, ⟨∅⟩ = {𝟎}. Of course, ⟨𝑆⟩ is a subspace of 𝑉, and is easily shown, when 𝑆 ≠ ∅, to consist of all finite linear combinations of members of 𝑆. If 𝑥1 , . . . , 𝑥𝑛 are members of 𝑉, the linear span of the set 𝑆 = {𝑥1 , . . . , 𝑥𝑛 } is denoted simply by ⟨𝑥1 , . . . , 𝑥𝑛 ⟩. A subset 𝑆 of 𝑉 is linearly independent if, for every finite sequence 𝑥1 , . . . , 𝑥𝑘 in 𝑆, 𝜆1 𝑥1 + ⋯ + 𝜆𝑘 𝑥𝑘 = 𝟎 ⇒ 𝜆1 = ⋯ = 𝜆𝑘 = 0. The empty set ∅ is (vacuously) a linearly independent subset of every vector space. A subset 𝐵 of 𝑉 is a basis of V if it is linearly independent, and ⟨𝐵⟩ = 𝑉. In particular, ∅ is a basis of the vector space {𝟎}. Any two bases of a vector space have the same (possibly infinite) cardinality, and the dimension of a vector space 𝑉, denoted dim 𝑉, is defined as the cardinality of any basis of 𝑉. In enumerating linearly independent subsets and bases of the finite vector spaces 𝑉, it is useful to distinguish various sequential orderings of these sets. When a basis of 𝑉 is written as a sequence (𝑥1 , . . . , 𝑥𝑛 ), we shall refer to it as an ordered basis of 𝑉. The following result is fundamental. Theorem 11.2.1. If 𝑛, 𝑘 ≥ 0, the number of linearly independent sequences (𝑥1 , . . . , 𝑥𝑘 ) in 𝐹𝑞𝑛 is equal to 𝑘−1
(11.2.1)
∏(𝑞𝑛 − 𝑞𝑗 ). 𝑗=0
11.3. Counting subspaces
165
In particular, the number of ordered bases of 𝐹𝑞𝑛 is equal to 𝑛−1
∏(𝑞𝑛 − 𝑞𝑗 ).
(11.2.2)
𝑗=0
Proof. Suppose first that 1 ≤ 𝑘 ≤ 𝑛. In constructing a linearly independent sequence (𝑥1 , . . . , 𝑥𝑘 ) in 𝐹𝑞𝑛 , 𝑥1 may be any of the 𝑞𝑛 − 1 nonzero vectors in 𝐹𝑞𝑛 , 𝑥2 any of the 𝑞𝑛 − 𝑞 vectors in 𝐹𝑞𝑛 − ⟨𝑥1 ⟩, 𝑥3 any of the 𝑞𝑛 − 𝑞2 vectors in 𝐹𝑞𝑛 − ⟨𝑥1 , 𝑥2 ⟩, . . . , and 𝑥𝑘 any of the 𝑞𝑛 − 𝑞𝑘−1 vectors in 𝐹𝑞𝑛 − ⟨𝑥1 , . . . , 𝑥𝑘−1 ⟩. If 1 ≤ 𝑛 < 𝑘, then formula (11.2.1) takes the value 0, as it should. Suppose now that 𝑛 = 0. So dim({𝟎}) = 0. If 𝑛 = 0 and 𝑘 = 0, then formula (11.2.1) is an empty product, hence equal to 1, as it should, ∅ being regarded as an empty sequence (or sequence of length 0). If 0 = 𝑛 < 𝑘, then (11.2.1) takes the value 0, as it should. Finally, if 0 = 𝑘 < 𝑛, then (11.2.1) is again an empty product, and takes the value 1, as it should. Formula (11.2.2) is simply the case 𝑘 = 𝑛 of (11.2.1), ∅ being regarded as an ordered basis of 𝑉 = {𝟎}. □ Theorem 11.2.2. Suppose that 𝑛, 𝑘 ≥ 0. Let 𝑈 and 𝑉 be 𝐹𝑞 -vector spaces, with dim 𝑈 = 𝑘 and dim 𝑉 = 𝑛. Denote by 𝑖𝑞 (𝑘, 𝑛) the number of monomorphisms (i.e., injective linear transformations) 𝑡 ∶ 𝑈 → 𝑉. Then 𝑘−1
𝑖𝑞 (𝑘, 𝑛) = ∏(𝑞𝑛 − 𝑞𝑗 ),
(11.2.3)
𝑗=0
and so if dim 𝑈 = dim 𝑉 = 𝑛, there are 𝑛−1
∏(𝑞𝑛 − 𝑞𝑗 )
(11.2.4)
𝑗=0
isomorphisms (bijective linear transformations) 𝑡 ∶ 𝑈 → 𝑉. Proof. If 𝑘 = 0 or 𝑘 > 𝑛, formula (11.2.3) is obvious. If 1 ≤ 𝑘 ≤ 𝑛, let (𝑢1 , . . . , 𝑢𝑘 ) be an ordered basis of 𝑈. A linear transformation 𝑡 ∶ 𝑈 → 𝑉 is injective if and only if the sequence (𝑡(𝑢1 ), . . . , 𝑡(𝑢𝑘 )) is linearly independent in 𝑉, and so (11.2.3) follows from (11.2.1). Since a linear transformation between two vector spaces of the same finite dimension is injective if and only if it is surjective, (11.2.4) follows from setting 𝑘 = 𝑛 in (11.2.3). □
11.3. Counting subspaces For all 𝑛, 𝑘 ≥ 0, let (𝑛𝑘) denote the number of 𝑘-dimensional subspaces of 𝐹𝑞𝑛 . The 𝑞
numbers (𝑛𝑘) are called q-binomial coefficients, or Gaussian coefficients. 𝑞
Theorem 11.3.1. For all 𝑛, 𝑘 ≥ 0, 𝑘−1
(11.3.1)
𝑘−1
𝑛 ( ) = ∏(𝑞𝑛 − 𝑞𝑗 )/ ∏(𝑞𝑘 − 𝑞𝑗 ). 𝑘 𝑗=0 𝑗=0 𝑞
166
11. Finite vector spaces
If 0 ≤ 𝑘 ≤ 𝑛, then 𝑛!𝑞 𝑛 . ( ) = ! 𝑘 𝑘𝑞 (𝑛 − 𝑘)!𝑞 𝑞
(11.3.2)
Proof. If 𝑘 = 0, formulas (11.3.1) and (11.3.2) take the value 1, as they should. If 𝑘 > 𝑛, formula (11.3.1) takes the value 0, as it should. If 1 ≤ 𝑘 ≤ 𝑛, let 𝒜 be equal to the set of all linearly independent sequences (𝑥1 , . . . , 𝑥𝑘 ) in 𝐹𝑞𝑛 and let ℬ be equal to 𝑘−1
the set of all 𝑘-dimensional subspaces of 𝐹𝑞𝑛 . By (11.2.1), |𝒜| = ∏𝑗=0 (𝑞𝑛 − 𝑞𝑗 ), and by definition, |ℬ| = (𝑛𝑘) . By Theorem 11.2.1, the map (𝑥1 , . . . , 𝑥𝑘 ) ↦ ⟨𝑥1 , . . . , 𝑥𝑘 ⟩ is a 𝑞
𝑘−1
∏𝑗=0 (𝑞𝑘 − 𝑞𝑗 )-to-one surjection from 𝒜 to ℬ, and so (11.3.1) follows from Theorem 2.5.4. □ 𝑛 Remark 11.3.2. By formula (11.3.2), (𝑛0) = (𝑛𝑛) = 1 and (𝑛1) = (𝑛−1 ) = 𝑛𝑞 = 𝑞
𝑞
𝑞
𝑞
1 + 𝑞 + ⋯ + 𝑞𝑛−1 if 𝑛 ≥ 1. More generally, (11.3.3)
𝑛 𝑛 ( ) =( ) , 𝑘 𝑛−𝑘 𝑞
0 ≤ 𝑘 ≤ 𝑛,
𝑞
as a trivial consequence of (11.3.2). We will present a combinatorial proof of this symmetry property in a later section. There are two recursive formulas for the 𝑞-binomial coefficients. Theorem 11.3.3. For all 𝑛, 𝑘 ≥ 0, (𝑛0) = 1 and (𝑘0) = 𝛿0,𝑘 . For all 𝑛, 𝑘 > 0, 𝑞
(11.3.4)
𝑞
𝑛 𝑛−1 𝑛−1 ( ) =( ) + 𝑞𝑘 ( ) 𝑘 𝑘−1 𝑘 𝑞
𝑞
𝑞
and (11.3.5)
𝑛 𝑛−1 𝑛−1 ( ) = 𝑞𝑛−𝑘 ( ) +( ) . 𝑘 𝑘−1 𝑘 𝑞
𝑞
𝑞
Proof. If 𝑘 > 𝑛, both formulas assert that 0 = 0, so suppose that 1 ≤ 𝑘 ≤ 𝑛. By (11.3.3), formula (11.3.4) implies formula (11.3.5), and so it suffices to prove the former. There is of course a straightforward algebraic proof of (11.3.4) based on (11.3.2), but we prefer the following combinatorial proof. Let 𝑊 be a fixed, but arbitrary, onedimensional subspace of 𝐹𝑞𝑛 . First, among all 𝑘-dimensional subspaces of 𝐹𝑞𝑛 , (𝑛−1 ) 𝑘−1 𝑞 counts those that contain 𝑊. The quickest proof of this assertion invokes the fact that that the map 𝑈 ↦ 𝑈/𝑊 is a bijection from the set of 𝑘-dimensional subspaces 𝑈 of 𝐹𝑞𝑛 that contain 𝑊 to the set of (𝑘 − 1)-dimensional subspaces of the (𝑛 − 1)-dimensional quotient space 𝐹𝑞𝑛 /𝑊. But the following elementary argument, variants of which occur frequently in this chapter, is of independent interest. Let 𝒜 be equal to the set of all linearly independent sequences (𝑥1 , . . . , 𝑥𝑘 ) in 𝐹𝑞𝑛 such that 𝑥1 ∈ 𝑊. Clearly, |𝒜| = (𝑞 − 1)(𝑞𝑛 − 𝑞) ⋯ (𝑞𝑛 − 𝑞𝑘−1 ). Let ℬ be equal to the set of all 𝑘-dimensional
11.4. The 𝑞-binomial coefficients are Comtet numbers
167
subspaces of 𝐹𝑞𝑛 that contain 𝑊. The map (𝑥1 , . . . , 𝑥𝑘 ) ↦ ⟨𝑥1 , . . . , 𝑥𝑘 ⟩ from 𝒜 to ℬ is a (𝑞 − 1)(𝑞𝑘 − 𝑞) ⋯ (𝑞𝑘 − 𝑞𝑘−1 )-to-one surjection. So 𝑛−1 |ℬ| = (𝑞 − 1)(𝑞𝑛 − 𝑞) ⋯ (𝑞𝑛 − 𝑞𝑘−1 )/(𝑞 − 1)(𝑞𝑘 − 𝑞) ⋯ (𝑞𝑘 − 𝑞𝑘−1 ) = ( ) . 𝑘−1 𝑞
Second, among all 𝑘-dimensional subspaces of 𝐹𝑞𝑛 , 𝑞𝑘 (𝑛−1 ) counts those that do 𝑘 𝑞
not contain 𝑊. For let 𝒜 be equal to the set of all linearly independent sequences (𝑥1 , . . . , 𝑥𝑘 ) in 𝐹𝑞𝑛 with 𝑊 not contained in ⟨𝑥1 , . . . , 𝑥𝑘 ⟩, and let ℬ be equal to the set of all 𝑘-dimensional subspaces of 𝐹𝑞𝑛 that do not contain 𝑊. The map (𝑥1 , . . . , 𝑥𝑘 ) ↦ ⟨𝑥1 , . . . , 𝑥𝑘 ⟩ is a (𝑞𝑘 − 1)(𝑞𝑘 − 𝑞) ⋯ (𝑞𝑘 − 𝑞𝑘−1 )-to-one surjection, and so |ℬ| = |𝒜|/(𝑞𝑘 − 1)(𝑞𝑘 − 𝑞) ⋯ (𝑞𝑘 − 𝑞𝑘−1 ) = (𝑞𝑛 − 𝑞) ⋯ (𝑞𝑛 − 𝑞𝑘 )/(𝑞𝑘 − 1)(𝑞𝑘 − 𝑞) ⋯ (𝑞𝑘 − 𝑞𝑘−1 ) = 𝑞𝑘 (
𝑛−1 ). 𝑘
□
It follows from the preceding by a simple inductive argument that, if 0 ≤ 𝑘 ≤ 𝑛, then (𝑛𝑘) is a polynomial in q with nonnegative integral coefficients. For example, 𝑞
6 6 6 6 ( ) = ( ) = 1, ( ) = ( ) = 1 + 𝑞 + 𝑞2 + 𝑞3 + 𝑞4 + 𝑞5 , 0 6 1 5 𝑞
𝑞
𝑞
𝑞
6 6 ( ) = ( ) = 1 + 𝑞 + 2𝑞2 + 2𝑞3 + 3𝑞4 + 2𝑞5 + 2𝑞6 + 𝑞7 + 𝑞8 , and 2 4 𝑞
𝑞
6 ( ) = 1 + 𝑞 + 2𝑞2 + 3𝑞3 + 3𝑞4 + 3𝑞5 + 3𝑞6 + 2𝑞7 + 𝑞8 + 𝑞9 . 3 𝑞
A sequence (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) in ℝ is said to be unimodal if there exists a (not necessarily unique) 𝑗 such that 0 ≤ 𝑗 ≤ 𝑛 and 𝑎0 ≤ ⋯ ≤ 𝑎𝑗 ≥ 𝑎𝑗+1 ≥ ⋯ ≥ 𝑎𝑛 . Note that the coefficients of the polynomials (𝑘6) above are unimodal. Sylvester (1973) was the first 𝑞
to prove that the coefficients of the polynomials (𝑛𝑘) are always unimodal,1 and O’Hara 𝑞
(1990) has given a combinatorial proof of this result. See also Zeilberger (1989). We will further explore unimodality, as well as the stronger property of logarithmic concavity, in Chapter 12.
11.4. The 𝑞-binomial coefficients are Comtet numbers From the recurrence (11.3.4) it follows that the numbers (𝑛𝑘) are Comtet numbers, 𝑞
with 𝑏𝑛 = 𝑞𝑛 for all 𝑛 ≥ 0. The following theorems are thus immediate consequences of Theorem 7.2.1. 1
This result should not be confused with the fact that the numerical sequence (𝑛0) , (𝑛1) , . . . , (𝑛 ) , where 𝑞 is a power 𝑛 𝑞
𝑞
𝑞
of a prime number, is always unimodal, an elementary result that you are asked to prove in Exercise 11.2.
168
11. Finite vector spaces
Theorem 11.4.1. For all 𝑛, 𝑘 ≥ 0, (11.4.1)
𝑛 ( ) = 𝑘 𝑑 𝑞
(11.4.2)
𝑞𝑑1 +2𝑑2 ⋯+𝑘𝑑𝑘
∑ 0 +𝑑1 +⋯+𝑑𝑘 =𝑛−𝑘 𝑑𝑖 ≥0
𝑞𝑑1 +2𝑑2 ⋯+𝑘𝑑𝑘 .
∑
=
𝑑1 +⋯+𝑑𝑘 ≤𝑛−𝑘 𝑑𝑖 ≥0
Theorem 11.4.2. For all 𝑘 ≥ 0, (11.4.3)
𝑛 𝑥𝑘 ∑ ( ) 𝑥𝑛 = . 𝑘 (1 − 𝑥)(1 − 𝑞𝑥) ⋯ (1 − 𝑞𝑘 𝑥) 𝑛≥0 𝑞
Theorem 11.4.3 (Cauchy’s 𝑞-binomial theorem). For all 𝑛 ≥ 0, 𝑘−1
𝑛
𝑛 𝑥𝑛 = ∑ ( ) ∏(𝑥 − 𝑞𝑗 ). 𝑘 𝑗=0 𝑘=0
(11.4.4)
𝑞
Proof. Immediate, from the fact that 𝑞-binomial coefficients are Comtet numbers, but let us give a combinatorial proof as well. By top-down summation of (11.4.4) we get the equivalent polynomial identity 𝑛−𝑘−1
𝑛
𝑛 𝑥𝑛 = ∑ ( ) 𝑘 𝑘=0
(11.4.5)
∏ (𝑥 − 𝑞𝑗 ). 𝑞
𝑗=0
which can be established by showing for all integers 𝑟 > 0 that 𝑛
(11.4.6)
𝑛 (𝑞𝑟 )𝑛 = ∑ ( ) (𝑞𝑟 − 1)(𝑞𝑟 − 𝑞) ⋯ (𝑞𝑟 − 𝑞𝑛−𝑘−1 ). 𝑘 𝑘=0 𝑞
Since a linear transformation 𝑡 ∶ 𝐹𝑞𝑛 → 𝐹𝑞𝑟 is fully determined by its action on a basis, there are (𝑞𝑟 )𝑛 such linear transformations Among these transformations, 𝑛 ( ) (𝑞𝑟 − 1)(𝑞𝑟 − 𝑞) ⋯ (𝑞𝑟 − 𝑞𝑛−𝑘−1 ) 𝑘 𝑞
counts those having a null space of dimension 𝑘. To see this, choose a 𝑘-dimensional subspace 𝑁 of 𝐹𝑞𝑛 , take an ordered basis (𝑥1 , . . . , 𝑥𝑘 ) of 𝑁, and extend it to an ordered basis (𝑥1 , . . . , 𝑥𝑘 , . . . , 𝑥𝑛 ) of 𝐹𝑞𝑛 . A linear transformation 𝑡 ∶ 𝐹𝑞𝑛 → 𝐹𝑞𝑟 has 𝑁 as its null space if and only if 𝑡(𝑥1 ) = ⋯ = 𝑡(𝑥𝑘 ) = 𝟎 and (𝑡(𝑥𝑘+1 ), . . . , 𝑡(𝑥𝑛 )) is linearly independent in 𝐹𝑞𝑟 . □ Corollary 11.4.4. For all 𝑛 ≥ 0, 𝑛
(11.4.7)
𝑘 𝑛 ∑ (−1)𝑘 𝑞( 2 ) ( ) = 𝛿𝑛,0 . 𝑘 𝑘=0
𝑞
Proof. Set 𝑥 = 0 in (11.4.4). Formula (11.4.7) is the 𝑞-analogue of the alternatingsum-of binomial-coefficients formula (3.3.1). □
11.5. 𝑞-binomial inversion
169
Remark 11.4.5. It follows from the symmetry of 𝑞-binomial coefficients that the sim𝑛 ple alternating sum ∑𝑘=0 (−1)𝑘 (𝑛𝑘) = 0 when n is odd. But if 𝑛 ≥ 2 is even, such a 𝑞
simple result no longer holds. Indeed, Gauss showed that if 𝑚 ≥ 1, then (see Goldman and Rota 1969) 2𝑚
(11.4.8)
∑ (−1)𝑘 ( 𝑘=0
2𝑚 ) = (1 − 𝑞)(1 − 𝑞3 ) ⋯ (1 − 𝑞2𝑚−1 ). 𝑘 𝑞
11.5. 𝑞-binomial inversion This section is devoted to proving a 𝑞-analogue of the binomial inversion principle. Lemma 11.5.1. If 0 ≤ 𝑗 ≤ 𝑘 ≤ 𝑛, then 𝑛 𝑘 𝑛 𝑛−𝑗 ( ) ( ) =( ) ( ) . 𝑘 𝑗 𝑗 𝑘−𝑗
(11.5.1)
𝑞
𝑞
𝑞
𝑞
In particular, when 𝑗 = 1, (11.5.1) yields 𝑛𝑞 𝑛 − 1 𝑛 ( ) . ( ) = 𝑘𝑞 𝑘 − 1 𝑘
(11.5.2)
𝑞
𝑞
Proof. Each side of (11.5.1) counts the set {(𝑊, 𝑈) ∶ 𝑊 ⊆ 𝑈 ⊆ 𝐹𝑞𝑛 , where dim 𝑊 = 𝑗 and dim 𝑈 = 𝑘}. □ Theorem 11.5.2 (𝑞-orthogonality relations). If 0 ≤ 𝑗 ≤ 𝑛, then 𝑛
(11.5.3)
𝑛−𝑘 𝑛 𝑘 ∑ (−1)𝑛−𝑘 𝑞( 2 ) ( ) ( ) = 𝛿𝑗,𝑛 and 𝑘 𝑗 𝑘=𝑗
𝑞
𝑛
(11.5.4)
∑ (−1)𝑘−𝑗 𝑞( 𝑘=𝑗
𝑘−𝑗 2
𝑞
) (𝑛) (𝑘) = 𝛿 . 𝑗,𝑛 𝑘 𝑗 𝑞
𝑞
□
Proof. Exercise.
Theorem 11.5.3 (𝑞-binomial inversion principle). For any sequences (𝑎𝑛 )𝑛≥0 and (𝑏𝑛 )𝑛≥0 in ℂ, the following are equivalent. 𝑛
(11.5.5)
For all 𝑛 ≥ 0,
𝑛 𝑏𝑛 = ∑ ( ) 𝑎𝑘 , and 𝑘 𝑘=0 𝑞
𝑛
(11.5.6)
For all 𝑛 ≥ 0,
𝑛−𝑘 𝑛 𝑎𝑛 = ∑ (−1)𝑛−𝑘 𝑞( 2 ) ( ) 𝑏𝑘 . 𝑘 𝑘=0
𝑞
Proof. Exercise.
□
Theorem 11.5.3 leads to the following useful variants of Cauchy’s 𝑞-binomial theorem.
170
11. Finite vector spaces
Theorem 11.5.4. For all 𝑛 ≥ 0, 𝑛
(𝑥 − 1)(𝑥 − 𝑞) ⋯ (𝑥 − 𝑞𝑛−1 ) = ∑ (−1)𝑛−𝑘 𝑞(
(11.5.7)
) (𝑛) 𝑥𝑘 , 𝑘
𝑛−𝑘 2
𝑘=0
𝑞
𝑛
𝑛−𝑘 𝑛 (𝑥 + 1)(𝑥 + 𝑞) ⋯ (𝑥 + 𝑞𝑛−1 ) = ∑ 𝑞( 2 ) ( ) 𝑥𝑘 , and 𝑘 𝑘=0
(11.5.8)
𝑞
𝑛
𝑛 (1 + 𝑥)(1 + 𝑞𝑥) ⋯ (1 + 𝑞𝑛−1 𝑥) = ∑ 𝑞( ) ( ) 𝑥𝑘 . 𝑘 𝑘=0
(11.5.9)
𝑘 2
𝑞
Proof. Formula (11.5.7) follows from (11.4.4) by 𝑞-binomial inversion, with 𝑏𝑛 = 𝑥𝑛 and 𝑎𝑘 = (𝑥 − 1)(𝑥 − 𝑞) ⋯ (𝑥 − 𝑞𝑘−1 ). Formula (11.5.8) follows from (11.5.7) by substituting −𝑥 for 𝑥. Taking the reciprocal polynomial of each side of (11.5.8) yields □ (11.5.9). Let 𝜀𝑞 (𝑛, 𝑘) denote the number of epimorphisms (surjective linear transformations) 𝑡 ∶ 𝐹𝑞𝑛 → 𝐹𝑞𝑘 . The following formula is an immediate consequence of Theorem 11.5.3. Theorem 11.5.5. For all 𝑛, 𝑘 ≥ 0, 𝑘
𝑘−𝑗 𝑘 𝜀𝑞 (𝑛, 𝑘) = ∑ (−1)𝑘−𝑗 𝑞( 2 ) ( ) 𝑞𝑗𝑛 . 𝑗 𝑗=0
(11.5.10)
𝑞
Proof. For fixed, but arbitrary, 𝑛 ≥ 0, it is the case that 𝑘
𝑘 𝑞𝑘𝑛 = ∑ ( ) 𝜀𝑞 (𝑛, 𝑗), for all 𝑘 ≥ 0, 𝑗 𝑗=0
(11.5.11)
𝑞
since, among all linear transformations 𝑡 ∶ 𝐹𝑞𝑛 → 𝐹𝑞𝑘 , the term (𝑘𝑗) 𝜀𝑞 (𝑛, 𝑗) counts those 𝑞
having rank 𝑗. Formula (11.5.10) follows from (11.5.11) by 𝑞-binomial inversion.
□
Corollary 11.5.6. 𝑘
(11.5.12) ∑ (−1)𝑘−𝑗 𝑞(
𝑘−𝑗 2
𝑗=0 𝑛
(11.5.13) ∑ (−1)𝑛−𝑗 𝑞( 𝑗=0
) (𝑘) 𝑞𝑗𝑛 = 0, for all 𝑘 > 𝑛, and 𝑗 𝑞
𝑛−𝑗 2
) (𝑛) 𝑞𝑗𝑛 = (𝑞𝑛 − 1)(𝑞𝑛 − 𝑞) ⋯ (𝑞𝑛 − 𝑞𝑛−1 ), for all 𝑛 ≥ 0. 𝑗 𝑞
Proof. Recall that if 𝑡 ∶ 𝑈 → 𝑉 is a linear transformation between finite-dimensional vector spaces 𝑈 and 𝑉 over a field 𝐹, then dim 𝑈 = dim 𝑁𝑡 + dim 𝑅𝑡 , where 𝑁𝑡 and 𝑅𝑡 denote, respectively, the null space and the range of 𝑡. The number dim 𝑁𝑡 is called the nullity of t, and the number dim 𝑅𝑡 is called the rank of t. It follows immediately that if dim 𝑈 < dim 𝑉, then there are no epimorphisms 𝑡 ∶ 𝑈 → 𝑉, and so formula (11.5.12) follows immediately from Theorem 11.5.5. Furthermore, if dim 𝑈 = dim 𝑉, then 𝑡 ∶ 𝑈 → 𝑉 is an epimorphism if and only if it is an isomorphism, and so formula (11.5.13) follows from Theorem 11.5.5 and formula (11.2.4). □
11.6. The 𝑞-Vandermonde identity
171
Theorem 11.5.7. For all 𝑛, 𝑘 ≥ 0, 𝜀𝑞 (𝑛, 𝑘) = (𝑞𝑛 − 1)(𝑞𝑛 − 𝑞) ⋯ (𝑞𝑛 − 𝑞𝑘−1 ).
(11.5.14)
Proof. # 1: Formula (11.5.14) follows from formula (11.5.10) by substituting 𝑘 for 𝑛, 𝑗 for 𝑘, and 𝑞𝑛 for 𝑥 in (11.5.7). Proof. # 2: A linear transformation 𝑡 ∶ 𝐹𝑞𝑛 → 𝐹𝑞𝑘 is surjective if and only if the rank of 𝑡 is equal to 𝑘 (and so the nullity of 𝑡 is equal to 𝑛 − 𝑘). So to construct all possible epimorphisms 𝑡 ∶ 𝐹𝑞𝑛 → 𝐹𝑞𝑘 , we first choose an (𝑛 − 𝑘)-dimensional subspace 𝑁 of 𝐹𝑞𝑛 , and let (𝑥1 , . . . , 𝑥𝑛−𝑘 ) be an arbitrary ordered basis of 𝑁, extending it to an ordered basis (𝑥1 , . . . , 𝑥𝑛−𝑘 , . . . , 𝑥𝑛 ) of 𝐹𝑞𝑛 . All epimorphisms 𝑡 ∶ 𝐹𝑞𝑛 → 𝐹𝑞𝑘 with null space 𝑁 are now constructed by setting 𝑡(𝑥𝑖 ) = 𝟎, for 1 ≤ 𝑖 ≤ 𝑛 − 𝑘, and taking for (𝑡(𝑥𝑛−𝑘+1 ), . . . , 𝑡(𝑥𝑛 )) 𝑛 any ordered basis of 𝐹𝑞𝑘 . It follows that 𝜀𝑞 (𝑛, 𝑘) = (𝑛−𝑘 ) (𝑞𝑘 − 1) ⋯ (𝑞𝑘 − 𝑞𝑘−1 ) = 𝑞
(𝑞𝑛 − 1) ⋯ (𝑞𝑛 − 𝑞𝑘−1 ). Note that this is virtually the same proof as that of Theorem 11.4.3 □ Remark 11.5.8. It follows from formulas (11.5.14) and (11.2.3) that 𝜀𝑞 (𝑛, 𝑘) = 𝑖𝑞 (𝑘, 𝑛). This will of course be no surprise to students of linear algebra, since it is true for finite dimensional vector spaces 𝑈 and 𝑉 over any field that the number of epimorphisms from 𝑈 to 𝑉 is equal to the number of monomorphisms from 𝑉 to 𝑈 (see Exercise 11.7).
11.6. The 𝑞-Vandermonde identity It is a heuristic principle that virtually all binomial coefficient identities have 𝑞-analogues. So it is natural to expect that there is a 𝑞-analogue of Theorem 3.1.6. Theorem 11.6.1 (𝑞-Vandermonde identity). For all 𝑚, 𝑛, 𝑟 ≥ 0, 𝑟
(11.6.1)
(
𝑚+𝑛 𝑚 𝑛 ) = ∑ 𝑞(𝑚−𝑘)(𝑟−𝑘) ( ) ( ) . 𝑟 𝑘 𝑟 − 𝑘 𝑘=0 𝑞
𝑞
𝑞
Proof. By (11.5.9), 𝑚
𝑘 𝑚 (1 + 𝑥)(1 + 𝑞𝑥) ⋯ (1 + 𝑞𝑚−1 𝑥) = ∑ 𝑞( 2 ) ( ) 𝑥𝑘 and 𝑘 𝑘=0
(11.6.2)
𝑞
𝑛
𝑗 𝑛 (1 + 𝑥)(1 + 𝑞𝑥) ⋯ (1 + 𝑞𝑛−1 𝑥) = ∑ 𝑞(2) ( ) 𝑥𝑗 . 𝑗 𝑗=0
(11.6.3)
𝑞
𝑚
Substituting 𝑞 𝑥 for 𝑥 in (11.6.3) yields 𝑛
(11.6.4)
𝑗 𝑛 (1 + 𝑞𝑚 𝑥)(1 + 𝑞𝑚+1 𝑥) ⋯ (1 + 𝑞𝑚+𝑛−1 𝑥) = ∑ 𝑞(2)+𝑚𝑗 ( ) 𝑥𝑗 . 𝑗 𝑗=0
𝑞
Multiplying (11.6.2) by (11.6.4) yields (11.6.5) 𝑚+𝑛
𝑟
𝑘 𝑟−𝑘 𝑚 𝑛 (1 + 𝑥)(1 + 𝑞𝑥) ⋯ (1 + 𝑞𝑚+𝑛−1 𝑥) = ∑ 𝑥𝑟 ∑ 𝑞( 2 )+( 2 )+𝑚(𝑟−𝑘) ( ) ( ) . 𝑘 𝑟−𝑘 𝑟=0 𝑘=0
𝑞
𝑞
172
11. Finite vector spaces
But (11.5.9) also implies that 𝑚+𝑛
(11.6.6)
(1 + 𝑥)(1 + 𝑞𝑥) ⋯ (1 + 𝑞
𝑚+𝑛−1
𝑟 𝑚+𝑛 𝑥) = ∑ 𝑞(2) ( ) 𝑥𝑟 . 𝑟 𝑟=0
𝑞
Comparing coefficients of like powers in (11.6.5) and (11.6.6) yields (11.6.1).
□
Remark 11.6.2. As one can imagine, there are a number of variants of formula (11.6.1). Unfortunately, none of them seems to exhibit the symmetry of (𝑚+𝑛 ) in 𝑚 and 𝑛. 𝑟 𝑞
11.7. 𝑞-multinomial coefficients of the first kind A sequence (𝑈 𝑖 )0≤𝑗≤𝑘 of subspaces of 𝐹𝑞𝑛 , with {𝟎} = 𝑈0 ⊆ 𝑈1 ⊆ ⋯ ⊆ 𝑈 𝑘 = 𝐹𝑞𝑛 , is called a multichain of length k from {𝟎} to 𝐹𝑞𝑛 . For each weak 𝑘-composition 𝑛1 + ⋯ + 𝑛𝑘 = 𝑛, let the q-multinomial coefficient of the first kind, (𝑛 ,.𝑛. .,𝑛 ) , be defined as the 1
𝑘
number of multichains {𝟎} = 𝑈0 ⊆ 𝑈1 ⊆ ⋯ ⊆ 𝑈 𝑘 = 𝐹𝑞𝑛 such that (11.7.1)
dim 𝑈 𝑖 /𝑈 𝑖−1 (= dim 𝑈 𝑖 − dim 𝑈 𝑖−1 ) = 𝑛𝑖 ,
𝑞
𝑖 = 1, . . . , 𝑘,
or, equivalently, such that (11.7.2)
dim 𝑈 𝑖 = 𝑛1 + ⋯ + 𝑛𝑖 ,
𝑖 = 1, . . . , 𝑘.
Theorem 11.7.1. (11.7.3)
(
𝑛!𝑞 𝑛 . ) = 𝑛1 , . . . , 𝑛 𝑘 (𝑛1 )!𝑞 ⋯ (𝑛𝑘 )!𝑞 𝑞
Proof. Let 𝒜 denote the set of all ordered bases (𝑥1 , . . . , 𝑥𝑛 ) of 𝐹𝑞𝑛 , and let ℬ denote the set of all of all multichains {𝟎} = 𝑈0 ⊆ 𝑈1 ⊆ ⋯ ⊆ 𝑈 𝑘 = 𝐹𝑞𝑛 satisfying (11.7.2). The map (𝑥1 , . . . , 𝑥𝑛 ) ↦ ({𝟎}, ⟨𝑥1 , . . . , 𝑥𝑛1 ⟩, ⟨𝑥1 , . . . , 𝑥𝑛1 +𝑛2 ⟩, . . . ⟨𝑥1 , . . . , 𝑥𝑛1 +⋯+𝑛𝑘−1 ⟩, 𝐹𝑞𝑛 ) 𝑛 is a (𝑞 − 1)𝑛 𝑞( 2 ) (𝑛1 )!𝑞 ⋯ (𝑛𝑘 )!𝑞 -to-one surjection from 𝒜 to ℬ (exercise), from which (11.7.3) follows by Theorems 2.5.4 and 11.2.1. □
A sequence (𝑈 𝑖 )0≤𝑗≤𝑘 of subspaces of 𝐹𝑞𝑛 , with {𝟎} = 𝑈0 ⊂ 𝑈1 ⊂ ⋯ ⊂ 𝑈 𝑘 = 𝐹𝑞𝑛 , is called a chain of length k from {𝟎} to 𝑉. In a chain {𝟎} = 𝑈0 ⊂ 𝑈1 ⊂ ⋯ ⊂ 𝑈𝑛 = 𝐹𝑞𝑛 , of length 𝑛, we have dim 𝑈 𝑖 = 𝑖, 𝑖 = 1, . . . , 𝑛. Such chains are maximal, in the sense that they are not contained in any longer chains. Corollary 11.7.2. There are 𝑛!𝑞 maximal chains in 𝐹𝑞𝑛 . □
Proof. Obvious, since 1!𝑞 = 1𝑞 = 1.
As one might expect, the 𝑞-multinomial coefficients have many properties analogous to those of ordinary multinomial coefficients. In particular, (11.7.4)
(
𝑛 − 𝑛1 − ⋯ − 𝑛𝑘−1 𝑛 − 𝑛1 𝑛 𝑛 ) , ) ⋯( ) =( ) ( 𝑛𝑘 𝑛2 𝑛1 𝑛1 , . . . , 𝑛 𝑘 𝑞
𝑞
𝑞
𝑞
11.8. 𝑞-multinomial coefficients of the second kind
from which it follows that (𝑛
𝑛 1 ,. . .,𝑛𝑘
173
) is a polynomial in 𝑞 with nonnegative integral 𝑞
coefficients. From (11.7.3) it follows, for every permutation 𝑔 of [𝑘], that 𝑛 𝑛 ( ) =( ) , 𝑛1 , . . . , 𝑛 𝑘 𝑛𝑔(1) , . . . , 𝑛𝑔(𝑘)
(11.7.5)
𝑞
𝑞
and that 𝑘
(11.7.6)
deg (
𝑛 𝑛 𝑛 ) = ( ) − ∑ ( 𝑖 ) = ∑ 𝑛𝑖 𝑛𝑗 . 2 𝑛1 , . . . , 𝑛 𝑘 2 1≤𝑖 1 and 𝑥1 + ⋯ + 𝑥𝑗 = 𝟎, so that 𝑥𝑗 = −𝑥1 − ⋯ − 𝑥𝑗−1 ≠ 𝟎. But 𝑥𝑗 ∈ (𝑈1 + ⋯ + 𝑈 𝑗−1 ) ∩ 𝑈 𝑗 , and so (11.8.1) fails. (𝑖𝑖) Suppose that (11.8.1) fails, with (𝑈1 +⋯+𝑈 𝑖 )∩𝑈 𝑖+1 containing the nonzero vector 𝑥. It is easy to show that 𝑈1 + ⋯ + 𝑈 𝑖 = {𝑥1 + ⋯ + 𝑥𝑖 ∶ 𝑥𝑙 ∈ 𝑈 𝑙 , 𝑙 = 1, . . . , 𝑖}. Since 𝑥 ∈ 𝑈1 + ⋯ + 𝑈 𝑖 , there exist 𝑥𝑙 ∈ 𝑈 𝑙 , 𝑙 = 1, . . . , 𝑖, such that 𝑥 = 𝑥1 + ⋯ + 𝑥𝑖 . So 𝑥1 + ⋯ + 𝑥𝑖 − 𝑥 + 𝟎 + ⋯ + 𝟎 = 𝟎 (where the preceding sum terminates in 𝑘 − 𝑖 − 1 0’s), but −𝑥 ≠ 𝟎, and so (11.8.2) fails. □ Suppose that (11.8.3)
𝑈1 + ⋯ + 𝑈 𝑘 = 𝑉.
If, in addition, either of the equivalent conditions (11.8.1) or (11.8.2) holds, we say that the sequence (𝑈1 , . . . , 𝑈 𝑘 ) constitutes an ordered direct sum decomposition of V, with k direct summands, symbolizing this by 𝑉 = 𝑈1 ⊕ ⋯ ⊕ 𝑈 𝑘 .
174
11. Finite vector spaces
Theorem 11.8.2. If 𝑉 is a finite dimensional vector space, with subspaces 𝑈1 , . . . , 𝑈 𝑘 satisfying (11.8.3), then 𝑉 = 𝑈1 ⊕ ⋯ ⊕ 𝑈 𝑘 if and only if 𝑘
dim 𝑉 = ∑ dim 𝑈 𝑖 .
(11.8.4)
𝑖=1
Proof. In proving both sufficiency and necessity, we will use the fact that for any finitedimensional subspaces 𝑈1 , . . . , 𝑈 𝑘 of an arbitrary vector space 𝑉, 𝑘
(11.8.5)
𝑘−1
dim(𝑈1 + ⋯ + 𝑈 𝑘 ) = ∑ dim 𝑈 𝑖 − ∑ dim((𝑈1 + ⋯ + 𝑈 𝑖 ) ∩ 𝑈 𝑖+1 ), 𝑖=1
𝑖=1
which may be proved by induction from the well-known result (11.8.6)
dim(𝑈1 + 𝑈2 ) = dim 𝑈1 + dim 𝑈2 − dim(𝑈1 ∩ 𝑈2 ).
Sufficiency. Suppose that (11.8.3) and (11.8.4) hold. By (11.8.3) and (11.8.5), 𝑘
(11.8.7)
𝑘−1
dim 𝑉 = ∑ dim 𝑈 𝑖 − ∑ dim((𝑈1 + ⋯ + 𝑈 𝑖 ) ∩ 𝑈 𝑖+1 ), 𝑖=1
𝑖=1
and comparing (11.8.7) to (11.8.4), we see that (11.8.8)
dim((𝑈1 + ⋯ + 𝑈 𝑖 ) ∩ 𝑈 𝑖+1 ) = 0,
𝑖 = 1, . . . , 𝑘 − 1,
which is equivalent to (11.8.1). Hence, 𝑉 = 𝑈1 ⊕ ⋯ ⊕ 𝑈 𝑘 . Necessity. Suppose that 𝑉 = 𝑈1 ⊕⋯⊕𝑈 𝑘 . Then (11.8.3) holds, and we again have (11.8.7). But by the definition of an ordered direct sum, we must have (11.8.1), and hence (11.8.8). So (11.8.7) reduces to (11.8.4). □ Theorem 11.8.3. Suppose that 𝑉 is an 𝑛-dimensional vector space and (𝑛1 , . . . , 𝑛𝑘 ) is a weak 𝑘-composition of 𝑛. (i) If (𝑥1 , . . . , 𝑥𝑛 ) is an ordered basis of 𝑉, 𝑈1 = ⟨𝑥1 , . . . , 𝑥𝑛1 ⟩, 𝑈2 = ⟨𝑥𝑛1 +1 , . . . , 𝑥𝑛1 +𝑛2 ⟩, . . . , and 𝑈 𝑘 = ⟨𝑥𝑛1 +⋯+𝑛𝑘−1 +1 , . . . , 𝑥𝑛 ⟩, then 𝑉 = 𝑈1 ⊕ ⋯ ⊕ 𝑈 𝑘 , with dim 𝑈 𝑖 = 𝑛𝑖 , 𝑖 = 1, . . . , 𝑘. (ii) If 𝑉 = 𝑈1 ⊕⋯⊕𝑈 𝑘 , with dim 𝑈 𝑖 = 𝑛𝑖 , 𝑖 = 1, . . . , 𝑘, (𝑥1 , . . . , 𝑥𝑛1 ) is an ordered basis of 𝑈1 , (𝑥𝑛1 +1 , . . . , 𝑥𝑛1 +𝑛2 ) is an ordered basis of 𝑈2 , . . . , and (𝑥𝑛1 +⋯+𝑛𝑘−1 +1 , . . . , 𝑥𝑛 ) is an ordered basis of 𝑈 𝑘 , then (𝑥1 , . . . , 𝑥𝑛 ) is an ordered basis of 𝑉. Proof. (𝑖) Since 𝑉 = ⟨𝑥1 , . . . , 𝑥𝑛 ⟩ ⊆ 𝑈1 + ⋯ + 𝑈 𝑘 ⊆ 𝑉, we have 𝑈1 + ⋯ + 𝑈 𝑘 = 𝑉. Since {𝑥1 , . . . , 𝑥𝑛 } is linearly independent, each of its subsets is linearly independent, and so dim 𝑈 𝑖 = 𝑛𝑖 , 𝑖 = 1, . . . , 𝑘. It follows from Theorem 11.8.2 that 𝑉 = 𝑈1 ⊕ ⋯ ⊕ 𝑈 𝑘. (𝑖𝑖) Let 𝑥 ∈ 𝑉. Since 𝑉 = 𝑈1 ⊕ ⋯ ⊕ 𝑈 𝑘 , there exist 𝑦 𝑖 ∈ 𝑈 𝑖 such that 𝑥 = 𝑦1 + ⋯ + 𝑦 𝑘 . But each 𝑦 𝑖 is a linear combination of the 𝑛𝑖 vectors 𝑥𝑛1 +⋯+𝑛𝑖−1 +1 , . . . , 𝑥𝑛1 +⋯+𝑛𝑖−1 +𝑛𝑖 , and so ⟨𝑥1 , . . . , 𝑥𝑛 ⟩ = 𝑉. Suppose that 𝜆1 𝑥1 + ⋯ + 𝜆𝑛 𝑥𝑛 = 𝟎. By (11.8.2), 𝜆𝑛1 +⋯+𝑛𝑖−1 +1 𝑥𝑛1 +⋯+𝑛𝑖−1 +1 + ⋯ + 𝜆𝑛1 +⋯+𝑛𝑖−1 +𝑛𝑖 𝑥𝑛1 +⋯+𝑛𝑖−1 +𝑛𝑖 = 𝟎, and so 𝜆𝑛1 +⋯+𝑛𝑖−1 +1 = ⋯ = 𝜆𝑛1 +⋯+𝑛𝑖−1 +𝑛𝑖 = 0, 𝑖 = 1, . . . , 𝑘. So {𝑥1 , . . . , 𝑥𝑛 } is linearly independent. □
11.9. The distribution polynomials of statistics on discrete structures
175
In the remainder of this section, we return to the study of 𝐹𝑞 -vector spaces. If (𝑛1 , . . . , 𝑛𝑘 ) is a weak 𝑘-composition of 𝑛, let ⟨𝑛1 . .𝑛.,𝑛𝑘 ⟩𝑞 denote the number of ordered direct sum decompositions 𝑈1 ⊕ ⋯ ⊕ 𝑈 𝑘 = 𝐹𝑞𝑛 , where dim 𝑈 𝑖 = 𝑛𝑖 , 𝑖 = 1, . . . , 𝑘. We call ⟨𝑛1 ,.𝑛. .,𝑛𝑘 ⟩𝑞 a q-multinomial coefficient of the second kind. Theorem 11.8.4. 𝑛
⟨𝑛1 . .𝑛.,𝑛𝑘 ⟩𝑞 =
(11.8.9)
𝑛𝑞
𝑛𝑘
𝑛1
.
(𝑛1 )𝑞 ⋯ (𝑛𝑘 )𝑞
Proof. Let 𝒜 denote the set of all ordered bases (𝑥1 , . . . , 𝑥𝑛 ) of 𝐹𝑞𝑛 , and let ℬ denote the set of all ordered direct sum decompositions 𝐹𝑞𝑛 = 𝑈1 ⊕ ⋯ ⊕ 𝑈 𝑘 , with dim 𝑈 𝑖 = 𝑛𝑖 , 𝑖 = 1, . . . , 𝑘. Consider the map (𝑥1 , . . . , 𝑥𝑛 ) ↦ (𝑈1 , . . . , 𝑈 𝑘 ) where 𝑈 𝑖 = ⟨𝑥𝑛1 +⋯+𝑛𝑖−1 +1 , . . . , 𝑥𝑛1 +⋯+𝑛𝑖 ⟩, 1 ≤ 𝑖 ≤ 𝑘. By Theorem 11.8.3(𝑖), this is a map from 𝒜 to ℬ. By Theorem 11.8.3(𝑖𝑖), the preimages of each (𝑈1 , . . . , 𝑈 𝑘 ) ∈ ℬ consist of all those sequences (𝑥1 , . . . , 𝑥𝑛 ) in 𝐹𝑞𝑛 for which (𝑥𝑛1 +⋯+𝑛𝑖−1 +1 , . . . , 𝑥𝑛1 +⋯+𝑛𝑖 ) is an ordered basis 𝑘
of 𝑈 𝑖 , 𝑖 = 1, . . . , 𝑘. By Theorem 11.2.2, there are ∏𝑖=1 (𝑞𝑛𝑖 − 1)(𝑞𝑛𝑖 − 𝑞) ⋯ (𝑞𝑛𝑖 − 𝑞𝑛𝑖 −1 ) such preimages. Formula (11.8.9) now follows from Theorem 2.5.4. □ 𝑛
Corollary 11.8.5. There are 𝑛𝑞 ordered direct sum decompositions 𝐹𝑞𝑛 = 𝑈1 ⊕ ⋯ ⊕ 𝑈𝑛 into one-dimensional direct summands. □
Proof. 1!𝑞 = 1𝑞 = 1. Corollary 11.8.6. If 0 ≤ 𝑘 ≤ 𝑛, then 𝑛 𝑛 ⟩𝑞 = ( ) 𝑞𝑘(𝑛−𝑘) . ⟨𝑘,𝑛−𝑘 𝑘
(11.8.10)
𝑞
Proof. Immediate, since 𝑛𝑞 = 𝑞( 2 ) 𝑛!𝑞 . 𝑛
𝑛
□
If 𝑈 is a subspace of any vector space 𝑉 and 𝑈 ⊕ 𝑊 = 𝑉, the subspace 𝑊 is called a supplement of 𝑈 in 𝑉. Corollary 11.8.7. A 𝑘-dimensional subspace 𝑈 of 𝐹𝑞𝑛 has 𝑞𝑘(𝑛−𝑘) supplements in 𝐹𝑞𝑛 . □
Proof. Exercise.
11.9. The distribution polynomials of statistics on discrete structures Let Δ be a finite set of discrete structures, and suppose that 𝑠 ∶ Δ → ℕ. The mapping 𝑠 is called a statistic on Δ. The set Δ is frequently regarded as being equipped with the uniform probability distribution, so that 𝑠 is an (integer-valued) random variable. The distribution polynomial 𝑝(Δ, 𝑠, 𝑞) of s on Δ is defined by (11.9.1)
𝑝(Δ, 𝑠, 𝑞) ∶= ∑ 𝑞𝑠(𝛿) = ∑ |{𝛿 ∈ Δ ∶ 𝑠(𝛿) = 𝑘}|𝑞𝑘 , 𝛿∈∆
𝑘≥0
176
11. Finite vector spaces
where 𝑞 is an indeterminate (the reason for our use of 𝑞 here, rather than the usual 𝑥, 𝑦, or 𝑧, will become clear in what follows). Elementary examples include (i) Δ = 2[𝑛] and 𝑠(𝐴) = |𝐴|, with 𝑝(Δ, 𝑠, 𝑞) = ∑𝐴⊆[𝑛] 𝑞|𝐴| = ∑𝑘≥0 (𝑛𝑘)𝑞𝑘 = (1 + 𝑞)𝑛 , and (ii) Δ = 𝑆𝑛 and 𝑠(𝑔) = the number of cycles in the permutation 𝑔, with 𝑝(Δ, 𝑠, 𝑞) = ∑ 𝑐(𝑛, 𝑘)𝑞𝑘 = 𝑞𝑛 = 𝑞(𝑞 + 1) ⋯ (𝑞 + 𝑛 − 1). 𝑘≥0
Note that 𝑝(Δ, 𝑠, 𝑞)|𝑞=1 = |Δ| and 𝐷𝑞 𝑝(Δ, 𝑠, 𝑞)|𝑞=1 /|Δ| = 𝜇𝑠 , the mean value of s on Δ. Higher moments of 𝑠 can also be calculated using the distribution polynomial of 𝑠. This is of interest because we frequently encounter an infinite sequence (Δ𝑛 , 𝑠𝑛 )𝑛≥0 of sets of discrete structures and their associated statistics, and we wish to determine the asymptotic behavior of the random variables 𝑠𝑛 . Now consider a more substantial problem. Suppose that 𝑎1 𝑎2 ⋯ 𝑎𝑛 is a permutation of [𝑛]. If 1 ≤ 𝑖 < 𝑗 ≤ 𝑛 and 𝑎𝑖 > 𝑎𝑗 , the pair (𝑎𝑖 , 𝑎𝑗 ) is said to constitute an inversion. Define the statistic inv for each 𝑔 ∈ 𝑆𝑛 by inv(𝑔) = the number of inversions in 𝑔(1)𝑔(2) ⋯ 𝑔(𝑛), and let 𝑖(𝑛, 𝑘) denote the number of permutations in 𝑆𝑛 with exactly 𝑘 inversions, where, of course, 0 ≤ 𝑘 ≤ (𝑛2). We wish to determine the distri(𝑛2) 𝑖(𝑛, 𝑘)𝑞𝑘 . We first establish a bution polynomial 𝑝(𝑆𝑛 , inv, 𝑞) = ∑𝑔∈𝑆 𝑞inv(𝑔) = ∑𝑘=0 𝑛 recursive formula for the numbers 𝑖(𝑛, 𝑘). Theorem 11.9.1. For all 𝑘 ≥ 0, 𝑖(0, 𝑘) = 𝛿0,𝑘 , and for all 𝑛 > 0 and all 𝑘 ≥ 0, min{𝑘,𝑛−1}
(11.9.2)
𝑖(𝑛, 𝑘) =
∑
𝑖(𝑛 − 1, 𝑘 − 𝑟).
𝑟=0
Proof. Among all permutations of [𝑛] with 𝑘 inversions, the term 𝑖(𝑛 − 1, 𝑘 − 𝑟) counts those in which 𝑟 numbers appear to the right of the number 𝑛. □ Table 11.1. The numbers 𝑖(𝑛, 𝑘) for 0 ≤ 𝑛 ≤ 5 and 0 ≤ 𝑘 ≤ 10
𝑛=0 𝑛=1 𝑛=2 𝑛=3 𝑛=4 𝑛=5
𝑘=0 1 1 1 1 1 1
𝑘=1
𝑘=2
𝑘=3
𝑘=4
𝑘=5
𝑘=6
𝑘=7
𝑘=8
𝑘=9
𝑘 = 10
1 2 3 4
2 5 9
1 6 15
5 20
3 22
1 20
15
9
4
1
Using the recurrence (11.9.2), we can determine the distribution polynomial of inv on 𝑆𝑛 . By 𝑆 0 we denote the set of permutations of [0] = ∅, of which there is just one (the empty permutation), which has no inversions. So 𝑝(𝑆 0 , inv, 𝑞) = 1. Also, clearly, 𝑝(𝑆 1 , inv, 𝑞) = 1. Theorem 11.9.2. For all 𝑛 ≥ 1, (11.9.3)
𝑝(𝑆𝑛 , inv, 𝑞) = (1)(1 + 𝑞) ⋯ (1 + 𝑞 + ⋯ + 𝑞𝑛−1 ) = 𝑛!𝑞 .
11.9. The distribution polynomials of statistics on discrete structures
177
Proof. It suffices to show that 𝑝(𝑆𝑛 , inv, 𝑞) = (1 + 𝑞 + ⋯ + 𝑞𝑛−1 )𝑝(𝑆𝑛−1 , inv, 𝑞) for all 𝑛 > 1, from which (11.9.3) follows by induction on 𝑛. First, we extend the notation 𝑖(𝑛, 𝑘) to arbitrary 𝑘 ∈ ℤ by setting 𝑖(𝑛, 𝑘) = 0 if 𝑘 < 0. Then (11.9.2) may be written 𝑛−1
(11.9.4)
𝑖(𝑛, 𝑘) = ∑ 𝑖(𝑛 − 1, 𝑘 − 𝑟), 𝑛 > 0, 𝑘 ≥ 0. 𝑟=0
If 𝑛 > 1, (𝑛2) (𝑛2) 𝑛−1 𝑘 𝑝(𝑆𝑛 , inv, 𝑞) = ∑ 𝑖(𝑛, 𝑘)𝑞 = ∑ 𝑞𝑘 ∑ 𝑖(𝑛 − 1, 𝑘 − 𝑟) 𝑘=0
𝑘=0
(𝑛2) () 𝑛−1 𝑘−𝑟 𝑟 = ∑ 𝑞 ∑ 𝑖(𝑛 − 1, 𝑘 − 𝑟)𝑞𝑘−𝑟 = ∑ 𝑞 ∑ 𝑖(𝑛 − 1, 𝑘 − 𝑟)𝑞 𝑛−1
(11.9.5)
𝑟=0
𝑛 2
𝑟
𝑟=0
𝑘=0
𝑟=0
( )−𝑟 (𝑛−1 𝑛−1 2 ) 𝑡 𝑟 ∑ 𝑞 ∑ 𝑖(𝑛 − 1, 𝑡)𝑞 = ∑ 𝑞 ∑ 𝑖(𝑛 − 1, 𝑡)𝑞𝑡
𝑛−1
=(𝑡=𝑘−𝑟)
𝑘=𝑟
𝑛 2
𝑟
𝑟=0
𝑡=0
𝑟=0
𝑡=0
= (1 + 𝑞 + ⋯ + 𝑞𝑛−1 )𝑝(𝑆𝑛−1 , inv, 𝑞). We may replace the upper limit (𝑛2) − 𝑟 in the middle sum above by (𝑛−1 ) since (𝑛−1 )= 2 2 𝑛 𝑛 □ ( 2 ) − (𝑛 − 1) ≤ ( 2 ) − 𝑟, as 𝑟 ≤ 𝑛 − 1. The apparent row symmetry of Table 11.1 is easily proved. Theorem 11.9.3. For all 𝑛 ≥ 0 and for 𝑘 = 0, . . . , (𝑛2), (11.9.6)
𝑛 𝑖(𝑛, 𝑘) = 𝑖(𝑛, ( ) − 𝑘). 2
Proof. It suffices to prove that 𝑝(𝑆𝑛 , inv, 𝑞) and its reciprocal polynomial are identical. We leave the verification of this fact, as well as a combinatorial proof of (11.9.6), as exercises. □ (𝑛2) 𝑖(𝑛, 𝑘)𝑞𝑘 = (1)(1+𝑞) ⋯ (1+𝑞+⋯+𝑞𝑛−1 ), it follows from Exercise 1.12 Since ∑𝑘=0 that 𝑖(𝑛, 𝑘) enumerates not just the permutations of [𝑛] with 𝑘 inversions, but also the number of weak 𝑛-compositions 𝑘1 + ⋯ + 𝑘𝑛 = 𝑘 such that 0 ≤ 𝑘𝑖 ≤ 𝑖 − 1, 𝑖 = 1, . . . , 𝑛. Is there a nice bijection from the set of all permutations of [𝑛] with 𝑘 inversions to the set of such weak 𝑛-compositions of 𝑘? There is, and it is the restriction to the set of permutations of [𝑛] with 𝑘 inversions of a bijection defined on all of 𝑆𝑛 . Theorem 11.9.4. Let 𝑎1 𝑎2 ⋯ 𝑎𝑛 be a permutation of [𝑛], and for 𝑖 = 1, . . . , 𝑛, let 𝑘𝑖 = the number of 𝑎𝑗 ’s appearing to the right of 𝑖 in 𝑎1 𝑎2 ⋯ 𝑎𝑛 such that 𝑖 > 𝑎𝑗 . The map 𝑎1 𝑎2 ⋯ 𝑎𝑛 ↦ (𝑘1 , . . . , 𝑘𝑛 ) is a bijection from 𝑆𝑛 to {(𝑘1 , . . . , 𝑘𝑛 ) ∶ 0 ≤ 𝑘𝑖 ≤ 𝑖 − 1, 𝑖 = 1, . . . , 𝑛}. Moreover, 𝑘1 + ⋯ + 𝑘𝑛 = 𝑘 if and only if 𝑎1 𝑎2 ⋯ 𝑎𝑛 has 𝑘 inversions. Proof. There are 𝑖 − 1 members of [𝑛] that are less than 𝑖, and each pair (𝑖, 𝑎𝑗 ) with 𝑖 > 𝑎𝑗 constitutes an inversion in 𝑎1 𝑎2 ⋯ 𝑎𝑛 . So it remains only to exhibit the inverse of the map defined above. Our discussion is facilitated by consideration of the permutation
178
11. Finite vector spaces
4271365, which is mapped to the inversion table (𝑘1 , . . . , 𝑘7 ) = (0, 1, 0, 3, 0, 1, 4). One recovers the permutation 4271365 from this table as follows. 1∘ Write down the number 1. 2∘ Since 𝑘2 = 1, the number 2 must appear to the left of 1: 21 . 3∘ Since 𝑘3 = 0, the number 3 must appear to the right of both 2 and 1: 213. 4∘ Since 𝑘4 = 3, the number 4 must appear to the left of 2,1, and 3: 4213. 5∘ Since 𝑘5 = 0, the number 5 must appear to the right of 4,2,1, and 3: 42135. 6∘ Since 𝑘6 = 1, the number 6 must appear between 3 and 5 above: 421365. 7∘ Since 𝑘7 = 4, the number 7 must appear between 2 and 1 above: 4271365. The general procedure for recovering a permutation from its inversion table should be clear from the above example. □ Theorem 11.9.2 can be bootstrapped to cover the enumeration of inversions in sequential arrangements of members of the multiset 𝑀 = {1𝑛1 , 2𝑛2 , . . . , 𝑘𝑛𝑘 }, where 𝑛1 + ⋯ + 𝑛𝑘 = 𝑛. Let 𝑆(𝑀) denote the set of all 𝑛! /𝑛1 ! ⋯ 𝑛𝑘 ! sequential arrangements of 𝑀, with 𝜎 denoting a generic such arrangement. Extend the inversion statistic inv from 𝑆𝑛 to 𝑆(𝑀) in the obvious way. Theorem 11.9.5. (11.9.7)
∑ 𝑞inv(𝜍) =
𝑝(𝑆(𝑀), inv, 𝑞) =
𝜍∈𝑆(𝑀)
𝑛!𝑞 (𝑛1 )!𝑞
⋯ (𝑛𝑘 )!𝑞
=(
𝑛 ) . 𝑛1 , . . . , 𝑛𝑘 𝑞
Proof. Consider the map from 𝑆(𝑀) × 𝑆𝑛1 × ⋯ × 𝑆𝑛𝑘 to 𝑆𝑛 defined by (𝜎, 𝑔1 , . . . , 𝑔𝑘 ) ↦ 𝑔, where 𝑔(1)𝑔(2) ⋯ 𝑔(𝑛) is constructed as follows. 1∘ Replace the 𝑛1 1’s in 𝜎, from left to right, by 𝑔1 (1), 𝑔1 (2), . . . , 𝑔1 (𝑛1 ). 2∘ Replace the 𝑛2 2’s in 𝜎, from left to right, by 𝑛1 +𝑔2 (1), 𝑛1 +𝑔2 (2), . . . , 𝑛1 +𝑔2 (𝑛2 ). ⋮ 𝑘∘ Replace the 𝑛𝑘 𝑘’s in 𝜎, from left to right, by 𝑛1 + ⋯ + 𝑛𝑘−1 + 𝑔𝑘 (1), . . . , 𝑛1 + ⋯ + 𝑛𝑘−1 + 𝑔𝑘 (𝑛𝑘 ). As an illustration, if 𝑀 = {12 , 23 , 33 }, 𝜎 = 21331223, 𝑔1 (1)𝑔1 (2) = 21, 𝑔2 (1)𝑔2 (2)𝑔2 (3) = 231, and 𝑔3 (1)𝑔3 (2)𝑔3 (3) = 312, then (𝜎, 𝑔1 , 𝑔2 , 𝑔3 ) ↦ 42861537. Keeping in mind that 𝑀 is given, it is easy to see that the mapping under consideration is a bijection. (How would one recover (𝜎, 𝑔1 , 𝑔2 , 𝑔3 ) from 42861537?) Moreover, it is clear that if (𝜎, 𝑔1 , . . . , 𝑔𝑘 ) ↦ 𝑔, then 𝑖(𝑔) = 𝑖(𝜎) + 𝑖(𝑔1 ) + ⋯ + 𝑖(𝑔𝑘 ), and so ∑𝑔∈𝑆 𝑞inv(𝑔) = 𝑛
∑𝜍∈𝑆(𝑀) 𝑞inv(𝜍) ∑𝑔 ∈𝑆 𝑞inv(𝑔1 ) ⋯ ∑𝑔 ∈𝑆 𝑞inv(𝑔𝑘 ) , which, by (11.9.3), is equivalent 1 𝑛1 𝑛𝑘 𝑘 to (11.9.8)
𝑛!𝑞 =
∑ 𝑞𝑖𝑛𝑣(𝜍) × (𝑛1 )!𝑞 × ⋯ × (𝑛𝑘 )!𝑞 . 𝜍∈𝑆(𝑀)
□
11.9. The distribution polynomials of statistics on discrete structures
Remark 11.9.6. We noted in (11.7.6) that deg (𝑛
𝑛 1 ,. . .,𝑛𝑘
179
) = ∑1≤𝑖 1 − ∑𝑘=1 𝑎𝑘 .) 11.2. Prove that the number sequence (𝑛0) , (𝑛1) , . . . , (𝑛𝑛) is unimodal whenever 𝑞 > 0. 𝑞
𝑞
𝑞
11.3. Prove Theorem 11.5.2. 11.4. Prove Theorem 11.5.3. 11.5. (a) Determine the number (denoted by 𝑠𝑞 (𝑛)) of spanning subsets S of 𝐹𝑞𝑛 , i.e., sets 𝑆 ⊆ 𝐹𝑞𝑛 such that ⟨𝑆⟩ = 𝐹𝑞𝑛 . (b) Suppose that 𝑛 ≥ 𝑘. Determine the number of sequences (𝑥1 , . . . , 𝑥𝑛 ) in 𝐹𝑛𝑘 such that ⟨𝑥1 , . . . , 𝑥𝑛 ⟩ = 𝐹𝑞𝑘 . 11.6. Determine the number of linear transformations 𝑡 ∶ 𝐹𝑞𝑛 → 𝐹𝑞𝑚 of rank 𝑟. 11.7. (For students of linear algebra) Let 𝑈 and 𝑉 be finite-dimensional vector spaces over an arbitrary field 𝐹. Prove that the number of surjective linear transformations from 𝑈 to 𝑉 is equal to the number of injective linear transformations from 𝑉 to 𝑈. 11.8. Furnish the missing details in the proof of Theorem 11.7.1. 11.9. Prove Corollary 11.8.7. 11.10. (a) Prove formula (11.9.6) by showing that 𝑝(𝑆𝑛 , inv, 𝑞) and its reciprocal polynomial are identical. (b) Prove (11.9.6) by exhibiting a bijection from the set of all permutations of [𝑛] with 𝑘 inversions to the set of all permutations of [𝑛] with (𝑛2) − 𝑘 inversions. 11.11. Determine the expected number of inversions in a randomly chosen permutation of [𝑛], where 𝑛 ≥ 2. 11.12. Prove Theorem 11.11.1 using the recurrences (11.3.4) and (11.3.5). 𝑘(𝑛−𝑘)
11.13. Suppose (𝑛𝑘) = ∑𝑖=0 𝑞
𝑎𝑖 𝑞𝑖 . Prove that 𝑎𝑖 = 𝑎𝑘(𝑛−𝑘)−𝑖 for 0 ≤ 𝑖 ≤ 𝑘(𝑛 − 𝑘), by
showing that (𝑛𝑘) is identical with its reciprocal polynomial. 𝑞
11.14. Let Λ(𝑘, 𝑛 − 𝑘) denote the set of all minimal lattice paths from (0,0) to (𝑘, 𝑛 − 𝑘), and for all 𝜆 ∈ Λ(𝑘, 𝑛 − 𝑘), let 𝛽(𝜆) denote the area above 𝜆, and inside the rectangle with vertices (0, 0), (0, 𝑛 − 𝑘), (𝑘, 𝑛 − 𝑘), and (𝑘, 0). Determine the distribution polynomial ∑𝜆∈Λ(𝑘,𝑛−𝑘) 𝑞𝛽(𝜆) . How are Exercises 11.13 and 11.14 related? 11.15. Give a combinatorial proof of Theorem 11.6.1 using the interpretation of (𝑚+𝑛 ) 𝑟 𝑞 featured in the case 𝑘 = 2 of Theorem 11.9.6.
11.B
185
Projects 11.A For any distribution polynomial 𝑝(Δ, 𝑠, 𝑞), the integer 𝑝(Δ, 𝑠, −1) ∶= 𝑝(Δ, 𝑠, 𝑞)|𝑞=−1 clearly gives the difference between the number of members of Δ with even 𝑠-values and the number of those with odd 𝑠-values. In particular, when 𝑝(Δ, 𝑠, −1) = 0, the statistic 𝑠 is said to be balanced on Δ. (a) Verify algebraically that the statistic that records the number of cycles in a permutation in 𝑆𝑛 is balanced for 𝑛 ≥ 2, and give a combinatorial proof of this fact by constructing a bijection from the set of permutations in 𝑆𝑛 with an even number of cycles to the set of permutations in 𝑆𝑛 with an odd number of cycles. (b) Verify algebraically that the statistic that records the number of inversions in a permutation in 𝑆𝑛 is balanced for 𝑛 ≥ 2, and give a combinatorial proof of this fact by constructing a bijection from the set of permutations in 𝑆𝑛 having an even number of inversions to the set of permutations in 𝑆𝑛 having an odd number of inversions.
11.B If 𝑛, 𝑘 ≥ 0, let 𝜎𝑞 (𝑛, 0) ∶= 𝛿𝑛,0 and 𝜎𝑞 (0, 𝑘) ∶= 𝛿0,𝑘 . If 𝑛, 𝑘 > 0, let 𝜎𝑞 (𝑛, 𝑘) be equal to the number of sequences (𝑉1 , . . . , 𝑉𝑛 ) of one-dimensional subspaces of 𝐹𝑞𝑘 such that 𝑉1 + ⋯ + 𝑉𝑛 = 𝐹𝑞𝑘 . In particular, 𝜎𝑞 (𝑛, 𝑘) = 0 if 𝑛 < 𝑘. (i) Use 𝑞-binomial inversion to prove that 𝜎𝑞 (𝑛, 𝑘) = ∑𝑗=0 (−1)𝑘−𝑗 𝑞( 𝑘
𝑘−𝑗 2
) (𝑘) 𝑗𝑛 , for 𝑞 𝑗 𝑞
all 𝑛, 𝑘, ≥ 0. (ii) Prove that 𝜎𝑞 (𝑛, 𝑘) = 𝑘𝑞 [𝑞𝑘−1 𝜎𝑞 (𝑛 − 1, 𝑘 − 1) + 𝜎𝑞 (𝑛 − 1, 𝑘)], for all 𝑛, 𝑘 > 0. (iii) Following Davis (1991), define the q-difference operator Δ𝑞 on any function 𝑓 ∶ 𝑓(𝑞𝑥+1)−𝑓(𝑥) . Define powers of Δ𝑞 inductively, by Δ0𝑞 = the (𝑞−1)𝑥+1 identity function on ℝℝ , Δ1𝑞 = Δ𝑞 , and Δ𝑘+1 = Δ𝑞 ∘Δ𝑘𝑞 for all 𝑘 ≥ 1. Find formulas 𝑞 𝑓(𝑥) for Δ𝑞 𝑓(𝑥)𝑔(𝑥) and Δ𝑞 𝑔(𝑥) .
ℝ → ℝ by Δ𝑞 𝑓(𝑥) =
(iv) Define polynomials 𝑝𝑛,𝑞 (𝑥) over ℂ, by 𝑝0,𝑞 (𝑥) = 1, 𝑝1,𝑞 (𝑥) = 𝑥, and 𝑝𝑛,𝑞 (𝑥) = 𝑥(𝑥 − 1𝑞 ) ⋯ (𝑥 − (𝑛 − 1)𝑞 ) if 𝑛 ≥ 2. Prove that Δ𝑞 𝑝0,𝑞 (𝑥) = 0 and Δ𝑞 𝑝𝑛,𝑞 (𝑥) = 𝑛𝑞 𝑝𝑛−1,𝑞 (𝑥) for all 𝑛 ≥ 1. (v) Prove that 𝑥𝑛 = ∑𝑘≥0 𝜎𝑞 (𝑛, 𝑘) ∑𝑘≥0 𝜎𝑞 (𝑛, 𝑘)
𝑝𝑘,𝑞 (𝑟𝑞 ) 𝑘
𝑘𝑞
𝑝𝑘,𝑞 (𝑥) 𝑘
𝑘𝑞
for all 𝑛 ≥ 0. (Hint: Prove that (𝑟𝑞 )𝑛 =
for all 𝑟 ≥ 𝑛, noting that
𝑝𝑘,𝑞 (𝑟𝑞 ) 𝑘
𝑘𝑞
= (𝑘𝑟 ) and referring back 𝑞
to your proof of (𝑖).) 𝑛
(vi) Prove that if 𝑝(𝑥) is a polynomial of degree 𝑛, then 𝑝(𝑥) = ∑𝑘=0 (vii) Prove that 𝜎𝑞 (𝑛, 𝑘) = 𝑞( 2 ) Δ𝑘𝑞 𝑥𝑛 |𝑥=0 . 𝑘
∆𝑘 𝑞 𝑝(0) 𝑘!𝑞
𝑝 𝑘,𝑞 (𝑥).
186
11. Finite vector spaces
𝑘
(viii)
Prove that Δ𝑘𝑞 𝑓(𝑥)
∑𝑗=0 (−1)𝑘−𝑗 𝑞
(𝑘−𝑗 ) 2 (𝑘)
𝑗 𝑞
𝑓(𝑞𝑗 𝑥+𝑗𝑞 )
. Setting 𝑓(𝑥) = 𝑥𝑛 , with 𝑥 = 0, ) [(𝑞−1)𝑥+1]𝑘 in conjunction with (𝑣𝑖), yields an alternative proof of formula (𝑖). =
𝑞(
𝑘 2
Chapter 12
Ordered sets
Among the numerous relations termed orders, the property of transitivity is the universal defining feature. This chapter explores several ordered structures that arise when transitivity is supplemented by one or more additional properties, including total orders, quasi-orders, weak orders, and their asymmetric counterparts. Our primary focus, however, is on partial orders, which appear in virtually all areas of mathematics involving the study of substructures of a given structure. Of additional interest to students of combinatorics is the fact that the seedbed of Rota’s unification of the subject (to be examined in detail in Chapter 14) is a family of algebras of bivariate functions defined on partially ordered sets with all intervals finite.
12.1. Total orders and their generalizations A binary relation 𝑅 on a set 𝑆 is a total (or linear) order if it is reflexive, transitive, antisymmetric, and complete. A total order 𝑅 on a set 𝑆 is a well-ordering of S if, for all nonempty 𝐴 ⊆ 𝑆, there exists an 𝑙 ∈ 𝐴 such that 𝑙𝑅𝑎, for all 𝑎 ∈ 𝐴. When (as is frequently the case) a total order is represented by the symbol ≤, such an element 𝑙 is called a smallest (or least) element of A. By antisymmetry, if a smallest element of a set exists, it is unique. It is clear that there exist totally ordered sets that are not wellordered. However, all such sets are infinite. Theorem 12.1.1. If ≤ is a total order on the finite set 𝑆, then ≤ is a well-ordering of 𝑆. Proof. By induction on |𝐴|. If |𝐴| = 1, the result holds by reflexivity of ≤. Suppose that every 𝐴 ⊆ 𝑆 for which |𝐴| = 𝑘 contains a least element. If 𝐵 ⊆ 𝑆 and |𝐵| = 𝑘 + 1, select any 𝑐 ∈ 𝐵. Then there exists an 𝑎 ∈ 𝐵 − {𝑐} such that 𝑎 ≤ 𝑏 for all 𝑏 ∈ 𝐵 − {𝑐}. By completeness, 𝑎 ≤ 𝑐 or 𝑐 ≤ 𝑎. If the former, then 𝑎 is the least element of 𝐵; if the latter, then 𝑐 is the least element of 𝐵. □ Corollary 12.1.2. If |𝑆| = 𝑛, there are 𝑛! total orders on 𝑆. 187
188
12. Ordered sets
Proof. Construct the obvious bijection from the set of well-orderings of 𝑆 to the set of all permutations of 𝑆 (construed as words). □ By omitting one or more of the defining properties of a total order, we arrive at three interesting generalizations: (1) 𝑅 is a partial order if it is reflexive, transitive, and antisymmetric. (2) 𝑅 is a weak order if it is reflexive, transitive, and complete. (3) 𝑅 is a quasi-order (or preorder) if it is reflexive and transitive. We will be chiefly concerned with partial orders in this chapter, but before turning to that subject we take brief note of the connection between quasi-orders and topologies on 𝑆, and between weak orders on 𝑆 and ordered partitions of 𝑆.
12.2. *Quasi-orders and topologies Readers may wish to review the material in Appendix B before reading this section. It turns out that every topology on a set 𝑆 induces a quasi-order on 𝑆, and that when 𝑆 is finite, there is a one-to-one correspondence between the set of topologies on 𝑆 and the set of quasi-orders on 𝑆. Proofs of all theorems in this section are left as exercises (see Project 12.A). Theorem 12.2.1. Suppose that 𝒯 is a topology on 𝑆. Define the binary relation 𝑅 on 𝑆 by 𝑥𝑅𝑦 if and only if 𝑥 belongs to every open set that contains 𝑦. Then 𝑅 is a quasi-order. Theorem 12.2.2. (𝑆, 𝒯) is a 𝑇0 -space if and only if the quasi-order 𝑅 induced by 𝒯 is a partial order. Theorem 12.2.3. (𝑆, 𝒯) is a 𝑇1 -space if and only if the quasi-order 𝑅 induced by 𝒯 is the identity relation. In particular, the quasi-order induced by any metric topology is the identity relation. In view of Theorem 12.2.3, it is clear that the map 𝒯 ↦ 𝑅 from topologies on 𝑆 to quasi-orders on 𝑆 need not be bijective. For example, the standard Euclidean topology on ℝ and the discrete topology on ℝ both induce the identity relation on ℝ. However, if the domain of this map is restricted to the set of so-called final topologies on 𝑆 (topologies that are closed under arbitrary intersections as well as arbitrary unions), then the map is bijective, as indicated below. Theorem 12.2.4. Suppose that 𝑅 is a quasi-order on 𝑆. For each 𝑦 ∈ 𝑆, let 𝐵(𝑦) ∶= {𝑥 ∈ 𝑆 ∶ 𝑥𝑅𝑦}. The family ℬ ∶= {𝐵(𝑦) ∶ 𝑦 ∈ 𝑆} is a base for a final topology on 𝑆. Proof. Exercise. (Hint: Show that arbitrary intersections of members of ℬ are equal to unions of members of ℬ.) □ The following theorem establishes that the map 𝒯 ↦ 𝑅 from the set of all final topologies on 𝑆 to the set of all quasi-orders on 𝑆 is injective. Theorem 12.2.5. If 𝑅 is the quasi-order induced by the final topology 𝒯, and 𝒯 ∗ is the final topology induced by 𝑅, then 𝒯 ∗ = 𝒯.
12.3. *Weak orders and ordered partitions
189
The following theorem establishes that the map 𝒯 ↦ 𝑅 from the set of all final topologies on 𝑆 to the set of all quasi-orders on 𝑆 is surjective. Theorem 12.2.6. If 𝒯 is the final topology induced by the quasi-order 𝑅, and 𝑅∗ is the quasi-order induced by 𝒯, then 𝑅∗ = 𝑅. If the set 𝑆 is finite, then every topology on 𝑆 consists of finitely many open sets, and so every topology on 𝑆 is a final topology. So the following is a corollary of the preceding theorems. Corollary 12.2.7. For every finite, nonempty set 𝑆, the map 𝒯 ↦ 𝑅 defined in Theorem 12.2.1 is a bijection from the set of all topologies on 𝑆 to the set of all quasi-orders on 𝑆.
12.3. *Weak orders and ordered partitions Recall that a weak order on a set 𝑆 is a binary relation on 𝑆 that is reflexive, transitive, and complete. The following theorems, the proofs of which are left as exercises (see Project 12.B below), establish that there is a one-to-one correspondence between the set of weak orders on 𝑆 and the set of totally ordered partitions of S. Theorem 12.3.1. If 𝑅 is a weak order on 𝑆 and the relation 𝐼 on 𝑆 is defined by 𝑥𝐼𝑦 if and only if 𝑥𝑅𝑦 and 𝑦𝑅𝑥, then 𝐼 is an equivalence relation. Let 𝑆/𝐼 be the partition of 𝑆 induced by 𝐼, and define the relation ≥ for all 𝑈, 𝑉 ∈ 𝑆/𝐼 by 𝑈 ≥ 𝑉 if and only if there exist 𝑢 ∈ 𝑈 and 𝑣 ∈ 𝑉 such that 𝑢𝑅𝑣 (equivalently, for all 𝑢 ∈ 𝑈 and 𝑣 ∈ 𝑉, 𝑢𝑅𝑣). Then ≥ is a total order on 𝑆/𝐼. Theorem 12.3.2. Suppose that the partition 𝒫 of 𝑆 is totally ordered by the relation ≥. Define the relation 𝑅 on 𝑆 for all 𝑢, 𝑣 ∈ 𝑆 by 𝑢𝑅𝑣 ⇔ 𝑈 ≥ 𝑉, where 𝑢 ∈ 𝑈 ∈ 𝒫 and 𝑣 ∈ 𝑉 ∈ 𝒫. Then 𝑅 is a weak order on 𝑆. Theorem 12.3.3. If (𝒫, ≥) is the totally ordered partition of 𝑆 induced by the weak order 𝑅, and 𝑅∗ is the weak order on 𝑆 induced by (𝒫, ≥), then 𝑅∗ = 𝑅. Theorem 12.3.4. If 𝑅 is the weak order induced by the totally ordered partition (𝒫, ≥) of 𝑆, and (𝒫 ∗ , ≥∗ ) is the totally ordered partition of 𝑆 induced by 𝑅, then 𝒫 ∗ = 𝒫 and the relations ≥∗ and ≥ are identical. Weak orders are of particular interest in an area of mathematical economics known as social choice (or social welfare) theory, which deals with methods for aggregating the weak orders of several individuals, or other entities, to produce a single societal order. The (typically finite) set 𝑆 on which individuals (sometimes called voters) assess their respective weak orders can be a set of alternative social policies, candidates for public office, possible allocations of resources to various entities, etc. The relation 𝑥𝑅𝑦 is interpreted to assert that an individual (or the society) regards alternative 𝑥 as being at least as desirable as alternative 𝑦. If 𝑅 is a weak order, it is traditional in social choice theory to represent its asymmetric part 𝑎𝑅 (defined in section 2.7) by the symbol 𝑃 and to interpret the relation 𝑥𝑃𝑦 as asserting that 𝑥 is preferred (or, for emphasis, strictly preferred) to y. The symmetric part 𝑠𝑅 of 𝑅 (also defined in section 2.7) is traditionally represented by the symbol 𝐼, and the relation 𝑥𝐼𝑦 is interpreted as asserting
190
12. Ordered sets
indifference between 𝑥 and 𝑦. There is a famous theorem of social welfare theory, due to Arrow (1951), which poses severe limitations on the possibility of aggregating the weak orders of several individuals in a fair and rational way. Let 𝒲 denote the set of all weak orders on the finite set 𝑆, where |𝑆| ≥ 3, and suppose that 𝑛 ≥ 2. An 𝑛-tuple (𝑅1 , . . . , 𝑅𝑛 ) ∈ 𝒲 𝑛 is called a preference profile, and it represents a possible sequence of weak orders on 𝑆 assessed by individuals 𝑖 = 1, 2, . . . , 𝑛. Each mapping 𝜓 ∶ 𝒲 𝑛 → 𝒲 is called a social welfare function (SWF), and provides a comprehensive method for aggregating the individual preferences in each conceivable profile to produce a corresponding societal weak order. Some social welfare functions are clearly unacceptable, including the constant SWFs (which impose the same societal ordering for all profiles) and the dictatorial SWFs (which, for a fixed 𝑑 ∈ [𝑛], are defined for all (𝑅1 , . . . , 𝑅𝑛 ) ∈ 𝒲 𝑛 by 𝜓(𝑅1 , . . . , 𝑅𝑛 ) = 𝑅𝑑 , thereby always adopting the preferential ordering of the same individual as the societal ordering). A natural way of identifying fair and rational SWFs is to posit certain axiomatic restrictions on the class of acceptable SWFs. Two axioms have been deemed reasonable by many economists. (In what follows 𝜓(𝑅1 , . . . , 𝑅𝑛 ) is denoted by 𝑅, 𝑎𝑅𝑖 by 𝑃𝑖 , and 𝑎𝑅 by 𝑃.) 1. WP (Weak Pareto property). For all alternatives 𝑥, 𝑦 ∈ 𝑆, if every individual strictly prefers 𝑥 to 𝑦(𝑥𝑃𝑖 𝑦, 𝑖 = 1, . . . , 𝑛), then 𝑥𝑃𝑦. 2. IA (Irrelevance of alternatives). For any profiles (𝑅1 , . . . , 𝑅𝑛 ) and (𝑅∗1 , . . . , 𝑅∗𝑛 ), and any alternatives 𝑥 and 𝑦, if, for each 𝑖 = 1, . . . , 𝑛, 𝑅𝑖 and 𝑅∗𝑖 rank 𝑥 and 𝑦 in exactly the same way (𝑥𝑅𝑖 𝑦 ⇔ 𝑥𝑅∗𝑖 𝑦 and 𝑦𝑅𝑖 𝑥 ⇔ 𝑦𝑅∗𝑖 𝑥), then 𝑅 and 𝑅∗ rank 𝑥 and 𝑦 in exactly the same way. 𝑊𝑃 is prima facie reasonable. The motivation for adopting 𝐼𝐴 is that the rankings provided by individual weak orders are purely ordinal, in the sense that the number of alternatives ranked between 𝑥 and 𝑦 is no indication of the strength of preference for 𝑥 over 𝑦. Theorem 12.3.5 (Arrow impossibility theorem). Any SWF satisfying 𝑊𝑃 and 𝐼𝐴 is effectively dictatorial, in the sense that there exists an individual 𝑑 whose strict preferences are always adopted in the societal ordering ( for all profiles, and all alternatives 𝑥 and 𝑦, 𝑥𝑃𝑑 𝑦 ⇒ 𝑥𝑃𝑦).
Proof. See Sen (1970).
□
By Corollary 12.1.2 and Theorems 12.3.1–12.3.4, there are 𝑛! total orders and 𝑃𝑛 weak orders on [𝑛], where 𝑃𝑛 is the horse race number defined in section 4.2. The enumeration of quasi-orders, partial orders, and, more generally, transitive relations are considerably more difficult problems, but results have been computed for certain values of 𝑛. Table 12.1 is a partial listing of known results. For additional results, see the On-Line Encyclopedia of Integer Sequences (oeis.org).
12.4. *Strict orders
191
Table 12.1. Enumeration of several types of transitive relations on [𝑛] for 1 ≤ 𝑛 ≤ 5)
𝑛 transitive quasi 1 2 1 2 13 4 3 171 29 4 3994 355 5 154,303 6942
partial 1 3 19 219 4231
weak 1 3 13 75 541
total 1 2 6 24 120
12.4. *Strict orders Readers may wish to review the material in section 2.7 before reading this section. Mathematicians (and, especially, mathematical economists) often prefer to work with the asymmetric parts of the various order relations introduced in the previous section (frequently, and confusingly, retaining the same terminology). This section clarifies the distinction between the two approaches, characterizing the asymmetric parts of total, weak, and partial orders by minimal sets of properties. We begin with three definitions. (i) 𝑅 is a strict total order if it is irreflexive, transitive, and complete. (ii) 𝑅 is a strict weak order if it is asymmetric and negatively transitive. (iii) 𝑅 is a strict partial order if is irreflexive and transitive. The proofs of the theorems in this section are left as exercises (see Project 12.C). The following theorems establish that every strict total order is a strict weak order, and they characterize the family of strict total orders within the family of strict weak orders. Theorem 12.4.1. If 𝑅 is irreflexive and transitive, then it is asymmetric. Corollary 12.4.2. If 𝑅 is a strict total order, or a strict partial order, then 𝑅 is asymmetric. Theorem 12.4.3. Every strict total order is a strict weak order. Theorem 12.4.4. 𝑅 is a strict total order if and only if it is a strict weak order and complete. The following theorems establish that every strict weak order is a strict partial order, and characterize the family of strict weak orders within the family of strict partial orders. Theorem 12.4.5. Every strict weak order is a strict partial order. Theorem 12.4.6. 𝑅 is a strict weak order if and only if it is a strict partial order and 𝑠𝑐𝑅 is transitive. The relation 𝑠𝑐𝑅, known as the symmetric complement of R, has the following properties. Theorem 12.4.7. If 𝑅 is a strict weak order on 𝑋, then 𝑠𝑐𝑅 is an equivalence relation on 𝑋.
192
12. Ordered sets
Theorem 12.4.8. If 𝑅 is a strict total order on 𝑋, then 𝑠𝑐𝑅 = 𝐼, the identity relation on 𝑋. Remark 12.4.9. If 𝑅 is a strict partial order, then 𝑠𝑐𝑅 is reflexive and symmetric. But 𝑠𝑐𝑅 need not be transitive, as shown by the example 𝑅 = {(𝑤, 𝑥), (𝑥, 𝑧), (𝑤, 𝑧), (𝑤, 𝑦)} on the set {𝑤, 𝑥, 𝑦, 𝑧}, where 𝑥(𝑠𝑐𝑅)𝑦 and 𝑦(𝑠𝑐𝑅)𝑧, but 𝑥𝑅𝑧. The following theorems establish the connection between partial, weak, and total orders on a set 𝑋 and their respective asymmetric parts. Theorem 12.4.10. The mapping 𝑅 ↦ 𝑎𝑅 is a bijection from the set of partial orders on 𝑋 to the set of strict partial orders on 𝑋, with inverse 𝑅 ↦ 𝑅 ∪ 𝐼, where I is the identity relation on 𝑋. Theorem 12.4.11. The mapping 𝑅 ↦ 𝑎𝑅 is a bijection from the set of weak orders on 𝑋 to the set of strict weak orders on 𝑋, with inverse 𝑅 ↦ 𝑅 ∪ 𝑠𝑐𝑅. Theorem 12.4.12. The mapping 𝑅 ↦ 𝑎𝑅 is a bijection from the set of total orders on 𝑋 to the set of strict total orders on 𝑋, with inverse 𝑅 ↦ 𝑅 ∪ 𝑠𝑐𝑅 (= 𝑅 ∪ 𝐼).
12.5. Partial orders: basic terminology and notation As noted above, a partial order is a reflexive, transitive, antisymmetric relation. In accord with standard practice, we always denote a partial order by the symbol ≤. If 𝑃 is the set on which ≤ is defined, we call 𝑃 a partially ordered set, or poset. (Sometimes, for clarity, we write (𝑃, ≤) in place of 𝑃, and ≤𝑃 in place of ≤.) Although the empty relation on ∅ is vacuously reflexive, transitive, and antisymmetric, we shall often exclude this case from consideration. The asymmetric part of a partial order ≤ is denoted by . The relations < and > are strict partial orders (characterized, as shown in the preceding section, by the properties of irreflexivity and transitivity) and the relation ≥ is again a partial order. It is easy to show that 𝑥 < 𝑦 if and only if 𝑥 ≤ 𝑦 and 𝑥 ≠ 𝑦. By antisymmetry, the symmetric part of ≤ is simply the identity relation = . If 𝑥, 𝑦 ∈ 𝑃 and 𝑥 ≤ 𝑦 or 𝑦 ≤ 𝑥, then 𝑥 and 𝑦 are said to be comparable (accent on the first syllable); otherwise 𝑥 and 𝑦 are incomparable. On every set 𝑃 the identity relation = is a partial order for which any two distinct elements are incomparable. A subposet (or, for emphasis, an induced subposet) of the poset 𝑃 is a subset 𝑄 of 𝑃 and a partial order on 𝑄 such that 𝑥 ≤ 𝑦 in 𝑄 ⇔ 𝑥 ≤ 𝑦 in 𝑃. If the partial order on 𝑄 only has the weaker property 𝑥 ≤ 𝑦 in 𝑄 ⇒ 𝑥 ≤ 𝑦 in 𝑃, then 𝑄 is called a weak subposet of 𝑃. The posets 𝑃 and 𝑄 are isomorphic if there exists a bijection 𝑓 ∶ 𝑃 → 𝑄 such that 𝑥 ≤ 𝑦 in 𝑃 if and only if 𝑓(𝑥) ≤ 𝑓(𝑦) in 𝑄. If 𝑎 ∈ 𝑃 and there exists no 𝑥 ∈ 𝑃 such that 𝑥 < 𝑎, then 𝑎 is called a minimal element. It is possible for every element of a poset to be minimal. (How?) Every finite poset 𝑃 contains at least one minimal element (proof by induction on |𝑃|). If there exists an element 0̂ ∈ 𝑃 such that 0̂ ≤ 𝑥 for all 𝑥 ∈ 𝑃, then 0̂ is called a smallest element of P. If 𝑃 has a smallest element, it is unique by antisymmetry, and we say that P has a 0.̂ If 𝑃 has a 0,̂ then 0̂ is a minimal element of 𝑃, and, indeed, the only minimal element of 𝑃 (no 𝑎 ≠ 0̂ can be minimal because 0̂ ≤ 𝑎, whence 0̂ < 𝑎).
12.5. Partial orders: basic terminology and notation
193
If 𝑏 ∈ 𝑃 and there exists no 𝑦 ∈ 𝑃 such that 𝑦 > 𝑏, then 𝑏 is called a maximal element. If there exists an element 1̂ ∈ 𝑃 such that 𝑥 ≤ 1̂ for all 𝑥 ∈ 𝑃, then 𝑏 is called a largest element of P. Note that 𝑏 is maximal with respect to ≤ if and only it is minimal with respect to the partial order ≥, and 1̂ is a largest element with respect to ≤ if and only if it is a smallest element with respect to ≥. It follows immediately from the observations in the preceding paragraph that the following hold. (1) Every element of 𝑃 can be maximal. (2) Every finite poset contains at least one maximal element. ̂ (3) If a largest element of 𝑃 exists, it is unique (in which case, we say that P has a 1). (4) If 𝑃 has a 1,̂ then it is the unique maximal element of 𝑃. In the above, we have invoked the duality principle for posets, which asserts that any theorem about the poset (𝑃, ≤) remains true in the poset (𝑃, ≥) when all occurrences of ≤ are replaced by ≥, all occurrences of < are replaced by >, the terms minimal and maximal are interchanged, and the symbols 0̂ and 1̂ are interchanged. You should make use of this principle whenever possible in deriving theorems about posets. We noted above that the 0̂ of the poset 𝑃 is always the sole minimal element of 𝑃, and the 1̂ is always the sole maximal element. On the other hand, the existence of a unique minimal (respectively, maximal) element of 𝑃 need not imply the existence of a 0̂ (respectively, 1)̂ of 𝑃. As seen below, however, all examples to the contrary occur when 𝑃 is infinite. Theorem 12.5.1. Suppose that 𝑃 is finite. (i) If 𝑃 contains a unique minimal element, this element is the 0̂ of 𝑃. (ii) If 𝑃 contains a unique maximal element, this element is the 1̂ of 𝑃. Proof. It suffices to prove (i), since (ii) follows from (i) by the duality principle. Suppose that 𝑎 is the unique minimal element of 𝑃, but 𝑎 is not the smallest element of 𝑃. Then there exists an 𝑥 ∈ 𝑃 such that ∼ (𝑎 ≤ 𝑥). Since 𝑥 ≠ 𝑎, either (1) 𝑥 < 𝑎 or (2) 𝑥 and 𝑎 are incomparable. Alternative (1) is ruled out by the minimality of 𝑎, and so alternative (2) holds. Now 𝑥 can’t be minimal, since 𝑎 is the only minimal element. So there exists 𝑥1 such that 𝑥1 < 𝑥. Clearly, 𝑥1 ≠ 𝑎, since 𝑎 and 𝑥 are incomparable. Furthermore, 𝑥1 can’t be minimal, since 𝑎 is the only minimal element. So there exists 𝑥2 such that 𝑥2 < 𝑥1 , whence 𝑥2 is distinct from 𝑥1 , 𝑥, and 𝑎. If |𝑃| = 𝑛, continuing in this way leads to exhausting the elements of 𝑃, with 𝑥𝑛−2 < 𝑥𝑛−1 < ⋯ < 𝑥1 < 𝑥, and 𝑎 incomparable with 𝑥 and each 𝑥𝑖 . It follows that 𝑥𝑛−2 is minimal, contradicting the unique minimality of 𝑎. □ If 𝑥 and 𝑦 are elements of the poset 𝑃, we say that 𝑥 is covered by y (or that y covers x), symbolized by 𝑥 < ⋅ 𝑦, if 𝑥 < 𝑦 and there exists no 𝑧 such that 𝑥 < 𝑧 < 𝑦. When 𝑃 is finite, the covering relation < ⋅ completely determines ≤. Indeed, the partial order ≤ is simply the transitive closure of < ⋅ (the intersection of all transitive relations that contain < ⋅). In such cases there is a useful graphical representation of < ⋅, called the Hasse diagram, in which elements of 𝑃 are represented as vertices, and the relation
194
12. Ordered sets
𝑥 < ⋅ 𝑦 by an edge connecting 𝑥 to 𝑦, with 𝑥 below 𝑦. For example, Figure 12.1 is the Hasse diagram of the poset (𝑃, ≤), where 𝑃 = [6] and 𝑥 ≤ 𝑦 if and only if 𝑥|𝑦.
Figure 12.1
12.6. Chains and antichains Recall that a poset 𝐶 is called a chain (or total order) if any two elements of 𝐶 are comparable. 𝐴 subset 𝐶 of a poset 𝑃 is called a chain in P if 𝐶, with ordering inherited from 𝑃, is totally ordered. For every poset 𝑃 and every 𝑥 ∈ 𝑃, {𝑥} and ∅ are chains in 𝑃. If 𝑃 is finite, the length of P, denoted 𝑙(𝑃), is defined by (12.6.1)
𝑙(𝑃) ∶= max{|𝐶| − 1 ∶ 𝐶 is a chain in 𝑃}.
So the length of 𝑃 is the number of links in a chain in 𝑃 of largest cardinality. 𝐴 chain 𝐶 is maximal if there is no chain that properly contains 𝐶. A chain of maximum cardinality must of course be maximal. But a maximal chain need not be a chain of maximum cardinality. In any poset 𝑃, every chain is contained in a maximal chain. This fact, which is equivalent to the axiom of choice (see Kelley (1955)), is called the Hausdorff maximality principle. Of course, if 𝑃 is finite, one need not invoke this principle. For suppose that 𝐶 is a chain in the finite poset 𝑃. If 𝐶 is maximal, we are done. If not, 𝐶 is a proper subset of a chain 𝐶 ∗ . If 𝐶 ∗ is maximal, we are done, etc. Since 𝑃 is finite, this procedure must culminate in a maximal chain after a finite number of steps. If 𝐴 is a subset of the poset 𝑃, 𝐴 is called an antichain if no two distinct elements of 𝐴 are comparable. For every poset 𝑃 and every 𝑥 ∈ 𝑃, {𝑥} and ∅ are antichains. If 𝑃 is finite, the width of P, denoted 𝑤(𝑃), is defined by (12.6.2)
𝑤(𝑃) ∶= max{|𝐴| ∶ 𝐴 is an antichain in 𝑃}.
The length and width of a poset are fundamental properties of that poset. Somewhat more esoteric but fascinating nevertheless, is the notion of the dimension of a poset 𝑃, the definition of which is based on the following theorem. Theorem 12.6.1 (Szpilrajn’s theorem). For every poset (𝑃, ≤), there exists a total order 𝑅 on 𝑃 that extends ≤, in the sense that 𝑥 ≤ 𝑦 ⇒ 𝑥𝑅𝑦. Proof. See Szpilrajn (1930).
□
When 𝑃 is finite, the result of Theorem 12.6.1 may be established by a straightforward inductive argument. See Velleman (2006, p. 269). In accord with the above, the dimension of the poset (𝑃, ≤), denoted dim(𝑃), is defined as the smallest cardinal number 𝗆 such that ≤ is the intersection of 𝗆 total
12.6. Chains and antichains
195
orders on 𝑃. In general, 𝑃 (and thus 𝗆) may be infinite. Even when 𝑃 is finite, the determination of dim(𝑃) can be difficult. It is always the case, however, that dim(𝑃) ≤ 𝑤(𝑃). See Bogart (1990) and Trotter (1992) for further details. The following generalizes Theorem 2.6.3 (Erdős and Szekeres) to posets (as you are asked to verify in Exercise 12.5). Theorem 12.6.2 (Dilworth’s lemma). Let 𝑃 be a poset, with |𝑃| = 𝑚𝑛 + 1. Then 𝑃 contains a chain 𝐶, with |𝐶| = 𝑚 + 1, or an antichain 𝐴, with |𝐴| = 𝑛 + 1. Proof. Suppose that no chain in 𝑃 has cardinality greater than 𝑚. Define 𝑓 ∶ 𝑃 → [𝑚] by 𝑓(𝑥) = max{|𝐶| ∶ 𝐶 is a chain with largest element 𝑥}. By the pigeonhole principle, there exists 𝑘 ∈ [𝑚] such that |𝑓−1 ({𝑘})| ≥ (𝑚𝑛 + 1)/𝑚. So at least 𝑛 + 1 elements of 𝑃 are the largest elements of chains in 𝑃 of cardinality 𝑘, and none of these 𝑛+1 elements is the largest element of any chain of cardinality greater than 𝑘. It follows that these 𝑛 + 1 elements constitute an antichain. □ The above result is best possible, in the sense that it is no longer true for all posets of cardinality 𝑚𝑛. For if 𝑃 is the union of 𝑛 disjoint chains, each having cardinality 𝑚, then 𝑃 clearly contains neither a chain of cardinality 𝑚 + 1, nor an antichain of cardinality 𝑛 + 1. Corollary 12.6.3. If 𝑃 is a finite poset, then |𝑃| ≤ (𝑙(𝑃) + 1) × 𝑤(𝑃). Proof. Let 𝑙(𝑃) = 𝑙 and 𝑤(𝑃) = 𝑤, and suppose that |𝑃| > (𝑙 + 1)𝑤. Choose any (𝑙 + 1)𝑤 + 1 elements of 𝑃, with partial order inherited from 𝑃, and call the resulting poset 𝑃′ . By Dilworth’s lemma, either (1) there exists a chain in 𝑃′ (hence, in 𝑃) of cardinality 𝑙 + 2, or (2) there exists an antichain in 𝑃 ′ (hence, in 𝑃) with cardinality 𝑤 + 1. In case (1), this contradicts the fact that 𝑙(𝑃) = 𝑙, and in case (2), the fact that 𝑤(𝑃) = 𝑤. □ Since the family of singleton subsets of a poset 𝑃 constitute a partition of 𝑃, every poset may be partitioned into blocks which are chains, or into blocks which are antichains. It is of interest, however, to do this using as few blocks as possible. The following theorems of Dilworth (1950) specify the minimal number of blocks required in such partitions. Theorem 12.6.4 (Dilworth’s antichain decomposition theorem). If 𝑃 is a finite poset, with length(𝑃) = 𝑙, then 𝑃 can be partitioned into 𝑙 + 1 antichains, but no fewer (i.e., the minimum number of blocks in an antichain partition of 𝑃 is equal to the maximum cardinality of any chain in 𝑃). Proof. Since each element of a chain in 𝑃 must belong to a different block of any antichain partition of 𝑃, and there exists at least one chain in 𝑃 of cardinality 𝑙 + 1, any antichain partition of 𝑃 must contain at least 𝑙 + 1 blocks. Moreover, there exists an antichain partition of 𝑃 with 𝑙 + 1 blocks. For let 𝑓 ∶ 𝑃 → [𝑙 + 1] be defined by (12.6.3)
𝑓(𝑥) ∶= max{|𝐶| ∶ 𝐶 is a chain in 𝑃 with largest element 𝑥}.
For each 𝑘 ∈ [𝑙+1], 𝑓−1 ({𝑘}) is an antichain, and, as usual, the sets 𝑓−1 ({𝑘}) are pairwise disjoint, with union equal to 𝑃. And each of the preimages 𝑓−1 ({𝑘}) is nonempty, since
196
12. Ordered sets
there exists in 𝑃 a chain of the form 𝑥1 < 𝑥2 < ⋯ < 𝑥𝑘 < ⋯ < 𝑥𝑙+1 , from which it follows that 𝑥𝑘 ∈ 𝑓−1 ({𝑘}). □ Theorem 12.6.5 (Dilworth’s chain decomposition theorem). If 𝑃 is a finite poset, with 𝑤(𝑃) = 𝑤, then 𝑃 can be partitioned into 𝑤 chains, but no fewer (i.e., the minimum number of blocks in a chain partition of 𝑃 is equal to the maximum cardinality of any antichain in 𝑃). Proof. Since each element of an antichain in 𝑃 must belong to a different block of any chain partition of 𝑃, and there exists an antichain with 𝑤 elements, any chain partition of 𝑃 must contain at least 𝑤 elements. Moreover, there exists a chain partition of 𝑃 with 𝑤 blocks, which we prove by induction on |𝑃|, the case |𝑃| = 1 being obvious. To prove the assertion for a given poset 𝑃, suppose that it is true for all posets of cardinality less than |𝑃|, whatever their width and length may be. Let 𝑎 be maximal in 𝑃, and consider the subposet 𝑃∗ ∶= 𝑃 − {𝑎} of 𝑃. Suppose that 𝑤(𝑃 ∗ ) = 𝑟, so that, by the inductive hypothesis, 𝑃 ∗ may be partitioned into the chains 𝐶1 , . . . , 𝐶𝑟 . Since 𝑤(𝑃 ∗ ) = 𝑟, 𝑃 ∗ contains an 𝑟-element antichain. For any such antichain 𝐴, |𝐴∩𝐶𝑖 | = 1, for 𝑖 = 1, . . . , 𝑟, and so each chain 𝐶𝑖 contains at least one element that belongs to some 𝑟-element antichain. Hence, for each 𝑖 = 1, . . . , 𝑟, we may define the element 𝑎𝑖 ∈ 𝑃 ∗ by (12.6.4)
𝑎𝑖 = max{𝑥 ∈ 𝐶𝑖 ∶ 𝑥 belongs to some 𝑟-element antichain in 𝑃 ∗ }.
Now 𝐴 ∶= {𝑎1 , . . . , 𝑎𝑟 } is an antichain. For suppose that 𝑎𝑖 < 𝑎𝑗 for some 𝑖, 𝑗 ∈ [𝑟]. Then, contradicting the definition of 𝑎𝑗 , there can be no 𝑟-element antichain in 𝑃 ∗ that contains 𝑎𝑗 . For such an antichain would have to contain an element of 𝐶𝑖 . But no element of 𝐶𝑖 that is greater than 𝑎𝑖 belongs to any 𝑟-element antichain. And 𝑎𝑖 , and all of the elements of 𝐶𝑖 that are less than 𝑎𝑖 , are less than 𝑎𝑗 , and so they can’t belong to any antichain that contains 𝑎𝑗 . (I) Suppose that 𝐴 ∪ {𝑎} is an antichain in 𝑃. Now {𝐶1 , . . . , 𝐶𝑟 , {𝑎}} is a partition of 𝑃 into 𝑟 + 1 chains. But 𝑤(𝑃) = 𝑟 + 1, since 𝐴 ∪ {𝑎} is an (𝑟 + 1)-element antichain, and there can be no antichain 𝐴′ in 𝑃 with |𝐴′ | > 𝑟 + 1. For such an antichain would have to have the contradictory properties: (i) |𝐴′ ∩ 𝐶𝑖 |, |𝐴′ ∩ {𝑎}| ≤ 1, 𝑖 = 1, . . . , 𝑟, and (ii) (𝐴′ ∩ 𝐶1 ) + ⋯ + (𝐴′ ∩ 𝐶𝑟 ) + (𝐴′ ∩ {𝑎}) = 𝐴′ . (II) Suppose that 𝐴 ∪ {𝑎} is not an antichain in 𝑃. As in (I) above, {𝐶1 , . . . , 𝐶𝑟 , {𝑎}} is a partition of 𝑃 into 𝑟 + 1 chains. But in this case, it turns out that 𝑤(𝑃) = 𝑟, and so the aforementioned partition is useless. The following argument exhibits a partition of 𝑃 into 𝑟 chains, and demonstrates that 𝑤(𝑃) = 𝑟, completing the proof. First, since 𝐴 ∪ {𝑎} is not an antichain, 𝑎 must be comparable to some 𝑎𝑖 ∈ 𝐴, and since 𝑎 is maximal in 𝑃, it must be the case that 𝑎 > 𝑎𝑖 . Let 𝐾 ∶= {𝑎} ∪ {𝑥 ∈ 𝐶𝑖 ∶ 𝑥 ≤ 𝑎𝑖 }. By the definition of 𝑎𝑖 , there are no 𝑟-element antichains containing any 𝑥 ∈ 𝐶𝑖 such that 𝑥 > 𝑎𝑖 . Consider the family of sets 𝐅 = {𝐶1 , . . . , 𝐶𝑖 − 𝐾, . . . , 𝐶𝑟 }. If 𝐶𝑖 − 𝐾 = ∅, then the remaining chains in this family constitute a partition of 𝑃 − 𝐾 into 𝑟 − 1 chains, so there are no 𝑟-element antichains in 𝑃−𝐾. If 𝐶𝑖 −𝐾 ≠ ∅, then 𝐅 constitutes a partition of 𝑃−𝐾 into 𝑟 chains,
12.6. Chains and antichains
197
and any 𝑟-element antichain in 𝑃 − 𝐾 must have nonempty intersection with each of the blocks of 𝐅, in particular, the block 𝐶𝑖 − 𝐾. But this is impossible, since no 𝑟-element antichain can have nonempty intersection with 𝐶𝑖 − 𝐾. So, again, there are no 𝑟-element antichains in 𝑃 − 𝐾. But there are (𝑟 − 1)-element antichains in 𝑃 − 𝐾 (consisting of the sole elements of 𝐴 ∩ 𝐶1 , . . . , 𝐴 ∩ 𝐶𝑖−1 , 𝐴 ∩ 𝐶𝑖+1 , . . . , 𝐴 ∩ 𝐶𝑟 , where 𝐴 is an 𝑟-element antichain of 𝑃 ∗ ). So 𝑤(𝑃 − 𝐾) = 𝑟 − 1. By the inductive hypothesis, 𝑃 − 𝐾 may be partitioned into 𝑟 − 1 chains. Adding the chain 𝐾 to this collection yields a partition of 𝑃 into 𝑟 chains. Since 𝑤(𝑃) = 𝑟 (it can’t be greater than 𝑟, since there is a partition of 𝑃 into 𝑟 chains, and it can’t be less than 𝑟, since 𝑃 ∗ ⊂ 𝑃 and 𝑤(𝑃 ∗ ) = 𝑟), we are done. □ Remark 12.6.6. Dilworth’s chain decomposition theorem is obviously applicable to the problem of determining the minimal number of individuals required to complete a fixed schedule of tasks. Here 𝑃 comprises the set of tasks, and if 𝑥, 𝑦 ∈ 𝑃, 𝑥 < 𝑦 asserts that it is possible for an individual to complete task 𝑥 in time to take on task 𝑦. It follows from Dilworth’s theorem that the minimum number of individuals required is equal to the maximum number of tasks, no two of which can be performed by the same individual. In practice, of course, one wishes not only to determine the number of chains in a minimal chain decomposition but also to construct such a decomposition. See Ford and Fulkerson (1962) for a discussion of the latter problem. In view of the above theorems, it is of obvious interest to ascertain the lengths and widths of posets. Consider first the case of the poset (2[𝑛] , ⊆) of subsets of [𝑛], ordered by inclusion. It is clear that the length of this poset is equal to 𝑛, since ∅ ⊂ [1] ⊂ [2] ⊂ ⋯ ⊂ [𝑛] is a chain of maximum cardinality. The width of this poset was first determined by Sperner (1928). The beautifully short proof presented below appeared in the very first volume of the Journal of Combinatorial Theory. Theorem 12.6.7 (Sperner’s theorem). The width of the poset (2[𝑛] , ⊆) is equal to 𝑛 max{(𝑛𝑘) ∶ 0 ≤ 𝑘 ≤ 𝑛} = (⌊𝑛/2⌋ ). Proof. (Lubell (1966)). Since, for each 𝑘 = 0, . . . , 𝑛, the set of all 𝑘-element subsets of [𝑛] is an antichain, it must be the case that the width of (2[𝑛] , ⊆) is greater than or equal to 𝜇 ∶= max{(𝑛𝑘) ∶ 0 ≤ 𝑘 ≤ 𝑛}. So it suffices to show that for every antichain 𝔄 in 2[𝑛] , |𝔄| ≤ 𝜇. There are clearly 𝑛! maximal chains in 2[𝑛] , and for each 𝑆 ⊆ [𝑛], there are |𝑆|! (𝑛 − |𝑆|)! such chains that contain 𝑆. Since no chain can contain two distinct elements of 𝔄, if 𝑆 1 , 𝑆 2 ∈ 𝔄 and 𝑆 1 ≠ 𝑆 2 , the set of maximal chains containing 𝑆 1 is disjoint from the set of maximal chains through 𝑆 2 (two such chains may of course have common elements, e.g., ∅ and [𝑛], but that is a different matter). So ∑𝑆∈𝔄 |𝑆|! (𝑛−|𝑆|)! ≤ 𝑛!, and dividing each side of this inequality by 𝑛! yields −1
(12.6.5)
𝑛 ∑( ) |𝑆| 𝑆∈𝔄
≤ 1. −1
𝑛 Now, given any antichain 𝔄, |𝔄| = ∑𝑆∈𝔄 1 = 𝜇 ∑𝑆∈𝔄 𝜇−1 ≤ 𝜇 ∑𝑆∈𝔄 (|𝑆| ) ≤ 𝜇, by the definition of 𝜇, along with (12.6.5). We will show in section 12.8 (as a special case of a 𝑛 more general result) that 𝜇 = (⌊𝑛/2⌋ □ ).
198
12. Ordered sets
Antichains in a poset 𝑃 are intimately associated with two other discrete structures. (i) A set 𝐼 ⊆ 𝑃 is an order ideal if (𝑦 ∈ 𝐼 and 𝑥 ≤ 𝑦) ⇒ 𝑥 ∈ 𝐼. In particular, the set 𝐼 = ∅ qualifies vacuously as an order ideal in any poset. (ii) A function 𝑓 ∶ 𝑃 → {0, 1} is a monotone boolean function if 𝑥 ≤ 𝑦 ⇒ 𝑓(𝑥) ≤ 𝑓(𝑦). Theorem 12.6.8. For every finite poset 𝑃, the number of antichains in 𝑃 is equal to the number of order ideals in 𝑃. Proof. The map 𝐴 ↦ 𝐼𝐴 ∶= {𝑥 ∈ 𝑃 ∶ there exists 𝑦 ∈ 𝐴 with 𝑥 ≤ 𝑦} is a bijection from the set of antichains in 𝑃 to the set of order ideals in 𝑃. The inverse of this map is given by 𝐼 ↦ 𝐴𝐼 ∶= {𝑥 is maximal in the subposet 𝐼}. □ Theorem 12.6.9. For every finite poset 𝑃, the number of monotone boolean functions on 𝑃 is equal to the number of order ideals in 𝑃. Proof. The map 𝑓 ↦ 𝐼𝑓 ∶= {𝑥 ∈ 𝑃 ∶ 𝑓(𝑥) = 0} is a bijection from the set of all monotone boolean functions on 𝑃 to the set of all order ideals in 𝑃. The inverse of this map is given by 𝐼 ↦ 𝑓𝐼 , where 𝑓𝐼 (𝑥) = 0 if 𝑥 ∈ 𝐼, and 𝑓𝐼 (𝑥) = 1 if 𝑥 ∉ 𝐼. □ The problem of determining 𝑀(𝑛), the number of monotone boolean functions on the poset (2[𝑛] , ⊆), is known as Dedekind’s problem. By identifying subsets of [𝑛] with their characteristic functions, a monotone boolean function 𝑓 ∶ 2[𝑛] → {0, 1} can be— and usually is—construed as a function 𝑓 ̂ ∶ {0, 1}𝑛 → {0, 1} having the property that if ̂ 1 , . . . , 𝑢𝑛 ) ≤ 𝑓(𝑡 ̂ 1 , . . . , 𝑡𝑛 ). Known values of 𝑀(𝑛) include 𝑢𝑖 ≤ 𝑡 𝑖 , 𝑖 = 1, . . . , 𝑛, then 𝑓(𝑢 𝑀(1) = 3, 𝑀(2) = 6, 𝑀(3) = 20, 𝑀(4) = 168, 𝑀(5) = 7581, and 𝑀(6) = 7, 828, 354. The values 𝑀(7) and 𝑀(8) have also been computed. See sequence 𝐴000372 in the Online Encyclopedia of Integer Sequences (oeis.org).
12.7. Matchings and systems of distinct representatives If 𝐴 and 𝐵 are finite sets and 𝑅 is a relation from 𝐴 to 𝐵, a function 𝑓 ∶ 𝐴 → 𝐵 is called a matching for 𝑅 if (𝑖) 𝑓 is injective, and (𝑖𝑖) 𝑓(𝑎) = 𝑏 ⇒ 𝑎𝑅𝑏. If (𝐵1 , . . . , 𝐵𝑛 ) is a sequence of subsets of the finite set 𝐵, a sequence (𝑏1 , . . . , 𝑏𝑛 ) of distinct elements of 𝐵 is called a system of distinct representatives (or 𝑆𝐷𝑅) for (𝐵1 , . . . , 𝐵𝑛 ) if 𝑏𝑖 ∈ 𝐵𝑖 , 𝑖 = 1, . . . , 𝑛. These two notions are, not surprisingly, intimately related. Theorem 12.7.1. Suppose that 𝐴 = {𝑎1 , . . . , 𝑎𝑛 } and 𝐵 are finite sets and 𝑅 ⊆ 𝐴 × 𝐵. The function 𝑓 ∶ 𝐴 → 𝐵 is a matching for 𝑅 if and only if (𝑓(𝑎1 ), . . . , 𝑓(𝑎𝑛 )) is a system of distinct representatives for (𝑅(𝑎1 ), . . . , 𝑅(𝑎𝑛 )). Proof. Straightforward.
□
Theorem 12.7.2. The sequence (𝑏1 , . . . , 𝑏𝑛 ) is a system of distinct representatives for (𝐵1 , . . . , 𝐵𝑛 ) if and only if, for the relation 𝑅 ⊆ [𝑛] × 𝐵 defined by (𝑖, 𝑏) ∈ 𝑅 ⇔ 𝑏 ∈ 𝐵𝑖 , the function 𝑓 ∶ [𝑛] → 𝐵 defined by 𝑓(𝑖) = 𝑏𝑖 is a matching. Proof. Straightforward.
□
12.7. Matchings and systems of distinct representatives
199
The following theorem, due to P. Hall (1935), is one of the classics of existential combinatorics: Theorem 12.7.3 (The marriage theorem). The relation 𝑅 ⊆ 𝐴 × 𝐵 has a matching if and only if |𝑅(𝑆)| ≥ |𝑆|, for all 𝑆 ⊆ 𝐴. Proof. Necessity. Obvious. Sufficiency. We may assume, with no loss of generality, that 𝐴 ∩ 𝐵 = ∅. Define a partial order ≤ on 𝐴 ∪ 𝐵 by 𝑥 ≤ 𝑦 ⇔ 𝑥 = 𝑦 or 𝑥𝑅𝑦. 𝐵 is an antichain in the poset (𝐴 ∪ 𝐵, ≤) since, if 𝑥 ≠ 𝑦, 𝑥 ≤ 𝑦 only if 𝑥 ∈ 𝐴 and 𝑦 ∈ 𝐵. Moreover, 𝐵 is an antichain of maximum cardinality. For suppose, to the contrary, that 𝑊 is an antichain and |𝑊| > |𝐵|. Now 𝑊 = (𝑊𝐴) + (𝑊𝐵), and so |𝑊𝐵| = |𝑊| − |𝑊𝐴|, whence, (12.7.1)
|𝐵 − (𝑊𝐵)| = |𝐵| − |𝑊𝐵| = |𝐵| − |𝑊| + |𝑊𝐴| < |𝑊𝐴|.
But (12.7.2)
𝑅(𝑊𝐴) ⊆ 𝐵 − (𝑊𝐵).
For, if not, there exist 𝑤 1 ∈ 𝑊𝐴 and 𝑤 2 ∈ 𝑊𝐵 such that 𝑤 1 𝑅𝑤 2 , and so 𝑤 1 < 𝑤 2 , since, as members of the disjoint sets 𝑊𝐴 and 𝑊𝐵, 𝑤 1 ≠ 𝑤 2 . But this would mean that 𝑊 is not an antichain. So by (12.7.2) and (12.7.1), |𝑅(𝑊𝐴)| ≤ |𝐵 − 𝑊𝐵| < |𝑊𝐴|, contradicting the assumption that |𝑅(𝑆)| ≥ |𝑆| for all 𝑆 ⊆ 𝐴. It follows from Dilworth’s chain decomposition theorem that there is a partition of 𝐴 ∪ 𝐵 consisting of |𝐵| chains. The set 𝐵 consists of one element from each of these chains. Each chain contains either one or two elements, and each 𝑎 ∈ 𝐴 belongs to a two-element chain. Since the chains are pairwise disjoint, we may match each 𝑎 ∈ 𝐴 to that 𝑏 ∈ 𝐵 for which {𝑎, 𝑏} is one of the chains. □ Here is an amusing elementary application of the marriage theorem. Consider the following truncated 6 × 6 board, with alternating red (𝑟) and white (𝑤) squares, and with deleted squares marked with an 𝑥. 𝑟∗ 𝑤 𝑟 𝑤 𝑟 𝑤
𝑥 𝑟∗ 𝑤 𝑟 𝑤 𝑟
𝑥 𝑥 𝑟∗ 𝑥 𝑟 𝑤
𝑤 𝑥 𝑥 𝑟 𝑤 𝑥
𝑟 𝑤 𝑥 𝑤 𝑟 𝑤
𝑤 𝑟 𝑤 𝑟 𝑤 𝑟
Can the 28 remaining squares be covered by nonoverlapping dominoes (rectangular pieces that cover two adjacent squares), with no domino extending beyond the board? The truncated board satisfies one obvious necessary condition for such a covering, namely, the fact that there are equal numbers of red and white squares (since a domino always covers a red and a white square). Nevertheless, such a covering is impossible here. For let 𝐴 be equal to the set of red squares, and let 𝐵 be equal to the set of white squares, and define a relation 𝑅 ⊆ 𝐴 × 𝐵 by 𝑎𝑅𝑏 if and only if squares 𝑎 and 𝑏 are adjacent. Clearly, there exists a covering of this truncated board if and only if 𝑅 has a matching, which, by the marriage theorem, will be the case if and only if
200
12. Ordered sets
|𝑅(𝑆)| ≥ |𝑆| for all 𝑆 ⊆ 𝐴. But for the set 𝑆 consisting of the three red squares marked with an asterisk (∗ ), |𝑅(𝑆)| = 2.
12.8. *Unimodality and logarithmic concavity A sequence (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) of real numbers is unimodal if there exists a (not necessarily unique) integer 𝑗, with 0 ≤ 𝑗 ≤ 𝑛, such that (12.8.1)
𝑎0 ≤ ⋯ ≤ 𝑎𝑗 ≥ 𝑎𝑗+1 ≥ ⋯ ≥ 𝑎𝑛 .
The characteristic property of the discrete graph of a unimodal sequence is that it has no valleys, i.e., no ups after a down. More formally, the following holds. Theorem 12.8.1. The sequence (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) is unimodal if and only if 𝑎𝑘 > 𝑎𝑘+1 implies that 𝑎𝑘+1 ≥ ⋯ ≥ 𝑎𝑛 (the preceding inequalities being vacuously satisfied if 𝑘 = 𝑛 − 1). Proof. Necessity. Suppose that (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) is unimodal, with 𝑗 as in (12.8.1), and 𝑎𝑘 > 𝑎𝑘+1 . Then 𝑘 ≥ 𝑗, and so 𝑎𝑘+1 ≥ ⋯ ≥ 𝑎𝑛 . Sufficiency. If the implication 𝑎𝑘 > 𝑎𝑘+1 ⇒ 𝑎𝑘+1 ≥ ⋯ ≥ 𝑎𝑛 holds vacuously, then 𝑎0 ≤ ⋯ ≤ 𝑎𝑛 , and so (12.8.1) holds for 𝑗 = 0. If there exists a 𝑘 such that 𝑎𝑘 > 𝑎𝑘+1 , and we let 𝑗 denote the smallest such 𝑘, then (12.8.1) holds. □ Remark 12.8.2. If a sequence is unimodal, then it attains its maximum value on a set of one or more consecutive integers. The latter property is clearly not sufficient for unimodality, however, as illustrated by the sequence (2,2,1,0,1). A sequence (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) of real numbers is logarithmically concave (or logconcave, abbreviated 𝐿𝐶), if, for 1 ≤ 𝑘 ≤ 𝑛 − 1, (12.8.2)
𝑎2𝑘 ≥ 𝑎𝑘−1 𝑎𝑘+1 ,
and strongly logarithmically concave (or strongly log-concave), abbreviated 𝑆𝐿𝐶, if (12.8.3)
𝑎2𝑘 > 𝑎𝑘−1 𝑎𝑘+1 .
The foregoing terminology is explained by the fact that a sequence (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) of positive real numbers is 𝐿𝐶 (resp., 𝑆𝐿𝐶) if and only if (12.8.4)
1 log 𝑎𝑘 ≥ (resp., >) (log 𝑎𝑘−1 + log 𝑎𝑘+1 ). 2
Theorem 12.8.3. If (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) is positive and LC, then it is unimodal. Proof. Suppose 𝑎𝑘 > 𝑎𝑘+1 . As noted earlier, if 𝑘 = 𝑛 − 1, the criterion for unimodality articulated in Theorem 12.8.1 is vacuously satisfied. If 𝑘 < 𝑛 − 1, then, in fact, 𝑎𝑘+1 > 𝑎𝑘+2 . For if not, then 0 < 𝑎𝑘+1 ≤ 𝑎𝑘+2 , which along with 0 < 𝑎𝑘+1 < 𝑎𝑘 , implies that 𝑎2𝑘+1 < 𝑎𝑘 𝑎𝑘+2 , contradicting LC. Iterating this argument as needed, it follows that 𝑎𝑘+1 > 𝑎𝑘+2 > ⋯ > 𝑎𝑛 . □
12.8. *Unimodality and logarithmic concavity
201
Remark 12.8.4. The sequence (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) has no internal zeros if there exist no integers 𝑖, 𝑗, and 𝑘, such that 0 ≤ 𝑖 < 𝑗 < 𝑘 ≤ 𝑛, with 𝑎𝑖 ≠ 0, 𝑎𝑗 = 0, and 𝑎𝑘 ≠ 0. A sequence with no internal zeros consists of a (possibly empty) zero sequence, followed by a (possibly empty) nonzero sequence, followed by a (possibly empty) zero sequence. In particular, a positive sequence has no internal zeros, and the same is true of the sequence (0, 0, . . . , 0). Clearly, the conclusion of Theorem 12.8.3 continues to hold if (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) is nonnegative and has no internal zeros. Mere nonnegativity, along with LC, does not, however, suffice to guarantee unimodality, as illustrated by the sequence (0, 1, 0, 0, 1). Theorem 12.8.5. If (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) is positive and SLC, then it attains its maximum on a set of one or two consecutive integers. Proof. Since SLC ⇒ LC, the sequence is unimodal by Theorem 12.8.3, and hence it attains its maximum on a set of one or more consecutive integers. But this set cannot contain three consecutive integers, for if 𝑎𝑘−1 = 𝑎𝑘 = 𝑎𝑘+1 , then 𝑎2𝑘 = 𝑎𝑘−1 𝑎𝑘+1 , contradicting SLC. □ Remark 12.8.6. The positivity condition in the above theorem can also be replaced with nonnegativity and the absence of internal zeros. But note that, given SLC, the latter condition allows for at most one zero at the beginning, and at most one zero at the end, of the sequence in question. The sequence (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) is symmetric if 𝑎𝑘 = 𝑎𝑛−𝑘 for all 𝑘 = 0, . . . , 𝑛. Theorem 12.8.7. If (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) is unimodal and symmetric, it attains its maximum value at 𝑟 central consecutive values of 𝑘, where 𝑟 ≡ 𝑛 + 1(mod2). In particular, this maximum value is always attained, inter alia, at 𝑘 = ⌊𝑛/2⌋. □
Proof. Exercise.
Corollary 12.8.8. If (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) is positive, symmetric, and SLC, its maximum value is attained at 𝑘 = 𝑛/2 if 𝑛 is even, and at ⌊𝑛/2⌋ and ⌊𝑛/2⌋ + 1 if 𝑛 is odd. □
Proof. Exercise. Theorem 12.8.9. The sequence (𝑛0), (𝑛1), . . . , (𝑛𝑛) is SLC. Proof. If 1 ≤ 𝑘 ≤ 𝑛 − 1, then 2
(12.8.5)
𝑛 𝑛 𝑛 𝑛 𝑘+1 𝑛−𝑘+1 𝑛 ⋅ ( )( )>( )( ). ( ) = 𝑘 𝑛−𝑘 𝑘−1 𝑘+1 𝑘−1 𝑘+1 𝑘
□
𝑛 Corollary 12.8.10. max{(𝑛𝑘) ∶ 0 ≤ 𝑘 ≤ 𝑛} = (⌊𝑛/2⌋ ).
Proof. Exercise. The following theorem states an important generalization of (12.8.5).
□
202
12. Ordered sets
Theorem 12.8.11 (Newton’s inequality). If 𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛 𝑥𝑛 ∈ ℝ[𝑥] has degree 𝑛 ≥ 2, and all roots of 𝑝(𝑥) are real, then 𝑘+1 𝑛−𝑘+1 (12.8.6) 𝑎2𝑘 ≥ ⋅ 𝑎𝑘−1 𝑎𝑘+1 , 1 ≤ 𝑘 ≤ 𝑛 − 1. 𝑘 𝑛−𝑘 Proof. If 𝑝(𝑥) has exclusively real roots, then the same is true of 𝐷𝑝(𝑥) and of the reciprocal polynomial 𝑅𝑝(𝑥) ∶= 𝑥𝑛 𝑝(1/𝑥), since the roots of 𝑅𝑝(𝑥) are the reciprocals of the nonzero roots of 𝑝(𝑥). Consequently, 𝑞(𝑥) ∶= 𝐷𝑛−𝑘−1 𝑅𝐷𝑘−1 𝑝(𝑥) (𝑘 − 1)! (𝑛 − 𝑘 + 1)! 𝑎𝑘−1 𝑥2 + 𝑘! (𝑛 − 𝑘)! 𝑎𝑘 𝑥 2 (𝑘 + 1)! (𝑛 − 𝑘 − 1)! + 𝑎𝑘+1 2 has exclusively real roots. If 𝑎𝑘−1 = 0, then (12.8.6) holds trivially. If not, then 𝑞(𝑥) is a quadratic polynomial with exclusively real roots, and (12.8.6) follows from the nonnegativity of the discriminant of 𝑞(𝑥). □ =
(12.8.7)
Corollary 12.8.12. The following sequences are 𝑆𝐿𝐶 for 𝑛 ≥ 2 : 𝑛 (i) (( )) 𝑘
,
0≤𝑘≤𝑛
𝑛 (ii) (( ) ) 𝑘 𝑞
,
and (iii) (𝑐(𝑛, 𝑘))0≤𝑘≤𝑛 .
0≤𝑘≤𝑛
Proof. We have already proved that sequence (𝑖) is SLC, but this result also follows im𝑛 mediately from the binomial theorem, (1+𝑥)𝑛 = ∑𝑘=0 (𝑛𝑘)𝑥𝑘 . That sequence (𝑖𝑖) is SLC 𝑘 𝑛 𝑞 ( 2 ) ( 𝑛) 𝑥 𝑘 , follows from the 𝑞-binomial theorem, (1+𝑥)(1+𝑞𝑥) ⋯ (1+𝑞𝑛−1 𝑥) = ∑ 𝑘=0
𝑘 𝑞
which, by Theorem 12.8.11, yields the inequality 2
2 −𝑘
𝑞𝑘
𝑛 𝑘 + 1 𝑛 − 𝑘 + 1 𝑘2 −𝑘+2 𝑛 𝑛 ⋅ 𝑞 ( ) ≥ ( ) ( ) , 𝑘 𝑘 𝑛−𝑘 𝑘−1 𝑘+1 𝑞
and, hence, the inequality
𝑞
2 (𝑛𝑘) 𝑞
𝑞
𝑛 𝑛 > (𝑘−1 ) (𝑘+1 ) . You are asked to give an alterna𝑞
𝑞
tive proof of the latter inequality in Exercise 12.11, analogous to the proof of formula (12.8.5). The fact that sequence (𝑖𝑖𝑖) is SLC follows from the formula 𝑛
𝑥𝑛 = ∑ 𝑐(𝑛, 𝑘)𝑥𝑘 .
□
𝑘=0
Remark 12.8.13. It is easy to see that if 𝑛 ≥ 2 and (𝑎𝑘 )0≤𝑘≤𝑛 is LC (respectively, SLC), then so are ((−1)𝑘 𝑎𝑘 )0≤𝑘≤𝑛 and ((−1)𝑛−𝑘 𝑎𝑘 )0≤𝑘≤𝑛 . In particular, it follows from the strong logarithmic concavity of the cycle numbers 𝑐(𝑛, 𝑘) that the Stirling numbers of the first kind, 𝑠(𝑛, 𝑘) = (−1)𝑛−𝑘 𝑐(𝑛, 𝑘), are SLC. It is also the case that the sequence (𝑆(𝑛, 𝑘))0≤𝑘≤𝑛 of Stirling numbers of the second kind is SLC, but the proof requires more than Newton’s inequality. See Kurtz (1972), where several sufficient conditions for SLC of the rows of a triangular array are established, based on the form of the recurrence relation generating the array. There is a long-standing conjecture that, for every 𝑛 ≥ 3, the sequence (𝑆(𝑛, 𝑘))0≤𝑘≤𝑛 of Stirling numbers of the second kind attains its maximum at a single value of k.
12.10. Lattices
203
12.9. Rank functions and Sperner posets If 𝑃 is any poset and 𝑥 ∈ 𝑃, the principal order ideal (POI) 𝐼𝑥 generated by x is the set 𝐼𝑥 ∶= {𝑦 ∈ 𝑃 ∶ 𝑦 ≤ 𝑥}. If all principal order ideals in 𝑃 are finite, we may define the rank function 𝑟 ∶ 𝑃 → 𝑁 by 𝑟(𝑥) = 𝑙(𝐼𝑥 )(= max{|𝐶| − 1 ∶ 𝐶 is a chain in 𝐼𝑥 }). Theorem 12.9.1. If 𝑃 is a finite poset and 𝑙(𝑃) = 𝑛, then the rank function 𝑟 of 𝑃 is a surjection from 𝑃 to {0, 1, . . . , 𝑛}, and 𝑟(𝑥) = 0 if and only if 𝑥 is a minimal element of 𝑃. Proof. It is clear that 𝑟(𝑥) ≤ 𝑛 for every 𝑥 ∈ 𝑃, and so 𝑟 ∶ 𝑃 → {0, 1, . . . , 𝑛}. Since 𝑙(𝑃) = 𝑛, there is a chain 𝑥0 < 𝑥1 < ⋯ < 𝑥𝑛 in 𝑃, and no chain in 𝑃 has larger cardinality. For each 𝑘 = 0, 1, . . . , 𝑛, we have 𝑟(𝑥𝑘 ) = 𝑘, since 𝑥0 < 𝑥1 < ⋯ < 𝑥𝑘 is a chain in the principal order ideal 𝐼𝑥𝑘 , and there can be no longer chain in 𝐼𝑥𝑘 . For if there were such a chain, then appending 𝑥𝑘+1 < ⋯ < 𝑥𝑛 to that chain would create a chain in 𝑃 of length greater than 𝑛. It is obvious that 𝑟(𝑥) = 0 if and only if 𝑥 is minimal. □ Suppose that 𝑟 is the rank function of the poset 𝑃, where 𝑙(𝑃) = 𝑛. For each 𝑘 = 0, 1, . . . , 𝑛, let 𝑃𝑘 ∶= {𝑥 ∈ 𝑃 ∶ 𝑟(𝑥) = 𝑘}. The set 𝑃𝑘 is called the 𝑘th rank of P. By the preceding theorem, each of the ranks is nonempty, and so (𝑃0 , 𝑃1 , . . . , 𝑃𝑛 ) is an ordered partition of 𝑃. Moreover, each of the ranks is an antichain in 𝑃. Recall that the width of a finite poset 𝑃 is the cardinality of a largest antichain in 𝑃. If such an antichain is to be found among the ranks of 𝑃, i.e., if (12.9.1)
𝑤(𝑃) = max{|𝑃𝑘 | ∶ 0 ≤ 𝑘 ≤ 𝑛},
then 𝑃 is said to be Sperner (or a Sperner poset). To show that a poset 𝑃 of length 𝑛 is Sperner, one shows that, for every antichain 𝐴 in 𝑃, |𝐴| ≤ max{|𝑃𝑘 | ∶ 0 ≤ 𝑘 ≤ 𝑛}. Recall that Lubell furnished just such a proof that the poset (2[𝑛] , ⊆) is Sperner (Theorem 12.6.7). Remark 12.9.2. Perhaps the best known poset that fails to be Sperner is the set 𝒫(𝑛), consisting of all partitions of [𝑛], ordered by refinement (if 𝒫 and 𝒬 are partitions of [𝑛], 𝒫 refines 𝒬, symbolized 𝒫 ≤ 𝒬, if every block of 𝒫 is a subset of a block of 𝒬). Canfield (1978) proved that the poset (𝒫(𝑛), ≤) may fail to be Sperner, thereby answering a question posed by Rota (1967). See also Graham (1978). Additional examples of Sperner and non-Sperner posets may be found in Stanley (1991). Remark 12.9.3. A finite poset 𝑃 of length 𝑛 is said to be rank-unimodal (resp., rankLC, rank-SLC, and rank-symmetric) if the numerical sequence (|𝑃0 |, |𝑃1 |, . . . , |𝑃𝑛 |) is unimodal (resp., LC, SLC, and symmetric). Since the cardinalities of the ranks are positive, rank-LC implies rank-unimodality.
12.10. Lattices If 𝑃 is any poset and 𝑥, 𝑦 ∈ 𝑃, an upper bound of 𝑥 and 𝑦 is an element 𝑏 ∈ 𝑃 such that 𝑥 ≤ 𝑏 and 𝑦 ≤ 𝑏. If 𝛽 is an upper bound of 𝑥 and 𝑦, and 𝛽 ≤ 𝑏 for every upper bound 𝑏 of 𝑥 and 𝑦, then 𝛽 is called a least upper bound (lub) or supremum (sup) of 𝑥 and 𝑦. If a least upper bound of 𝑥 and 𝑦 exists, it is unique. An element 𝑎 ∈ 𝑃 is a lower bound
204
12. Ordered sets
of x and 𝑦 if 𝑎 ≤ 𝑥 and 𝑎 ≤ 𝑦, and a lower bound 𝛼 of 𝑥 and 𝑦 is called a greatest lower bound (glb) or infimum (inf) of 𝑥 and 𝑦 if 𝑎 ≤ 𝛼 for every lower bound 𝑎 of 𝑥 and 𝑦. If a greatest lower bound of 𝑥 and 𝑦 exists, it is unique. If every pair of elements (hence every finite set of elements) of a poset has a supremum and an infimum, the poset is called a lattice (sometimes, for clarity, an order-theoretic lattice) and typically denoted by 𝐿, and its order ≤ is called a lattice order. The order-theoretic conception of a lattice has an exact algebraic counterpart. An algebraic structure (𝐿, ∨, ∧) is called an algebraic lattice if the operations ∨ and ∧ have the following properties: (i) The operations ∨ and ∧ are associative, commutative, and idempotent (𝑥 ∨ 𝑥 = 𝑥 ∧ 𝑥 = 𝑥). (ii) The operations ∨ and ∧ satisfy the absorption laws 𝑥 ∨ (𝑥 ∧ 𝑦) = 𝑥 ∧ (𝑥 ∨ 𝑦) = 𝑥. The following four theorems (the proofs of which are left as exercises; see Project 12.D) establish a one-to-one correspondence between order-theoretic and algebraic lattices on a set 𝐿. Theorem 12.10.1. Suppose that (𝐿, ≤) is an order-theoretic lattice. If binary operations ∨ and ∧ are defined on 𝐿 by 𝑥 ∨ 𝑦 ∶= sup{𝑥, 𝑦} (sometimes read “𝑗𝑜𝑖𝑛{𝑥, 𝑦}”) and 𝑥 ∧ 𝑦 ∶= inf{𝑥, 𝑦} (sometimes read “𝑚𝑒𝑒𝑡{𝑥, 𝑦}”), then (𝐿, ∨, ∧) is an algebraic lattice, called the algebraic lattice induced by (𝐿, ≤). Theorem 12.10.2. If (𝐿, ∨, ∧) is an algebraic lattice, and the binary relation ≤ on 𝐿 is defined by 𝑥 ≤ 𝑦 ⇔ 𝑥 ∧ 𝑦 = 𝑥 (equivalently, 𝑥 ∨ 𝑦 = 𝑦), then (𝐿, ≤) is an order-theoretic lattice, called the order-theoretic lattice induced by (𝐿, ∨, ∧). Theorem 12.10.3. If (𝐿, ≤) is an order-theoretic lattice and (𝐿, ∨, ∧) is the algebraic lattice induced by (𝐿, ≤), then the order-theoretic lattice induced by (𝐿, ∨, ∧) is precisely (𝐿, ≤). Theorem 12.10.4. If (𝐿, ∨, ∧) is an algebraic lattice and (𝐿, ≤) is the order-theoretic lattice induced by (𝐿, ∨, ∧), then the algebraic lattice induced by (𝐿, ≤) is precisely (𝐿, ∨, ∧). Remark 12.10.5. Recall that if (𝑃, ≤) is a poset and 𝑄 ⊆ 𝑃 is equipped with the binary relation ≤𝑄 defined by 𝑥 ≤𝑄 𝑦 ⇔ 𝑥 ≤ 𝑦, then ≤𝑄 is a partial order on 𝑄. However, if (𝐿, ≤) is a lattice, 𝑄 ⊆ 𝐿, and ≤𝑄 is defined as above, while (𝑄, ≤𝑄 ) is a poset, it need not be a lattice. And if it is a lattice, the binary operations ∨𝑄 and ∧𝑄 on 𝑄 induced by ≤𝑄 need not agree with the operations ∨ and ∧ induced by ≤ on 𝐿. Only when 𝑥 ∨𝑄 𝑦 = 𝑥 ∨ 𝑦 and 𝑥 ∧𝑄 𝑦 = 𝑥 ∧ 𝑦, for all 𝑥, 𝑦 ∈ 𝑄 do we call (𝑄, ≤𝑄 ) a sublattice of (𝐿, ≤). Figure 12.2 illustrates these distinctions. (i) The set 𝑄 = {𝑏, 𝑐}, with ordering inherited from (𝐿, ≤), is not a lattice. ̂ with ordering inherited from (𝐿, ≤), is a lattice. But 𝑏 ∨𝑄 𝑐 = (ii) The set 𝑄 = {0,̂ 𝑏, 𝑐, 1}, 1̂ ≠ 𝑏 ∨𝐿 𝑐 = 𝑎 and 𝑏 ∧𝑄 𝑐 = 0̂ ≠ 𝑏 ∧𝐿 𝑐 = 𝑑. So (𝑄, ≤𝑄 ) is not a sublattice of (𝐿, ≤).
References
205
Figure 12.2. Hasse diagram of the lattice (𝐿, ≤)
References [1] K. Arrow (1951): Social Choice and Individual Values, Cowles Commission Monograph 12, John Wiley and Sons. [2] K. Bogart (1990): Introductory Combinatorics, Harcourt, Brace, Jovanovich. MR1206900 [3] E. Canfield (1978): On a problem of Rota, Advances in Mathematics 29, 1–10. MR480066 [4] R. Dilworth (1950): A decomposition theorem for partially ordered sets, Annals of Mathematics 51, 161–166. [5] L. Ford and D. Fulkerson (1962): MR0159700
Flows in Networks, Princeton University Press.
[6] R. Graham (1978): Maximum antichains in the partition lattice, The Mathematical Intelligencer 2, 84–86. MR505555 [7] P. Hall (1935): On representatives of subsets, J. London Math. Soc. 10, 26–30. [8] J. Kelley (1955): General Topology, Van Nostrand (re-published in 1975 by SpringerVerlag). MR0070144 [9] D. Kurtz (1972): A note on concavity properties of triangular arrays of numbers, Journal of Combinatorial Theory 13, 135–139. MR304296 [10] D. Lubell (1966): A short proof of Sperner’s theorem, Journal of Combinatorial Theory 1, 299. MR194348 [11] L. Meshalkin (1963): Generalization of Sperner’s theorem on the number of subsets of a finite set, Theory of Probability and Its Applications 8, 203–204 (translation). MR0150049 [12] G.-C. Rota (1967): Research problem 2-1, Journal of Combinatorial Theory 2, 104. [13] A. Sen (1970): Collective Choice and Social Welfare, Holden-Day. [14] E. Sperner (1928): Ein Satz über Untermengen einer endlichen Menge, Mathematische Zeitschrift 27, 544–548. MR1544925 [15] R. Stanley (1991): Some applications of algebra to combinatorics, Discrete Applied Mathematics 34, 241–277. MR1137997 [16] E. Szpilrajn (1930): 386–389.
Sur l’extension de l’ordre partiel, Fundamenta Mathematicae 16,
[17] W. Trotter (1992): Combinatorics and Partially Ordered Sets: Dimension Theory, The Johns Hopkins University Press. MR1169299 [18] D. Velleman (2006): How to Prove It: A Structured Approach, 2𝑛𝑑 Edition, Cambridge University Press. MR2200345
206
12. Ordered sets
Exercises 12.1. Suppose that (𝑃, ≤) is a poset and consider the map 𝑥 ↦ 𝐼𝑥 from 𝑃 to the set of all principal order ideals of 𝑃, ordered by the subset relation. Prove that this map is an order isomorphism. (Hence, up to isomorphism, the only posets are families of sets, ordered by the subset relation.) 12.2. Suppose that 𝑃 is a finite poset and 𝑆 ⊆ 𝑃. (a) Prove that the set of maximal elements of 𝑆 (with 𝑆 regarded as an induced subposet of 𝑃) constitute an antichain. (b) Show by example that distinct subsets of 𝑃 may have the same set of maximal elements. (c) Prove that distinct order ideals must have distinct sets of maximal elements. 12.3. Suppose that 𝐀 = (𝐴1 , . . . , 𝐴𝑘 ) and 𝐁 = (𝐵1 , . . . , 𝐵𝑘 ) are ordered partitions of [𝑛]. We say that A and B are connected if, for some 𝑖, 𝐴𝑖 ⊂ 𝐵𝑖 or 𝐵𝑖 ⊂ 𝐴𝑖 . Otherwise, A and B are unconnected. Prove that, in the set of all ordered partitions of [𝑛] with 𝑘 blocks, the largest set of pairwise unconnected ordered partitions has cardinality max{(𝑛 ,.𝑛. .,𝑛 ) ∶ 𝑛1 + ⋯ + 𝑛𝑘 = 𝑛}. This result, due to Meshalkin 1 𝑘 (1963), reduces to Sperner’s theorem when 𝑘 = 2. 12.4. (For students of set theory) Prove that the following two formulations of the Hausdorff maximality principle are equivalent: (i) Every chain in a poset is contained in a maximal chain. (ii) Every poset contains a maximal chain. 12.5. Use Dilworth’s lemma (Theorem 12.6.2) to prove that if (𝑥1 , 𝑥2 , . . . , 𝑥𝑚𝑛+1 ) is a sequence of distinct real numbers, then this sequence contains an increasing subsequence of length 𝑚 + 1 or a decreasing subsequence of length 𝑛 + 1. Hint: Define a partial order 𝑅 on 𝑃 = {𝑥1 , 𝑥2 , . . . , 𝑥𝑚𝑛+1 } by 𝑥𝑖 𝑅𝑥𝑗 ⇔ 𝑖 ≤ 𝑗 and 𝑥𝑖 ≤ 𝑥𝑗 . To what subsequences of (𝑥1 , 𝑥2 , . . . , 𝑥𝑚𝑛+1 ) do the chains and antichains in the poset (𝑃, 𝑅) correspond?) 12.6. Prove Theorem 12.7.1: Suppose that 𝐴 = {𝑎1 , . . . , 𝑎𝑛 } and 𝐵 are finite sets and 𝑅 ⊆ 𝐴 × 𝐵. The function 𝑓 ∶ 𝐴 → 𝐵 is a matching for 𝑅 if and only if (𝑓(𝑎1 ), . . . , 𝑓(𝑎𝑛 )) is a system of distinct representatives for (𝑅(𝑎1 ), . . . , 𝑅(𝑎𝑛 )). 12.7. Prove Theorem 12.7.2: The sequence (𝑏1 , . . . , 𝑏𝑛 ) is a system of distinct representatives for (𝐵1 , . . . , 𝐵𝑛 ) if and only if, for the relation 𝑅 ⊆ [𝑛] × 𝐵 defined by (𝑖, 𝑏) ∈ 𝑅 ⇔ 𝑏 ∈ 𝐵𝑖 , the function 𝑓 ∶ [𝑛] → 𝐵 defined by 𝑓(𝑖) = 𝑏𝑖 is a matching. 12.8. Prove Theorem 12.8.7: If (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) is unimodal and symmetric, it attains its maximum value at 𝑟 central consecutive values of 𝑘, where 𝑟 ≡ 𝑛+1( mod 2). In particular, this maximum value is always attained, inter alia, at 𝑘 = ⌊𝑛/2⌋. 12.9. Prove Corollary 12.8.8: If (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) is positive, symmetric, and SLC, its maximum value is attained at 𝑘 = 𝑛/2 if 𝑛 is even, and at ⌊𝑛/2⌋ and ⌊𝑛/2⌋ + 1 if 𝑛 is odd.
12.D
207
12.10. (a) Prove Corollary 12.8.10: max{(𝑛𝑘) ∶ 0 ≤ 𝑘 ≤ 𝑛} = ⌊𝑛/2⌋. (b) Extend the preceding result to multinomial coefficients. 12.11. Show that the sequence ((𝑛0) , (𝑛1) , . . . , (𝑛𝑛) ) is SLC by a proof analogous to 𝑞 𝑞 𝑞 that of Theorem 12.8.9. 12.12. Let 𝑉 be an 𝑛-dimensional 𝐹𝑞 -vector space, with 𝑃 being the poset of subspaces of 𝑉, ordered by inclusion. Prove that 𝑃 is a Sperner poset. Determine the width of 𝑃 with as much specificity as possible. 12.13. (a) Prove that if 𝑅 is a quasi-order on a set 𝑋, then 𝑎𝑅 is a strict partial order on 𝑋. (b) Show by example that the mapping 𝑅 ↦ 𝑎𝑅 from the set of all quasi-orders on 𝑋 to the set of all strict partial orders on 𝑋 is not injective. (c) “A quasi-order on 𝑋 is the same as a partially ordered partition of 𝑋”. Give a mathematically precise formulation of this assertion, and prove it.
Projects 12.A Prove Theorems 12.2.1–12.2.6 and Corollary 12.2.7.
12.B Prove Theorems 12.3.1–12.3.4.
12.C Prove all theorems and corollaries in section 12.4. Hints: (i) In proving Theorem 12.4.10, you will need to show that if 𝑅 is transitive, then 𝑎𝑅 is transitive. To prove that the map 𝑅 ↦ 𝑎𝑅 is a bijection, with the asserted inverse, show that (1) for every partial order 𝑅, 𝑎𝑅 ∪ 𝐼 = 𝑅, and (2) for every strict partial order 𝑅, 𝑎(𝑅 ∪ 𝐼) = 𝑅. (ii) In proving Theorem 12.4.11, you will use Theorem 2.7.3. To prove that the map 𝑅 ↦ 𝑎𝑅 is a bijection, with the asserted inverse, show that (1) for every weak order 𝑅, 𝑎𝑅 ∪ 𝑠𝑐𝑎𝑅 = 𝑅 (Lemma: 𝑅 weak ⇒ 𝑠𝑐𝑎𝑅 = 𝑠𝑅), and (2) for every strict weak order 𝑅, 𝑎(𝑅 ∪ 𝑠𝑐𝑅) = 𝑅. (iii) In proving Theorem 12.4.12, you will need to show that if 𝑅 is complete and and antisymmetric, then 𝑎𝑅 is complete. To prove that the map 𝑅 ↦ 𝑎𝑅 is a bijection, with the asserted inverse, show that (1) for every total order 𝑅, 𝑎𝑅 ∪ 𝐼 = 𝑅, and (2) for for every strict total order 𝑅, 𝑎(𝑅 ∪ 𝐼) = 𝑅.
12.D Prove Theorems 12.10.1–12.10.4.
Chapter 13
Formal power series
We have previously remarked that convergence questions for combinatorial generating functions, viewed as power series over ℝ or ℂ, can be finessed by reconceptualizing such series as so-called formal power series. This chapter is devoted to an elaboration of that idea.
13.1. Semigroup algebras Suppose that (𝑆, Δ) is a semigroup with identity 𝑒 and 𝐴 is a commutative ring with 1. Let 𝐴[𝑆] ∶= {𝐹 ∶ 𝑆 → 𝐴 ∶ 𝐹 is finitely nonzero}, and define addition (+), multiplication (⋅), and scalar multiplication ( . ) on 𝐴[𝑆] by (𝐹 + 𝐺)(𝑠) ∶= 𝐹(𝑠) + 𝐺(𝑠), (𝐹 ⋅ 𝐺)(𝑠) ∶= ∑ᵆ∆𝑣=𝑠 𝐹(𝑢)𝐺(𝑣), and (𝜆.𝐹)(𝑠) ∶= 𝜆(𝐹(𝑠)), where 𝑢, 𝑣, 𝑠 ∈ 𝑆 and 𝜆 ∈ 𝐴. It is straightforward to verify that (𝐴[𝑆] , +, ⋅, .) is an 𝐴-algebra with additive identity 0, where 𝟎(𝑠) ≡ 0, and multiplicative identity 1, where 𝟏(𝑠) = 𝛿𝑒,𝑠 , and that the operation ⋅ is commutative if the operation Δ is commutative. This algebra is called the algebra of A relative to S. Such algebras are known generically as semigroup algebras, the most familiar example being the algebra of 𝐴 relative to the semigroup (ℕ, +), which is simply the algebra of polynomials over 𝐴. Let (𝑆, Δ) and 𝐴 be as above, but suppose now that for each 𝑠 ∈ 𝑆, there are only finitely many ordered pairs (𝑢, 𝑣) such that 𝑢Δ𝑣 = 𝑠. In this case we can equip the set 𝐴𝑆 of all functions from 𝑆 to 𝐴 with the structure of an 𝐴-algebra, using the very same definitions of addition, multiplication, and scalar multiplication as in the preceding paragraph. Here, however, the sum defining 𝐹 ⋅ 𝐺(𝑠) is well-defined simply because it has finitely many terms, in contrast to the case of semigroup algebra products, which may have infinitely many terms (only finitely many of which are nonzero). (𝐴𝑆 , +, ⋅, .), which is called the unrestricted algebra of A relative to S, is an 𝐴-algebra, with additive and multiplicative identities 0 and 1 defined as above, and the operation ⋅ is commutative if Δ is. Such algebras are known generically as unrestricted semigroup algebras. We have already encountered one such algebra in Chapter 8, the algebra of the field 209
210
13. Formal power series
of complex numbers ℂ relative to the multiplicative semigroup of positive integers ℙ, based on the Dirichlet product 𝐹 ⋅ 𝐺(𝑛) = ∑𝑑|𝑛 𝐹(𝑑)𝐺(𝑛/𝑑). In the next section, we introduce the unrestricted semigroup algebra with which we are chiefly concerned in this chapter, and show how it can be construed as an algebra of formal power series.
13.2. The Cauchy algebra We consider now the unrestricted algebra of ℂ relative to the additive semigroup of nonnegative integers ℕ, so that, for all 𝐹, 𝐺 ∈ ℂℕ , 𝑛 ∈ ℕ, and 𝜆 ∈ ℂ, we have (13.2.1)
(𝐹 + 𝐺)(𝑛) ∶= 𝐹(𝑛) + 𝐺(𝑛), 𝑛
(13.2.2)
𝐹 ⋅ 𝐺(𝑛) ∶= ∑ 𝐹(𝑘)𝐺(𝑛 − 𝑘), and
(13.2.3)
(𝜆.𝐹)(𝑛) ∶= 𝜆(𝐹(𝑛)).
𝑘=0
Henceforth, we shall usually omit the scalar multiplication symbol ( . ) in expressions such as (13.2.3). The product defined by (13.2.2) is called the Cauchy product of F and G. By the remarks in section 13.1, ℂℕ , equipped with the above operations, is a commutative ℂalgebra with additive identity 0, where 𝟎(𝑛) ∶≡ 0, and multiplicative identity 1, where 𝟏(𝑛) ∶= 𝛿0,𝑛 . This algebra is called the Cauchy algebra. If 𝐹 ∈ ℂℕ , nonnegative integral powers of 𝐹 are defined inductively by (𝑖) 𝐹 0 = 𝟏, and (𝑖𝑖) 𝐹 𝑟+1 = 𝐹 𝑟 ⋅ 𝐹 for all 𝑟 ≥ 0. Note that if 𝑟 ≥ 2, then, by the general associativity principle (Warner (1971, Theorem A.5)), 𝐹1 ⋅ 𝐹2 ⋯ 𝐹𝑟 is unambiguous, with (13.2.4)
(𝐹1 ⋅ 𝐹2 ⋯ 𝐹𝑟 )(𝑛) =
∑
𝐹1 (𝑛1 )𝐹2 (𝑛2 ) ⋯ 𝐹𝑟 (𝑛𝑟 ).
𝑛1 +𝑛2 +⋯+𝑛𝑟 =𝑛 𝑛𝑖 ≥0
Theorem 13.2.1. If 𝐹, 𝐺 ≠ 𝟎, then 𝐹 ⋅ 𝐺 ≠ 𝟎. Proof. Given 𝐹, 𝐺 ≠ 𝟎, let min{𝑘 ∶ 𝐹(𝑘) ≠ 0} = 𝑟 and min{𝑘 ∶ 𝐺(𝑘) ≠ 0} = 𝑠. Then 𝑟+𝑠 𝐹 ⋅ 𝐺(𝑟 + 𝑠) = ∑𝑘=0 𝐹(𝑘)𝐺(𝑟 + 𝑠 − 𝑘) = 𝐹(𝑟)𝐺(𝑠) ≠ 0, since 𝑘 < 𝑟 ⇒ 𝐹(𝑘) = 0 and 𝑘 > 𝑟 ⇒ 𝑟 + 𝑠 − 𝑘 < 𝑠, and so 𝐺(𝑟 + 𝑠 − 𝑘) = 0. Hence, 𝐹 ⋅ 𝐺 ≠ 𝟎. □ Theorem 13.2.2. 𝐹 ∈ ℂℕ has a multiplicative inverse if and only if 𝐹(0) ≠ 0. Proof. Necessity. Obvious. Sufficiency. Define 𝐺 ∶ ℕ → ℂ inductively by (𝑖) 𝐺(0) = 𝑛+1 −1 1/𝐹(0) and (𝑖𝑖) 𝐺(𝑛 + 1) = 𝐹(0) ∑𝑘=1 𝐹(𝑘)𝐺(𝑛 + 1 − 𝑘). It is easy to verify that 𝐹 ⋅ 𝐺 = 𝟏. □ Theorem 13.2.3. If 𝐹 ∈ ℂℕ has a multiplicative inverse and 𝑟 ∈ ℙ, then 𝐹 𝑟 has a multiplicative inverse, and (𝐹 𝑟 )−1 = (𝐹 −1 )𝑟 , their common value being denoted by 𝐹 −𝑟 . Proof. Straightforward.
□
The set ℂℕ 1 ∶= {𝐹 ∈ ℂℕ ∶ 𝐹(0) = 1} is in some ways analogous to the set of positive real numbers. The following theorems are illustrative.
13.3. Formal power series and polynomials over ℂ
211
Theorem 13.2.4. For every 𝐹 ∈ ℂℕ 1 and every 𝑟 ∈ ℙ, there exists in ℂℕ 1 a unique 𝐺 (denoted 𝐹 1/𝑟 ) such that 𝐺 𝑟 = 𝐹. Proof. By (13.2.4), we wish to find 𝐺 ∈ ℂℕ 1 such that (13.2.5)
𝐹(𝑛) =
∑
𝐺(𝑛1 ) ⋯ 𝐺(𝑛𝑟 ), for all 𝑛 ≥ 0.
𝑛1 +⋯+𝑛𝑟 =𝑛 𝑛𝑖 ≥0 1
For 𝑛 = 0, this yields 𝐺(0) = 1 and for 𝑛 = 1, 𝐺(1) = 𝑟 𝐹(1). In general, for 𝑛 ≥ 2, separating out the 𝑟 cases in (13.2.5) in which some 𝑛𝑖 = 𝑛, and solving for 𝐺(𝑛), we have (13.2.6)
𝐺(𝑛) =
1 ∑ 𝐺(𝑛1 )𝐺(𝑛2 ) ⋯ 𝐺(𝑛𝑟 )}, {𝐹(𝑛) − 𝑟 𝑛1 +⋯+𝑛𝑟 =𝑛 0≤𝑛𝑖 ≤𝑛−1
□
which determines 𝐺(𝑛) recursively.
Theorem 13.2.5. For every 𝐹 ∈ ℂℕ 1, 𝑟 ∈ ℙ, and 𝑞 ∈ ℤ, there exists in ℂℕ 1 a unique 𝐺 (denoted 𝐹 𝑞/𝑟 ) such that 𝐺 𝑟 = 𝐹 𝑞 . Proof. Since 𝐹 𝑞 (0) = 1, replace 𝐹 with 𝐹 𝑞 in Theorem 13.2.4.
□
Theorem 13.2.6. Suppose that 𝐹, 𝐺 ∈ ℝℕ , and that 𝑟 ∈ ℙ. If 𝑟 is odd and 𝐹 𝑟 = 𝐺 𝑟 , then 𝐹 = 𝐺. If 𝑟 is even and 𝐹 𝑟 = 𝐺 𝑟 , then 𝐹 = 𝐺 or 𝐹 = −𝐺. Proof. If 𝐹 = 𝟎 or 𝐺 = 𝟎, the result is obvious, so suppose that 𝐹, 𝐺 ≠ 𝟎. Let 𝜔 denote 𝑟 the 𝑟th root of unity 𝑒2𝜋𝑖/𝑟 . Then 𝐹 𝑟 − 𝐺 𝑟 = ∏𝑗=1 (𝐹 − 𝜔𝑗 𝐺) = 𝟎. If 𝜔𝑗 ∉ ℝ, then 𝐹 − 𝜔𝑗 𝐺 ≠ 𝟎, since 𝐹 and 𝐺 are real-valued and 𝐹, 𝐺 ≠ 𝟎. If 𝑟 is odd, then 𝜔𝑗 is real if and only if 𝑗 = 𝑟, and so 𝐹 − 𝜔𝑟 𝐺 = 𝐹 − 𝐺 = 𝟎, and 𝐹 = 𝐺. If 𝑟 is even, then 𝜔𝑗 is real if and only if 𝑗 = 𝑟 or 𝑗 = 𝑟/2, and so 𝐹 − 𝐺 = 𝟎, or 𝐹 + 𝐺 = 𝟎, whence 𝐹 = 𝐺 or 𝐹 = −𝐺. □
13.3. Formal power series and polynomials over ℂ It is time to make good on our claim that one can in good conscience deploy and manipulate combinatorial generating functions without attending to the convergence questions that arise in the classical analysis of real or complex power series. We begin by defining a function 𝑋 ∶ ℕ → ℂ by 𝑋(𝑛) ∶= 𝛿1,𝑛 . Although it is a fixed member of ℂℕ , 𝑋 is referred to as an indeterminate. It is easy to check that (13.3.1)
𝑋 𝑟 (𝑛) = 𝛿𝑟,𝑛 , for all 𝑟 ∈ ℕ.
In particular, 𝑋 0 = 𝟏. Theorem 13.3.1. For all 𝐹 ∈ ℂℕ , (13.3.2)
𝐹 = ∑ 𝐹(𝑛)𝑋 𝑛 . 𝑛≥0
212
13. Formal power series
Proof. Recall that if Ω is any nonempty set, the sequence (𝜏𝑛 )𝑛≥0 in ℂΩ is said to converge pointwise to 𝜏 ∈ ℂΩ if, for all 𝜔 ∈ Ω, the sequence of complex numbers (𝜏𝑛 (𝜔))𝑛≥0 converges to 𝜏(𝜔) with respect to the standard Euclidean metric on ℂ. In classical analysis Ω is usually an interval of real numbers, or a region of ℂ. In the present case, we take Ω = ℕ and 𝜏𝑛 = 𝑆𝑛 ∶= 𝐹(0)𝑋 0 + 𝐹(1)𝑋 + ⋯ + 𝐹(𝑛)𝑋 𝑛 , and we interpret formula (13.3.2) to assert that (𝑆𝑛 )𝑛≥0 converges pointwise to 𝐹. This is easily proved, for if 𝑗 ∈ ℕ, the sequence (𝐹(𝑛)𝑋 𝑛 (𝑗))𝑛≥0 has a single nonzero term, namely, 𝐹(𝑗)𝑋 𝑗 (𝑗) = 𝐹(𝑗). So (𝑖) 𝑆𝑛 (𝑗) = 0 if 0 ≤ 𝑛 < 𝑗, and (𝑖𝑖) 𝑆𝑛 (𝑗) = 𝐹(𝑗), for all 𝑛 ≥ 𝑗. In other words, for every 𝑗 ∈ ℕ, it is the case that, for sufficiently large 𝑛, 𝑆𝑛 (𝑗) is actually equal to 𝐹(𝑗). (This means, of course, that the preceding proof holds, not only for the standard Euclidean metric, but for any metric whatsoever on ℂ, a point which we elaborate in the second paragraph below.) □ Expressions of the form ∑𝑛≥0 𝑎𝑛 𝑋 𝑛 are called formal power series. By Theorem 13.3.1, it is trivially the case that ∑𝑛≥0 𝑎𝑛 𝑋 𝑛 = ∑𝑛≥0 𝑏𝑛 𝑋 𝑛 if and only if 𝑎𝑛 = 𝑏𝑛 for all 𝑛 ∈ ℕ. Along with formulas (13.2.1)–(13.2.3), Theorem 13.3.1 implies that (13.3.3a)
∑ 𝑎𝑛 𝑋 𝑛 + ∑ 𝑏𝑛 𝑋 𝑛 = ∑ (𝑎𝑛 + 𝑏𝑛 )𝑋 𝑛 , 𝑛≥0
𝑛≥0
𝑛≥0 𝑛
(13.3.3b)
∑ 𝑎𝑛 𝑋 𝑛 ⋅ ∑ 𝑏𝑛 𝑋 𝑛 = ∑ ( ∑ 𝑎𝑘 𝑏𝑛−𝑘 )𝑋 𝑛 , and, 𝑛≥0
(13.3.3c)
𝑛≥0
𝑛≥0 𝑘=0
𝑛
𝜆 ∑ 𝑎𝑛 𝑋 = ∑ (𝜆𝑎𝑛 )𝑋 𝑛 . 𝑛≥0
𝑛≥0 ℕ
The set of all functions 𝐹 ∈ ℂ (when represented as formal power series as in (13.3.2)) is denoted by 𝐶[[𝑋]]. Finitely nonzero functions 𝐹 ∈ ℂℕ are called polynomials, since 𝑛 their representations as formal power series take the form ∑𝑘=0 𝐹(𝑘)𝑋 𝑘 , where 𝑛 = max{𝑘 ∶ 𝐹(𝑘) ≠ 0}. The set of all polynomials, represented in this way, is denoted by 𝐶[𝑋]. Clearly, 𝐶[𝑋] is a subalgebra of 𝐶[[𝑋]]. N.B. One should take care to distinguish between the cases in which (𝑖) 𝐹 is a polynomial in the indeterminate X, and (𝑖𝑖) 𝐹(𝑛) is a polynomial function of n (such as 𝐹(𝑛) = 𝑛2 ). As an aside to the proof of Theorem 13.3.1 above, we noted that the argument for the pointwise convergence of 𝑆𝑛 = 𝐹(0)𝑋 0 + 𝐹(1)𝑋 + ⋯ + 𝐹(𝑛)𝑋 𝑛 to F holds for any metric on ℂ. In particular, it holds for the discrete metric 𝛿∗ defined by (1) 𝛿∗ (𝑥, 𝑥) = 0 and (2) 𝛿∗ (𝑥, 𝑦) = 1 if 𝑥 ≠ 𝑦, for which every subset of ℂ is open. Of course, there are sequences (𝐺𝑛 )𝑛≥0 in ℂℕ that converge pointwise to some 𝐺 ∈ ℂℕ in the usual sense that lim𝑛→∞ 𝐺𝑛 (𝑗) = 𝐺(𝑗) with respect to the standard Euclidean metric on ℂ, but for which lim𝑛→∞ 𝐺𝑛 (𝑗) fails to converge with respect to 𝛿∗ . As it turns out, however, we shall only need to deal with sequences in ℂℕ that converge pointwise, with ℂ equipped with the metric 𝛿∗ , and hence with the following: (1) sequences (𝑠𝑛 )𝑛≥0 in ℂ that are strongly convergent (say, to 𝑠 ∈ ℂ), in the sense that 𝑠𝑛 is eventually equal to 𝑠; (2) sequences (𝑆𝑛 )𝑛≥0 in ℂℕ that are strongly pointwise convergent (say, to 𝑆 ∈ ℂℕ ), in the sense that, for all 𝑗 ∈ ℕ, (𝑆𝑛 (𝑗))𝑛≥0 strongly converges to 𝑆(𝑗); and
13.3. Formal power series and polynomials over ℂ
213
(3) infinite series ∑𝑛≥0 𝐹𝑛 that are strongly pointwise convergent (say, to 𝑆 ∈ ℂℕ ), in the sense that (𝑆𝑛 )𝑛≥0 is strongly pointwise convergent to 𝑆, where 𝑆𝑛 ∶= 𝐹0 + ⋯ + 𝐹𝑛 . This is essentially the strategy followed by Niven (1969) in his beautiful treatment of formal power series and formal infinite products, which relies on the essentially finitary nature of the sequences and series under consideration. In the remainder of this chapter we follow Niven’s exposition closely, with one important exception. We define an absolute value function on ℂℕ , and analyze convergence in terms of the corresponding metric on ℂℕ . Convergence of a sequence in ℂℕ with respect to this metric turns out, unsurprisingly, to be equivalent to strong pointwise convergence (see Theorem 13.3.6). But there are at least two features that recommend our approach. First, we are able to discuss convergence without having to drill down so often to the pointwise behavior of sequences in ℂℕ . Second, by stating theorems in terms of the aforementioned absolute value, we are able to highlight some lovely parallels with, as well as some striking departures from, classical analysis. We begin by introducing the important function ord ∶ ℂℕ → ℕ ∪ {∞}, defined by (13.3.4a)
ord(𝟎) ∶= ∞, and
(13.3.4b)
ord(𝐹) ∶= min{𝑗 ∶ 𝐹(𝑗) ≠ 0}, for all 𝐹 ≠ 𝟎.
So if 𝐹(0) = 𝐹(1) = ⋯ = 𝐹(𝑛 − 1) = 0 and 𝐹(𝑛) ≠ 0, then ord(𝐹) = 𝑛. In particular, ord(𝟏) = 0, ord(𝑋) = 1, and ord(𝑋 𝑛 ) = 𝑛. In what follows, we employ the convention 𝑛 + ∞ = ∞ + 𝑛 = ∞ + ∞ = ∞. Theorem 13.3.2. For all 𝐹, 𝐺 ∈ ℂℕ and all nonzero 𝜆 ∈ ℂ, (13.3.5a)
ord(𝐹 ⋅ 𝐺) = ord(𝐹) + ord(𝐺) and
(13.3.5b)
ord(𝜆𝐹) = ord(𝐹). □
Proof. Exercise. Theorem 13.3.3. For all 𝐹, 𝐺 ∈ ℂℕ , (13.3.6)
ord(𝐹 + 𝐺) ≥ min{ord(𝐹), ord(𝐺)},
and if ord(𝐹) ≠ ord(𝐺), then ord(𝐹 + 𝐺) = min{ord(𝐹), ord(𝐺)}. □
Proof. Exercise. Next we define the absolute value function | | on ℂℕ by (13.3.7)
|𝐹| = 2− ord(𝐹) ,
where 2−∞ ∶= 0. Note that the range of | | is equal to the set {1, 1/2, 1/4, . . . , 0}. Theorem 13.3.4. The function | | has the following properties. (i) |𝐹| = 0 if and only if 𝐹 = 𝟎. (ii) |𝐹 ⋅ 𝐺| = |𝐹||𝐺|. (iii) For all nonzero 𝜆 ∈ ℂ, |𝜆𝐹| = |𝐹|. (iv) |𝐹 + 𝐺| ≤ max{|𝐹|, |𝐺|}, and if |𝐹| ≠ |𝐺|, then |𝐹 + 𝐺| = max{|𝐹|, |𝐺|}.
214
13. Formal power series
□
Proof. Exercise.
Property (iv) above is a stronger version of the triangle inequality, known as the non-Archimedean property. Note that |𝐹| = 1 if and only if 𝐹(0) ≠ 0 and |𝐹| < 1 if and only if 𝐹(0) = 0. Finally, we define a mapping 𝑑 ∶ ℂℕ × ℂℕ → {1, 1/2, 1/4, . . . , 0} by (13.3.8)
𝑑(𝐹, 𝐺) ∶= |𝐹 − 𝐺|.
Theorem 13.3.5. The mapping 𝑑 has the following properties. (i) For all 𝐹, 𝐺 ∈ ℂℕ , 𝑑(𝐹, 𝐺) = 0 if and only if 𝐹 = 𝐺. (ii) For all 𝐹, 𝐺 ∈ ℂℕ , 𝑑(𝐹, 𝐺) = 𝑑(𝐺, 𝐹). (iii) For all 𝐹, 𝐺, 𝐻 ∈ ℂℕ , 𝑑(𝐹, 𝐻) ≤ 𝑑(𝐹, 𝐺) + 𝑑(𝐺, 𝐻) and, in fact, (iv) 𝑑(𝐹, 𝐻) ≤ max{𝑑(𝐹, 𝐺), 𝑑(𝐺, 𝐻)}, with 𝑑(𝐹, 𝐻) = max{𝑑(𝐹, 𝐺), 𝑑(𝐺, 𝐻)} if 𝑑(𝐹, 𝐺) ≠ 𝑑(𝐺, 𝐻). Proof. Immediate, from Theorem 13.3.4.
□
By (i)–(iii) above, 𝑑 is a metric on ℂℕ . Property (iv) is a stronger version of the triangle inequality (iii), known as the ultrametric inequality, and any metric with this property is called an ultrametric. In an ultrametric space, all triangles are isosceles. (Why?) We are finally prepared to deal with convergence. Suppose that (𝐹𝑛 )𝑛≥0 is a sequence in ℂℕ and 𝐹 ∈ ℂℕ . As in any metric space, we say that this sequence converges to F, symbolized lim𝑛→∞ 𝐹𝑛 = 𝐹, if lim𝑛→∞ 𝑑(𝐹𝑛 , 𝐹) = 0. Theorem 13.3.6. The following statements are equivalent. (1) lim𝑛→∞ 𝐹𝑛 = 𝐹. (2) lim𝑛→∞ 𝑑(𝐹𝑛 , 𝐹) = 0. (3) lim𝑛→∞ |𝐹𝑛 − 𝐹| = 0. (4) lim𝑛→∞ 2−𝑜𝑟𝑑(𝐹𝑛 −𝐹) = 0. (5) lim𝑛→∞ 𝑜𝑟𝑑(𝐹𝑛 − 𝐹) = ∞. (6) For all 𝑗 ∈ ℕ, there exists an 𝑁 𝑗 ∈ ℕ such that 𝑛 ≥ 𝑁 𝑗 ⇒ ord(𝐹𝑛 − 𝐹) > 𝑗. (7) For all 𝑗 ∈ ℕ, there exists an 𝑁 𝑗 ∈ ℕ such that 𝑛 ≥ 𝑁 𝑗 ⇒ 𝐹𝑛 (𝑖) = 𝐹(𝑖), 𝑖 = 0, 1, . . . , 𝑗. (8) For all 𝑗 ∈ ℕ, there exists an 𝑛𝑗 ∈ ℕ such that 𝑛 ≥ 𝑛𝑗 ⇒ 𝐹𝑛 (𝑗) = 𝐹(𝑗). Proof. The equivalence of statements (1)–(7) is simply definitional. Statement (8) follows from statement (7) by setting 𝑛𝑗 = 𝑁 𝑗 , and by invoking the case 𝑖 = 𝑗 of (7). Assertion (7) follows from (8) by taking 𝑁 = max{𝑛0 , 𝑛1 , . . . , 𝑛𝑗 }. □ Assertion (8) above asserts that, for all 𝑗 ∈ ℕ, 𝐹𝑛 (𝑗) is eventually equal to 𝐹(𝑗), and so, as promised earlier, lim𝑛→∞ 𝐹𝑛 = 𝐹 exactly captures the notion of the strong pointwise convergence of (𝐹𝑛 )𝑛≥0 to 𝐹.
13.4. Infinite sums in ℂℕ
215
Theorem 13.3.7. If (𝐹𝑛 )𝑛≥0 and (𝐺𝑛 )𝑛≥0 are sequences in ℂℕ , 𝜆 ∈ ℂ, lim𝑛→∞ 𝐹𝑛 = 𝐹, and lim𝑛→∞ 𝐺𝑛 = 𝐺, then lim𝑛→∞ (𝐹𝑛 + 𝐺𝑛 ) = 𝐹 + 𝐺 and lim𝑛→∞ 𝜆𝐹𝑛 = 𝜆𝐹. Proof. Straightforward, using Theorem 13.3.6(8).
□
Remark 13.3.8. Students of topology will recognize that the equivalence of statements (1) and (8) in Theorem 13.3.6 is just what would be the case if ℂℕ were equipped with the so-called topology of pointwise convergence, where ℂ is equipped with the discrete topology. In fact, the topology on ℂℕ induced by the ultrametric 𝑑 is precisely this topology, as shown in Appendix B.
13.4. Infinite sums in ℂℕ Suppose that (𝐹𝑛 )𝑛≥0 is a sequence in ℂℕ . For all 𝑛 ≥ 0, let (13.4.1)
𝑆𝑛 ∶= 𝐹0 + ⋯ + 𝐹𝑛 . ℕ
If there exists an 𝑆 ∈ ℂ such that lim𝑛→∞ 𝑆𝑛 = 𝑆, we say that the infinite series ∑𝑛≥0 𝐹𝑛 converges to S, symbolized by ∑ 𝐹𝑛 = 𝑆.
(13.4.2)
𝑛≥0
Theorem 13.4.1. The infinite series ∑𝑛≥0 𝐹𝑛 converges if and only if lim𝑛→∞ 𝐹𝑛 = 𝟎. (“This is soft analysis”. – Joseph Schoenfield (1969)) Proof. Necessity. Suppose that ∑𝑛≥0 𝐹𝑛 = 𝑆, with 𝑆𝑛 defined by (13.4.1). Then for each 𝑗 ∈ ℕ, there exists an 𝑛𝑗 ∈ ℕ such that 𝑛 ≥ 𝑛𝑗 ⇒ 𝑆𝑛 (𝑗) = 𝑆(𝑗). So if 𝑛 ≥ 𝑛𝑗 + 1, then 𝐹𝑛 (𝑗) = 𝑆𝑛 (𝑗) − 𝑆𝑛−1 (𝑗) = 𝑆(𝑗) − 𝑆(𝑗) = 0. Sufficiency. Suppose that lim𝑛→∞ 𝐹𝑛 = 𝟎. Then for each 𝑗 ∈ ℕ, there exists an 𝑛𝑗 ∈ ℕ such that 𝑛 ≥ 𝑛𝑗 ⇒ 𝐹𝑛 (𝑗) = 0. So 𝑛 ≥ 𝑛𝑗 ⇒ 𝑆𝑛 (𝑗) = 𝑆(𝑗) ∶= 𝐹0 (𝑗) + ⋯ + 𝐹𝑛𝑗 −1 (𝑗). □ It follows as a corollary of the above theorem that every infinite series of the form ∑𝑛≥0 𝑎𝑛 𝑋 𝑛 converges (cf. Theorem 13.3.1). If (𝐹𝑛 )𝑛≥0 is a sequence in ℂℕ , the sequence (𝐺𝑛 )𝑛≥0 is called a rearrangement of (𝐹𝑛 )𝑛≥0 if there exists a permutation 𝜎 of ℕ such that 𝐺𝑛 = 𝐹𝜍(𝑛) for all 𝑛. Theorem 13.4.2. If ∑𝑛≥0 𝐹𝑛 converges and 𝜎 is any permutation of ℕ, then ∑𝑛≥0 𝐹𝜍(𝑛) converges and (13.4.3)
∑ 𝐹𝑛 = ∑ 𝐹𝜍(𝑛) . 𝑛≥0
𝑛≥0
Proof. Consider the matrix 𝑀 = (𝐹𝑛 (𝑗))𝑛,𝑗≥0 , with 𝑛 being the row indicator, and 𝑗 the column indicator. By Theorem 13.4.1, lim𝑛→∞ 𝐹𝑛 = 𝟎, and so all columns of 𝑀 are finitely nonzero. Since 𝑀𝜍 = (𝐹𝜍(𝑛) (𝑗))𝑛,𝑗≥0 comes from 𝑀 by the obvious permutation of the rows of 𝑀, the columns of 𝑀𝜍 are also finitely nonzero, and the 𝑗th column sum of 𝑀𝜍 is equal to the 𝑗th column sum of 𝑀 for all 𝑗 ≥ 0. □
216
13. Formal power series
In classical complex analysis, a sequence (𝑎𝑛 )𝑛≥0 is said to be summable if (𝑖) ∑𝑛≥0 𝑎𝑛 converges, and (𝑖𝑖) for every permutation 𝜎 of ℕ, ∑𝑛≥0 𝑎𝜍(𝑛) converges, with ∑𝑛≥0 𝑎𝜍(𝑛) = ∑𝑛≥0 𝑎𝑛 . As will be familiar to most readers, the mere convergence of ∑𝑛≥0 𝑎𝑛 does not imply the summability of (𝑎𝑛 )𝑛≥0 . Indeed, (𝑎𝑛 )𝑛≥0 is summable if and only if ∑𝑛≥0 |𝑎𝑛 | converges. See Wade (2005, Theorems 4.17 and 4.18), as well as Appendix A. By Theorem 13.4.2, the situation is quite different for sequences in ℂℕ , where convergence of ∑𝑛≥0 𝐹𝑛 with respect to the ultrametric d is equivalent to summability of (𝐹𝑛 )𝑛≥0 . In what follows, this equivalence enables us to refer directly to the summability of sequences, omitting any reference to the convergence of their associated series. Suppose that (𝐹𝑛 )𝑛≥0 is a sequence in ℂℕ . The sequence (𝐺𝑛 )𝑛≥0 is said to result from insertion of parentheses in (𝐹𝑛 )𝑛≥0 if there exists a strictly increasing function 𝑝 ∶ ℕ → ℕ, with 𝑝(0) = 0, and 𝐺𝑛 ∶= ∑𝑝(𝑛)≤𝑖𝑟 is summable, and (13.4.6)
∑ 𝐹𝑛 = ∑ 𝐹𝑛 + ∑ 𝐹𝑛 . 𝑛≥0
0≤𝑛≤𝑟
𝑛>𝑟
□
Proof. Straightforward. Let ℂℕ 0 ∶= {𝐹 ∈ ℂℕ ∶ 𝐹(0) = 0}. Recall that 𝐹 ∈ ℂℕ 0 if and only if |𝐹| < 1.
Theorem 13.4.5 (Geometric series in ℂℕ .). If |𝐹| < 1, then (𝐹 𝑛 )𝑛≥0 is summable, and (13.4.7)
∑ 𝐹 𝑛 = (𝟏 − 𝐹)−1 . 𝑛≥0
In particular, (13.4.8)
∑ 𝑋 𝑛 = (𝟏 − 𝑋)−1 . 𝑛≥0
13.5. Summation interchange
217
Proof. By Theorem 13.3.4(𝑖𝑖), |𝐹 𝑛 | = |𝐹|𝑛 , and since |𝐹| < 1, lim𝑛→∞ |𝐹 𝑛 | = 0. So by Theorem 13.3.6 (3), lim𝑛→∞ 𝐹 𝑛 = 𝟎, and so (𝐹 𝑛 )𝑛≥0 is summable. To complete the proof, it suffices to show that (𝟏 − 𝐹) ⋅ ∑𝑛≥0 𝐹 𝑛 = 𝟏. By equation (13.4.5), (𝟏 − 𝐹) ⋅ ∑ 𝐹 𝑛 = ∑ (𝟏 − 𝐹) ⋅ 𝐹 𝑛 = ∑ (𝐹 𝑛 − 𝐹 𝑛+1 ) 𝑛≥0
𝑛≥0
𝑛≥0 0
1
= lim {(𝐹 − 𝐹 ) + ⋯ + (𝐹 𝑛 − 𝐹 𝑛+1 )} = lim (𝟏 − 𝐹 𝑛+1 ) = 𝟏. 𝑛→∞
𝑛→∞
□
13.5. Summation interchange In view of Theorems 13.4.1 – 13.4.4, it is tempting to conclude that one can play fast and loose with summation of any sort in the realm of formal power series. However, one cannot be too cavalier in summing the members of a double sequence in ℂℕ , even when, displayed as a rectangular array, its rows and columns are finitely nonzero. Consider, for example, the double sequence (𝐹𝑖,𝑗 )𝑖,𝑗≥0 shown in Table 13.1. Table 13.1. The double sequence (𝐹 𝑖,𝑗 )𝑖,𝑗≥0
𝟎 𝟏 −𝟏 𝟎 𝟎 −𝟏 𝟎 𝟎 𝟎 𝟎 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
𝟎 𝟎 𝟎 𝟎 𝟎 𝟏 𝟎 𝟎 𝟎 𝟎 𝟎 𝟏 𝟎 𝟎 𝟎 −𝟏 𝟎 𝟏 𝟎 𝟎 𝟎 −𝟏 𝟎 𝟏 𝟎 ⋅ 𝟎 −𝟏 𝟎 𝟏 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋯ ⋯ ⋯ ⋯ ⋯ etc.
In the above array, the rows and the row sums are summable, and the columns and the column sums are summable. But the sum of the row sums is equal to 1 and the sum of the column sums is equal to −𝟏. And summing this array along its diagonals (i.e., 𝟎 + (−𝟏 + 𝟏) + (𝟎 + 𝟎 + 𝟎) + (𝟎 + −𝟏 + 𝟏 + 𝟎) + ⋯) yields a still different sum, namely, 0. We would like to identify conditions that allow us to sum the members of a double sequence in ℂℕ by rows, columns, or diagonals, each yielding the same sum. Actually, we can accomplish considerably more. Let us say that (𝐹𝑖,𝑗 )𝑖,𝑗≥0 is summable if there exists an 𝑆 ∈ ℂℕ such that, for every bijection 𝑔 ∶ ℕ → ℕ × ℕ, the sequence (𝐺𝑛 )𝑛≥0 in ℂℕ defined by 𝐺𝑛 ∶= 𝐹𝑔(𝑛) is summable to 𝑆, in which case we write ∑𝑖,𝑗≥0 𝐹𝑖,𝑗 = 𝑆. With (𝐹𝑖,𝑗 )𝑖,𝑗≥0 as above, and 𝑟 ∈ ℕ, let Π𝑟 ∶= {(𝑖, 𝑗) ∈ ℕ × ℕ ∶ 0 ≤ ord(𝐹𝑖,𝑗 ) ≤ 𝑟}. If Π𝑟 is finite for all 𝑟, we say that (𝐹𝑖,𝑗 )𝑖,𝑗≥0 is finitary. Theorem 13.5.1. If the double sequence (𝐹𝑖,𝑗 )𝑖,𝑗≥0 is finitary, then the following hold. (i) It is summable. (ii) Each of its rows and columns is summable.
218
13. Formal power series
(iii) The sequence of row sums and sequence of column sums are both summable, and the sum of the row sums is equal to the sum of the column sums, i.e., ∑(∑ 𝐹𝑖,𝑗 ) = ∑ (∑ 𝐹𝑖,𝑗 ).
(13.5.1)
𝑖≥0 𝑗≥0
𝑗≥0 𝑖≥0
(iv) Furthermore, denoting the common value of the preceding sums by 𝑆, it is the case that 𝑖
∑( ∑ 𝐹𝑖−𝑗,𝑗 ) = 𝑆 (diagonal summation of (𝐹𝑖,𝑗 )𝑖,𝑗≥0 ).
(13.5.2)
𝑖≥0 𝑗=0
Proof. (i) Suppose 𝑔 ∶ ℕ → ℕ × ℕ is a bijection and 𝐺𝑛 ∶= 𝐹𝑔(𝑛) . To show that (𝐺𝑛 )𝑛≥0 is summable it suffices by Theorem 13.4.1 to show that, for all 𝑟 ∈ ℕ, there exists an 𝑛𝑟 ∈ ℕ such that 𝑛 > 𝑛𝑟 ⇒ ord(𝐺𝑛 ) > 𝑟. Clearly, 𝑛𝑟 = max 𝑔−1 (Π𝑟 ) satisfies this desideratum. Suppose now that ℎ ∶ ℕ → ℕ × ℕ is also a bijection and 𝐻𝑛 ∶= 𝐹 ℎ(𝑛) . Let 𝜎 ∶= 𝑔−1 ∘ ℎ. Clearly, 𝜎 is a permutation of ℕ, and 𝐺𝜍(𝑛) = 𝐹𝑔(𝜍(𝑛)) = 𝐹 ℎ(𝑛) = 𝐻𝑛 . As a rearrangement of (𝐺𝑛 )𝑛≥0 , the sequence (𝐻𝑛 )𝑛≥0 is summable, and ∑𝑛≥0 𝐻𝑛 = ∑𝑛≥0 𝐺𝑛 . (ii) For each 𝑟 ∈ ℕ, let 𝑖𝑟 ∶= max{𝑖 ∶ (𝑖, 𝑗) ∈ Π𝑟 }, and let 𝑗𝑟 ∶= max{𝑗 ∶ (𝑖, 𝑗) ∈ Π𝑟 }. Then, for every 𝑖 ∈ ℕ, 𝑗 > 𝑗 ⇒ ord(𝐹𝑖,𝑗 ) > 𝑟. So lim𝑗→∞ 𝐹𝑖,𝑗 = 𝟎, and (𝐹𝑖,𝑗 )𝑗≥0 is summable, with, say, ∑𝑗≥0 𝐹𝑖,𝑗 = 𝑅𝑖 . Similarly, for every 𝑗 ∈ ℕ, 𝑖 > 𝑖𝑟 ⇒ ord(𝐹𝑖,𝑗 ) > 𝑟. So lim𝑖→∞ 𝐹𝑖,𝑗 = 𝟎, and (𝐹𝑖,𝑗 )𝑖≥0 is summable, with, say, ∑𝑖≥0 𝐹𝑖,𝑗 = 𝐶𝑗 . (iii) Suppose, as in (𝑖) above, that 𝑔 ∶ ℕ → ℕ × ℕ is a bijection, 𝐺𝑛 = 𝐹𝑔(𝑛) , and ∑𝑛≥0 𝐺𝑛 = 𝑆 𝐺 . Let 𝑆 𝐺,𝑛 ∶= 𝐺 0 + ⋯ + 𝐺𝑛 , and 𝑆 𝑅,𝑛 ∶= 𝑅0 + ⋯ + 𝑅𝑛 . To show that (𝑅𝑛 )𝑛≥0 is summable and ∑𝑛≥0 𝑅𝑛 = 𝑆 𝐺 , it suffices to show that, for each 𝑟 ∈ ℕ, ord(𝑆 𝐺,𝑛 −𝑆 𝑅,𝑛 ) > 𝑟 for 𝑛 sufficiently large. Let 𝑖𝑟 and 𝑗𝑟 be defined as in part (𝑖𝑖) above. Note that Π𝑟 ⊆ Π(𝑖𝑟 , 𝑗𝑟 ) ∶= {0, 1, . . . , 𝑖𝑟 } × {0, 1, . . . , 𝑗𝑟 }. Now let 𝑛 be sufficiently large to ensure that Π(𝑖𝑟 , 𝑗𝑟 ) ⊆ {𝑔(0), 𝑔(1), . . . , 𝑔(𝑛)}. In particular, this implies that 𝑛 ≥ 𝑖𝑟 . Then ord(𝑆 𝐺,𝑛 − 𝑆 𝑅,𝑛 ) = ord[(𝑆 𝐺,𝑛 − ∑ 𝐹𝑖,𝑗 ) − (𝑆 𝑅,𝑛 − ∑ 𝐹𝑖,𝑗 )] > 𝑟, 0≤𝑖≤𝑖𝑟 0≤𝑗≤𝑗𝑟
0≤𝑖≤𝑖𝑟 0≤𝑗≤𝑗𝑟
since (1) each of the finitely many terms in 𝑆 𝐺,𝑛 − ∑ 0≤𝑖≤𝑖𝑟 𝐹𝑖,𝑗 has order greater than 0≤𝑗≤𝑗𝑟
𝑟, and (2)
𝑆 𝑅,𝑛 − ∑ 𝐹𝑖,𝑗 = (𝑆 𝑅,𝑖𝑟 − ∑ 𝐹𝑖,𝑗 ) + ∑ 𝑅𝑖 0≤𝑖≤𝑖𝑟 0≤𝑗≤𝑗𝑟
0≤𝑖≤𝑖𝑟 0≤𝑗≤𝑗𝑟
𝑖𝑟 𝑗𝑟
𝑖𝑟 𝑛𝑗 implies that 𝐹𝑛 (𝑗) = 0, and there exists an 𝑛𝑗+1 such that 𝑛 > 𝑛𝑗+1 implies that 𝐹𝑛 (𝑗 + 1) = 0, whence 𝒟𝐹𝑛 (𝑗) = (𝑗 + 1)𝐹𝑛 (𝑗 + 1) = 0. Hence, lim𝑛→∞ 𝒟𝐹𝑛 = 𝟎, and (𝒟𝐹𝑛 )𝑛≥0 is summable. Moreover, for each 𝑗 ∈ ℕ, 𝑛𝑗+1
𝑛𝑗+1
[𝒟 ∑ 𝐹𝑛 ](𝑗) = (𝑗 + 1) ( ∑ 𝐹𝑛 ) (𝑗 + 1) = (𝑗 + 1) ∑ 𝐹𝑛 (𝑗 + 1) = ∑ (𝑗 + 1)𝐹𝑛 (𝑗 + 1) 𝑛≥0
𝑛≥0
𝑛=0
𝑛=0
= ∑ (𝑗 + 1)𝐹𝑛 (𝑗 + 1) = ∑ 𝒟𝐹𝑛 (𝑗) = [ ∑ 𝒟𝐹𝑛 ](𝑗), 𝑛≥0
which establishes equation (13.6.11).
𝑛≥0
𝑛≥0
□
13.7. The formal logarithm
221
13.7. The formal logarithm 𝑥𝑛
By (9.4.5) it follows that ln(1 + 𝑥) = ∑𝑛≥1 (−1)𝑛+1 𝑛 if −1 < 𝑥 ≤ 1. So it is perhaps unsurprising that the formal logarithm function ℒ is defined for |𝐹| < 1 by (13.7.1)
(−1)𝑛+1 𝑛 𝐹 , 𝑛 𝑛≥1
ℒ(𝟏 + 𝐹) ∶= ∑
which converges by Theorem 13.4.1. In particular, 𝐿(𝟏+𝑋) = 𝑋 −𝑋 2 /2+𝑋 3 /3−𝑋 4 /4+ ⋯. Note that ℒ ∶ ℂℕ 1 → ℂℕ 0. Theorem 13.7.1. For all 𝐺 ∈ ℂℕ 1, (13.7.2)
𝒟(ℒ(𝐺)) = 𝐺 −1 ⋅ 𝒟(𝐺).
Proof. Suppose that 𝐺 = 𝟏 + 𝐹, where 𝐹 ∈ ℂℕ 0. Then, by (13.6.11), (13.6.4), (13.6.6), (−1)𝑛+1 (13.4.5), and (13.4.7), 𝒟(ℒ(𝐺)) = 𝒟 ∑𝑛≥1 𝑛 𝐹 𝑛 = ∑𝑛≥1 (−1)𝑛+1 𝐹 𝑛−1 ⋅ 𝒟𝐹 = (𝟏 + 𝐹)−1 ⋅ 𝒟𝐹 = 𝐺 −1 ⋅ 𝒟𝐺, since 𝒟𝐺 = 𝒟𝐹. □ Theorem 13.7.2. If 𝐹, 𝐺 ∈ ℂℕ 1,then (13.7.3)
ℒ(𝐹 ⋅ 𝐺) = ℒ(𝐹) + ℒ(𝐺),
(13.7.4)
ℒ(𝐹 −1 ) = −ℒ(𝐹),
and, for all 𝑚 ∈ ℤ, (13.7.5)
ℒ(𝐹 𝑚 ) = 𝑚ℒ(𝐹).
Proof. Since ℒ(𝐹 ⋅ 𝐺) ∈ ℂℕ 0 and ℒ(𝐹) + ℒ(𝐺) ∈ ℂℕ 0, it suffices by Theorem 13.6.1(𝑖) to show that 𝒟(ℒ(𝐹 ⋅ 𝐺)) = 𝒟(ℒ(𝐹) + ℒ(𝐺)). But 𝒟(ℒ(𝐹 ⋅ 𝐺)) = (𝐹 ⋅ 𝐺)−1 ⋅ 𝒟(𝐹 ⋅ 𝐺) = (𝐺 −1 ⋅ 𝐹 −1 ) ⋅ (𝐺 ⋅ 𝒟𝐹 + 𝐹 ⋅ 𝒟𝐺) = 𝐹 −1 ⋅ 𝒟𝐹 + 𝐺 −1 ⋅ 𝒟𝐺 = 𝒟(ℒ(𝐹)) + 𝒟(ℒ(𝐺)) = 𝒟(ℒ(𝐹) + ℒ(𝐺)). The proofs of (13.7.4) and (13.7.5) are left as exercises. □ Theorem 13.7.3. For all 𝑟 ∈ ℚ, and all 𝐺 ∈ ℂℕ 1, (13.7.6)
ℒ(𝐺 𝑟 ) = 𝑟ℒ(𝐺).
Proof. Suppose that 𝑟 = 𝑚/𝑛, where 𝑚 ∈ ℤ and 𝑛 ∈ ℙ. Let 𝐺𝑚/𝑛 = 𝐻, so that 𝐺 𝑚 = 𝐻 𝑛 , and ℒ(𝐺 𝑚 ) = ℒ(𝐻 𝑛 ). So, by (13.7.5), 𝑚ℒ(𝐺) = 𝑛ℒ(𝐻), which implies (13.7.6). □ Theorem 13.7.4. For all 𝐺 ∈ ℂℕ 1, ℒ(𝐺) = 𝟎 if and only if 𝐺 = 𝟏. Proof. Sufficiency. Straightforward. Necessity. Since ℒ(𝐺) = 𝟎, 𝒟(ℒ(𝐺)) = 𝐺 −1 ⋅ 𝒟(𝐺) = 𝟎. Since 𝐺 −1 ≠ 𝟎, 𝒟(𝐺) = 𝟎, so 𝐺 = 𝜆𝟏 for some 𝜆 ∈ ℂ. Since 𝐺 ∈ ℂℕ 1, 𝜆 = 1. □ Theorem 13.7.5. The mapping ℒ is injective. Proof. If 𝐺, 𝐻 ∈ ℂℕ 1, and ℒ(𝐺) = ℒ(𝐻), then ℒ(𝐺) − ℒ(𝐻) = ℒ(𝐺 ⋅ 𝐻 −1 ) = 𝟎, and so, by Theorem 13.7.4, 𝐺 ⋅ 𝐻 −1 = 𝟏, and 𝐺 = 𝐻. □ We conclude this section with a generalization of Theorem 13.6.5.
222
13. Formal power series
Theorem 13.7.6. Suppose that |𝐹| < 1. For all 𝑟 ∈ ℚ, 𝑟 (𝟏 + 𝐹)𝑟 = ∑ ( )𝐹 𝑛 . 𝑛 𝑛≥0
(13.7.7) Proof. The sequence ((𝑛𝑟 )𝐹 𝑛 )
𝑛≥0
is clearly summable, so suppose that ∑𝑛≥0 (𝑛𝑟 )𝐹 𝑛 =
𝐺. By Theorem 13.6.6, 𝒟(𝐺) = 𝒟(𝐹) ⋅ ∑𝑛≥1 𝑛(𝑛𝑟 )𝐹 𝑛−1 , and so 𝑟 𝑟 (𝟏 + 𝐹) ⋅ 𝒟(𝐺) = 𝒟(𝐹) ⋅ ∑ 𝑛( )𝐹 𝑛−1 + 𝒟(𝐹) ⋅ ∑ 𝑛( )𝐹 𝑛 𝑛 𝑛 𝑛≥1 𝑛≥1 𝑟 𝑟 = 𝒟(𝐹) ⋅ ∑ 𝑛( )𝐹 𝑛−1 + 𝒟(𝐹) ⋅ ∑ (𝑛 − 1)( )𝐹 𝑛−1 𝑛 𝑛 − 1 𝑛≥1 𝑛≥2 𝑟 𝑟 = 𝑟𝒟(𝐹) + 𝒟(𝐹) ⋅ ∑ (𝑛( ) + (𝑛 − 1)( )) 𝐹 𝑛−1 𝑛 𝑛 − 1 𝑛≥2 = 𝑟𝒟(𝐹) + 𝑟𝒟(𝐹) ⋅ ∑ ( 𝑛≥2
𝑟 𝑟 )𝐹 𝑛−1 = 𝑟𝒟(𝐹) ⋅ ∑ ( )𝐹 𝑛 = 𝑟𝐺 ⋅ 𝒟(𝐹). 𝑛−1 𝑛 𝑛≥0
Multiplying the result, (𝟏 + 𝐹) ⋅ 𝒟(𝐺) = 𝑟𝐺 ⋅ 𝒟(𝐹), of the above derivation by 𝐺 −1 ⋅ (𝟏 + 𝐹)−1 yields 𝒟(ℒ(𝐺)) = 𝐺 −1 ⋅ 𝒟(𝐺) = 𝑟(𝟏 + 𝐹)−1 ⋅ 𝒟(𝐹) = 𝑟(𝟏 + 𝐹)−1 ⋅ 𝒟(𝟏 + 𝐹) = 𝑟𝒟(ℒ(𝟏 + 𝐹)) = 𝒟(𝑟ℒ(𝟏 + 𝐹)) = 𝒟(ℒ(𝟏 + 𝐹)𝑟 ). Since ℒ(𝐺) and ℒ((𝟏 + 𝐹)𝑟 ) are both members of ℂℕ 0, it follows from Theorem 13.6.1 that ℒ(𝐺) = ℒ((𝟏 + 𝐹)𝑟 ). Since ℒ is injective, (𝟏 + 𝐹)𝑟 = 𝐺 = ∑𝑛≥0 (𝑛𝑟 )𝐹 𝑛 . □ See Exercise 13.9 for a nice application of the above theorem.
13.8. The formal exponential function Suppose that |𝐹| < 1, so that lim𝑛→∞ 𝐹 𝑛 /𝑛! = 𝟎. Then (𝐹 𝑛 /𝑛! )𝑛≥0 is summable, and we define the formal exponential function ℰ by ℰ(𝐹) ∶= 𝟏 + 𝐹 + 𝐹 2 /2! + ⋯ = ∑ 𝐹 𝑛 /𝑛! .
(13.8.1)
𝑛≥0 𝑛
ℕ
In particular, ℰ(𝑋) = ∑𝑛≥0 𝑋 /𝑛!. Note that ℰ ∶ ℂ 0 → ℂℕ 1. Theorem 13.8.1. For all 𝐹 ∈ ℂℕ 0, (13.8.2)
𝒟(ℰ(𝐹)) = ℰ(𝐹) ⋅ 𝒟(𝐹).
Proof. By formula (13.8.1), Theorem 13.6.6, formula (13.6.6), and formula 13.4.5, 𝐹𝑛 𝐹𝑛 𝐹𝑛 𝐹𝑛 = ∑𝒟 = ∑( ⋅ 𝒟(𝐹)) = ( ∑ ) ⋅ 𝒟(𝐹) = ℰ(𝐹) ⋅ 𝒟(𝐹). 𝑛! 𝑛! 𝑛! 𝑛! 𝑛≥0 𝑛≥0 𝑛≥0 𝑛≥0 □
𝒟(ℰ(𝐹)) = 𝒟 ∑
Theorem 13.8.2. For all 𝐹 ∈ ℂℕ 0, ℒ(ℰ(𝐹)) = 𝐹. Proof. By Theorems 13.7.1 and 13.8.1, we have 𝒟(ℒ(ℰ(𝐹))) = (ℰ(𝐹))−1 ⋅ℰ(𝐹)⋅𝒟(𝐹) = 𝒟(𝐹), and so ℒ(ℰ(𝐹)) = 𝐹 by Theorem 13.6.1(𝑖). □
Exercises
223
Theorem 13.8.3. For all 𝐺 ∈ ℂℕ 1, ℰ(ℒ(𝐺)) = 𝐺. Proof. By Theorem 13.8.2, ℒ(ℰ(ℒ(𝐺))) = ℒ(𝐺), and so ℰ(ℒ(𝐺)) = 𝐺 by Theorem 13.7.5. □ The preceding exposition has established formal power series analogues of most of the combinatorial generating functions encountered in previous chapters, the sole exception being the infinite products arising in Chapter 8 as generating functions for integer partitions. Interested readers will find an account of formal infinite products in Niven (1969).
References [1] E. Cashwell and C. Everett (1959): The ring of number-theoretic functions, Pacific Journal of Mathematics 9, 975–985 [2] H. Gould (1974): Coefficient identities for powers of Taylor and Dirichlet series, American Mathematical Monthly 81, 3–14. [3] I. Niven (1969): Formal power series, American Mathematical Monthly 76, 871–889. MR252386 [4] R. Rosenthal (1953): On functions with infinitely many derivatives, Proceedings of the American Mathematical Society 4, 600–602 [5] R. Sedgewick and P. Flagolet (2013): An Introduction to the Analysis of Algorithms, Addison-Wesley. [6] J. Shoenfield (1969): Remark made during the author’s defense of a dissertation (directed by Leonard Carlitz) on 𝑝-adic analysis. [7] W. Wade (2005): An Introduction to Analysis, Prentice-Hall. MR911681 [8] S. Warner (1971): Classical Modern Algebra, Prentice-Hall. MR0267998
Exercises 13.1. Prove Theorem 13.2.3. 13.2. Prove Theorems 13.3.2 and 13.3.3. 13.3. Prove Theorem 13.3.4. 13.4. Prove Theorem 13.4.4. 13.5. Prove the converse of Corollary 13.5.2: If the double sequence (𝐹𝑖,𝑗 )𝑖,𝑗≥0 is summable, then it is finitary. 13.6. Suppose that the sequence (𝐻𝑖 )𝑖≥0 is summable, and ord(𝐻𝑖 ) > 𝑟 for all 𝑖. Prove that ord(∑𝑖≥0 𝐻𝑖 ) > 𝑟. 13.7. Prove Theorem 13.6.1. 13.8. Prove Theorem 13.6.2.
224
13. Formal power series
13.9. Suppose that 𝐹(0) = 𝐹(1) = 1, and 𝐹(𝑛) = 𝐹(𝑛 − 1) + 𝐹(𝑛 − 2) for all 𝑛 ≥ 2. Derive the results about the Fibonacci numbers introduced in Chapter 1 using formal power series. Specifically, prove that 𝐹 = (𝟏 − 𝑋 − 𝑋 2 )−1 , and expanding this expression by Theorem 13.7.6, rederive the result of Exercise 1.4, namely, 𝑛 𝐹(𝑛) = ∑𝑘=0 (𝑛−𝑘 ). 𝑘 13.10. Prove the formal power series analogue of Theorem 4.3.5, namely, ∑ 𝑃𝑛 𝑋 𝑛 /𝑛! = (𝟏 − (ℰ(𝑋) − 𝟏))−1 , 𝑛≥0
justifying the summation interchange in the proof. 13.11. Prove the formal power series analogue of Theorem 6.1.3, namely, ∑ 𝐵𝑛 𝑋 𝑛 /𝑛! = ℰ(ℰ(𝑋) − 𝟏), 𝑛≥0
justifying the summation interchange in the proof.
Projects 13.A (For students of algebra) The ring of number theoretic functions, introduced above in section 8.1, turns out to be isomorphic to the ring of formal power series over ℂ in a countably infinite number of indeterminates. Cashwell and Everett (The ring of number-theoretic functions, Pacific Journal of Mathematics 9 (1959), 975–985) prove that the latter ring, and hence the former, is a unique factorization domain. Compose a detailed exposition of their proof.
13.B (For students of analysis) An old theorem of Borel states that, given an arbitrary sequence (𝑎𝑛 )𝑛≥0 in ℝ, there exists a function 𝑓 that is infinitely differentiable on the open interval (−1, 1) such that, for all 𝑛, 𝑎𝑛 = 𝑓(𝑛) (0)/𝑛! (the power series ∑𝑛≥0 𝑎𝑛 𝑥𝑛 may of course converge only for 𝑥 = 0). Read the proof of this theorem in R. Rosenthal (1953): On functions with infinitely many derivatives, Proceedings of the American Mathematical Society 4, 600–602, and compose a detailed exposition of Rosenthal’s proof. Can this theorem be used to finesse convergence questions for combinatorial generating functions? See the paper of H. Gould (1974): Coefficient identities for powers of Taylor and Dirichlet series, American Mathematical Monthly 81, 3–14.
13.C In section 11.9 we discussed the distribution polynomial ∑𝛿∈∆ 𝑞𝑠(𝛿) of a statistic 𝑠 ∶ Δ → ℕ on a finite set of discrete structures Δ. This idea can be extended to the case in which Δ is any set of combinatorial objects (including symbols from a formal language), and Δ is infinite (as long as 𝑠−1 ({𝑛}) is finite for all 𝑛 ∈ ℕ) by regarding
13.C
225
∑𝛿∈∆ 𝑞𝑠(𝛿) = ∑𝑛≥0 |𝑠−1 ({𝑛})|𝑞𝑛 as a formal power series in the indeterminate 𝑞. Moreover, if Γ is another set of combinatorial objects equipped with a statistic 𝑢, then the product and sum of the formal power series ∑𝛿∈∆ 𝑞𝑠(𝛿) and ∑𝛾∈Γ 𝑞ᵆ(𝛾) are connected in an interesting way with their Cartesian product Δ × Γ, and with the union of disjoint copies of Δ and Γ. Such constructions turn out to be useful in identifying functional equations satisfied by certain ordinary generating functions. Read section 3.9 (“The symbolic method”) of the text An Introduction to the Analysis of Algorithms, by R. Sedgewick and P. Flagolet (Addison-Wesley 2013), which elaborates on this idea, and prepare a class presentation of this material, with some illustrative applications.
Chapter 14
Incidence algebra: The grand unified theory of enumerative combinatorics
We are at last in a position to consider what is arguably the most attractive unification to date of a substantial part of enumerative combinatorics, namely, the theory of incidence algebras of a locally finite poset. Introduced by Gian-Carlo Rota (1964) in the seminal paper, On the foundations of combinatorial theory (I): Theory of Möbius functions, and further developed in a series of foundational papers by his students and other mathematicians attracted to the elegance of that theory (as well as Rota’s charismatic personality and crystalline prose), this theory has come to occupy a position of central importance in enumerative combinatorics. Of particular note is the paper, On the foundations of combinatorial theory (VI): The idea of generating function, by Doubilet, Rota, and Stanley (1972), which introduced the theory of binomial posets and their factorial functions, thereby providing a satisfying explanation of why various types of combinatorial problems are inevitably associated with particular types of generating functions.
14.1. The incidence algebra of a locally finite poset If 𝑥 ≤ 𝑦 in the poset 𝑃, the set [𝑥, 𝑦] ∶= {𝑧 ∈ 𝑃 ∶ 𝑥 ≤ 𝑧 ≤ 𝑦} is called the closed interval from x to y, and the set (𝑥, 𝑦) ∶= {𝑧 ∈ 𝑃 ∶ 𝑥 < 𝑧 < 𝑦} the open interval from x to y. A poset is said to be locally finite if all of its closed (equivalently, open) intervals are finite. In what follows, all posets are assumed to be locally finite, and Int(𝑃) denotes the set of all closed intervals of 𝑃. Consider the set ℂInt(𝑃) = {𝑓 ∶ Int(𝑃) → ℂ}, equipped with pointwise addition (denoted by +) and scalar multiplication (denoted by ., or simply by concatenation) 227
228
14. Incidence Algebra
and product ∗ defined by 𝑓 ∗ 𝑔(𝑥, 𝑦) ∶= ∑ 𝑓(𝑥, 𝑧)𝑔(𝑧, 𝑦),
(14.1.1)
𝑥≤𝑧≤𝑦
where we abuse notation by writing 𝑓(𝑥, 𝑦) instead of the correct, but cumbersome, notation 𝑓([𝑥, 𝑦]). Theorem 14.1.1. The structure (ℂInt(𝑃) , +, ∗, .) is a ℂ-algebra, called the incidence algebra of P, and is denoted by 𝐼𝐴(𝑃). The multiplicative identity 𝟏 of 𝐼𝐴(𝑃) is defined by (14.1.2)
𝟏(𝑥, 𝑦) = 𝛿𝑥,𝑦 , i.e., 𝟏(𝑥, 𝑦) = 1 if 𝑥 = 𝑦, and 𝟏(𝑥, 𝑦) = 0 if 𝑥 < 𝑦,
and a function 𝑓 has a multiplicative inverse if and only if, for all 𝑥 ∈ 𝑃, 𝑓(𝑥, 𝑥) ≠ 0. Proof. The associativity of ∗ follows from the computation 𝑓 ∗ (𝑔 ∗ ℎ)(𝑥, 𝑦) = ∑ 𝑓(𝑥, 𝑧)(𝑔 ∗ ℎ)(𝑧, 𝑦) = ∑ 𝑓(𝑥, 𝑧) ∑ 𝑔(𝑧, 𝑤)ℎ(𝑤, 𝑦) 𝑥≤𝑧≤𝑦
=
𝑥≤𝑧≤𝑦
∑
𝑧≤𝑤≤𝑦
𝑓(𝑥, 𝑧)𝑔(𝑧, 𝑤)ℎ(𝑤, 𝑦) = ∑ ( ∑ 𝑓(𝑥, 𝑧)𝑔(𝑧, 𝑤))ℎ(𝑤, 𝑦)
𝑥≤𝑧≤𝑤≤𝑦
𝑥≤𝑤≤𝑦 𝑥≤𝑧≤𝑤
= ∑ (𝑓 ∗ 𝑔)(𝑥, 𝑤)ℎ(𝑤, 𝑦) = (𝑓 ∗ 𝑔) ∗ ℎ(𝑥, 𝑦). 𝑥≤𝑤≤𝑦
It follows from the generalized associativity principle (Warner (1965, Theorem 18.5)) that, for every 𝑘 ∈ ℙ, 𝑓1 ∗ 𝑓2 ∗ ⋯ ∗ 𝑓𝑘 is unambiguous, with (14.1.3) 𝑓1 ∗ 𝑓2 ∗ ⋯ ∗ 𝑓𝑘 (𝑥, 𝑦) =
∑
𝑓1 (𝑥, 𝑧1 )𝑓2 (𝑧1 , 𝑧2 ) ⋯ 𝑓𝑘 (𝑧𝑘−1 , 𝑦).
𝑥≤𝑧1 ≤𝑧2 ≤⋯≤𝑧𝑘−1 ≤𝑦
That 1 is the multiplicative identity of 𝐼𝐴(𝑃) follows from the computations (𝑓 ∗ 𝟏)(𝑥, 𝑦) = ∑ 𝑓(𝑥, 𝑧)𝟏(𝑧, 𝑦) = 𝑓(𝑥, 𝑦)𝟏(𝑦, 𝑦) = 𝑓(𝑥, 𝑦) and 𝑥≤𝑧≤𝑦
(𝟏 ∗ 𝑓)(𝑥, 𝑦) = ∑ 𝟏(𝑥, 𝑧)𝑓(𝑧, 𝑦) = 𝟏(𝑥, 𝑥)𝑓(𝑥, 𝑦) = 𝑓(𝑥, 𝑦). 𝑥≤𝑧≤𝑦
We conclude by establishing the asserted necessary and sufficient conditions for 𝑓 to have a multiplicative inverse. (𝑖) Suppose that 𝑔 is the multiplicative inverse of 𝑓. In particular, 𝑔 ∗ 𝑓 = 𝟏, and so, for all 𝑥 ∈ 𝑃, 𝑔 ∗ 𝑓(𝑥, 𝑥) = 𝑔(𝑥, 𝑥)𝑓(𝑥, 𝑥) = 𝟏(𝑥, 𝑥) = 1, and so 𝑓(𝑥, 𝑥) ≠ 0. (𝑖𝑖) Suppose that, for all 𝑥 ∈ 𝑃, 𝑓(𝑥, 𝑥) ≠ 0. Define 𝑔 and ℎ inductively on the cardinality of [𝑥, 𝑦] by 𝑔(𝑥, 𝑥) = ℎ(𝑥, 𝑥) = 1/𝑓(𝑥, 𝑥) for all 𝑥 ∈ 𝑃, with (14.1.4) −1 −1 ∑ 𝑔(𝑥, 𝑧)𝑓(𝑧, 𝑦) and ℎ(𝑥, 𝑦) = ∑ 𝑓(𝑥, 𝑧)ℎ(𝑧, 𝑦). 𝑔(𝑥, 𝑦) = 𝑓(𝑦, 𝑦) 𝑥≤𝑧 0, |𝐹𝑛 𝑥𝑛 | ≤ |(2𝑥)𝑛 | for all real 𝑥. But the geometric series ∑𝑛≥0 (2𝑥)𝑛 has radius of convergence 𝑅 = 1/2, and so ∑𝑛≥0 |(2𝑥)𝑛 | converges for |𝑥| < 1/2. So by Theorem A.1.3, ∑𝑛≥0 |𝐹𝑛 𝑥𝑛 |, and hence ∑𝑛≥0 𝐹𝑛 𝑥𝑛 , converges for |𝑥| < 1/2. (In fact, the radius of convergence of ∑𝑛≥0 𝐹𝑛 𝑥𝑛 is equal to (√5 − 1)/2 ≐ 0.618, which follows from the closed form (1.2.10) for 𝐹𝑛 . But we only needed to confirm that ∑𝑛≥0 𝐹𝑛 𝑥𝑛 converges on some open interval containing 0 in advance of deriving that closed form.) Application A.2. Analytic justification of Theorem 1.4.1. Suppose that 𝐼 is a nonempty set of nonnegative integers. If 𝐼 is finite and 𝑘 ≥ 0, the fact that (A.2.1)
(∑ 𝑥𝑖 )𝑘 = ∑ 𝐶(𝑛, 𝑘; 𝐼)𝑥𝑛 𝑖∈𝐼
𝑛≥0
follows simply from the rule for multiplying polynomials, the right-hand sum being finitely nonzero. Suppose, however, that 𝐼 is infinite. We need to find some 𝑟 > 0 such that ∑𝑖∈𝐼 𝑥𝑖 converges absolutely for |𝑥| < 𝑟. Write this sum in the form ∑𝑖≥0 𝑎𝑖 𝑥𝑖 , where 𝑎𝑖 = 1 if 𝑖 ∈ 𝐼, and 𝑎𝑖 = 0 otherwise. Since |𝑎𝑖 𝑥𝑖 | ≤ |𝑥𝑖 | for all 𝑖, and the geometric series ∑𝑛≥0 𝑥𝑖 converges absolutely for |𝑥| < 1, the desired result follows from Theorem A.1.3, and the aforementioned extension of Theorem A.2.2
A.3. Double sequences and series
253
A.3. Double sequences and series We begin with a cautionary example. Consider the infinite double sequence
(A.3.1)
1 −1 0 1 0 0 0 0 0 0 ⋅ ⋅ ⋅ ⋅
0 0 −1 0 1 −1 0 1 0 0 ⋅ ⋅ ⋅ ⋅
⋅ 0 0 −1 1 0 ⋅
⋅ ⋅ 0 0 −1 1 0
⋅ ⋅ ⋅ 0 0 −1 1
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 0 0 0 −1 0
⋅ ⋅ ⋅ ⋅ ⋅ 0 0 etc.
Each row sum converges, and the sum of the row sums converges to 0. Each column sum converges, and the sum of the column sums converges to 1. So the sum of the row sums is not equal to the sum of the column sums. This example shows that one must proceed with caution when interchanging double (and, of course, higher dimensional) infinite sums, even when, as above, the rows and columns of an infinite double sequence are finitely nonzero. The following theorem states a condition that guarantees that the order of summation is immaterial. Theorem A.3.1. Suppose that (𝛼𝑛,𝑘 )𝑛,𝑘≥0 is an infinite double sequence of real numbers, construed as an infinite rectangular array, with 𝑛 designating the row and 𝑘 designating the column occupied by 𝛼𝑛,𝑘 . If, for all 𝑘 ≥ 0, ∑𝑛≥0 𝛼𝑛,𝑘 converges absolutely, with ∑𝑛≥0 |𝛼𝑛,𝑘 | = 𝐶𝑘 , and ∑𝑘≥0 𝐶𝑘 converges, then, for all 𝑛 ≥ 0, ∑𝑘≥0 𝛼𝑛,𝑘 converges absolutely, ∑𝑛≥0 (∑𝑘≥0 𝛼𝑛,𝑘 ) and ∑𝑘≥0 (∑𝑛≥0 𝛼𝑛,𝑘 ) converge, and ∑𝑛≥0 (∑𝑘≥0 𝛼𝑛,𝑘 ) = ∑𝑘≥0 (∑𝑛≥0 𝛼𝑛,𝑘 ). In combinatorial applications of the preceding theorem, it is typically the case that 𝛼𝑛,𝑘 = 𝑎𝑛,𝑘 𝑥𝑛 and that 𝑎𝑛,𝑘 is a nonnegative integer that denotes the cardinality of some set of discrete structures indexed on the pair (𝑛, 𝑘). Moreover, in all of the examples that we consider, it will be the case for each 𝑛 ≥ 0 that the sequence (𝑎𝑛,𝑘 )𝑘≥0 is finitely nonzero, usually because 𝑎𝑛,𝑘 = 0 if 𝑘 > 𝑛. With 𝑠𝑛 ∶= ∑𝑘≥0 𝑎𝑛,𝑘 , the goal is then to show that the infinite series ∑𝑛≥0 𝑠𝑛 𝑥𝑛 , or sometimes ∑𝑛≥0 𝑠𝑛 𝑥𝑛 /𝑛!, has a positive radius of convergence, and to identify the function 𝐹(𝑥) represented by this series in reasonably simple terms. This is typically accomplished via the following steps. (1) Identify some 𝑟 > 0 (with 𝑟 = +∞ a possibility) such that, for all 𝑘 ≥ 0, ∑ |𝑎𝑛,𝑘 𝑥𝑛 | = ∑ 𝑎𝑛,𝑘 |𝑥|𝑛 𝑛≥0
𝑛≥0
converges on the interval (−𝑟, 𝑟), with 𝐶𝑘 (𝑥) ∶= ∑𝑛≥0 𝑎𝑛,𝑘 |𝑥|𝑛 and 𝑐 𝑘 (𝑥) ∶= ∑𝑛≥0 𝑎𝑛,𝑘 𝑥𝑛 . (2) Identify some 𝜌 > 0, with 𝜌 ≤ 𝑟, such that ∑𝑘≥0 𝐶𝑘 (𝑥), and hence, ∑𝑘≥0 𝑐 𝑘 (𝑥), converges for all 𝑥 ∈ (−𝜌, 𝜌).
254
A. Analysis review
(3) Apply Theorem A.3.1 to conclude that ∑ 𝑠𝑛 𝑥𝑛 = ∑ ( ∑ 𝑎𝑛,𝑘 )𝑥𝑛 = ∑ ( ∑ 𝑎𝑛,𝑘 𝑥𝑛 ) = ∑ 𝑐 𝑘 (𝑥) 𝑛≥0
𝑛≥0 𝑘≥0
𝑘≥0 𝑛≥0
𝑘≥0
for all 𝑥 ∈ (−𝜌, 𝜌). Identify, if possible, a reasonably simple function 𝐹 such that ∑𝑘≥0 𝑐 𝑘 (𝑥) = 𝐹(𝑥) on (−𝜌, 𝜌). Remark. Note that while the row sums, being finitely nonzero, trivially converge absolutely for every 𝑥, we establish the absolute convergence of the column sums in the first step. This is because it is typically difficult to establish the convergence of the sum of the row sums directly and, indeed, such convergence is confirmed only in step (3) above. Application A.3. Analytic justification of Theorem 1.5.1 (The fundamental theorem of composition enumeration). Consider the array (𝐶(𝑛, 𝑘; 𝐼)𝑥𝑛 )𝑛,𝑘≥0 . It has already been established (see Application A.2) for each 𝑘 ≥ 0 that ∑𝑛≥0 𝐶(𝑛, 𝑘; 𝐼)𝑥𝑛 = (∑𝑖∈𝐼 𝑥𝑖 )𝑘 on the interval (−1, 1). It follows immediately that ∑ |𝐶(𝑛, 𝑘; 𝐼)𝑥𝑛 | = ∑ 𝐶(𝑛, 𝑘; 𝐼)|𝑥|𝑛 = (∑ |𝑥|𝑖 )𝑘 𝑛≥0
𝑛≥0
𝑖∈𝐼
converges on (−1, 1). Since 0 ∉ 𝐼, (A.3.2)
∑ |𝑥|𝑖 ≤ ∑ |𝑥|𝑖 = 𝑖∈𝐼
𝑖≥1
|𝑥| < 1 if 𝑥 ∈ 𝐽 ∶= (−1/2, 1/2). 1 − |𝑥|
So if 𝑥 ∈ 𝐽, ∑𝑘≥0 (∑𝑛≥0 |𝐶(𝑛, 𝑘; 𝐼)𝑥𝑛 |) = ∑𝑘≥0 (∑𝑖∈𝐼 |𝑥|𝑖 )𝑘 , and so ∑𝑛≥0 𝐶(𝑛; 𝐼)𝑥𝑛 = ∑𝑘≥0 (∑𝑖∈𝐼 𝑥𝑖 )𝑘 by Theorem A.3.1. To conclude that ∑𝑘≥0 (∑𝑖∈𝐼 𝑥𝑖 )𝑘 = 𝑖
𝑖
1 1−∑𝑖∈𝐼 𝑥𝑖
𝑥 ∈ 𝐽, it suffices by (A.3.2) to note that | ∑𝑖∈𝐼 𝑥 | ≤ ∑𝑖∈𝐼 |𝑥 | by Theorem A.1.1(𝑖𝑖).
References [1] T. Apostol (1974): Mathematical Analysis, 2𝑛𝑑 edition, Addison Wesley. MR0344384 [2] J. Kitchen (1968): Calculus of One Variable, Addison Wesley. [3] W. Wade (2010): An Introduction to Analysis, 4th Edition, Prentice Hall. MR911681
for
Appendix B
Topology review
B.1. Topological spaces and their bases If 𝑆 is a nonempty set, a topology on 𝑆 is a family 𝒯 of subsets (called open sets) of 𝑆 such that (𝑖) 𝒯 is closed under arbitrary unions (so, by our convention on empty unions, it is always the case that ∅ ∈ 𝒯), and (𝑖𝑖) 𝒯 is closed under finite intersections (so, by our convention on empty intersections, it is always the case that 𝑆 ∈ 𝒯). If 𝒯 is a topology on 𝑆, the pair (𝑆, 𝒯) is called a topological space. The family 𝒯 = {∅, 𝑆} is a topology on 𝑆, called the indiscrete topology on S, and the family 𝒯 = 2𝑆 is a topology on 𝑆, called the discrete topology on S. A sequence (𝑥𝑛 )𝑛≥0 in 𝑆 is said to converge to 𝑥 ∈ 𝑆 with respect to the topology 𝒯 (symbolized by 𝑥𝑛 → 𝑥) if, for all 𝐺 ∈ 𝒯 with 𝑥 ∈ 𝐺, there exists an 𝑁 𝐺 ∈ ℕ such that 𝑛 > 𝑁 𝐺 ⇒ 𝑥𝑛 ∈ 𝐺 (i.e., 𝑥𝑛 is eventually in every open set that contains x). Every sequence in 𝑆 converges to every member of 𝑆 with respect to the indiscrete topology. A sequence (𝑥𝑛 )𝑛≥0 in 𝑆 converges to 𝑥 ∈ 𝑆 with respect to the discrete topology if and only if there exists an 𝑁 ∈ ℕ such that 𝑛 > 𝑁 ⇒ 𝑥𝑛 = 𝑥 (i.e., 𝑥𝑛 is eventually equal to x). A family ℬ ⊆ 𝒯 is a base for the topology 𝒯 if every 𝐺 ∈ 𝒯 is equal to a union of members of ℬ. The following theorem characterizes those families ℬ ⊆ 2𝑆 that constitute a base for some topology on 𝑆. Theorem B.1.1. If ℬ ⊆ 2𝑆 , the family of all unions of members of ℬ is a topology on 𝑆 if and only if (𝑖) ∪ℬ = 𝑆, and (𝑖𝑖) for all 𝐵1 , 𝐵2 ∈ ℬ, 𝐵1 ∩ 𝐵2 is equal to a union of members of ℬ.
Proof. Necessity. Obvious. Sufficiency. (𝑎) For any ℬ ⊆ 2𝑆 satisfying (𝑖), the family of all unions of members of ℬ is clearly closed under arbitrary unions, and contains both ∅ and 𝑆. (𝑏) Suppose that ℬ1 , ℬ2 ⊆ ℬ. Then (∪ℬ1 ) ∩ (∪ℬ2 ) = ∪(𝐵1 ∩ 𝐵2 ), taken over all (𝐵1 , 𝐵2 ) ∈ ℬ1 × ℬ2 . By (𝑖𝑖) each set (𝐵1 ∩ 𝐵2 ) is equal to a union of members of ℬ, and so (∪ℬ1 ) ∩ (∪ℬ2 ) is a union of members of ℬ. □ 255
256
B. Topology review
Suppose that (𝑆, 𝒯) is a topological space, 𝑈 ⊆ 𝑆, and 𝑎 ∈ 𝑆. We call 𝑎 an accumulation point of U if every open set containing 𝑎 contains at least one point of 𝑈 that is distinct from 𝑎. The set of all accumulation points of 𝑈, denoted by 𝑈 ′ , is called the derived set of U, and the set 𝑈 ∶= 𝑈 ∪ 𝑈 ′ is called the closure of U.
B.2. Metric topologies A mapping 𝑑 ∶ 𝑆 × 𝑆 → [0, ∞) is called a metric on S if it satisfies the following properties. (i) 𝑑(𝑥, 𝑦) = 0 if and only if 𝑥 = 𝑦. (ii) For all 𝑥, 𝑦 ∈ 𝑆, 𝑑(𝑥, 𝑦) = 𝑑(𝑦, 𝑥). (iii) For all 𝑥, 𝑦, 𝑧 ∈ 𝑆, 𝑑(𝑥, 𝑧) ≤ 𝑑(𝑥, 𝑦) + 𝑑(𝑦, 𝑧). Property (iii) is, for obvious reasons, known as the triangle inequality. A pair (𝑆, 𝑑), with 𝑑 a metric on 𝑆, is called a metric space. If (𝑆, 𝑑) is a metric space, 𝑥 ∈ 𝑆, and 𝑟 > 0, the open ball of radius r about x is the set 𝐵(𝑥; 𝑟) ∶= {𝑦 ∈ 𝑆 ∶ 𝑑(𝑥, 𝑦) < 𝑟}. Theorem B.2.1. If 𝑑 is a metric on 𝑆, the family of open balls {𝐵(𝑥; 𝑟) ∶ 𝑥 ∈ 𝑆 and 𝑟 > 0} is a base for a topology, denoted by 𝒯𝑑 , on 𝑆. Proof. It is clear that the union of all open balls determined by 𝑑 is equal to 𝑆. Suppose that 𝐵(𝑥1 ; 𝑟1 ) and 𝐵(𝑥2 ; 𝑟2 ) are open balls. To establish criterion (𝑖𝑖) of Theorem B.1.1, it suffices to show that, for every 𝑥0 ∈ 𝐵(𝑥1 ; 𝑟1 ) ∩ 𝐵(𝑥2 ; 𝑟2 ), there exists an 𝑟 > 0 such that 𝐵(𝑥0 ; 𝑟) ⊆ 𝐵(𝑥1 ; 𝑟1 ) ∩ 𝐵(𝑥2 ; 𝑟2 ). We may simply take 𝑟 = min{𝑟1 − 𝑑(𝑥0 , 𝑥1 ), 𝑟2 − 𝑑(𝑥0 , 𝑥2 )}. For if 𝑥 ∈ 𝐵(𝑥0 ; 𝑟), then 𝑑(𝑥, 𝑥1 ) ≤ 𝑑(𝑥, 𝑥0 ) + 𝑑(𝑥0 , 𝑥1 ) < 𝑟1 and 𝑑(𝑥, 𝑥2 ) ≤ 𝑑(𝑥, 𝑥0 ) + 𝑑(𝑥0 , 𝑥2 ) < 𝑟2 . So 𝑥 ∈ 𝐵(𝑥1 ; 𝑟1 ) ∩ 𝐵(𝑥2 ; 𝑟2 ). □ The topology 𝒯𝑑 is called the topology on S induced by d. Such topologies are known generically as metric topologies, and a topology 𝒯 on 𝑆 is called metrizable if there exists a metric 𝑑 on 𝑆 such that 𝒯 = 𝒯𝑑 . From Theorem B.2.1 we get the following characterization of the open sets of a metric topology on 𝑆: A set 𝐺 ⊆ 𝑆 is open if and only if, for all 𝑥 ∈ 𝐺, there exists an 𝑟 > 0 and an open ball 𝐵(𝑥; 𝑟) such that 𝐵(𝑥; 𝑟) ⊆ 𝐺. This characterization will be familiar to students of analysis, who will have encountered it as a definition of open sets in Euclidean spaces. If (𝑆, 𝑑) is a metric space and (𝑠𝑛 )𝑛≥0 is a sequence in 𝑆, we say that this sequence converges to 𝑠 ∈ 𝑆 (symbolized 𝑠𝑛 → 𝑠) if, for all 𝜀 > 0, there exists an 𝑛𝜀 ∈ ℕ such that 𝑛 > 𝑛𝜀 ⇒ 𝑑(𝑠𝑛 , 𝑠) < 𝜀 (equivalently, 𝑠𝑛 ∈ 𝐵(𝑠; 𝜀)). In view of the above discussion, this formulation of convergence coincides with the definition of convergence introduced earlier in arbitrary topological spaces. Theorem B.2.2. Suppose that (𝑆, 𝑑) is a metric space and that 𝑈 ⊆ 𝑆. If 𝑎 ∈ 𝑈, there is a sequence (𝑠𝑛 )𝑛≥0 in 𝑈 such that 𝑠𝑛 → 𝑎.
B.5. The topology of pointwise convergence
257
Proof. Since 𝑎 ∈ 𝑈 = 𝑈 ∪ 𝑈 ′ , it is the case that, for every 𝑛 ∈ ℕ, there exists an open ball 𝐵(𝑎; 𝑛−1 ) that contains a member of 𝑈. If 𝑠𝑛 is any such member, then clearly 𝑠𝑛 → 𝑎. □
B.3. Separation axioms A topological space (𝑆, 𝒯) is called a 𝑇0 -space if, for all distinct 𝑥, 𝑦 ∈ 𝑆, there is an open set 𝐺𝑥 ∈ 𝒯 such that 𝑥 ∈ 𝐺𝑥 and 𝑦 ∉ 𝐺𝑥 or there is an open set 𝐺𝑦 ∈ 𝒯 such that 𝑦 ∈ 𝐺𝑦 and 𝑥 ∉ 𝐺𝑦 . (𝑆, 𝒯) is called a 𝑇1 -space if, for all distinct 𝑥, 𝑦 ∈ 𝑆, there is an open set 𝐺𝑥 ∈ 𝒯 such that 𝑥 ∈ 𝐺𝑥 and 𝑦 ∉ 𝐺𝑥 and there is an open set 𝐺𝑦 ∈ 𝒯 such that 𝑦 ∈ 𝐺𝑦 and 𝑥 ∉ 𝐺𝑦 . (𝑆, 𝒯) is called a 𝑇2 -space (or Hausdorff space) if, for all distinct 𝑥, 𝑦 ∈ 𝑆, there exist disjoint open sets 𝐺𝑥 , 𝐺𝑦 ∈ 𝒯 such that 𝑥 ∈ 𝐺𝑥 and 𝑦 ∈ 𝐺𝑦 . It is easy to show that every metric space is a Hausdorff space.
B.4. Product topologies Suppose that (𝑌𝑥 , 𝒯𝑥 )𝑥∈𝑋 is a family of topological spaces indexed on the set 𝑋. The Cartesian product ∏𝑥∈𝑋 𝑌𝑥 is formally defined as the set of functions 𝑓 ∶ 𝑋 → ⋃𝑥∈𝑋 𝑌𝑥 for which 𝑓(𝑥) ∈ 𝑌𝑥 for all 𝑥 ∈ 𝑋. Members of ∏𝑥∈𝑋 𝑌𝑥 are commonly denoted by 𝑋-indexed families {𝑦𝑥 }𝑥∈𝑋 , where 𝑦𝑥 = 𝑓(𝑥). In Theorem B.4.1 we exhibit a base for a topology on ∏𝑥∈𝑋 𝑌𝑥 . This topology is called the product topology (or weak topology) on ∏𝑥∈𝑋 𝑌𝑥 . Theorem B.4.1. Let (𝑌𝑥 , 𝒯𝑥 )𝑥∈𝑋 be a family of topological spaces indexed on the set 𝑋. A base for a topology on ∏𝑥∈𝑋 𝑌𝑥 is given by the family of all subsets of ∏𝑥∈𝑋 𝑌𝑥 having the form ∏𝑥∈𝑋 𝐺𝑥 , with each 𝐺𝑥 ∈ 𝒯𝑥 and 𝐺𝑥 = 𝑌𝑥 for all but finitely many 𝑥 ∈ 𝑋. Proof. It is clear that the union of the aforementioned family is equal to ∏𝑥∈𝑋 𝑌𝑥 , so suppose that ∏𝑥∈𝑋 𝐺𝑥 and ∏𝑥∈𝑋 𝐺𝑥′ are members of that family. Then ∏𝑥∈𝑋 𝐺𝑥 ∩ ∏𝑥∈𝑋 𝐺𝑥′ = ∏𝑥∈𝑋 (𝐺𝑥 ∩ 𝐺𝑥′ ) is itself a member of that family, and hence trivially equal to a union of members of that family. □
B.5. The topology of pointwise convergence Suppose, as in the preceding section, that (𝑌𝑥 , 𝒯𝑥 )𝑥∈𝑋 is a family of topological spaces indexed on the set 𝑋. If 𝑌𝑥 = 𝑌 and 𝒯𝑥 = 𝒯 for all 𝑥 ∈ 𝑋, then the product topology on ∏𝑥∈𝑋 𝑌 = 𝑌 𝑋 is frequently called the topology of pointwise convergence, terminology explained in Theorem B.5.1. Theorem B.5.1. Let (𝑌 , 𝒯) be a topological space, and let 𝒯 ∗ be the product topology on 𝑌 𝑋 . A sequence of functions (𝑓𝑛 )𝑛≥0 in 𝑌 𝑋 converges to a function 𝑓 ∈ 𝑌 𝑋 with respect to 𝒯* if and only if, for every 𝑥 ∈ 𝑋, the sequence (𝑓𝑛 (𝑥))𝑛≥0 in 𝑌 converges to 𝑓(𝑥) with respect to 𝒯. Proof. Sufficiency. If 𝑓𝑛 (𝑥) → 𝑓(𝑥) for all 𝑥 ∈ 𝑋, then, for every 𝑥 ∈ 𝑋 and every 𝐺𝑥 ∈ 𝒯 such that 𝑓(𝑥) ∈ 𝐺𝑥 , there exists an 𝑁𝑥 such that 𝑛 > 𝑁𝑥 ⇒ 𝑓𝑛 (𝑥) ∈ 𝐺𝑥 . To show that 𝑓𝑛 → 𝑓, suppose that 𝑓 ∈ ∏𝑥∈𝑋 𝐺𝑥 , a member of the basis of 𝒯 ∗ . Let 𝐼 be the finite subset of 𝑋 for which 𝐺𝑥 ≠ 𝑌 . If 𝐼 = ∅, then ∏𝑥∈𝑋 𝐺𝑥 = 𝑌 𝑋 , and so 𝑓𝑛 ∈ ∏𝑥∈𝑋 𝐺𝑥
258
B. Topology review
for every 𝑛. If 𝐼 ≠ ∅, let 𝑁 = max{𝑁𝑥 ∶ 𝑥 ∈ 𝐼}. Then 𝑛 > 𝑁 ⇒ 𝑓𝑛 ∈ ∏𝑥∈𝑋 𝐺𝑥 . Necessity. Suppose that 𝑓𝑛 → 𝑓. If 𝑥′ ∈ 𝑋 and 𝑓(𝑥′ ) ∈ 𝐺, where 𝐺 ∈ 𝒯, let ∏𝑥∈𝑋 𝐺𝑥 be the member of the basis of 𝒯 ∗ defined by (𝑖) 𝐺𝑥′ = 𝐺 and (𝑖𝑖) 𝐺𝑥 = 𝑌 for all 𝑥 ≠ 𝑥′ . Since 𝑓𝑛 → 𝑓, we must have 𝑓𝑛 ∈ ∏𝑥∈𝑋 𝐺𝑥 , and hence 𝑓𝑛 (𝑥′ ) ∈ 𝐺, for 𝑛 sufficiently large. □ Corollary B.5.2. If 𝑌 is equipped with the discrete topology, and 𝑌 𝑋 with the corresponding topology of pointwise convergence, then a sequence (𝑓𝑛 )𝑛≥0 in 𝑌 𝑋 converges to 𝑓 if and only if, for every 𝑥 ∈ 𝑋, 𝑓𝑛 (𝑥) is eventually equal to 𝑓(𝑥). Proof. Obvious, in view of Theorem B.5.1 and the remarks in section B.1.
□
Suppose now that ℂ is equipped with the discrete topology, and ℂ𝑋 is equipped with the corresponding topology of pointwise convergence 𝒯 ∗ . Let (𝑓𝑛 )𝑛≥0 be a sequence in ℂ𝑋 and, for all 𝑛 ≥ 0, let 𝑠𝑛 ∶= 𝑓0 + ⋯ + 𝑓𝑛 , where, for each 𝑥 ∈ 𝑋, 𝑠𝑛 (𝑥) ∶= 𝑓0 (𝑥) + ⋯ + 𝑓𝑛 (𝑥). As usual, the infinite series ∑𝑛≥0 𝑓𝑛 is said to converge to s with respect to 𝒯 ∗ (symbolized ∑𝑛≥0 𝑓𝑛 = 𝑠) if there exists an 𝑠 ∈ ℂ𝑋 such that 𝑠𝑛 → 𝑠 with respect to 𝒯 ∗ . Corollary B.5.3. If ℂ is equipped with the discrete topology, and ℂ𝑋 with the corresponding topology of pointwise convergence 𝒯 ∗ , the infinite series ∑𝑛≥0 𝑓𝑛 converges with respect to 𝒯 ∗ if and only if, for all 𝑥 ∈ 𝑋, 𝑓𝑛 (𝑥) is eventually equal to 0. Proof. By Corollary B.5.2, 𝑠𝑛 = 𝑓0 + ⋯ + 𝑓𝑛 → 𝑠 if and only if, for every 𝑥 ∈ 𝑋, there exists an 𝑁𝑥 such that 𝑛 > 𝑁𝑥 ⇒ 𝑠𝑛 (𝑥) = 𝑠(𝑥), which is equivalent to the assertion that, if 𝑛 > 𝑁𝑥 + 1, then 𝑓𝑛 (𝑥) = 𝑠𝑛 (𝑥) − 𝑠𝑛−1 (𝑥) = 𝑠(𝑥) − 𝑠(𝑥) = 0. □ Corollary B.5.4. Under the hypotheses of Corollary B.5.3, if ∑𝑛≥0 𝑓𝑛 converges with respect to 𝒯 ∗ and 𝜎 ∶ ℕ → ℕ is a bijection, then ∑𝑛≥0 𝑓𝜍(𝑛) converges with respect to 𝒯 ∗ , and ∑𝑛≥0 𝑓𝜍(𝑛) = ∑𝑛≥0 𝑓𝑛 . Proof. By Corollary B.5.3, for every 𝑥 ∈ 𝑋, the sequence (𝑓𝑛 (𝑥))𝑛≥0 , and hence the sequence (𝑓𝜍(𝑛) (𝑥))𝑛≥0 , is finitely nonzero, with ∑𝑛≥0 𝑓𝜍(𝑛) (𝑥) = ∑𝑛≥0 𝑓𝑛 (𝑥) being equal to the sum of those finitely many nonzero terms. □ If 𝑋 = ℕ, the conclusions of the preceding corollaries are identical to those of Theorems 13.3.6, 13.4.1, and 13.4.2. This is not surprising, since, as shown below, their hypotheses are identical. Theorem B.5.5. The topology 𝒯𝑑 on ℂℕ induced by the ultrametric 𝑑 defined by (13.3.8) is identical with the topology of pointwise convergence 𝒯 ∗ on ℂℕ , where ℂ is equipped with the discrete topology. Proof. It suffices to show that every member of a basis of 𝒯𝑑 is a union of members of a basis of 𝒯 ∗ , and conversely. (𝑖) Suppose that 𝐵(𝐹; 𝑟) is an open ball for the ultrametric 𝑑. If 𝑟 > 1, then 𝐵(𝐹; 𝑟) = ℂℕ , which is a member of the basis of 𝒯 ∗ . If 2−𝑗 < 𝑟 ≤ 2−𝑗+1 for 𝑗 ≥ 1, and 𝐻 ∈ 𝐵(𝐹; 𝑟), then 𝑑(𝐻, 𝐹) ≤ 2−𝑗 , and so 𝐻(𝑖) = 𝐹(𝑖) for 0 ≤ 𝑖 ≤ 𝑗 − 1. Let
References
259
𝑐 𝑖 denote the common value of 𝐻(𝑖) and 𝐹(𝑖) for 0 ≤ 𝑖 ≤ 𝑗 − 1. Then ∏𝑖≥0 𝐺 𝑖 , where 𝐺 𝑖 = {𝑐 𝑖 } for 0 ≤ 𝑖 ≤ 𝑗 − 1 and 𝐺 𝑖 = ℂ for 𝑖 ≥ 𝑗, is a member of the basis of 𝒯 ∗ . Moreover, 𝐻 ∈ ∏𝑖≥0 𝐺 𝑖 ⊆ 𝐵(𝐹; 𝑟). (𝑖𝑖) Suppose that ∏𝑖≥0 𝐺 𝑖 is a member of the basis of 𝒯 ∗ . If each 𝐺 𝑖 = ℂ, then, as noted above, ∏𝑖≥0 𝐺 𝑖 = ℂℕ = 𝐵(𝐹; 𝑟) for any 𝐹 ∈ ℂℕ and any 𝑟 > 1. Otherwise, only finitely many 𝐺 𝑖 differ from ℂ, and so there exists a 𝑗 ≥ 1 such that 𝐺 𝑖 = ℂ for 𝑖 ≥ 𝑗. Suppose that 𝐹 ∈ ∏𝑖≥0 𝐺 𝑖 . We claim that 𝐵(𝐹; 2−𝑗 ) ⊆ ∏𝑖≥0 𝐺 𝑖 . For if 𝐻 ∈ 𝐵(𝐹; 2−𝑗 ), then, since 𝐻(𝑖) = 𝐹(𝑖) ∈ 𝐺 𝑖 for 0 ≤ 𝑖 ≤ 𝑗 − 1, we have 𝐻(𝑖) ∈ 𝐺 𝑖 for all 𝑖 ≥ 0, whence 𝐻 ∈ ∏𝑖≥0 𝐺 𝑖 . □ We remarked in Chapter 14 that if Int(𝑃) is uncountable, then the topology of pointwise convergence on ℂInt(𝑃) , where ℂ is equipped with the discrete topology, is not metrizable. In fact, a much more general result holds. Theorem B.5.6 (Mulay (2019)). Suppose that X is an uncountable set and (𝑌𝑥 , 𝒯𝑥 )𝑥∈𝑋 is a family of topological spaces. Suppose that for every 𝑥 ∈ 𝑋, there exist 𝑎𝑥 , 𝑏𝑥 ∈ 𝑌𝑥 , and an open set 𝑉𝑥 ⊂ 𝑌𝑥 such that 𝑎𝑥 ∈ 𝑉𝑥 and 𝑏𝑥 ∈ 𝑌𝑥 − 𝑉𝑥 . Then 𝑌 ∶= ∏𝑥∈𝑋 𝑌𝑥 , endowed with the product topology, is not metrizable. Proof. Let 𝑎 ∶= {𝑎𝑥 }𝑥∈𝑋 , 𝑏 ∶= {𝑏𝑥 }𝑥∈𝑋 , and 𝑆 ∶= {{𝑦𝑥 }𝑥∈𝑋 ∶ 𝑦𝑥 = 𝑏𝑥 for all but finitely many 𝑥}. We prove that 𝑌 is not metrizable by showing that 𝑎 ∈ 𝑆, but that there is no sequence in 𝑆 that converges to 𝑎. To show the first of these assertions, suppose that 𝐺 ∶= ∏𝑥∈𝑋 𝐺𝑥 is a basic open set in 𝑌 that contains 𝑎. Then 𝐽 ∶= {𝑥 ∈ 𝑋 ∶ 𝐺𝑥 ≠ 𝑌𝑥 } is finite. Let 𝑧𝑥 ∶= 𝑎𝑥 if 𝑥 ∈ 𝐽, and 𝑧𝑥 ∶= 𝑏𝑥 if 𝑥 ∈ 𝑋 −𝐽. Clearly, 𝑧 ∶= {𝑧𝑥 }𝑥∈𝑋 ∈ 𝑆 ∩𝐺. To show the second of these assertions, let (𝜎𝑛 )𝑛≥1 be an arbitrary sequence in 𝑆, with 𝜎𝑛 = {𝜎𝑛,𝑥 }𝑥∈𝑋 , and exhibit an open set 𝑊 that contains 𝑎, but no 𝜎𝑛 , as follows: For each 𝑛 ≥ 1, let 𝐻𝑛 ∶= {𝑥 ∈ 𝑋 ∶ 𝜎𝑛,𝑥 ≠ 𝑏𝑥 }. Clearly, each 𝐻𝑛 is finite, and so 𝐻 ∶= ⋃𝑛≥1 𝐻𝑛 is countable. Let 𝑥′ ∈ 𝑋 − 𝐻, and 𝑊 = ∏𝑥∈𝑋 𝑊𝑥 , where 𝑊𝑥′ = 𝑉𝑥′ , and 𝑊𝑥 = 𝑌𝑥 if 𝑥 ≠ 𝑥′ . Then 𝑎 ∈ 𝑊, but since 𝜎𝑛,𝑥′ = 𝑏𝑥′ for every 𝑛 ≥ 1, and 𝑏𝑥′ ∈ 𝑌𝑥′ − 𝑉𝑥′ , no 𝜎𝑛 is a member of 𝑊. □
References [1] J. Kelley (1953): General Topology, Van Nostrand. MR0070144 [2] S. Mulay (2019): personal communication. [3] W. Pervin (1964): Foundations of General Topology, Academic Press. MR0165477
Appendix C
Abstract algebra review
This appendix is essentially a glossary of the basic terminology of abstract algebra. It is by no means complete, featuring only definitions of terms that occur in the text. In addition, some of the elementary theorems of the subject are stated. Proofs of these theorems are occasionally included in order to convey something of the spirit of algebraic reasoning, the rest being left as exercises. In the interest of simplicity, however, certain results may not be stated in their most general form. Readers who wish to pursue the subject in more depth will find excellent accounts in Warner (1971, 1990).
C.1. Algebraic structures with one composition A composition (or binary operation) on the nonempty set 𝑆 is a mapping from 𝑆 ×𝑆 to 𝑆. The image of an ordered pair (𝑥, 𝑦) ∈ 𝑆×𝑆 under such a mapping is typically denoted by placing a symbol between 𝑥 and 𝑦. The symbols + and ⋅ occur most frequently, but there are many other possibilities, and we shall use the symbol ∗ in this introductory section, in order to preclude any preconceptions that might be associated with more familiar composition symbols. A set equipped with one or more compositions constitutes an algebraic structure. Let (𝑆, ∗) be such a structure. (i) Identities. If 𝑒 𝑙 ∈ 𝑆 and 𝑒 𝑙 ∗ 𝑥 = 𝑥, for all 𝑥 ∈ 𝑆, then 𝑒 𝑙 is called a left identity. If 𝑒 𝑟 ∈ 𝑆 and 𝑥 ∗ 𝑒 𝑟 = 𝑥, for all 𝑥 ∈ 𝑆, then 𝑒 𝑟 is called a right identity. If 𝑆 contains both a left identity 𝑒 𝑙 and a right identity 𝑒 𝑟 , then, since 𝑒 𝑟 is a right identity, 𝑒 𝑙 ∗ 𝑒 𝑟 = 𝑒 𝑙 , and, since 𝑒 𝑙 is a left identity, 𝑒 𝑙 ∗ 𝑒 𝑟 = 𝑒 𝑟 . The common value of 𝑒 𝑙 and 𝑒 𝑟 , the two-sided identity e, is clearly the only one- or two-sided identity in 𝑆, and is simply called the identity of 𝑆. (ii) Associativity. The composition ∗ (as well as the algebraic structure (𝑆, ∗)) is said to be associative if 𝑥 ∗ (𝑦 ∗ 𝑧) = (𝑥 ∗ 𝑦) ∗ 𝑧, for all 𝑥, 𝑦, 𝑧 ∈ 𝑆. In an associative algebraic structure (𝑆, ∗) the value of a nonparenthesized expression of the form 𝑥1 ∗ 𝑥2 ∗ ⋯ ∗ 𝑥𝑛 is unambiguous. 261
262
C. Abstract algebra review
(iii) Inverses. Suppose that 𝑆 contains the identity 𝑒, and 𝑥 ∈ 𝑆. If there exists an ′ ′ ′ 𝑥𝑙 ∈ 𝑆 such that 𝑥𝑙 ∗ 𝑥 = 𝑒, then 𝑥𝑙 is called a left inverse of x. If there exists an ′ ′ ′ 𝑥𝑟 ∈ 𝑆 such that 𝑥 ∗ 𝑥𝑟 = 𝑒, then 𝑥𝑟 is called a right inverse of x. If an associative ′ algebraic structure (𝑆, ∗) with identity 𝑒 contains both a left inverse 𝑥𝑙 and a right ′ ′ ′ ′ ′ ′ ′ ′ ′ inverse 𝑥𝑟 of 𝑥, then 𝑥𝑙 = 𝑥𝑙 ∗ 𝑒 = 𝑥𝑙 ∗ (𝑥 ∗ 𝑥𝑟 ) = (𝑥𝑙 ∗ 𝑥) ∗ 𝑥𝑟 = 𝑒 ∗ 𝑥𝑟 = 𝑥𝑟 . The ′ ′ ′ common value of 𝑥𝑙 and 𝑥𝑟 , the two-sided inverse 𝑥 of 𝑥, is clearly the only oneor two-sided inverse of 𝑥, and is simply called the inverse of x. (iv) Commutativity. The composition ∗ (as well as the algebraic structure (𝑆, ∗)) is said to be commutative if 𝑥 ∗ 𝑦 = 𝑦 ∗ 𝑥, for all 𝑥, 𝑦 ∈ 𝑆. (v) Semigroups and groups. (𝑆, ∗) is called a semigroup if it is associative. It is called a group if it is associative, contains an identity element, and every member of 𝑆 has an inverse. Commutative groups are almost universally referred to by mathematicians as abelian groups. There are of course many nonabelian groups, the best known being the so-called symmetric group (𝑆𝑋 , ∘) on a set 𝑋, where 𝑆𝑋 denotes to the set of all permutations of 𝑋 (i.e., bijections 𝜎 ∶ 𝑋 → 𝑋) and ∘ denotes composition of functions.
C.2. Algebraic structures with two compositions In this section and the next, we revert to the use of + and ⋅ as composition symbols, with the caveat that one should assume as properties of the compositions denoted by these symbols only those that are explicitly postulated. We refer to the composition + as addition and the composition ⋅ as multiplication. In an algebraic structure labeled (𝑆, +), the identity, if it exists, is called the additive identity, and is denoted by the symbol 0. The inverse of 𝑥 ∈ 𝑆, if it exists, is called the additive inverse of 𝑥, and is denoted by the symbol −𝑥. In an algebraic structure labeled (𝑆, ⋅), the identity, if it exists, is called the multiplicative identity, and is denoted by the symbol 1. The inverse of 𝑥 ∈ 𝑆, if it exists, is called the multiplicative inverse of 𝑥, and is denoted by the symbol 𝑥−1 . (i) Rings. An algebraic structure (𝑅, +, ⋅) is called a ring if the following three conditions are satisfied. (R1) (𝑅, +) is an abelian group. (R2) (𝑅, ⋅) is a semigroup. (R3) The operation ⋅ distributes over +, in the sense that (1) 𝑥 ⋅ (𝑦 + 𝑧) = 𝑥 ⋅ 𝑦 + 𝑥 ⋅ 𝑧, and (2) (𝑦 + 𝑧) ⋅ 𝑥 = 𝑦 ⋅ 𝑥 + 𝑧 ⋅ 𝑥. If the ring 𝑅 contains the multiplicative identity 1 (where, to avoid trivialities, 1 ≠ 0), 𝑅 is called a ring with identity (or a ring with 1). If the operation ⋅ is commutative, 𝑅 is called a commutative ring. Theorem C.2.1. In any ring (𝑅, +, ⋅), it is the case that the following hold. (C.2.1)
0 ⋅ 𝑥 = 𝑥 ⋅ 0 = 0.
(C.2.2)
(−𝑥) ⋅ 𝑦 = 𝑥 ⋅ (−𝑦) = −(𝑥 ⋅ 𝑦).
(C.2.3)
(−𝑥) ⋅ (−𝑦) = 𝑥 ⋅ 𝑦.
(C.2.4)
If 𝑥 has a multiplicative inverse, so does − 𝑥, and (−𝑥)−1 = −(𝑥−1 ).
C.3. 𝑅-algebraic structures
263
Proof. Exercise (or see Warner (1971, Theorem 14.5)).
□
(ii) Integral Domains. If (𝑅, +, ⋅) is a commutative ring with 1, 𝑅 is called an integral domain if it satisfies the cancellation property (C.2.5)
𝑎 ⋅ 𝑥 = 𝑎 ⋅ 𝑦 and 𝑎 ≠ 0 ⇒ 𝑥 = 𝑦.
Theorem C.2.2. Suppose that (𝑅, +, ⋅) is an integral domain. If 𝑥, 𝑦 ∈ 𝑅 and 𝑥 ⋅ 𝑦 = 0, then 𝑥 = 0 or 𝑦 = 0. Proof. Exercise.
□
(iii) Fields. If (𝑅, +, ⋅) is a commutative ring with 1, and every nonzero 𝑥 ∈ 𝑅 has a multiplicative inverse, then 𝑅 is called a field. So a field (𝑅, +, ⋅) simply consists of two abelian groups, (𝑅, +) and (𝑅−{0}, ⋅), that are interconnected by the distributivity property. The reader is doubtless familiar with the fields of rational, real, and complex numbers. As noted in Chapter 11, there are also fields with finitely many elements, among them the fields (𝑍𝑝 , ⊕, ⊙), where 𝑝 is a prime number, 𝑍𝑝 ∶= {0, 1, . . . , 𝑝 − 1}, 𝑥 ⊕ 𝑦 ∶= (𝑥 + 𝑦) mod 𝑝, and 𝑥 ⊙ 𝑦 ∶= (𝑥 ⋅ 𝑦) mod 𝑝. Theorem C.2.3. Every field is an integral domain, and every finite integral domain is a field. Proof. The proof of the first assertion is straightforward. To prove the second, suppose that 𝑎 is a nonzero element of the finite integral domain 𝑅. By (C.2.5), the map 𝑥 ↦ 𝑎⋅𝑥 is injective from the finite set (𝑅−{0}) to itself, and consequently surjective, by Theorem 2.2.10. Hence, 𝑎 ⋅ 𝑥 = 1 for some 𝑥 ∈ 𝑅. □
C.3. 𝑅-algebraic structures Suppose that (𝑅, +, ⋅) is a commutative ring with 1, and 𝐴 is an algebraic structure with one or more compositions. To differentiate these two structures, we denote members of the former, called scalars, by Greek letters (except for 0 and 1), and members of the latter by Roman letters. Scalar multiplication is a mapping from 𝑅 × 𝐴 to 𝐴, with the image of the pair (𝜆, 𝑥) under this mapping denoted by 𝜆.𝑥. Such a structure is called an R-algebraic structure. (i) R-algebras. We call the 𝑅-algebraic structure (𝐴, +, ⋅, .) an 𝑅-algebra if it has the following properties. (C.3.1)
(𝐴, +, ⋅) is a ring. The additive identity of this ring is denoted by 𝟎.
(C.3.2)
𝜆.(𝑥 + 𝑦) = 𝜆.𝑥 + 𝜆.𝑦.
(C.3.3)
(𝜆 + 𝜇).𝑥 = 𝜆.𝑥 + 𝜇.𝑥.
(C.3.4)
(𝜆 ⋅ 𝜇).𝑥 = 𝜆.(𝜇.𝑥).
(C.3.5)
1.𝑥 = 𝑥.
(C.3.6)
𝜆.(𝑥 ⋅ 𝑦) = (𝜆.𝑥) ⋅ 𝑦 = 𝑥 ⋅ (𝜆.𝑦).
264
C. Abstract algebra review
The R-algebra (𝐴, +, ⋅, .) is said to be commutative if the operation ⋅ is commutative. The multiplicative identity of the ring (𝐴, +, ⋅), if it exists, is denoted by 1, and is also called the identity of the R-algebra (𝐴, +, ⋅, .). If (𝐴, +) is an abelian group, the 𝑅-algebraic structure (𝐴, +, .) is called an R-module if it satisfies conditions (C.3.2)–(C.3.5) above. Students of linear algebra will recognize that such structures generalize the concept of a vector space. As we will make no use of module theory in this text, we will limit our remarks about modules to the following theorem. Theorem C.3.1. In any 𝑅-module (𝐴, +, .), hence in any 𝑅-algebra (𝐴, +, ⋅, .), it is the case that, for all 𝜆 ∈ 𝑅 and all 𝑥 ∈ 𝐴, (C.3.7)
𝜆.𝟎 = 0.𝑥 = 𝟎.
(C.3.8)
𝜆.(−𝑥) = (−𝜆).𝑥 = −(𝜆.𝑥).
(C.3.9)
(−1).𝑥 = −𝑥.
Proof. Exercise (or see Warner (1971, Theorem 27.2)).
□
C.4. Substructures Suppose that (𝑆, ∗) is an algebraic structure and that 𝐻 is a nonempty subset of 𝑆. When can we regard 𝐻, with composition ∗ inherited from 𝑆, as an algebraic structure in its own right? It is clearly both necessary and sufficient that 𝐻 be a stable subset of 𝑆 with respect to ∗, in the sense that 𝑥, 𝑦 ∈ 𝐻 ⇒ 𝑥 ∗ 𝑦 ∈ 𝐻 (a property also expressed by saying that 𝐻 is closed under ∗). If this condition on 𝐻 is satisfied, the algebraic structure (𝐻, ∗) may of course not inherit all the properties of (𝑆, ∗). The properties of associativity and commutativity will clearly be inherited. Suppose, however, that (𝑆, ∗) has identity 𝑒, and 𝐻 is a stable subset of 𝑆. The example (𝑆, ∗) = (ℤ, +), with 𝐻 = ℙ, shows that 𝑒 need not belong to 𝐻, and, indeed, that 𝐻 need not contain an identity element at all. And, as shown by the example (𝑆, ∗) = ({−1, 0, 1}, ⋅), with 𝐻 = {0}, if 𝐻 does contain an identity element (in this case, 0), it may differ from the identity element of 𝑆 (in this case, 1). Similar remarks apply to inverses, and so some caution is in order when dealing with substructures. Our interest here will focus on when stable subsets of some common algebraic structures inherit all of the defining properties of those structures. (1) Subsemigroups. Suppose that (𝑆, ∗) is a semigroup and 𝐻 is a stable subset of 𝑆. If (𝐻, ∗) is itself a semigroup, we say that (𝐻, ∗) is a subsemigroup of (𝑆, ∗), or simply that 𝐻 is a subsemigroup of 𝑆. Clearly, for every stable subset 𝐻 of 𝑆, it is the case that 𝐻 is a subsemigroup of 𝑆. (2) Subgroups. Suppose that (𝐺, ∗) is a group, and 𝐻 is a stable subset of 𝐺. We say that (𝐻, ∗) is a subgroup of (𝐺, ∗), or simply that 𝐻 is a subgroup of 𝐺, if (𝐻, ∗) is itself a group. Here, the mere fact that 𝐻 is a stable subset of 𝐺 does not ensure that 𝐻 is a subgroup of 𝐺.
C.5. Isomorphic structures
265
Theorem C.4.1. Suppose that (𝐺, ∗) is a group (with identity 𝑒, and the inverse of 𝑥 in 𝐺 denoted by 𝑥′ , as in section C.1). If 𝐻 is a stable subset of 𝐺, then 𝐻 is a subgroup of 𝐺 if and only if (1) 𝑒 ∈ 𝐻, and (2) for all 𝑥 ∈ 𝐻, 𝑥′ ∈ 𝐻. Proof. Necessity. Suppose that 𝐻 is a subgroup of 𝐺 and the identity element of 𝐻 is 𝑓. Then 𝑓 ∗ 𝑓 = 𝑓 = 𝑓 ∗ 𝑒, and so 𝑓′ ∗ (𝑓 ∗ 𝑓) = 𝑓′ ∗ (𝑓 ∗ 𝑒). But 𝑓′ ∗ (𝑓 ∗ 𝑓) = (𝑓′ ∗ 𝑓) ∗ 𝑓 = 𝑒 ∗ 𝑓 = 𝑓. Similarly, 𝑓′ ∗ (𝑓 ∗ 𝑒) = 𝑒, and so 𝑓 = 𝑒. Suppose next that 𝑥, 𝑦 ∈ 𝐻 and 𝑥∗𝑦 = 𝑓(= 𝑒). Then 𝑥′ ∗(𝑥∗𝑦) = 𝑥′ ∗𝑒 = 𝑥′ . But also 𝑥′ ∗(𝑥∗𝑦) = (𝑥′ ∗𝑥)∗𝑦 = 𝑒∗𝑦 = 𝑦, and so 𝑦 = 𝑥′ . Sufficiency. Obvious. □ Remark C.4.2. If the group (𝐺, ∗) is finite, then every stable subset of 𝐺 is a subgroup. Proof. Exercise.
□
(iii) Subrings. Suppose that (𝑅, +, ⋅) is a ring, and 𝐻 is a nonempty subset of 𝑅 that is closed under + and ⋅. If 𝐻, with compositions inherited from 𝑅, is itself a ring, 𝐻 is called a subring of 𝑅. Theorem C.4.3. A subset 𝐻 of the ring 𝑅 that is closed under + and ⋅ is a subring of 𝑅 if and only if (𝐻, +) is a subgroup of (𝑅, +). Proof. It suffices to observe that distributivity is inherited from 𝑅.
□
(iv) Subfields. Suppose that (𝐹, +, ⋅) is a field and that 𝐻 is a nonempty subset of 𝐹 that is closed under + and ⋅. If 𝐻, with compositions inherited from 𝐹, is itself a field, then 𝐻 is called a subfield of 𝐹. Theorem C.4.4. A subset 𝐻 of the field 𝐹 that is closed under + and ⋅ is a subfield of 𝐹 if and only if (𝐻, +) is a subgroup of (𝐹, +), and (𝐻 − {0}, ⋅) is a subgroup of (𝐹 − {0}, ⋅). Proof. Straightforward.
□
(v) Subalgebras. Let (𝐴, +, ⋅, .) be an 𝑅-algebra. Suppose that 𝐻 is a nonempty subset of 𝐴 that is closed under + and ⋅, and also under scalar multiplication (in the sense that, for all 𝜆 ∈ 𝑅, and all 𝑥 ∈ 𝐻, 𝜆.𝑥 ∈ 𝐻). If (𝐻, +, ⋅, .) is itself an 𝑅- algebra, 𝐻 is called a subalgebra of 𝐴. Theorem C.4.5. Suppose that (𝐴, +, ⋅, .) is an 𝑅-algebra, and 𝐻 is a nonempty subset of 𝐴 that is closed under + and ⋅, and also under scalar multiplication. Then 𝐻 is a subalgebra of 𝐴. Proof. Exercise (use properties (C.3.1)–(C.3.6) and (C.3.7)–(C.3.9)).
□
C.5. Isomorphic structures The algebraic structures (𝑆 1 , ∗1 ) and (𝑆 2 , ∗2 ) are said to be isomorphic if there exists a bijection 𝑓 ∶ 𝑆 1 → 𝑆 2 such that (1) 𝑓(𝑥 ∗1 𝑦) = 𝑓(𝑥) ∗2 𝑓(𝑦) for all 𝑥, 𝑦 ∈ 𝑆 1 . Any bijection 𝑓 satisfying property (1) is called an isomorphism from 𝑆 1 to 𝑆 2 . It is easy to show that if 𝑓 is an isomorphism from 𝑆 1 to 𝑆 2 , then 𝑓−1 is an isomorphism from 𝑆 2 to 𝑆 1 . Property (1) may also be expressed in the equivalent form (2) 𝑥 ∗1 𝑦 = 𝑓−1 (𝑓(𝑥) ∗2 𝑓(𝑦)), which highlights the fact that one can carry out a computation in 𝑆 1
266
C. Abstract algebra review
by an appropriate computation in 𝑆 2 , with the result of the latter mapped back to 𝑆 1 . This is of course the procedure that generations of (precalculator) algebra II students followed in evaluating the product of positive numbers 𝑥 and 𝑦 by the formula 𝑥 ⋅ 𝑦 = antilog(log 𝑥+log 𝑦), based on the isomorphism log from the group (ℝ+, ⋅) to the group (ℝ, +). The notion of an isomorphism extends to algebraic structures with more than one composition in the obvious way. The 𝑅-algebraic structures (𝐴1 , +1 , ⋅1 , .1 ) and (𝐴2 , +2 , ⋅2 , .2 ) are said to be isomorphic if there exists an isomorphism 𝑓 between the structures (𝐴1 , +1 , ⋅1 ) and (𝐴2 , +2 , ⋅2 ), and, for all 𝜆 ∈ 𝑅 and all 𝑥 ∈ 𝐴1 , 𝑓(𝜆.1 𝑥) = 𝜆.2 𝑓(𝑥). A property of an algebraic, or 𝑅-algebraic, structure is said to be algebraic if it is preserved under isomorphisms. It is easy to see that, among others, the properties of associativity, commutativity, and distributivity, as well as the existence of identity elements and inverses, are all algebraic properties. Consequently, any algebraic structure isomorphic to a semigroup (respectively, group, ring, integral domain, field) is itself a semigroup (respectively, group, ring, integral domain, field). Furthermore, any 𝑅algebraic structure isomorphic to an 𝑅-algebra is itself an 𝑅-algebra. When two structures are isomorphic they are, for all (algebraic) intents and purposes, indistinguishable. Yet we may prefer to work in one such structure rather than in its isomorphic twin, based on the greater salience or simplicity of the former. A particularly vivid illustration of this phenomenon appears in Chapter 14 (section 10), in which the reduced incidence algebra of a binomial poset is shown to be isomorphic to an algebra of formal power series.
References [1] S. Warner (1971): Classical Modern Algebra, Prentice-Hall. MR0267998 [2] S. Warner (1990): Modern Algebra, Volumes I and II, Dover Publications. MR1068318
Index
Addition and extended addition rule, 17 Aigner, M., 236, 248 Algebraic structures: with one composition, 261 isomorphic, 266 over a commutative ring with identity, 263, 264 substructures of, 264 with two compositions, 262, 263 Alon, N., 66 Andrews, G., 109, 116 Apostol, T., 251, 254 Arithmetic (number-theoretic) function, 97 Arrow impossibility theorem, 190 Bell numbers: and moments of a Poisson distribution, 83 and partitions of a set (resp., equivalence relations), 27, 71 exponential generating function for, 71 recurrence for, 71 Berge, C., 159 Bertrand’s ballot problem, 138 Bijective (combinatorial) proof, 2, 8 Binomial coefficient: and lattice paths, 35 as enumerator of subsets of a given cardinality, 31 recurrence, 32 table of (Pascal’s triangle), 32 Binomial inversion principle, 37
Binomial poset: definition, 240 factorial function of, 240 incidence coefficient of, 242 reduced incidence algebra of, 243 Bogart, K., 39, 45, 195, 205 Bonferroni inequalities, 43 Burnside’s lemma, 144 Canfield, E., 203, 205 Catalan numbers: and Dyck words (subdiagonal lattice paths), 137 and parenthesizations of a word, 136 and triangulations of a polygon, 138 recurrence, generating function, and closed form for, 136 Cauchy product, 210 Cauchy’s formula, 81 Cayley, A., 59, 66 Characteristic function of a set, 20, 39 Chi function (of an interval in a locally finite poset), 232 Cigler, J., 183 Circular words, 100–101 Combinatorial factorization of a polynomial, 4 Complete symmetric function, 92 Composition of a positive integer: generating functions for, 6 pictorial representation and enumeration of, 2 under restrictions on its parts, 6
267
268
weak, 5 Comtet’s theorem, 90, 93 Conjunctive normal form (of a boolean function), 157 Cycle index of a permutation, 146 Cycle numbers: and restricted ordered occupancy, 78 and permutations, 79 as connection constants, 80 as signless Stirling numbers of the first kind, 79 as weighted Stirling numbers, 89 recurrence and table, 78 recurrence and table, 79 Davis, R., 183, 186, 248 De Bruijn’s generalization of Polya’s theorem, 155, 159 Dedekind’s problem, 198 Dilworth’s antichain decomposition theorem, 195 Dilworth’s chain decomposition theorem, 196 Dilworth’s lemma, 195 Dirichlet product, 97, 210 Disjunctive normal form (of a boolean function), 157 Distribution polynomial: for integer partitions, 179 for statistics on discrete structures, 175 for the inversion statistic on integer sequences, 178 for the inversion statistic on permutations, 176 Dobinski’s formula, 71, 81 Doubilet, P., 227, 248 Doyle, P., 40, 44 Equinumerous (equipollent, equipotent) sets, 14 Equivalence relation, 27 Erdős, P., 24, 27 Erickson, M., 65, 66 Erikson, K., 109, 116 Euler 𝜙-function, 98, 99 Eulerian derivative, 182 Eulerian number, 85 Exponential formula, 88 Falling factorial polynomial, 20 Fibonacci number: and tilings, 5
Index
asymptotic growth rate of, 4 closed form for, 4 combinatorial interpretation of, 3 of a binomial poset, 250 recurrence and generating function for, 3 Finite difference: and polynomial interpolation, 121 antidifferences, table of, 123 definition and basic properties of, 119 finite difference calculus, fundamental theorem of, 122 relation to the shift operator, 120 Finitely additive measure, 19 Ford, L., 197, 205 Formal derivative (of a formal power series), 219 Formal power series, 212 Fulkerson, D., 197, 205 Function (map, mapping, functional, transformation, operator) as a distribution, 15 as a sequence or word, 15 domain partition induced by, 22 domain, codomain and range of, 13 extensional and intensional conceptions of, 13 graph of, 13 injective, surjective and bijective, 13 one- and two-sided inverses of, 14 partial, 47 weakly (resp., strictly) increasing, 44 Galois numbers of a finite vector space, ff. 181 Golden ratio, 3 Goldman, J., 182, 183 Graded poset, 239 Graham, R., 21, 27, 66, 205 Graph: as an irreflexive, binary relation, 57 complete, 58 complete bipartite, 66 connected, 58 edge coloring of, 62 edge of, 57 enumeration of isomorphism classes of, 151 labeled and unlabeled, 58 vertex, vertex adjacency, degree of a vertex, 57 Greatest lower bound (infimum), 204
Index
Gross, O., 53, 54 Hall, P. , 233–234 Harary, F., 62, 66, 154, 159 Harmonic numbers, 126 Harrison, M., 159 Hausdorff maximality principle, 194, 205 Incidence algebra (of a locally finite poset), 228 Indeterminate, 211 Irrelevance of alternatives (for a social welfare function), 190 Jordan–Dedekind chain condition, 239 Kaplansky, I., 40, 44 Kelley, J., 205, 259 Kirchhoff, G., 62, 66 Kitchen, J., 251, 254 Knuth, D., 21, 27, 181, 183 Kurtz, D., 203, 205 Lagrange interpolation theorem, 140 Lah numbers: recurrence, closed form, and table, 78 Lah numbers: and ordered occupancy, 77 as connection constants, 78 as weighted Stirling numbers, 88 recurrence, closed form, and table, 77 Lancaster’s theorem, 93 Lattice: algebraic, 204 order-theoretic, 204 sublattice, 204 Least upper bound (supremum), 203 Legendre’s theorem, 211 Linear difference equation: and rational generating functions, 132 for periodic and polynomial functions, 135 homogeneous, with constant coefficients, 127 in operator form, 127 solution using its characteristic polynomial, 128 Liu, C., 130, 140 Logarithmic concavity (of a real sequence), 200 Lubell, D., 197, 205 Lucas, E.:
269
and a congruence for binomial coefficients, 114 and the problème des ménages, 40, 44 Marriage theorem, 199 Matching, 198 McCluskey, E., 158, 159 Meshalkin, L., 205 Method of linear functionals, 182 Metric: discrete, 212 ultrametric, 214 Mobius function: of a positive integer, 98 of an interval in a locally finite poset, 233 Mobius inversion principle: binomial inversion as a special case, 99 for arithmetic functions, 98 for bivariate functions on a locally finite poset, 235 for univariate functions on a locally finite poset, 236 Modular binomial lattice: characteristic of, 247 definition of, 247 Mulay, S., 181, 183, 259 Multinomial coefficients: abbreviated notation for, 51 and distributions with prescribed occupancy numbers, 49 and ordered partitions of a set, 50 as enumerators of words with prescribed letter frequencies, 50 recurrence for, 51 Multiplication rule , 20 Newton’s inequality, 203 Niven, I., 213, 223 O’Hara, K., 183 Orbit (of a permutation group), 144 Ordered direct sum decomposition of a vector space, 173 Ordered partitions of a set: and preferential rankings (weak orders), 52 asymptotic formula for, 53 recurrence and exponential generating functions for, 49–50 infinite series for, 53 p-order of an integer, 112
270
Palmer, E., 147, 159 Partially ordered set (poset): antichain in, 194 chain in, 194 comparability of two elements of, 192 covering relation between two elements of, 193 dimension of, 194 duality principle for, 193 graded, 239 length (resp., width) of, 194 maximal chain in, 194 minimal (resp., maximal) element of, 192, 193 monotone boolean function on, 198 order ideal in, 198 smallest (resp., largest) element of, 192, 193 subposet (induced subposet) of, 192 weak subposet of, 192 Partition of a set: definition, 22 and equivalence relations, 27 enumeration of, by number of blocks, 69 Partition of an integer: as a distribution of unlabeled balls among unlabeled boxes, 101 as a multiset of positive integers, 101 Ferrers diagram of, 102 generating functions for, 105–106 pentagonal number theorem, 107 recurrence and table, 102 self-conjugate, 104 Patashnik, O., 21, 27 Pattern inventory of a permutation group, 149 Permutation group: as a subgroup of the symmetric group, 143 Burnside’s lemma, 144 orbits induced by, 144 permutational equivalence of, 161 Permutation: as a bijective self-map, 15 as a word, 15 Cauchy’s formula, 81, 147 cycle decomposition of, 79 enumeration by number of cycles, 79 Pervin, W., 259 Pigeonhole principle: elementary form, 17
Index
for functions, 23 for relations, 29 Polya’s first and second theorems, 146–149 Power sum: and Bernoulli numbers, 111 definition, 109 recurrences for, 109, 110 Prüfer code, 60 Principal dual order ideal (of a poset), 236 Principal order ideal (of a poset), 236 Probabilistic method (for determining bounds on Ramsey numbers), 65 Problème des rencontres: and the hat-check problem, 38 solution by binomial inversion, 38 q-binomial (Gaussian) coefficient, 165 q-binomial inversion principle, 169 q-factorials of the first and second kinds, 163–164 q-integer, 163 q-multinomial coefficients of the first and second kind, 172 q-Vandermonde identity, 171 Quasi-order (preorder): as a partially ordered partition, 207 as a reflexive, transitive relation, 188 connection with topologies, 189 Quine, W., 158, 159 Ramsey, F., 62, 66 Rank function (of a poset), 203 Rational generating functions (fundamental theorem of), 132 Reciprocal polynomial, 4 Relation: covering, in a partially ordered set, 193 domain and range of, 24 dual, complement, symmetric and asymmetric part of, 26 graph of, 24 Intensional and extensional conception of, 24 matrix representation of, 26 pigeonhole principle for, 30 symmetric complement of, 191 types of (reflexive, symmetric, asymmetric, antisymmetric, transitive, complete, negatively transitive), 25 Restricted growth function, 72 Rising factorial polynomial, 21
Index
Rota, G.-C., 19, 27, 81, 182, 183, 227, 248, 249 Rothschild, B., 62, 66 Schoenfield, J., 215, 223 Schur’s lemma, 65 Semigroup algebra, 209 Sen, A., 190, 205 Shattuck, M., 181, 183 Sieve formula (principle of inclusion and exclusion): abstract form (inversion formula for set functions), 41 basic form, 19 complementary form, 19 noninductive proof of, 39 Snake oil method, 123 Social welfare function, 190 Spanning subset (of a vector space), 183 Spanning tree (of a connected graph), 62 Spencer, J., 64, 66, 81 Sperner poset, 203 Sperner’s theorem, 197 Spiegel, M., 122, 140 Stanley, R., 11, 27, 131, 227, 248, 249 Stirling numbers of the first kind: and elementary symmetric functions, 75 as connection constants, 75 recurrence and table, 76 signless, 79 Stirling numbers of the second kind: and restricted growth functions, 72 and set partitions with prescribed number of blocks, 69 as connection constants, 74 exponential generating function for, 70 recurrence for, 70 table of (Stirling’s triangle), 70 Stone, H., 159 Strict order, 191 Strong convergence (of a complex sequence), 213 Strong logarithmic concavity (of a real sequence), 200 Strong pointwise convergence: of a sequence in 𝐶 𝑁 , 213 of an infinite series in 𝐶 𝑁 , 213 of sequences and series in 𝐶 Int(𝑃) , 229 Summability: of a complex sequence, 216, 251 of a double sequence in 𝐶 𝑁 , 217 of a sequence in 𝐶 Int(𝑃) , 230
271
of a sequence in 𝐶 𝑁 , 216 Sylvester, J., 104, 105 System of distinct representatives (SDR), 198–199 Szekeres, A., 24, 27 Szpilrajn’s theorem, 194, 205 Top-down summation, 7 Topology: base of, 255 definition of, 255 discrete, 255 indiscrete, 255 metric, 256 of pointwise convergence, 257 product, 257 separability axioms for, 257 Total (linear) order, 187 Touchard, J., 39, 44 Tournament, 29 Tree: as connected graph with no cycles, 59 Cayley’s formula for the number of, 59 Kirchhoff’s matrix tree theorem, 62 Prüfer code of, 60, 61 Trotter, W., 195, 205 Twenty-fold way, 77, 78 Unimodal sequence of real numbers, 167, 200 Vacuous implication, principle of, 12 Vandermonde, A.-T.: binomial coefficient identity, 33 determinant factorization theorem, 129 Velleman, D., 14, 27, 194, 205 Wade, W., 36, 251, 254 Wagner, C., 109, 117, 180, 183 Warner, S., 210, 223, 261, 266 Weak order: as a reflexive, transitive, complete relation, 188 connection with ordered partitions, 189 Weak Pareto property (of a social welfare function), 190 Weight function on the positive integers: definition, 87 weighted Stirling and Bell numbers, 88 Well-ordering, 187 West, D., 59, 66 Wilf, H., 44, 46, 123, 140 Worpitsky’s identity, 85
272
Zeilberger, D., 183 Zeta function: of a positive integer, 98 of an interval in a locally finite poset, 231
Index
A First Course in Enumerative Combinatorics provides an introduction to the fundamentals of enumeration for advanced undergraduates and beginning graduate students in the mathematical sciences. The book offers a careful and comprehensive account of the standard tools of enumeration—recursion, generating functions, sieve and inversion formulas, enumeration under group actions—and their application to counting problems for the fundamental structures of discrete mathematics, including sets and multisets, words and permutations, partitions of sets and integers, and graphs and trees. The author’s exposition has been strongly influenced by the work of Rota and Stanley, highlighting bijective proofs, partially ordered sets, and an emphasis on organizing the subject under various unifying themes, including the theory of incidence algebras. In addition, there are distinctive chapters on the combinatorics of finite vector spaces, a detailed account of formal power series, and combinatorial number theory. The reader is assumed to have a knowledge of basic linear algebra and some familiarity with power series. There are over 200 well-designed exercises ranging in difficulty from straightforward to challenging. There are also sixteen large-scale honors projects on special topics appearing throughout the text. The author is a distinguished combinatorialist and award-winning teacher, and he is currently Professor Emeritus of Mathematics and Adjunct Professor of Philosophy at the University of Tennessee. He has published widely in number theory, combinatorics, probability, decision theory, and formal epistemology. His Erdo˝s number is 2.
For additional information and updates on this book, visit www.ams.org/bookpages/amstext-49
AMSTEXT/49
This series was founded by the highly respected mathematician and educator, Paul J. Sally, Jr.