129 59 2MB
English Pages [156] Year 2015
DIOPHANTINE APPROXIMATION
Jan-Hendrik Evertse e-mail [email protected]
Fall 2015
Chapter 1 Introduction We give a brief overview of the contents of this course.
1.1
Geometry of numbers
A set C ⊂ Rn is called convex if for every x, y ∈ C, the line segment connecting them, that is {(1 − t)x + ty : 0 6 t 6 1}, is also contained in C. A central symmetric convex body in Rn is a closed, bounded, convex subset of Rn that contains 0 as an interior point and is symmetric about 0, i.e., for every x ∈ C we have −x ∈ C. Theorem 1.1 (Minkowski’s first convex body theorem, 1896). Let C ⊂ Rn be a central symmetric convex body of volume vol C > 2n . Then C contains a point x ∈ Zn with x 6= 0. We give a consequence. Theorem 1.2 (Dirichlet, 1842). Let ξ ∈ R\Q. Then there are infinitely many pairs (x, y) such that x (1.1) |ξ − | 6 y −2 , x, y ∈ Z, y > 0, gcd(x, y) = 1. y Proof. let Q ∈ Z, Q > 3, and consider the convex body CQ := (x, y) ∈ R2 : |x − ξy| 6 Q−1 , |y| 6 Q . 1
This body has volume vol CQ = 22 . Hence there is a non-zero point (x, y) ∈ CQ ∩Z2 . If we divide (x, y) by the gcd of x and y, the resulting point is still in CQ . Hence there is (x, y) ∈ CQ ∩Z2 with gcd(x, y) = 1. If y = 0 then x = ±1. The point (±1, 0) does not belong to CQ since Q > 3. Hence y 6= 0. By replacing (x, y) by (−x, −y) we can make y positive. Thus, we obtain for every Q > 3, a point (xQ , yQ ) ∈ CQ ∩Z2 with gcd(xQ , yQ ) = 1 and yQ > 0. We claim that if we let Q → ∞, then (xQ , yQ ) runs through infinitely many different pairs of integers. For assume the contrary. Then there is a fixed pair (x0 , y0 ) such that (xQ , yQ ) = (x0 , y0 ) for arbitrarily large Q. But then, |x0 − ξy0 | 6 Q−1 for arbitrarily large Q, and thus, x0 − ξy0 = 0, ξ = x0 /y0 . This is against our assumption ξ ∈ R \ Q. We conclude that indeed there are infinitely many distinct points among the (xQ , yQ ). For these points, we have |ξ −
xQ −2 | 6 (QyQ )−1 6 yQ . yQ
Theorem 1.2 follows. Exercise 1.1. Let ξ ∈ Q. Prove that there is a constant c(ξ) > 0 such that if (x, y) is a pair of integers with gcd(x, y) = 1, y > 0 and x/y 6= ξ, then |ξ − x/y| > c(ξ)y −1 . Deduce that Theorem 1.2 is false for ξ ∈ Q. We state without proof the following generalization of Dirichlet’s Theorem, which can also be deduced from Minkowski’s convex body theorem. Theorem 1.3 (Dirichlet, 1842). Let ξ1 , . . . , ξn be n > 2 real numbers, not all belonging to Q. Then there are infinitely many tuples (x1 , . . . , xn , y) ∈ Zn+1 such that x1 xn |ξ1 − | 6 y −1−1/n , . . . , |ξn − | 6 y −1−1/n , y > 0, gcd(x1 , . . . , xn , y) = 1. y y In the section on the Geometry of Numbers, we discuss a far-reaching generalization of Minkowski’s Theorem, and some further applications. 2
1.2
Approximation of algebraic numbers by rational numbers
In general, for given ξ ∈ R one may ask whether Theorem 1.2 remains true if we replace y −2 by a smaller function, say y −2−δ with δ > 0. That is, we may ask whether the inequality (1.2)
|ξ − x/y| 6 y −2−δ in x, y ∈ Z with y > 0, gcd(x, y) = 1
has infinitely many solutions. It is certainly possible to construct such numbers ξ, as is shown by the exercise below.
Exercise 1.2. Let a be an integer > 3 and put ξ := ity |ξ − x/y| 6 y −a
P∞
n=0
2n
10−a . Then the inequal-
has infinitely many solutions in integers x, y with y > 0, gcd(x, y) = 1. The number ξ constructed in this exercise seems very superficial. One may wonder, whether there are “reasonable” numbers ξ for which (1.2) has infinitely many solutions for some δ > 0. A famous and difficult theorem by K.F. Roth (1955), states that this is not the case if ξ is algebraic. Recall that a number ξ is called algebraic if there is a non-zero polynomial P ∈ Q[X] with P (ξ) = 0. Theorem 1.4 (Roth, 1955). Let ξ ∈ R be an algebraic number and let δ > 0. Then the inequality |ξ − x/y| 6 y −2−δ in x, y ∈ Z with y > 0, gcd(x, y) = 1 has only finitely many solutions. The proof of this result is too long to be included in this course. We will give some applications of this result to Diophantine equations. Likewise, one may ask whether Theorem 1.3 is best possible. In case that ξ1 , . . . , ξn are all real algebraic numbers we have the following famous result of W.M. Schmidt. A set of numbers {ξ1 , . . . , ξn } in C is said to be linearly independent over Q if {(x1 , . . . , xn ) ∈ Qn : x1 ξ1 + · · · + xn ξn = 0} = {(0, . . . , 0)}. 3
Theorem 1.5 (Schmidt, 1971). Let ξ1 , . . . , ξn be algebraic numbers in R such that {1, ξ1 , . . . , ξn } is linearly independent over Q. Further, let δ > 0. Then there are only finitely many tuples (x1 , . . . , xn , y) ∈ Zn+1 such that y > 0, gcd(x1 , . . . , xn , y) = 1, |ξ1 − x1 /y| 6 y −1−(1/n)−δ , . . . , |ξn − xn /y| 6 y −1−(1/n)−δ . Theorem 1.2 is a consequence of a far more general, very central result in Diophantine approximation, the Subspace Theorem. This theorem is too difficult to be stated in this introduction, but we will discuss it later. The Subspace Theorem has many consequences, in particular to Diophantine equations and inequalities, but also to other areas in number theory.
1.3
Transcendence
Recall that a number ξ ∈ C is transcendental (over Q) if it is not algebraic, i.e., there is no non-zero P ∈ Q[X] with P (ξ) = 0. The following counting argument implies that transcendental numbers exist. Theorem 1.6. (i) R is uncountable. (ii) The set of algebraic numbers in C is countable. Proof. (i) We use Cantor’s diagonal argument. It suffices to prove that the open interval ]0, 1[ = {x ∈ R : 0 < x < 1} is uncountable. We have to prove that ]0, 1[ \T 6= ∅ for any countable subset T of ]0, 1[ . Take an arbitrary countable subset T , and represent its numbers by their decimal expansions. The assumption that T is countable means that its elements can be arranged in a sequence, say 0.x11 x12 x13 . . . 0.x21 x22 x23 . . . 0.x31 x32 x33 . . . .. . There is 0.y1 y2 y3 . . . ∈]0, 1[ with yi 6= xii for all i > 1, and this number clearly does not belong to T . 4
(ii) A number ξ ∈ C is algebraic if there is non-zero F ∈ Z[X] such that F (ξ) = 0. For F = a0 X r + a1 X r−1 + · · · + ar ∈ Z[X] we put S(F ) := max(r, |a0 |, . . . , |ar |). Note that for given value of S there are only finitely many F ∈ Z[X] with S(F ) = S and each of these F has only finitely many zeros in C. Now order the algebraic numbers in a sequence as follows: first take all algebraic numbers which are zeros of polynomials F ∈ Z[X] with S(F ) = 1, then take the algebraic numbers not considered so far that are zeros of polynomials F ∈ Z[X] with S(F ) = 2, and so on. In this way, we eventually obtain all algebraic numbers, and thus they can be arranged in a sequence. It is of course a much more interesting (and difficult) problem whether numbers “from nature” such as e and π are transcendental. In 1873, Hermite proves that e is transcendental, and in 1882 Lindemann did the same for π. In fact, Lindemann P n proved the following result, which covers both. Here we define ez := ∞ n=0 z /n! for z ∈ C, Theorem 1.7. Let α ∈ C be a non-zero algebraic number. Then eα is transcendental. To deduce from this that π is transcendental, assume that it is algebraic. Then πi would be algebraic, but eπi = −1 is not transcendental, contradicting Lindemann’s Theorem. In our course we will discuss also other transcendence results. To give some flavour, we finish with a proof that e is irrational. We start with a simple but useful irrationality criterion. Lemma 1.8. Let ξ ∈ R. Assume there is a sequence of pairs of integers (xn , yn ) with yn > 0 such that xn /yn 6= ξ and |xn − ξyn | → 0 as n → ∞. Then ξ 6∈ Q. Proof. Assume that ξ ∈ Q. Then ξ = a/b with a, b ∈ Z, b > 0. Then for any pair of integers x, y with y > 0, x/y 6= ξ we have |x − ξy| =
|bx − ay| 1 > b b
since the numerator is a non-zero integer. Hence a sequence of pairs (xn , yn ) as in the statement of the lemma cannot exist. 5
Theorem 1.9. e 6∈ Q. Proof. We use the identity e = P xn = n! nk=0 k!1 . Then
P∞
k=0
1/k!. We apply the Lemma with yn = n!, and
∞ X 1 |xn − eyn | = n! k! k=n+1
hence 1 1 1 + + + ··· n + 1 (n + 1)(n + 2) (n + 1)(n + 2)(n + 3) 1 1 1 < + + + ··· 2 n + 1 (n + 1) (n + 1)3 1 = → 0 as n → ∞. n
0 < |xn − eyn | =
This proves that e 6∈ Q. Exercise 1.3. Complete the following irrationality proof for π (attributed to Cartwright, 1945). Assume that π = ab with a, b ∈ Z>0 , gcd(a, b) = 1. (i) Define Z
1
In := −1
(1 − x2 )n cos( 21 πx)dx for n = 0, 1, 2, . . . .
Prove that I0 =
4 π,
I1 =
32 π3 ,
In =
8n π2
(2n − 1)In−1 − (2n − 2)In−2
for n > 2.
2n+1 (ii) Prove that a n! · In ∈ Z for n > 0. 2n+1 (iii) Prove that 0 < a n! ·In < 1 for n sufficiently large, and deduce a contradiction.
6
Chapter 2 Geometry of numbers Literature: W.M. Schmidt, Diophantine approximation, Lecture Notes in Mathematics 785, Springer Verlag 1980, Chap.II, §§1,2, Chap. IV, §1 J.W.S. Cassels, An Introduction to the Geometry of Numbers, Springer Verlag 1997, Classics in Mathematics series, reprint of the 1971 edition C.L. Siegel, Lectures on the Geometry of Numbers, Springer Verlag 1989
2.1
Introduction
Geometry of numbers is concerned with the study of lattice points in certain bodies in Rn , where n > 2. We discuss Minkowski’s theorems on lattice points in central symmetric convex bodies. In this introduction we give the necessary definitions. Lattices. A (full) lattice in Rn is an additive group L = {z1 v1 + · · · + zn vn : z1 , . . . , zn ∈ Z} where {v1 , . . . , vn } is a basis of Rn , i.e., {v1 , . . . , vn } is linearly independent. We call {v1 , . . . , vn } a basis of L. The determinant of L is defined by d(L) := | det(v1 , . . . , vn )|, that is, the absolute value of the determinant of the matrix with columns v1 , . . . , vn . 7
We show that the determinant of a lattice does not depend on the choice of the basis. Recall that GL(n, Z) is the multiplicative group of n × n-matrices with entries in Z and determinant ±1. Lemma 2.1. Let L be a lattice, and {v1 , . . . , vn }, {w1 , . . . , wn } two bases of L. Then there is a matrix U = (uij ) ∈ GL(n, Z) such that (2.1)
wi =
n X
uij vj for i = 1, . . . , n.
j=1
Consequently, | det(v1 , . . . , vn )| = | det(w1 , . . . , wn )|. Proof. Let U be the matrix expressing w1 , . . . , wn into v1 , . . . , vn , that is, the matrix given by (2.1). A priori, U is just a non-singular matrix, but since w1 , . . . , wn lie in the lattice generated by v1 , . . . , vn , it must have its entries in Z. P Let U −1 = (uij ). Then from linear algebra we know that vi = nj=1 uij wj for i = 1, . . . , n. Now U −1 has its entries in Z since v1 , . . . , vn lie in the lattice generated by w1 , . . . , wn . Since both det U and det U −1 are integers and must be multiplicative inverses of one another, we have det U = ±1, i.e., U ∈ GL(n, Z). Finally, we observe that | det(w1 , . . . , wn )| = | det U | · | det(v1 , . . . , vn )| = | det(v1 , . . . , vn )|.
Let L, M be two lattices in Rn with M ⊆ L. Choose bases {v1 , . . . , vn } of L, {w1 , . . . , wn } of M . Let U = (uij ) be the matrix expressing w1 , . . . , wn into v1 , . . . , vn . Then U has its entries in Z. We define the index of M in L by [L : M ] := | det U |. The relation det(w1 , . . . , wn ) = det U · det(v1 , . . . , vn ) easily translates into d(M ) = [L : M ] · d(L) and this shows that [L : M ] does not depend on the choices of the bases of L and M. Convex bodies. Recall that a subset C of Rn is convex if for any two points 8
x, y ∈ C, also the line segment connecting them, i.e., {tx + (1 − t)y : 0 6 t 6 1}, is contained in C. A central symmetric convex body in Rn is a closed, bounded, convex subset C of Rn having 0 as an interior point, and which is symmetric about 0, i.e. if x ∈ C then also −x ∈ C. Examples. (i). If C is a central symmetric convex body, then so is its dilation with factor λ > 0, λC := {λx : x ∈ C}. (ii). If C is a central symmetric convex body and φ a linear transformation of Rn , then φ(C) is also a central symmetric convex body. (iii). Parallelepipeds and ellipsoids: Let Kn = {x ∈ Rn : max |xi | 6 1}, 16i6n
Bn = {x ∈ Rn : x21 + · · · + x2n 6 1}
be the n-dimensional unit cube, and the n-dimensional Euclidean unit ball, respectively, where x = (x1 , . . . , xn ) ∈ Rn . A (symmetric) parallelepiped in Rn is the image of Kn under a linear transformation of Rn , and a (symmetric) ellipsoid in Rn the image of Bn under a linear transformation of Rn . Both are central symmetric convex bodies.
Exercise 2.1. Recall that a norm on Rn is a function k · k : Rn → R>0 such that • kλxk = |λ| · kxk for all x ∈ Rn , λ ∈ R; • kx + yk 6 kxk + kyk for all x, y ∈ Rn ; • kxk = 0 ⇐⇒ x = 0. Prove that the unit ball Bk·k = {x ∈ Rn : kxk 6 1} is a central symmetric convex body. Hint. You may use that all norms on Rn induce the usual topology, that is, a subset U of Rn is open if and only for every x0 ∈ U there is δ > 0 such that {x ∈ Rn : kx − x0 k < δ} ⊆ U . In fact, every central symmetric convex body arises from a norm. On Rn we define the function kxkC := min{λ ∈ R>0 : x ∈ λC}. 9
Lemma 2.2. (i) k · kC is well defined. (ii) k · kC defines a norm on Rn . (iii) λC = {x ∈ Rn : kxkC 6 λ} for λ > 0. Proof. (i). Clearly, k0kC = 0. Suppose x 6= 0. Consider the set S := {ξx : ξ ∈ R>0 , ξx ∈ C}. Then S 6= {0}. For since 0 is an interior point of C there is δ > 0 with δx ∈ C. Further, since C is compact and S is a closed subset of C, the set S is compact. S is homeomorphic to I = {ξ ∈ R>0 : ξx ∈ C}. Hence I is compact. So I has a maximum, call it µ0 . Then µ0 > 0 since S 6= {0}. −1 We claim that kxkC = µ−1 0 . First µ0 x ∈ C, hence x ∈ µ0 C. Further, if x ∈ λC, then λ−1 x ∈ C hence λ−1 6 µ0 . So λ > µ−1 0 . This proves our claim, hence (i).
(ii). We have shown above that kxkC = µ−1 0 > 0 if x 6= 0. The proofs of the other two norm properties are left to the reader. (iii). Left to the reader. Exercise 2.2. Prove (ii) and (iii).
2.2
Minkowski’s first convex body theorem
Using Lebesgue theory, one can define an n-dimensional volume vol(S) ∈ R>0 ∪{∞} (the so-called Lebesgue measure) for subsets S of Rn from a large class, the so-called measurable subsets of Rn . We do not need the precise definition of Lebesgue measure or measurable set. What is important to us is that all open sets and all closed sets are measurable, bounded measurable sets have finite volume, and the empty R set has volume 0. The volume of S is equal to the Riemann integral S dx1 · · · dxn for every set S for which this integral is defined. However, there are measurable sets S for which the Riemann integral is not defined. We mention some important properties of the volume: 1. Let S be a measurable subset of Rn . Then every translate a+S := {a+x : x ∈ S} is also measurable and vol(a + S) = vol(S). Further, if φ is a linear transformation of Rn , then φ(S) is measurable and vol(φ(S)) = | det φ| · vol(S). 10
2. Let S ⊂ Rn be measurable. Then S c = Rn \ S is measurable. 3. Let Sn (n = 1, 2, 3, . . .) be a countable collection of measurable subsets of Rn . S Then S = ∞ S is measurable. Moreover, if the sets Sn are pairwise disjoint, n=1 P∞n then vol(S) = n=1 vol(Sn ).
Theorem 2.3. (Minkowski’s first convex body theorem, 1896). Let C be a central symmetric convex body in Rn and L a lattice in Rn of rank n. Suppose that vol(C) > 2n d(L). Then C contains a point from L \ {0}. Choose a basis {v1 , . . . , vn } of L. We call F := {x1 v1 + · · · + xn vn : xi ∈ R, 0 6 xi < 1 for i = 1, . . . , n} a fundamental parallelepiped for L. Notice that F has volume d(L), the translates u + F (u ∈ L) are pairwise disjoint, and [ (u + F ). Rn = u∈L
Proof. We first assume that vol(C) > 2n d(L). Then the set 21 C = { 12 x : x ∈ C} has volume > d(L). For u ∈ L, define Su := 12 C ∩ (u + F ). Then the sets Su (u ∈ L) are pairwise disjoint and their union is precisely 12 C. Hence X vol(Su ) = vol( 12 C) > d(L). u∈L
We shift the sets Su into F , that is, we define Su∗ := −u + Su = (−u + 21 C) ∩ F for u ∈ L. Since Su∗ has the same volume as Su , we have X X vol(Su∗ ) = vol(Su ) > d(L) = vol(F ). u∈L
u∈L
That is, we have a collection of subsets Su∗ (u ∈ L) of F , the sum of whose volumes is larger than the volume of F . So there are two distinct u, v ∈ L such that Su∗ ∩Sv∗ 6= ∅. Pick a point a ∈ Su∗ ∩ Sv∗ . Then for certain x, y ∈ 12 C we have x − u = y − v = a. Hence x − y = u − v ∈ L \ {0}. 11
Now 2x, 2y ∈ C, by the symmetry of C we have −2y ∈ C, and by the convexity of C we have 12 (2x − 2y) = x − y ∈ C. This shows that C contains a non-zero point from L. Now assume that vol(C) = 2n d(L). Suppose that C does not contain a non-zero point from L. Then since C is compact and L is discrete, there is λ > 1 such that λC does not contain a non-zero point from L. But this contradicts what has been established above, since vol(λC) = λn vol(C) > 2n d(L). Exercise 2.3. Prove the following theorem of Blichfeldt. let S be a measurable, not necessarily convex, subset of Rn with vol(S) > d(L). Then there are x, y ∈ S with x 6= y and x − y ∈ L. We give some consequences. Corollary 2.4. Let li = αi1 X1 + · · · + αin Xn (i = 1, . . . , n) be linear forms with real coefficients and with det(l1 , . . . , ln ) 6= 0. Let A1 , . . . , An be positive reals with A1 · · · An > | det(l1 , . . . , ln )| . Then there is a non-zero x ∈ Zn with |l1 (x)| 6 A1 , . . . , |ln (x)| 6 An . Proof. Define C = {x ∈ Rn : |li (x)| 6 Ai (i = 1, . . . , n)}, R := {y = (y1 , . . . , yn ) ∈ Rn : |yi | 6 Ai (i = 1, . . . , n)}. Then R = φ(C), where φ(x) = (l1 (x), . . . , ln (x)). Notice that φ is a linear transformation of Rn of determinant det(l1 , . . . , ln ). Since applying a linear transformation φ to a set has the effect that the volume of that set is multiplied by | det φ|, we have vol(R) = | det φ| vol(C), hence vol(C) = | det φ|−1 vol R = | det(l1 , . . . , ln )|−1 2n A1 · · · An > 2n . Now it follows at once that C contains a non-zero point from Zn . 12
We recall some well-known results of Dirichlet from 1842, concerning the approximation of real numbers by rational numbers. Dirichlet proved these results using his own box principle (if m objects are put into n boxes where m > n then two objects are in the same box). Dirichlet’s results can be obtained alternatively from the above corollary. Corollary 2.5 (Dirichlet, 1842). Let ξ be a real irrational number. Then there are infinitely many pairs of integers (x, y) with gcd(x, y) = 1, y > 0 and ξ − x 6 y −2 . y This has been proved in the introduction. We deduce some generalizations. Corollary 2.6 (Dirichlet, 1842). (i) Let ξ1 , . . . , ξn be real numbers, at least one of which is irrational. Then there are infinitely many tuples of integers (x1 , . . . , xn , y) with gcd(x1 , . . . , xn , y) = 1, y > 0 and x i ξi − 6 y −1−1/n for i = 1, . . . , n. (2.2) y (ii) Let ξ1 , . . . , ξn be real numbers such that 1, ξ1 , . . . , ξn are linearly independent over Q. Then there are infinitely many tuples of integers (x, y1 , . . . , yn ) with (y1 , . . . , yn ) 6= (0, . . . , 0) and |ξ1 y1 + · · · + ξn yn − x| 6 max(|y1 |, . . . , |yn |)−n . Proof. We prove only (i). Notice that (2.2) is equivalent to (2.3)
|xi − ξi y| 6 y −1/n for i = 1, . . . , n.
The idea is to consider instead of (2.2) the system of inequalities (2.4)
|xi − ξi y| 6 Q−1/n (i = 1, . . . , n), |y| 6 Q,
for any integer Q > 1, and to let Q vary. If (x1 , . . . , xn , y) is a tuple of integers satisfying this system, then by replacing Q−1/n by y −1/n as we may, we obtain a solution of (2.3) and hence of (2.2). Notice that the system of linear forms X1 − ξ1 Xn+1 , . . . , Xn − ξn Xn+1 , Xn+1 has determinant 1. So by Corollary 2.4, for every integer Q > 1 there is a non-zero 13
tuple of integers (x1 , . . . , xn , y) satisfying (2.4). If y = 0 then x1 = · · · = xn = 0 which is impossible. Hence y 6= 0. By changing signs and dividing out the gcd of x1 , . . . , xn , y if necessary, we obtain a solution xQ = (x1 , . . . , xn , y) of (2.4) with y > 0 and gcd(x1 , . . . , xn , y) = 1. We claim that if we let Q → ∞, then xQ runs through an infinite set. Indeed, suppose the contrary. Then there is an infinite sequence of integers Qi → ∞ such that for each Qi the point xQi is equal to some fixed tuple of integers x = (x1 , . . . , xn , y) independent of i. But then, ξi = xi /y ∈ Q for i = 1, . . . , n, against our assumption. Now replacing in (2.4) the upper bound Q−1/n by y −1/n as suggested above, we obtain infinitely many solutions of (2.2). Exercise 2.4. Prove the following common generalization of both (i) and (ii). Let m, n be positive integers and li = ξi1 X1 + · · · + ξin Xn (i = 1, . . . , m) linear forms with real coefficients satisfying {y ∈ Zn : li (y) ∈ Z for i = 1, . . . , m} = {0}. Then there are infinitely many tuples (x, y), with x = (x1 , . . . , xm ) ∈ Zm , y = (y1 , . . . , yn ) ∈ Zn , y 6= 0, such that −m/n |li (y) − xi | 6 max |yj | for i = 1, . . . , m. 16j6n
Remarks. 1. Corollary 2.5 has been improved by Hurwitz (1891) as follows. If ξ is any irrational, real number, then there are infinitely pairs of integers (x, y) ∈ Z2 such that √ −1 |ξ − x/y| 6 5 y −2 , y > 0, gcd(x, y) = 1. √ A second result of Hurwitz states that there are ξ (e.g. ξ = 12 (1+ 5)), for which the √ −1 √ −1 constant 5 is optimal, that is, for every A < 5 , the inequality |ξ − x/y| 6 Ay −2 has only finitely many solutions in pairs (x, y) ∈ Z2 with y > 0. Hurwitz’ proofs are based on the theory of continued fractions (not to be discussed in this course). 2. Given a real number θ, we denote by kθk the distance of θ to the nearest integer, i.e., kθk = min{|θ − m| : m ∈ Z}. Corollary 2.6 implies that for any two real numbers ξ1 , ξ2 , not both in Q, there are infinitely many positive integers y such that kξ1 yk 6 y −1/2 , kξ2 yk 6 y −1/2 . 14
This implies that there are infinitely many positive integers y such that ykξ1 yk · kξ2 yk 6 1. In fact, this is true also if both ξ1 , ξ2 ∈ Q, since then, there are infinitely many integers y with kξ1 yk = 0, kξ2 yk = 0. The following famous conjecture, due to Littlewood, is still open: Littlewood’s Conjecture. Let ξ1 , ξ2 be any two real numbers. Then for every ε > 0 there exists a positive integer y such that ykξ1 yk · kξ2 yk < ε. Note that kxk 6 21 for every x ∈ R. So Littlewood’s conjecture would imply also that for any n > 3 reals ξ1 , . . . , ξn , and for any ε > 0 there is a positive integer y with ykξ1 yk · · · kξn yk < ε. Exercise 2.5. Deduce from Hurwitz’ second result in Remark 1 that there are ξ ∈ R for which there is a constant c(ξ) > 0 such that y · kξyk > c(ξ) for all y ∈ Z>0 (that is, there is no one-dimensional analogue of Littlewood’s Conjecture).
2.3
Minkowski’s second convex body theorem
Let L be a lattice in Rn and C a central symmetric convex body in Rn . Definition. The n successive minima λ1 , . . . , λn of C with respect to L are defined as follows: λi is the minimum of all positive reals λ such that λC contains at least i linearly independent points from L. Lemma 2.7. The successive minima λ1 , . . . , λn of C with respect to L are welldefined, and 0 < λ1 6 · · · 6 λn < ∞. Proof. Let k · kC be the norm associated with C, defined by kxkC = min{λ ∈ R>0 : x ∈ λC}. Recall that λC = {x ∈ Rn : kxkC 6 λ}. 15
We can order the points of L as a sequence x0 = 0, x1 , x2 , . . . such that 0 = kx0 kC < kx1 kC 6 kx2 kC 6 · · · . To see this, consider for each positive integer m the points x ∈ L with m − 1 < kxkC 6 m, these are the points with x ∈ mC, x 6∈ (m − 1)C. Since L is discrete and mC is closed and bounded, there are only finitely many such points and these can be ordered according to their k · kC -values. Let v1 := x1 , and for i = 2, 3, . . . , n, let vi be the first element in the sequence {xk }∞ k=1 that is linearly independent of v1 , . . . , vi−1 . Put λi := kvi kC for i = 1, . . . , n. Then 0 < λ1 6 λ2 6 · · · 6 λn < ∞. We show that λ1 , . . . , λn are the successive minima of C with respect to L. It is clear that λ1 is the first minimum. Suppose that there is an index i > 2 such that λi is not the i-th minimum. Then there is µ < λi such that µC contains i linearly independent points from L. At least one of these points, say y, must be linearly independent of v1 , . . . , vi−1 . For this y we have kykC 6 µ < λi , so in the sequence {xk }∞ k=1 constructed above it occurs before vi . But this is impossible, since in {xk }∞ the points between v1 and v2 are linearly dependent on v1 , the points k=1 between v2 and v3 are linearly dependent on v1 and v2 ,. . ., the points between vi−1 and vi are linearly dependent on v1 , . . . , vi−1 . This proves our lemma. Remark. The above construction gives linearly independent vectors v1 , . . . , vn ∈ L with vi ∈ λi C and in fact kvi kC = λi for i = 1, . . . , n. These vectors need not form a basis of L. Minkowski’s second convex body theorem gives an optimal upper and lower bound for the product of the successive minima of a central symmetric convex body C with respect to a lattice L. Theorem 2.8 (Minkowski’s second convex body theorem, 1910). Let L be a lattice and C a central symmetric convex body, both in Rn , and let λ1 , . . . , λn be the successive minima of C with respect to L. Then 2n d(L) d(L) · 6 λ1 · · · λn 6 2n · . n! vol(C) vol(C) We show that Minkowski’s second convex body theorem implies his first. Second convex body theorem ⇒ First convex body theorem. Minkowski’s second convex body theorem implies that λn1 6 2n d(L)/ vol(C). Assume that vol(C) > 2n d(L); 16
then λ1 6 1. Now λ1 C contains a non-zero point from L and λ1 C ⊆ C; hence C contains a non-zero point from L. Example 1. Let Bn be the Euclidean ball in Rn , given by x21 +· · ·+x2n 6 1. Let L be a lattice of rank n in Rn . Let λ1 , . . . , λn be the successive minima of Bn with respect Pn 2 1/2 to L. It is clear that x ∈ λBn if and only if kxk2 6 λ, where kxk2 = j=1 xj is the Euclidean norm. There are linearly independent vectors v1 , . . . , vn ∈ L with kvi k2 6 λi for i = 1, . . . , n. In fact, v1 is a (not necessarily unique) shortest nonzero vector in L, and for i = 2, . . . , n, vi is a shortest vector in L outside the linear subspace spanned by v1 , . . . , vi−1 . Now Theorem 2.8 implies that n Y
kvi k2 6 2n V (n)−1 d(L),
i=1
where V (n) = vol(Bn ). Recall that V (1) = 2, V (2) = π, and V (n) = 2π n V (n − 2) for n > 3. We mention once more that v1 , . . . , vn need not be a basis of L. Example 2. We prove that the constant 2n in the upper bound of Theorem 2.8 is best possible, i.e., the theorem becomes false if 2n is replaced by a smaller quantity. Let L be any lattice in Rn of rank n and choose a basis v1 , . . . , vn of L. Further, let λ1 , . . . , λn be reals with 0 < λ1 6 · · · 6 λn . Define −1 −1 C1 := x1 λ1 v1 + · · · + xn λn vn : x1 , . . . , xn ∈ R, max |xi | 6 1 . 16i6n
The set C1 is the image of the n-dimensional cube Kn := {x = (x1 , . . . , xn ) ∈ Rn : max |xi | 6 1} 16i6n
under the linear transformation φ : Rn → Rn : ei 7→ λ−1 i vi (i = 1, . . . , n), where e1 , . . . , en is the standard basis of Rn . Consequently, C1 is a central symmetric convex body with volume vol(C1 ) = | det(φ)| · vol(Kn ) = 2n
| det(v1 , . . . , vn )| = 2n d(L)(λ1 · · · λn )−1 . λ1 · · · λn 17
Thus, λ1 · · · λn = 2n d(L) vol(C1 )−1 . We now show that λ1 , . . . , λn are the successive minima of C1 with respect to L. For λ > 0 we have ( n ) X λC1 = xj λ−1 j vj : xj ∈ R, |xj | 6 λ for j = 1, . . . , n . j=1
This implies that λi C1 contains the i linearly independent points v1 , . . . , vi . Let P λ < λi and let y = nj=1 yj vj with y1 , . . . , yn ∈ Z be a lattice point in λC1 . Then |yi | < 1, . . . , |yn | < 1, implying that yi = · · · = yn = 0. So all lattice points in λC1 lie in the (i − 1)-dimensional space spanned by v1 , . . . , vi−1 , and this space cannot contain i linearly independent points. So λi is the i-th successive minimum of C1 . It follows that indeed λ1 , . . . , λn are the successive minima of C1 . Example 3. We prove that the factor 2n /n! in the lower bound of Theorem 2.8 is best possible. Let ) ( n X −1 C2 := x1 λ−1 |xi | 6 1 . 1 v1 + · · · + xn λn vn : x1 , . . . , xn ∈ R, i=1
Then C2 is the image under φ : ei 7→ octahedron ( On :=
λ−1 i vi
(i = 1, . . . , n) of the n-dimensional ) n X n y = (y1 , . . . , yn ) ∈ R : |yi | 6 1 . i=1
n
We have vol(On ) = 2 /n! (verify this!). It follows that C2 is a central symmetric convex body with volume 2n 2n vol(C2 ) = | det φ| · = d(L)(λ1 · · · λn )−1 , n! n! 2n hence λ1 · · · λn = n! d(L)/ vol(C2 ). Exercise 2.6. Prove that λ1 , . . . , λn are the successive minima of C2 with respect to L. We first deduce the lower bound for λ1 · · · λn in Theorem 2.8. After that, we prove the upper bound in the special case that C is an ellipsoid. For a proof of the upper bound for arbitrary C, which is much more involved, we refer to the book of Cassels, Chapter 8. We need a lemma. 18
Lemma 2.9. Let v1 , . . . , vr ∈ Rn . Then ( r ) r X X xi vi : xi ∈ R, |xi | 6 1 i=1
i=1
is the smallest convex subset in Rn , symmetric about 0, that contains v1 , . . . , vr , that is, the set itself is convex and symmetric about 0, and it is contained in every other convex set which is symmetric about 0 and contains v1 , . . . , vr . Exercise 2.7. Prove this lemma. Proof of the lower bound in Theorem 2.8. Choose linearly independent vectors v1 , . . . , vn of L such that vi ∈ λi C for i = 1, . . . , n. Then λ−1 i vi ∈ C for i = 1, . . . , n. Consider the set C2 from Example 3. By Lemma 2.9, this is the smallest symmetric convex set containing the points λ−1 i vi ∈ C (i = 1, . . . , n). Hence C2 ⊆ C. We mention that v1 , . . . , vn generate a sublattice M of L of rank n, therefore, | det(v1 , . . . , vn )| = d(M ) = [L : M ] · d(L) > d(L). Hence, vol(C) > vol(C2 ) =
2n 2n d(M )(λ1 · · · λn )−1 > d(L)(λ1 · · · λn )−1 . n! n!
This implies the lower bound for λ1 · · · λn from Theorem 2.8. We now prove the upper bound for ellipsoids. Theorem 2.10. Let L be a lattice in Rn and E an ellipsoid in Rn which is symmetric about 0. Then for the successive minima λ1 , . . . , λn of E with respect to L we have λ1 · · · λn 6 2n d(L)/ vol(E). Proof. We first observe that there is no loss of generality to assume that E = Bn is the Euclidean unit ball given by kxk2 = (x21 + · · · + x2n )1/2 6 1. Recall that there is a linear transformation φ of Rn such that φ(Bn ) = E. Let L0 = φ−1 (L). Then clearly, the successive minima of Bn with respect to L0 are equal to the successive minima of E with respect to L. Further, vol(E) = | det φ| vol(Bn ), d(L) = | det φ|d(L0 ), which implies d(L)/ vol(E) = d(L0 )/ vol(Bn ). This argument shows that it suffices to prove Theorem 2.10 for L0 , Bn instead of L, E. 19
So let λ1 , . . . , λn be the successive minima of Bn with respect to a given lattice L in Rn . This means that λi is the smallest λ > 0 such that the set of x ∈ L with kxk2 6 λ contains i linearly independent points. Define 1/n 2n d(L) µ := . vol(Bn )λ1 · · · λn We have to prove that µ > 1. Let v1 , . . . , vn be linearly independent vectors in L such that vi ∈ λi Bn , i.e., kvk2 = λi for i = 1, . . . , n. Recall from linear algebra that by means of the Gram-Schmidt orthogonalization procedure, one can construct an orthonormal basis e1 , . . . , en of Rn such that for i = 1, . . . , n, the linear subspace of Rn spanned by e1 , . . . , ei is equal to the linear subspace of Rn spanned by v1 , . . . , vi . Let ψ be the linear transformation of Rn defined by ψ(ei ) = λi ei for i = 1, . . . , n. Then ψ has determinant λ1 · · · λn . Hence vol µψ(Bn ) = µn λ1 · · · λn vol(Bn ) = 2n d(L). By Minkowski’s first convex body theorem, µψ(Bn ) contains a non-zero lattice point, x, say, of L. The idea of the proof is now to estimate from above and below kψ −1 (x)k2 . A comparison of both estimates gives µ > 1. First observe that ψ −1 (x) ∈ µBn . This means precisely that kψ −1 (x)k2 6 µ. We now show that kψ −1 (x)k2 > 1. Then µ > 1 follows. Since x ∈ L \ {0} we have kxk2 > λ1 . Let i be the largest index such that kxk2 > λi . This means that x is in the linear space Vi generated by v1 , . . . , vi . Indeed, this is clearly true if i = n since Vn = Rn . Suppose that i < n. Then kxk2 < λi+1 . The vectors v1 , . . . , vi have Euclidean norm 6 λi < λi+1 and by the definition of λi+1 there cannot be i + 1 linearly independent points in L of Euclidean norm < λi+1 . Hence x ∈ Vi . As mentioned before, Vi is also spanned by e1 , . . . , ei . So we have in fact x = x1 e1 + · · · + xi ei with x1 , . . . , xi ∈ R. P Clearly, ψ −1 (x) = ik=1 (xk /λk )ek . Hence, using λ1 6 · · · 6 λi , ! i i X X −1 2 2 2 kψ (x)k2 = (xk /λk ) > xk /λ2i = kxk2 /λ2i > 1. k=1
k=1
This completes the proof of Theorem 2.10. 20
2.4
Dual lattices
Henceforth, vectors in Rn will be column vectors, unless otherwise stated. As usual, the standard inner product of x = (x1 , . . . , xn )T , y = (y1 , . . . , yn )T ∈ Rn is given by P hx, yi = ni=1 xi yi . Here, by AT we denote the transpose of a matrix A. Let L be a lattice in Rn The dual of L is given by L∗ := {x ∈ Rn : hx, yi ∈ Z for all y ∈ L}. Let {v1 , . . . , vn } be a basis of L, and V the matrix with columns v1 , . . . , vn . Thus, L = {V z : z ∈ Zn }. Using hx, V yi = hV T x, yi for x, y ∈ Rn , we obtain L∗ = {x ∈ Rn : hx, V zi ∈ Z ∀z ∈ Zn } = {x ∈ Rn : hV T x, zi ∈ Z ∀z ∈ Zn } = {x ∈ Rn : V T x ∈ Zn } = {(V T )−1 w : w ∈ Zn }. Hence L∗ is a lattice in Rn , with basis the columns v1∗ , . . . , vn∗ of (V T )−1 . Note that d(L∗ ) = | det(V T )−1 | = | det V |−1 = d(L)−1 . We denote by Bn the n-dimensional Euclidean ball given by kxk2 = hx, xi1/2 6 1. Theorem 2.11. Let L be a lattice in Rn and L∗ its dual. Further, let λ1 , . . . , λn be the successive minima of L with respect to Bn , and λ∗1 , . . . , λ∗n the successive minima of L∗ with respect to Bn . Then 1 6 λi λ∗n+1−i 6 c(n) for i = 1, . . . , n, where c(n) = 4n vol(Bn )−2 . Proof. We first deduce the lower bound for λi λ∗n+1−i . Let v1 , . . . , vn be linearly independent vectors from L such that vi ∈ λi Bn , i.e., kvi k2 6 λi for i = 1, . . . , n. Likewise, let v1∗ , . . . , vn∗ be linearly independent vectors from L∗ such that kvi∗ k2 6 λ∗i for i = 1, . . . , n. Now let i ∈ {1, . . . , n}, and consider the set of vectors {x ∈ Rn : hvk , xi = 0 for k = 1, . . . , i}. 21
Since v1 , . . . , vi are linearly independent, this is a linear subspace of Rn of dimension ∗ does not lie in this space. It n − i. Hence at least one of the vectors v1∗ , . . . , vn+1−i follows that there are indices k 6 i, l 6 n + 1 − i, such that hvk , vl∗ i = 6 0. But hvk , vl∗ i ∈ Z, since vk ∈ L, vl∗ ∈ L∗ . Hence |hvk , vl∗ i| > 1. Now by the Cauchy-Schwarz inequality, 1 6 |hvk , vl∗ i| 6 kvk k2 kvl∗ k2 6 λk λ∗l 6 λi λ∗n+1−i . This establishes the lower bound. To prove the upper bound, recall that by Theorem 2.10, we have λ1 · · · λn 6 2n d(L)(vol Bn )−1 , λ∗1 · · · λ∗n 6 2n d(L∗ )(vol Bn )−1 . Further, d(L∗ ) = d(L)−1 . Hence n Y (λi λ∗n+1−i ) 6 4n d(L)d(L∗ ) vol(Bn )−2 = c(n). i=1
It follows that for i = 1, . . . , n, λi λ∗n+1−i 6 Q
c(n) 6 c(n). ∗ j6=i λj λn+1−j
This proves Theorem 2.11.
2.5
Kronecker’s approximation theorem
Recall that by Dirichlet’s Theorem, if ξ1 , . . . , ξn are real numbers of which at least one is irrational, then there are infinitely many tuples of integers x1 , . . . , xn , y such that |ξi − xi /y| 6 y −1−1/n for i = 1, . . . , n, y > 0. This implies that for every ε > 0, there exists (x1 , . . . , xn , y) ∈ Zn+1 such that |ξi y − xi | 6 ε for i = 1, . . . , n, y > 0. Kronecker’s approximation theorem deals with systems of inhomogeneous inequalities of the shape (2.5)
|ξi y − xi − θi | 6 ε (i = 1, . . . , n) in x1 , . . . , xn , y ∈ Z
where θ1 , . . . , θn are any real numbers. 22
Theorem 2.12. Let ξ1 , . . . , ξn , θ1 , . . . , θn be real numbers. Suppose that 1, ξ1 , . . . , ξn are linearly independent over Q. Then for every ε > 0, there exists (x1 , . . . , xn , y) ∈ Zn+1 with (2.5). Remark. The condition that 1, ξ1 , . . . , ξn be linearly independent over Q can not be removed. For suppose that 1, ξ1 , . . . , ξn are linearly dependent over Q. Then there are integers a1 , . . . , an , a0 , not all 0, such that a1 ξ1 + · · · + an ξn = a0 . In fact, at least one of a1 , . . . , an is non-zero. Choose θ1 , . . . , θn ∈ R such that a1 θ1 + · · · + an θn 6∈ Z. Let δ be the distance from a1 θ1 + · · · + an θn to the nearest integer. We show that for sufficiently small ε > 0, (2.5) is not solvable. Indeed, let (x1 , . . . , xn , y) ∈ Zn+1 be a solution of (2.5). Then n n n X X X ai (ξi y − xi − θi ) 6 |ai | · |ξi y − xi − θi | 6 ε |ai |. i=1
i=1
i=1
But on the other hand, n n n X X X ai (ξi y − xi − θi ) = a0 y − ai x i − ai θi > δ. i=1
i=1
Hence (2.5) is unsolvable for ε < δ/
Pn
i=1
i=1
|ai |.
We need the following lemma. Lemma 2.13. Let L be a lattice in Rn , and λ1 , . . . , λn the successive minima of Bn with respect to L. Then for every b ∈ Rn there is u ∈ L such that ku − bk2 6 21 (λ1 + · · · + λn ). Proof. Choose linearly independent v1 , . . . , vn ∈ L such that vi ∈ λi Bn for i = Pn 1, . . . , n. Then b = , . . . , tn ∈ R. Define ai ∈ Z, ri ∈ R by i=1 ti vi with t1P ti = ai + ri , − 21 < ri 6 21 , and put u := ni=1 ai vi . Then clearly, u ∈ L, and ku − bk2 = k
n X
ri vi k2 6
i=1
1 2
n X i=1
23
kvi k2 6
1 2
n X i=1
λi .
Exercise 2.8. Let L, λ1 , . . . , λn be as above. Prove that there exists b ∈ Rn such that ku − bk2 > 21 λ1 for every u ∈ L. Proof of Theorem 2.12. Let M be a large positive integer, to be chosen later. Consider the lattice in Rn+1 , −1 LM = ε (x1 − ξ1 y), . . . , ε−1 (xn − ξn y), M −1 y : x1 , . . . , xn , y ∈ Z = {Az : z ∈ Zn+1 }, where
ε−1
0 ..
A=
. ε−1
0
−ε−1 ξ1 .. . −1 −ε ξn M −1
and z = (x1 , . . . , xn , y)T . Put b = (ε−1 θ1 , . . . , ε−1 θn , 0)T . We show that there is a positive integer M , such that for the successive minima of the (n + 1)-dimensional Euclidean ball Bn+1 with respect to LM we have 2 (2.6) λ1 6 λ2 6 · · · 6 λn+1 6 . n+1 Then by Lemma 2.13, there is u = Az ∈ LM with z = (x1 , . . . , xn , y)T ∈ Zn+1 such that ku − bk2 6 12 (λ1 + · · · λn+1 ) 6 1, and this can be rewritten as n X ε−2 (xi − ξi y − θi )2 + M −2 y 2 6 1. i=1
This certainly implies that x1 , . . . , xn , y satisfy (2.5). λ∗1
By Theorem 2.11, to prove (2.6) it suffices to show that for the first minimum of Bn+1 with respect to the dual lattice L∗M , we have λ∗1 > 21 (n + 1) · c(n + 1) =: c0 (n).
It is easy to verify that
ε ...
(AT )−1 = M ξ1
ε . . . M ξn 24
0 .. . . 0 M
Hence L∗M =
=
(AT )−1 z : z ∈ Zn+1
εx1 , . . . , εxn , M (ξ1 x1 + · · · + ξn xn + y) : x1 , . . . , xn , y ∈ Z .
We assume without loss of generality that ε < c0 (n). Let µ be the minimum of all numbers |ξ1 x1 + · · · + ξn xn + y|, taken over all integers x1 , . . . , xn , y such that |xi | 6 ε−1 c0 (n) for i = 1, . . . , n, |ξ1 x1 + · · · + ξn xn + y| 6 21 , (x1 , . . . , xn , y) 6= 0. Then µ is the minimum of finitely many real numbers which are all positive, since 1, ξ1 , . . . , ξn are linearly independent over Q. Hence µ > 0. Now let M be an integer with M > c0 (n)/µ. Then for every non-zero u ∈ L∗M we have kuk2 > c0 (n) since either one of the numbers εx1 , . . . , εxn , or M (ξ1 x1 +· · ·+ξn xn +y), has absolute value > c0 (n). This implies that if λ 6 c0 (n), then λBn+1 does not contain a non-zero point from L∗M , that is, λ∗1 > c0 (n). This completes our proof. In fact, Kronecker proved a much more general approximation theorem, of which Theorem 2.12 is just a special case. As usual, kxk2 denotes the Euclidean norm of a vector x ∈ Rn . Theorem 2.14 (Kronecker, 1887). Let A be an m × n-matrix with real entries, and b ∈ Rm a column vector. Then the following two assertions are equivalent: (i) For every y ∈ Rm with AT y ∈ Zn we have hb, yi ∈ Z; (ii) For every ε > 0 there is z ∈ Zn such that kAz − bk2 6 ε. For a proof, we refer to Siegel, Chapter II. Exercise 2.9. a) Prove (ii)=⇒(i). b) Deduce Theorem 2.12 from Theorem 2.14.
25
Chapter 3 Some algebra Literature: S. Lang, Algebra, 2nd ed. Addison-Wesley, 1984. Chaps. III,V,VII,VIII,IX. P. Stevenhagen, Dictaat Algebra 2, Algebra 3 (Dutch).
This chapter contains some algebra that is used throughout the course. The material in Section 3.1 will be used mainly used as a reference source; we will not discuss it during the course nor will we examine it. But the concepts and results from this section will be used throughout. A ring A is always supposed to have a (necessarily unique) unit element 1 such that x · 1 = 1 · x = x for all x ∈ A. An element a ∈ A is called a unit of A if there is b ∈ A with ba = ab = 1; this necessarily unique element b is denoted by a−1 . The units of A form a multiplicative group, the unit group of A, which we denote by A∗ . In particular, the unit group of a field K is K ∗ = K \ {0}. An integral domain is a commutative ring without divisors of 0, i.e., it does not contain non-zero elements a, b such that ab = 0. The quotient field of an integral domain A is denoted by QA . The field QA consists of all fractions a/b with a, b ∈ A, b 6= 0, where two fractions a/b, c/d are identified if ad = bc. The basic fact needed in both the theories of field extensions and ring extensions to be discussed below is division with remainder for polynomials. We agree that deg 0 = −∞. Then if A is an integral domain we have deg f g = deg f + deg g for all f, g ∈ A[X]. Lemma 3.1. Let A be a commutative ring, and f, g ∈ A[X] polynomials such that 27
the leading coefficient of f is in A∗ . Then there are polynomials q, r ∈ A[X] such that g = qf + r, deg q 6 deg g − deg f and deg r < deg f . The polynomials q, r are uniquely determined by f, g. Proof. Induction on the degree of g. First assume that deg g < deg f . If g = qf + r for some q, r ∈ A[X] with q 6= 0 and deg r < deg f , then since the leading coefficient of f is in A∗ , we have deg qf > deg f , while on the other hand deg qf = deg(g −r) < deg f , which is impossible. Hence q = 0, r = g. Suppose that deg g = m and deg f = n with m > n. let a, b be the leading coefficients of f, g, respectively. So a ∈ A∗ . Then apply the induction hypothesis to g − ba−1 X m−n f , which is in A[X] and has degree smaller than m. As a consequence, we obtain the extended Euclidean algorithm for polynomials with coefficients from a field. Lemma 3.2. Let K be a field and f, g ∈ K[X] polynomials of positive degree. Assume that f, g are not both divisible by a polynomial from K[X] of positive degree. Then there are a, b ∈ K[X] such that af + bg = 1, deg a < deg g and deg b < deg f . Proof. We proceed by induction on m := deg f + deg g. For m = 2, the assertion is straightforward to verify. Assume m > 3. Let deg g > deg f ; then deg g > 2. By the previous lemma there are q, r ∈ K[X] such that g = qf + r, deg q 6 deg g − deg f and deg r < deg f . Our assumption on f, g implies that r 6= 0. If deg r = 0 we are done. Let deg r > 0. Clearly, f and r do not have a non-constant common factor since otherwise it would have divided g as well. So we can imply the induction hypothesis and infer that there are a0 , b0 ∈ K[X] with a0 r + b0 f = 1, deg a0 < deg f and deg b0 < deg r. Substituting g−qf for r we obtain af +bg = 1 with a = −a0 q+b0 , b = a0 .
3.1
Field extensions
let k, K be fields such that k is a subfield of K. We call K an extension of k, notation k ⊂ K or K ⊃ k. Notice that in this case, K is a k-vector space with an additional structure, that is the multiplication of K. 28
Let K ⊃ k be a field extension. We say that α ∈ K is algebraic over k if there is a non-zero polynomial f ∈ k[X] with f (α) = 0, and transcendental over k otherwise. Lemma 3.3. Let α ∈ K be algebraic over k. Then there is a unique monic polynomial fα,k ∈ k[X] such that fα,k (α) = 0 and deg fα,k 6 deg g for every non-zero g ∈ k[X] with g(α) = 0. This polynomial has the following properties: (i) for g ∈ k[X] we have g(α) = 0 if and only if fα,k divides g in k[X]; (ii) fα,k is irreducible in k[X]. Proof. We can clearly choose a non-zero polynomial f of minimal degree among all polynomials g ∈ k[X] with g(α) = 0 and then make it monic. We prove that f satisfies (i) and (ii). To prove (i), take g ∈ k[X] with g(α) = 0. By the division with remainder algorithm, there are q, r ∈ k[X] with g = qf + r and deg r < deg g. Then r(α) = 0. By our choice of f we must have r = 0, i.e., f divides g. Conversely, if f divides g then clearly g(α) = 0. To prove (ii), assume that f is reducible. Then f = gh, where g, h ∈ k[X] are polynomials of degree smaller than that of f . At least one of these polynomials has α as a zero, say g. But this contradicts the choice of f . Property (i) implies that the polynomial f chosen above is uniquely determined. Definition. The monic polynomial fα,k from Lemma 3.3 is called the minimal polynomial of α over k. The degree of α over k is the degree of fα,k . Given α1 , . . . , αr ∈ K, we denote by k(α1 , . . . , αr ) the smallest subfield of K containing k and α1 , . . . , αr . Thus, f (α1 , . . . , αr ) k(α1 , . . . , αr ) = , f, g ∈ k[X1 , . . . , Xr ] : g(α1 , . . . , αr ) 6= 0 . g(α1 , . . . , αr ) Lemma 3.4. Let α ∈ K be algebraic over k and suppose that α has degree n over k. Then k(α) is an n-dimensional k-vector space with basis {1, α, . . . , αn−1 }, in other words, every element of k(α) can be expressed uniquely as (3.1)
c0 + c1 α + · · · + cn−1 αn−1 with c0 , . . . , cn−1 ∈ k.
Proof. The elements 1, α, . . . , αn−1 are linearly independent over k, for otherwise there were a non-zero polynomial p ∈ k[X] of degree < n with p(α) = 0, which is 29
impossible since it cannot be a multiple of fα,k . Consider the k-vector space V with k-basis 1, α, . . . , αn−1 , i.e., V := {c0 + c1 α + · · · + cn−1 αn−1 : c0 , . . . , cn−1 ∈ k} = {p(α) : p ∈ k[X], deg p < n}. We show that V is a field. Then necessarily V = Q(α) and we are done. V is clearly closed under addition and subtraction, so it suffices to prove that it is closed under multiplication and multiplicative inversion. Let β, γ ∈ V with βγ 6= 0. We have β = f (α), γ = g(α) with f, g ∈ k[X] of degree < n. By division with remainder, there are q, r ∈ k[X] with f g = qfα,k + r and deg r < n. Hence βγ = r(α) ∈ V . Further, f and fα,k do not have a common non-constant factor since fα,k is irreducible, so by the extended Euclidean algorithm there are a, b ∈ k[X], with deg a < n, such that af + bfα,k = 1. Hence a(α)β = 1, i.e., β −1 = a(α) ∈ V . Definition. Let K ⊃ k be a field extension. Then K ⊃ k is called an algebraic extension if every element of K is algebraic over k, and a transcendental extension otherwise. Further, K ⊃ k is called a finite extension if K is finite dimensional as a k-vector space. The k-dimension of K is called the degree of K over k, notation [K : k]. By Lemma 3.4, if α is algebraic over k and deg fα,k = n, then [k(α) : k] = n. Lemma 3.5. Let k ⊂ K ⊂ M be a tower of field extensions. Then M ⊃ k is finite if and only if M ⊃ K and K ⊃ k are finite, and in this case, [M : k] = [M : K] · [K : k]. Proof. Assume that M ⊃ k is finite. Then K is finite dimensional over k, being a subspace of M . Hence K ⊃ k is finite. Further, M is finitely generated as a K-vector space since it is finitely generated as a k-vector space. Hence M ⊃ K is finite. Conversely, suppose that both K ⊃ k and M ⊃ K are finite. Let {α1 , . . . , αm } be a k-basis of K, and {β1 , . . . , βn } a K-basis of M . Then {αi βj : i = 1, . . . , m, j = 1, . . . , n} is a k-basis of M . Hence [M : k] = [M : K] · [K : k]. Lemma 3.6. Let K ⊃ k be finite and α ∈ k. Then α is algebraic over k of degree dividing [K : k]. 30
Proof. [K : k] = [K : k(α)] · [k(α) : k]. Lemma 3.7. Let K ⊃ k be a field extension. Then K ⊃ k is finite if and only if K = k(α1 , . . . , αr ) for certain α1 , . . . , αr that are algebraic over k. Proof. First suppose that K ⊃ k is finite. Then K has a basis, say {α1 , . . . , αr }, over k. By Lemma 3.6, α1 , . . . , αr are algebraic over k, and clearly, K = k(α1 , . . . , αr ). Conversely, suppose that K = k(α1 , . . . , αr ), where α1 , . . . , αr are algebraic over k. Let K0 = k and Ki = k(α1 , . . . , αi ) for i = 1, . . . , r. Then Ki = Ki−1 (αi ) for i = 1, . . . , r. Hence Ki ⊃ Ki−1 is finite for i = 1, . . . , r and so, by Lemma 3.5, K ⊃ k is finite. Proposition 3.8. Let K ⊃ k be any field extension. (i) Let α, β ∈ K be algebraic over k, with β 6= 0. Then α ± β, αβ and α/β are algebraic over k. (ii) Let β1 , . . . , βn ∈ K be algebraic over k and let α ∈ K be a zero of X n +β1 X n−1 + · · · + βn . Then α is algebraic over k. Proof. (i) Consider the field k(α, β). By Lemma 3.7 this is a finite extension of k, and by Lemma 3.6, all elements of k(α, β) are algebraic over k. (ii) Let L = K(β1 , . . . , βn ), M = L(α). By Lemma 3.7, L is finite over K, and by Lemma 3.4, M is finite over L. Hence by Lemma 3.5, M is finite over K. So by Lemma 3.6, α is algebraic over k. Proposition 3.8 implies that the set of α ∈ K that are algebraic over k form a field, the algebraic closure of k in K. Another consequence of Proposition 3.8 is that if α ∈ K is algebraic over the algebraic closure of k in K, then α belongs to this algebraic closure. Definition. Let K ⊂ L be a field extension and α1 , . . . , αr ∈ L. We say that α1 , . . . , αr are algebraically dependent over K if there is a non-zero polynomial P ∈ K[X1 , . . . , Xr ] such that P (α1 , . . . , αr ) = 0. If no such polynomial exists, we say that α1 , . . . , αr are algebraically independent over K. Proposition 3.9. Let k ⊂ K ⊂ L be a tower of field extensions such that k ⊂ K is algebraic, and let α1 , . . . , αr ∈ L. Then α1 , . . . , αr algebraically independent over k ⇐⇒ α1 , . . . , αr algebraically independent over K. 31
To prove this, we need a lemma. Lemma 3.10. Let K ⊂ L be a field extension and α1 , . . . , αr ∈ L. Then α1 , . . . , αr are algebraically dependent over K if and only if there is i such that αi is algebraic over K(α1 , . . . , αi−1 , αi+1 , . . . , αr ). Proof. Suppose that α1 , . . . , αr are algebraically dependent over K. Then there is non-zero P ∈ K[X1 , . . . , Xr ] with P (α1 , . . . , αr ) = 0. Suppose that for instance the P variable Xr occurs in P . Then we can write P as ti=0 Pi (X1 , . . . , Xr−1 )Xri , where the Pi are polynomials with coefficients in K, with t > 0 and Pt 6= 0. By substituting αi for Xi for i = 1, . . . , r, we get that αr is algebraic over K(α1 , . . . , αr−1 ). Conversely, assume that say αr is algebraic over K(α1 , . . . , αr−1 ). Then αr is a zero of a polynomial X t + β1 X t−1 + · · · + βt with β1 , . . . , βt ∈ K(α1 , . . . , αr−1 ). Thus, βi = fi (α1 , . . . , αr−1 )/gi (α1 , . . . , αr−1 ) with fi , gi ∈ K[X1 , . . . , Xr−1 ] for i = 1, . . . , t. We may write fi /gi = hi /h0 where h0 = g1 · · · gr and hi ∈ K[X1 , . . . , Xr ] for i = 1, . . . , t. Then clearly, P (α1 , . . . , αr ) = 0, where P = h0 (X1 , . . . , Xr−1 )Xrt + h1 (X1 , . . . , Xr−1 )Xrt−1 + · · · + ht (X1 , . . . , Xr−1 ).
Proof of Proposition 3.9. The implication ⇐ is obvious (left to the reader). We prove ⇒. Assume that α1 , . . . , αr are algebraically dependent over K. By Lemma 3.10 we may assume that αr is algebraic over K(α1 , . . . , αr−1 ). That is, αr is a zero of X t + β1 X t−1 + · · · + βt with β1 , . . . , βt ∈ K(α1 , . . . , αr ). The elements β1 , . . . , βr are rational expressions in elements from K and α1 , . . . , αr−1 , hence are algebraic over k(α1 , . . . , αr−1 ). By Proposition 3.8, αr is algebraic over k(α1 , . . . , αr−1 ). Now Lemma 3.10 implies that α1 , . . . , αr are algebraically dependent over k.
3.2
Ring extensions
Given two rings A, B, when writing A ⊂ B or B ⊃ A we always mean that A is a subring of B, i.e., A is a ring with the addition, multiplication and unit element of B. We call A ⊂ B or B ⊃ A a ring extension. It is possible to set up a theory for ring extensions similar to that for field extensions. We consider extensions B ⊃ A of commutative rings A, B. 32
A role similar to that of vector spaces in the theory of field extensions is played by modules in the theory of ring extensions. In general, if A is a commutative ring, then an A-module is a set M , endowed with an addition + : M × M → M and scalar multiplication · : A × M → M , which satisfy precisely the same axioms as the addition and scalar multiplication of a vector space, except that the scalars are taken from a ring A instead of a field. In particular, if A is a field, then M is a vector space over that field. A Z-module is simply an abelian group. Let A be a commutative ring, and M an A-module. We call M 0 an A-submodule of M if it is closed under the addition and scalar multiplication of M . That is, M 0 is an A-submodule of M if and only if for all α, β ∈ M 0 , r, s ∈ A we have rα+sβ ∈ M 0 . We say that M is finitely generated over A, if there is a finite set of elements P α1 , . . . , αr ∈ M such that M = { ri=1 xi αi : xi ∈ A}. We call {α1 , . . . , αr } an A-basis for M if α1 , . . . , αr generate M as an A-module, and if they are A-linearly P independent, i.e., there is no (x1 , . . . , xr ) ∈ Ar \ {0} with ri=1 xi αi = 0. One notable difference between vector spaces and modules is that finitely generated vector spaces always have a basis, whereas finitely generated modules over a ring need not have a basis. An A-module which does have a basis is called free. Example. View the residue class ring Z/nZ (with n a positive integer) as a Zmodule. Write the residue class a mod n as a. As a Z-module, Z/nZ is generated by 1. But for each a ∈ Z/nZ we have n · a = 0, Hence each element of Z/nZ is linearly dependent over Z, and so Z/nZ can not have a Z-basis.
Now let A ⊂ B be an extension of commutative rings. We denote the unit element of A by 1. Clearly, B may be viewed as an A-module. We call A ⊂ B or B ⊃ A a finite ring extension and say that B is finite over A, if B is finitely generated as an A-module. Given α1 , . . . , αr ∈ B, we denote by A[α1 , . . . , αr ] the smallest subring of B containing A and α1 , . . . , αr . Thus, A[α1 , . . . , αr ] = {f (α1 , . . . , αr ) : f ∈ A[X1 , . . . , Xr ]}. An element α ∈ B is said to be integral over A if there is a monic polynomial f ∈ A[X] with f (α) = 0. 33
Lemma 3.11. Let A ⊂ B be an extension of commutative rings, and α ∈ B. Then the following are equivalent: (i) α is integral over A; (ii) A[α] is finite over A; (iii) there is a non-zero, finitely generated A-submodule M of B such that 1 ∈ M and αM ⊆ M , where αM = {αx : x ∈ M }. Proof. (i)=⇒(ii). Let f ∈ A[X] be a monic polynomial with f (α) = 0. Let β ∈ A[α]. Then β = g(α) with g ∈ A[X]. Since f is monic, using division with remainder we find q, r ∈ A[X] with g = qf + r, and deg r < deg f = n. We may write r = c0 + c1 X + · · · + cn−1 X n−1 with ci ∈ A. Thus, β = g(α) = q(α)f (α) + r(α) = r(α) =
n−1 X
ci α i .
i=0
It follows that A[α] is generated as an A-module by 1, α, . . . , αn−1 . (ii)=⇒(iii). Trivial. (iii)=⇒(i). let {ω1 , . . . , ωr } be a set of A-module generators for M . Since 1 ∈ M we may assume that ω1 = 1. Further, since αωi ∈ M we have αωi = ci1 ω1 + · · · + cir ωr with cij ∈ A for i, j = 1, . . . , r. This can be rewritten as ω1 (αI − C) ... = ωr
0 .. , . 0
where I is the r × r unit matrix and C = (cij )i,j=1,...,r is the r × r-matrix with cij on the i-th row and j-th column. We multiply both sides of this identity on the left with the matrix consisting of the minors of αI − C, i.e., with D = (Mij )i,j=1,...,r , where Mij = (−1)i+j times the determinant of the matrix, obtained by removing the j-th row and i-th column from αI − C. Then since D(αI − C) = det(αI − C) · I, we obtain det(αI − C)ωi = 0 for i = 1, . . . , r, so det(αI − C) = 0. That is, α is a zero of det(XI − C), which is a monic polynomial from A[X]. 34
Corollary 3.12. Let A ⊂ B be a finite extension of commutative rings. Then every α ∈ B is integral over A. Proof. Apply Lemma 3.11, (iii)=⇒(i) with M = B. Lemma 3.13. Let A, B, C be commutative rings such that B is finite over A and C is finite over B. Then C is finite over A. Proof. Suppose that B is generated as an A-module by α1 , . . . , αm , and C is generated as a B-module by β1 , . . . , βn . A straightforward computation shows that C is generated an as A-module by αi βj (i = 1, . . . , m, j = 1, . . . , n). Lemma 3.14. Let A, B be commutative rings. Then B is finite over A if and only if B = A[α1 , . . . , αr ] for certain α1 , . . . , αr that are integral over A. Proof. Suppose that B is finite over A, say B is generated as an A-module by α1 , . . . , αr . By Corollary 3.12, α1 , . . . , αr are integral over A, and clearly, B = A[α1 , . . . , αr ]. Conversely, suppose that B = A[α1 , . . . , αr ] where α1 , . . . , αr are integral over A. Let B0 := A and Bi := A[α1 , . . . , αi ] for i = 1, . . . , r. Thus, Bi = Bi−1 [αi ] for i = 1, . . . , r. By Lemma 3.11, (i)=⇒(ii), Bi is finite over Bi−1 for i = 1, . . . , r, and then by Lemma 3.13, B = Br is finite over A. Proposition 3.15. Let A ⊂ B be an extension of commutative rings. (i) Let α, β ∈ B be integral over A. Then α ± β and αβ are integral over A. (ii) Let β1 , . . . , βn ∈ B be integral over A, and let α ∈ B be a zero of X n + β1 X n−1 + · · · + βn . Then α is integral over A. Proof. (i). By Lemma 3.14, the ring A[α, β] is finite over A, and then by Corollary 3.12, all elements of A[α, β] are integral over A. (ii). Let C = A[β1 , . . . , βn ]. By Lemma 3.14, C is finite over A. The number α is integral over C, so by Lemma 3.11, C[α] is finite over C. Again by Lemma 3.14, C[α] is finite over A, and then by Corollary 3.12, α is integral over A. Definition. Let A ⊂ B be an extension of commutative rings. By Proposition 3.15 (i), the set C = {α ∈ B : α is integral over A} 35
is a subring of B, called the integral closure of A in B. Note that by Proposition 3.15,(ii) every element of B that is integral over C already belongs to C. Definition. Let A be an integral domain. We say that A is integrally closed if every α ∈ QA that is integral over A, belongs to A. Lemma 3.16. Z is integrally closed. Proof. Let a/b ∈ Q with a, b ∈ Z, b > 0, and gcd(a, b) = 1. Suppose that a/b is integral over Z. Then there is a monic polynomial f = X d +c1 X d−1 +· · ·+cd ∈ Z[X] with f (a/b) = (a/b)d + c1 (a/b)d−1 + · · · + cd = 0. Multiplying with bd gives ad + b(c1 ad−1 + c2 ad−2 b + · · · + cd bd−1 ) = 0. This implies that b divides ad , hence b = 1 since gcd(a, b) = 1.
3.3
Algebraic numbers and algebraic integers
A number α ∈ C is called algebraic if it is algebraic over Q, and transcendental otherwise. let α ∈ C be algebraic. We denote by fα the minimal polynomial of α over Q, that is the unique monic, irreducible polynomial f ∈ Q[X] with f (α) = 0. Every polynomial g ∈ Q[X] with g(α) = 0 is a multiple of f . We observe that fα = (X − α(1) ) · · · (X − α(d) ) where α(1) , . . . , α(d) are distinct elements of C. Indeed, if two of these numbers, say α(i) and α(j) were equal, then α(i) were a zero of the derivative fα0 of fα . But then α(i) were a zero of g := gcd(fα , fα0 ), hence deg g > 1. But this is impossible since fα is irreducible. One of the numbers α(i) is equal to α. We call d the degree of α, and α(1) , . . . , α(d) the conjugates of α. The number α is called totally real if all α(i) are in R, and totally complex, if all α(i) are in C \ R. In general, some of the α(i) are in R and some not in R. For 36
√ √ √ instance, α = 3 2 has minimal polynomial X 3 − 2, and conjugates 3 2, ζ 3 2, and √ ζ 2 3 2, where ζ is a primitive cube root of unity. Recall that by Proposition 3.8, if α, β are algebraic then so are α ± β, αβ and α/β (if β 6= 0). Further, if α is a zero of some polynomial X d + β1 X d−1 + · · · + β0 with all βi algebraic numbers, then α is algebraic. Let Q := {α ∈ C : α is algebraic over Q}. A field K is called algebraically closed if every monic polynomial f ∈ K[X] can Q be factored as di=1 (X − αi ) with α1 , . . . , αd ∈ K. It is well-known that C is algebraically closed. Together with the remarks made above, it follows that Q is an algebraically closed field, the algebraic closure of Q (in C). Suppose fα = X d + b1 X d−1 + · · · + bd−1 X + bd ; then b1 , . . . , bd ∈ Q. Let a0 ∈ Z>0 be the least common multiple of the denominators of b1 , . . . , bd . Put Fα := a0 fα . Then Fα = a0 X d + a1 X d−1 + · · · + ad , with a0 , . . . , ad ∈ Z, gcd(a0 , . . . , ad ) = 1. We call Fα the primitive minimal polynomial of α (terminology invented by the author; not used in general!). We define the height of α by H(α) := max(|a0 |, . . . , |ad |). Examples. 1. Let α = a/b with a, b ∈ Z, b > 0, gcd(a, b) = 1. Then α has primitive minimal polynomial bX − a, hence H(α) = max(|a|, b). √ 2. Let α = 51 (1 + 2 3). Then √ √ 11 fα = (X − 51 (1 + 2 3))(X − 51 (1 − 2 3)) = X 2 − 52 X − 25 , Fα = 25X 2 − 10X − 11, H(α) = 25. Exercise 3.1. Let α be a non-zero algebraic number. (i) Prove that H(α−1 ) = H(α). (ii) Prove that (H(α) + 1)−1 6 |α| 6 H(α) + 1. Definition. A number α ∈ C is called an algebraic integer if it is integral over Z, that is, if there is a monic polynomial f ∈ Z[X] such that f (α) = 0. Elements of 37
Z are sometimes called rational integers. α ∈ C is called an algebraic unit if both α, α−1 are algebraic integers. Recall that by Proposition 3.15, if α, β are algebraic integers then so are α ± β and αβ. Further, if α is a zero of some polynomial X d + β1 X d−1 + · · · + β0 with all βi algebraic integers, then α is an algebraic integer. This shows that the algebraic integers in C form a ring, which we denote by Z. Lemma 3.17. Let α be an algebraic number, and fα its minimal polynomial. Then α is an algebraic integer if and only if fα ∈ Z[X]. Proof. If fα ∈ Z[X] then clearly, α is an algebraic integer. Assume conversely that α is an algebraic integer. Then there is a monic polynomial f ∈ Z[X] with f (α) = 0. The polynomial fα divides f in Q[X]. Hence α(1) , . . . , α(d) , being the zeros of fα , are also zeros of f , and so they are algebraic integers. But then the coefficients of Q fα = di=1 (X − α(i) ) are algebraic integers. These coefficients lie also in Q. Since all algebraic integers in Q actually belong to Z, we must have fα ∈ Z[X]. Lemma 3.18. Let α be an algebraic number, and let a0 be the leading coefficient of the primitive minimal polynomial Fα of α. Then a0 α is an algebraic integer. Proof. Let Fα = a0 X d + a1 X d−1 + · · · + ad . Then d d−1 0 = ad−1 + · · · + ad−1 0 Fα (α) = (a0 α) + a1 (a0 α) 0 ad ,
hence a0 α is a zero of a monic polynomial from Z[X]. Definition. Given a non-zero algebraic number α, we define the denominator of α to be the smallest positive a ∈ Z such that aα is an algebraic integer, notation den(α). Exercise 3.2. (i) Let G = b0 X m + b1 X m−1 + · · · + bm where b0 , . . . , bm are algebraic integers with b0 6= 0. Let α ∈ C with G(α) = 0. Prove that G/(X − α) is a polynomial with algebraically integral coefficients. Hint. Induction on m. In the induction step, use that b0 α is an algebraic integer. (ii) We can express the primitive minimal polynomial of an algebraic number α as Q Fα = a0 di=1 (X − α(i) ), where α(1) , . . . , α(d) are the conjugates of α. Prove that for each subset {i1 , . . . , ik } of {1, . . . , d}, the number a0 α(i1 ) · · · α(ik ) is an algebraic integer. 38
3.4
Algebraic number fields
A(n algebraic) number field is a finite extension of Q. Thus, K is an algebraic number field if and only if K = Q(α1 , . . . , αr ) for certain algebraic numbers α1 , . . . , αr . The degree of an algebraic number field K is the degree [K : Q]. We state some facts without proof. Lemma 3.19 (Theorem of the primitive element). Let K be an algebraic number field of degree d. Then there is θ ∈ K such that K = Q(θ), and K is a Q-vector space with basis {1, θ, . . . , θd−1 }. √ √ √ √ Example. Q( 2, 3) = Q( 2 + 3). √ √ √ √ To verify this, observe first that 3 6∈ Q( 2). Hence K := Q( 2, 3) has degree √ √ √ √ √ [K : Q( 2)]·[Q( 2) : Q] = 4. For a Q-basis of K one may take {1, 2, 3, 6}. On √ √ the other hand, L := Q( 2 + 3) is a subfield of K, and it has the four Q-linearly √ √ √ √ 2 √ √ √ √ √ independent elements 1, 2+ 3, ( 2+ 3) = 5+2 6, ( 2+ 3)3 = 11 2+9 3. Hence L = K. An embedding of a number field K in C is an injective field homomorphism of K into C. An embedding of K in C necessarily leaves the elements of Q unchanged. This has the following consequences. First, let σ be an embedding of K in C, α1 , . . . , αr ∈ K, and β = f (α1 , . . . , αr )/g(α1 , . . . , αr ) with f, g ∈ Q[X1 , . . . , Xr ]. Then σ(β) = f (σ(α1 , . . . , αr ))/g(σ(α1 , . . . , αr )). In particular, if K = Q(α1 , . . . , αr ), then σ is uniquely determined by its images on α1 , . . . , αr . Second, if f ∈ Q[X], and α ∈ K is a zero of f , then also σ(α) is a zero of f . For f (σ(α)) = σ(f (α)) = 0. Lemma 3.20. Let K be an algebraic number field of degree d. Then there are precisely d distinct embeddings of K in C. These d embeddings can be described explicitly as follows. Suppose that K = Q(θ). Let fθ be the minimal polynomial of θ and suppose fθ has degree d. Then fθ = (X − θ(1) ) · · · (X − θ(d) ) with θ(1) , . . . , θ(d) ∈ C. 39
As mentioned above, the embeddings of K in C map θ to the zeros of fθ , and each embedding is uniquely determined by its image of θ. Hence the d embeddings σ1 , . . . , σd of K in C can be given by σi (θ) = θ(i) . √ √ Example. Let K = Q( 4 2). The minimal polynomial of 4 2 is X 4 − 2, and the √ zeros of X 4 − 2 are ik 4 2 (k = 0, 1, 2, 3), where i2 = −1. Thus, ( 3 ) X √ 4 K= xj ( 2)j : xj ∈ Q j=0
and the four embeddings of K in C are given by 3 X j=0
3 X √ √ 4 4 xj ( 2)j 7→ xj (ik 2)j
(k = 0, 1, 2, 3).
j=0
Let K be an algebraic number field, and L a finite extension of K. An embedding τ of L in C is called a continuation of an embedding σ of K in C if τ |K = σ, i.e., σ(x) = τ (x) for x ∈ K. Obviously, each embedding of L in C is a continuation of some embedding of K in C. Lemma 3.21. Each embedding of K in C can be continued in precisely [L : K] ways to an embedding of L in C. √ √ Example. Let K = Q( 4 2), L = Q( 12 2). Then L ⊃ K and [L : K] = 3. The √ √ four embeddings of K in C are given by σk ( 4 2) = ik 4 2 (k = 0, 1, 2, 3). The twelve √ √ l 12 3 embeddings of L in C are given by τl ( 12 2) = ζ12 = i. 2 (l = 0, 1, . . . , 11), where ζ12 The continuations of σk to L are τl with 3l ≡ k (mod 4), 0 6 l 6 11. Corollary 3.22. Let K be an algebraic number field of degree d, and let σ1 , . . . , σd be the embeddings of K in C. Further, let α ∈ K. Then each of the conjugates of α occurs precisely [K : Q(α)] times in the sequence σ1 (α), . . . , σd (α). Q (i) Proof. Suppose Q(α) has degree m. Let fα = m i=1 (X − α ). Then the m embeddings τ1 , . . . , τm of Q(α) ∈ C are determined by τi (α) = α(i) for i = 1, . . . , m. Since each τi has precisely d/m continuations to K and each embedding of K in C is a continuation of some τi , each of the numbers α(i) occurs precisely d/m = [K : Q(α)] times among σ1 (α), . . . , σm (α). 40
Let K be an algebraic number field of degree d and σ1 , . . . , σd the embeddings of K in C. The characteristic polynomial of α ∈ K is defined by χα,K
d Y := (X − σi (α)). i=1
In case that K = Q(α), we have χα,K = fα is the minimal polynomial of α. In case that Q(α) is strictly smaller than α we have the following. Lemma 3.23. Let α ∈ K and let fα be the minimal polynomial of α. Then χα,K = [K:Q(α)] fα . Hence χα,K ∈ Q[X]. Proof. Use that fα = (X −α(1) ) · · · (X −α(d) ) where α(1) , . . . , α(m) are the conjugates of α, and apply Corollary 3.22. Definition. Let K be an algebraic number field. The ring of integers of K is given by OK = {α ∈ K : α is an algebraic integer}. That is, OK is the integral closure of Z in K. The group of units (invertible elements) ∗ . We observe that if α ∈ OK and σ is an embedding of K of OK is denoted by OK in C, then σ(α) is an algebraic integer. For α is a zero of a monic f ∈ Z[X] hence so is σ(α). Lemma 3.24. Let α ∈ K. Then α ∈ OK ⇐⇒ χα,K ∈ Z[X]. Proof. ⇐= χα,K is a monic polynomial having α as a root. =⇒ Lemma 3.17 implies fα ∈ Z[X], and then Lemma 3.23 implies χα,K ∈ Z[X]. Definition. Let K be an algebraic number field of degree d and σ1 , . . . , σd the embeddings of K in C. Then the trace and norm of α ∈ K over Q are given by respectively d d X Y T rK/Q (α) = σi (α), NK/Q (α) = σi (α). i=1
i=1
41
These numbers are coefficients of χα,K . So by Lemma 3.23 these numbers belong to Q; moreover, if α ∈ OK then by Lemma 3.24 they belong to Z. Notice that for α, β ∈ K we have T rK/Q (α + β) = T rK/Q (α) + T rK/Q (β),
NK/Q (αβ) = NK/Q (α)NK/Q (β).
∗ Lemma 3.25. Let α ∈ OK . Then α ∈ OK ⇐⇒ NK/Q (α) = ±1.
Proof. =⇒ Both α, α−1 are in OK , hence NK/Q (α) ∈ Z, NK/Q (α−1 ) ∈ Z. But NK/Q (α)NK/Q (α−1 ) = 1, hence NK/Q (α) = ±1. ⇐= Without loss of generality, σ1 (α) = α. Then α−1 = ±σ2 (α) · · · σd (α) is an algebraic integer, hence belongs to OK . Exercise 3.3. Let K be a quadratic number field, that is an algebraic number field of degree 2. √ √ (i) Prove that K = Q( d) = {a + b d : a, b ∈ Q}, where d ∈ Z \ {0, 1} and d is not divisible by the square of an integer 6= 1. Also determine the two embeddings of K in C. √ (ii) Let α = a + b d with a, b ∈ Q. Prove the following: if d ≡ 2 (mod 4) or d ≡ 3 (mod 4) then α ∈ OK if and only if a, b ∈ Z; if d ≡ 1 (mod 4) then α ∈ OK if and only if 2a, 2b ∈ Z and 2a ≡ 2b (mod 2). Hint. Determine the minimal polynomial fα of α and use that its coefficients are in Z.
3.5
Galois theory
Let K be an algebraic number field with K ⊂ C. The field K is called normal if K = Q(α1 , . . . , αr ) where α1 , . . . , αr are such that (X − α1 ) · · · (X − αr ) ∈ Q[X]. Let K be a normal algebraic number field and α1 , . . . , αr as above. Let σ be an embedding of K in C. In general, if f ∈ Q[X] and α is a zero of f in K, then f (σ(α)) = 0. As a consequence, σ permutes α1 , . . . , αr . Since K consists of rational functions in α1 , . . . , αr with coefficients in Q, this implies that σ is an isomorphism mapping K to itself, i.e., an automorphism of K. The automorphisms of K form a group under composition, the Galois group of K, notation Gal(K/Q). We state without proof some properties of the Galois group. 42
Proposition 3.26. Let K be a normal algebraic number field, and Gal(K/Q) its Galois group. (i) Gal(K/Q) is a group of order [K : Q]. We have {x ∈ K : σ(x) = x ∀σ ∈ Gal(K/Q)} = Q. (ii) There is a bijection between the subgroups of Gal(K/Q) and the subfields of K, given by H 6 Gal(K/Q) −→ K H := {x ∈ K : σ(x) = x ∀ σ ∈ H} Gal(K/L) := {σ ∈ Gal(K/Q) : σ|L = id} ←− L and the order of H is equal to [K : K H ]. (iii) Let f ∈ Q[X] be an irreducible polynomial having at least one root in K. Then all roots of f lie in K, for any two roots α, β of f there is σ ∈ Gal(K/Q) such that σ(α) = β, and each σ ∈ Gal(K/Q) permutes the roots of f . Remarks. 1. The bijection in (ii) reverses inclusions: if H1 is a subgroup of H2 , then K H2 is a subfield of K H1 . 2. Every algebraic number field K can be extended to a normal number field. Let K = Q(θ). Then for the minimal polynomial fθ of θ we have fθ = (X −θ(1) ) · · · (X − θ(d) ). Clearly, N := Q(θ(1) , . . . , θ(d) ) is normal, and N contains K. √ √ √ √ √ √ Example. Let K = Q( 2, 3). Then K = Q( 2, − 2, 3, − 3) is generated by the four roots of (X 2 − 2)(X 2 − 3) hence it is normal. We have seen before that √ √ √ [K : Q] = 4, and that K has basis {1, 2, 3, 6} over Q. Hence G := Gal(K/Q) √ √ has order 4. Any σ ∈ Gal(K/Q) maps 2 to a root of X 2 − 2, hence to ± 2. √ √ √ Likewise, σ maps 3 to ± 3. Thus, G = {σab : a, b = 1, −1}, where σab ( 2) = √ √ √ a 2, σab ( 3) = b 3. It is easy to check that G is the Klein four group, with σ11 the identity. The table below gives the subgroups of G and corresponding subfields of K. K H = {x ∈ K : σ(x) = x ∀σ ∈ H} √ √ Q( 2, 3) √ Q( 2) √ Q( 3) √ Q( 6) Q
H {id} {id, σ1,−1 } {id, σ−1,1 } {id, σ−1,−1 } G 43
As an example, we compute the subfield K H corresponding to the subgroup H = √ √ √ {id, σ−1,−1 } of G. Recall that {1, 2, 3, 6} is a basis of K. Thus, every element √ √ √ of K can be expressed uniquely as x0 + x1 2 + x2 3 + x3 6 with xi ∈ Q. Now √ √ √ √ √ √ σ−1,−1 maps β = x0 + x1 2 + x2 3 + x3 6 to x0 − x1 2 − x2 3 + x3 6, and thus √ σ11 (β) = β if and only if x1 = x2 = 0. This shows that K H = Q( 6). √ Exercise 3.4. Let 3 2 be the real cube root of 2, and ζ a primitive cube root of unity. √ (i) Prove that the field K := Q( 3 2, ζ) is normal and that [K : Q] = 6. (ii) Determine the subfields of K.
44
Chapter 4 Transcendence results We recall some basic definitions. We call α ∈ C transcendental if it is not algebraic, i.e., if it is not a zero of a non-zero polynomial from Q[X]. We call numbers α1 , . . . , αn ∈ C algebraically independent if they are algebraically independent over Q, i.e., if there is no non-zero polynomial P ∈ Q[X1 , . . . , Xn ] such that P (α1 , . . . , αn ) = 0. According to Proposition 3.8, this is equivalent to α1 , . . . , αn being algebraically independent over Q. Given a subset S of C, we define the transcendence degree of S, notation trdeg S, to be the maximal number t such that S contains t algebraically independent elements.
4.1
The transcendence of e.
We define as usual ez =
P∞
n=0
z n /n! for z ∈ C. Further, Q = {α ∈ C : α algebraic over Q}.
Theorem 4.1 (Hermite, 1873). e is transcendental. We assume that e is algebraic. This means that there are q0 , q1 , . . . , qn ∈ Z with (4.1)
q0 + q1 e + · · · + qn en = 0,
q0 6= 0.
Under this hypothesis, we construct M ∈ Z with M 6= 0 and |M | < 1 and obtain 45
a contradiction. We need some auxiliary results. Of course we have to use certain properties of e. We use that (ez )0 = ez . Let f ∈ C[X] be a polynomial. For z ∈ C we define Z z ez−u f (u)du. (4.2) F (z) := 0
Here the integration is over the line segment from 0 to z. We may parametrize this line segment by u = tz, 0 6 t 6 1. Thus, Z F (z) =
1
ez(1−t) f (zt)zdt.
0
Lemma 4.2. Suppose f has degree m. Then ! m m X X z (j) F (z) = e f (0) − f (j) (z). j=0
j=0
Proof. Repeated integration by parts. Corollary 4.3. Let f be as in Lemma 4.2. Then q0 F (0) + · · · + qn F (n) = −
n X m X
qa f (j) (a).
a=0 j=0
Proof. Clear. Our aim is to show that for a suitable choice of f , the quantity M := q0 F (0) + · · · + qn F (n) is a non-zero integer with |M | < 1. Note that Corollary 4.3 gives an identity with an analytic expression on the left-hand side, and an algebraic expression on the right-hand side. We prove that M is a non-zero integer by analyzing the right-hand side, and |M | < 1 by analyzing the left-hand side. For the latter, we need the following simple estimate. Lemma 4.4. Let f ∈ C[X] be any polynomial and let F be given by (4.2). Then For z ∈ C we have |F (z)| 6 |z| · e|z| · sup |f (u)|. u∈C, |u|6|z|
46
Proof. We have Z
1
|e
|F (z)| 6
z(1−t)
Z
1
f (zt)z|dt 6
0
e|z| |z| · |f (zt)|dt
0
6 |z| · e|z| ·
|f (u)|.
sup u∈C, |u|6|z|
Let p be a prime number, which is chosen later to be sufficiently large to make all estimates work. We take 1 f (X) := (p−1)! · X p−1 {(X − 1)(X − 2) · · · (X − n)}p .
(4.3) In this case, (4.4)
M=
n X
qa F (a) = −
n np+p−1 X X
a=0
a=0
qa f (j) (a).
j=0
Lemma 4.5. We have (4.5)
f (p−1) (0) = {(−1)n n!}p ;
(4.6)
f (j) (a) = 0 for a = 0, . . . , n, j = 0, . . . , p − 1, (a, j) 6= (0, p − 1);
(4.7)
f (j) (a) ≡ 0 (mod p) for a = 0, . . . , n, j > p.
Proof. In general, if g is a polynomial of the shape (X − a)r h with a ∈ C, h ∈ C[X], then g (m) (a) = 0 for m = 0, . . . , r − 1 and g (r) (a) = r!h(a). This implies (4.5), (4.6). To prove (4.7), observe that for any g = cr X r + · · · + c0 ∈ C[X] and all j > 0 we have r−j r−j−1 1 (j) r (4.8) + cr−1 r−1 X + · · · + cj . j j! g = cr j X In particular, since (p − 1)!f ∈ Z[X] and the binomial coefficients are integers, we have for j > p, a = 0, . . . , n that (p − 1)!f (j) /j! ∈ Z[X], and so f (j) (a)/p ∈ Z. This implies at once (4.7). Lemma 4.6. Assume that p > |q0 n|. Then M is a non-zero integer. Proof. From (4.5) it follows that the term q0 f (p−1) (0) is an integer not divisible by p, while all other terms qa f (j) (a) in the right-hand side of (4.4) are integers that are either 0 or divisible by p. Hence M is an integer not divisible by p. 47
Lemma 4.7. For p sufficiently large, we have |M | < 1. Proof. By Lemma 4.4, we have for a = 0, . . . , n, |F (a)| 6 a · e|a| · sup |f (u)|. |u|6a
For a, b = 0, . . . , n, and u ∈ C with |u| 6 a we have |u − b| 6 |u| + |b| 6 2n. Hence sup |f (u)| 6 |u|6a
(2n)np+p−1 cp 6 , (p − 1)! (p − 1)!
say, where c is a constant independent of p, a, b. This implies ! n n X X cp |qa F (a)| 6 qa · a · e a . |M | 6 (p − 1)! a=0 a=0 p
c For p sufficiently large this is < 1, since for any c > 1, (p−1)! → 0 as p → ∞.
Summarizing, our assumption that e is algebraic implies that there is a quantity M , which is by Lemma 4.6 an non-zero integer, and by Lemma 4.7, of absolute value < 1. Since this is absurd, e must be transcendental.
4.2
The Lindemann-Weierstrass theorem
Lindemann proved in 1882 that eα is transcendental for algebraic α, and Weierstrass proved in 1885 that if α1 , . . . , αn are algebraic numbers that are linearly independent over Q, then eα1 , . . . , eαn are algebraically independent over Q. The following result, due to A. Baker, is in fact equivalent to the Lindemann-Weierstrass Theorem. Theorem 4.8. Let α1 , . . . , αn , β1 , . . . , βn ∈ Q. Suppose that α1 , . . . , αn are pairwise distinct, and that β1 , . . . , βn 6= 0. Then β1 eα1 + · · · + βn eαn 6= 0. Before proving this theorem, we state some corollaries. Corollary 4.9. (i) Let α ∈ Q be non-zero. Then eα is transcendental. (ii) π is transcendental. 48
Proof. (i) Suppose that eα =: β is algebraic. Then it follows that 1 · eα − β · e0 = 0, contradicting Theorem 4.8. (ii) Suppose that π is algebraic. Then πi is algebraic. But eπi = −1 is not transcendental, contradicting (i). Corollary 4.10. (i) Let α ∈ Q and α 6= 0. Then sin α, cos α, and tan α are transcendental. (ii) Let α ∈ Q and α 6= 0, 1. Then log α is transcendental (for any choice of log α). Exercise 4.1. Prove Corollary 4.10. Corollary 4.11. Let α1 , . . . , αn ∈ Q. Then the following two assertions are equivalent: (i) α1 , . . . , αn are linearly independent over Q; (ii) eα1 , . . . , eαn are algebraically independent. Exercise 4.2. Prove Corollary 4.11. We start with some preliminary comments on the proof of Theorem 4.8. Our proof of the transcendence of e was by contradiction: we assumed that q0 + q1 e + · · · + qn en = 0 for certain rational integers q0 , . . . , qn , and constructed from this a non-zero integer M with |M | < 1. To prove the Lindemann-Weierstrass Theorem, we may again proceed by contradiction and assume that β0 eα0 + · · · + βn eαn = 0. By following the transcendence proof of e, but replacing 0, 1, , . . . , n by α1 , . . . , αn and q0 , . . . , qn by β1 , , . . . , βn , we obtain a non-zero algebraic integer M , not necessarily in √ Q, such that |M | < 1. This is, however, not a contradiction. For instance, 21 (1− 5) is an algebraic integer of absolute value < 1. Instead, we use the following observation. Let α be in the ring of integers of an algebraic number field K. Recall that the characteristic polynomial of α is Q χα,K (X) = σ (X − σ(α)) where the product is over all embeddings σ : K ,→ C of K. This polynomial has its coefficients in Z. So in particular the norm NK/Q (α) = Q σ σ(α) is in Z. If α 6= 0, we have NK/Q (α) 6= 0, hence |NK/Q (α)| > 1. So from the assumption that Theorem 4.8 is false we have to construct somehow a non-zero algebraic integer whose norm has absolute value < 1. If we just follow the transcendence proof of e without any modifications, we only get a non-zero algebraic integer M with |M | < 1, but we can not show that its norm has absolute value < 1. 49
A priori, for some embedding σ, the quantity |σ(M )| may be very large, and then |NK/Q (M )| may be > 1. P To circumvent this, we deduce from the expression ni=1 βi eαi a new expression Pt γi conditions. These symmetry i=1 δi e , where the γi , δi satisfy certain symmetryP conditions allow to construct, under the hypothesis ti=1 δi eγi = 0, a non-zero algebraic integer having a norm with absolute value < 1. Thus, we obtain a weaker version of the Lindemann-Weierstrass Theorem, which asserts that under the said P symmetry conditions, ti=1 δi eγi 6= 0. But as will be seen, this weaker version implies the general Lindemann-Weierstrass Theorem. Theorem 4.12 (“Weak Lindemann-Weierstrass Theorem”). Let L ⊂ C be a normal algebraic number field. Let γ1 , . . . , γt , δ1 , . . . , δt ∈ L such that γ1 , . . . , γt are distinct, δ1 · · · δt 6= 0, and suppose moreover, that each τ ∈ Gal(L/Q) permutes the pairs (γ1 , δ1 ), . . . , (γt , δt ). Then δ1 eγ1 + · · · + δt eγt 6= 0. We say that τ permutes the pairs (γ1 , δ1 ), . . . , (γt , δt ) if (τ (γ1 ), τ (δ1 )), . . . , (τ (γt ), τ (δt )) is a permutation of (γ1 , δ1 ), . . . , (γt , δt ). We first prove the implication Theorem 4.12=⇒Theorem 4.8. After that, we prove Theorem 4.12. Theorem 4.12 =⇒ Theorem 4.8. Assume that Theorem 4.8 is false. This means that there are α1 , . . . , αn , β1 , . . . , βn ∈ Q such that, α1 , . . . , αn are distinct, β1 , . . . , βn are non-zero, and β1 eα1 + · · · + βn eαn = 0. We derive from this a contradiction to Theorem 4.12. Let L be the number field generated by α1 , . . . , αn , β1 , . . . , βn and their conjugates. Then L is a normal number field. Let Gal(L/Q) = {τ1 , . . . , τd }. Recall that if γ ∈ L, then the set {τ1 (γ), . . . , τd (γ)} contains all conjugates of γ. Clearly, d Y τi (β1 )eτi (α1 ) + · · · + τi (βn )eτi (αn ) = 0. i=1
50
By expanding the product, we get (4.9)
n X i1 =0
···
n X
τ1 (βi1 ) · · · τd (βid ) · eτ1 (αi1 )+···+τd (αid ) = 0.
id =1
Each τ ∈ Gal(L/Q) permutes the pairs τ1 (αi1 ) + · · · + τd (αid ), τ1 (βi1 ) · · · τd (βid ) , since τ τ1 , . . . , τ τd is a permutation of τ1 , . . . , τd . The exponents τ1 (αi1 )+· · ·+τd (αid ) need not be distinct. We group together the terms with equal exponents. Let γ1 , . . . , γs be the distinct values among τ1 (αi1 ) + · · · + τd (αid ) (0 6 i1 , . . . , in 6 d), and for k = 1, . . . , s, denote by Jk the set of tuples (i1 , . . . , id ) such that τ1 (αi1 ) + · · · + τd (αid ) = γk . Then (4.9) becomes (4.10)
s X
δk eγk = 0,
X
where δk =
k=1
τ1 (βi1 ) · · · τd (βid ).
(i1 ,...,ik )∈Jk
Notice that each τ ∈ Gal(L/Q) permutes the pairs (γ1 , δ1 ), . . . , (γs , δs ). A priori, all coefficients δk might be 0. However, we show that there is a tuple (i1 , . . . , ik ) such that τ1 (αi1 ) + · · · + τd (αid ) is different from all the other exponents. Thus, for some k, the set Jk has cardinality 1, and δk 6= 0. Define a total ordering on C by setting θ < ζ if Re θ < Re ζ or if Re θ = Re ζ and Im θ < Im ζ. This ordering has the property that if θi , ζi are complex numbers P P with θi < ζi for i = 1, . . . , r, then rj=1 θj < rj=1 ζj . Since α1 , . . . , αd were assumed to be distinct, for each τ ∈ Gal(L/Q), the numbers τ (α1 ), . . . , τ (αd ) are distinct. Hence for each k ∈ {1, . . . , d}, there is an index ik such that τk (αik ) > τk (αj ) for j 6= ik . This implies τ1 (αi1 ) + · · · + τd (αid ) > τ1 (αj1 ) + · · · + τd (αjd ) for all tuples (j1 , . . . , jd ) 6= (i1 , . . . , id ), and so τ1 (αi1 ) + · · · + τd (αid ) is distinct from the other exponents. Assume without loss of generality that δ1 , . . . , δt are the non-zero numbers among δ1 , . . . , δs . Then (4.10) becomes (4.11)
t X
δk eγk = 0.
k=1
51
By construction, the numbers γ1 , . . . , γt are distinct algebraic numbers. Further, δ1 , . . . , δt are non-zero. As observed before, each τ ∈ Gal(L/Q) permutes the pairs (γ1 , δ1 ), . . . , (γs , δs ) from (4.10). But then τ permutes also the pairs with δk 6= 0, i.e., τ permutes (γ1 , δ1 ), . . . , (γt , δt ). Now Theorem 4.12 implies that (4.11) is false. Thus, our assumption that Theorem 4.8 is false leads to a contradiction. Proof of Theorem 4.12. We follow the transcendence proof of e, with the necessary modifications. Before proceeding, we observe that there is no loss of generality to assume that δ1 , . . . , δt are algebraic integers. Indeed, there is a positive m ∈ Z such that mδ1 , . . . , mδt are algebraic integers (e.g, we may take for m the product of the denominators of β1 , . . . , βn ), and clearly, the conditions and conclusion of Theorem 4.12 are unaffected if we replace δi by mδi for i = 1, . . . , t. Let γ1 , . . . , γt be distinct algebraic numbers and δ1 , . . . , δt non-zero algebraic integers from the normal number field L, such that each τ ∈ Gal(L/Q) permutes the pairs (γ1 , δ1 ), . . . , (γt , δt ). Assume that δ1 eγ1 + · · · + δt eγt = 0.
(4.12)
Let again p be a prime number. Further, let l be a positive rational integer such that lγ1 , . . . , lγt are all algebraic integers (e.g., the product of the denominators of γ1 , . . . , γt ). For k = 1, . . . , t, define fk (X) :=
1 (p−1)!
tp
p−1
· l (X − γk )
t Y (X − γj )p , j=1
Z Fk (z) :=
z
ez−u fk (u)du,
0
Mk := δ1 Fk (γ1 ) + · · · + δt Fk (γt ). We prove the following: 1) For each τ ∈ Gal(L/Q) we have τ (M1 ) ∈ {M1 , . . . , Mt }; 2) for sufficiently large p, M1 is a non-zero algebraic integer; 3) |Mk | < 1 for k = 1, . . . , t and sufficiently large p. 52
From 1) and 3) it follows that |NL/Q (M1 )| < 1. But this contradicts 2). Lemma 4.13. (i) We have Mk = −
t tp−1 X X
(m)
δj fk (γj ) for k = 1, . . . , t.
j=1 m=0
(ii) For each τ ∈ Gal(L/Q) we have τ (M1 ) ∈ {M1 , . . . , Mt }. Proof. (i) This follows at once from Lemma 4.2 and our assumption
Pt
γj j=1 δj e
= 0.
(ii) There is a permutation τ ∗ of 1, . . . , t such that (τ (γk ), τ (δk )) = (γτ ∗ (k) , δτ ∗ (k) ) for k = 1, . . . , t. By applying τ to the coefficients of f1 , we obtain tp
p−1
l (X − τ (γ1 ))
t t Y Y p tp (X − τ (γj )) = l (X − γτ ∗ (1) ) (X − γτ ∗ (j) )p = fτ ∗ (1) . j=1
j=1
Hence τ (M1 ) = −
t
tp−1
t
tp−1
XX 1 (m) τ (δj )fτ ∗ (1) (τ (γj )) (p − 1)! j=1 m=0
XX 1 (m) = − δτ ∗ (j) fτ ∗ (1) (γτ ∗ (j) ) = Mτ ∗ (1) . (p − 1)! j=1 m=0
Given two algebraic numbers α, β and m ∈ Z>0 , we write α ≡ β (mod m) if (α − β)/m is an algebraic integer. Lemma 4.14. let k ∈ {1, . . . , t}. Then ( t )p Y (p−1) tp (4.13) f1 (γ1 ) = l (γ1 − γk ) , k=2
(4.14) (4.15)
(j) f1 (γm ) (j) f1 (γm )
= 0 for m = 1, . . . , t, j = 0, . . . , p − 1, (m, j) 6= (1, p − 1), ≡ 0(mod p) for m = 1, . . . , t, j > p. 53
Proof. The proofs of (4.13) and (4.14) are completely analogous to those of (4.5) and (4.6) in Lemma 4.5. We prove only (4.15). Let m ∈ {1, . . . , t} and j > p. Define 1 g(X) := f1 (X/l) = (p−1)! · l(X − lγ1 )p−1
t Y
(X − lγk )p .
k=2
Then (p − 1)!g has algebraically integral coefficients. Using (4.8), one easily shows (j) that the coefficients of (p − 1)!g1 /j! are algebraic integers. Hence for j > p, g (j) (lγj )/p is an algebraic integer, and therefore, (j)
f1 (γm ) lj g (j) (lγm ) = p p is an algebraic integer. This implies at once (4.15). Lemma 4.15. For p sufficiently large, M1 is a non-zero algebraic integer. Proof. An application of Lemma 4.14 gives p
M1 ≡ −δ1 A (mod p) with A := l
t
t Y
(γ1 − γk ).
k=2
Both δ1 , A are algebraic integers, hence M1 is an algebraic integer. We prove that for sufficiently large p, δ1 Ap /p is not an algebraic integer. Then necessarily, M1 6= 0. Assume that δ1 Ap /p is an algebraic integer. Let b = NL/Q (δ1 ), B = NL/Q (A). Then b, B ∈ Z, and the norm NL/Q (δ1 Ap /p) = bB p /pd is in Z, where d = [L : Q]. But this is impossible if p > |bB|. Lemma 4.16. For p sufficiently large we have |Mk | < 1 for k = 1, . . . , t. Exercise 4.3. Prove this lemma. Thus our assumption that Theorem 4.12 is false implies the Lemmas 4.13, 4.15, 4.16, and these together give a contradiction. 54
4.3
Other transcendence results
We give an overview of some other transcendence results, without proof. Recall that the logarithm on C is a multi-valued function: the logarithm of a non-zero complex number z is defined by log z := log |z| + i arg z, where log |z| is the usual real logarithm of the positive number |z| and arg z is any choice for the argument of z. In the theorems below, we fix a real ϕ0 , and take arg z ∈ [ϕ0 , ϕ0 + 2π). Having fixed ϕ0 , we have fixed the argument, the logarithm, and also ab := eb log a for a, b ∈ C P n with a 6= 0, where ez := ∞ n=0 z /n!. The theorems hold for any choice of ϕ0 . Theorem 4.17 (Gel’fond, Schneider, 1934). Let α, β ∈ Q with α 6= 0, 1, β 6∈ Q. Then αβ is transcendental. Corollary 4.18. Let α ∈ Q with α 6∈ Qi. Then eπα is transcendental. Proof. Choose log(−1) = πi. Then eπα = e−iα log(−1) = (−1)−iα . Corollary 4.19. Let α1 , α2 , β1 , β2 ∈ Q be non-zero. Assume that log α1 , log α2 are linearly independent over Q. Then β1 log α1 + β2 log α2 6= 0. Proof. Suppose β1 log α1 + β2 log α2 = 0. Put γ := −β2 /β1 . Then by assumption, γ 6∈ Q, and α2 = elog α2 = eγ log α1 = α1γ , contradicting Theorem 4.17. In 1966, A. Baker proved the following far-reaching generalization. Theorem 4.20 (A. Baker, 1966). Let α1 , . . . , αn , β1 , . . . , βn ∈ Q be non-zero. Assume that log α1 , . . . , log αn are linearly independent over Q. Then β1 log α1 + · · · + βn log αn is transcendental. Definition. We say that non-zero complex numbers α1 , . . . , αn are multiplicatively dependent if there are x1 , . . . , xn ∈ Z, not all 0, such that α1x1 · · · αnxn = 1. Otherwise, α1 , . . . , αn are called multiplicatively independent. 55
Corollary 4.21. Let n > 1. Let α1 , . . . , αn , β1 , . . . , βn , γ ∈ Q be such that α1 , . . . , αn 6= 0, α1 , . . . , αn are multiplicatively independent, (β1 , . . . , βn ) 6∈ Qn . Then eγ α1β1 · · · αnβn is transcendental. Proof. Suppose that δ := eγ α1β1 · · · αnβn is algebraic. We have elog δ = eγ+β1 log α1 +···+βn log αn and so, for some k ∈ Z, (4.16)
log δ = γ + β1 log α1 + · · · + βn log αn + 2kπi.
Note that log(−1) = (2l + 1)πi for some l ∈ Z depending on the choice of the log. By Theorem 4.20, log δ, log α1 , . . . , log αn and log(−1) are linearly dependent over Q, that is, there are x1 , . . . , xn , xn+1 , xn+2 ∈ Z, not all 0, such that (4.17)
x1 log α1 + · · · + xn log αn + xn+1 log(−1) + xn+2 log δ = 0.
Eliminating log δ from (4.16) and (4.17), we get 2k xn+2 γ+(xn+2 β1 +x1 ) log α1 +· · ·+(xn+2 βn +xn ) log αn +(xn+2 · 2l+1 +xn+1 ) log(−1) = 0.
In (4.17), at least one of x1 , . . . , xn , xn+2 is 6= 0, since log(−1) 6= 0. Since (β1 , . . . , βn ) 6∈ Qn we have xn+2 βi + xi 6= 0 for at least one i ∈ {1, . . . , n}. Applying again Theorem 4.20, we infer that there are y1 , . . . , yn+1 ∈ Z, not all 0, such that y1 log α1 + · · · + yn log αn + yn+1 log(−1) = 0. In fact, at least one of y1 , . . . , yn must be 6= 0. Now we get α12y1 · · · αn2yn = e2y1 log α1 +···+2yn log αn = e−2yn+1 log(−1) = e−2yn+1 (2l+1)πi = 1, contrary to our assumption. In fact, the conditions that the αi be multiplicatively independent, and the βi not all in Q are needed only if γ = 0. These conditions can be dropped if we assume that γ 6= 0. 56
Exercise 4.4. Let α1 , . . . , αn , β1 , . . . , βn , γ ∈ Q,and suppose that γ, α1 , . . . , αn 6= 0. Using Corollary 4.21, prove that eγ α1β1 · · · αnβn is transcendental. There is a far reaching conjecture, due to Schanuel, which implies all results mentioned before and much more.
Schanuel’s Conjecture. (1960’s) Let x1 , . . . , xn be any (not necessarily algebraic) complex numbers which are linearly independent over Q. Then trdeg(x1 , . . . , xn , ex1 , . . . , exn ) > n. We give some examples of known cases. Examples. 1. Let x ∈ C∗ . Then either x is transcendental, or x is algebraic and then by Lindemann’s Theorem, ex is transcendental. Hence trdeg(x, ex ) > 1. Schanuel’s Conjecture is still open for n > 2. 2. Let α1 , . . . , αn ∈ Q and suppose they are linearly independent over Q. By Corollary 4.11 (a consequence of the Lindemann-Weierstrass Theorem), eα1 , . . . , eαn are algebraically independent. Hence trdeg(α1 , . . . , αn , eα1 , . . . , eαn ) = n. We deduce some consequences of Schanuel’s Conjecture which are still open. Conjecture. e and π are algebraically independent. Proof under assumption of Schanuel’s Conjecture. The transcendence degree of a set of numbers does not change if some algebraic numbers are added to or removed from it. Moreover, the transcendence degree of this set does not change if we multiply its elements with non-zero algebraic numbers. So by Schanuel’s conjecture, trdeg(e, π) = trdeg(e, πi) = trdeg(1, πi, e, eπi ) = 2.
Conjecture. Let α1 , . . . , αn ∈ Q such that α1 , . . . , αn 6= 0 and log α1 , . . . , log αn are linearly independent over Q. Then log α1 , . . . , log αn are algebraically independent. Proof under assumption of Schanuel’s Conjecture. We have trdeg(log α1 , . . . , log αn ) = trdeg(log α1 , . . . , log αn , α1 , . . . , αn ) > n.
57
The above conjecture implies that for every non-zero polynomial P ∈ Q[X1 , . . . , Xn ] we have P (log α1 , . . . , log αn ) 6= 0. Baker’s Theorem 4.20 implies that this holds for linear polynomials P ∈ Q[X1 , . . . , Xn ], but even for quadratic polynomials P this is still open. For instance, the above conjecture implies that log 2 · log 3 is transcendental, but as yet not even this very special case could be proved. We mention some other consequences of Schanuel’s conjecture, the deduction of which is left as an exercise.
Exercise 4.5. Deduce the following from Schanuel’s conjecture: (i) Let α ∈ Q, α 6∈ iQ. Then π and eπα are algebraically independent. 2 d−1 (ii) Let α, β ∈ Q with α 6∈ {0, 1} and β of degree d > 2. Then αβ , αβ , . . . , αβ are algebraically independent. xn−1 (iii) Define the sequence {xn }∞ for n > 2, i.e., x2 = ee , n=1 by x1 = e and xn = e e x3 = ee , etc. Then x1 , . . . , xN are algebraically independent for every N > 1. (iv) Let α ∈ Q \ {0, 1}. Then log α, log log α are algebraically independent. (v) Let p, q be two distinct prime numbers. Then for every irrational x ∈ R, at least one of the numbers px , q x is transcendental.
Remark. The following has been proved. In 1996, Nesterenko proved (among other things), that π, eπ and Γ( 41 ) are alR∞ gebraically independent. Recall that Γ(x) = 0 tx−1 e−t dt for x > 0, that Γ(n) = √ (n − 1)! for every positive integer n, and that Γ( 21 ) = π. For α, β as in (ii), Diaz proved in 1989 that 2
trdeg(αβ , αβ , . . . , αβ
d−1
) > [(d + 1)/2]
where [x] is the largest integer 6 x. This settles (ii) for d = 3. In the 1960’s, Lang and Ramachandra independently proved (among other things) that if p1 , p2 , p3 are three distinct primes and x an irrational real, then at least one of the numbers px1 , px2 , px3 is transcendental. 58
4.4
Siegel’s Lemma
In the next section we will prove a special case of the Gel’fond-Schneider Theorem, that is that if α, β are real algebraic numbers with α 6= 0, 1, β 6∈ Q, then αβ is transcendental. In the present section we develop a tool which is very important in Diophantine approximation, that is the so-called Siegel’s Lemma, which was formally stated for the first time by Siegel in 1929, but known before. Essentially, it states that under certain hypotheses, a system of M homogeneous linear equations in N unknowns a11 x1 + · · · + a1N xN .. .
(4.18)
=0 .. .
aM 1 x1 + · · · + aM N xN = 0 has a non-trivial solution in integer coordinates, the absolute values of which are not too large. Theorem 4.22 (Siegel’s Lemma). Assume that N > M > 0, A > 1, and aij ∈ Z, |aij | 6 A for i = 1, . . . , M , j = 1, . . . , N . Then (4.18) has a solution x = (x1 , . . . , xN ) ∈ ZN \ {0} with max |xi | 6 (N A)M/(N −M ) .
16i6N
Proof. For i = 1, . . . , M , x ∈ ZN , put li (x) := −Ci :=
N X
PN
min(aij , 0), Di :=
j=1
j=1
aij xj and let
N X
max(aij , 0).
j=1
Notice that Ci + Di 6 N A. Let B be a positive integer, and let SB the set of y = (y1 , . . . , yN ) ∈ ZN with 0 6 yj 6 B for j = 1, . . . , N . For each y ∈ SB we have −Ci B 6 li (y) 6 Di B for i = 1, . . . , M. Notice that SB has cardinality (B + 1)N . Further, if y runs through SB , then (l1 (y), . . . , lM (y)) runs through a set of cardinality at most M Y (Ci B + Di B + 1) 6 (N AB + 1)M . i=1
59
We choose B such that (B + 1)N > (N AB + 1)M . Then by the box principle, there are distinct y1 , y2 ∈ SB with li (y1 ) = li (y2 ) for i = 1, . . . , M . Take x = y1 − y2 . Then x satisfies (4.18) and |xi | 6 B for i = 1, . . . , N . We finish our proof by showing that the choice B = [(N A)M/(N −M ) ] is valid. Indeed, with this choice of B we have (B + 1)N −M > (N A)M , hence (B + 1)N > (N A(B + 1))M > (N AB + 1)M .
We need a generalization where the coefficients aij are algebraic integers instead of just rational integers. To deduce this, we need some facts from algebraic number theory, which we recall without proof. Let K be an algebraic number field of degree d. Then its ring of integers OK is a free Z-module of rank d, that is, there are ω1 , . . . , ωd ∈ OK such that every element β of OK can be expressed uniquely as (4.19)
β = x1 ω1 + · · · + xd ωd with x1 , . . . , xd ∈ Z.
We call {ω1 , . . . , ωd } an integral basis of K. Let σ1 , . . . , σd be the embeddings of K in C. Then σi (β) = x1 σi (ω1 ) + · · · + xd σi (ωd ) for i = 1, . . . , d, or b = Ωx where b := (σ1 (β), . . . , σd (β))T , x = (x1 , . . . , xd )T , and σ1 (ω1 ) · · · σ1 (ωd ) .. .. Ω := . . . σ1 (ωd ) · · · σd (ωd ) The matrix Ω is invertible; in fact, DK := (det Ω)2 is a non-zero rational integer, the discriminant of K. We define the house of an algebraic integer α by α := max(|α(1) |, . . . , |α(m) |), 60
where α(1) , . . . , α(m) are the conjugates of α. If α belongs to an algebraic number field K of degree d and σ1 , . . . , σd are the embeddings of K in C, then each of the conjugates of α occurs d/m times among σ1 (α), . . . , σd (α). Hence we have also α = maxi |σi (α)|. Lemma 4.23. There is an effectively computable number C(K) depending on K and ω1 , . . . , ωd , such that if β ∈ OK then for the rational integers x1 , . . . , xd given by (4.19) we have max |xi | 6 C(K) · β . 16i6d
Proof. By expanding x = Ω−1 b we get xi = Ω−1 = (θij ). Apply the triangle inequality.
Pd
j=1 θij σj (β)
for i = 1, . . . , d, where
We consider again systems (4.18), but now the coefficients aij are from OK . Corollary 4.24. Let [K : Q] = d, let M, N be integers with N > dM > 0, let A be a real > 1, and suppose that aij ∈ OK ,
aij 6 A for i = 1, . . . , M, j = 1, . . . , N.
Then (4.18) has a solution x = (x1 , . . . , xN ) ∈ ZN \ {0} such that max |xij | 6 (C(K)N A)dM/(N −dM ) .
(4.20)
16j6d
P Proof. In general, if we write β ∈ OK as dk=1 yk ωk with yk ∈ Z, then β = 0 if and only if yk = 0 for k = 1, . . . , d. By Lemma 4.23 we have aij =
d X
ωk bikj with bikj ∈ Z, |bikj | 6 C(K) · A.
k=1
P Pd PN N Then N j=1 aij xj = k=1 ωk j=1 bikj xj for x ∈ Z , hence (4.18) holds for some x ∈ Zd if and only if bik1 x1 + · · · + bikN xN = 0 for i = 1, . . . , M, k = 1, . . . , d. Notice that this is a system of dM equations in N unknowns. By Theorem 4.22 this system has a non-trivial solution x ∈ ZN with (4.20). 61
4.5
The Gel’fond-Schneider Theorem
We prove the following theorem. Theorem 4.25. Let α, β be real algebraic numbers such that α > 0, α 6= 0 and β 6∈ Q. Then αβ is transcendental. Here αβ = eβ log α with the usual natural logarithm for positive real numbers. The proof in the case that α, β are not both real or α < 0 goes along the same lines, but with additional complications. Gel’fond and Schneider independently proved the above theorem, in the general case where α, β may be complex, with somewhat different proofs. We follow Schneider’s proof. We assume that γ := αβ is algebraic. Let K = Q(α, β, γ), d = [K : Q]. Recall that there are positive integers m1 , m2 , m3 such that m1 α, m2 β, m3 γ are algebraic integers. Then with m = m1 m2 m3 , mα, mβ, mγ are algebraic integers. Let D1 , D2 , L be parameters, which will be chosen optimally later. In what follows, c1 , c2 , . . . will be constants depending only on α, β, γ and L, and independent of D1 , D2 , L. Lemma 4.26. Assume that D1 D2 > 2dL2 , L > 5. Then there are integers aij (i = 0, . . . , D1 − 1, j = 0, . . . , D2 − 1), not all zero, such that the function (4.21)
F (z) =
D 1 −1 D 2 −1 X X i=0
aij z i αjz
j=0
has zeros a + bβ with a, b = 1, . . . , L, and such that (4.22)
|aij | 6 exp c1 (D1 log L + D2 L)
(i = 0, . . . , D1 − 1, j = 0, . . . , D2 − 1).
Proof. We have to find aij ∈ Z, not all zero, such that F (a + bβ) = 0 for a, b = 1, . . . , L. Using αa+bβ = αa γ b , this translates into a system of L2 linear equations in D1 D2 unknowns, D 1 −1 D 2 −1 X X i=0
aij (a + bβ)i αaj γ bj = 0 (a, b = 1, . . . , L).
j=0
62
To apply Siegel’s Lemma we want all coefficients of this system of equations to be algebraic integers. To this end, we multiply the equations with mD1 +2LD2 and obtain (4.23)
D 2 −1 1 −1 D X X i=0
aij mD1 +2LD2 (a + bβ)i αaj γ bj = 0 (a, b = 1, . . . , L).
j=0
We estimate the houses of the coefficients mD1 +2LD2 (a + bβ)i αaj γ bj of this system. Let σ : K ,→ C be an embedding of K. Then for the image under σ of a typical coefficient we have |mD1 +2LD2 (a + bσ(β))i σ(α)aj σ(γ)bj | 6 mD1 +2LD2 (L(1 + |σ(β))|D1 (1 + |σ(α)|)LD2 (1 + |σ(γ)|)LD2 6 exp c2 (D1 log L + D2 L) where the constant c2 has been chosen large enough in terms of m, σ(α), σ(β), σ(γ). By making c2 even larger, we can make it independent of σ, so that the houses of the coefficients of system (4.23) are all bounded above by exp c2 (D1 log L + D2 L) . Now Corollary 4.24 implies that system (4.23) has a solution in integers aij , not all zero, such that dL2 /(D1 D2 −dL2 ) |aij | 6 C(K)D1 D2 ec2 (D1 log L+D2 L) 6 exp c1 (D1 log L + D2 L) , choosing c1 sufficiently large. Here we have used our assumption D1 D2 > 2dL2 . We now choose the parameters D1 , D2 , L such that D1 D2 = 2dL2 and D1 = D2 L (to make D1 log L and D2 L about equal), i.e. √ √ (4.24) D1 = 2d · L3/2 , D2 = 2dL1/2 . Then the estimate in Lemma 4.26 becomes (4.25)
|aij | 6 exp c3 L3/2 log L .
We note that F (z) is a so-called exponential polynomial, i.e., a function of the shape r X E(z) = pk (z)eγk z , k=1
where the pk (z) are non-zero polynomials, and the γk distinct numbers. We need a simple result on the number of zeros of such a function. 63
Lemma 4.27. Assume that the γk and the coefficients of the pk are all reals. Put P M := rk=1 (1 + deg pk ). Then E(z) has at most M zeros in R. Exercise 4.6. Prove this lemma. Hint. Proceed by induction on M . Apply Rolle’s Theorem, which asserts that if G is a differentiable real function and a, b are reals with a < b and G(a) = G(b) = 0, then there is c with a < b < c and G0 (c) = 0. Notice that we can apply this lemma to our above function F (z), thanks to our assumption that α, β are real. Thus, this lemma implies that F (z) has at most D1 D2 = 2dL2 zeros. We know already that F (z) has the L2 zeros a + bβ (1 6 a, b 6 L). These zeros are all different, since β 6∈ Q. We briefly sketch the idea how to derive a contradiction from this. Details are provided later. Here it is important that we have some freedom to choose the parameters D1 , D2 , L introduced above. Thus, we can choose L sufficiently large to make all estimates work. We show that for all sufficiently small and all sufficiently large L, we have F (a + bβ) = 0 for all integers a, b with 1 6 a, b 6 L1+ . Thus, F has at least L2(1+) zeros. But for L sufficiently large this is larger than dL2 , and so this contradicts Lemma 4.27. To prove that F (a + bβ) = 0 for all integers a, b with 1 6 a, b 6 L1+ , we proceed 1+ as follows. The number A := mD1 +2D2 [L ] F (a + bβ) is a non-zero algebraic integer. Using an analytic argument, we show that |A| is very small. Further, by a trivial estimate we show that |σ(A)| is not too large for the embeddings σ of K in C not Q equal to the identity. It will follow that |NK/Q (A)| = | σ σ(A)| < 1, where the product is over all embeddings, the identity included. But then, F (a + bβ) = A = 0, since the norm of a non-zero algebraic integer is a non-zero element of Z. We now work out the details. We need a few facts from complex analysis. Recall that an entire function is a function f : C → C that is everywhere analytic, i.e., f (w)−f (z) f 0 (z) = limw→z w−z exists for every z ∈ C. The following two lemmas are standard, and their proofs can be found in any textbook on complex analysis. Lemma 4.28. Let f be an entire function and a ∈ C a zero of f . Then the function f (z) g defined by g(z) = z−a if z 6= a and g(a) = f 0 (a) is also entire. 64
Lemma 4.29 (Maximum Modulus Principle). Let f be an entire function. For R > 0, define |f |R := sup |f (z)|. |z|=R
Then for every z ∈ C with |z| 6 R we have |f (z)| 6 |f |R , i.e., |f (z)| attains its maximum on the disk |z| 6 R on the boundary of that disk. As a consequence of these two lemmas we obtain the following: Lemma 4.30. Let f be an entire function and a1 , . . . , ar distinct zeros of f . Let R, T be reals such that |ai | 6 R for i = 1, . . . , r and T > 2R. Then r |f (z)| 6 |f |T 4R/T for all z ∈ C with |z| 6 R. Proof. Let g(z) = f (z)/(z − a1 ) · · · (z − ar ). By Lemma 4.28, g(z) is entire. Let z ∈ C with |z| 6 R. On the one hand, by Lemma 4.29, r Y |f (z)| 6 |g(z)| (|z| + |ai |) 6 |g(z)|(2R)r 6 |g|T (2R)r , i=1
on the other hand, we have for w ∈ C with |w| = T , |g(w)| =
|f (w)| 6 |f (w)| · (2/T )r , |w − a1 | · · · |w − ar |
since |w − ai | > |w| − |ai | > T − R > 21 T . Hence |g|T 6 |f |T (2/T )r . Our lemma follows. Proof of Theorem 4.25. Let 0 < < 1/8. Put R := (1 + |β|)L1+ , T := (1 + |β|)L1+2 . Our intention is to show that if L is sufficiently large, then F (a + bβ) = 0 for all integers a, b with 1 6 a, b 6 L1+ . This yields L2(1+) distinct zeros of F , since β 6∈ Q. Assuming L is sufficiently large, this contradicts Lemma 4.27, since by that lemma, F cannot have more than D1 D2 = 2dL2 zeros. Choose integers a, b with 1 6 a, b 6 L1+ . We first estimate |F (a + bβ)| by applying Lemma 4.30. A simple application of the triangle inequality gives |F |T 6
D 1 −1 D 2 −1 X X i=0
3/2
|aij |T i |(1 + |α|)T j 6 D1 D2 ec3 L
j=0
65
log L
T D1 (1 + |α|)T D2 .
√ √ Using our choices D1 = 2dL3/2 , D2 = √ 2dL1/2 , T = (1 + |β|)T 1+2 , we see that the term with exponent T D2 = (1 + |β|) 2dL(3/2)+2 dominates. We thus obtain |F |T 6 exp c4 L(3/2)+2 . As a consequence, 2 |F (a + bβ)| 6 exp c4 L(3/2)+2 · (4L− )L . 1+ ]
We multiply F (a + bβ) with mD1 +2D2 [L 1+ ]
A := mD1 +2D2 [L
to obtain an algebraic integer, i.e.,
D 2 −1 1 −1 D X X
aij (a + bβ)i (αa γ b )j .
j=0
i=0
Now clearly, |A| 6 exp c5 L(3/2)+2 − L2 log L . We estimate the absolute values of the other conjugates of A. Let σ be an embedding of K = Q(α, β, γ) in C. Then by the triangle inequality, D1 +2D2 [L1+ ]
|σ(A)| 6 m
D 1 −1 D 2 −1 X X i=0 1+
|aij |(a + b|σ(β)|)i (|σ(α)|a |σ(γ)|b )j
j=0 3/2
6 D1 D2 mD1 +2D2 [L ] ec3 L 6 exp c6 L(3/2)+ .
log L
1+
(1 + |σ(β)|)L
(1 + |σ(α)|)(1 + |σ(γ)|)
D2 L1+
So altogether we have an upper bound for |A| and upper bounds for |σ(A)| for the other d − 1 embeddings σ of K not equal to the identity. By taking their product we obtain |NK/Q (A)| 6 exp c5 L(3/2)+2 + (d − 1)c6 L(3/2)+ − L2 log L . Now we choose L large enough so that the exponent becomes negative; this is possible since the term L2 log L grows more fastly than the other terms in the exponent as L → ∞. Further, we assume L2(1+) > 2dL2 = D1 D2 . With such a choice of L, we get |NK/Q (A)| < 1. Since the norm of a non-zero algebraic integer is a non-zero integer, this must imply A = 0, or equivalently, F (a + bβ) = 0. This holds for all integers a, b with 1 6 a, b 6 L1+ . As explained above, this contradicts Lemma 4.27. Our proof of Theorem 4.25 is complete. 66
Chapter 5 Linear forms in logarithms Literature: A. Baker, Transcendental Number Theory, Cambridge University Press, 1975. T.N. Shorey, R. Tijdeman, Exponential Diophantine equations, Cambridge University Press, 1986; reprinted 2008.
5.1
Lower bounds for linear forms in logarithms
We fix ϕ0 ∈ R, and define the complex logarithm by log z = log |z| + iarg z with ϕ0 < arg z 6 ϕ0 + 2π. In the results below, the choice of ϕ0 does not matter. We recall Baker’s transcendence result from the previous chapter. Theorem 5.1 (A. Baker, 1966). Let α1 , . . . , αm ∈ Q\{0, 1}, γ ∈ Q and β1 , . . . , βm ∈ Q \ {0}. Assume that log α1 , . . . , log αm are linearly independent over Q. Then γ + β1 log α1 + · · · + βm log αm 6= 0. One may ask about quantitative versions of this theorem, i.e., can we give a strictly positive lower bound for the absolute value of the left-hand side? In 1967, Baker indeed obtained such a lower bound, which we conveniently refer to as a ’lower bound for a linear form in logarithms’. Baker’s lower bound turned out to be an 67
extremely powerful tool, not only in transcendence theory, but also in applications which do not have anything to do with transcendence, such as Diophantine equations and Gauss’ class number 1 problem. For this reason, Baker’s lower bound from 1967 was improved by Baker himself and others. We will give some applications to certain Diophantine equations. We recall a lower bound for linear forms in logarithms by Baker from 1975. Recall that the height H(α) of α ∈ Q is the maximum of the absolute values of the coefficients of the primitive minimal polynomial Fα of α. Theorem 5.2 (A. Baker, 1975). Let α1 , . . . , αm ∈ Q \ {0, 1} and γ, β1 , . . . , βm ∈ Q. Assume that Λ := γ + β1 log α1 + · · · + βm log αm 6= 0. Then |Λ| > (eB)−C
(e = 2.7182 . . .)
where B = max(H(γ), H(β1 ), . . . , H(βm )), and where C is an effectively computable positive number depending on m, the degrees and heights of α1 , . . . , αm , and ϕ0 . The assertion that C is effectively computable means that by going through the proof of Theorem 5.2 one can compute an explicit value of C. For our applications, we restrict ourselves to the case that γ = 0 and βi = bi ∈ Z for i = 1, . . . , m. In that case, we can get rid of the logarithms. Corollary 5.3. Let α1 , . . . , αm ∈ Q \ {0, 1} and let b1 , . . . , bm ∈ Z such that bm α1b1 · · · αm 6= 1.
Then 0
bm |α1b1 · · · αm − 1| > (eB)−C ,
where B := max(|b1 |, . . . , |bm |) and where C 0 is an effectively computable number depending only on m and on the degrees and heights of α1 , . . . , αm . Proof. For the logarithm of a complex number z we choose log z = log |z| + iarg z with −π < arg z 6 π. With this choice of log we have log(1 + z) =
∞ X (−1)n−1 n=1
n
· z n for z ∈ C with |z| < 1. 68
Using this power series expansion, one easily shows that log(1 + z)| 6 |z|(1 + |z| + |z|2 + · · · ) 6 2|z| for z ∈ C with |z| 6 12 . bm − 1. If |z| > 1/2 we are done, so we suppose We apply this with z := α1b1 · · · αm that |z| 6 1/2. We have to estimate from below | log(1 + z)|.
Recall that the complex logarithm is additive only modulo 2πi. That is, log(1+z) = b1 log α1 +· · ·+bm log αm +2kπi = b1 log α1 +· · ·+bm log αm +2k log(−1) for some k ∈ Z, since log(−1) = πi. Applying Theorem 5.2 we get −C1 | log(1 + z)| > e max(B, |2k|) where C1 is an effectively computable constant depending only on m and α1 , . . . , αm . Since | log(1 + z)| 6 2|z| 6 1 we have |2kπi| 6 1 +
m X
| log αj | · |bj | 6 1 +
j=1
m X
| log αj | · B = C2 B,
j=1
say. Hence |2k| 6 C2 B and so | log(1 + z)| > (eC2 B)−C1 . This implies |z| > 21 (eC2 B)−C1 > (eB)−C
0
for a suitable C 0 , as required. For completeness, we give a completely explicit version of Corollary 5.3 in the case that α1 , . . . , αm are integers. Recall that the height of a rational number a = x/y with x, y ∈ Z coprime, is given by H(a) := max(|x|, |y|). Theorem 5.4 (Matveev, 2000). Let a1 , . . . , am be non-zero rational numbers and let b1 , . . . , bm be integers such that ab11 · · · abmm 6= 1. 0
Then |ab11 · · · abmm − 1| > (eB)−C , where B = max(|b1 |, . . . , |bm |),
C 0 = 12 e · m4.5 30m+3
m Y j=1
69
max 1, log H(aj ) .
To illustrate the power of the above results we give a quick application. Corollary 5.5. let a, b be integers with a > 2, b > 2. Then there is an effectively computable number C1 > 0, depending only on a, b, such that for any two positive integers m, n, max(am , bn ) |am − bn | > . (e max(m, n))C1 Consequently, for any non-zero integer k, there exists an effectively computable number C2 , depending on a, b, k such that if m, n are positive integers with am − bn = k, then m, n 6 C2 . Proof. Let m, n be positive integers. Put B := max(m, n). Assume without loss of generality that am > bn . By Corollary 5.3 or Theorem 5.4 we have |1 − bn a−m | > (eB)−C1 , where C1 is an effectively computable number depending only on a, b. Multiplying with am gives our first assertion. Now let m, n be positive integers with am − bn = k. Put again B := max(m, n). Since a, b > 2 we have am > 2m , bn > 2n , hence am = max(am , bn ) > 2B . So, |k| > 2B · (eB)−C1 . This proves that B is bounded above by an effectively computable number depending on a, b, k. Exercise 5.1. In 1995, Laurent, Mignotte and Nesterenko proved the following explicit estimate for linear forms in two logarithms. Let a1 , a2 be two positive rational numbers 6= 1. Further, let b1 , b2 be non-zero integers. Suppose that Λ := b1 log a1 − b2 log a2 6= 0. Then log |Λ| > n −24.34 max log
o2 |b1 | |b2 | + 0.14 , 21 log H(a1 ) log H(a2 ) . + log H(a2 ) log H(a1 )
Using this estimate, compute an upper bound C, such that for all positive integers m, n with 97m − 89n = 8 we have m, n 6 C. Hint. Use | log(1 + z)| 6 2|z| if |z| 6 21 . 70
In 1844, Catalan conjectured that the equation in four unknowns, xm − y n = 1 in x, y, m, n ∈ Z with x, y, m, n > 2 has only one solution, that is, 32 −23 = 1. In 1976, as one of the striking consequences of the results on linear forms in logarithms mentioned above, Tijdeman proved that there is an effectively computable constant C, such that for every solution (x, y, m, n) of Catalan’s equation, one has xm , y n 6 C. The constant C can be computed but it is extremely large. Several people tried to prove Catalan’s conjecture, on the one hand by reducing Tijdeman’s constant C using sharper linear forms in logarithm estimates, on the other hand by showing with techniques from algebraic number theory that xm , y n have to be very large as long as (xm , y n ) 6= (32 , 23 ), and finally using heavy computations. This didn’t lead to success. In 2000 Mihailescu managed to prove Catalan’s conjecture by an algebraic method which is completely independent of linear forms in logarithms. We give another application. Consider the sequence {an } with an = 2n for n = 0, 1, 2, . . .. Note that an − an−1 = 21 an . Similarly, we may consider the increasing sequence {an } of numbers which are all composed of primes from {2, 3}, i.e., 1, 2, 3, 4, 6, 8, 9, 12, 16, 18, 24, 27, 32, . . . and ask how the gap an − an−1 compares with an as n → ∞. More generally, we may take a finite set of primes and ask this question about the sequence of consecutive integers composed of these primes. Theorem 5.6 (Tijdeman, 1974). Let S = {p1 , . . . , pt } be a finite set of distinct primes, and let a0 < a1 < a2 < · · · be the sequence of consecutive positive integers composed of primes from S. Then there are effectively computable positive numbers c1 , c2 , depending on t, p1 , . . . , pt , such that an an − an−1 > for n = 1, 2, . . . . c1 (log an )c2 Proof. let n > 1. We have an = pk11 · · · pkt t , and an−1 = pl11 · · · pltt with non-negative integers ki , li . By Corollary 5.3, an−1 = |1 − pl11 −k1 · · · pltt −kt | > (eB)−C , 1 − an where B := max(|l1 − k1 |, . . . , |lt − kt |) and C is effectively computable and depends only on t, p1 , . . . , pt . Note that ki 6
log an log an 6 , log pi log 2
li 6
log an−1 log an 6 log pi log 2 71
for i = 1, . . . , t,
hence B 6 log an / log 2. It follows that an − an−1 > an (e log an / log 2)−C .
5.2
Dirichlet’s Unit Theorem
We want to apply the results from the previous section to certain Diophantine equations, and for this, we need some facts on units in algebraic number fields. Let K be an algebraic number field of degree d. Recall that K has precisely d embeddings in C. An embedding σ of K in C is called real if σ(K) ⊂ R, and complex otherwise. If σ is a complex embedding of K, then so is σ : x 7→ σ(x), i.e., the composition of σ and complex conjugation. Hence the complex embeddings of K occur in complex conjugate pairs {σ, σ}, and so, the number of complex embeddings of K is even. Let us denote by r1 the number of real embeddings of K, and by 2r2 the number of complex embeddings of K. Thus, r1 + 2r2 = d. Further, we order the embeddings σ1 , . . . , σd of K in such a way that σ1 , . . . , σr1 are the real embeddings, σr1 +r2 +1 = σr1 +1 , . . . , σr1 +2r2 = σr1 +r2 . ∗ the group of We denote as usual by OK the ring of integers of K, and by OK units of OK . Further, we define the norm and house of α ∈ OK by respectively,
NK/Q (α) := σ1 (α) · · · σd (α),
α := max |σi (α)|. 16i6d
Recall that the norm is multiplicative, and that NK/Q (α) ∈ Z for α ∈ OK . ∗ Lemma 5.7. Let α ∈ OK . Then α ∈ OK ⇐⇒ NK/Q (α) = ±1.
∗ Proof. =⇒. Let α ∈ OK . Then α, α−1 ∈ OK . Hence NK/Q (α) ∈ Z, NK/Q (α−1 ) ∈ Z. But the product of these two integers is NK/Q (1) = 1, hence both integers are ±1. Q ⇐= Suppose σ1 = id. Then α−1 = ± di=2 σi (α) is an algebraic integer in K, ∗ hence in OK . So α ∈ OK .
72
To study the units of OK , it will be useful to consider the logarithms of the absolute values of their conjugates. More precisely, we consider the map −→
∗ log: OK → Rd : ε 7→ (log |σ1 (ε)|, . . . , log |σd (ε)|). ∗ This is clearly a group homomorphism from OK with multiplication to Rd with ∗ addition. For ε ∈ OK we have
log |σr1 +r2 +i (ε)| = log |σr1 +i (ε)| for i = 1, . . . , r2 , d X log |σi (ε)| = 0, i=1 −→
∗ so log maps OK to the linear subspace H of Rd , consisting of the vectors x = (x1 , . . . , xd ) ∈ Rd satisfying the equations
xr1 +r2 +1 = xr1 +1 , . . . , xr1 +2r2 = xr1 +r2 , x1 + · · · + xd = 0. Notice that H has dimension r := d − (r2 + 1) = r1 + r2 − 1. The following result is known as Dirichlet’s Unit Theorem. For a proof we refer to any textbook on algebraic number theory. A lattice in a real vector space V is an additive group {z1 a1 + · · · + zm am : z1 , . . . , zm ∈ Z} where {a1 , . . . , am } is a basis of V . We call {a1 , . . . , am } also a basis of the lattice. −→
−→
Theorem 5.8 (Dirichlet). The image of log is a lattice in H. The kernel of log is the group UK of roots of unity of K, and this group is finite. −→ −→
−→
Choose units ε1 , . . . , εr such that log (ε1 ), . . . , log (εr ) form a basis of the lattice
∗ log (OK ) (we call such ε1 , . . . , εr a system of fundamental units for K). Then for ∗ every ε ∈ OK , there are unique integers b1 , . . . , br such that −→
−→
−→
log (ε) = b1 log (ε1 ) + · · · + br log (εr ) ∗ Hence ε ∈ OK can be expressed uniquely as
(5.1)
ζεb11 · · · εbrr with ζ ∈ UK , b1 , . . . , br ∈ Z.
We deduce some consequences. 73
Lemma 5.9. There is an effectively computable number C > 0 depending on K, ε1 , . . . , εr , ∗ such that for every ε ∈ OK we have max(|b1 |, . . . , |br |) 6 C · log ε , where b1 , . . . , br are the integers defined by (5.1). Proof. From ε = ζεb11 · · · εbrr we deduce log |σi ()| =
r X
bj log |σi (εj )| (i = 1, . . . , d)
j=1
or in matrix notation log |σ1 (ε1 )| · · · log |σ1 (εr )| .. .. . . .. .. . . log |σd (ε1 )| · · · log |σd (εr )|
·
b1 .. = . br
log |σ1 ()| .. . .. . log |σd ()|
.
The matrix has rank r = dim H since its columns form a basis of H. So we can select r rows, say with indices i1 , . . . , ir , which form an invertible r × r-matrix. Denote by Ω the inverse of this matrix, and let Ω = aij i,j=1,...,r . Then bi =
r X
aij log |σij (ε)| for i = 1, . . . , r.
j=1
Note that the terms aij are determined by K, ε1 , . . . , εr and are effectively computable in terms of these quantities. By one of the homework exercises, ε > 1 and |σj (ε)| > ε 1−d for j = 1, . . . , d. Hence log |σj (ε)| 6 d log ε for j = 1, . . . , d. Now an application of the triangle inequality gives ! r X max |bi | 6 max |aij | · d log ε = C · log ε . 16i6r
16i6r
j=1
74
∗ The next lemma states that given α ∈ OK \ {0}, we can find ε ∈ OK such that all conjugates of εα have about the same absolute value. Then the maximum of these absolute values, which is εα , is about the d-th root of the product of these absolute values, which is |NK/Q (εα)| = |NK/Q (α)| since NK/Q (ε) = ±1.
Lemma 5.10. There is an effectively computable number c > 0 with the following ∗ property: for every non-zero α ∈ OK there is ε ∈ OK such that c−1 |NK/Q (α)|1/d 6 εα 6 c|NK/Q (α)|1/d .
(5.2)
Proof. In general, if L is a lattice in H, then for every point x ∈ H there is a point u ∈ L such that ku − xk2 6 c(L) for some number c(L) depending only on L. This c(L) can be computed in terms of a basis of L. By applying this −→
∗ with L =log (OK ), we see that there is an effectively computable number c1 > 0, −→
∗ ∗ with ), such that for every x ∈ H there is ε ∈ OK depending on the lattice log (OK −→
kx− log (ε)|k2 6 c1 . This implies xi − log |σi (ε)| 6 c1 for i = 1, . . . , d, where x = (x1 , . . . , xd ). We apply this with xi := − log |σi (α)| + d1 log |NK/Q (α)| (i = 1, . . . , d). ∗ With these xi , the point x is easily seen to lie in H. It follows that there is ε ∈ OK such that 1 log |σ (ε)| + log |σ (α)| − log |N (α)| 6 c1 for i = 1, . . . , d, i i K/Q d
i.e., 1 log |σ (εα)| − log |N (α)| 6 c1 for i = 1, . . . , d. i K/Q d Choosing i with εα = |σi (εα)|, we get (5.2) with c := ec1 . Corollary 5.11. Given α ∈ OK \ {0}, one can effectively determine a finite set of divisors γ1 , . . . , γm of α in OK such that for each divisor β of α in OK there are ∗ ε ∈ OK and i ∈ {1, . . . , m} such that β = εγi . 75
Proof. Let β be a divisor of α. Then NK/Q (β) divides NK/Q (α). By the previous ∗ lemma, there is ε ∈ OK , such that εβ 6 c|NK/Q (β)|1/d 6 c|NK/Q (α)|1/d . By one of the homework exercises, there are only finitely many algebraic integers γ of degree at most d and house at most c|NK/Q (α)|1/d . This implies that there are at most finitely many γ ∈ OK with γ 6 c|NK/Q (α)|1/d . In fact, these can be determined effectively, and for each of these γ it can be checked whether they divide α in OK . This leaves us with a finite set {γ1 , . . . , γm } as in the statement of our corollary.
5.3
Unit equations and Thue equations
Let K be an algebraic number field. We consider the so-called unit equation (5.3)
∗ αx + βy = 1 in x, y ∈ OK ,
where α, β ∈ K ∗ . Theorem 5.12. Eq. (5.3) has at most finitely many solutions, and these can be determined effectively. In 1921, Siegel proved that (5.3) has only finitely many solutions, but his proof is ineffective, in the sense that it shows only that there are only finitely many solutions, but it does not give a method how to determine them. Our proof, based on lower bounds for linear forms in logarithms, does give a method to determine the solutions. This effective proof is already implicit in work of Baker from the 1960’s. Gy˝ory (1978) made this explicit. Proof. Let (x, y) be a solution of (5.3). By (5.1), there are ζ1 , ζ2 ∈ UK , as well as a1 , . . . , ar , b1 , . . . , br ∈ Z, such that x = ζ1 εa11 · · · εar r , y = ζ2 εb11 · · · εbrr . 76
Thus, αζ1 εa11 · · · εar r + βζ2 εb11 · · · εbrr = 1. We assume without loss of generality that B := max(|a1 |, . . . , |br |) = |br |. We estimate from above and below, Λi := |σi (α)σ(ζ1 )σi (ε1 )a1 · · · σi (εr )ar − 1| = |σi (β)σi (y)| for a suitable choice of i. In fact, let |σi (y)| be the smallest, and |σj (y)| = y the largest among |σ1 (y)|, . . . , |σd (y)|. Then by Lemma 5.7, |σi (y)|d−1 · y 6 1 and subsequently by Lemma 5.9, |σi (y)| 6 y −1/(d−1) 6 e−B/C(d−1) . This leads to Λi 6 |σi (β)|e−B/C(d−1) . 0
By Corollary 5.3 we have |Λi | > (eB)−C for some effectively computable number C 0 depending on α, ε1 , . . . , εr and the finitely many roots of unity of K. We infer 0
(eB)−C 6 |σi (β)|e−B/C(d−1) and this leads to an effectively computable upper bound for B. Remark. There are practical algorithms to solve equations of the type (5.3) which work well as long as the degree of the field K is not too large, and K has a system of fundamental units whose heights are not too large. These algorithms are based on lower bounds for linear forms in logarithms and the Lenstra-Lenstra-Lov´asz lattice basis reduction algorithm (LLL-algorithm). For instance, in 2000 Wildanger deter∗ mined all solutions of the equation x+y = 1 in x, y ∈ OK , with K = Q(cos(2π/19)). The number field K has degree 9 and all its embeddings are real. Thus, the unit ∗ group OK has rank 8. In general, a form of degree d in n variables is a homogenenous polynomial F (X1 , . . . , Xn ) of degree d, i.e., a polynomial consisting of terms cX1i1 · · · Xnin with 77
i1 + · · · + in = d. Note that F (tX1 , . . . , tXn ) = td F (X1 , . . . , Xn ). A binary form of degree d is a homogeneous polynomial of degree d in two variables, i.e., F (X, Y ) = a0 X d + a1 X d−1 Y + · · · + ad Y d . Suppose that F has its coefficients in K. Let ar be the first non-zero coefficient of F from the left. In some finite extension L of K, we can factor F (X, 1) as ar (X − α1 )r1 · · · (X −αt )rt , where α1 , . . . , αt are distinct, and the multiplicities r1 , . . . , rt > 0. Then we get (5.4)
F (X, Y ) = Y d F (X/Y, 1) = ar Y d−r (X − α1 Y )r1 · · · (X − αt Y )rt .
Thus, a binary form can be factored into linear forms. A Thue equation is an equation of the shape (5.5)
F (x, y) = m in x, y ∈ Z,
where F is a binary form with coefficients in Z and m is a non-zero integer. We consider some special cases. First suppose that F is linear. Then (5.5) becomes ax + by = m in x, y ∈ Z. As is well-known, this equation has no solution if gcd(a, b) does not divide m, and infinitely many solutions if gcd(a, b) does divide m. Next, suppose that F is quadratic. Then (5.5) specializes to ax2 + bxy + cy 2 = m. If the discriminant D = b2 − 4ac < 0 then this equation describes an ellipsis, and this has only finitely many points (x, y) ∈ Z2 on it. In fact, these points may be determined by rewriting the equation as a(x + (b/2a)y)2 + (|D|/4a)y 2 = m. In case that D > 0 the equation may have infinitely many solutions, e.g., the Pell equation x2 − dy 2 = 1 where d > 1 is a positive integer, not equal to a square. In fact it can be shown that if D = b2 − 4ac > 0 and D is not a square, then ax2 + bxy = cy 2 = m has either no, or infinitely many solutions. Another special case is, where F may have arbitrary degree d but the coefficient a0 of X d is 0. Then F is divisible by Y , so if (x, y) is a solution of (5.5), then y 78
divides m. For each divisor y of m there are at most finitely many integers x with F (x, y) = m, which, if they exist, can be determined effectively. We prove the following. Theorem 5.13. Let F ∈ Z[X, Y ] be a binary form of degree d. Suppose that the coefficient of X d in F is non-zero and that F (X, 1) has at least three distinct zeros in C. Let m be a non-zero integer. Then the equation (5.5)
F (x, y) = m in x, y ∈ Z
has only finitely many solutions. In 1909, the Norwegian mathematician A. Thue proved in an ineffective way that Eq. (5.5) has only finitely many solutions. In 1967, Baker gave an effective proof of this. Proof. By assumption, F (X, Y ) = a0 X d + · · · + ad Y d with a0 6= 0. We make a reduction to the case a0 = 1. If (x, y) is a solution of (5.5), then, by multiplying with ad−1 0 , d d−1 (a0 x)d + a1 (a0 x)d−1 + a2 a0 (a0 x)d−2 y 2 + · · · + ad ad−1 0 y = ma0 .
Thus, (a0 x, y) satisfies a Thue equation F 0 (x0 , y 0 ) = m0 , where the coefficient of X d in F 0 is 1. Henceforth, we assume that the coefficient of X d in F is 1. Then F (X, Y ) = (X − α1 Y )r1 · · · (X − αt Y )rt with α1 , . . . , αt ∈ C, r1 , . . . , rt > 0 and t > 3. We want to reduce (5.5) to a unit equation. The crucial observation here is that the three linear forms in two variables X − αi Y (i = 1, 2, 3) are linearly dependent. More precisely, we have Siegel’s identity (α2 − α3 )(X − α1 Y ) + (α3 − α1 )(X − α2 Y ) + (α1 − α2 )(X − α3 Y ) = 0. This implies that if (x, y) ∈ Z2 is a solution of (5.5), then (5.6)
α2 − α3 x − α1 y α3 − α1 x − α2 y · + · = 1. α2 − α1 x − α3 y α2 − α1 x − α3 y 79
Let K = Q(α1 , . . . , αt ). Then α1 , . . . , αt ∈ OK since they are zeros of the monic polynomial F (X, 1) ∈ Z[X]. Let (x, y) ∈ Z2 be a solution of (5.5). Then the numbers x − αi y (i = 1, 2, 3) divide m in OK . By Corollary 5.11, we have x − αi y = µi εi , ∗ where µi belongs to an effectively determinable finite set and εi ∈ OK for i = 1, 2, 3. By substituting this into (5.6) we obtain α 2 − α 3 µ1 ε1 α3 − α1 µ2 ε2 · · + · · = 1. α 2 − α 1 µ3 ε3 α2 − α1 µ3 ε3
We may view this as a unit equation with unknowns ε1 /ε3 , ε2 /ε3 . We have only finitely many possibilities for each µi which can be determined effectively, and by Theorem 5.12, for each choice of µ1 , µ2 , µ3 we have only finitely possibilities for the pair (ε1 /ε3 , ε2 /ε3 ) which can be determined effectively. Consequently, if (x, y) runs through the solutions of (5.5), then the quotient (x−α1 y)/(x−α3 y) = (µ1 /µ3 )(ε1 /ε3 ) runs through a finite set which can be determined effectively. We can compute x/y from (x − α1 y)/(x − α3 y) and then x, y from y d F (x/y, 1) = F (x, y) = m. In this way, it follows that (5.5) has only finitely many solutions which can be determined effectively. Remark. Today, Thue equations can really be solved in practice, and several packages contain routines to solve Thue equations (KANT, Maple but to my knowledge not yet SAGE). These routines are based on lower bounds for linear forms in logarithms, and the LLL-algorithm. Exercise 5.2. Let F (X, Y ) ∈ Z[X, Y ] be a positive definite binary form of degree d > 3, i.e., the coefficient of X d is > 0 and the zeros of F (X, 1) are all in C \ R. Prove, without using lower bounds for linear forms in logarithms, that for each positive integer m the equation F (x, y) = m has only finitely many solutions in x, y ∈ Z. We finish with stating, without proof, an effective finiteness result for the equation (5.7)
by n = f (x) in x, y ∈ Z.
where n > 2, b is a non-zero integer and f ∈ Z[X]. For n = 2 this is called a hyperelliptic equation and for n > 3 a superelliptic equation. Such equations can be reduced to unit equations or Thue equations. 80
Theorem 5.14 (Baker, 1968). Assume that f has no multiple zeros and that f has degree at least 2 if n > 3 and degree at least 3 if n = 2. Then (5.7) has only finitely many solutions, and its set of solutions can be determined effectively. We consider a special case to illustrate the idea of the proof. Consider the equation (5.8)
y 3 = 2x(x − 3) in x, y ∈ Z.
Let (x, y) be a solution of (5.8). The gcd of 2x and x − 3 divides 6. So if p is a prime number > 5, then p divides at most one of 2x, x − 3 and if it divides one of these numbers, the exponent of p in the unique factorization of that number is divisible by 3. It follows that 2x = au3 , x − 3 = bv 3 where ab is a third power, and both a and b are composed of primes from {2, 3}. In fact, we may assume that the exponents on 2, 3 in a, b are either 0,1, or 2, since powers 23k , 33l can be absorbed by u, v. Thus, a, b ∈ {±2k 3l : k, l = 0, 1, 2}. Considering the solutions (x, y) of (5.8) with fixed a, b, we get a Thue equation au3 − 2bv 3 = 6. By determining the solutions (u, v) for each of these Thue equations, we can determine the solutions of (5.8). A similar approach can be followed for equations y n = f (x) with n > 3 if f is not reducible over Z. Then f factorizes over some algebraic number field K, and we have to make a reduction to Thue equations of which the unknowns are taken from OK instead of Z. For such equations one has a finiteness result similar to Theorem 5.13. In the case n = 2, one can make a reduction only to Thue equations where the involved binary form has degree 2, and in this case, Thue’s theorem is not applicable. Then one needs a more complicated argument. Exercise 5.3. Let ai , bi , ci (i = 1, 2) be integers with a1 , b1 , c1 , a2 , b2 , c2 , a1 b2 − a2 b1 , b1 c2 − b2 c1 6= 0. Prove that there are only finitely many triples (x, y, z) ∈ Z3 satisfying the system of equations (5.9)
a1 x2 − b1 z 2 = c1 , a2 y 2 − b2 z 2 = c2 . 81
√ √ √ √ Hint. Let K = Q( a1 , b1 , a2 , b2 ). Apply Theorem 5.12 and the ideas in the proof of Theorem 5.13 to the identities p p p p p p √ √ b2 (x a1 + z b1 ) − b1 (y a2 + z b2 ) = x a1 b2 − y a2 b1 , p p p p p p √ √ b2 (x a1 − z b1 ) − b1 (y a2 − z b2 ) = x a1 b2 − y a2 b1 . Then conclude that if (x, y, z) runs through the solutions of (5.9) then √ √ √ √ (x a1 + z b1 )/(x a1 − z b1 ) runs through a finite set. Exercise 5.4. Use Exercise 5.3 to prove that the equation y 2 = x(2x − 3)(4x − 5) in x, y ∈ Z has only finitely many solutions. In 1976, Schinzel and Tijdeman obtained the surprising result that Eq. (5.7) has no solutions with y 6= 0, ±1 if n is too large. Theorem 5.15 (Schinzel-Tijdeman, 1976). Let b be a non-zero integer and f (X) ∈ Z[X] a polynomial of degree at least 2 without multiple zeros. Then there is an effectively computable number C depending on f such that if by n = f (x) is solvable in x, y ∈ Z with y 6= 0, ±1, then n 6 C. Exercise 5.5. (i) Prove that the equation xn − 2y n = 1 in x, y ∈ Z with x > 2, y > 2 has no solutions if n > 15000. Hint. Applying the estimate of Laurent-Mignotte-Nesterenko from Exercise 5.1 to an appropriate linear form in two logarithms you will get a lower estimate depending on n and x, y. But you can derive also an upper estimate which depends on n, x, y. Comparing the two estimates leads to an upper bound for n independent of x, y. (ii) Let a, b, c be positive integers. Prove that there is a number C, effectively computable in terms of a, b, c, such that the equation axn − by n = c has no solutions if n > C. In the case a = b you may give an elementary proof, without using the result of Laurent-Mignotte-Nesterenko. 82
(iii) Prove that the equation x z y = in x, y, z ∈ Z with x > 4, y > 2, z > 3 3 has only finitely many solutions.
5.4
p-adic analogues
The results mentioned in Section 5.1 have so-called p-adic analogues. We give one example. Recall that each non-zero rational number can be expressed uniquely as a product of prime powers. We may express this as Y pordp (a) , a=± p∈P where P is the set of prime numbers, and the exponents ordp (a) are integers, at most finitely many of which are non-zero. We define the p-adic absolute value of a by |a|p := p−ordp (a) for a ∈ Q∗ , |0|p := 0. For instance, −72/343 = −23 · 32 · 7−3 , hence | − 72/343|2 = 2−3 , | − 72/343|3 = 3−2 , | − 72/343|7 = 73 . Notice that for any prime number p we have |ab|p = |a|p |b|p , |a + b|p 6 max(|a|p , |b|p ) for a, b ∈ Q. The last inequality is called the strong triangle inequality or ultrametric inequality. In general, if a1 , . . . , ar are rational numbers such that |a1 |p > |ai |p for i = 2, . . . , r, then (5.10)
|a1 + · · · + ar |p = |a1 |p .
The strong triangle inequality implies that the p-adic absolute value |·|p defines a metric dp on Q, given by dp (x − y) := |x − y|p . Two numbers x, y ∈ Q are p-adically 83
close, if dp (x − y) is small, which means that x − y = a/b where a, b are coprime integers and a is divisible by a high power of p. From topology it is known how to complete a metrical space, by adding to this space the limits of all its Cauchy sequences. The metrical completion of Q with metric dp is denoted Qp . As it turns out, addition and multiplication on Q can be extended to Qp , and this makes Qp into a field, the field of p-adic numbers. In Diophantine approximation, | · |p and Qp have the same ‘status’ as the ordinary absolute value and R, and many results in Diophantine approximation and transcendence theory have analogues in the p-adic setting. We will not go into this. To gave a flavour, we give an analogue of Corollary 5.3 in the case that α1 , . . . , αm are rational numbers. There is a more general version for algebraic α1 , . . . , αm but it requires knowledge of algebraic number theory to state this. Theorem 5.16. (Yu, 1986) Let p be a prime number, let a1 , . . . , am be non-zero rational numbers with |ai |p = 1 for i = 1, . . . , m. Further, let b1 , . . . , bm be integers such that ab11 · · · abmm 6= 1. Put B := max(|b1 |, . . . , |bm |). Then |ab11 · · · abmm − 1|p > (eB)−C where C is an effectively computable number depending on p, m and a1 , . . . , am . For m = 1 there is a sharper result which can be proved by elementary means (exercise below). But for m > 2 the proof is very difficult. One may define p-adic logarithms and translate the theorem into a lower bound for the p-adic absolute value of a linear form in p-adic logarithms. We do not work this out. Exercise 5.6. Let a be an integer, and p a prime, such that |a|p 6 p−1 if p > 2 and |a|2 6 2−2 if p = 2. Prove that for any positive integer b we have |(1 + a)b − 1|p = |ab|p > 1/ab. Hint. First prove this for b with |b|p = 1 and for b = p. Then prove it for b = upt where u is an integer not divisible by p and t > 0. We give an application. Let S = {p1 , . . . , pt } be a finite set of primes numbers and define the multiplicative group US := {±pz11 · · · pzt t : z1 , . . . , zt ∈ Z}. 84
Theorem 5.17. The equation x + y = 1 in x, y ∈ US
(5.11)
has only finitely many solutions, and these can be determined effectively. Proof. Let (x, y) be a solution of (5.11). We may write x = u/w, y = v/w where u, v, w are integers with gcd(u, v, w) = 1. Then (5.12)
u + v = w.
The integers u, v, w are composed of primes from S, and moreover, no prime divides two numbers among u, v, w since u, v, w are coprime. After reordering the primes p1 , . . . , pt , we may assume that u = ±pb11 · · · prbr ,
b
r+1 v = ±pr+1 · · · pbss ,
b
s+1 w = ±ps+1 · · · pbt t ,
where 0 6 r 6 s 6 t and the bi are non-negative integers (empty products are equal to 1; for instance if r = 0 then u = ±1). We have to prove that B := max(b1 , . . . , bt ) is bounded above by an effectively computable number depending only on p1 , . . . , pt . By symmetry, we may assume that B = bt . Then using −(u/v) − 1 = −(w/v) we obtain −b s t 0 < | ± pb11 · · · pbrr pr+1r+1 · · · p−b − 1|pt = |w/v|pt = p−b = p−B s t t . From Theorem 5.17 we obtain that | · · · |pt > (eB)−C , where C is effectively computable in terms of p1 , . . . , pt . Hence (eB)−C 6 p−B t . So indeed, B is bounded above by an effectively computable number depending on p1 , . . . , p t . Remark. In his PhD-thesis from 1988, de Weger gave a practical algorithm, based on strong linear forms in logarithms estimates and the LLL-basis reduction algorithm, to solve equations of the type (5.11). As a consequence, he showed that the equation x + y = 1 has precisely 545 solutions in positive integers x, y ∈ US with 0 < x 6 y, where S = {2, 3, 5, 7, 11, 13}.
85
Let K be an algebraic number field and let Γ be a finitely generated, multiplicative subgroup of K ∗ , i.e., there are γ1 , . . . , γt ∈ Γ such that every element of Γ can be expressed as ζγ1z1 · · · γtzt where ζ is a root of unity in K, and z1 , . . . , zt are integers. Further, let a, b be non-zero elements from K and consider the equation (5.13)
ax + by = 1 in x, y ∈ Γ.
The following result is a common generalization of both Theorems 5.12 and 5.17. Theorem 5.18 (Gy˝ory, 1979). Equation (5.13) has only finitely many solutions, and these can be determined effectively. In 1960, Lang gave an ineffective proof of this result, by combining earlier work of Siegel (1921), Mahler (1933) and Parry (1950). Gy˝ory’s proof is based on Corollary 5.3 and a generalization of Theorem 5.16 for algebraic numbers.
86
Chapter 6 Approximation of algebraic numbers by rationals Literature: W.M. Schmidt, Diophantine approximation, Lecture Notes in Mathematics 785, Springer Verlag 1980, Chap.II, §§1,2, Chap. IV, §1 L.J. Mordell, Diophantine Equations, Pure and applied Mathematics series, vol. 30, Academic Press, 1969. reprint of the 1971 edition.
6.1
Liouville’s Theorem and Roth’s Theorem
We are interested in the problem how well a given real algebraic number can be approximated by rational numbers. Recall that the height H(ξ) of a rational number ξ is given by H(ξ) := max(|x|, |y|), where x, y are coprime integers such that ξ = x/y. In Homework exercise 10b, you were asked to prove the following inequality, which is a variation on a result of Liouville from 1844: Theorem 6.1. Let α be an algebraic number of degree d > 1. Then there is an effectively computable number c(α) > 0 such that (6.1)
|α − ξ| > c(α)H(ξ)−d for every ξ ∈ Q with ξ 6= α.
Here we may take c(α) = den(α)−d · (1 + α )1−d . 87
Let α be an algebraic number of degree d > 1. One of the central problems in Diophantine approximation is, to obtain improvements of (6.1) with in the righthand side H(ξ)−κ with κ < d instead of H(ξ)−d . More precisely, the problem is, whether there exist κ < d and a constant c(α, κ) > 0 depending only on α, κ, such that (6.2)
|ξ − α| > c(α, κ)H(ξ)−κ for every ξ ∈ Q.
Recall that by Dirichlet’s Theorem, there exist infinitely many pairs of integers x −2 x, y such that y − α 6 |y| , y 6= 0. For such solutions we have |x| 6 (|α| + 1) · |y|. Hence, writing ξ = xy we infer that there is a constant c1 (α) > 0 such that |ξ − α| 6 c1 (α)H(ξ)−2 for infinitely many ξ ∈ Q. This shows that there can not exist an inequality of the shape (6.2) with κ < 2. In particular, for rational or quadratic algebraic numbers α, Theorem 6.1 gives the best possible result in terms of the exponent on H(ξ). Now let α be a real algebraic number of degree d > 3. In 1909, the Norwegian mathematician A. Thue made an important breakthrough by showing that for every κ > d2 + 1 there exists a constant c(α, √κ) > 0 such that (6.2) holds. In 1921, C.L. Siegel proved the same for every κ > 2 d. In 1949, A.O. Gel’fond √ and independently Freeman Dyson (the famous physicist) improved this to κ > 2d. Finally, in 1955, K.F. Roth proved the following result, for which he was awarded the Fields medal. Theorem 6.2 (Roth, 1955). Let α be a real algebraic number of degree d > 3. Then for every κ > 2 there exists a constant c(α, κ) > 0 such that (6.2)
|ξ − α| > c(α, κ)H(ξ)−κ for every ξ ∈ Q.
As mentioned before, Roth’s Theorem is valid also if α is a rational or quadratic number (with the proviso that ξ 6= α if α ∈ Q) but then it is weaker than (6.1). Further, Roth’s Theorem holds true also for complex, non-real algebraic numbers α; then we have in fact |ξ − α| > |Im α| for α ∈ Q, i.e., (6.2) holds even with κ = 0. Exercise 6.1. Let α be a real algebraic number of degree d > 3. Prove that the following three assertions are equivalent: (i) for every κ > 2 there is a constant c(α, κ) > 0 with (6.2); 88
(ii) for every κ > 2, the inequality (6.3)
|ξ − α| 6 H(ξ)−κ in ξ ∈ Q
has only finitely many solutions; (iii) for every κ > 2, C > 0, the inequality (6.4)
|ξ − α| 6 CH(ξ)−κ in ξ ∈ Q
has only finitely many solutions. It should be noted that Theorem 6.1 is effective, i.e., the constant c(α) in (6.1) can be computed. In contrast, the results of Thue, Siegel, Gel’fond, Dyson and Roth mentioned above are ineffective, i.e., with their methods of proof one can prove only the existence of a constant c(α, κ) > 0 as in (6.2), but one can not compute such a constant. Equivalently, the methods of proof of Thue ,. . ., Roth show that the inequalities (6.3), (6.4) have only finitely many solutions, but they do not provide a method to determine these solutions. Thue used his result on the approximation of algebraic numbers stated above, to prove his famous theorem that the equation F (x, y) = m in x, y ∈ Z where F is a binary form in Z[X, Y ] such that F (X, 1) has at least three distinct roots and m is a non-zero integer, has at most finitely many solutions. We prove a more general result. A binary form F (X, Y ) ∈ Z[X, Y ] is called square-free if it is not divisible in C[X, Y ] by (αX + βY )2 for some α, β ∈ C, not both 0. Theorem 6.3. Let F (X, Y ) ∈ Z[X, Y ] be a square-free binary form of degree d > 3. Then for every κ > 2 there is a constant c(F, κ) > 0 such that for every pair of integers (x, y) with F (x, y) 6= 0 we have (6.5)
|F (x, y)| > c(F, κ) max(|x|, |y|)d−κ .
If F is a binary form of degree d 6 2 the theorem holds true as well but then it is trivial since |F (x, y)| ∈ Z, hence > 1. 89
Proof. We prove the inequality only for pairs of integers (x, y) with |y| > |x|. Then the inequality can be deduced for pairs (x, y) with |x| > |y| by interchanging x, y and repeating the argument below. Next, we restrict to the case that |y| > |x| and F is not divisible by Y . If F is divisible by Y we have F = Y · F1 where F1 ∈ Z[X, Y ] is a square-free binary form of degree d − 1 > 2 which is not divisible by Y . Then if the inequality holds for F1 and with d − 1 instead of d, it follows automatically for F . So assume that F is a square-free binary form of degree d > 2 that is not divisible by Y . Then F (X, Y ) = a0 X d + a1 X d−1 Y + · · · + ad Y d with a0 6= 0, and so, F (X, Y ) = a0 (X − α1 Y ) · · · (X − αd Y ) with α1 , . . . , αd distinct. Let (x, y) be a pair of integers with F (x, y) 6= 0 and |y| > |x|. Then y 6= 0. Let ξ := x/y. Notice that |y| = max(|x|, |y|) > H(ξ) (with equality if gcd(x, y) = 1). Let i be the index with |ξ − αi | = min |ξ − αj |. j=1,...,d
By Theorem 6.2 we have |ξ − αi | > c(αi , κ)H(ξ)−κ > c(αi , κ) max(|x|, |y|)−κ . For j 6= i we have |αi − αj | 6 |αi − ξ| + |ξ − αj | 6 2|ξ − αj |, implying 1 |ξ − αj | > |αi − αj |. 2 Hence d
|F (x, y)| = |y| · |a0 |
d Y
d
|ξ − αj | = max(|x|, |y|) · |a0 |
j=1
> c(αi , κ)|a0 |
Y
d Y j=1
1 |αi 2
− αj | · max(|x|, |y|)d−κ .
j6=i
We deduce Thue’s Theorem. 90
|ξ − αj |
Corollary 6.4. Let F (X, Y ) be a binary form in Z[X, Y ] such that F (X, 1) has at least three distinct roots. Further, let m be a non-zero integer. Then the equation F (x, y) = m in x, y ∈ Z has at most finitely many solutions. Proof. We first make a reduction to the case that F (X, Y ) is square-free, by showing that F is divisible in Z[X, Y ] by a square-free binary form F ∗ ∈ Z[X, Y ] of degree > 3. We can factor the polynomial F (X, 1) as cg1 (X)k1 · · · gt (X)kt where c is a nonzero integer and g1 (X), . . . , gt (X) are irreducible polynomials in Z[X] none of which is a constant multiple of the others. Let f ∗ (X) := g1 (X) · · · gt (X). Then f ∗ ∈ Z[X], and deg f ∗ =: d > 3 since F (X, 1) has at least three zeros in C. We have F (X, 1) = f ∗ (X)g(X) with g ∈ Z[X]. Put F ∗ (X, Y ) = Y d f (X/Y ) and G(X, Y ) := Y deg F −d g(X/Y ). Then F = F ∗ G with G ∈ Z[X, Y ]. The polynomial f ∗ has degree d > 3 and d distinct zeros, and it divides F (X, 1) in Z[X]. Hence F ∗ is square-free, F ∗ has degree d > 3 and F ∗ divides F in Z[X, Y ]. Let x, y be integers with F (x, y) = m. Then F ∗ (x, y) divides m. Take κ with 2 < κ < d. Then by Theorem 6.3, |m| > |F ∗ (x, y)| > c(F ∗ , κ) max(|x|, |y|)d−κ , implying that |x|, |y| are bounded. P The total degree of a polynomial G = i ai X1i1 · · · Xrir , notation totdeg G, is the maximum of all quantities i1 + · · · + ir , taken over all tuples i = (i1 , . . . , ir ) with ai 6= 0. For instance, 3X17 X25 X32 − 2X1 X212 X32 has total degree 15. Exercise 6.2. Let F ∈ Z[X, Y ] be a square-free binary form of degree d > 4, and let G ∈ Z[X, Y ] be a polynomial of total degree 6 d − 3. Prove that there are only finitely many pairs (x, y) ∈ Z2 with F (x, y) = G(x, y) and F (x, y) 6= 0. As mentioned before, the proof of Roth’s Theorem is ineffective, and an effective proof of Roth’s Theorem seems to be very far away. There are however effective improvements of Liouville’s inequality, i.e., inequalities of the shape |ξ − α| > c(α, κ)H(ξ)−κ for ξ ∈ Q 91
where α is algebraic of degree d > 3 and κ < d (but very close to d) and with some explicit expression for c(α, κ). We mention the following result of the Russian mathematician Fel’dman, obtained using lower bound for linear forms in logarithms. Theorem 6.5 (Fel’dman, 1971). Let α be a real algebraic number of degree d > 3. Then there exist effectively computable numbers c1 (α), c2 (α) > 0 depending on α such that (6.6)
|ξ − α| > c1 (α)H(ξ)−d+c2 (α) for ξ ∈ Q .
The proof is to complicated to be given here, but we can give a brief sketch. The hard core is the following effective result on Thue equations, given by Fel’dman. The proof is by making explicit the arguments of the proof given in the previous chapter. Lemma 6.6. Let F ∈ Z[X, Y ] be a binary form such that F (X, 1) has at least three zeros in C. Then there are effectively computable numbers A, B depending only on F , such that for each non-zero integer m and each solution (x, y) ∈ Z2 of F (x, y) = m we have max(|x|, |y|) 6 A|m|B . Proof of Theorem 6.5 (assuming Lemma 6.6). Let f (X) = a0 X d + a1 X d−1 + · · · + ad = a0 (X − α(1) ) · · · (X − α(d) ) be the primitive minimal polynomial of α and F (X, Y ) := Y d F (X/Y ). Then F (X, Y ) is a binary form in Z[X, Y ]. Let ξ = x/y with x, y ∈ Z coprime. Then f (ξ) 6= 0 and this implies m := F (x, y) 6= 0. By Lemma 6.6 we have (6.7)
1/B 1/B |F (x, y)| = |m| > max(|x|, |y|)/A = H(ξ)/A ,
where A, B are effectively computable positive numbers depending on F , hence α. It remains to estimate from below |ξ − α| in terms of |F (x, y)| and H(ξ). Q Assume α(1) = α. Notice that F (x, y) = a0 di=1 (x − α(i) y). We estimate the factors as follows: |x − α(1) y| = |ξ − α| · |y| 6 |ξ − α| · H(ξ), |x − α(i) y| 6 |x| + |α(i) | · |y| 6 (1 + |α(i) |)H(ξ) (i = 2, . . . , d). 92
Thus, d Y x − α(i) y |F (x, y)| = |a0 | i=1 d Y 6 |a0 | · |ξ − α| · H(ξ) (1 + |α(i) |) d
i=2 d
= |ξ − α| · C(α)H(ξ) , say, where C(α) is effectively computable. Hence |ξ − α| > |F (x, y)| · C(α)−1 H(ξ)−d . Combined with (6.7) this gives |ξ − α| > A−1/B C(α)−1 H(ξ)−d+1/B = c1 (α)H(ξ)−d+c2 (α) with c1 (α) = A−1/B C(α)−1 , c2 (α) = 1/B. The quantities c1 (α), c2 (α) are very small numbers for which one can find an explicit expression by going through the proof. For instance, Bugeaud proved in 1998, that (6.6) holds with d−1 c1 (α) = exp − 1027d d16d H d−1 log(edH) , d−1 −1 c2 (α) = 1027d d16d H d−1 log(edH) where H = H(α) is the height of α. One can obtain better results for certain special classes of algebraic numbers using other methods. M. Bennett obtained good effective improvements of Liouville’s √ inequality for various numbers of the shape m a where m is a positive integer and a a positive rational number. For instance he showed that 1 √ 3 (6.8) ξ − 2 > H(ξ)−2.45 for ξ ∈ Q. 4 Exercise 6.3. Using (6.8), compute explicit constants A, B such that the following holds: for any solution x, y ∈ Z of x3 − 2y 3 = m we have max(|x|, |y|) 6 A|m|B . Hint. Go through the proof of Theorem 6.3 and compute a constant c such that |x3 − 2y 3 | > c max(|x|, |y|)3−2.45 for all x, y ∈ Z. Notice that we may have |x| > |y|. 93
The techniques used by Thue,. . ., Roth cannot be used in general to solve Diophantine equations, but together with suitable refinements, they allow to give explicit upper bounds for the number of solutions of Diophantine equations. For instance we have: Theorem 6.7 (Bombieri, Schmidt, 1986). Let F (X, Y ) be a binary form in Z[X, Y ] such that F (X, 1) has precisely d > 3 distinct roots. Then the equation F (x, y) = 1 in x, y ∈ Z has at most c · d solutions where c is a positive constant not depending on F . The importance of the result is that the bound is uniform, i.e. for all binary forms F we get the upper bound cd. It is possible to compute c explicitly. Bombieri and Schmidt showed that for sufficiently large d, c can be taken equal to 430. Probably the constant c can be improved, but the dependence on d is optimal. For instance, let F (X, Y ) = (X − a1 Y ) · · · (X − ad Y ) + Y d , where a1 , . . . , ad are distinct integers. Then the equation F (x, y) = 1 has the d solutions (a1 , 1), . . . , (ad , 1). M. Bennett proved the following remarkable result: Theorem 6.8 (Bennett, 2002). Let d be an integer with d > 3 and let a, b be positive integers. Then the equation |axd − by d | = 1 has at most one solution in positive integers x, y. For instance, the equation (a + 1)xd − ay d = 1 has (1, 1) as its only solution in positive integers. In his proof, Bennett uses various techniques (good lower bounds for linear forms in two logarithms, Diophantine approximation techniques based on so-called hypergeometric functions, and heavy computations). We finish with an exercise related to the abc-conjecture, formulated by Masser and Oesterl´e in 1985. The radical rad(N ) of an integer N is the product of the primes dividing N . For instance, rad(±23 57 118 ) = 2 · 5 · 11. abc-conjecture. For every ε > 0 there is a constant C(ε) > 0 such that for all positive integers a, b, c with a + b = c, gcd(a, b, c) = 1 we have c 6 C(ε)rad(abc)1+ε . 94
The abc-conjecture has many striking consequences. As an example we deduce a weaker version of Fermat’s Last Theorem. Let x, y, z be positive coprime integers and n > 4. Assume that xn + y n = z n . Apply the abc-conjecture with a = xn , b = y n , c = z n . Then rad(abc) 6 xyz 6 z 3 . On the other hand, z n 6 C(ε)(z 3 )1+ε . Taking ε < 41 it follows that z and n are bounded. Exercise 6.4. Assuming the abc-conjecture, prove that the equation xm + y n = z p has only finitely many solutions in positive integers x, y, z, m, n, p with x > 1, y > 1 1, z > 1, gcd(x, y, z) = 1 and m + n1 + p1 < 1. Granville and Langevin proved independently that the abc-conjecture is equivalent to the following: Granville-Langevin conjecture. Let F (X, Y ) ∈ Z[X, Y ] be a square-free binary form of degree d > 3. Then for every κ > 2 there is a constant C(F, κ) > 0 such that rad(F (x, y)) > C(F, κ) max(|x|, |y|)d−κ
for every x, y ∈ Z with gcd(x, y) = 1, F (x, y) 6= 0.
Exercise 6.5. (i) Prove that the Granville-Langevin conjecture implies the abcconjecture (the converse is also true but this is much more difficult to prove). (ii) Prove that the Granville-Langevin conjecture implies Roth’s Theorem. (iii) An integer n 6= 0 is called powerful if every prime in the prime factorization of n occurs with exponent at least 2. In other words, n is powerful if it can be expressed as ±a2 b3 for certain positive integers a, b not both equal to 1. Let F (X, Y ) ∈ Z[X, Y ] be a square-free binary form of degree at least 5. Assuming the Granville-Langevin conjecture, prove that there are only finitely many pairs of integers x, y with gcd(x, y) = 1 such that F (x, y) is powerful. (iv) Assuming the Granville-Langevin conjecture, prove the following. Let f (X) ∈ Z[X] be a polynomial of degree d > 2 with d distinct zeros in C. Then for every ε > 0 there is a constant C 0 (f, ε) > 0 such that rad(f (x)) > C(f, ε)|x|d−1−ε for all x ∈ Z with f (x) 6= 0. 95
Hint. Construct from f a binary form F of degree d + 1. (v) Deduce the following conjecture of Schinzel: if f is any square-free polynomial in Z[X] of degree > 3, then there are only finitely many integers x such that f (x) is powerful.
6.2
Thue’s approximation theorem
We intend to prove the following result of Thue: Theorem 6.9. Let α be a real algebraic number of degree d > 3 and κ > Then the inequality
d 2
+ 1.
|ξ − α| 6 H(ξ)−κ in ξ ∈ Q
(6.9)
has only finitely many solutions. Our basic tool will be Siegel’s Lemma, which we recall here. We consider systems of linear equations a11 x1 + · · · + a1N xN .. .
(6.10)
=0 .. .
aM 1 x1 + · · · + aM N xN = 0 with coefficients aij from the ring of integers OK of a number field K. Proposition 6.10. Let K be an algebraic number field of degree d. Then there is a constant C(K) > 0 for which the following holds. Let M, N be integers with N > dM > 0, let A be a real > 1, and suppose that aij ∈ OK ,
aij 6 A for i = 1, . . . , M, j = 1, . . . , N.
Then (6.10) has a solution x = (x1 , . . . , xN ) ∈ ZN \ {0} such that (6.11)
max |xij | 6 (C(K)N A)dM/(N −dM ) .
16j6d
Proof. This Corollary 4.24 from Chapter 4. 96
We introduce some notation. The norm of a polynomial P = P is given by kP k := D i=0 |pi |. It is not difficult to check that
PD
i=0
pi X i ∈ C[X]
(6.12)
|P (α)| 6 kP k max(1, |α|)deg P
(6.13)
kP + Qk 6 kP k + kQk, kP Qk 6 kP k · kQk for P, Q ∈ C[X].
for P ∈ C[X], α ∈ C,
From these properties it can be deduced that if P ∈ C[X], α ∈ C, then for the polynomial Pe(X) := P (X + α) we have kPek 6 kP k(1 + |α|)deg P .
(6.14)
Exercise 6.6. Prove (6.12)–(6.14). The k-th divided derivative of a polynomial P ∈ C[X] is defined by P ((k)) := P i P (k) /k!. Thus, if P = D i=0 pi X , then P
((k))
D X i = pi X i−k k i=0
a with := 0 if b > a. b
Notice that if P ∈ Z[X] then also P ((k)) ∈ Z[X]. Further, since each binomial coefficient ki can be estimated from above by 2i 6 2deg P , we have kP ((k)) k 6 2deg P kP k.
(6.15)
Lastly, we have the product rule (6.16)
((k))
(P Q)
=
k X
P ((k−j)) Q((j)) for P, Q ∈ C[X].
j=0
A brief outline of the proof of Theorem 6.9. We give a brief outline of the proof. More details and explanation are given later. We follow the usual procedure to assume that (6.9) has infinitely many solutions, and to construct a non-zero integer of absolute value < 1. The first step of the proof is to take, for any positive integer r, non-zero polynomials Pr , Qr ∈ Z[X] of degree as small as possible such that Pr − αQr is divisible by (X − α)r . Using Siegel’s Lemma, one can prove the existence of such Pr , Qr of degree at most m := [( 21 d + ε)r] for any ε > 0. 97
To see this, view the coefficients of Pr , Qr as a system of 2m + 2 unknowns. The condition Pr − αQr divisible by (X − α)r is equivalent to Pr((k)) (α) − αQ((k)) (α) = 0 for k = 0, . . . , r − 1, r and by expanding this, we get a system of r linear equations with coefficients in K := Q(α) in the 2m + 2 unknown coefficients of Pr , Qr . Now [K : Q] = d and 2m + 2 > dr, hence by Corollary 6.10, this system has a non-trivial solution in integers. In the second step, we take two solutions of (6.9), say ξ1 = x1 /y1 , ξ2 = x2 /y2 with xi , yi ∈ Z, gcd(xi , yi ) = 1, yi > 0 for i = 1, 2, and consider the number 1 [( +ε)dr]
Ar := y1 2
y2 (Pr (ξ1 ) − ξ2 Qr (ξ1 )).
This number is clearly an integer. We want to show that we can choose the solutions ξ1 , ξ2 such that Ar 6= 0 and |Ar | < 1, thus obtaining a contradiction. To prove |Ar | < 1, we write Pr − αQr = Vr · (X − α)r with Vr ∈ C[X] and obtain
1 [( +ε)dr]
Ar = y1 2
y2 Vr (ξ1 )(ξ1 − α)r − (ξ2 − α)Qr (ξ1 ) .
To keep our discussion informal, we ignore ε, and just pretend that |Vr (ξ1 )| 6 1, |Qr (ξ1 )| 6 1. Define λ := log H(ξ2 )/ log H(ξ1 ), and write H1 := H(ξ1 ). Notice that we can make both H1 and λ as large as we like, since we assumed that (6.9) has infinitely many solutions. Then y1 6 H1 , y2 6 H1λ , and thus, |Ar |
0
(dr/2)+λ
60 H1 =
(H1−κr + H1−λκ )
λ+r(−κ+d/2)
H1
(1−κ)λ+dr/2
+ H1
.
We want to make both exponents negative, to make |Ar | < 1. This amounts to r>
λ , κ − 21 d
r
12 d + 1 is equivalent to 0
0 , bα ∈ OK . By expanding the above expression and multiplying with bm we obtain m m X X i m i i m i+1 b α pi − b α qi = 0 (k = 0, . . . , r − 1), k k i=0 i=0
which is a system of r linear equations in 2m + 2 > (1 + 2ε)dr unknowns with coefficients in OK . Thus, the number of unknowns is larger than [K : Q] times the number of equations, and the condition of Proposition 6.10 is satisfied. As a consequence, the above system has a non-trivial solution (p0 , . . . , pm , q0 , . . . , qm ) ∈ Z2m+2 such that dr
max(max |pi |, max |qi |) 6 (C(K)(2m + 2)A) 2m+2−dr 6 (C(K)(2 + 2ε)drA)1/2ε , i
i
where (with 0 6 i 6 m, 0 6 k 6 r), i m i+1 i m i b α A = max max b α , max k,i i,k k k (2+2ε)dr 6 2m bm max(1, α )m+1 6 2b max(1, α ) . So using C(K)(2 + 2ε) 6 (C(K)(2 + 2ε))(2+2ε)dr , dr 6 2(2+2ε)dr , we see that Lemma d(1+ε−1 ) 6.11 holds with C1 = 4C(K)(2 + 2ε)b · max(1, α ) . We now take two solutions ξ1 , ξ2 of (6.9) and show that for every r there is a not ((k)) ((k)) too large k such that Pr (ξ1 ) − ξ2 Qr (ξ1 ) 6= 0. We start with a simple lemma. Lemma 6.12. Let F ∈ Q[X], β an algebraic number such that (X − β)m divides F , and f ∈ Q[X] the minimal polynomial of β. Then f m divides F . Proof. By the unique factorization in Q[X], we can factor F in Q[X] as cf1k1 · · · fsks where c ∈ Q, f1 , . . . , fs are distinct monic irreducible polynomials in Q[X] and k1 , . . . , ks > 0. Since two monic irreducible polynomials cannot have a common zero, β is a zero of precisely one of f1 , . . . , fs , say f1 . Then f1 = f and k1 > m since (X − β)m divides F and f has no double zeros. Lemma 6.13. Let ξ1 , ξ2 be two rational numbers, and r a positive integer. Then ((k)) ((k)) there is k 6 d(2εr + 1) such that Pr (ξ1 ) 6= ξ2 Qr (ξ1 ). 100
Proof. The proof rests upon an analysis of the polynomial F := Pr Q0r − Pr0 Qr . We first show that F is not identically 0. Assume the contrary. One of Pr , Qr , say Qr , is not identically 0. Then (Pr /Qr )0 = 0 hence Pr /Qr is identically equal to some constant c ∈ Q. But then, Qr = (c − α)−1 Pr − αQr is divisible by (X − α)r . By Lemma 6.12, Qr is divisible by f r , where f is the minimal polynomial of α. But this is impossible, since by our assumption ε < 21 we have r deg f = rd > ( 21 + ε)dr > deg Qr . We now prove our lemma. Assume there exists an integer t > 1 such that Pr((k)) (ξ1 ) = ξ2 Q((k)) (ξ1 ) for k = 0, . . . , t r (if not, we are done). By eliminating ξ2 we obtain Pr((k)) (ξ1 )Q((l)) (ξ1 ) − Pr((l)) (ξ1 )Q((k)) (ξ1 ) = 0 for k, l 6 t. r r ((l))
((m))
((m))
((l))
For each k > 0, F ((k)) is a linear combination of Pr Qr −Pr Qr , 0 6 l, m 6 ((k)) k + 1. Hence F (ξ1 ) = 0 for k 6 t − 1, and therefore, F is divisible by (X − ξ1 )t . By construction, Pr − αQr is divisible by (X − α)r , hence Pr0 − αQ0r is divisible by (X − α)r−1 . So F = Pr (Q0r − αPr0 ) − Pr0 (Qr − αPr ) is divisible by (X − α)r−1 . But F ∈ Q[X] hence by Lemma 6.12 it is divisible by f r−1 . So F is in fact divisible by (X − ξ1 )t f r−1 . Since deg F 6 max(deg Pr + deg Q0r , deg Pr0 + deg Qr ) 6 (1 + 2ε)dr − 1,
deg f = d,
it follows that t 6 (1 + 2ε)dr − 1 − d(r − 1) = d(2εr + 1) − 1. This proves our lemma. Take two solutions ξ1 , ξ2 of (6.9). Write ξi = xi /yi with xi , yi ∈ Z, gcd(xi , yi ) = 1 and yi > 0 for i = 1, 2. For integers r > 0, k > 0 consider the number 1 [( +ε)dr] ((k)) ((k)) 2 Ark := y1 y2 Pr (ξ1 ) − ξ2 Qr (ξ1 ) . This is clearly an integer, and by Lemma 6.13 there is k < d(2εr + 1) such that Ark 6= 0. We proceed to estimate |Ark |. 101
Lemma 6.14. Let ξ1 , ξ2 be two solutions of (6.9) with H(ξ2 ) > H(ξ1 ) and put λ :=
log H(ξ2 ) . log H(ξ1 )
Then |Ark | 6 C2r H(ξ1 )u + H(ξ1 )v
where C2 is an effectively computable number depending only on α, κ, ε, and where u = 21 d − κ + (2κ + 1)εd r + dκ + λ, v = ( 21 + ε)dr − (κ − 1)λ. Proof. We write H1 := H(ξ1 ), H2 := H(ξ2 ). Thus, H2 = H1λ . ((k))
The polynomial Pr
((k))
− αQr
is divisible by (X − α)r−k . Hence
Pr((k)) − αQ((k)) = V · (X − α)r−k with V ∈ C[X]. r This gives 1 [( +ε)dr] r−k {k} 2 − (ξ2 − α)Qr (ξ1 ) |Ark | = y1 y2 V (ξ1 )(ξ1 − α) ( 1 +ε)dr 1 ( +ε)dr r−k 2 2 6 max(|V (ξ1 )|, |Q((k)) (ξ )|) · |y y | · |ξ − α| + |y y | · |ξ − α| . 1 2 1 2 2 1 1 r We take for granted for the moment that |V (ξ1 )| 6 C2r , |Q(ξ1 )| 6 C2r . Notice that 1 ( +ε)dr
|y1 2
1 ( +ε)dr
y2 | 6 H1 2
1 ( +ε)dr+λ
H2 = H1 2
,
|ξ2 − α| 6 H2−κ = H1−κλ , −κr(1−2dε)+dκ
|ξ1 − α|r−k 6 (H1−κ )r−k 6 (H1−κ )r−2dεr−d = H1
.
A combination of these estimates gives our lemma. ((k))
We finish with estimating |Qr (ξ1 )|, |V (ξ1 )|. Here, we use (6.12)–(6.15). Define ((k)) ((k)) e e P (X) := Pr (X + α), Q(X) := Qr (X + α), Ve (X) := V (X + α) and ξe := ξ1 − α. Then 1 ( 1 +ε)dr kP k kPek 6 kP ((k)) k(1 + |α|)( 2 +ε)dr 6 2(1 + |α|) 2 ( 1 +ε)dr r 6 2(1 + |α|) 2 C1
102
e 6 2(1 + |α|) and likewise kQk
( 1 +ε)dr 2
e = X r−k Ve we have C1r . Since Pe − αQ
1 e 6 (1 + |α|) 2(1 + |α|) ( 2 +ε)dr C r . kVe k 6 kPek + |α| · kQk 1 e = |ξ − α| 6 1, this leads to Together with |ξ| e 6 kQk e ξ)| e 6 C2r , |Q((k)) (ξ1 )| = |Q( r
e 6 kVe k 6 C r , |V (ξ1 )| = |Ve (ξ)| 2
1
1
with C2 := 2( 2 +ε)d (1 + |α|)1+( 2 +ε)d C1 . This completes our proof. Proof of Theorem 6.9. We take solutions ξ1 , ξ2 of (6.9) with H(ξ2 ) > H(ξ1 ). Then by Lemma 6.13, for every integer r > 1, there is k 6 d(2εr + 1) such that Ark is a non-zero integer. We want to show that there are r > 1 and ξ1 , ξ2 , such that |Ark | < 1 for all k 6 d(2εr + 1) and obtain a contradiction. To this end, we want to choose r in such a way that u 6 −εdr, v 6 −εdr. Then |Ark | 6 2C2r H(ξ1 )−εdr < 1 if we assume in addition that H(ξ1 ) > (2C2 )1/dε . The conditions u=
1 d 2
− κ + (2κ + 1)εd r + dκ + λ 6 −εdr, v = ( 21 + ε)dr − (κ − 1)λ 6 −εdr
can be reworked into a lower and an upper bound for r in terms of λ, in fact, these conditions are equivalent to (6.19)
κ−
1 d 2
(κ − 1)λ dκ + λ 6r 6 1 . − (2κ + 2)εd ( 2 + 2ε)d
We now have to use our condition κ > 21 d + 1. This is equivalent to 0
> 0; 1 + 2ε)d κ − 2 d − (2κ + 2)εd
we can choose such ε depending only on d, κ. With this choice of ε, both the upper and lower bound in (6.19) are increasing functions of λ and the upper bound 103
increases faster than the lower bound. So there is λ0 such that for every λ > λ0 , the upper and lower bound in (6.19) differ by at least 1 and there is an integer r > 1 with (6.19). We finish the proof. Assume that (6.9) has infinitely many solutions. So (6.9) has solutions ξ1 , ξ2 such that (6.20)
H(ξ1 ) > (2C2 )1/dε ,
log H(ξ2 ) =: λ > λ0 . log H(ξ1 )
Then there is an integer r > 1 with (6.19), hence with u 6 −εdr, v 6 −εdr. Together with Lemma 6.14 this implies |Ark | 6 C2r · 2H1 (ξ1 )−εdr < 1 for k 6 d(2εr + 1). On the other hand, by Lemma 6.13, there is k 6 d(2εr + 1) such that Ark is a nonzero integer. This gives the contradiction we want. So (6.9) cannot have infinitely many solutions. Remark. To obtain a contradiction, we did not need the assumption that (6.9) has infinitely many solutions, but merely that there are solutions ξ1 , ξ2 of (6.9) with (6.20). In other words, solutions ξ1 , ξ2 of (6.9) with (6.20) cannot exist. The constants C2 and λ0 are effectively computable. So in fact we can prove the following sharpening of Theorem 6.9: Theorem 6.15. Let α be an algebraic number of degree d and κ > 21 d+1. There are effectively computable constants C, λ0 depending on α, κ, such that if ξ1 is a solution of (6.9)
|ξ − α| 6 H(ξ)−κ
in ξ ∈ Q
with H(ξ1 ) > C, then for any other solution ξ of (6.9) we have H(ξ) 6 H(ξ1 )λ0 . It should be noted that Theorem 6.15 would give an effective proof of Thue’s Theorem in case we were extremely lucky and knew a solution ξ1 of (6.9) with H(ξ1 ) > C. However, to find such a solution seems quite hopeless, since the constant C is very large. It is very likely that such a solution ξ1 does not even exist. However, there are variations on Thue’s method, which work only for special algebraic numbers √ α of the shape d a with a ∈ Q, where the constant C is much smaller and where a solution ξ1 of (6.9) with H(ξ1 ) > C is known. For such α one can derive very strong 104
effective approximation results, for instance Bennett’s estimate (6.8) mentioned in Section 6.1. On the other hand Theorem 6.15 can be used to estimate the number of solutions of (6.9). This is worked out in the exercise below. Exercise 6.7. (i) Let ξ1 , ξ2 be distinct rational numbers. Prove that −1 |ξ1 − ξ2 | > H(ξ1 )H(ξ2 ) . (ii) Let α be a real number, and κ > 2, and consider the inequality (6.21)
|ξ − α| 6 H(ξ)−κ in ξ ∈ Q with ξ > α.
Prove that if ξ1 , ξ2 are two distinct solutions of (6.21) with H(ξ2 ) > H(ξ1 ), then H(ξ2 ) > H(ξ1 )κ−1 . (So there are large gaps between the solutions of (6.21); we call such an inequality a gap principle.) Hint. Estimate from above |ξ1 − ξ2 |. (iii) Let A > 2, c > 1. Prove that the number of solutions ξ of (6.21) with A 6 log c H(ξ) < Ac is bounded above by 1 + log(κ−1) . (iv) Let α be a real algebraic number of degree d > 3 and κ > d2 + 1. Compute an explicit upper bound for the number of solutions of (6.21) in terms of the constants C and λ0 from Theorem 6.15.
105
Chapter 7 The Subspace Theorem Literature: W.M. Schmidt, Diophantine approximation, Lecture Notes in Mathematics 785, Springer Verlag 1980, Chap.IV,VI,VII
The Subspace Theorem is a higher dimensional generalization of Roth’s Theorem on the approximation of algebraic numbers by rational numbers. We explain the Subspace Theorem, give some applications to simultaneous Diophantine approximation, and then an application to higher dimensional generalizations of Thue equations, the so-called norm form equations.
7.1
The Subspace Theorem and some applications
In the formulation of the Subspace Theorem, we need some notions from linear algebra, which we recall below. Let n be an integer > 1 and r 6 n. We say that linear P P forms L1 = nj=1 α1j Xj , . . . , Lm = nj=1 αnj Xj with coefficients in C are linearly dependent if there are c1 , . . . , cr m ∈ C, not all 0, such that c1 L1 + · · · + cm Lm ≡ 0. Otherwise, L1 , . . . , Lr are called linearly independent. If r = n, then L1 , . . . , Ln are linearly independent if and only if their coefficient determinant det(L1 , . . . , Ln ) = det(αij )16i,j6n 6= 0. A linear subspace T of Qn of dimension r can be described as ( r ) X T = zi ai : z1 , . . . , zr ∈ Q , i=1
107
where a1 , . . . , ar are linearly independent vectors from Qn or alternatively as T = {x ∈ Qn : L1 (x) = 0, . . . , Ln−r (x) = 0} where L1 , . . . , Ln−r are linearly independent linear forms in X1 , . . . , Xn with coefficients from Q. The norm of x = (x1 , . . . , xn ) ∈ Zn is given by kxk := max(|x1 |, . . . , |xn |). In what follows, Q is the set of algebraic numbers in C. Theorem 7.1. (Subspace Theorem, W.M. Schmidt, 1972). Let n > 2, let Li = αi1 X1 + · · · + αin Xn
(i = 1, . . . , n)
be n linearly independent linear forms with coefficients in Q and let C > 0, δ > 0. Then the set of solutions of the inequality (7.1)
|L1 (x) · · · Ln (x)| 6 Ckxk−δ in x ∈ Zn
is contained in a union T1 ∪ · · · ∪ Tt of finitely many proper linear subspaces of Qn . Remark. The proof of the Subspace Theorem is ineffective, i.e., it does not enable to determine the subspaces. There is however a quantitative version of the Subspace Theorem which gives an explicit upper bound for the number of subspaces. This is an important tool for estimating the number of solutions of various types of Diophantine equations. We show that the Subspace Theorem implies Roth’s Theorem. Recall that the height of ξ ∈ Q is H(ξ) = max(|x|, |y|), where ξ = x/y, x, y ∈ Z, gcd(x, y) = 1. Corollary 7.2. Let α ∈ Q and C > 0, κ > 2. Then the inequality (7.2)
|ξ − α| 6 C · H(ξ)−κ in ξ ∈ Q
has only finitely many solutions. 108
Proof. Let ξ = x/y be a solution of (7.2), with x, y ∈ Z, gcd(x, y) = 1. Write κ = 2 + δ with δ > 0. By multiplying (7.2) with y 2 we obtain |y(x − αy)| 6 Cy 2 max(|x|, |y|)−2−δ 6 C · max(|x|, |y|)−δ . Since the linear forms Y and X − αY are linearly independent, this is an inequality to which the Subspace Theorem is applicable. It follows that the pairs of integers (x, y) ∈ Z2 with gcd(x, y) = 1 such that ξ = x/y is a solution of (7.2) lie in a union of finitely many proper, i.e., one-dimensional linear subspaces of Q2 . But a given one-dimensional subspace of Q2 consists of all points of the shape λ(x0 , y0 ) with λ ∈ Q where (x0 , y0 ) ∈ Z2 , thus the rational number ξ is uniquely determined by the subspace. This proves Roth’s Theorem. Exercise 7.1. Let Li = αi1 X1 + · · · + αin Xn (i = 1, . . . , m, m < n) be linearly independent linear forms with coefficients in Q and let C > 0, δ > 0. Deduce from the Subspace Theorem, that the set of solutions x ∈ Zn of the inequality |L1 (x) · · · Lm (x)| 6 Ckxkm−n−δ lie in a union of finitely many proper linear subspaces of Qn . The Subspace Theorem states that the set of solutions of (7.1) is contained in a finite union of proper linear subspaces of Qn , but one may wonder whether (7.1) has only finitely many solutions. For instance, it may be that there is a non-zero x0 ∈ Zn with L1 (x0 ) = 0. Then for every λ ∈ Z, the point λx0 is a solution to (7.1), and this gives infinitely many solutions to (7.1). To avoid such a construction, let us consider (7.3)
0 < |L1 (x) · · · Ln (x)| 6 C · kxk−δ in x ∈ Zn .
In the case n = 2 the number of solutions is indeed finite. Lemma 7.3. Let Li = αi1 X + αi2 Y (i = 1, 2) be two linearly independent linear forms with coefficients in Q and let C > 0, δ > 0. Then the inequality (7.4)
0 < |L1 (x)L2 (x)| 6 Ckxk−δ in x = (x, y) ∈ Z2
has only finitely many solutions. 109
Proof. By the Subspace Theorem, the solutions of (7.4) lie in finitely many onedimensional linear subspaces of Q2 . So we have to prove that each of these subspaces contains only finitely many solutions. Let T be one of these subspaces. Then T = {λx0 : λ ∈ Q} where we may choose x0 = (x0 , y0 ) ∈ Z2 with gcd(x0 , y0 ) = 1. Note that λ(x0 , y0 ) ∈ Z2 if and only if λ ∈ Z. If L1 (x0 )L2 (x0 ) = 0 then (7.4) has no solutions in T . Suppose that L1 (x0 )L2 (x0 ) 6= 0. Then x = λx0 is a solution of (7.4) if and only if 0 < λ2 |L1 (x0 )L2 (x0 )| 6 C · |λ|−δ kx0 k−δ , i.e., if |λ|2+δ 6 Ckx0 k−δ |L1 (x0 )L2 (x0 )|−1 . This shows that |λ| is bounded, hence that T contains only finitely many solutions of (7.4). In case that n > 3, (7.3) may very well have infinitely many solutions. We illustrate this with an example. Example. Let 0 < δ < 1 and consider the inequality √ √ √ √ √ √ (7.5) 0 < |(x1 + 2x2 + 3x3 )(x1 − 2x2 + 3x3 )(x1 − 2x2 − 3x3 )| 6 kxk−δ to be solved in x = (x1 , x2 , x3 ) ∈ Z3 . Notice that the three linear forms on the left-hand side are linearly independent. Consider the triples of integers x = (x1 , x2 , x3 ) ∈ Z3 with x3 = 0, x1 x2 6= 0. For these points, kxk = max(|x1 |, |x2 |, 0). By Dirichlet’s Theorem, the inequality √ x1 2 − 6 |x2 |−2 x2 has infinitely many solutions (x1 , x2 ) ∈ Z2 with x2 6= 0. For these solutions, kxk has the same order of magnitude as |x2 |. Indeed, √ √ |x1 /x2 | 6 |x2 |−2 + 2 6 1 + 2, √ and so, kxk = max(|x1 |, |x2 |) 6 (1 + 2)|x2 |. So for the points under consideration, √ √ √ √ √ √ 0 < |(x1 + 2x2 + 3x3 )(x1 − 2x2 + 3x3 )(x1 − 2x2 − 3x3 )| √ √ = |(x1 + 2x2 )(x1 − 2x2 )2 | √ √ 3 2 6 (1 + 2)kxk · (x−1 ) 6 (1 + 2) kxk−1 2 6 kxk−δ , 110
provided kxk is sufficiently large. It follows that (7.5) has infinitely many solutions x in the subspace x3 = 0. Exercise 7.2. (i) Prove that (7.5) has infinitely many solutions in the spaces x1 = 0 and x2 = 0. (ii) Prove that (7.5) has only finitely many solutions with x1 x2 x3 6= 0 Hint. The solutions of (7.5) lie in finitely many proper linear subspaces of Q3 . Let T be one of these subspaces. Let ax1 + bx2 + cx3 = 0 be a non-trivial equation vanishing identically on T , with at least one of a, b, c 6= 0. Since we only have to consider spaces T containing solutions with x1 x2 x3 6= 0, we may assume that at most one among a, b, c is zero. Given a solution (x1 , x2 , x3 ) of (7.5) in T ∩Z3 , express one of the variables x1 , x2 , x3 as a linear combination of the two others and substitute this in (7.5). What results is a product of three linear forms in two variables. You may handle this using an extension of the Subspace Theorem, treated below. Let L1 , . . . , Lr be linear forms with coefficients in C in the variables X1 , . . . , Xn , where r > n. We say that L1 , . . . , Lr (or more correctly the hyperplanes L1 = 0, . . . , Lr = 0 being defined by them) are in general position if each n-tuple of linear forms among L1 , . . . , Lr is linearly independent. Theorem 7.4. Let Li = αi1 X1 + · · · + αin Xn
(i = 1, . . . , r, r > n)
be r linear forms with coefficients in Q in general position and let C > 0, δ > 0. Then the set of solutions of the inequality (7.6)
|L1 (x) · · · Lr (x)| 6 C · kxkr−n−δ in x ∈ Zn
is contained in a union T1 ∪ · · · ∪ Tt of finitely many proper linear subspaces of Qn . This can be deduced by combining the Subspace Theorem with the following lemma. Lemma 7.5. Let M1 , . . . , Mn be linearly independent linear forms in X1 , . . . , Xn with complex coefficients. Then there is a constant C > 0 such that kxk 6 C max |M1 (x)|, . . . , |Mn (x)| for all x ∈ Cn . 111
Proof. Since the linear forms M1 , . . . , Mn are linearly independent, they span the complex vector space of all linear forms in X1 , . . . , Xn with complex coefficients. So we can express X1 , . . . , Xn as linear combinations of M1 , . . . , Mn , i.e., Xi =
n X
βij Mj with βij ∈ C (i = 1, . . . , n).
j=1
Take x = (x1 , . . . , xn ) ∈ Cn and put M := max16i6n |Mi (x)|. Then max |xi | 6 max
16i6n
16i6n
n X
|βij | · |Mj (x)| 6 C · M with C := max
16i6n
j=1
n X
|βij |.
j=1
Proof of Theorem 7.4. We partition the solutions x of (7.6) into a finite number of subsets according to the ordering of the numbers |L1 (x)|, . . . , |Lr (x)|, and show that each of these subsets lies in at most finitely many proper linear subspaces of Qn . Consider the solutions x ∈ Zn from one of these subsets, say for which |L1 (x)| 6 · · · 6 |Lr (x)|. By Lemma 7.5, for i = n + 1, . . . , r, since L1 , . . . , Ln−1 , Li are linearly independent, there is a constant Ci such that for all solutions x under consideration, kxk 6 Ci max(|L1 (x)|, . . . , |Ln−1 (x)|, |Li (x)|) = Ci |Li (x)|. Together with (7.6) this implies |L1 (x) · · · Ln (x)| 6 Ckxkr−n−δ
r Y
|Li (x)|−1
i=n+1
6 C · (Cn+1 · · · Cr )kxk−δ . So the solutions x under consideration lie in at most finitely many proper linear subspaces of Qn . The next result is a slight variation on a theorem of Dirichlet. 112
Lemma 7.6. Let α1 , . . . , αn ∈ R be numbers that are linearly independent over Q. Then there is C > 0 such that the inequality (7.7)
|α1 x1 + · · · + αn xn | 6 Ckxk1−n in x = (x1 , . . . , xn ) ∈ Zn
has infinitely many solutions. Proof. Let βi = −αi /αn (i = 1, . . . , n − 1) For instance from Minkowski’s convex body theorem (see Chapter 2), one deduces that there are infinitely many x = (x1 , . . . , xn ) ∈ Zn such that (7.8)
|xn − β1 x1 − · · · − βn xn−1 | 6 max(|x1 |, . . . , |xn−1 |)1−n .
Given a solution of this inequality, it follows easily that |xn | 6 1 +
n−1 X
|βi | · |xi | 6 C 0 max |xi |, 16i6n−1
i=1
say, hence kxk 6 C 0 max16i6n−1 |xi |. By inserting this into (7.8) and multiplying with |αn | we get (7.7) with C = |αn |C 0n−1 . In the case that the coefficients αi are algebraic, we have the following counterpart. Theorem 7.7. Let α1 , . . . , αn ∈ Q and C > 0, δ > 0. Then the inequality (7.9)
0 < |α1 x1 + · · · + αn xn | 6 Ckxk1−n−δ in x = (x1 , . . . , xn ) ∈ Zn
has only finitely many solutions. Remark. For n = 2 this implies Roth’s Theorem. Proof. We proceed by induction on n. For n = 1 the assertion is obvious. (Here we use our assumption α1 x1 6= 0). Let n > 1 and suppose Theorem 7.7 is true for linear forms in fewer than n variables. We apply the Subspace Theorem. We may assume that at least one of the coefficients α1 , . . . , αn is non-zero, otherwise there are no solutions. Suppose that α1 6= 0. Then (7.6) implies |(α1 x1 + · · · + αn xn )x2 · · · xn | 6 Ckxk−δ 113
and by the Subspace Theorem, the solutions of the latter lie in a union of finitely many proper linear subspaces T1 , . . . , Tt of Qn . We consider only solutions with α1 x1 + · · · + αn xn 6= 0. Therefore, without loss of generality we may assume that α1 x1 + · · · + αn xn is not identically 0 on any of the spaces T1 , . . . , Tt . Consider the solutions of (7.6) in Ti . Choose a non-trivial linear form vanishing identically on Ti , a1 x1 + · · · + an xn = 0. Suppose for instance, that an 6= 0. Then xn can be expressed as a linear combination of x1 , . . . , xn−1 . By substituting this into (7.6) we obtain an inequality 0 < |β1 x1 + · · · + βn−1 xn−1 | 6 Ckxk1−n−δ 6 C
2−n−δ max |xi | .
16i6n−1
By the induction hypothesis, the latter inequality has only finitely many solutions (x1 , . . . , xn−1 ). So Ti contains only finitely many solutions x of (7.6). Applying this to T1 , . . . , Tt we obtain that (7.6) has altogether only finitely many solutions. Instead of approximating a given algebraic number α by rationals, we can also consider the approximation of α by algebraic numbers of degree at most d. Recall that the primitive minimal polynomial of ξ ∈ Q is the polynomial F := a0 X d + a1 X d−1 + · · · + ad ∈ Z[X] such that F (ξ) = 0, F is irreducible, and a0 > 0, gcd(a0 , . . . , ad ) = 1. Then the height of ξ is H(ξ) := max(|a0 |, . . . , |ad |). We consider |ξ − α| 6 C · H(ξ)−κ in ξ ∈ Q with deg ξ 6 d.
(7.10)
Theorem 7.8. For every C > 0, κ > d + 1, (7.10) has only finitely many solutions. Proof. Write κ = d + 1 + δ with δ > 0. Let ξ be a solution of (7.10). Let F = x0 + x1 X + · · · + xd X d be the primitive minimal polynomial of ξ. Then x := (x0 , . . . , xd ) ∈ Zd+1 and H(ξ) = kxk. We want to show that there are only finitely many possibilities for F , and to this end, we want to estimate from above |F (α)| = P | di=0 xi αi | and apply Theorem 7.7. Since F (ξ) = 0 we have Z |F (α)| =
0
1
F 0 (ξ + t(α − ξ)) · (α − ξ)dt 6 |α − ξ| · max |F 0 (ξ + t(α − ξ))|. 06t61
114
Using |ξ + t(α − ξ)| 6 |α − ξ| 6 |α| + C for 0 6 t 6 1, we obtain 0
|F (ξ + t(α − ξ))| 6
d X
|xi | · i(|α| + C)i−1 6 C 0 kxk,
i=1 0
say. Hence |F (α)| 6 |ξ − α| · C kxk. There are only finitely many ξ which are conjugate to α. For the remaining solutions ξ of (7.10) we have F (α) = 6 0, and so 0 0, δ > 0, and let α1 , . . . , αn be real algebraic numbers such that (7.11)
1, α1 , . . . , αn are linearly independent over Q.
Consider the system of inequalities (7.12)
1
1
|x1 − α1 xn+1 | 6 Ckxk− n −δ , . . . , |xn − αn xn+1 | 6 Ckxk− n −δ
to be solved simultaneously in x = (x1 , . . . , xn+1 ) ∈ Zn+1 . Prove that (7.12) has only finitely many solutions. Hint. First apply the Subspace Theorem. Then show that a subspace T , given by an equation a1 x1 + · · · + an+1 xn+1 = 0 with ai ∈ Q, say, contains only finitely many solutions. There is no obvious way to do this with the Subspace Theorem, so you have to prove this directly without using the Subspace Theorem. Here you have to use assumption (7.11).
7.2
Norm form equations
Recall that if F (X, Y ) is an irreducible binary form in Q[X, Y ] of degree d with coefficient of X r equal to 1, say, then F (X, Y ) =
d Y (X − α(i) Y ) i=1
115
where α(1) , . . . , α(d) are the conjugates of an algebraic number α. If K = Q(α), and σ1 , . . . , σs are the embeddings of K in C, with σi (α) = α(i) , then d Y F (X, Y ) = (X − σi (α)Y ) = NK/Q (X − αY ). i=1
That is, F is a norm form in two variables. The Thue equation NK/Q (x − αy) = c in x, y ∈ Z has only finitely many solutions if [K : Q] > 3 (by Thue’s Theorem) or if K is an imaginary quadratic field (then the solutions represent points with integer coordinates on an ellipsis). It may √ have infinitely many solutions if K is real quadratic. For instance if K = Q( d) with √ d a positive, non-square integer, then the Pell equation x2 − dy 2 = NK/Q (x − dy) = 1 has infinitely many solutions. We consider a generalization of the Thue equation, involving norm forms of an arbitrary number of variables. Let K = Q(θ) be an algebraic number field of degree d. Then the monic minimal polynomial fθ of θ can be expressed as fθ = Qd (i) (1) (d) ∈ C are the conjugates of θ. The embeddings of i=1 (X − θ ), where θ , . . . , θ (i) K in C are given by σi (θ) = θ for i = 1, . . . , d. Define G := Q(θ(1) , . . . , θ(d) ). Then G is a normal number field. Denote by Gal(G/Q) the Galois group, i.e., the group of automorphisms of G. Recall that each τ ∈ Gal(G/Q) permutes θ(1) , . . . , θ(d) . On the other hand τ is uniquely determined by its images on θ(1) , . . . , θ(d) . Hence each τ ∈ Gal(G/Q) may be identified with a permutation of θ(1) , . . . , θ(d) , and thus Gal(G/Q) is isomorphic to a subgroup of Sd (that is the permutation group on d elements). Now suppose that 2 6 n 6 d and let α1 , . . . , αn be elements of K which are linearly independent over Q, that is, the only solution in x1 , . . . , xn ∈ Q of x1 α1 + · · · + xn αn = 0 is x1 = · · · = xn = 0. Define the polynomial d Y F (X1 , . . . , Xn ) := NK/Q (α1 X1 + · · · + αn Xn ) := (σi (α1 )X1 + · · · + σi (αn )Xn ). i=1
Notice that if we apply any τ from the Galois group Gal(G/Q), then it permutes the linear factors of F , hence it leaves the coefficients of F unchanged. So F has its coefficients in Q. 116
We deal with the so-called norm form equation (7.13)
NK/Q (α1 x1 + · · · + αn xn ) = c in x = (x1 , . . . , xn ) ∈ Zn .
If n = 2, the left-hand side is a binary form and (7.13) becomes a Thue equation. In 1972, Schmidt gave a necessary and sufficient condition such that (7.13) has only finitely many solutions. His proof was based on the Subspace Theorem. Here, we prove a special case of his result. Theorem 7.9. Suppose that n < d, and let α1 , . . . , αn be elements of K which are linearly independent over Q. Assume that Gal(G/Q) ∼ = Sd . Then (7.13) has only finitely many solutions. We need some lemmas. Lemma 7.10. The vectors (σ1 (αi ), . . . , σd (αi )) (i = 1, . . . , n) are linearly independent in Cd . Proof. In general, any linearly independent subset of a finite dimensional vector space can be augmented to a basis of that space. In particular, we can augment {α1 , . . . , αn } to a Q-basis {α1 , . . . , αd } of K. As a consequence, there are bij ∈ Q such that d X i θ = bij αj for i = 0, . . . , d − 1. j=1
Pd−1
Then also, σi (θ)j = k=0 bjk σi (αk ) for i = 1, . . . , d, j = 0, . . . , d − 1, and this leads to a matrix identity and determinant identity
T σi (θ)j ) = σi (αj ) · bij ,
By Vandermonde’s identity we have j det σi (θ) =
det σi (θ)j ) = det σi (αj ) · det bij .
Y
(θ(j) − θ(i) ) 6= 0.
16i 2, and assume the theorem is true for norm form equations in fewer than n unknowns. Since the linear forms L1 , . . . , Ld are in general position, by Theorem 7.4, for any C > 0, δ > 0 the set of solutions of |F (x)| = |L1 (x) · · · Ld (x)| 6 Ckxkd−n−δ 118
lies in a union of finitely many proper linear subspaces of Qn . It follows that the solutions of (7.13) lie in only finitely many proper linear subspaces of Qn . We show that (7.13) has only finitely many solutions in each of these subspaces. Let T be one of these subspaces. For solutions in T , one of the coordinates can be expressed as a linear combination of the others, with coefficients in Q. Say that we have xn = a1 x1 + · · · + an−1 xn−1 identically on T , where ai ∈ Q. By substituting this in (7.13) we get a norm form equation in n − 1 variables NK/Q (β1 x1 + · · · + βn−1 xn−1 ) = c, where βi = αi + ai αn for i = 1, . . . , n − 1. It is not difficult to show that β1 , . . . , βn−1 are linearly independent over Q. Hence by the induction hypothesis, this last equation has only finitely many solutions (x1 , . . . , xn−1 ) ∈ Zn−1 . This implies that the original equation (7.13) has only finitely many solutions (x1 , . . . , xn ) ∈ T . This completes our proof. We give examples of norm form equations with infinitely many solutions. We recall the following fact: Lemma 7.12. Let K be an algebraic number field and α an element of the ring of integers OK of K. Then α is a unit of OK ⇐⇒ NK/Q (α) = ±1. Proof. Well-known. It is more convenient to rewrite (7.13) as (7.14)
NK/Q (ξ) = c in ξ ∈ M,
where M := {α1 x1 + · · · + αn xn : x1 , . . . , xn ∈ Z}. Notice that M is a free Z-module in K of rank n, i.e., its elements can be expressed uniquely as Z-linear combinations of a basis of n elements. ∗ Take an algebraic number field K such that the unit group OK of the ring of integers of K is infinite. By Dirichlet’s Unit Theorem, this holds for any number field K which is not Q or an imaginary quadratic field (i.e. a number field of the
119
√ shape Q( −a) with a ∈ Z>0 ). Take M = OK . It is known that OK is a free ∗ Z-module of rank equal to [K : Q]. Now clearly, if ε ∈ OK , then ξ = ε2 is a solution to NK/Q (ξ) = 1 in ξ ∈ OK , and so this last norm form equation has infinitely many solutions. More generally, (7.14) has infinitely many solutions if µOL = {µξ : ξ ∈ OL } ⊆ M for some µ ∈ K ∗ , and some subfield L of K which is not equal to Q or to an imaginary quadratic field. Now Schmidt’s result on norm form equations is as follows. Theorem 7.13. (W.M. Schmidt, 1972) Let K be an algebraic number field, α1 , . . . , αn elements of K which are linearly independent over Q, and Pn M := i=1 αi xi : xi ∈ Z . Then the following two assertions are equivalent: (i) there do not exist µ ∈ K ∗ and a subfield L of K not equal to Q or to an imaginary quadratic field such that µOL ⊆ M; (ii) for every c ∈ Q∗ , the equation NK/Q (ξ) = c in ξ ∈ M
(7.14)
has only finitely many solutions. The implication (i)=⇒(ii) is deduced from the Subspace Theorem. The proof is too difficult to be included here. We prove only the other implication, that is, if (i) is false then there is c ∈ Q∗ such that (7.14) has infinitely many solutions. Indeed, for every ε ∈ OL∗ we have µε2 ∈ M and NK/Q (ε) = ±1. Thus, by letting ε run through OL∗ , we obtain infinitely many elements ξ = µε2 ∈ M with NK/Q (ξ) = NK/Q (µ)NK/Q (ε)2 = NK/Q (µ).
Example. Let √ 6 K = Q( 2),
√ √ √ 5 6 6 M := {x1 2 + x2 2 + x3 2 : x1 , x2 , x3 ∈ Z}. √ Notice that K contains the subfield L = Q( 3 2). One can show that √ √ √ 3 3 3 OL = {x1 + x2 2 + x3 4 : xi ∈ Z}, OL∗ = {±(1 − 2)n : n ∈ Z}. 120
√ √ We have M = 6 2OL and NK/Q (1 − 3 2) = 1. Hence every n ∈ Z yields a solution √ √ ξ := 6 2(1 − 3 2)n ∈ M of √ 6 NK/Q (ξ) = NK/Q ( 2) = 2. √ Exercise 7.4. Let K = Q( 5 2). Note that the√embeddings of K in C are given by √ √ σi ( 5 2) = ρi 5 2 for i = 0, . . . , 4, where ρ = e2π −1/5 . Let c ∈ Q∗ , and consider the norm form equation 4 Y √ √ √ 5 5 5 (7.15) NK/Q (x1 + 2x2 + 4x3 ) = (x1 +ρi 2x2 +ρ2i 4x3 ) = c in x1 , x2 , x3 ∈ Z.
√ 5
i=0
(i) Prove that the left-hand side of (7.15) is a product of linear forms in general position. (ii) Prove that if α, β ∈ K ∗ and α/β 6∈ Q, then the linear forms σi (α)X1 + σi (β)X2 (i = 0, . . . , 4) are in general position. (iii) Prove that (7.15) has only finitely many solutions (you are allowed to apply Theorem 7.4 but not Theorem 7.13).
121
Chapter 8 P-adic numbers Literature: N. Koblitz, p-adic Numbers, p-adic Analysis, and Zeta-Functions, 2nd edition, Graduate Texts in Mathematics 58, Springer Verlag 1984, corrected 2nd printing 1996, Chap. I,III
8.1
Absolute values
The p-adic absolute value | · |p on Q is defined as follows: if a ∈ Q, a 6= 0 then write a = pm b/c where b, c are integers not divisible by p and put |a|p = p−m ; further, put |0|p = 0. Example. Let a = −2−7 38 5−3 . Then |a|2 = 27 , |a|3 = 3−8 , |a|5 = 53 , |a|p = 1 for p > 7. We give some properties: |ab|p = |a|p |b|p for a, b ∈ Q∗ ; |a + b|p 6 max(|a|p , |b|p ) for a, b ∈ Q∗ (ultrametric inequality). Notice that the last property implies that |a + b|p = max(|a|p , |b|p ) if |a|p 6= |b|p . It is common to write the ordinary absolute value |a| = max(a, −a) on Q as |a|∞ , to call ∞ the ‘infinite prime’ and to define MQ := {∞} ∪ {primes}. Then we 123
have the important product formula: Y |a|p = 1 for a ∈ Q, a 6= 0. p∈MQ
We define more generally absolute values on fields. Let K be any field. An absolute value on K is a function | · | : K → R>0 with the following properties: |ab| = |a| · |b| for a, b ∈ K; |a + b| 6 |a| + |b| for a, b ∈ K (triangle inequality); |a| = 0 ⇐⇒ a = 0. Notice that these properties imply that |1| = 1. The absolute value | · | is called nonarchimedean if the triangle inequality can be replaced by the stronger ultrametric inequality or strong triangle inequality |a + b| 6 max(|a|, |b|) for a, b ∈ K . An absolute value not satisfying the ultrametric inequality is called archimedean. If K is a field with absolute value | · | and L an extension of K, then an extension or continuation of | · | to L is an absolute value on L whose restriction to K is | · |. Examples. 1) Every field K can be endowed with the trivial absolute value | · |, given by |a| = 0 if a = 0 and |a| = 1 if a 6= 0. It is not hard to show that if K is a finite field then there are no non-trivial absolute values on K. 2) The ordinary absolute value | · |∞ on Q is archimedean, while the p-adic absolute values are all non-archimedean. 3) Let K be any field, and K(t) the field of rational functions of K. For a polynomial f ∈ K[t] define |f | = 0 if f = 0 and |f | = edeg f if f 6= 0. Further, for a rational function f /g with f, g ∈ K[t] define |f /g| = |f |/|g|. Verify that this defines a non-archimedean absolute value on K(t). Let K be a field. Two absolute values |·|1 , |·|2 on K are called equivalent if there is α > 0 such that |x|2 = |x|α1 for all x ∈ K. We state without proof the following result: Theorem 8.1. (Ostrowski) Every non-trivial absolute value on Q is equivalent to either the ordinary absolute value or a p-adic absolute value for some prime number p. 124
8.2
Completions
Let K be a field, | · | a non-trivial absolute value on K, and {ak }∞ k=0 a sequence in K. We say that {ak }∞ k=0 converges to α with respect to | · | if limk→∞ |ak − α| = 0. is called a Cauchy sequence with respect to |·| if limm,n→∞ |am −an | = Further, {ak }∞ k=0 0. Notice that any convergent sequence is a Cauchy sequence. We say that K is complete with respect to | · | if every Cauchy sequence w.r.t. | · | in K converges to a limit in K. For instance, R and C are complete w.r.t. the ordinary absolute value. Ostrowski proved that any field complete with respect to an archimedean absolute value is isomorphic to R or C. Every field K with an absolute value can be extended to an up to isomorphism complete field, the completion of K. Theorem 8.2. Let K be a field with non-trivial absolute value | · |. There is an up ˜ of K, called the to absolute value preserving isomorphism unique extension field K completion of K, having the following properties: ˜ also denoted | · |, such that K ˜ is (i) | · | can be continued to an absolute value on K, complete w.r.t. | · |; ˜ i.e., every element of K ˜ is the limit of a sequence from K. (ii) K is dense in K, Proof. Basically one has to mimic the construction of R from Q or the construction of a completion of a metric space in topology. We give a sketch. Cauchy sequences, limits, etc. are all with respect to | · |. The set of Cauchy sequences in K with respect to | · | is closed under termwise addition and multiplication {an } + {bn } := {an + bn }, {an } · {bn } := {an · bn }. With these operations they form a ring, which we denote by R. It is not difficult to verify that the sequences {an } such that an → 0 with respect to | · | form a maximal ideal in R, which we denote by M. Thus, the quotient R/M is a field, which is our ˜ completion K. ˜ by choosing a representative {an } of α, We define the absolute value |α| of α ∈ K 125
and putting |α| := limn→∞ |an |, where now the limit is with respect to the ordinary absolute value on R. It is not difficult to verify that this is well-defined, that is, the limit exists and is independent of the choice of the representative {an }. ˜ by identifying a ∈ K with the element of We may view K as a subfield of K ˜ represented by the constant Cauchy sequence {a}. In this manner, the absolute K ˜ constructed above extends that of K, and moreover, every element of value on K ˜ ˜ One shows that K ˜ is K is a limit of a sequence from K. So K is dense in K. ˜ has a limit in K, ˜ by taking very complete, that is, any Cauchy sequence {an } in K good approximations bn ∈ K of an and then taking the limit of the bn . Finally, if K 0 is another complete field with absolute value extending that on K ˜ to K 0 as follows: such that K is dense in K 0 one obtains an isomorphism from K ˜ Choose a sequence {ak } in K converging to α; this is necessarily a Take α ∈ K. Cauchy sequence. Then map α to the limit of {ak } in K 0 . Corollary 8.3. Assume that | · | is a non-trivial, non-archimedean absolute value ˜ is also non-archimedean. on K. Then the extension of | · | to K ˜ Choose sequences {ak }, {bk } in K that converge to a, b, Proof. Let a, b ∈ K. respectively. Then |a + b| = lim |ak + bk | 6 lim max(|ak |, |bk |) = max(|a|, |b|). k→∞
8.3
k→∞
p-adic Numbers and p-adic integers
In everything that follows, p is a prime number. The completion of Q with respect to | · |p is called the field of p-adic numbers, notation Qp . The continuation of | · |p to Qp is also denoted by | · |p . This is a non-archimedean absolute value on Qp . Convergence, limits, Cauchy sequences and the like will all be with respect to | · |p . As mentioned before, by identifying a ∈ Q with the class of the constant Cauchy sequence {a}, we may view Q as a subfield of Qp . Lemma 8.4. The value set of | · |p on Qp is {0} ∪ {pm : m ∈ Z}. 126
Proof. Let x ∈ Qp , x 6= 0. Choose again a sequence {xk } in Q converging to x. Then |x|p = limk→∞ |xk |p . For k sufficiently large we have |xk |p = pmk for some mk ∈ Z. Since the sequence of numbers pmk converges we must have mk = m ∈ Z for k sufficiently large. Hence |x|p = pm . The set Zp := {x ∈ Qp : |x|p 6 1} is called the ring of p-adic integers. Notice that if x, y ∈ Zp then |x − y|p 6 max(|x|p , |y|p ) 6 1. Hence x − y ∈ Zp . Further, if x, y ∈ Zp then |xy|p 6 1 which implies xy ∈ Zp . So Zp is indeed a ring. Viewing Q as a subfield of Qp , we have Zp ∩ Q = ab : a, b ∈ Z, p - b . It is not hard to show that the group of units of Zp , these are the elements x ∈ Zp with x−1 ∈ Zp , is equal to Z∗p = {x ∈ Qp : |x|p = 1}. Further, Mp := {x ∈ Qp : |x|p < 1} is an ideal of Zp . In fact, Mp is the only maximal ideal of Zp since any ideal of Zp not contained in Mp contains an element of Z∗p , hence generates the whole ring Zp . Noting |x|p < 1 ⇐⇒ |x|p 6 p−1 ⇐⇒ |x/p|p 6 1 ⇐⇒ x/p ∈ Zp for x ∈ Qp , we see that Mp = pZp . For α, β ∈ Qp we write α ≡ β (mod pm ) if (α − β)/pm ∈ Zp . This is equivalent to |α − β|p 6 p−m . Notice that if α = ab 1 , β = ab 2 with a1 , b1 , a2 , b2 ∈ Z and p - b1 b2 , 1 2 then a1 ≡ a2 (mod pm ), b1 ≡ b2 (mod pm ) =⇒ α ≡ β (mod pm ). For p-adic numbers, “very small” means “divisible by a high power of p”, and two p-adic numbers α and β are p-adically close if and only if α − β is divisible by a high power of p. Lemma 8.5. For every α ∈ Zp and every positive integer m there is a unique am ∈ Z such that |α − am |p 6 p−m and 0 6 am < pm . Hence Z is dense in Zp . Proof. There is a rational number a/b (with coprime a, b ∈ Z) such that |α − (a/b)|p 6 p−m since Q is dense in Qp . At most one of a, b is divisible by p and 127
it cannot be b since |a/b|p 6 1. Hence there is an integer am with bam ≡ a (mod pm ) and 0 6 am < pm . Thus, |α − am |p 6 max(|α − (a/b)|p , |(a/b) − am |p ) 6 p−m . This shows the existence of am . As for the unicity, if a0m is another integer with the properties specified in the lemma, we have |am − a0m |p 6 p−m , hence am ≡ a0m (mod pm ), implying am = a0m . Theorem 8.6. The non-zero ideals of Zp are pm Zp (m = 0, 1, 2, . . .) and Zp /pm Zp ∼ = m ∼ Z/p Z. In particular, Zp /pZp = Fp . Proof. Let I be a non-zero ideal of Zp and choose α ∈ I for which |α|p is maximal. Then |α|p = p−m with m ∈ Z>0 . We have p−m α ∈ Z∗p , hence pm ∈ I. Further, for β ∈ I we have |βp−m |p 6 1, hence β ∈ pm Zp . Hence I ⊂ pm Zp . This implies I = pm Zp . The homomorphism Z/pm Z → Zp /pm Zp : a (mod pm )7→ a (mod pm ) is clearly injective. and also surjective in view of Lemma 8.5. Hence Z/pm Z ∼ = Zp /pm Zp . P∞ Lemma 8.7. Let {ak }∞ k=0 be a sequence in Qp . Then k=0 ak converges in Qp if and only if limk→∞ ak = 0. Further, every convergent series in Qp is unconditionally convergent, i.e., neither the convergence, nor the value of the series, are affected if the terms ak are rearranged. Proof. Suppose that α :=
P∞
k=0
an =
ak converges. Then
n X k=0
ak −
n−1 X
ak → α − α = 0.
k=0
Conversely, suppose that ak → 0 as k → ∞. Let αn := integers m, n with 0 < m < n we have |αn − αm |p = |
n X
Pn
k=0
ak . Then for any
ak |p 6 max(|am+1 |p , . . . , |an |p ) → 0 as m, n → ∞ .
k=m+1
So the partial sums αn form a Cauchy sequence, hence must converge to a limit in Qp . 128
To prove the second part of the lemma, let σ be a bijection from Z>0 to Z>0 . P P We have to prove that ∞ aσ(k) = ∞ k=0 k=0 ak . Equivalently, we have to prove that PM PM k=0 ak − k=0 aσ(k) → 0 as M → ∞, i.e., for every ε > 0 there is N such that |
M X k=0
ak −
M X
aσ(k) |p < ε for every M > N .
k=0
Let ε > 0. There is N such that |ak |p < ε for all k > N . Choose N1 > N such that {σ(0), . . . , σ(N1 )} contains {0, . . . , N } and let M > N1 . Then in the sum P PM S := M k=0 ak − k=0 aσ(k) , only terms ak with k > N and aσ(k) with σ(k) > N occur. Hence each term in S has p-adic absolute value < ε and therefore, by the ultrametric inequality, |S|p < ε. We now show that every element of Zp has a “Taylor series expansion,” and every element of Qp a “Laurent series expansion” where instead of powers of a variable X one takes powers of p. P k Theorem 8.8. (i) Every element of Zp can be expressed uniquely as ∞ k=0 bk p with bk ∈ {0, . . . , p − 1} for k > 0 and conversely, every such series belongs to Zp . P k (ii) Every element of Qp can be expressed uniquely as ∞ k=−k0 bk p with k0 ∈ Z, bk ∈ {0, . . . , p − 1} for k > −k0 and b−k0 6= 0 and conversely, every such series belongs to Qp . P b pk Proof. We first prove part (i). First observe that by Lemma 8.7, a series ∞ P∞ k=0 kk with bk ∈ {0, . . . , p−1} converges in Qp . Further, it belongs to Zp , since | k=0 bk p |p 6 maxk>0 |bk pk |p 6 1. ∞ Let α ∈ Zp . Define sequences {αk }∞ k=0 in Zp , {bk }k=0 in {0, . . . , p−1} inductively as follows: α0 := α; (8.1) For k = 0, 1, . . . , let bk ∈ {0, . . . , p − 1} be the integer with αk ≡ bk (mod p) and put αk+1 := (αk − bk )/p.
By induction on k, one easily deduces that for k > 0, αk ∈ Z p , α =
k X j=0
129
bj pj + pk+1 αk .
Hence |α −
Pk
j j=0 bj p |p
6 p−k−1 for k > 0. It follows that α = lim
k→∞
k X
∞ X
j
bj p =
j=0
bj p j .
j=0
P k Notice that the integer am from Lemma 8.5 is precisely m−1 k=0 bk p . Since am is uniquely determined, so must be the integers bk . P k We prove part (ii). As above, any series ∞ k=−k0 bk p with bk ∈ {0, . . . , p − 1} converges in Qp . Let α ∈ Qp with α 6= 0. Suppose that |α|p = pk0 . Then β := p−k0 α has |β|p = 1, so it belongs to Zp . Applying (i) to β we get α=p
−k0
β=p
−k0
∞ X
ck p k
k=0
with ck ∈ {0, . . . , p − 1} which implies (ii). Corollary 8.9. Zp is uncountable. Proof. Apply Cantor’s diagonal method. We use the following notation: α = 0. b0 b1 . . . (p) α = b−k0 · · · b−1 . b0 b1 . . . (p)
if α =
P∞
if α =
P∞
k=0 bk p
k
,
k=−k0 bk p
k
with k0 < 0.
We can describe various of the definitions given above in terms of p-adic expansions. P∞ For instance, for α ∈ Qp we have |α|p = p−m where α = bk pk with bk ∈ k=m P P∞ k {0, . . . , p − 1} for k > m and bm 6= 0. next, if α = k=0 ak pk , β = ∞ k=0 bk p ∈ Zp with ak , bk ∈ {0, . . . , p − 1}, then α ≡ β (mod pm ) ⇐⇒ ak = bk for k < m. For p-adic numbers given in their p-adic expansions, one has the same addition with carry algorithm as for real numbers given in their decimal expansions, except that for p-adic numbers one has to work from left to right instead of right to left. Likewise, one has subtraction and multiplication algorithms for p-adic numbers which are precisely the same as for real numbers apart from that one has to work from left to right instead of right to left. 130
Theorem 8.10. Let α =
P∞
k=−k0 bk p
k
with bk ∈ {0, . . . , p − 1} for k > −k0 . Then
α ∈ Q ⇐⇒ {bk }∞ k=−k0 is ultimately periodic. Proof. ⇐= Exercise. =⇒ Without loss of generality, we assume that α ∈ Zp (if α ∈ Qp with |α|p = pk0 , say, then we proceed further with β := pk0 α which is in Zp ). Suppose that α = A/B with A, B ∈ Z, gcd(A, B) = 1. Then p does not divide B (otherwise |α|p > 1). Let C := max(|A|, |B|). Let {αk }∞ k=0 be the sequence defined by (8.1). Notice that αk determines uniquely the numbers bk , bk+1 , . . .. Claim. αk = Ak /B with Ak ∈ Z, |Ak | 6 C. This is proved by induction on k. For k = 0 the claim is obviously true. Suppose the claim is true for k = m where m > 0. Then αm+1 =
α m − bm (Am − bm B)/p = . p B
Since αm ≡ bm (mod p) we have that Am − bm B is divisible by p. So Am+1 := (Am − bm B)/p ∈ Z. Further, |Am+1 | 6
C + (p − 1)B 6 C. p
This proves our claim. Now since the integers Ak all belong to {−C, . . . , C}, there must be indices l < m with Al = Am , that is, αl = αm . But then, bk+m−l = bk for all k > l, proving that {bk }∞ k=0 is ultimately periodic. Examples. (i) We determine the 3-adic expansion of − 25 . We compute the numbers αk , bk according to (8.1). Notice that 51 ≡ 2 (mod 3). k αk bk
0
1
2
3
− 25
− 54
− 35
− 15
2
1
0
131
4
− 52 1 2
2 It follows that the sequence of 3-adic digits {bk }∞ k=0 of − 5 is periodic with period 2, 1, 0, 1 and that
−
2 = 2 × 30 + 1 × 31 + 0 × 32 + 1 × 33 + 2 × 34 + 1 × 35 + 0 × 36 + 1 × 37 + · · · 5 = 0.2101 2101 . . . (2) = 0. 2101 (2).
(ii) We determine the 2-adic expansion of with the 2-adic expansion of 17 . k αk bk So
Notice that
0
1
2
3
1 7
− 37
− 75
− 76
1
1
1
1 = 0. 1 110 (2), 7
8.4
1 . 56
1 56
= 2−3 × 17 . We start
4
− 37 0 1
1 = 111. 011 . . . (2). 56
The p-adic topology
The ball with center a ∈ Qp and radius r in the value set {0} ∪ {pm : m ∈ Z} of | · |p is defined by B(a, r) := {x ∈ Qp : |x − a|p 6 r}. Notice that if b ∈ B(a, r) then |b − a|p 6 r. So by the ultrametric inequality, for x ∈ B(a, r) we have |x − b|p 6 max(|x − a|p , |a − b|p ) 6 r, i.e. x ∈ B(b, r). So B(a, r) ⊆ B(b, r). Similarly one proves B(b, r) ⊆ B(a, r). Hence B(a, r) = B(b, r). In other words, any point in a ball can be taken as center of the ball. We define the p-adic topology on Qp as follows. A subset U of Qp is called open if for every a ∈ U there is m > 0 such that B(a, p−m ) ⊂ U . It is easy to see that this topology is Hausdorff: if a, b are distinct elements of Qp , and m is an integer with p−m < |a − b|p , then the balls B(a, p−m ) and B(b, p−m ) are disjoint. But apart from this, the p-adic topology has some strange properties. Theorem 8.11. Let a ∈ Qp , m ∈ Z. Then B(a, p−m ) is both open and compact in the p-adic topology. Proof. The ball B(a, p−m ) is open since for every b ∈ B(a, p−m ) we have B(b, p−m ) = B(a, p−m ). 132
To prove the compactness we modify the proof of the Heine-Borel theorem stating that every closed bounded set in R is compact. Assume that B0 := B(a, p−m ) is not compact. Then there is an infinite open cover {Uα }α∈A of B0 no finite subcollection of which covers B0 . Take x ∈ B(a, p−m ). Then |(x − a)/pm |p 6 1. Hence there is b ∈ {0, . . . , p − 1} such that x−a ≡ b (mod p). But then, x ∈ B(a + bpm , p−m−1 ). So pm m −m−1 B(a, pm ) = ∪p−1 ) is the union of p balls of radius p−m−1 . It follows b=0 B(a + bp , p that there is a ball B1 ⊂ B(a, p−m ) of radius p−m−1 which can not be covered by finitely many sets from {Uα }α∈A . By continuing this argument we find an infinite sequence of balls B0 ⊃ B1 ⊃ B2 ⊃ · · · , where Bi has radius p−m−i , such that Bi can not be covered by finitely many sets from {Uα }α∈A . We show that the intersection of the balls Bi is non-empty. For i > 0, choose xi ∈ Bi . Thus, Bi = B(xi , p−m−i ). Then {xi }i>0 is a Cauchy sequence since |xi − xj |p 6 p−m−min(i,j) → 0 as i, j → ∞. Hence this sequence has a limit x∗ in Qp . Now we have |xi − x∗ |p = limj→∞ |xi − xj |p 6 p−m−i , hence x∗ ∈ Bi , and so Bi = B(x∗ , p−m−i ) for i > 0. The point x∗ belongs to one of the sets, U , say, of {Uα }α∈A . Since U is open, for i sufficiently large the ball Bi must be contained in U . This gives a contradiction. Corollary 8.12. Every non-empty open subset of Qp is disconnected. Proof. Let U be an open non-empty subset of Qp . Take a ∈ U . Then B := B(a, p−m ) ⊂ U for some m ∈ Z. By increasing m we can arrange that B is strictly smaller than U . Now B is open and also U \ B is open since B is compact. Hence U is the union of two non-empty disjoint open sets.
8.5
Algebraic extensions of Qp
We fix an algebraic closure Qp of Qp . We construct an extension of | · |p to Qp . For polynomials f, g ∈ Zp [X] we write f ≡ g (mod pm ) if p−m (f − g) ∈ Zp [X]. Given f ∈ Zp [X] and a sequence of polynomials fm ∈ Zp [X] (m = 1, 2, . . .), we write limm→∞ fm = f if for every k > 0, the sequence of coefficients of X k in fm converges to the coefficient of X k in f . Clearly, limm→∞ fm = f if and only if there is a sequence of non-negative integers am with limm→∞ am → ∞ (in R) such that fm ≡ f (mod pam ). 133
An important tool is the so-called Hensel’s Lemma, which gives a method to derive, from a factorization of a polynomial f ∈ Zp [X] modulo p, a factorization of f in Zp [X]. Theorem 8.13. Let f, g1 , h1 be polynomials in Zp [X] such that f 6= 0, f ≡ g1 h1 (mod p), gcd(g1 , h1 ) ≡ 1 (mod p), g1 is monic, 0 < deg g1 < deg f, deg g1 h1 6 deg f. Then there exist polynomials g, h ∈ Zp [X] such that f = gh, g ≡ g1 (mod p), h ≡ h1 (mod p), g is monic, deg g = deg g1 . Proof. By induction on m, we prove that there are polynomials gm , hm ∈ Zp [X] such that f ≡ gm hm (mod pm ), gm ≡ g1 (mod p), hm ≡ h1 (mod p), (8.2) gm is monic, deg gm = deg g1 , deg gm hm 6 deg f. For m = 1 this follows from our assumption. Let m > 2, and suppose that there are polynomials gm−1 , hm−1 satisfying (8.2) with m − 1 instead of m. We try to find u, v ∈ Zp [X] such that gm = gm−1 + pm−1 u, hm = hm−1 + pm−1 v satisfy (8.2). By assumption, A := p1−m (f − gm−1 hm−1 ) ∈ Zp [X]. Notice that f ≡ gm hm (mod pm ) if and only if f − (gm−1 + pm u)(hm−1 + pm v) ≡ 0 (mod pm ) ⇐⇒ A ≡ vgm−1 + uhm−1 (mod p) ⇐⇒ A ≡ vg1 + uh1 (mod p). Thanks to our assumption gcd(g1 , h1 ) ≡ 1 (mod p) such u, v exist, and in fact, we can choose u with deg u < deg g1 . Then clearly, gm = gm−1 + pm−1 u, hm = hm−1 + pm−1 v satisfy (8.2). Now for each term X k , the coefficients of X k in the gm form a Cauchy sequence, hence have a limit, so we can take g := limm→∞ gm . Then g is monic, and 0 < deg g < deg f . Likewise, we can define h := limm→∞ hm . Then f − gh = lim (f − gm hm ) = 0. m→∞
This completes our proof. 134
Corollary 8.14. Let f = a0 X n + a1 X n−1 + · · · + an ∈ Qp [X] be irreducible. Put M := max(|a0 |p , . . . , |an |p ). Let k be the smallest index i such that |ai |p = M . Then k = 0 or k = n. Proof. Assume that 0 < k < n. So |ai |p < |ak |p for i < k and |ai |p 6 |ak |p for i > k. Put f˜ := b−1 k f . Then f˜ = b0 X n + · · · + bk−1 X n−k+1 + X n−k + bk+1 X n−k−1 + · · · + bn with |bi |p < 1 for i < k and |bi |p 6 1 for i > k. Now f˜ ∈ Zp [X], b0 , . . . , bk−1 are divisible by p, and thus, f˜ ≡ (X n−k + bk+1 X n−k−1 + · · · + bn ) · 1 (mod p). By applying Hensel’s Lemma, we infer that there are polynomials g, h ∈ Zp [X] such that f˜ = gh and deg g = n − k. Then f˜, hence f , is reducible, contrary to our assumption. We are now ready to define an extension of | · |p to Qp . Given α ∈ Qp , let f = X n + a1 X n−1 + · · · + an ∈ Qp [X] be the monic minimal polynomial of α over Qp , that is the monic polynomial in Qp [X] of smallest degree having α as a root. Then we put |α|p := |an |1/n p . Let α(1) = α, . . . , α(n) be the conjugates of α, i.e., the roots of f in Qp . Let L be any finite extension of Qp containing α, and suppose that [L : Qp ] = m. Then L has precisely m embeddings in Qp that leave the elements of Qp unchanged, say σ1 , . . . , σm . Now in the sequence σ1 (α), . . . , σm (α), each of the conjugates α(1) , . . . , α(n) occurs precisely m/n times. Define the norm NL/Qp (α) := σ1 (α) · · · σm (α). Then |α|p = |an |1/n = |α(1) · · · α(n) |1/n = |NL/Qp (α)|p1/[L:Qp ] . p p In case that α ∈ Qp , the minimal polynomial of α is X − α, and thus we get back our already defined |α|p . Theorem 8.15. | · |p defines a non-archimedean absolute value on Qp . 135
Proof. Let α, β ∈ Qp , and take L = Qp (α, β). Then p] p] p] = |NL/Qp (α)|1/[L:Q |NL/Qp (β)|1/[L:Q = |α|p |β|p . |αβ|p = |NL/Qp (αβ)|1/[L:Q p p p
To prove that |α + β|p 6 max(|α|p , |β|p ), assume without loss of generality that |α|p 6 |β|p and put γ := α/β. Then |γ|p 6 1, and we have to prove that |1 + γ|p 6 1. Let f = X n + a1 X n−1 + · · · + an be the minimal polynomial of γ over Qp . Then |an |p = |γ|np 6 1, and by Corollary 8.14, also |ai |p 6 1 for i = 1, . . . , n − 1. Now the minimal polynomial of γ + 1 is f (X − 1) = X n + · · · + f (−1) and so |γ + 1|p = |f (−1)|1/n = |(−1)n + a1 (−1)n−1 + · · · + a0 |1/n p p 6 max(1, |a1 |p , . . . , |an |p )1/n 6 1, as required.
We recall Eisenstein’s irreducibility criterion for polynomials in Zp . Lemma 8.16. Let f (X) = X n + a1 X n−1 + · · · + an−1 X + an ∈ Zp [X] be such that ai ≡ 0 (mod p) for i = 1, . . . , n, and an 6≡ 0 (mod p2 ). Then f is irreducible in Qp [X]. Proof. Completely similar as the Eisenstein criterion for polynomials in Z[X]. Example. Let α be a zero of X 3 − 8X + 10 in Q2 . The polynomial X 3 − 8X + 10 is irreducible in Q2 [X], hence it is the minimal polynomial of α. It follows that 1/3 |α|2 = |10|2 = 2−1/3 . We finish with some facts which we state without proof. Theorem 8.17. (i) Let K be a finite extension of Qp . Then there is precisely one absolute value on K whose restriction to Qp is | · |p , and this is given by 1/[K:Qp ] |NK/Qp (·)|p . Further, K is complete with respect to this absolute value. (ii) Qp is not complete with respect to | · |p . (iii) The completion Cp of Qp with respect to | · |p is algebraically closed. 136
8.6
Exercises
In the exercises below, p always denotes a prime number and convergence is with respect to | · |p . (a) Determine the p-adic expansion of −1. P k (b) Let α = ∞ k=0 bk p with bk ∈ {0, . . . , p − 1} for k > 0. Determine the p-adic expansion of −α.
Exercise 8.1.
Exercise 8.2. Let α ∈ Qp , α 6= 0. Prove that α has a finite p-adic expansion if and only if α = a/pr where a is a positive integer and r a non-negative integer. P k Exercise 8.3. Let α = ∞ k=−k0 bk p ∈ Qp where bk ∈ {0, . . . , p − 1} for k > −k0 and b−k0 6= 0. Suppose that the sequence {bk }∞ k=−k0 is ultimately periodic, i.e., there exist r, s with r > −k0 , s > 0 such that ak+s = ak for all k > r. Prove that α ∈ Q. Exercise 8.4. Let α ∈ Zp with |α − 1|p 6 p−1 . In this exercise you are asked to define αx for x ∈ Zp and to show that this exponentiation has the expected properties. You may use without proof that the limit of the sum, product etc. of two sequences in Zp is the sum, product etc. of the limits. p −1 6 p−1 . (a) Prove that αα−1 p (b) Let u be a positive integer. Prove that |αu − 1|p 6 |u|p |α − 1|p . Hint. Write u = pm b where b is not divisible by p and use induction on m. (c) Let u, v be positive integers. Prove that |αu − αv |p 6 |u − v|p |α − 1|p . (d) We now define αx for x ∈ Zp as follows. Take a sequence of positive integers {ak }∞ k=0 such that limk→∞ ak = x and define αx := lim αak . k→∞
Prove that this is well-defined, i.e., the limit exists and is independent of the choice of the sequence {ak }∞ k=0 . (e) Prove that for x, y ∈ Zp we have |αx − αy |p 6 |x − y|p |α − 1|p . (Hint. Take sequences of positive integers converging to x, y.) Then show that if {xk }∞ k=0 is a sequence in Zp such that limk→∞ xk = x then limk→∞ αxk = αx (so the function x 7→ αx is continuous). 137
(f ) Prove the following properties of the above defined exponentiation: (i) (αβ)x = αx β x for α, β ∈ Zp , x ∈ Zp with |α − 1|p 6 p−1 , |β − 1|p 6 p−1 ; (ii) αx+y = αx αy , (αx )y = αxy for α ∈ Zp with |α − 1|p 6 p−1 , x, y ∈ Zp . Remark. In 1935, Mahler proved the following p-adic analogue of the Gel’fondSchneider Theorem: let α, β be elements of Zp , both algebraic over Q, such that |α − 1|p 6 p−1 and β 6∈ Q. Then αβ is transcendental over Q. Exercise 8.5. Denote by C((t)) the field of formal Laurent series ∞ X
bk tk
k=k0
with k0 ∈ Z, bk ∈ C for k > k0 . We define an absolute value | · |0 on C((t)) by |0|0 := 0 and |α|0 := c−k0 (c > 1 some constant) where α=
∞ X
bk tk with bk0 6= 0.
k=k0
This absolute value is clearly non-archimedean. (a) Prove that C((t)) is complete w.r.t. | · |0 . (b) Define | · |0 on the field of rational functions C(t) by |0|0 := 0 and |α|0 := c−k0 if α 6= 0, where k0 is the integer such that α = tk0 f /g with f, g polynomials not divisible by t. Prove that C((t)) is the completion of C(t) w.r.t. | · |0 . Exercise 8.6. In this exercise you are asked to work out a p-adic analogue of Newton’s method to approximate the roots of a polynomial (which is in fact a special case of Hensel’s Lemma). Let f = a0 X n + · · · + an ∈ Zp [X]. The derivative of f is f 0 = na0 X n−1 + · · · + an−1 . (a) Let a, x ∈ Zp and suppose that x ≡ 0 (mod pm ) for some positive integer m. Prove that f (a + x) ≡ f (a) (mod pm ) and f (a + x) ≡ f (a) + f 0 (a)x (mod p2m ). Hint. Use that f (a + X) ∈ Zp [X]. 138
(b) Let x0 ∈ Z such that f (x0 ) ≡ 0 (mod p), f 0 (x0 ) 6≡ 0 (mod p). Define the sequence {xn }∞ n=0 recursively by xn+1 := xn −
f (xn ) f 0 (xn )
(n > 0).
n
Prove that xn ∈ Zp , f (xn ) ≡ 0 (mod p2 ), f 0 (xn ) 6≡ 0 (mod p) for n > 0. (c) Prove that xn converges to a zero of f in Zp . (d) Prove that f has precisely one zero ξ ∈ Zp such that ξ ≡ x0 (mod p). Exercise 8.7. In this exercise, p is a prime > 2. (a) Let d be a positive integer such that d 6≡ 0 (mod p) and x2 ≡ d (mod p) is solvable. Show that x2 = d is solvable in Zp . (b) Let a, b be two positive integers such that none of the congruence equations x2 ≡ a (mod p), x2 ≡ b (mod p) is solvable in x ∈ Z. Prove that ax2 ≡ b (mod p) is solvable in x ∈ Z. Hint. Use that the multiplicative group (Z/pZ)∗ is cyclic of order p − 1. This implies that there is an integer g such that (Z/pZ)∗ = {g m mod p : m = 0, . . . , p − 2}. √ (c) Let K be a quadratic extension of Qp . Prove that K = Qp ( d) for some √ √ d ∈ Zp . Next, prove that Qp ( d1 ) = Qp ( d2 ) if and only if d1 /d2 is a square in Qp . (d) Determine all quadratic extensions of Q5 . (e) Prove that for any prime p > 2, Qp has up to isomorphism only three distinct quadratic extensions. Exercise 8.8. (a) Prove that xp−1 = 1 has precisely p − 1 solutions in Zp , and that these solutions are different modulo p. (b) Let S consist of 0 and of the solutions in Zp of xp−1 = 1. Let α ∈ Zp . Prove that for any positive integer m, there are ξ0 , . . . , ξm−1 ∈ S such that P k m ∞ α ≡ m−1 k=0 ξk pP(mod p ). Then prove that there is a sequence {ξk }k=0 in S ∞ k such that α = k=0 ξk p . (This is called the Teichm¨ uller representation of α). 139
Exercise 8.9. In this exercise you may use the following facts on p-adic power series (the coefficients are always in Qp , and m, m0 ∈ Z). P P∞ n n 1) Suppose f (x) = ∞ n=0 an (x − x0 ) , g(x) = n=0 bn (x − x0 ) converge and are equal on B(x0 , p−m ). Then an = bn for all n > 0. P 2) Suppose that for x ∈ B(x0 , p−m ), f (x) = ∞ an (x − x0 )n converges and |f (x) − n=0 P 0 n f (x0 )|p 6 p−m . Further, suppose that g(x) = ∞ n=0 bn (x − f (x0 )) converges on 0 B(f (x0 ), p−m ). Then the composition g(f (x)) can be expanded as a power series P∞ n , p−m ). n=0 cn (x − x0 ) which converges on B(x P∞0 3) We define the derivative of f (x) = n=0 an (x − x0 )n by f 0 (x) :=
∞ X
nan (x − x0 )n−1 .
n=1
If f converges on B(x0 , pm ) then so does f 0 . The derivative satisfies the same sum rules, product rule, quotient rule and chain rule as the derivative of a function on R, e.g., g(f (x))0 = g 0 (f (x))f 0 (x). Now define the p-adic exponential function and p-adic logarithm by expp x :=
∞ X xn n=0
n!
,
logp x :=
∞ X (−1)n−1 n=1
n
· (x − 1)n .
Further, let r = 1 if p > 2, r = 2 if p = 2. Prove the following properties. (a) Prove that expp (x) converges and | expp (x) − 1|p = |x|p for x ∈ B(0, p−r ). Hint. Prove that |xn /n!|p → 0 as n → ∞, and |xn /n!|p < |x|p for n > 2. (b) Prove that logp (x) converges and | logp x|p = |x − 1|p for x ∈ B(1, p−r ). (c) Prove that expp (x + y) = expp (x) expp (y) for x, y ∈ B(0, p−r ). Hint. Fix y and consider the function in x, f (x) := expp (y)−1 expp (x + y). P n 0 Then f (x) can be expanded as a power series ∞ n=0 an x . Its derivative f (x) can be computed in the same way as one should do it for real or complex functions. This leads to conditions on the coefficients an . (d) Prove that logp (xy) = logp (x) + logp (y) for x, y ∈ B(1, p−r ). 140
(e) Prove that logp (expp x) = x for x ∈ B(0, p−r ). (f ) Prove that expp (logp x) = x for x ∈ B(1, p−r ).
141
Chapter 9 The p-adic Subspace Theorem Literature: B. Edixhoven, J.-H. Evertse (eds.), Diophantine Approximation and Abelian Varieties, Introductory Lectures, Lecture Notes in Mathematics 1566, Springer Verlag 1993, Chap.IV
9.1
Results
The p-adic Subspace Theorem deals with Diophantine inequalities in which several different absolute values occur (e.g., the ordinary absolute value and | · |p1 , . . . , | · |ps for distinct primes p1 , . . . , ps ). Recall that the p-adic absolute value | · |p has a unique continuation to Qp (the algebraic closure of Qp ). By ‘algebraic’ we always mean ‘algebraic over Q’. We start with a generalization of Roth’s Theorem. Theorem 9.1. Let p1 , . . . , ps be distinct prime numbers. Let α be an algebraic number in R and for i = 1, . . . , s, let αpi be a number in Qp which is algebraic over Q. Finally, let κ > 2. Then the inequality (9.1)
|α − ξ| · |αp1 − ξ|p1 · · · |αps − ξ|ps 6 H(ξ)−κ in ξ ∈ Q
has only finitely many solutions. Example. Consider √ √ 3 3 | 2 − ξ| · | 3 − ξ|2 6 H(ξ)−κ in ξ ∈ Q 143
√ where κ > 2. Here, 3 3 = 31/3 ∈ Q2 is defined by Exercise 8.6. Theorem 9.1 implies that there are only finitely many ξ ∈ Q such that ξ is very close √ √ to 3 2 but 2-adically not too close to 3 3 or conversely; and also if ξ is moderately √ √ close to 3 2 and also 2-adically moderately close to 3 3. We now formulate the p-adic Subspace Theorem. This involves again absolute values | · |, | · |p1 , . . . , | · |ps and for each of these absolute values, a system of n linearly independent linear forms in X1 , . . . , Xn . Theorem 9.2. Let n > 2, ε > 0, and let p1 , . . . , ps be distinct prime numbers. Further, let L1∞ , . . . , Ln∞ be linearly independent linear forms in X1 , . . . , Xn with coefficients in C that are algebraic over Q, and for j = 1, . . . , s, let L1,pj , . . . , Ln,pj be linearly independent linear forms in X1 , . . . , Xn with coefficients in Qpj that are algebraic over Q. Consider the inequality (9.2)
|L1∞ (x) · · · Ln∞ (x)| ·
s Y
|L1,pj (x) · · · Ln,pj (x)|pj 6 kxk−ε in x ∈ Zn .
j=1
Then there are a finite number of proper linear subspaces T1 , . . . , Tt of Qn such that all solutions of (9.2) lie in T1 ∪ · · · ∪ Tt . Proof of Theorem 9.1. Let ξ be a solution of (9.1). Write ξ = x/y with x, y ∈ Z, 2 gcd(x, y) = 1. Multiply (9.1) with A := |y| · |y|p1 · · · |y|ps . Notice that |y|pj 6 1 for j = 1, . . . , s. Hence A 6 y 2 6 H(ξ)2 . Let ε = κ − 2. Then (9.1) implies |(x − αy)y| ·
s Y
|(x − αpj y)y|pj 6 max(|x|, |y|)−ε .
j=1
The solutions (x, y) ∈ Z2 of the latter lie in only finitely many proper one-dimensional linear subspaces of Q2 , and each of these gives rise to a single fraction ξ = x/y. So (9.1) has only finitely many solutions. Example. Let ε > 0. We show that the inequality (9.3)
|2u + 3v − 5w | 6 max(|2u |, |3v |, |5w |)1−ε
has only finitely many solutions in non-negative integers u, v, w. 144
Write x1 = 2u , x2 = 3v , x3 = 5w , x = (x1 , x2 , x3 ). We first show that the set of solutions x lies in the union of finitely many proper linear subspaces of Q3 . Consider for the moment those solutions for which kxk = |x3 |. Notice that |x1 x2 x3 |2 · |x1 x2 x3 |3 · |x1 x2 x3 |5 = 2−u 3−v 5−w = |x1 x2 x3 |−1 . In combination with (9.3), this gives |(x1 + x2 − x3 )x1 x2 | · |x1 x2 x3 |2 · |x1 x2 x3 |3 · |x1 x2 x3 |5 6 |x3 |−1 kxk1−ε 6 kxk−ε . The solutions of the latter inequality lie in the union of finitely many proper linear subspaces of Q3 . So the solutions of (9.3) with kxk = |x3 | lie in finitely many proper linear subspaces of Q3 . In a similar way one proves that the solutions with kxk = |x1 | or with kxk = |x2 | lie in finitely many proper linear subspaces of Q3 . It is left as an exercise to prove that if T is a two-dimensional linear subspace of Q then T contains only finitely many solutions of (9.3). 3
Similarly as for the basic Subspace Theorem discussed in Chapter 7, there is a version with linear forms in general position. Theorem 9.3. Let ε > 0, and let p1 , . . . , ps be distinct prime numbers. Further, let L1∞ , . . . , Lr∞ (r > n) be linear forms in X1 , . . . , Xn in general position with coefficients in C that are algebraic over Q, and for j = 1, . . . , s, let L1,pj , . . . , Lrj ,pj (rj > n) be linear forms in X1 , . . . , Xn in general position with coefficients in Qpj that are algebraic over Q. Consider the inequality (9.4)
|L1∞ (x) · · · Lr∞ (x)| ·
s Y
|L1,pj (x) · · · Lrj ,pj (x)|pj 6 kxkr−n−ε
j=1
in x ∈ Zn with gcd(x1 , . . . , xn ) = 1. Then there are a finite number of proper linear subspaces T1 , . . . , Tt of Qn such that all solutions of (9.4) lie in T1 ∪ · · · ∪ Tt . Proof. We partition the solutions x ∈ Zn of (9.4) in classes depending on which n quantities among |L1∞ (x)|, . . . , |Lr∞ (x)| are the smallest, and likewise, for j = 1, . . . , s, which n quantities among |L1,pj (x)|pj , . . . , |Lrj ,pj (x)|pj are the smallest. It suffices to show that the solutions in a given class lie in finitely many proper linear subspaces of Qn . 145
Consider for instance the solutions x ∈ Zn such that |L1∞ (x)|, . . . , |Ln∞ (x)| are the smallest among |L1∞ (x)|, . . . , |Lr∞ (x)| and |L1,pj (x)|pj , . . . , |Ln,pj (x)|pj are the smallest among |L1,pj (x)|pj , . . . , |Lrj ,pj (x)|pj for j = 1, . . . , s. Let i > n + 1. According to Lemma 7.4, there is a constant Ci such that for the solutions under consideration, kxk 6 Ci |Li∞ (x)| . Let j ∈ {1, . . . , s}. Since we consider only solutions whose coordinates have gcd 1, for each solution x = (x1 , . . . , xn ) under consideration, there is an index k with |xk |pj = 1. Since L1,p1 , . . . , Ln−1,pj , Li,pj span the vector space of all linear forms in Qpj , we have Xk = α1 L1,pj + · · · + αn−1 Ln−1,pj + αn Li,pj for certain constants α1 , . . . , αn . So by the ultrametric inequality, 1 = |xk |pj 6 max |αl |pj |Ll,pj (x)|pj 6 Ci,pj |Li,pj (x)|pj l
for some constant Ci,pj . By combining these inequalities with (9.4), we obtain |L1∞ (x) · · · Ln∞ (x)| ·
s Y
|L1,pj (x) · · · Ln,pj (x)|pj 6 Ckxk−ε
j=1
for some constant C > 0. Now apply Theorem 9.2 to the latter. Let F (X, Y ) ∈ Z[X, Y ] be a square-free binary form of degree n > 3 and p1 , . . . , ps distinct prime numbers. We consider the so-called Thue-Mahler equation (9.5)
|F (x, y)| = pz11 · · · pzss in x, y, z1 , . . . , zs ∈ Z with gcd(x, y) = 1.
Notice that if we drop the condition gcd(x, y) = 1 it is possible to construct infinitely many solutions from a given solution. We prove the following. Theorem 9.4. (Mahler, 1933). Equation (9.5) has only finitely many solutions. We use the following important fact. ws 1 Lemma 9.5. Let u ∈ Q. Then u = ±pw 1 · · · ps for certain integers w1 , . . . , ws if and only if |u| · |u|p1 · · · |u|ps = 1.
146
Proof. Trivial. Proof of Theorem 9.4. If F (1, 0) 6= 0 then the form F can be factored as a0 (X − α1 Y ) · · · (X − αn Y ) with α1 , . . . , αn distinct, while if F (1, 0) = 0, F can be factored as a0 Y (X − α1 Y ) · · · (X − αn−1 Y ) with α1 , . . . , αn−1 distinct. In both cases, F is a product of n linear forms in two variables in general position. Take ε with 0 < ε < n − 2. Then by Lemma 9.5 we have for any solution (x, y, z1 , . . . , zs ) of (9.5), |F (x, y)| ·
s Y
|F (x, y)|pj = 1 6 max(|x|, |y|)n−2−ε .
j=1
By Theorem 9.3, the set of solutions (x, y) ∈ Z2 of this inequality lies in the union of finitely many one-dimensional linear subspaces of Q2 . Each such subspace contains only two solutions with gcd(x, y) = 1. This proves that (9.5) has only finitely many solutions. Remark. The above proof of the finiteness of the number of solutions of the Thue-Mahler equation is based on the p-adic Subspace Theorem and is therefore ineffective. There is however an alternative, effective proof of Theorem 9.4. There are effective lower bounds for the p-adic absolute value of linear forms in p-adic logarithms of algebraic numbers, similar to those mentioned in Chapter 5. Then one can prove Theorem 9.4, with an effective upper bound for max(|x|, |y|), by combining estimates for linear forms in ‘ordinary logarithms’ with estimates for linear forms in pj -adic logarithms for j = 1, . . . , s. Recall that in Chapter 5, we considered the unit equation ax + by = 1 where the ∗ of the ring of integers OK of an unknowns x, y are taken from the unit group OK algebraic number field K. It was proved that this equation has only finitely many ∗ solutions. By Dirichlet’s Unit Theorem, the group OK is finitely generated, and we have ∗ ∼ OK = W × Zr where W is the group of roots of unity in K (which is finite), and where r is the unit rank. Recall that r = r1 + r2 − 1 where r1 is the number of embeddings K → R and r2 the number of complex conjugate pairs of embeddings σ, σ : K → C, where σ is the composition of σ and complex conjugation. 147
We consider a much more general situation where x, y are taken from an arbitrary finitely generated multiplicative group in an arbitrary field of characteristic 0. For such a finitely generated group Γ we have Γ ∼ = Γtors ×Zr where Γtors is the (necessarily finite) torsion subgroup of Γ, consisting of roots of unity. Thus, (9.6)
Γ = {ζg1m1 · · · grmr : m1 , . . . , mr ∈ Z}
for certain generators g1 , . . . , gr . Theorem 9.6. (Lang, 1960). Let K be any field of characteristic 0, let a, b be nonzero elements from K, and let Γ be a finitely generated subgroup of the multiplicative group K ∗ of K. Then the equation (9.7)
ax + by = 1 in x, y ∈ Γ
has only finitely many solutions. Lang’s proof is ineffective. From Theorem 5.17, that we proved in Chapter 5, one can derive an effective proof of the above theorem in the special case that Γ is a subgroup of Q∗ and that a, b are non-zero elements of Q∗ . We now give another, but ineffective proof of this result. Let g1 , . . . , gr be a set of generators of Γ as in (9.6). Let p1 , . . . , ps be primes such that the numerators and denominators of a, b, g1 , . . . , gr are composed of primes from p1 , . . . , ps . Write ax = u/w, by = v/w, where u, v, w are integers, necessarily composed of primes from p1 , . . . , ps , with gcd(u, v, w) = 1 and u + v = w. Now clearly, we have |uv(u + v)| = pz11 · · · pzss , gcd(u, v) = 1 for certain non-negative integers z1 , . . . , zs . This is a Thue-Mahler equation. Therefore there are only finitely many possibilities for the pair (u, v), hence for (u, v, w), hence for (x, y). Remark. In case that the group Γ is contained in an algebraic number field K, it is possible to give an effective proof of Theorem 9.6, see Theorem 5.18. If the degree of K and the number of generators of Γ are not too large, there is a practical algorithm to determine all solutions. Example. Let Γ be the multiplicative group generated by 2, 3, 5, 7, 11, 13 and consider the equation (9.8)
x + y = 1 in x, y ∈ Γ with x 6 y. 148
We give some solutions: 1 1 3 4 2 11 3993 16807 3 · 113 75 , , , , , , , = 6 2 , . 2 2 7 7 13 13 20800 20800 2 · 5 · 13 26 · 52 · 13 In his thesis of 1988, de Weger determined all 545 solutions of (9.8).
9.2
Further applications
Let K be a field of characteristic 0 and Γ a finitely generated subgroup of K ∗ . Further, let n > 2 and α1 , . . . , αn ∈ K ∗ . We consider the equation α1 x1 + · · · + αn xn = 1 in x1 , . . . , xn ∈ Γ.
(9.9)
If n > 3 this equation may have infinitely many solutions. For instance, let 2 6 m < n and suppose (9.9) has a solution (x1 , . . . , xn ) with α1 x1 + · · · + αm xm = 1, αm+1 xm+1 + · · · + αn xn = 0 . Then for every u ∈ Γ, the tuple (x1 , . . . , xm , uxm+1 , . . . , uxn ) is also a solution of (9.9). Assuming the group Γ is infinite, we obtain in this way infinitely many solutions of (9.9). More generally, we can construct infinitely many solutions from P a given solution (x1 , . . . , xn ) with a vanishing subsum i∈I αi xi = 0 for some nonempty subset I of {1, . . . , n}. To make such easy constructions of infinite sets of solutions impossible, we consider only solutions without vanishing subsums. Definition. A solution (x1 , . . . , xn ) of (9.9) is called non-degenerate if X αi xi 6= 0 for each non-empty subset I of {1, . . . , n}. i∈I
Theorem 9.7. (Van der Poorten, Schlickewei, Laurent, E., 1980’s) Equation (9.9) has only finitely many non-degenerate solutions. Roughly speaking, the proof consists of two steps. In the first step one makes a reduction from the general case that K is a field of characteristic 0 to the special case that K is an algebraic number field by using techniques from algebraic geometry. 149
To treat the case that Γ is contained in an algebraic number field one has to apply the ‘p-adic Subspace Theorem over number fields,’ which is a generalization of the p-adic Subspace Theorem which involves absolute values on an algebraic number field and in which the unknowns are algebraic integers of that number field. Since in these notes we have only the p-adic Subspace Theorem over Q at our disposal, we assume henceforth Γ ⊂ Q∗ , α1 , . . . , αn ∈ Q∗ and prove Theorem 9.9 in this special case. Lemma 9.8. There are finitely many proper linear subspaces T1 , . . . , Tt of Qn such that the set of solutions (x1 , . . . , xn ) of (9.9) (non-degenerate or not) lies in T1 ∪ · · · ∪ Tt . Proof. We use the ‘general position version’ of the p-adic Subspace Theorem. There are g1 , . . . , gr of Q∗ such that every element of Γ can be expressed as ±g1u1 · · · grur with u1 , . . . , ur ∈ Z. Let p1 , . . . , ps be the prime numbers occurring in the numerators and denominators of α1 , . . . , αn , g1 , . . . , gr . Take a solution (x1 , . . . , xn ) of (9.9) and write yi α i xi = (i = 1, . . . , n) w where y1 , . . . , yn , w are integers with gcd(y1 , . . . , yn , w) = 1. Further write y = (y1 , . . . , yn ). Clearly, y1 + · · · + yn = w and y1 , . . . , yn , w are composed of primes from p1 , . . . , ps . This implies (9.10) |y1 · · · yn (y1 + · · · + yn )| ·
s Y
|y1 · · · yn (y1 + · · · + yn )|pj = 1 6 kyk(n+1)−n−ε .
j=1
where 0 < ε < 1. The linear forms y1 , . . . , yn , y1 +· · ·+yn are in general position. So by the general position-version of the p-adic Subspace Theorem, the set of solutions y = (y1 , . . . , yn ) ∈ Zn of (9.10) lies in the union of at most finitely many proper linear subspaces of Qn . If for instance y lies in a subspace with equation a1 y1 +· · ·+an yn = 0 then (x1 , . . . , xn ) lies in a1 α1 x1 + · · · + an αn xn = 0. This proves the lemma. Lemma 9.9. There is a finite set U ∈ Q∗ such that for every solution (x1 , . . . , xn ) of (9.9) (non-degenerate or not) there are distinct indices i, j ∈ {1, . . . , n} with xi /xj ∈ U . 150
Proof. We proceed by induction on n. First let n = 2. The solutions (x1 , x2 ) lie in finitely many subspaces a1 x1 + a2 x2 = 0. Thus for the quotient x1 /x2 there are only finitely many possible values. Now let n > 3 and assume that the lemma is true for equations of type (9.9) in fewer than n variables. By the previous lemma, there are proper linear subspaces T1 , . . . , Tt of Qn such that the solutions of (9.9) lie in T1 ∪ · · · ∪ Tt . Consider the solutions in T ∈ {T1 , . . . , Tt }. Suppose that T is given by an equation a1 x1 + · · · + an xn = 0 with one of the coefficients, say an , not equal to 0. By expressing xn as a linear combination of x1 , . . . , xn−1 and substituting this into (9.9), we obtain an equation 0 α10 x1 + · · · + αn−1 xn−1 = 1. By the induction hypothesis, there is a finite set UT , such that for every solution of the latter equation there are distinct indices i, j ∈ {1, . . . , n − 1}, with xi /xj ∈ UT (this holds true even if some of the coefficients αj0 are 0). Now the lemma holds with U := UT1 ∪ · · · ∪ UTt . Proof of Theorem 9.7. We proceed again by induction on n, starting with n = 1. For n = 1 the assertion is trivial. Let n > 2 and suppose Theorem 9.7 is true for equations in fewer than n variables. Suppose the set U from the previous lemma is {β1 , . . . , βm }. Then the nondegenerate solutions (x1 , . . . , xn ) of (9.9) can be divided into finitely many sets Sijk (i, j = 1, . . . , n, i 6= j, k = 1, . . . , m), where Sijk is the set of solutions with xi /xj = βk . We have to show that each of these sets is finite. Consider for instance the solutions in Sn,n−1,1 , i.e., with xn /xn−1 = β1 . These solutions satisfy (9.11)
α1 x1 + · · · + αn−2 xn−2 + (αn−1 + β1 αn )xn−1 = 1.
We may assume that αn−1 + β1 αn 6= 0 (otherwise the solutions in Sn,n−1,1 would have αn−1 xn−1 + αn xn = 0 and such solutions have been excluded). Further, it is easy to check that if each subsum of the left-hand side of (9.9) is non-zero, then so is each subsum of the left-hand side of (9.11). By the induction hypothesis, (9.11) has only finitely many non-degenerate solutions (x1 , . . . , xn−1 ). Hence the set Sn,n−1,1 contains only finitely many non-degenerate solutions of (9.9). The same reasoning applies for the other sets Sijk . This finishes the induction step. 151
We now deal with linear recurrence sequences. A sequence U = {uh }∞ h=0 with terms in C is called a linear recurrence sequence if it is given by a linear recurrence (9.12)
uh = c1 uh−1 + · · · + ck uh−k for h > k,
where c1 , . . . , ck are constants in C and ck 6= 0, and by initial values u0 , . . . , uk−1 . Given a linear recurrence sequence U , there are various linear recurrences which it may satisfy but there is a unique one with minimal length k (exercise). This k is called the order of the linear recurrence sequence U , and the polynomial fU (X) = X k − c1 X k−1 − · · · − ck the companion polynomial of U . Theorem 9.10. Let U = {uh }∞ h=0 be a linear recurrence sequence in C with comk panion polynomial fU (X) = X − c1 X k−1 − · · · − ck . Write fU (X) = (X − θ1 )e1 · · · (X − θm )em , where θ1 , . . . , θm are distinct complex numbers and e1 , . . . , em positive integers. Then there are polynomials g1 , . . . , gm ∈ C[X] of degrees at most e1 − 1, . . . , em − 1, respectively, such that (9.13)
h uh = g1 (h)θ1h + · · · + gm (h)θm for h > 0.
Conversely, any sequence satisfying (9.13) is a linear recurrence sequence. Proof. Consider the power series y(z) =
∞ X uh h=0
h!
zh.
One proves easily by induction on h that there is a constant C > 0 such that |uh | 6 C h for all h > 0. Hence y(z) converges, and thus defines an analytic function, everywhere on C. Using that the sequence U satisfies recurrence relation (9.12), it follows easily that y satisfies the linear differential equation y (k) = c1 y (k−1) + · · · + ck−1 y 0 + ck y . 152
By the theory of linear differential equations, the set of solutions of the latter equation is a complex vector space with basis {z j eθi z : i = 1, . . . , m, j = 0, . . . , ei − 1}. Hence there are cij ∈ C such that y(z) =
m eX i −1 X
j θi z
cij z e
i=1 j=0
=
=
m eX i −1 X i=1 j=0
∞ X
m X
(e −1 i X
h=0
i=1
j=0
cij
∞ X l=0
θil
z l+j l! )
cij h(h − 1) · · · (h − j +
1)θi−j
! θih
zh . h!
∞ This implies that {uh }∞ h }h=0 satisfies (9.13) then h=0 satisfies (9.13). Conversely, if {uP h by reversing the above argument one shows that y(z) = ∞ h=0 (uh /h!)z satisfies a linear differential equation with constant coefficients, and subsequently that {uh }∞ h=0 is a linear recurrence sequence.
Example. Let U = {uh }∞ h=0 be given by uh = 10uh−1 − 31uh−2 + 30uh−3 (h > 3), u0 = 1, u1 = 0, u2 = −12. The companion polynomial of U is given by fU (X) = X 3 − 10X 2 + 31X − 30 = (X − 2)(X − 3)(X − 5). By Theorem 9.10 there are constants c1 , c2 , c3 such that uh = c1 2h + c2 3h + c3 5h . Substituting h = 0, 1, 2 one obtains c1 = 1, c2 = 0, c2 = −12 and uh = 2h + 3h − 5h . The zero set of a linear recurrence sequence U = {uh }∞ h=0 is defined by ZU := {h ∈ Z>0 : uh = 0} and the zero multiplicity of U is NU := #ZU . With the notation from Theorem 9.10, the set ZU is the set of solutions of (9.14)
h g1 (h)θ1h + · · · + gm (h)θm = 0 in h ∈ Z>0 .
This is called an exponential-polynomial equation. A linear recurrence sequence U = {uh }∞ h=0 is called non-degenerate if the zeros of its companion polynomial θ1 , . . . , θm are such that none of the quotients θi /θj (1 6 i < j 6 m) is a root of unity. 153
Theorem 9.11. (Skolem-Mahler-Lech, 1953) Let U be a non-degenerate linear recurrence sequence. Then its zero set is finite. Stated equivalently, if θ1 , . . . , θm are non-zero complex numbers such that none of the quotients θi /θj (1 6 i, j 6 m, i 6= j) is a root of unity and if g1 (X), . . . , gm (X) are polynomials in C[X], not all equal to 0, then Eq. (9.14) has only finitely many solutions. There are two very different proofs. In the first proof, which was the one given by Skolem, Mahler and Lech, one ‘maps’ the linear recurrence sequence to a sequence with terms in Qp for a suitable prime p and then uses techniques from p-adic analysis. In the second proof, one ‘maps’ the linear recurrence sequence to a sequence with terms in an algebraic number field, and then applies the p-adic Subspace Theorem over number fields. Here we prove Theorem 9.11 in the special case that the companion polynomial fU of U = {uh }∞ e = h=0 does not have multiple zeros, i.e., in Theorem 9.10 we have Pm 1 h · · · = em = 1. Then the polynomials gi (h) in (9.13) have degree 0, so uh = i=1 gi θi for h > 0 where the gi are constants. That is, we have to show that the equation h g1 θ1h + · · · + gm θm =0
has finitely many solutions in h ∈ Z>0 . We proceed by induction on m. For m = 1 there are no solutions and we are done. Let m > 2 and suppose the theorem is true if we have fewer than m terms. Let ai := −gi /gm , βi := θi /θm . Then the equation reduces to h a1 β1h + · · · + am−1 βm−1 = 1.
(9.15)
Further, none of the numbers βi , nor any of the quotients βi /βj (i 6= j) is a root of unity. We apply Theorem 9.7 with the group Γ generated by β1 , . . . , βm−1 . It follows that there are only finitely many integers h which satisfy (9.15) and for which none of the subsums of the left-hand side of (9.15) vanishes, i.e., X ai βih 6= 0 for each non-empty subset I of {1, . . . , m}. i∈I
154
P But by the induction hypothesis, each equation i∈I ai βih = 0 has only finitely many solutions h. So altogether, (9.15) has only finitely many solutions h. Remark. Using a much refined version of the p-adic Subspace over number fields, Schmidt proved the following: Theorem 9.12. (Schmidt, 2000) Let U be a non-degenerate linear recurrence sequence with terms in C of order k. Then for its zero multipicity we have NU 6 exp exp exp 20k. This has been improved by Amoroso and Viada (2011) to NU 6 exp exp 70k. Bavencoffe and B´ezivin (Une Famille Remarquable de Suites R´ecurrentes Lin´eaires, Monatshefte f¨ ur Mathematik 120 (1995), 189–203) found examples of non-degenerate linear recurrence sequences U of arbitrarily large order k, having NU > 21 k 2 − 12 k + 1; no linear recurrence sequences of order k with larger zero multiplicity are known. In fact, let X k+1 + (−2)k−1 X + (−2)k Pk (X) := ; X +2 verify that Pk (X) ∈ Z[X]. Let U = {un }∞ n=0 be the linear recurrence sequence with companion polynomial Pk and initial values u0 = · · · = uk−2 = 0, uk−1 = 1. Bavencoffe and B´ezivin proved that U is non-degenerate, and moreover, that un = 0 for n = l(k + 1) + q with l > 0, q > 0, l + q 6 k − 2, n = j(2k + 1) with 1 6 j 6 k − 1.
9.3
Exercises
Exercise 9.1. Let p1 , p2 , p3 be distinct prime numbers, A1 , A2 , A3 non-zero integers, and ε > 0. Prove that the inequality |A1 pu1 1 + A2 pu2 2 + A3 pu3 3 | 6 max(pu1 1 , pu2 2 , pu3 3 )1−ε has only finitely many solutions in non-negative integers u1 , u2 , u3 . Exercise 9.2. let p be a prime number, α a real algebraic number and ε > 0. 155
(1) Prove that the inequality α − x 6 max(x, pu )−1−ε pu has only finitely many solutions in integers x, u with u > 0. (2) Prove that the inequality α −
x 6 max(x, pu )−1−ε pu − 1
has only finitely many solutions in integers x, u with u > 0. Exercise 9.3. Let ε > 0. Prove that the inequality 3 n 6 e−εn − u 2 has only finitely many solutions in non-negative integers n, u. Hint. Let x = 3n , y = u2n and apply in an appropriate way the p-adic Subspace Theorem. Exercise 9.4. Let f (X) = a0 X n + a1 X n−1 + · · · + an ∈ Z[X] be a square-free polynomial, i.e., without multiple zeros, and let p1 , . . . , ps be distinct prime numbers. We consider the equation (9.16)
|f (ξ)| = pz11 · · · pzss in ξ ∈ Q, z1 , . . . , zs ∈ Z .
(1) Let (ξ, z1 , . . . , zs ) be a solution of (9.16). Prove that |ξ|p 6 1 for every prime p with p 6∈ {p1 , . . . , ps }, p - a0 . (2) Let n > 2. Prove that (9.16) has only finitely many solutions. What if n = 1? Hint. Write ξ = x/y with x, y ∈ Z, gcd(x, y) = 1 and reduce (9.16) to a Thue-Mahler equation. Exercise 9.5. Let p be a prime number, α ∈ Zp , α 6∈ Q. (1) Prove that for every positive integer m there are integers x, y, not both 0, such that |x − αy|p 6 p−2m , |x| 6 pm , |y| 6 pm . Hint. Choose a positive integer a such that |α − a|p 6 p−2m and show that if x, y is a solution then x − ay = p2m u for some u ∈ Z. 156
(2) Prove that the inequality |x − αy|p 6 max(|x|, |y|)−2 has infinitely many solutions in (x, y) ∈ Z2 . (3) Suppose that α is algebraic and let ε > 0. Prove that the inequality |x − αy|p 6 max(|x|, |y|)−2−ε has only finitely many solutions in (x, y) ∈ Z2 . Exercise 9.6. For a finite set of primes S = {p1 , . . . , ps }, denote by US the set of integers of the shape ±pu1 1 · · · pus s : u1 , . . . , us ∈ Z>0 . Let S0 , . . . , Sn be pairwise disjoint sets of prime numbers, and a0 , . . . , an non-zero integers. Prove that the equation a0 x0 + · · · + an xn = 0 in x0 ∈ US0 , . . . , xn ∈ USn has only finitely many solutions. Exercise 9.7. Let U = {uh }∞ h=0 be a linear recurrence sequence with terms in C. (1) Prove that the following two assertions are equivalent: (a) uh = c1 uh−1 + · · · + ck uh−k for all h > k; P h k (b) ∞ h=0 uh X = g(X)/h(X), where h(X) = 1 − c1 X − · · · − ck X and g(X) is a polynomial of degree at most k − 1. (2) Let IU be the set of all polynomials d0 X m + · · · + dm ∈ C[X] (m > 0, d0 , . . . , dm ∈ C) such that d0 uh + d1 uh−1 + · · · + dm uh−m = 0 for all h > m. Prove that IU is an ideal of the ring C[X], generated by the companion polynomial of U . (3) Give a necessary and sufficient condition, in terms of the companion polynomial of U , such that U is periodic (i.e., there is m > 0 such that uh+m = uh for all h > 0. (4) Give an example of a non-periodic linear recurrence sequence U = {uh }∞ h=0 such that ZU = {h ∈ Z>0 : uh = 0} is infinite. 157
Exercise 9.8. An arithmetic progression is a sequence a, a + d, a + 2d, . . . where a, d are integers with d > 0. Let U = {uh }∞ h=0 be a linear recurrence sequence with terms in C. We do not assume that U is non-degenerate. Assuming the Skolem-Mahler-Lech Theorem, prove that either ZU is finite, or ZU is the union of a finite set and a finite number of arithmetic progressions. Hint. Assume that U is degenerate and let θ1 , . . . , θm be the roots of the companion polynomial of U . Let N be a positive integer such that all roots of unity among the quotients θi /θj have order dividing N . Consider the sequences {uhN +i }∞ h=0 (i = 0, . . . , N − 1). Exercise 9.9. A linear recurrence sequence U = {uh }∞ h=0 is called strongly nondegenerate if for the zeros θ1 , . . . , θm of the companion polynomial of U , neither any of the numbers θi (i = 1, . . . , m), nor any of the quotients θi /θj (1 6 i, j 6 m i 6= j) is a root of unity. (1) Let U be a strongly non-degenerate linear recurrence sequence with terms in C. Prove that for every a ∈ C, the set ZU (a) := {h ∈ Z>0 : uh = a} is finite. (2) Let U = {uh }∞ h=0 be a linear recurrence sequence with companion polynomial f (X) = (X − θ1 )(X − θ2 ) where none of θ1 , θ2 , θ1 /θ2 is a root of unity. Prove that the set TU := {(h, l) ∈ Z2 : uh = ul , 0 < h < l} is finite. Hint. Use Theorem 9.7. Remark. One can show that TU is finite for every strongly non-degenerate linear recurrence sequence U .
158