331 93 2MB
English Pages 354 Year 2016
Volker Diekert, Manfred Kufleitner, Gerhard Rosenberger, Ulrich Hertrampf Discrete Algebraic Methods De Gruyter Graduate
Also of Interest Riemann-Roch Spaces and Computation Alvanos, 2015 ISBN 978-3-11-042613-7, e-ISBN (PDF) 978-3-11-042612-0, e-ISBN (EPUB) 978-3-11-043948-9
Algebraic Curves and Finite Fields Niederreiter/Ostafe/Panario/Winterhof, 2014 ISBN 978-3-11-031788-6, e-ISBN (PDF) 978-3-11-031791-6, e-ISBN (EPUB) 978-3-11-037955-6
A Course in Mathematical Cryptography Baumslag/Fine/Kreuzer/Rosenberger, 2015 ISBN 978-3-11-037276-2, e-ISBN (PDF) 978-3-11-037277-9, e-ISBN (EPUB) 978-3-11-038616-5
Discrete Mathematics and Applications Zubkov (Editor-in-Chief) ISSN 0924-9265, e-ISSN 1569-3929
Volker Diekert, Manfred Kufleitner, Gerhard Rosenberger, Ulrich Hertrampf
Discrete Algebraic Methods | Arithmetic, Cryptography, Automata and Groups
Mathematics Subject Classification 2010 11-01, 11Y11, 12-01, 14H52, 20E06, 20M05, 20M35, 68Q45, 68Q70, 68R15, 94A60 Authors Prof. Dr. Volker Diekert University of Stuttgart Department of Computer Science Universitätsstr. 38 70569 Stuttgart Germany [email protected]
Prof. Dr. Gerhard Rosenberger University of Hamburg Department of Mathematics Bundesstr. 55 20146 Hamburg Germany [email protected]
PD Dr. Manfred Kufleitner University of Stuttgart Department of Computer Science Universitätsstr. 38 70569 Stuttgart Germany kufl[email protected]
Prof. Dr. Ulrich Hertrampf University of Stuttgart Department of Computer Science Universitätsstr. 38 70569 Stuttgart Germany [email protected]
ISBN 978-3-11-041332-8 e-ISBN (PDF) 978-3-11-041333-5 e-ISBN (EPUB) 978-3-11-041632-9 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2016 Walter de Gruyter GmbH, Berlin/Boston Cover image: John Phelan, Wikimedia Commons. This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license. Typesetting: le-tex publishing services GmbH, Leipzig, Germany Printing and binding: CPI books GmbH, Leck ♾ Printed on acid-free paper Printed in Germany www.degruyter.com
Preface This book is based on courses taught by the authors over many years at universities in Dortmund, Hamburg, Munich, and Stuttgart. Two textbooks emerged from the German lecture notes, Elemente der diskreten Mathematik [28] and Diskrete algebraische Methoden [27]. While the first book has a strong emphasis on combinatorial topics, the second book focuses on algebraic ideas. This English edition is a revised and extended version of the second book. Essentially, no knowledge beyond high school mathematics is required for reading this book. However, the density of the presentation suggests a certain amount of mathematical maturity, as, for example, required from students in computer science or mathematics studying for a master’s degree. Every mathematically inclined reader should be able to follow the topics, but the effort might vary. Contents. Discrete algebraic methods are a future-oriented topic and its fundamentals are becoming increasingly significant. The selection of the material is also influenced by the authors’ personal affinities. We begin with a general chapter on algebraic structures, providing the foundation for the rest of the book. Each of the subsequent chapters can be read independently, but there are various connections. For example, the chapter on elliptic curves is motivated by applications in cryptography. Algebraic cryptography is actually a guiding theme, given that cryptography is ubiquitous in modern society. Our focus is on asymmetric cryptographic methods. They are mathematically more interesting and challenging than symmetric procedures, which are designed to optimize performance. We also present Shamir’s attack on the Merkle– Hellman scheme. Due to this attack, the Merkle–Hellman scheme no longer has any practical use. However, the Merkle–Hellman method serves as a lesson on how skeptical we should be regarding unproven security claims. Also, the general question, whether it makes sense to build cryptosystems on NP-hard problems, remains topical. The purely speculative answer we tend to give to this question is “no,” but the margin here is too narrow to include our justification. The chapter on number theoretic algorithms is central. For example, large random prime numbers are needed in cryptosystems such as Rivest–Shamir–Adleman (RSA). So the method relies on the ability to efficiently generate primes. To date, RSA is the most popular asymmetric cryptographic method. It is named after Rivest, Shamir, and Adleman, who were the first to publish the algorithm in 1977 (although it seems that the scheme was known to British intelligence much earlier). We also study alternatives for RSA, such as the Diffie–Hellman procedure for secure key exchange. If we rely on this procedure, it might be reassuring to understand why known algorithms are unable to compute discrete logarithms on moderate input sizes. For calculations with large numbers, the analysis is often based on the fact that numbers can be multi-
VI | Preface
plied in nearly linear time. We discuss Schönhage and Strassen’s algorithm for integer multiplication, which achieves this time bound. In the chapter on the recognition of primes, we present the deterministic polynomial time test by Agrawal, Kayal, and Saxena. It is called the AKS test after its inventors. Even though it was only discovered in 2002, it is the archetypical application of discrete algebraic methods. Our presentation of the AKS test follows the exposition of the original article to a large extent. Nevertheless, we incorporate some simplifications compared to the original paper. In particular, we do not use cyclotomic polynomials. All we need is the basic fact that splitting fields exist. This refurbishment makes it possible to discuss the AKS test in a high school study group with a committed teacher and motivated students. We strongly encourage school teachers to undertake such a project. The chapter on elliptic curves is mainly dedicated to number theoretic and cryptographic applications. Cryptography based on elliptic curves is well established in practice; and this area will be of growing importance in the near future. We provide all the necessary mathematics for this purpose. One big problem for the understanding of elliptic curves is that a great portion of users have no idea why computations should lead to correct results. This applies to a lesser extent to RSA-based methods, because computing modulo n is easily comprehensible. In comparison, elliptic curves constitute inherently complex mathematical objects. A user might easily be able to implement the needed operations on elliptic curves, needing only basic arithmetic operations. The formalism thus is available in the form of recipes, but usually the recipes (and also the books containing them) do not reveal why they are working correctly. This leads to justified skepticism towards elliptic curves. We counter this skepticism by not avoiding the difficult part. The proof of the group structure is presented in full detail and is based entirely upon the chapter on algebraic structures. In particular, we do not expect the reader to have previous knowledge from algebraic geometry or from complex analysis. Combinatorics on words studies structural properties of sequences, such as repetitions. Motivated by Higman’s lemma on subsequences, we give a concise introduction to Ramsey theory and well-quasi-orderings. The chapter culminates in a proof of Kruskal’s tree theorem. The chapter on automata leads us to an area of theoretical computer science, in which the theory of semigroups plays a central role. We deal with regular languages in a general context, which is crucial for understanding deterministic and nondeterministic automata. Two key results in this area are the characterization of star-free languages by Schützenberger and the decomposition theorem of Krohn and Rhodes. For both results, we present new and simplified proofs. We also give a recent interpretation of Green’s relations in terms of local divisors. The above-mentioned results concern languages over finite words. However, the need for formal methods in hardware and software verification makes it necessary to analyze systems that do not intend to terminate, leading to the study of infinite words and the connection between
Preface |
VII
logic and automata on infinite objects. Examples of such systems are operating systems, embedded systems in cars, planes, etc. Here, we deal with Büchi automata. In the 1960s, Büchi showed that monadic second-order logic over infinite words is decidable; we give a detailed proof of this result. Two more applications of automata theory follow: decidability of Presburger arithmetic and a simple automata theoretic proof that there is an effective description for the solution set of linear Diophantine systems. The final chapter is devoted to discrete infinite groups. The chosen topics belong to the algorithmic branch of combinatorial group theory, as developed in the classic book by Lyndon and Schupp. We chose to explain standard constructions using confluent term rewriting systems. This leads to mathematically precise statements and in many cases directly to corresponding algorithms. In the section on free groups, the influence of Serre and Stallings on modern group theory becomes obvious. A particularly nice application of Stallings’ ideas is the geometric proof for the fact that the automorphism group of finitely generated free groups is generated by (finitely many) Whitehead automorphisms. Authors. Volker Diekert holds the chair of Theoretical Computer Science at the University of Stuttgart. He studied in Hamburg and Montpellier, completing a Diplôme des Études Supérieures with Alexander Grothendieck. He graduated in mathematics in Hamburg and regularly attended seminars offered by Ernst Witt. He earned his PhD with Jürgen Neukirch in Regensburg. These late mathematicians and truly impressive personalities have had an enduring influence in his life. Manfred Kufleitner studied computer science with a minor in mathematics, and then earned his PhD in computer science in Stuttgart under the supervision of the first author. Shortly after, he spent one year in Bordeaux. He has been a guest professor at Technische Universität München and an acting professor at the University of Hamburg. Gerhard Rosenberger comes with the greatest life experience among the authors. He has a strong background in teaching mathematics and much experience in writing textbooks. His research work has been enriched by extended stays in Russia and in the United States. In his scientific collaboration, he has co-authors from more than 25 different countries. Currently, he teaches at the University of Hamburg. Ulrich Hertrampf started his career in pure mathematics. After his diploma thesis in the area of group representations, he earned his PhD in the area of practical computer science, but returned to theoretical research when he completed his habilitation in the structural complexity theory group of Klaus Wagner in Würzburg. Cover. The title of the book is Discrete Algebraic Methods, or DAM for short; and the book’s cover page shows the Hoover Dam. It is a concrete construction to gather a huge pool of water. The purpose is not to block it, but to provide a continuous stream
VIII | Preface
of water and hydroelectric power. Having this interpretation in mind, we are glad that we are permitted to reproduce the picture by John Phelan. Miscellaneous. We believe that combining exercises with the study of advanced material deepens the understanding of the subject matter. Exercises can be found at the end of each chapter and complete solutions are given in the appendix. The exercises are of variable difficulty; their order corresponds to the presentation of the topics within a chapter. All chapters end with short summaries to reinforce learning. We prefer succinct explanations over verbose ones, thus leaving free space for the reader’s own considerations. Throughout, we give complete proofs. Most chapters end with topics which lie beyond standard contents. For aesthetic reasons, we have intentionally omitted punctuation marks at the end of displayed formulas. Our style of writing is motivated by gender-free mathematics. The use of gender is purely grammatical. Thus, if we speak of “she” or “he” we intend no preference to women or men. If “Alice” and “Bob” appear occasionally, we mean “Person A” and “Person B.” If mathematicians are mentioned by name, we include biographical details, whenever it appears appropriate and if publicly available. In some cases, we obtained permission to mention years of birth. In other words, these details are partly missing.
Acknowledgments A successful realization of this book would have been impossible without support from the Theoretical Computer Science department at the University of Stuttgart. The entire team was heavily involved in the preparation of the manuscript. Their dedicated participation in proofreading and their help in writing solutions to the exercises are gratefully acknowledged. In particular, the following former and present members made numerous contributions to the completion of the German and English editions of the book: Lukas Fleischer, Jonathan Kausch, Jürn Laun, Alexander Lauser, Jan Philipp Wächter, Tobias Walter, and Armin Weiß. Helpful technical support also came from Heike Photien, Horst Prote, and Martin Seybold as well as from Steven Kerns at the Language Center of the University of Stuttgart. During fall 2015, visiting researchers at the University of Stuttgart read parts of the book and provided helpful comments. This includes Jorge Almeida (Universidade do Porto, Porto), Murray Elder (The University of Newcastle, Australia), Théo Pierron (LaBRI, Bordeaux), M. Hossein Shahzamanian (Universidade do Porto, Porto), and Marc Zeitoun (LaBRI, Bordeaux). Those mathematical, stylistic, and orthographic errors that undoubtedly remain shall be charged to the authors. Last but not least, we thank DeGruyter for publishing our book. Stuttgart and Hamburg, January 2016
VD, MK, GR, UH
Contents Preface | V 1 Algebraic structures | 1 1.1 Groups | 4 1.2 Regular polygons | 10 1.3 Symmetric groups | 13 1.4 Rings | 15 1.5 Modular arithmetic | 20 1.5.1 Euclidean algorithm | 20 1.5.2 Ideals in the integers | 22 1.5.3 Chinese remainder theorem | 23 1.5.4 Euler’s totient function | 24 1.6 Polynomials and formal power series | 26 1.7 Hilbert’s basis theorem | 32 1.8 Fields | 33 1.9 Finite fields | 35 1.10 Units modulo n | 37 1.11 Quadratic reciprocity | 38 Exercises | 41 Summary | 46 2 Cryptography | 49 2.1 Symmetric encryption methods | 50 2.2 Monoalphabetic cipher | 52 2.3 Polyalphabetic cipher | 53 2.4 Frequency analysis and coincidence index | 55 2.5 Perfect security and the Vernam one-time pad | 56 2.6 Asymmetric encryption methods | 58 2.7 RSA cryptosystem | 60 2.8 Rabin cryptosystem | 61 2.9 Diffie–Hellman key exchange | 63 2.10 ElGamal cryptosystem | 64 2.11 Cryptographic hash functions | 65 2.12 Digital signatures | 67 2.13 Secret sharing | 69 2.14 Digital commitment | 71 2.15 Shamir’s attack on the Merkle–Hellman cryptosystem | 73 Exercises | 78 Summary | 81
X | Contents
3 Number theoretic algorithms | 83 3.1 Runtime analysis of algorithms | 83 3.2 Fast exponentiation | 86 3.3 Binary GCD | 87 3.4 Probabilistic recognition of primes | 88 3.4.1 Fermat primality test and Carmichael numbers | 88 3.4.2 Solovay–Strassen primality test | 89 3.4.3 Miller–Rabin primality test | 90 3.4.4 Applications of the Miller–Rabin scheme | 93 3.4.5 Miller–Rabin versus Solovay–Strassen | 95 3.5 Extracting roots in finite fields | 96 3.5.1 Tonelli’s algorithm | 97 3.5.2 Cipolla’s algorithm | 98 3.6 Integer factorization | 99 3.6.1 Pollard’s p − 1 algorithm | 100 3.6.2 Pollard’s rho algorithm for factorization | 100 3.6.3 Quadratic sieve | 101 3.7 Discrete logarithm | 103 3.7.1 Shanks’ baby-step giant-step algorithm | 104 3.7.2 Pollard’s rho algorithm for the discrete logarithm | 105 3.7.3 Pohlig–Hellman algorithm for group order reduction | 106 3.7.4 Index calculus | 107 3.8 Multiplication and division | 108 3.9 Discrete fourier transform | 109 3.10 Primitive roots of unity | 112 3.11 Schönhage–Strassen integer multiplication | 113 Exercises | 117 Summary | 119 4 Polynomial time primality test | 121 4.1 Basic idea | 121 4.2 Combinatorial tools | 122 4.3 Growth of the least common multiple | 123 4.4 Of small numbers and large orders | 125 4.5 Agrawal–Kayal–Saxena primality test | 125 Summary | 129 5 5.1 5.1.1 5.1.2 5.1.3
Elliptic curves | 131 Group law | 135 Lines | 135 Polynomials over elliptic curves | 137 Divisors | 141
Contents | XI
5.1.4 Picard group | 143 5.2 Applications of elliptic curves | 144 5.2.1 Diffie–Hellman key exchange with elliptic curves | 145 5.2.2 Pseudocurves | 146 5.2.3 Factorization using elliptic curves | 148 5.2.4 Goldwasser–Kilian primality certificates | 150 5.3 Endomorphisms of elliptic curves | 153 Exercises | 158 Summary | 159 6 Combinatorics on words | 161 6.1 Commutation, transposition and conjugacy | 162 6.2 Fine and Wilf’s periodicity lemma | 164 6.3 Kruskal’s tree theorem | 165 Exercises | 170 Summary | 171 7 Automata | 173 7.1 Recognizable sets | 174 7.2 Rational sets | 181 7.3 Regular languages | 186 7.4 Star-free languages | 189 7.5 Krohn–Rhodes theorem | 193 7.6 Green’s relations | 205 7.7 Automata over infinite words | 210 7.7.1 Deterministic Büchi automata | 211 7.7.2 Union and intersection | 213 7.7.3 Omega-rational languages | 214 7.7.4 Recognizability of omega-regular languages | 216 7.7.5 Monadic second-order logic over infinite words | 219 7.8 Presburger arithmetic | 223 7.9 Solutions of linear Diophantine systems | 228 Exercises | 231 Summary | 234 8 8.1 8.2 8.3 8.4 8.4.1 8.4.2
Discrete infinite groups | 236 Classical algorithmic problems | 236 Residually finite monoids | 236 Presentations | 237 Rewriting systems | 238 Termination and confluence | 238 Semi-Thue systems | 241
XII | Contents
8.5 Solving the word problem in finitely presented monoids | 246 8.6 Free partially commutative monoids and groups | 248 8.7 Semidirect products | 252 8.8 Amalgamated products and HNN extensions | 253 8.9 Rational sets and Benois’ theorem | 259 8.10 Free groups | 262 8.11 The automorphism group of free groups | 268 8.12 The special linear group SL(2, ℤ) | 277 Exercises | 281 Summary | 284 Solutions to exercises | 286 Chapter 1 | 286 Chapter 2 | 293 Chapter 3 | 297 Chapter 5 | 303 Chapter 6 | 306 Chapter 7 | 308 Chapter 8 | 318 Bibliography | 325 Index | 329
1 Algebraic structures Algebra is one of the oldest mathematical disciplines. Language and techniques from modern algebra are used in most areas of mathematics. Algebraic methods have their origins in the elementary theory of numbers and the study of equations. The elementary theory of numbers considers the properties of number systems such as the integers, the rationals, or the reals. The theory of equations, on the other hand, deals with solving equations and systems thereof. Both subjects date back to classical times. The Babylonians already knew how to solve linear and quadratic equations and how to compute square roots several hundred years before the Common Era. Their applications included land surveying and mapping, architecture, tax collection, and the computation of calendars. General procedures for solving cubic equations, however, were only discovered in the 16th century by Scipione del Ferro (1465–1526), rediscovered by Niccoló Fontana Tartaglia (1500–1557) and published by Gerolamo Cardano (1501– 1576); this is the reason why the formulas for the solutions were named after Cardano. A bit later, Lodovico Ferrari (1522–1565), a student of Cardano, found the respective solutions for equations of degree 4. The availability of computational methods also extended the number ranges. The discovery of complex numbers, for example, is often attributed to Cardano and Rafael Bombelli (1526–1573), whereas the (positive) rational numbers were already known in ancient Egypt. In antiquity, it was understood that √2 is not rational. The proof is credited to Theaitetos (about 415–369 B.C.) and its written version can be found in Book X of Euclid’s Elements. Nevertheless, √2 as a solution of X 2 = 2 has a simple description and, more important in antiquity, the hypotenuse of a right-angled triangle with legs of length 1 gives a geometric interpretation. The first real numbers that are not solutions of polynomial equations with rational coefficients appeared in 1844 when Joseph Liouville (1809–1882) specified them. Such numbers are called transcendental in contrast to algebraic numbers, which are solutions of a system of polynomial equations with rational coefficients. The two best-known examples of transcendental numbers are Euler’s number e (named after Leonhard Euler, 1707–1783) and the number π. The proof that e is transcendental was given by Charles Hermite (1822–1901) in 1872. The corresponding result for π from 1882 is due to Carl Louis Ferdinand von Lindemann (1852–1939). For Euler’s constant γ = 0.57772 . . . , it is open whether or not γ is transcendental. The methods for showing “nonfeasibility” are often more abstract than the ones for feasibility. Giving an according polynomial suffices to show that some number is algebraic. But how can one show that no polynomial with rational coefficients has π as a root? How can one show that there is no general formula for the roots of polynomials of degree 5? Or how can one show that the ancient problem of trisecting an angle is not possible using only a ruler and a compass? Such questions had strong influence on the development of modern algebra.
2 | 1 Algebraic structures
On the one hand, people tried to get a better understanding of the structure of numbers; on the other hand, a broader appearance of vectors and matrices motivated the investigation of generalized arithmetic operations. Niels Henrik Abel (1802–1829) showed in 1824 that there is no formula for solving equations of degree 5. This result was conjectured in 1799 by Paolo Ruffini (1765–1822), who gave an incomplete proof, which in turn influenced Abel’s work. Building on Abel’s ideas, Évariste Galois (1811–1832) investigated general equations and realized that the development of group theory was very significant. Pushed by David Hilbert (1862–1943), the so-called axiomatic approach became more and more popular among mathematicians at the end of the 19th century. The idea was to explore interesting objects (for example, numbers) not directly. Instead, investigations were based on a small set of given conditions (axioms), which are, among others, satisfied by the objects in question. This initiated the study of more abstract number systems. Ernst Steinitz (1871–1928) published the first axiomatic study of abstract fields in 1910. As a further generalization of fields, the concept of rings was introduced during the first few years of the 20th century. Amalie Emmy Noether (1882–1935) provided the basis for the study of commutative rings in 1921. The theory of semigroups and monoids was developed comparatively late. This is mainly due to the intuitive idea that the more structure an object has, the more properties can be derived for these objects. Many mathematicians thought that semigroups were too general to have any nontrivial structural properties. The first to disprove this view were Kenneth Bruce Krohn and John Lewis Rhodes (born 1937) in 1965. They described a decomposition of finite semigroups into simple groups and flip–flops,¹ see Theorem 7.26. This result is often considered to be the foundation of semigroup theory. The real significance of semigroups was only noticed as computer science became more and more important. In particular, the close relationship between semigroups and automata contributed to this. A more general point of view is taken by universal algebra, sometimes also referred to as general algebra. It was established in 1935 by Garrett Birkhoff (1911–1996). Here, arbitrary algebraic structures are investigated and frequently appearing concepts – such as homomorphisms and substructures – are presented in a unified form. The idea of generalization is even pushed a little further in category theory. Samuel Eilenberg (1913–1998) and Saunders MacLane (1909–2005) developed the concept of categories and functors in 1942. Category theory, due to its universality, appears in many forms; for example, the semantics of programming languages can be presented in a mathematical framework within this theory. Applications of algebra are extraordinarily diverse; for example, algebraic ideas play an important role in automata theory, algebraic coding theory and codes of variable length, cryptography, symmetry considerations in chemistry and physics, com-
1 By flip–flop we mean the three element monoid U 2 = {1, a, b} with xy = y for y ≠ 1.
1 Algebraic structures
|
3
puter algebra systems, and graph theory. Modern algebra is characterized by numerous notions. In fact, this is one of the contributions of these theories, as important properties and criteria can be described and investigated in a uniform way. This chapter is devoted to a detailed study of groups; then we consider rings, polynomials, and fields, as far as they fit into this book’s framework or are needed in order to understand the other chapters. A short overview of the various algebraic structures is given by the following diagrams. semigroups
rings
monoids
domains
groups
fields
existence of inverses
commutative multiplication multiplicative inverses
neutral element
x ⋅ y ≠ 0 for x, y ≠ 0
associative law
distributive laws
One operation
Two operations
We first give a brief overview of some elementary concepts of algebra. For binary operations ∘ : M × M → M, we often use the infix notation x ∘ y instead of ∘(x, y), and if the underlying operation is clear, then we write xy for short. An operation on M is associative if for all x, y, z ∈ M, we have (xy)z = x(yz), and it is commutative or Abelian (named after Niels Henrik Abel), if xy = yx holds for all x, y ∈ M. For instance, multiplication forms a commutative and associative operation on the set of natural numbers ℕ = {0, 1, 2, . . . }. An element e ∈ M is called idempotent if e2 = e. It is called neutral if xe = ex = x for all x ∈ M. A neutral element is idempotent, but the converse does not hold: consider the set {0, 1} with multiplication, then both elements are idempotent, 1 is neutral, but 0 is not. A neutral element is often referred to as an identity. Similarly, an element z ∈ M is a zero element if xz = zx = z for all x ∈ M. A set M, together with an associative operation forms a semigroup. A semigroup with a neutral element is called a monoid. If in a monoid M with neutral element e each element x has an inverse y such that xy = yx = e, then M forms a group. We write (M, ⋅, e) when emphasizing the operation and the neutral element. In the case of two operations + and ⋅, the rules connecting these two operations can be formulated. This leads to the concepts of rings and fields, which we will discuss later. Example 1.1. The set M = {0, . . . , n} with the operation xy = min{x + y, n} forms a finite commutative monoid. The element n is a zero element, and the element 0 is neutral. Since zero elements cannot have an inverse, M is not a group if n ⩾ 1. ◊
4 | 1 Algebraic structures
Roughly speaking, a subset Y of an algebraic structure Z forms a substructure if Y itself has the same structural properties as those claimed for Z. For example, consider the semigroup M = {1, 0} with multiplication as operation; in this case, {1} and {0} are subsemigroups of M. The semigroup M forms a monoid, but only {1} is a submonoid. Even though {0} forms a monoid, it is not a submonoid since it does not have the same identity as M. A substructure Y of Z is generated by X if it is the smallest substructure containing X. We denote this substructure by ⟨X⟩. A mapping between two algebraic structures which is compatible with the respective operations (such as + and ⋅) and which maps neutral elements to neutral elements is called a homomorphism. Thus, for a homomorphism φ : M → N between two monoids M and N, the two properties φ(xy) = φ(x)φ(y) and φ(1M ) = 1N are required; here, 1M and 1N denote the neutral elements of M and N, respectively. A bijection φ always has an inverse map φ−1 . If both mappings φ and φ−1 are homomorphisms, then φ is called an isomorphism. In most cases, a bijective homomorphism already is an isomorphism. An automorphism is an isomorphism from a structure onto itself. For numbers k, ℓ ∈ ℤ, we write k | ℓ if there is an integer m ∈ ℤ with km = ℓ; in this case, k divides ℓ, and ℓ is a multiple of k. The greatest common divisor of two integers k, ℓ, not both equal to 0, is the largest nonnegative integer d such that d | k and d | ℓ, written as d = gcd(k, ℓ). In addition, it is natural to set gcd(0, 0) = 0. Two integers k, ℓ are coprime if gcd(k, ℓ) = 1. We say that k is congruent to ℓ modulo n (and write k ≡ ℓ mod n) if there is m ∈ ℤ with k = ℓ + mn, that is, if n | k − ℓ. The integer n is the modulus.
1.1 Groups The notion of a group goes back to Galois, who examined the solutions of polynomial equations over the rational numbers and tried to classify them into groups of solvable equations. Galois was the first to recognize the close connection between solutions of polynomial equations, the corresponding field extensions, and permutation groups. Old manuscripts indicate that he wrote a survey of his mathematical insights the night before a fatal duel, which took place early in the morning of May 30th, 1832. He died one day later at the age of 20. The importance of his work was recognized only posthumously in the middle of the 19th century. We study groups from an abstract point of view. A group G is a set with a binary operation (g, h) → g ⋅ h which is associative and thus satisfies (x ⋅ y) ⋅ z = x ⋅ (y ⋅ z). Furthermore, there is a neutral element 1 ∈ G such that 1 ⋅ x = x ⋅ 1 = x, and for each x ∈ G there is an inverse y with x ⋅ y = y ⋅ x = 1. In fact, it is sufficient to require the existence of left inverses since these are automatically also right inverses, see Exercise 1.13 (a). Moreover, the inverses are uniquely determined and we write x−1 for the inverse y of x. If the operation is written additively (i.e., using + instead of ⋅), then the inverse is denote by −x and the neutral element is 0. For example, the integers ℤ, with
1.1 Groups
|
5
addition, form a group. A subset H of a group G is a subgroup if H contains the neutral element of G and if H with the operation of G forms a group; see Exercise 1.14 for other characterizations of subgroups. For any subset X ⊆ G, the subgroup of G generated by X is ⟨X⟩. Thus, ⟨X⟩ contains exactly those group elements that can be written as a product of elements x and x−1 with x ∈ X. The empty product yields the neutral element. For X = {x1 , . . . , x n }, we also write ⟨x1 , . . . , x n ⟩ instead of ⟨{x1 , . . . , x n }⟩. In the following, let G be a group, g, g1 , g2 ∈ G be the arbitrary group elements and H be a subgroup of G. We call the set gH = { gh | h ∈ H } the (left) coset of H with respect to g. Analogously, Hg is the right coset with respect to g. The set of cosets is denoted by G/H, that is G/H = { gH | g ∈ G } Analogously, H \ G = { Hg | g ∈ G } is the set of right cosets. Lemma 1.2. For cosets and right cosets the following properties hold: (a) |H| = |gH| = |Hg| (b) g1 H ∩ g2 H ≠ 0 ⇔ g1 ∈ g2 H ⇔ g1 H ⊆ g2 H ⇔ g1 H = g2 H (c) |G/H| = |H \ G| Proof: (a) The mapping g⋅ : H → gH, x → gx is a bijection with inverse map g−1 ⋅ : gH → H, x → g−1 x. Thus, |H| = |gH|, and by symmetry also |H| = |Hg|. (b) Let g1 h1 = g2 h2 with h1 , h2 ∈ H. Then, g1 = g2 h2 h−1 1 ∈ g2 H. From g1 ∈ g 2 H, we obtain g1 H ⊆ g2 H ⋅ H = g2 H. Now, g1 H ⊆ g2 H immediately implies g1 H ∩ g2 H ≠ 0, and by symmetry g2 H ⊆ g1 H. Thus, all four statements in (b) are equivalent. −1 (c) g1 ∈ g2 H is true if and only if g−1 1 ∈ Hg 2 holds. Thus, using (b), we obtain a −1 bijection gH → Hg from the set G/H to H \ G. Lemma 1.2 (b) shows that different cosets of H are disjoint. Each element g ∈ G belongs to the coset gH. These two facts together show that the cosets induce a partition of G, where by (a) all classes are of the same size. G .. .
.. .
.. .
1 .. .
x .. .
y
H
xH
yH
.. . ⋅⋅⋅
z .. .
⋅⋅⋅
zH
.. .
Fig. 1.1. The group G as a disjoint union of cosets.
6 | 1 Algebraic structures The index of H in G is [G : H] = |G/H|. Lemma 1.2 (c) yields [G : H] = |H \ G|. The order of G is its size |G|. The order of an element g ∈ G is the order of ⟨g⟩. If ⟨g⟩ is finite, then the order of g is the smallest positive integer n such that g n = 1. The following fundamental theorem is named after Joseph Louis Lagrange, 1736–1813. Theorem 1.3 (Lagrange). |G| = [G : H] ⋅ |H| Proof: By Lemma 1.2, the group G is the disjoint union of [G : H] cosets of H, each of size |H|; see Figure 1.1. For finite groups G, Lagrange’s theorem has the following consequences: the order of a subgroup H divides the order of G. If H = ⟨g⟩ for g ∈ G, this means that the order of any group element g ∈ G divides the order of G. If K is a subgroup of H and H is a subgroup of G, then we obtain [G : K] = [G : H][H : K]. Theorem 1.4. Let g be a group element of order d. Then g n = 1 ⇔ d | n. Proof: If n = kd, then g n = (g d )k = 1k = 1. For the converse, let g n = 1 and n = kd + r with 0 ⩽ r < d. Then, 1 = g n = (g d )k g r = 1 ⋅ g r = g r and therefore r = 0. So, n = kd + 0 = kd is a multiple of d. Corollary 1.5. Let G be a finite group. For all g ∈ G, we have g|G| = 1. Proof: Let d be the order of g ∈ G. Then, d divides |G| by Lagrange’s theorem, and thus g|G| = 1 by Theorem 1.4. A group G is called cyclic if G is generated by a single element x, that is, G = ⟨x⟩. In this case, x is called a generator of G. Cyclic groups are either finite or isomorphic to (ℤ, +, 0). If |⟨x⟩| = |⟨y⟩| holds for cyclic groups ⟨x⟩ and ⟨y⟩, then the mapping x i → y i is a group isomorphism. Therefore, cyclic groups are uniquely determined (up to isomorphism) by the group order. For all n ⩾ 1, the set {1, x, x2 , . . . , x n−1 } with the operation x i ⋅x j = x(i+j) mod n is a cyclic group with n elements, generated by x. The operation in this group is illustrated in Figure 1.2. ⋅x x
x2
⋅x
⋅x 1
⋅x x n−1
⋅x n−2 ⋅x x
⋅⋅ ⋅⋅ ⋅⋅ ⋅⋅ ⋅⋅ ⋅⋅⋅ ⋅⋅ ⋅⋅
Fig. 1.2. Group of order n generated by x.
1.1 Groups
|
7
Theorem 1.6. Groups of prime order are cyclic. Proof: Let |G| be prime. We can choose g ∈ G \ {1} because |G| ⩾ 2. Now, |⟨g⟩| divides |G| by Lagrange’s theorem. Since |G| is prime and |⟨g⟩| ⩾ 2, we obtain |⟨g⟩| = |G| and ⟨g⟩ = G. Theorem 1.7. Subgroups of cyclic groups are cyclic. Proof: Let G = ⟨g⟩ and U be a subgroup of G. Since the trivial subgroup {1} is cyclic, in the following we may assume that U ≠ {1}. There exists n > 0 with g n ∈ U. Let n be minimal with this property. Consider an arbitrary element g k ∈ U, then g k mod n ∈ U. Since n was chosen to be minimal, we conclude k ≡ 0 mod n and therefore U is generated by g n . Theorem 1.8. Let G be an Abelian group and let g, h ∈ G be elements with coprime orders m, n ∈ ℕ. Then, gh has order mn. Proof: Let d be the order of gh. From (gh)mn = (g m )n (h n )m = 1 ⋅ 1 = 1, together with Theorem 1.4, we see that d is a divisor of mn. If d ≠ mn, then mn has a prime divisor p satisfying d | mn p . But gcd(m, n) = 1, and thus either p | m or p | n, but not both. m
m
Assume that p | m and p ∤ n. Then, (gh) p n = g p n ⋅ 1 ≠ 1 since of m. This contradicts d | mn p ; thus we have d = mn.
m pn
is not a multiple
The following theorem is named after Augustin Louis Cauchy (1789–1857). The proof presented here was given by McKay (1923–2012) in [73]. Theorem 1.9 (Cauchy). Let G be a finite group and p be a prime divisor of |G|. Then, G contains an element of order p. Proof: Let n = |G| and S = { (g1 , . . . , g p ) ∈ G p | g1 ⋅ ⋅ ⋅ g p = 1 in G }. In each p-tuple (g1 , . . . , g p ) ∈ S, the elements g1 , . . . , g p−1 ∈ G can be chosen arbitrarily and g p is then uniquely determined by g p = (g1 ⋅ ⋅ ⋅ g p−1 )−1 . Hence, |S| = n p−1 . Define f : S → S by f(g1 , . . . , g p ) = (g p , g1 , . . . , g p−1 ), that is, f cyclically shifts the elements. Note that, indeed (g p , g1 , . . . , g p−1 ) ∈ S. For every s ∈ S one can consider the restriction of f to F s = {f(s), f 2 (s), . . . , f p (s)}. This restriction is a bijection, and since f p is the identity, the order of f (as a bijection on F s ) is a divisor of p, that is, it is either 1 or p. In particular, |F s | is either 1 or p. The sets { F s | s ∈ S } form a partition of S. Let k be the number of partition classes with one element and ℓ be the number of partition classes with p elements. We obtain n p−1 = |S| = k + pℓ, and p | n yields k ≡ 0 mod p. From (1, . . . , 1) ∈ S, we obtain k ⩾ 1 and therefore k ⩾ p. Hence, there exists s ∈ S \ {(1, . . . , 1)} with f(s) = s, that is, for s = (g1 , . . . , g p ), we have (g1 , . . . , g p ) = f(s) = (g p , g1 , . . . , g p−1 ). We conclude g1 = g2 = ⋅ ⋅ ⋅ = g p ≠ 1 and the p definition of S yields g1 = 1. Therefore, g1 ∈ G has order p by Theorem 1.4. The rest of this section is devoted to the homomorphism theorem for groups. As we shall see later, this is the basis for an analogous result in ring theory. Let φ : G → K be
8 | 1 Algebraic structures a group homomorphism, that is, φ(g1 g2 ) = φ(g1 ) φ(g2 ) for all g1 , g2 ∈ G. We define the following sets: ker(φ) = { g ∈ G φ(g) = 1 }
im(φ) = φ(G) = { φ(g) ∈ K g ∈ G }
We call ker(φ) the kernel of φ and im(φ) the image of φ. A subgroup H of G is normal if the left cosets of H are equal to the right cosets of H, that is, if gH = Hg for all g ∈ G. This is equivalent to saying that the left and the right cosets define the same partitions of G. If G is commutative, then obviously all subgroups are normal. Theorem 1.10. For every subset H of a group G, the following properties are equivalent: (a) H is a normal subgroup of G. (b) H is a subgroup such that G/H forms a group with respect to the operation g1 H ⋅ g2 H = g1 g2 H; its neutral element is H. (c) H is the kernel of a group homomorphism φ : G → K. (d) H is a subgroup and for all g ∈ G, we have gHg−1 ⊆ H. Proof: (a) ⇒ (b): We have g1 H g2 H = g1 (Hg2 )H = g1 (g2 H)H = g1 g2 H. This shows that the operation on G/H is well defined and associative and that H is the neutral element. Being well defined means that the operation is independent of the chosen representatives g1 and g2 . The inverse of gH is g−1 H ∈ G/H. (b) ⇒ (c): Consider the mapping φ : G → G/H, g → gH. Then, φ(g1 g2 ) = g1 g2 H = g1 Hg2 H = φ(g1 ) φ(g2 ). This shows that φ is a homomorphism. The kernel of φ is ker(φ) = { g ∈ G | gH = H } = H. (c) ⇒ (d): Let φ : G → K be a group homomorphism with ker(φ) = H. Then, φ(1) = 1 −1 = 1 ⋅ 1 = 1, and yields 1 ∈ H. For all g1 , g2 ∈ H, we have φ(g1 g−1 2 ) = φ(g 1 )φ(g 2 ) −1 thus g1 g2 ∈ ker(φ) = H. Therefore, H is a subgroup; see Exercise 1.14 (a). For all g ∈ G and all h ∈ H, we have φ(ghg−1 ) = φ(g)φ(h)φ(g)−1 = φ(g)φ(g)−1 = 1 and thus gHg−1 ⊆ H = ker(φ). (d) ⇒ (a): From gHg−1 ⊆ H, we obtain gH ⊆ Hg and Hg−1 ⊆ g−1 H for all g ∈ G. Since all group elements in G can be represented as inverses, the latter implies Hg ⊆ gH for all g ∈ G. Both relations together yield gH = Hg for all g ∈ G. If H is a normal subgroup, then the group of Theorem 1.10 (b) is called the quotient group of G modulo H. The mapping G → G/H, g → gH is a homomorphism with kernel H. In the case of cyclic groups, quotient groups can be interpreted as “spooling.” Consider the finite cyclic group C6 = {0, 1, 2, 3, 4, 5} with the operation given by addition modulo 6. Then, {0, 3} is a subgroup. Since cyclic groups are commutative, {0, 3} is normal and we can form the quotient group C6 /{0, 3}; see Figure 1.3. For the infinite cyclic group (ℤ, +, 0) and its normal subgroup 4ℤ a similar picture arises; see Figure 1.4. Here, we identify ℤ/4ℤ with the representatives 0, 1, 2, 3. Each element of this set corresponds to the coset in which it occurs; for example, 3 corresponds to the coset 3 + 4ℤ = {. . . , −1, 3, 7, . . . }. The same interpretation works quite analogously
1.1 Groups
| 9
2 4
0
3
1
{0, 3}
1
3
{1, 4}
2 0
4
{2, 5} 5
5
C6 /{0, 3}
Spooling of C6
C6
Fig. 1.3. Illustration of C 6 /{0, 3}.
2 + 4ℤ
1 + 4ℤ
3 + 4ℤ
4ℤ 5
6 7
1
2
4
3
ℤ
−2
−3 0
−1 1
2
−4
ℤ/4ℤ 3
0
Fig. 1.4. Illustration of ℤ/4ℤ.
for the infinite, noncyclic group (ℝ, +, 0) and its subgroup 4ℤ. Thus, we obtain an illustration of the infinite group ℝ/4ℤ as a circle of length 4. Theorem 1.11. Subgroups of index 2 are normal. Proof: Let [G : H] = 2. Then, gH = { and therefore, H is normal.
H G\H
if g ∈ H } = Hg if g ∈ ̸ H
10 | 1 Algebraic structures
im(φ)
φ
G ker(φ)
1
g1 ⋅ ker(φ)
φ(g1 )
g2 ⋅ ker(φ)
φ(g2 )
.. .
.. .
g3 ⋅ ker(φ)
φ(g3 )
Fig. 1.5. Illustration of the homomorphism theorem for groups.
Theorem 1.12 (Homomorphism theorem for groups). Let φ : G → K be a group homomorphism. Then, φ induces an isomorphism: φ : G/ker(φ) → im(φ) g ker(φ) → φ(g)
(1.1) (1.2)
Proof: Let H = ker(φ). The mapping φ is well defined: if g1 H = g2 H, then g1 = g2 h for a suitable h ∈ H. This yields φ(g1 H) = φ(g1 ) = φ(g2 h) = φ(g2 )φ(h) = φ(g2 ) = φ(g2 H) since φ(h) = 1. But if φ is well defined, it is obviously a homomorphism and by construction it is surjective. It remains to show that φ is injective. Let φ(g1 H) = −1 φ(g2 H). Then, φ(g1 ) = φ(g2 ) and φ(g−1 1 g2 ) = 1. Finally, this yields g 1 g2 ∈ H and therefore g1 H = g1 (g−1 1 g2 H) = g2 H. Figure 1.5 gives an illustration of the induced isomorphism φ in Theorem 1.12. Remark 1.13. The homomorphism theorem for groups together with Theorem 1.10 shows that it is equivalent to consider quotient groups or homomorphic images of groups. Furthermore, we see that a group homomorphism φ is injective if and only if ker(φ) = {1} holds. ◊
1.2 Regular polygons In this section, we want to demonstrate the concepts introduced so far by looking at mappings on regular polygons. A regular n-gon is an undirected graph C n = (V, E) with vertices V = ℤ/nℤ and edges E = { {i, i + 1} | i ∈ ℤ/nℤ }. The cases n = 0, n = 1 and n = 2 degenerate and we thus assume that n ⩾ 3 in the remainder of this section. An n-gon always has n edges.
1.2 Regular polygons
|
11
3 2 0
1
Triangle
3
2
0
1
4
2 0
Quadrangle
1
Pentagon
An automorphism of a graph G = (V, E) is a bijective mapping φ : V → V satisfying ∀x, y ∈ V : {x, y} ∈ E ⇔ {φ(x), φ(y)} ∈ E The set of automorphisms Aut(G) ⊆ V V of a graph G = (V, E) forms a group with the composition of mappings as operation; the neutral element is idV . The group Aut(G) is called the automorphism group or the symmetry group of G. In this section, we investigate the groups D n = Aut(C n ), the automorphism groups of regular n-gons; they are called dihedral groups. If φ ∈ D n , then φ is uniquely determined by the values φ(0) and φ(1). Let φ(0) = i with i ∈ ℤ/nℤ. Then, for φ(1) the only possible values are φ(1) = i + 1 or φ(1) = i − 1, because vertex i only has edges to vertices i−1 and i+1. Since we are only interested in the case n ⩾ 3, the vertices i − 1 and i + 1 are different. If φ(1) = i + 1, then φ(2) = i + 2 follows: i and i + 2 are the only neighbors of i + 1 and φ(2) ≠ i = φ(0) because φ is bijective. Continuing this way, we obtain φ(j) = φ(0)+ j. In the other case, φ(1) = i − 1, we symmetrically get φ(j) = φ(0) − j for all j ∈ ℤ/nℤ. This means |D n | = 2n for n ⩾ 3, because for φ(0), we have n values to choose and two orientations are possible. The six elements of D3 can be illustrated as follows:
0
1 id
1
0
2 1
2 δ
0
2
0
δ2
2 σ
0
2
1
0
1 δσ
2
1 δ2 σ
Because of 6 = 3!, all permutations of the vertices 0, 1, and 2 are contained in D3 . Define the rotation δ ∈ D n and the reflection σ ∈ D n by δ(j) = j + 1 σ(j) = −j for j ∈ ℤ/nℤ. The k-fold application of δ is δ k (j) = j + k and its inverse is (δ k )−1 = δ−k = δ n−k . The order of δ is n and of σ is 2. We can visualize σ as the reflection over the line through the vertex 0 and the center of the polygon. If φ ∈ D n satisfies φ(0) = i and φ(1) = i + 1, then φ = δ i . And if φ(0) = i and φ(1) = i − 1, then φ = δ i σ. It follows that D n = {id, δ, δ2 , . . . , δ n−1 , σ, δσ, δ2 σ, . . . , δ n−1 σ}
12 | 1 Algebraic structures
Next, we give some identities for D n σ 2 = id
δσ = σδ−1
δ n = id
The subgroup S = ⟨σ⟩ = {id, σ} of D n = ⟨δ, σ⟩ contains two elements and, by Lagrange’s theorem, it has n cosets. We obtain G/S = {S, δS, . . . , δ n−1 S}. From δσ = σδ−1 ∈ ̸ Sδ for n ⩾ 3, we conclude δS ≠ Sδ. So, the subgroup S is not normal. In the case n = 3, dividing D3 into left cosets and right cosets of S yields the following partitions: D3
D3 id
δσ
δ2
id
δσ
δ2
Sδ2
σ
δ
δ2 σ
σ
δ
δ2 σ
Sδ
S
δS
δ2 S
S
Next, we consider the subgroup N = ⟨δ⟩ = {id, δ, δ2 , . . . , δ n−1 } of order n. This group has index 2 in D n ; its cosets are N and σN = {σ, σδ, σδ2 , . . . , σδ n−1 }. By Theorem 1.11, the subgroup N is normal. This also follows from σδ i = δ−i σ. For n ⩾ 2, consider D2n . The subgroup E = ⟨δ2 ⟩ = {id, δ2 , δ4 , . . . , δ2n−2 } contains n elements. Therefore, the index of E in D2n is 4. Its four cosets are E, δE, σE and σδE. We have σδ2i = δ2n−2i σ; in particular, 2n − 2i is even. This implies σE = Eσ. From this, together with δE = Eδ, we conclude that E is normal. Therefore, D2n /E is a group with four elements. We choose {id, δ, σ, σδ} as representatives of the four cosets of E. In addition to the relations within D2n , we obtain δ2 = id. Instead of dealing with the cosets, we define the group structure directly on the set of representatives, using the following multiplication table. ⋅
id
δ
σ
σδ
id δ σ σδ
id δ σ σδ
δ id σδ σ
σ σδ id δ
σδ σ δ id
The table is symmetric and all diagonal entries are id. In other words, the group is Abelian and all nontrivial elements are of order 2. Therefore, the group cannot be cyclic. This group is known as the Klein four-group named after Felix Christian Klein (1849–1925). It is isomorphic to the direct product ℤ/2ℤ × ℤ/2ℤ and to the automorphism group of the following graph with four vertices: 3
2
0
1
1.3 Symmetric groups
| 13
1.3 Symmetric groups Let S n be the set of all permutations on {1, . . . , n}, called the symmetric group on n elements. It indeed forms a group with the operation given by the composition of mappings, where πσ is defined by (πσ)(i) = π(σ(i)) for 1 ⩽ i ⩽ n. If G is a group with at most n elements and g ∈ G, then x → gx defines a permutation on G. Therefore, every group of order at most n is isomorphic to a subgroup of S n . In this section, we have a closer look at the elements of S n . Let π be a permutation on {1, . . . , n}. Then a pair (i, j) ∈ {1, . . . , n}2 with i < j is called an inversion of π if π(i) > π(j). If we think of (π(1), . . . , π(n)) as a sequence, then (i, j) is an inversion if the elements at positions i and j are in the wrong order. By I(π), we denote the number of inversions of π. Example 1.14. Consider the permutation π ∈ S 9 given by (π(1), . . . , π(9)) = (3, 1, 7, 2, 4, 6, 9, 5, 8). Then the inversions of π are (1, 2), (1, 4), (3, 4), (3, 5), (3, 6), (3, 8), (6, 8), (7, 8), and (7, 9). One way of illustrating the inversions is by writing (1, . . . , 9) twice, one above the other. Then, we draw a line from each number i in the first line to π(i) in the second line: 1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
Every crossing of two lines corresponds to an inversion.
◊
A permutation in S n has at most (2n) inversions and the permutation π = (n, . . . , 1) with π(i) = n − i + 1 matches this upper bound. The identity mapping is the only permutation without inversions. The sign of π is sgn(π) = (−1)I(π) ; it is an element of the multiplicative group {1, −1}. Lemma 1.15. Let π : {1, . . . , n} → {1, . . . , n} be a permutation. Then, we have π(j) − π(i) j−i 1⩽i 1, the n × n matrices over ℤ form a noncommutative ring. The rationals ℚ, the reals ℝ, and the complex numbers ℂ are the fields.
16 | 1 Algebraic structures Example 1.20. If R is a ring, then R R = { f : R → R | f is a mapping } forms a ring with pointwise addition and multiplication. Formally, for f, g ∈ R R the mappings f + g ∈ R R and f ⋅ g ∈ R R are defined by (f + g)(r) = f(r) + g(r) (f ⋅ g)(r) = f(r) ⋅ g(r) Even if R is a field, R R is not a field.
◊
A subset I ⊆ R is an additive subgroup of R if (I, +, 0) is a subgroup of (R, +, 0). Multiplicative submonoids are defined similarly. A subset S of a ring R is a subring if it is both an additive subgroup and a multiplicative submonoid. In particular, every subring is a ring with the same 0 and 1. A subring S of a field R is called subfield if S itself is a field. An additive subgroup I ⊆ R satisfying R ⋅ I ⋅ R ⊆ I is called an ideal. An ideal can only be a subring if 1 belongs to I, but then we have I = R. The subsets {0} and R are the trivial ideals of R. We will see in Theorem 1.32 that all ideals in ℤ are of the form nℤ. The set of additive cosets of I is R/I = { r + I | r ∈ R }. Since addition in R is commutative, R/I forms an Abelian group with addition (r + I) + (s + I) = r + s + I Let us define multiplication on R/I by (r + I) ⋅ (s + I) = rs + I As the following computation shows, this definition is well defined: (r + I)(s + I) = rs + Is + rI + II ⊆ rs + I The associativity of multiplication and the distributive property for R/I now follow from the corresponding properties in R. We call R/I the quotient ring of R modulo I. The elements of R/I are called residue classes or residues. We say r is congruent to s modulo I if r + I = s + I. In computations modulo I, we can switch back and forth between representatives and classes. Let X ⊆ I. If every element r ∈ I can be written as r = ∑x∈X r x xs x with r x , s x ∈ R such that r x ≠ 0 ≠ s x for only finitely many x, then X generates I. If a finite generating set exists, then the ideal is called finitely generated. An ideal I is principal if it is generated by a single element, that is, if I = RrR for some r ∈ R. In ℤ, every ideal is principal because it is generated by the greatest common divisor of all its elements; see Theorem 1.32. A mapping φ : R → S between rings R and S is called a (ring) homomorphism if the following conditions hold for all x, y ∈ R: (a) φ(x + y) = φ(x) + φ(y) (b) φ(x ⋅ y) = φ(x) ⋅ φ(y) (c) φ(1) = 1.
1.4 Rings | 17
The first property means that φ is a group homomorphism with respect to addition. This also implies φ(0) = 0. The last two properties guarantee that φ is a monoid homomorphism with respect to multiplication. In particular, (c) does not follow from (b). A bijective ring homomorphism is a ring isomorphism because, due to bijectivity, the inverse mapping is a homomorphism, too. If I ⊆ R is an ideal, then R → R/I, r → r + I defines a ring homomorphism. Let φ : R → S be a ring homomorphism. Its kernel is the ideal ker(φ) = { r ∈ R | φ(r) = 0 }. The image of φ is the subring im(φ) = { φ(r) | r ∈ R } of S. Unlike im(φ), the ideal ker(φ) is not a subring unless im(φ) = {0}. In analogy to Theorem 1.10, one can show that I is an ideal if and only if I is the kernel of a ring homomorphism φ : R → S; see Exercise 1.24. Example 1.21. Let R be a ring and r ∈ R. Then the mapping φ : R R → R, f → f(r) is a ring homomorphism. Its kernel is ker(φ) = { f : R → R | f(r) = 0 }. The mappings with a zero at r form an ideal in R R . ◊ Example 1.22. Consider the derivative operator D : { f : ℝ → ℝ | f is differentiable } → ℝℝ , f → f We have (f + g) = f + g , that is, D is a group homomorphism with respect to addition. On the other hand, ker(D) = { f | f is a constant mapping } is not an ideal. Therefore, D is not a ring homomorphism. ◊ The following theorem is analogous to the homomorphism theorem for groups. Theorem 1.23 (Homomorphism theorem for rings). Let φ : R → S be a ring homomorphism. Then, φ induces the ring isomorphism R/ker(φ) → im(φ) r + ker(φ) → φ(r) Proof: The set φ(r+ker(φ)) has only one element, namely φ(r). Therefore, r+ker(φ) → φ(r) defines a ring homomorphism. From Theorem 1.12, it follows that it is a bijection, and bijective ring homomorphisms are isomorphisms. An easy consequence of the homomorphism theorem for rings is that homomorphisms of finite fields into themselves are bijective. We show a more general statement. Corollary 1.24. Ring homomorphisms from a field into a ring with 0 ≠ 1 are injective. In particular, all homomorphisms between fields are injective. Proof: Let φ be a homomorphism and suppose that φ(z) = 0 for z ≠ 0. Then, 1 = φ(1) = φ(z−1 z) = φ(z−1 ) ⋅ φ(z) = φ(z−1 ) ⋅ 0 = 0, contradicting the assumption. Thus ker(φ) = {0}, and the homomorphism theorem for rings 1.23 yields injectivity. An ideal M ⊆ R is maximal if M ≠ R and there is no ideal I with M ⊊ I ⊊ R. Note that R itself is an ideal. Thus, the “largest” ideal R is not maximal in R.
18 | 1 Algebraic structures Theorem 1.25. An ideal M ⊆ R in a commutative ring R is maximal if and only if R/M is a field. Proof: Let M ⊆ R be maximal and x + M ∈ R/M a residue class with x ∉ M. We show that x + M is invertible in R/M. Note that M + xR is an ideal. Since M is maximal and x ∉ M, it follows that M + xR = R. Then, 1 can be written as 1 = m + xy for m ∈ M and y ∈ R. The product xy is congruent to 1 modulo M and thus y + M is the multiplicative inverse. This shows that each class x + M in R/M, except the zero element M, is invertible. Consequently, R/M is a field. Now for the converse, let R/M be a field. Then 1 ∈ ̸ M, because 0 ≠ 1 in all fields. Let I ⊆ R be an ideal with M ⊊ I, and consider x ∈ I \ M. Then x + M ≠ 0 + M and since R/M is a field, there is r ∈ R such that (x + M) ⋅ (r + M) = 1 + M. Then 1 + M = xr + M; therefore R = R + M = xR + M ⊆ I because I is an ideal containing both xR and M. This shows I = R, so M is maximal. An element r of a ring R is a zero divisor if there exists s ∈ R \ {0} with rs = 0 or sr = 0. For example, in the ring ℤ/6ℤ the numbers 2 and 3 are zero divisors. Invertible elements cannot be zero divisors. A ring is a domain if 0 is the only zero divisor. By this definition, in domains we always have 0 ≠ 1. An integral domain is a commutative domain. For an integer k ∈ ℤ and a ring (R, +, ⋅, 0R , 1R ), we define k ⋅ 1R = ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 1R + ⋅ ⋅ ⋅ + 1R k times
We therefore write k ∈ R, meaning the element k ⋅ 1R . In this notation, for 0, 1 ∈ ℤ, we have 0 = 0R ∈ R and 1 = 1R ∈ R. Two cases are possible. Either all positive integers in R are different from zero or there is a smallest positive number n with n = 0 in R. In the first case, we say that R has characteristic 0; in the second case, the characteristic is n with n > 0. In any case, we denote the characteristic of R by char(R). The ideal char(R)⋅ℤ is the kernel of the homomorphism ℤ → R, k → k ⋅1R . The homomorphism theorem for rings 1.23 shows that ℤ/char(R)ℤ is isomorphic to a subring of R. Every subring S of R has the same characteristic as R. Lemma 1.26. Let R be a domain. Then, either char(R) = 0 or char(R) is prime. Proof: Suppose that char(R) = mn with m, n > 1. Since mn = 0 and m ≠ 0 ≠ n in R, both m and n are zero divisors in R. Fields have no zero divisors and therefore their characteristic is either zero or a prime number. Fields of characteristic zero contain ℤ as a subring, and since all s ≠ 0 are invertible, fractions sr = rs−1 are elements of the field for all r, s ∈ ℤ with s ≠ 0. Thus, these fields contain the rational numbers ℚ as uniquely determined subfield. Other examples for fields of characteristic zero are ℝ, ℂ or { a + b √−1 | a, b ∈ ℚ } ⊊ ℂ. All fields of characteristic p for prime number p contain ℤ/pℤ. Therefore, the fields ℚ and ℤ/pℤ are minimal with respect to set inclusion; they are called prime fields.
1.4 Rings
| 19
The binomial coefficient (xk) for x ∈ ℂ and k ∈ ℤ is defined as follows: { x⋅(x−1)⋅⋅⋅(x−k+1) x k! ( )={ k 0 {
for k ⩾ 0 for k < 0
For k ⩾ 0, we have exactly k factors in both the numerator and the denominator; in particular, for k = 0 both products are empty and thus evaluate to the neutral elen n! ment 1. For n, k ∈ ℕ, we obtain ( nk) = k!(n−k)! = ( n−k ). For all x ∈ ℂ and k ∈ ℤ, the following addition theorem holds: x−1 x−1 x )+( ) ( )=( k k−1 k This can be seen as follows. For k < 0, both sides are 0, whereas for k = 0, both sides x−1 x−1 x−k x−1 k x−1 evaluate to 1. Now let k > 0. Then, ( xk) = kx ( x−1 k−1) = k ( k−1) + k ( k−1) = ( k ) + (k−1). n n For n ∈ ℕ, we have (0) = (n) = 1. Now by induction, using the addition theorem, we see that the binomial coefficient ( nk) is an integer for all n, k ∈ ℕ. In particular, the binomial coefficients for n ∈ ℕ and k ∈ ℤ can be interpreted as multiples of the neutral element 1R in any ring R. Theorem 1.27 (Binomial theorem). Let R be a commutative ring. For all x, y ∈ R and all n ∈ ℕ, we have (x + y)n = ∑k ( nk)x k y n−k . Proof: The sum on the right-hand side runs over all k ∈ ℤ, but almost all of these summands are zero. This convention is often helpful, because index shifts are made much easier. The proof is by induction on n. For n = 0, both sides reduce to the neutral element 1R . Now let n > 0. Then (x + y)n = (x + y)(x + y)n−1 = (x + y) ∑ ( k
n − 1 k n−1−k )x y k
n − 1 k+1 n−1−k n − 1 k n−k = (∑ ( )x y ) + (∑ ( )x y ) k k k k = (∑ ( k
= ∑ (( k
n − 1 k n−k n − 1 k n−k )x y ) + (∑ ( )x y ) k−1 k k
n−1 n−1 n )+( )) x k y n−k = ∑ ( )x k y n−k k−1 k k k
The binomial theorem is an important tool for the consideration of mappings r → r p in commutative rings. Theorem 1.28. Let R be a commutative ring with prime characteristic p. Then, r → r p defines a ring homomorphism. Proof: Clearly, 1p = 1 and (rs)p = r p s p . It remains to show that (r+s)p = r p +s p . By the binomial theorem (r + s)p = ∑k ( pk)r k s p−k . All binomial coefficients ( pk) for 1 ⩽ k ⩽ p − 1
20 | 1 Algebraic structures
are divisible by p. Since the characteristic of R is p, all these binomial coefficients are zero in R and thus (r + s)p = (0p)r0 s p + (pp)r p s0 = s p + r p . The homomorphism in Theorem 1.28 is usually referred to as the Frobenius homomorphism named after Ferdinand Georg Frobenius (1849–1917). It is especially interesting for fields, because in this case it is, by Corollary 1.24, injective and thus, for finite fields, necessarily also bijective. As an application of Theorem 1.28 for R = ℤ/pℤ, we prove Fermat’s little theorem (named after Pierre de Fermat, 160?–1665). In particular, on prime fields ℤ/pℤ the Frobenius homomorphism is the identity. Theorem 1.29 (Fermat’s little theorem). Let p be prime. Then, a p ≡ a mod p for all a ∈ ℤ. Proof: It suffices to prove the claim for a ∈ ℕ. We proceed by induction on a. For a = 0, the statement is true. Now, (a + 1)p ≡ a p + 1p ≡ a + 1 mod p. The first congruence follows from Theorem 1.28 and the second from the induction hypothesis. In particular, under the assumptions of Theorem 1.29, for all a ∈ ℤ which are invertible in ℤ/pℤ, we obtain a p−1 ≡ 1 mod p.
1.5 Modular arithmetic Let us start with a quick repetition of the most basic terms. The greatest common divisor of two integers k and ℓ is denoted by gcd(k, ℓ); it is the largest natural number that divides both k and ℓ. The greatest common divisor of k and 0 is defined as the number |k|. Two numbers are coprime if their greatest common divisor is 1. For computations in the quotient ring ℤ/nℤ, for integers k, ℓ, and n the following notation is used: k ≡ ℓ mod n It means that k + nℤ = ℓ + nℤ, that is, k and ℓ differ by a multiple of n. If this holds, we say that k is congruent to ℓ modulo n; the integer n is called modulus. By k mod n, we denote the unique number r ∈ {0, . . . , |n| − 1} satisfying k ≡ r mod n. Frequently, k mod n is taken as representative of the residue class k + nℤ. By (ℤ/nℤ)∗ , we denote the group of units of the ring ℤ/nℤ. These are the residue classes that have a multiplicative inverse.
1.5.1 Euclidean algorithm The Euclidean algorithm (Euclid of Alexandria, around 300 BC) is an efficient method for computing the greatest common divisor. Since gcd(k, ℓ) = gcd(−k, ℓ) = gcd(ℓ, k), it suffices to consider natural numbers 0 < k ⩽ ℓ. Let ℓ = qk + r for 0 ⩽ r < k. The
1.5 Modular arithmetic
| 21
integer r is called the remainder and we have r = ℓ mod k. Each number dividing k and the remainder r also divides the sum ℓ = qk + r. Each number dividing k and ℓ also divides the difference r = ℓ − qk. This yields a recursive version of the Euclidean algorithm /∗ Assume that k ⩾ 0, ℓ ⩾ 0 ∗/ function gcd(k, ℓ) begin if k = 0 then return ℓ else return gcd(ℓ mod k, k) fi end If k ⩽ ℓ, then after two recursive calls, the smaller parameter is less than k/2. In particular, the recursion depth is in O(log k). The next theorem is often named after Étienne Bézout (1730–1783) who showed a corresponding statement for polynomials. Theorem 1.30 (Bézout’s lemma). Let k, ℓ ∈ ℤ. Then there exist a, b ∈ ℤ such that: gcd(k, ℓ) = ak + bℓ Proof: We may assume that ℓ > k > 0, all other cases are either trivial or can be reduced to this case. Let r0 = ℓ and r1 = k. The Euclidean algorithm successively computes remainders r0 > r1 > r2 ⋅ ⋅ ⋅ > r n > r n+1 = 0 satisfying r i−1 = q i r i + r i+1 for appropriate q i ∈ ℕ. It follows gcd(k, ℓ) = gcd(r i+1 , r i ) = gcd(0, r n ) = r n . We now show that for all i ∈ {0, . . . , n} there are integers a i and b i such that a i r i+1 + b i r i = r n . For i = n, we can set a n = 0 and b n = 1. Let now i < n and a i+1 and b i+1 already defined, that is a i+1 r i+2 + b i+1 r i+1 = r n . With r i+2 = r i − q i+1 r i+1 , we obtain (b i+1 − a i+1 q i+1 )r i+1 + a i+1 r i = r n . Thus, a i = b i+1 − a i+1 q i+1 and b i = a i+1 have the desired property. The proof presented above yields the following procedure: the extended Euclidean algorithm computes gcd(k, ℓ) and, additionally, two numbers a and b with the property ak + bℓ = gcd(k, ℓ). /∗ Assume that k ⩾ 0, ℓ ⩾ 0 ∗/ /∗ The result is (a, b, t) with ak + bℓ = t = gcd(k, ℓ) ∗/ function ext-gcd(k, ℓ) begin if k = 0 then return (0, 1, ℓ) else (a, b, t) := ext-gcd(ℓ mod k, k); return (b − a ⋅ ⌊ ℓk ⌋, a, t) fi end
22 | 1 Algebraic structures Let n ∈ ℕ. For each number k that is coprime to n, the extended Euclidean algorithm provides two integers a, b with ak + bn = 1. Modulo n, we obtain ak ≡ 1 mod n. Thus, a is a multiplicative inverse of k. We have a closer look at (ℤ/nℤ)∗ , the invertible elements of ℤ/nℤ, in the following theorem: Theorem 1.31. Let n ∈ ℕ. Then the following statements hold: (a) (ℤ/nℤ)∗ = { k + nℤ | gcd(k, n) = 1 } (b) The number n is prime if and only if ℤ/nℤ is a field. (c) For every k ∈ ℤ, the mapping ℤ/nℤ → ℤ/nℤ, x → kx is bijective if and only if gcd(k, n) = 1. Proof: (a) We have k ∈ (ℤ/nℤ)∗ if and only if 1 can be written in the form 1 = kℓ + mn. By Theorem 1.30, this is equivalent to gcd(k, n) = 1. (b) The ring ℤ/nℤ is a field if and only if all nonzero elements are invertible. By (a) this means that all numbers from 1 to n − 1 are coprime to n. This in turn is equivalent to primality. (c) If gcd(k, n) = 1, then k is invertible in (ℤ/nℤ)∗ . Hence, the multiplication with k has an inverse mapping, thereby showing that it is a bijective function. For the converse, suppose that x → kx is surjective. Then, the element 1 has a preimage, and thus there exists x ∈ ℤ with kx ≡ 1 mod n. It follows kx + ℓn = 1 for some ℓ ∈ ℤ. This shows gcd(k, n) = 1.
1.5.2 Ideals in the integers Ideals are a concept from algebra which also plays an important role in modular arithmetic. The interest in ideals of ℤ comes from the fact that divisibility in ℤ can be formulated in terms of subset relations of ideals. We have k | ℓ ⇔ ℓℤ ⊆ kℤ From the following theorem, we obtain a converse of this statement: each subset relation of ideals in ℤ represents some divisibility of numbers. Theorem 1.32. Let I be an ideal of ℤ. Then, there is a unique number n ∈ ℕ such that I = nℤ. Here, n is the greatest common divisor of all numbers in I. Particularly, kℤ + ℓℤ = gcd(k, ℓ) ℤ Proof: If I = {0}, then n = 0 satisfies the assertion. Let now I ≠ {0}. For all a, b ∈ I, we can find p, q ∈ ℤ with gcd(a, b) = ap + bq. So, gcd(a, b) ∈ I if and only if a, b ∈ I. The claim now follows from the observation that the greatest common divisor of all numbers in I is the uniquely determined smallest positive number in I. Actually, the proof of Theorem 1.32 shows that all ideals in a ring are generated by a single element as soon as division with the remainder is available. This, for example,
1.5 Modular arithmetic |
23
is also the case in polynomial rings over fields; see Section 1.6. Another advantage of the algebraic point of view on modular arithmetic is that by using algebraic concepts one might find simpler formulations and new interpretations for certain interrelations and contexts. If I and J are ideals of a ring R, then I ∩ J is an ideal, too. For k, ℓ ∈ ℤ by lcm(k, ℓ) ∈ ℕ, we denote the least common multiple of k and ℓ. A tight relation between lcm and gcd is given by lcm(k, ℓ) ⋅ gcd(k, ℓ) = |k| ⋅ |ℓ|. Theorem 1.33. kℤ ∩ ℓℤ = lcm(k, ℓ)ℤ Proof: Since kℤ ∩ ℓℤ is an ideal of ℤ, by Theorem 1.32, there exists n ∈ ℕ with kℤ ∩ ℓℤ = nℤ. From nℤ ⊆ kℤ and nℤ ⊆ ℓℤ, we obtain k | n and ℓ | n. So, n is a common multiple of k and ℓ and thus n ⩾ lcm(k, ℓ). Let t be any common multiple of k and ℓ. Then, tℤ ⊆ kℤ ∩ ℓℤ = nℤ and consequently n | t. This shows n ⩽ lcm(k, ℓ) and therefore n is the least common multiple of k and ℓ.
1.5.3 Chinese remainder theorem For coprime numbers k and ℓ, the quotient ring ℤ/kℓℤ can be decomposed into the components ℤ/kℤ and ℤ/ℓℤ. This is exactly what is stated by the Chinese remainder theorem. The most important part of the statement is the following lemma. Lemma 1.34. For coprime numbers k, ℓ ∈ ℤ, the following mapping is surjective: π : ℤ → ℤ/kℤ × ℤ/ℓℤ x → (x + kℤ, x + ℓℤ) Furthermore, π induces a bijection between ℤ/kℓℤ and ℤ/kℤ × ℤ/ℓℤ by the mapping (x mod kℓ) → (x mod k, x mod ℓ). Proof: Consider (x+kℤ, y+ℓℤ). Since k and ℓ are coprime, there are numbers a, b ∈ ℤ with ak + bℓ = 1. Thus, bℓ ≡ 1 mod k and ak ≡ 1 mod ℓ. For x, y ∈ ℤ, we obtain the following properties of yak + xbℓ: yak + xbℓ ≡ 0 + x ⋅ 1 ≡ x
mod k
yak + xbℓ ≡ y ⋅ 1 + 0 ≡ y
mod ℓ
But then π(yak + xbℓ) = (x + kℤ, y + ℓℤ) and π is surjective. We have π(x ) = π(x) for all x ∈ x + kℓℤ, so (x mod kℓ) → (x mod k, x mod ℓ) is well defined. Therefore, π induces a surjection of Z/kℓℤ onto ℤ/kℤ × ℤ/ℓℤ. Finally, we observe that both Z/kℓℤ and ℤ/kℤ × ℤ/ℓℤ have kℓ elements. Thus, the induced mapping is bijective. If R1 and R2 are rings, we can define a ring structure on the Cartesian product R1 × R2 by component-wise addition and multiplication. More specifically, addition is defined
24 | 1 Algebraic structures by (x1 , y1 ) + (x2 , y2 ) = (x1 + x2 , y1 + y2 ). The zero element is given by the pair (0, 0) ∈ R1 × R2 . Similarly, multiplication is defined by (x1 , y1 ) ⋅ (x2 , y2 ) = (x1 ⋅ x2 , y1 ⋅ y2 ) and the neutral element is (1, 1). This ring is called the direct product of R1 and R2 . In a somewhat more elementary form, the following theorem is originally due to Sun Tzu in the 3rd century. However, it was only published in 1247 by Qin Jiushao (ca. 1202–1261). Theorem 1.35 (Chinese remainder theorem). Let k, ℓ ∈ ℤ be coprime. Then the following mapping defines an isomorphism of rings: ℤ/kℓℤ → ℤ/kℤ × ℤ/ℓℤ x + kℓℤ → (x + kℤ, x + ℓℤ) Proof: The mapping is compatible with addition and multiplication and corresponds to the bijective mapping from Lemma 1.34. A consequence of the Chinese remainder theorem is that, for the coprime numbers k and ℓ, the multiplicative group (ℤ/kℓ)∗ is isomorphic to (ℤ/kℤ)∗ × (ℤ/ℓℤ)∗ .
1.5.4 Euler’s totient function In this section, we deal with the number of elements in (ℤ/nℤ)∗ . To this end, we define Euler’s totient function φ(n) as φ(n) = (ℤ/nℤ)∗ By Theorem 1.31, the function φ(n) counts the integers in the range from 1 to n which are coprime to n. Thus φ(1) = 1. The Chinese remainder theorem yields φ(kℓ) = φ(k)φ(ℓ) for the coprime numbers k and ℓ. This is due to the fact that x + kℓℤ is invertible if and only if both x + kℤ and x + ℓℤ are invertible. To determine the value of the totient function for arbitrary numbers, we only have to find the value of φ(n) for the cases where n is a prime power p k . So, let p be prime and k ⩾ 1. From the numbers 1, . . . , p k exactly the p k−1 numbers p, 2p, . . . , p k are divisible by p and therefore not coprime to p k . The remaining p k − p k−1 numbers are coprime to p k . This shows φ(p k ) = (p − 1)p k−1 = (1 − p−1 )p k Hence, we can compute φ(n) for any number n as soon as we know its prime factorization. An important property of Euler’s totient function is due to the following generalization of Fermat’s little theorem. Theorem 1.36 (Euler). If gcd(a, n) = 1, then a φ(n) ≡ 1 mod n.
1.5 Modular arithmetic |
25
Proof: This is an immediate consequence of Corollary 1.5 since a ∈ (ℤ/nℤ)∗ and φ(n) is the order of the group (ℤ/nℤ)∗ . In particular, we can determine the inverse of any element a ∈ (ℤ/nℤ)∗ by computing a φ(n)−1 . We write ∑d|n φ(d) for the sum over all φ(d) such that d is a positive divisor of n. Theorem 1.37. ∑ φ(d) = n d|n
Proof: Consider the following set of n fractions: N={
n−1 0 1 } n,n,..., n
By cancellation, we can write every fraction such that d | n. Hence, we obtain N={
k d
m n
as
k d
for the coprime numbers k and d
d | n, 0 ⩽ k < d, gcd(k, d) = 1 }
Grouping the fractions by their denominator yields the following partition of N into disjoint sets: k N = ⋃ { 0 ⩽ k < d, gcd(k, d) = 1 } d d|n Finally, |{ k/d | 0 ⩽ k < d, gcd(k, d) = 1 }| = |{ k | 0 ⩽ k < d, gcd(k, d) = 1 }| = φ(d) completes the proof. The following theorem gives an important connection between group theory and modular arithmetic. Theorem 1.38. Let G be a cyclic group of order n and d be a positive divisor of n. Then, G has φ(d) elements of order d. Proof: Let g ∈ G be a generator. By ψ(d), we denote the number of elements of order d. Since the order of every element in G divides n, we have ∑d|n ψ(d) = n. By Theorem 1.37, it suffices to show that ψ(d) ⩾ φ(d) for every divisor d of n. Consider k ∈ {1, . . . , d − 1} with gcd(k, d) = 1. Let h=g
kn d
We have h d = (g n )k = 1k = 1. Since n does not divide ikn d
ikn d
for i ∈ {1, . . . , d − 1}, we
= g ≠ 1 for all i ∈ {1, . . . , d − 1}. Thus, the order of h is d. Every number have k ∈ (ℤ/dℤ)∗ defines an element of order d such that different numbers define different group elements. This completes the proof of ψ(d) ⩾ φ(d). hi
26 | 1 Algebraic structures
1.6 Polynomials and formal power series Throughout this section, R is a commutative ring. A formal power series with coefficients in R is an infinite sequence (a0 , a1 , . . . ) where all a i are from R. A polynomial is a formal power series with a i = 0 for almost all i ∈ ℕ. As usual by almost all, we mean “all but finitely many.” The degree of a polynomial f = (a0 , a1 , . . . ) is denoted by deg(f) and it is the maximum index d ∈ ℕ with a d ≠ 0. The polynomial f with a i = 0 for all i ∈ ℕ is called the zero polynomial and we define its degree to be −∞. Polynomials of degree at most 0 are called constants; they are identified with the elements of R. We usually write a formal power series (a0 , a1 , . . . ) as a formal sum f(X) = ∑ a i X i i⩾0
Here, we read X as a formal symbol or as a variable. A polynomial can be written as a finite formal sum d
f(X) = ∑ a i X i i=0
If f is not the zero polynomial, we may require that d = deg(f) and a d ≠ 0. In this case, we call a d the leading coefficient of the polynomial f(X). The leading coefficient of the zero polynomial is zero by definition. A formal power series (a0 , a1 , . . . ) can also be interpreted as a mapping ℕ → R with i → a i . We denote the set of formal power series by RX and the set of polynomials by R[X]. The latter is called the polynomial ring over R. The set of formal power series RX is a ring ∑ a i X i + ∑ b i X i = ∑ (a i + b i )X i i⩾0
i⩾0
i⩾0
∑ a i X i ⋅ ∑ b j X j = ∑ a i X i ⋅ b j X j = ∑ ( ∑ a i b j )X k i⩾0
j⩾0
i, j⩾0
k⩾0
i+j=k
In this ring, the polynomials form a subring but not an ideal. A polynomial f ∈ R[X] can also be interpreted as a polynomial mapping f ̃ : R → R by substituting values for the variable X f ̃ : R → R, r → ∑ a i r i i⩾0
The mapping f → f ̃ is called the evaluation homomorphism. To simplify notation, we write f(r) instead of f ̃(r). The notation f(r) is extended to formal power series, provided that the value of the infinite series ∑ i⩾0 a i r i is well defined in the ring R. This depends on the notion of convergence in R. In any case, for power series we have f(0) = a0 . The mappings RX → R with f → f(0) and R[X] → R R with f → f ̃ are ring homomorphisms, which in general, however, are not injective. If R is finite, then R R is also finite, whereas R[X] for R ≠ {0} is infinite. If R is an infinite field, for example, R = ℚ or R = ℝ, then the polynomial mapping f ̃ corresponding to f can only be the constant zero function if all the coefficients
1.6 Polynomials and formal power series |
27
of f are zero. In this case, we can therefore identify polynomials and polynomial mappings. Since very often polynomials are only considered over ℚ or ℝ, the distinction between polynomials and polynomial mappings might look artificial and unnecessary. But for finite fields it is necessary, as the following example shows. Example 1.39. Let R = ℤ/2ℤ and f(X) = X 2 + X ∈ R[X]. Then, f(X) has degree 2 and it is not the zero polynomial. But if f(X) is interpreted as the mapping f ̃ : ℤ/2ℤ → ℤ/2ℤ, r → r2 + r, we find that f ̃ is the zero function since f(0) = f(1) = 0 for 0, 1 ∈ ◊ ℤ/2ℤ. An advantage of this purely formal point of view is that invertibility of formal power series does not depend on convergence. Moreover, certain properties of the ring R can directly be transferred to the ring of formal power series RX . Theorem 1.40. The following properties transfer from R to RX : (a) RX is an integral domain if and only if the ring R is. (b) f ∈ RX is invertible if and only if f(0) ∈ R is. Proof: (a) If R has a zero divisor, then so does RX because R is a subring. Now let R be an integral domain and f = (a0 , a1 , . . . ) and g = (b 0 , b 1 , . . . ) be two elements of RX \ {0}. Let a i be the first nonzero coefficient of f and b j the first nonzero coefficient of g. Then, f ⋅ g = (c0 , c1 , . . . ) with c i+j =
∑ a k b ℓ = a i b j ≠ 0 k+ℓ=i+j
In particular, f ⋅ g is not the zero polynomial. Thus, RX is an integral domain. (b) If f is invertible, so must be f(0) since (f ⋅ g)(0) = f(0) ⋅ g(0). For the converse, let f = (a0 , a1 , . . . ) and assume that f(0) = a0 ∈ R is invertible. Define b 0 by a0 ⋅ b 0 = 1. Now, let k ⩾ 1 and (b 0 , . . . , b k−1 ) be already defined. Then define b k by a0 ⋅ b k = − ∑ a i b k−i 0 0, the ring F[X]/f is not zero. For the converse, let f be not irreducible, that is, f is constant or a product of polynomials of smaller degree. First, suppose that f is constant. In the case of f = 0, the quotient F[X]/0 is isomorphic to F[X] and the latter is not a field since polynomials having roots cannot be inverted. If f is a nonzero constant, then F[X]/f is the zero ring and thus not a field. If the polynomial f can be written as a product of two polynomials of strictly smaller degree, f = gh, then g ≢ 0 mod f and h ≢ 0 mod f but gh ≡ 0 mod f . Thus F[X]/f contains zero divisors and therefore is not a field. Next, we examine the size of F[X]/f . Theorem 1.52. Let F be a field and f ∈ F[X] \ {0}. Then, F[X]/f as an additive group is isomorphic to F deg(f) .
32 | 1 Algebraic structures Proof: Let d = deg(f) and F d = { g ∈ F[X] | deg(g) < d }. As an additive group F d is isomorphic to F d since a polynomial of degree less than d is defined by the sequence of coefficients (a0 , . . . , a d−1 ) ∈ F d . The homomorphism g → g mod f from F d to F[X]/f is surjective because, by Theorem 1.44, the residue classes of F[X]/f have representatives of degree less than d. It is injective since g = g mod f for all g ∈ F d . This proves the theorem. Example 1.53. Let F = ℤ/2ℤ and f = X 3 +X+1 ∈ F[X]. In F[X]/f , we have f ≡ 0 mod f . Since −1 = +1 holds in F, this is equivalent to X 3 ≡ X + 1 mod f . This allows us to reduce exponents i ⩾ 3 in X i . For example, we obtain X 6 + X 4 ≡ X 3 ⋅ (X + 1) + X 4 ≡ X 3 ≡ X + 1 mod f Since deg(f) = 3 and f(r) ≠ 0 for all r ∈ F, f is irreducible in F[X]. Using Theorems 1.51 and 1.52, we finally see that F[X]/f is a field with eight elements. ◊ Example 1.54. The Advanced Encryption Standard (AES) works with the irreducible polynomial X 8 + X 4 + X 3 + X + 1 of degree 8 over F = ℤ/2ℤ. Here, all computations are done in the field F[X]/(X 8 + X 4 + X 3 + X + 1). This field contains 256 elements that can be represented by 1 byte each, the ith bit being the coefficient of X i for 0 ⩽ i ⩽ 7. The addition operation is the bitwise exclusive-or of 2 bytes and the multiplication rule follows directly from the polynomial representation given above. ◊
1.7 Hilbert’s basis theorem A commutative ring R is called Noetherian if each ideal is generated by a finite set. The term refers to Emmy Noether who was a student of Hilbert and the first female university professor for mathematics in Germany. Hilbert’s basis theorem states that polynomial rings over Noetherian rings are Noetherian themselves. It formed the basis for the rapid development of algebraic geometry in the 20th century. We will restrict our considerations to commutative rings throughout this section. Every field F is a Noetherian ring since the only two ideals {0} and F require at most one generating element. The ring ℤ is Noetherian since, according to Theorem 1.32, every ideal is principal. By Theorem 1.45, also every ideal of the polynomial ring F[X] in one variable X over a field F is principal; therefore F[X] is Noetherian, too. In the polynomial ring ℂ[X, Y] in two variables, the ideal ⟨X, Y⟩ is not principal because if f(X, Y) is supposed to generate the polynomial X, then Y must not occur in f(X, Y). But then the polynomial Y cannot be generated by f(X, Y). As we shall see, ℂ[X, Y] is Noetherian. Generally, the polynomial ring R[X 1 , . . . , X n ] in n variables is defined inductively as R[X1 , . . . , X n ] = R [X n ] for R = R[X1 , . . . , X n−1 ]. Theorem 1.55 (Hilbert’s basis theorem). Let n ∈ ℕ and R be a Noetherian commutative ring. Then the polynomial ring R[X1 , . . . , X n ] is Noetherian.
1.8 Fields | 33
Proof: Since R[X1 , . . . , X n ] = R [X n ] for R = R[X1 , . . . , X n−1 ], it suffices to show the theorem for the polynomial ring R[X] in one variable. Let I ⊆ R[X] be an ideal. For f ∈ I, let ℓf be the leading coefficient of f . For f, g ∈ I, both ℓf + ℓg and ℓf − ℓg are the leading coefficients of polynomials of degree at most max{deg(f), deg(g)} in I. The set { ℓf | f ∈ I } is an ideal in R. So, there is a finite set of polynomials B ⊆ I such that { ℓf f ∈ I } = ⟨ℓb | b ∈ B ⟩ Let d = max{ deg(b) | b ∈ B }. For every, e < d the set { ℓf | f ∈ I, deg(f) ⩽ e } is an ideal in R. Thus there is a finite set B e ⊆ I satisfying { ℓf f ∈ I, deg(f) ⩽ e } = ⟨ℓb | b ∈ B e ⟩ We let B = B ∪ ⋃ e 1. If f is not irreducible, then f can be decomposed into a product f = gh with deg(g) < d and deg(h) < d. The statement is now obtained by induction. First, let E be the splitting field of g over F. We then construct the splitting field E of h over E . Finally, let f be irreducible. Then, E = F[X]/f is a field by Theorem 1.51. From the finiteness of F, together with Theorem 1.52, we conclude that E is finite, too. After renaming the variable X to α, we can write E = F[α]/f(α) and use X again as the name of a variable. In E , by construction we have f(α) = 0. Thus, α is a root and f ∈ E [X] can be decomposed into f(X) = (X − α)g(X), where deg(g) < d. Note that the leading coefficient of g is β. Inductively, for g and E there is a field extension E where g splits into linear factors. Thus, over E[X] we obtain d−1
g(X) = β ∏ (X − α i ) i=1
where the leading coefficient of f is still β ∈ F. Substituting α d = α, we obtain the desired result d
f(X) = β ∏(X − α i ) i=1
1.9 Finite fields | 35
A field E is algebraically closed if every nonzero polynomial f(X) ∈ E[X] admits a facdeg(f) torization f(X) = β ∏i=1 (X − α i ) for β, α i ∈ E. Applying Theorem 1.58 “infinitely often” to a field F and all polynomials over this field leads to the so-called algebraic closure of F. By construction, the algebraic closure of F is algebraically closed. The formal proof for the existence of the algebraic closure is based on the axiom of choice. Furthermore, one can show that the algebraic closure of a field is unique up to isomorphism; this means that all minimal field extensions of F where every polynomial splits into linear factors are isomorphic. The algebraic closure of a field F is always infinite, even if F itself is finite. The complex numbers ℂ are the algebraic closure of the field of real numbers ℝ but not of the rational numbers ℚ because, among others, the number π is not in the algebraic closure of ℚ. Let n ⩾ 1 and G ⊆ ℂ be the set of roots of the polynomial X n − 1 over the complex numbers ℂ. Since the polynomial X n − 1 and its derivative nX n−1 are coprime, X n − 1 has n different roots. These roots form a finite subgroup of ℂ∗ . By Theorem 1.56, the group G is cyclic of order n. This shows that every cyclic group can be found as a subgroup of ℂ∗ . We now want to examine the situation in arbitrary fields. Theorem 1.59. Let n ⩾ 1 be an integer with char(F) ∤ n. Then there is a field extension E of F and an element α ∈ E∗ of order n such that n−1
X n − 1 = ∏ (X − α i ) i=0
We call α a primitive root of unity of order n. Proof: We already know that f(X) = X n − 1 splits into linear factors in an appropriate field extension E. The roots of f(X) form a finite subgroup of E∗ consisting of at most n elements. By Theorem 1.56, this subgroup is cyclic. Furthermore, we have f (X) = nX n−1 . Since char(F) does not divide n, we have f (α) ≠ 0 for all α ∈ E∗ . Since f(0) ≠ 0, all roots of X n − 1 are simple. Thus, they are pairwise distinct and the cyclic subgroup is generated by an element α of order n.
1.9 Finite fields A special case of Theorem 1.56 shows that the multiplicative group of finite fields is always cyclic. This fact is of central importance for the study of finite fields. Every field is a vector space over its prime field. In particular, a finite field F is a finite-dimensional vector space over its prime field 𝔽p for a prime number p. If the dimension is n, then F contains exactly p n elements. We state this property in the following theorem and give an alternative proof that does not require linear algebra, but uses Cauchy’s theorem instead.
36 | 1 Algebraic structures Theorem 1.60. Let F be a finite field of characteristic p. Then there exists n ⩾ 1 with |F| = p n . Proof: By Lemma 1.26, the number p is prime. Then, in the additive group of F, all nonzero elements have order p. By Cauchy’s theorem (Theorem 1.9), for every prime divisor q of |F|, there is an element of order q. Thus, p is the only prime divisor of |F| and hence, |F| = p n for an appropriate exponent n ⩾ 1. Theorem 1.61. Let p be a prime number and n ⩾ 1. Then there exists a field with p n elements. Proof: Let q = p n and let E be a finite splitting field of f(X) = X q − X over 𝔽p . The field E exists by Theorem 1.58. The derivative of f is f (X) = −1. Thus, f and f are coprime and, by Corollary 1.50, all roots of f are simple. This shows that X q − X has exactly q different roots in E. By Theorem 1.28, the n-fold application of the Frobenius automorphism yields an automorphism φ : E → E, x → x q . The roots of f are those elements x which satisfy φ(x) = x. These elements form a field with p n elements. Theorem 1.62. Let F and K be finite fields and |K| a power of |F|. Then, F is isomorphic to a subfield of K. Proof: Let |F| = q = p n for a prime number p and n ⩾ 1. By Theorem 1.56, both F ∗ and K ∗ are cyclic. First, let α be a generator of F ∗ . Then, α is a root of the polynomial X q−1 −1 ∈ 𝔽p [X]. Let f be a factor of X q−1 −1 with f(α) = 0 in F such that f is irreducible in 𝔽p [X]. By Theorem 1.51, the quotient ring 𝔽p [X]/f is a field. The homomorphism φ : 𝔽p [X]/f → F, g(X) → g(α) is well defined and thus surjective. Therefore, φ is an isomorphism by Corollary 1.24. Without loss of generality, we assume that F = 𝔽p [X]/f in the remainder of the proof. Since K contains q k elements for some k ⩾ 1, its prime field is 𝔽p . The cyclic group K ∗ contains q k −1 elements and thus it contains a cyclic subgroup of order q−1. Therefore, X q−1 − 1 can be decomposed into linear factors over K, and f as a divisor of X q−1 − 1 has a root β ∈ K. We now define a homomorphism ψ : F p [X]/f → K, g(X) → g(β). As a homomorphism of fields it is injective and thus F appears as an isomorphic subfield of K. An immediate consequence of the previous theorem is that, up to isomorphism, every finite field is uniquely determined by its number of elements. This justifies the notation 𝔽q or GF(q), respectively, for the field with q elements. By combining the results in this section, we obtain the following classification of finite fields. Corollary 1.63. For every prime power q > 1, up to isomorphism there is a unique field 𝔽q with q elements. Moreover, all finite fields are of this form. Proof: By Theorem 1.61, there exists a field 𝔽q with q elements. By Theorem 1.62, this field is unique up to isomorphism. Finally, by Theorem 1.60, there are no other finite fields.
1.10 Units modulo n
| 37
Example 1.64. Let n ⩾ 1. By Theorem 1.61, there is a field extension E over 𝔽2 with 2n elements and it can be written as E = 𝔽2 [X]/f for an irreducible polynomial f(X) ∈ 𝔽2 [X] of degree n. If we regard E as a vector space over 𝔽2 , then the elements 1, X, . . . , X n−1 from E are linearly independent. If we define α i = (0 ⋅ ⋅ ⋅ 010 ⋅ ⋅ ⋅ 0) with 1 at the (n − i)th position from the left, then X i → α i induces a bijection between E and the set 𝔹n of 0 − 1 sequences of length n. Here, any polynomial of degree less than n is represented by the sequence of its coefficients. In particular, this yields the structure of a field on 𝔹n . Now let f(X) = X 3 + X 2 + 1 ∈ 𝔽2 [X]. This polynomial has no roots in 𝔽2 . Its degree is 3, so f is irreducible and E = 𝔽2 [X]/f is a field with eight elements. The elements of E are the linear combinations of 1, X, X 2 with coefficients from 𝔽2 . The sequences of these coefficients result in a representation of E by 0 − 1 sequences of length 3. Multiplying such sequences, for example, we obtain (101)(111) = (X 2 + 1)(X 2 + X + 1) = 1 = (001) because (X 2 + 1)(X 2 + X + 1) ≡ 1 mod f in 𝔽2 [X].
◊
1.10 Units modulo n We have seen that the group of units (ℤ/nℤ)∗ is cyclic if n is prime. In this section, we want to describe the structure of the multiplicative group (ℤ/nℤ)∗ for prime powers n. The structure of (ℤ/nℤ)∗ for arbitrary numbers n can then be obtained using the Chinese remainder theorem. Considering the group G = (ℤ/8ℤ)∗ , we can identify G with the odd numbers 1, 3, 5, 7. We have a2 = 1 for all a ∈ G. Particularly G, being of order 4, is not a cyclic group. We start with two technical lemmas. Lemma 1.65. Let p be an odd prime and e ⩾ 2. For all a ∈ ℤ, we have (1 + ap)p
e−2
≡ 1 + ap e−1
mod p e e−3
Proof: For e = 2, the statement is trivial. So, let e > 2. Inductively, we have (1 + ap)p e−3 ≡ 1 + ap e−2 mod p e−1 . Hence, there is a number b ∈ ℤ such that (1 + ap)p = 1 + ap e−2 + bp e−1 . Using p | ( pk) for 1 ⩽ k < p, we obtain (1 + ap)p
e−2
p k = (1 + ap e−2 + bp e−1 )p = ∑ ( ) ((a + bp)p e−2 ) k k ∈ 1 + p(a + bp)p e−2 + p e ℤ + (a + bp)p p p(e−2)
Now, p e | p p(e−2) because p > 2, and thus (1+ ap)p
e−2
≡ 1+ ap e−1 mod p e follows.
Lemma 1.66. Let e ⩾ 3. Then for all a ∈ ℤ, we have (1 + 4a)2
e−3
≡ 1 + 2e−1 a mod 2e .
38 | 1 Algebraic structures Proof: The statement is trivial for e = 3. Let e > 3. Then by inductive hypothesis, we e−4 may assume that (1 + 4a)2 = 1 + 2e−2 a + 2e−1 b for b ∈ ℤ. Using e ⩽ 2e − 4, we obtain (1 + 4a)2
e−3
= (1 + 2e−2 a + 2e−1 b)2 = 1 + 2e−1 (a + 2b) + 22e−4 (a + 2b)2 ≡ 1 + 2e−1 a
mod 2e
We now use these two lemmas to show that certain elements in (ℤ/p e ℤ)∗ have a large order. Theorem 1.67. Let p be an odd prime. Then, for all e ⩾ 1 the group (ℤ/p e ℤ)∗ is cyclic. For e ∈ {1, 2}, the group (ℤ/2e ℤ)∗ is cyclic, too. If e > 2, then (ℤ/2e ℤ)∗ is isomorphic to the additive group ℤ/2ℤ × ℤ/2e−2 ℤ and therefore not cyclic. Proof: The multiplicative group of a finite field is cyclic. This shows the assertion for all cases where e = 1. Now let p be odd and e ⩾ 2. We choose a generator g of (ℤ/pℤ)∗ . Then, g p−1 = 1 + ap and (p + g)p−1 = 1 + bp for suitable integers a and b. Suppose that p | a and p | b. Then, g p ≡ g mod p2 and (p + g)p ≡ p + g mod p2 , which leads to a contradiction as follows: p p + g ≡ (p + g)p ≡ ∑ ( )p k g p−k ≡ g p ≡ g k k
mod p2
Thus, without loss of generality, we may assume that gcd(a, p) = 1; otherwise we e−1 could take p + g as generating element of (ℤ/pℤ)∗ . Lemma 1.65 yields (1 + ap)p ≡ e−1 1 + ap e mod p e+1 and therefore (1 + ap)p ≡ 1 mod p e . In particular, the order of e−2 1+ap in (ℤ/p e ℤ)∗ divides p e−1 . Again from Lemma 1.65, we conclude that (1+ap)p ≡ e−1 e p−1 e ∗ e−1 1 + ap ≢ 1 mod p . Therefore, the order of g in (ℤ/p ℤ) is p and, thus, the e−1 e ∗ order of g is (p − 1)p . Consequently, (ℤ/p ℤ) is cyclic. The group (ℤ/4ℤ)∗ is cyclic because it is of order 2. We now come to the case e−2 e−2 p = 2 and e > 2. Lemma 1.66 implies 52 = (1 + 4)2 ≡ 1 + 2e mod 2e+1 and e−2 e−3 thus 52 ≡ 1 mod 2e . Quite similarly, one can show that 52 ≡ 1 + 2e−1 ≢ 1 mod 2e . Therefore, the order of the number 5 in (ℤ/2e ℤ)∗ is 2e−2 . Let G be the subgroup of (ℤ/2e ℤ)∗ generated by 5. Suppose that −1 ∈ G; in other words, we have 5n ≡ −1 mod 2e for some n ∈ ℕ. Then, −1 ≡ 5n ≡ 1 mod 4, a contradiction. This shows G ∩ −G = 0. From |G ∪ −G| = 2e−1 = |(ℤ/2e ℤ)∗ | we obtain G ∪ −G = (ℤ/2e ℤ)∗ . In particular, the group homomorphism ℤ/2ℤ × ℤ/2e−2 ℤ → (ℤ/2e ℤ)∗ , (a, b) → (−1)a 5b is surjective and hence bijective.
1.11 Quadratic reciprocity Let 𝔽q be a finite field. An element a ∈ 𝔽q is a square or a quadratic residue if there exists b ∈ 𝔽q with b 2 = a. In this case, we call b a square root of a. If b is a square root of a, then so is −b. Since quadratic equations over fields have at most two solutions,
1.11 Quadratic reciprocity
|
39
b and −b are the only square roots of a. In general, not all elements of a field are squares. As a simple example, consider 𝔽3 = {0, 1, −1}, where −1 is not a square. The following theorem, named after Euler, gives an effective criterion to test whether a ∈ 𝔽q is a square. Theorem 1.68 (Euler’s criterion). Let 𝔽q be a finite field and q ∈ ℕ be odd. Then the q−1 mapping 𝔽∗q → {1, −1}, a → a 2 is a surjective homomorphism of (multiplicative) groups. Moreover q−1 a ∈ 𝔽∗q is a square ⇔ a 2 = 1 Proof: If a is a square, then there exists b ∈ 𝔽∗q with b 2 = a. From Corollary 1.5, we q−1 obtain a 2 = b q−1 = 1. Now, let a not be a square. By Theorem 1.56, the multiplicative group 𝔽∗q is cyclic. Let g be a generator of this group and choose k ∈ ℕ such that q−1 a = g2k+1 . From g q−1 = 1, we conclude g 2 ∈ {−1, 1} because the polynomial X 2 − 1 q−1 has no other roots. Since the order of g is q − 1, we obtain g 2 = −1 and thus a
q−1 2
=g
(2k+1)(q−1) 2
= (−1)2k+1 = −1
q−1
Now we have shown that in all cases a 2 ∈ {−1, 1} and 1 occurs if and only if a is a square. This also shows that the mapping is a homomorphism. For surjectivity, we q−1 q−1 note that 1 2 = 1 and g 2 = −1 for all generators g of 𝔽∗q . Euler’s criterion implies that there are as many squares as nonsquares in 𝔽∗q if q is odd. We take a look at squares in prime fields ℤ/pℤ. At first glance, the fields ℤ/pℤ and ℤ/qℤ for different primes p and q do not seem to have very much in common. But, surprisingly, both structures are tightly connected by their quadratic residues. This is what the law of quadratic reciprocity is about. Let n be an arbitrary odd number and a ∈ ℤ coprime to n. The mapping x → ax mod n defines a permutation on the set {0, . . . , n − 1}. We let a ( ) = sgn(x → ax mod n) n and call ( an ) ∈ {1, −1} the Jacobi symbol (named after Carl Gustav Jacob Jacobi, 1804– 1851); it is the sign of the permutation x → ax mod n. The connection between quadratic residues and signs of permutations is given by the following result of Yegor Ivanovich Zolotarev (1847–1878). Theorem 1.69 (Zolotarev’s lemma). Let p be an odd prime and a ∈ ℤ such that p ∤ a. Then, a is a square modulo p if and only if ( ap ) = 1. Proof: Lemma 1.15 shows p−1 p (ay mod p) − (ax mod p) a ( ) = ∏ ≡ a ( 2) ≡ a 2 p 0⩽x y . Therefore, n π has ( m 2 )(2) inversions. This shows sgn(μ ∘ ν−1 ) = (−1)
m(m−1) n(n−1) 2 2
= (−1)
m−1 n−1 2 2
Exercises
Now, we have shown (a) for positive numbers because ( mn ) (b) Using Lemma 1.15, we obtain (
−1
| 41
= ( mn ).
n n−1 −1 x−y = (−1)(2) = (−1) 2 ) = sgn(x → −x mod n) = ∏ n y−x 0⩽x 1. Show that the additive groups (ℤ/mℤ) × (ℤ/nℤ) and ℤ/mnℤ are not isomorphic. 1.31. Find s ∈ ℕ with 51 ⋅ s ≡ 1 mod 98. 1.32. Find the two smallest numbers n1 , n2 ∈ ℕ such that the following congruences for i = 1, 2 are satisfied: n i ≡ 2 mod 3, n i ≡ 3 mod 4 and n i ≡ 6 mod 7. 1.33. Show that 72n
4
+2n2
≡ 1 mod 60 for all n ∈ ℕ.
1.34. Show that 5 is the only prime number of the form z4 + 4 with z ∈ ℤ. Hint: Use polynomial division to determine a polynomial p such that z4 + 4 = (z2 − 2z + 2) ⋅ p(z). 1.35. Compute gcd(X 6 + X 5 + X 3 + X, X 8 + X 7 + X 6 + X 4 + X 3 + X + 1) in the polynomial ring 𝔽2 [X] over the two-element field 𝔽2 . 1.36. Let f(X) = X 5 + X 2 +1 ∈ 𝔽2 [X]. Show that f(X) is irreducible over the two-element field 𝔽2 .
Exercises
|
45
1.37. Let t ⩾ 1. The polynomial f(X) ∈ ℝ[X] is called t-thin if it is the sum of at most t terms of the form a i X i . Then, f(X) = ∑i⩾0 a i X i and a i ≠ 0 for at most t indices i. Show that the number of positive real roots of f(X) ≠ 0 is bounded by t − 1. 1.38. Let 0 ≠ f(X) = ∑i a i X i ∈ ℝ[X] be a polynomial of degree d. The number of sign changes of f(X) is the number of sign changes in the sequence (a0 , . . . , a d ); and this is defined as the number of sign changes in the subsequence that remains if all a j with a j = 0 are omitted. The number of sign changes in the sequence (0, 1, 0, −3, 4, 0, 2, −1) thus is that of (1, −3, 4, 2, −1), which is 3. (a) Let 0 < λ ∈ ℝ be a positive real number. Show that the number of sign changes of g(X) = (X − λ)f(X) is strictly greater than the number of sign changes of f(X). (b) (Descartes’ rule of signs; named after René Descartes, 1596–1650) The number of positive real roots (with multiplicities) of f(X) is bounded by the number of sign changes in the sequence (a0 , . . . , a d ). 1.39. Let f(X) = X n + a n−1 X n−1 + ⋅ ⋅ ⋅ + a0 ∈ ℤ[X]. Show that if r ∈ ℚ is a root of f(X), then r ∈ ℤ and r divides a0 . 1.40. (Eisenstein’s criterion; named after F. Gotthold M. Eisenstein, 1823–1852) Let f(X) = a n X n + ⋅ ⋅ ⋅ + a0 X 0 ∈ ℤ[X] with n ⩾ 1 and a n ≠ 0. Suppose there exists a prime number p such that p ∤ a n and p | a n−1 , . . . , p | a0 , but p2 ∤ a0 . Show that f(X) is irreducible over ℚ. 1.41. Prove the following statements using Eisenstein’s criterion from the previous exercise: (a) Let p be a prime number and f(X) = X n − p for n ⩾ 1. Then, f(X) is irreducible over n ∈ ̸ ℚ. ℚ and thus in particular √p (b) Let p be a prime number and f(X) = X p−1 + X p−2 + ⋅ ⋅ ⋅ + 1. Then, f(X) is irreducible over ℚ. p −1 Hint: f(X) can be written as f(X) = XX−1 ∈ ℤ[X], and f(X) is irreducible over ℚ if and only if f(X + 1) is. 1.42. We extend formal power series to series of the form F(X) = ∑ f i X i i∈ℤ
Claim: The series S(X) = ∑i∈ℤ X i satisfies S(X) = 0. Proof: X ⋅ S(X) = S(X) and therefore S(X) ⋅ (1 − X) = 0. On the other hand, (1 − X) ⋅ ∑i⩾0 X i = 1. From these facts, we obtain S(X) = S(X) ⋅ 1 = S(X) ⋅ (1 − X) ⋅ ∑ X i = 0 ⋅ ∑ X i = 0 i⩾0
Where is the error in this proof?
i⩾0
46 | 1 Algebraic structures 1.43. Consider the set ℚ[√2] = { a + b √2 | a, b ∈ ℚ } with addition defined by (a + b √2)+(a + b √2) = (a + a )+(b + b )√2 and multiplication by (a + b √2)⋅(a + b √2) = (aa + 2bb ) + (ab + a b)√2. Show that, with these operations, ℚ[√2] is a field. 1.44. Let F be a field, E a field extension of F and α ∈ E a root of the polynomial p(X) ∈ F[X] with deg(p) ⩾ 1. Prove that there is a unique polynomial m(X) ∈ F[X] with the leading coefficient 1 and the property { g(X) | g(X) ∈ F[X], g(α) = 0 } = { f(X)m(X) | f(X) ∈ F[X] } The polynomial m(X) is called the minimal polynomial of α. 1.45. Let a ∈ ℤ and p an odd prime with p ∤ a. Let m be the order of a in (ℤ/pℤ)∗ and n the order of a in (ℤ/p e ℤ)∗ for e ⩾ 1. Show that m | n and mn ∈ { p i | 0 ⩽ i < e }. 1.46. Let 𝔽 be a field with q elements for an odd number q. Show that a ∈ 𝔽 is a square if and only if X − a ∈ 𝔽[X] divides the polynomial X (q−1)/2 − 1.
Summary Notions – – – – – – – – – – – – – – – –
semigroup, monoid Abelian, commutative homomorphism isomorphism group, inverse element generated substruct. ⟨X⟩ coset order of a group order of an element index [G : H] cyclic group generating element kernel ker(φ), image im(φ) normal subgroup quotient group G/H symmetry group
– – – – – – – – – – – – – – – – –
symmetric group inversion sign sgn(π) transposition (commutative) ring zero ring skew field field subring, subfield ring homomorphism ideal, quotient ring principal ideal maximal ideal zero divisor (integral) domain characteristic char(R) prime field
– – – – – – – – – – – – – – – –
Frobenius homomorphism formal power series polynomials R[X] zero polynomial degree deg(f) leading coefficient derivative f root, multiplicity irreducible Noetherian ring field extension splitting field algebraic closure finite field 𝔽q Jacobi symbol Legendre symbol
Methods and results – – – – – –
partitioning into cosets Lagrange’s theorem: |G| = [G : H] ⋅ |H| g|G| = 1. Moreover: g n = 1 ⇔ order of g divides n groups of prime order are cyclic subgroups of cyclic groups are cyclic G Abelian group, g, h ∈ G with coprime orders m, n ⇒ gh is of order mn
Summary
– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
| 47
Cauchy’s theorem: p prime divisor of |G| ⇒ G has an element of order p normal subgroups are exactly the kernels of group homomorphisms subgroups of index 2 are normal subgroups homomorphism theorem for groups: φ : G → H homomorphism ⇒ G/ ker(φ) ≅ im(φ) structure of the symmetry group of a regular polygon π permutation on {1, . . . , n}. Then, sgn(π) = (−1)# of inversions = ∏i 0. Alice and Bob choose a finite group G to be the set of ciphers Y as well as the set of keys K. The group G is public, but it has to be large enough that we can embed X as a subset in G. Bob chooses a secret key k ∈ G uniformly at random. Then he communicates the key k to Alice over a secure channel. For example, Alice and Bob meet personally before the decision about the text x is taken. When Bob intends to inform Alice about x, he sends the product y = xk over a public channel. Alice recovers x by computing
2.5 Perfect security and the Vernam one-time pad | 57
x = yk −1 . The probability for each ciphertext y is now Pr[y] = ∑x Pr[x−1 y] Pr[x] = 1/|G| and thus independent of x ∈ X. The problem is that an observer Oscar knows y and perhaps later, for example, from Alice’s reaction, he might know x as well. Knowing x and y reveals k. So, the key k is good for one time, only. (Even if Oscar does not know x, he could detect that the same message was sent again unless k is changed.) An advantage is that the method is valid in any group that is large enough, so we are free to choose the group structure. A reasonable simplification is to demand k = k −1 for all k ∈ G because then no inverses have to be computed and encryption and decryption methods are identical. Now k = k −1 is equivalent to k 2 = 1. So we can, for example, choose G = (ℤ/2ℤ)n for n ∈ ℕ with log2 |X| < n. The elements of G are then the bit strings y ∈ 𝔹n = {0, 1}n and the group operation is the bitwise exclusive-or ⊕. Therefore, encryption and decryption in this case can be performed very efficiently. For this special case, we obtain the cryptosystem using the so-called Vernam one-time pad by Gilbert Vernam (1890–1960) who published it in 1918 and had it patented shortly thereafter. It is an encryption scheme with a “one-time key” because, for each message x, a new key must be chosen to guarantee the security. One can view the Vernam one-time pad as a generalization of the Vigenère cipher. The subjects of encryption are bit sequences of length n. Here, X ⊆ 𝔹n = Y = K. Encryption and decryption using the key k ∈ 𝔹n then work as follows: c(z, k) = d(z, k) = z ⊕ k From the above discussion, it is clear that one-time encryption with the Vernam one-time pad is perfectly secure. But this guarantee is lost if the key is used twice. This is why the scheme is called a one-time pad. As noted above, an attacker can determine the key in a known-plaintext scenario: knowing a plaintext x and its corresponding ciphertext c(x, k) the attacker can compute the key k by x ⊕ c(x, k) = x ⊕ x ⊕ k = k. A Vernam one-time pad is used whenever perfect security is required so that the efforts of key generation and transmission become secondary issues.² A major problem is that the number of different keys is not smaller than that of the plaintexts. So, if we can transmit the key secretly, why can we not simply use this transmission for x instead? The point is that the key generation is completely decoupled from the text: Alice and Bob can agree on k in advance, before x is fixed. The difficulty due to the large number of keys cannot be avoided by the next theorem. Theorem 2.1. Let X denote a set of texts with Pr[x] > 0 for all x ∈ X, let K be a set of keys, and suppose that the ciphertexts Y satisfy Pr[y] > 0 for all y ∈ Y. If the underlying cryptosystem is perfectly secure, then |X| ⩽ |Y| ⩽ |K|. Proof: First, for each key k ∈ K, the encoding function x → c(x, k) is an injective mapping from X to Y, which implies |X| ⩽ |Y|. Then, because of the perfect security 2 The Vernam one-time pad was used on the red telephone which connected the White House to the Kremlin during the Cold War.
58 | 2 Cryptography property, for a fixed x0 ∈ X and each y ∈ Y, there has to be a key k ∈ K with y = c(x0 , k). Otherwise, we have Pr[y | x0 ] = 0 ≠ Pr[y], contradicting the perfect security property. Therefore, k → c(x0 , k) is a surjective mapping from K to Y and thus |K| ⩾ |Y|. With a large number of keys, a design for a uniformly distributed random key selection is not obvious. Theorem 2.1 shows that the key length cannot be reduced without sacrificing perfect security. Now we can show that the uniform distribution is also essential if the key generation is independent of the text. Therefore, we must not rely on the socalled pseudo-random generators in the key generation for the Vernam one-time pad, or the system is not perfectly secure anymore. Theorem 2.2. Let |X| = |Y| = |K| such that Pr[x] > 0 for all x ∈ X. If keys are chosen independently of plaintexts, then the following statements are equivalent: (a) The cryptosystem is perfectly secure. (b) For every k ∈ K, we have Pr[k] = 1/|K|, and for all x ∈ X and y ∈ Y there is a key k xy ∈ K with c(x, k xy ) = y. Proof: From Pr[k] = 1/|K|, we obtain Pr[y | x] = Pr[k xy | x] = Pr[k xy ] = 1/|K|, and then Pr[y] = ∑x Pr[x] Pr[y | x] = (∑ x Pr[x])/|K| = 1/|K|. Thus, the cryptosystem is perfectly secure. Conversely, let Pr[y] = Pr[y | x]. Suppose that c(x, k 1 ) = y = c(x, k 2 ) for two different keys k 1 , k 2 ∈ K. Then, since |K| = |Y|, there exists some ciphertext y ∈ Y such that there is no k ∈ K with c(x, k) = y . Thus, Pr[y ] = Pr[y | x] = 0, a contradiction to |X| = |Y|. Therefore, for every x and every y there exists a unique key k xy ∈ K with c(x, k xy ) = y. Similarly, for every key k ∈ K and every cipher y ∈ Y, there exists a unique x ∈ X with k = k xy . This yields Pr[k] = Pr[k xy ] = Pr[y | x] = Pr[y]. In particular, we have Pr[k 1 ] = Pr[y] = Pr[k 2 ] for all k 1 , k 2 ∈ K. Consequently, Pr[k] = 1/|K| for all k ∈ K.
2.6 Asymmetric encryption methods The cryptosystems considered so far are symmetric, as they use the same key for encryption and decryption. Therefore, before the communication, Alice and Bob need to agree on a key which is unknown to an attacker. For a key exchange, Alice and Bob can, for example, meet in a café and exchange the key personally. Back home again, they can exchange messages over an insecure channel with the help of the key. But what is to be done when no café is available at the right time? The problem can be solved by using different keys for encryption and decryption, which leads to the concept of asymmetric cryptosystems. The idea was first published in 1976 by Diffie and Hellman. Alice uses two keys for the communication. The first one is a public key that is made visible to anyone. However, the second key is private and secret, known solely to Alice. If Bob wants to send a message to Alice, he uses her public key for encryption. Alice in turn uses her private key to decrypt Bob’s message. This
2.6 Asymmetric encryption methods |
59
scheme elegantly solves the problem of key exchange because the communicated key is Alice’s public key, but nothing else. This is why the method is also called a publickey cryptosystem. Even if Oscar eavesdrops on the conversation, his only possibility is to decode Bob’s message based on the public key and the ciphertext. Therefore, methods of this kind have to be designed in such a way that this attack by Oscar cannot be performed efficiently. The problem is that we hardly know any way to prove such a statement rigorously. In public-key cryptography, you have to be content with relative computational security or even with pragmatic security. There is another danger: Oscar pretends to be Alice by exchanging his public key with hers. Then he may receive and decrypt messages that were meant for Alice. This scenario is often referred to as man-in-the-middle attack. In Section 2.12, we present a method for Alice to label her key with some kind of signature to prevent these attacks. The main goal in public-key cryptography is to ensure that Oscar is not able to determine the original message from the ciphertext along with the public key. One relies on relative computational security for this, meaning that Oscar could decrypt messages in principle, but his resources in time or energy, according to our current knowledge, are not sufficient to do so. Therefore, public-key systems are based on a computational problem which is believed to be not efficiently solvable from a practical point of view. We consider the following asymmetric encryption methods: RSA: This is probably the best known and most successful asymmetric method. It is based on the factorization of large integers. Rabin: This is related to the Rivest–Shamir–Adleman (RSA) system and it also relies on integer factorization. Diffie-Hellman: The method is not directly designed for encryption, but it is a method of exchanging keys for symmetric cryptosystems. Its security is based on the computation of discrete logarithms. ElGamal: This cryptosystem is related to the Diffie–Hellman key exchange. Its security is also based on the discrete logarithm. Merkle–Hellman: It is based on the NP-complete knapsack problem. However, Shamir was able to show that it is not secure. We present Shamir’s attack at the end of this chapter. Even though it is widely believed that the factorization of large numbers and the discrete logarithm problem cannot be solved efficiently, no proof is known for these conjectures. Therefore, we are always interested in new cryptosystems based on different computational problems: for instance, group-based cryptography uses noncommutative groups as a platform and algebraic operations like conjugacy to design cryptographic schemes. So far, theses schemes have not fulfilled initial promises and, at present, practical applications are rare. This might drastically change if simple schemes like RSA become insecure due to more efficient factorizations algorithms.
60 | 2 Cryptography
2.7 RSA cryptosystem The best-known public-key encryption method is RSA. It is named after the last names of its inventors Ronald Linn Rivest (born 1947), Adi Shamir (born 1952), and Leonard Adleman (born 1945). In RSA, Alice chooses two distinct primes p and q and computes the product n = pq. Next, Alice computes two numbers e, s ⩾ 3 such that es ≡ 1 mod (p−1)(q−1). Typically, Alice chooses e, say e = 17 in case that 17 ∤ φ(n), and then she computes s using the extended Euclidean algorithm. A small number e, like e = 17, is not a problem as long as we take care that no message x is encrypted with x e ⩽ n. However, the number s should be large; otherwise, the private key (n, s) is insecure due to an attack of Michael J. Wiener [110]. If s has less than one-quarter as many bits as the modulus n, then, in typical cases, this attack may discover the secret s. The public key is the pair (n, e); and encryption is done by x → x e mod n. Alice decrypts by y → y s mod n. Let y = x e mod n. If es = 1 + (p − 1)k, then y s ≡ x es ≡ x ⋅ (x p−1 )k ≡ {
0 x ⋅ 1k
if p | x }≡x otherwise
mod p
by Fermat’s little theorem. Analogously, one can show that y s ≡ x mod q. In other words, both p and q divide y s − x. Since p and q are coprime, n = pq divides y s − x and, hence, we have y s ≡ x mod n. If x ∈ {0, . . . , n − 1}, then we obtain y s mod n = x. Thus, every encrypted message is decrypted correctly. In Chapter 3, we show how to find large primes p and q (Section 3.4), what properties p and q should have in order to ensure that n cannot be factorized easily (Section 3.6), how x e mod n can be computed efficiently (Section 3.2), and that the number n can easily be factorized as soon as one knows the secret key s (end of Section 3.4.4). Since we assume that factorization cannot be done efficiently, this implies that the secret key is secure. However, it is not known whether the decryption of the RSA method is secure. It is conceivable that, even without the knowledge of s, an attacker could decrypt the ciphertext. It is not possible to unambiguously assign a secret key s to the decryption function. So one can determine s by knowing the rule es ≡ 1 mod lcm(p − 1, q − 1); since p − 1 and q − 1 are even, clearly lcm(p − 1, q − 1) < (p − 1)(q − 1). Consequently, y → y s mod n and y → y s mod n define the same mappings on {0, . . . , n − 1}. But in general, s ≠ s . For example, y → y23 mod 55 and y → y3 mod 55 define the same mappings on {0, . . . , 54}; they satisfy 7 ⋅ 23 ≡ 1 mod 20 and 7 ⋅ 3 ≡ 1 mod 20. One of the problems in RSA, and in many similar cryptosystems, occurs when only a few messages are likely to be sent. Then Oscar simply encrypts all these messages with the public key (n, e). For each of these messages, he stores the pair consisting of plaintext and corresponding ciphertext. Later, if Oscar eavesdrops on a message encrypted by (n, e), he compares it to his list of messages. If the ciphertext occurs in his list, then he knows the corresponding plaintext. To prevent this type of attack, a sufficient amount of random bits is usually appended to each message. In practice,
2.8 Rabin cryptosystem
|
61
128–256 random bits are considered to be secure. It can be useful to insert the random bits at various (fixed) positions of each message.
2.8 Rabin cryptosystem In the Rabin cryptosystem (developed by Michael Oser Rabin, born 1931), a message is encrypted by squaring it in the quotient ring modulo n = pq for two large prime numbers p and q. The method was published in 1979. It should be noted that the decryption is ambiguous: indeed, if Bob sends the value y ∈ {0, . . . , n−1} to Alice, where x2 ≡ y mod n, then Alice finds four different square roots of y and she has to decide which is the correct one for x. The number of choices can be reduced to 2 by requiring 2 2 the selection of x in the set {1, . . . , n−1 2 } because x ≡ (n − x) mod n. But this does not resolve ambiguity completely. This disadvantage is partly compensated by the fact that the decryption can be proved to be essentially as hard as the factorization of n. No such statement is known for RSA. It is easy to put more assumptions on x so that the encryption is very likely to be unique. As a typical example, you can require the first k bits of a message to match the last k bits. However, with additional assumptions about the encrypted messages, the connection between the hardness of decryption and that of integer factorization may become more and more incoherent. It seems to be a principle dilemma with Rabin’s method. Still, the mathematical and philosophical consequences are interesting enough to justify discussing the Rabin cryptosystem. The system works as follows: (a) Alice chooses two distinct primes p and q with p ≡ q ≡ −1 mod 4 and computes n = pq. (b) The public key is n, and the private key is (p, q). (c) If Bob wants to send a message x ∈ ℤ/nℤ to Alice, he computes y = x2 mod n and sends y. (d) In order to decrypt y, Alice computes the values zp = y
p+1 4
mod p
and
zq = y
q+1 4
mod q
and then, using the Chinese remainder theorem, obtains up to four different values of z ∈ ℤ/nℤ with z ≡ ±z p mod p and z ≡ ±z q mod q. To prove the correctness of the method, we have to show that one of the four possible values of z equals x. From x2 ≡ y mod n, we can conclude x2 ≡ y mod p and x2 ≡ y mod q. Since p and q are prime, each of the equations X 2 ≡ y mod p and X 2 ≡ y mod q has at most two solutions ±x p ∈ ℤ/pℤ and ±x q ∈ ℤ/qℤ, respectively. In particular, x satisfies x ≡ ±x p mod p and x ≡ ±x q mod q. By symmetry, it suffices to show that z2p ≡ y mod p, because then {−z p , +z p } = {−x p , +x p }. For p | y, nothing has to be shown. So let x, y ∈ (ℤ/pℤ)∗ . From Fermat’s little theorem, we have
62 | 2 Cryptography
y
p−1 2
≡ x p−1 ≡ 1 mod p. This yields z2p ≡ y
p+1 2
≡y⋅y
p−1 2
≡ y ⋅ 1 mod p
and thus the correctness of the Rabin method. As mentioned before, Alice has to be able to identify the original message x out of the four possible values for z. For this purpose, it is customary to limit the space of plaintexts in order to make it possible for Alice to choose the right one with high probability. Therefore, the first and the last k bits of a message x are often required to coincide; say k = 20 or k = 64. If n can be factorized, then it becomes easy to decrypt a message y by performing the same steps as Alice does. However, an attacker does not have to proceed this way to decrypt y, but might be able to determine the original message x in an entirely different way. Suppose there is an efficient algorithm R such that, on input y ∈ { x2 | x ∈ ℤ/nℤ }, it computes R(y) with R(y)2 ≡ y mod n. The idea is that a potential attacker will use R for deciphering. We show that R can be used to factorize n. First, we choose x ∈ {2, . . . , n − 1} at random. If gcd(x, n) ≠ 1, then we have factorized n. So let gcd(x, n) = 1 and z = R(x2 mod n). There are four possibilities which are equally likely: (a) x ≡ z mod p and x ≡ z mod q (b) x ≡ z mod p and x ≡ −z mod q (c) x ≡ −z mod p and x ≡ z mod q (d) x ≡ −z mod p and x ≡ −z mod q. In each of the cases (b) and (c), the computation of gcd(x + z, n) yields a nontrivial factor of n, and gcd(x − z, n) gives the other factor of n. Therefore, we factorized n with a success probability of 1/2. The success probability can be increased arbitrarily by repeating this algorithm for several rounds. Therefore, if we can compute square roots modulo n, then we can factorize n. So, under the assumption that factorizing n is not possible, no attacker can have an algorithm for deciphering messages in the Rabin cryptosystem. If we are using the variant where the first and the last k bits of x coincide, then an attacker only needs to have an algorithm R which, on input x ∈ ℤ/nℤ and the first and y ∈ { x2 } the last k bits of x coincide computes R(y) with R(y)2 ≡ y mod n. In this setting, you still choose x ∈ {2, . . . , n − 1} at random (choosing x such that the first and the last k bits coincide is pointless since the result will be z = x); and you can assume that gcd(x, n) = 1. The element y = x2 mod n is in the domain of R with a probability in the order of magnitude of 2−k . Therefore, after expectedly 2k rounds, we find y in the domain of R and we can proceed with z = R(y) as before. Therefore, with increasing k, this reduction becomes increasingly loose. So, from a practical point of view, the reduction still works for
2.9 Diffie–Hellman key exchange | 63
k = 20, but it fails for k = 64. Some other reduction might work even for k = 64, but to date no such approach is known.
2.9 Diffie–Hellman key exchange Consider the following scenario where Alice and Bob wish to establish an encrypted communication over the Internet. For efficiency reasons, they want to use a symmetric encryption method and thus they need a common secret key. Agreement on such a key can be performed securely and efficiently using the Diffie–Hellman key exchange (Bailey Whitfield Diffie, born 1944; Martin Edward Hellman, born 1945). Alice and Bob agree on a prime number p and a number g mod p generating a sufficiently large subgroup of 𝔽∗p . Since 𝔽∗p is cyclic, there are φ(p − 1) generators of 𝔽∗p . Therefore, it is often required for g to generate the whole group 𝔽∗p . For practical applications, however, a weaker assumption is sufficient. You can choose p such that p − 1 is divided by a large prime number q. For a randomly chosen value of g, we have g(p−1)/q ≢ 1 mod p with probability 1 − 1/q. Thus, a random element g almost surely passes the test g(p−1)/q ≢ 1 mod p, where q is a divisor of the order of g. If g ∈ (ℤ/pℤ)∗ passes the test, then q divides the order of g. Alice and Bob agree on the values of p and g. These need not be kept secret. So they may communicate via e-mail. Then Alice and Bob produce secret random numbers a and b in {2, . . . , p − 1}, respectively. Alice computes A = g a mod p and Bob B = g b mod p. Now Alice publicly sends A to Bob, and Bob publicly sends B back to Alice. Since A b ≡ g ab ≡ B a mod p, they can use the key K = (A b mod p) = (B a mod p) ∈ {1, . . . , p − 1} for their subsequent communication. For instance, they could use K as a secret key in some symmetric cryptosystem. The following example uses very small numbers, in particular q = 11. In realistic applications, numbers with several hundred decimals are used. Example 2.3. Alice and Bob agree on p = 23 and g = 5. Then, g2 ≡ 2 mod 23. Therefore, the order of g in (ℤ/23ℤ)∗ is at least 11. In fact, g generates (ℤ/23ℤ)∗ and thus its order is 22. Alice chooses (randomly) a = 6, Bob chooses b = 13. Alice computes A ≡ 56 ≡ 8 mod 23 and sends A = 8 to Bob. Bob in turn computes B ≡ 513 ≡ 21 mod 23 and sends B = 21 to Alice. Alice computes K = 216 mod 23 = 18. Bob computes K = 813 mod 23 = 18. Everyone may see the numbers 23, 5, 8, and 21, but the ◊ secret key K = 18 remains hidden. The security of this method depends on the difficulty of computing the discrete logarithm. In practice, very large primes are used. The prime number q is maximal with respect to p if q = (p−1)/2 is also a prime. In this case, q is called a Sophie Germain prime (Sophie Germain, 1776–1831), and g can be chosen to be 2. The values of p, g, A, B are known to any eavesdropper, but the Diffie–Hellman problem is to compute K. The Diffie–Hellman problem is solvable if the discrete logarithm a of A ≡ g a mod p or b of
64 | 2 Cryptography B ≡ g b mod p can be computed. So far, no efficient algorithm for the Diffie–Hellman problem is known. However, a third party Oscar could try to get between Alice and Bob, and then pretend to be Alice communicating with Bob, and to be Bob communicating with Alice, without them realizing that. To prevent this man-in-the-middle attack, Alice and Bob can provide signatures for all messages sent. Digital signatures are discussed in Section 2.12.
2.10 ElGamal cryptosystem The ElGamal method was developed in 1985 by Taher ElGamal (born 1955) and is used for both digital signatures and encryption [40]. Like RSA, the ElGamal method is an asymmetric encryption algorithm using a public and a private key. While the security of the RSA algorithm is based on the fact that no efficient factorization algorithm is known, ElGamal uses the discrete logarithm problem. The basic idea is simple: multiply a message with the key obtained from the Diffie–Hellman key exchange. Again, a prime number p with a large prime factor q of p − 1 is needed. Typically, one first chooses q, a sufficiently large prime, and then checks primality of p = kq + 1 for some relatively small even number k. If g k ≢ 1 mod p for g ∈ (ℤ/pℤ)∗ , then q divides the order of g. A random element satisfies this property with a probability of 1 − 1/q. Alice chooses a random number a ∈ {2, . . . , p − 2} as secret key and computes A = g a mod p The public key consists of the triple (p, g, A). The encryption of a message m by Bob to Alice is performed blockwise, so we can assume that the plaintext m is from the set {0, . . . , p − 1}. Bob chooses a random number b ∈ {2, . . . , p − 2} and computes B = g b mod p
and
c = A b m mod p
The ciphertext consists of the pair (B, c). Alice decrypts the text by multiplying c by B p−1−a in 𝔽∗p . This leads to the correct result B p−1−a c ≡ B−a c ≡ g−ab c ≡ A−b A b m ≡ m
mod p
Example 2.4. Let p be the Sophie Germain prime 11 and g = 2. Let the secret key be a = 4. Alice publishes the triple (p, g, A) = (11, 2, 5). Bob wants to send the message m = 7 and has chosen b = 3 and thus B = 8. He computes c ≡ 53 ⋅ 7 ≡ 6 mod 11 and sends the pair (8, 6) to Alice. She obtains 86 ⋅ 6 ≡ 7 mod 11, which is the correct plaintext. ◊ A disadvantage of the ElGamal method is that the ciphertext is twice as long as the plaintext. Moreover, Bob needs to use a new random number b for each message m.
2.11 Cryptographic hash functions | 65
Otherwise, Oscar can easily decrypt all messages as soon as he knows at least one pair of a plaintext and a ciphertext. If (B, c) and (B, c ) are ciphertexts corresponding to messages m and m , respectively, and if Oscar knows (B, c) and (B, c ) as well as m, he obtains m ≡ mc−1 c mod p. Typically, secret messages do not remain secret very long. For example, Oscar may simply assume that the first message is Hello Alice. In an analogous manner, ElGamal encryption can be done over any finite cyclic group. In particular, this can be done over a cyclic subgroup of an elliptic curve. Thus, it serves as the basis for elliptic curve cryptography (see, e.g., [5]). We discuss elliptic curves and their groups in Section 5.
2.11 Cryptographic hash functions One of the applications of cryptographic hash functions is digital signatures, as described in the next section. In general, the alphabet in this section is 𝔹 = {0, 1}; however, the concepts introduced here can be adapted to arbitrary finite alphabets. A hash function is a mapping h : 𝔹∗ → 𝔹n for n ∈ ℕ. A compression function is a mapping g : 𝔹m → 𝔹n for natural numbers m > n; so a hash function maps words of arbitrary length to words of fixed length n. A compression function is a function that maps words of a given length to words of a fixed smaller length. Neither hash functions nor compression functions are injective. To be able to efficiently use hash functions in cryptography, we require h(x) to be easily computable for all x. However, it is crucial that one cannot efficiently find an x for a given s ∈ 𝔹n with h(x) = s. A mapping having these two properties is called a one-way function. From the current state of research it is not known whether one-way functions exist at all. Based on the principle of relative computational security, however, there are many mappings for which it is generally assumed that finding preimages is hard. A function where no efficient method for finding preimages is currently known, is the exponentiation in finite groups. Here, computing preimages means just solving the discrete logarithm problem; see Section 3.7. A collision is a pair (x, x ) such that h(x) = h(x ) and x ≠ x . A function h is called collision resistant if it is computationally hard to find a collision (x, x ). All hash and compression functions have collisions since they are not injective. Collision-resistant hash functions are one-way functions: if an algorithm can compute the preimage x of an image y, then one can randomly choose an x and compute y = h(x ) and some preimage x ∈ h−1 (y). Now, (x, x ) is a collision if x ≠ x . The probability for this is high enough since f is far from being injective. Thus, if collisions cannot be computed efficiently, then preimages also cannot be determined efficiently.
66 | 2 Cryptography An encryption function c k : 𝔹n → 𝔹n with key k ∈ 𝔹n yields the following canonical compression functions g : 𝔹2n → 𝔹n : (a) g(k, x) = c k (x) ⊕ x (b) g(k, x) = c k (x) ⊕ x ⊕ k (c) g(k, x) = c k (x ⊕ k) ⊕ x (d) g(k, x) = c k (x ⊕ k) ⊕ x ⊕ k. Here, ⊕ is the bitwise exclusive-or. All these mappings can easily be computed (assuming that this is the case for c k ), but they are difficult to invert if we have a good encryption function. More robust statements are usually not available. On the one hand, this is because such statements about the security of encryption functions are not known; however on the other hand, it is also important how ⊕ and c k behave relative to each other. If, for example, c k (x) = x ⊕ k, then all of the above compression functions are easy to invert.
Merkle–Damgård construction We now assume that we have a collision resistant compression function g : 𝔹2n → 𝔹n ; for example, we may use one of the functions defined above, if we have a sufficiently good encryption function. Using g, we intend to construct a collision resistant hash function h : 𝔹∗ → 𝔹n . For this purpose, we use the Merkle–Damgård-construction designed by Ralph C. Merkle (born 1952) and Ivan Bjerre Damgård (born 1956). To simplify the presentation, we assume that we only compute hash values of messages with less than 2n bits. For the typical value of n = 128, this number is greater than the estimated number of atoms in the universe. Collisions with more than 2128 bits could not be written down anyway. So let x ∈ 𝔹∗ be any word with less than 2n bits. We add as few zeros as possible on the right of x in order to obtain a word x whose length is divisible by n. Let ℓx ∈ 𝔹n be the binary encoding of the length of x; we allow leading zeros such that ℓx has length n. The length of the word x = x ℓx is divisible by n. We can therefore write x = x1 ⋅ ⋅ ⋅ x s with x i ∈ 𝔹n and s ⩾ 1. Note that x s = ℓx . Now, we let H0 = 0n ∈ 𝔹n and H i = g(H i−1 ⋅ x i ) for i ⩾ 1; here, ⋅ denotes the concatenation operation. The hash value of x is defined as h(x) = H s Suppose, we know two different words x, y ∈ 𝔹∗ of length less than 2n satisfying h(x) = h(y). We show that, under this assumption, we are able to find a collision for g; to do that, we define x = x1 ⋅ ⋅ ⋅ x s and y = y1 ⋅ ⋅ ⋅ y t as extensions of x and y as defined above, with x i , y i ∈ 𝔹n . Let H0 , . . . , H s be the intermediate values in the computation of the hash value of x, and G0 , . . . , G t those for y. Without loss of generality, let s ⩽ t.
2.12 Digital signatures
| 67
If H s−i−1 ≠ G t−i−1 and H s−i = G t−i , then (H s−i−1 ⋅ x s−i , G t−i−1 ⋅ y t−i ) is a collision of g, as both words are mapped by g to H s−i . If this situation does not occur, then, since H s = h(x) = h(y) = G t , the property H s−i = G t−i holds for all i ∈ {0, . . . , s}. Under this assumption, we show that there is an index i ∈ {0, . . . , s − 1} with x s−i ≠ y t−i . If, on the contrary, we have x s−i = y t−i for all 0 ⩽ i < s, then in particular ℓx = x s = y t = ℓy . Consequently, x and y have the same length, resulting in s = t and x = y; this is a contradiction. Thus, an index i ∈ {0, . . . , s−1} with x s−i ≠ y t−i exists. Then, (H s−i−1 ⋅ x s−i , G s−i−1 ⋅ y s−i ) is a collision of g. Therefore, if g is collision resistant, then h is collision resistant.
2.12 Digital signatures In 1991, the Digital Signature Algorithm (DSA) was proposed by the US National Institute of Standards and Technology (NIST). To send an appropriately signed message, a random number generator and an efficient primality test are required. Additionally, one relies on well-known secure hash functions, like the ones described in the previous section. A hash function computes a small fixed-length value from an arbitrarily long bit string. In the scenario of DSA, hash values with bit length 160 are needed. We now describe the nine steps Alice has to perform in order to sign a message according to the DSA standard. The first five steps are independent of the message content: (1) Alice chooses a prime number q with about 160 bits. (2) She selects some even number k such that p = kq + 1 is a prime number with 512–1024 bits. (3) She searches for g0 ∈ F ∗p satisfying g0k ≢ 1 mod p and computes g = g0k mod p. (4) Alice randomly chooses x ∈ {1, . . . , q − 1} and computes y = g x mod p. (5) Alice publishes (q, p, g, y) and specifies the hash function. So far, Alice’s only secret is the number x. It might be unclear what g ≡ g0k ≢ 1 mod p means and why Alice succeeds in finding such an element g. We know that the multiplicative group 𝔽∗p is cyclic, and since q | p − 1, it has a unique subset U of order q. In fact, Alice needs a generator g of U. For each g0 ∈ 𝔽∗p , the element g0k is in U, which is guaranteed by Fermat’s little theorem. If g ≢ 1 mod p, then g is a generator because q is prime. The chance of finding a generator of U is very high: the homomorphism 𝔽∗p → U with f → f k is surjective and all elements of U are met equally often. Therefore, the probability for g to generate the group U is 1 − 1/q. The method’s security depends on the fact that no efficient algorithm is known for computing x ∈ {1, . . . , q − 1} from y ∈ {1, . . . , p − 1}. The only known method is to
68 | 2 Cryptography
compute the discrete logarithm of y to base g. The order q has a binary length of 160 bits and should thus be too large to do that. After these general preliminaries, we now describe the steps that Alice needs to perform when sending a signed message m to Bob. (6) Alice computes the hash value h ∈ {1, . . . , q − 1} of the message m. (7) She randomly chooses a sufficiently large number ℓ ∈ {1, . . . , q − 1}. (8) She computes the remainder r = ((gℓ mod p) mod q) ∈ {0, . . . , q − 1}. (9) Finally, she determines s = (ℓ−1 (h + xr) mod q) ∈ {0, . . . , q − 1}. If s = 0, then Alice restarts the procedure at step (7). The last step is possible because ℓ ∈ (ℤ/qℤ)∗ . The signature of the message m is the pair (r, s), and Alice sends (m, r, s) to Bob. The signature is a bit string of length 320, that is 40 bytes. If Bob receives a message m with signature (r, s), he verifies the signature as follows. He computes the hash value h and then u = s−1 h mod q and υ = s−1 r mod q. Then, he checks whether the following congruence holds: r ≡ (g u y υ mod p)
mod q
If the congruence is satisfied, he accepts the signature. In order to see that all correctly built signatures are accepted, we consider the following lines. From Alice’s computations, we know that sℓ ≡ h + xr mod q. This yields ℓ ≡ s−1 h + xs−1 r ≡ u + xυ gℓ ≡ g u g xυ ≡ g u y υ
mod q
mod p
and consequently r ≡ (g u y υ mod p) mod q, as desired. Note that the signature (r, s) only fits to one single hash value h. Thus, it is not possible to eavesdrop on a signature (r, s) and use it for a different message m with the hash value h ≠ h. Suppose that Oscar tries to fake a signature. We may assume that he is using the correct hash value h for the message m. The problem for Oscar is that, after the values (h, r, s) have been chosen, the values of ℓ mod q and x mod q directly determine each other. So, Oscar can then compute ℓ if and only if he can compute x. If he has no information about x at all, then for him all elements gℓ ∈ U are equally likely. For solving the congruence r ≡ (gℓ mod p) mod q, nothing better is known than computing the discrete logarithm of y to base g. In order to prevent the public key (q, p, g, y) from being compromised, one can use certification agencies for signing the public key. As a second method for digital signatures, let us mention RSA signatures. The idea here is to reverse the order of encryption and decryption in a cryptosystem. The RSA algorithm works correctly, because the encryption function c and the decryption function d are mutually inverse functions. Bob knows Alice’s public RSA key and can compute the encryption function c, but only Alice knows the decryption function d. If Alice wants to send a signed message m to Bob, then she computes the hash value h(m) and
2.13 Secret sharing
| 69
sends the pair (m, d(h(m))) to Bob. Bob, receiving a message (m, s) from Alice, computes h(m) and accepts if h(m) = c(s). If the message indeed originates from Alice, then c(s) = c(d(h(m))) = h(m).
2.13 Secret sharing Secret sharing is about the following situation: Alice wants to distribute a secret s among n people in such a way that t of these people together are able to reconstruct s; however, if less than t people meet, it must not be possible for them to gain any information about s. Mechanisms like this are often used in access controls for systems or information: for instance, one can imagine a company with twenty employees to have a very important customer data base. This data base is encrypted with a key s. The owner, Alice, does not want any single employee to gain access to the data base because he could steal the file and sell it to a competitor. But if three or more employees meet, they should be able to work with the customer data base. It should not matter at all which three of the twenty employees are trying to access entries of the customer file. Another useful property is the ability to add another person with access to the secret s. In the above example, Alice could hire more employees. We first consider a mechanical solution of this problem. Suppose that the secret is hidden behind some door. Alice could build a door with several locks such that each lock has to be unlocked before the door opens. For every lock, she can produce arbitrarily many keys. Let {1, . . . , n} be the set of employees, and let A1 , . . . , A m be n the subsets of {1, . . . , n} with t − 1 elements. We note that m = ( t−1 ). Alice builds m locks, each one labeled by a subset A j .
A1 A2
. . . A m−1 Am
An employee i ∈ {1, . . . , n} gets the key to lock A j if and only if i ∈ ̸ A j . Therefore, if B = A j is a meeting of t − 1 employees, then none of the employees in B has the key to the lock A j . In particular, no subset of less than t employees can open the door. On the other hand, if B is a subset of at least t employees, then B \ A j ≠ 0 for all j ∈ {1, . . . , m}. This shows that every key is in the possession of some member in B and thus, the group B can open the door.
70 | 2 Cryptography
In order to implement this method without physical locks, Alice could fix a large finite commutative group G, for instance, Alice could use G = ℤ/nℤ. We assume that the secret is an element s ∈ G. Every key to the virtual lock A j is a pair (j, g j ) where g j is a random element in G, except for g m , which is chosen as g m = s − ∑m−1 j=1 g j . m Then, all keys together can recover s = ∑j=1 g j . On the other hand, if some key is missing, then all elements in G could be the secret key (with the same probability if the elements g j are chosen independently and uniformly at random). The purpose of the first component in the keys is to (easily) allow g j = g k for j ≠ k. n n If Alice hires a new employee n + 1, she has to install ( t−2 ) = (n+1 t−1 ) − ( t−1) new locks. The labels of these locks are subsets B j of {1, . . . , n + 1} with n + 1 ∈ B j and |B j | = t − 1. Thus, in the physical setting, the new employee n + 1 gets the keys to all the old locks, and the keys to the new locks are distributed accordingly among the old employees. In the virtual implementation, Alice has to choose a new element g j ∈ G for every lock but one, including the old locks. The remaining key is chosen such that the sum yields s. If Alice keeps the old keys for the locks A1 , . . . , A m , then the new employee can recover the secret s all on his own by summing up his keys. The large number of keys and the difficulties when adding participants are serious drawbacks. These problems are solved by an approach developed by Shamir [94].
Shamir’s secret sharing scheme We assume that the secret s is a natural number: for instance, s could encode a password in decimal. Alice first chooses a prime number p which should be much larger than n and s. The prime number p is published. Now, Alice constructs a random polynomial a(X) by choosing coefficients a1 , . . . , a t−1 ∈ 𝔽p independently and uniformly at random. These coefficients together with the secret s define the polynomial t−1
a(X) = s + ∑ a i X i ∈ 𝔽p [X] i=1
The degree of a is smaller than t, and at 0, the polynomial a evaluates to a(0) = s. Next, Alice communicates the information (i, a(i)) to person i using a secure channel. If t or more of these people work together, they can reconstruct the coefficients of the polynomial a with their t values and, thus, the secret s. For instance, they may use Lagrange interpolation. For t pairwise different elements x1 , . . . , x t ∈ 𝔽p , the ith Lagrange polynomial ℓi (X) is defined by ℓi (X) = ∏ 1⩽j⩽t j=i̸
X − xj xi − xj
2.14 Digital commitment | 71
Thus, ℓi is a polynomial of degree t − 1, and {1 if i = j ℓi (x j ) = { 0 otherwise { ̃ = ∑1⩽i⩽t a(x i )ℓi (X). The polyIf a(x1 ), . . . , a(x t ) are known, we can determine a(X) ̃ ̃ nomial a(X) − a(X) has zeros in each of x1 , . . . , x t . Since the degree of a(X) − a(X) is at ̃ most t − 1, it has to be the zero polynomial. We conclude a(X) = a(X). This shows that t or more people can reconstruct the secret s. It is also easy to see that further people can be added easily. We still have to show that less than t people are not able to obtain information about the secret s. For this, let t − 1 different points (x i , a(x i )) ∈ (𝔽p \ {0}) × 𝔽p for i ∈ {1, . . . , t −1} be known. If less than t −1 points are known, even less information is ̃ available. By Lagrange interpolation, there are p polynomials a(X) ∈ 𝔽p [X] of degree ̃ i ) = a(x i ) for all i ∈ {1, . . . , t − 1}. Each of these polynomials less than t satisfying a(x ̃ ̃ would result in a different secret s̃ = a(0). Since each such polynomial a(X) is equally likely, every secret s̃ is equally likely. We note that if we had used ℤ or ℚ instead of 𝔽p , then the possible elements s̃ could not all have the same probability; in addition, the coefficients a i could not be chosen uniformly at random.
2.14 Digital commitment The following conversation between Alice, an employee of the investment advisory company Ruin Invest, and a potential customer Bob was intercepted.³ Bob: Tell me five shares you recommend for purchase. If they all rise during the next month, then I will be your customer. Alice: If I told you the shares, then you could invest without paying us. I suggest you look at our recommendations of the last month. You will be amazed what gains you would have made on a purchase. Bob: How can I be sure that you really recommended these shares? You could just pick five winners. If you now tell me ongoing shares, I know you cannot change your choice. I would not cheat and invest in the shares you recommended. Alice: We do not cheat either, and we will tell you the recommended shares of the last month. The mutual distrust leads to a problem here. As a mechanical solution, the current recommendation could be deposited in a sealed envelope at a safe place. Then the choice could be checked afterward. An electronic version to solve this problem is re-
3 This scenario was taken from the lecture notes of a course taught by Holger Petersen at the University of Stuttgart in Summer Semester 2007.
72 | 2 Cryptography
ferred to as digital commitment. Alice commits herself to an information t (here just 1 bit). Using a symmetric encryption method, the following steps are performed: (1) Alice randomly chooses an (initially secret) key k. (2) Bob randomly generates a string x and sends it to Alice. (3) Alice encrypts xt using the key k by computing y = c k (xt). (4) Alice sends y to Bob. Thus, Alice has committed herself to xt. Later on, Bob can verify Alice’s choice: (5) Alice sends the key k to Bob. (6) Bob decrypts the message y using the key k and verifies that d k (y) = xt for a bit t. Then, Bob is convinced that Alice had committed herself to t. How can Alice cheat? She could try to find a key k resulting in a desired t . But then also d k (y) = xt has to hold for the random string x. With a randomly selected key k, this corresponds to a known plaintext attack, which a good cryptosystem should be able to withstand. Bob is in an even worse position because, besides the secret text y, he only knows a prefix of the plaintext (namely, x). A public collision resistant hash function f can also be used to make a commitment. The protocol for this is as follows: (1) Alice randomly chooses two bit strings x1 , x2 . (2) Alice sends f(x1 x2 t) and x1 to Bob. Now again Alice has committed herself to t. Later, Bob can verify Alice’s information t: (3) Alice sends x1 x2 t to Bob. (4) Bob compares f(x1 x2 t) and x1 with those values he previously obtained from Alice. Since f is difficult to invert, Bob cannot determine the bit t before he receives x1 x2 t (especially because several preimages of f(x1 x2 t) could have the prefix x1 ). Disclosing a part of the random information makes it possible to check whether Alice chose special strings that would make it easier to find a collision. So Alice has to disclose the “true” value of t. The main advantage of this protocol is that Bob does not have to send any messages: for instance, Alice could use a radio station, newspaper ads, a web page or her Twitter account. Example 2.5. Alice and Bob have a telephone conversation. They want to decide who has to attend the cryptography course the next day to take notes. For this purpose, each of them flips a coin. If both have the same result (both have heads or both have tails), then Alice has to attend the course; otherwise Bob has to go. If Alice tells Bob the result of her coin toss, he could choose his result such that Alice has to go; conversely, the same situation occurs if Bob is the first to tell the result of his coin toss. Therefore, they agree on the following protocol. Alice commits to a bit t, then Bob tells Alice his
2.15 Shamir’s attack on the Merkle–Hellman cryptosystem
| 73
coin toss result in form of a bit t . Finally, Alice reveals her bit t, and both of them ◊ know who will attend the course.
2.15 Shamir’s attack on the Merkle–Hellman cryptosystem The Merkle–Hellman cryptosystem is based on the knapsack problem, which is one of the classical NP-complete problems [44]. The cryptosystem was developed by Merkle and Hellman and published in 1978 [75]. Their hope was to have designed a secure cryptosystem that is based on a provably hard problem (unless P = NP). This hope did not last long: soon after its presentation, it was destroyed by Shamir. In the following, we describe the Merkle–Hellman scheme in its original form and, after that, we explain Shamir’s method for breaking it. The presentation essentially follows [95], but in a somewhat less technical (and less general) way. The input to the knapsack problem is a sequence (s1 , . . . , s n , c) of natural numbers in binary, and the question is whether there is a subset I ⊆ {1, . . . , n} such that ∑ si = c i∈I
We understand c to be the capacity of a knapsack and s i the size of the ith piece. Then, we want to know whether it is possible to fill the knapsack completely. The algorithmic difficulty arises from the binary encoding of the numbers, because when the numbers are given in unary, then the problem is solvable in polynomial time; see Exercise 2.18. On the other hand, the problem becomes very simple in some special cases, even if the inputs are in binary. We call a sequence of positive real numbers (s1 , . . . , s n ) superincreasing if s j > j−1 ∑i=1 s i for all 1 ⩽ j ⩽ n. The sequence s i = 2i for i ∈ ℕ is superincreasing, whereas the Fibonacci sequence F i is not. The knapsack problem for superincreasing sequences is solvable in linear time: when considering s i in the order from n down to 1, one has to include i in I if and only if the remaining capacity suffices. In particular, solutions are unique.
The Merkle–Hellman cryptosystem The cryptographic idea is to hide the property of being superincreasing by a modulo computation. The Merkle–Hellman cryptosystem consists of two parts: one private and one public. Alice secretly chooses a superincreasing sequence (s1 , . . . , s n ) and a modulus m. This number has to satisfy m > ∑ni=1 s n . Preferably, m should be prime because Alice needs two more secret numbers u, w ∈ {2, . . . , m − 1} such that uw ≡ 1 mod m. Concretely, it was suggested to have n = 100 and the numbers s i having n+i−1 bits while m should have 2n bits. In order to hide the order of the numbers s i ,
74 | 2 Cryptography Alice chooses a permutation π on {1, . . . , n}. She publishes a sequence (a1 , . . . , a n ) with values 0 < a i < m and a π(i) = s i u mod m Then, a i is a number with 2n bits. Bob encrypts a bit string (x1 , . . . , x n ) ∈ 𝔹n of length n by the sum y = ∑ xi ai i⩽n
Bob sends y to Alice on a public channel. It follows y < n22n , so y consists of about 2n + log n bits. Alice receives y and computes c = yw mod m. She knows c ≡ ∑ x π(i) a π(i) w ≡ ∑ x π(i) s i i⩽n
mod m
i⩽n
and by the choice of m this yields c = ∑i⩽n x π(i) s i . Since (s1 , . . . , s n ) is superincreasing, she can reconstruct the sequence (x π(1) , . . . , x π(n)) ∈ 𝔹n and therefore she knows all the x i for 1 ⩽ i ⩽ n.
Shamir’s attack We will show that the Merkle–Hellman system in the form just described is insecure and can be broken. The crucial observation is that it is neither necessary to compute the permutation π nor the sequence (s1 , . . . , s n ) nor the numbers m, or w. The attacker aims for a permutation σ, a superincreasing sequence of positive rational numbers (r1 , . . . , r n , 1), and a number V with 0 < V < 1 such that r i ≡ a σ(i) V mod 1 Here, a ≡ b mod 1 means that the difference a − b is an integer. If the aim is achieved, it follows that yV = ∑ x i a i V = ∑ x σ(i) a σ(i) V ≡ ∑ x σ(i) r i mod 1 i⩽n
∑ni=1 r i
i⩽n
i⩽n
Since 1 > and because the sequence of the r i is superincreasing, an attacker obtains all x σ(i) efficiently. We may have π ≠ σ, but the interest is in the sequence (x1 , . . . , x n ), only. Since σ and the x σ(i) are known, the x i can be computed. It is not clear how an attacker can achieve this task, but there is a very important aid: the attacker knows that σ, the sequence (r1 , . . . , r n ), and the number V exist: consider σ = π, r i = s i /m, and V = w/m. We fix the unknown value υ = w/m and we try to approach υ with our value V. For each index i ∈ {1, . . . , n}, we consider the function f i : [0, 1) → [0, 1) with f i (V) = a i V mod 1. As usual, we have [0, 1) = { x ∈ ℝ | 0 ⩽ x < 1 }. The graph of this function is a sawtooth wave. There are exactly a i zeros 0, 1/a i , . . . , (a i − 1)/a i ; and, except for the jumps, the slope is always a i . In the following, numbers of the form q/a i with 0 ⩽ q < a i are called positions.
2.15 Shamir’s attack on the Merkle–Hellman cryptosystem
| 75
f i (V) 1
0 ai
1 ai
2 ai
⋅⋅⋅
1
V
At the point υ, the function f π(i) evaluates to f π(i)(υ) = a π(i) υ mod 1 = (s i u mod m)w/m mod 1 = s i /m . Since (s1 , . . . , s n , 1) is superincreasing, this means that the permutation π is determined by the order of the values f i (υ). This is the crucial observation that makes breaking the Merkle–Hellman method possible. The problem is now reduced to finding a sufficiently good approximation of the value υ. There are less than n4 possibilities for the initial sequence (π(1), . . . , π(4)). Without loss of generality, we can assume to be in a phase of the attack where the values π(1), . . . , π(4) are correct. If the initial sequence is not correct, it is not clear whether the aim will be achieved. If the aim is not achieved, then we try the next possibility for π(1), . . . , π(4). If the aim is achieved with an incorrect hypothesis for π(1), . . . , π(4), this does not matter. Thus, after a maximum of four exchanges in the sequence (a1 , . . . , a n ), we can assume that we have π(j) = j for 1 ⩽ j ⩽ 4; and, without restriction, a1 , . . . , a4 correspond to the smallest values s1 , . . . , s4 . Let J = {1, 2, 3, 4}. We choose λ ⩾ 0 with a j > 22n+j−λ for all j ∈ J. The attack is easier if λ is small. If a1 , . . . , a4 were uniformly distributed, then we would expect λ ⩽ 8. From a practical viewpoint, this justifies assuming that λ is a small constant. Actually, a weaker assumption suffices in the mathematical analysis. Later, we see that the attack is successful with high probability as long as λ is sublinear. Next, we consider for each j ∈ J a natural number c j with 0 < c j < a j and υ=
cj + εj aj
(2.1)
where 0 ⩽ ε j < 1/a j . We have a j υ = c j + a j ε j and thus a j υ mod 1 = s j /m = a j ε j < 1 for all j ∈ J (because π(j) = j for j ∈ J). From the estimations m > 22n − 1, a j > 22n+j−λ and s j < 2n+j−1 (from the bit length of s j ), we conclude 22n−1+2n+j−λ ε j < 2n+j−1 and thus ε j < 2−3n+λ c
(2.2)
Let ℓ ∈ J be such that ajj ⩽ ac ℓℓ for all j ∈ J. In the following, we assume that cℓ ≠ 0. We discuss this again later and provide a justification for the assumption. For the time
76 | 2 Cryptography being, we can consider the case cℓ ≠ 0 as the case to treat first. From (2.1) and (2.2), we obtain cℓ c j 0⩽ − = ε j − εℓ < 2−3n+λ aℓ a j Because of cℓ ≠ 0, the following system of Diophantine inequalities in the four unknowns c j with j ∈ J has an integer solution: 0⩽
cℓ c j − < 2−3n+λ , aℓ a j
1 ⩽ cℓ < aℓ
and 0 ⩽ c j < a j for j ∈ J \ {ℓ}
(2.3)
By υ, we know that at least one solution of the system (2.3) is guaranteed. For the following, we only need the value cℓ and the index ℓ ∈ J. However, we need to compute all possible solutions for each ℓ ∈ J and thus, a priori, we obtain a list cℓ,1 , cℓ,2 , cℓ,3 , . . . . The number of ℓ ∈ J is only 4, but the respective lists could become too long and thus cause the attack to fail. But the following heuristic shows that one does not have to be afraid of the lists becoming too long. On the contrary, we can assume that we will find exactly one ℓ and only one cℓ ≠ 0, and even this only if we investigate the correct set {π(1), . . . , π(4)}. In all other cases, it is likely that no solution will be found (which then also shows that {π(1), . . . , π(4)} was not chosen correctly). To explain the heuristic, we assume for the moment that the numbers a j for j ∈ J are independently and uniformly distributed random variables. If we fix a position 0 < p/aℓ < 1, then the probability for the interval [ p/aℓ − 2−3n+λ , p/aℓ ] to have a nonempty intersection with each set of positions ℕ/a j for j ∈ J\{ℓ}, is at most (2−n+λ )3 = 2−3n+3λ . Summing up over all positions, we obtain the result that there exists a solution with a probability which is at most 2−n+3λ . If λ is sublinear, then this probability converges to zero. (For λ ∈ O(1), the probability approaches zero exponentially fast.) A problem with the estimation is that the values a j are not independent and, for the right choices of {π(1), . . . , π(4)} and ℓ, there exists some solution. However, in many cases there is only one (or very few) solutions. Moreover, from the user perspective, we must deal with the case where the solution (c j )j∈J ∈ ℕ4 is unique. This solution can be found by standard methods in polynomial time as there is only a constant number of inequalities. Should there be, contrary to expectations, several solutions, an attacker might find all solutions by checking them one by one. For simplicity, we may assume that {π(1), . . . , π(4)} is correct and for this choice exactly one solution cℓ of the system (2.3) with ℓ ∈ J exists and that cℓ and ℓ ∈ J are known. Defining α = cℓ /aℓ and ε = 2−3n+λ , from Equation (2.1), we obtain the property υ ∈ [α, α + ε]. For each index 1 ⩽ i ⩽ n, at most one position p/a i can be within the interval [α, α + ε] because the distance a1i between p/a i and (p + 1)/a i is too large for ε. The graph of the function f i with f i (V) = a i V mod 1 within the interval [α, α + ε] therefore consists of at most two segments, which can easily be computed. There are at most
2.15 Shamir’s attack on the Merkle–Hellman cryptosystem
| 77
O(n2 ) intersections, and we partition the interval [α, α + ε] in at most O(n2 ) subintervals such that no intersections and no positions are in the interior of the subintervals. Since (0, s1 /m, . . . , s n /m, 1) is superincreasing, υ has to be in the interior of such a subinterval. This is a good occasion to reintegrate the previously excluded case cℓ = 0, which leads to the special case α = 0. Within the interval [0, ε], there are no intersections at all, and we only get one additional interval (0, ε) with the possibility υ ∈ (0, ε). Now, let (β, β+δ) be the interior of a subinterval (where now we allow β = 0 and δ = ε). The graph of each function f i yields a unique segment within (β, β + δ). Thus, the segments are linearly ordered from bottom to top and this defines a permutation σ. f σ(3) f σ(4) f σ(2) f σ(1)
β
β+δ
V
For every V ∈ (β, β + δ), the corresponding permutation σ has the property 0 < a σ(1) V mod 1 < ⋅ ⋅ ⋅ < a σ(n) V mod 1 < 1 Only very few of these sequences (a σ(1) V mod 1, . . . , a σ(n) V mod 1, 1) will be superincreasing. However, at the latest during the attack phase, in which the guess of {π(1), . . . , π(4)} and cℓ is correct, the existence of such a sequence in one of the subintervals is guaranteed. In this phase, it is the interval (β, β + δ) with υ ∈ (β, β + δ). Here, σ coincides with π. Of course, we do not know at all whether we are in the right phase or at the right β. But we know that this is possible and we just want to check whether (a σ(1) V mod 1, . . . , a σ(n) V mod 1, 1) is superincreasing. First, for 1 ⩽ i ⩽ n, we define natural numbers c i by ⌊a i β⌋. Then, for V ∈ (β, β + δ), we have a i V mod 1 = a i − c i . For each interval (β, β + δ) with associated permutation σ we obtain the following n + 3 linear inequalities in the indeterminate V: i−1
n
∑ (a σ(k) V − c σ(k) ) < a σ(i) V − c σ(i) ,
∑ (a k V − c k ) < 1,
k=1
k=1
β 1, and f, g : ℝ⩾0 → ℝ⩾0 with f(1) ⩽ g(1). If f(n) ⩽ a ⋅ f(n/b)+ g(n) for all n ⩾ b, then f(b k ) ⩽ ∑ki=0 a i ⋅ g(b k−i ) for all k ∈ ℕ. Proof: For k = 0, the claim is true. Now, let k > 0. By definition, we have f(b k ) ⩽ a ⋅ f(b k−1 ) + g(b k ). The inductive hypothesis for k − 1 yields k−1
k
f(b k ) ⩽ g(b k ) + a ∑ a i ⋅ g(b k−1−i ) = ∑ a i ⋅ g(b k−i ) i=0
i=0
84 | 3 Number theoretic algorithms
Distinguishing different cases depending on the specific values of the parameters a and b and the specific function g, we obtain the following theorem. Here, log n = max(1, log2 n). Theorem 3.2 (Master theorem). Let a, b, c ∈ ℝ⩾0 with b > 1 and let f : ℝ⩾0 → ℝ⩾0 be nondecreasing with f(n) ⩽ a ⋅ f(n/b) + O(n c ). Then O(n c ) { { { f(n) ∈ { O(n c log n) { { log b a ) { O(n
if a < b c if a = b c if a > b c
Proof: Suppose that f(1) ⩽ d and f(n) ⩽ a ⋅ f(n/b) + dn c for some d > 0. The function h(n) = f(n)/d satisfies h(1) ⩽ 1 and h(n) ⩽ a ⋅ h(n/b) + n c . If h satisfies the claim of the theorem, so does f . It therefore suffices to deal with the case f(1) ⩽ 1 and f(n) ⩽ a ⋅ f(n/b) + n c . Furthermore, it is enough to consider n = b k with k ∈ ℕ: the function f is nondecreasing and since p(bn) ∈ O(p(n)) for the functions p on the right-hand side, we can bound the value on n by the value on the next b-power. With g(n) = n c , Lemma 3.1 yields k
k
f(n) = f(b k ) ⩽ ∑ a i ⋅ b c(k−i) = n c ∑ ( i=0
i=0
a i ) bc
If a < b c , the limit of the geometric series yields a constant factor ∞
f(n) ⩽ n c ⋅ ∑ ( i=0
a i 1 ) = nc ⋅ ∈ O(n c ) bc 1 − bac
For a = b c , we have f(n) ⩽ n c ⋅ (k + 1) ∈ O(n c log n). The remaining case is a > b c : k+1
−1 ( bac ) a i a logb (n) f(n) ⩽ n ⋅ ∑ ( c ) = n c ⋅ ∈ O(n c ( c ) ) = O(nlog b a ) a b b − 1 bc i=0 k
c
Thus, the proof is complete for all cases. The following example gives another typical approach for solving recurrences. The idea is to verify an educated guess. Example 3.3. Let f : ℝ⩾0 → ℝ⩾0 be a nondecreasing function such that f(n) ⩽ 2f( n2 + 4) + dn for some d ⩾ 1. If we had f(n) ⩽ 2f( n2 ) + dn, then f(n) ∈ O(n log n) by the master theorem. The “+4” should not change the asymptotics, and we therefore conjecture that for some sufficiently large constant k ⩾ 4d we have f(n) ⩽ kn log2 n for all n ⩾ 1. Here, “sufficiently large” means that the conjecture holds for all n between 1 and 256d, that is, the base case is true. Let now n > 256d and suppose the conjecture
3.1 Runtime analysis of algorithms
| 85
holds for values smaller than n. Then n f(n) ⩽ 2f ( + 4) + dn 2 n n ⩽ 2k ( + 4) log2 ( + 4) + dn 2 2 n ) + dn ⩽ (kn + 8k) log2 ( √2 = (kn + 8k)(log2 n − 1/2) + dn kn + 8k log2 n + dn ⩽ kn log2 n − ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 2 0 do if n is odd then e := a ⋅ e fi; a := a2 ; n := ⌊ 2n ⌋ od; return e end
If n is odd, then a n = a ⋅ a n−1 , and n − 1 is even. Therefore, n can be halved in each iteration. If â and n̂ are the initial values of a and n, respectively, then the loop invariant is e ⋅ a n = â n̂ . In each execution of the while loop at most two monoid operations are performed, so this algorithm requires at most 2⌊log2 n⌋ + 2 operations. The maximum number of operations is reached on numbers of the form n = 2k − 1. In the case of M = ℤ/mℤ, with modular multiplication as the operation, the above algorithm is also called fast modular exponentiation. This is meant to emphasize that you do not compute the number a n ∈ ℤ first and then reduce modulo m, but that all intermediate results are computed modulo m. Therefore, these numbers are limited by m, whereas a n can already become very large for rather small numbers a and n. We note that for cryptographic applications, in order to make sure that you cannot deduce any information about the input by measuring the execution time or the energy consumption, it is often desirable for the running time not to depend on n but only on the number of digits of n. For this purpose, the first line of the while loop could be replaced by if n is odd then e := a ⋅ e else f := a ⋅ f fi;
3.3 Binary GCD |
87
Remark 3.4. The above approach is not always optimal. Consider the computation of a15 . Using fast exponentiation, we need six operations: squaring three times yields the elements a2 , a4 , a8 ; and with three further multiplications, we obtain a15 = a ⋅ a2 ⋅ a4 ⋅ a8 . If, on the other hand, we compute the five elements a2 , a3 = a ⋅ a2 , a6 = (a3 )2 , a12 = (a6 )2 and a15 = a12 ⋅ a3 with one operation each, then these five multiplications lead to the desired result. The general approach of this kind is usually referred to as addition chains. An addition chain for n ∈ ℕ \ {0} is a sequence (a0 , . . . , a k ) of natural numbers with the properties a0 = 1, a k = n, and each number a i (for i ⩾ 1) is the sum of two (not necessarily different) preceding numbers. Here, the length k is the number of monoid operations required to compute a n by this addition chain. The addition chain for 15 that results from fast exponentiation, is (1, 2, 3, 4, 7, 8, 15). The alternate computation with five multiplications uses the addition chain (1, 2, 3, 6, 12, 15). The main algorithmic disadvantage (except that one has to construct optimal addition chains, which, however, does not require monoid operations) is that, in general, all intermediate results need to be stored. At best, nearly half of the required monoid operations can be saved using addition chains; see Section 4.6.3 in [55]. ◊
3.3 Binary GCD We have seen, in Section 1.5.1, how to compute the greatest common divisor with a logarithmic number of recursive calls. The algorithm presented there can be applied to arbitrary rings which admit a Euclidean division, so-called Euclidean domains. This includes polynomial rings over fields; see Theorem 1.44. However, in many settings, the operation “ℓ mod k” is considered to be expensive in the sense that it uses significantly more time and energy than, for instance, addition and subtraction. The binary gcd (the procedure bgcd below) efficiently computes the greatest common divisor of two natural numbers such that all divisions and multiplications are by the number 2. Assuming that we represent natural numbers in binary, these are nothing but bitwise shifts. /∗ Assume that k ⩾ 0, ℓ ⩾ 0 ∗/ function bgcd(k, ℓ) begin if k = 0 or ℓ = 0 then return k + ℓ elsif k and ℓ are even then return 2⋅bgcd(k/2, ℓ/2) elsif k is even and ℓ is odd then return bgcd(k/2, ℓ) elsif (k is odd and ℓ is even) or k < ℓ then return bgcd(ℓ, k) else return bgcd(k − ℓ, ℓ) fi end The number of recursive calls is in O(log k +log ℓ) since, after at most two calls, at least one of the arguments is divided by 2. Note that 2 is a common divisor of k and ℓ if and
88 | 3 Number theoretic algorithms only if both of them are even. Since gcd(k, ℓ) = gcd(k − ℓ, ℓ), the correctness of the algorithm follows. It is possible to extend the binary gcd such that, on input k and ℓ, it additionally computes numbers a and b with ak + bℓ = gcd(k, ℓ). Moreover, this extension also does not use arbitrary division and multiplication; see Exercise 3.3.
3.4 Probabilistic recognition of primes For many applications, such as the RSA cryptosystem, very large prime numbers are needed. This leads to the question of how to rapidly check a given (large) number n for primality. In the next chapter, we will present a deterministic polynomial time algorithm for this problem. However, its running time cannot compete with the probabilistic methods presented here. The two main procedures in this section – the Solovay– Strassen and the Miller–Rabin primality test – work in rounds. In each round, a number a ∈ {1, . . . , n − 1} is selected at random and, depending on a, the output is either that n is composite or that n is probably prime. If the output is that n is composite, then n is actually composite. But whenever the output says that n is probably prime, then n could still be composite. However, both methods have the property that, for a composite number n, with probability at least 1/2, the chosen a reveals n as composite. Therefore, after k independent rounds, a composite number is discovered with a probability of at least 1 − 2−k . If after 100 rounds each time the output claimed that n may be prime, then the error probability is less than 0.00000 00000 00000 00000 00000 00000 78887
3.4.1 Fermat primality test and Carmichael numbers On input n, the Fermat test consists of two steps. First, it randomly chooses a ∈ {1, . . . , n − 1}. Then, it computes b = a n−1 mod n. If n is prime, then Fermat’s little theorem yields b = 1. Thus, if b ≠ 1, then n is composite. Otherwise, if b = 1, then n might be prime, but we do not know for sure. It could therefore be helpful to repeat the procedure in order to possibly reveal n as composite. Carmichael numbers are named after Robert Daniel Carmichael (1879–1967). A Carmichael number is a composite number n satisfying a n−1 ≡ 1 mod n for all a ∈ (ℤ/nℤ)∗ . This means that the Fermat test fails for such numbers because there is no real chance to find a suitable a which proves that n is composite (of course, there is always the tiny chance to choose a ∈ ̸ (ℤ/nℤ)∗ and find a divisor of n). There are infinitely many Carmichael numbers [2]. The first three Carmichael numbers are 561, 1105, and 1729. The Carmichael numbers are the Knödel numbers K1 . They are named after Walter Knödel (born 1926) who was the founding Dean of the Department of Computer Science at the University of Stuttgart. A number n belongs to K r
3.4 Probabilistic recognition of primes |
89
if a n−r ≡ 1 mod n for all a ∈ (ℤ/nℤ)∗ . The following theorem collects some basic properties of Carmichael numbers. A natural number n is square-free if m2 ∤ n for all integers m ⩾ 2, that is, no prime power of exponent greater than 1 appears in the prime factorization of n. Theorem 3.5. Let n be a Carmichael number. Then the following properties hold: (a) n is odd (b) n is square-free, and (c) n has at least three different prime factors. Proof: (a) Note that n ⩾ 4. If n is even, then −1 = (−1)n−1 ≡ 1 mod n, a contradiction. (b) Suppose that n = p d u for some prime number p such that p ∤ u and d ⩾ 2. Then wee have p p (p d−1 + 1) ≡ ∑ ( )p(d−1)k ≡ 1 + p ⋅ p d−1 + p2d−2 ℤ ≡ 1 mod p d k k This yields (p d−1 + 1)n ≡ 1 mod p d and (p d−1 + 1)n−1 ≢ 1 mod p d . Let b ≡ p d−1 + 1 mod p d and b ≡ 1 mod u. Then, b is an element in (ℤ/nℤ)∗ satisfying b n−1 ≢ 1 mod n, a contradiction. (c) Assume that n = pq for two prime numbers p ≠ q. Let g ∈ ℤ/nℤ be a generator modulo p and congruent 1 modulo q. The order of g is p − 1. On the other hand, g n−1 = 1 since n is a Carmichael number. This yields p − 1 | n − 1 = p(q − 1) + p − 1 and thus p − 1 | q − 1. Symmetrically, we have q − 1 | p − 1 and thus p = q, a contradiction. Therefore, n has at least three different prime factors.
3.4.2 Solovay–Strassen primality test Robert Martin Solovay (born 1938) and Volker Strassen (born 1936) combined Euler’s criterion and the law of quadratic reciprocity to obtain a probabilistic primality test, the Solovay–Strassen test. The basic idea is the following. To assure the primality of n−1 a given odd number n ⩾ 3, in each round you verify that a 2 mod n = ( an ) for a randomly chosen a ∈ (ℤ/nℤ)∗ . More precisely, one round of the Solovay–Strassen test works as follows: (1) Select a ∈ {1, . . . , n − 1} at random. n−1 (2) Compute b = a 2 mod n. If b ∈ ̸ {1, −1}, halt and output “n is composite.” (3) Compute c = ( an ). If b = c, halt and output “n is probably prime.” Otherwise, output “n is composite.” For the computation of b, we use fast exponentiation and c is computed using the algorithm for the Jacobi symbol. Note that b ∈ {1, −1} implies a ∈ (ℤ/nℤ)∗ . If n is prime,
90 | 3 Number theoretic algorithms
then the output is “n is probably prime” by Euler’s criterion and Zolotarev’s lemma. We now show that, in every round, composite numbers are detected with probability greater than 1/2. If n survived k rounds of this procedure, then the probability that n is prime is at least 1 − 2−k . There are no exceptions for this test such as Carmichael numbers for the Fermat test. Theorem 3.6. Let E(n) = { a ∈ (ℤ/nℤ)∗ | a 2 ≡ ( an ) mod n } for an odd number n ⩾ 3. Then, E(n) is a subgroup of (ℤ/nℤ)∗ and we have n−1
E(n) = (ℤ/nℤ)∗ ⇔ n is prime Proof: E(n) contains the element 1; so it is not empty and, by Theorem 1.70 (d), E(n) is a group. The direction from right to left follows from Euler’s criterion (Theorem 1.68) and Zolotarev’s lemma (Theorem 1.69). For the remaining direction, suppose that E(n) = (ℤ/nℤ)∗ for a composite number n. In particular, n is a Carmichael number. From Theorem 3.5, n is square-free. We can write n = pu for a prime number p ⩾ 3 with p ∤ u. Let a be not a square modulo p and congruent 1 modulo u. In particular, a ∈ (ℤ/nℤ)∗ . Then, we have a a a ( ) = ( )( ) = (−1) ⋅ 1 = −1 n p u From E(n) = (ℤ/nℤ)∗ , we obtain a 2 ≡ −1 mod n and therefore a This is a contradiction to a ≡ 1 mod u. n−1
n−1 2
≡ −1 mod u.
If n is an odd and composite number, then the index of the subgroup E(n) in (ℤ/nℤ)∗ is at least 2. Thus, more than half of the elements of {1, . . . , n − 1} are not in E(n). All such elements lead to a round of the Solovay–Strassen test in which n is detected as composite.
3.4.3 Miller–Rabin primality test The initial situation is as follows: let n ⩾ 3 be an odd number. After a certain number of trial divisions, we suspect that n could be prime. The Miller–Rabin test is named after Gary Lee Miller and Michael Oser Rabin. In every round, it either refutes the primality or it yields the result that n is prime with a probability of at least 3/4. The Miller– Rabin test refines the Fermat test and it can also handle Carmichael numbers. It can even be efficiently performed for large numbers and, by repeating it independently, it can be made sufficiently reliable for all practical purposes, such as cryptographic applications. Given an odd number n ⩾ 3, the Miller–Rabin test works as follows: (1) Write n − 1 = 2ℓ u for an odd number u. (2) Select a ∈ {1, . . . , n − 1} at random and let b = a u mod n. (3) If b = 1, halt and output “n is probably prime.”
3.4 Probabilistic recognition of primes |
91
(4) For b ≠ 1, construct the following sequence by successive squaring in ℤ/nℤ: 1
2
3
ℓ−1
(b, b2 , b 2 , b 2 , . . . , b 2 ) (5) If −1 occurs in this sequence, halt and output “n is probably prime”; otherwise output “n is composite.” First, let n be prime. Then by Fermat’s little theorem, we have a n−1 ≡ 1 mod n for all a ∈ {1, . . . , n − 1}. If b ≡ 1 mod n, the output is “n is probably prime.” So let b ≢ 1 ℓ 1 2 3 ℓ−1 mod n. Since b 2 ≡ a n−1 ≡ 1 mod n, the sequence (b, b 2 , b 2 , b 2 , . . . , b 2 ) contains an element c whose square in ℤ/nℤ is equal to 1, and c itself is not 1. But ℤ/nℤ is a field in which the polynomial X 2 − 1 only has the two roots 1 and −1. Therefore, −1 occurs in the list. This shows that in any case the output is “n is probably prime” whenever n is actually prime. It remains to consider the case where n is composite. In Theorem 3.11, we show that then in at most 1/4 of all cases the output is “n is probably prime.” Proposition 3.7. Let n = rst ∈ ℕ for pairwise coprime numbers r, s, t ⩾ 1 such that r, s ⩾ 3 are odd. Let u be odd, ℓ ⩾ 1 and ℓ G = { a ∈ (ℤ/nℤ)∗ a2 u ≡ 1 mod rs } H = { a ∈ (ℤ/nℤ)∗ a k ≡ ±1 mod rs }
ℓ
for k = min{ 2ℓ −1 u | ∀a ∈ G : a2
u
≡ 1 mod rs }. Then, H ≠ G.
Proof: Since ℓ ⩾ 1, we have −1 ∈ G and thus k ∈ ℕ. By construction of k there exists a ∈ G with a2k ≡ 1 mod rs and a k ≢ 1 mod rs. Without loss of generality, let a k ≢ 1 mod r. From the Chinese remainder theorem, there exists b with b ≡ a mod r and b ≡ 1 mod st. We have b 2k ≡ 1 mod rs and hence b ∈ G. On the other hand, b ∈ ̸ H because b k ≢ ±1 mod rs. This shows H ≠ G. Using Proposition 3.7, one can easily prove an error probability of at most 1/2 for the Miller–Rabin test: for any odd composite number n, we can consider the groups ℓ G = { a ∈ (ℤ/nℤ)∗ a2 u ≡ 1 mod n } H = { a ∈ (ℤ/nℤ)∗ a k ≡ ±1 mod n }
ℓ
for k = min{ 2ℓ −1 u | ∀a ∈ G : a2 u ≡ 1 mod n }. Remember that n − 1 = 2ℓ u for an odd number u. All elements a ∈ (ℤ/nℤ)∗ \ H lead to the output “n is composite.” If G ≠ (ℤ/nℤ)∗ , then we also have H ≠ (ℤ/nℤ)∗ . Otherwise, n is a Carmichael number and by Theorem 3.5, we can write n = rs for odd coprime numbers r, s ⩾ 3. Proposition 3.7 yields H ≠ G. Therefore, in any case, the index of H in (ℤ/nℤ)∗ is at least 2. This gives an error probability of 1/2. Proving the error probability 1/4 is slightly more involved. Lemma 3.8. Let n = p d ≠ 9 for some odd prime p. At most 1/5 of all elements a ∈ (ℤ/nℤ)∗ yield the output “n is probably prime” in the Miller–Rabin test.
92 | 3 Number theoretic algorithms Proof: The group (ℤ/nℤ)∗ is cyclic of order (p − 1)p d−1 . Let g be a generator of (ℤ/nℤ)∗ . We have gcd((p − 1)p d−1 , n − 1) = p − 1. Therefore, we have (g k )n−1 ≡ 1 mod n if and only if p d−1 | k. Hence, exactly p − 1 elements a ∈ (ℤ/nℤ)∗ satisfy a n−1 ≡ 1 mod n. If a n−1 ≢ 1 mod n, then ℓ−1
1, −1 ∈ ̸ {b, b 2 , . . . , b 2
}
and the output is “n is composite.” Unless p = 3 and d = 2, we have 1/p d−1
⩽ 1/5.
p−1 (p−1)p d−1
=
Lemma 3.9. Let n = rs for odd coprime numbers r, s ⩾ 3 and suppose that n is not a Carmichael number. At most 1/4 of all elements a ∈ (ℤ/nℤ)∗ yield the output “n is probably prime” in the Miller–Rabin test. Proof: Let ℓ G = { a ∈ (ℤ/nℤ)∗ a2 u ≡ 1 mod rs } H = { a ∈ (ℤ/nℤ)∗ a k ≡ ±1 mod rs }
ℓ
for k = min{ 2ℓ −1 u | ∀a ∈ G : a2 u ≡ 1 mod rs }. Note that H is a subgroup of G which, in turn, is a subgroup of (ℤ/nℤ)∗ . Since n is not a Carmichael number, we have G ≠ (ℤ/nℤ)∗ , and Proposition 3.7 shows H ≠ G. Therefore, the index of H in (ℤ/nℤ)∗ is at least 4. Next, we show that the output “n is probably prime” implies a ∈ H. If b = 1, then i ℓ a ∈ H. Let now b 2 ≡ −1 mod n for i < ℓ. Then, a2 u ≡ 1 mod rs and thus a ∈ G. In particular, the element a is considered in the definition of k and thus, 2i u divides k. This shows a ∈ H. Therefore, all elements a ∈ (ℤ/nℤ)∗ \ H lead to the output “n is composite.” In the case of n being a Carmichael number, we use the property that n has at least three different prime factors for applying Proposition 3.7 twice. Lemma 3.10. Let n be a Carmichael number. At most 1/4 of all elements a ∈ (ℤ/nℤ)∗ yield the output “n is probably prime” in the Miller–Rabin test. Proof: By Theorem 3.5, we have n = rst for coprime numbers r, s, t ⩾ 3. Let G = (ℤ/nℤ)∗ and ℓ ∈ ℕ minimal such that ℓ G = { a ∈ (ℤ/nℤ)∗ a2 u ≡ 1 mod rs }
Since −1 ∈ G, we have ℓ ⩾ 1. We define h = 2ℓ −1 u. Let H = { a ∈ (ℤ/nℤ)∗ a h ≡ ±1 mod rs } J = { a ∈ (ℤ/nℤ)∗ a h ≡ ±1 mod n } Note that J is a subgroup of H and H is a subgroup of G. Proposition 3.7 shows H ≠ G. If J ≠ H, then the index of J in G is at least 4, and we are done, since all a ∈ G \ J yield the
3.4 Probabilistic recognition of primes |
93
output “n is composite.” So let J = H, that is, a h ≡ ±1 mod rs implies a h ≡ ±1 mod n for all a ∈ G. If a h ≡ −1 mod n, then a h ≡ −1 mod rs; now, the element â defined by â ≡ a mod rs and â ≡ 1 mod t is in H \ J, a contradiction. This shows J = { a ∈ m (ℤ/nℤ)∗ | a h ≡ 1 mod n }. Let m ∈ ℕ be minimal with J = { a ∈ (ℤ/nℤ)∗ | a2 u ≡ 1 mod n }. We set k = 2m−1 u. By Proposition 3.7 K = { a ∈ (ℤ/nℤ)∗ a k ≡ ±1 mod n } is a proper subgroup of J. Hence, the index of K in G is at least 4. It remains to show that the output “n is probably prime” implies a ∈ K. If b = 1, i i then a ∈ K. Let now b 2 ≡ −1 mod n for i < ℓ. Then, b2 ≡ −1 mod rs and thus i < ℓ . It follows that a ∈ H = J. In particular, a is considered in the definition of m. We conclude i < m and a ∈ K. Therefore, all a ∈ G \ K yield the output “n is composite.” Theorem 3.11. Let n ⩾ 3 be odd and composite. At most 1/4 of all elements a ∈ {1, . . . , n − 1} yield the output “n is probably prime” in the Miller–Rabin test. Proof: All elements a ∈ {1, . . . , n − 1} with a ∈ ̸ (ℤ/nℤ)∗ yield the output “n is composite.” For n ≠ 9, the claim therefore follows from the Lemmas 3.8, 3.9 and 3.10. For n = 9, the Miller–Rabin test considers eight possible choices for a, but only a = 1 and a = −1 lead to the output “n is possibly prime.” This shows that the Miller–Rabin test yields a given level of certainty with half as many rounds as the Solovay–Strassen test.
3.4.4 Applications of the Miller–Rabin scheme Proposition 3.7 does not require that 2ℓ u = n − 1. This allows us to extend the Miller– Rabin scheme to other settings. Let n ⩾ 3 be odd. Suppose that ℓ ⩾ 1 and that u is odd. We can arbitrarily choose a ∈ {1, . . . , n − 1}, set b = a u mod n, and then consider the 2 ℓ−1 sequence b, b 2 , b 2 , . . . , b 2 . Let ℓ G = { a ∈ (ℤ/nℤ)∗ a2 u ≡ 1 mod n } H = { a ∈ G a k ≡ ±1 mod n }
ℓ
for k = min{ 2ℓ −1 u | ℓ ⩾ 0, ∀a ∈ G : a2 u = 1 }. Proposition 3.7 shows H ≠ G. For every element a ∈ ̸ H one of the following properties holds: ℓ (a) a2 u ≢ 1 mod n, or 2 ℓ−1 (b) −1 ∈ ̸ {b, b 2 , b 2 , . . . , b 2 } Thus, for a random element a ∈ {1, . . . , n − 1} one of these two conditions is satisfied in at least half of the cases. If condition (a) is satisfied because a is not invertible
94 | 3 Number theoretic algorithms
modulo n, then we find a nontrivial divisor of n, namely gcd(a, n). If, in contrast, conℓ dition (a) is satisfied for a ∈ (ℤ/nℤ)∗ , then G ≠ (ℤ/nℤ)∗ . If a2 u ≡ 1 mod n and 2 ℓ−1 2 2 condition (b) holds, then there is an element c ∈ {b, b , b , . . . , b 2 } such that c2 ≡ 1 mod n and c ≢ ±1 mod n. But then (c − 1)(c + 1) ≡ 0 mod n, and gcd(c − 1, n) is a nontrivial divisor of n. In particular, half of all elements a ∈ {1, . . . , n − 1} yield either a divisor of n or the result G ≠ (ℤ/nℤ)∗ . If φ(n) divides 2ℓ u, then G = (ℤ/nℤ)∗ and we can use the Miller–Rabin scheme as a probabilistic method for factorizing n. For example, if all prime factors of φ(n) are elements of the set {p1 , . . . , p m }, then n ⌊logp i n⌋ holds. can be factored by choosing ℓ and u such that 2ℓ u = ∏m i=1 p i
Security of the secret key in RSA Using the Miller–Rabin scheme, we will show that the secret key for the RSA cryptosystem is secure. For the RSA algorithm, two sufficiently large primes p and q are determined and n = pq is computed. Then numbers e, s ⩾ 3 with es ≡ 1 mod φ(n) are computed. The public key is the pair (n, e), and the secret key is s. Encryption is done by x → x e mod n; decryption can be accomplished by y → y s mod n. If it is possible to factorize n, then it is possible, from the knowledge of e, to determine the secret key s efficiently. However, it is generally believed that there is no efficient method for factorizing large numbers, and that it is therefore not feasible to factorize n. We want to show that with the knowledge of the secret key s, it is easily possible to determine the factors of n. Since it is assumed that the latter cannot be performed efficiently, there can be no efficient method to determine s with the knowledge of the public key (n, e). Let s ∈ ℕ be a number with (x e )s = x es ≡ x mod n for all x ∈ ℤ. Then, a es−1 ≡ 1 mod n for all a ∈ (ℤ/nℤ)∗ . We write es−1 = 2ℓ u with an odd u. From Proposition 3.7, using the same notation, we obtain H ⊊ G = (ℤ/nℤ)∗ . In particular, the following procedure yields a factor of n with high probability: (1) Choose a ∈ {2, . . . , n − 1} at random. (2) If gcd(a, n) ≠ 1, then we have a divisor of n. (3) Let b = a u mod n. 2 ℓ−1 (4) If b ≠ 1, then consider the sequence b, b 2 , b 2 , . . . , b 2 modulo n. If −1 does not ℓ occur in this sequence, then, because of b 2 ≡ 1 mod n, the sequence contains an element c ≢ ±1 mod n with c2 ≡ 1 mod n. Thus, computing gcd(c − 1, n), we obtain a divisor of n. (5) If no divisor has been found so far, repeat the process with a new value of a. As we saw above, this procedure finds a divisor of n with a probability of at least 1/2 in each iteration. Under the assumption that one cannot factorize n efficiently, this shows that it is not possible for an attacker to derive the secret key from the public key in the RSA cryptosystem.
3.4 Probabilistic recognition of primes |
95
3.4.5 Miller–Rabin versus Solovay–Strassen The Solovay–Strassen test plays a minor role in practice and it cannot compete with the Miller–Rabin test, the latter being conceptually and algorithmically simpler. The Solovay–Strassen test loses because, for a fixed choice of a, whenever the Solovay– Strassen test deduces that n is composite, then so does the Miller–Rabin test. We show this in Theorem 3.12. Let n ⩾ 3 be odd and a ∈ ℤ with gcd(a, n) = 1. The number n is a pseudoprime to base a if a n−1 ≡ 1 mod n (and so, n with the choice of a passes the Fermat test); n is an Eulerian pseudoprime to base a if a(n−1)/2 ≡ ( an ) mod n is true (and so, n with the choice of a passes the Solovay–Strassen test); finally, n is a strong pseudoprime to base a if n − 1 = 2ℓ u for an odd u and one of the following conditions is satisfied: – a u ≡ 1 mod n, or k – there exists 0 ⩽ k < ℓ such that a2 u ≡ −1 mod n On this choice of a, strong pseudoprimes pass the Miller–Rabin test. The less bases there are such that a composite number n is an (Eulerian or strong, respectively) pseudoprime, the more likely the respective primality test will detect that n is composite. Carmichael numbers are pseudoprimes to all bases. If n is an Eulerian pseudoprime to base a, then a(n−1)/2 mod n ∈ {1, −1} and therefore a n−1 = (a(n−1)/2 )2 ≡ 1 mod n. So n is a pseudoprime to base a. Thus, the Solovay– Strassen test on composite numbers never performs worse (and on Carmichael numbers it always performs better) than the Fermat test. The next theorem shows that the Miller–Rabin test performs even better; see also Exercise 3.4. Theorem 3.12. Every strong pseudoprime n to base a is also an Eulerian pseudoprime to base a. Proof: Let a ∈ (ℤ/nℤ)∗ such that the output of the Miller–Rabin test is “n is probably prime.” We need to show that the Solovay–Strassen test gives the same output. We write n − 1 = 2ℓ u with u odd. Note that ℓ ⩾ 1 since n is odd. Let b = a u mod n. Since n−1 a 2 , ( an ) ∈ {1, −1} and u is odd, we have n−1 2
≡ (a
n−1 2
u
n−1
) ≡a 2 mod n b a au a u ( )=( )=( ) =( ) n n n n
b
It therefore remains to show that b k−1
n−1 2
≡ ( bn ) mod n. If b = 1, then this is true. Let now
≡ −1 mod n for some k ∈ {1, . . . , ℓ}. In particular, the order of b modulo n is 2k . We let n = p1 ⋅ ⋅ ⋅ p t for not necessarily distinct primes p i . Let p i − 1 = 2ℓi u i for u i k k−1 odd. By reordering, we can assume that ℓ1 ⩽ ⋅ ⋅ ⋅ ⩽ ℓt . We have b 2 ≡ 1 ≢ −1 ≡ b 2 mod p i for all i. Therefore, the order of b modulo p i is 2k , and we obtain k ⩽ ℓ1 . Let s be the number of exponents ℓi with k = ℓi . For i ⩽ s we have p i − 1 = 2k u i = 2k (1 + 2z i ) = 2k + 2k+1 z i ≡ 2k mod 2k+1 , and if i > s, then 2k+1 | p i − 1 and thus p i ≡ 1 mod 2k+1 . b2
96 | 3 Number theoretic algorithms
This yields
{(2k + 1)2s ≡ 1s ≡ 1 mod 2k+1 n≡{ (2k + 1)2s +1 ≡ 2k + 1 ≢ 1 mod 2k+1 {
for s = 2s for s = 2s + 1
n−1
If s is even, then 2k+1 | n−1 and b 2 ≡ 1 mod n. Otherwise, if s is odd, then 2k+1 ∤ n−1 n−1 and b 2 ≡ −1 mod n. Euler’s criterion and Zolotarev’s lemma yield (
{−1 if i ⩽ s b )={ pi 1 if i > s {
Thus, ( bn ) = ∏i ( pbi ) = (−1)s . We conclude b
n−1 2
≡ ( bn ) mod n, as desired.
The proof of Theorem 3.12 is adapted from Carl Bernard Pomerance (born 1944), John Lewis Selfridge (1927–2010), and Samuel Standfield Wagstaff, Jr. (born 1945) [85].
3.5 Extracting roots in finite fields Extracting roots in a group G is the following problem. Let an element a ∈ G be given, which is known to be a square, that is, an element b ∈ G with a = b 2 exists. The goal is to determine such an element b satisfying a = b 2 . In general, the solution is not unique. It is often difficult to extract roots modulo n. In this section, we show that in finite fields, this problem is efficiently solvable using probabilistic algorithms. Let 𝔽 be a finite field. On input a ∈ 𝔽, we want to compute b ∈ 𝔽 with b 2 = a. We call b a square root of a and say that a is a square or a quadratic residue. If b is a root of a, then so is −b. Since quadratic equations over fields have at most two solutions, we conclude that b and −b are the only roots of a. There are some special cases where extracting roots is particularly simple. Before we examine a general method, let us first consider two such special cases. Let G be a finite group of odd order and a ∈ G. We let b = a(|G|+1)/2 . The crucial point here is that (|G| + 1)/2 is an integer. From Corollary 1.5, we have a|G| = 1. Thus, b 2 = a|G| a = a. So b is a square root of a. This covers the case 𝔽 = 𝔽2n because here the order of the multiplicative group 𝔽∗ = 𝔽 \ {0} is odd. In particular, each element in 𝔽2n is a square. Before we proceed to the second special case, we want to remark that, in general, not every element of a field is a square. A simple example for that is 𝔽3 = {0, 1, −1}. In this field, −1 is not a square. We can use Euler’s criterion to check whether an element a of a field 𝔽 with an odd number of elements is a square: we have a(|𝔽|−1)/2 = 1 if and only if a is a square. We now investigate the case |𝔽| ≡ −1 mod 4 separately since, in this case, extracting roots is simple. The reason is that exactly one of the elements a and −a in 𝔽∗ is a square: since |𝔽| + 1 is divisible by 4, the element b = a(|𝔽|+1)/4 is
3.5 Extracting roots in finite fields |
97
well defined and we have b2 = a
|𝔽|+1 2
=a⋅a
{ a ={ −a {
|𝔽|−1 2
if a is a square otherwise
Thus, if a is a square, then b is one of its square roots. Next, we consider two algorithms to extract roots in arbitrary finite fields 𝔽 for which |𝔽| is odd.
3.5.1 Tonelli’s algorithm Let 𝔽 be a finite field with an odd number q = |𝔽| of elements. We present the probabilistic method for extracting roots, as developed by Alberto Tonelli (1849–1921) in 1891. To this end, we write q − 1 = 2ℓ u with u odd. Let i G i = { g ∈ 𝔽∗ g2 u = 1 } Each set G i−1 is a subgroup of G i , and Gℓ = 𝔽∗ . Since 𝔽∗ is cyclic, all groups G i are ℓ−i cyclic, too. Let x be a generator of 𝔽∗ . Then, x2 is a generator of G i . In particular, we can see that the index of G i−1 in G i is 2. ℓ−1 Let g ∈ 𝔽∗ be not a square. With Euler’s criterion, we obtain −1 = g2 u = i ℓ−i−1 i i (g2 )2 u . Thus, g2 ∈ Gℓ−i \ Gℓ−i−1 , and, moreover, Gℓ−i−1 and g2 Gℓ−i−1 are the two cosets of Gℓ−i−1 in Gℓ−i ; that is, you can alternate between these two cosets by muli tiplication with g2 . Each quotient G i /G i−1 is generated by a power of g. Thus, also 𝔽∗ /G0 is generated by g. In particular, each element a ∈ 𝔽∗ can be written as a = g k h with h ∈ G0 . The idea now is to extract the roots of g k and h separately. For h, this is easy because G0 is of odd order. If a is a square, then k is even: Euler’s criterion yields ℓ−1
1 = a2
u
ℓ−1
= (g k h)2
u
ℓ−1
= g2
uk
⋅1
ℓ−1
and, because of g2 u = −1, we see that k must be even, showing that g k/2 is a root of g k . Therefore, it only remains to determine the exponent k. Let k = ∑j⩾0 k j 2j be ℓ the binary representation of k. For the first i bits k 0 , . . . , k i−1 of k, using g2 u = 1, we obtain ℓ−i
1 = h2
u
ℓ−i
= (ag−k )2
u u
ℓ−i
u
u
g−(∑j⩾0 k j 2 )⋅2
ℓ−i
u
g−(∑j=0 k j 2 )⋅2
= a2
j
i−1
i−1
j
ℓ−i
= (ag− ∑j=0 k j 2 )2 i−1
ℓ−i
ℓ−i
= a2
j
u
This shows that ag− ∑j=0 k j 2 ∈ Gℓ−i . If k 0 , . . . , k i−2 are known already, then the bit i−1 k i−1 is uniquely determined, because g2 ∈ ̸ Gℓ−i . Finally, this yields the following algorithm: j
98 | 3 Number theoretic algorithms
(1) (2) (3) (4)
Choose a random element g ∈ 𝔽∗ until g is not a square. i−1 j Determine k 0 , . . . , k ℓ−1 successively such that ag− ∑j=0 k j 2 ∈ Gℓ−i for all i. j −k Let k = ∑ℓ−1 j=0 k j 2 and h = ag . Return b = g k/2 h(u+1)/2 as a root of a.
In each iteration of the first step, g is not a square with probability 1/2. Therefore, a suitable g is found after only a few rounds with high probability. To check whether ℓ−i c ∈ Gℓ−i , we compute c2 u . Using fast exponentiation, this takes O(log q) operations in 𝔽. The test has to be carried out ℓ times, so O(ℓ log q) operations in 𝔽 suffice for this part. Since ℓ ⩽ log q, the worst case here is O(log2 q) operations. The remaining steps are in O(log q). This can be slightly improved using a trick of Daniel Shanks (1917–1996). For this purpose, we compute c = a u . The element g u is not a square since ℓ−1
(g u )2
u
ℓ−1
= (g2
u
u
) = (−1)u = −1 ℓ−1
Replacing g by g u , after step (1) of Tonelli’s algorithm, we may assume that g2 = −1. i−1 j i−1 j ℓ−i The condition ag− ∑j=0 k j 2 ∈ Gℓ−i now is equivalent to (cg− ∑j=0 k j 2 )2 = 1. This can be checked using O(ℓ) field operations. Thus, the complexity of the algorithm is improved to O(ℓ2 + log q) operations in 𝔽.
3.5.2 Cipolla’s algorithm Let 𝔽 be a finite field with an odd number of elements. We now present and investigate the randomized algorithm of Michele Cipolla (1880–1947) for finding a square root. This algorithm works with polynomials in 𝔽[X]. On input a ∈ 𝔽, the idea is to construct a field extension in which it is easy to compute a square root of a. In the following, let the input a ∈ 𝔽∗ be a square: (1) Repeatedly choose elements t ∈ 𝔽 at random until t2 − 4a is not a square. (2) Let f = X 2 − tX + a. (3) Compute b = X (|𝔽|+1)/2 mod f . This is the desired square root of a. We first show that, with high probability, a suitable element t is chosen in the first step. Theorem 3.13. Let a ∈ 𝔽∗ be a square. If t ∈ 𝔽 is randomly chosen, then t2 − 4a is not a square with probability (|𝔽| − 1)/(2|𝔽|). Proof: By completing the square, we see that t2 − 4a is a square if and only if the polynomial X 2 − tX + a splits into linear factors over 𝔽. The number of polynomials X 2 − tX + a with t ∈ 𝔽 that split into linear factors over 𝔽 is equal to the number of polynomials (X − α)(X − β) with αβ = a. If you choose α, then β is uniquely determined (since a ≠ 0). We have α = β if and only if α is one of the two square roots of a. There are
3.6 Integer factorization
| 99
|𝔽| − 3 different ways to choose α such that α ≠ β and αβ = a. Due to the symmetry in α and β, this yields (|𝔽| − 3)/2 polynomials. Additionally, there are the two possibilities for α = β being a square root of a. The remaining (|𝔽| − 1)/2 polynomials do not split into linear factors. This proves the statement of the theorem. If t2 − 4a is not a square, then the polynomial f = X 2 − tX + a ∈ 𝔽[X] is irreducible. Using Theorem 1.51, we see that 𝕂 = 𝔽[X]/f is a field that contains 𝔽. The field 𝕂 contains X as an element. An important property of the element X ∈ 𝕂 is provided by the following theorem. Theorem 3.14. In 𝕂, we have X |𝔽|+1 = a. Proof: If |𝔽| = p q for some prime p, then the mapping φ : 𝕂 → 𝕂 with φ(x) = x|𝔽| is the q-fold application of the Frobenius homomorphism x → x p ; see Theorem 1.28. In particular, φ is a homomorphism. Since the roots of the polynomial Y |𝔽| − Y ∈ 𝕂[Y] are exactly the elements of 𝔽, we see that φ(x) = x if and only if x ∈ 𝔽. Consider the polynomial h = Y 2 − tY + a ∈ 𝕂[Y]. The roots of this polynomial are X and t− X. Since the coefficients of h are in 𝔽, the elements φ(X) and φ(t− X) are roots of h, too. For instance, we have 0 = φ(0) = φ(h(X)) = φ(X)2 − tφ(X) + a. We conclude {X, t − X} = {φ(X), φ(t − X)}. Now, X ∈ ̸ 𝔽 yields φ(X) ≠ X and thus X |𝔽| = φ(X) = t − X. Finally, we obtain X |𝔽|+1 = XX |𝔽| = X(t − X) = a. From Theorem 3.13, it follows that, with high probability after a few runs, the algorithm will find a suitable element t ∈ 𝔽. Therefore, Cipolla’s algorithm can be performed efficiently. Once t is found, the algorithm executes O(log |𝔽|) field operations. From Theorem 3.14, we see that b ∈ 𝕂 is a square root of a. Now a, by assumption, has two square roots in 𝔽. But since the equation Y 2 = a has only two solutions, it follows that b ∈ 𝔽. This shows the validity of Cipolla’s algorithm. If a ∈ 𝔽∗ is not a square, then Cipolla’s algorithm still computes a square root b of a, but in this case we have b ∈ 𝕂 \ 𝔽. Also, note that if a is not a square, then the success probability for finding an appropriate t is (|𝔽| + 1)/(2|𝔽|); this is slightly better than in the case of a being a square.
3.6 Integer factorization The security of many cryptosystems is based on the assumption that it is impossible to factorize large numbers efficiently. In this chapter, we will introduce some methods to demonstrate that the factorization of certain numbers is easy; from that we immediately obtain criteria for poorly chosen keys in certain cryptosystems. To factorize a number n one needs to find a factor m of n with 1 < m < n. To completely decompose n into its prime factors, one can go on by factorizing m and n/m. Before starting to factorize a number n, one can use a primality test like the Miller–Rabin test to check whether n is composite at all.
100 | 3 Number theoretic algorithms 3.6.1 Pollard’s p − 1 algorithm An idea of John Michael Pollard for factorization is the following. Let p be a prime divisor of n. With Fermat’s little theorem, we obtain a k ≡ 1 mod p for all multiples k of p − 1. If p − 1 contains only prime factors of a set B, then we can use k = ∏ q∈B q⌊log q n⌋ . Frequently, B is chosen to be the set of all primes up to a certain size. If a k ≢ 1 mod n, we obtain a nontrivial divisor of n by computing gcd(a k − 1, n). This leads to the following algorithm to find a divisor of n: (1) Let k = ∏q∈B q⌊log q n⌋ . (2) Compute gcd(a k − 1, n) for a randomly chosen a ∈ {2, . . . , n − 1}. (3) If this does not yield a proper divisor of n, then try a new value of a or a larger base B of prime numbers. Of course, in step (2), we first compute b = a k mod n with fast modular exponentiation and then gcd(b − 1, n). The problem with Pollard’s p − 1 algorithm is that the structure of the group (ℤ/nℤ)∗ is fixed. If no prime divisor p of n has the property that p − 1 has only small prime factors, then this method does not work. Using an overlarge base B makes k huge, and thus the algorithm becomes inefficient.
3.6.2 Pollard’s rho algorithm for factorization Another idea of Pollard’s for integer factorization uses the birthday paradox. The basic idea behind so-called ρ algorithms is as follows. From the birthday paradox, there is a high probability that in a sequence of O(√|M|) random elements from a finite set M two chosen elements are the same. Instead of a random sequence, you determine a pseudo-random sequence m0 , m1 , . . . with the property that m i+1 can be uniquely determined from m i . Then, if m i = m j , we obtain m i+k = m j+k for all k ⩾ 0. Moreover, from the viewpoint of an observer m0 , m1 , . . . , should behave like a random sequence. In particular, the probability of finding two indices i < j with m i = m j after O(√|M|) steps is high. The shape of the image below motivates the term “ρ algorithm.” m i+1 mi mj m i−1
m1 m0
m j−1
3.6 Integer factorization
| 101
Assume that n has a prime divisor p, and let f : ℤ/pℤ → ℤ/pℤ be an arbitrary mapping. The target set of f being finite, there exist integers j > i ⩾ 0 with f j (x0 ) = f i (x0 ). If f behaves sufficiently randomly, then, with high probability, this phenomenon will already occur after the first O(√p) steps. The function f(x) = x2 + a mod p with a ∈ ̸ {−2, −1, 0} is often used; f(x) = x2 + 1 mod p is a typical example. As, unfortunately, we do not know p yet, we compute F(x) = x2 + a mod n instead of f . This yields F i (x0 ) ≡ f i (x0 ) mod p Now, we hope to find two elements that are the same in the f -sequence, but different in the F-sequence. For the corresponding indices i ≠ j, we have f i (x0 ) ≡ f j (x0 ) i
j
F (x0 ) ≢ F (x0 )
mod p mod n
Using this pair (i, j), we now obtain a nontrivial factor of n by computing gcd(F j (x0 ) − F i (x0 ), n). The problem is that storing each of the F i (x0 ) is too expensive, as is the comparison of all F i (x0 ) with each other. The solution is to continually increase the distance which we suspect is between i and j. Let x i = F i (x0 ) and y i = F 2i (x0 ). If x i ≡ x i+k mod p, then we obtain x j ≡ x j+k mod p for all j ⩾ i. In particular, for j = kℓ ⩾ i we have x j ≡ x2j = y j mod p. This leads to the following algorithm: (1) Choose x0 ∈ {0, . . . , n − 1} at random, and let y0 = x0 . (2) For all i ⩾ 1, successively consider the pairs (x i , y i ) = (F(x i−1 ), F 2 (y i−1 )) and check whether gcd(y i − x i , n) yields a nontrivial divisor of n. (3) Once gcd(y i − x i , n) = n holds, the procedure is stopped and restarted with a new initial value x0 or a different function F. If gcd(y i −x i , n) = n holds, then we have found a cycle for the F-sequence. The problem is that the ρ algorithm for factorization is only efficient if n has a sufficiently small prime factor p. The expected number of arithmetic operations of the procedure to find 4 a divisor of n is O(√p). If the smallest prime divisor is of order O(√n), then O(√n) operations have to be carried out. This is not feasible for large integers.
3.6.3 Quadratic sieve The quadratic sieve is a factorization algorithm which does not require the input n to have some special structure. The basic idea is to find integers x and y such that x2 ≡ y2 mod n and x ≢ ±y mod n. In this case, both gcd(x − y, n) and gcd(x + y, n) yield a nontrivial factor of n.
102 | 3 Number theoretic algorithms We first fix a factor base B consisting of −1 together with a list of prime numbers. A typical choice is B = {−1} ∪ { p ⩽ b | p is prime } for some bound b depending on n. An integer is B-smooth if it can be written as a product of elements in B. In the first phase, we compute a large set of numbers S such that r2 mod n is B-smooth for all r ∈ S. Then, in the second phase, we choose a subset T of S such that all prime exponents in ∏r∈T (t2 mod n) are even. Finally, we set x = ∏r∈T r and y is computed from ∏r∈T (r2 mod n) by halving all prime exponents. By construction, we have x2 ≡ y2 mod n. However, if x ≡ ±y mod n, then we either have to try a different choice for T, compute an even larger set S, or start all over again with a bigger bound b. Let us consider the first phase in more detail. Let m = ⌊√n⌋ and R = {−c + m, . . . , −1 + m, m, 1 + m, . . . , c + m} for some bound c significantly smaller than m/2. We define f(r) = r2 − n for r ∈ R. Note that |f(r)| is in the order of magnitude of 2cm < n. We want to compute S = { r ∈ R | f(r) is B-smooth } For every element r ∈ S, there is a factorization f(r) = ∏ p e p,r p∈B
The computation of S starts with the vector (f(−c + m), . . . , f(c + m)). For every prime p ∈ B, if some entry in this vector is divisible by p, we divide it by p as often as possible. If we end up with an entry 1 or −1, then the initial integer at this position is B-smooth. Note that this computation also gives the prime exponents e p,s . It is too costly to check for every entry whether it is divisible by p. Here, the sieving comes into play. The entry f(r) is divisible by p if and only if f(r) = r2 − n ≡ 0 mod p. We compute solutions r1 and r2 of the quadratic equation r2 ≡ n mod p; this can be achieved with Tonelli’s or Cipolla’s algorithm from Section 3.5. We have r2i ≡ n mod p, and f(r) is divisible by p if and only if r ∈ {r1 , r2 } + pℕ. By construction of the vector, it suffices to consider c/p entries. In the second phase, we want to compute a subset T ⊆ S such that in the product ∏ f(r) = ∏ p∑r∈T e p,r r∈T
p∈B
all exponents ∑r∈T e p,r for p ∈ B are even (including the exponent of −1). To this end, we need to find a nontrivial solution of the following system of linear equations modulo 2 with variables X r ∈ {1, 0}: ( ∑ e p,r X r = 0) r∈S
p∈B
3.7 Discrete logarithm
|
103
In particular, we want to have |S| ⩾ |B|. The set T is given by T = { r ∈ S | Xr = 1 } Finally, we set x = ∏r∈T r and y = ∏p∈B p(∑r∈T e p,r )/2 . If B is too small, we have to choose a huge c in order to have |S| ⩾ |B|. On the other hand, if B is large, the sieving takes a lot of time and we have to solve a large system of linear equations. Typically, you start with rather “small” values of b and c in the order of magnitude of 2√log n and then increases them if the algorithm does not succeed with the factorization of n. There are many optimizations of the quadratic sieve. For instance, it suffices to only consider primes p in the factor base such that n modulo p is a square; otherwise the quadratic equation r2 ≡ n mod p has no solution. Example 3.15. Let n = 91. We consider the factor base B = {−1, 2, 3, 5}. With m = 9 and c = 2, we obtain R = {7, 8, 9, 10, 11}. For r ∈ R , we compute f(r) = r2 − 91. This yields the initial vector (f(7), . . . , f(11)) = (−42, −27, −10, 9, 30). The following table summarizes the sieving stage. r
7
8
9
10
11
f(r)
−42
−27
−10
9
30
Sieving with p = 2 Sieving with p = 3 Sieving with p = 5
−21 −7 −7
−27 −1 −1
−5 −5 −1
9 1 1
15 5 1
The B-smooth numbers are −27 = (−1) ⋅ 33 and −10 = (−1) ⋅ 2 ⋅ 5 and 9 = 32 and 30 = 2 ⋅ 3 ⋅ 5. Possible choices for T are T1 = {8, 9, 10, 11} and T2 = {8, 9, 11} and T3 = {10}. For T1 , we obtain x1 = 8 ⋅ 9 ⋅ 10 ⋅ 11 = 7920 and y1 = 2 ⋅ 33 ⋅ 5 = 270; we can omit the sign of y1 since both +y1 and −y1 are considered. We have gcd(x1 −y1 , 91) = 1 and gcd(x1 + y1 , 91) = 91. Therefore, T1 does not yield a factorization of 91. For T2 , we obtain x2 = 8 ⋅ 9 ⋅ 11 = 792 and y2 = 2 ⋅ 32 ⋅ 5 = 90. This yields the factorization 91 = 13 ⋅ 7 since gcd(x2 − y2 , 91) = 13 and gcd(x2 + y2 , 91) = 7. Finally, for T3 , we obtain x3 = 10 and y3 = 3. This immediately yields the factors x3 − y3 = 7 and x3 + y3 = 13. ◊
3.7 Discrete logarithm Currently, no efficient methods for integer factorization are known, and it is widely believed that no such methods exist. The security of many cryptosystems depends heavily on this assumption. However, you cannot completely rule out the possibility that one day factorizing can be done sufficiently fast. Therefore, we are interested in
104 | 3 Number theoretic algorithms
cryptosystems whose security is based on the hardness of other computational problems. One such problem is the discrete logarithm. Let G be a group and g ∈ G an element of order q. Let U be the subgroup of G generated by g, so |U| = q. Then, x → g x defines an isomorphism ℤ/qℤ → U which can be computed using fast exponentiation in many applications. The discrete logarithm problem is the task of determining the element x mod q from the knowledge of the elements g and g x in G. If G is the additive group (ℤ/nℤ, +, 0), this is easy using the extended Euclidean algorithm. But what should be done when the Euclidean algorithm is not available? It seems that, then, the inverse mapping of x → g x cannot be computed efficiently. In applications, it is therefore sufficient to work with the multiplicative group of a finite field or a finite group of points on an elliptic curve. With elliptic curves, you can apparently reach the same level of security with fewer bits, but since we have not yet dealt with elliptic curves, we will restrict ourselves to the case 𝔽∗p for a prime number p. We fix the prime number p and the element g ∈ 𝔽∗p . Let the order of g in 𝔽∗p be q. Then, in particular, q is a divisor of p − 1. Suppose we are given h = g x ∈ 𝔽∗p . The task is to determine x ⩽ p − 1 with g x = h. The naive method is to exhaustively try out all possible values for x . The expected search time is approximately q/2. This is hopeless if q, written in binary, has a length of 80 bits or more. In typical applications, q and p have a bit size of about 160 to 1000 bits each. The computation of x y mod z is very efficient when using fast modular exponentiation, even for numbers x, y, z with more than 1000 bits. Therefore, the mapping ℤ/qℤ → U with x → g x can, in fact, be computed efficiently. The currently known algorithms for the discrete logarithm problem on inputs of bit size log n achieve a runtime of O(√n) on elliptic curves. This is considered to be subexponential and is significantly better than O(n), but still too slow to treat 160-digit numbers reasonably. Note that a 160-digit number has an order of magnitude of 2160 .
3.7.1 Shanks’ baby-step giant-step algorithm Let G be a finite group of order n, let g ∈ G be an element of order q, and let y be a power of g. We search for a number x with g x = y, the discrete logarithm of y to base g. A classical algorithm to compute the discrete logarithm is the baby-step giant-step algorithm by Shanks. Here, a number m ∈ ℕ is chosen, which should be of approximate size √q. If q is unknown, you can instead choose m to have approximate size √n and substitute q by n in the following. For all r ∈ {m −1, . . . , 0} in the baby steps, one computes the values g−r = g q−r ∈ G and stores the pairs (r, yg−r ) in a table B. In practical implementations, you should use hash tables for efficiency reasons. Note that for exactly one r with 0 ⩽ r < m there is a number s with x = sm + r. Substituting z = yg−r , the pair (r, z) is in the table and z = g sm . The giant steps are as follows. We compute h = g m , and then for all s with 0 ⩽ s < ⌈q/m⌉, we check whether
3.7 Discrete logarithm
| 105
a table entry (r, h s ) exists. If we find such numbers r and s, then x = sm + r is the desired value. The running time is dominated by m, that is, approximately √q or √n, respectively. This is not particularly pleasing, but no faster methods are known. A bigger problem with this method is the huge amount of memory needed to store the m pairs (r, z). Even if you can reduce the table size by hashing and other elaborate methods, an enormous amount of data still have to be maintained and stored. For realistic sizes like q ⩾ 2100 , the baby-step giant-step algorithm is not feasible.
3.7.2 Pollard’s rho algorithm for the discrete logarithm As before, let G be a finite group of order n, let g ∈ G be an element of order q, and y a power of g. We search for a number x with g x = y. The runtime of Pollard’s ρ algorithm to compute x is similar to that of Shanks’ baby-step giant-step algorithm; however, it requires only a constant amount of storage space. The price for this advantage is that the ρ algorithm yields no worst-case time bound, but rather a bound that is based on the birthday paradox. It only provides an expected time complexity of O(√q). First, we decompose G into three disjoint sets, P1 , P2 and P3 . In G = 𝔽∗p , for example, you could define the P i by residues modulo 3. In principle, the exact decomposition is irrelevant, but it should be easy to compute and ensure that the following scheme yields something like a random walk through the subgroup of G generated by g. We think of a pair (r, s) ∈ ℤ/qℤ × ℤ/qℤ as a representation of g r y s ∈ G. The above partition defines a mapping f : ℤ/qℤ × ℤ/qℤ → ℤ/qℤ × ℤ/qℤ as follows: (r + 1, s) { { { f(r, s) = {(2r, 2s) { { {(r, s + 1)
if g r y s ∈ P1 if g r y s ∈ P2 if g r y s ∈ P3
Let f(r, s) = (r , s ) and h = g r y s and h = g r y s , then h = gh for h ∈ P1 , h = h2 for h ∈ P2 and h = hy for h ∈ P3 . The ρ algorithm starts with a random pair (r1 , s1 ) and iterates the computation by (r i+1 , s i+1 ) = f(r i , s i ) for i ⩾ 1. We define h i = g r i y s i and observe that h i+1 can uniquely be determined from h i . Therefore, the sequence (h1 , h2 , . . . ) in G encounters a cycle after a finite initial segment and, therefore, it looks like the Greek letter ρ, again explaining the name of the algorithm. There exist t and r with h t = h t+r . Now hℓ = hℓ+r holds for all ℓ ⩾ t. We use this fact to keep the required storage space bounded by a constant. We could proceed as we did in Section 3.6.2 to find different indices i, j with h i = h j . However, we want to use an alternative approach, which could also be used in the ρ algorithm for factorization. We work in phases: at the beginning of each phase, a value hℓ is stored. In the first phase,
106 | 3 Number theoretic algorithms we let ℓ = 1. Then, for k ∈ {1, . . . , ℓ}, we successively compute the value hℓ+k and compare it with hℓ . We stop the procedure if ℓ is found such that hℓ = hℓ+k . Otherwise, we replace hℓ by h2ℓ and ℓ by 2ℓ and start a new phase. Note that, at the latest, we will stop if ℓ is greater than or equal to t + r because then, with k = r, we reach the situation hℓ = hℓ+k . Thus, we stop with the pairs (r, s) = (rℓ , sℓ ) and (r , s ) = (rℓ+k , sℓ+k ), for which r s g y = g r y s . This implies g r+xs = g r +xs and therefore r + xs ≡ r + xs mod q. Now, x(s−s ) ≡ r −r mod q. The probability for s = s is small and if q is prime, we can solve the congruence uniquely. This method, however, can also be applied to more general situations and we might possibly obtain different x to solve the last congruence. If possible, we check for all these solutions x whether g x = y holds. Otherwise, we restart the procedure, using another random pair (r1 , s1 ). The analysis of the running time depends on the expected minimal value t + r, yielding h t = h t+r . If the sequence (h1 , h2 , . . . ) behaves randomly, then we look for k such that the probability for the existence of h t = h t+r with 1 ⩽ t < t + r ⩽ k is greater than 1/2. The birthday paradox leads to an expected running time of O(√q) for Pollard’s ρ algorithm.
3.7.3 Pohlig–Hellman algorithm for group order reduction If the prime factorization of the order of a cyclic group G is known, then the discrete logarithm problem can be reduced to the prime divisors. The according method was published by Stephen Pohlig and Martin Hellman. Let |G| = n with n = ∏ p e(p) p|n
where the product runs over all prime divisors p of n. Let y, g ∈ G be given such that y is a power of g. We look for a number x with y = g x . We define n g p = g np y p = y np n p = e(p) p The elements G p = { h n p | h ∈ G } form a subgroup of G. If h generates G, then h n p generates G p . In particular, G p contains exactly p e(p) elements. The following theorem shows that it is sufficient to solve the discrete logarithm problem in the groups G p . x
Theorem 3.16. For all primes p with p | n let x p ∈ ℕ be given such that y p = g pp . If x ≡ x p mod p e(p) for all primes p | n, then y = g x . −x
Proof: We have (g−x y)n p = g p p y p = 1. Thus, the order of g−x y is a divisor of n p for all p | n. The greatest common divisor of all n p is 1, so the order of g−x y also must be 1. This shows y = g x . With the Chinese remainder theorem from Theorem 3.16, we can compute a solution in G from the solutions in the groups G p . We further simplify the problem for G p by
3.7 Discrete logarithm
|
107
reducing the discrete logarithm to groups of order p. Without loss of generality let now |G| = p e for a prime number p. Since x < p e , there exists a unique representation x = x0 ⋅ p0 + ⋅ ⋅ ⋅ + x e−1 ⋅ p e−1 with 0 ⩽ x i < p. We successively compute x0 , . . . , x e−1 as follows. Let x0 , . . . , x i−1 be 0 i−1 known already for i ⩾ 0. For z i = yg−(x0 ⋅p +⋅⋅⋅+x i−1 ⋅p ) we have g x i ⋅p +⋅⋅⋅+x e−1 ⋅p i
e−1
= zi
Raising both sides to the p e−i−1 th power yields (g p
e−1
e
x
p e−i−1
) i = zi
(3.1)
since g p = 1 for e ⩾ e. The element g p generates a group of order at most p, and the number x i is obtained by solving the discrete logarithm problem of equation (3.1) within this group. This can, for example, be done using Shanks’ baby-step giant-step algorithm or Pollard’s ρ algorithm with O(√p) group operations. For arbitrary cyclic groups G with |G| = n, we thus have a method for computing the discrete logarithm which requires e−1
O( ∑ e(p) ⋅ ( log n + √p)) p|n
group operations. Here, the rough estimate e(p) ⋅ log n covers the computation of g p , y p , and z i .
3.7.4 Index calculus Index calculus is an algorithm for computing discrete logarithms in (ℤ/qℤ)∗ for some prime q. Given g ∈ (ℤ/qℤ)∗ and a ∈ ⟨g⟩, we want to compute an integer x with a = g x . We choose some bound b and we let B = { p | p prime, p ⩽ b } be a factor base. Remember that an integer is B-smooth if and only if it can be written as a product of numbers in B. The index calculus algorithm consists of two stages. First, for every p ∈ B we compute y p with g y p ≡ p mod q Second, we try to find y such that ag y mod q is B-smooth; this is done by randomly choosing y until this condition holds. Then, we can write ag y mod q = ∏ p∈B p e p and obtain ag y ≡ ∏ p e p ≡ ∏ g e p y p ≡ g∑p∈B e p y p mod q p∈B
p∈B
and thus x = −y + ∑p∈B e p y p mod q − 1 satisfies a ≡ g x mod q.
108 | 3 Number theoretic algorithms
It remains to describe the computation of the y p . To this end, we randomly choose many different z such that g z mod q is B-smooth, that is, we can write g z mod q = ∏p∈B p f(p,z). We stop if we have found a large set Z of such numbers z, at least |B| + 1. Then the y p ’s are obtained as solutions of the system of linear equations (z ≡ ∑ f(p, z)y p
mod p − 1)
p∈B
z∈Z
The expected running time of the index calculus algorithm is similar to that of the quadratic sieve. The main ingredient in the algorithm is the notion of B-smoothness. It is not easy to (efficiently) generalize this notion to groups other than (ℤ/qℤ)∗ . This is one of the main reasons for considering elliptic curves over finite fields 𝔽q with q odd; in this setting, it is widely believed that such a generalization does not exist. This is one of the main reasons for the popularity of cryptosystems based on elliptic curves: the same level of security can be achieved with smaller keys than for cryptosystems based on (ℤ/qℤ)∗ .
3.8 Multiplication and division The classical method of multiplying two binary numbers of length n requires Θ(n2 ) operations. Let the two numbers r and s of 2k bits each be of the following form: r=
A
B
s=
C
D
Here, A represents the k most significant bits of r and B the k least significant bits. Similar for C and D with respect to s. In other words, we can write r = A 2k + B and s = C 2k + D. From that we obtain rs = A C 22k + (A D + B C) 2k + B D Instead of following this approach, Karatsuba’s algorithm (Anatoly Alexeyevich Karatsuba, 1937–2008) recursively computes the three products AC, (A + B)(C + D), and BD. Then, rs can be computed performing only three multiplications of numbers with bit length at most k rs = AC 22k + (A + B)(C + D) 2k − (AC + BD) 2k + BD Let tmult (n) be the time which this recursive method takes to multiply numbers of n bits. Since n-bit numbers can be added in time O(n), the master theorem yields the following estimate for tmult (n): tmult (n) = 3 ⋅ tmult (n/2) + O(n) ∈ O(nlog2 (3) ) = O(n1.58496⋅⋅⋅ )
3.9 Discrete fourier transform
|
109
Thus, by a divide-and-conquer approach, we have reduced the exponent of the classical method from 2 to approximately 1.58496. Another important arithmetic operation is the computation of a mod m. Because of a a mod m = a − m ⌊ ⌋ m this operation can be reduced to subtraction, multiplication and integer division. For subtraction, the complexity of the classical method is linear in the number of digits. For multiplication, we have just seen an algorithm with subquadratic running time. So, now division remains to be investigated. To this end, we sketch in the following a method, which, using only subtractions and multiplications, is able to compute sufficiently many decimal places of the reciprocal of a given integer. Once we know enough 1 1 bits of m after the decimal point, we can compute ⌊ ma ⌋ by multiplying a and m . Those 1 bit values of m contributing merely to the fractions part of this product need not be computed. 1 To approximate m , we can use the Newton method (Sir Isaac Newton, 1643–1727) to find the root of f(x) = 1x − m. For this purpose, we compute better and better approximations x i to this root by the rule x i+1 = x i −
f(x i ) f (x i )
In our particular case, this results in the rule x i+1 = 2x i − mx2i 1 If we begin with an initial value x0 between 0 and m , the sequence (x i )i∈ℕ very 1 quickly converges to m . A good initial value is, for example, x0 = 2−⌊log2 m⌋−1 . The value ⌊log2 m⌋ + 1 can be found easily by counting the number of binary digits of m. Without further computation, the initial value x0 can be specified in binary by a sequence of zeros after the decimal point, followed by a 1. The problem that we have to work with noninteger numbers can be eliminated by previously multiplying by a sufficiently large power of 2; this can be achieved by a simple shift operation.
3.9 Discrete fourier transform The goal of the discrete Fourier transform (named after Jean Baptiste Joseph Fourier, 1768–1830) is to multiply polynomials very efficiently. Let R be a commutative ring and b ⩾ 1 a natural number with an inverse in R. That is, the multiplicative inverse b−1 ∈ R exists. This is always the case if R is a field of characteristic zero, like ℂ. It also holds in fields of characteristic p such that gcd(b, p) = 1. And it is also true if b = 2r is a power of two and R = ℤ/nℤ for odd n. Note that ℤ/nℤ may have various zero divisors.
110 | 3 Number theoretic algorithms The b-fold direct product R b of R consists of all vectors (u 0 , . . . , u b−1 ) with u i ∈ R. For R b , we can consider two different multiplications to be defined in a natural way. First, we can perform a componentwise multiplication: (u 0 , . . . , u b−1 ) ⋅ (υ0 , . . . , υ b−1 ) = (u 0 υ0 , . . . , u b−1 υ b−1 ) Or we interpret a vector (u 0 , . . . , u b−1 ) as a polynomial ∑i u i X i and multiply polynomials in the quotient ring R[X]/(X b −1). To distinguish these two kinds of multiplication, we denote the product of the polynomials f and g as f ∗ g (it is called the convolution product). By convention u i = 0 for all i < 0 and all i ⩾ b. This eases notation because we need no explicit summation limits. In this quotient ring, the identity X b = 1 holds, and we obtain ( ∑ u i X i ) ∗ ( ∑ υ i X i ) = ∑ ( ∑ u j υ i−j )X i = ∑ ( ∑ u j υ i−j )X i mod b i
i
i
j
i
j
Let now ω ∈ R be a bth root of unity, that is, an element satisfying ω b = 1. Then, we can evaluate f(X) ∈ R[X]/(X b − 1) at powers ω i and f(X) → (f(1), f(ω), f(ω2 ), . . . , f(ω b−1 )) defines a ring homomorphism from R[X]/(X b − 1) into R b . A primitive bth root of unity is an element ω ∈ R \ {1} satisfying the following conditions: (a) ω b = 1, ki (b) ∑b−1 i=0 ω = 0 for all 1 ⩽ k < b. If ω is a primitive bth root of unity, then ω−1 is as well, since the equation in condition (b) can be multiplied by ω−k(b−1) . If c ⩾ 1 divides b, then ω c is a primitive (b/c)th root of unity in R because we have b/c−1
b/c−1
∑ ω kci = c−1 ∑ c ω k
i=0
b−1
i
= c−1 ∑ ω k
i=0
i
= c−1 ⋅ 0 = 0
i=0
with k = kc < b. Note that, in the third term of the equation above, each power ω k i is counted exactly c times. From Corollary 1.57, the complex number e(2π√−1)/b is a primitive bth root of unity in ℂ. The discrete Fourier transform is possible in each ring R in which b has a multiplicative inverse b −1 and where a primitive bth root of unity ω exists. For such rings R, the polynomial ring R[X]/(X b − 1) with the identity X b = 1 and the direct product R b are isomorphic. The isomorphism can be explained by matrix multiplications. The {0, . . . , b − 1} × {0, . . . , b − 1} matrices F = (ω ij )i,j and F = (ω−ij )i,j satisfy b−1
b−1
F ⋅ F = (ω ij )i,j ⋅ (ω−ij )i,j = ( ∑ ω ik−kj ) = ( ∑ ω k(i−j)) k=0
i,j
k=0
i,j
3.9 Discrete fourier transform
|
111
Here, (a ij )i,j denotes the matrix with entry a ij at the (i, j)-position. For i ≠ j we have k(i−j) = 0, and for i = j this sum equals b. Thus, F ⋅ F is a diagonal matrix with ∑b−1 k=0 ω value b on the diagonal. In particular, the matrices F and F are invertible and F ⋅ F ⋅ b −1 is the unit matrix. Now, (a0 , . . . , a b−1 ) ⋅ F = (∑k a k ω kj )j and we obtain the following ring isomorphism F : R[X]/(X b − 1) → R b , mapping each element f(X) = ∑i a i X i to (a0 , . . . , a b−1 ) ⋅ F = (f(1), f(ω), f(ω2 ), . . . , f(ω b−1 )) In particular, multiplication with the matrix F can be interpreted as evaluating a polynomial f at points 1, ω, ω2 , . . . , ω b−1 . The mapping F : R[X]/(X b − 1) → R b is called discrete Fourier transform. For the inverse mapping, the matrix F is replaced by F and in the end the result is multiplied by the scalar b −1 . coefficient sequence for the product f ∗ g
coefficient sequence for f and g
interpolation F ⋅ b −1
evaluation F
sequences f(ω i ), g(ω i )
pointwise multiplication
sequence f(ω i ) ⋅ g(ω i )
Fig. 3.1. Computing f ∗ g using the discrete Fourier transform.
To compute the coefficients z i in f(X) ∗ g(X) = ∑i z i X i according to Figure 3.1, we use the following strategy: (1) Compute F(f(X)) and F(g(X)), that is, f(ω i ) and g(ω i ) for 0 ⩽ i < b. (2) Evaluate the products h i = f(ω i ) ⋅ g(ω i ) in R for 0 ⩽ i < b. (3) Compute (h0 , . . . , h b−1 ) ⋅ F ⋅ b −1 = (z0 , . . . , z b−1 ). Since F is a ring isomorphism, we have f ∗ g = (F(f) ⋅ F(g)) ⋅ F ⋅ b −1 Next, let us consider the special case that b is a power of 2, that is, b = 2r for r ⩾ 1. Then, the computation of F(f(X)) can efficiently be performed using a divideand-conquer strategy. This leads to the fast Fourier transform (abbreviated FFT) as follows. We write polynomials f(X) of degree less than b in the form f(X) = f0 (X 2 ) + Xf1 (X 2 ) The polynomials f j have degree less than b/2 and ω2 is a primitive (b/2)th root of unity. If f = (a0 , . . . , a b−1 ), then f0 = (a0 , a2 , . . . , a b−2 ) and f1 = (a1 , a3 , . . . , a b−1 ).
112 | 3 Number theoretic algorithms
This yields f(ω i ) = f0 (ω2i ) + ω i f1 (ω2i ) Let F : R[X]/(X b/2 − 1) → R b/2 be the Fourier transform of dimension b/2 with the root of unity ω2 . If we have computed F (f0 ) = (u 0 , . . . , u b/2−1 ) and F (f1 ) = (υ0 , . . . , υ b/2−1 ), then we can determine F(f) as follows. With vectors u = (u 0 , . . . , u b/2−1 , u 0 , . . . , u b/2−1 ) υ = (υ0 , . . . , υ b/2−1 , υ0 , . . . , υ b/2−1 ) w = (1, ω, ω2 , . . . , ω b−1 ) of length b, we determine by componentwise computation F(f) = u + w ⋅ υ The transformed F (f0 ) and F (f1 ) can be recursively computed with the same procedure. The additions in R and multiplications with ω i and b −1 are called elementary arithmetic operations. Let t(b) be the number of elementary arithmetic operations necessary in this scheme for computing F(f). Then, t(b) satisfies the recurrence t(b) ⩽ 2 t(b/2) + O(b) since to determine the Fourier transform two recursive calls for f 0 and f1 of half the size are needed. The results of these two recursive calls will be combined to obtain the transform f using a linear number of operations. The master theorem, Theorem 3.2, then yields t(b) ∈ O(b log b). If we want to compute the product f(X) ∗ g(X) of two polynomials of degree less than d in R[X], then we can choose b as the smallest power of 2 with b ⩾ 2d. If a primitive bth root of unity ω is known, then using the FFT according to the scheme in Figure 3.1, the product of f(X) and g(X) can be computed with only O(d log d) elementary arithmetic operations. The naive approach, in contrast, requires O(d 2 ) elementary arithmetic operations.
3.10 Primitive roots of unity Let b ∈ ℕ and 𝔽 be a field of characteristic zero or characteristic p with gcd(b, p) = 1. Then, b is invertible in 𝔽. Moreover, the formal derivative of the polynomial X b − 1 is bX b−1 . Therefore, the polynomial X b −1 has no multiple roots. In a field extension of 𝔽, the polynomial X b − 1 splits into linear factors and the bth roots in the field extension are pairwise distinct. The group of bth roots of unity thus has b elements. It is cyclic and generated by an element ω. Then, ω b = 1 and ω k ≠ 1 for all 1 ⩽ k < b. This ki = 1 − ω bk = 0 for all 1 ⩽ k < b. Since fields have no zero implies (1 − ω k ) ∑ b−1 i=0 ω b−1 ki divisors, ∑i=0 ω = 0 follows; and the generator ω is a primitive bth root of unity. If 𝔽 = ℂ, for example, we may choose ω = e(2π√−1)/b . In general, however, a ring has zero divisors, and therefore the properties ω b = 1 ki and ω k ≠ 1 for all 1 ⩽ k < b are not sufficient to ensure ∑b−1 i=0 ω = 0. The remaining
3.11 Schönhage–Strassen integer multiplication
|
113
part of this section is dedicated to the following theorem which is crucial for Schönhage and Strassen’s algorithm for the fast multiplication of large numbers. Theorem 3.17. Let b = 2r and let m be a multiple of b, and let n = 2m + 1. We define ψ = 2m/b and ω = ψ2 . Then the following properties hold in ℤ/nℤ: (a) b −1 = −2m−r (b) ψ b = −1 (c) ω is a primitive bth root of unity. Proof: In ℤ/nℤ we have 2m = −1. The claims b −1 = −2m−r and ψ b = −1, as well as ki = 0 for all 1 ⩽ k < b. Since ω b = 1, are therefore trivial. We have to show ∑b−1 i=0 ω r b = 2 is a power of 2, an induction on r yields b/2−1
b−1
r−1
p
∑ ω ki = (1 + ω k ) ∑ (ω2 )ki = ∏ (1 + ω2 k ) i=0
p=0
i=0
2ℓ u
with an odd number u. Using 0 ⩽ ℓ < r, we have a factor of the form Now let k = r−1 r−1 r−1 2 u 1+ω in the above product. Now, ω2 = ψ2⋅2 = ψ b = −1. Since u is odd, we r−1 obtain 1 + ω2 u = 1 + (−1)u = 1 − 1 = 0.
3.11 Schönhage–Strassen integer multiplication Multiplying two n-bit numbers requires n2 -bit operations when using the classical method. In 1960, Karatsuba noticed that it can be done much faster with a surprisingly simple divide-and-conquer approach; he showed that asymptotically less than n1.6 operations are sufficient; see Section 3.8. A major breakthrough came when Arnold Schönhage (born 1934) and Volker Strassen were able to present a method with a nearly linear number of arithmetic operations in 1971 [91]. More specifically, they showed that the multiplication of two n-bit numbers can be done in time O(n log n log log n) on a multitape Turing machine. It took 35 more years, until Martin ∗ Fürer (born 1947) could further improve this time bound to O(n log n 2log n ); see [43]. Here, log∗ n denotes the number of applications of the logarithm needed to have a re∗ sult less than 1. The function 2log n asymptotically grows much slower than log log n; however, on all realistic inputs, log log n is smaller. n
1
2
100
2100
log log n ∗ 2log n
1 1
1 2
3 16
7 32
100
22
100 64
2100
22
2100 128
In the remainder of this section, we prove the result of Schönhage–Strassen. Theorem 3.18. The multiplication of two natural numbers of bit size n can be done with O(n log n log log n) bit operations.
114 | 3 Number theoretic algorithms
The input consists of two natural numbers u and υ. We may assume that the binary representation of the product uυ requires at most n bits and that n = 2s is a power of 2. It is sufficient to compute uυ modulo 2n + 1. In the according quotient ring, 2n = −1 holds. We define b = 2⌈s/2⌉ and ℓ = 2⌊s/2⌋ . As a rule of thumb, we note – 2n + 1 is large and odd. – n = 2s is the input size and a power of 2. – s ∈ ℕ is a small number. – b = 2⌈s/2⌉ , ℓ = 2⌊s/2⌋ , n = bℓ, b | 2ℓ and ℓ ⩽ √n ⩽ b ⩽ 2ℓ. We decompose the input into b blocks of length ℓ and write u = ∑i u i 2ℓi and υ = ∑i υ i 2ℓi , where 0 ⩽ u i , υ i < 2ℓ for all i. By convention, u i = υ i = 0 holds for all i < 0 and for all i ⩾ b. Let y i = ∑j u j υ i−j ; then y i ≠ 0 can only hold for 0 ⩽ i < 2b − 1. For 0 ⩽ i < b at most i + 1 summands u j υ i−j in y i are nonzero, which yields y i < (i + 1)22ℓ ; and in y b+i at most b − i − 1 summands do not vanish, resulting in y b+i < (b − i − 1)22ℓ . With 2bℓ = −1, we obtain b−1
y i − y b+i )2ℓi uυ = ∑ ( ∑ u j υ i−j )2ℓi = ∑ y i 2ℓi = ∑ ( ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ i
j
i
i=0
= wi
in ℤ/(2n + 1)ℤ. Since −(b − i − 1)22ℓ < w i < (i + 1)22ℓ , it is sufficient to determine the values w i modulo b(22ℓ + 1). Since b is a power of 2 and 22ℓ + 1 is odd, according to the Chinese remainder theorem we can first compute the w i modulo b, then modulo 22ℓ + 1, and finally combine these two intermediate steps for the result modulo b(22ℓ + 1). Letting wi ≡ w i mod b and wi ≡ w i mod 22ℓ + 1, we obtain w i from wi and wi by the computation w i ≡ wi (22ℓ + 1) − wi 22ℓ
mod b(22ℓ + 1)
since 22ℓ ≡ 0 mod b. The resulting values w i with −(b − i − 1)22ℓ < w i < (i + 1)22ℓ each have O(√n) bits. From w0 , . . . , w b−1 we compute b−1
∑ w i 2ℓi mod (2n + 1)
i=0
This requires O(√n) additions or subtractions, respectively, each on operands of length O(√n). These can be performed by the classical methods in linear time. It remains to show how to find the values wi and wi . We first describe how the values wi are computed. First, we let u i = u i mod b and υ i = υ i mod b for 0 ⩽ i < b. This is easy because b is a power of 2 and u i and υ i are given in binary. Since yi = ∑j u j υi−j , we have wi = (yi − yb+i ) mod b. Therefore, it is sufficient to determine the yi for 0 ⩽ i < 2b. We know that 0 ⩽ yi < 2b 3 , and therefore 1 + 3 log b < 4 log b bits for each yi are sufficient. Let u = ∑i u i 2(4 log b)i and υ = ∑i υi 2(4 log b)i . This means that u can be obtained from the u i by inserting
3.11 Schönhage–Strassen integer multiplication |
115
sufficiently long blocks of zeros between them; υ can be constructed in the same way. Then, we have u ⋅ υ = ∑ ( ∑ u j υi−j )2(4 log b)i = ∑ yi 2(4 log b)i i
u
j
i
υ
The binary lengths of and are bounded by 4b log b, so with the method of Karatsuba, we can compute the product u ⋅ υ in time O((b log b)1.6 ) ⊆ O(b 2 ) = O(n). From the result, one can directly extract the values y i since the corresponding blocks do not overlap. Next, we want to determine the wi . We let N = 22ℓ +1 and R = ℤ/Nℤ. In particular, 2ℓ 2 = −1 in R. Note that N is approximately 22√n + 1. The number N is large, but it is much smaller than 2n . The subsequent calculations are performed in the polynomial ring R[X]/(X b − 1). The elements ψ = 22ℓ/b and ω = ψ2 have the properties given by Theorem 3.17. Thus, this polynomial ring is isomorphic to the direct product R b , and multiplication of polynomials can be reduced to multiplications in R as shown in Section 3.9. We consider the following two polynomials:
Since
ψb
f(X) = ∑ u i ψ i X i
g(X) = ∑ υ i ψ i X i
i
i
= −1 and
Xb
= 1, we have the following: b−1
f(X) ∗ g(X) = ∑ ( ∑ u j υ i−j )ψ i X i = ∑ y i ψ i X i = ∑ (y i − y b+i )ψ i X i i
j
i
i=0
ψi
By the factor in the definition of f and g it is ensured that y b+i has the desired i negative sign. Let h(X) = f(X) ∗ g(X) ∈ R[X]/(X b − 1) and write h(X) = ∑b−1 i=0 z i X . Then, we obtain the w i as follows: wi = z i ψ−i mod 22ℓ + 1 We compute the z i using the fast discrete Fourier transform, in which we recursively use the same algorithm for multiplication in R. It remains to show how to compute the number z i ψ−i mod 22ℓ + 1 efficiently. The time bound O(ℓ) is sufficient. Elements of R are represented by numbers z in {0, . . . , 22ℓ }. Numbers z ∈ R can efficiently be multiplied with −1 by computing 22ℓ + 1 − z with the classical method. Because of ψ−i = −ψ b−i in R, it suffices to compute elements zψ j mod N for 0 < j < b. Now, ψ j ≡ 2k mod N for k = 2ℓj/b ∈ {1, . . . , 2ℓ − 1}. The value z2k can be computed by left-shifting z by k bits. We can split the result of the shift operation into two 2ℓ-bit blocks z , z ∈ {0, . . . , 22ℓ − 1} such that z2k = z + z 22ℓ . Subtracting z from z modulo N now yields the desired result since z − z ≡ z + z 22ℓ = z2k ≡ ψ j mod N.
Overview of the algorithm In the following, we give a sketch of how the algorithm works. Here, x ∈ R means that the element x from the ring R is given or was computed. Arrows stand for a causal
116 | 3 Number theoretic algorithms connection. In order to give a better insight into the proportions, b and ℓ are replaced by √n. u, υ ∈ ℤ/(2n + 1)ℤ (4)
(1)
u , υ ∈ ℤ
f(X), g(X) ∈ (ℤ/(22√n + 1)ℤ)[X] / (X √n − 1) (5)
(2)
F(f(X)), F(g(X)) ∈ (ℤ/(22√n + 1)ℤ)√n
u υ ∈ ℤ (3)
(6)
(w0 , . . . , w√n−1 ) ∈ (ℤ/√nℤ)√n
F(f(X)) ⋅ F(g(X)) ∈ (ℤ/(22√n + 1)ℤ)√n (7)
f(X) ∗ g(X) ∈ (ℤ/(22√n + 1)ℤ)[X] / (X √n − 1) (9)
(8) 2√n + 1)ℤ)√n (w 0 , . . . , w√n−1 ) ∈ (ℤ/(2
(9)
(w0 , . . . , w√n−1 ) ∈ (ℤ/√n(22√n + 1)ℤ)√n (10)
uυ ∈ ℤ/(2n + 1)ℤ The individual steps of the algorithm can be summarized as follows: The numbers u and υ each are decomposed in √n blocks of √n bits. Each block is computed modulo √n; then u and υ are compiled from the blocks with sufficiently many zeros inserted. (2) The numbers u and υ are multiplied using the Karatsuba multiplication algorithm. (3) Due to the shielding by sufficiently many zeros, the blocks w 0 , . . . , w √n−1 can be reconstructed
(1)
(4) (5) (6) (7) (8) (9) (10)
from the product u υ . The √n-bit blocks are viewed as numbers modulo 22√n + 1 and interpreted as coefficients of polynomials f and g. The Fourier transforms F(f) and F(g) of f and g are computed. In the computation of F(f) ⋅ F(g) for each of the √n multiplications in the ring ℤ/(22√n + 1)ℤ, the algorithm is called recursively. Using the inverse Fourier transform the product of the polynomials f and g is determined. The values w i appear as coefficients of the polynomial f(X) ∗ g(X). 2√n + 1) can be With the Chinese remainder theorem the numbers w i mod √n and w i mod (2 2√n √n(2 combined to w i mod + 1). The product uυ is computed from the numbers w i .
Exercises
| 117
Runtime analysis Computing F(f(X)), F(g(X)), and (h0 , . . . , h b−1 ) ⋅ F ⋅ b −1 takes O(b log b) elementary arithmetic operations, each of which can be performed with O(ℓ)-bit operations. This results in O(ℓ ⋅ b log b) = O(n log n) operations. Furthermore, b products h i = f(ω i ) ⋅ g(ω i ) modulo 22ℓ + 1 have to be computed recursively. Let M(n) be the number of bit operations needed to multiply two n-bit numbers using the Schönhage–Strassen algorithm. Then, we obtain the following recurrence: M(n) ⩽ b ⋅ M(2ℓ) + O(n log n) Thus, M(n)/n ⩽ 2M(2ℓ)/2ℓ + O(log n). For n = 2s and t(s) = M(2s )/2s , we obtain t(s) ⩽ 2t(s/2 + 1) + O(s) This yields t(s) ∈ O(s log s) by the master theorem (see Theorem 3.2 and Example 3.3). Thus, we finally have M(n) ∈ O(n log n log log n).
Exercises 3.1. (Master theorem II) Let ∑ki=0 α i < 1 and f(n) ⩽ ∑ki=0 f(⌈α i n⌉) + O(n). Show that f(n) ∈ O(n). 3.2. Show that on input two binary numbers a, b ∈ ℕ, one can decide in polynomial time whether there exists c ∈ ℚ with c2 = a/b. 3.3. Give an extension of the binary gcd algorithm such that, on input k, ℓ ∈ ℕ, it computes a, b, t with ak + bℓ = t = gcd(k, ℓ). The algorithm should not use division and multiplications by numbers other than 2. 3.4. Show that 1729 is an Eulerian pseudoprime to base 2, but not a strong pseudoprime to base 2. In addition, use the Miller–Rabin scheme to find a nontrivial divisor of 1729. 3.5. (Lucas test; named after Édouard Lucas, 1842–1891) Let n ⩾ 1 and a ∈ ℤ. Show that n is prime if the following two conditions are satisfied: (i) a n−1 ≡ 1 mod n and (ii) a(n−1)/q ≢ 1 mod n for all prime divisors q of n − 1. 3.6. (Pépin test; named after Théophile Pépin, 1826–1904) Let n ⩾ 1. Show that the n Fermat number f n = 22 + 1 is prime if and only if 3(f n −1)/2 ≡ −1 mod f n . 3.7. Let p > 2 be a prime number and consider the Mersenne number n = 2p − 1 (named after Marin Mersenne, 1588–1648). (a) Show that if n is prime, then f(X) = X 2 − 4X + 1 ∈ 𝔽n [X] is irreducible. (b) Suppose that n is prime and let K = 𝔽n [X]/f . Show that X n = 4 − X and (X − 1)n+1 = −2 in K.
118 | 3 Number theoretic algorithms (c) Suppose that n is prime and let K = 𝔽n [X]/f . Show that (X − 1)n+1 = 2X (n+1)/2 in K. (d) Show that n is prime if and only if X (n+1)/2 ≡ −1 mod X 2 − 4X + 1 in (ℤ/nℤ)[X]. 3.8. (Lucas–Lehmer test; named after É. Lucas and Derrick Lehmer, 1905–1991) Let p > 2 be a prime number and n = 2p − 1 the corresponding Mersenne number. The sequence (ℓj )j∈ℕ is defined by ℓ0 = 4 and ℓj+1 = ℓ2j − 2 mod n. Show that n is prime if and only if ℓp−2 = 0. j j Hint: Show that ℓj ≡ X 2 + (4 − X)2 mod X 2 − 4X + 1 in (ℤ/nℤ)[X]. 3.9. A linear Diophantine equation has the form a1 X1 + ⋅ ⋅ ⋅ + a n X n = a n+1 with a i ∈ ℤ. Explain how to check whether a solution (x1 , . . . , x n ) ∈ ℤ n exists and how to compute it. 3.10. Compute the square root of 2 in the field 𝔽41 = ℤ/41ℤ. Use Tonelli’s algorithm and choose g = 3 in the first step. 3.11. Let p be prime with p ≡ 5 mod 8. We consider the field 𝔽p and a square a ∈ 𝔽p . Show that if a(p−1)/4 = 1, then a(p+3)/8 is a square root of a; and if a(p−1)/4 = −1, then 2−1 (4a)(p+3)/8 is a square root of a. 3.12. Let p be a prime number with p ≡ −1 mod 4 and let f(X) = X 2 + 1. (a) Show that the polynomial f(X) is irreducible in 𝔽p [X]. (b) Show how to efficiently find an element a with 1 ⩽ a < p − 1 such that ( ap ) = 1 and ( a+1 p ) = −1. (c) Show that in 𝔽p2 = 𝔽p [X]/f it is possible to extract roots in deterministic polynomial time. 3.13. Compute the square root of 2 in the field 𝔽23 = ℤ/23ℤ. Use Cipolla’s algorithm and choose t = 0 in the first step. 3.14. Use Pollard’s p − 1 algorithm to find a nontrivial divisor of n = 253. In the first step, let B = {2, 3} and a = 2, in the second step B = {2, 3, 5} and a = 2. 3.15. Find a nontrivial divisor of n = 689 with Pollard’s ρ algorithm. Use F(x) = x2 + 1 mod n and the initial value x0 = 12. 3.16. Compute the discrete logarithm of 3 to base 2 in the multiplicative group (ℤ/19ℤ)∗ using Shanks’ baby-step giant-step algorithm. 3.17. Let G be a finite group and g ∈ G. Show how to determine the order of g using Shanks’ baby-step giant-step algorithm.
Summary
|
119
3.18. Let G = (ℤ/23ℤ)∗ and g = 3. (a) Determine the order of g in G. (b) Use Pollard’s ρ algorithm to compute the discrete logarithm of 18 to base 3 in G. Decompose G with respect to the residue classes modulo 3 and use r1 = s1 = 1 as initial values. 3.19. Compute the discrete logarithm of 5 to base 2 in the multiplicative group (ℤ/19ℤ)∗ . Reduce the problem with the Pohlig–Hellman method to discrete logarithms in groups of smaller order. 3.20. Let p = 22 + 1 be a Fermat prime. Show that discrete logarithms in (ℤ/pℤ)∗ can be computed in deterministic polynomial time. k
3.21. Let q = 97, b = 4 and ω = −22. (a) Show that ω is a primitive bth root of unity in 𝔽q . (b) Specify the Fourier matrices F, F ∈ 𝔽b×b with respect to ω. q (c) Compute the product f ∗ g of the polynomials f(X) = 1 + X + X 2 and g(X) = 2 + 3X using the FFT. 3.22. Let p1 , p2 , p3 ∈ 256 ℤ + 1 be distinct primes of at most 64 bits, let ω i be a primitive 256 th root of unity in the field K i = ℤ/p i ℤ, and let F i be the Fourier transform in K i with respect to ω i . On input two numbers u = ∑j⩾0 u j 264j and υ = ∑j⩾0 υ j 264j with 0 ⩽ u j , υ j < 264 we determine U i (X) = ∑j⩾0 (u j mod p i )X j and V i (X) = ∑j⩾0 (υ j mod p i )X j . Then, in K i we compute the coefficients of j W i (X) = U i (X) ∗ V i (X) = F −1 i (F i (U i ) ⋅ F i (V i )) = ∑ j⩾0 w i,j X
using the FFT. Finally, we determine the polynomial W(X) = ∑j⩾0 w j X j with w j ≡ w i,j mod p i by means of the Chinese remainder theorem. At last, we return the number w = W(264 ). What is the running time of this algorithm? What are the maximum values for u and υ to ensure that w = uυ holds?
Summary Notions – – – – – – – –
addition chain Carmichael number square-free pseudoprime Eulerian pseudoprime strong pseudoprime square, quadratic residue square root
– – – – – – –
root extraction in groups factorization pseudo-random sequence B-smooth discrete logarithm primitive root of unity elementary arithmetic operation
120 | 3 Number theoretic algorithms
Methods and results – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
master theorem for estimation of recurrence equations birthday paradox: For random sequences of m events from Ω with m ⩾ √2|Ω| ln 2, the probability to have two identical elements in the sequence is greater than 1/2. fast exponentiation for computing a n with O(log n) multiplications binary gcd Fermat test properties of Carmichael numbers The error probability of the Solovay–Strassen primality test is at most 1/2. description of the Miller–Rabin primality test The error probability of the Miller–Rabin primality test is at most 1/4. Miller–Rabin scheme yields an algorithm for factorization of n if a multiple of φ(n) is known or if φ(n) has only small prime divisors Miller–Rabin scheme as evidence for the secret RSA key to be secure n Eulerian pseudoprime to base a ⇒ n pseudoprime to base a n strong pseudoprime to base a ⇒ n Eulerian pseudoprime to base a root extraction in groups of odd order root extraction in 𝔽q with q ≡ −1 mod 4 Tonelli’s algorithm for extracting roots in 𝔽q with odd number q Cipolla’s algorithm for extracting roots in 𝔽q with odd q Pollard’s p − 1 algorithm Pollard’s ρ algorithm for integer factorization quadratic sieve Shanks’ baby-step giant-step algorithm for discrete logarithm Pollard’s ρ algorithm for discrete logarithm Pohlig–Hellman algorithm for discrete logarithm: step 1 reduction to prime powers, step 2 reduction to prime divisors index calculus algorithm Karatsuba integer multiplication in O(n 1,58496⋅⋅⋅ ) division using Newton method discrete Fourier transform: ring isomorphism R[X]/(X b − 1) → R b fast Fourier transform: computing the discrete Fourier transform with O(b log b) elementary arithmetic operations primitive roots of unity in ℤ/(2m + 1)ℤ Schönhage–Strassen integer multiplication in O(n log n log log n)
4 Polynomial time primality test In August 2002, a message rapidly spread out, reaching all mathematically interested people throughout the world within hours: “PRIMES is in P.” The scientists Manindra Agrawal, Neeraj Kayal, and Nitin Saxena from India had found a polynomial-time primality test. Composed of the initials of their last names, this method is called the AKS test. It is neither based on unproven hypotheses like, for example, the generalized Riemann hypothesis (Georg Friedrich Bernhard Riemann, 1826–1866), nor does it provide any probabilistic claims like the Miller–Rabin test from Section 3.4.3. The problem called PRIMES, that is, “check whether an integer, given in binary, is prime,” is thus in the complexity class P of problems decidable in deterministic polynomial time. The simplicity of the method is sensational. We are able to treat all aspects of the proof in full detail here. No specialized number theoretical insight that would go beyond the scope of this book is required. The work of Agrawal, Kayal, and Saxena was awarded the Gödel-Prize (Kurt Gödel, 1906–1978)¹ in 2006. By now, a number of textbooks have presented the AKS test; see, for example [19, 32, 41]. Our presentation becomes a little easier due to the use of field extensions instead of cyclotomic polynomials. The necessary prerequisites for understanding the AKS test are modest: only parts of Sections 1.1, 1.4, 1.5, 1.6, and 1.8 are required.
4.1 Basic idea The following lemma is the main idea behind the AKS primality test. The proof is similar to Fermat’s little theorem. Lemma 4.1. Let a, n ∈ ℕ with gcd(a, n) = 1. The number n is prime if and only if the following polynomial congruence holds in ℤ[X]: (X + a)n ≡ X n + a
mod n
Proof: First, suppose that n is prime. We have (X + a)n ≡ X n + a n ≡ X n + a mod n. The first congruence follows from Theorem 1.28 and the second from Fermat’s little Theorem 1.29. For the converse, consider a prime number p which is a proper divisor of n and let p k be the largest power of p dividing n. Then, p k is also the largest power of p dividing n(n − 1) ⋅ ⋅ ⋅ (n − p + 1). It follows that p k−1 is the largest power of p dividing (np). Thus, the term ( np)X p a n−p does not vanish modulo n.
1 The Gödel Prize is awarded annually for the best work in the field of theoretical computer science, published in the last 13 years.
122 | 4 Polynomial time primality test
The implication from right to left is not required in the following; as a primality test, it cannot be used in this form, anyway, because if we multiply out (X + a)n , in principle all the n + 1 coefficients have to be computed, and the number of coefficients is exponential in the input size ⌊log2 n⌋ + 1. The congruence (X + a)n ≡ X n + a mod n is weakened to (X + a)n ≡ X n + a mod (X r −1, n) for a small number r ∈ ℕ. This simplified congruence has to be checked for all a with a ⩽ ℓ for some sufficiently large ℓ. The idea is to show that r and ℓ can be chosen to be small, that is, polylogarithmic in n. We first note that the congruence (X + a)n ≡ X n + a mod (X r − 1, n) can also be seen as an equality (X + a)n = X n + a in the quotient ring ℤ[X]/(X r − 1, n), where (X r − 1, n) is the ideal generated by the polynomials X r − 1 and n. Put another way: a congruence f1 (X) ≡ f2 (X)
mod (X r − 1, n)
means that there are polynomials g(X), h(X) ∈ ℤ[X] satisfying f1 (X) = f2 (X) + (X r − 1)g(X) + nh(X) From this description, it is obvious that congruence modulo n implies congruence modulo (X r − 1, n). The computation of (X + a)n mod (X r − 1, n) can be done by fast exponentiation and we may replace each power X r by 1 and compute coefficients modulo n. The polynomials then have at most r coefficients, each of which is smaller than n. Thus, if r is polylogarithmic in n, then (X + a)n ≡ X n + a mod (X r − 1, n) can be checked in polynomial time.
4.2 Combinatorial tools In the next three sections, we provide some tools for the correctness proof of the AKS test. We start with two elementary combinatorial theorems. For a set A let ( Ak) be the set of all k-element subsets of A. Theorem 4.2. For every finite set A, we have |(Ak)| = (|A| k ). Proof: Let |A| = n. The theorem is correct for k ⩽ 0 or k ⩾ n. For 0 < k < n there are n(n − 1) ⋅ ⋅ ⋅ (n − k + 1) sequences (a1 , . . . , a k ) of pairwise different elements a i ∈ A. Two such sequences represent the same subset of A if the sequences coincide up to a permutation of the indices. There are k! such permutations. The following theorem often appears in connection with drawing balls from an urn as “unordered selection with repetition.” Note that 0 ∈ ℕ is a natural number. Theorem 4.3. For all r, s ∈ ℕ, we have |{ (e1 , . . . , e s ) ∈ ℕ s | ∑si=1 e i ⩽ r }| = ( r+s s ). Proof: We imagine r + s points arranged in a horizontal row. From these, we choose s points and replace them by bars. By Theorem 4.2, there are (r+s s ) possibilities for
4.3 Growth of the least common multiple | 123
choosing the positions of the bars. Each such selection corresponds to one s-tuple (e1 , . . . , e s ) ∈ ℕ s with ∑si=1 e i ⩽ r ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ ∙ ∙ ⋅ ⋅ ⋅ ∙ ∙ | ∙⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ ∙ ⋅ ⋅ ⋅ ∙ ∙ | ⋅ ⋅ ⋅ | ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ ∙ ∙ ⋅ ⋅ ⋅ ∙ ∙ | ∙⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ ∙ ⋅⋅⋅ ∙ ∙ e 1 points e 2 points e s points remainder ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ r points and s bars
First, e1 points are chopped off by the first bar. After the first bar, e2 points are chopped off by the second bar, and so on. The sth bar is followed by some remaining points in order to obtain a total of r points. This gives a bijection between the solutions of the inequality and the possible choices of points and bars. t+ℓ For r = t − 1 and s = ℓ + 1, we have ( r+s s ) = (ℓ+1) = previous theorem yields
(t+ℓ)! (t−1)! (ℓ+1)!
t+ℓ = (t−1 ). Therefore, the
ℓ t+ℓ { (e0 , . . . , eℓ ) ∈ ℕℓ+1 ∑ e i < t } = ( ) t −1 i=0
(4.1)
4.3 Growth of the least common multiple We denote the least common multiple of the numbers 1, . . . , n by lcm(n). In the following, we derive a lower bound for the growth of lcm(n). The proof we present is based on an article by Nair [79]. Lemma 4.4. For all m, n ∈ ℕ with 1 ⩽ m ⩽ n, we have m(mn ) | lcm(n). 1
Proof: We investigate the integral I = ∫0 x m−1 (1 − x)n−m dx. The evaluation is done in two different ways. First, we apply the binomial theorem to rewrite (1 − x)n−m as k ∑k (−1)k (n−m k )x . Then, we obtain x m−1 (1 − x)n−m = ∑(−1)k ( k
n − m m−1+k )x k
Thus, the evaluation of the integral yields 1
I = ∑(−1)k ( k
n−m n−m 1 ) ∫ x m−1+k dx = ∑(−1)k ( ) k k m +k k 0
If we multiply I with lcm(n), then I ⋅ lcm(n) becomes an alternating sum of integers since lcm(n) m+k ∈ ℕ for 0 ⩽ k ⩽ n − m. The value of I being positive, we see that I ⋅ lcm(n) is in ℕ. By induction on n − m, we now show that 1/I = m( mn ). For m = n, we have 1
1
I = ∫ x m−1 (1 − x)n−n dx = ∫ x m−1 dx = [ 0
0
1 1 m 1 1 x ] = = m m m( m 0 m)
124 | 4 Polynomial time primality test 1 m Let now 1 ⩽ m < n. We use partial integration ∫ u υ dx = uυ − ∫ uυ dx with u = m x m−1 n−m n−m−1 and u = x and with υ = (1 − x) and υ = −(n − m)(1 − x) . Note that u(1) ⋅ υ(1) = u(0) ⋅ υ(0) = 0. Thus, we obtain 1
1
I = ∫ u υ dx = − ∫ uυ dx 0
since u(1) ⋅ υ(1) = 0 = u(0) ⋅ υ(0)
0 1
=
n−m ∫ x(m+1)−1 (1 − x)n−(m+1) dx m 0
1 n−m ⋅ = n m (m + 1)( m+1 ) 1 = m(mn ) This shows
lcm(n) m(mn )
by induction
∈ ℕ and finally m( mn ) | lcm(n).
n = Since (2n n ) is the largest binomial coefficient in the binomial expansion of 4 2n 2n 2n n (1 + 1) = ∑k ( k ), we see that ( n ) is greater than the average value 4 /(2n + 1). For n ⩾ 3, in particular 2n 4n (4.2) > 2n ( )> 2n + 1 n n By Lemma 4.4, this yields lcm(2n) ⩾ n(2n n ) > n2 for n ⩾ 3. We improve this estimate a little in the following theorem.
Theorem 4.5. For all n ⩾ 7, we have lcm(n) > 2n . Proof: Lemma 4.4 yields two divisors of lcm(2n + 1) (2n + 1)(
2n 2n + 1 ) = (n + 1)( ) | lcm(2n + 1) n n+1 n(
2n ) | lcm(2n) | lcm(2n + 1) n
Since n and 2n + 1 are coprime, n(2n + 1)(2n n ) is a divisor of lcm(2n + 1). With Equa⩽ lcm(2n + 1). Let n ⩾ 4. Then tion (4.2), we obtain n ⋅ 4n < n (2n + 1)(2n ) n 22n+2 = 4 ⋅ 22n ⩽ n ⋅ 4n < lcm(2n + 1) ⩽ lcm(2n + 2) Therefore, 2n < lcm(n) for all n ⩾ 9. The only remaining cases to investigate are n = 7 and n = 8. We directly compute 27 = 128 < 420 = lcm(7) and 28 = 256 < 840 = lcm(8).
4.4 Of small numbers and large orders
|
125
4.4 Of small numbers and large orders Let gcd(n, r) = 1. Then ordr (n) denotes the order of n in the multiplicative group (ℤ/rℤ)∗ , that is ordr (n) = min { i i ⩾ 1, n i ≡ 1 mod r } The following technical lemma states that, under certain assumptions, a small number r is guaranteed to exist such that ordr (n) is large with respect to r. For the rest of this section log, as usual, denotes the logarithm to base 2. Lemma 4.6. Let m ⩾ 2 and let n be a prime number with m2 log n < n. Then there exists a positive number r ⩽ m2 log n satisfying ordr (n) > m. Proof: Let s = ⌊m2 log n⌋. Since n is prime and n > s, the order ordr (n) is defined for all r with 1 ⩽ r ⩽ s. By contradiction, we assume that ord r (n) ⩽ m for all such r. In particular, for each r ∈ {1, . . . , s} there exists i ∈ {1, . . . , m} such that n i ≡ 1 mod r. i So r divides n i − 1 and therefore also the product ∏m i=1 (n − 1). This implies that lcm(s) m i is a divisor of ∏i=1 (n − 1). Since m ⩾ 2, we have s ⩾ 7. Using Theorem 4.5, we obtain m
2s < lcm(s) ⩽ ∏(n i − 1) < n(
m+1 2
)
i=1
It follows m2 log n − 1 < s < ( m+1 2 ) log n and thus m ⩽ 1, a contradiction. Theorem 4.7. Let n ⩾ 225 be a prime number. Then there exists r ∈ ℕ such that r ⩽ log5 n and ordr (n) > log2 n. Proof: For n ⩾ 225 we have n > log5 n, and with m = ⌊log2 n⌋, by Lemma 4.6 there is a suitable number r satisfying r ⩽ log5 n.
4.5 Agrawal–Kayal–Saxena primality test The AKS primality test on input n ∈ ℕ works as follows. (1) If n < 225 , check primality of n directly. From now on, we assume that n > log5 n. (2) Check whether n = m k for a number m and some k ⩾ 2. In the following, we may assume that n ≠ p k for all primes p and all k ⩾ 2. (3) Search for r ∈ {⌈log2 n⌉ + 1, . . . , ⌊log5 n⌋} with gcd(r, n) = 1 and ordr (n) > log2 n. If no such number r can be found (or we find r with gcd(r, n) ≠ 1), then by Theorem 4.7 the input n is composite and we halt the procedure. From now on, we assume that r ∈ ℕ with gcd(r, n) = 1 and log2 n < r ⩽ log5 n, as well as ordr (n) > log2 n. (4) For all a ∈ {2, . . . , r − 1} check whether gcd(a, n) = 1. From now on, we additionally assume that gcd(a, n) = 1 for all 1 ⩽ a ⩽ r.
126 | 4 Polynomial time primality test (5) Let ℓ = ⌊√φ(r) log n⌋ and for all a ∈ {1, . . . , ℓ} check the congruence (X + a)n ≡ X n + a
mod (X r − 1, n)
If the answer is positive for all a ∈ {1, . . . , ℓ}, then n is prime, otherwise n is composite. Theorem 4.8. The AKS primality test is correct and it runs in polynomial time (in the input size log n). The remainder of this section is devoted to the proof of Theorem 4.8. The number 225 is a constant and has no influence on the asymptotic runtime. For each exponent k ⩽ log n it can be checked by binary search, whether a number m exists such that n = m k . In particular, the test, whether n = m k can be done in polynomial time. The values a, ℓ, r are bounded by log5 n. The Euclidean algorithm for testing whether gcd(a, n) = 1 has polynomial running time, and for each pair (a, r), the congruence (X + a)n ≡ X n + a
mod (X r − 1, n)
can be checked in polynomial time as well. For a prime number n, the AKS test outputs that n is prime. This follows from Lemma 4.1 and Theorem 4.7. It remains to show that the test is able to expose composite numbers. We therefore assume, to the contrary, that the AKS algorithm claims n to be prime, even though there exists a prime number p < n which divides n. Overall, we can make the following assumptions: – n is composite – p is prime and p | n – ∀k ⩾ 1 : n ≠ p k – r ⩽ log5 n < n – ∀a ∈ {1, . . . , r} : gcd(a, n) = 1 – log2 n < ordr (n) ⩽ φ(r) – ℓ = ⌊√φ(r) log n⌋ ⩽ φ(r) – ∀a ∈ {1, . . . , ℓ} : (X + a)n ≡ X n + a mod (X r − 1, n) From these assumptions, we will derive a contradiction. Let 𝔽 be the splitting field of the polynomial f(X) = X r − 1 over 𝔽p . Then, 𝔽 is a finite field extension of 𝔽p , in which the polynomial f(X) splits into linear factors r
f(X) = X r − 1 = ∏(X − α i ) i=1
with α i ∈ 𝔽. Since f (X) = rX r−1 with r ≢ 0 mod p, we have f (α i ) ≠ 0 for all roots α i . Therefore, the roots are pairwise distinct by Proposition 1.49. Thus, the set U = {α 1 , . . . , α r } ⊆ 𝔽∗
4.5 Agrawal–Kayal–Saxena primality test | 127
is a subgroup of order r which, according to Theorem 1.56, is cyclic. There are φ(r) possibilities for choosing a generator of U. Let L = {0, 1, . . . , ℓ} Since neither 0 nor 1 generate U and since ℓ ⩽ φ(r), some generator α of U satisfies −α ≠ a for all a ∈ L. We fix α and we note that α + a ≠ 0 in 𝔽 for all a ∈ L. We have r < p since gcd(a, n) = 1 for all a ∈ {1, . . . , r}. Because of ℓ < r < p, we can consider L as a subset of 𝔽p ⊆ 𝔽. Definition 4.9. We call a number m ∈ ℕ introspective for a polynomial g(X) ∈ 𝔽p [X] if ∀β ∈ U : g(β)m = g(β m ) Lemma 4.10. Let m1 , m2 be introspective for g(X) ∈ 𝔽p [X]. Then also m1 m2 is introspective for g(X). Proof: For β ∈ U, we also have β m1 ∈ U. This yields g(β)m1 m2 = g(β m1 )m2 = g(β m1 m2 ). Lemma 4.11. The numbers a ∈ L.
n p
and p are introspective for all polynomials (X + a) with
Proof: The prime p is introspective for every polynomial g(X) ∈ 𝔽p [X] because in characteristic p, the mapping x → x p is an automorphism of fields. Thus, g(β)p = g(β p ) for n n all β ∈ 𝔽. It remains to show (β + a) p = β p + a for all β ∈ U. Again, we take advantage of the fact that x → x p is an automorphism of fields. Therefore, in 𝔽, we obtain n
n
(β + a) p = β p + a n
⇔ (β + a)n = (β p + a)p ⇔ (β + a)n = β n + a p ⇔ (β + a)n = β n + a By assumption, we have (X + a)n ≡ X n + a mod (X r − 1, n). This yields (X + a)n ≡ X n + a mod (X r − 1, p). Together with β r = 1, we conclude (β + a)n = β n + a in 𝔽. This shows that np is introspective. Lemma 4.12. Let m be introspective for the polynomials g(X), h(X) ∈ 𝔽p [X]. Then m is introspective for the polynomial (g ⋅ h)(X) = g(X) ⋅ h(X). Proof: We have (g ⋅ h)(β)m = g(β)m ⋅ h(β)m = g(β m ) ⋅ h(β m ) = (g ⋅ h)(β m ). Lemma 4.13. For all i, j ⩾ 0, all a ∈ L and all e a ∈ ℕ, the number ( np )i p j is introspective for the polynomial g(X) = ∏ a∈L (X + a)e a . Proof: By Lemma 4.11 and Lemma 4.12, both np and p are introspective for g(X). Lemma 4.10 shows that i and j can be chosen arbitrarily.
128 | 4 Polynomial time primality test
In the following, let n i G = { ( ) p j mod r ∈ ℤ/rℤ i, j ⩾ 0 } p be the subgroup of (ℤ/rℤ)∗ generated by
n p
and p. Define t = |G| and observe that
2
log n < ord r (n) ⩽ t ⩽ φ(r). Let P be the set of polynomials P = { ∏ (X + a)e a ∑ e a < t } a∈L a∈L t+ℓ ). Let Equation (4.1) shows |P| = ( t−1
G = { g(α) ∈ 𝔽 | g ∈ P } Then G ⊆ 𝔽∗ because α + a ≠ 0 holds for all a ∈ L = {0, . . . , ℓ}. We give two different estimates for |G|, and then show that they contradict each other. Lemma 4.14. The mapping P → 𝔽∗ with g(X) → g(α) is injective. Proof: Let g1 (X), g2 (X) ∈ P with g1 (α) = g2 (α). Consider m = ( np )i p j . Then, we have g1 (α m ) = g1 (α)m = g2 (α)m = g2 (α m ) Thus, the polynomial g1 (X) − g2 (X) has at least the t roots α m with m ∈ G. On the other hand, the degree of g1 (X) − g2 (X) is less than t, so g1 (X) = g2 (X) and therefore the mapping g(X) → g(α) is injective. t+ℓ As a consequence of Lemma 4.14, we have |G| = |P| = ( t−1 ). We consider a set of introspective numbers ̂I ⊆ ℕ defined as follows:
i ̂I = { ( n ) p j 0 ⩽ i, j ⩽ ⌊√t⌋ } p At this point, we have to use the assumption that n has at least two prime divisors, thus ensuring that ̂I contains sufficiently many numbers. Since np has a prime divisor different from p, the set ̂I contains exactly (⌊√t⌋ + 1)2 elements, and |̂I| > t = |G|. The mapping ̂I → G : m → (m mod r) thus cannot be injective, which means that there exist m1 , m2 with 1 ⩽ m1 < m2 and m1 , m2 ∈ ̂I, such that m1 ≡ m2 mod r. We further remark that m2 ⩽ n√t . Now we show that all elements g(α) ∈ G are roots of the nontrivial polynomial Y m1 − Y m2 . From α r = 1, we obtain α m1 = α m2 . For g(α) ∈ G it follows that g(α)m1 − g(α)m2 = g(α m1 ) − g(α m2 ) = 0. As a consequence |G| ⩽ m2 ⩽ n√ t because the polynomial Y m1 − Y m2 has at most m2 roots. We obtain a contradiction by
Summary
| 129
the following computation: |G| = ( ⩾( ⩾(
t+ℓ ) t−1
Lemma 4.14
ℓ + 1 + ⌊√ t log n⌋ ) ⌊√t log n⌋
since √ t > log n
2 ⌊√t log n⌋ + 1 ⌊√t log n⌋
)
> 2√t log n =n
since ℓ = ⌊√φ(r) log n⌋ ⩾ ⌊√ t log n⌋ using Equation (4.2), as n is large enough
√t
This proves the correctness of the AKS test and thus Theorem 4.8. We now summarize the steps of the proof. The assumptions in our proof by contradiction lead us to a field 𝔽 and a choice of α ∈ 𝔽. A principle used several times is the fact that in fields the zero polynomial is the only polynomial having more roots than indicated by the degree; in combinatorics, this argument is known as the polynomial method. Then, a set G of exponents is considered which applied to α lead to different elements α m from 𝔽. Depending on |G|, we describe a set P of polynomials. For the set P, by construction, we know its cardinality. The set P is embedded into the multiplicative group of 𝔽 by evaluation at α. The subset of 𝔽 which corresponds to P is called G and we obtain |G| = |P|. The next step shows that all elements of G are roots of a polynomial of small degree. This yields a second estimate of G which, however, contradicts the first.
Summary Notions – – –
PRIMES deterministic polynomial time P f1 (X) ≡ f2 (X) mod (X r − 1, n)
–
(Ak)
denotes the set of k-subsets of A
– – – –
lcm(n) is the least common multiple of 1, . . . , n ord r (n) is the order of n in (ℤ/rℤ)∗ introspective number polynomial method
Methods and results – –
Let gcd(a, n) = 1. Then, n is prime ⇔ (X + a)n ≡ X n + a mod n For every finite set A, we have |(Ak)| = (|A| k ).
–
|{ (e 1 , . . . , e s ) ∈ ℕs | ∑si=1 e i ⩽ r }| = (r+s s )
–
t+ℓ |{ (e 0 , . . . , e ℓ ) ∈ ℕℓ+1 | ∑ℓi=0 e i < t }| = (t−1 )
–
lcm(n) > 2n for n ⩾ 7
130 | 4 Polynomial time primality test
– –
If n ⩾ 225 is prime, then there exists r ⩽ log5 n with ordr (n) > log2 n. AKS primality test on input n ⩾ 225 : If n = m k for k ⩾ 2, then output “n is composite.” Compute r ⩽ log5 n with gcd(r, n) = 1 and ord r (n) > log2 n. If such an r does not exist, then output “n is composite.” If gcd(a, n) ≠ 1 for some a ∈ {2, . . . , r − 1}, then output “n is composite.” Let ℓ = ⌊√φ(r) log n⌋. If (X + a)n ≢ X n + a mod (X r − 1, n) for some a ∈ {1, . . . , ℓ}, then output “n is composite.” (7) Otherwise, output “n is prime.”
(1) (2) (3) (4) (5) (6)
– – –
polynomial running time of the AKS test If n is prime, then the AKS test outputs that n is prime. If n is composite, then the AKS test outputs that n is composite.
5 Elliptic curves Johann Carl Friedrich Gauss (1777–1855) showed that the complex numbers ℂ can be identified with the plane ℝ × ℝ. By contracting the infinitely far border of this plane to a single point, we obtain the surface of a sphere. The extra point in the drawing corresponds to the North Pole and the real numbers together with this point appear as a circle on the sphere.
The typical shape of an elliptic curve over ℂ is more complicated and it emerges when we consider a grid. In the simplest case, this is the additive subgroup L = ℤ + iℤ. For the grid L, we can form the quotient group ℂ/L. We can visualize this group as a three-dimensional image similar to the surface of a sphere. However, ℂ/L does not give a sphere but a torus, that is, the surface of a donut. We start with the unit square [0, 1] × [0, 1] inside ℝ × ℝ. Then, the four edges of [0, 1] × [0, 1] belong to the grid L (when identifying ℝ × ℝ with ℂ); and we obtain ℂ/L by glueing together the top and the bottom edge as well as the left and the right edge. This results in a torus as a visualization of ℂ/L. In ℂ/L, seen as an additive group, there are exactly four elements of order at most 2. In the unit square [0, 1] × [0, 1], they correspond to the points (0, 0), (0, 1/2), (1/2, 0), and (1/2, 1/2). Thus, the Klein four-group ℤ/2ℤ × ℤ/2ℤ appears as a subgroup of ℂ/L. Consider a cubic equation of type Y 2 = X 3 +AX+B with constants A and B and two variables X and Y over the complex numbers ℂ; and let E = { (a, b) ∈ ℂ × ℂ | b 2 = a3 + Aa+B } be its set of solutions. If the roots of the polynomial X 3 +AX+B are all different, then E is a smooth surface of (real) dimension 2 inside the space ℂ×ℂ of dimension 4.
132 | 5 Elliptic curves
The set E can naturally be identified with the surface of a torus; as in the step from a plane to a sphere, one has to add some extra “point at infinity.” The connection between E and the surface of a torus is given by the following parametrization: there w exist w1 , w2 ∈ ℂ \ {0} with w12 ∈ ̸ ℝ and a total function f : ℂ → ℂ ∪ {∞} with f(z) = f(z + w) for all w in the grid L = { mw1 + nw2 | m, n ∈ ℤ } such that E ∪ {(∞, ∞)} = { (f(z), f (z)) z ∈ ℂ } As a partial mapping from ℂ to ℂ, the function f is meromorphic and f is its derivative; we set f (z) = ∞ whenever f (z) is undefined over ℂ. The parametrization has the additional property that the mapping ℂ/L → E ∪ {(∞, ∞)} with z → (f(z), f (z)) is bijective. This allows us to identify E ∪ {(∞, ∞)} with the torus ℂ/L. A meromorphic function f with f(z) = f(z + w) for all w ∈ L is called elliptic; and the above connection coins the term elliptic curve. For more details on parametrizations of elliptic curves, we refer to the existing literature and textbooks such as [57] or [90].
The real solutions of Y 2 = X 3 + AX + B can be obtained as the intersection of a plane and the torus defined by E. For instance, taking the intersection of the torus with the equatorial plane, we obtain two circles, one positioned inside the other. If we cut the outer circle and pull the endpoints to infinity, then the standard image of an elliptic curve over the real numbers emerges (see the second plot in Figure 5.1), and the points on this curve still satisfy the cubic equation. We can imagine that, in this case, the point (∞, ∞) from the torus has become an infinitely distant point, called the point at infinity. The three plots in Figure 5.1 show the elliptic curves (over ℝ) given by the equations Y 2 = X 3 − X + B with B ∈ {−1, 0, 1}. The third one, for example, shows the set of points { (a, b) ∈ ℝ × ℝ b 2 = a3 − a + 1 }
5 Elliptic curves |
Y 2 = X3 − X − 1
Y 2 = X3 − X
Y = X3 − X − 1
133
Y 2 = X3 − X + 1
Y = X3 − X
Y = X3 − X + 1
Fig. 5.1. Elliptic curves over ℝ.
We call such a set of points a curve. If (a, b) is a point on the curve, then (a, −b) is another one. The picture does not really look like an ellipse and, in fact, the connection between ellipses and elliptic curves can only be understood via a detour to elliptic functions. Due to our focus on discrete methods, elliptic curves over finite fields are of particular interest. To this end, we start with an algebraic equation Y 2 = X 3 + AX + B with coefficients in ℤ and investigate the points satisfying this equation modulo some prime p, that is, { (a, b) ∈ 𝔽p × 𝔽p b 2 = a3 + Aa + B } where 𝔽p denotes the finite field ℤ/pℤ. Figure 5.2 gives two examples of such “curves” for p = 101. There are some hidden lines and circular structures, but the exact scheme 50
-50
0
50
50
-50
-50
0
50
-50
Y 2 ≡ X 3 + 48X + 51 mod 101
Y 2 ≡ X 3 + 2X + 3 mod 101
Fig. 5.2. Elliptic curves over ℤ/101ℤ.
134 | 5 Elliptic curves
looks rather random. This “regular chaos” makes elliptic curves of significant interest for cryptographic applications. In the distant future, the Rivest–Shamir–Adleman (RSA) algorithm could be broken, while cryptography using elliptic curves might remain secure. It is widely believed that techniques such as the quadratic sieve (Section 3.6.3) or the index calculus method (Section 3.7.4) cannot be efficiently adapted to elliptic curves; this approves the use of elliptic curves in cryptography and it is the main reason for smaller key sizes in elliptic curve cryptosystems. The position of the points follows a precise geometric structure, thereby forming an Abelian group. A computer is able to perform computations in this group’s structure very efficiently. The fascination of elliptic curves has influenced whole generations of mathematicians. The study of such curves lead Andrew Wiles (born 1953) and Richard Taylor (born 1962) to the proof of Fermat’s Last Theorem, which states that, for every integer n ⩾ 3, the equation X n + Y n = Z n has no positive integer solution. In the following, let K denote a field and k an algebraically closed field containing K. Thus, k is always infinite; it is a field extension of K in which each polynomial splits into linear factors. For K = ℚ or K = ℝ, we choose k = ℂ. For K = 𝔽p , we content ourselves with the knowledge that an algebraic closure exists and that it is infinite. We do not treat the subject in full generality: fields of characteristic 2 are omitted completely, and for characteristic 3, we only consider curves in Weierstrass normal form (Karl Theodor Wilhelm Weierstrass, 1815–1897). A general elliptic curve is defined by an equation of the form Y 2 + CXY + DY = X 3 + EX 2 + AX + B for constants A, B, C, D, and E. The equation is in Weierstrass normal form if C = D = E = 0. If the characteristic is neither 2 nor 3, then, by affine transformations, every elliptic curve is equivalent to one in Weierstrass normal form; see Exercise 5.1. We choose coefficients A, B ∈ K and consider the polynomial s(X) = X 3 + AX + B = (X − a1 )(X − a2 )(X − a3 ). We are only interested in the (generic) case where the three roots a1 , a2 , a3 ∈ k are pairwise distinct. This is equivalent to 4A3 + 27B2 ≠ 0; see Exercise 5.2. The polynomial s(X) leads us to an equation with two variables Y 2 = X 3 + AX + B The solutions of this equation can be interpreted as a curve in K × K. We call the set E(K) = { (a, b) ∈ K × K b 2 = a3 + Aa + B } an elliptic curve over K. In particular, E(K) is a subset of E(k). As we have seen earlier, it is natural to add a so-called point at infinity, denoted by O. By abuse of notation, we also call ̃ E(K) = E(K) ∪ {O}
5.1 Group law |
135
an elliptic curve. In projective coordinates, the point at infinity would belong to the curve in a natural way from the very beginning. Unlike E(K), the curve E(k) always contains infinitely many points: since k is algebraically closed, we can always extract square roots. In particular, for each of the infinitely many a ∈ k there exists b ∈ k with (a, b) ∈ E(k). For a point P = (a, b) ∈ k×k, we define P = (a, −b) and O = O. Thus, besides O, ex̃ actly the three points P i = (a i , 0) in E(k) satisfy P i = P i . As we will see later, these four points form a Klein four-group with O as the neutral element. If the polynomial s(X) already splits into linear factors over K, then the points P i are in E(K). If q is a prime power, we have |E(𝔽q )| ⩽ 2q because each X-coordinate provides at most two solutions in the Y-coordinate. A deep result by Helmut Hasse (1898–1979) provides a much more precise statement: q − 2√q ⩽ E(𝔽q ) ⩽ q + 2√q
(5.1)
Formula (5.1), in a way, provides a connection between chance and necessity¹. Suppose for every a ∈ 𝔽q , we independently and uniformly at random chose c ∈ 𝔽q and counted the pairs (a, b), (a, −b) if and only if c = b 2 is a square. Then, we would expect q pairs with a deviation of about √q. One can think of c as a randomized equivalent of s(a). By Hasse’s theorem, this probabilistic behavior coincides rather precisely with the actual bounds.
5.1 Group law The interest in elliptic curves in the context of cryptography or number theoretic al̃ ̃ gorithms is based upon a binary operation + on E(K) such that (E(K), +, O) forms an Abelian group. We present a geometric approach, which uses lines and their intersec̃ tion with E(K) as an approach to define the addition. It turns out to be quite obvious ̃ ̃ that + maps two points of E(K) to a point on E(K), that O is the neutral element, every point has an inverse, and + is commutative. The only unobvious property is associativity, which is surprisingly difficult to prove. We give a complete and self-contained proof of associativity in Sections 5.1.2 and 5.1.3.
5.1.1 Lines A line in k × k is a set of points L that can either be described by the equation X = a or by Y = μX+ν. In the first case, we speak of a vertical. We imagine the point at infinity O lying on every vertical but on no other line.
1 According to Jacques Monod’s (1910–1976) book Le hasard et la nécessité. Essai sur la philosophie naturelle de la biologie moderne from 1970.
136 | 5 Elliptic curves
We investigate the intersection of lines with points on an elliptic curve E(k) given by Y 2 = s(X) with s(X) = X 3 + AX + B. If L is a vertical given by X = a, then the points P = (a, b) and P = (a, −b) are in the intersection L ∩ E(k) whenever b 2 = a3 + Aa + B. For b = 0, we have P = P and the point is counted twice. Together with the point at ̃ infinity O, this gives exactly three points in the intersection of E(k) and the vertical. If L is described by the equation Y = μX + ν and if P = (a, b) belongs to the intersection of L and E(k), then a is a root of the following polynomial: t(X) = s(X) − (μX + ν)2 = X 3 − μ2 X 2 + (A − 2μν)X + B − ν2 This polynomial of degree 3 over k has exactly three (not necessarily distinct) roots, and each root uniquely defines an associated Y-coordinate via the linear equation. If we write t(X) = (X − x1 )(X − x2 )(X − x3 ), then a ∈ {x1 , x2 , x3 } and, by comparing the coefficients of X 2 , we obtain x1 + x2 + x3 = μ 2 . Let y i = μx i + ν for i ∈ {1, 2, 3}. Then, we obtain L ∩ E(k) = { (x i , y i ) | i = 1, 2, 3 } If we know the two points (x1 , y1 ) and (x2 , y2 ), then we can determine the third point (x3 , y3 ) using the following formulas: x3 = μ2 − x2 − x1 y3 = μx3 + ν We find the parameter ν by ν = y1 − μx1 To determine the parameter μ, we have to distinguish between the cases x1 ≠ x2 and x1 = x2 . Assume that x1 ≠ x2 . Then, μ can be computed as μ=
y2 − y1 x2 − x1
Otherwise, if x1 = x2 , then x1 is a double root of t(X), and we have 0 = t (x1 ) = 3x21 − 2μ 2 x1 + A − 2μν = 3x21 + A − 2μy1 We have y1 ≠ 0 because otherwise s (x1 ) = 0, that is, x1 would also be a double root of s(X), a case which we have excluded from our considerations. Together with the fact that 2 ≠ 0 in k, this yields 3x21 + A μ= 2y1 The value μ is the slope of the tangent of the elliptic curve at the point (x1 , y1 ). Many computations can be performed in the field K without having to consider the algebraic closure k. Let x1 , A, B, μ, ν ∈ K and as given earlier t(X) = s(X) − (μX + ν)2 = (X − x1 )(X − x2 )(X − x3 ). Then, x2 ∈ K if and only if x3 ∈ K. If x2 is in K, then L ∩ E(k) = { (x i , y i ) | i = 1, 2, 3 } ⊆ E(K). Thus, if two points of a line belong
5.1 Group law
| 137
̃ ̃ to E(K), then the third point is in E(K), too. This leads to the crucial idea of how to ̃ obtain the commutative group structure on the set E(K). We let the point at infinity O ̃ be the neutral element, and the sum of any three points of E(K) (with multiplicities) lying on a common line is defined to be zero. We make this explicit by giving formulas for the group operation. Several cases ̃ need to be distinguished. For all P ∈ E(K), we define P + O = O + P = P. The sum of two points P = (x1 , y1 ) and Q = (x2 , y2 ) of E(K) is computed as follows: 1 (a) If x1 ≠ x2 , we draw a line through P and Q. This means we compute μ = yx22 −y −x 1 and x3 = μ 2 − x2 − x1 and finally y3 = y1 + μ(x3 − x1 ). We let R = (x3 , −y3 ) and P+Q=R (b) If x1 = x2 and y1 ≠ −y2 , then y1 = y2 ≠ 0 and P = Q. We determine the tangent 3x 2 +A
slope μ = 2y1 1 and x3 = μ2 − 2x1 as well as y3 = y1 + μ(x3 − x1 ). The third point on the line through P and Q is R = (x3 , y3 ). To achieve P + Q + R = O, we define 2P = P + Q = R (c) Finally, if x1 = x2 and y1 = −y2 , then P and Q are on a vertical, and the third point on this line is O = O. Therefore, we define P+Q=O ̃ Theorem 5.1. The aforementioned operation on E(K) defines the structure of an Abelian group with O as its neutral element. ̃ ̃ Proof: We have already seen that P + Q ∈ E(K) for P, Q ∈ E(K). From the rules given earlier it follows that – O is the neutral element, – the inverse −P of P = (a, b) ∈ E(K) is P = (a, −b), and ̃ – we have P + Q = Q + P for all P, Q ∈ E(K). The only condition which is not clear and, in fact, difficult to show, is the associative law (P + Q) + R = P + (Q + R) In order to show associativity, we take a detour via polynomials. The presentation of this proof is the subject of the following two sections.
5.1.2 Polynomials over elliptic curves We define and investigate polynomials over E(k). As given earlier, let s(X) = X 3 + AX + B. Let a1 , a2 , a3 ∈ k be the roots of s(X); they are assumed to be pairwise different. The polynomial ring in two variables X and Y is denoted by k[X, Y]. Each polynomial f(X, Y) ∈ k[X, Y] can be evaluated at (a, b) ∈ k × k to obtain f(a, b). We restrict
138 | 5 Elliptic curves
ourselves to the evaluation at points of the elliptic curve. At these points, the evaluation of f(X, Y) ∈ k[X, Y] coincides with the evaluation of g(X, Y) = f(X, Y) + (Y 2 − s(X)) ⋅ h(X, Y) for each polynomial h(X, Y) ∈ k[X, Y]. Thus, in the ring k[X, Y] we can introduce an equation Y 2 = s(X) and consider the quotient ring k[x, y] = k[X, Y]/ ⟨Y 2 = s(X)⟩ where ⟨Y 2 = s(X)⟩ is the ideal generated by Y 2 − s(X). We call k[x, y] the polynomial ring over E(k). The element x ∈ k[x, y] is the residue class of X modulo ⟨Y 2 = s(X)⟩ and, similarly, y is the residue class of Y. For a polynomial f ∈ k[x, y] and a point P = (a, b) ∈ E(k), the value f(P) = f(a, b) ∈ k is well defined, since all polynomials g in the residue class of f satisfy g(P) = f(P) for all P ∈ E(k). In the following, we show that the ring k[x, y] behaves very similarly to the usual polynomial ring. To this end, we introduce the norm of a polynomial, which allows us to use our knowledge about polynomials in one variable. Using the norm, we introduce a notion of degree that satisfies the degree formula. The usual notion of a degree cannot be used here since, for example, y2 = s(x) would have both degree 2 and degree 3. Then, we show that, for points on the elliptic curve, the order of a root can be defined in a reasonable manner. It is not entirely obvious how this can be achieved since, for instance, the point (a1 , 0) of the elliptic curve is a root of the polynomial x + y − a1 , but neither x − a1 nor y can be factored out. By k[x], we denote the image of the polynomial ring k[X] in k[x, y]. It is clear that k[X] can be identified with k[x]. Note, however, that (y3 − y2 )(y + 1) is in k[x] while neither y3 − y2 nor y + 1 is in k[x]. Every polynomial f ∈ k[x, y] can be written as f = υ(x) + y ⋅ w(x) for υ(x), w(x) ∈ k[x]: Suppose that f = ∑i⩾0 f i (x)y i for f i (x) ∈ k[x]. Then f = ∑ f i (x)y i = ∑ f2i (x)s i (x) + y ∑ f2i+1 (x)s i (x) i⩾0
i⩾0
i⩾0
and we can set υ(x) = ∑i⩾0 f2i (x)s i (x) and w(x) = ∑i⩾0 f2i+1 (x)s i (x). Lemma 5.2. If f ∈ k[x, y] is a polynomial with f ≠ 0, then it has only a finite number of roots in E(k). Proof: Let f = υ(x) + y ⋅ w(x). We reduce the problem to a polynomial with only one variable. Consider g(x) = f ⋅ (υ(x) − yw(x)) = υ2 (x) − y2 w2 (x) = υ2 (x) − s(x)w2 (x) ∈ k[x] If f(P) = 0 for infinitely many P ∈ E(k), then the polynomial g has infinitely many roots in k. Thus, it is the zero polynomial. Since the degrees in x of w2 (x) and υ2 (x),
5.1 Group law | 139
respectively, are even (or −∞) but the degree of s(x) is 3, it follows that υ(x) = w(x) = 0 and thus f = 0. The lemma implies uniqueness of the representation f = υ(x) + yw(x). To see this, let ̃ ̃ ̃ ̃ υ(x)+yw(x) = υ(x)+y w(x) and consider g = υ(x)− υ(x)+y(w(x)− w(x)). Then, g(P) = 0 ̃ ̃ for all P ∈ E(k), and thus, from Lemma 5.2 we obtain υ(x) = υ(x) and w(x) = w(x). For f = υ(x) + yw(x) ∈ k[x, y], we define f ∈ k[x, y] by f = υ(x) − yw(x). Since the representation f = υ(x) + yw(x) is unique, f is well defined. Implicitly, f was already used in the proof given earlier, even though we did not know (and did not use) that f it is well defined. We can now formally define the norm of f : N(f) = f f = υ2 (x) − s(x)w2 (x) Thus, the norm N(f) is a polynomial in k[x]. In particular, the norm of x is x2 and the norm of y is −s(x). One can easily check that N(f ⋅ g) = N(f) ⋅ N(g). For f ∈ k[x] let deg x (f) be the degree of f as a polynomial in k[X]. Then, we have degx (N(f)) = max {2 deg x (υ(x)), 3 + 2 degx (w(x))} This holds because the degree of υ2 (x) is even (or −∞) but the degree of s(x)w2 (x) is odd (or −∞). Moreover, we note that N(f) = 0 implies f = 0. We define the degree of f ∈ k[x, y] by deg(f) = degx (N(f)) Note that deg(f) can never be 1, whereas all other values −∞, 0, 2, 3, . . . are in the image of deg. In the following, we will see that this is a key idea for understanding elliptic curves. The degree mapping factors are as follows: degx
N
deg : k[x, y] → k[x] → {−∞} ∪ ℕ These mappings are monoid homomorphisms, where the operations in k[x, y] and k[x] are multiplicative while in {−∞} ∪ ℕ it is additive. Let us keep in mind that deg(f) ≠ 1 and deg(fg) = deg(f) + deg(g) for all f, g ∈ k[x, y]. Suppose that fg = 0 in k[x, y]. Then, N(f)N(g) = 0 in k[x] and therefore N(f) = 0 or N(g) = 0. This implies f = 0 or g = 0, thereby showing that k[x, y] is an integral domain, that is, the zero polynomial 0 is the only zero divisor. Next, we want to define the order ordP (f) of a polynomial f ∈ k[x, y] at a point P ∈ E(k). Remember that, over an algebraically closed field k, every polynomial f ∈ k[X] can be split into linear factors n
f(X) = ∏(X − x i )d i i=1
where d i is the order (or multiplicity) of the root x i ∈ k and ∑ni=1 d i = degx (f). For polynomials f ∈ k[x, y], the situation is quite similar.
140 | 5 Elliptic curves Theorem 5.3. Let f ∈ k[x, y] with f ≠ 0 and P = (a, b) ∈ E(k). For exactly one d ∈ ℕ there exist g, h ∈ k[x, y] such that g(P) ≠ 0 ≠ h(P) and fg = (x − a)d h
if a ∈ ̸ {a1 , a2 , a3 }
fg = y h
if a ∈ {a1 , a2 , a3 }
d
where a1 , a2 , a3 are the roots of s(X). Proof: First, we show the uniqueness of d for a ∈ ̸ {a1 , a2 , a3 }. Consider fg = (x − a)d h ̃ and therefore ̃ = fg g̃ = (x − a)e hg and f g̃ = (x − a)e h̃ for d > e ⩾ 0. We obtain (x − a)d gh ̃ =0 ̃ − hg) (x − a)e ((x − a)d−e gh ̃ Using d > e ̃ = hg. Since (x − a)e is not the zero polynomial, we obtain (x − a)d−e gh d−e ̃ ̃ ̃ we see that ((x − a) gh)(P) = 0 and thus (hg)(P) = h(P)g(P) = 0. This shows that ̃ g(P) = 0 or h(P) = 0. Hence, for every a, there is at most one d satisfying fg = (x − a)d h and g(P) ≠ 0 ≠ h(P). Next, we show the uniqueness of the exponent d for a ∈ {a1 , a2 , a3 }. Note that in this situation, we have b = 0. Let fg = y d h and f g̃ = y e h̃ for d > e ⩾ 0. As before, we obtain ̃ =0 ̃ − hg) y e (y d−e gh ̃ Now, d > e implies y d−e gh(P) ̃ = hg. ̃ = 0 and Since y e is nonzero, this yields y d−e gh ̃ ̃ thus h(P)g(P) = 0. Therefore, g(P) = 0 or h(P) = 0. We now prove the existence of d. By Corollary 1.47, there exists an integer e ⩾ 0 and polynomials υ, w ∈ k[x] with f = (x − a)e (υ(x) + yw(x)) and υ(a) ≠ 0 or w(a) ≠ 0. First, consider the case a ∈ ̸ {a1 , a2 , a3 } and b ≠ 0. If υ(a) + bw(a) ≠ 0, we can choose d = e, g = 1 and h = υ(x) + yw(x). If υ(a) + bw(a) = 0, then υ(a) − bw(a) ≠ 0 has to hold because otherwise 2υ(a) = 0 = 2bw(a) in contradiction to υ(a) ≠ 0 or w(a) ≠ 0. Thus, the polynomial g = υ(x) − yw(x) satisfies g(P) ≠ 0 and fg = (x − a) e N(g). The polynomial (x − a)e N(g) is in k[x] and can thus be written as (x − a)d h(x) with h(a) = h(P) ≠ 0. For the other case, without loss of generality we may assume that a = a1 and b = 0. Then f ⋅ (x − a2 )e (x − a3 )e = s e (x)(υ(x) + yw(x)) = y2e (υ(x) + yw(x)) For υ(a) ≠ 0 we can choose d = 2e and g = (x − a2 )e (x − a3 )e and h = υ(x) + yw(x) ̃ since b = 0 ensures h(P) = υ(a). In the case of υ(a) = 0, we have υ(x) = (x − a)c υ(x) ̃ for c > 0 and υ(a) ≠ 0. This yields ̃ ̃ f ⋅ (x − a2 )c+e (x − a3 )c+e = y2e (s c (x)υ(x) + y w(x))
5.1 Group law |
141
c−1 (x)υ(x), ̃ ̃ ̃ ̃ ≠ 0. If we let h = w(x)+ys with w(x) = (x−a2 )c (x−a3 )c w(x). Note that w(a) ̃ then also h(P) = w(a) ≠ 0 because b = 0. Then
f ⋅ (x − a2 )c+e (x − a3 )c+e = y2e+1 h and we have reached our goal with d = 2e + 1. For a polynomial f ≠ 0 and a point P ∈ E(k) let d ∈ ℕ be defined as in Theorem 5.3. We call d the order of f at P and write d = ord P (f). Note that f(P) = 0 if and only if ordP (f) > 0. If ordP (f) > 0, we also speak of the multiplicity of the root P. An important property of the order follows directly from the uniqueness in Theorem 5.3: ordP (f ⋅ g) = ordP (f) + ordP (g) for all f, g ∈ k[x, y] and P ∈ E(k). Theorem 5.4. Let f, h ∈ k[x, y] with f ≠ 0 ≠ h and ordP (f) ⩽ ordP (h) for all P ∈ E(k). Then, there is a polynomial g ∈ k[x, y] satisfying fg = h. Proof: It suffices to show the existence of g ∈ k[x, y] with f f g = hf since then f (fg − h) = 0 and thus fg = h. Because of ordP (f f ) ⩽ ordP (hf ) for all points P, by replacing f by f f , we may thus assume that f ∈ k[x]. We use induction on the degree of f . If deg x (f) = 0, then f ∈ k is a constant and g = f −1 h satisfies the requirements of the theorem. For degx (f) = 1, it is sufficient to consider the normalized case f = x − a. Let h = υ(x) + yw(x) and let P = (a, b) be a point on the curve. From ordP (x − a) = ordP (x − a) ⩾ 1, we obtain ordP (h) ⩾ 1 and ordP (h) ⩾ 1. Thus υ(a) + bw(a) = 0 = υ(a) − bw(a). For b ≠ 0 this yields υ(a) = w(a) = 0 and, hence, h contains a factor x − a. The remaining case is b = 0, υ(a) = 0 and w(a) ≠ 0. We may assume that a = a1 . Since (x − a)(x − a2 )(x − a3 ) = s(x) = y2 , we have ordP (x − a) = 2. On the other hand, ordP (h) = 1 because ̃ ̃ ̃ ̃ + y w(x) = y (y υ(x) + w(x)) h ⋅ (x − a2 )(x − a3 ) = s(x)υ(x) ̃ ̃ ̃ is determined by υ(x) = (x − a)υ(x). Here, w(x) = (x − a2 )(x − a3 )w(x), and υ(x) This contradicts ordP (x − a) ⩽ ordP (h). Therefore, the situation b = 0, υ(a) = 0 and w(a) ≠ 0 cannot occur for degx (f) = 1. Now let degx (f) > 1. Then, f can be written as f = f1 f2 , where both deg x (f1 ) and deg x (f2 ) are smaller than deg x (f). We can apply the induction hypothesis to f1 and h. This yields the existence of g1 ∈ k[x, y] with f1 ⋅ g1 = h. Now, ordP (f2 ) ⩽ ordP (g1 ) for all P ∈ E(k). Again, by induction there exists g2 ∈ k[x, y] with f2 g2 = g1 . Thus, we obtain fg2 = f1 f2 g2 = f1 g1 = h.
5.1.3 Divisors A divisor (with nonnegative coefficients) in this context is a formal sum D = ∑ P∈E(k) n P P, where n P ∈ ℕ for all P and n P = 0 for almost all P. Therefore, the sum is finite. Divisors
142 | 5 Elliptic curves
can be added: ( ∑ m P P) + ( ∑ n P P) = ∑ (m P + n P )P P∈E(k)
P∈E(k)
P∈E(k)
Note that the addition of divisors is associative. The empty sum with n P = 0 for all P ∈ E(k) is the neutral element. The degree of a divisor D = ∑P∈n P n P P is given by deg(D) = ∑ n P P∈E(k)
Obviously, deg(D1 + D2 ) = deg(D1 ) + deg(D2 ). By Lemma 5.2, every polynomial f ∈ k[x, y] defines a unique divisor div(f) by div(f) = ∑ ordP (f)P P∈E(k)
Since ordP (f ⋅ g) = ordP (f) + ordP (g), it follows that div(f ⋅ g) = div(f) + div(g). Divisors of type div(f) are called principal divisors. Let us first determine the principal divisor of f = x − a. Let P = (a, b). If a ∈ {a1 , a2 , a3 }, say a = a1 , then we have (x − a)(x − a2 )(x − a3 ) = s(x) = y2 and hence ordP (x − a) = 2; this yields div(x − a) = 2P. For a ∈ ̸ {a1 , a2 , a3 }, we have P ≠ P and div(x − a) = P + P. So regardless of a, we always obtain div(x − a) = P + P. In the next step let f = f(x) ∈ k[x]. Then, f = ∏ni=1 (x − x i )d i with deg x (f) = ∑ni=1 d i . If we choose elements y i ∈ k such that P i = (x i , y i ) ∈ E(k), then the principal divisor of f is n
div(f) = ∑ d i (P i + P i ) i=1
Then for f ∈ k[x], we obtain the formula deg(f) = 2 deg x (f) = deg(div(f)) For a divisor D = ∑P∈E(k) n P P we further define D = ∑P∈E(k) n P P. Obviously, deg(D) = deg(D). Now let f ∈ k[x, y], then f(P) = f (P). It follows that ordP (f) = ordP (f ) and therefore div(f ) = div(f). We obtain 2 deg(f) = deg(N(f)) = deg(div(N(f))) = deg(div(f)) + deg(div(f )) = 2 deg(div(f)) This yields deg(f) = deg(div(f)) Therefore, a principal divisor can never have degree 1. Since for P = (a, b) ∈ E(k) the divisor P + P = div(x − a) is a principal divisor, all divisors of type D + D are principal divisors.
5.1 Group law | 143
Let us examine the principal divisors of a linear polynomial f = μx + ν + γy for μ, ν, γ ∈ k. The case γ = 0 has already been treated earlier. So, without loss of generality let γ = −1. The principal divisor can be obtained from the three points in the intersection of the line L = { (x, y) ∈ k × k | y = μx + ν } and the elliptic curve. The corresponding formulas can be found in Section 5.1.1.
5.1.4 Picard group In the following, all computations are performed modulo principal divisors. Formally, we define an equivalence relation by C ∼ D if there exist polynomials f, g ∈ k[x, y] such that C+div(f) = D+div(g). Then [D], as usual, denotes the equivalence class of D. If C 1 ∼ D1 and C2 ∼ D2 , then also C1 +C2 ∼ D1 +D2 . Thus, the set of equivalence classes is a commutative monoid with respect to the operation [C]+[D] = [C + D]. All principal divisors are in the same class, which is the neutral element 0. This monoid is in fact a group because D + D is a principal divisor, that is, [D + D] = 0 and hence [D] = −[D]. The group consisting of these classes is called the Picard group (Charles Émile Picard, 1856–1941) and it is denoted by Pic0 (E(k)). We show that the Picard group Pic0 (E(k)) ̃ can be identified with E(k). Let us first consider the class of principal divisors. There is no other divisor in this class: suppose that D + div(f) = div(h) for polynomials f, h ∈ k[x, y]. Then, ordP (f) ⩽ ordP (h) for all P ∈ E(k) and, by Theorem 5.4, there exists a polynomial g with fg = h. Thus, D = div(g) is a principal divisor. In particular, the zero class does not contain any divisor of degree 1. For P ∈ E(k), it then follows that [P] ≠ 0 in Pic0 (E(k)) and therefore the Picard group is not trivial. If we consider two points P, Q ∈ E(k), then [P] ≠ [Q] is equivalent to [P + Q] ≠ 0. If Q ≠ P ≠ Q, the x-coordinates of P and Q are different and there is a unique line through P and Q. This line intersects the elliptic curve in a third point R. Then, P + Q + R is a principal divisor for a linear polynomial and therefore [P + Q] = [R] ≠ 0. This shows that [P] ≠ [Q]. If Q ≠ P = Q, then P ≠ (a i , 0) and [P + Q] = [2P] = [R] for the third point R in the intersection of the tangent at P with the curve E(k). Thus, for two different points P, Q ∈ E(k), we always have [P] ≠ [Q]. It remains to consider the point at infinity O and we define [O] = 0. Theorem 5.5. The mapping ̃ E(k) → Pic0 (E(k)) P → [P] defines an isomorphism of Abelian groups. Proof: As we have just seen, the mapping is injective. It is surjective, too. To see this, we give an equivalence preserving reduction process on the set of divisors. A divisor of the form P + P can be replaced by 0. For each divisor P + Q with P ≠ Q we find a line
144 | 5 Elliptic curves and thus another point R on the curve with the property that P + Q + R is a principal divisor. Thus, P + Q can be replaced by R. Each step reduces the degree of a divisor. The algorithm ends at 0 and therefore at the point at infinity or at a divisor of degree 1, that is, a point on the curve. The homomorphism property P + Q → [P] + [Q] directly follows from the construction. ̃ Since the Picard group is an Abelian group, the points of the elliptic curve E(k) also form an Abelian group. Furthermore, if K is a subfield of k, the formulas in Section 5.1.1 ̃ show that E(K) = { (a, b) ∈ K × K | b 2 = s(a) } is a subgroup. Addition, as introduced in Section 5.1.1, coincides with addition in the Picard group. The associative law, thus, is a consequence of the interpretation of the points on the curve as divisors and the interpretation of lines as principal divisors. This completes the proof of Theorem 5.1.
5.2 Applications of elliptic curves This section is mostly self-contained. We therefore repeat the most important properties of elliptic curves. Let K be a field with a characteristic different from 2 and let A, B ∈ K with 4A3 + 27B2 ≠ 0. The set of points defined by the equation Y 2 = X 3 + AX 2 + B is E(K) = { (x, y) ∈ K × K y2 = x3 + Ax + B } If we add a new point O, called the point at infinity, then we obtain the elliptic curve ̃ E(K) defined by the parameters A and B. ̃ E(K) = E(K) ∪ {O} ̃ ̃ There is an addition + on E(K) such that (E(K), +, O) forms an Abelian group with neutral element O. For points P = (x1 , y1 ) and Q = (x2 , y2 ) in E(K), the sum P + Q is given by the following formulas. y −y (a) If x1 ≠ x2 , then for μ = x22 −x11 , we define x3 = μ2 − x2 − x1 and y3 = μ(x1 − x3 ) − y1 and P + Q = (x3 , y3 ). 3x 2 +A
(b) If x1 = x2 and y1 = y2 ≠ 0, we have P = Q. In this case, we use μ = 2y1 1 for defining x3 = μ 2 − 2x1 and y3 = μ(x1 − x3 ) − y1 and 2P = P + Q = (x3 , y3 ). (c) If x1 = x2 and y1 = −y2 , we define P + Q = O. Case (c) gives a simple formula for computing the inverse. For P = (x1 , y1 ) ∈ E(K), we have −P = (x1 , −y1 ). Case (a) occurs if and only if P ≠ Q and P ≠ −Q. We presented ̃ a complete proof of the group laws in Section 5.1. In particular, we have P + Q ∈ E(K) ̃ ̃ for all P, Q ∈ E(K). The addition on E(K) might look mysterious at first sight but it has a geometric motivation, which is given in Section 5.1.1. Many cryptographic protocols are based on arithmetic in cyclic groups. A typical example is the subgroup ⟨g⟩ of (ℤ/pℤ)∗ generated by g ∈ (ℤ/pℤ)∗ . The analogon
5.2 Applications of elliptic curves |
145
for elliptic curves is ⟨P⟩ where P is a point on an elliptic curve over a finite field. The general idea is to replace the operation g k in (ℤ/pℤ)∗ by k ⋅ P = ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ P + ⋅⋅⋅ + P k summands
on the elliptic curve. For the efficient computation of k ⋅ P, you can use fast exponentiation with the only difference that multiplication is replaced by addition. For the sake of security, certain requirements have to be met by the subgroup ⟨P⟩ generated by P: for example, this group should be large, and its order should have a large prime divisor. The advantage of elliptic curves is that the same level of security as in (ℤ/pℤ)∗ can already be achieved with smaller key lengths. This is mainly because algorithmic techniques that rely on linear algebra seem not to work over elliptic curves: for instance, to date there has been no efficient adaptation of the quadratic sieve or the index calculus algorithm.
5.2.1 Diffie–Hellman key exchange with elliptic curves We use Diffie and Hellman’s key exchange protocol as an example of how to apply elliptic curves in cryptography; see Section 2.9 for the classical version. Alice and Bob wish to agree upon a common key which is not known to anybody else. The common key could then be used for a symmetric encryption method such as AES. The problem is that all communication between Alice and Bob can be intercepted. ̃ First, Alice chooses an elliptic curve E(ℤ/pℤ) together with a point P = (x, y) on this curve. The naive approach for this is to choose the parameters A, B, and p of the curve as well as a random element x such that x3 + Ax + B is a square in ℤ/pℤ; and then Alice computes y as a square root of x3 + Ax + B modulo p. This could be done using Tonelli’s or Cipolla’s algorithm or, as in the Rabin cryptosystem, she could rely on special prime numbers p for which the extraction of roots is easy. The following procedure is much simpler: Alice first chooses a large prime number p and elements A, x, y ∈ ℤ/pℤ at random, and then she computes B = y2 − x3 − Ax mod p. The elliptic ̃ curve E(ℤ/pℤ) is fully defined by the parameters A, B, and p. By construction, the point P = (x, y) lies on the curve. Alice sends the public key (p, A, B, x, y) to Bob. As a secret key, Alice chooses a ∈ ℕ. Then, she sends the coordinates of the point a ⋅ P to Bob. Bob proceeds analogously. He chooses a secret number b ∈ ℕ and sends b ⋅ P to Alice. They are now both able to compute the point ̃ Q = ab ⋅ P = a ⋅ (b ⋅ P) = b ⋅ (a ⋅ P) ∈ E(ℤ/pℤ) Neither of them needs to know the other’s secret key. The point Q can be used as their common secret key (for example by taking the first n bits of the binary representation). The computation of a, when given P and a⋅P, is exactly the discrete logarithm to base P
146 | 5 Elliptic curves ̃ in the group E(ℤ/pℤ). One can therefore assume that intercepting the communication does not help in retrieving the secret key. The computation of Q, when given the points P, a ⋅ P and b ⋅ P, is known as the Diffie–Hellman problem, and it is believed not to be efficiently solvable. We note that the ElGamal cryptosystem can also be adapted to elliptic curves in the very same way as the Diffie–Hellman key exchange protocol.
5.2.2 Pseudocurves ̃ The proof of the group laws for E(K) in Section 5.1 relies on the fact that K is a field. Now, we consider elliptic curves over quotient rings ℤ/nℤ. If n is composite, then such rings are not fields. Thus, we cannot rely on validity of the group laws anymore. Even the sum of two points is not necessarily defined in all cases since the denominators in the above computations need not be invertible. Let n ∈ ℕ be an odd number greater than 1 and let A, B ∈ ℤ such that gcd(4A3 + 27B2 , n) = 1. The pseudocurve defined by the parameters A and B is ̃ E(ℤ/nℤ) = { (x, y) ∈ (ℤ/nℤ)2 y2 ≡ x3 + Ax + B mod n } ∪ {O} The name pseudocurve shall indicate that ℤ/nℤ does not need be a field. As usual, we identify ℤ/nℤ with {0, . . . , n − 1}. We transfer the addition operation from elliptic curves to pseudocurves, but with the restriction that the sum is only partially defined. ̃ The point at infinity O is neutral and for P = (x1 , y1 ) and Q = (x2 , y2 ) in E(ℤ/nℤ)\{O}, we define P + Q as follows. y −y (a) If gcd(x2 − x1 , n) = 1, then let μ = x22 −x11 . (b) If x1 = x2 and y1 = y2 with gcd(y1 , n) = 1, then we define μ =
3x 21 +A 2y1 .
In both cases, we define the addition P + Q = (x3 , y3 ) by x3 = μ 2 − x2 − x1 and y3 = μ(x1 − x3 ) − y1 . All computations are modulo n. (a) If x1 = x2 and y1 = −y2 , we define P + Q = O. There are three possibilities for P + Q to be undefined. First, we could have x1 ≢ x2 mod n, but x2 − x1 is not in (ℤ/nℤ)∗ . Second, we could have P = Q, but y1 is not in (ℤ/nℤ)∗ . And the third possibility for composite numbers n is x1 ≡ x2 mod n but y1 ≢ ±y2 mod n. ̃ ̃ Theorem 5.6. Let P, Q ∈ E(ℤ/nℤ) such that P + Q is defined. Then, P + Q ∈ E(ℤ/nℤ). Proof: We may assume that P ≠ O and Q ≠ O. Let P = (x1 , y1 ) and Q = (x2 , y2 ). First, suppose that x2 − x1 is invertible modulo n. In this case, the claim essentially follows from the derivation (see Section 5.1.1) of the earlier given formulas. However, a few minor adjustments are necessary. Let μ ≡ (y2 − y1 ) ⋅ (x2 − x1 )−1 mod n. Then,
5.2 Applications of elliptic curves
| 147
P + Q = (x3 , y3 ) with x3 = μ2 − x2 − x1 mod n and y3 = μ(x1 − x3 ) − y1 mod n. For ν ≡ y1 − μx1 mod n, we have y2 ≡ μ(x2 − x1 ) + y1 ≡ μx2 + ν
mod n
Therefore, P and Q are points on the line Y = μX + ν over ℤ/nℤ. Let s(X) = X 3 + AX + B and g(X) = μX + ν. Both x1 and x2 are roots of the polynomial t(X) = s(X) − g2 (X) of degree 3 over ℤ/nℤ. From Corollary 1.47, we obtain t(X) = (X − x1 )r(X) for a polynomial r of degree 2. Since x2 − x1 is invertible, x2 is a root of r. Together with the degree formula from Theorem 1.42 this yields t(X) = (X − x1 )(X − x2 )(X − x̂ 3 ) for x̂ 3 ∈ ℤ/nℤ. By comparing the coefficients at X 2 , we see that x̂ 3 ≡ x3 mod n. For ̃ ŷ 3 = g(x3 ), we have 0 ≡ s(x3 )−g2 (x3 ) ≡ s(x3 )− ŷ 23 mod n and thus (x3 , ŷ 3 ) ∈ E(ℤ/nℤ). Moreover, ŷ 3 = μx3 + ν ≡ −μ(x1 − x3 ) + y1 ≡ −y3 mod n shows P + Q = (x3 , y3 ) ∈ ̃ E(ℤ/nℤ). The second case is x1 ≡ x2 mod n. We can exclude the case that y1 ≡ −y2 mod n ̃ because the point at infinity belongs to E(ℤ/nℤ). Since P + Q is defined, we can conclude that y1 ≡ y2 mod n and that 2y1 is invertible modulo n; remember that n is odd. ̃ In particular, P = Q. Let P + Q = 2P = (x3 , y3 ) in E(ℤ/nℤ). We use closure of the operation on elliptic curves over ℚ. We interpret x1 , y1 ∈ {0, . . . , n − 1} ⊆ ℚ. Let B = B + kn with y21 = x31 + Ax1 + B in ℚ ̃ (ℤ/nℤ) = E(ℤ/nℤ) ̃ Let E be the curve defined by A and B . Then, E and P = (x1 , y1 ) ∈ ̃ E (ℚ). Due to the addition rules, there exist x3 , y3 ∈ ℤ with 2P = (
x3 y3 ̃ (ℚ) )∈E 2 (2y1 ) , (2y1 )2
and x3 ⋅ (2y1 )−2 ≡ x3 mod n as well as y3 ⋅ (2y1 )−2 ≡ y3 mod n. Here, (2y1 )−2 denotes ̃ (ℚ) yields (x3 , y3 ) ∈ E(ℤ/nℤ). ̃ the inverse of (2y1 )2 modulo n. Now, 2P ∈ E ̃ Let m > 1 be a divisor of n. Then, E(ℤ/mℤ) inherits the property of being a pseũ docurve from E(ℤ/nℤ). By the homomorphism theorem for rings, we know that mod m : ℤ/nℤ → ℤ/mℤ : x → x mod m is a ring homomorphism. We extend this ̃ mapping to points P = (x, y) ∈ E(ℤ/nℤ) by P mod m = (x mod m, y mod m) ̃ and define O mod m = O. If P is a point on the pseudocurve E(ℤ/nℤ), then P mod m ̃ is a point on the pseudocurve E(ℤ/mℤ). An important special case of this modulo op̃ eration on curve points occurs if m is prime. Then, E(ℤ/mℤ) is an elliptic curve since 3 2 m > 2 and 4A + 27B ≠ 0 in ℤ/mℤ. The following theorem connects the operations
148 | 5 Elliptic curves ̃ on the two pseudocurves. The main problem is that P + Q in E(ℤ/nℤ) might a priori ̃ be obtained by different formulas than (P mod m) + (Q mod m) in E(ℤ/mℤ). We show that this cannot happen. ̃ Theorem 5.7. Let m > 1 be a divisor of n and let P, Q ∈ E(ℤ/nℤ) be points such that ̃ P + Q is defined. Then, (P mod m) + (Q mod m) in E(ℤ/mℤ) is defined and (P mod m) + (Q mod m) = (P + Q) mod m Proof: We may assume that P = (x1 , y1 ) ≠ O and Q = (x2 , y2 ) ≠ O. The ring homomorphism mod m : ℤ/nℤ → ℤ/mℤ is compatible with forming inverses, that is, for gcd(x, n) = 1 we have (x mod n)−1 mod m = (x mod m)−1 On the left-hand side of the equation, (x mod n)−1 denotes the inverse of x modulo n while, on the right-hand side, the element x mod m is inverted in ℤ/mℤ. If we apply ̃ ̃ the same formulas for the addition of the points in E(ℤ/nℤ) and E(ℤ/mℤ), then the homomorphism property yields the desired result. In particular, the sum (P mod m) + (Q mod m) is defined in this situation. The remaining cases are (a) x1 ≢ x2 mod n and x1 ≡ x2 mod m (b) x1 ≡ x2 mod n, y1 ≡ y2 ≢ 0 mod n and y1 ≡ −y2 mod m (c) x1 ≡ x2 mod n and y1 ≢ ±y2 mod n In case (a), the sum P + Q is not defined because x2 − x1 is divisible by m and therefore not invertible modulo n. Thus, the requirements of the theorem are not satisfied. In case (b), the sum P + Q is not defined either because 2y1 ≡ y1 + y2 mod n and m | y1 + y2 . Thus, 2y1 is not invertible modulo n. If case (c) occurs, then again P + Q is not defined.
5.2.3 Factorization using elliptic curves In order to factorize an integer n > 1, you try to find m ∈ {2, . . . , n − 1} with m | n. Although still unproven, the factorization of binary numbers is considered to be not solvable in polynomial time. Therefore, a factorization algorithm is only taken into account after you have checked that the number in question is composite. We already discussed several primality tests. A second typical preparatory step is to exclude small divisors by trial divisions. The idea of using elliptic curves for factorization was derived by Hendrik Willem Lenstra Jr. (born 1949). In the following, we present a probabilistic variant of this approach. The basic idea is Pollard’s p − 1 algorithm, which we repeat briefly. Let n be a composite number, p a prime factor of n and let a ∈ ℕ with gcd(a, n) = 1. We assume that k ∈ ℕ is a multiple of p − 1. By Fermat’s little theorem, we have a p−1 ≡ 1 mod p, and thus also a k ≡ 1 mod p
5.2 Applications of elliptic curves |
149
Now p | a k − 1 and p | n. If a k ≢ 1 mod n then gcd(a k − 1, n) yields a nontrivial divisor of n. In this method, we have some freedom in the choice of a and k. A possible strategy for choosing k is motivated by the hope that p − 1 can be factorized into small prime divisors. Since we do not know p at this point, you could choose k as k = ∏ ℓ⌊logℓ n⌋ ℓ⩽C
Here, C is an expected upper bound for the prime divisor of p − 1. In order to keep k within a reasonable size, C must not be too large. For a, an arbitrary value from (ℤ/nℤ)∗ can be chosen. The disadvantage of the method is that the structure of ℤ/nℤ cannot be changed. If, for example, no prime factor p of n has the property that p − 1 has only small prime divisors, then Pollard’s p − 1 algorithm is not successful. This is where pseudocurves come into play because, for every n, there are numerous ̃ pseudocurves E(ℤ/nℤ). Lenstra’s method also starts by choosing a bound C and constructing a number k as before. The greater the number C, the better the chance of finding a divisor. However, the time for the computation also grows with C. The idea is to randomly select a ̃ ̃ pseudocurve E(ℤ/nℤ) and a point P ∈ E(ℤ/nℤ). Then, you try to compute k ⋅ P with the hope that, at some point in this computation, the sum of two points is undefined. This will then yield a nontrivial divisor of n. If the computation of k ⋅ P is successful, then we repeat this step with another pseudocurve. This is not possible with Pollard’s p − 1 algorithm. We now discuss the details of the method. The algorithm first chooses A, x, y ∈ {0, . . . , n − 1} at random. Then, B is computed by B = y2 − x3 − Ax mod n If gcd(4A3 + 27B2 , n) ≠ 1, then we either found a nontrivial divisor of n or we repeat ̃ the process. The pseudocurve E(ℤ/nℤ) is now given by the equation Y 2 = X 3 + AX + B ̃ with gcd(4A3 + 27B2 , n) = 1 and the point on E(ℤ/nℤ) is P = (x, y). Remark 5.8. If we determined A and B first, then it would be harder to find a nonzero point on the curve. Note that neither Tonelli’s nor Cipolla’s algorithm can be used for the extraction of roots since ℤ/nℤ is not a field. ◊ The operation on pseudocurves need not be defined for all possible arrangements of parentheses. For example, in the computation of 5 ⋅ P, it is possible that ((P + P) + (P + P)) + P is defined whereas (P + P) + ((P + P) + P) is undefined. The difference here is that, in the second case, 3 ⋅ P = (P + P) + P occurs as an intermediate result. Moreover, for two different parentheses, where all intermediate results are defined, we cannot expect that both of them yield the same result. This problem is circumvented by using a fixed algorithm for the computation of k ⋅ P. Thus, the parenthesis is uniquely determined by k. We can view k ⋅ P as the result produced by this particular algorithm. This result can also be undefined. To compute k ⋅ P as efficiently as possible, you use fast exponentiation with addition rather than multiplication as operation.
150 | 5 Elliptic curves ̃ Proposition 5.9. If the operation Q + R for two points Q, R ∈ E(ℤ/nℤ) is not defined, this yields a nontrivial divisor of n. Proof: Let Q = (x1 , y1 ) and R = (x2 , y2 ). If Q + R is undefined, then there are three possible reasons. The first one is that x1 ≢ x2 mod n, but x2 − x1 is not invertible modulo n. In this case, x2 − x1 is not a multiple of n and not coprime to n. Thus, gcd(x2 − x1 , n) is a nontrivial divisor of n. The second possibility is x1 ≡ x2 mod n and y1 ≡ y2 ≢ 0 mod n, but y1 is not invertible modulo n. Now, gcd(y1 , n) yields a nontrivial divisor of n. The third case is x1 ≡ x2 mod n, but y1 ≢ ±y2 mod n. We have y22 − y21 ≡ (x32 + Ax2 + B) − (x31 + Ax1 + B) ≡ 0 mod n . Therefore, y22 − y21 = (y2 + y1 )(y2 − y1 ) is a multiple of n, but neither y2 + y1 nor y2 − y1 is a multiple of n. But then both gcd(y2 + y1 , n) and gcd(y2 − y1 , n) are nontrivial divisors of n. If the operation Q + R for two points Q and R during an intermediate step in the computation of k ⋅ P is not defined, then we obtain a nontrivial divisor of n. It remains to explain why we can hope for some addition during the computation to be undefined. ̃ Let p ⩾ 3 be a prime factor of n. Then, E(ℤ/pℤ) is an elliptic curve. The hope is that ̃ the order of E(ℤ/pℤ) has only small prime divisors because then k ⋅ (P mod p) = O ̃ in E(ℤ/pℤ). It is very unlikely that, for all other prime divisors q of n, we also have ̃ k ⋅ (P mod q) = O in E(ℤ/qℤ). From Theorem 5.7, it follows that k ⋅ P is not defined ̃ in E(ℤ/nℤ); otherwise k ⋅ P = O would follow because only the point at infinity is mapped to the point at infinity mod p. This in turn would imply k ⋅ (P mod q) = O in ̃ E(ℤ/qℤ), contradicting our assumption.
5.2.4 Goldwasser–Kilian primality certificates The idea behind a primality certificate is to provide an efficiently verifiable proof. If a number n has a primality certificate, then the proof should show that n is prime with absolute certainty. One such approach goes back to Henry Cabourn Pocklington (1870–1952). Theorem 5.10 (Pocklington). Let a, k, n, q ∈ ℕ with n − 1 = q ⋅ k and q > k. Suppose that the following properties are satisfied: (a) q is prime (b) a n−1 ≡ 1 mod n and (c) gcd(a k − 1, n) = 1. Then, n is prime. Proof: Suppose that n is composite. Then there exists a prime divisor p of n with p ⩽ √n. Let d be the order of a in (ℤ/pℤ)∗ . From (b), we obtain a n−1 ≡ 1 mod p. Therefore, d is a divisor of n − 1. From (c), we obtain that d does not divide k. Then,
5.2 Applications of elliptic curves
| 151
(a) yields that q is a divisor of d. Altogether, we have √n > p − 1 ⩾ d ⩾ q. But q > k yields q ⩾ √n, a contradiction. Consequently, n is prime. Example 5.11. We present a proof for the primality of 2 922 259 that (with the help of a computer) can easily be checked. We have 2 922 259 − 1 = 1721 ⋅ 1698 and 22 922 259−1 ≡ 1 mod 2 922 259 gcd(21698 − 1, 2 922 259) = 1 Both facts can be checked efficiently using fast modular exponentiation and the Euclidean algorithm. If we knew that 1721 is prime, then Pocklington’s theorem would show that 2 922 259 is prime. We use the same approach to give a proof for the primality of 1721. We have 1721 − 1 = 43 ⋅ 40 and 21721−1 ≡ 1 mod 1721 gcd(240 − 1, 1721) = 1 Finally, we “prove” that 43 is prime since 43 − 1 = 7 ⋅ 6 and 243−1 ≡ 1 mod 43 gcd(26 − 1, 43) = 1 Since we know that 7 is prime, the number 43 is prime, too. This in turn implies that 1721 is prime and finally that 2 922 259 is prime. The primality certificate consists of all numbers involved in the proof given earlier: n1 = 43
q1 = 7
a1 = 2
n2 = 1721
q2 = 43
a2 = 2
n3 = 2 922 259
q3 = 1721
a3 = 2
The process (in reverse order) can also be used to produce “provable” primes.
◊
The problem with Pocklington’s primality certificates is that they only work for numbers n where n − 1 has a large prime factor. Shafrira Goldwasser (born 1958) and Joseph John Kilian’s algorithm carries Pocklington’s idea over to elliptic curves. Here, by choosing different curves, numerous groups are available. Similar to the situation with integer factorization, one would apply this method only if n has passed a probabilistic primality test such as the Miller–Rabin test. ̃ Theorem 5.12 (Goldwasser and Kilian 1986). Let n ∈ ℕ and let E(ℤ/nℤ) be a pseu4 2 ̃ √n docurve. Let O ≠ P ∈ E(ℤ/nℤ) and q > ( + 1) a prime number. If q ⋅ P = O in ̃ E(ℤ/nℤ), then n is prime. ̃ Proof: Let q ⋅ P = O in E(ℤ/nℤ) and suppose that n is composite. Then there exists a prime factor p of n with p ⩽ √n. Let d be the order of P mod p in the elliptic curve
152 | 5 Elliptic curves ̃ ̃ E(ℤ/pℤ). From Theorem 5.7, we have q ⋅ (P mod p) = O in E(ℤ/pℤ). This implies d | q. ̃ Since q is prime and d ≠ 1, we obtain d = q. Thus, q ⩽ |E(ℤ/pℤ)|. According to 4 ̃ the Hasse bound in equation (5.1), however, |E(ℤ/pℤ)| ⩽ (√n + 1)2 and we have a contradiction. Remark 5.13. If you require the stronger assumption q > 2√n + 1 in Theorem 5.12, then the statement can be shown without using the Hasse bound. Instead, the weaker ̃ (and easy to prove) estimate |E(ℤ/pℤ)| ⩽ 2p + 1 in the earlier proof is sufficient. The assumption q > 2√n + 1 is not a limitation for applications. ◊ Let us now describe Goldwasser and Kilian’s algorithm. An important step in this algorithm is to efficiently determine the number of points on an elliptic curve over a finite field. The first deterministic polynomial time algorithm for this task was derived by René Schoof (born 1955). If m = log(p) is the number of bits of the prime p, it ̃ takes O(m9 ) operations to compute |E(ℤ/pℤ)|, and thus in its original form it was not practicable. With some improvements, especially by Arthur Oliver Lonsdale Atkin (1925–2008) and Noam Elkies (born 1966), the computation essentially takes O(m4 ) operations. Another ingredient is a probabilistic algorithm for computing square roots in finite fields. This can be done with either Tonelli’s or Cipolla’s algorithm; see Section 3.5. We want to prove that n ∈ ℕ is prime. Moreover, the test shall provide a certificate for this property. For small numbers n, we can solve this by means of a table lookup. Therefore, we can assume n to be sufficiently large. First, by a probabilistic procedure, we convince ourselves that n is prime with high probability. Then, we choose a rañ dom pseudocurve E(ℤ/nℤ) and, for example by using Schoof’s algorithm, we com̃ pute the number |E(ℤ/nℤ)| under the assumption that n is prime. If this calculation is not possible, for example, due to a division by zero, then n is composite. We keep ̃ on searching for pseudocurves until |E(ℤ/nℤ)| = kq for a probable prime q which 4 satisfies (√n + 1)2 < q < n/2. Before we certify q as prime, we choose a random ̃ point P = (x, y) on E(ℤ/nℤ). For this, we repeatedly choose x ∈ ℤ/nℤ at random until y ∈ ℤ/nℤ is found such that y2 ≡ x3 + Ax + B mod n. To determine y, we use one of the randomized methods for extracting roots in finite fields. Should the process fail, then the chances of finding a proper divisor of n are good, thus proving that n is composite. ̃ In the next step, we compute P = k ⋅ P in E(ℤ/nℤ). If k ⋅ P is not defined, then n is com ̃ posite. If P = O, we search for a new point P ∈ E(ℤ/nℤ). For P ≠ O, the point P must ̃ have the order q, otherwise n is composite. If the computation of q ⋅ P = O in E(ℤ/nℤ) is successful, then from Theorem 5.12, we obtain that n is prime, provided that q is prime. Therefore, we apply the method recursively on input q and thus certify q to be ̃ prime. The certificate for the primality of n consists of the parameters of E(ℤ/nℤ), the point P, and the number q together with a certificate for its primality. If this algorithm yields a result, this result is always correct. However, it cannot be guaranteed that the algorithm always terminates after finitely many steps and returns a result. Due to the diversity of elliptic curves, however, in practice it turns out that the algorithm termi-
5.3 Endomorphisms of elliptic curves
| 153
nates with very high probability. Moreover, its running time can compete with the AKS algorithm from Chapter 4 (and, in contrast to the AKS algorithm, it additionally yields a primality certificate, which can be verified efficiently).
5.3 Endomorphisms of elliptic curves In the previous section, we used the Hasse bound from Equation (5.1) to improve the performance of the Goldwasser–Kilian primality test. Without the Hasse bound, we obtain a sufficiently good but slightly weaker result. On the other hand, the Hasse bound is indispensable for a deeper understanding of elliptic curves. The known proofs of this formula are based on the study of endomorphisms. This section provides an introduction to the basic concepts and theorems concerning endomorphisms of elliptic curves. The proofs of many results in this section make nice exercises. Let k be an algebraically closed field with a characteristic different from 2, and let E(k) be the elliptic curve defined by Y 2 = X 3 + AX + B. In particular, s(X) = X 3 + AX + B has no multiple roots. In Section 5.1.2, we defined the polynomial ring k[x, y] over E(k) as k[x, y] = k[X, Y]/ ⟨Y 2 = s(X)⟩ The element x ∈ k[x, y] is the residue class of X and y is the residue class of Y. In k[x, y], we have y2 = s(x). The image of k[X] in k[x, y] is denoted by k[x]. Since y2 = s(x) in k[x, y], we have y2 ∈ k[x]. Every polynomial f ∈ k[x, y] can uniquely be written as f = υ(x) + yw(x) with υ(x), w(x) ∈ k[x]. Let f = υ(x) − yw(x). Then, N(f) = f f = υ2 (x) − s(x)w2 (x) ∈ k[x] is the norm of f . We have N(f) = 0 if and only if f = 0. The ring k[x, y] has no zero divisors: if fg = 0 in k[x, y], then N(f)N(g) = 0 in k[x], and therefore N(f) = 0 or N(g) = 0. Thus, k[x, y] is an integral domain. We can form the field k(x, y) consisting of all fractions p(x,y) q(x,y) for polynomials u(x,y) p(x, y), q(x, y) ∈ k[x, y] with q(x, y) ≠ 0. Two fractions p(x,y) q(x,y) and υ(x,y) define the same element of k(x, y) if and only if p(x, y) υ(x, y) = u(x, y) q(x, y) in k[x, y]. Addition and multiplication are defined in the same way as for “usual” fractions. We call k(x, y) the function field of E(k); its elements are rational functions. A rational function r(x, y) = p(x,y) q(x,y) induces a (partial) mapping E(k) → k by (a, b) → r(a, b) for P = (a, b) ∈ E(k), and this mapping is defined for almost all P ∈ E(k); it is undefined if and only if q(a, b) = 0.
Lemma 5.14. If two rational functions r(x, y) and t(x, y) in k(x, y) coincide at infinitely many points P ∈ E(k), then r(x, y) = t(x, y). Proof: Let r = pq and t = uυ for p, q, u, υ ∈ k[x, y]. Then, pυ coincides with uq at infinitely many points P ∈ E(k). By Lemma 5.2, we see that pυ = uq and thus r = t.
154 | 5 Elliptic curves If r = p/q for p, q ∈ k[x, y], then we can multiply both the numerator and the denominator by q. This shows that every rational function r(x, y) can be written as r(x, y) =
u(x) + yυ(x) w(x)
with u(x), υ(x), w(x) ∈ k[x] and w(x) ≠ 0. The order of a point (see Theorem 5.3) is a concept which can also be applied to rational functions. For r = p/q ∈ k(x, y) and P ∈ E(k), let d p and d q be the orders of p and q at P, respectively. The difference d = d p − d q is a well-defined integer and we call d ∈ ℤ the order of r at P. For d ⩾ 0, the image r(P) is defined; this is the case for almost all points of the curve. For d > 0 we speak of a zero or root, and for d < 0 we speak of a pole. By Theorem 5.4, rational functions without poles are polynomials. If the field k has characteristic p > 2 and if A and B belong to a finite field extension 𝔽q of the prime field 𝔽p , then we consider the Frobenius automorphism k → k, a → a q . For a ∈ k, we have a = a q if and only if a ∈ 𝔽q . In particular, A q = A and B q = B. Hence, s q (x) = s(x q ). It follows that the Frobenius mapping ϕ q : E(k) → E(k) (a, b) → (a q , b q ) is well defined and bijective because for all x, y ∈ k, we have y2 = s(x) if and only if y2q = s q (x) = s(x q ). Whenever we mention the Frobenius mapping ϕ q in the remainder of this section, we implicitly assume that A, B ∈ 𝔽q and that q is a prime power. Let ρ : E(k) → E(k) be a partial function such that there exist rational functions r(x, y), t(x, y) ∈ k(x, y) which satisfy ρ(P) = (r(P), t(P)) for all points P in the domain of ρ. We call ρ a rational morphism if it is defined almost everywhere. The domain of ρ must not contain the poles of r or t. Note that ρ uniquely determines the two rational functions r(x, y) and t(x, y). The composition of rational morphisms is again a rational morphism. The Frobenius mapping ϕ q is a typical example of a rational morphism; it is therefore called the Frobenius morphism. Moreover, ϕ q is defined everywhere and it is bijective. We give two important examples of rational morphisms. For every point T = (a, b) ∈ E(k), the translation τ defined by τ(P) = P + T is rational because for all (x, y) ∈ E(k) \ {T, −T}, we have τ(x, y) = (x T , y T ) with (y − b)2 −x−a (x − a)2 (y − b)(x − x T ) −y yT = (x − a)
xT =
5.3 Endomorphisms of elliptic curves
| 155
It is just as easy to see that σ(P) = 2P is a rational morphism. A point (x, y) ∈ E(k) with y ≠ 0 satisfies σ(x, y) = (x,̂ y)̂ for (3x2 + A)2 − 2x 4y2 (3x2 + A)(x − x)̂ −y ŷ = 2y
x̂ =
The morphism σ also defines a group homomorphism. This leads to the following nõ tion. An endomorphism of E(k) is a group homomorphism of E(k) into itself, which is either trivial (i.e., it maps E(k) to O) or it coincides with a rational morphism almost everywhere. We immediately see that nontrivial endomorphisms have a finite kernel because the image of almost every point is in E(k) and, thus, different from O. Lemma 5.15. If endomorphism α and β coincide at almost all points of E(k), then α = β. Proof: We may assume that α and β are nontrivial. Let P ∈ E(k). We show that α(P) = β(P). For almost every point T ∈ E(k), we have α(P + T) = β(P + T) ≠ O, and almost all points T ∈ E(k) satisfy α(T) = β(T) ≠ O. We can therefore choose some point T which satisfies both properties. Since α and β are homomorphisms, we obtain α(P) = α(P + T) − α(T) = β(P + T) − β(T) = β(P). If we set ϕ q (O) = O, then the following theorem shows that ϕ q is an endomorphism. Theorem 5.16. The Frobenius morphism ϕ q is an endomorphism. Proof: See Exercise 5.9. Theorem 5.17. The endomorphisms form a ring with the operations defined by (α + β)(P) = α(P) + β(P) and (α ⋅ β)(P) = α(β(P)). Proof: The endomorphisms of an Abelian group form a ring with these operations. Thus, we only have to show the closure as rational morphisms. This is clear for composition. See Exercise 5.7 for addition. The following lemma provides a normal form of endomorphisms. Lemma 5.18. Let α be a nontrivial endomorphism. Then there exist polynomials yu(x) p(x), q(x), u(x), υ(x) ∈ k[x] such that α(x, y) = ( p(x) q(x) , υ(x) ) for almost all points (x, y) ∈ E(k). Moreover, neither p(x)/q(x) nor u(x)/υ(x) is constant. Proof: See Exercise 5.8. Theorem 5.19. Let α be an endomorphism with α(P) = (r(P), t(P)) for almost all points P ∈ E(k). Then the following properties hold: (a) α(P) = (r(P), t(P)) for all points P ∈ E(k) such that r(P) is defined. (b) The kernel of α is the set of points P ∈ E(k) for which r(P) is not defined.
156 | 5 Elliptic curves Proof: By Lemma 5.18, we can assume that r(x, y) = almost all (x, y) ∈ E(k) satisfy
p(x) q(x)
and t(x, y) =
yu(x) υ(x) . Therefore,
(x3 + Ax + B)u 2 (x) y2 u 2 (x) = 2 = t2 (x, y) υ2 (x) υ (x) = r3 (x, y) + A r(x, y) + B =
p3 (x) + A p(x)q2 (x) + B q3 (x) q3 (x)
We may assume that u(x) and υ(x) have no common roots. Since all roots in the denom3 2 (x)+Bq3 (x) has inator υ(x) appear twice, but x3 +Ax+B only has simple roots, p (x)+Ap(x)q q3 (x) to have a pole at all roots of υ(x). Thus, υ(x) = 0 implies q(x) = 0. In other words, if r(P) is defined, then t(P) is defined. The rational morphism ρ given by ρ(P) = (r(P), t(P)) is therefore defined at all points where r does not have a pole. For property (a) it remains to show that ρ(P) = α(P) holds for all points P ∈ E(k) such that ρ(P) is defined. This is done by applying the proof technique from Lemma 5.15; the main difference is that we cannot assume ρ to be an endomorphism. For almost all points T ∈ E(k), the mapping P → ρ(P + T) − ρ(T) for P ∈ E(k) is a rational morphism, that is, there exist rational functions r,̂ t ̂ ∈ k(x, y) such that the rational morphism ρ T = (r,̂ t)̂ coincides with this mapping at almost all points P. The image ̂ ̂ and t(P) are defined. Note that ρ T (P) could be ρ T (P) is defined if and only if both r(P) defined even if ρ(P + T) is undefined; for instance, this could be the case for P = −T. Let T be a point such that ρ(T) is defined and it satisfies ρ(T) = α(T). For almost all points P ∈ E(k), we have ρ T (P) = ρ(P + T) − ρ(T) = α(P + T) − α(T) = α(P) This shows that ρ T coincides with α at almost all P and, hence, ρ T coincides with ρ at almost all points P. By Lemma 5.14, we obtain r ̂ = r and t ̂ = t, that is, ρ T = ρ. Consider a point P ∈ E(k) in the domain of ρ. There exists a point T such that ρ(P + T) = α(P + T) and ρ(T) = α(T). This yields ρ(P) = ρ T (P) = α(P), as desired. For property (b), we show that ρ(P) is defined if and only if α(P) ≠ O. The implication from left to right follows from property (a). For the other direction, let α(P) ≠ O. We find a point T such that ρ(P + T) = α(P + T) and ρ(T) = α(T), and this yields α(P) = ρ T (P) = ρ(P). In particular, ρ(P) is defined. yu(x) Let α be an endomorphism with α(x, y) = ( p(x) q(x) , υ(x) ) for almost all (x, y) ∈ E(k) such that the polynomials p(x) and q(x) have no common roots. The degree of α is
deg(α) = max {deg x (p(x)), deg x (q(x))} Note that, by Lemma 5.14, the degree of a nontrivial endomorphism is well defined. We call α separable if one of the derivatives p (x) or q (x) is not the zero function. The Frobenius endomorphism ϕ q has degree q, but it is not separable. A direct computation shows that multiplication by 2, that is α(P) = 2P, is separable (Exercise 5.10).
5.3 Endomorphisms of elliptic curves
|
157
The degree of α is 4. This can be computed directly but it also follows from the next theorem. Theorem 5.20. The following properties hold for each nontrivial endomorphism α: (a) The group homomorphism α is surjective. (b) If α is separable, then |ker(α)| = deg(α). (c) If α is not separable, then |ker(α)| < deg(α). Proof: We start with the proof of surjectivity. Let P ∈ E(k). Obviously P = (P + T) − T for all T ∈ E(k). If both (P + T) and T belong to the image of α, then so does P. It therefore suffices to show that for almost every point P ∈ E(k) there exists Q ∈ E(k) with α(Q) = P. In particular, we do not need to consider the three points P with P = −P. Let p(x), q(x), u(x), υ(x) ∈ k[x] be the polynomials given by Lemma 5.18. Moreover, we assume that p(x) and q(x) have no common roots. For almost all points (c, d) ∈ E(k), the property α(c, d) = (a, b) implies p(c)/q(c) = a. In particular, c is a root of p(x) − aq(x). We therefore consider polynomials of the form p(x) − aq(x) for a ∈ k. For almost all a ∈ k, we have deg(α) = degx (p(x) − aq(x)). Note that deg(α) ⩾ 1 since p/q is not constant. If P = (a, b) ∈ E(k) is a point such that the polynomial p(x) − aq(x) is not constant, then there exists a root c of this polynomial. For this root, we find some element d ∈ k such that Q = (c, d) and −Q = (c, −d) are the only points on the curve E(k) with c as first component. By Theorem 5.19, the first component of α(Q) is p(c) q(c) = a. Hence, there are only two possibilities: either α(Q) = P and α(−Q) = −P or α(Q) = −P and α(−Q) = P. Note that Q ≠ −Q since we only consider points P with P ≠ −P. This shows that every root of p(x) − aq(x) accounts for a unique point Q with α(Q) = P; we might have to replace d by −d. In particular, this shows that α is surjective. In addition, since p(x) − aq(x) has at most deg(α) roots, this shows that |ker(α)| = |α −1 (P)| ⩽ deg(α). If α is not separable, then the derivative p (x) − aq (x) of p(x) − aq(x) is zero. In particular, p(x) − aq(x) has multiple roots; this yields |ker(α)| < deg(α). For the remainder of this proof, let α be separable. Let C ⊆ k be the set of roots of the polynomial p(x)q (x) − p (x)q(x). It remains to be shown that there exists a ∈ k such that the polynomial p(x) − aq(x) has no multiple roots and its degree is deg(α). Since there are infinitely many a such that the degree of p(x) − aq(x) is deg(α), there ̂ is such an a ∈ k \ {0} with a ≠ p(c)/q( c)̂ for all ĉ ∈ C. Suppose that p(x) − aq(x) has a multiple root c, that is, p(c) − aq(c) = 0 and p (c) − aq (c) = 0. The multiplication of the two equations p(c) = aq(c) and aq (c) = p (c) yields ap(c)q (c) = ap (c)q(c). Since a ≠ 0, we obtain c ∈ C. This contradicts the choice of a because a = p(c)/q(c). If P = (a, b) is a point on the curve, then |ker(α)| = |α −1 (P)| ⩾ deg(α). This completes the proof of the theorem. Remark 5.21. From Exercise 5.11, the mapping P → ϕ q (P) − P defines the surjective ̃ q ). endomorphism (ϕ q − 1), and its kernel consists of the elliptic curve E(𝔽
158 | 5 Elliptic curves
To prove Hasse’s theorem, the following path can be taken. First, show that ̃ q )| is reduced to giving (ϕ q − 1) is separable. Thus, the problem of determining |E(𝔽 a bound for the degree of (ϕ q − 1). This is rather difficult and technical. The proof typically uses the concept of Weil pairings and is, for example, presented in [108]. Weil pairings were defined in 1940 by André Weil (1906–1998). ◊
Further reading There is a huge number of scientific articles and textbooks on elliptic curves. Silverman’s book [97] is a standard reference. However, for the proofs, the author often refers, without further explanation, to the book by Robin Hartshorne (born 1938) on algebraic geometry [48], which treats the area in a very general way and uses the concept of schemes as defined by Alexander Grothendieck (1928–2014). Hartshorne’s book was not written to be an introduction to the theory of elliptic curves and, therefore, is not suitable for that purpose. A good introduction to the theory of elliptic curves is, for example, book by Washington [108]. We would further like to mention [57, 65]. Cryptographic applications of elliptic curves are also treated in textbooks such as [58] by Neal Koblitz (born 1948). However, this book does not prove the group structure. (The proof of the group structure of elliptic curves as given here uses the method of “divisors” and requires no knowledge beyond the scope of the current book.) Koblitz and Victor Saul Miller (born 1947) are considered to be co-founders of elliptic curve cryptography (ECC), see [56] and [76]. Since the mid-1980s, cryptography using elliptic curves has developed rapidly and become the most established method in practice other than the RSA cryptosystem.
Exercises 5.1. A general elliptic curve is defined by an equation of the following type: Ỹ 2 + c X̃ Ỹ + d Ỹ = X̃ 3 + e X̃ 2 + Ã X̃ + B̃
(5.2)
(a) Show that over fields of characteristic different from 2, Equation (5.2) can be transformed into the following form by changing coordinates: Y 2 = X̂ 3 + ê X̂ 2 + Â X̂ + B̂
(5.3)
(b) Show that over fields of characteristic different from 3, Equation (5.3) can be transformed into Weierstrass normal form Y 2 = X 3 + AX + B by a coordinate shift of X. 5.2. Show that the polynomial X 3 + AX + B has multiple zeros if and only if 4A3 + 27B2 = 0. Make sure that your reasoning is also valid for characteristic 2 and 3.
Summary
| 159
5.3. Let p ⩾ 3 be prime and let E be an elliptic curve over 𝔽p defined by Y 2 = X 3 + AX + B. For a ∈ 𝔽p we let 1 if a ≠ 0 and a is a square in 𝔽p { { { a ( ) = {−1 if a ≠ 0 and a is not a square in 𝔽p { p { { 0 if a = 0 Show that |E(𝔽p )| = p + ∑a∈𝔽p ( a
3
+Aa+B ). p
5.4. Let E be the curve defined by Y 2 = X 3 + X + 6 over the field 𝔽11 . (a) Show that X 3 + X + 6 has no multiple roots in the algebraic closure of 𝔽11 . ̃ 11 ) is cyclic. (b) Show that E(𝔽 5.5. Let E be the curve defined by Y 2 = X 3 + X over the field 𝔽5 . (a) Show that X 3 + X has no multiple roots in the algebraic closure of 𝔽5 . ̃ 5 ) is isomorphic to the Klein four-group ℤ/2ℤ × ℤ/2ℤ. (b) Show that E(𝔽 Note: In the following exercises, let E(k) always be an elliptic curve given by the equation Y 2 = X 3 + AX + B over an algebraically closed field k of characteristic different from 2. Moreover, let 4A3 + 27B2 ≠ 0. ̃ | 3P = O } is isomorphic to the group ℤ/3ℤ × ℤ/3ℤ. 5.6. Show that { P ∈ E(k) 5.7. Show that, for endomorphisms α and β of an elliptic curve, α + β defined by (α + β)(P) = α(P) + β(P) is also a rational morphism. 5.8. Let α be a nontrivial endomorphism. Show that there exist polynomials p(x), yu(x) q(x), u(x), υ(x) ∈ k[x] such that α(x, y) = ( p(x) q(x) , υ(x) ) for almost all (x, y) ∈ E(k). In addition, show that neither p(x)/q(x) nor u(x)/υ(x) is constant. 5.9. Show that the Frobenius morphism ϕ q is an endomorphism. ̃ Hint: Use that E(k) = Pic0 (E(k)) by Theorem 5.5. 5.10. Show that the endomorphism α(P) = 2P of E(k) is separable. 5.11. Show that the mapping P → ϕ q (P) − P defines a surjective endomorphism ̃ q ) over 𝔽q . (ϕ q − 1) and that the kernel consists of the elliptic curve E(𝔽
Summary Notions – – –
elliptic curve E and Ẽ point at infinity O inverse point P
– – –
line, vertical addition of points polynomial ring over E
160 | 5 Elliptic curves
– – – – – – – – –
norm N(f) degree of a polynomial deg(f) order ord P (f) divisor degree of a divisor deg(D) principal divisor div(f) Picard group Pic0 (E) pseudocurve primality certificate
– – – – – – – – –
rational function function field order of a rational function at P zeros and poles rational morphism Frobenius morphism ϕ q endomorphism ring degree of an endomorphism separable endomorphism
Methods and results – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
For each elliptic curve E over 𝔽q , we have |E| ⩽ 2q and | E|̃ ⩽ 2q + 1. Over algebraically closed fields, every line intersects a curve Ẽ at three points (with multiplicities). The addition P + Q = −R is chosen such that P, Q, R are located on a line. Every polynomial f ≠ 0 has only finitely many roots on a curve E. The representation f = υ(x) + yw(x) in k[x, y] is unique. The order of a root of f ∈ k[x, y] is uniquely determined. If f ≠ 0 ≠ h and ord P (f) ⩽ ord P (h) for all P ∈ E, then f divides h. ord P (f ⋅ g) = ord P (f) + ord P (g) div(f ⋅ g) = div(f) + div(g) deg(f) = deg(div(f)) Ẽ and Pic0 (E) are isomorphic; in particular, Ẽ is a group. construction of a point P and a curve E with P ∈ E Diffie–Hellman key exchange with elliptic curves structure of pseudocurves Lenstra’s integer factorization with elliptic (pseudo)curves Pocklington’s prime number certification Goldwasser and Kilian’s prime number certification If two rational functions coincide at infinitely many points, then they are identical. P → P + T is a rational morphism. P → 2P is a rational morphism. Endomorphisms have a finite kernel. If two endomorphisms coincide at almost all points, then they are identical. The Frobenius morphism is a bijective endomorphism. Endomorphisms form a ring. normal form of nontrivial endomorphisms relation between endomorphisms and their rational morphisms Endomorphisms are surjective. For separable endomorphisms, the degree is the size of the kernel. For nonseparable endomorphisms, the degree is larger than the size of the kernel.
6 Combinatorics on words Combinatorics on words deals with structural statements and properties of words and formal languages. Typical applications are text algorithms such as pattern recognition and the theory of variable length codes. The foundations were laid by the Norwegian mathematician Axel Thue (1863–1922) in the early 20th century. We refer to the Lothaire textbooks [66–68] for background information and highlights in this field. Let Σ be a set. We view Σ as an alphabet and the elements of Σ are therefore called letters. We are mainly interested in the case where Σ is finite. By Σ n , we denote the set of sequences of n letters over Σ. For (a1 , . . . , a n ) ∈ Σ n , we also write a1 ⋅ ⋅ ⋅ a n . We say that a1 ⋅ ⋅ ⋅ a n is a word of length |a1 ⋅ ⋅ ⋅ a n | = n, and its alphabet is alph(a1 ⋅ ⋅ ⋅ a n ) = {a1 , . . . , a n } ⊆ Σ. Let Σ∗ denote the set of all (finite) words over Σ. Σ∗ = ⋃ Σ n n∈ℕ
We use ε to denote the empty word. It is the only element of Σ∗ of length 0 and whose alphabet is empty. We have 0∗ = {ε}. Once Σ is nonempty, Σ∗ is an infinite set. On Σ∗ , we can define a concatenation ⋅ as follows. For a1 ⋅ ⋅ ⋅ a n , b 1 ⋅ ⋅ ⋅ b m ∈ Σ∗ let (a1 ⋅ ⋅ ⋅ a n ) ⋅ (b 1 ⋅ ⋅ ⋅ b m ) = a1 ⋅ ⋅ ⋅ a n b 1 ⋅ ⋅ ⋅ b m The concatenation is associative and the empty word ε is its neutral element. If x, u, υ, w are words with x = uυw, then u is called a prefix, υ a factor and w a suffix of x. For u ∈ Σ∗ , we denote by u n the n-fold concatenation u ⋅ ⋅ ⋅ u of u. If u = x n holds for some n ∈ ℕ, then we call u a power of x. Together with the concatenation as operation, Σ∗ forms a monoid.¹ It is the free monoid over the set Σ. The term “free” is motivated by the following algebraic property of Σ∗ : Theorem 6.1 (Universal property). Let M be a monoid and φ : Σ → M a mapping. Then φ can uniquely be extended to a homomorphism φ : Σ∗ → M by defining that φ(a1 ⋅ ⋅ ⋅ a n ) = φ(a1 ) ⋅ ⋅ ⋅ φ(a n ) for n ⩾ 0. Proof: For n = 0, we have φ(ε) = 1, as needed for a monoid homomorphism. Let a1 ⋅ ⋅ ⋅ a n , b 1 ⋅ ⋅ ⋅ b m ∈ Σ∗ be two words. Then, we have φ(a1 ⋅ ⋅ ⋅ a n b 1 ⋅ ⋅ ⋅ b m ) = φ(a1 ) ⋅ ⋅ ⋅ φ(a n ) φ(b 1 ) ⋅ ⋅ ⋅ φ(b m ) = φ(a1 ⋅ ⋅ ⋅ a n ) φ(b 1 ⋅ ⋅ ⋅ b m )
1 In the context of monoids, we use the symbol 1 for the empty word, too. Another standard notation for the empty word is λ. The symbol λ refers to the original work of Thue who published in German, where leeres Wort means empty word.
162 | 6 Combinatorics on words This shows that the extension to Σ∗ is a monoid homomorphism. Let ψ : Σ∗ → M be an arbitrary monoid homomorphism such that ψ(a) = φ(a) for all a ∈ Σ. Then, ψ(a1 ⋅ ⋅ ⋅ a n ) = ψ(a1 ) ⋅ ⋅ ⋅ ψ(a n ) = φ(a1 ) ⋅ ⋅ ⋅ φ(a n ) = φ(a1 ⋅ ⋅ ⋅ a n ). Thus, ψ = φ, proving the uniqueness of the extension of φ. Example 6.2. Let M be any monoid. The free monoid M ∗ over the set M is infinite. Applying Theorem 6.1 to the identity mapping M → M, we obtain a surjective homomorphism, the evaluation homomorphism M ∗ → M, (m1 , . . . , m n ) → m1 ⋅ ⋅ ⋅ m n where m1 ⋅ ⋅ ⋅ m n is a product in M.
◊
Example 6.3. Let Σ be an arbitrary set. The mapping Σ → ℕ with a → 1 extends to a homomorphism |⋅| : Σ∗ → (ℕ, +, 0), which is the length function w → |w|. ◊ Example 6.4. Applying Theorem 6.1 to the mapping Σ → 2Σ , a → {a} yields that the alphabet mapping Σ ∗ → (2Σ , ∪, 0) with w → alph(w) is a homomorphism, that is, alph(ε) = 0 and alph(υw) = alph(υ) ∪ alph(w). ◊
6.1 Commutation, transposition and conjugacy We are interested in the structural relation between words which satisfy certain equations. For example, what can we say about words x and y such that xy = yx? Thus, x and y commute. Since powers of elements commute, there is a simple sufficient condition. If x = r k and y = r m for certain k, m ∈ ℕ, then xy = yx follows. Theorem 6.5 (b) tells us that the converse holds in in free monoids. Besides commutation, we also study transposition and conjugation. Let M be any monoid. Elements x, y ∈ M are called transposed, if there are r, s ∈ M such that x = rs and y = sr. Transposition is a reflexive and symmetric relation, but in general it is not transitive; see Exercise 6.1. However, we show that transposition is a transitive relation in free monoids. An element x ∈ M is called conjugate to y ∈ M (more precisely left-conjugate) if there exists z ∈ M such that zx = yz. The conjugacy relation is always reflexive and transitive, but in general it is not symmetric: for example, consider M = a∗ ∪ ba∗ ⊆ {a, b}∗ with the multiplication defined by x ⋅ a = xa and x ⋅ b = b for all x. Then b is conjugate to a, but not vice versa. Conjugation is symmetric in groups and is therefore an equivalence relation since in this case all the three equations zx = yz, x = z−1 yz, and y = zxz−1 are equivalent. If elements x, y in a monoid are transposed by x = rs and y = sr, then they are also conjugate because sx = srs = ys. Theorem 6.5 (a) states that in free monoids the converse is also true. Thus, in free monoids, conjugation and transposition coincide. We will see in Section 8.10 that the statements about commutation, transposition, and conjugation from free monoids carry over to free groups.
6.1 Commutation, transposition and conjugacy
|
163
The following theorem by Roger Conant Lyndon (1917–1988) and Marcel-Paul Schützenberger (1920–1996) is fundamental in combinatorics on words. It describes the structure of conjugate words. The second part of the theorem deals with pairs of commuting words. Theorem 6.5 (Lyndon, Schützenberger). Let x, y, z ∈ Σ∗ be words. (a) If x ≠ ε and zx = yz, then there are r, s ∈ Σ∗ and k ∈ ℕ such that x = sr, y = rs and z = (rs)k r. In particular, conjugation coincides with transposition, and both are equivalence relations. (b) If xy = yx, then x and y both are powers of the same word r ∈ Σ∗ . Proof: (a) If |z| ⩽ |x|, then z is a suffix of x. Therefore, according to the following picture, a word s with y = zs and x = sz exists. Then the claim holds for z = r and k = 0. z
x y
z s
For |z| > |x|, we obtain the following picture: z
x
y
z z
Thus, z = z x = yz and |z | < |z| since x ≠ ε. By induction on |z| there exist words r, s ∈ Σ∗ and an exponent k ∈ ℕ with x = sr, y = rs and z = (rs)k r. Now, the claim holds with k = k + 1. (b) If x = ε, then the claim holds with r = y. Otherwise, we may assume, by symmetry, that both x and y are not empty. Using part (a), we obtain x = st = ts and y = (st)k s. By induction on |xy|, the words s and t are powers of the same word r. Thus, the statement is true for x and y, too. Every nonempty word w can be written as w = r k where r is nonempty and k ∈ ℕ is maximal. The r word is called the primitive root of w. If we have k = 1, then w is equal to its primitive root, and we say that w is primitive in this case. A primitive root is uniquely defined. For this and another basic property of primitive words, we refer to Exercise 6.5 and Exercise 6.6. Theorem 6.5 (b) shows that two words commute if and only if they have the same primitive root.
164 | 6 Combinatorics on words
6.2 Fine and Wilf’s periodicity lemma Nathan Jacob Fine (1916–1994) and Herbert Saul Wilf (1931–2012) gave another sufficient condition for commutativity in free monoids: if u m and υ n have a sufficiently long prefix in common, then u and υ commute. They gave a tight bound for the meaning of “sufficiently long.” The result is nontrivial and very useful. In order to have a comfortable induction hypothesis, we show a slightly stronger statement. This is a typical example of “loading the induction” where a stronger statement is easier to prove. The proof given here was found by Jeffrey Outlaw Shallit (born 1957). Theorem 6.6. Let u, υ ∈ Σ∗ be nonempty, s ∈ u{u, υ}∗ and t ∈ υ{u, υ}∗ . If s and t have a common prefix of length |u| + |υ| − gcd(|u|, |υ|), then uυ = υu. Proof: Without loss of generality, assume that |u| ⩽ |υ|. The statement is trivial for |u| = |υ|, so let 1 ⩽ |u| < |υ|. From gcd(|u|, |υ|) < |υ|, we obtain υ = uw for some nonempty word w. It suffices to show that uw = wu holds, because then uυ = u(uw) = u(wu) = (uw)u = υu follows. Since |s| ⩾ |u| + |υ| − gcd(|u|, |υ|) > |u|, we have s ∈ uu{u, w}∗ . Since t has uw as a prefix, we can write s = us and t = ut for s ∈ u{u, w}∗ and t ∈ w{u, w}∗ . Now, gcd(|u|, |υ|) = gcd(|u|, |w|) and |υ| = |u| + |w|. Hence, s and t have a common prefix of length |u|+|w|−gcd(|u|, |w|). The induction hypothesis yields uw = wu as desired. A natural number p ⩾ 1 is a period of the word a1 ⋅ ⋅ ⋅ a n with a i ∈ Σ if a i = a i+p for all 1 ⩽ i ⩽ n − p. For example, the word aabaabaa has the periods 3, 6, 7, 8. The length |u| is always a period of a nonempty word u. Fine and Wilf’s periodicity lemma from 1965 is often formulated in the form of Corollary 6.7: if a sufficiently long word w has two periods, then their greatest common divisor is a period of w, too. Corollary 6.7 (Periodicity lemma). If a word w with |w| ⩾ p + q − gcd(p, q) has two periods p and q, then gcd(p, q) is also a period of w. Proof: Let w be a prefix of both s = u k and t = υℓ , where |u| = p and |υ| = q. Theorem 6.6 yields uυ = υu. By Theorem 6.5 (b), u and υ both are powers of a word x. Thus, s also is a power of x and |x| as well as all multiples of it are periods of w. The length of x divides both p and q, therefore |x| also divides gcd(p, q). For all positive integers p, q, we inductively define the pair of words σ(p, q) ∈ {a, b}∗ × {a, b}∗ as follows: (a p , a p−1 b) { { { σ(p, q) = {(υ, u) { { {(u, uυ)
if p = q if p > q and σ(q, p) = (u, υ) if p < q and σ(p, q − p) = (u, υ)
6.3 Kruskal’s tree theorem | 165
The pair of words (u, υ) = σ(p, q) has the properties |u| = p, |υ| = q, uυ ≠ υu and the words u and υ coincide on the first p + q − gcd(p, q) − 1 symbols. Therefore, the bound |u| + |υ| − gcd(|u|, |υ|) in the periodicity lemma is optimal. For instance, the word w = a p−1 ba p−1 has the periods p and p+1, but gcd(p, p + 1) = 1 is not a period of w. This is not a contradiction to the periodicity lemma because |w| = 2p − 1 < p + (p + 1) − 1.
6.3 Kruskal’s tree theorem A word a1 ⋅ ⋅ ⋅ a n with a i ∈ Σ is a subword of υ if υ can be written as υ = υ0 a1 υ1 ⋅ ⋅ ⋅ a n υ n . We write u ≼ υ if u is a subword of υ. The term scattered subword is sometimes used in some works to emphasize the distinction between “subword” and “factor.” We want to show that, if L is an infinite set of words over a finite alphabet, then there exist two different words u, υ ∈ L with u ≼ υ. This fact is known as Higman’s lemma. We prove it in the slightly more abstract context of quasi-orderings and wellquasi-orderings. This approach allows us to derive Kruskal’s tree theorem by using the same methods. Kruskal’s tree theorem is a generalization of Higman’s lemma to the case of finite trees. A quasi-ordering (X, ⩽) is a set X equipped with a reflexive and transitive relation ⩽, that is, ⩽ is a subset of X × X and for all x, y, z ∈ X, we have x ⩽ x and the implication: if x ⩽ y and y ⩽ z, then x ⩽ z . In contrast to partial orders, quasi-orderings need not be antisymmetric, that is, x ⩽ y and y ⩽ x does not imply x = y. We write x < y if x ⩽ y and y ⩽̸ x. Two elements x, y ∈ X are incomparable if neither x ⩽ y nor y ⩽ x; otherwise x, y are called comparable. A subset of pairwise comparable elements is called a chain. A subset Y ⊆ X is an antichain if any two different elements of Y are incomparable. A sequence (x i )i∈ℕ with x i ∈ X is decreasing if x i > x i+1 for all i; it is nondecreasing if x i ⩽ x i+1 for all i. A sequence (x i )i∈ℕ is called good if i < j exist with x i ⩽ x j . If an infinite sequence is not good, then it is bad. A quasi-ordering (X, ⩽) is called a well-quasi-ordering if every infinite sequence is good. Every finite quasi-ordering is a well-quasi-ordering. In particular, the identity on a finite set always defines a well-quasi-ordering. The integers (ℤ, |) with divisibility | form a quasi-ordering, which is not a well-quasi-ordering, because, for example, the sequence of prime numbers is bad. The natural numbers with the usual order (ℕ, ⩽) form a well-quasi-ordering: let (x i )i∈ℕ be a sequence in ℕ and let x j be the smallest value in the sequence; then clearly x j ⩽ x j+1 . The integers (ℤ, ⩽), however, do not form a well-quasi-ordering since, for example, the sequence (−i)i∈ℕ is bad. Similarly, 1 the positive real numbers (ℝ⩾0 , ⩽) are no well-quasi-ordering, because ( i+1 )i∈ℕ is bad.
166 | 6 Combinatorics on words
One of the main motivations to consider well-quasi-orderings is that, in many respects, they behave like finite sets. An important tool in the interplay between finite and infinite sets is Ramsey’s theorem (Frank Plumpton Ramsey, 1903–1930). This theorem is known in many different variants; here, we only deal with edge colorings of infinite complete graphs with finitely many colors. Theorem 6.8 (Ramsey 1930). Let V be an infinite set of vertices and C a finite set of colors. For each coloring, c : (V2 ) → C there is an infinite subset X ⊆ V such that (X2 ) with respect to c is colored with a single color. Proof: We define finite subsets X i ⊆ V and infinite subsets Y i ⊆ V satisfying – X i ⊆ X i+1 and X i+1 \ X i = {x i+1 }, Y i+1 ∪ {x i+1 } ⊆ Y i , X i ∩ Y i = 0, – for all x i there is a color r i ∈ C with c(x i , y) = r i for all y ∈ Y i . We start with X0 = 0 and Y 0 = V. Now, let X i and Y i be already defined. We choose an arbitrary vertex x i+1 ∈ Y i and let X i+1 = X i ∪ {x i+1 }. Since C is finite, there is an infinite subset Y i+1 ⊆ Y i \ {x i+1 } such that all edges {x i+1 , y} for y ∈ Y i+1 have the same color in the coloring c. We now define X = ⋃ i⩾0 X i and a vertex coloring d : X → C by d(x i ) = r i . Then there is an infinite subset X ⊆ X such that all vertices in X have the same color r with respect to d. For, x i , x j ∈ X with i < j, we have c(x i , x j ) = d(x i ) = r because x j ∈ Y i . Therefore, all edges from (X2 ) in c have the color r. We now present a simple but useful characterization of well-quasi-orderings. Theorem 6.9. Let (X, ⩽) be a quasi-ordering. Then the following properties are equivalent: (a) (X, ⩽) is a well-quasi-ordering, that is, all infinite sequences are good. (b) There are no infinite decreasing sequences and no infinite antichains. (c) Every infinite sequence (x i )i∈ℕ in X has an infinite nondecreasing subsequence (x i j )j∈ℕ , that is, we have x i j ⩽ x i j+1 for all j. Proof: The implications from (a) to (b) and from (c) to (a) are trivial. We now show (b) ⇒ (c). Let (x i )i∈ℕ be an infinite sequence in X. We define a coloring c on { (i, j) ∈ ℕ | i < j } by ⩽ if x i ⩽ x j { { { c(i, j) = {> if x i > x j { { if x i , x j are incomparable {‖ This defines an infinite complete graph with vertex set ℕ, whose edges show three colors. According to Ramsey’s theorem, there is an infinite monochromatic subset {i1 , i2 , . . . } ⊆ ℕ. By assumption, the color of this subset can neither be > nor ‖ and thus, x i1 ⩽ x i2 ⩽ ⋅ ⋅ ⋅ is an infinite nondecreasing sequence.
6.3 Kruskal’s tree theorem | 167
Remark 6.10. Many results concerning well-quasi-orderings are of the form “If (X, ⩽) is a well-quasi-ordering, then (X , ⩽ ) is one, too.” For example, (Y, ⩽) is a well-quasiordering if Y ⊆ X for a well-quasi-ordering (X, ⩽); here, the order on Y is the restriction of the order on X. Similarly, a quasi-ordering (X, ≼) is a well-quasi-ordering if (X, ⩽) is a well-quasi-ordering and ⩽ is a subset of ≼. In this case (X, ≼) is called a coarsening of (X, ⩽). ◊ If (X, ⩽) and (Y, ⩽) are two quasi-orderings, then the product order on X × Y is componentwise: two pairs (x, y), (x , y ) ∈ X × Y satisfy (x, y) ⩽ (x , y ) if both x ⩽ x and y ⩽ y . The product order is again a quasi-ordering. Theorem 6.11. If both (X, ⩽) and (Y, ⩽) are well-quasi-orderings, then the product order (X × Y, ⩽) is also a well-quasi-ordering. Proof: Let (x i , y i )i∈ℕ be a sequence in X × Y. By Theorem 6.9 (c), there is a nondecreasing subsequence (x i j )j∈ℕ of (x i )i∈ℕ , that is, we have x i j ⩽ x i j+1 for all j. The sequence (y i j )j∈ℕ has two elements y i k ⩽ y iℓ for k < ℓ. This yields (x i k , y i k ) ⩽ (x iℓ , y iℓ ). The following fact about ℕ k with componentwise order is often attributed to a paper of Dickson (Leonard Eugene Dickson, 1874–1954) published in 1913 [22]. It has been rediscovered many times and seems to have been known earlier, too. Corollary 6.12 (Dickson’s lemma). (ℕ k , ⩽) is a well-quasi-ordering. Proof: Since (ℕ, ⩽) is a well-quasi-ordering, the claim follows from Theorem 6.11 by induction on k. By Dickson’s lemma, ℕ k cannot contain an infinite antichain, however, arbitrarily large (finite) antichains occur. For example A n = { (i, n − i) | 0 ⩽ i ⩽ n } is an antichain of size n + 1 in ℕ2 . Let (X, ⩽) be a quasi-ordering. On words from X ∗ , we define the subword relation as follows. Let u = a1 ⋅ ⋅ ⋅ a n with a i ∈ X. We let u ≼ υ if υ has a factorization υ = υ0 b 1 υ1 ⋅ ⋅ ⋅ b n υ n with b i ∈ X and υ i ∈ X ∗ , such that a i ⩽ b i for all i. This way, (X ∗ , ≼) forms a quasi-ordering. If u ≼ υ, then u is called a subword of υ. The following result by Graham Higman (1917–2008) from 1952 states that the subword relation over a well-quasi-ordering is itself a well-quasi-ordering [49]. We present the proof of Crispin St. John Alvah Nash-Williams (1932–2001); it uses the technique of so-called minimal bad sequences [80]. Theorem 6.13 (Higman’s lemma). If (X, ⩽) is a well-quasi-ordering, then (X ∗ , ≼) with the subword relation is a well-quasi-ordering, too. Proof: Suppose that (X ∗ , ≼) is not a well-quasi-ordering: then there are bad sequences in X ∗ . We inductively define a minimal bad sequence (u i )i∈ℕ as follows. Assume we have already constructed u 0 , . . . , u i−1 suitably. Then for u i , we choose some shortest word such that there exists a bad sequence beginning with u 0 , . . . , u i . This pro-
168 | 6 Combinatorics on words cess defines a bad sequence (u i )i∈ℕ . In particular, all words u i are nonempty. We write u i = a i υ i with a i ∈ X and υ i ∈ X ∗ . The sequence (a i )i∈ℕ by Theorem 6.9 (c) contains an infinite nondecreasing subsequence (a i j )j∈ℕ . Note that u i ≼ υ j implies u i ≼ u j . So, if the sequence (υ i j )j∈ℕ were bad, then also u 0 , . . . , u i0 −1 , υ i0 , υ i1 , υ i2 , . . . would be bad, contradicting the definition of u i0 . Thus, there are k < ℓ with υ i k ≼ υ iℓ . Using a i k ⩽ a iℓ , this yields u i k ≼ u iℓ in contradiction to the fact that (u i )i∈ℕ is bad. Consequently, there are no bad sequences, and (X ∗ , ≼) is a well-quasi-ordering. A frequent application of Higman’s lemma is in the case of the well-quasi-ordering (Σ, =) for a finite alphabet Σ. Then, u = a1 ⋅ ⋅ ⋅ a n with a i ∈ Σ is a subword of υ if υ = υ0 a1 υ1 ⋅ ⋅ ⋅ a n υ n for υ i ∈ Σ∗ . From this special case of Higman’s lemma, one can easily n n derive Dickson’s lemma: k-tuples (n1 , . . . , n k ) ∈ ℕ k are encoded as words a11 ⋅ ⋅ ⋅ a k k k over the alphabet Σ = {a1 , . . . , a k }. A sequence (x i )i∈ℕ in ℕ corresponds in this way to a sequence (u i )i∈ℕ of words u i ∈ Σ∗ ; and x i ⩽ x j holds if and only if u i is a subword of u j . A generalization of Higman’s lemma is Kruskal’s tree theorem (Joseph Bernard Kruskal, 1928–2010) from 1960. It shows that the set of finite trees is a well-quasiordering with respect to the subtree relation [61]. Depending on what kind of trees are considered, one obtains different versions of Kruskal’s tree theorem. We use rooted trees in which the children of each node are ordered. Moreover, the nodes in the trees are labeled with elements from a well-quasi-ordering. This scenario yields one of the most general variants of Kruskal’s tree theorem. Let X be a set. We inductively define the class of finite trees TX over X as follows: – If t1 , . . . , t n are trees in TX , then also x(t1 , . . . , t n ) is a tree in TX for all x ∈ X. Here x(t1 , . . . , t n ) denotes the tree whose root is labeled x and whose children are (in this order) the trees t1 , . . . , t n . Since the order of a node’s children is essential, one often speaks of ordered trees. The tree x(t1 , . . . , t n ) is illustrated in Figure 6.1. Note that the trees here are oriented from top to bottom; in particular, parent nodes are always depicted closer to the top than their child nodes. For n = 0, we see that every tree consisting only of a root labeled by x belongs to TX . x ⋅⋅⋅ t1
t2
t n−1
Fig. 6.1. A tree x(t 1 , . . . , t n ).
tn
6.3 Kruskal’s tree theorem |
169
Let (X, ⩽) be a quasi-ordering. We inductively define the subtree relation ≪ on TX as follows: – Let n ⩾ 0. If x ⩽ y and s i ≪ t i , then for all finite sequences u i of trees from TX , we have x(s1 , . . . , s n ) ≪ y(u 0 , t1 , u 1 , . . . , t n , u n ). – If s ≪ t, then also s ≪ x(t) for all x ∈ X. – If r ≪ s and s ≪ t, then r ≪ t. The idea is that s ≪ t if and only if one can obtain the tree s from t by applying the following three operations: (a) replacing the label y ∈ X of some node by x ∈ X, provided that x ⩽ y; (b) removing subtrees of an arbitrary node in t; and (c) removing (“skipping”) nodes having only one child. Especially, ≪ is a quasi-ordering. If s ≪ t holds, then we call s a subtree of t. In the following example from T{a,b} over the wellquasi-ordering ({a, b}, =), we have t3 ≪ t2 ≪ t1 , whereas t4 is not a subtree of t1 . a a
b a
b
b
a
Tree t1
a
a a b
b
a
a
b
b
a a
a
b
b
b Tree t2 ≪ t1
Tree t3 ≪ t2
Tree t4 ≪̸ t1
Theorem 6.14 (Kruskal’s tree theorem). If (X, ⩽) is a well-quasi-ordering, then (TX , ≪) with the subtree relation is a well-quasi-ordering, too. Proof: Suppose that (TX , ≪) is not a well-quasi-ordering. Then there are bad sequences. We inductively define a minimal bad sequence (t i )i∈ℕ of trees t i ∈ TX . Let t0 , . . . , t i−1 be suitably constructed already. Then we choose a tree t i with a minimum number of nodes such that a bad sequence exists, which begins with t0 , . . . , t i . This process yields a bad sequence (t i )i∈ℕ . Let x i be the label of the root of t i . Since (X, ⩽) is a well-quasi-ordering, by Theorem 6.9 (c) there is an infinite nondecreasing subsej j j quence (x i j )j∈ℕ of (x i )i∈ℕ with x i j ⩽ x i j+1 . We write t i j = x i j (s1 , . . . , s n j ) with s k ∈ TX . j j Let Sj = {s1 , . . . , s n j } and S = ⋃ j∈ℕ Sj . Claim. (S, ≪) with the subtree relation is a well-quasi-ordering. Proof of the claim: Suppose that (si )i∈ℕ is a bad sequence in S. Choose j minimal such that Sj ∩ { si | i ∈ ℕ } ≠ 0. By omitting a finite number of elements at the beginning of the sequence, we may assume that s0 ∈ Sj . Due to the minimality of t i j , the sequence t0 , . . . , t i j −1 , s0 , s1 , . . . is good. Since both (t i )i∈ℕ and (si )i∈ℕ are bad, there exist t k and sℓ with k < i j such that t k ≪ sℓ . Let sℓ ∈ Sm . Then t k ≪ x i m (sℓ ) ≪ t i m , and by the choice of j, we have k < i m . This is a contradiction.
170 | 6 Combinatorics on words By Higman’s lemma, the words S∗ with the subword relation ≼ form a well-quasij j ordering. We consider the sequence (u j )j∈ℕ with u j = s1 ⋅ ⋅ ⋅ s n j ∈ S∗ . Since it is good, there are k < ℓ with u k ≼ u ℓ . Together with x i k ⩽ x iℓ , we finally obtain t i k ≪ t iℓ , a contradiction. The proof of Kruskal’s tree theorem presented here is essentially the one given by Nash-Williams [80]. Higman’s lemma is a special case of Kruskal’s tree theorem: a word a1 ⋅ ⋅ ⋅ a n ∈ X ∗ with a i ∈ X is encoded by the tree a(a1 , . . . , a n ) for a fixed a ∈ X; alternatively one could also use the more vertical encoding a1 (a2 (⋅ ⋅ ⋅ a n−1 (a n ) ⋅ ⋅ ⋅ )). Frequently, one considers trees in which one does not distinguish the order of the children. This variant of Kruskal’s tree theorem can easily be derived from Theorem 6.14 by using Remark 6.10. The subtree relation appearing in the unordered case is nothing but the so-called graph minor relation. In a series of 20 publications with altogether more than 500 pages, George Neil Robertson (born 1938) and Paul D. Seymour (born 1950) proved that the finite graphs with the graph minor relation form a well-quasi-ordering. The final result is given in [88].
Exercises 6.1. Consider the monoid M = Σ∗ /{ac = ca} with generating set Σ = {a, b, c}, where words of the form uacυ and ucaυ are identified with each other; this means we perform operations like we do with words, but with the additional rule ac = ca. So, for example, in M, we have bcaaa = bacaa = baaca = baaac, but abc ≠ cba. Show that the transposition of elements in M is not a transitive relation. 6.2. (Levi’s lemma, named after Friedrich Wilhelm Daniel Levi, 1888–1966) Let M be a monoid. Show that the following properties are equivalent: (i) M is isomorphic to the free monoid Σ∗ over a set Σ. (ii) There is a homomorphism φ : M → ℕ with φ−1 (0) = 1M , and for all p, q, x, y ∈ M such that pq = xy there exists an element u ∈ M for which either p = xu, y = uq or x = pu, q = uy is satisfied. 6.3. Determine a nonfree monoid, in which for all elements p, q, x, y with pq = xy there is an element u such that either p = xu, y = uq or x = pu, q = uy is satisfied (compare with Exercise 6.2). 6.4. Let < be a linear order on Σ. We define the lexicographic order ≺ on Σ ∗ by u ≺ υ if u is a proper prefix of υ or u = ras and υ = rbt with r, s, t ∈ Σ∗ , a, b ∈ Σ and a < b. Show: (a) ∀w ∈ Σ∗ : u ≺ υ ⇔ wu ≺ wυ (b) If u is not a prefix of υ, then ∀w, z ∈ Σ∗ : u ≺ υ ⇒ uw ≺ υz.
Summary
| 171
6.5. A word w ∈ Σ∗ is called primitive, if it cannot be written as w = u i with i > 1. Show that the following statements are equivalent: (i) w is primitive. (ii) w is not a proper factor of w2 . (iii) The cyclic permutation υa of w = aυ with a ∈ Σ is primitive. 6.6. A primitive root of a word w is a primitive word u with w = u i for i ∈ ℕ. Show that every nonempty word w has a unique primitive root. 6.7. A word w ∈ Σ∗ is called Lyndon word if w is primitive and w = uυ ≺ υu holds for each decomposition w = uυ such that u ≠ ε ≠ υ. Here, ≺ again is the lexicographic order induced by a given linear order on the letters. Show that the following statements are equivalent: (i) w is a Lyndon word. (ii) For all proper suffixes υ of w, we have w ≺ υ. (iii) w ∈ Σ or there is a decomposition w = uυ of w into Lyndon words u, υ with u ≺ υ. 6.8. (Lyndon factorization [16]) Let < be a linear order on Σ and let ≺ be the corresponding lexicographic order on Σ∗ . Show that every word w ∈ Σ∗ can uniquely be decomposed into w = ℓ1 ⋅ ⋅ ⋅ ℓn , where ℓ1 , . . . , ℓn are Lyndon words with ℓn ≼ ⋅ ⋅ ⋅ ≼ ℓ1 .
Summary Notions – – – – – – – – –
letter alphabet Σ, words Σ∗ length of a word empty word ε concatenation free monoid prefix suffix factor
– – – – – – – – –
power transposed conjugate period quasi-ordering incomparable chain, antichain decreasing sequence nondecreasing sequence
– – – – – – –
good/bad sequence well-quasi-ordering subword subword relation ≼ (ordered) tree subtree subtree relation ≪
Methods and results – – – – –
Universal property of free monoids: each mapping φ : Σ → M can uniquely be extended to a homomorphism φ : Σ∗ → M. Lyndon and Schützenberger: x ≠ ε, zx = yz ⇒ x = sr, y = rs, z = (rs)k r. xy = yx ⇒ x, y both are powers of a word r. periodicity lemma: word w has periods p, q and |w| ⩾ p + q − gcd(p, q) ⇒ w has period gcd(p, q). Ramsey: for each coloring c : (V2 ) → C with |V| = ∞ and |C| < ∞ there is X ⊆ V with |X| = ∞ and c(X2 ) = 1.
172 | 6 Combinatorics on words
–
– – – – – –
Let (X, ⩽) be a quasi-ordering. Then: (X, ⩽) is well-quasi-ordering ⇔ there is no infinite properly descending sequence and no infinite antichains ⇔ every infinite sequence has an infinite increasing partial sequence. subsets of well-quasi-orderings are well-quasi-orderings. coarsenings of well-quasi-orderings are well-quasi-orderings. (X, ⩽) and (Y, ⩽), well-quasi-orderings ⇒ (X × Y, ⩽) with componentwise comparison is wellquasi-ordering. Dickson’s lemma: (ℕk , ⩽) is a well-quasi-ordering. Higman’s lemma: The subword relation is a well-quasi-ordering. Kruskal’s tree theorem: The subtree relation is a well-quasi-ordering.
7 Automata Various topics in theoretical computer science deal with the classification of formal languages with respect to their complexity, where different disciplines use different measures of “complexity.” Typically, a formal language is a subset of a finitely generated free monoid Σ∗ . The word “formal” serves to distinguish these languages from natural languages. In many cases, we omit this adjective and call L ⊆ Σ∗ a language. The classification of languages can be done from rather different points of view. A wellknown approach is the Chomsky hierarchy by Avram Noam Chomsky (born 1928), where grammars are used as description mechanisms. The class of regular languages coincides with Type-3 in Chomsky’s hierarchy. This class is particularly robust as it admits various other natural characterizations. In this chapter, we examine regular languages not only within free monoids but also in arbitrary monoids. The two main mechanisms to describe regular languages, regular expressions and deterministic finite state automata, lead to different classes. Therefore, we distinguish between rational and recognizable sets. We will see that rational sets correspond to nondeterministic finite automata and regular expressions, whereas deterministic finite-state automata define recognizable sets. Moreover, there are finitely generated monoids where the class of rational subsets is strictly larger than the class of recognizable sets. Section 7.3 shows that the concepts rational and recognizable coincide over finitely generated free monoids, which allows an unambiguous use of the term regular in this context. Let L and K be the subsets of a monoid M. The product L ⋅ K is defined by L ⋅ K = { uυ | u ∈ L, υ ∈ K }. Moreover, we let L0 = {1} and L n+1 = L ⋅ L n for n ∈ ℕ. Thus, we can define the submonoid generated by L as follows: L∗ = ⋃ L n = { u 1 ⋅ ⋅ ⋅ u n | u i ∈ L, n ⩾ 0 } n∈ℕ
L∗
is the smallest submonoid of M which contains L. The operation By definition, ∗ L → L is called the Kleene star after Stephen Cole Kleene (1909–1994) or simply the star operation. The Kleene star depends on the ambient monoid M; it must not to be confused with the notation for the free monoid over L. For example, the Kleene star of {a, aa} for a ∈ Σ contains the word aaa = a ⋅ aa = aa ⋅ a = a ⋅ a ⋅ a. However, in the free monoid generated by the alphabet {a, aa}, the words a ⋅ aa, aa ⋅ a and a ⋅ a ⋅ a are all different. Frequently, we use shorthands, for example, we may write LK instead of L ⋅ K or, if u ∈ M, then we identify u with the singleton {u}. It is also common to write L + K to denote the union L ∪ K.
174 | 7 Automata
7.1 Recognizable sets The concept of recognizability is based on an algebraic approach to formal language theory. It is particularly well suited to describe the “complexity” of subclasses of regular languages and to decide their membership in subclasses. Star-free languages are the most prominent of one of these subclasses and we discuss them in Section 7.4. Let M and N be monoids. Typically, N is finite. A subset L ⊆ M is recognized by a homomorphism φ : M → N if L = φ−1 (φ(L)). In this case, we say that N recognizes the set L ⊆ M. A subset of M is called recognizable if it is recognized by a finite monoid N. The set of all recognizable subsets of M is denoted by REC(M). If φ is given by the images of a generating set Σ of M, then we can test membership of u ∈ L for u = a1 ⋅ ⋅ ⋅ a n with a i ∈ Σ by a “table look-up,” we compute φ(u) = φ(a1 ) ⋅ ⋅ ⋅ φ(a n ) in N and check whether φ(u) belongs to the finite table φ(L) ⊆ N. Remark 7.1. In a finite monoid M, every subset of M is recognized by the identity mapping M → M. In particular, finite subsets of a finite monoid are recognizable. In a group G the converse holds; if the finite subsets are recognizable, then G is finite. Indeed, let N be finite and φ : G → N be a surjective homomorphism recognizing the singleton {1}. Then N is a group and {1} is the preimage of the neutral element of N. Therefore, the kernel of φ : G → N is trivial. This means that φ is injective. Hence, G is finite because N is finite. ◊ Theorem 7.2. The class of recognizable sets over a monoid M is a Boolean algebra. This means, it is closed under finite union, finite intersection, and complementation. It is also closed under inverse homomorphisms. This means, if ψ : M → M is a homomorphism and if L ∈ REC(M), then ψ−1 (L) ∈ REC(M). Proof: Every finite monoid recognizes 0 and M. This shows the closure under empty union and empty intersection. Let L1 , L2 be recognizable sets and for i = 1, 2, let φ i : M → N i be a homomorphism with L i = φ−1 i (F i ) where F i = φ i (L i ). The homomorphism φ1 : M → N1 also recognizes the complement M\L1 . Consider ψ : M → N1 ×N2 with ψ(u) = (φ1 (u), φ2 (u)). Then ψ−1 (F1 ×F2 ) = L1 ∩L2 and ψ−1 ((F1 ×N2 )∪(N1 ×F2 )) = L1 ∪ L2 . Let ψ : M → M be a homomorphism and let L ⊆ M be recognized by φ : M → N, then ψ−1 (L) ⊆ M is recognized by φ ∘ ψ : M → N. This shows that the class of recognizable sets is closed under inverse homomorphisms. We have just seen that recognizable sets are closed under inverse homomorphisms, but, in general, they are not closed under homomorphisms. Consider, for example, the homomorphism ψ : {a, b}∗ → ℤ with a → 1 and b → −1. The singleton {ε} ⊆ {a, b}∗ is recognizable in {a, b}∗ , but ψ({ε}) = {0} ⊆ ℤ is not recognizable in ℤ by Remark 7.1. Actually, no nontrivial subset of a∗ is recognizable in the group ℤ. The following example yields an even stronger “nonclosure"property.” It was given by Shmuel Wino-
7.1 Recognizable sets
| 175
grad (born 1936) and shows that, in general, recognizable sets are neither closed under product nor under Kleene star. Example 7.3. We extend the addition from ℤ to M = ℤ ∪ {e, a} to an operation on M by e + m = m + e = m for m ∈ M, a + a = 0, and a + k = k + a = k for k ∈ ℤ. This way M = (M, +, e) forms a commutative monoid with e as its neutral element. Consider a homomorphism φ : M → N with φ−1 (φ(L)) = L, and let φ̃ : ℤ → N be the restriction of φ to ℤ ⊆ M. With F = φ(L) we have φ̃ −1 (F) = φ−1 (F) ∩ ℤ = L ∩ ℤ. Thus, L ∈ REC(M) implies L ∩ ℤ ∈ REC(ℤ). By Remark 7.1, we obtain {0} ∈ ̸ REC(ℤ). Next, we show that {a} ∈ REC(M). Let N = {e, a, n} be a commutative monoid with neutral element e, zero element n and a + a = n. Let φ : M → N be the homomorphism with e → e, a → a and k → n for k ∈ ℤ. Then φ−1 (a) = {a} and therefore {a} ∈ REC(M). On the other hand, L1 = {a} + {a} = {0} and L2 = {a}∗ = {e, a, 0} are not in REC(M), because otherwise also L i ∩ ℤ = {0} ∈ REC(ℤ). ◊ A congruence in a monoid M generalizes the notion of “being congruent modulo n” for integers. By definition, it is an equivalence relation ≡ ⊆ M × M such that u ≡ υ implies xuy ≡ xυy for all u, υ, x, y ∈ M. For u ∈ M, let [u] = { υ ∈ M | u ≡ υ } be its congruence class. The set of congruence classes { [u] | u ∈ M } is denoted by M/ ≡ and it forms a monoid with the multiplication [u] ⋅ [υ] = [uυ] The operation is well defined because u ≡ u and υ ≡ υ yields uυ ≡ u υ ≡ u υ From the corresponding properties of the monoid M, it follows that the operation on M/ ≡ is associative with [1] as neutral element. (It follows “par transport de structures.”) Moreover, the mapping μ : M → M/ ≡ with μ(u) = [u] is a surjective homomorphism. Let L be any subset of M. Then we assign to L the syntactic congruence ≡L by letting u ≡L υ if ∀x, y ∈ M : xuy ∈ L ⇔ xυy ∈ L Hence, u can “syntactically” be replaced by υ in any context without changing the membership with respect to L. The relation ≡L is obviously a congruence and the congruence classes form the syntactic monoid of L. It is denoted by Synt(L). The syntactic monoid Synt(L) recognizes the subset L (i.e., μ−1 (μ(L)) = L) because for μ(u) = μ(υ) either u, υ ∈ L or u, υ ∈ ̸ L. The next theorem implies that Synt(L) is the minimal monoid recognizing L. Theorem 7.4. Let L ⊆ M and φ : M → N be a homomorphism with L = φ−1 (φ(L)). Then, by φ(u) → [u], a surjective homomorphism from the submonoid φ(M) ⊆ N to Synt(L) is defined.
176 | 7 Automata Proof: The mapping φ(u) → [u] is well defined: for φ(u) = φ(υ) and x, y ∈ M we have xuy ∈ L ⇔ φ(xuy) ∈ φ(L) ⇔ φ(xυy) ∈ φ(L) ⇔ xυy ∈ L Thus [u] = [υ]. Furthermore, φ(M) → Synt(L) : φ(u) → [u] is a surjective mapping. The homomorphism property now follows from 1 = φ(1) → [1], φ(uυ) = φ(u)φ(υ) as well as φ(uυ) → [uυ] and φ(u)φ(υ) → [u][υ] for all u, υ ∈ M. Another concept related to recognizability by monoids is acceptance by deterministic automata. An M-automaton A = (Q, ⋅, q0 , F) consists of a set of states Q, an initial state q0 ∈ Q, a set of final states F ⊆ Q, and a transition function ⋅ : Q × M → Q satisfying the following properties for all q ∈ Q and u, υ ∈ M. (q ⋅ u) ⋅ υ = q ⋅ (uυ) q⋅1= q We define the set L(A) that is accepted by the M-automaton A by L(A) = { u ∈ M | q0 ⋅ u ∈ F } In mathematical terms, Q is a set where M acts on the right. We can also think that, by reading an element u ∈ M in a state q, we move from q to state q ⋅ u. Choosing a generating set Σ ⊆ M, we may represent M-automata as edge-labeled directed graphs. The states are the vertices and, in our graphical representation, an initial state is marked by an incoming arrow, whereas final states are characterized by double circles. We let δ = { (p, u, p ⋅ u) | p ∈ Q, u ∈ Σ } and we draw an edge from p to p ⋅ u which is labeled by u. The edges in this graph are called transitions. If we refer to this representation, we also denote an M-automaton as a tuple A = (Q, Σ, δ, q0 , F). We use the following convention in pictures.
p initial state
final state
u
q
transition (p, u, q)
q
u
transition (q, u, q)
To determine p ⋅ u for arbitrary pairs (p, u) ∈ Q × M, it is sufficient to factorize u = u 1 ⋅ ⋅ ⋅ u n with u i ∈ Σ and then follow the states p, p ⋅ u 1 , . . . , p ⋅ u 1 ⋅ ⋅ ⋅ u n successively. Since Σ is a generating set, this path from p to p ⋅ u exists. M-automata defined this way are complete and deterministic. The automaton is complete because the transition function ℚ × Σ → Q, (q, u) → q ⋅ u is totally defined. It is deterministic because, for each q ∈ Q and for each u ∈ M, there is at most one state q ⋅ u. (In Section 7.2, we extend this concept to nondeterministic M-automata.) We say that a state q ∈ Q is reachable if there is some u ∈ M with q = q0 ⋅ u. Without loss of generality, we can assume that all states are reachable, because removing
7.1 Recognizable sets
| 177
an unreachable state does not change the accepted set L(A). Let Σ ≠ 0. Since an Mautomaton is complete, we see that δ is finite if and only if Q and Σ are finite. Thus, if M is not finitely generated, then δ is always infinite. If M is finitely generated, then we can choose Σ to be finite and we see that δ is finite if and only if the set of reachable states is finite. Example 7.5. The set {−1, +1} is a generating set of the monoid ℤ of integers, with the addition as monoid operation. The ℤ-automaton A, defined by A = ({0, 1, 2, 3, 4, 5}, δ, 0, {1, 4}) with δ(q, k) = (q + k) mod 6 has the following graphical representation: +1 +1
1
2 −1
−1
−1 3
0 −1 +1
+1
−1 −1
5
4
+1
+1 The set accepted by A is L(A) = { k + 6ℓ | k ∈ {1, 4}, ℓ ∈ ℤ }.
◊
To every subset L ⊆ M, you can assign a canonical automaton, which is called the minimal automaton of L. For this purpose, we first define the quotient by an element u ∈ M by L(u) = { υ ∈ M | uυ ∈ L }. These subsets are interpreted as states. L(u) is also commonly denoted by u −1 L which is closer to the notation of a (left) quotient. Note that L = L(1) and this is the natural initial state. Formally, the minimal automaton AL of L ⊆ M is defined as follows: AL = (Q, ⋅, q0 , F) Q = { L(u) | u ∈ M } q0 = L = L(1) F = { L(u) | 1 ∈ L(u) } L(u) ⋅ υ = L(uυ) Let L(u) = L(u ). Then x ∈ L(uυ) ⇔ uυx ∈ L ⇔ υx ∈ L(u) = L(u ) ⇔ u υx ∈ L ⇔ x ∈ L(u υ) This shows that L(uυ) = L(u υ). As a consequence, L(u) ⋅ υ is well defined; it does not depend on the choice of representatives for the class L(u). Moreover, L(u) ⋅ 1 = L(u) and (L(u) ⋅ υ) ⋅ w = L(uυ) ⋅ w = L(uυw) = L(u) ⋅ υw. The minimal automaton A L accepts exactly the set L, because L(1) ⋅ u = L(u), and u ∈ L is equivalent to 1 ∈ L(u).
178 | 7 Automata Example 7.6. We consider the set L = { k + 6ℓ | k ∈ {1, 4}, ℓ ∈ ℤ } as in Example 7.5. Adding multiples of 3 does not change membership in L. It therefore suffices to determine the languages L(0), L(1) and L(2): L(0) = 1 + 3ℤ, L(1) = 3ℤ and L(2) = −1 + 3ℤ. This yields the following minimal automaton AL : L(1)
+1 −1
−1 +1
L(0)
−1 +1
L(2)
◊ The following theorem justifies the name “minimal” automaton. Theorem 7.7. Let L ⊆ M be accepted by an M-automaton A = (Q, Σ, ⋅, q0 , F) and AL the minimal automaton of L. Then q0 ⋅ u → L(u) is a surjective mapping from the set of reachable states of A onto the set of states in AL . In particular, the number of states of AL is at most the number of states of A. Proof: From q = q0 ⋅ u, we immediately obtain L(u) = { υ | q ⋅ υ ∈ F }. Therefore, the mapping q0 ⋅ u → L(u) is well defined. It is surjective, since for each u ∈ M the reachable state q0 ⋅ u is a preimage of L(u).
Automata minimization Theorem 7.7 implicitly provides a minimization method for M-automata. For a state p ∈ Q, let L(p) = {υ ∈ M | p ⋅ υ ∈ F} be the language at p. The set accepted by A is exactly the language at the initial state. Moreover, we see that L(p) = L(q) implies L(p⋅u) = L(q⋅u) for all u ∈ M. Let us merge states which define the same language into a single class [p] and let [p] ⋅ u = [p ⋅ u]. Then Theorem 7.7 tells us that the mapping q0 ⋅ u → L(u) induces a bijection between { [p] | p ∈ Q is reachable } and the states { L(u) | u ∈ M } of the minimal automaton. The following marking algorithm by Edward Forrest Moore (1925–2003) [77] is applied to an M-automaton with states Q and δ ⊆ Q × Σ × Q. After preprocessing we know that all states are reachable from the initial state q0 . Now, in a first phase, the algorithm marks pairs of states whose components define different languages as explained below. Then, in a second phase one chooses a representative from each set of unmarked states. The first of the two phases satisfies the invariant that a pair (p, q) ∈ Q × Q is only marked if L(p) ≠ L(q). The first phase works as follows:
7.1 Recognizable sets |
179
(1) Choose any linear order ⩽ on Q (for example, Q = {1, . . . , n} with the natural order). (2) Let P = { (p, q) ∈ Q × Q | p < q } be the set of all ordered pairs. (Typically, P is represented as an upper triangular matrix.) (3) Mark all pairs (p, q) ∈ P if one of the vertices p, q is a final state and the other one is not. (4) Consider a marked pair (p , q ) ∈ P. Mark all unmarked pairs (p, q) ∈ P where {p , q } = {p ⋅ a, q ⋅ a} for some a ∈ Σ. Then, delete the pair (p , q ) from the set P. (5) Repeat the previous step until all remaining pairs in P are unmarked. Hence, no pairs are deleted anymore. This concludes the first phase. Let us see that the invariant L(p) ≠ L(q) holds for all marked pairs (p, q). This is certainly true if, say, p is final and q is not final because then 1 ∈ L(p) \ L(q). Now, consider a marked pair (p , q ). By induction (and by symmetry in p and q ) we may assume that there is some u ∈ L(p ) \ L(q ). The algorithm marks all pairs (p, q) ∈ P with {p , q } = {p ⋅ a, q ⋅ a} for a ∈ Σ. This implies u = au ∈ (L(p) \ L(q)) ∪ (L(q) \ L(p)). Thus, the invariant L(p) ≠ L(q) is satisfied. As a consequence, if there are p, q ∈ Q with p < q and L(p) = L(q), then the pair (p, q) has not been deleted and it is still a member of P. We also need the converse. Let (p, q) ∈ P with L(p) ≠ L(q) and let u ∈ (L(p) \ L(q)) ∪ (L(q) \ L(p)). If |u| = 0, then (p, q) is marked in step (3) and deleted. In the other case we have u = aυ and υ ∈ (L(p ⋅ a) \ L(q ⋅ a)) ∪ (L(q ⋅ a) \ L(p ⋅ a)). By induction on |u|, either the pair (p ⋅ a, q ⋅ a) or the pair (q ⋅ a, p ⋅ a) is marked. This implies that (p, q) is marked and then deleted in this case, as well. In the following, we only consider the remaining pairs (p, q) in P. The idea for the second phase is based on the following observation. If we view (p, q) as an undirected edge {p, q}, then, in the resulting undirected graph (Q, P), connected components are cliques. The languages at all states in the same clique are equal and can be identified with the states of A L . ̂ which is in the same So in the second phase, we assign to every p ∈ Q a state p ̂ is minimal with respect to the chosen linear order connected component of p. Say, p of Q. There are various ways to do this algorithmically; and one can choose any of ̂ | p ∈ Q } is in canonical bijection with the set of states in the them. The set ̂ Q = {p minimal automaton AL . The bijection is induced by mapping p = q0 ⋅ u to the quotient L(u). Since we started with an automaton where every state is reachable, we can write p = q0 ⋅ u for some u and the mapping q0 ⋅ u → L(u) is well defined and injective ̂ , a, q̂ ) | (p, a, q) ∈ δ }. Then, on ̂ Q by construction of the graph. Finally, we let δ̂ = { (p ̂ q̂0 , F ∩ ̂ the automaton (̂ Q, Σ, δ, Q) is, up to renaming, the same M-automaton as the minimal automaton AL .
180 | 7 Automata
Syntactic monoids as transition monoids To an M-automaton A = (Q, ⋅, q0 , F), we assign its transition monoid by the canonical interpretation of an element in M as a mapping on states; for u ∈ M we define the mapping δ u : Q → Q by δ u (q) = q ⋅ u. The transition monoid of A is the submonoid { δ u | u ∈ M } of the monoid { f : Q → Q | f is function } of all mappings from Q to Q. We have δ υ (δ u (q)) = q ⋅ uυ = δ uυ (q) for all q ∈ Q. Thus, the operation δ u ⋅ δ υ = δ uυ is well defined. The homomorphism that maps u ∈ M to δ u recognizes L(A) because we have u ∈ L(A) if and only if δ u (q0 ) ∈ F. Theorem 7.8. There is a canonical isomorphism between the syntactic monoid of L ⊆ M and the transition monoid of the minimal automaton AL . Proof: Let T be the transition monoid of AL . By Theorem 7.4 the homomorphism δ u → [u] is a surjection from T onto the syntactic monoid Synt(L). We only have to show that it is injective. So let [u] = [υ], that is, u ≡L υ. For a state L(x) = { y | xy ∈ L } in AL we have δ u (L(x)) = L(x) ⋅ u = L(xu). For all y ∈ M this yields the equivalence y ∈ δ u (L(x)) = L(xu) ⇔ xuy ∈ L ⇔ xυy ∈ L ⇔ y ∈ L(xυ) = δ υ (L(x)) and therefore δ u (L(x)) = δ υ (L(x)). This shows δ u = δ υ . The following theorem by John R. Myhill, Sr. (1923–1987) and Anil Nerode (born 1932) states that recognizability and acceptance by M-automata with finitely many states are equivalent [81]. Theorem 7.9 (Myhill, Nerode). Let L ⊆ M. The following statements are equivalent: (a) L is recognizable, that is, L ∈ REC(M). (b) L is accepted by an M-automaton with finitely many states. (c) The minimal automaton AL has finitely many states. (d) The syntactic monoid Synt(L) is finite. Proof: (a) ⇒ (b): Let N be a finite monoid and φ : M → N a homomorphism with φ−1 (φ(L)) = L. We define a finite M-automaton AN = (N, ⋅, 1N , φ(L)) with n ⋅ u = n φ(u). Now L(AN ) = { u 1N ⋅ φ(u) ∈ φ(L) } = φ−1 (φ(L)) = L that is, AN accepts the set L. (b) ⇒ (c). This follows from Theorem 7.7. (c) ⇒ (d). If AL has only finitely many states Q L , then there are only finitely many mappings from Q L to Q L . Therefore, the transition monoid of AL is finite and thus, by Theorem 7.8, so is the syntactic monoid Synt(L). (d) ⇒ (a): This follows, since the syntactic monoid Synt(L) recognizes the set L.
7.2 Rational sets | 181
7.2 Rational sets In this section, we introduce rational sets and extend the concept of M-automata to nondeterministic automata. Rational expressions define rational sets and we show that rational sets and nondeterministic finite M-automata have the same expressive power. The family RAT(M) of rational sets over a monoid M is inductively defined as follows: – L ∈ RAT(M) for all finite subsets L of M. – If K, L ∈ RAT(M), then also K ∪ L ∈ RAT(M) and KL ∈ RAT(M). – If L ∈ RAT(M), then also L∗ ∈ RAT(M). Here, as defined previously, KL = K ⋅ L = { uυ ∈ M | u ∈ K, υ ∈ L }, and L∗ denotes the submonoid of M generated by L. The definition of a rational set is given together with its semantics as a subset of M. Sometimes it is useful to distinguish between the subset L and its syntactic description as a rational (or regular) expression. If + is not used for the monoid operation, then the syntactic operator + has the same meaning as ∪, so a finite set L = {u1 , . . . , u k } is given by the expression u 1 + ⋅ ⋅ ⋅ + u k , too. If k = 0, then we use 0 as an expression. We define KL, and L∗ to be rational expressions if K and L are rational expressions. Moreover, brackets are used to avoid ambiguity, for example, the expression a(b + c)∗ is another syntax for {a}{b, c}∗ . However, for the monoid M = ℕ × ℕ × ℕ with addition + as operation, the expression (0, 2, 4) + ((0, 1, 2) ∪ (1, 0, 2))∗ denotes {(0, 2, 4)}{((0, 1, 2), (1, 0, 2))}∗ . So, this is a case where + does not mean “union.” We will make sure that there is no risk of confusion. In M as above, the expression (0, 2, 4) + ((0, 1, 2) ∪ (1, 0, 2))∗ describes the set { (ℓ, m, n) | 2(ℓ + m) = n and m ⩾ 2 } Remark 7.10. The class of rational sets is closed under homomorphisms. Indeed, let ψ : M → M be a homomorphism and let L, K ⊆ M be any languages. If L is finite, then ψ(L) is finite. Moreover, ψ(L ∪ K) = ψ(L) ∪ ψ(K), ψ(L ⋅ K) = ψ(L) ⋅ ψ(K), and ψ(L∗ ) = (ψ(L))∗ . ◊ A nondeterministic M-automaton A = (Q, δ, I, F) consists of a set of states Q, a transition relation δ ⊆ Q × M × Q, a set of initial states I ⊆ Q, and a set of final states F ⊆ Q. It is called finite if δ is a finite set. The elements of δ are called transitions, and the element u ∈ M is the label of the transition (p, u, q). Roughly speaking, the automaton A decorates the elements of M with states, by assigning to each u ∈ M possible runs. A run on A is a sequence of the form r = q1 u 1 q2 ⋅ ⋅ ⋅ u n q n+1 with (q i , u i , q i+1 ) ∈ δ for all 1 ⩽ i ⩽ n. We say r is a run on u ∈ M, if u = u1 ⋅ ⋅ ⋅ u n ; it is accepting if q1 ∈ I and q n+1 ∈ F. The accepted set of A is L(A) = { u ∈ M | A has an accepting run on u }
182 | 7 Automata
Two M-automata are equivalent if they accept the same set. For a given generating set Σ of M, an automaton A is called spelling if δ ⊆ Q × Σ × Q, that is, if all labels are elements of Σ. A spelling nondeterministic M-automaton is called complete if for each state p ∈ Q and each a ∈ Σ, there exists some (p, a, q) ∈ δ. It is called deterministic if the set of initial states I contains exactly one state and if for each state p ∈ Q and each element u ∈ M there is at most one state q ∈ Q such that there is a run pu 1 q1 ⋅ ⋅ ⋅ u n q n with u = u 1 ⋅ ⋅ ⋅ u n and q n = q. For a complete and deterministic spelling automaton, we can define p ⋅ u = q and we obtain an M-automaton in the sense of Section 7.1. Conversely, by choosing a generating set Σ for M, each M-automaton as defined in Section 7.1 can be converted into a complete and deterministic spelling automaton in the in the sense given earlier. The notion of nondeterministic M-automaton is therefore more general than the term M-automaton as used before. The following automaton over the monoid ℕ × ℕ with addition as operation recognizes the set { (m, n) | m = n − 4, m even }: (0, 4)
(0, 0), (2, 2) (0, 0)
(4, 0)
(2, 2)
(2, 2)
For the conversion of rational sets to nondeterministic automata, we use the Thompson construction [103] named after Kenneth Lane Thompson (born 1943). He is a well-known computer science pioneer and primarily known for his work as a co-developer of the Unix operating system. Lemma 7.11. For each set, L ∈ RAT(M) there is a nondeterministic finite M-automaton A with L = L(A). Proof: Let 1 ∈ M be the neutral element. For every rational set L, we construct an automaton AL with the following invariants: – L = L(AL ) – AL has exactly one initial state i L . It will be the single state without incoming transitions. – AL has exactly one final state f L ≠ i L . It will be the single state without outgoing transitions. The construction is by induction on the number of nested operations union, product and star, which are necessary to obtain L. If L = {u 1 , . . . , u n } is finite, then AL is the following automaton:
7.2 Rational sets
| 183
u1 iL
.. . un
fL
If L = K ∪ K for K, K ∈ RAT(M) and if the automata AK and AK are constructed already, then AL is given by AK fK
iK
1
1
fL
iL AK
1
1 f K
i K
This means that we take the disjoint union of AK and AK including all transitions and add a new initial state i L and a new final state f L . If L = KK , then AL is defined by AK fK
iK
1
AK f K
i K
For L = K ∗ we finally define 1
iL
1
AK fK
iK
1
fL
1 The last construction for L = K ∗ is correct because in AK there is no incoming arc to i k and there is no outgoing arc leaving f K . Thus, L = L(AL ) in all cases. Next, we show that for all automata one can assume that there are no ε-transition of the form (p, 1, q). Such a transition allows a spontaneous switch from state p to q. Accordingly, the procedure in the following proof is often referred to as elimination of ε-transitions. Lemma 7.12. Let A be a nondeterministic finite M-automaton and Σ a given generating set of M. Then there exists an equivalent nondeterministic finite M-automaton B accepting L with labels from Σ and only one initial state.
184 | 7 Automata Proof: We transform A = (Q, δ, I, F) by a sequence of substitutions into the automaton B = (Q , δ , {i}, F ) having the desired properties. Since δ is finite, we can assume Q to be finite, too. Let u ∈ M \ (Σ ∪ {1}) and let (p, u, q) ∈ δ be a transition. Then there exists a1 , . . . , a n ∈ Σ with u = a1 ⋅ ⋅ ⋅ a n and n ⩾ 2. We add new states q1 , . . . , q n−1 to Q and remove the transition (p, u, q) from δ. Then we add the transitions p
a1
q1
a2
q2
⋅⋅⋅
q n−1
an
q
In this manner, we proceed with every transition whose label is in M \ (Σ ∪ {1}). Next, we add a new state i and transitions { (i, 1, q0 ) | q0 ∈ I }. From now on, the state i is the only initial state. By introducing more transitions with label 1, we can assume that whenever (p, 1, q) and (q, 1, r) are transitions, then (p, 1, r) is a transition, too. We define the set of final states as F = F ∪ { p ∈ Q | (p, 1, q) is a transition with q ∈ F } This does not change the accepted set. Therefore, we have now obtained an automaton B = (Q , δ, {i}, F ), which is equivalent to A and whose labels all are in Σ ∪ {1}. We now remove all transitions (p, 1, q) ∈ δ and add the transitions { (p, a, q ) (q, a, q ) ∈ δ, a ∈ Σ } instead. This yields the automaton B. Each run in B has a canonical counterpart in B . Now let r = q0 u 1 q1 ⋅ ⋅ ⋅ u n q n be an accepting run of B with u i ∈ Σ ∪{1}. Since we added all transitive 1-transitions to B , we can assume that two consecutive labels u i and u i+1 never are both 1. By construction of F , we can assume that u n ≠ 1 (except for n = 0). So if u i = 1, then i < n and u i+1 ≠ 1. Therefore, in the case u i = 1, we can always replace the partial run q i−1 u i q i u i+1 q i+1 of r by q i−1 u i+1 q i+1 and obtain an accepting run in B. This shows L(B) = L(B ). Together with L(B ) = L(A), the proof of the lemma is complete. Generalizing nondeterministic automata, we now allow rational sets as labels. An Mautomaton with rational labels is a tuple A = (Q, δ, I, F), where Q, I, F are have the same meaning as in the definition of nondeterministic automata and δ is a subset of Q×RAT(M)×Q. A run of A is a sequence of the form q0 u 1 q1 ⋅ ⋅ ⋅ u n q n with q i ∈ Q, u i ∈ M and for all i ∈ {1, . . . , n} there is a transition (q i−1 , L i , q i ) ∈ δ with u i ∈ L i . Accepting runs and the accepted set L(A) of A are defined as above. As before, A is called finite if δ is a finite set. Of course, this does not mean that a rational set L ∈ RAT(M) which occurs as a label of a transition has to be finite. Next, we want to show that nondeterministic automata accept rational sets. We do so by changing labels of transitions; see the proof of the next lemma. The method is often referred to as state elimination.
7.2 Rational sets
| 185
Lemma 7.13. Let A be a finite M-automaton with rational labels. Then L(A) ∈ RAT(M). Proof: Let A = (Q, δ, I, F). Without loss of generality Q = {q2 , . . . , q n } with n ⩾ 2. Since the class of rational sets is closed under union, we can assume that for all q i , q j ∈ Q at most one transition (q i , L i,j , q j ) ∈ δ exists. We add two new states q0 and q1 to Q and obtain the set of states Q = {q0 , . . . , q n }. Define δ = δ ∪ { (q0 , {1} , i) | i ∈ I } ∪ { (f, {1} , q1 ) | f ∈ F } The automaton B = (Q , δ , {q0 }, {q1 }) is equivalent to A. Moreover, B satisfies the following invariants; the initial state q0 has no incoming transitions, the final state q1 has no outgoing transitions and for q i , q j ∈ Q there is at most one transition (q i , L i,j , q j ) ∈ δ . Moreover, L i,j ∈ RAT(M). In the following, we let L i,j = 0 if there is no transition from q i to q j . We continue the process of “state elimination” as long as n > 1. If this is the case, we construct an automaton C = (Q, δ, q0 , q1 ) equivalent to B with Q = {q0 , . . . , q n−1 } such that C also satisfies the three invariants of B. First, we remove the state q n and all its adjacent transitions, then we replace each transition (q i , L i,j , q j ) ∈ δ with i, j < n by (q i , Li,j , q j ) where Li,j = L i,j ∪ L i,n L∗n,n L n,j The procedure is also visualized in the following picture: L n,n
qi
L i,n
qn
L n,j
qj
qi
L i,j ∪ L i,n L∗n,n L n,j
qj
L i,j In automaton B
In automaton C
By repeatedly eliminating states, we can assume that in B we eventually have n = 1. By construction, we obtain L(B) = L0,1 ∈ RAT(M). Theorem 7.14. Let M be a monoid and L ⊆ M. Then the following statements are equivalent: (a) L is rational. (b) L is accepted by a nondeterministic finite M-automaton. (c) L is accepted by a nondeterministic finite spelling M-automaton with only one initial state. (d) L is accepted by a finite M-automaton with rational labels.
186 | 7 Automata Proof: The implication (a) ⇒ (b) is Lemma 7.11, and (b) ⇒ (c) can be seen with Lemma 7.12. For (c) ⇒ (d) one replaces all transitions (p, u, q) by (p, {u}, q). Finally, (d) ⇒ (a) is Lemma 7.13. Lemma 7.15. Every rational subset L ⊆ M is contained in a finitely generated submonoid of M. Proof: Since every nondeterministic finite automaton A has only finitely many edges, only finitely many labels U ⊆ M can occur. In particular, L(A) ⊆ U ∗ ⊆ M. Therefore, by Theorem 7.14 every rational subset L of M is contained in a finitely generated submonoid of M. A consequence of the previous lemma is that M is a rational subset of M if and only if M is finitely generated. In particular, the class of rational subsets is not closed under inverse homomorphisms, because the homomorphism ψ : M → {1} yields the subset M as preimage of the rational subset {1}. However, rational subsets are closed under homomorphisms, because in nondeterministic automata one can replace every label by its homomorphic image. The next theorem of Dawson James McKnight, Jr. (1928–1981) characterizes finitely generated monoids by the inclusion of REC(M) into RAT(M), see [74]. Theorem 7.16 (McKnight). A monoid M is finitely generated if and only if REC(M) ⊆ RAT(M). Proof: The recognizable set M by Lemma 7.15 can only be rational if M is finitely generated. The converse follows from Theorem 7.14 because, if M is finitely generated, then we can view a finite M-automaton (in the sense of Section 7.1) as a nondeterministic spelling automaton. If M is a free monoid over a finite alphabet, then the rational sets inside M are also called regular. In the next section, we deal with regular languages. We will see that the regular languages and the recognizable sets coincide; this is Kleene’s theorem.
7.3 Regular languages We have seen that the classes of recognizable and rational subsets are incomparable, in general. Finite subsets of M are always rational, but they are not recognizable, for example, in infinite groups; see Remark 7.1. On the other hand, the recognizable set ℕ over the monoid of natural numbers ℕ with multiplication as operation is not rational since the monoid (ℕ, ⋅, 1) is not finitely generated. In the following, we will stick to the convention of speaking about languages and words when we mean subsets and elements in finitely generated free monoids. We now show that the properties of being regular, rational and recognizable are equivalent for
7.3 Regular languages | 187
finitely generated free monoids. This result is known as Kleene’s theorem on regular languages [54]. Theorem 7.17 (Kleene). Let Σ be a finite alphabet. Then REC(Σ∗ ) = RAT(Σ∗ ). Proof: The inclusion REC(Σ∗ ) ⊆ RAT(Σ∗ ) follows from Theorem 7.16. Now, let L ⊆ Σ∗ be rational. From Theorem 7.14, we know that a spelling nondeterministic finite Σ∗ automaton A = (Q, δ, I, F) with n = |Q| states exists which accepts the language L. Using the so-called subset construction, we design an equivalent deterministic automaton with at most 2n states. The result of this construction is the subset automaton B. The states of B are subsets of Q, which explains the upper bound 2n . The construction starts by defining I ⊆ Q as the unique initial state in B. Suppose that P ⊆ Q has already been defined as a state of B. For each letter a ∈ Σ define P ⋅ a = { q ∈ Q | ∃p ∈ P : (p, a, q) ∈ δ } We add the set P ⋅ a to the set of states of B if it is not there already, making more and more subsets of Q become states of B. For all words w ∈ Σ∗ , the state I ⋅ w of B consists of all states q ∈ Q of A for which a run of A on w, starting in some initial state p ∈ I and ending at q, exists. The proof of this fact is carried out by induction on the length of w. Accordingly, we define the final states of B as the set of subsets which contain at least one final state of A. So, B defines the desired deterministic automaton, which accepts L. Let Σ be a finite alphabet, then we denote by REG(Σ∗ ) the family of regular languages. Theorem 7.17 says REG(Σ∗ ) = REC(Σ∗ ) = RAT(Σ∗ ). By general closure properties of recognizable and rational sets, Theorem 7.2 and Remark 7.10, we obtain the following assertion. Corollary 7.18. Let Σ be a finite alphabet. Then REG(Σ∗ ) is a Boolean algebra. If φ : Σ∗ → Γ ∗ is any homomorphism to a finitely generated monoid, then we have φ(REG(Σ∗ )) ⊆ REG(Γ ∗ ) and φ−1 (REG(Γ ∗ )) ⊆ REG(Σ∗ ). The corresponding constructions in Corollary 7.18 are effective. This means, if regular languages L and K are given, for example, by rational expressions, then we can effectively construct a recognizing homomorphism or a rational expression for, say, L \ K. The next theorem states another useful closure property for regular languages, but effectiveness is not always guaranteed. Theorem 7.19. Let Σ be a finite alphabet, L ∈ REG(Σ∗ ) be a regular language and K ⊆ Σ∗ be any language. Then the quotient LK −1 = { u ∈ Σ∗ ∃υ ∈ K : uυ ∈ L } is regular. Moreover, if K is regular, and if L and K are specified by accepting finite automata, then we can effectively find an automaton accepting LK −1 .
188 | 7 Automata
Proof: Let us assume that L is given by some finite automaton with state space Q and set of final states F. In order to define an accepting automaton for LK −1 we enlarge the set of final states to the new set F = { q ∈ Q | ∃υ ∈ K : υ labels a path from q to some state in F } It is clear that the new automaton accepts the quotient LK −1 . If K is given by a finite automaton, then we can use, for example, the product automaton construction to compute F . The effectiveness in Theorem 7.19 is given if K belongs to a class of formal languages where first, the class is closed under intersection with regular sets and second, we can decide emptiness. So, Theorem 7.19 is effective for “context-free” but not for “contextsensitive” languages; see for example [51]. Over a finitely generated free monoid Σ∗ the concepts of deterministic and nondeterministic finite automata are equivalent. Henceforth, we say that a nondeterministic finite automaton (Q, δ, I, F) is spelling if the transition relation δ is a subset of Q×Σ×Q. We also use the notation NFA as an abbreviation for a spelling automaton. We use the abbreviation ε-NFA if δ ⊆ Q × (Σ ∪ {ε}) × Q. The subset construction in the proof of Theorem 7.17 shows that for every ε-NFA with n states there exists an equivalent deterministic one with at most 2n states. Equivalence means that they both accept the same language L ⊆ Σ∗ . Minimization leads to the minimal automaton AL and its transition monoid is the syntactic monoid Synt(L); it is the smallest recognizing monoid n for L. This construction yields |Synt(L)| ⩽ 2n2 because first, AL has at most 2n states (and this bound is tight) and second, for an automaton A with s states, the transition monoid has at most s s states (and this bound is tight again). However, if we start with an ε-NFA with n states which accepts a language L, then the proof of the next theorem shows that we can realize a recognizing homomorphism inside the monoid of Boolean 2 n × n-matrices. This yields |Synt(L)| ⩽ 2n . To see the difference, consider n = 3. The n 2 number 2n2 is greater than 10 million, but 2n = 512. Another advantage of using n × n-matrices is that they can be multiplied efficiently. Theorem 7.20. Let Σ be finite and A an ε-NFA over Σ∗ with n states. Then, the syntactic 2 monoid of L = L(A) has at most 2n elements. Proof: Let A = (Q, δ, I, F) where without loss of generality Q = {1, . . . , n}. By assumption, we have δ ⊆ Q × (Σ ∪ {ε}) × Q. Removing ε-transitions does not increase the set of states. Thus, without loss of generality we may assume δ ⊆ Q × Σ × Q. For each letter a ∈ Σ, we define a Boolean n × n-matrix A a ∈ 𝔹n×n by letting the entry A ai,j to be 1 if (i, a, j) ∈ δ let A ai,j = 0 otherwise. The Boolean values 𝔹 = {0, 1} with the disjunction ∨ as addition and the conjunction ∧ as multiplication form a semiring, and, hence, the Boolean n × n-matrices with multiplication form a monoid 𝔹n×n . Thus, the mapping a → φ(a) = A a defines a homomorphism φ from Σ∗ to 𝔹n×n . If w = a1 ⋅ ⋅ ⋅ a m is a word and φ(w) = B = A a1 ⋅ ⋅ ⋅ A a m , then B i,j = 1 is true if and only if there is a path
7.4 Star-free languages | 189
from i to j labeled by w. Moreover, for P = { B ∈ 𝔹n×n | ∃i ∈ I, j ∈ F : B i,j = 1 }, we obtain φ−1 (P) = L. Therefore, φ recognizes the language L. By Theorem 7.4, the syntactic monoid is the homomorphic image of a submonoid of 𝔹n×n . Therefore, it contains at 2 most 2n elements. An important consequence of Kleene’s theorem is that the class of regular languages is closed under finite intersection and complementation. As shown by the following example, RAT(M) is not closed under finite intersection, in general. Moreover, if RAT(M) is not closed under finite intersection, then it is not closed under complementation by the rules of De Morgan (Augustus De Morgan, 1806–1871). Example 7.21 (Nonclosure under finite intersection). Let M = {a, c}∗ × b ∗ . Algebraically M is a direct product of two free monoids. Let L = (a, b)∗ (c, 1)∗ and K = (a, 1)∗ (c, b)∗ . The sets L, K ⊆ M are rational. We can represent the intersection L ∩ K by the set { (a n c n , b n ) | n ∈ ℕ }. Consider the projection π : M → {a, c}∗ which erases the second component. This is a homomorphism and the image of (a n c n , b n ) ∈ M is a n c n . Thus, π(L ∩ K) = { a n c n | n ∈ ℕ }. It is clear that for m ≠ n the words a m and a n are not syntactically equivalent. This is true for all four languages L, K, L ∩ K ⊆ M, and π(L ∩ K). Hence, none of them is recognizable. The point is that {a, c}∗ is free. Hence by Kleene’s theorem, π(L ∩ K) is not rational. Since the rational subsets are closed under homomorphic, L ∩ K is not rational. ◊
7.4 Star-free languages A fundamental result by Schützenberger relates star-free languages (defined in the following) to finite aperiodic monoids [92]. Here, a monoid N is called aperiodic if there exists n ∈ ℕ such that u n+1 = u n for all u ∈ N. Aperiodic monoids are orthogonal to groups. A nontrivial group is never aperiodic; and if in an aperiodic monoid M an element u has a right-inverse υ with uυ = 1, then u = υ = 1. Indeed, if uυ = 1, then u n υ n = 1 for all n ∈ ℕ. Now, if in addition u n+1 = u n for some n, then u = u u n υ n = u n+1 υ n = u n υ n = 1. (The situation for υ is symmetric.) Thus, aperiodic monoids are also called group-free.
Local divisors We begin with a useful construction on monoids, which simplifies the proof of Theorem 7.22 and is also the main tool in Section 7.5. Local divisors also show up in Section 7.6. For further applications, we refer to [23–30]. Let M be a monoid and c ∈ M. We define a new operation ∘ on its subset cM ∩ Mc by xc ∘ cy = xcy
190 | 7 Automata Assume that xc = x c and cy = cy . Then xc ∘ cy = xcy = x cy = x cy = x c ∘ cy . This shows that ∘ is well defined. Let cx = x c and cy be elements of cM ∩ Mc. Then cx ∘ cy = x c ∘ cy = x cy = cxy. Computation rules of M carry over to cM ∩ Mc. In particular, the operation ∘ on cM ∩ Mc is associative, because the operation of M is associative. Thus, M c = (cM ∩ Mc, ∘, c) is a monoid with c as its neutral element; it is called the local divisor of M at c. If x n = x n+1 for all x ∈ M, then a straightforward computation shows that we have y n = y n+1 for all y in the local divisor M c , too. Indeed, let cx = yc ∈ cM ∩ Mc and n ∈ ℕ. Taking the fold power of xc in the monoid M c , we obtain x n c. Thus, any local divisor of an aperiodic monoid is aperiodic. Actually, M c is the homomorphic image of the submonoid { x ∈ M | xc ∈ cM } under the mapping φ(x) = xc. The mapping φ is a homomorphism since φ(x)φ(y) = (xc) ∘ (yc) = xyc = φ(xy) and φ(1) = c, which is the neutral element in M c . Moreover, since each y ∈ M c = cM ∩ Mc can be written as y = xc, we obtain φ(x) = y and φ is surjective. Hence, M c is indeed a “divisor” in the sense of Section 7.5. Moreover, if c is idempotent (i.e., c2 = c), then M c is isomorphic to the so-called local submonoid cMc of M. These two properties coined the notion of local divisor. If M is aperiodic and 1 ≠ c ∈ M, then we have seen cm ≠ 1 for all m ∈ M. Therefore, |cM ∩ Mc| < |M|. This is the crucial property that make local divisors attractive for induction.
SF = AP As an application of the algebraic approach for recognizable languages, we present a decidable characterization of star-free languages. Decidability means that for any standard way to represent a regular language L, say, for example, by an accepting NFA, there is an algorithm which returns “yes” if and only if L is star free. Moreover, in the positive case it gives a star-free expression for L. The family SF(M) of star-free sets over M is inductively defined as follows: – If L is a finite subset of M, then L ∈ SF(M). – If K, L ∈ SF(M), then also K ∪ L ∈ SF(M) and KL ∈ SF(M). – If L ∈ SF(M), then also M \ L ∈ SF(M). Since SF(M) is closed under finite union and complementation, SF(M) is closed under finite intersection, too. In contrast to a regular expression, a star-free expression may use complementation, but but no Kleene star. This explains its name “star-free.” Consider the singletons {aa} and {ab} in {a, b}∗ . Singletons are star free by definition. In Example 7.25, we will see that {aa}∗ is not a star-free language, whereas {ab}∗ is star free (if a ≠ b). Thus, a star in the syntactic description does not tell us whether the corresponding language is star free or not. The example {aa}∗ also shows that the class of star-free languages is not closed under Kleene star. Regular star-free languages are regular. However, over arbitrary monoids, the class of rational sets and the class of star-free sets are incomparable, in general.
7.4 Star-free languages |
191
The next theorem by Schützenberger is often considered the second most important fundamental theorem on regular languages (after Kleene’s theorem). Theorem 7.22 (Schützenberger). Let L ⊆ Σ∗ be a language over a finite alphabet Σ. Then the following assertions are equivalent: (a) L is star free. (b) The syntactic monoid Synt(L) is finite and aperiodic. (c) L is recognized by a finite aperiodic monoid. The remainder of this section is devoted to the proof of this theorem. For a finite alphabet Σ, the class of regular star-free languages SF(Σ∗ ) can alternatively be defined as follows: first, 0 ∈ SF(Σ∗ ), second, if a ∈ Σ, then {a} ∈ SF(Σ∗ ), and third, for languages K, L ⊆ SF(Σ∗ ) the languages K ∪ L, Σ∗ \ L, and KL are elements of SF(Σ∗ ). Let Γ ⊆ Σ, then Γ ∗ is star free Γ ∗ = Σ∗ \ ⋃ (Σ∗ \ 0) b (Σ∗ \ 0) b∈Σ\Γ
SF(Γ ∗ )
SF(Σ∗ )
⊆ because given a star-free expression for L ∈ SF(Γ ∗ ), we Moreover, can substitute every complementation Γ ∗ \ K by (Σ∗ \ K) ∩ Γ ∗ . We can replace an intersection by using complementation, union, and complementation. This yields a star free expression for L over the alphabet Σ. Lemma 7.23. The syntactic monoid of a star-free language is aperiodic. Proof: We show that for every star-free language L ∈ SF(Σ∗ ), there exists a number n(L) ∈ ℕ such that for all x, y, u ∈ Σ∗ : xu n(L) y ∈ L ⇔ xu n(L)+1 y ∈ L We set n(0) = 1, and for a ∈ Σ, we define n({a}) = 2. Now let K, L ∈ SF(Σ∗ ) be languages for which n(L) and n(K) are suitably defined already. Then, we let n(Σ∗ \L) = n(L), as well as n(K ∪ L) = max(n(K), n(L)) and n(KL) = n(K) + n(L) + 1. We only prove the correctness of n(KL); the according proofs for the other constructions are similar. Let w = xu n(K)+n(L)+2 y ∈ KL. We assume that the splitting of w = w1 w2 into w1 ∈ K and w2 ∈ L is done within one of the u-factors. Splittings inside x or y are easier. Let u = u u , w1 = xu n1 u ∈ K and w2 = u u n2 y ∈ L with n1 + 1 + n2 = n(K) + n(L) + 2. We have either n1 ⩾ n(K) + 1 or n2 ⩾ n(L) + 1. Without loss of generality, let n1 ⩾ n(K) + 1. By our assumptions, xu n1 −1 u ∈ K and hence xu n(K)+n(L)+1 y ∈ KL. Analogously, one can show that xu n(K)+n(L)+1 y ∈ KL implies xu n(K)+n(L)+2 y ∈ KL. The difficult part in the proof of Schützenberger’s theorem is handled next. We present the proof from [62]. Lemma 7.24. If L ⊆ Σ∗ is recognized by a finite aperiodic monoid M, then L is star free. Proof: Let φ : Σ∗ → M be a homomorphism recognizing L. We have L = ⋃ p∈φ(L) φ−1 (p). Therefore, it suffices to show for all p ∈ M, that φ−1 (p) is star free. Let Σ1 = { a ∈ Σ |
192 | 7 Automata φ(a) = 1 }. In aperiodic monoids uυw = 1 implies uυ = 1 and this implies υ = 1. Hence, φ−1 (1) = Σ∗1 ∈ SF(Σ∗ ). Next, let p ≠ 1. For c ∈ Σ with φ(c) ≠ 1, we define Σ c = Σ \ {c}. Restricting φ to the domain Σ∗c now yields the homomorphism φ c : Σ∗c → M with w → φ(w). Since each word w with φ(w) ≠ 1 contains a letter c with φ(c) ≠ 1, we obtain φ−1 (p) = ⋃
⋃
−1 ∗ ∗ −1 φ−1 c (p1 ) ⋅ (φ (p2 ) ∩ cΣ ∩ Σ c) ⋅ φ c (p3 )
c∈Σ p=p 1 p 2 p 3 φ(c)=1 ̸ −1 The language φ−1 c (p1 ) contains the prefix before the first c and φ c (p3 ) corresponds to the suffixes after the last c. By induction on the size of the alphabet, we obtain −1 ∗ ∗ that φ−1 c (p1 ) and φ c (p3 ) are elements of SF(Σ c ) ⊆ SF(Σ ). It remains to show that for −1 ∗ ∗ p ∈ φ(c)M∩Mφ(c) the language φ (p)∩cΣ ∩Σ c is star free. Let T = { φ(w) | w ∈ Σ∗c }. We define a substitution
σ : {ε} ∪ cΣ∗ → T ∗ with cυ1 ⋅ ⋅ ⋅ cυ k → φ c (υ1 ) ⋅ ⋅ ⋅ φ c (υ k ) where υ i ∈ Σ∗c . Here, T ∗ denotes the free monoid over the set T rather than the Kleene star. The mapping σ replaces maximal c-free factors υ by φ c (υ). Let ψ : T ∗ → M φ(c) be the homomorphism into the local divisor of M at φ(c), that results from T → M φ(c) with φ c (w) → φ(cwc) by the universal property of free monoids (Theorem 6.1). For w = cυ1 cυ2 ⋅ ⋅ ⋅ cυ k ∈ {ε} ∪ cΣ∗ with υ i ∈ Σ∗c and k ⩾ 0, we have ψσ(w) = ψ(φ c (υ1 )φ c (υ2 ) ⋅ ⋅ ⋅ φ c (υ k )) = φ(cυ1 c) ∘ φ(cυ2 c) ∘ ⋅ ⋅ ⋅ ∘ φ(cυ k c) = φ(cυ1 cυ2 ⋅ ⋅ ⋅ cυ k c) = φ(wc) Thus, wc ∈ φ−1 (p) ∩ cΣ∗ ∩ Σ∗ c ⇔ w ∈ σ −1 ψ−1 (p) and hence φ−1 (p) ∩ cΣ∗ ∩ Σ∗ c = σ −1 ψ−1 (p) ⋅ c. Therefore, it is sufficient to show that σ −1 ψ−1 (p) ∈ SF(Σ∗ ). Recall that we have φ(c) ≠ 1, hence |M φ(c)| < |M| since M is aperiodic. By induction on the size of the recognizing monoid, we obtain ψ−1 (p) ∈ ∗ SF(T ∗ ), and by induction on the size of the alphabet φ−1 c (t) ∈ SF(Σ ) for all t ∈ T. By structural induction, we now show that σ −1 (K) ∈ SF(Σ∗ ) for every language K ∈ SF(T ∗ ): σ −1 (t) = c ⋅ φ−1 c (t) −1
∗
∗
−1
for t ∈ T ∗
σ (T \ K) = (Σ \ σ (K)) ∩ ({ε} ∪ cΣ )
for K ∈ SF(T ∗ )
σ −1 (K1 ∪ K2 ) = σ −1 (K1 ) ∪ σ −1 (K2 )
for K1 , K2 ∈ SF(T ∗ )
σ −1 (K1 ⋅ K2 ) = σ −1 (K1 ) ⋅ σ −1 (K2 )
for K1 , K2 ∈ SF(T ∗ )
The last relation holds because of σ(cw1 cw2 ) = σ(cw1 )σ(cw2 ).
7.5 Krohn–Rhodes theorem | 193
Proof of Theorem 7.22: If L is recognized by a finite aperiodic monoid, then, by Lemma 7.24, L is star free. If L is star free, then by Lemma 7.23 Synt(L) is aperiodic. With Kleene’s theorem (Theorem 7.17), we obtain SF(Σ∗ ) ⊆ RAT(Σ∗ ) and that all star-free languages are recognizable. By Theorem 7.9, Synt(L) is finite. Since Synt(L) recognizes the language L, there is some finite aperiodic monoid recognizing L. Schützenberger’s theorem follows. Given a regular expression for L, we can effectively compute the corresponding syntactic monoid. Given a finite monoid we can check whether x|M|+1 = x|M| for all x ∈ M. This last condition expresses that a finite monoid is aperiodic. Thus, whether a given regular language is star free can be checked using this method. Example 7.25. Let Σ = {a, b}. The language L = (ab)∗ ∈ RAT(Σ∗ ) is star free, because (ab)∗ = {ε} ∪ (aΣ∗ ∩ Σ∗ b ∩ (Σ∗ \ Σ∗ aaΣ∗ ) ∩ (Σ∗ \ Σ∗ bbΣ∗ )) ∈ SF(Σ∗ ) The syntactic monoid of L has six elements {1, a, b, ab, ba, 0}. It is aperiodic because x3 = x2 for all elements x. In contrast, the syntactic monoid of the language (aa)∗ ∈ RAT({a}∗ ) is the group ℤ/2ℤ, so (aa)∗ is not star free. ◊ Star-free languages play a central role in formal verification. In this area, specifications are formulated using some simple logical formalism; and many popular logics turn out to have exactly the same expressive power as star-free languages. For a survey, refer to [24].
7.5 Krohn–Rhodes theorem Krohn and Rhodes studied the structure of finite monoids via decompositions. In 1965, they showed that every finite monoid can be represented as a wreath product of flip– flops and finite simple groups [59]; here, a flip–flop is a three element monoid. A representation as a wreath product of flip–flops and finite simple groups is called a Krohn–Rhodes decomposition. The most common presentation of the Krohn–Rhodes theorem uses so-called transformation monoids; see [29, 38, 60, 87, 101]. The concept of transformation monoids is a mixture of automata and monoids. Gérard Lallement (1935–2006) published a proof of the Krohn–Rhodes theorem in 1971, which only uses monoids in the traditional sense [63]; unfortunately, his proof had a flaw, and the corrected version relies on a more technical notion of wreath products [64]. The proof presented here only uses monoids in the traditional sense, too, but it is inspired by a similar proof using transformation monoids [29]. The main technique is local divisors as introduced in Section 7.4. As a byproduct of this approach, we do not use any semigroups which are not monoids. We start this section with an introduction to some useful terminology, including divisors and wreath products. In order to prove that a monoid T divides some given
194 | 7 Automata
wreath product, it is often more illustrative to apply the so-called wreath product principle. Then, we consider flip–flops and their generalizations; monoids where every nonneutral element is a right zero. We present a decomposition technique for arbitrary finite groups into simple groups. A group G is simple if {1} and G are its only normal subgroups. The induction parameter in our proof of the Krohn–Rhodes theorem is the number of elements x satisfying {x} ≠ Mx ≠ M, that is, the elements which are neither a right zero nor invertible. We decompose a monoid M into a local divisor M c and an augmented submonoid N # . The latter is a monoid extension of N, where, for every element x ∈ N, there exists a unique right zero x ∈ N # such that xy = xy for all x, y ∈ N. Moreover, the right zeros of N are also right zeros in N # . We need the following notions to state the Krohn–Rhodes theorem; further details are given in the following. A monoid M is a divisor of a monoid N if M is a homomorphic image of a submonoid of N. Let M N be the set of all mappings f : N → M. The wreath product M ≀ N is the set M N × N with the composition (f, a)(g, b) = (h, ab) where the mapping h : N → M is defined by h(x) = f(x)g(xa) for x ∈ N. The flip–flop monoid U2 = {1, a, b} with neutral element 1 is defined by xa = a and xb = b for all x ∈ U2 . Theorem 7.26 (Krohn, Rhodes). Let M be a finite monoid. Let CM be the closure of U2 and all simple group divisors of M under wreath products. Then M divides a monoid in CM . A simple group divisor is a divisor which is a simple group. Another way of stating the Krohn–Rhodes theorem is by saying that every finite monoid M divides a sequence of wreath products where each factor is either the flip–flop U2 or a simple group divisor of M; such a sequence is called a Krohn–Rhodes decomposition of M. We note that the wreath product ≀ on monoids is not associative, that is, in general (M ≀ N) ≀ T is not isomorphic to M ≀ (N ≀ T). The corresponding operation on transformation monoids is associative. Throughout this section, for a finite monoid M, we let CM be the smallest class of finite monoids containing U2 and the simple group divisors of M such that the wreath product of any two monoids in CM is again in CM . After some preparatory work (on divisors and wreath products), we prove the Krohn–Rhodes theorem by showing that it holds for bigger and bigger classes of monoids until, at the end of this section, we are able to give the proof for arbitrary monoids.
Divisors and surjective partial homomorphisms Let M and N be monoids. If a submonoid U of N and a surjective homomorphism φ : U → M exist, then M is a divisor of N. In this case, we write M ≺ N. The intuitive idea is that N is more powerful than M (for instance, in the sense that every language recognizable by M is also recognizable by N). A group divisor is a divisor which
7.5 Krohn–Rhodes theorem | 195
forms a group. In many cases, it is more convenient to work with surjective partial homomorphisms to prove M ≺ N. A partial mapping ψ : N → M is a partial homomorphism if ψ(nn ) is defined whenever both ψ(n) and ψ(n ) are defined, and then we have ψ(nn ) = ψ(n)ψ(n ). As usual, the partial homomorphism ψ is surjective if for every m ∈ M there exists n ∈ N with ψ(n) = m. As shown in the following lemma, a surjective partial homomorphism is a mere reformulation of division. Lemma 7.27. We have M ≺ N if and only if there exists a surjective partial homomorphism ψ : N → M. Proof: First, let M ≺ N. Then there is a submonoid U of N and a surjective homomorphism ψ : U → M. The mapping ψ is a surjective partial homomorphism from N to M. For the other direction, let ψ : N → M be a partial homomorphism. The domain of ψ is a subsemigroup U of N. If U contains the neutral element 1 of N, then ψ(n) = ψ(n ⋅ 1) = ψ(n)ψ(1) for all n ∈ U. This shows that ψ(1) is neutral for all elements ψ(n). Since ψ is surjective, we obtain ψ(1) = 1. Hence, ψ : U → M is a surjective homomorphism. If 1 ∈ ̸ U, let φ : U ∪ {1} → M be defined by φ(1) = 1 and φ(n) = ψ(n) for n ∈ U. Then U ∪ {1} is a submonoid of N and φ is a surjective homomorphism. We freely apply Lemma 7.27 without further reference. Lemma 7.28. If M ≺ N and N ≺ T, then M ≺ T. Proof: Let ψ1 : N → M and ψ2 : T → N be surjective partial homomorphisms. Then their composition ψ1 ∘ ψ2 : T → M is a surjective partial homomorphism.
Wreath products For a monoid M and an arbitrary set X, let M X = { f : X → M | f is a mapping }. For f, g ∈ M X , we define f ∗ g by (f ∗ g)(x) = f(x) g(x). The set M X can be interpreted as a product of |X| copies of M, and ∗ is the pointwise operation on the components. In particular, M X forms a monoid with ∗ as the operation. Let M and N be two monoids. For g : N → M and a ∈ N the mapping ag : N → M is defined by (ag)(x) = g(xa). Note that a(f ∗ g) = af ∗ ag. We can now define an operation on M N × N as follows. For pairs (f, a) and (g, b) in M N × N, we set (f, a)(g, b) = (f ∗ ag, ab) For the set M N × N with this operation, we write M≀N and call it the wreath product of M and N. Next, we show that M ≀ N forms a monoid. The neutral element is (ι, 1) with ι(x) = 1 for all x ∈ N. Consider (f, a), (g, b), (h, c) ∈
196 | 7 Automata M N × N. Then the following computation shows that the operation in M ≀ N is associative: ((f, a)(g, b))(h, c) = (f ∗ ag, ab)(h, c) = (f ∗ ag ∗ abh, abc) = (f ∗ a(g ∗ bh), abc) = (f, a)(g ∗ bh, bc) = (f, a)((g, b)(h, c)) Next, we show that the direct product is a divisor of the wreath product. Lemma 7.29. Let M and N be two monoids. Then M × N ≺ M ≀ N. Proof: Let U be the submonoid of M N × N consisting of all the elements (f, a) such that f is constant, that is, there exists m ∈ M with f(x) = m for all x ∈ N. A surjective homomorphism φ : U → M × N is given by φ(f, a) = (f(1), a). Note that f(1) = f(x) for all x ∈ N since f is constant. This yields φ((f, a) (g, b)) = φ(f ∗ ag, ab) = (f(1) g(a), ab) = (f(1) g(1), ab) = (f(1), a) (g(1), b) = φ(f, a) φ(g, b). As the following lemma shows, the division relation between monoids carries over to wreath products. Lemma 7.30. If M ≺ S and N ≺ T, then M ≀ N ≺ S ≀ T. Proof: Let σ : S → M and τ : T → N be surjective partial homomorphisms. Let G = { g : T → S | σ(g(t1 )) = σ(g(t2 )) if τ(t1 ) = τ(t2 ) is defined } be the functions in S T which are compatible with σ and τ. For (g, t) ∈ G × T, we define ψ(g, t) = (σ(g), τ(t)) ∈ M N × N by (σ(g))(n) = σ(g(τ −1 (n))). By choice of G, the term σ(g(τ −1 (n))) defines a unique element in M. For every mapping f : N → M, we can choose an extension f e ∈ G such that f e (t) = s for some s ∈ σ −1 (f(τ(t))). If τ(t) is undefined, then f e (t) can be chosen arbitrarily; for instance, we could set f e (t) = 1 whenever τ(t) is undefined. If (f, n) ∈ M N × N and τ(t) = n, then ψ(f e , t) = (f, n). This shows that ψ is surjective. It remains to show that ψ is a partial homomorphism. Suppose that ψ(g, t) and ψ(g,̂ t)̂ are defined. Then ̂ = (σ(g) ∗ τ(t)σ(g), ̂ ̂ τ(t)) ̂ τ(t t)) ψ(g, t) ψ(ĝ , t)̂ = (σ(g), τ(t)) (σ(g), ̂ ̂ = (σ(g ∗ t g), ̂ τ(t t)) ̂ τ(t t)) = (σ(g) ∗ σ(t g), ̂ = ψ(g ∗ t g,̂ t t)̂ = ψ((g, t)(g,̂ t)) ̂ 1 )) = Note that if τ(t1 ) = τ(t2 ) is defined, then τ(t1 t) = τ(t2 t) is defined and thus σ(t g(t ̂ ̂ ̂ ̂ σ(g(t1 t)) = σ(g(t2 t)) = σ(t g(t2 )). Therefore, we indeed have t g ∈ G and, hence, g ∗ t ĝ ∈ G. The combination of Lemmas 7.28, 7.29 and 7.30 tells us that division ≺ is transitive and that it is compatible with wreath products and direct products. For instance, if both M and N divide some monoid in CT , then so do M ≀ N and M × N. Remember that CT is the
7.5 Krohn–Rhodes theorem |
197
closure of U2 and all simple group divisors of the monoid T under wreath products. We will apply these properties of division without further reference. If ν : A∗ → N is a homomorphism to a monoid N, then we can consider the alphabet A ν = N × A and the length-preserving mapping σ ν : A∗ → A∗ν defined by σ ν (a1 ⋅ ⋅ ⋅ a k ) = (1, a1 )(ν(a1 ), a2 ) ⋅ ⋅ ⋅ (ν(a1 ⋅ ⋅ ⋅ a k−1 ), a k ) for a i ∈ A. One can think of σ ν as decorating the ith letter a i with the element ν(a1 ⋅ ⋅ ⋅ a i−1 ) ∈ N. The homomorphism ν defines an automaton with states N whose transition function is given by n ⋅ a = n ν(a). The initial state is the neutral element 1. This allows to interpret the letter (ν(a1 ⋅ ⋅ ⋅ a i−1 ), a i ) as a pair consisting of the previous state ν(a1 ⋅ ⋅ ⋅ a i−1 ) and the current letter a i . Using these notions, we can introduce the wreath product principle as a method for showing division of wreath products. If M uses the decorated word σ ν (u) as input, then this behavior can also be realized by the wreath product M ≀ N on u. u ∈ A∗
N
M
ν(u)
μ(σ ν (u))
Fig. 7.1. Visualization of the wreath product M ≀ N as a cascade operation.
Proposition 7.31 (Wreath product principle). Let M, N, and T be monoids, and let φ : A∗ → T be a surjective homomorphism. Suppose there exist homomorphisms ν : A∗ → N and μ : A∗ν → M such that, for all words u ∈ A∗ , the images ν(u) and μ(σ ν (u)) together determine φ(u), that is, (ν(u) = ν(υ) ∧ μ(σ ν (u)) = μ(σ ν (υ))) ⇒ φ(u) = φ(υ) for all u, υ ∈ A∗ . Then T ≺ M ≀ N. Proof: For a ∈ A, we define f a : N → M by f a (n) = μ(n, a). Let ψ : A∗ → M ≀ N be the homomorphism defined by ψ(a) = (f a , ν(a)) for a ∈ A. For every word u = a1 ⋅ ⋅ ⋅ a k
198 | 7 Automata with a i ∈ A, the image ψ(u) = (f, ν(u)) satisfies f = f a1 ∗ ν(a1 )f a2 ∗ ⋅ ⋅ ⋅ ∗ ν(a1 ⋅ ⋅ ⋅ a k−1 )f a k This yields f(1) = f a1 (1) ⋅ f a2 (ν(a1 )) ⋅ ⋅ ⋅ f a k (ν(a1 ⋅ ⋅ ⋅ a k−1 )) = μ(1, a1 )μ(ν(a1 ), a2 ) ⋅ ⋅ ⋅ μ(ν(a1 ⋅ ⋅ ⋅ a k−1 ), a k ) = μ(σ ν (u)) In particular, ψ(u) = ψ(υ) implies both ν(u) = ν(υ) and μ(σ ν (u)) = μ(σ ν (υ)), and thus it implies φ(u) = φ(υ). Therefore, ψ(u) → φ(u) defines a surjective partial homomorphism M ≀ N → T. There exists a converse of the wreath product principle as stated in Proposition 7.31; see Exercise 7.14: if T ≺ M ≀ N, then there exist homomorphisms ν and μ with the above properties. However, this direction uses a direct product of |N| copies of M when defining μ. Using the notation from Proposition 7.31, we give an automata theoretic view on the wreath product principle; see Figure 7.1 for a sketch. Suppose we are given an input a1 ⋅ ⋅ ⋅ a k with a i ∈ A. Let n0 = 1 and n i = ν(a1 ⋅ ⋅ ⋅ a i ) for i ⩾ 1. The homomorphism σ decorates the letter a i with n i−1 . To this end, we can view ν as an automaton which reads a i and updates its state from n i−1 to n i = n i−1 ⋅ ν(a i ). Let m0 = 1 and m i = μ((n0 , a1 ) ⋅ ⋅ ⋅ (n i−1 , a i )) for i ⩾ 1. In step i, the automaton for μ reads the decorated letter (n i−1 , a i ) and it updates its state by m i = m i−1 ⋅ μ(n i−1 , a i ). A∗
a1
a2
a3
⋅⋅⋅
a i−1
ai
a i+1
⋅⋅⋅
ak
N
1
n1
n2
⋅⋅⋅
n i−2
n i−1
ni
⋅⋅⋅
n k−1
nk
M
1
m1
m2
⋅⋅⋅
m i−2
m i−1
mi
⋅⋅⋅
m k−1
mk
Now, the wreath product principle says the following. If, for every input a1 ⋅ ⋅ ⋅ a k , we can deduce the element φ(a1 ⋅ ⋅ ⋅ a k ) ∈ T from (n k , m k ), then the wreath product principle tells us that T ≺ M ≀ N.
Right zeros and units An element x of a monoid M is a right zero if ax = x for all a ∈ M. This is equivalent to saying that Mx = {x}. If x ∈ M is a right zero and a is an arbitrary element of M, then the products xa and ax = x are also right zeros. For every set X there is a monoid such that all elements in X are right zeros and there is no other element except for the neutral element 1U ∈ ̸ X. We define this monoid U X = X ∪ {1U } with neutral element 1U by xy = y for all x, y ∈ X. Associativity follows because a product of elements in X,
7.5 Krohn–Rhodes theorem | 199
regardless of the order of execution, is always the last element. We write 1U for the neutral element in U X since in some of our applications given as follows, X will be a monoid (in which case X contains an element 1 different from 1U ). If |X| = |Y|, then U X and U Y are isomorphic. Therefore, we also write U n instead of U X with |X| = n. Proposition 7.32. U2n ≺ U n × U2 Proof: Let X = {x1 , . . . , x n }, Y = {y1 , . . . , y n }, and Z = {x, y}. We show that U X∪Y ≺ U X × U Z . We define a surjective partial homomorphism ψ : U X × U Z → U X∪Y by ψ(x i , x) = x i and ψ(x i , y) = y i as well as ψ(1U , 1U ) = 1U . Obviously, ψ is surjective. ̂ for all It is a partial homomorphism since ψ(x i , z)ψ(x j , z)̂ = ψ(x j , z)̂ = ψ((x i , z)(x j , z)) z, ẑ ∈ Z. Since U n is a submonoid of U n+m for all m, n ⩾ 0, we trivially have U2n−1 ≺ U2n . Hence, by repeatedly applying Proposition 7.32, we see that U n divides the direct product of at most 1+log n copies of U2 . This leads to the following observation. Remember that CM denotes the wreath product closure of U2 and all simple group divisors of M. Corollary 7.33. For every monoid M and every n ⩾ 0, the monoid U n divides a monoid in CM . Proof: By Proposition 7.32, the monoid U n divides some k-fold direct product of U2 . Hence, it divides some k-fold wreath product of U2 . The claim follows since U2 ∈ CM and CM is closed under wreath products. The following theorem yields Krohn–Rhodes decompositions of finite groups G because these can be decomposed iteratively until no nontrivial normal subgroups remain. Theorem 7.34. Let G be a group and N a normal subgroup of G. Then G ≺ N ≀ (G/N). Proof: Let H = G/N and let h1 , h2 , . . . , ∈ G with h1 = 1 be the representatives of the cosets of N, that is, for all x ∈ G there is a unique i with Nh i = Nx. We can identify H with {h1 , h2 , . . . } by the bijection Nh i → h i . For g ∈ G let [g] = h i , if Ng = Nh i . Note that [1] = h1 = 1. The operation in H is given by [x][y] = [xy]. This is well defined because, for all x, y ∈ G, we have N[x][y] = Nxy = N[xy]. For g ∈ G, let f g : H → N be given by f g (x) = xg[xg]−1 . Note that xg[xg]−1 ∈ N since Nxg = N[xg]. We define a surjective partial homomorphism ψ : N ≀ H → G by ψ(f g , [g]) = f g (h1 )[g] = f g (1)[g]. Since f g (1)[g] = g[g]−1 [g] = g, the mapping ψ is indeed surjective. It remains to show that ψ is a partial homomorphism. Consider two elements (f a , [a]) and (f b , [b]) in N H × H. For every x ∈ H, we have (f a ∗ [a]f b )(x) = f a (x) ⋅ f b ([xa]) = xa[xa]−1 ⋅ [xa]b[xab]−1 = xab[xab]−1 = f ab (x)
200 | 7 Automata This shows that f a ∗ [a]f b = f ab . Thus, ψ((f a , [a])(f b , [b])) = ψ(f a ∗ [a]f b , [ab]) = ψ(f ab , [ab]) = ab = ψ(f a , [a]) ψ(f b , [b]) An interpretation of Theorem 7.34 in terms of the wreath product principle is as follows. Let N be a normal subgroup of a group G and let H = G/N. By choosing representatives for the cosets, we can assume H ⊆ G. Moreover, the representative of the coset N is 1. Let [g] denote the image of g ∈ G in H. Suppose we are given a sequence a1 ⋅ ⋅ ⋅ a k of elements in G and we want to compute the resulting product. First, H decorates the input sequence by (p0 , a1 ) ⋅ ⋅ ⋅ (p k−1 , a k ) where p i = [a1 ⋅ ⋅ ⋅ a i ] ∈ H. We obtain p i by p i−1 ⋅ [a i ]. In the next phase, N reads the decorated input and it computes q i = a1 ⋅ ⋅ ⋅ a i [a1 ⋅ ⋅ ⋅ a i ]−1 ∈ N. This is done by q i = q i−1 ⋅ [a1 ⋅ ⋅ ⋅ a i−1 ]a i [a1 ⋅ ⋅ ⋅ a i ]−1 = q i−1 ⋅ p i−1 a i [p i−1 a i ]−1 . In particular, the element p i−1 a i [p i−1 a i ]−1 is uniquely determined by the decorated letter (p i−1 , a i ). G∗
a1
a2
a3
⋅⋅⋅
ak
G/N
1
[a1 ]
[a1 a2 ]
⋅⋅⋅
p k−1
p k = [a1 ⋅ ⋅ ⋅ a k ]
N
1
a1 [a1 ]−1
a1 a2 [a1 a2 ]−1
⋅⋅⋅
q k−1
q k = a1 ⋅ ⋅ ⋅ a k [a1 ⋅ ⋅ ⋅ a k ]−1
Finally, the value of the product a1 ⋅ ⋅ ⋅ a k in G is obtained as q k p k . By the wreath product principle, this shows G ≺ N ≀ H. Corollary 7.35. If G is a finite group, then G divides a monoid in CG . Proof: The proof is by induction on |G|. If G is simple, then G ∈ C G . Otherwise, there exists a nontrivial normal subgroup N of G. By induction, the group N divides a monoid in CN and the group G/N divides a monoid in CG/N . We have CN ∪ CG/N ⊆ CG . Thus, by Theorem 7.34, the group G divides a monoid in CG . We note that the wreath product of two groups is again a group. Since U2 is not involved in the above decomposition of groups, every finite group G divides a group in C G . Remember that an element x of a monoid M is invertible if there exists y ∈ M with xy = yx = 1. If M is finite, then this is equivalent to Mx = M; see Exercise 1.3. The invertible elements of M are also called units. The following theorem leads to the Krohn–Rhodes decomposition of monoids M such that all elements x ∈ M satisfy either Mx = {x} or Mx = M, that is, every element is either a right zero or invertible. Theorem 7.36. If a monoid M can be written as a union of Z and G such that all elements in Z are right zeros and all elements in G are invertible, then M ≺ U Z ≀ G. Proof: We can assume that M is nontrivial and, hence, Z and G are disjoint. For z ∈ Z, we define the mapping f z : G → U Z by f z (g) = zg−1 . Note that zg−1 is indeed a right
7.5 Krohn–Rhodes theorem | 201
zero. Let ι : G → U Z with ι(g) = 1U be the constant mapping to the neutral element 1U of U Z . We set 1U g = g for all g ∈ G. This allows us to define a surjective partial homomorphism ψ : U Z ≀ G → M by ψ(f, g) = f(1) g if f is either of the form f = f z or f = ι. The mapping ψ is surjective since ψ(ι, g) = g for g ∈ G and ψ(f z , 1) = z for z ∈ Z. It remains to show that ψ is a partial homomorphism. Consider h ∈ G and (f, g) ∈ U Z ≀ G such that ψ(f, g) is defined. Then ψ((f, g)(ι, h)) = ψ(f ∗ gι, gh) = ψ(f, gh) = f(1)gh = ψ(f, g)ψ(ι, h) and for z ∈ Z we have ψ((f, g)(f z , h)) = ψ(f ∗ gf z , gh) = ψ(f zg−1 , gh) = zg−1 ⋅ gh = zh = ψ(f, g)zh = ψ(f, g) ψ(f z , h) where the second to last equality uses the fact that z is a right zero. Suppose M is a union of right zeros Z and units G. The decomposition in Theorem 7.36 can also be formulated in terms of the wreath product principle. We want to evaluate a sequence a1 ⋅ ⋅ ⋅ a k of elements in M by using a decoration mechanism and products in Z and G, only. Let g i be the value of a1 ⋅ ⋅ ⋅ a i when ignoring all right zeros. Formally, all right zeros are mapped to the neutral element of G. The component for U Z reads the decorated input (1, a1 )(g1 , a2 ) ⋅ ⋅ ⋅ (g k−1 , a k ). Whenever it reads a letter (g i−1 , a i ) for some right zero a i ∈ Z, it sets its state to z i = a i g−1 i−1 . If a i is in G, then the state z i = z i−1 does not change. Finally, the value of the product a1 ⋅ ⋅ ⋅ a k is obtained as z k g k . Suppose i is maximal such that a i is a right zero. Then, a1 ⋅ ⋅ ⋅ a k = a i ⋅ a i+1 ⋅ ⋅ ⋅ a k and a i+1 ⋅ ⋅ ⋅ a k is a group element. This yields z k g k = z i g k = a i g−1 i−1 ⋅ g k = a i ⋅ a i+1 ⋅ ⋅ ⋅ a k , as desired. For example, consider the input abybzaa for a, b ∈ G and y, z ∈ Z. This yields the following picture: M
a
b
y
b
z
a
a
G
1
a
ab
ab
abb
abb
abba
abbaa
UZ
1U
1U
1U
y(ab)−1
y(ab)−1
z(abb)−1
z(abb)−1
z(abb)−1
We obtain the result zaa of the product abybzaa by multiplying the last two states z(abb)−1 ⋅ abbaa. Since it is always possible to compute the value of a1 ⋅ ⋅ ⋅ a k from the pair of states at the end of the run, the wreath product principle yields M ≺ U Z ≀ G. Corollary 7.37. If every element in a finite monoid M is either a right zero or a unit, then M divides a monoid in CM . Proof: Let Z be the right zeros in M, and let G be the units. By Theorem 7.36 it suffices to show that each of U Z and G divides a monoid in CM . The claim for U Z follows by Corollary 7.33, and the claim for G follows by Corollary 7.35.
202 | 7 Automata
Neither right zero nor unit By Corollary 7.37, the Krohn–Rhodes theorem holds for monoids M, which can be written as disjoint union of right zeros and units. Hence, the “difficult” elements of M are exactly those which are neither a right zero nor a unit, that is, the elements x ∈ M with {x} ≠ Mx ≠ M. This leads to the following definition. For a monoid M let K(M) = { x ∈ M | {x} ≠ Mx ≠ M } Lemma 7.38. If N is a finite submonoid of a monoid M, then K(N) ⊆ K(M). Proof: Let x ∈ K(N). Since x is not a right zero, there exists a ∈ N with ax ≠ x. Thus, x is not a right zero in M either. Suppose that x has an inverse y ∈ M. The element x generates a cyclic subsemigroup ⟨x⟩ of N. Since ⟨x⟩ is finite, there exists n ⩾ 1 with x2n = x n ; see Exercise 1.6 (a). Multiplication by y n yields x n = 1. Then, y = x n−1 ∈ ⟨x⟩ ⊆ N, and x is already invertible in N. Remember that the local divisor of M at c ∈ M is the set M c = cM ∩ Mc with the operation xc ∘ cy = xcy. We have seen in Section 7.4 that M c is a divisor of M. It is the homomorphic image of the submonoid { x ∈ M | xc ∈ cM } of M. Lemma 7.39. Let M be a finite monoid, and let c ∈ K(M). Then |K(M c )| < |K(M)|. Proof: Since c ∈ K(M) \ K(M c ), it suffices to show that K(M c ) ⊆ K(M). If x ∈ cM ∩ Mc is a right zero in M, then yx = x for all y ∈ M. In particular, for all yc ∈ cM we have yc ∘ x = yx = x. Hence, x is a right zero in M c . Suppose that there exists an element x ∈ cM ∩ Mc which is invertible in M. Then, yx = 1 for some y ∈ M. It follows that M = Myx ⊆ Mx ⊆ Mc ⊆ M and hence M = Mc; this is a contradiction since c is not invertible in M. Therefore, we have K(M c ) ⊆ K(M), as desired.
Augmented monoids Let M be a monoid. Every element a ∈ M induces a mapping τ a : M → M by τ a (x) = xa. If two such mappings τ a and τ b coincide, then a = τ a (1) = τ b (1) = b. Thus, every mapping τ a uniquely determines the element a ∈ M. Moreover, there is a strong connection between the composition of mappings and the product in M. This is more evident if we reverse the order of the composition; let τ a τ b = τ b ∘ τ a , that is, we first apply τ a and then we apply τ b . With this notation, we have τ a τ b = τ ab . This gives an interpretation of the elements of M as mappings M → M. The mapping τ a is constant if and only if a is a right zero. If we add all constant mappings, then { τ b | b ∈ M } together with the set of constants is closed under composition; in particular, it forms a monoid. If a is a right zero, then the constant mapping x → a is τ a ∈ { τ b | b ∈ M }. If a is not a right zero, then the mapping x → a is not in { τ b | b ∈ M } and we have introduced a new element.
7.5 Krohn–Rhodes theorem |
203
We reproduce the augmentation of constants on the monoid M without the detour over mappings. Let the augmented monoid M # of M be given by the disjoint union M # = M ∪ { a a ∈ M is not a right zero } The idea is that if a is not a right zero, then we add the “constant mapping to a.” The operation on M is extended to M # by xa = a
{ ab ab = { ab {
if ab ∈ M is a right zero if ab ∈ M is not a right zero
for x ∈ M # and a, b ∈ M. The neutral element of M is also neutral in M # . A straightforward computation shows that the operation on M # is associative. Thus, M # forms a monoid extension of M. All elements in { x | x ∈ M is not a right zero } are right zeros in M # , every right zero in M is also a right zero in M # , and the invertible elements in M and M # coincide. Therefore, we have K(M) = K(M # ). Lemma 7.40. If G ≺ M # and G is a group, then G ≺ M. Proof: Since 1 ∈ M ⊆ M # , it suffices to consider the case |G| ⩾ 2. Let U ⊆ M # be a submonoid and φ : U → G a surjective homomorphism. We consider mutually inverse elements a, b ∈ G\{1} and arbitrary preimages e, x, y ∈ U with φ(e) = 1, φ(x) = a and φ(y) = b. Suppose that e is a right zero; then 1 = φ(e) = φ(xe) = φ(x)φ(e) = a ⋅ 1 = a in contradiction to the choice of a. Suppose that x is a right zero. Then a = φ(x) = φ(yx) = φ(y)φ(x) = ba = 1 yields a contradiction. Thus, U does not contain any right zeros. Since all elements in M # \ M are right zeros, we obtain U ⊆ M. This shows G ≺ M.
Proof of the Krohn–Rhodes theorem The central part in our proof of the Krohn–Rhodes theorem is the following result. It gives a decomposition of M as a wreath product of a local divisor M c and an augmented submonoid N # . Theorem 7.41. Let M be a monoid generated by A; let c ∈ A and N be the submonoid of M generated by A \ {c}. Then M ≺ (U N × M c ) ≀ (N # × U1 ) Proof: Let Z = { x | x ∈ N is not a right zero } be disjoint from N, and let N # = N ∪ Z be the augmented monoid. We extend ⋅ to a mapping ⋅ : N → N # by setting x = x if x ∈ N is a right zero. Then for all x ∈ N the element x is a right zero in N # . In N # , we have ab = ab for all a, b ∈ N. For each element x ∈ N # there exists a unique element
204 | 7 Automata n ∈ N with x ∈ {n, n}, we denote this element by x.̂ Let U1 = {1, 0} with the usual multiplication of integers as operation. Let φ : A∗ → M be the natural homomorphism, and let ν : A∗ → N # × U1 be the homomorphism defined by {(a, 1) ν(a) = { (1, 0) {
if a ∈ A \ {c} if a = c
We want to apply the wreath product principle. The extended alphabet is A ν = N # × U1 × A and the length-preserving mapping σ : A∗ → A∗ν is given by σ(a1 ⋅ ⋅ ⋅ a k ) = (1, a1 )(ν(a1 ), a2 ) ⋅ ⋅ ⋅ (ν(a1 ⋅ ⋅ ⋅ a k−1 ), a k ). The mapping σ adds information given by ν to a word. We define a homomorphism μ : A∗ν → U N × M c by μ(x, 1, c) = (x,̂ c)
μ(x, e, a) = (1U , c)
̂ μ(x, 0, c) = (1U , c xc) for a ∈ A \ {c}, x ∈ N # and e ∈ U1 . Note that (1U , c) is the neutral element of U N × M c . Consider a word u = a1 ⋅ ⋅ ⋅ a k with a i ∈ A. We show that φ(u) is uniquely determined by the combination of ν(u) and μ(σ(u)). In its first component, the homomorphism ν multiplies the letters in u from left to right, except that whenever it sees the letter c, it restarts its evaluation from 1. Moreover, the second component of ν keeps track of whether there already was an occurrence of c. The mapping σ provides μ with this evaluation of ν. The homomorphism μ ignores all letters a ∈ A \ {c}. When seeing the first occurrence of the letter c, the homomorphism μ stores the evaluation of ν at the prefix before the first c in its U N component, and this value remains unchanged in the rest of the computation. At all future c’s, the homomorphism μ takes the evaluation x̂ of the factor between the current c and the previous c, and it updates its current ̂ = m xc. ̂ In particular, at every point after the first c, the M c comvalue m by m ∘ c xc ponent contains the evaluation of the input from the first c up to the previous c. For instance, the word u = aacaacacca for a ∈ A \ {c} leads to the following picture. A∗
a
a
c
a
a
c
a
c
c
a
U1
1
1
1
0
0
0
0
0
0
0
0
N#
1
a
a2
1
a
a2
1
a
1
1
a
Mc
c
c
c
c
c
c
ca2 c
ca2 c
ca2 cac
ca2 cacc
ca2 cacc
UN
1U
1U
1U
a2
a2
a2
a2
a2
a2
a2
a2
The components for U1 and N # can only read the input from A∗ . The components for M c and U N , on the other hand, each have access to the first three lines. Suppose that ν(u) = (x, e) ∈ N # × U1 and μ(σ(u)) = (n, m) ∈ U N × M c . If e = 1, then u contains no letter c and, thus, φ(u) = x. Suppose that e = 0. In this case, the
7.6 Green’s relations
| 205
letter c occurs in u. If we write u = prs for p, s ∈ (A \ {c})∗ and r ∈ cA∗ ∩ A∗ c, then we have φ(p) = n, φ(r) = m and φ(s) = x.̂ This shows φ(u) = n ⋅ m ⋅ x.̂ In any case, ν(u) and μ(σ(u)) uniquely determine φ(u). By the wreath product principle, this yields M ≺ (U N × M c ) ≀ (N # × U1 ). We can finally prove the Krohn–Rhodes theorem from page 194; that is, we start with U2 and the simple group divisors of M as building blocks, and we show that M can be composed from these using wreath products and division. Proof of the Krohn–Rhodes theorem (Theorem 7.26): The proof is by induction on |K(M)|. If K(M) = 0, then the claim follows from Corollary 7.37. Now let K(M) ≠ 0. Let A be a minimal set of generators for M. Then there is a generator c ∈ A ∩ K(M). Let N be the submonoid of M generated by A \ {c}. The minimality of A yields c ∈ ̸ N. By Lemma 7.38, we have K(N # ) = K(N) ⊆ K(M) \ {c}. Lemma 7.40 shows that every group divisor of N # is also a group divisor of M and hence CN # ⊆ CM . By induction, the monoid N # divides a monoid in CM . Lemma 7.39 shows |K(M c )| < |K(M)|, and thus, by induction, M c divides a monoid in CM c ⊆ CM . Finally, we conclude that M divides a monoid in CM by Corollary 7.33 and Theorem 7.41.
7.6 Green’s relations Throughout this section, M denotes a monoid. If M is finite, then for all x ∈ M there are minimal t ⩾ 0, p ⩾ 1 (t = threshold and p = period) such that x t+p = x t . Let k be minimal such that t ⩽ kp < t + p. Then Exercise 1.6 (b) shows that x kp is the unique idempotent power of x. Moreover, if n = |M|, then x kp = x n! . Hence, for each finite monoid M there is a positive natural number ω such that x ω is idempotent for every x ∈ M. In the following, we define the notion of D-class. In a finite monoid M, the D-class D(s) of an element s ∈ M consists of all elements t ∈ M such that MsM = MtM. A main result in this section has been stated first in [18]; the local divisors M s and M t are isomorphic as soon as MsM = MtM. As a consequence, classical results concerning Green’s relations have a natural and transparent algebraic interpretation. Let us start with a general notion of ideals in monoids. A left ideal (resp. right ideal, resp. two-sided ideal) in a monoid M is a subset I ⊆ M such that M I ⊆ I (resp. I M ⊆ I, resp. M I M ⊆ I). Our focus is on principal ideals. A principal ideal is of the form Mx, xM, and MxM, where x ∈ M. The concept of ideals leads to natural partial orders ⩽L , ⩽R , ⩽J and induced equivalence relations ∼L , ∼R , ∼J . They are given by x ⩽L y ⇔
Mx ⊆ My
x ∼L y ⇔
Mx = My
x ⩽R y ⇔
xM ⊆ yM
x ∼R y ⇔
xM = yM
x ⩽J y ⇔ MxM ⊆ MyM
x ∼J y ⇔ MxM = MyM
206 | 7 Automata Instead of ∼L , ∼R , ∼J , we also write L, R, and J for the corresponding subsets in M × M. These are three out of five relations referred to as Green’s relations. The other two relations are H and D, defined by H = L ∩ R and D = L ∘ R. The relations were introduced and first studied by James Alexander Green (1926– 2014) in 1951. As the other four, D is an equivalence relation. For finite monoids this follows from Lemma 7.42 which shows J = D. Since the definition of J is symmetric, J = D implies D = L ∘ R = R ∘ L. In Exercise 7.18, we will see that the latter assertion L ∘ R = R ∘ L holds even in infinite monoids; and therefore D always is an equivalence relation. But, in general, we may have D ⊊ J. This is shown in Exercise 7.1 (c). The picture is as follows: H =L∩R⊆L∪R⊆D⊆J If G is a group, then H = J = G × G. Hence Green’s relations tell us nothing about the group. However, they are fundamental in the theory of (finite) semigroups. If a monoid M contains a group G as a subsemigroup, then all elements in G are H-equivalent. Let e ∈ G be its neutral element, then e is idempotent. This implies G ⊆ H(e). In particular, if M is finite and x ∈ M, then x ω is the neutral element in the finite cyclic group {x t , . . . , x t+p−1 } (where x t = x t+p and p ⩾ 1) and x t ∼H ⋅ ⋅ ⋅ ∼H x t+p−1 . Lemma 7.42. Let M be a finite monoid and s, t be J-equivalent elements. Then the following assertions hold: (a) If s ⩽R t (resp. s ⩽L t), then s ∼R t (resp. s ∼L t). (b) L(s) ∩ R(t) ≠ 0 (c) J = D = L ∘ R = R ∘ L; therefore, D is an equivalence relation. Proof: Since s ∼J t, we can write s = ptu and t = qsυ. (a) Since s ⩽R t, we may assume that s = tu ∈ tM. Hence, t = qsυ = qtuυ = q ω t(uυ)ω = (q ω t(uυ)ω )(uυ)ω = t(uυ)ω ∈ sM This implies t ⩽R s. We conclude s ∼R t. (b) We have s = pq s υu = (pq)ω ((pq)ω s (υu)ω ) = (pq)ω s As a consequence, s ⩽L qs ⩽L s and hence, s ∼L qs. By symmetry; s ∼R sυ. This implies qsM = qsυM = tM; and qs ∈ L(s) ∩ R(t). (c) We have shown that L(s) ∩ R(t) ≠ 0. Thus, there exists z with s ∼L z ∼R t. We obtain J ⊆ L ∘ R. The other inclusion L ∘ R ⊆ J is trivial. Therefore, J = L ∘ R. By symmetry, J = R∘L. By definition, D = L∘R. This proves the claim J = D = L∘R = R∘L. Now, D is reflexive because L and R are reflexive. It is symmetric since L ∘ R = R ∘ L. It is transitive since L∘R∘L∘R =L∘L∘R∘R= L∘R
7.6 Green’s relations
| 207
Let M be a finite monoid and s ∈ M. Recall the definition of a local divisor M s . It is the set sM ∩ Ms together with the multiplication xs ∘ sy = xsy. The neutral element is s. Note that the H-class H(s) is a subset of sM ∩ Ms. It turns out that H(s) is exactly the set of units in M s . Indeed, let x ∼H s, then s = yx for some y. We have ys ∈ Ms; and since sM = xM we have ys ∈ ysM = yxM = sM. Hence, ys ∈ sM ∩ Ms and ys ∘ x = yx = s. Thus, every element in H(s) has an inverse with respect to ∘. For the other direction, let y s = sy ∈ sM ∩ Ms such that y s ∘ x = y x = s and x ∘ sy = xy = s. Then, we have s ∈ xM ∩ Mx. This shows that x ∼H s. Thus, H(s) is the set of units in the local divisor. If we endow the set H(s) with the operation ∘, then it becomes a group. It is called the Schützenberger group of the H-class H(s). It has been known that all Schützenberger groups in a given D-class D are isomorphic. Actually, a more general statement is true: if s, t ∈ D, then the local divisors M s and M t are isomorphic. This will be shown next.
Generalization of Green’s lemma The following theorem is a slightly enriched version of Green’s classical lemma. Theorem 7.43. Let M be a monoid and s, t be R-equivalent elements such that s = tu and t = sυ. Then the right multiplication ⋅υ, mapping x to xυ, induces a bijection Ms → Mt, x → xυ. The mapping ⋅υ enjoys the following additional properties: (a) It induces an isomorphism between the local divisors M s and M t . (b) It maps L(s) bijectively onto L(t). ⋅υ (c) If M is finite, then L(s) → L(t) respects H-classes. More precisely, if M is finite, then H(x) ⋅ υ = H(xυ) for x ∈ L(s). ⋅υ
Proof: Let x ∈ Ms. Then xυ ∈ Mt and xυu = x since sυu = s. This shows that Ms → ⋅u ⋅υ Mt is injective and Mt → Ms is surjective. By symmetry, Ms → Mt is bijective. Since sMυ ⊆ tM, we also have a bijection between sM ∩ Ms and tM ∩ Mt. (a) Consider the bijection φ : sM ∩ Ms → tM ∩ Mt with φ(x) = xυ. Then we have φ(xs ∘ sy) = φ(xsy) = xsyυ and φ(xs) ∘ φ(sy) = xt ∘ syυ = xsyυ. Hence, φ(xs ∘ sy) = φ(xs) ∘ φ(sy); and φ is an isomorphism between the local divisors M s and M t . (b) Let x ∈ L(s), then Mxυ = Msυ = Mt. Hence, L(s) ⋅ υ ⊆ L(t). By symmetry, L(t) ⋅ u ⊆ L(s). This implies that ⋅υ induces a bijection between L(s) and L(t). (c) Let x ∈ L(s), then xυ ∼L t. Trivially, xυ ⩽R x. Moreover, xυ ∼J x since xυ ∼L t ∼J s ∼L x. By Lemma 7.42, we conclude xυ ∼R x. This says that H(x) is mapped to H(xυ). Again, as xυu = x, we obtain H(x) ⋅ υ = H(xυ). Remark 7.44. In Exercise 7.1 (b), we see that Lemma 7.42 fails for infinite monoids M, in general. More precisely, there are examples where s ∼J t and s ⩽R t, but s∼R ̸ t. We can choose the same example to find s ∼J t and L(s) ∩ R(t) = 0.
208 | 7 Automata Finally, if M is infinite, then L(s) → R(t), x → xυ does not always respect H-classes. Thus in general, the finiteness assumption in Theorem 7.43 (c) is necessary, too. ◊ The following corollary generalizes what is known as Green’s theorem. Corollary 7.45. Let M be a finite monoid and let s, t be J-equivalent. Then the local divisors M s and M t are isomorphic. In particular, the Schützenberger groups H(s) and H(t) are isomorphic and therefore |H(s)| = |H(t)|. If e ∈ M is an idempotent, then H(e) is a subgroup in the monoid M. Proof: Since s ∼J t there is some r ∈ R(s) ∩ L(t) by Lemma 7.42. Hence, by Theorem 7.43 (and left–right symmetry), we know that the local divisors M s , M r , and M t are isomorphic. Hence their groups of units are isomorphic. These groups of units are the Schützenberger groups H(s), H(r), and H(t). If e ∈ M is an idempotent, then the multiplication ∘ in the local divisor is exactly the same as in M. Indeed e2 = e implies xe ⋅ ey = xey = xe ∘ ey.
Egg boxes Lemma 7.42 shows that we can partition each D-class of a finite monoid into a disjoint union of L- (resp. R-) classes. These L- (resp. R-) classes are pairwise incomparable with respect to 1. To handle this case endow Σ and Σ with linear orderings such that a < b in Σ implies a < b in Σ for all a ∈ f −1 (a ) and b ∈ f −1 (b ). ̂ for f ∗ (w ). In order to check that Next, compute the lexicographical normal form w a ∈ Σ is the first letter in the lexicographical normal form for w , we consider all b ∈ Σ with a ≠ b . If (a , b ) ∈ D , then there are (a, b) ∈ D with f(a) = a and f(b) = b , because f is surjective on edges. In this case, the first a must be before all ̂ . If (a , b ) ∈ I , then f −1 (a ) and f −1 (b ) are not empty because f is surjective b in w ̂ before any letter of f −1 (b ) on vertices. In this case all letters of f −1 (a ) appear in w ̂ . This is due to the choice of the linear order on Σ. Once a is detected, we appears in w are done by induction since M(Σ, I) is cancellative: either by the inclusion (8.5) or by a simple direct argument using the defining relations ab = ba for (a, b) ∈ I. Corollary 8.16. Let M(Σ, I) be a trace monoid and D the collection of subsets {a, b} ⊆ Σ such that (a, b) ∉ I (including the case a = b). For each {a, b} ∈ D let π ab be the natural projection of M(Σ, I) onto the free monoid {a, b}∗ , which is defined by erasing all letters different from a, b. Then, we obtain a canonical embedding π : M(Σ, I) →
∏ {a, b}∗ , w → (π ab (w)) {a,b}∈D
In particular, every trace monoid is a submonoid of a finite direct product of finitely generated free monoids; and therefore its word problem is solvable in linear time. The analog of Proposition 8.15 and Corollary 8.16 fails for free partially commutative groups. For example, consider Σ = {a, b, c} with the linear order a < b < c, where a and c are independent, but (a, b) ∉ I and (b, c) ∉ I. For group elements x, y, let [x, y] = xyx−1 y−1 denote the commutator of x and y. It turns out that the lexicographical normal form of [[a, b], [c, b]] is the word ababcbacbabcbc In particular, [[a, b], [c, b]] ≠ 1 in G(Σ, I). There are two projections π ab and π bc of G(Σ, I) onto F(a, b) and F(b, c), respectively. However, π ab ([c, b]) = 1 and π bc ([a, b]) = 1. Hence, π[[a, b], [c, b]] = 1 in the direct product F(a, b) × F(b, c). Actually, the graph group F(a, b, c)/{ac = ca} is no subgroup of any finite direct product of free groups [35]. The subgroup structure of RAAGs is rich and complicated. It has been studied from various angles which go far beyond the scope of this textbook. For more advanced topics, see, for example [34, 111]. Based on (Σ, I) the right-angled Coxeter group C(Σ, I) is defined as follows: C(Σ, I) = Σ∗ / { a2 = 1, ab = ba (a, b) ∈ I } These Coxeter groups are therefore quotients of the respective graph groups. Since nontrivial Coxeter groups have torsion, they cannot appear as subgroups of graph
252 | 8 Discrete infinite groups
groups. However, every graph group is a subgroup of some right-angled Coxeter group, see Exercise 8.4.
8.7 Semidirect products The construction of semidirect products follows the scheme we have seen in the construction of free products in Example 8.7. We let M and K be monoids with associated (potentially infinite) alphabets Σ M = M \ {1} and Σ K = K \ {1}. We assume that these alphabets are disjoint, and the monoids share the neutral element 1. The notion of semidirect product refers to M, K and a monoid homomorphism α : K → Aut(M), where Aut(M) is the automorphism group of the monoid M. For instance, K = M = ℤ (with common neutral element 0) and α(x)(m) = (−1)x ⋅ m. In this section, we write x m instead of α(x)(m) for x ∈ K and m ∈ M. In this notation, we obtain x
m x m = x (mm ) and x (y m) =
xy
m
The idea now is to realize x m as an inner automorphism xmx−1 = x m. For monoids there is no x−1 , in general. Hence, we replace the equation xmx−1 = x m by xm = x mx, which is equivalent in the group case and makes sense in all monoids. As for free products, let Σ = Σ M ∪ Σ K and S = S M ∪ S K . We extend S to a system S α by S α = S M ∪ S K ∪ { xm → x mx x ∈ Σ K , m ∈ Σ M } The system S α terminates because it reduces length-lexicographically if we choose the letters from Σ M smaller than those from Σ K . Local confluence follows from the above computation rules. Hence, the system S α is convergent. The normal form computation pushes the elements of K to the right. Then we may interpret IRR(S α ) as the set of pairs in M × K. The semidirect product M ⋊α K is defined by the set M × K together with the multiplication (m, x) ⋅ (n, y) = (m x n, xy) By construction M ⋊α K = Σ∗ /S α is the quotient monoid of the free product M ∗ K with the (additional) defining relations xm = x mx. Moreover, M and K are embedded in M ⋊α K via M ≅ M × {1} and K ≅ {1} × K which means that we may consider M and K as submonoids of M ⋊ α K. If M and K are groups, then M is a normal subgroup; the conjugation action (1, x)(m, 1)(1, x−1 ) with x ∈ K, m ∈ M, results in (x m, 1) = (α(x)(m), 1) and, therefore, it becomes an automorphism α(x) ∈ Aut(M). As an example, we now consider the case K = M = ℤ. There are exactly two semidirect products, the direct product and the non-Abelian group ℤ ⋊ ℤ with the multiplication (m, x) ⋅ (n, y) = (m + (−1)x ⋅ n, x + y)
8.8 Amalgamated products and HNN extensions
|
253
For the case K = ℤ/2ℤ and M = ℤ, the situation is quite similar. We also get only one non-Abelian semidirect product ℤ ⋊ (ℤ/2ℤ) with the multiplication as above, but in the second component the computation is modulo 2. Example 8.17. The following are typical instances of semidirect products: –
– –
The non-Abelian semidirect product ℤ ⋊ (ℤ/2ℤ) is isomorphic to the free product (ℤ/2ℤ) ∗ (ℤ/2ℤ). Indeed, let a = (1, 1) ∈ ℤ ⋊ (ℤ/2ℤ) and b = (0, 1) ∈ ℤ ⋊ (ℤ/2ℤ), then a2 = b 2 = (0, 0). Moreover, ab = (1, 0) and, therefore, we have a surjective homomorphism from (ℤ/2ℤ) ∗ (ℤ/2ℤ) = {a, b}∗ /{a2 = b 2 = 1} onto G. The inverse homomorphism is given by (m, x) → (ab)m b x . It is a homomorphism since (ab)m = (ba)−m in (ℤ/2ℤ) ∗ (ℤ/2ℤ) for m ∈ ℤ. The Baumslag–Solitar group BS(1, −1) was defined as the semidirect product ℤ ⋊ ℤ. The group BS(1, 2) corresponds to the semidirect product ℤ[ 12 ] ⋊ ℤ. Here ℤ[ 12 ] is the additive group of the fractions 2pk ∈ ℚ with p, k ∈ ℤ. The multiplication in ℤ[ 12 ] ⋊ ℤ is defined by (r, m) ⋅ (s, n) = (r + 2m s, m + n).
The proof that BS(1, 2) and ℤ[ 12 ] ⋊ ℤ are isomorphic is along the same lines as above. For BS(1, 2), we map a to (1, 0) and t to (0, 1) in ℤ[ 12 ] ⋊ ℤ. The defining relation tat−1 = a2 holds in ℤ[ 12 ] ⋊ ℤ; the elements (1, 0) and (0, 1) generate the semidirect product; and, finally, the normal form description in Equation (8.1) shows that a → (1, 0), t → (0, 1) induces an injection. ◊
8.8 Amalgamated products and HNN extensions We explain the construction of amalgamated products and HNN extensions for groups, only. We start with the amalgamated product and consider three groups U, G, and H together with two injective homomorphisms φ : U → G and ψ : U → H. For example, U, G and H just can be the finite cyclic groups ℤ/2ℤ, Z/4ℤ and Z/6ℤ, respectively. We now define the amalgamated product G ∗ U H as the quotient group of the free product G ∗ H by G ∗U H = G ∗ H/ { φ(u) = ψ(u) u ∈ U }
(8.6)
We recognize the following universal property: the homomorphisms from G ∗ U H to a group (or a monoid) K correspond exactly to the pairs (g, h) of homomorphisms g : G → K and h : H → K which fulfill the condition g(φ(u)) = h(ψ(u)) for all u ∈ U In Section 8.12, we consider the special linear group SL(2, ℤ) of 2×2 matrices with 0 1 ) and R = ( 0 −1 ), then S has coefficients in ℤ and determinant 1. If we choose S = ( −1 0 1 1
254 | 8 Discrete infinite groups 0 order 4 and R has order 6. In fact, we have S4 = R6 = ( 10 10 ) and S2 = R3 = ( −1 0 −1 ). In particular, we obtain a pair of homomorphisms ℤ/4ℤ → SL(2, ℤ) with 1 mod 4 → S and ℤ/6ℤ → SL(2, ℤ) with 1 mod 6 → R. Due to the universal property of amalgamated products, there is a homomorphism of (ℤ/4ℤ) ∗ℤ/2ℤ (ℤ/6ℤ) in SL(2, ℤ) because of the relation S2 = R3 . Theorem 8.38 shows that this homomorphism in fact is an isomorphism. Let us return to the general situation G ∗ H/{ φ(u) = ψ(u) | u ∈ U }. It is a priori unclear whether the natural maps G → G ∗U H and H → G ∗U H are injective. In principle, G ∗U H could be the trivial group. However, this is not the case; and to prove this we introduce a convergent semi-Thue system S ⊆ Σ∗ × Σ∗ which represents G ∗U H. Here S and Σ will be infinite if G ∪ H is infinite. For this, we just may consider the situation that φ : U → G and ψ : U → H are inclusions and that G ∩ H = U. This simplifies the notation.
G
U
H
We now divide G and H into cosets of U. For this, we choose a complete system A ⊆ G of (left) coset representatives of U in G with 1 ∈ A and a complete system B ⊆ H of (left) coset representatives of U in H with 1 ∈ B. As alphabets we choose Σ A = (U ∪ A) \ {1} and Σ B = (U ∪ B) \ {1} and Σ = Σ A ∪ Σ B . Each element g ∈ G ∪ H has a unique representation as a word cu ∈ Σ∗ with c ∈ A ∪ B and u ∈ U. (For the neutral element 1 ∈ G ∪ H, we have c = u = 1 ∈ A ∩ B ∩ U, and cu = 1 is the empty word.) More precisely, there is a representation of length 0 (g = 1), length 1 (1 ≠ g ∈ U ∪ A ∪ B) or length 2 (otherwise). We denote elements in G or H with square brackets. The semi-Thue system S consists of the following rules: υw → au υw → bu
if υ, w ∈ Σ A , u ∈ U, a ∈ A, [υw] = [au] and υ ≠ a if υ, w ∈ Σ B , u ∈ U, b ∈ B, [υw] = [bu] and υ ≠ b
(8.7)
Theorem 8.18. The semi-Thue system S of (8.7) is convergent and Σ∗ /S is canonically isomorphic to the amalgamated free product G ∗U H. Proof: First, we convince ourselves that Σ∗ /S and G ∗U H are isomorphic. The isomorphism is induced by the natural inclusions Σ A → G and Σ B → H. It is therefore canonical, and this justifies the notation Σ∗ /S = G ∗U H. Because Σ A ∩ Σ B ⊆ U, the natural homomorphism Σ∗ → G ∗U H is well defined since in the amalgamated product all u ∈ G ∩ U are identified with u ∈ H ∩ U. The group G ∗U H is generated by the elements of G and H, and these generators can already be represented by words from Σ∗ with length at most two. So the map Σ∗ → G ∗U H is surjective. Due to the
8.8 Amalgamated products and HNN extensions |
255
claims [υw] = [au] and [υw] = [bu], also the induced map π : Σ∗ /S → G ∗U H is well defined and a surjective homomorphism. We have to show that π is injective. For this purpose, we define the maps φ G : G → Σ∗ /S and φ H : H → Σ∗ /S by φ G (g) = au and φ H (h) = bu if [g] = [au], a ∈ A, u ∈ U and [h] = [bu], b ∈ B, u ∈ U, respectively. These maps are well defined, because A and B are complete sets of representatives of the left cosets of U in G and H, respectively. We have to show that φ G and φ H are homomorphisms. Here the system S comes into play. Let φ G (g) = au, φ G (g ) = a u and φ G (gg ) = a u . With a maximum of four rewriting steps, we obtain ⩽1 ⩽1 ⩽2 ̃ ⇒ a u = φ G (gg ) ̃ ⇒ a û uu φ G (g)φ G (g ) = aua u ⇒ a ã uu S
S
S
It follows that φ G is a homomorphism. Analogously, φ H is a homomorphism. For all u ∈ U = G ∩ H we have φ G (u) = φ H (u) which means that φ G and φ H induce a homomorphism φ : G ∗U H → Σ∗ /S. Finally, φ(π(c)) = c for all c ∈ Σ which gives that π is injective. The proof that Σ∗ /S = G ∗U H was a bit tedious, but purely mechanical. Next, we show that the system S is convergent. This requires to prove termination and local confluence. We start with the termination. Let x ∈ Σ∗ be a word of length n. We show that for x at most O(n2 ) rewriting steps are possible. There are at most n length reductions. Let now C = Σ \ U. We consider the length-preserving rule υw → cu with c ∈ A ∪ B, u ∈ U. Then |cu| = 2 and c ∈ C. For w ∈ U, we would get υ = c which is excluded. Hence, without loss of generality, let w = a ∈ A, and then also c = a ∈ A. For υ ∈ A the rule has the form a a → au. An application of the rule a a → au reduces the number of letters from C. Therefore, altogether a rule of this type can be applied at most n times because no rule application increases the number of letters from C. Thus it is enough to count how often a rule of the type u a → au with a, a ∈ A and u, u can be applied. By such a rule application a letter from U moves to the right, and this can happen at most n times per letter before it reaches the right end. Since we have at most n letters from U, we therefore obtain a maximum of O(n2 ) rule applications. The proof of the local confluence is pure routine by considering all critical pairs. The critical pairs arise from words of length 3. In such a word, all three letters either belong to Σ A or belong to Σ B . Indeed, if a word υuw consisting of three letters υ, u, w has the middle letter u ∈ U, then either υu is irreducible or υ ∈ U. All words whose letters belong entirely to Σ A or to Σ B have an interpretation in G or H and therefore a unique irreducible normal form, either the form au or the form bu with a ∈ A, b ∈ B and u ∈ U. Thus, the system S represents the amalgamated product G ∗U H and is convergent. A direct consequence of Theorem 8.18 is that G and H embed themselves in G ∗U H as desired: all words of the form au or bu with a ∈ A, b ∈ B and u ∈ U are irreducible. This implies the announced embedding of G∪H into the amalgamated product G∗ U H. The normal forms in IRR(S) consist of alternating sequences of letters from A and B, and, possibly, a last letter from U. Therefore, each word w ∈ IRR(S) can be
256 | 8 Discrete infinite groups
written as w = a1 b 1 ⋅ ⋅ ⋅ a m b m u with m ⩾ 0, and all a i ∈ A, b j ∈ B and u ∈ U. If we require that, in addition, a i ≠ 1 for 2 ⩽ i ⩽ m and b j ≠ 1 for 1 ⩽ j < m, then the representation is unique. For many applications, this system S is unnecessarily meticulous. Often we content ourselves with the alphabet Γ = (G ∪ H) \ {1} and rules which reduce words of length 2 or 3 to words in Γ ∪ {1} in each step. For this purpose, define S as follows: S = { υuw → [υuw] | u ∈ U ∧ (υ, w ∈ Γ ∩ G ∨ υ, w ∈ Γ ∩ H) } If we start with a word w ∈ Γ ∗ , then we obtain an irreducible sequence relating to S of the form w = g1 ⋅ ⋅ ⋅ g n with n ⩾ 0, where for n ⩾ 1, we have g n , g i ∈ Γ, and g i ∈ G ⇔ g i+1 ∉ G for all 1 ⩽ i < n. To implement this effectively, we assume that G is finitely generated by an alphabet Γ0 and H is finitely generated by an alphabet Γ1 where Γ0 ∩ Γ1 = 0. We further assume that we can effectively determine if a word u ∈ Γ i∗ belongs to U. Also we assume that, for a word u ∈ U ∩ Γ i∗ , we can effectively compute a representation ∗ as a word in Γ1−i . Finally, we must be able to decide whether g = 1 for g ∈ Γ0∗ ∪ Γ1∗ . Let us summarize this again. The word problem in G ∗U H is solvable if the following conditions are satisfied: the word problems in G and H are solvable, the membership problem for the subgroup U is solvable in G and H, and for u ∈ U, we can effectively switch between the representations as words in Γ0∗ and Γ1∗ . We now consider another important construction in combinatorial group theory, the HNN extensions which are named after their inventors Graham Higman (1917– 2008), Bernhard Hermann Neumann (1909–2002), and Hanna Neumann (1914–1971). Their explicit construction is given in [50]. We start with a group G and an isomorphism φ : A → B between subgroups A and B of G. The central idea is to embed G into a bigger group in which φ becomes an inner automorphism, that is, φ is realized by an element t with tat−1 = φ(a) for all a ∈ A. The required property leads to the following construction. Let t be a new symbol and F(t) the free group generated by t; then, we have F(t) ≅ ℤ. The HNN extension of G with respect to φ : A → B is the following quotient group: HNN(G; A, B, φ) = G ∗ F(t)/ { tat−1 = φ(a) a ∈ A } In order to see that G and F(t) can be embedded into the HNN extension, we argue analogously as in the case of amalgamated products. The formal proofs just follow the already known technique step by step. As the alphabet we now choose Σ = {t, t} ∪ (G \ {1}), and thus obtain a canonical surjective homomorphism Σ∗ → HNN(G; A, B, φ). We choose a complete system
8.8 Amalgamated products and HNN extensions
|
257
C ⊆ G and D ⊆ G of (left) coset representatives of A in G with 1 ∈ C and of B in G with 1 ∈ D, respectively. Each g ∈ G has a unique representation g = ca = db with a ∈ A, b ∈ B, c ∈ C, and d ∈ D. In the HNN extension we have: gt = cat = ctφ(a) and gt = dbt = dtφ−1 (b). Therefore, we may push the elements of A to the right by means of t−1 = t and analogously the elements of B by means of t. The representatives remain on the left side. This consideration leads us to the following semi-Thue system S: tt tt gh gt gt
→ → → → →
1 1 [gh] ctφ(a) dtφ−1 (b)
if g, h ∈ G \ {1} and gh = [gh] in G if c ∈ C, 1 ≠ a ∈ A and g = [ca] in G if d ∈ D, 1 ≠ b ∈ B and g = [db] in G
(8.8)
The isomorphism between Σ∗ /S and HNN(G; A, B, φ) and the local confluence of S are easy to prove, that is, they are purely mechanical and analogously follow the scheme we had for amalgamated products. Some more attention we have to pay to the termination since the system S even increases lengths. Again, let x ∈ Σ∗ be a word of length n. First, we mark all letters in the word except the t’s and the t’s. Then we remove the marking for those 1 ≠ c ∈ C and 1 ≠ d ∈ D where there is a t or t, respectively, on the right. This rule for the markings we retain during the rewriting steps. After a rule application, a new marking can only be generated if, for instance, there is a factor ctt and tt gets deleted. This reduces the number of t’s, and the number of marks is therefore limited in total by n. Hence, each rule application reduces the number of t’s or it reduces the number of markings or some marking will be passed exactly one position closer to the right edge. We thus obtain a maximum of n2 possible rewriting steps. In particular, we obtain the following theorem: Theorem 8.19. The system S of equation (8.8) is convergent, and Σ∗ /S is canonically isomorphic to the HNN extension G ∗ F(t)/ { tat−1 = φ(a) a ∈ A } From Theorem 8.19, we obtain that the irreducible normal forms for HNN extensions have the following unique representation: g = r1 θ1 ⋅ ⋅ ⋅ r n θ n h with n ⩾ 0, h ∈ G and either r i ∈ C and θ i = t or r i ∈ D and θ i = t for all 1 ⩽ i ⩽ n. In particular, G and F(t) embed into the HNN extension, and the HNN extension is always a group that has ℤ as a quotient group and, therefore, contains ℤ as a subgroup, too. As in the case of amalgamated products the system S from (8.8) is unnecessarily meticulous. In order to solve the word problem, we do not need a unique normal form. It is enough to consider Britton reductions which were introduced in [10] by John
258 | 8 Discrete infinite groups
Leslie Britton (1927–1994). By definition, a Britton reduction means to apply one of the following rules: gh → [gh] tat → φ(a) tbt → φ−1 (b)
for g, h ∈ G \ {1} and gh = [gh] in G for a ∈ A for b ∈ B
The new system is length reducing and for each word from Σ∗ it provides a Britton reduced word of the form g = g1 θ1 ⋅ ⋅ ⋅ g n θ n h with n ⩾ 0, g1 , . . . , g n , h ∈ G, and θ i ∈ {t, t} for i = 1, . . . , n and g i ≠ 1 for 2 ⩽ i ⩽ n. This Britton reduced normal form is not unique: for instance, tφ(b) and bt are both Britton reduced but represent the same element in HNN(G; A, B, φ). Let g = g1 θ1 ⋅ ⋅ ⋅ g n θ n h be Britton reduced and θ1 , . . . , θ n the sequence of the occurring t’s and t’s. If we now further apply rules from the convergent system S of (8.8), then we realize that the sequence θ1 , . . . , θ n does not change anymore. This sequence is therefore determined solely by the group element g. Hence, g ∈ G if and only if n = 0 and h = 1. If G has a solvable word problem and if we can effectively compute Britton reductions, then the HNN extension G ∗ F(t)/{ tat−1 = φ(a) | a ∈ A } has a solvable word problem. Example 8.20. The Baumslag–Solitar groups BS(p, q) defined in Example 8.8 are HNN extensions of ℤ. The Waack group W defined in Example 8.9 is an HNN extension of BS(1, 2) = F(a, t)/{tat−1 = a2 }. Actually, we presented two different realizations: BS(1, 2) ∗ F(s)/{sas−1 = a2 } and BS(1, 2) ∗ F(r)/{rar−1 = a}. Thus, different looking HNN extensions may define isomorphic groups. ◊ The similarity of the constructions for amalgamated products and HNN extensions is not a coincidence: it is part of the Bass–Serre theory. This theory was outlined by Jean-Pierre Serre (born 1926) [93] and brought to its final form by Hyman Bass (born 1932).
The Baumslag group BG(1, 2) Let us consider another HNN extension BG(1, 2) given as BG(1, 2) = BS(1, 2) ∗ F(b)/ {bab −1 = t} It is called the Baumslag group. The generator t is redundant and BG(1, 2) is generated by a and b with a single defining relation BG(1, 2) = F(a, b)/ {(bab −1 )a(b −1 a−1 b) = a2 } At first glance, this might appear to be a similar definition as the one for the Waack group W above. However, the behavior is quite different. Baumslag constructed his
8.9 Rational sets and Benois’ theorem |
259
group BG(1, 2) as an example of a not cyclic one-relator group where all its finite quotients are cyclic. Later Gersten showed that the so-called Dehn function grows faster than any elementary function. We do not define these notions here, but due to the discovery of Gersten, the group BG(1, 2) is called the Baumslag–Gersten group in the literature [84], too. The word problem in BG(1, 2) is solvable because we can effectively compute the Britton reduced normal form as follows. The input is a word w ∈ {a, a−1 , b, b −1 , t, t−1 }∗ , which we decompose w = g0 β 1 g1 ⋅ ⋅ ⋅ β m g m with m ⩾ 0, β i ∈ {b, b −1 }∗ and g i ∈ {a, a−1 , t, t−1 }∗ . If m = 0, then w = g0 ∈ BS(1, 2) is Britton reduced. If m = 1, then w ∉ BS(1, 2), and w is Britton reduced. Now, let m ⩾ 2. We consider all factors of the form β i g i β i+1 1 with β i = β −1 i+1 for 1 ⩽ i < m. If β i = b, then we test if g i ∈ BS(1, 2) = ℤ[ 2 ] ⋊ ℤ has the form g i = (τ, 0) with τ ∈ ℤ. If yes, then g i = a τ in BS(1, 2), and we replace the factor β i g i β i+1 by t τ or (0, τ), respectively, if we calculate directly in the decomposition ℤ[ 12 ] ⋊ ℤ. Analogously, if β i = b −1 and g i = (0, τ) with τ ∈ ℤ, then we replace β i g i β i+1 by a τ or (τ, 0), respectively. After a maximum of m such steps, we obtain a Britton reduced word. This looks harmless enough, but lo and behold, the exponents can get huge very quickly; see Exercise 8.11. This is closely related to the fast growing Dehn functions; and for this reason, the group BG(1, 2) for many years was a candidate for a one-relator group with an extremely difficult word problem. However, it had to be canceled from that list by [31, 78]: the word problem for BG(1, 2) is solvable in cubic time using a data structure called power circuit.
8.9 Rational sets and Benois’ theorem As usual if M is a monoid and R, R subsets of M, then R ⋅ R denotes the product R ⋅ R = { xy ∈ M | x ∈ R, y ∈ R } and R∗ denotes the submonoid of M, which is generated by R. If we let R0 = {1} and R n+1 = R ⋅ R n , we have R∗ = ⋃{ R n | n ∈ ℕ }. As in Section 7.2, we define the family RAT(M) of rational sets inductively: – Finite subsets R ⊆ M are rational. – If R, R are rational, then so are R ∪ R , R ⋅ R and R∗ . Every finitely generated submonoid N ⊆ M is rational, but there also exist rational submonoids which are not finitely generated. The standard example for this is N = {(0, 0)} ∪ { (m, n) ∈ ℕ × ℕ | m ⩾ 1 }; N is a submonoid of ℕ × ℕ but it cannot be finitely generated because for any finite set of pairs (m i , n i ) the element (1, 1 + max{n i }) is in N but not in the submonoid generated by the pairs (m i , n i ). The submonoid N is
260 | 8 Discrete infinite groups rational because N = {(0, 0)} ∪ {(1, 0)} + ({(1, 0)} ∪ {(0, 1)})∗ . For groups the situation is different. Theorem 8.21. Let H be a rational subgroup of a group G. Then H is finitely generated. Proof: Since H is rational, there is a finite subset A ⊆ G such that H can be described as a rational set over A∗ . For this let π : A∗ → G be the homomorphism induced by the inclusion of A in G, then there is a regular set R ⊆ A∗ with π(R) = H. By Kleene’s theorem, Theorem 7.17, the language R will be recognized by a deterministic finite automaton A over the alphabet A and therefore π(L(A)) = H. Without loss of generality we have A = A−1 , and for each u ∈ A∗ there is a word u ∈ A∗ with π(uu) = π(uu) = 1. Let n be the number of states of A. It is enough to show that the finite set π(W) with W = { uυu ∈ A∗ π(uυu) ∈ H and |uυ| ⩽ n } generates the subgroup H. Since π(L(A)) = H, it is sufficient to show that for each word w ∈ L(A) the element π(w) is in the subgroup generated by π(W). We show this by induction on |w|. If |w| ⩽ n, then w ∈ W because in this case we may choose the empty word for u. Now, let |w| > n. When reading the word in A, we must see one of the n states at least twice. Therefore, let uυ be a prefix of w with |u| < |uυ| ⩽ n such that the words u and uυ lead to the same state of A. We write w = uυz. Since the automaton is deterministic, the word uz is recognized, too. Hence, by induction, π(uz) is in the subgroup generated by π(W). Because π(uυu uz) = π(uυz) = π(w) ∈ H and π(uz) ∈ H, we must have π(uυu) ∈ H. Therefore, uυu ∈ W. A classical theorem in formal language theory, Theorem 7.17, states that the rational sets of finitely generated free monoids form an effective Boolean algebra. As soon as we meet partial commutation, the situation changes. Let Σ = {a, b, c} be an alphabet with three generators. Consider the (free partially commutative) monoid M = {a, b, c}∗ /{ab = ba, bc = cb}. Exercise 8.6 shows that the family of rational subsets in M is not closed under intersection. We are particularly interested in rational sets of finitely generated free groups F(Σ) and want to show that they form an effective Boolean algebra, that is, the family of rational sets is closed under finite union and complementation, and moreover we may effectively compute the finite union and the complementation (and hence the intersection). This statement about free groups was published in 1969 by Benois [6]. Her proof can be extended to other finitely generated groups and monoids which may be presented by certain confluent semi-Thue systems. We start very general. For a finite alphabet Γ, a finite semi-Thue system S ⊆ Γ × Γ is called monadic if the following condition is satisfied: for all rules (ℓ, r) ∈ S, we have |ℓ| > |r| ⩽ 1. A monadic system is therefore length reducing and right components are either empty or have exactly one letter. For infinite systems, we additionally require
8.9 Rational sets and Benois’ theorem |
261
that for each r ∈ Γ ∪ {1} the set L(r) = { ℓ ∈ Γ ∗ | (ℓ, r) ∈ S } of the left components is regular. For finite systems this requirement is automatically satisfied. Theorem 8.22. Let Γ be finite, S ⊆ Γ × Γ be a confluent monadic semi-Thue system, and let M = Γ ∗ /S be the quotient monoid. Then the family of the rational sets RAT(M) is an effective Boolean algebra. In particular, membership in a rational set is decidable. Proof: Let π : Γ ∗ → M = Γ ∗ /S be the canonical projection. As just described, we define for r ∈ Γ ∪ {1} the regular set L(r) = { ℓ ∈ Γ ∗ | (ℓ, r) ∈ S } of the left components. For the specification of a rational set R ⊆ M, we use a nondeterministic finite automaton A on Γ such that π(L(A)) = R. We allow ε-edges, which could be removed if necessary, but we require that the edges are labeled only with letters or the empty word. If two nondeterministic finite automata R and R are given, then the standard constructions in Section 7.2 provide automata for R ∪ R , R ⋅ R , and R∗ . Intersection can be obtained quite naturally using union and complementation. Hence, for a nondeterministic finite automaton A, it suffices to effectively construct another nondeterministic finite automaton A with π(L(A )) = M \ π(L(A)). Next, we give the construction of A step by step. First, we enlarge the language L(A) within Γ ∗ without changing R = π(L(A)). We “flood” the automaton by more edges. Let Q be the state set of A. For each pair (p, q) ∈ Q2 , we define a regular set L(p, q) ⊆ Γ ∗ , which consists of all words that we would accept if p, were the only initial state and q the only final state. Then we test for each r ∈ Γ ∪ {1} whether L(p, q) ∩ L(r) ≠ 0. This is an effective test. If the intersection is nonempty, then we insert an edge from p to q with label r in the automaton A, provided it does not already exist. We iterate this procedure repeatedly, until (after a maximum of (|Γ|+1)|Q|2 rounds) finally no new edges can be included. Note, the monadic property implies that we increase the number of edges, but we do not insert new states. The language L(A) was not changed: to see this, consider an accepted word urυ, which, when reading r, passes a new edge from the state p to the state q. Then there is a rule (ℓ, r) ∈ S with ℓ ∈ L(p, q). Hence, also uℓυ is accepted, and the accepting path uses the new edge less frequently. This implies π(urυ) = π(uℓυ), which means that the flooding does not change the rational set R. By a slight abuse of language, we continue to refer to the new automaton as A. It now has the following important property: from u ⇒ υ and S
u ∈ L(A) it follows υ ∈ L(A). We define ∗ ̂ = {u ̂} ̂ ∈ IRR(S) ∃u ∈ L(A) : u ⇒ u R S
̂ is a set of normal forms for R, and π provides a bijection between R ̂ and R. Then R ̂ ⊆ L(A). Furthermore R As the next step, we construct a nondeterministic finite automaton A which accepts the complement Γ ∗ \ L(A); see Theorem 7.17. The set π(L(A )) contains more elements than the complement M \ R, but if we consider the intersection with the ir-
262 | 8 Discrete infinite groups
reducible words, then, as desired, we obtain ̂ = IRR(S) \ L(A) L(A ) ∩ IRR(S) = IRR(S) \ R It remains to show that IRR(S) is regular. For this purpose, we write the complement of IRR(S) as the union of finitely many regular languages Γ ∗ L(r)Γ ∗ . Thus, we have effectively found an automaton A with L(A ) = IRR(S) \ L(A) It follows π(L(A )) = M \ R which proves the theorem. Corollary 8.23 (Benois). The rational sets in a finitely generated free group form an effective Boolean algebra. Proof: Example 8.6 shows that the monadic system S = { aa → 1 | a ∈ Γ } is strongly confluent.
8.10 Free groups A central theorem about free groups says that their subgroups are free, too. This theorem was first shown by Jacob Nielsen (1890–1959) for finitely generated subgroups [82] and by Otto Schreier (1901–1929) in general. Like vector spaces and free monoids, free groups are, up to isomorphism, completely determined by their rank, that is, by the cardinality of their basis. For subgroups of finite index we present the rank formula by Schreier, which allows the computation of the rank of the subgroup. We already know from Corollary 8.23 that the membership problem in finitely generated subgroups is solvable, because finitely generated subgroups are rational subsets. Some classical proofs of these results are technical and complicated. In a modern approach one typically follows the ideas of Serre. In [93], it is shown that a group is free if and only if it acts “freely and without inversion” on trees. Another approach uses Stallings graphs [99]. These are special deterministic automata on free groups. They are named after John Robert Stallings, Jr. (1935–2008). The approaches of Serre and Stallings are closely related. Essential parts in Stallings’ method can be derived from the ideas of Benois, which we presented in Section 8.9. These results were historically more than a decade ahead of Stallings’ graphs, but hardly known among group theorists. In the following, let F = F(Σ) be a free group with basis Σ. The rank of F is defined as the cardinality |Σ| of Σ. Let us convince ourselves that the rank is well defined, that is, it does not depend on the chosen basis. The proof uses the well-known fact that a vector space, up to isomorphism, is uniquely determined by its dimension. Theorem 8.24. Let F(Σ) and F(Σ ) be isomorphic free groups. Then there is a bijection between the bases Σ and Σ .
8.10 Free groups
| 263
Proof: Let F = F(Σ). Consider the subgroup N of F generated by the squares. N consists of products of the form x21 ⋅ ⋅ ⋅ x2k with k ∈ ℕ and x i ∈ F. The subgroup N of F is normal because z(x21 ⋅ ⋅ ⋅ x2k )z−1 = (zx1 z−1 )2 ⋅ ⋅ ⋅ (zx k z−1 )2 . In the quotient group F/N all elements have order 2. It is therefore Abelian, because xy = xy(yx)2 = (x(yy)x)yx = yx ∈ (Σ) F/N. Therefore, F/N is an 𝔽2 -vector space. Now we consider the 𝔽2 -vector space 𝔽2 which consists of the maps χ : Σ → 𝔽2 with χ(b) = 0 for almost all letters b. The (Σ) set { χ a | a ∈ Σ } with χ a (a) = 1 and χ a (b) = 0 for b ≠ a forms a basis of 𝔽2 ; (Σ) φ : 𝔽2 → F/N, χ a → a ∈ F/N, defines a surjective linear map. The inverse map (Σ) is induced by the homomorphism ρ : F(Σ) → 𝔽2 , a → χ a , and this homomorphism (Σ) gives a linear map ψ : F(Σ)/N → 𝔽2 , a → χ a ; and we have χ a = ψ(φ(χ a )) for all χ a (Σ) from the base of 𝔽2 . In particular, φ is injective and hence an isomorphism between 𝔽2 -vector spaces: ψ
(Σ)
φ
F(Σ) → F/N → 𝔽2 → F/N The definition of F/N is independent of the basis of F; and therefore also the dimen(Σ) sion of F/N as an 𝔽2 -vector space. This dimension is equal to the dimension of 𝔽2 , that is, equal to |Σ|. Hence, if F(Σ) and F(Σ ) are isomorphic, then there is a bijection between Σ and Σ . Certainly, if there is a bijection between Σ and Σ , then F(Σ) and F(Σ ) are isomorphic. In the following, let Γ = Σ ∪ Σ−1 and let π : Γ ∗ → F be the homomorphism induced by the inclusion of Γ in F. We may interpret words in Γ ∗ as group elements of F. For a word w ∈ Γ ∗ , we often write w ∈ F instead of π(w) ∈ F. As usual, a word w ∈ Γ ∗ is called reduced if it does not contain any factor aa−1 with a ∈ Γ. Now, let G be a subgroup of F. We want to show that G is free. Our goal is more ambitious and, therefore, it can only be achieved with a little additional effort. We present a method for determining the rank of finitely generated groups. We start with the Schreier graph of the right cosets G\F for G in F. The set of vertices is V = G \ F. The set E of directed edges consists of pairs (u, a) ∈ V × Γ with source s(u, a) = u and target t(u, a) = ua. Note that this graph may contain multiple edges and loops. The number of vertices is the index [F : G], which we denote by n. Here, n is finite or n = ∞. We label the edge (u, a) by λ(u, a) = a. If we define G as the unique initial and final state, then we obtain a deterministic automaton that recognizes the language L ⊆ Γ ∗ with π(L) = G. It proves advantageous to move to a partial automaton. For this, let Q be the set of right cosets Gu that are in the Schreier graph on a path, which is labeled by a reduced word uυ with π(uυ) ∈ G. The set Q may be infinite, and we have Q = { Gu ∃υ ∈ Γ ∗ with π(uυ) ∈ G and uυ is reduced } Now, let K(G, Σ) be the subgraph of the Schreier graph, which is induced by Q. We let E be its edge set. The edges are labeled, and we have G ∈ Q. Hence, we may interpret K(G, Σ) as an automaton by defining G to be the initial and final state. The automaton
264 | 8 Discrete infinite groups accepts a language R ⊆ Γ ∗ with π(R) ⊆ G. Since each element in G is represented by a reduced word, we actually have π(R) = G. We call the automaton K(G, Σ) the kernel of the Schreier graph. The graph K(G, Σ) = (Q, E) is connected and it has an edge labeling λ : E → Γ. We orient the edges via their labeling by E+ = { e ∈ E | λ(e) ∈ Σ }. An undirected edge is, according to our convention, a collection {e, e} of two oriented edges e and e. In our case e is a tuple e = (p, a) with p ∈ Q and a ∈ Γ. Letting e = (pa, a−1 ), we have λ(e) = a−1 . If we want to stress that we consider an oriented edge we call it arc. Thus, an arc is the same as an oriented edge. A tree is a nonempty connected undirected graph (U, T) without loops, multiple edges or cycles. A tree (Q, T) with T ⊆ E is called a spanning tree of (Q, E). Like any connected graph, (Q, E) has a spanning tree (Q, T). In our applications, (Q, E) is connected and countable. In order to show that it has a spanning tree, we arrange the edges in some linear list (e 1 , e2 , . . . ). Let T0 = 0 and inductively define T i for i ⩾ 1 as follows: if the edges in T i−1 ∪ { e j | j > i } connect all vertices of Q, then let T i = T i−1 ; otherwise let T i = T i−1 ∪ {e i }. Finally, define T = ∪i∈ℕ T i ; and (Q, T) becomes a spanning tree of (Q, T). For arbitrary graphs, the proof is similar using transfinite induction or Zorn’s lemma. Theorem 8.25. Let G be a subgroup of F = F(Σ). Then G is a free group. In the notation just introduced G is isomorphic to the free group F(∆) with ∆ = E+ \ T where T is a spanning tree (Q, T) of the graph K(G, Σ). Moreover, if Q is finite and if m = |E|/2 ∈ ℕ ∪ {∞} denotes the number of undirected edges in K(G, Σ), then G has rank |∆| = m + 1 − |Q| Proof: We interpret G in the free group F(E + ). In other words, we interpret K(G, Σ) as an undirected graph and we want to find G as a subgroup in the free group whose generators are the arcs in E+ . Recall that we have λ(e) = λ(e)−1 for all edges e. Hence, we obtain a well-defined homomorphism λ : F(E+ ) → G For any two vertices p, q ∈ Q, the uniquely determined shortest path in the spanning tree from p to q is denoted by T[p, q]. This is a sequence of arcs; in particular, we may interpret T[p, q] as an element of F(E+ ). In the group F(E+ ), we then have T[p, q] = T[q, p]−1 . Next, we define for each arc e an element τ(e) ∈ F(E+ ) by the sequence of arcs: τ(e) = T[G, s(e)] ⋅ e ⋅ T[t(e), G] So, we walk in the spanning tree from the coset G to the source of e, pass through e and then, in the spanning tree, go back to the initial vertex G. In particular, we have λ(τ(e)) ∈ G. Moreover, in order to compute λ(τ(e)) ∈ G, we can use any path in K(G, Σ) which starts and ends in the coset G and passes through the arc e.
8.10 Free groups
| 265
If t(e) = s(f) for arcs e, f then τ(e) ⋅ τ(f) = T[G, s(e)] ⋅ e ⋅ f ⋅ T[f(e), G] ∈ F(E + ) because the middle paths between G and t(e) = s(f) cancel each other. Now, let e 1 ⋅ ⋅ ⋅ e m ∈ E∗ be a sequence of arcs from G to G; then we have λ(e1 ⋅ ⋅ ⋅ e m ) = λ(τ(e1 )) ⋅ ⋅ ⋅ λ(τ(e m )) ∈ G for the labeling. In particular, λ(τ(F(E+ )) = G because the labels for the sequences of arcs in K(G, Σ) are sufficient to generate G. Note that τ(e) = 1 ∈ F(E+ ) and λ(τ(e)) = 1 ∈ G for e ∈ T. Thus, we obtain a surjective homomorphism λ ∘ τ : F(∆) → G It only remains to show that λ ∘ τ is injective because, for nonempty graphs with |Q| vertices, a spanning tree has |Q| − 1 undirected edges. Suppose that λ(τ(w)) = 1 for a reduced word w = e 1 ⋅ ⋅ ⋅ e k with e i ∈ ∆ ∪ ∆ and k ⩾ 0. Then e i−1 ≠ e i for 2 ⩽ i ⩽ k, and no edge e i belongs to T for 1 ⩽ i ⩽ k. The sequence of arcs τ(w) = T[G, s(e 1 )]e1 T[t(e1 ), G] ⋅ ⋅ ⋅ T[G, s(e k )]e k T[t(e k ), G] ∈ E∗ can be reduced for k ⩾ 2 by the rules ee → 1 possibly within factors T[t(e i−1 ), G]T[G, s(e i )]. But no reduction can involve an arc e i ∈ ∆ ∪ ∆ because the neighbors remain edges of the spanning tree or have the form e i−1 or e i+1 . By performing all reductions, we obtain a reduced chain of arcs f1 ⋅ ⋅ ⋅ fℓ from G to G with k ⩽ ℓ and f i−1 ≠ f i for all 2 ⩽ i ⩽ ℓ. We still have λ(f1 ⋅ ⋅ ⋅ fℓ ) = 1. Suppose that ℓ ⩾ 1, then there is an index i with λ(f i−1 ) = λ(f i−1 ). But t(f i−1 ) = s(f i ) because we have a sequence of arcs, and this means f i−1 = f i which does not hold. This shows that ℓ = 0 and, hence, we have k = 0. Therefore, w is the empty word. We say that K(G, Σ) is complete if, for each letter a ∈ Γ and every state Gu ∈ Q, we have Gua ∈ Q, too. In this case, there is an edge from Gu to Gua in K(G, Σ) labeled by a. Since in the Schreier graph there is exactly one edge labeled by a from Gu to Gua, the graph K(G, Σ) is complete if and only if it coincides with the Schreier graph. Two situations are of special interest: Lemma 8.26. If G is a nontrivial normal subgroup or if G is of finite index in F(Σ), then K(G, Σ) is complete and, hence, K(G, Σ) is the Schreier graph. Proof: Let Gu be a right coset and u = a 1 ⋅ ⋅ ⋅ a k reduced with k ⩾ 0 and a i ∈ Γ. We show that Gu ∈ Q. Then K(G, Σ) is the whole Schreier graph, and K(G, Σ) is therefore complete. This is true for k = 0. Now, let k ⩾ 1. If Σ = {a} is a singleton then G is of finite index n; and by means of Ga i for 0 ⩽ i ⩽ n, we travel through the Schreier graph. Now, let |Σ| ⩾ 2 and b ∈ Σ with a k ≠ b. The map Gw → Gwb permutes the right cosets. If G −1 has finite index, then there exists n ⩾ 1 with Gub n = Gu. Since a1 ⋅ ⋅ ⋅ a k b n a−1 k ⋅ ⋅ ⋅ a1 is reduced, it defines a path from G to G. In particular, Gu ∈ Q.
266 | 8 Discrete infinite groups
Now assume that G is a nontrivial normal subgroup of infinite index in F(Σ) and u = a1 ⋅ ⋅ ⋅ a k as above. In particular, |Σ| ⩾ 2 and there exist a, b ∈ Σ with a k = a ≠ b. By assumption G ≠ {1}. Hence, there is a shortest nonempty reduced word w ∈ G. This word cannot have the form cυc−1 with c ∈ Γ because cυc−1 ∈ G implies υ ∈ G, since −1 G is normal. If w neither begins with a−1 nor ends in a, then a1 ⋅ ⋅ ⋅ a k w a−1 k ⋅ ⋅ ⋅ a1 is reduced and it describes a path from G to G in K(G, Σ). As a consequence Gu ∈ Q. Thus, we may assume that w either begins with a−1 or ends in a. It cannot be both as we pointed out above. By symmetry let w end in a. Then the first letter of w is some c ≠ a−1 . By interchanging the role of b and b −1 if necessary, we may assume that −1 b ≠ c ≠ a−1 . As a consequence, uυ = a1 ⋅ ⋅ ⋅ a k b −1 w b a−1 k ⋅ ⋅ ⋅ a1 is reduced. Again, uυ describes a path from G to G in K(G, Σ) and therefore Gu ∈ Q. Lemma 8.27. Let G be a subgroup of a finitely generated free group F(Σ). Then the following statements are equivalent: (a) The subgroup G is rational. (b) The subgroup G is finitely generated. (c) The kernel K(G, Σ) is finite. Proof: If K(G, Σ) is finite then G is rational. A rational subgroup is finitely generated by Theorem 8.21. Hence, let G be finitely generated. We only have to show that Q is finite, where as above K(G, Σ) = (Q, E). Since G is finitely generated, there are finitely many words w1 , . . . , w k ∈ Γ ∗ such that each element in G can be represented as a product over A = {w1 , . . . , w k }. The reading of a word w ∈ A∗ in the Schreier graph uses only a finite part Q of the right cosets, and all reduced words are formed by reduction of words from A∗ . Let Q be the finite set of vertices that are visited when reading w1 w2 ⋅ ⋅ ⋅ w k in the Schreier graph. Remember that, after reading any of the words w i , we return to the starting node G. Thus, when reading a word from A∗ , only vertices from Q are visited. Since any reduced word representing an element in G can be obtained by reducing a word from A∗ , it follows Q ⊆ Q and Q is finite. Theorem 8.28. Let F = F(Σ) be a finitely generated free group and let G be a nontrivial normal subgroup. Then, G has finite index in F if and only if G is finitely generated. Proof: If G has finite index then K(G, Σ) is finite and, hence, G is finitely generated by Lemma 8.27. Conversely, if G is finitely generated, then K(G, Σ) is finite. Now, Lemma 8.26 shows that K(G, Σ) is the Schreier graph. Therefore, the index is finite. We now give the classical form of Theorem 8.25. Theorem 8.29 (Nielsen and Schreier). Subgroups of free groups are free. If F is a finitely generated free group of rank r and G a subgroup of finite index n, then G has finite basis ∆ and the following rank formula holds: |∆| − 1 = n ⋅ (r − 1)
8.10 Free groups
| 267
Proof: Let F = F(Σ). The statement was already proved in Theorem 8.25; only the rank formula had a slightly different form. We obtain the classical version directly from Lemma 8.26, which says that K(G, Σ) is the Schreier graph. Therefore, |Q| = n = [F : G] and m = n ⋅ r. Corollary 8.30. If x and y are commuting elements of a free group, then x and y generate a cyclic subgroup. Proof: The subgroup G generated by x and y is free by Theorem 8.29 and commutative by assumption. Hence, G is either trivial or of rank 1. In both cases G is cyclic. Corollary 8.30 corresponds to Theorem 6.5 (b) for free monoids. To conclude this section, we show the corresponding result to Theorem 6.5 (a) for free groups. For the remainder of this section, let F(Σ) be the free group over Σ and Γ = Σ ∪ Σ−1 ⊆ F(Σ). We write a = a−1 for a ∈ Γ. A word w ∈ Γ ∗ is called cyclically reduced if ww is reduced. In particular, w is reduced; and if w begins with a letter a ∈ Γ, then it does not end with a. Each element x ∈ F(Σ) is conjugated to an element, which can be represented by a cyclically reduced word w. Recall that words x, y ∈ Γ ∗ are transposed if and only if we can write x = rs and y = sr. Clearly, transposed words represent conjugated words in F(Σ). The next theorem states the converse for cyclically reduced words. It was published by Lyndon and Schützenberger in 1962. Theorem 8.31. Let F(Σ) be the free group over Σ and x, y ∈ Γ ∗ be cyclically reduced. If zxz−1 = y in F(Σ) for some z, then there exist reduced words r, s ∈ Γ ∗ with x = sr and y = rs in Γ ∗ . Proof: We may assume that z is given by a reduced word in Γ ∗ . If zx and yz are both reduced then the words are identical and the statement follows from Theorem 6.5 (a). Now, let zx be not reduced (the arguments for yz are analogous). Then there is a letter a ∈ Γ such that z = z a−1 and x = ax . It follows z x a = zxa = yza = yz in F(Σ). Since x is cyclically reduced, x a is cyclically reduced, too. By induction on |z| there exist reduced words r , s ∈ Γ ∗ with x a = s r and y = r s . Now, x = ax and x a are transposed, and also x a and y are transposed. By Theorem 6.5 (a) transposition is transitive, and therefore x and y are transposed words. Corollary 8.32. Dehn’s problems, that is, the word problem, the conjugacy problem, and the isomorphism problem are algorithmically solvable for finitely generated free groups. Proof: The word problem can be solved by free reduction. To solve the conjugacy problem we need to cyclically reduce two input words. Then we have to test if these cyclically reduced words are transposed. This test gives the correct answer by Theorem 8.31. Finally, two free groups are isomorphic if and only if their bases have the same cardinality.
268 | 8 Discrete infinite groups
8.11 The automorphism group of free groups In this section, we give a graph theoretic interpretation of a classical result by Nielsen [83]: the automorphism group Aut(F) of a finitely generated free group F is finitely generated. The classical proof uses elementary transformations. Our proof is based on the notion of Whitehead automorphisms. In fact, the automorphism group Aut(F) is finitely presented [70], but we do not show this here.
Whitehead automorphisms Let F = F(Σ) be a free group of rank n, that is, |Σ| = n. We define a finite family of automorphisms of F which are named after John Henry Constantine Whitehead (1904– 1960). Every permutation of Σ induces a Whitehead automorphism of F. There are n! automorphisms of this type. For a ∈ Σ, we define a Whitehead automorphism i a as follows: { a−1 for a = b i a (b) = { b for b ∈ Σ \ {a} { The automorphism i a inverts the letter a and leaves all other letters invariant. There are n automorphisms of this type. The last set of Whitehead automorphisms is defined for a ∈ Σ and three pairwise disjoint subsets L, R, M of Σ with a ∈ M. Each tuple (a, L, R, M) defines a Whitehead automorphism W(a,L,R,M) as follows: ab { { { { { { ba−1 W(a,L,R,M)(b) = { { aba−1 { { { { {b
for b ∈ L for b ∈ R for b ∈ M for b ∈ Σ \ (L ∪ M ∪ R)
There are n4n−1 automorphisms of this type. Sometimes one also counts the inverse −1 automorphism W(a,L,R,M) as a Whitehead automorphism but this does not really mat−1 ter, because we have W(a,L,R,M) = i a ∘ W(a,L,R,M) ∘ i a . Among the Whitehead automorphisms, we find the subset of elementary Nielsen transformations which consists of the automorphisms i a and λ ab = W(a,{b},0,{a}). The following theorem shows that Aut(F(Σ)) is finitely generated. Exercise 8.14 (a) then shows that only four elementary Nielsen transformations are sufficient to generate Aut(F). Theorem 8.33. The automorphism group Aut(F(Σ)) is generated by the Whitehead automorphisms. Our proof is based on lectures by Saul Schleimer given at the Centre de Recerca Matemàtica in Barcelona (Catalonia) in September 2012. According to Schleimer
8.11 The automorphism group of free groups
| 269
and [107], our proof of Theorem 8.33 follows unpublished ideas of Stallings. The intention is not to present the shortest possible proof, but to understand why Whitehead automorphisms form a natural set of generators for Aut(F(Σ)). In fact, we can get even a little more from the proof: without any additional effort it shows that finitely generated free groups are Hopfian. In the following, X and Y describe finite sets and F(X) and F(Y) the corresponding ̃ = X ∪ X −1 ⊆ F(X) and ̃ free groups. Let then X Y = Y ∪ Y −1 ⊆ F(Y). We want to study surjective homomorphisms of F(X) onto F(Y). Without loss of generality, we may assume that Y ≠ 0. We call a homomorphism π : F(X) → F(Y) a projection if ̃ π(X) ⊆ {1} ∪ ̃ Y ⊆ {1} ∪ π(X) That is, a projection is surjective, and letters are either deleted or mapped to letters or their inverses. For X = Y every projection is a product of Whitehead automorphisms. Hence, the following statement generalizes Theorem 8.33. Theorem 8.34. Let φ : F(X) → F(Y) be a homomorphism. (a) Then it is decidable whether φ is surjective, that is, if φ is an epimorphism. (b) If φ is an epimorphism, then φ can be factorized as φ = π ∘ ψ where π is a projection and ψ a product of Whitehead automorphisms. Theorem 8.34 provides the assertion that finitely generated free groups are Hopfian, because for |X| = |Y| projections are isomorphisms. We also see that if finitely generated free groups are isomorphic, then their rank is equal. (See Theorem 8.24 for the general statement including infinitely generated free groups.) Indeed, if |X| < |Y|, then there are no projections from F(X) to F(Y). Thus, epimorphisms (resp. isomorphisms) from F(X) onto F(Y) exist if and only if |X| ⩾ |Y| (resp. |X| = |Y|). The rest of this section is devoted to the proof of Theorem 8.34. Without loss of generality, we may assume that X and Y are nonempty finite sets. A graphical realization of a homomorphism of F(X) to F(Y) is a pair (Γ, Φ) where Γ is a marked, connected, ̃ to edge labeled, finite graph, together with a spanning tree, and Φ is a mapping of X ̃ the set of edges of Γ. Formally, Γ is a tuple (1, V, E, T, λ) and Φ : X → E is a mapping with the following properties: – The pair (V, E) is a connected graph with vertex set V and finite nonempty edge set E. – There exists a marked vertex in V, which is denoted by 1. – For each directed edge e, we denote the source by s(e) ∈ V and the target by t(e) ∈ V. The graph may have loops and multiple edges. – To each edge e ∈ E exactly one edge e ∈ E is assigned such that s(e) = t(e), t(e) = s(e) and e = e. – λ: E → ̃ Y maps each edge e to an element in ̃ Y = Y ∪ Y −1 such that λ(e) = λ(e)−1 . The mapping λ induces a homomorphism λ : E∗ → F(Y). – We let E+ = λ−1 (Y) be the set of positively oriented edges. (Drawings depict these edges, only.) The homomorphism λ : E∗ → F(Y) factorizes through the free
270 | 8 Discrete infinite groups
– – – –
group F(E+ ). Hence, we also have a homomorphism λ : F(E+ ) → F(Y) between free groups. (V, T) is a spanning tree of (V, E). An edge in E \ T is called a bridge. ̃ and the set E \ T of the bridges. Φ induces a bijection between X −1 We have Φ(a ) = Φ(a) for all a ∈ X.
Every graphical realization defines a homomorphism φ : F(X) → F(Y) as follows. Let p, q ∈ V be two vertices. As done earlier, T[p, q] denotes the shortest path in the spanning tree (V, T) from p to q. For a ∈ X and e = Φ(a) ∈ E, we put φ Γ (a) = T[1, s(e)] e T[t(e), 1] Thus, φ Γ (a) is a closed path in (V, E) which starts at the marked vertex 1, walks in the spanning tree from 1 to the source of Φ(a), passes through the edge Φ(a) and then goes back in the spanning tree to the marked vertex 1. Hence, we may interpret the word φ Γ (a) ∈ E∗ also in the free group F(E+ ). Since φ Γ (a−1 ) = φ Γ (a)−1 holds in F(E+ ), we obtain a well-defined homomorphism φ Γ : F(X) → F(E+ ). Finally, we define φ(a) = λ(φ Γ (a)) = λ(T[1, s(e)] e T[t(e), 1]) ∈ F(Y) We obtain the following picture: φΓ
λ
φ : F(X) → F(E+ ) → F(Y) We say that (Γ, Φ) is a graphical realization of φ : F(X) → F(Y). A graphical realization of a homomorphism is by no means unique; see, for example, Figure 8.5. Before we proceed further, let us show that every homomorphism φ : F(X) → F(Y) has some graphical realization (Γ, Φ). This is simple. Consider a ∈ X, then we may write φ(a) as φ(a) = y 1 ⋅ ⋅ ⋅ y m with y i ∈ ̃ Y. Because Y ≠ 0 and since we allow factors of the form yy−1 , we may assume that m ⩾ 2. For each letter a ∈ X, we draw a cycle of length |φ(a)| with initial and target 1 and label the ith edge with the ith letter in the word φ(a). We add the first m − 1 edges of the cycle to the spanning tree and the a a d
b 1
a c
a a
a
1
c a
d
b
Fig. 8.5. Two different graphical realizations of the Whitehead automorphism W (a,{b},{c},{a,d}) .
8.11 The automorphism group of free groups
| 271
last edge e m becomes a bridge. Finally, we define Φ(a) = e m . The resulting graphical realization of φ looks like a flower with petals of different size. ̃ ∗ , we may interpret φ(u) Given a graphical realization (Γ, Φ) and a word u ∈ (X) as the word φ Γ (u) in E∗ or as an element of the free group F(E+ ) or by virtue of λ as an element of F(Y). However, there is no risk of confusion. The trivial homomorphism has a graphic realization by a graph with two points 1 and 2, which are connected in the direction from 1 to 2 by |X| + 1 different arcs and all of them have label y, where y is some fixed letter in the nonempty set Y.
1
y y .. . y
2
Top arc in the spanning tree and |X| bridges
The identity map id : F(X) → F(X) is realized by a so-called rose. This is a graph with ̃ is the set of loops, and the involution corresponds to the map one vertex 1 in which X −1 x → x = x . Further, the labeling λ of the loops is induced by the map from X to ̃ Y. For idX : F(X) → F(X) the labeling is also the identity. b c
a “Rose of the identity” for X = {a, b, c, d, e}:
1
e
d
Roses also realize those projections that map letters to letters or their inverses. If π : F(X) → F(Y) is a projection of this form, then we label the loops of the rose with π. The rose shows us whether it encodes a projection. For this, it is necessary and sufficient that all y ∈ Y appear as labels. b c
a Projection of {a, b, c, d, e} onto {a, b, c}:
1
a
a
272 | 8 Discrete infinite groups
a a
a
a
a
a
a
1
a
b
b
1
b
b
a a
Fig. 8.6. Different spanning trees.
D d
E e
a
a A
b
a
d
1
a
a
p
q
a
p
c C
b B a a
e A
b
d
a
e
D d
E e
a
q
1
c C
b B
Fig. 8.7. Replacement of the bridge A by the edge e.
In the following pictures, we draw the edges from the spanning tree thicker and as straight lines while the bridges are curved; see Figure 8.6. Example 8.35. We use Figure 8.7 and make the following agreement. Let X = E+ \ T = {A, B, C, D, E} (with φ(x) = x) and Y = {a, b, c, d, e}. The edges e ∈ E+ are also indicated with their labels λ(e) ∈ Y. Thus, the upper picture tells us: φ(A) = baea
φ(B) = baeab(ba)−1
φ(D) = d(bae)−1
φ(E) = bdae
φ(C) = baeac(baea)−1
8.11 The automorphism group of free groups
| 273
In the lower picture the spanning tree has been changed which leads to new paths: φ (A) = baea
φ (B) = a−1 ab(ba)−1
φ (D) = da
φ (E) = bdae
φ (C) = a−1 aca−1 a
This gives the following result: φ(A) = φ (A)
φ(B) = φ (AB)
φ(D) = φ (DA−1 )
φ(E) = φ (E)
Therefore, φ can be factorized as φ ∘ W(A,{B},{D},{A,C}).
φ(C) = φ (ACA−1 )
◊
A graphical realization specifies the base point, the orientation, a spanning tree, etc. However, we are interested in homomorphisms only modulo Whitehead automorphisms. Thus, we can think that a graphical realization without a specification of the items above defines a class of homomorphisms. If every member in such a class is obtained from any other member by a multiplication of a Whitehead automorphism, then we do not lose any substantial information. This will become clear in the next subsections.
Free choice of the base point Let 1 ∈ V be a vertex different from 1 which we want to make the marked initial and final vertex. Instead of (Γ, Φ), we now consider (Γ , Φ) with Γ = (1 , V, E, T, λ). Then, for a ∈ X, we may compute φ (a) in F(Y) by φ (a) = λ(T[1 , 1] φ(a) T[1, 1 ]) = hφ(a)h−1 where h = λ(T[1 , 1]) ∈ F(Y). Therefore, we obtain φ by applying an inner automorphism γ of F(Y). In particular, φ is surjective if and only if φ is surjective. If φ is surjective then, for y ∈ F(Y), there exists x ∈ F(X) with φ(x) = y. In this case φ (a) = φ(xax−1 ) = φ ∘ ψ, where ψ is an inner automorphism. Hence, ψ is a product of Whitehead automorphisms. We may therefore move the base point around: every vertex can be chosen to be the base point.
Changing the orientation It is convenient to change the orientation of identically labeled edges simultaneously. Consider a label y ∈ Y and then take λ = i y ∘ λ as the new edge labeling. This gives a new pair (Γ , Φ) and defines φ : F(X) → F(Y) with i y ∘ φ = φ. If φ = γ ∘ π for an inner automorphism γ of F(Y) and a projection π , then we have φ = γ ∘ π for an inner automorphism γ of F(Y) and a projection π. This follows because i y ∘γ ∘π (a) = i y (h)i y (π (a))i y (h)−1 for some h ∈ F(Y) and because π = i y ∘π is a projection. Therefore, we may consider (Γ , Φ) and φ instead of (Γ, Φ) and φ. Thus, in the drawing: if y ∈ Y, then we are free to change the orientation of all arcs labeled by y without changing the label of these arcs.
274 | 8 Discrete infinite groups Replacing λ(Φ(a)) by its inverse Let a ∈ X and e = Φ(a) ∈ E \ T be a bridge. If we define Φ (a) = e and Φ (b) = Φ(b) for all a ≠ b ∈ X, then we obtain φ = φ ∘ i a . Since i a is a Whitehead automorphism, in this situation we may consider Φ and φ instead of Φ and φ.
Changing the spanning tree This is the heart of the process. A local change of the spanning tree causes a right multiplication by some Whitehead automorphism. Let A ∈ E\T be a bridge with s(A) = q ≠ 1 and t(A) = 1. We want to construct a spanning tree T that contains A, and we want to compute the homomorphism F(X) → F(Y) that results from the modified graphical realization. If there is a parallel edge e ∈ T, that is, s(e) = s(A), t(e) = t(A) with λ(A) = λ(e), then we simply replace A by e in the spanning tree. Now, let Φ(a) = A where a ∈ X. Consider the path T[1, q] = T[1, p] e with e ∈ T and s(e) = p, t(e) = q. This runs in the spanning tree from 1 to the source of A. We remove e, e from T and we insert A and A into the spanning tree. This defines a new spanning tree T . We define Φ (a) = e; and for b ∈ X with b ≠ a, we let Φ (b) = Φ(b). Note that e is a bridge in T . Let φ : F(X) → F(Y) be induced by ((1, V, E, T , λ), Φ ). The path φ(a) = T[1, p] e A has not changed because T[1, p] = T [1, p] and A = T [q, 1], hence φ(a) = φ (a). Now consider b ∈ X with b ≠ a. Then Φ(b) = B is still a bridge and we have φ (b) = T [1, s(B)] B T [t(B), 1] There are now four different cases that exactly correspond to the distinctions in Whitehead automorphisms. (a) The path T [1, s(B)] begins with the edge A, but T [t(B), 1] does not end with A. Then the following equations hold in the free group F(E+ ): φ(b) = T[1, q] T[q, s(B)] B T[t(B), 1] = (T[1, q]A) (A T[q, s(B)] B T[t(B), 1]) = φ(a)φ (b) = φ (a)φ (b) = φ (ab) (b) The path T [1, s(B)] does not begin with the edge A, but T [t(B), 1] ends with A. Then the following equations hold in the free group F(E+ ): φ(b) = T[1, s(B)] B T[t(B), q]AA T[q, 1] = φ (b)φ(a−1 ) = φ (ba−1 ) (c) The path T [1, s(B)] begins with the edge A and T [t(B), 1] ends with A. Then the following equations hold in the free group F(E+ ): φ(b) = T[1, q]A A T[q, s(B)] B T[t(B), q]AA T[q, 1] = φ(a)φ (b)φ(a−1 ) = φ (aba−1 )
8.11 The automorphism group of free groups
| 275
(d) The path φ (b) does not begin with the edge A and does not end with A. Then the whole path contains neither A or A. In this case we have φ (b) = φ(b). The change of the spanning tree thus results in φ : F(X) → F(Y) where φ = φ ∘ ψ and ψ is a Whitehead automorphism. The inverse of a Whitehead automorphism is a (product of) Whitehead automorphism(s). Hence, φ = φ ∘ ψ−1 . We may therefore change spanning trees in the described manner. This is explicitly done in Example 8.35. Since we also may change the orientation of equally labeled edges, we may assume the following, if necessary: if e, f are equally labeled edges which have the common source 1 and the targets p and q, respectively, and if further 1, p, and q are pairwise distinct, then the above method provides a spanning tree T with e, f ∈ T.
Determinization We say that a graphical realization (Γ; Φ) is deterministic, if for each u ∈ V and each y∈̃ Y there is at most one vertex υ, which is the target t(e) for edges e with s(e) = u and λ(e) = y. Thus, for each word w = y 1 ⋅ ⋅ ⋅ y m ∈ ̃ Y ∗ there is at most one vertex q ∈ V, which is reachable from u by a path labeled with w. Suppose that υ = s(e) = s(f) is the source of two edges e and f , which have the same label y = λ(e) = λ(f) but different final vertices p = t(e) ≠ t(f) = q. This violates the determinism. Note that the determinism is not violated by parallel edges, that is, if t(e) = t(f). We do not have to remove parallel edges with the same labels. They indicate that the homomorphism is not injective, but this fact becomes visible in the projection π at the end, too. Without restriction we can assume that t(f) ≠ υ; otherwise we interchange the role of e and f . By changing the orientation, if necessary, we may assume that y ∈ Y. By moving the base point we may also assume that υ = 1. In particular, 1 ≠ q. Furthermore, we may assume that f ∈ T is an edge in the spanning tree. We now fold the graph along the edges e and f . This means, we identify e with f , e with f , and p with q. Formally, we remove f and f from E and hence also from T. All remaining edges with source or target q get the new source or target p. Finally, we remove the now isolated vertex q. Thus, we obtain a new graph Γ = (1, V , E , T , λ), which is still connected. Since we have removed exactly one vertex and exactly one undirected edge from T, then (V , T ) is a spanning tree of (V , E ). Moreover, if 1 ≠ t(e), then we may assume, by the above construction, that e ∈ T is an edge of the spanning tree, too, because then e and f are equally labeled edges which both have the source 1 and the target p and q, respectively, where 1, p, and q are pairwise different. Then e ∈ T , and the folding of e and f does not change the outputs of φ in the group F(Y). This means, we have φ = φ .
276 | 8 Discrete infinite groups
υ u
1
p
s
υ
λ(e) = y
u
λ(f) = y
w
q
λ(e) = y
1
s p, q
w
t
t
If 1 = t(e), then the situation is somewhat different. Now, e is a loop and ̃ with Φ(a) = e, and we have thus a bridge. Therefore, there is exactly one a ∈ X φ(a) = λ(e) = y. Without loss of generality, let a ∈ X and y ∈ Y. λ(e) = y
λ(e) = y
u
1
λ(f) = y
υ
u
q w
1, q υ
w
We compute the new, induced homomorphism φ . We already saw that φ (a) = φ(a). Now, let a ≠ b ∈ X. For φ(b), there are again only the four cases we already know. (a) φ(b) begins with f but does not end with f . (b) φ(b) does not begin not with f but ends with f . (c) φ(b) begins with f and ends with f . (d) φ(b) contains neither f nor f . The paths φ (b) arise out of the paths φ(b) by deleting all edges f and f . However, a deletion changes the label and therefore the image in F(Y). To compensate for this, we emphasize that we do not delete the edges f and f in φ(b), but replace f by e as well as f by e. After this modification, φ(b) becomes a word w b ∈ E∗ , and we obtain λ(φ(b)) = λ(w b ) ∈ F(Y). The word w b is a closed path in (V , E ), and according to the above cases, in F(Y), we obtain (a) φ(b) = λ(w b ) = λ(e) φ (b) = φ (a)φ (b) = φ (ab) (b) φ(b) = λ(w b ) = φ (b) λ(e) = φ (ba−1 ) (c) φ(b) = λ(w b ) = λ(e) φ (b) λ(e) = φ (aba−1 ) (d) φ(b) = φ (b) We remark that φ = φ ∘ ψ for a Whitehead automorphism ψ.
8.12 The special linear group SL(2, ℤ)
| 277
The rose ̂ Φ) ̂ By the above procedures, we finally obtain a deterministic graphical realization (Γ, for a homomorphism ̂ φ : F(X) → F(Y) such that φ =γ∘̂ φ∘ψ
(8.9)
In Equation (8.9), the mapping γ is an inner automorphism of F(Y), and ψ is a product ̂ we now can recognize if φ is surjective. of Whitehead automorphisms of F(X). With Γ, First of all, this is the case if and only if ̂ φ is surjective. Let us make a preliminary observation. Suppose, w = uzzυ is the label of a path in Γ̂ from 1 to 1 with u, υ ∈ ̃ Y ∗, and z ∈ ̃ Y. By determinism an edge e corresponds to the position of z, and for z we may choose the edge e without changing the vertices on the path. Hence, uυ also describes ̂ is now the reduced word in ̂ a path from 1 to 1. If w Y ∗ , which corresponds to w, then ̂. there is a path from 1 to 1, labeled with w Suppose, φ is surjective, then certainly y ∈ φ(F(X)) = ̂ φ(F(X)) for all y ∈ Y. The letter y is a reduced word and, by the preliminary consideration, the graph Γ̂ contains a loop at 1 labeled by y. This is a necessary condition. Conversely, the existence of these loops is sufficient to guarantee the surjectivity of φ. For the final part in the proof of Theorem 8.34, we may assume that φ : F(X) → F(Y) is surjective. As we have explained above, if y = φ(z), then yφ(x)y−1 = φ(zxz−1 ) Since an inner automorphism is a product of Whitehead automorphisms, the inner automorphism γ in Equation (8.9) is not needed. Thus, for some product of Whitehead automorphisms ψ of F(X), we may write φ=̂ φ∘ψ The graph Γ̂ is deterministic, and since φ is surjective, for all y ∈ Y there exists a selfloop at 1 with label y. Thus, there can be no other vertices because of determinism. The final graph Γ̂ is therefore a rose: a graph with one vertex 1, in which the set of loops ̃ Therefore, we have ̂λ = ̂ can be identified with X. φ for the labeling and π = ̂ φ for the desired projection from F(X) on F(Y). As mentioned above, our notion of determinism does not exclude parallel edges. Thus, we may have |X| > |Y| and several x ∈ X can be mapped to the same y. If this happens, then φ is not injective and parallel self-loops with the same label appear in the rose. This concludes the proof of Theorem 8.34.
8.12 The special linear group SL(2, ℤ) The special linear group SL(2, ℤ) consists of all 2 × 2 matrices with integer coefficients and determinant 1. This group acts on the complex numbers without the rational
278 | 8 Discrete infinite groups numbers by linear fractional transformations: if M = ( ac db ) ∈ SL(2, ℤ) and z ∈ ℂ \ ℚ, we set az + b M ⋅ z = M(z) = cz + d −1 d −b Since M = ( −c a ), we have M ⋅ z ∈ ℚ only if z itself is rational. The group SL(2, ℤ) acts on the upper half plane ℍ = { z ∈ ℂ | im(z) > 0 }, and also analogously on { z ∈ ℂ | im(z) < 0 } and ℝ \ ℚ. To see this, write z = r + is and z = r − is with r, s ∈ ℝ. Then az + b (az + b)(cz + d) = cz + d |cz + d|2 aczz + bd + (ad + bc)r (ad − bc)is = + |cz + d|2 |cz + d|2 Since zz ∈ ℝ and ad − bc = 1, we obtain that the imaginary part in az+b cz+d is im(z) up to a multiplication by a positive real number. Moreover, the operation is a group operation because a (( c
b a )( c d
z+b a ac z+d (aa + bc )z + (ab + bd ) b + b = )) ⋅ z = z+b (ca + dc )z + (cb + dd ) d c ac z+d + d
0 Also M ⋅ z = z for all z if and only if M = ( 10 10 ) = 1 or M = ( −1 0 −1 ) = −1. The kernel of the homomorphism corresponding to the group operation is therefore exactly the subgroup {±1} ⊆ SL(2, ℤ), which is also the center of SL(2, ℤ). In general, the center of a group is the subgroup of those elements which commute with all other elements of the group. In particular, the center is a normal subgroup. The modular group PSL(2, ℤ) is defined as the quotient SL(2, ℤ)/{±1}. Especially, the homomorphism of PSL(2, ℤ) into the group of all invertible maps from ℍ to ℍ is injective. In other words, PSL(2, ℤ) is a discrete subgroup of the group of linear fractional transformations from ℍ to ℍ, in particular, PSL(2, ℤ) acts “faithfully” on the upper half plane. The topic of this section is to give algebraic descriptions of SL(2, ℤ) and PSL(2, ℤ). We will show that SL(2, ℤ) is the amalgamated product of ℤ/4ℤ and ℤ/6ℤ such that the amalgamation is over their respective cyclic subgroups of order 2; and the modular group is the free product of ℤ/2ℤ and ℤ/3ℤ. We start with the study of the matrices in SL(2, ℤ) without negative entries. This forms a monoid, henceforth denoted by SL(2, ℕ). We will see that SL(2, ℕ) is the free monoid on two generators T and U where T, U ∈ SL(2, ℕ) are given by
1 T=( 0
1 ) 1
1 U=( 1
0 ) 1
Proposition 8.36. The monoid SL(2, ℕ) is free with basis {T, U}. Proof: Let ( ac bd ) ∈ SL(2, ℤ) with a, b, c, d ⩾ 0. Then a ⩾ c or d ⩾ b because ad − bc = 1. Suppose that a > c and d > b. Then 1 = ad − bc ⩾ (c + 1)(b + 1) − bc = 1 + c + b
8.12 The special linear group SL(2, ℤ)
| 279
and, hence, b = c = 0 and a = d = 1. Therefore, (a − c)(b − d) ⩾ 0 for ( 10 10 ) ≠ ( ac db ). That is, if ( ac db ) is not the identity matrix, then either the upper row is strictly greater than the lower, row or, conversely, the lower row is strictly greater than the upper row. Note that a = c and b = d is impossible. A row is strictly greater than another one if, first, it is greater or equal in both components and, second, strictly greater in at least one component. Next, we show that the monoid SL(2, ℕ) is freely generated by T and U. We have (
1 0
1 a )( 1 c
b a+c )=( d c
b+d ) d
(
1 1
0 a )( 1 c
b a )=( d a+c
b ) b+d
This shows that every product of a (nonempty) sequence of T’s and U’s results in a matrix with one row strictly greater than the other one. Moreover, each matrix in SL(2, ℕ) can uniquely be written as a product of T’s and U’s. This is true for the identity matrix, which corresponds to the empty word. Now, let ( ac bd ) ∈ SL(2, ℕ) be a matrix in which the upper row is strictly greater than the lower row. Note that c + d > 0. Then a = a + c and b = b + d for a, b ∈ ℕ and 1 ( 0
1 a )( 1 c
b a+c )=( d c
b+d a )=( c d
b ) d
Thus the first factor must be T. Since a + b + c + d < a + b + c + d, we can now decompose ( ac bd ) inductively. If the lower row is strictly greater than the upper row, then the first factor is U, and the induction argument remains unchanged. The geometric interpretation of the matrices T and U is given by their effect on a complex number z with im(z) > 0. For each n ∈ ℤ, we obtain 1 T n (z) = ( 0 U n (z) = (
1 n
n )⋅z = z+n 1 z 0 )⋅z = nz + 1 1
In particular, the matrices T and U have both infinite order. Next, we consider the two matrices S, R ∈ SL(2, ℤ) given by S=(
0 −1
1 ) 0
0 R=( 1
−1 ) 1
We have S2 = −1 and R3 = −1; hence, S has order 4 and R has order 6. Moreover, 1 0 ). Geometrically (in the complex −R = ST and RS = −STS = STS−1 = U −1 = ( −1 1 plane) we may interpret S, T, R, and R2 as follows: S(z) =
−1 z
T(z) = z + 1
R(z) =
−1 z+1
R2 (z) =
−(z + 1) z
280 | 8 Discrete infinite groups
Hence S(z) is an involution, T(z) a translation, and R(z) a rotation of order 3. Recall that R has order 6 in SL(2, ℤ), but, in the quotient group PSL(2, ℤ), its order is 3. Lemma 8.37. Any two of the four matrices R, S, T, U generate the group SL(2, ℤ). Proof: Any two of the four matrices generate the three matrices R, T and U by the above formulas, and hence, also S. This follows by an extensive case distinction. For instance, if we have T and U then we obtain R−1 = R5 by U −1 T = −STST = −R2 = R5 Now, let A = ( ac db ) ∈ SL(2, ℤ). We show that A is in the subgroup generated by S, T, and U. If c = 0 then d ≠ 0, and we replace A by AS. If c < 0 then we replace A by −A = S2 A. Now we may assume that c > 0. If we replace A by T n AT n for a sufficiently large n, then, without loss of generality, we obtain a, b, c, d > 0. Now A is already in the submonoid generated by T and U (Proposition 8.36). This proves the lemma. Theorem 8.38. The groups PSL(2, ℤ) and SL(2, ℤ) have a presentation with R and S as generators and S 2 = R3 = 1 or S4 = R6 = S2 R3 = 1, respectively, as defining relations. Consequently, PSL(2, ℤ) is the free product of ℤ/2ℤ and ℤ/3ℤ, and SL(2, ℤ) is the amalgamated product of ℤ/4ℤ with ℤ/6ℤ such that the amalgamation is over their respective cyclic subgroups of order 2. Proof: The product of the two cyclic groups ℤ/2ℤ and ℤ/3ℤ has a presentation G = {r, s}∗ /{s2 = r3 = 1}. The amalgamated product of ℤ/4ℤ and ℤ/6ℤ, with amalgamation of their respective cyclic subgroups of order 2, has a presentation G = {r, s}∗ / {s4 = r6 = 1, s2 = r3 }. Consider the homomorphisms φ : G → SL(2, ℤ) and φ : G → PSL(2, ℤ) induced by r → R and s → S. Both homomorphisms are surjective by Lemma 8.37. Now, let w ∈ {r, s}∗ with φ(w) = 1 and φ (w) = 1, respectively. If w ∈ s∗ then it follows already w = 1 in G and w = 1 in G , respectively. The analogous statement holds for w ∈ r∗ . Each element of G (respectively G ), which is not in the subgroup generated by a conjugate of s or r, can be written as a word w = s j0 r i1 (sr i2 ) ⋅ ⋅ ⋅ (sr i m )s j m with m ⩾ 1 and 0 ⩽ j0 ⩽ 3 (respectively 0 ⩽ j0 ⩽ 1), 0 ⩽ j m ⩽ 1, and 1 ⩽ i k ⩽ 2 for 1 ⩽ k ⩽ m. Note that in G, we can move the factors s2 and r3 to the left, and that we have s2 = r3 . The only difference between G and G is the range of j0 . The values for j0 are 0, 1, 2, 3 in G and 0, 1 in G . It is enough to show that φ (w) ≠ 1 for such a word w. After a suitable conjugation with r or r2 , if necessary, we may assume that w = r i1 s ⋅ ⋅ ⋅ r i m−1 sr i m , m ⩾ 2, with 1 ⩽ i j ⩽ 2 for j = 1, . . . , m and 1 ⩽ i1 , i m ⩽ 2. Now, we apply a ping-pong argument: the modular group also operates on ℝ \ ℚ by −(z+1) −1 2 virtue of the above formulas. We obtain S(z) = −1 z , R(z) = z+1 and R (z) = z . In 2 particular, R and R map positive values to negative ones, and S maps negative values to positive ones. Hence, if we apply φ (w) on any arbitrary positive, irrational number like √2, then we obtain a negative value for φ (w)(z). This implies φ (w) ≠ 1.
Exercises
|
281
Since T and U generate a free submonoid in the group SL(2, ℤ), we might suspect that they generate a free subgroup in SL(2, ℤ). But this is in contradiction to Lemma 8.37 because SL(2, ℤ) contains elements of order 4. Nevertheless, both SL(2, ℤ) and PSL(2, ℤ) contain all free groups with finite or countable rank. To show this, it is enough to prove the existence of a subgroup of rank 2. For this purpose, we consider the following two matrices in SL(2, ℤ): A = RSR = (
0 −1
1 ) −2
2 B = SRSRS = ( 1
−1 ) 0
Corollary 8.39. The matrices A and B generate a free subgroup of rank 2 of the modular group PSL(2, ℤ) and, therefore, also of the group SL(2, ℤ). Proof: Let W ∈ PSL(2, ℤ) be a nonempty reduced word in A, A −1 , B and B−1 , written in terms of R and S. By Theorem 8.38, we may write PSL(2, ℤ) as {R, S}∗ /{S2 = R3 = 1}, and we obtain normal forms by cancelling the factors S2 and R3 in W. The normal forms for A, A−1 , B, and B−1 are A = RSR, A−1 = R2 SR2 , B = SRSRS, and B−1 = SR2 SR2 S. We have to show that W ≠ 1 in PSL(2, ℤ). For this purpose, it is enough to prove by induction that the last letter X ∈ {A, A−1 , B, B−1 } of W determines the last three letters of the normal form of W in {R, S}∗ : for X = A it is RSR, for X = A−1 it is SR2 , for X = B it is SRS, and for X = B−1 it is R2 S. In particular, the normal form is not the empty word, and therefore W ≠ 1 in PSL(2, ℤ). Remark 8.40. The embedding of the free group of rank 2 in the group SL(2, ℤ) gives another proof of Proposition 8.12. Since SL(2, ℤ) is residually finite, free groups are residually finite, too. In order to see this, it is enough to consider free groups of finite rank; and free groups of finite rank can be embedded into a free group of rank 2 by Exercise 8.8, which appears as a subgroup of SL(2, ℤ). ◊ The Baumslag–Solitar group BS(2, 3) from Example 8.8 is not Hopfian by Exercise 8.10. Hence, it is not residually finite by Exercise 8.9. Therefore, in contrast to free groups, BS(2, 3) cannot be embedded into SL(2, ℤ) or into any other linear group.
Exercises 8.1. Let M = Σ∗ /S for a finite length reducing and confluent semi-Thue system S. Show that the word problem for M is decidable in linear time. 8.2. Prove Proposition 8.11, that is, show that all residually finite and finitely presented monoids have a decidable word problem. 8.3. Let (Σ, I) be a finite, undirected graph and let M(Σ, I) = Σ∗ / { ab = ba | (a, b) ∈ I }
282 | 8 Discrete infinite groups
be the corresponding free partially commutative monoid. A transitive orientation of (Σ, I) is a subset I+ ⊆ I with I = { (a, b), (b, a) | (a, b) ∈ I+ } such that (a, b), (b, c) ∈ I+ implies (a, c) ∈ I+ . Show: (a) Let I+ ⊆ I be a transitive orientation of (Σ, I). Then the semi-Thue system S+ = { ba → ab | (a, b) ∈ I+ } is convergent. (b) Let S ⊆ Σ2 × Σ2 be a convergent semi-Thue system with Σ∗ /S = M(Σ, I). Then I+ = { (a, b) ∈ Σ × Σ | ab ∈ IRR(S) } is a transitive orientation of (Σ, I). (c) Let M(Σ, I) and M(Σ , I ) be isomorphic, that is, M(Σ, I) ≅ M(Σ , I ). Then the graphs (Σ, I) and (Σ , I ) are isomorphic. (d) Let M(Σ, I) ≅ (ℕ × ℕ) ∗ ℕ. Then the transposition relation (uυ, υu) in M(Σ, I) is not transitive. (e) How many vertices does the smallest graph without a transitive orientation contain? 8.4. Let C(Σ, I) = Σ∗ /{ a2 = 1, ab = ba | (a, b) ∈ I } be a finitely generated rightangled Coxeter group. (a) Construct a convergent semi-Thue system SRACG ⊆ Σ∗ × Σ∗ , which represents C(Σ, I) such that almost all rules are length preserving. (b) Let G = G(V, E) be a graph group. Find an embedding of G in a Coxeter group C = C(Σ, I). (c) For (Σ, I) let F be the set of cliques, that is, F = { F ⊆ Σ | ∀a, b ∈ F : (a, b) ∈ I } Consider the finite semi-Thue system T ⊆ F∗ × F∗ with the rules FF → (F \ {a})(F \ {a}) FF → (F ∪ {a})(F \ {a}) 0 → 1
for a ∈ F ∩ F for a ∈ F \ F and F ∪ {a} ∈ F
Show: T is convergent and F∗ /T is isomorphic to the Coxeter group C(Σ, I). 8.5. Show that the class of rational sets is closed under homomorphisms, but not closed under inverse homomorphisms. Hint: For the nonclosure use the standard fact that { a n b n | n ∈ ℕ } is not rational (i.e., regular) for a two letter alphabet {a, b} and that the family RAT({a, b}∗ ) is closed under intersection. 8.6. Let M = {a, b, c}∗ /{ac = ca, bc = cb} be the direct product of the free monoids c∗ and {a, b}∗ . Show that the family of rational sets RAT(M) is not closed under intersection. Hint: Reuse ideas from Exercise 8.5. 8.7. Show that in free groups, the intersection of two finitely generated subgroups is finitely generated. Hint: Use that the set of reduced words which belong to a finitely generated subgroup is regular.
Exercises
| 283
8.8. Let F({a, b}) = F2 be the free group with two generators. Show that the set U = { a n ba−n | n ∈ ℤ } ⊆ F2 is the basis of a free subgroup. In particular, F2 contains each free group of finite or countable rank as a subgroup. 8.9. Show that finitely generated, residually finite groups are Hopfian. 8.10. Show that the Baumslag–Solitar group BS(2, 3) = {a, t}∗ /{ta2 t−1 = a3 } is not Hopfian. 8.11. Show that, during the normal form computation (corresponding to Example 8.20) in the group BG(1, 2), extremely long words may arise; for instance, words of the form a τ(n) . Here τ : ℕ → ℕ is the tower function defined by τ(0) = 1 and τ(n + 1) = 2τ(n) . 8.12. Let G be a non-Abelian finite group and let Z(G) denote its center, that is Z(G) = { g ∈ G | ∀h ∈ G : gh = hg }. Show that the index of Z(G) is at least 4. Using this fact, answer the following question. “What is the probability that two group elements commute?” More precisely, we seek an upper bound for the probability in non-Abelian finite groups, which is tight for certain groups. The answer has been given in a paper (with the question as its title) published by Gustafson in 1973. 8.13. Use Theorem 6.5 for proving Corollary 8.30: commuting elements in a free group lie in a common cyclic subgroup. 8.14. In this exercise, we will show that the automorphism group of a finitely generated free group F(Σ) is generated by at most four elements. (a) For a ∈ Σ, we define the automorphism i a by i a (a) = a−1 and i a (c) = c for a ≠ c ∈ Σ. For b ∈ Σ with a ≠ b, we further define the automorphisms λ ab and ρ ba by λ ab (b) = ab, ρ ba (b) = ba−1 and λ ab (c) = ρ ba (c) = c for b ≠ c ∈ Σ. (Recall that the automorphisms i a and λ ab are referred to as the elementary Nielsen transformations.) First, show that the automorphisms ρ ba may be expressed by elementary Nielsen transformations. Then use Theorem 8.34 to show that the automorphism group of F(Σ) can be generated by the elementary Nielsen transformations. (b) Show that the automorphism group of F(Σ) is generated by at most four elements. Moreover, show that three generators are sufficient if F(Σ) has rank 2. 8.15. By Proposition 8.36, all words over the alphabet {T, U} (and hence also all bit sequences in {0, 1}∗ ) may be encoded as 2 × 2-matrices with entries in ℕ. Moreover, each matrix in SL(2, ℕ) describes a unique word in {T, U}∗ . Now, let W ∈ {T, U}∗ be a word of length ℓ and let φ(W) = ( aa13 aa24 ) ∈ SL(2, ℕ) be the corresponding matrix product. Show that a i ⩽ Fℓ+1 for i = 1, 2, 3, 4, where Fℓ+1 is the (ℓ + 1)th Fibonacci number. As usual, F0 = 0, F1 = 1, and F n+1 = F n + F n−1 for n ⩾ 1.
284 | 8 Discrete infinite groups 8.16. Let A ∈ PSL(2, ℤ) be an element in the modular group of order 2 and A(z) = Furthermore, let S ∈ PSL(2, ℤ) with S(z) = −1 z . (a) Show that a + d = 0. (b) Show that A and S are conjugated in PSL(2, ℤ).
az+b cz+d .
Hint: Use the normal form R i1 S ⋅ ⋅ ⋅ R i m−1 SR i m , m ⩾ 2, 1 ⩽ i j ⩽ 2 for j = 1, . . . , m and i1 = i m , as in the proof of Theorem 8.38. 8.17. Let n ∈ ℕ, n ⩾ 1. Prove Fermat’s theorem on sums of two squares: (a) If −1 is a quadratic residue modulo n, that is, −1 ≡ q2 mod n for some q ∈ ℤ, then n is a sum of two squares in ℤ, that is, n = x2 + y2 with x, y ∈ ℤ. Hint: If −1 ≡ q2 mod n, then there exists p ∈ ℤ with q2 + pn = −1. Form A(z) = qz+n pz−q . By Exercise 8.16 (b) the element A ∈ PSL(2, ℤ) is conjugated to S, that is, xz+y and XSX −1 = A. there exists X ∈ PSL(2, ℤ) with X(z) = uz+υ 2 2 (b) If n = x + y with x, y ∈ ℤ and gcd(x, y) = 1, then −1 is a square modulo n. (c) Use Exercises 8.17 (a) and 8.17 (b) for showing that a prime number p is a sum of two squares if and only if p = 2 or p ≡ 1 mod 4.
Summary Notions – – – – – – – – – – – – – – –
word problem conjugacy problem isomorphism problem residually finite monoid Hopfian group presentations of monoids and groups rewriting system confluence, termination, convergence Church–Rosser property semi-Thue system free group free product Baumslag–Solitar group BS(p, q) Waack group W Tietze transformation
– – – – – – – – – – – – – – –
free partially commutative monoid graph group semidirect product amalgamated product HNN extension Britton reduction Baumslag group rational set rank (kernel of the) Schreier graph Whitehead automorphism Nielsen transformation graphical realization special linear group SL(2, ℤ) modular group PSL(2, ℤ)
Methods and results – – – – – –
Finitely generated residually finite groups are Hopfian. Strong confluence ⇒ confluence. Confluence ⇔ Church–Rosser Local confluence and termination ⇒ confluence. Local confluence can be tested based on critical pairs. If S ⊆ Γ ∗ × Γ ∗ is a finite convergent semi-Thue system, then the word problem in Γ ∗ /S is decidable.
Summary
– – – – – – – – – – – – – – – – –
| 285
Residually finite, finitely presented monoids have a decidable word problem. Free groups are residually finite. Trace monoids embed into RAAGs and their word problem can be solved in linear time. Convergent systems for amalgamated products and HNN extensions. Groups embed themselves into their amalgamated products and HNN extensions. 1 ≠ w ∈ G ∗U H or 1 ≠ w ∈ HNN(G; A, B, φ) ⇒ w is Britton reducible. Subgroups are rational subsets if and only if they are finitely generated. Benois: rational subsets of a finitely generated free group form an effective Boolean algebra. Free groups are isomorphic if and only if their bases have the same cardinality. If K(G, Σ) is the kernel of the Schreier graph with edge set E and spanning tree T, then G is isomorphic to the free group F(E + \ T). Nielsen–Schreier theorem: subgroups of free groups are free. F finitely generated free group and |G \ F| = n < ∞, then the rank formula (rank(F) − 1) ⋅ n = rank(G) − 1 holds. Word / conjugacy / isomorphism problems are decidable for finitely generated free groups. Aut(F) is finitely generated by the Whitehead automorphisms. The monoid SL(2, ℕ) is free with basis {T, U}. SL(2, ℤ) = {R, S}∗ /{S 4 = R 6 = S 2 R 3 = 1} and PSL(2, ℤ) = {R, S}∗ /{S 2 = R 3 = 1} (ping–pong argument) Countable free groups are subgroups of PSL(2, ℤ) and SL(2, ℤ).
Solutions to exercises Chapter 1 1.1. Suppose that e and f are neutral. Then, e = ef = f . 1.2. Suppose that y and z are inverses of x, in particular, yx = 1 = xz. Then, y = y ⋅1 = yxz = 1 ⋅ z = z. 1.3. We consider the mapping δ : M → M with δ(x) = ax. It is injective since δ(x) = δ(y) implies x = 1 ⋅ x = bax = b ⋅ δ(x) = b ⋅ δ(y) = bay = 1 ⋅ y = y. Then, δ is also surjective because M is finite. Let c be the element in M that is mapped to 1 ∈ M, that is, 1 = δ(c) = ac. Then, b = b ⋅ 1 = bac = 1 ⋅ c = c and therefore ab = 1. 1.4. The set of all mappings f : ℕ → ℕ forms an infinite monoid with composition as operation. The neutral element is the identity mapping id with id(n) = n. Let a : ℕ → ℕ with a(0) = 0 and a(n) = n − 1 for n ⩾ 1. For the mapping b : ℕ → ℕ with b(n) = n + 1, we have a ∘ b = id, but from (b ∘ a)(0) = b(a(0)) = b(0) = 1, we obtain b ∘ a ≠ id. 1.5. Define c = bab. Then, aca = ababa = aba = a and cac = bababab = babab = bab = c. 1.6 (a) Since S is finite, there are t, p ⩾ 1 with x t = x t+p . It follows x j = x j+ℓp for all j ⩾ t and ℓ ⩾ 0. ⋅x
x t+2
x t+1
⋅x ⋅
⋅x x
⋅x
x2
⋅x
⋅⋅⋅
⋅x
x t−1
⋅x
⋅
⋅
⋅
xt = x t+p ⋅x ⋅
x t+p−1 ⋅x
x t+p−2
⋅
⋅
⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅x
Let k be minimal such that t ⩽ kp < t + p. Then, x kp is idempotent since x kp+ip = x kp for all i ⩾ 0. Thus, {x t , x t+1 , . . . , x t+p−1 } is a cyclic subgroup of order p which has the idempotent element x kp as a neutral element; and we have 1 ⩽ kp ⩽ |S|. There is only one power of x which is idempotent: it is x kp . Indeed, let x m and x n be idempotent. Then, x n = (x n )2 = (x n )3 = ⋅ ⋅ ⋅ and thus x m = (x m )n = x mn = (x n )m = x n .
Chapter 1 |
287
1.6 (b) Exercise 1.6 (a) shows that xℓ is idempotent for some 1 ⩽ ℓ ⩽ |S|. This implies x mℓ = xℓ for all m ⩾ 1. Therefore, x|S|! is idempotent for all x ∈ S. 1.7. Let x ∈ H and g = φ(x) ∈ G. Then, e = x n ∈ H is idempotent for some n ⩾ 1 by Exercise 1.6 (b). We have φ(e) = 1 and therefore φ(eHe) = G. Since eHe ⊆ H and H is minimal, we conclude eHe = H. The element e is neutral in eHe, hence neutral in H. Finally, consider the subset xH of M. It is subsemigroup with φ(xH) = gG = G. Hence, xH = H by minimality of H. Thus, there is some y ∈ H with xy = e. By Exercise 1.3, we also have yx = 1. This shows that H is a group. 1.8 (a) Let e, f be idempotent and x = ef . We have ef ⋅ xe ⋅ ef = ef ⋅ x ⋅ ef = ef and xe ⋅ ef ⋅ xe = (x ⋅ ef ⋅ x) ⋅ e = xe. This shows xe = x. Symmetrically, we obtain fx = x. Now, x2 = xe ⋅ fx = x. 1.8 (b) If e is idempotent, then eee = e and hence e = e. Since x = x for all elements x, Exercise 1.8 (a) shows that ef is idempotent whenever e, f are idempotent. In particular, the impotent elements form a subsemigroup. Finally, ef ⋅fe ⋅ef = ef ⋅ef = ef and fe ⋅ ef ⋅ fe = fe ⋅ fe = fe. This yields fe = ef = ef . Hence, the subsemigroup of the idempotents is commutative. 1.9. Let a n+1 be an arbitrary element in S. Consider the n + 1 elements a1 ⋅ ⋅ ⋅ a i for 1 ⩽ i ⩽ n + 1. By the pigeonhole principle, we have a1 ⋅ ⋅ ⋅ a i = a1 ⋅ ⋅ ⋅ a j for 1 ⩽ i < j ⩽ n + 1. In particular, i ⩽ n. For b = a i+1 ⋅ ⋅ ⋅ a j , we obtain a1 ⋅ ⋅ ⋅ a i = a1 ⋅ ⋅ ⋅ a i b, as desired. 1.10. Let X be a finite set with 1U ∈ ̸ X. Then, U X = X ∪ {1U } with the operation ab = b for b ∈ X and ab = a for b = 1U forms a monoid with |X| + 1 elements. For Y ⊆ X, U Y is a submonoid of U X . We choose X such that |X| is even and Y ⊆ X such that |Y| odd. Now M = U X and U = U Y meet the requirements of the exercise. 1.11. Show associativity of ∘ as follows: ((x, s, y) ∘ (x , s , y )) ∘ (x , s , y ) = (x, sφ(y, x )s , y ) ∘ (x , s , y ) = (x, sφ(y, x )s φ(y , x )s , y ) = (x, s, y) ∘ (x , s φ(y , x )s , y ) = (x, s, y) ∘ ((x , s , y ) ∘ (x , s , y )). 1.12. Let G be a group and x2 = x for some x ∈ G. Let y be the inverse of x, that is, xy = 1. Then, x = xxy = xy = 1. Next, let M be finite monoid and let 1 be the only idempotent. Then, as each x has some idempotent power x ω with ω ⩾ 1 (see Exercise 1.6 (a)), we conclude that x ω = 1. Hence, x ω−1 is an inverse of x. 1.13 (a) First, we show that every left inverse element is also right inverse. Let h be the left inverse element of g and g the left inverse element of h. Then, hg = e and g h = e and therefore gh = egh = g hgh = g eh = g h = e. Thus, h is the inverse of g. Further, we conclude ge = ghg = eg = g. 1.13 (b) Let |G| > 1 be any set. Define the operation by xy = x for all x, y ∈ G. Then each element e ∈ G satisfies the conditions but G is not a group.
288 | Solutions to Exercises 1.13 (c) Let a ∈ G be fixed. There exists e ∈ G with ea = a. For all b ∈ G there exists x ∈ G such that b = ax. Thus, eb = eax = ax = b and e is left neutral. Left inverse elements exist as solutions of ya = e. Thus, G is a group by Exercise 1.13 (a). 1.14 (a) (i) ⇒ (ii): We have 1 ∈ S, so S ≠ 0. Moreover, for all y ∈ S, also y−1 is in S. Therefore, we have xy−1 ∈ S for all x, y ∈ S. (ii) ⇒ (i): S ≠ 0, so there exists x ∈ S. Thus also 1 = x ⋅ x−1 ∈ S and furthermore x−1 = 1 ⋅ x−1 ∈ S. The closure of S under the group operation follows from xy = x(y−1 )−1 ∈ S. Consequently, S is a subgroup of G. 1.14 (b) (i) ⇒ (ii): This follows because S is closed under the group operation and 1 ∈ S. (ii) ⇒ (i): Let x ∈ S. Since S is finite and x i ∈ S for all i > 0, the order r of x is finite. Therefore, 1 = x r ∈ S and x−1 = x r−1 ∈ S. Thus, S is a group. 1.14 (c) Let G be the additive group ℤ and S = ℕ. Then, S is closed under addition but S is not a group. 1.15. (i) ⇒ (ii) is trivial. (ii) ⇒ (i): We have φ(1G ) = φ(1G ⋅ 1G ) = φ(1G )φ(1G ). By Exercise 1.12, we have φ(1G ) = 1H . Moreover, φ(x−1 ) = φ(x−1 )φ(x)φ(x)−1 = φ(x−1 x)φ(x)−1 = 1H ⋅ φ(x)−1 = φ(x)−1 . 1.16. By m2 = 1, the element m itself is the inverse of m ∈ M. Furthermore, mn = m ⋅ 1 ⋅ n = m(mn)(mn)n = mm(nm)nn = 1 ⋅ nm ⋅ 1 = nm for all m, n ∈ M. Thus, M is a commutative group. 1.17. Let U be the set of elements having odd order. We have 1 ∈ U, thus U is not empty. We show that U forms a subgroup of G. Let x, y ∈ U and let k be the order of x and ℓ the order of y. Since G is commutative, we have (xy −1 )kℓ = x kℓ y−kℓ = (x k )ℓ (yℓ )−k = 1. So, the order of xy−1 divides kℓ. Since kℓ is odd, so is the order of xy−1 and therefore xy−1 ∈ U. Exercise 1.14 (a) yields the desired result. 1.18. Since H is a subgroup, we have HH = H = H −1 . Therefore, we can conclude from xH = yH that xHx−1 = xHHx−1 = xHH −1 x−1 = xH(xH)−1 = yH(yH)−1 = yHH −1 y−1 = yHHy−1 = yHy−1 . 1.19 (a) Let g ∈ G. From [G : H] = n, we obtain |{g i H | i ⩾ 0}| ⩽ n. So, there are 0 ⩽ j < k ⩽ n with g j H = g k H. Multiplying on the left by g−j yields H = g k−j H and thus g k−j ∈ H. Note that i = k − j satisfies 1 ⩽ i ⩽ n. 1.19 (b) We consider the mapping π : G → S G/H into the symmetric group, where π(g) : G/H → G/H is the mapping xH → gxH. The mapping π is a homomorphism and ker(π) = N. Thus, N is a normal subgroup and |G/N| ⩽ n! by the homomorphism theorem. It remains to show that N is the largest normal subgroup contained in H. For this purpose, let K ⊆ H be a normal subgroup of G. Then, K = xKx−1 ⊆ xHx−1 for all x ∈ G. But then, we also have K ⊆ ⋂ x∈G xHx−1 = N. 1.20. Let D4 be the symmetry group of a regular rectangle with vertex set {0, 1, 2, 3} as in Section 1.2. This group has eight elements and is generated by a rotation
Chapter 1 | 289
δ : n → n + 1 mod 4 and a reflection σ. Remember that σ leaves the vertices 0 and 2 fixed and swaps 1 and 3. Let τ be the reflection that leaves the vertices 1 and 3 fixed and interchanges 0 and 2. The subgroup K = ⟨σ, τ⟩ has index 2 in D4 and is therefore normal. (The group K is isomorphic to the Klein four-group.) Clearly, ⟨σ⟩ is a normal subgroup of K. Finally, we have δσδ−1 = τ ∈ ̸ ⟨σ⟩ and thus ⟨σ⟩ is not a normal subgroup of D4 . 1.21. First, let G be commutative. Then, f is a group homomorphism because f(x ⋅ y) = (x ⋅ y)−1 = y−1 ⋅ x−1 = x−1 ⋅ y−1 = f(x) ⋅ f(y). Now, let f be a group homomorphism. Then, x ⋅ y = (y−1 ⋅ x−1 )−1 = f(y−1 ⋅ x−1 ) = f(y−1 ) ⋅ f(x−1 ) = y ⋅ x. We conclude that G is commutative. 1.22. Let n be the order of a. Then, f(a)n = f(a n ) = f(1) = 1. Thus, the order of f(a) divides n. 1.23. Let 0 ≠ r ∈ R. Then the mapping s → sr is injective since sr = s r implies (s − s )r = 0 and thus s − s = 0. Due to the finiteness of R, the mapping is also surjective. Therefore, there is a left inverse s of r such that sr = 1. Exercise 1.13 (a) shows that (R \ {0}, ⋅, 1) is a group and thus R is a skew field. Remark: Wedderburn’s theorem (named after Joseph Henry Maclagan Wedderburn, 1882–1942) states that every finite skew field is already a field, which means that R, under the given circumstances, is even commutative. Ernst Witt (1911–1991) found a proof in the early 1930s, which even nowadays is still considered to be the simplest one for this result [1]. 1.24. (i) ⇒ (ii): Let I be an ideal. Then, (I, +, 0) is a normal subgroup of (R, +, 0), and by Theorem 1.10, the quotient (R/I, +, I) is a group. Moreover, for the product of the sets r1 +I and r2 +I, we have that (r1 +I)⋅(r2 +I) ⊆ r1 r2 +r1 I+r2 I+I⋅I ⊆ r1 r2 +I+I+I = r1 r2 +I. Let r1 + I = s1 + I and r2 + I = s2 + I. Then there are i1 , i2 ∈ I such that s1 = r1 + i1 and s2 = r2 + i2 . Thus, we obtain (s1 + I)(s2 + I) ⊆ s1 s2 + I = (r1 + i1 )(r2 + i2 ) + I = r1 r2 + r1 i2 + r2 i1 + i1 i2 + I ⊆ r1 r2 + r1 I + r2 I + I + I ⊆ r1 r2 + I. This shows that the multiplication given by (r1 + I, r2 + I) → r + I with (r1 + I)(r2 + I) ⊆ r + I is well defined. The associativity of multiplication and the distributive law on R/I now follow from the corresponding laws on R. (ii) ⇒ (iii): Let R/I be a ring. Then, φ : R → R/I, r → r + I by Theorem 1.10 is a group homomorphism with respect to addition and its kernel is I. It remains to show that φ is a ring homomorphism. We have φ(1) = 1 + I and 1 + I is the neutral element with respect to multiplication in R/I. Furthermore φ(r1 )φ(r2 ) = (r1 + I)(r2 + I) = r1 r2 + I = φ(r1 r2 ). (iii) ⇒ (i): Let φ : R → S be a homomorphism of commutative rings with kernel I. By Theorem 1.10 I is a normal subgroup of R and thus in particular a subgroup. For r ∈ R, we have φ(rI) = φ(r)φ(I) = φ(r) ⋅ {0} = {0}. Thus, rI is a subset of the kernel I of φ. This shows that I is an ideal.
290 | Solutions to Exercises 1.25. We consider the mapping φ : R/I → R/J, r + I → r + J. This mapping is well defined and surjective since I ⊆ J. From (r1 + J) ⋅ (r2 + J) = r1 r2 + J, we see that φ is a homomorphism. Let φ(r + I) = J. This is the case if and only if r + J = J. And this, in turn, is the case if and only if r ∈ J. Therefore, ker(φ) = J/I. The claim now follows using the homomorphism theorem. 1.26. (i) If i1 , i2 ∈ I and j1 , j2 ∈ J, then (i1 + j1 ) + (i2 + j2 ) = (i1 + i2 ) + (j1 + j2 ) ∈ I + J. Hence, I + J is a commutative subgroup because the according properties transfer from I and J to I + J. Moreover, r1 (i + j)r2 = r1 ir2 + r1 jr2 ∈ I + J for i ∈ I, j ∈ J and r1 , r2 ∈ R. Thus, I + J is an ideal. (ii) Consider I = J = ⟨X, Y⟩ in R = F[X, Y] for a field F. We have X 3 , Y 3 ∈ I ⋅ J. If I ⋅ J is an ideal, then X 3 + Y 3 ∈ I ⋅ J, that is, X 3 + Y 3 = (aX + bY)(cX + dY) = acX 2 + (ad + bc)XY + bdY 2 for suitable a, b, c, d ∈ F[X, Y]. But then the degree of ac must be 1. Without loss of generality let a = X. Then, dX + bc has to be 0. However, since neither b nor c can contain an X, there is no solution to the equation dX + bc = 0. So the (complex) product of two ideals, in general, is not an ideal itself. (iii) We consider I = ⟨X⟩ and J = ⟨Y⟩ in R = F[X, Y]. Then, X, Y ∈ I ∪ J, but X + Y ∈ ̸ I ∪ J. So in general, I ∪ J is not an ideal. (iv) The intersection of two groups is a group itself. It remains to show that R(I ∩ J)R ⊆ I ∩ J. Let i ∈ I ∩ J and r1 , r2 ∈ R. Since I and J are ideals, we have r1 ir2 ∈ I and r1 ir2 ∈ J. So r1 ir2 ∈ I ∩ J, which shows that I ∩ J is an ideal. 1.27. Let I = ⟨6, X 2 − 2⟩. Assume that I = ⟨r(X)⟩. Then the degree of r must be less than or equal to the degree of 6 and, thus, the degree of r(x) is 0. Since 1, 2, 3 ∈ ̸ I, we obtain r(X) = ±6. Hence, X 2 − 2 ∈ ̸ ⟨r(X)⟩ = I, a contradiction. The quotient ring R/I is not a field because (2 + I)(3 + I) = 6 + I = I. Hence, I cannot be maximal. 1.28. 34 = 81 ≡ 1 mod 16, so the order of 3 divides 4. Since 32 = 9 ≢ 1 mod 16, the order is 4. 1.29. The group (ℤ/60ℤ)∗ contains φ(60) = φ(22 ) ⋅ φ(3) ⋅ φ(5) = 16 elements. The orders of the elements are divisors of 16. All divisors of 16 are even, except for the number 1. But 1 is the only element of order 1. Thus, there remain 15 elements with even order. 1.30. ℤ/mnℤ contains an element of order mn. In the group (ℤ/mℤ) × (ℤ/nℤ), however, for each element of order r, we know that r divides lcm(m, n). But from gcd(m, n) > 1, we conclude lcm(m, n) = mn/ gcd(m, n) < mn. 1.31. Using the Euclidean algorithm, we obtain 98 = 2 ⋅ 51 − 4 and 51 = 13 ⋅ 4 − 1. Back substitution yields 1 = 13 ⋅ 4 − 51 = 13(2 ⋅ 51 − 98) − 51 = −13 ⋅ 98 + 25 ⋅ 51. Thus, s = 25 is the desired number. 1.32. We rewrite the congruences as n i ≡ −1 mod 3, n i ≡ −1 mod 4 and n i ≡ −1 mod 7. This yields the potential solutions n1 = −1+84 = 83 and n2 = −1+2⋅84 = 167, because 84 = 3 ⋅ 4 ⋅ 7 and we know that modulo 84 the solution is unique. Since
Chapter 1 | 291
the difference between two solutions must be a multiple of 84, the second smallest solution cannot be less than 167. 1.33. For all n ∈ ℕ, the number n4 +n2 is even and, thus, 2n4 +2n2 = 4m for a suitable 4 2 integer m. Therefore, 72n +2n = 74m = 492m ≡ (−11)2m ≡ 121m ≡ 1m ≡ 1 mod 60. 1.34.
+ 4 = (z2 − 2z + 2) (z2 + 2z + 2)
z4 − z4 + 2z3 − 2z2 2z3 − 2z2 − 2z3 + 4z2 − 4z
2z2 − 4z + 4 − 2z2 + 4z − 4 0 We obtain p(z) = z2 + 2z + 2 and z4 + 4 = p(−z) ⋅ p(z). Differentiation yields p (z) = 2z + 2 and p (z) = 2; so p has its only minimum at z = −1 and, we have p(−1) = 1. Consequently, p(z) ⩾ 1 for all z ∈ ℤ. For z4 + 4 to be prime, we must have p(z) = 1 or p(−z) = 1. This is the case if and only if z = −1 or z = 1. Both cases yield the value 5. 1.35. Let f = X 8 + X 7 + X 6 + X 4 + X 3 + X + 1 and g = X 6 + X 5 + X 3 + X. We compute gcd(f, g) using the Euclidean algorithm. f = g ⋅ (X 2 + 1)
+ X4 + X3 + 1
g = (X 4 + X 3 + 1)X 2 + X 3 + X 2 + X X 4 + X 3 + 1 = (X 3 + X 2 + X)X
+ X2 + 1
X 3 + X 2 + X = (X 2 + 1)(X + 1)
+1
2
2
X + 1 = 1 ⋅ (X + 1)
+0
Thus, gcd(f, g) = 1. 1.36. Suppose that f(X) is not irreducible over 𝔽2 . Then, f(X) = g(X)h(X) over 𝔽2 with deg(g), deg(h) ∈ {1, 2, 3, 4} and deg(g) + deg(h) = 5. Since f(X) has no root in 𝔽2 , we conclude deg(g), deg(h) ∈ {2, 3}. Without loss of generality let deg(g) = 2. The polynomials of degree 2 over 𝔽2 are X 2 + X +1, X 2 + X, X 2 +1 and X 2 . But only X 2 + X +1 is irreducible since each of the others has a root in 𝔽2 . By the division algorithm for polynomials, we find f(x) = (X 2 + X + 1)(X 3 + X 2 ) + 1. So, X 2 + X + 1 is not a divisor of f(X) and therefore f(X) is irreducible. 1.37. If t = 1, we have f(X) = a i X i for i ∈ ℕ and a i ≠ 0 because 0 ≠ f(X). So, there are no positive roots. Let therefore t ⩾ 2. In general, by division with an appropriate X i , we may assume that a0 ≠ 0. Then the derivative f (X) is a (t − 1)-thin polynomial and by induction it has at most t−2 roots. Between each pair of real roots of f(X), by Rolle’s theorem (named after Michel Rolle, 1652–1719), there is at least one root of f (X). The claim follows.
292 | Solutions to Exercises 1.38 (a) We obtain g(X) = ∑i b i X i = ∑i (a i−1 − λa i )X i . Without loss of generality, we assume that a0 ≠ 0. Consider an index i − 1 with i ⩾ 1 after which in the sequence (a0 , . . . , a d ) a sign change occurs. Therefore, a i−1 ≠ 0 and for a i−1 < 0, we have a i ⩾ 0 (respectively for a i−1 > 0 we have a i ⩽ 0). It follows that at a sign change a i−1 always has the same sign as b i . Now b 0 = −λa0 , so b 0 and a0 have a different sign, but b d+1 = a d ≠ 0. Thus, the number of sign changes must have increased. 1.38 (b) Let 0 < λ1 ⩽ ⋅ ⋅ ⋅ ⩽ λ k be the sequence of positive real roots with multiplicities. Then, we have f(X) = (X − λ1 ) ⋅ ⋅ ⋅ (X − λ k )h(X). By k-fold application of Exercise 1.38 (a) the claim follows. 1.39. For n = 0 the statement is clear. Now, let n ⩾ 1. Let r = and gcd (s, t) = 1. Now,
s t
with s, t ∈ ℤ, t ≠ 0
s n−1 sn s f ( ) = 0 = n + a n−1 n−1 + ⋅ ⋅ ⋅ + a0 t t t Then, s n + a n−1 ts n−1 + ⋅ ⋅ ⋅ + a0 t n = 0. Thus, t divides s n and therefore t = ±1 because gcd(s, t) = 1. Hence, r ∈ ℤ. Without loss of generality, assume that t = 1. We obtain s(s n−1 + a n−1 s n−2 + ⋅ ⋅ ⋅ + a1 ) = −a0 , and thus s = r is a divisor of a0 . 1.40. We first show that if f(X) is irreducible over ℤ, then it is irreducible over ℚ. Assume that f(X) is reducible over ℚ. Then, there exist coprime numbers s, t ∈ ℕ \ {0} and polynomials g(X), h(X) ∈ ℤ[X] with f(X) = st g(X)h(X) such that the greatest common divisor of the coefficients of g(X) is 1 and the greatest common divisor of the coefficients of h(X) is also 1. Suppose there exists a prime number q such that q divides every coefficient of g(X)h(X). Then, g(X)h(X) mod q is the zero polynomial in (ℤ/qℤ)[X]; this is a contradiction since neither g(X) mod q nor h(X) mod q is zero. Hence, the greatest common divisor of the coefficients of g(X)h(X) is 1. Now, tf(X) = sg(X)h(X) yields t = ±1. This shows that f(X) is reducible over ℤ. It remains to show that f(X) is irreducible over ℤ. Let f(X) = g(X)h(X) with g(X) = b r X r + ⋅ ⋅ ⋅ + b 0 and h(X) = c s X s + ⋅ ⋅ ⋅ + c0 for b i , c i ∈ ℤ. We have to show that r = 0 or s = 0. Since p is a prime divisor of a0 = b 0 c0 and p2 is not, p either divides b 0 or c0 , but not both. Let p | b 0 and p ∤ c0 . Because of p ∤ a n = b r c s , there exists a smallest index m > 0 with p ∤ b m , but p | b i for all i < m. Letting c j = 0 for j > s, we obtain a m = b m c0 + (b m−1 c1 + ⋅ ⋅ ⋅ + b 0 c m ). The prime p divides (b m−1 c1 + ⋅ ⋅ ⋅ + b 0 c m ), but not b m c0 , and thus p does not divide a m . Therefore m = n. This shows r = n and s = 0. 1.41 (a) This is a direct consequence of Eisenstein’s criterion. 1.41 (b) We obtain f(X + 1) = (X+1) X p ). for 1 ⩽ i ⩽ p − 1 and p2 ∤ ( p−1
p
−1
p = X p−1 + (1p)X p−2 + ⋅ ⋅ ⋅ + (p−1 ) ∈ ℤ[X]. But p | ( pi)
1.42. In the extension considered here, the operation of multiplication is not associative anymore. In particular, (S(X) ⋅ (1 − X)) ⋅ ∑i⩾0 X i ≠ S(X) ⋅ ((1 − X) ⋅ ∑i⩾0 X i ).
Chapter 2 |
293
1.43. Since the operations are chosen in such a way that they correspond to the according operations in ℝ, we see that ℚ[√2] is a subring of ℝ. It remains to show that the inverse of (a + b √2) ≠ 0 is an element of ℚ[√2]. Since at least one of a and b is nonzero and the square root of 2 is not in ℚ, we have a2 ≠ 2b 2 , that is a2 − 2b 2 ≠ 0, and therefore 1 a + b √2
=
a −b √ a − b √2 = + 2 ∈ ℚ[√2] a2 − 2b 2 a2 − 2b 2 a2 − 2b 2
1.44. The set I = { g(X) ∈ F[X] | g(α) = 0 } is an ideal. It contains at least one polynomial besides the zero polynomial since p ∈ I. By Theorem 1.45, there is a unique polynomial m(X) with leading coefficient 1 which generates I. 1.45. From a n ≡ 1 mod p e , we obtain a n ≡ 1 mod p and thus, m | n. Since a m ≡ 1 e−1 mod p, we have a m = 1 + bp for some b ∈ ℤ. By Lemma 1.65, we have a mp = e−1 e−1 p e e+1 mp e (1 + bp) ≡ 1 + bp mod p and therefore a ≡ 1 mod p . This yields n | mp e−1 . We conclude mn ∈ { p i | 0 ⩽ i < e }. 1.46. The polynomial X − a divides X (q−1)/2 − 1 if and only if a is a root of X (q−1)/2 − 1 if and only if a(q−1)/2 = 1 if and only if a is a square. The last equivalence is Euler’s criterion (Theorem 1.68).
Chapter 2 2.1. Considering the times of Frederick the Great and Voltaire one can assume that the message was written in French. We obtain: “ce soir sous P a cent sous six.” This reads as “ce soir souper à Sans Souci.” So Frederick the Great was inviting to dinner at his palace in Potsdam, and Voltaire replied with a big “G” and a small “a.” This means “G grand a petit.” So he announced with “j’ai grand appétit” his ravenous appetite. 2.2. It may be natural to conjecture that Exercise 1.3 is applicable and that the functions c k and d k have to be mutually inverse. However, this is not the case due to different domains and target sets. As a simple counter-example consider the encryption function c k : {1} → {1, 2} with c k (1) = 1 and the decryption function d k : {1, 2} → {1} with d k (1) = d k (2) = 1. 2.3 (a) From ℤ/77ℤ = ℤ/7ℤ × ℤ/11ℤ, we obtain φ(n) = 6 ⋅ 10 = 60. 2.3 (b) The Euclidean algorithm yields s = 7. 2.3 (c) x ≡ y s = 57 ≡ 47 mod 77. 2.4. We have x = x2⋅13−5⋅5 . Thus, we compute (x5 )−1 ≡ (5145 )−1 ≡ 134 mod 551 with the extende Euclidean algorithm. This yields x ≡ 4382 ⋅ 1345 ≡ 77 mod 551.
294 | Solutions to Exercises 2.5. We have d(c(x)) = (x e mod n)s mod n ≡ x es ≡ x1+k(p i −1) ≡ x mod p i for some k ∈ ℕ. By the Chinese remainder theorem, we obtain d(c(x)) ≡ x mod n. From x ∈ {0, . . . , n − 1} we conclude d(c(x)) = x. 2.6 (a) The encrypted message is y ≡ 172 ≡ 36 mod n. 11+1
2.6 (b) Recall that n = 11 ⋅ 23 = 253. First, we determine z11 ≡ 36 4 ≡ 5 mod 11 23+1 and z23 ≡ 36 4 ≡ 6 mod 23. By the Chinese remainder theorem, from the conditions z ≡ ±5 mod 11 and z ≡ ±6 mod 23, we obtain the four solutions z ∈ {6, 17, 236, 247}. 2.6 (c) The encoding function is injective on the given domain: suppose there exist ̃ + x)̃ ≡ 0 mod 253. x, x̃ ∈ 00{0, 1}4 00 with x > x̃ and x2 ≡ x̃ 2 mod 253. Then, (x − x)(x Now, there are two cases. Either one of the two factors is 0 or one is a multiple of 11, and the other a multiple of 23. We have x − x̃ ≠ 0 and x + x̃ ≢ 0 mod 253 because x + x̃ cannot exceed 120. Thus, the only remaining case is x − x̃ being a multiple of 11 or 23. The numbers x and x̃ are both multiples of 4. From 0 ⩽ x−4 x̃ ⩽ 15, we obtain that x − x̃ can only be a multiple of 11, and x−4 x̃ = 11 has to hold. With 4x ⩽ 15 this yields 4x̃ ⩽ 4 and x+4 x̃ ⩽ 11 + 2 ⋅ 4 = 19 < 23. In particular, x + x̃ is not divisible by 23. This is a contradiction; and it shows that such x and x̃ cannot exist. 2.7 (a) The number n is prime, so φ(n) = n − 1 = 46 and the order of g is a divisor of 46 = 2 ⋅ 23. From 52 = 25 ≢ 1 mod 47 and 523 ≡ −1 mod 47, we obtain that the order of g is 46. 2.7 (b) We have A ≡ 5a ≡ 516 ≡ 17 mod 47 and B ≡ 5b ≡ 59 ≡ 40 mod 47. The secret key is k ≡ A b ≡ B a ≡ 21 mod 47. 2.7 (c) B = 40 has already been determined. The ciphertext is computed by y ≡ A b ⋅x ≡ k ⋅ x ≡ 21 ⋅ 33 ≡ 35 mod 47. 2.8. Since each element x ∈ X has a unique image y = h(x), each x is counted in exactly one ‖y‖. Thus, ∑y∈Y ‖y‖ = |X|. Substituting this equation into m = ∑y∈Y ‖y‖/|Y|, directly yields m =
|X| |Y| .
A collision (x, x ) with h(x) = h(x ) = y accounts for two
different elements in ‖y‖. The number of these pairs for a fixed y is (‖y‖ 2 ). This yields ‖y‖ 1 2 − |X| is obtained using (‖y‖) = The equation = N = ∑y∈Y (‖y‖ ‖y‖ ). ∑ ( ) ∑ y∈Y 2 2 2 2 y∈Y 2 ‖y‖(‖y‖−1) 2
and ∑y∈Y ‖y‖ = |X|. We have ∑ (‖y‖ − m)2 = ∑ ( ‖y‖2 − 2 ‖y‖ y∈Y
y∈Y
=( ∑ ‖y‖2 ) − 2|X| y∈Y
|X| |X| 2 +( ) ) |Y| |Y|
|X| |X| 2 |X|2 + |Y| ( ) = 2N + |X| − |Y| |Y| |Y|
In particular, 0 ⩽ ∑y∈Y (‖y‖ − m)2 = 2N + |X| −
|X|2 |Y| . This
yields N ⩾ 12 ( |X| |Y| − |X|). 2
2.9. By definition, h1 is collision resistant. Now, let i ⩾ 1 and let x1 x2 ≠ x1 x2 be a collision of h i+1 . We may assume that x1 ≠ x1 . According to the definition of h i+1 , two
Chapter 2 |
295
cases are possible. Either h i (x1 )h i (x2 ) = h i (x1 )h i (x2 ) and thus, x1 ≠ x1 is a collision of h i . Or, in the other case, h i (x1 )h i (x2 ) ≠ h i (x1 )h i (x2 ) is a collision of h1 . 2.10. We have a80 115 359 ≡ a1 294 755 mod n. Therefore a80 115 359−1 294 755 ≡ 1 mod n. Furthermore, 80 115 359 − 1 294 755 = 78 820 604 = 4 ⋅ 19 705 151. Since φ(n) = 4p q , we try to find a number b such that b 19 705 151 ≢ 1 mod n; see Section 3.4.4 for details on this approach. We randomly choose b = 13. Then, 1319 705 151 ≡ 10 067 mod n and 10 0672 ≡ 1 mod n. Thus, (10 067 − 1)(10 067 + 1) ≡ 0 mod n, and we compute gcd(10 066, n) = 719 and gcd(10 068, n) = 839. This yields the factorization n = 719 ⋅ 839. 2.11 (a) First, let u k (x) = (γ, δ) = (α s , (x − mγ)s−1 ). Then, β γ γ δ ≡ α mγ γ δ ≡ α mγ α sδ ≡ −1 α mγ α s⋅s (x−mγ) ≡ α x mod p, and thus υ k (x, γ, δ) = true. Conversely, let υ k (x, γ, δ) = true and let t be the discrete logarithm of γ to base α. Then, β γ γ δ ≡ α x mod p, and thus α x−mγ ≡ γ δ ≡ α tδ mod p. We further obtain x − mγ ≡ tδ mod (p − 1). This is equivalent to δ ≡ (x − mγ)t−1 mod (p − 1). Thus, (γ, δ) is a valid signature for x. 2.11 (b) Choose u, υ with gcd(υ, p − 1) = 1. Define γ = α u β υ mod p, δ = −γυ−1 mod −1 p − 1 and x = uδ mod p − 1. Then, β γ γ δ ≡ β γ α uδ β −υυ γ ≡ α uδ ≡ α x mod p.
2.11 (c) We have to show that β λ λ μ ≡ α x mod p holds. Let y = (hγ − jδ)−1 mod p − 1. Since (γ, δ) is a valid signature, we have γ δ ≡ β −γ α x mod p. This yields β λ λ μ ≡ β λ (γ h α i β j )δλy ≡ β λ (γ δ )hλy α iδλy β jδλy ≡ β λ β −γhλy α xhλy α iδλy β jδλy ≡ β λ β −γhλy α x β jδλy ≡ −1 α x β λ−λ(hγ−jδ)(hγ−jδ) ≡ α x mod p. 2.12. Let the four people be 1, 2, 3, 4. We have m = (42), and the subsets of {1, 2, 3, 4} with two elements are A1 = {1, 2}, A2 = {1, 3}, A3 = {1, 4}, A4 = {2, 3}, A5 = {2, 4} and A6 = {3, 4}. For the first five of these subsets, we randomly choose the keys k 1 = (1, 9), k 2 = (2, 4), k 3 = (3, 5), k 4 = (4, 1) and k 5 = (5, 1). Then the key for A6 is k 6 = (6, 11) since 11 ≡ 7 − (9 + 4 + 5 + 1 + 1) mod 12. Person 1 gets the keys {k 4 , k 5 , k 6 }, person 2 gets {k 2 , k 3 , k 6 }, person 3 gets {k 1 , k 3 , k 5 } and person 4 gets {k 1 , k 2 , k 4 }. If three of the four participants come together, then they have all the keys k j in their subsets. Thus, they can compute s = (∑6j=1 k j mod 12) = 7. 2.13. We choose the prime number p = 61. We share the secret using the polynomial a(X) = 42+a1 X ∈ 𝔽p [X] with the randomly chosen coefficient a1 = 23. Then, a(1) = 4, a(2) = 27 and a(3) = 50. Thus, the pairs (1, 4), (2, 27) and (3, 50) shall be distributed. Any two of these pairs can be used to reconstruct the secret. For example, let (1, 4) and (2, 27) be given. We obtain the equations a0 + a1 = 4 and a0 + 2a1 = 27. We find a0 = 4 − a1 and substitute 4 − a1 for a0 in the second equation. This results in (4 − a1 ) + 2a1 = 27, so a1 = 23 and further a0 + 23 = 4, or a0 ≡ −19 ≡ 42 mod 61. 2.14. We use a multi-step algorithm based on Shamir’s secret sharing. First, the secret key is distributed between three keys such that two of them are sufficient to reconstruct the secret key. Two of these keys are given to the directors. The third key is again divided into ten keys with the property that seven of them suffice to decrypt the
296 | Solutions to Exercises
secret. Seven of these ten keys are passed on to the heads of department. The remaining three keys are again divided as secret (for this, a suitable encoding of the three keys as one secret is required). And finally this secret is divided into a total of 87 keys such that eleven of them are sufficient to reconstruct the secret. Obviously, the keys of eleven of the employees can compensate the lack of three department heads, and seven department heads can compensate the lack of a director. 2.15. Let persons 1, 2 and 3 have salaries g1 , g2 and g3 , respectively. The protocol works as follows: first, person 1 sends a (large) random number z to person 2, who in turn sends z + g2 to person 3. Since person 3 does not know the number z, it is not possible to infer the salary g2 from this. Then, person 3 sends the sum z + g2 + g3 to person 1. Person 1 is able to compute the sum g1 + g2 + g3 and thus also the average salary. Person 1 communicates it to the other two persons. This protocol can be easily generalized to cases of more employees. 2.16. Let w1 < ⋅ ⋅ ⋅ < w n be the possible salaries. Let c B be Bob’s public encryption function and d B his private decryption function. Alice randomly chooses x and sends d = c B (x) − a to Bob, where a is her salary. Bob now computes y1 , . . . , y n with y i = d B (d + w i ). For j such that w j = a, we have y j = x. To conceal his salary, Bob applies a one-way function f and computes z i = f(y i ) for all i ∈ {1, . . . , n}. Without loss of generality let z i ≠ z j + 1 for 1 ⩽ i, j ⩽ n. (Otherwise, Alice has to choose a new x or Alice and Bob agree upon a different hash function.) If b = w k is Bob’s salary, then he sends the sequence z1 , . . . , z k , z k + 1, . . . , z n + 1 to Alice. Now, a ⩽ b if and only if f(x) occurs in the sequence. 2.17. The dealer commits himself to a number between 0 and 36. Then, all players make their bids in plaintext. Finally, the dealer reveals his number. 2.18. The problem can be solved via dynamic programming. We fill a {0, . . . , c} × n table T. By T i,j , we denote the entry in row i and column j. The table is initialized with T0,j = 1; and then iteratively filled by the following rule. {1 if T i−s j ,j−1 = 1 or T i,j−1 = 1 T i,j = { 0 otherwise { The entry T i,j means that a solution for the weight i already exists using s1 , . . . , s j . Thus, a solution of the knapsack instance exists if T c,n = 1. 2.19. The inverse of u in (ℤ/71ℤ)∗ is w = 5. Alice chose the a i such that s i = a i ⋅w mod 47. So she used the superincreasing sequence (2, 5, 9, 17, 37). Alice determined c = 90 ⋅ w = 90 ⋅ 5 ≡ 24 mod 71. The only solution of the knapsack problem is 24 = 1 ⋅ 2 + 1 ⋅ 5 + 0 ⋅ 9 + 1 ⋅ 17 + 0 ⋅ 37. Therefore, the plaintext is (1, 1, 0, 1, 0). 2.20. For i = 0 the statement is trivial. For i ⩾ 0, induction yields: i−1
i
s i+1 ⩾ 2s i = s i + s i > s i + ∑ s j = ∑ s j j=1
j=1
Chapter 3
| 297
Chapter 3 3.1. Let γ0 > 0 be such that f(n) ⩽ ∑ki=0 f(⌈α i n⌉) + γ0 n. Let further ε > 0 and n0 ∈ ℕ be numbers such that α i n0 ⩽ n0 − 1 for all i ∈ {1, . . . , k} and ∑ki=0 ⌈α i n⌉ ⩽ (1 − ε)n for all n ⩾ n0 . Finally, choose γ large enough to ensure that γ0 < γε and f(n) < γn for all n < n0 . By induction on n we show that f(n) < γn. For n < n0 the assertion is satisfied by choice of γ. For n ⩾ n0 we have k
k
f(n) ⩽ ∑i=0 f (⌈α i n⌉) + γ0 n ⩽ ∑i=0 γ ⋅ ⌈α i n⌉ + γ0 n ⩽ (γ(1 − ε) + γ0 ) ⋅ n ⩽ γn Remark: A well-known application of the master theorem II is the computation of the median of a sequence of n numbers with only O(n) comparisons. This shows that it is not necessary to sort the whole sequence to determine the median. 3.2. Without loss of generality we assume that a ≠ 0 ≠ b. Divide each of a and b by gcd(a, b). Then, we can independently determine the square roots of the numerator and denominator using binary search. 3.3. The extended binary gcd algorithm is as follows: /∗ Assume that k ⩾ 0, ℓ ⩾ 0 ∗/ /∗ The result is (a, b, t) with ak + bℓ = t = gcd(k, ℓ) ∗/ function ext-bgcd(k, ℓ) begin if k = 0 or ℓ = 0 then return (1, 1, k + ℓ) elsif k and ℓ are even then (a, b, t) := ext-bgcd( 2k , 2ℓ ); return (a, b, 2t) elsif k is even and ℓ is odd then (a, b, t) := ext-bgcd( 2k , ℓ); if a is even then return( 2a , b, t) k else return ( a−ℓ 2 , b + 2 , t) fi elsif (k is odd and ℓ is even) or k < ℓ then (a, b, t) := ext-bgcd(ℓ, k); return (b, a, t) else (a, b, t) := ext-bgcd(k − ℓ, ℓ); return (a, b − a, t) fi end 3.4. We have 2(n−1)/2 = 2864 ≡ 1 mod 1729 and by Theorem 1.70 (c) we obtain ( 2n ) =
(−1)(n −1)/8 = 1. Note that 1729 ≡ 1 mod 16 and therefore n 8−1 is an even number, so 1729 is an Eulerian pseudoprime to base 2. We write 1728 = 2ℓ u = 26 ⋅ 27 and let b = 645 ≡ 227 ≢ 1 mod 1729. Thus, 0 1 2 3 4 5 (b 2 , b 2 , b 2 , b 2 , b 2 , b 2 ) = (645, 1065, 1, 1, 1, 1) modulo 1729. Since −1 does not occur in this sequence, 1729 is not a strong pseudoprime to base 2. Further, for c = 1065, we have c2 ≡ 1 mod 1729. Consequently, 1064 ⋅ 1066 = (c − 1)(c + 1) ≡ 0 mod 2
2
298 | Solutions to Exercises 1729. Therefore, gcd(1064, 1729) = 133 is a nontrivial divisor of 1729 = 133 ⋅ 13 = 7 ⋅ 19 ⋅ 13. Moreover, let us note that in this particular case out of all a ∈ {1, . . . , n − 1} (or all a ∈ {1, . . . , n − 1} coprime to n, respectively) the Fermat test fails for 75% (or 100%, respectively), the Solovay–Strassen test for 39.5% (or approximately 52.7%, respectively) and the Miller–Rabin test for approximately 9.4% (or 12.5%, respectively). Thus, in particular, 1729 is a Carmichael number. 3.5. Let r be the order of a in (ℤ/nℤ)∗ . From (i), we obtain r | n − 1, and with (ii) this yields r = n − 1. Thus, |(ℤ/nℤ)∗ | = n − 1, and n is prime. 3.6. Suppose, f n is prime. Then, Euler’s criterion yields 3(f n −1)/2 ≡ ( f3n ) mod f n . Together with f n ≡ 1 mod 4 using the law of quadratic reciprocity, we obtain ( f3n ) = (3−1)/2 ( f3n ) ≡ f n = f n ≡ −1 mod 3. The converse direction is a consequence of the Lucas test in Exercise 3.5. 3.7 (a) By the law of quadratic reciprocity we have ( 3n ) = − ( 3n ) = − ( 13 ) = −1 because 2p − 1 ≡ (−1)p − 1 ≡ 1 mod 3 for odd p. Using the well-known quadratic formula it follows that f is irreducible. 3.7 (b) The polynomial g(Y) = Y 2 − 4Y + 1 in K[Y] has the roots X and 4 − X. These are the only zeros of g because K is a field. The coefficients of g are in 𝔽n , so 0 = g(X)n = g(X n ) follows. Therefore, X n ∈ {X, 4 − X}. Since the roots of Y n − Y are exactly the elements of 𝔽n and X ∈ ̸ 𝔽n , we obtain X n ≠ X and thus X n = 4 − X. This yields (X − 1)n+1 = (X − 1)n (X − 1) = (X n − 1)(X − 1) = (3 − X)(X − 1) = −X 2 + 4X − 3 = −2 Note that a → a n is the Frobenius homomorphism and that a n = a for all a ∈ 𝔽n . 3.7 (c) In 𝔽n and thus also in K by Euler’s criterion we have 2(n−1)/2 = 1 because ( 2n ) = 1. Moreover, (X − 1)2 = 2X in K. This yields (X − 1)n+1 = ((X − 1)2 )
(n+1)/2
= (2X)(n+1)/2 = 2X (n+1)/2
3.7 (d) We define f(X) = X 2 − 4X + 1. For the direction from right to left, let the congruence hold, and let q be the smallest prime divisor of n. In 𝔽q [X] we have X n+1 ≡ 1 mod f(X) and X (n+1)/2 ≢ 1 mod f(X). Therefore, in the group of units of R = 𝔽q [X]/f , the element X has order n + 1. If n is composite, then q ⩽ √n. Since R contains q2 elements, we know n ⩾ q2 > |R∗ | ⩾ n + 1, a contradiction. Thus, n is prime. For the converse direction, let n be prime and let K = 𝔽n [X]/f . In the field, K we have 2X (n+1)/2 = (X − 1)n+1 = −2 and thus X (n+1)/2 = −1, as desired. 3.8. Let R = ℤ/nℤ, let f(X) = X 2 − 4X + 1 and let K = R[X]/f . By induction on j we j j first show that in K the property ℓj = X 2 + (4 − X)2 is satisfied. For j = 0 we have
Chapter 3 |
299
ℓ0 = 4 = X + (4 − X). Now let j ⩾ 0. Using X(4 − X) = 1, we obtain j
j
2
ℓj+1 = ℓ2j − 2 = (X 2 + (4 − X)2 ) − 2 = X2
j+1
+ (4 − X)2
j+1
+ 2X 2 (4 − X)2 − 2
j
j
= X2
j+1
+ (4 − X)2
j+1
+ 2(X(4 − X)) − 2
= X2
j+1
+ (4 − X)2
j+1
+ 2 ⋅ 12 − 2 = X 2
2j
j
j+1
+ (4 − X)2
j+1
We have X (n+1)/2 = −1 if and only if X (n+1)/2−k = −X −k . For k = (n + 1)/4, together with X −1 = 4 − X, we obtain the equivalence X (n+1)/2 = −1 ⇔ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ X (n+1)/4 + (4 − X)(n+1)/4 = 0 ℓp−2
The assertion follows from Exercise 3.7. 3.9. We have the equivalence a n+1 ∈ a1 ℤ+⋅ ⋅ ⋅+ a n ℤ ⇔ gcd(a1 , . . . , a n ) | a n+1 . Here, g = gcd(a1 , . . . , a n ) = gcd(gcd(a1 , . . . , a n−1 ), a n ). In particular, using the extended Euclidean algorithm numbers y i ∈ ℤ with a1 y1 + ⋅ ⋅ ⋅ + a n y n = g can be computed. Setting x i = y i a n+1 /g, this yields a solution for the equation. 3.10. We let q = 41, u = 5 and ℓ = 3. Then, q − 1 = u2ℓ . Furthermore, from g(q−1)/2 ≡ −1 mod 41, we can see that g is not a square in 𝔽41 . In the second step i j we now successively determine bits k 0 , k 1 and k 2 . Let x i = ag− ∑j=0 k j 2 , especially x−1 = a. We determine k i ∈ {0, 1} from k 0 , . . . , k i−1 and let k i = 0 if and only if u2ℓ−i−1 x i−1 ≡ 1 mod q is satisfied. u2ℓ−1 = a(q−1)/2 ≡ 1 mod q and we let k 0 = 0. Note that by – For i = 0 we have x−1 Euler’s criterion the input would not be a square if we would have k 0 = 1. ℓ−2 0 – For i = 1 we compute x0 = x−1 ⋅ g k0 2 = x−1 = 2. This yields x0u2 ≡ −1 mod q and we let k1 = 1. 1 – For i = 2 we compute x1 = x0 ⋅ (g−1 )k1 2 ≡ 2 ⋅ 142 ≡ 23 mod 41; for the second ℓ−3 congruence we use g−1 ≡ 14 mod 41. Then, we obtain x1u2 ≡ −1 mod q and we let k 2 = 1. Thus, we obtain k = k 0 + 2k 1 + 4k 2 = 6. Finally b = 2(u+1)/2 g−uk/2 = 23 145⋅3 ≡ 24 mod 41 is a square root of 2 in 𝔽41 . The second solution is 17 = 41 − 24. 3.11. First, let a(p−1)/4 = 1 and b = a(p+3)/8 . Then, b 2 = a(p+3)/4 = a ⋅ a(p−1)/4 = a ⋅ 1 = a. Now let a(p−1)/4 = −1 and b = 2−1 (4a)(p+3)/8 . From Euler’s criterion and the law of quadratic reciprocity it follows that 2(p−1)/2 = ( 2p ) = −1. Thus, we obtain b 2 = 2−2 (4a)(p+3)/4 = 2−2 ⋅ 2(p+3)/2 ⋅ a(p+3)/4 = 2(p−1)/2 ⋅ a(p−1)/4 ⋅ a = (−1) ⋅ (−1) ⋅ a = a. 3.12 (a) From the law of quadratic reciprocity it follows that ( −1 p ) = −1, so −1 is not a square, that is, X 2 = −1 has no solution in 𝔽p . Since f(X) has degree 2, it is irreducible. −1 3.12 (b) We have ( 1p ) = 1 and ( p−1 p ) = ( p ) = −1. Therefore, more generally, let
1 ⩽ b < c ⩽ p − 1 with ( bp ) = 1 and ( pc ) = −1. If c = b + 1, then we let a = b. Otherwise,
300 | Solutions to Exercises we consider d = ⌊(b + c)/2⌋ and compute ( dp ). Depending on this result being either −1 or 1, we recursively continue the algorithm with the numbers b < d or d < c. This binary search returns the requested number a after at most O(log p) evaluations of the Jacobi symbol. 3.12 (c) Let a be the number computed in the previous part of the exercise. The elements a + 1 and −1 both are no squares, therefore −(a + 1) is a square. Since p ≡ −1 mod 4, one can efficiently compute b, c ∈ 𝔽p with b 2 = a and c2 = −(a + 1). Now b 2 + c2 = −1. We let g = b + cX ∈ 𝔽p2 . Assume that g is a square. Then there are s, t ∈ 𝔽p with (s + tX)2 = g. Thus, 2 s − t2 = b and 2st = c. Let h = (s − tX)2 . Then, h = b − cX and thus gh = b 2 + c2 = −1. With r = (s + tX)(s − tX) = s2 + t2 ∈ 𝔽p , we obtain r2 = gh = −1. This is a contradiction to ( −1 p ) = −1. Consequently, g is not a square. We can now use the deterministic part of Tonelli’s algorithm to extract roots in 𝔽p2 . 3.13. We let a = 2 and t = 0, and we observe that (t2 − 4a)11 ≡ −1 mod 23; thus, t = 0 is a valid choice for the first step in Cipolla’s algorithm. We now determine X 12 by repeatedly replacing X 2 by tX − a = −2 and computing coefficients in ℤ/23ℤ: X 12 = (X ⋅ X 2 )4 ≡ (−2X)4 = 24 (X 2 )2 ≡ 24 (−2)2 ≡ 18 mod (X 2 + 2) So, 18 and 5 = 23 − 18 are the two square roots of 2 in 𝔽23 . 3.14. For B = {2, 3} we find k = 27 ⋅ 35 = 128 ⋅ 243 = 31 104. Using fast modular exponentiation and a = 2 we further obtain a k ≡ 82 mod n. Now, gcd(a k − 1, n) = gcd(81, 253) = 1 and we did not find a factor. For B = {2, 3, 5} we find k = 27 ⋅ 35 ⋅ 53 = 128 ⋅ 243 ⋅ 125 = 3 888 000. Using fast modular exponentiation and a = 2 we now obtain a k ≡ 133 mod n. Then, gcd(a k − 1, n) = gcd(132, 253) = 11 and we have found a nontrivial factor of n. Let us remark that the p − 1 algorithm with B = {2, 3} finds a nontrivial factor for roughly 24% of all a ∈ {2, . . . , n − 1}. For B = {2, 3, 5} the probability is 84%. 3.15. We let x0 = y0 = 12 and obtain the following values: x1 = 145
y1 = 356
gcd(y1 − x1 , n) = 1
x2 = 356
y2 = 144
gcd(y2 − x2 , n) = 53
Thus, 53 is a divisor of n. 3.16. The order n of (ℤ/19ℤ)∗ is 18. We let g = 2, y = 3 and m = ⌊√n⌋ = 4. Performing the baby-steps results in the following table B: r yg n−r
mod 19
3
2
1
0
17
15
11
3
Chapter 3 |
301
As to the giant-steps, we compute h = 24 = 16 and successively h0 = 1, h1 = 16, h2 ≡ 9. None of these values can be found in the second row of the table. Finally, for s = 3 we obtain h s ≡ 11, which can be found in the table B at r = 1. Thus, x = 3 ⋅ m + r = 13 is the desired value. 3.17. Compute the smallest n ⩾ 0 with g n = 1. To this end check the giant-steps s in ascending order and discard the solution r = 0 and s = 0. A small optimization is the following. If two baby-steps (r, a) and (r , a) with r < r occur, one only stores the entry (r, a) in the table B. 3.18 (a) The order of G is 22 = 2⋅11. We have g2 = 9 ≢ 1 mod 23 and g11 ≡ 1 mod 23. So, 11 is the order of g. 3.18 (b) We let y = 18, we let f : ℤ/qℤ × ℤ/qℤ → ℤ/qℤ × ℤ/qℤ such that (r + 1, s) { { { f(r, s) = {(2r, 2s) { { {(r, s + 1)
if (g r y s mod 23) ≡ 0 mod 3 if (g r y s mod 23) ≡ 1 mod 3 if (g r y s mod 23) ≡ 2 mod 3
and we define a sequence (r i , s i ) with (r1 , s1 ) = (1, 1) and (r i+1 , s i+1 ) = f(r i , s i ) for i ⩾ 1. Further, we let h i = g r i y s i mod 23. For ℓ ∈ {1, 2, 4} we compute the values of (rℓ , sℓ ) as well as hℓ ; and for k ∈ {1, . . . , ℓ} the values of (rℓ+k , sℓ+k ) and hℓ+k are displayed in the following table: ℓ
ℓ=1
ℓ=2
ℓ=4
(r ℓ , sℓ )
(1, 1)
(1, 2)
(3, 2)
hℓ
8
k
k=1
k=1
k=2
k=1
k=2
k=3
k=4
(r ℓ+k , sℓ+k )
(1, 2)
(2, 2)
(3, 2)
(3, 3)
(4, 3)
(5, 3)
(5, 4)
6
18
8
6
18
8
6
hℓ+k
6
8
The algorithm terminates if hℓ = hℓ+k , that is, in the current case at ℓ = 4 and k = 3; the values for ℓ = k = 4 are actually not computed anymore. We have g3 y2 = hℓ = hℓ+k = g5 y3 . If y = g x , then g3+2x = g5+3x and therefore 3 + 2x ≡ 5 + 3x mod 11. One solution of this congruence is 9. Since 39 ≡ 18 mod 23, this is indeed the desired value. Note that the algorithm does not find the accordance with value 6 of h2 and h5 , since the values h5 , h6 , h7 , h8 are exclusively compared with h4 . 3.19. We let g = 2 and y = 5. The order of (ℤ/19ℤ)∗ is n = 18 = 2 ⋅ 32 . We choose parameters as displayed in the following table: p
ep
np
gp
yp
xp
2 3
1 2
9 2
18 4
1 6
0 7
302 | Solutions to Exercises x
For p ∈ {2, 3} we have n p = n/p e(p) , g p = g n p mod q and y p ≡ y n p ≡ g pp mod q. If we let x = 16, then congruences x ≡ 0 mod 2 and x ≡ 7 mod 32 are satisfied and by Theorem 3.16 we know that 16 is the desired value. To determine the value x3 we write x3 = d0 +d1 ⋅3. The value d0 can be determined as a solution of the equation ((g3 )3 )d0 ≡ (y3 )3 mod 19. Now (g3 )3 = 43 ≡ 7 mod 19 and (y3 )3 = 63 ≡ 7 mod 19 and, moreover, d0 = 1 trivially is a solution. In order to −d determine d1 we first let z1 = 11. Then, z1 ≡ y3 g3 0 mod 19 and d1 is a solution of the congruence ((g3 )3 )d1 ≡ z1 mod 19, that is, by insertion 7d1 ≡ 11 mod 19. One solution is d1 = 2 and we obtain x3 = 1 + 2 ⋅ 3 = 7. 3.20. The order of (ℤ/pℤ)∗ is a power of 2. Using the Pohlig–Hellman algorithm, it suffices to compute discrete logarithms in O(log p) two-element groups. 3.21 (a) A simple computation yields ω b/2 ≡ −1 mod 97 and ω b ≡ 1 mod 97. 1 1 3.21 (b) F = ( 1 1
1 −22 −1 22
1 −1 1 −1
1 22 ) −1 −22
1 1 F=( 1 1
1 22 −1 −22
1 −1 1 −1
1 −22 ) −1 22
3.21 (c) The degree of f ∗ g is 3, therefore we can choose b = 4 and use ω = −22 as primitive bth root of unity. This yields the following course of the FFT on input f (left) and g (right): 2 + 3X
1 + X + X2 1+X 1
1 0
1
1 (2, 0)
3
2 3
0
2
(1, 1)
(2, 2)
(3, −22, 1, 22)
0 (3, 3)
(5, 33, −1, −29)
These flowcharts have the following meaning: if f, f0 , f1 are polynomials such that f f(X) = f0 (X 2 ) + Xf1 (X 2 ) and w = (1, ω, ω2 , . . . , ω b−1 ), then f0 symbolizes the f1 u
υ
recursion and (u,u)+w(υ,υ) the merge of the results, where addition and multiplication are meant componentwise. We compute the componentwise product (3, −22, 1, 22) ⋅ (5, 33, −1, −29) ≡ (15, 50, −1, 41) mod 97. As to the inverse transformation, we use the same scheme, only the primitive bth root of unity used here is ω−1 = 22. Then (15, 50, −1, 41) (15, −1)
(50, 41) −1
15
50
14 + 16X 8 + 20X + 20X 2 + 12X 3
41 −6 + 9X
Chapter 5
| 303
To obtain the result, we have to multiply 8 + 20X + 20X 2 + 12X 3 by b −1 = −24 and we end up with (f ∗ g)(X) = 2 + 5X + 5X 2 + 3X 3 . 3.22. Let u and υ be natural numbers with a binary representation of at most n bits. The time complexity of the method is dominated by the time needed for the FFT and therefore it is O(n log n). There are at most m = ⌈n/64⌉ indices i, for which u i as well as υ i are nonzero. We consider the product uυ = (∑j u j 264j )(∑j υ j 264j ) = ∑j (∑k u k υ j−k )264j with the usual convention that u j = υ j = 0 for j < 0 and for j ⩾ m. In order to have the z j = ∑k u k υ j−k uniquely determined by w j , the inequality z j < p1 p2 p3 has to hold. At most m summands in z j are nonzero, and each of these summands is bounded by 264 . Therefore, we obtain z j ⩽ m ⋅ 264 ⋅ 264 . Since p i > 256 it suffices, if the number m of 64-bit blocks of u and υ satisfies m ⩽ 23⋅56 /22⋅64 = 240 . Furthermore, 2m ⩽ 256 has to hold in order to have no overflow in the Fourier transform. This already follows from the fact that m ⩽ 240 . Therefore, using this method numbers with a binary representation of 64 ⋅ 240 bits can be multiplied. This corresponds to a binary representation of more than eight terabytes. With very large primes p i (as close to 64 bits as possible) even larger numbers can be multiplied.
Chapter 5 5.1 (a) If the characteristic is not 2, we complete the square on the left-hand side of ̃ Equation (5.2) and obtain Ỹ 2 + 2( 2c X̃ + 2d )Ỹ + ( 2c X̃ + 2d )2 − t(X)̃ = (Ỹ + 2c X̃ + 2d )2 − t(X), ̃ where t is a quadratic polynomial in X, which we push to the right-hand side. Now let Y = Ỹ + 2c X̃ + 2d and X̂ = X̃ and gather the coefficients on the right-hand side. ̂ and combine the coefficients. 5.1 (b) Let X̂ = X − e/3 5.2. Our computations are performed in an algebraically closed field k. If the polynomial has a multiple root, then we have X 3 + AX + B = (X − a)2 (X − b) for some a, b ∈ k. Multiplying out and comparing coefficients yields 2a + b = 0, 2ab + a2 = A and −a2 b = B. By substituting b = −2a in the second and third equation, we obtain A = −3a2 and B = 2a3 . It follows that 4A3 +27B2 = −4⋅27a6 +27⋅4a6 = 0 independent of the field’s characteristic. For the converse, let 4A3 + 27B2 = 0. If the characteristic is neither 2 nor 3, then this yields (−A/3)3 = (B/2)2 . Let a be a square root of −A/3; then A = −3a2 and B = 2a3 . For b = −2a the same computation as in the first part yields (X − a)2 (X − b) = X 3 + AX + B and thus, a is a multiple root. If the characteristic is 2, then 4A3 + 27B2 = 0 if and only if B = 0. Then, X 3 + AX = X(X 2 + A) = X(X + a)2 for a ∈ k with a2 = A. If the characteristic is 3, then 4A3 + 27B2 = 0 is equivalent to A = 0. We obtain X 3 + B = (X + b)3 for b ∈ k with b 3 = B.
304 | Solutions to Exercises 5.3. Let s(X) = X 3 + AX + B. We show that for every a ∈ 𝔽p there are 1 + ( s(a) p ) points in E(𝔽p ) with the value a in the X-coordinate. If s(a) = 0, then (a, 0) is the only point. If ( s(a) p ) = 1, then there exists a square root b ∈ 𝔽 p of s(a) and the two points (a, b) and (a, −b) are the only ones with this X-coordinate. If ( s(a) p ) = −1, then no point on the curve has the value a as an X-coordinate. This yields s(a) s(a) )) = p + ∑ ( ) E(𝔽p ) = ∑ (1 + ( p p a∈𝔽 a∈𝔽 p
p
5.4 (a) 4 ⋅ 13 + 27 ⋅ 62 = 8 ≠ 0 in 𝔽11 . By Exercise 5.2, this proves the claim. 5.4 (b) For each x ∈ 𝔽11 we distinguish the three possible cases whether x3 + x + 6 equals 0, is a square in 𝔽11 or is not a square in 𝔽11 . Then, we obtain from Exercise 5.3 the set E(𝔽11 ) = {(2, 4), (2, 7), (3, 5), (3, 6), (5, 2), (5, 9), (7, 2), (7, 9), (8, 3), (8, 8), ̃ 11 )| = |E(𝔽11 ) ∪ O| = 12 + 1 = 13 is prime and thus (10, 2), (10, 9)}. Therefore, |E(𝔽 ̃ 11 ) is cyclic. E(𝔽 5.5 (a) 4 ⋅ 13 = −1 ≠ 0 in 𝔽5 . By Exercise 5.2, this proves the claim. 5.5 (b) Similar to the previous exercise, we obtain E(𝔽5 ) = {(0, 0), (2, 0), (3, 0)}. Each ̃ 5 ) is isomorphic to the Klein fourof these three elements is of order 2. Hence, E(𝔽 group; any two elements of E(𝔽5 ) generate the group. ̃ | 3P = O } is a subgroup of E(k). ̃ 5.6. It is easy to see that { P ∈ E(k) Let P = (a, b) ∈ E(k) be a point of order 3. We have b ≠ 0 because otherwise P would be of order 2. 2 Moreover 2P = −P and therefore ( 3a2b+A )2 − 2a = a. Together with b 2 = a3 + Aa + B, 4 2 2 this yields 3a + 6Aa + 12Ba − A = 0. We therefore consider the polynomial t(X) = 3X 4 + 6AX 2 + 12BX − A2 . Every point of order 3 would lead to the same polynomial; in particular, the X-coordinate of every point in { P ∈ E(k) | 3P = O } is a root of t. Moreover, every root of t is the X-coordinate of two different points in { P ∈ E(k) | 3P = O }. If n is the number of different roots of t, then |{ P ∈ E(k) | 3P = O }| = 2n ⩽ 8. The derivative of t is t (X) = 12(X 3 + AX + B). The polynomial t has no multiple roots because X 3 + AX + B has three distinct roots. Therefore, t has no triple roots. Suppose that t(X) has two different double roots c ≠ d. Then, t(X) = 3(X −c)2 (X −d)2 = 3(X 2 − 2cX + c2 )(X 2 − 2dX + d2 ). Comparing coefficients at X 3 yields 0 = −6(c + d). Then, c = −d and therefore t(X) = 3(X 2 − c2 )2 = 3X 4 − 6c2 X 2 + 3c4 . This implies B = 0 as well as A = −c2 and A2 = −3c4 . We obtain A = B = 0, which contradicts 4A3 + 27B2 ≠ 0. Therefore, t has at least two simple roots c1 ≠ c2 . We further have c3i + Ac i + B ≠ 0 for i ∈ {1, 2}. Thus for suitable d i ≠ 0 we can find four different points (c i , ±d i ) on E(k). By construction of t, all four points P i satisfy 2P i = −P i . This shows that all four points have order 3. Every Abelian group, in which at least four elements have order 3, contains ℤ/3ℤ × ℤ/3ℤ. So there are at least 8 points of order 3. 5.7. We may assume that neither α nor β is trivial. For α = β, we have (α + β)(P) = 2α(P), and in this case α + β is rational. Next, we consider the case α(P) = (f(P), g(P))
Chapter 5 |
305
and β(P) = (f(P), h(P)) for almost all P ∈ E(k). We may assume that α ≠ β, and therefore g(P) ≠ h(P) for almost all points P ∈ E(k). Then the only possibility is g(P) = −h(P) for infinitely many P ∈ E(k). This implies g = −h in k(x, y). Thus, α(P) = −β(P) for almost all P ∈ E(k), and furthermore α = −β by Lemma 5.15. This shows that α + β is the trivial endomorphism. The only remaining case is α(P) = (f1 (P), g(P)) and β(P) = (f2 (P), h(P)) for almost all P ∈ E(k) such that f1 −f2 ≠ 0 ∈ k(x, y). So for almost all P, we have f1 (P) ≠ f2 (P) and using the addition formula on the elliptic curve for points with different x-coordinate we find the corresponding rational morphism for α + β. 5.8. Let α(P) = (r1 (P), r2 (P)) for almost all P ∈ E(k). We already know that we can i (x) . Since α is an endomorphism, we have α(x, −y) = −α(x, y). write r i (x, y) = u i (x)+yυ q i (x) Therefore, υ1 (x) = 0 ∈ k[x] and u 2 (x) = 0 ∈ k[x]. This yields the desired polynomials p(x) = u 1 (x), q(x) = q1 (x), u(x) = υ2 (x) and υ(x) = q2 (x). If p(x)/q(x) or u(x)/υ(x) were constant, then the image of α would be a finite set, in contradiction to the kernel being finite. 5.9. We only have to show that ϕ q is a group homomorphism. Since ϕ q is a bijection on E(k), the inverse mapping ψ(P) = ϕ−1 q (P) is defined and it suffices to show ̃ that ψ is a group homomorphism. We identify E(k) with Pic0 (E(k)) according to Theorem 5.5. The mapping P → ψ(P) induces an automorphism of the group of divisors by ∑P∈E(k) n P P → ∑P∈E(k) n p ψ(P), which we denote by ψ, too. It remains to show that principal divisors are mapped to principal divisors because then ψ induces a homomorphism on Pic0 (E(k)). Consider a principal divisor div(f) = ∑P∈E(k) ordP (f)P with f ∈ k[x, y]. We obtain ψ(div(f)) = ∑ ordP (f)ψ(P) = ∑ ordϕ q (P) (f)P P∈E(k)
P∈E(k)
By Theorem 5.3, we obtain ordϕ q (P) (f) = q ⋅ ordP (f) = ordP (f q ) for all P ∈ E(k). Thus, ψ(div(f)) = div(f q ) is a principal divisor. Therefore, ψ induces an automorphism of ̃ the Abelian group E(k). Hence, the inverse mapping ϕ q is a homomorphism, too. 5.10. We write the rational morphism α as α = (f(x, y), g(x, y)) with f(x, y) = 2 ( 3x2y+A )2 − 2x. For a suitable polynomial p(x), we have f(x, y) = p(x) s(x) for s(x) = 3 x + Ax + B. Since s(x) has no multiple roots, s (x) is not the zero polynomial. For y = 0 there must be a pole at p(x) s(x) . Therefore, p(x) and s(x) have no common zeros. 5.11. The Frobenius morphism ϕ q is an endomorphism, therefore (ϕ q − 1) is one as well (according to Theorem 5.17). The kernel (except for O) consists of exactly the points (a, b) ∈ E(k) with a = a q and b = b q . Hence, we have (a, b) ∈ E(𝔽q ). (See for example the proof of Theorem 1.61).
306 | Solutions to Exercises
Chapter 6 6.1. We denote the fact that u and υ are transposed by u ∼ υ. Then abc = a(bc) ∼ (bc)a = bac = (ba)c ∼ c(ba) = cba, but abc ≁ cba. 6.2. For the implication (i) ⇒ (ii) we may assume that M = Σ∗ . Let φ(a) = 1 for all a ∈ Σ. Then φ : Σ∗ → ℕ defines a homomorphism with φ(w) = 0 if and only if w is empty (the number φ(w) is the length of w). The empty word is the neutral element of Σ∗ . If pq = xy, then we write p = a1 ⋅ ⋅ ⋅ a k , q = a k+1 ⋅ ⋅ ⋅ a m , x = b 1 ⋅ ⋅ ⋅ b ℓ , y = b ℓ+1 ⋅ ⋅ ⋅ b n with a i , b j ∈ Σ. From pq = xy, we obtain m = n and a i = b i for all i ∈ {1, . . . , n}. Without loss of generality let k ⩾ ℓ, otherwise we exchange (p, q) and (x, y). With u = aℓ+1 ⋅ ⋅ ⋅ a k , we therefore get p = xu and y = uq. ̃ ab | a, b ∈ M̃ }. Now let M be a monoid satisfying (ii). For M̃ = M\{1}, we let Σ = M\{ The set Σ contains those elements of M which are not decomposable. By induction over φ(w), we show that each element from M can be written as a product of elements from Σ. If φ(w) = 0, then w is the neutral element, which appears as the empty product. Now let φ(w) > 0. If w ∈ Σ, then there is nothing to show. So let w = uυ with ̃ Then φ(u) < φ(u) + φ(υ) = φ(w). Analogously, φ(υ) < φ(w). By induction, u, υ ∈ M. u and υ can be written as products of elements from Σ and thus also w = uυ. This shows that M is generated by Σ. Finally, let a1 ⋅ ⋅ ⋅ a m = b 1 ⋅ ⋅ ⋅ b n with a i , b j ∈ Σ. In order to show that M corresponds to the free monoid Σ∗ , we have to prove that m = n and a i = b i for all i ∈ {1, . . . , m}. This is done by induction on m + n. For m = 0, we have 0 = φ(1) = φ(b 1 ⋅ ⋅ ⋅ b n ) = φ(b 1 ) + ⋅ ⋅ ⋅ + φ(b n ). From φ(c) > 0 for all c ∈ Σ, we obtain n = 0. Now let m ⩾ 1. φ(b 1 ⋅ ⋅ ⋅ b n ) = φ(a1 ⋅ ⋅ ⋅ a m ) > 0 yields n ⩾ 1. Thus, there is u ∈ M with a1 = b 1 u, b 2 ⋅ ⋅ ⋅ b n = ua2 ⋅ ⋅ ⋅ a m or b 1 = a1 u, a2 ⋅ ⋅ ⋅ a m = ub 2 ⋅ ⋅ ⋅ b n . Without loss of generality let a1 = b 1 u and b 2 ⋅ ⋅ ⋅ b n = ua2 ⋅ ⋅ ⋅ a m . From a1 , b 1 ∈ Σ and the construction of Σ it follows that u = 1; this shows a1 = b 1 and a2 ⋅ ⋅ ⋅ a m = b 2 ⋅ ⋅ ⋅ b n . Inductively, we obtain m = n and a i = b i for all i ∈ {2, . . . , m}. 6.3. Consider any nontrivial group G. Then G, viewed as a monoid, is not free. For pq = xy choose u = p−1 x ∈ G. Then pu = x and q = uy. 6.4 (a) Let u ≺ υ. If u is a proper prefix of υ, then wu is a proper prefix of wυ. If u = ras and υ = rbt with r, s, t ∈ Σ∗ , a, b ∈ Σ and a < b, then wu = (wr)as and wυ = (wr)bt. Thus, in both cases, wu ≺ wυ. Conversely, let wu ≺ wυ. If wu is a proper prefix of wυ, then u is a proper prefix of υ. If wu = r as and wυ = r bt and a < b, then |w| ⩽ |r |. Let r = wr. Then u = ras and υ = rbt, and therefore u ≺ υ. 6.4 (b) Since u is not a prefix of υ, we consequently have u = ras and υ = rbt with r, s, t ∈ Σ∗ , a, b ∈ Σ and a < b. This yields uw = ra(sw) and υz = rb(tz) and therefore uw ≺ υz. 6.5. (i) ⇒ (ii): Suppose that w is a proper factor of w2 , that is, w2 = uwυ with u ≠ ε ≠ υ. Then there are words s, t ∈ Σ∗ such that uw = wt and wυ = sw. From that
Chapter 6
| 307
we obtain |u| = |t|, |υ| = |s| and w = st = us = tυ, and further u = t and s = υ, which implies w = st = us = ts. By Theorem 6.5, w is not primitive. (ii) ⇒ (i): If w = u i with i > 1 then w2 = u 2i = u(u i )u i−1 and w is a proper factor of w2 . (i) ⇔ (iii) holds because w = u i with i > 1, u = au if and only if υa = u (au )i−1 a = (u a)i . The root u a of υa, therefore, is the cyclic permutation of the root u of w. 6.6. Let u and υ be primitive roots. Then, without loss of generality w = u i = υ j with 1 ⩽ i ⩽ j. For i = 1, w is primitive and therefore u = υ = w. For i ⩾ 2, we have |u| + |υ| ⩽ |w| and by the periodicity lemma, gcd(|u|, |υ|) is a period of w, too. But this yields u = υ. 6.7. (i) ⇒ (ii): Let υ be a proper suffix of w = uυ. We first show that υ cannot be a proper prefix of w. Assume to the contrary that w = υt. By Theorem 6.5 we hace υ = (rs)k r, u = rs and t = sr for r, s ∈ Σ∗ . Since w is primitive, we may assume that r ≠ ε and rs ≠ sr. Since w is a Lyndon word, we obtain w = (rs)k+1 r ≺ r(rs)k+1 . Using Exercise 6.4 the word r can be reduced at the beginning, and we obtain (sr)k+1 ≺ (rs)k+1 . Again by Exercise 6.4 we can multiply this equation from the right by r and obtain (sr)k+1 r ≺ (rs)k+1 r = w. This is a contradiction to w being a Lyndon word. Thus, no proper suffix of a Lyndon word w can occur as a prefix of w. If υ ≺ uυ = w, then by Exercise 6.4 we have υu ≺ uυ. This is not possible because w is a Lyndon word. Therefore, w ≺ υ as required. (ii) ⇒ (i): For factorizations w = uυ, we have uυ ≺ υ ≺ υu. If w = u i with i > 1 then w ≺ u ≺ u i = w. Thus, w is primitive, and therefore a Lyndon word. In the remainder of the solution we use (i) ⇔ (ii). (i) ⇒ (iii): The statement is clear for w ∈ Σ. So let w ∈ ̸ Σ. Since all letters are Lyndon words, one can decompose w = uυ with maximal |υ|, such that u ≠ ε and υ is a Lyndon word. We have u ≺ uυ ≺ υ since uυ is a Lyndon word. It remains to show that u is a Lyndon word, too. Without loss of generality, let u ∈ ̸ Σ, otherwise u is already a Lyndon word. Now consider a proper suffix u of u. The word u υ is not a Lyndon word because υ was chosen to be maximal. Thus, there is a suffix t of u υ with t ≺ u υ. From u ≺ t follows u ≺ t ≺ u υ and then t = u s. The word s is a suffix of the Lyndon word υ, and thus υ ≺ s, which implies u υ ≺ u s = t, a contradiction. Therefore, t ≼ u and altogether, we obtain u ≺ uυ ≺ t ≼ u . From (ii) now follows that u is a Lyndon word, as we had to show. (iii) ⇒ (i): If w ∈ Σ, then the statement is clear. So let w = uυ with u ≺ υ for Lyndon words u, υ. We first show the auxiliary claim uυ ≺ υ. If u is not a prefix of υ, then this follows from Exercise 6.4. If υ = uυ , then υ ≺ υ because υ is a Lyndon word. But then uυ ≺ uυ = υ. We now show that uυ ≺ s for every proper suffix s of uυ. If s is a suffix of υ, then uυ ≺ υ ≺ s. Otherwise, s = tυ. Since t is a proper suffix of u, we have u ≺ t and therefore uυ ≺ tυ = s by Exercise 6.4. So uυ is a Lyndon word. 6.8. Since each letter in Σ is a Lyndon word, there exists a decomposition of w into Lyndon words. Let w = ℓ1 ⋅ ⋅ ⋅ ℓn be such a decomposition with minimal n. If ℓi ≺ ℓi+1 at one place i ∈ {1, . . . , n − 1}, then ℓi ℓi+1 by Exercise 6.7 is a Lyndon word. This is impossible since n was chosen to be minimal. Therefore, ℓn ≼ ℓn−1 ≼ ⋅ ⋅ ⋅ ≼ ℓ1 . Thus,
308 | Solutions to Exercises
there is a decomposition of the required kind. It remains to show the uniqueness of this decomposition. So let w = ℓ1 ⋅ ⋅ ⋅ ℓn = ℓ1 ⋅ ⋅ ⋅ ℓm be two such decompositions. We show that ℓ1 = ℓ1 and then get the result by the inductive hypothesis for ℓ2 ⋅ ⋅ ⋅ ℓn = ℓ2 ⋅ ⋅ ⋅ ℓm . So without loss of generality let ℓ1 = ℓ1 ⋅ ⋅ ⋅ ℓi υ with i ⩾ 1 and υ being a nonempty prefix of ℓi+1 . Then by Exercise 6.7, we have ℓ1 ≺ υ. Moreover, υ ≼ ℓi+1 ≼ ℓ1 ≺ ℓ1 since υ is a prefix of ℓi+1 . Altogether, we obtain ℓ1 ≺ ℓ1 , a contradiction.
Chapter 7 7.1 (a) We can express Synt(L) as a regular set as follows: Synt(L) = {ε} ∪ a+ ∪ b + ∪ abb ∗ ∪ a∗ ab ∪ {ba} To see that these are the shortest words, calculate [ba]L = {a, b}∗ ba{a, b}∗ , [a n ]L = {a n } for n ⩾ 1, and [abb n ]L = { a k b k+n | k ⩾ 1 }. The result follows by symmetry in a and b. 7.1 (b) Consider ab and x, y, r, s ∈ Synt(L) such that xyab ≡L ab and ab ≡L abrs. Then we have x = y = r = s = ε. Hence, [ab] ∼D [u] implies u = ab as words. On the other hand [aab] ∼J [ab] because (aab)b ≡L ab. Hence, D ≠ J. 7.1 (c) We have [abb] ⩽R [ab], but there is no word u such that [ab] = [abbu]. 7.2. As we saw in the solution of Exercise 1.6 (a), there are numbers t, p ∈ ℕ with p ⩾ 1 and x t = x t+p . If we choose t and p minimal, then S = {x, . . . , x t+p−1 } and n = t + p − 1. In particular, the semigroup is uniquely determined by t ∈ {1, . . . , n} and different values for t define nonisomorphic semigroups. 7.3. The set product ⋅ defines an associative operation on the power set 2 M , we let A ⋅ B = { ab ∈ M | a ∈ A, b ∈ B }. Therefore, n = (2|M| )! satisfies A n = A2n for all A ⊆ M. Especially, φ(Σ n ) = φ(Σ)n = φ(Σ)2n = φ(Σ2n ). 7.4. Let φ : M → N be a homomorphism into a finite monoid N which recognizes L. Let P = { x ∈ N | x2 ∈ φ(L) }. Then φ−1 (P) = { u ∈ M φ(u) ∈ P } = { u ∈ M φ(uu) = φ(u)φ(u) ∈ φ(L) } = { u ∈ M | uu ∈ L } = √ L Thus, φ−1 (φ(√ L)) = φ−1 (φ(φ−1 (P))) = φ−1 (P) = √L and √L is recognized by N. 7.5. First, let L = K1 × K2 and for i ∈ {1, 2} let φ i : M i → N i be a homomorphism into a finite monoid N i with K i = φ−1 i (φ i (K i )). Let ψ : M 1 × M 2 → N 1 × N 2 with ψ(m 1 , m 2 ) = (φ1 (m1 ), φ2 (m2 )). Then ψ−1 (ψ(L)) = ψ−1 (φ1 (K1 ) × φ2 (K2 )) −1 = φ−1 1 (φ 1 (K 1 )) × φ 2 (φ 2 (K 2 )) = K 1 × K2 = L
Chapter 7
| 309
So, L is recognizable. The direction from right to left now follows, because the class of recognizable languages is closed under union. For the converse, let φ : M1 × M2 → N be a homomorphism into a finite monoid N with φ−1 (φ(L)) = L. For i ∈ {1, 2}, we define the homomorphisms ψ1 : M1 → N and ψ2 : M2 → N by ψ1 (m1 ) = φ(m1 , 1) and ψ2 (m2 ) = φ(1, m2 ). This yields the homomorphism ψ : M1 × M2 → N × N with ψ(m1 , m2 ) = (ψ1 (m1 ), ψ2 (m2 )). We now show that the homomorphism ψ recognizes the language L. To this end, let P = { (n1 , n2 ) ∈ N × N n1 n2 ∈ φ(L) } Then, we have ψ−1 (P) = { (m1 , m2 ) ψ(m1 , m2 ) ∈ P } = { (m1 , m2 ) ψ1 (m1 ) ψ2 (m2 ) ∈ φ(L) } = { (m1 , m2 ) φ(m1 , 1) φ(1, m2 ) ∈ φ(L) } = { (m1 , m2 ) φ(m1 , m2 ) ∈ φ(L) } = φ−1 (φ(L)) = L and therefore ψ−1 (ψ(L)) = ψ−1 (ψ(ψ−1 (P))) = ψ−1 (P) = L. So, ψ recognizes the set L. We obtain L = ψ−1 (ψ(L)) =
⋃ ψ−1 (n1 , n2 ) =
⋃ ψ−1 1 (n1 ) (n 1 ,n2 )∈ψ(L)
(n 1 ,n2 )∈ψ(L)
× ψ−1 2 (n2 )
that is, L is of the desired form. 7.6. We have u ∈ L(B) ⇔ q0 ⋅ u ∈ Q \ F ⇔ q0 ⋅ u ∈ ̸ F ⇔ u ∈ M \ L(A). 7.7 (a) Choose F = F1 × Q2 ∪ Q1 × F2 . Then u ∈ L(B) if and only if (q1 , q2 ) ⋅ u ∈ F. This is equivalent to q1 ⋅ u ∈ F1 or q2 ⋅ u ∈ F2 and therefore to u ∈ L(A1 ) ∪ L(A2 ). 7.7 (b) Choose F = F1 × F2 . Then u ∈ L(B) if and only if (q1 , q2 ) ⋅ u ∈ F. This is equivalent to q1 ⋅ u ∈ F1 and q2 ⋅ u ∈ F2 and therefore to u ∈ L(A1 ) ∩ L(A2 ). 7.8 (a) We apply the Thompson construction from Lemma 7.11, which yields the following automaton. ab 1
1 1
1 1
a
1 1
1 1
1 abb 1
1
1
310 | Solutions to Exercises
7.8 (b) The construction of Lemma 7.12 first inserts intermediate states: a
b
1
2
1
1 1
1
a
1
3
1
1
1 1
1 a
0
1
b
4
b
5
1
6
1
Removing 1-edges and unreachable states yields the following spelling automaton
a b
1
2 a
a
3 a
a
a
a
a a 0
4
a
5
b
6
b a
7.8 (c) We use the power set construction of Theorem 7.17. Notice that the automaton is already minimal. a a
0 b a, b
0
b
1, 3, 4
2, 5 b
a
a
6
b b
a
1, 3 a
b 2
Chapter 7
| 311
7.8 (d) It is sufficient to interchange final and nonfinal states in B. a a
0
b
1, 3, 4
b 0
a, b
1, 3 a
b
a
a
a
2, 5
b 2
6
b b
7.8 (e) We use the construction from Lemma 7.13 to obtain a rational expression. First, we add the distinguished initial and final state and re-number all states. Note that in the resulting automaton there is at most one edge between any two states except for the loop at q6 . Therefore, we also replace the loop’s labeling by {a, b}. 1 1
a q0
1
q2
a
b
q3
q4
a
q5
q1 1
b {a, b}
a
q6
a
b
a
q7
b
b q8
b 1
Next, we eliminate the state q8 . 1 1
a q0
1
q2
a
b {a, b}
a
q6
b 1
b
q3 a
q4 b q7
a
q5 bb
b ba
q1
312 | Solutions to Exercises
Eliminating q7 results in the following automaton. 1 1
a 1
q0
a
q2
a
q4
q5
ba
b {a, b}
b
q3
bb
a
q6
b
q1
ba
bb 1
The next state to eliminate is q6 . {1} ∪ b{a, b}∗
a 1
q0
a
q2
q3
b
q4
ba a
a
q5
b ∪ bb{a, b}∗
q1
ba {1} ∪ bb{a, b}∗
After eliminating q5 , we have {1} ∪ b{a, b}∗
q0
1
q2
a
q3 a
Then, we eliminate q4 . . .
b
q4
q1
ba ∪ a(ba)∗ a {1} ∪ bb{a, b}∗ ∪ a(ba)∗ (b ∪ bb{a, b}∗ )
Chapter 7 |
313
{1} ∪ b{a, b}∗
1
q0
q2
a
b({1} ∪ bb{a, b}∗ ∪ a(ba)∗ (b ∪ bb{a, b}∗ ))
q3
q1
a ∪ b(ba ∪ a(ba)∗ a)
. . . and q3 . q0
1
{1} ∪ b{a, b}∗ ∪ a(a ∪ b(ba ∪ a(ba)∗ a))∗ b({1} ∪ bb{a, b}∗ ∪ a(ba)∗ (b ∪ bb{a, b}∗ ))
q2
q1
Thus, we obtain {1} ∪ b {a, b}∗ ∪ ∗
a (a ∪ b (ba ∪ a(ba)∗ a)) b ({1} ∪ bb {a, b}∗ ∪ a(ba)∗ (b ∪ bb {a, b}∗ )) as a rational expression for the complement. 7.9. We can merge q5 and q6 into one state as well as q2 , q3 and q4 . This yields the following automaton. a
{q2 , q3 , q4 } a q0
b
b q1
a {q5 , q6 }
b
a, b 7.10. States in P(Aρ ) are subsets of Q. In this automaton, we have P⋅a = { q ∈ Q | ∃p ∈ P : (q, a, p) ∈ δ }. Moreover, F is the initial state and P ⊆ Q is a final state if and only if q0 ∈ P. Since L(P(Aρ )) = L(A)ρ it suffices to show that different states in P(Aρ ) define different languages. So, let P ≠ P and without loss of generality q ∈ P \ P .
314 | Solutions to Exercises By assumption, there is a word w = a1 ⋅ ⋅ ⋅ a n with a i ∈ Σ and q0 ⋅ a1 ⋅ ⋅ ⋅ a n = q. Thus, w ρ = a n ⋅ ⋅ ⋅ a1 belongs to the language of of P. Since A is deterministic, w ρ can only be accepted from those states in P(Aρ ), which contain q. In particular, w ρ does not belong to the produce of P . 7.11 (a) The claim follows using the construction in the proof of Theorem 7.4 together with the fact that V is closed under direct products. 7.11 (b) The monoid Synt(L) recognizes the language L. Since V is closed under divisors, the claim finally follows from Theorem 7.4. 7.12. Let (2Σ , ∪, 0) be the monoid of subsets of Σ and alph : Σ∗ → 2Σ the canonical homomorphism defined by a → {a}. Then, alph is surjective and recognizes each language L which is a Boolean combination of languages of the form B∗ for B ⊆ Σ. In 2Σ the equations x2 = x and xy = yx are valid. The syntactic monoid of L is a homomorphic image of 2Σ , so the equations x2 = x and xy = yx hold there, too. Now let φ : Σ∗ → M be a homomorphism recognizing L and let M be a monoid in which the equations x2 = x and xy = yx are valid. Let u = a1 ⋅ ⋅ ⋅ a n be a word with a i ∈ Σ. Due to the equations in M we have φ(u) = ∏a∈alph(u) φ(a). Therefore, the value φ(u) is determined by alph(u). Using the notation [A] for { u ∈ Σ∗ | alph(u) = A }, we obtain L = ⋃ [A] = ⋃ A∗ \ ( ⋃ B∗ ) A∈alph(L)
A∈alph(L)
B⊊A
Thus, L is of the desired form. It should be noted that finiteness of M was not used in the proof. The image φ(Σ∗ ) is finite as a consequence of the equations alone. 7.13 (a) Divisors of aperiodic monoids are aperiodic. For the direction from left to right, it is therefore sufficient to consider an aperiodic group G. In G, we have 1 = g n (g−1 )n = g n+1 (g−1 )n = g ⋅ 1 = g for n ∈ ℕ. So the group G contains only the neutral element. Now let all group divisors of M be trivial. We set n = |M|!. After Exercise 1.6 (b), we have x n = x2n for all x ∈ M. For each x ∈ M the set U = {1} ∪ { x m | m ⩾ n } is a submonoid of M. The mapping φ : U → { x m | m ⩾ n } with φ(1) = x n and φ(x m ) = x m for m ⩾ 1 defines a surjective homomorphism since x m ⋅ x n = x n ⋅ x m = x n ⋅ x n+m = x n x n x m = x2n x m = x n x m = x m . Moreover, G = { x m | m ⩾ n } is a group with neutral n element x . The inverse of x m is x(n−1)m because x m ⋅x(n−1)m = x nm = x n . By assumption |{ x m | m ⩾ n }| = 1 and thus x n = x n+1 . 7.13 (b) If M is aperiodic, then M ∈ C follows from the Krohn–Rhodes theorem and the first part of the exercise. For the converse, we note that U2 is aperiodic, and that divisors of aperiodic monoids are aperiodic. It remains to show that the wreath product of aperiodic monoids is aperiodic. Let n ∈ ℕ and M, N be two aperiodic monoids with x n = x n+1 for all x ∈ M ∪ N. We now consider the wreath product M ≀ N and we
Chapter 7
| 315
show that (f, x)2n = (f, x)2n+1 for all (f, x) ∈ M N × N. We have (f, x)2n = (f ∗ xf ∗ x2 f ∗ ⋅ ⋅ ⋅ ∗ x2n−1 f, x2n ) Now, x n = x m for all m ⩾ n yields (f, x)2n = (f ∗ xf ∗ ⋅ ⋅ ⋅ ∗ x n f ∗ x n f ∗ ⋅ ⋅ ⋅ ∗ x n f, x2n ) = (f ∗ xf ∗ ⋅ ⋅ ⋅ ∗ x n−1 f ∗ (x n f)n , x2n ) For f ∈ M N and x ∈ N, we have x n f ∈ M and hence, (x n f)n = (x n f)n+1 . In N we have x2n = x2n+1 . Thus, (f, x)2n = (f ∗ xf ∗ ⋅ ⋅ ⋅ ∗ x n−1 f ∗ (x n f)n+1 , x2n+1 ) = (f, x)2n+1 This shows that M ≀ N is aperiodic. 7.14. We identify M |N| with M N . Suppose φ(a) = (f a , n a ) with f a : N → M and n a ∈ N. Let ν(a) = n a and μ(n, a) = nf a . By induction on |u|, we obtain φ(u) = (ψ(σ ν (u)), ν(u)) for all u ∈ A∗ . 7.15. First, let M ≺ N. Then there is a submonoid U of N and a surjective homomorphism φ : U → M. The partial mapping φ : N → M defines a covering of M by N. Each preimage from φ−1 (m) is a cover of m ∈ M. For the other direction, let ψ : N → M be a covering. We define U = { n ∈ N | n is a cover of some element m ∈ M } We have 1 ∈ U because 1 is a cover of the neutral element of M. Let â and b̂ be covers ̂ Thus, â b̂ is a cover of ab ∈ M. This ̂ = ψ(n â b). of a, b ∈ M. Then ψ(n)ab = ψ(n a)b shows that U is a submonoid of N. We define a mapping φ : U → M by φ(a)̂ = a. Suppose that â = b.̂ Then â is both a cover of a and of b. Since ψ is surjective, there exists n ∈ N with ψ(n) = 1. Then a = ψ(n)a = ψ(n a)̂ = ψ(n b)̂ = ψ(n)b = b. This ̂ shows that φ is well defined. Now, φ(1) = 1 and φ(â b)̂ = ab = φ(a)φ( b)̂ because â b̂ is a cover of ab. This shows that φ is a surjective homomorphism. 7.16. Let X = {x0 , . . . , x n }, let Y = X \ {x0 } and let Z = {x0 , t} with t ≠ x0 . We show that U X ≺ U Y × U Z . For this, we define a covering ψ : U Y × U Z → U X by {z ψ(y, z) = { y {
for z = x0 otherwise
A cover of the neutral element 1U is (1U , 1U ), a cover of x0 is (1U , x0 ), and a cover of x i with i ⩾ 1 is given by (x i , t); this is true because for all x ∈ U X , we have ψ(y, z) = ψ((y, z)(1U , 1U )) { { { ψ(y, z) x = { x0 = ψ(y, x0 ) = ψ((y, z)(1, x0 )) { { { x = ψ(x, t) = ψ((y, z)(x, t))
for x = 1U for x = x0 for x ∈ ̸ {1U , x0 }
316 | Solutions to Exercises ̂ Thus, in all cases ψ(y, z)x = ψ((y, z) x). 7.17. We use the notation from the proof from Theorem 7.41. We define a partial map# ping ψ : (U N × M c )N ×U1 × N # × U1 → M, and show that it is a covering. The elements # of (U N × M c )N ×U1 are written as pairs of mappings (f, g) with f : N # × U1 → U N and # g : N # × U1 → M c . For (f, g, x, e) ∈ (U N × M c )N ×U1 × N # × U1 with x ∈ N # and e ∈ U1 , we define {x ∗ ψ(f, g, x, e) = { f(1, 1) ⋅ g(1, 1) ⋅ x∗ {
if e = 1 and g(1, 1) = c if e = 0 and f(1, 1) ∈ N
For e = 1 and g(1, 1) ≠ c as well as for e = 0 and f(1, 1) = 1U we have ψ undefined. The mapping ψ is surjective since M can be written as M = N ∪ N ⋅ M c ⋅ N; this follows from the fact that a product w = a1 ⋅ ⋅ ⋅ a n with a i ∈ A either does not contain the element c, or w can be factored directly before the first c and directly behind the last c (these two occurrences of c might possibly coincide). The first part N in the union N ∪ N ⋅ M c ⋅ N is covered by e = 1, while the second part occurs as image of the elements (f, g, x, 0) with f(1, 1) ∈ N. If w = a1 ⋅ ⋅ ⋅ a n ∈ M and each a i is covered by â i , then â 1 ⋅ ⋅ ⋅ â n is a cover of w. Therefore, it suffices to show that each generator a ∈ A has a cover. We let ĉ = (f c , g c , 1, 0), and for a ≠ c we let â = (f a , g a , a, 1), where f c (x, 1) = x∗ ∈ U N
g c (x, 1) = c ∈ M c
f c (x, 0) = 1U ∈ U N
g c (x, 0) = c x∗ c ∈ M c
f a (x, e) = 1U ∈ U N
g a (x, e) = c ∈ M c
for x ∈ N # and e ∈ U1 . Now one can easily verify that indeed covers are defined. For completeness, we present the corresponding calculations. # Let u = (f, g, x, e) ∈ (U N × M c )N ×U1 × N # × U1 and a ∈ A \ {c}. We first consider the case e = 1 and g(1, 1) = c. Then ψ(u ⋅ c)̂ = ψ(u ⋅ (f c , g c , 1, 0)) = ψ(f ∗ (x, 1)f c , g ∗ (x, 1)g c , 1, 0) = (f(1, 1) f c (x, 1)) ⋅ (g(1, 1) ∘ g c (x, 1)) ⋅ 1 = (f(1, 1) x∗ ) ⋅ (g(1, 1) ∘ c) = x∗ ⋅ g(1, 1) = x∗ ⋅ c = ψ(u) ⋅ c Here, the product f(1, 1) x∗ is computed in U N . Recall that the mapping (x, e)f c : N # × U1 → U N is defined by ((x, e)f c )(y, k) = f c (yx, ke), the mapping f ∗ f is defined by (f ∗ f )(y, k) = f(y, k) f (y, k), and g ∗ g results from (g ∗ g )(y, k) = g(y, k) ∘ g (y, k). Let again e = 1 and g(1, 1) = c. Then ψ(u ⋅ a)̂ = ψ(u ⋅ (f a , g a , a, 1)) = ψ(f ∗ (x, 1)f a , g ∗ (x, 1)g a , xa, 1) = (xa)∗ = x∗ a = ψ(u) ⋅ a
Chapter 7
| 317
For the second to last equality note that xa ∈ {x∗ a, x∗ a}. For e = 0, we obtain the following computations: ψ(u ⋅ c)̂ = ψ(f ∗ (x, 0)f c , g ∗ (x, 0)g c , 1, 0) = (f(1, 1) f c (x, 0)) ⋅ (g(1, 1) ∘ g c (x, 0)) ⋅ 1 = f(1, 1) ⋅ (g(1, 1) ∘ c x∗ c) = f(1, 1) ⋅ g(1, 1) ⋅ x∗ c = ψ(u) ⋅ c and ψ(u ⋅ a)̂ = ψ(f ∗ (x, 0)f a , g ∗ (x, 0)g a , xa, 0) = (f(1, 1) f a (x, 0)) ⋅ (g(1, 1) ∘ g a (x, 0)) ⋅ (xa)∗ = f(1, 1) ⋅ g(1, 1) ⋅ x∗ a = ψ(u) ⋅ a Thus, ψ is a covering. 7.18. We need to prove that for all x, y ∈ M the following equivalence holds. (∃z : x ∼L z ∼R y) ⇔ (∃z : y ∼L z ∼R x) By symmetry it is enough to show the implication from left to right. Hence, assume that there are x, y, z ∈ M such that x ∼L z ∼R y. Since x ∈ Mx = Mz, there exists t ∈ M with x = tz. There also exists s ∈ M with y = zs. Observe that x ∼L z implies xw ∼L zw and x ∼R y implies wx ∼R wy for all w ∈ M. Therefore, we know that tz ∼R ty and xs ∼L zs. From this we get x = tz ∼R ty = tzs = xs ∼L zs = y 7.19. Consider any s, t ∈ S. Let e = t|S|! be the idempotent power of t; see Exercise 1.6 (a). We have e2 = e and hence (se)e = se. The consequence is [s][t]ω ⊆ [se][e]ω . But this all we need to see the claim. 7.20. We first assume that φ recognizes the language L. Let α ∼φ β and α ∈ L with α = u 1 u 2 ⋅ ⋅ ⋅ , β = υ1 υ2 ⋅ ⋅ ⋅ and φ(u i ) = x i = φ(υ i ). We want to show that β ∈ L. By Lemma 7.56 there exists t ∈ S and a sequence of indices 1 ⩽ i1 < i2 < ⋅ ⋅ ⋅ with x i j +1 ⋅ ⋅ ⋅ x i j+1 = t for all j ⩾ 1. In particular, for the words u j = u i j +1 ⋅ ⋅ ⋅ u i j+1 and υj = υ i j +1 ⋅ ⋅ ⋅ υ i j+1 , we have φ(u j ) = φ(υj ) = t. Let u 0 = u 1 ⋅ ⋅ ⋅ u i1 and υ0 = υ1 ⋅ ⋅ ⋅ υ i1 . Then φ(u 0 ) = φ(υ0 ) = s for s ∈ S and α = u 0 u 1 ⋅ ⋅ ⋅ as well as β = υ0 υ1 ⋅ ⋅ ⋅ . From α ∈ [s][t]ω ∩ L follows [s][t]ω ⊆ L, and with β ∈ [s][t]ω , we finally obtain β ∈ L. Consider α ∈ [s][t]ω ∩ L and β ∈ [s][t]ω . Then, the factorization resulting from α, β ∈ [s][t]ω yields α ∼φ β. Therefore, β ∈ L and [s][t]ω ⊆ L. 7.21. We start with a Büchi automaton A = (Q, δ, {q0 }, F) where F ⊆ Q is a set of final state and δ ⊆ Q × Σ × Q. We enlarge the class of automata by allowing automata of the form (Q, δ, I, F) where now I ⊆ Q is a set of initial states and δ ⊆ Q × Σ∗ × Q is finite. An infinite word is accepted if and only if and there exists a factorization α = u 1 u 2 ⋅ ⋅ ⋅ into finite words u i and a run u1
u1
u i+1
u i+2
q0 → q1 → ⋅ ⋅ ⋅ q i → q i+1 → ⋅ ⋅ ⋅ which starts in q0 ∈ I and which visits final states infinitely often.
318 | Solutions to Exercises
Next, we shift the acceptance conditions to transitions. We define the set of final transitions T ⊆ δ by T = { (p, u, q) ∈ δ | q ∈ F }. In this model a run as above is accepting if and only if first, q0 ∈ I and second, there are infinitely many i with (q i , u i+1 , q i+1 ) ∈ T. It is clear that every ω-regular language can still be accepted in the new model. Now, we reduce the class of automata again. We move again to the situation where I = {q0 } and where moreover, there are no incoming transitions to q0 . This is trivial. We add a new state q0 and we add nonfinal ε-transitions from q0 to all states in I. Next, we only allow labels on transitions which are either letters or the empty word ε. This is again standard: if we have (p, u, q) ∈ δ with |u| ⩾ 2 then we introduce new states and we split the transition into a path of length |u|. The new transitions are final if and only if (p, u, q) ∈ T. In the notation, we arrived at an automaton B = (Q, δ, {q0 }, T) with δ ⊆ Q × (Σ ∪ {ε}) × Q and T ⊆ δ. Having such an automaton, we “flood” it with additional transitions. For each letter a ∈ Σ and each pair (p, q) ∈ a
a
ℚ × Q we check whether there is a path p → p → q which is labeled by a = a a . a
a
If (p, a, q) ∈ T, we do nothing. If (p, a, q) ∈ δ \ T, but p → p → q uses a final transition, then we augment T by replacing T with T ∪ {(p, a, q)}. If (p, a, q) ∉ δ, then then we augment δ by replacing δ with δ ∪ {(p, a, q)}. This procedure is repeatedly applied until no changes in T or δ occur anymore. The procedure terminates after at most 2 ⋅ |Σ| ⋅ |Q|2 steps. We constructed an automaton B = (Q, δ , {q0 }, T ) with δ ⊆ δ ⊆ Q × (Σ ∪ {ε}) × Q and T ⊆ T without changing the accepted language. The point is that we can throw away all ε-transitions in δ we are never forced to use them. Thus, without restriction, all labels on transitions are letters. The new automaton is denoted by B = (Q, δ, {q0 }, T ). The final step is to shift the acceptance conditions back to Q = Q ∪ Q . For each states. For this, we let Q be a disjoint copy of Q and we let ̃ (p, a, q) ∈ δ, we introduce an additional transition (p , a, q) ∈ δ where p ∈ Q is the copy of p. Moreover, for each final transition (p, a, q) ∈ T, we introduce an additional transition (p, a, q ) ∈ δ where q ∈ Q is the copy of q. The new set of transitions is ̃ {q0 }, Q ). In this automaton denoted by ̂ Q. This yields an Büchi automaton C = (̃ Q, δ, all states from Q are final. There is only one initial state and this state does not have incoming transitions. Moreover, as a matter of fact there are no transition between final states. If we compare the original language L(A) with L(C), then we see that they are equal.
Chapter 8 ∗
∗
S
S
̂ ∈ IRR(S) and z ⇒ ̂z ∈ IRR(S) and 8.1. The idea is that we first compute w ⇒ w ̂ and ̂z letter by letter. It is therefore sufficient to then compare the irreducible words w ∗ ̂ ∈ IRR(S) in time O(|w|). give an algorithm which, on input w ∈ Σ∗ , computes w ⇒ w S
Chapter 8
| 319
We choose δ > 0 with |ℓ| ⩾ (1 + δ)|r| for all (ℓ, r) ∈ S. This is possible because S is finite and length reducing. Then we assign to each word pair (u, υ) its weight γ(u, υ) = |u| + (1 + δ)|υ|. We start with the word pair (1, w) where 1 is the empty word. Let |w| = n. Then γ(1, w) = (1 + δ)|w| = (1 + δ)n and 1 ∈ IRR(S). We keep as an invariant that we only generate word pairs (u, υ) with u ∈ IRR(S) and uυ = w in M, that is, ∗ ̂ , 1) after finitely many steps. If this is not uυ ⇐⇒ w. The goal is to get the word pair (w S
yet attained, then υ = aυ for some a ∈ Σ and υ ∈ Σ∗ . If now ua ∈ IRR(S), that is, if ua is irreducible, then we replace the pair (u, υ) in one time step by the pair (ua, υ ). Here the weight has been reduced by the constant δ. Now, let ua be reducible. Then necessarily ua = u ℓ for some (ℓ, r) ∈ S. In this case, we take one step to rewrite the pair (u, υ) by the pair (u , rυ ). For the weights we obtain γ(u , rυ ) ⩽ γ(u ℓ, υ ) = γ(ua, υ ) ⩽ γ(u, υ) − δ. So, in both cases the invariant is preserved. Therefore, the ̂ in time (1 + 1δ )n ∈ O(n). algorithm is correct and computes w 8.2. We may assume that the monoid is given as M = Γ ∗ /S where S ⊆ Γ ∗ × Γ ∗ is finite. ∗
On input u, υ, we have to check whether u ⇐⇒ υ. For this we run two algorithms in S
parallel, both of which work in stages. In stage n, the first one computes the list of ⩽n
⩽n
S
S
words w such that u ⇐⇒ w. If, at some point, the first algorithm detects u ⇐⇒ υ, then we stop with the positive answer u = υ in M. In stage n, the second algorithm computes the list of all homomorphisms h : M → N where N is a finite monoid (say, given by its multiplication table) such that |N| ⩽ n. The list is effectively computable because there are n|Γ| mappings h from Γ to N if N is a finite monoid of size n; each such mapping defines a homomorphism h : Γ ∗ → N; and h defines a homomorphism h : M → N if and only if h(ℓ) = h(r) for all (ℓ, r) ∈ S. Having constructed this list of homomorphisms, we check whether h(u) ≠ h(υ). If, at some point, the second algorithm detects h(u) ≠ h(υ), then we stop with the negative answer u ≠ υ in M. Since M is residually finite, one of the two algorithms will eventually stop with the correct answer. 8.3 (a) The system S + terminates because it is length-lexicographically reducing for the order on Σ induced by I+ . It is locally confluent because if cab ⇐ cba ⇒ bca, then cab ⇒ acb ⇒ abc ⇐ bac ⇐ bca. S+
S+
S+
S+
S+
S+
8.3 (b) For all pairs (a, b) ∈ I, we must have either ba → ab ∈ S or ab → ba ∈ S. Let I+ = { (a, b) ∈ Σ × Σ | ba → ab ∈ S }. Then first I = { (a, b), (b, a) | (a, b) ∈ I+ }. Suppose that I+ is not a transitive orientation. Then there are (a, b), (b, c) ∈ I+ with (a, c) ∉ I+ . That means, S contains rules ba → ab and cb → bc but no rule ca → ac. For n ∈ ℕ consider the rewrit∗ ∗ ings c n a n b n ⇐ c n b n a n ⇒ b n c n a n . The factors a n b n and b n c n are irreducible, S+
S+
because if, for instance, there were another irreducible word in the class of a n b n ,
320 | Solutions to Exercises
then in this word a factor ba would have to appear. But this is not possible. Now, also c n a n is irreducible because either (a, c) ∉ I or ac → ca ∈ S. Since S is finite we necessarily have c n a n b n , b n c n a n ∈ IRR(S) for sufficiently large n, contradicting c n a n b n = b n c n a n ∈ M(Σ, I). 8.3 (c) Let M = M(Σ, I). By virtue of (8.2) we may assign each element of M with a length. Therefore, M is generated by the elements of length 1, and this generating set is minimal. We have Σ = (M \ {1}) \ (M \ {1})2 , and hence Σ is determined by M. The edge set I results from the elements from Σ, which are different but commute in M. 8.3 (d) By Exercise 8.3 (c) we have Σ = {a, b, c} and I = {(a, c), (c, a)} with ac = ca, ab ≠ ba and bc ≠ cb in M. Then abc and cab are transposed, and also cab = acb and cba are transposed. But there is no transposition of abc which directly leads to cba. 8.3 (e) All graphs with four vertices have a transitive orientation but the cycle C5 with 5 vertices and 5 edges does not. Hence, the answer is 5. 8.4 (a) We choose a linear order for Σ. Then the system is almost identical to the system of equation (8.3) or equation (8.4), respectively. In addition, we just have to delete all squares of letters. SRACG
=
{ bua → abu | a, b ∈ Σ, a < b, (a, bu) ∈ I } ∪ { aua → u | a ∈ Σ, (a, u) ∈ I }
̃ be a disjoint copy of V and Σ = V ∪ ̃ ̃ , b), 8.4 (b) Let V V. For (a, b) ∈ E, we put (a, b), (a ̃ ̃ into I, but no other pairs. In particular, a and a ̃ , b) ̃ do not commute in C. (a, b) and (a ̃ . (Recall that We now consider the homomorphism φ : G → C, induced by a → a a ̃ ̃ ̃ ̃ (a, b) ∈ E implies a a b b = b ba a in C.) It remains to show that φ is injective. For ̃ are this purpose, we first choose a linear order for Σ, in which the elements a and a disposed whenever they are adjacent to each other. Because V ⊆ Σ this also induces a linear order on Σ. Now, let 1 ≠ g ∈ G and let w ∈ V ∗ be a length-lexicographically shortest word representing g. Then φ(w) is an irreducible normal form for the system ̃ and a ̃ a prevent that SRACG from the solution of Exercise 8.4 (a), because the factors a a identical elements are next to each other. 8.4 (c) The convergence of T is pure routine. We only show that φ : Σ → F, a → {a}, induces an isomorphism φ : C(Σ, I) → F∗ /T. For (a, b) ∈ I, we obtain φ(a)φ(b) = {a, b} in F∗ /T. Furthermore φ(a2 ) = 0 = 1 in F∗ /T for a ∈ Σ. Since φ(Σ) generates the group F∗ /T, the map φ is surjective. Now, consider the map ψ : F → C(Σ, I), F → ∏a∈F a. Since F ∈ F is a clique, it does not matter in which order we evaluate the product ∏a∈F a ∈ C(Σ, I). Because of the rules in T the map ψ induces a homomorphism F∗ /T onto C(Σ, I). Since ψ(φ(a)) = a for all a ∈ Σ, the homomorphism φ is injective. 8.5. Closure under homomorphisms is straightforward from the definition. For the nonclosure under inverse homomorphisms, consider the homomorphism h :
Chapter 8
| 321
{a, b}∗ → ℤ where h(a) = 1 and h(b) = −1. Then we have a∗ b ∗ ∩ h−1 (0) = { a n b n n ∈ ℕ } As a finite set {0} is rational; and a∗ b ∗ is rational, too. 8.6. In M, we consider the intersection R = (ac)∗ b ∗ ∩ a∗ (bc)∗ of two rational subsets. It is the set of all elements w ∈ M which have the same number of a’s, b’s, and c’s. Assume that R is rational and consider the projection π : M → {a, b}∗ which deletes all c’s. By Exercise 8.5 we see that π(R) is rational (i.e., regular) in {a, b}∗ . But this is a contradiction, because π(R) = { a n b n | n ∈ ℕ } is not regular. 8.7. Let G1 and G2 be two finitely generated subgroups of a free group F. Without loss of generality, let F = F(Σ) be a finitely generated free group. By Section 8.9, the sets of the reduced words in (Σ ∪ Σ−1 )∗ , which represent the elements of G1 and G2 , are in each case regular. The intersection of regular languages is regular, and this provides a regular set for the intersection G1 ∩ G2 . 8.8. Let K be the kernel of the projection F2 → ℤ, a → 1, b → 0. The Schreier graph of K has as vertex set { Ka n | n ∈ ℤ }, and each vertex Ka n has two outgoing edges, labeled by a (to vertex Ka n+1 ) and a−1 (to vertex Ka n−1 ), as well as two loops with labels b and b −1 , respectively. The edges labeled with a or a−1 form a spanning tree. Let ∆ be the set of edges labeled with b. By Theorem 8.25, the kernel K is isomorphic to the free group F(∆). An according isomorphism is defined in the proof of that theorem. This exactly yields the set U as the image of ∆. 8.9. Let G be a group generated by k elements and φ : G → G a surjective automorphism. Suppose that φ is not injective. Then there exists an element 1 ≠ g ∈ G with φ(g) = 1. Since G is residually finite there exists a surjective homomorphism π : G → E onto a finite group E with π(g) ≠ 1. Now, for n ∈ ℕ consider the homomorphism π n : G → E, given by π n (h) = π(φ n (h)). Since φ is surjective, we find h n ∈ G with φ n (h n ) = g. Therefore, π n (h n ) = π(g) ≠ 1 but π n+1 (h n ) = π(φ(g)) = π(1) = 1. Hence, π m (h n ) = 1 for all m > n; especially π m ≠ π n for all m ≠ n. A homomorphism from G to E is determined by the images of the k generators. Hence, there are at most |E|k homomorphisms. This is a contradiction, and therefore φ must be injective. 8.10. We show that BS(p, q) is not Hopfian if p and q do not have the same set of prime divisors. Let r be a prime number which divides p but not q. In the reverse case use the isomorphism BS(p, q) → BS(q, p) given by a → a and t → t−1 . We consider the element [a, ta p/r t−1 ]. Here [x, y] = xyx−1 y−1 , the commutator of x and y; [x, y] = 1 if and only if xy = yx. The word [a, ta p/r t−1 ] = ata p/r t−1 a−1 ta−p/r t−1 is Britton reduced, that is in particular [a, ta p/r t−1 ] ≠ 1. But it is contained in the kernel of the homomorphism φ : BS(p, q) → BS(p, q) with t → t and a → a r . The homomorphism φ
322 | Solutions to Exercises
therefore is not injective. But φ is surjective because r and q are coprime, and therefore a can be represented as a product of the elements φ(a) = a r and φ(ta p/r t−1 ) = a q . That means, BS(p, q) is not Hopfian under the given conditions on p and q. 8.11. In the group BG(1, 2) = BS(1, 2) ∗ F(b)/{bab −1 = t}, we have ba e b −1 = t e and e t e at−e = a2 . If we define A(0) = a and A(n + 1) = bA(n)b −1 abA(n)−1 b −1 , then A(n) has only exponential length, but the normal form of A(n) has the form a τ(n) . 8.12. The center is a normal subgroup in G and since G is not commutative, the quotient group G/Z(G) is not trivial. If the index of Z(G) in G is less than 4, then G/Z(G) is cyclic. Assume by contradiction that this is possible. Then there exists some a ∈ G such that g, h ∈ G can be written as products g = a k b and h = aℓ c, where k, ℓ ∈ ℕ and b, c ∈ Z(G). This is a contradiction because: gh = a k baℓ c = a k+ℓ bc = aℓ+k cb = hg Now, let us derive an upper bound for the probability that a pair (g, h) ∈ G × G satisfies gh = hg. For g ∈ G let C(g) = { h ∈ G | gh = hg } be the centralizer of g. It is a subgroup, and for g ∉ Z(G), we obtain |C(g)| ⩽ |G|/2. As we have just seen, the probability for a randomly chosen g to be in the center Z(G) is at most 1/4. In that case, every h commutes with g, but if g ∉ Z(G), then at most half of the h commute with g. Since at least 3/4 are not in the center, we conclude that the desired upper bound is given by 1/4 + 3/4 ⋅ 1/2 = 5/8. Finally, to see that the bound is sharp consider the dihedral group D with 8 elements: the center has index 4 and the centralizer of each element outside the center has index 2. Note that there are infinitely many groups where the bound 5/8 is sharp: if H is any finite Abelian group, then the bound is sharp for D × H, too. 8.13. Let x, y ∈ F with xy = yx. We may assume that x and y are reduced. If xy is also reduced, then this holds also for yx and the statement follows from Theorem 6.5 (b). Without loss of generality, let |x| > |y|. Then we may write x = x s and y = s−1 y for a maximum suffix s of x, which is not empty. For y = s−1 , we have xy = x = yx = yx y−1 , that is, x y = yx , and the statement follows by induction on |x|. Therefore, without loss of generality, we may assume that y is not the empty word. Conjugation with s gives sxs−1 = sx and sys−1 = y s−1 , and these elements commute. If sx is not reduced then we may apply induction on |xy|, and the statement follows. Now, let sx be reduced. But then the word sx y s is reduced because of the maximality of |s| and the symmetry in x and y, and we may apply Theorem 6.5 (b) again. 8.14 (a) We have ρ ba = i b ∘ λ ab ∘ i b . Therefore, it is enough to show that every Whitehead automorphism can be expressed by automorphisms of the form i a , λ ab and ρ ba . First, consider the permutation π ab of Σ, which interchanges the two letters a, b with a ≠ b and leaves all other letters fixed. We have π ab = λ−1 ba ∘ i a ∘ ρ ba ∘ λ ba because −1 −1 ρ ba ∘λ ba (a) = b and λ ba ∘i a (b) = b as well as ρ ba ∘λ ba (b) = ba−1 and λ−1 ba ∘i a (ba ) = a. Hence, all transpositions can be expressed as desired, and therefore all permutations
Chapter 8
| 323
of Σ because the complete permutation group of Σ is generated by the transpositions. It remains to show that the Whitehead automorphisms of the form W(a,L,R,M) can be expressed as desired. But this is possible because W(a,L,R,M) = ∏b∈L∪M λ ab ⋅ ∏c∈R∪M ρ ca . 8.14 (b) We may assume that |Σ| ⩾ 2. The permutation group of Σ is generated by a transposition and a cyclic permutation of length |Σ|. If |Σ| = 2, we only need the transposition. Moreover, for a single pair (a, b) with a, b ∈ Σ, the automorphisms i a and λ ab are sufficient for generating all regular and elementary Nielsen transformations. Finally, the statement follows from Exercise 8.14 (a). 8.15. The solution can be obtained by induction on ℓ by showing min{a1 , a3 }, min{a2 , a4 } ⩽ Fℓ and max{a1 , a3 }, max{a2 , a4 } ⩽ Fℓ+1 . 8.16 (a) Let A ∈ PSL(2, ℤ) be an element of order 2. Let A(z) = sentation we have A=(
a c
b ) d
A2 = (
a2 + bc c(a + d)
az+b cz+d .
In matrix repre-
b(a + d) ) d2 + bc
If both b and c were 0, then we would have a = d = ±1 and hence A(z) = z for all z, in contradiction to the fact that A has order 2. So b ≠ 0 or c ≠ 0, and a + d = 0 follows. −1 8.16 (b) Let S(z) = −1 z and R(z) = z+1 . Suppose that, in PSL(2, ℤ), the matrix A is not conjugated to S. Since R has order 3, possibly after a suitable conjugation, we may assume that A = R i1 S ⋅ ⋅ ⋅ R i m−1 SR i m , m ⩾ 2, with 1 ⩽ iℓ ⩽ 2 for 1 ⩽ ℓ ⩽ m and i1 = i m . But, as in the proof of Theorem 8.38, this matrix A has infinite order, which is a contradiction. Therefore, in PSL(2, ℤ), the matrix A is conjugated to S.
8.17 (a) Let −1 be a quadratic residue modulo n. Then there are p, q ∈ ℤ with −q2 − pn = 1. Define A ∈ PSL(2, ℤ) with A(z) = qz+n pz−q . An easy calculation shows that A has order 2. From Exercise 8.16 (b), we know that, in PSL(2, ℤ), the matrix A xz+y is conjugate to S. This means, there is a matrix X ∈ PSL(2, ℤ) with X(z) = uz+υ and −1 XSX = A. Hence, we obtain XSX −1 (z) =
(−υy − ux)z + x2 + y2 qz + n = (−υ2 − u 2 )z + υy + ux pz − q
Since n ∈ ℕ, we have n = x2 + y2 . 8.17 (b) Since gcd(x, y) = 1, there exist u, υ ∈ ℤ with xυ − yu = 1. We construct 2 xz+y +y2 2 2 X ∈ PSL(2, ℤ) with X(z) = uz+υ . It follows XSX −1 (z) = qz+x = qz+n pz−q pz−q for p = −υ − u and q = −υy − ux. From −q2 − pn = 1, we obtain −1 ≡ q2 mod n. 8.17 (c) The case p = 2 is trivial. Now, let p ⩾ 3. For p = x2 + y2 , we automatically have gcd(x, y) = 1. Now, Exercises 8.17 (a) and 8.17 (b) show that p is a sum of two squares if and only if −1 is square modulo p; and −1 is a square modulo p if and only if (ℤ/pℤ)∗ has an element of order 4. The multiplicative group (ℤ/pℤ)∗ is cyclic and has order p − 1. It contains an element of order 4 if and only if p ≡ 1 mod 4.
Bibliography [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27]
M. Aigner and G. M. Ziegler. Proofs from THE BOOK. Springer, 2014. W. R. Alford, A. J. Granville and C. B. Pomerance. There are infinitely many Carmichael numbers. Ann. of Math. (2), 140:703–722, 1994. A. Arnold. A syntactic congruence for rational ω-languages. Theoret. Comput. Sci., 39:333– 335, 1985. F. L. Bauer. Decrypted Secrets: Methods and Maxims of Cryptology. Springer, 2006. G. Baumslag, B. Fine, M. Kreuzer and G. Rosenberger. A Course in Mathematical Cryptography. Walter de Gruyter, 2015. M. Benois. Parties rationelles du groupe libre. C. R. Acad. Sci. Paris, Sér. A, 269:1188–1190, 1969. A. Björner and F. Brenti. Combinatorics of Coxeter groups, Springer, 2005. O. Bogopolski. Introduction to group theory. European Mathematical Society, 2008. A. Boudet and H. Comon. Diophantine equations, Presburger arithmetic and finite automata. In CAAP 1996, Proceedings, volume 1059 of LNCS, pages 30–43. Springer, 1996. J. L. Britton. The word problem. Ann. of Math., 77:16–32, 1963. J. A. Brzozowski. Canonical regular expressions and minimal state graphs for definite events. In Proc. Sympos. Math. Theory of Automata (New York, 1962), pages 529–561, 1963. J. R. Büchi. Weak second-order arithmetic and finite automata. Zeitschrift für mathematische Logik und Grundlagen der Mathematik, 6:66–92, 1960. J. R. Büchi. On a decision method in restricted second-order arithmetic. In Proc. Int. Congr. for Logic, Methodology, and Philosophy of Science, pages 1–11. Stanford Univ. Press, 1962. J. Buchmann. Introduction to Cryptography. Springer, 2014. T. Camps, V. große Rebel and G. Rosenberger. Einführung in die kombinatorische und die geometrische Gruppentheorie. Heldermann Verlag, 2008. K.-T. Chen, R. H. Fox and R. C. Lyndon. Free differential calculus, IV – The quotient groups of the lower central series. Ann. of Maths., 68(1):81–95, 1958. N. Chomsky and M.-P. Schützenberger. The algebraic theory of context-free languages. In Comp. Prog. and Formal Systems, pages 118–161. North-Holland, 1963. A. Costa and B. Steinberg. The Schützenberger category of a semigroup. Semigroup Forum, pages 1–17, 2014. R. Crandall and C. B. Pomerance. Prime Numbers: A Computational Perspective. Springer, 2010. M. W. Davis. The geometry and topology of Coxeter groups. Princeton Univ. Press, 2008. M. Dehn. Ueber unendliche diskontinuierliche Gruppen. Math. Ann., 71:116–144, 1911. L. E. Dickson. Finiteness of the odd perfect and primitive abundant numbers with n distinct prime factors. Amer. J. Math., 35(4):413–422, 1913. V. Diekert and P. Gastin. Pure future local temporal logics are expressively complete for Mazurkiewicz traces. Inform. Comput., 204:1597–1619, 2006. V. Diekert and P. Gastin. First-order definable languages. In Logic and Automata: History and Perspectives, Texts in Logic and Games, pages 261–306. Amsterdam Univ. Press, 2008. V. Diekert and M. Kufleitner. Bounded synchronization delay in omega-rational expressions. In CSR 2012, Proceedings, volume 7353 of LNCS, pages 89–98. Springer, 2012. V. Diekert, M. Kufleitner, K. Reinhardt and T. Walter. Regular languages are Church–Rosser congruential. J. ACM, 62:1–39, 2015. V. Diekert, M. Kufleitner and G. Rosenberger. Diskrete algebraische Methoden. Walter de Gruyter, 2013.
326 | Bibliography
[28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55]
V. Diekert, M. Kufleitner and G. Rosenberger. Elemente der Diskreten Mathematik. Walter de Gruyter, 2013. V. Diekert, M. Kufleitner and B. Steinberg. The Krohn–Rhodes theorem and local divisors. Fundamenta Informaticae, 116(1–4):65–77, 2012. V. Diekert, M. Kufleitner and P. Weil. Star-free languages are Church–Rosser congruential. Theoret. Comput. Sci., 454:129–135, 2012. V. Diekert, J. Laun and A. Ushakov. Efficient algorithms for highly compressed data: The word problem in Higman’s group is in P. Int. J. Algebr. Comput., 22(8):1–19, 2012. M. Dietzfelbinger. Primality testing in polynomial time: From randomized algorithms to ‘PRIMES is in P’. Springer, 2004. P. D. Domich, R. Kannan and L. E. Trotter Jr. Hermite normal form computation using modulo determinant arithmetic. Math. Oper. Res., 12:50–59, 1987. C. Droms. Graph groups, coherence and three-manifolds. J. Algebra, 106(2):484–489, 1985. C. Droms. Subgroup of graph groups. J. Algebra, 110:519–522, 1987. W. Dyck. Ueber Aufstellung und Untersuchung von Gruppe und Irrationalität regulärer Riemann’scher Flächen. Math. Ann., XVII:473–509, 1881. W. Dyck. Gruppentheoretische Studien. Math. Ann., XX:1–44, 1883. S. Eilenberg. Automata, Languages, and Machines, volume B. Academic Press, 1976. S. Eilenberg and M.-P. Schützenberger. Rational sets in commutative monoids. J. Algebra, 13:173–191, 1969. T. ElGamal. A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE T. Inform. Theory, 31:469–472, 1985. B. Fine and G. Rosenberger. Number theory: An introduction via the distribution of primes. Birkhäuser, 2007. M. Forester. Splittings of generalized Baumslag–Solitar groups. Geometriae Dedicata, 121:43–59, 2006. M. Fürer. Faster integer multiplication. SIAM J. Comput., 39(3):979–1005, 2009. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NPCompleteness. W. H. Freeman, 1979. S. Ginsburg and E. H. Spanier. Semigroups, Presburger formulas and languages. Pac. J. Math., 16:285–296, 1966. K. Gödel. Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I. Monatshefte für Mathematik und Physik, 38:173–198, 1931. M. Gromov. Hyperbolic groups. In Essays in Group Theory, number 8 in MSRI Publ., pages 75– 263. Springer, 1987. R. Hartshorne. Algebraic Geometry. Springer, 1997. G. Higman. Ordering by divisibility in abstract algebras. P. Lond. Math. Soc., Third Series, 2:326–336, 1952. G. Higman, B. Neumann and H. Neumann. Embedding theorems for groups. J. London Math. Soc., 24:247–254, 1949. J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, 1979. R. Kannan and A. Bachem. Polynomial algorithms for computing the Smith and Hermite normal forms of an integer matrix. SIAM J. Comput., 8:499–507, 1979. R. M. Keller. Parallel program schemata and maximal parallelism I. Fundamental results. J. ACM, 20(3):514–537, 1973. S. C. Kleene. Representation of events in nerve nets and finite automata. In Automata Studies, pages 3–40. Princeton Univ. Press, 1956. D. E. Knuth. The art of computer programming. Vol. 2: Seminumerical Algorithms. AddisonWesley, 1997.
Bibliography
[56] [57] [58] [59] [60]
[61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78]
[79] [80] [81] [82] [83]
| 327
N. Koblitz. Elliptic curve cryptosystems. Mathematics of Computation, 48:203–209, 1987. N. Koblitz. Introduction to Elliptic Curves and Modular Forms. Springer, 1993. First edn. 1984. N. Koblitz. A Course in Number Theory and Cryptography. Springer, 1994. K. Krohn and J. L. Rhodes. Algebraic theory of machines. I: Prime decomposition theorem for finite semigroups and machines. Trans. Amer. Math. Soc., 116:450–464, 1965. K. Krohn, J. L. Rhodes and B. Tilson. The prime decomposition theorem of the algebraic theory of machines. In Algebraic Theory of Machines, Languages, and Semigroups, Chapter 5, pages 81–125. Academic Press, 1968. J. B. Kruskal. Well-quasi-ordering, the Tree Theorem, and Vazsonyi’s conjecture. Trans. Amer. Math. Soc., 95(2):210–225, 1960. M. Kufleitner. Star-free languages and local divisors. In DCFS 2014, Proceedings, volume 8614 of LNCS, pages 23–28. Springer, 2014. G. Lallement. On the prime decomposition theorem for finite monoids. Math. Systems Theory, 5:8–12, 1971. G. Lallement. Augmentations and wreath products of monoids. Semigroup Forum, 21(1):89– 90, 1980. S. Lang. Elliptic Curves: Diophantine Analysis. Grundlehren der mathematischen Wissenschaften 231. Springer, 2010. M. Lothaire. Combinatorics on Words, volume 17 of Encyclopedia of Mathematics and Its Applications. Addison-Wesley, 1983. Reprinted by Cambridge Univ. Press, 1997. M. Lothaire. Algebraic Combinatorics on Words, volume 90 of Encyclopedia of Mathematics and Its Applications. Cambridge Univ. Press, 2002. M. Lothaire. Applied Combinatorics on Words, volume 105 of Encyclopedia of Mathematics and Its Applications. Cambridge Univ. Press, 2005. R. Lyndon and P. Schupp. Combinatorial Group Theory. Springer, 2001. First edn. 1977. W. Magnus, A. Karrass and D. Solitar. Combinatorial Group Theory. Interscience Publishers, 1966. Reprint of the 2nd edn. (1976): 2004. Yu. V. Matiyasevich. Hilbert’s Tenth Problem. MIT Press, 1993. A. Mazurkiewicz. Concurrent program schemes and their interpretations. DAIMI Rep. PB 78, Aarhus University, 1977. J. H. McKay. Another proof of Cauchy’s group theorem. Am. Math. Monthly, 66:119, 1959. J. D. McKnight. Kleene quotient theorem. Pac. J. Math., pages 1343–1352, 1964. R. Merkle and M. M. Hellman. Hiding information and signatures in trapdoor knapsacks. IEEE T. Inform. Theory, 24(5):525–530, 1978. V. Miller. Use of elliptic curves in cryptography. In Advances in Cryptology – CRYPTO 85, volume 218 of LNCS, pages 417–426, Springer, Berlin, 1985. E. F. Moore. Gedanken-Experiments on Sequential Machines. In Automata Studies, pages 129–153. Princeton Univ. Press, Princeton, New Jersey, 1956. A. G. Myasnikov, A. Ushakov and W. Dong-Wook. The Word Problem in the Baumslag group with a non-elementary Dehn function is polynomial time decidable. J. Algebra, 345:324–342, 2011. M. Nair. On Chebyshev-type inequalities for primes. Am. Math. Monthly, 89(2):126–129, 1982. C. St. J. A. Nash-Williams. On well-quasi-ordering finite trees. Mathematical Proceedings of the Cambridge Philosophical Society, 59(04):833–835, 1963. A. Nerode. Linear automaton transformations. Proc. Amer. Math. Soc, 9(4):541–544, 1958. J. Nielsen. Om regning med ikke-kommutative faktorer og dens anvendelse i gruppeteorien. Mat. Tidsskr. B, 1921:78–94, 1921. J. Nielsen. Die Isomorphismengruppe der freien Gruppen. Math. Ann., 91, 1924.
328 | Bibliography
[84]
[85] [86]
[87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112]
A. N. Platonov. An isoparametric function of the Baumslag–Gersten group. (Russian) Vestnik Moskov. Univ. Ser. I Mat. Mekh., 3:12–17, 2004. Russian. Engl. transl. Moscow Univ. Math. Bull. 59(3):12–17, 2004. C. B. Pomerance, J. L. Selfridge and S. S. Wagstaff Jr. The pseudoprimes to 25 ⋅ 109 . Math. Comput., 35:1003–1026, 1980. M. Presburger. Über die Vollständigkeit eines gewissen Systems der Arithmetik ganzer Zahlen, in welchem die Addition als einzige Operation hervortritt. Comptes Rendus du I congrès de Mathématiciens des Pays Slaves, pages 92–101, 1929. J. L. Rhodes and B. Steinberg. The q-theory of finite semigroups. Springer, 2009. N. Robertson and P. D. Seymour. Graph minors. XX. Wagner’s conjecture. J. Comb. Theory, Ser. B, 92:325–357, 2004. G. Rousseau. On the Jacobi symbol. Journal of Number Theory, 48(1):109–111, 1994. B. Schoeneberg. Elliptic Modular Functions. Springer, 1974. A. Schönhage and V. Strassen. Schnelle Multiplikation großer Zahlen. Computing, 7:281– 292, 1971. M.-P. Schützenberger. On finite monoids having only trivial subgroups. Inf. Control, 8:190– 194, 1965. J.-P. Serre. Trees. Springer, 1980. French original 1977. A. Shamir. How to share a secret. Commun. ACM, 22:612–613, 1979. A. Shamir. A polynomial-time algorithm for breaking the basic Merkle–Hellman cryptosystem. IEEE Trans. Inform. Theor., 30:699–704, 1984. C. E. Shannon. Communication theory of secrecy systems. Bell System Technical Journal Volume, XXVIII:656–715, 1949. J. H. Silverman. The Arithmetic of Elliptic Curves. Springer, 2009. C. Squier. Word problems and a homological finiteness condition for monoids. J. Pure Appl. Algebra, 49:201–217, 1987. J. R. Stallings. Topology of finite graphs. Invent. Math., 71:551–565, 1983. J. Stillwell. Classical Topology and Combinatorial Group Theory. Springer, 1993. H. Straubing. Finite Automata, Formal Logic, and Circuit Complexity. Birkhäuser, 1994. W. Thomas. Automata on infinite objects. In Handbook of Theoretical Computer Science, Chapter 4, pages 133–191. Elsevier, 1990. K. Thompson. Regular expression search algorithm. Commun. ACM, 11:410–422, 1968. H. Tietze. Ueber die topologische Invarianten mehrdimensionaler Mannigfaltigkeiten. Monatsh. Math., 19:1–118, 1908. S. Waack. Tape complexity of word problems. In FCT 1981, Proceedings, volume 117 of LNCS, pages 467–471. Springer, 1981. S. Waack. The parallel complexity of some constructions in combinatorial group theory. J. Inf. Process. Cybern., 26(5–6):265–281, 1990. R. D. Wade. Folding free-group automorphisms. Q. J. Math., 2013. L. C. Washington. Elliptic Curves: Number Theory and Cryptography. Chapman & Hall/CRC, 2008. A. Weiß. The word and conjugacy problem in generalized Baumslag–Solitar groups is solvable in logspace. ALFA 2015, Bordeaux, June 15–17, 2015. Unpublished, 2015. M. J. Wiener. Cryptanalysis of short RSA secret exponents. IEEE T. Inform. Theory, 36(3):553– 558, 1990. D. Wise. From Riches to Raags: 3-Manifolds, Right-Angled Artin Groups, and Cubical Geometry. American Mathematical Society, 2012. C. Wrathall. The word problem for free partially commutative groups. J. Symbolic Comput., 6(1):99–104, 1988.
Index A Abel, N. H. 2, 3, 15 Abelian 3 accepted language 211 accepted set 176, 181 accepting run 181 addition chain 87 addition law on elliptic curve 137 addition theorem 19 Adleman, L. V, 60 Advanced Encryption Standard 32 AES see Advanced Encryption Standard Agrawal, M. VI, 121 AKS primality test 121 algebraic closure 35 algebraic number 1 algebraically closed field 35 algorithm – AKS 121 – baby-step giant-step 104 – binary gcd 87, 117, 297 – Brzozowski’s minimization 232 – Cipolla 98 – Dehn 247 – division 109 – elliptic curve integer factorization 148 – Euclidean 20 – extended Euclidean 21 – fast exponentiation 86 – fast Fourier transform 111 – fast modular exponentiation 86 – index calculus 107 – Karatsuba 108 – marking 178 – Moore’s minimization 178 – Pohlig–Hellman 106 – Pollard’s p − 1 100 – Pollard’s ρ (factorization) 100 – Pollard’s ρ (logarithm) 105 – probabilistic primality test 88 – quadratic sieve 101 – Schönhage–Strassen 113 – Tonelli 97 Alice VIII, 51 almost all 26 alphabet 161, 162, 237
alternating group 15 amalgamated product 253 antichain 165 aperiodic monoid 189 arc 264 Arnold, A. 217, 218 Artin, E. 249 associativity 3 – elliptic curves 137, 143 asymptotic estimate 83 Atkin, A. O. L. 152 attack – brute-force 52 – chosen-ciphertext 51 – chosen-plaintext 51 – ciphertext-only 51 – known-plaintext 51 – man-in-the-middle 59 – Shamir 74 – Wiener 60 augmented monoid 203 automata minimization 178 – Brzozowski 232 – Moore 178 automaton 176 – Büchi 211 – complement 232 – complete 176, 182 – deterministic 176, 182 – deterministic Büchi 211 – equivalent 182 – finite 181, 184 – minimal 177 – minimization 178 – nondeterministic 181 – product 232 – run 181, 184, 211 – spelling 182, 188 – subset 187 – trim 231 – with rational labels 184 automorphism 4 – Frobenius 20, 154 – graph 11 – inner 252 – Whitehead 268
330 | Index
automorphism group 11 axiom of choice 35 B baby-step giant-step algorithm 104 bad sequence 165 basis of a free group 243 Bass, H. 258 Baumslag, G. 245 Baumslag group 258 Baumslag–Gersten group 259 Baumslag–Solitar group 245 Benois, M. 260, 262 Benois’ theorem 262 Bézout, É. 21 Bézout’s lemma 21 big-O see O-notation binary gcd 87, 117, 297 binomial coefficient 19 binomial expansion see binomial theorem binomial theorem 19 Birkhoff, G. 2 birthday paradox 85 birthday paradox algorithm see ρ algorithm Bob VIII, 51 Bombelli, R. 1 Boone, W. W. 236 Brandt monoid 208 bridge 270 Britton, J. L. 258 Britton reduction 258 brute-force attack 52 Brzozowski, J. A. 233 Brzozowski’s minimization algorithm 232 B-smooth number 102, 107 Büchi automaton 211 – deterministic 211 – run 211 Büchi, J. R. VII, 210, 214, 218, 219, 221 Büchi’s theorem 219, 221 C Caesar, G. J. 52 Caesar’s cipher 52 cancellative monoid 250 Cardano, G. 1 Carmichael number 88 Carmichael, R. D. 88 Cauchy, A. L. 7
Cauchy’s theorem 7 center of a group 278, 283 centralizer 322 chain 165 channel encoding 50 characteristic 18 chinese remainder theorem 24 Chomsky, A. N. 173 Church, A. 238 Church–Rosser property 239 cipher 50 – Caesar 52 – monoalphabetic 52 – polyalphabetic 53 – shift 52 – Vigenère 54 ciphertext 50 Cipolla, M. 98 Cipolla’s algorithm 98 closed formula see sentence coarsening 167 coefficient 26 – leading 26 coincidence index 55 collision resistant function 65 commutative 3 commutative ring 15 comparable elements 165 complement automaton 232 complete automaton 176, 182 complete rewriting system 239 complex product 249 compression see data compression compression function 65 computational security 51 concatenation 161, 237 concurrency 248 confluent 239 – locally 239 – strongly 239 congruence 175, 241 – syntactic 175, 217 congruent 4, 16 conjugacy problem 236 conjugate element 162 constant polynomial 26 construction – subset 187 – Thompson 182
Index |
content see alphabet convergent rewriting system 239 convolution product 110 coprime 4, 20, 30 coset 5 – left 5 – right 5 cover 233 covering 233 Coxeter group 249 – right-angled 251 Coxeter, H. 249 critical pair 242 – factor 242 – overlap 242 cryptanalysis 49 cryptographic system 50 cryptography 49 cryptosystem 50 – asymmetric 58 – Caesar’s cipher 52 – Diffie–Hellman 59, 63 – ElGamal 59, 64 – elliptic Diffie–Hellman 145 – elliptic ElGamal 146 – key exchange 63 – Merkle–Hellman 59, 73 – monoalphabetic cipher 52 – multi-prime RSA 79 – polyalphabetic cipher 53 – public-key 59 – Rabin 59, 61 – RSA 59, 60, 94 – shift cipher 52 – symmetric 50 – Vernam one-time pad 57 – Vigenére cipher 54 curve 133 – elliptic 131, 134, 144 – pseudo- 146 cyclic group 6 cyclic semigroup 232 cyclically reduced word 267 D D-class 206 Damgård, I. B. 66 data compression 50, 55 Davis, M. 228
De Morgan, A. 189 decoding function 50 decomposition – Krohn–Rhodes 193, 194 decreasing sequence 165 decryption function 50 defining relation 241 degree – divisor 142 – endomorphism 156 – polynomial 26, 139 degree formula 27 Dehn function 259 Dehn, M. 236 Dehn’s algorithm 247 del Ferro, S. 1 dependence alphabet 250 derivative 17, 28 Descartes, R. 45 Descartes’ rule of signs 45 deterministic automaton 176, 182 deterministic Büchi automaton 211 deterministic graphical realization 275 deterministic language 211 DFA see deterministic automaton DFT see discrete Fourier transform Dickson, L. E. 167 Dickson’s lemma 167 differentiation rules 28 Diffie, B. W. V, 58, 63, 145 Diffie–Hellman key exchange 59, 63 – on elliptic curves 145 Diffie–Hellman problem 63, 146 digital commitment 71 digital signature 67 Digital Signature Algorithm 67 dihedral group 11 direct product 24 directed edge 263 discrete Fourier transform 109, 111 discrete logarithm 103, 104 – baby-step giant-step algorithm 104 – index calculus 107 – Pohlig–Hellman algorithm 106 – Pollard’s ρ algorithm 105 distributivity 15 divide 4 division 109 – polynomial 29
331
332 | Index
divisor 141, 194 – greatest common 4 – group 194 – local 190, 202 – principal 142 – zero 18 domain 18 – integral 18 DSA see Digital Signature Algorithm Dyck language 244 Dyck, W. F. A., Ritter von 244 E ECC 158 edge – arc 264 – bridge 270 – directed 263 – source 263 – target 263 egg box diagram 208 Eilenberg, S. 2 Eisenstein, F. G. M. 45 Eisenstein’s criterion 45 element – comparable 165 – conjugate 162 – generating 6 – idempotent 3, 42 – incomparable 165 – inverse 3, 4 – invertible 200 – left inverse 41 – left neutral 42 – neutral 3, 4 – period 205 – regular 42 – right inverse 41 – right neutral 42 – square 38, 96 – threshold 205 – transposed 162 – unit 15, 200 – zero 3 elementary arithmetic operation 112 ElGamal cryptosystem 59, 64 – on elliptic curves 146 ElGamal signature 79 ElGamal, T. 64
Elgot, C. C. 221 elimination – of ε-transitions 183 – of states 184 elimination of ε-transitions 183 Elkies, N. 152 elliptic curve 131, 132, 134, 144 – group operation 137 elliptic curve integer factorization 148 elliptic function 132 empty word 161, 237 encoding function 50 encryption function 50 endomorphism 155 – Frobenius 155 – separable 156 ε-NFA 188 equivalent automata 182 ε-transition 183 Euclid of Alexandria 1, 20 Euclidean algorith – extended 21 Euclidean algorithm 20 – binary 87 – binary extended 117, 297 Euler, L. 1, 24, 39, 47 Eulerian pseudoprime 95 Euler’s criterion 39 Euler’s number 1 Euler’s theorem 24 Euler’s totient function 24 evaluation homomorphism 26, 162 even permutation 15 exponentiation – fast 86 – fast modular 86 expression – rational 181 – regular see rational expression – star-free 190 extended Euclidean algorithm 21 – binary 117, 297 F φ function see totient function factor 161 factor critical pair 242 factor group see quotient group factor ring see quotient ring
Index | 333
fast exponentiation 86 – modular 86 fast Fourier transform 111 fast powering see fast exponentiation Fermat number 117 Fermat, P. de 20, 88, 134, 284 Fermat primality test 88 Fermat’s Last Theorem 134 Fermat’s little theorem 20 Fermat’s theorem on sums of two squares 284 Ferrari, L. 1 FFT see fast Fourier transform Fibonacci number 283 field 3, 15, 33 – algebraically closed 35 – finite 35 – function 153 – Galois 33 – prime 18 – skew 15 – splitting 34 – sub- 16 – used in AES 32 field extension 34 final state 176, 181 final transition 318 Fine, N. J. 164 finite automaton 181 finite field – used in AES 32 finitely generated 16 finitely presented group 244 finitely presented monoid 243 first-order logic 223 first-order variable 220 flip–flop 2, 194 formal language 173 formal power series 26 – derivative 28 Fourier, J. B. J. 109 Frederick the Great 49, 293 free group 243 – basis 243 – rank 243, 262 free monoid 161, 238 free partially commutative group 249 free partially commutative monoid 248 free product 245 free variable 224
frequency analysis 53 Frobenius automorphism 154 Frobenius endomorphism 155 Frobenius, F. G. 20, 154, 160 Frobenius homomorphism 20 Frobenius morphism 154 function – collision resistant 65 – compression 65 – decoding 50 – decryption 50 – Dehn 259 – elliptic 132 – encoding 50 – encryption 50 – hash 65 – one-way 65 – rational 153 – totient 24 – tower 283 – transition 176 function field 153 Fürer, M. 113 G Galois field 33 Galois, É. 2, 4, 15 Gauss, J. C. F. 131 gcd see greatest common divisor; Euclidean algorithm generalized Baumslag–Solitar group 246 generate 4 generated ideal 16 generator 6 Germain, S. 63 Gersten, S. M. 259 global solution 51 Gödel, K. P. 121, 223, 227 Gödel’s incompleteness theorem 223, 227 Goldwasser, S. 151, 152, 160 Goldwasser–Kilian primality certificates 150 good sequence 165 graph – rose 271, 277 – Schreier 263 – Stallings 262 graph group 249 graphical realization 269, 270 – deterministic 275
334 | Index
greatest common divisor 4, 20, 30 Green, J. A. VI, 206 Green’s lemma 207 Green’s relations 206 – D 206 – H 206 – J 206 – L 206 – R 206 Gromov, M. L. 247 Grothendieck, A. VII, 158 group 3, 4 – alternating 15 – automorphism 11 – Baumslag 258 – Baumslag–Gersten 259 – Baumslag–Solitar 245 – center 278, 283 – Coxeter 249 – cyclic 6 – dihedral 11 – finitely presented 244 – free 243 – free partially commutative 249 – generalized Baumslag–Solitar 246 – graph 249 – Hopfian 237 – hyperbolic 247 – Klein four- 12 – modular 278 – multiplicative 15 – Picard 143 – quotient 8 – residually finite 237 – right-angled Artin 249 – right-angled Coxeter 251 – Schützenberger 207, 208 – simple 194 – small cancellation 247 – special linear 237, 277 – symmetric 13 – symmetry 11 – Waack 246 group divisor 194 group homomorphism 43 group of units 15 group operation on elliptic curve 137 group-based cryptography 59
group-free monoid see aperiodic monoid Gustafson, W. H. 283 H H-class 206 halting problem 243 Hartshorne, R. 158 hash function 65 Hasse bound 135 Hasse, H. 135, 152, 158 Hasse’s theorem 135 Hellman, M. E. V, 58, 63, 73, 106, 145 Hermite, Ch. 1 Higman, G. VI, 167, 256 Higman’s lemma 167 Hilbert, D. 2, 32, 227 Hilbert’s basis theorem 32 Hilbert’s tenth problem 227 HNN extension 256 homomorphism 4 – evaluation 162 – Frobenius 20 – graphical realization 269, 270 – group 43 – partial 195 – projection 269 – ring 16 – syntactic 217 homomorphism theorem – groups 9 – rings 17 Hopf, H. 237 Hopfian group 237 hyperbolic group 247 I ideal 16, 205 – finitely generated 16 – generated 16 – left 205 – maximal 17 – principal 16, 205 – right 205 – two-sided 205 idempotent element 3, 42 identity 3 image 8, 17 incomparable elements 165 independence 248
Index | 335
independence relation 248 index – stabilization 232 index calculus 107 index of a subgroup 6 infinite word 210 infix see factor initial state 176, 181 inner automorphism 252 integer factorization 99 – elliptic curves 148 – Pollard’s p − 1 algorithm 100 – Pollard’s ρ algorithm 100 – quadratic sieve 101 integer multiplication – Karatsuba 108 – Schönhage–Strassen 113 integral domain 18 interpolation 70 introspective number 127 inverse element 3, 4 inversion 13 invertible element 200 irreducible normal form 241 irreducible polynomial 31 isomorphism 4 – of rings 17 isomorphism problem 236 J J-class 206 Jacobi, C. G. J. 39 Jacobi symbol 39 K Karatsuba, A. A. 108, 113 Karatsuba’s algorithm 108 Kayal, N. VI, 121 Keller, R. M. 248 Kerckhoffs, A. 51 Kerckhoffs’ principle 51 kernel 8, 17 kernel of the Schreier graph 264 key 50 – private 58 – public 58 – secret 58 key exchange 63, 145 Kilian, J. J. 151, 152, 160
Kleene, S. C. 173, 186, 187 Kleene star 173 Kleene’s theorem 187 Klein, F. Chr. 12 Klein four-group 12 knapsack problem 73 Knödel number 88 Knödel, W. 88 Koblitz, N. 158 Krohn, K. B. VI, 2, 193, 194 Krohn–Rhodes decomposition 193, 194 Krohn–Rhodes theorem 194 Kronecker delta 55 Kronecker, L. 55 Kruskal, J. B. 168, 169 Kruskal’s tree theorem 169 L L-class 206 label of a transition 181 Lagrange interpolation 70 Lagrange, J.-J. de 6 Lagrange polynomial 70 Lagrange’s theorem 6 Lallement, G. 193 Landau, E. G. H. 83 language 173, 186, 210 – accepted 211 – at a state 178 – deterministic 211 – Dyck 244 – formal 173 – mirror 233 – ω- 210 – ω-rational 214 – regular 186 – reverse 233 – star-free 190 Laocoön 52 law of quadratic reciprocity 40 lcm see least common multiple leading coefficient 26 least common multiple 23, 123 left-conjugate see conjugate left coset 5 left ideal 205 left inverse 41 left neutral 42 Legendre, A.-M. 40
336 | Index
Legendre symbol 40 Lehmer, D. H. 118 lemma – Bézout 21 – Dickson 167 – Fine–Wilf 164 – Green 207 – Higman 167 – Levi 170 – periodicity 164 – Zolotarev 39 length of a word 161, 237 length-lexicographic 237 Lenstra, H., Jr. 148, 149 letter 161, 237 Levi, F. W. D. 170 Levi’s lemma 170 lexicographic order 170, 237 Lindemann, C. L. F. von 1 line 135 linear Diophantine equation 118 linear Diophantine system 228 linear set 224 Liouville, J. 1 local divisor 190, 202 local solution 51 local submonoid 190 locally confluent 239 logic – first-order 223 – monadic second-order 210, 219 Lucas, É 117, 118 Lucas primality test 117 Lucas–Lehmer primality test 118 Lyndon factorization 171 Lyndon, R. C. VII, 163, 171, 267 Lyndon word 171 Lyndon–Schützenberger theorem 163 M MacLane, S. 2 man-in-the-middle attack 59 marking algorithm 178 master theorem 84 master theorem II 117 Matiyasevich, Yu. 227 M-automaton see automaton maximal ideal 17 Mazurkiewicz, A. 248
McKay, J. H. 7 McKnight, D. J., Jr. 186 McKnight’s theorem 186 median 297 Merkle, R. C. 66, 73 Merkle–Damgård construction 66 Merkle–Hellman cryptosystem 59, 73 Mersenne, M. 117 Mersenne number 117 Mezei, J. 232 Mezei’s theorem 232 Miller, G. L. 90 Miller–Rabin primality test 90 minimal automaton 177 minimal polynomial 46 mirror language 233 modular arithmetic 20 modular group 278 modulo 4, 8, 16 modulus 4, 20 monadic second-order logic 210, 219 monadic semi-Thue system 260 monoalphabetic cipher 52 monoalphabetic substitution 52 Monod, J. 135 monoid 3 – aperiodic 189 – augmented 203 – Brandt 208 – cancellative 250 – defining relations 241 – divisor 194 – finitely presented 243 – flip–flop 194 – free 161, 238 – free partially commutative 248 – group-free see aperiodic – presentation 238 – residually finite 237 – syntactic 175 – trace 248 – transition 180 Moore, E. F. 178 Moore’s minimization algorithm 178 morphism – Frobenius 154 – rational 154 MSO see monadic second-order logic
Index |
multiple 4 – least common 23, 123 multiple exponent attack 79 multiple root 30 multiplication table 241 multiplicative group 15 multiplicity of a root 30, 141 multi-prime RSA 79 Myhill, J. R. 180 Myhill–Nerode theorem 180 N Nair, M. 123 Nash-Williams, C. St. J. A. 167, 170 natural numbers 3 Nerode, A. 180 Neukirch, J. VII Neumann, B. H. 256 Neumann, H. 256 neutral element 3, 4 Newton, I., Sir 109 Newton method 109 NFA 188 – ε- 188 Nielsen, J. 262, 266, 268 Nielsen transformation 268, 283 Nielsen–Schreier theorem 266 Noether, A. E. 2, 32 Noetherian rewriting system 239 Noetherian ring 32 nondecreasing sequence 165 nondeterministic automaton 181 norm 139 norm of a polynomial 153 normal form – Britton reduced 258 – irreducible 241 – Weierstrass 134, 158 normal subgroup 8 Novikov, P. S. 236 number – algebraic 1 – Carmichael 88 – coprime 4 – Eulerian pseudoprime 95 – Euler’s 1 – Fermat 117 – Fibonacci 283 – introspective 127
– Knödel 88 – Mersenne 117 – natural 3 – pseudoprime 95 – smooth 102, 107 – Sophie Germain 63 – square-free 89 – strong pseudoprime 95 – transcendental 1 O ω-language 210 – deterministic 211 – rational 214 O-notation 83 ω-rational language 214 ω-regular 211 ω-word 210 odd permutation 15 one-time pad 57 one-way function 65 operation – Abelian 3 – associative 3 – commutative 3 – distributive 15 – on elliptic curves 137, 144 – on pseudocurves 146 order – length-lexicographic 237 – lexicographic 170, 237 – of a group 6 – of a polynomial at P 141 – of a rational function at P 154 – of a root see multiplicity – of an element 6 – product 167 – shortlex 237 ordered tree 168 Oscar 51 overlap critical pair 242 P pair – critical 242 – factor critical 242 – overlap critical 242 partial homomorphism 195 Pépin primality test 117
337
338 | Index
Pépin, Th. 117 perfect security 51, 56 period of a word 164 period of an element 205 periodicity lemma 164 permutation – even 15 – odd 15 – sign 13 Picard group 143 plaintext 50 Pocklington, H. C. 150, 151 Pocklington’s theorem 150 Poe, E. A. 53, 54 Pohlig, S. 106 Pohlig–Hellman algorithm 106 point at infinity 132, 144 pole of a rational function 154 Pollard, J. M. 100, 105 Pollard’s p − 1 algorithm 100 Pollard’s ρ algorithm (factorization) 100 Pollard’s ρ algorithm (logarithm) 105 polyalphabetic cipher 53 polynomial 26 – constant 26 – coprime 30 – degree 26 – derivative 28 – irreducible 31 – Lagrange 70 – minimal 46 – root of a 30 – thin 45 – zero 26 – zero of a 30 polynomial division 29 polynomial method 129 polynomial ring 26 – over E(k) 138, 153 Pomerance, C. B. 96 power circuit 259 power of a word 161 power series 26 powerset construction see subset construction pragmatic security 52 prefix 161 Presburger arithmetic 223 Presburger, M. 223, 226 Presburger’s theorem 226
presentation 238 primality certificate 150 – Goldwasser–Kilian 150 – Pocklington 151 primality test – AKS 121 – deterministic 121 – Fermat 88 – Goldwasser–Kilian 150 – Lucas 117 – Lucas–Lehmer 118 – Miller–Rabin 90 – Pépin 117 – Pocklington 150 – probabilistic 88 – Solovay–Strassen 89 prime field 18 prime number – Fermat 117 – Mersenne 117 – pseudo- 95 – Sophie Germain 63 primitive root 35, 110, 163 – of a word 171 primitive word 163, 171 principal divisor 142 principal ideal 16, 205 principal root see primitive root problem – conjugacy 236 – Diffie–Hellman 63, 146 – discrete logarithm 104 – extracting square roots 96 – halting 243 – Hilbert’s tenth 227 – integer factorization 99 – isomorphism 236 – knapsack 73 – word 236 product – amalgamated 253 – direct 24 – free 245 – semidirect 252 – wreath 194, 195 product automaton 232 product order 167 projection 269
Index | 339
pseudocurve 146 – partially defined sum 146 pseudoprime 95 – Eulerian 95 – strong 95 pseudo-random sequence 100 pseudovariety see variety public-key cryptosystem 59 Putnam, H. 228 Q Qin, J. 24 quadratic reciprocity 40 quadratic residue 38, 96 quadratic sieve 101 quasi-ordering 165 – well- 165 quotient group 8 quotient of a language 177 quotient ring 16 R ρ algorithm 100 R-class 206 RAAG see right-angled Artin group Rabin cryptosystem 59, 61 Rabin, M. O. 61, 90 Ramsey, F. P. 166 Ramsey’s theorem 166 rank formula 266 rank of a free group 243, 262 rational expression 181 rational function 153 rational morphism 154 rational set 181, 259 reachable state 176 reciprocity 38 recognizable 174 recognize 174, 216 – strongly 216 – weakly 216 recurrence 83 reduced word 263 Rees, D. 42 Rees sandwich semigroup 42 regular class 209 regular element 42 regular expression see rational expression regular language 186
regular n-gon 10 relation – Green’s 206 – independence 248 relatively prime see coprime remainder 21 residually finite 237 residue class 16 – quadratic 38 reverse language 233 rewriting system 238 – Church–Rosser 239 – complete 239 – confluent 239 – convergent 239 – locally confluent 239 – Noetherian 239 – semi-Thue 241 – strongly confluent 239 – terminating 239 Rhodes, J. L. VI, 2, 193, 194 Riemann, G. F. B. 121 right-angled Artin group 249 right-angled Coxeter group 251 right coset 5 right ideal 205 right inverse 41 right neutral 42 right zero 198 ring 3, 15 – commutative 15 – Noetherian 32 – polynomial 26 – zero 15 ring homomorphism 16 ring isomorphism 17 Rivest, R. L. V, 60 Robertson, G. N. 170 Robinson, J. 228 Rolle, M. 291 Rolle’s theorem 291 root – of unity 35 – primitive 35, 110, 163 root of a polynomial 30 – multiple 30 – multiplicity of a 30 – primitive 35 – simple 30
340 | Index
root of a rational function 154 rose 271, 277 Rosser, J. B. 238 Rousseau, G. 41 RSA cryptosystem 59, 60, 94 – multiple exponent attack 79 – multi-prime 79 RSA signature 68 Ruffini, P. 2, 15 run 181, 184 – accepting 181 S Saxena, N. VI, 121 scattered subword see subword scheme 158 Schleimer, S. 268 Schönhage, A. VI, 113 Schönhage–Strassen multiplication 113 Schoof, R. J. 152 Schreier graph 263 – complete 265 – kernel 264 Schreier, O. 262, 266 Schupp, P. VII Schützenberger group 207, 208 Schützenberger, M.-P. VI, 163, 189, 191, 207, 267 Schützenberger’s theorem 191 secret sharing 69 – Shamir 70 – subset of keys 69 security – absolute see perfect security – computational 51 – perfect 51, 56 – pragmatic 52 – relative computational 52 Selfridge, J. L. 96 semidirect product 252 semigroup 3 – cyclic 232 – Rees sandwich 42 – syntactic 217 semilinear set 224 semi-Thue system 241 – monadic 260 sentence 220, 224 separable endomorphism 156
sequence – bad 165 – decreasing 165 – good 165 – nondecreasing 165 – superincreasing 73 Serre, J. P. VII, 258, 262 set – accepted 181 – linear 224 – rational 181, 259 – recognizable 174 – regular 186 – semilinear 224 – solution 229 – star-free 190 Seymour, P. D. 170 Shallit, J. O. 164 Shamir, A. V, 60, 70, 73, 74, 78 Shamir’s attack 74 Shamir’s secret sharing 70 Shanks, D. 98, 104 Shannon, C. E. 56 shift cipher 52 shortlex 237 sign of a permutation 13 signature – digital 67 – DSA 67 – ElGamal 79 – RSA 68 simple group 194 simple root 30 skew field 15 small cancellation group 247 smooth number 102, 107 Solitar, D. 245 Solovay, R. M. 89 Solovay–Strassen primality test 89 solution set 229 Sophie Germain prime 63 source encoding 50, 55 source of an edge 263 spanning tree 264 special linear group 237, 277 spelling automaton 182, 188 splitting field 34 square 38, 96 square root 38, 96
Index | 341
square-free number 89 Squier, C. C. 247 stabilization index 232 Stallings graph 262 Stallings, J. R., Jr. VII, 262, 269 star see Kleene star star operation 173 star-free expression 190 star-free set 190 state – final 176, 181 – initial 176, 181 – reachable 176 state elimination 184 Steinitz, E. 2 Strassen, V. VI, 89, 113 string see word string rewriting system see semi-Thue system strong pseudoprime 95 strongly confluent 239 strongly recognize see recognize subfield 16 subgroup 5 – index of a 6 – normal 8 submonoid 4 – local 190 subring 16 subsemigroup 4 subset automaton 187 subset construction 187 substitution – monoalphabetic 52 – polyalphabetic 53 substructure 4 – generated 4 subtree 169 subword 165, 167 suffix 161 Sun, T. 24 superincreasing sequence 73 symbol – Jacobi 39 – Legendre 40 symmetric group 13 symmetry group 11 syntactic congruence 175, 217 syntactic homomorphism 217
syntactic monoid 175 syntactic semigroup 217 T target of an edge 263 Tartaglia, N. F. 1 terminating rewriting system 239 Theaitetos 1 theorem – Benois 262 – binomial 19 – Büchi 219, 221 – Cauchy 7 – chinese remainder 24 – degree formula 27 – Euler 24 – Euler’s criterion 39 – Fermat’s Last 134 – Fermat’s little 20 – Fine–Wilf 164 – Gödel’s incompleteness 223, 227 – Goldwasser–Kilian 151 – Hasse 135 – Hilbert’s basis 32 – homomorphism (for groups) 9 – homomorphism (for rings) 17 – Kleene 187 – Krohn–Rhodes 194 – Kruskal’s tree 169 – Lagrange 6 – Lyndon–Schützenberger 163 – master 84 – master II 117 – McKnight 186 – Mezei 232 – Myhill–Nerode 180 – Nielsen–Schreier 266 – Pocklington 150 – Presburger 226 – Ramsey 166 – Rolle 291 – Schützenberger 191 – sums of two squares 284 – Wedderburn 289 Thompson construction 182 Thompson, K. L. 182 threshold of an element 205 Thue, A. 161, 241 Tietze transformation 246
342 | Index
Tietze, H. F. F. 236, 246 Tonelli, A. 97 Tonelli’s algorithm 97 torus 131 totient function 24 tower function 283 trace monoid 248 track 222, 225 Trakhtenbrot, B. A. 221 transcendental number 1 transformation – Nielsen 268, 283 – Tietze 246 transition 176, 181 – ε- 183 – elimination of ε- 183 – final 318 – label of a 181 transition function 176 transition monoid 180 transposed elements 162 transposition 14 tree 168, 264 – ordered 168 – spanning 264 – sub- 169 trim automaton 231 two-sided ideal 205 U ultimately periodic word 214 unit 15, 200 universal property 161 unknown see variable V variable 26 – first-order 220 – free 224 – monadic second-order 220 variety 233 Vernam, G. 57 Vernam one-time pad 57 vertical 135 Vigenère, B. de 54 Vigenère cipher 54 Voltaire 49, 293 W Waack group 246 Waack, S. 246
Wagner, K. W. VII Wagstaff, S. S., Jr. 96 weakly recognize 216 Wedderburn, J. H. M. 289 Wedderburn’s theorem 289 Weierstrass equation see Weierstrass normal form Weierstrass, K. T. W. 134, 158 Weierstrass normal form 134, 158 Weil, A. 158 Weil pairing 158 well-quasi-ordering 165 Whitehead automorphism 268 Whitehead, J. H. C. VII, 268 Wiener attack 60 Wiener, M. J. 60 Wiles, A. 134 Wilf, H. S. 164 Winograd, S. 175 Witt, E. VII, 289 word 161, 186, 237 – alphabet of a 161 – cyclically reduced 267 – empty 161, 237 – finite 161 – infinite 210 – length 161, 237 – Lyndon 171 – period 164 – power 161 – primitive 163, 171 – reduced 263 – ultimately periodic 214 word problem 236, 238 wreath product 194, 195 wreath product principle 197 Z zero see root of a polynomial – of a polynomial 30 – of a rational function 154 – right 198 zero divisor 18 zero element 3 zero polynomial 26 zero ring 15 Zolotarev, Y. I. 39 Zolotarev’s lemma 39