An Introduction to Benford's Law [Course Book ed.] 9781400866588

This book provides the first comprehensive treatment of Benford's law, the surprising logarithmic distribution of s

136 54 4MB

English Pages 256 Year 2015

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Title
Copyright
Contents
Preface
1 Introduction
1.1 History
1.2 Empirical evidence
1.3 Early explanations
1.4 Mathematical framework
2 Significant Digits and the Significand
2.1 Significant digits
2.2 The significand
2.3 The significand σ-algebra
3 The Benford Property
3.1 Benford sequences
3.2 Benford functions
3.3 Benford distributions and random variables
4 The Uniform Distribution and Benford’s Law
4.1 Uniform distribution characterization of Benford’s law
4.2 Uniform distribution of sequences and functions
4.3 Uniform distribution of random variables
5 Scale-, Base-, and Sum-Invariance
5.1 The scale-invariance property
5.2 The base-invariance property
5.3 The sum-invariance property
6 Real-valued Deterministic Processes
6.1 Iteration of functions
6.2 Sequences with polynomial growth
6.3 Sequences with exponential growth
6.4 Sequences with super-exponential growth
6.5 An application to Newton’s method
6.6 Time-varying systems
6.7 Chaotic systems: Two examples
6.8 Differential equations
7 Multi-dimensional Linear Processes
7.1 Linear processes, observables, and difference equations
7.2 Nonnegative matrices
7.3 General matrices
7.4 An application to Markov chains
7.5 Linear difference equations
7.6 Linear differential equations
8 Real-valued Random Processes
8.1 Convergence of random variables to Benford’s law
8.2 Powers, products, and sums of random variables
8.3 Mixtures of distributions
8.4 Random maps
9 Finitely Additive Probability and Benford’s Law
9.1 Finitely additive probabilities
9.2 Finitely additive Benford probabilities
10 Applications of Benford’s Law
10.1 Fraud detection
10.2 Detection of natural phenomena
10.3 Diagnostics and design
10.4 Computations and Computer Science
10.5 Pedagogical tool
List of Symbols
Bibliography
Index
Recommend Papers

An Introduction to Benford's Law [Course Book ed.]
 9781400866588

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

AN INTRODUCTION TO BENFORD’S LAW

AN INTRODUCTION TO BENFORD’S LAW

Arno Berger and Theodore P. Hill

PRINCETON UNIVERSITY PRESS PRINCETON AND OXFORD

Copyright c 2015 by Princeton University Press Published by Princeton University Press 41 William Street, Princeton, New Jersey 08540 In the United Kingdom: Princeton University Press 6 Oxford Street, Woodstock, Oxfordshire, OX20 1TW All Rights Reserved ISBN: 978-0-691-16306-2 Library of Congress Control Number: 2014953765 British Library Cataloging-in-Publication Data is available This book has been composed in LATEX The publisher would like to acknowledge the authors of this volume for providing the camera-ready copy from which this book was printed. Printed on acid-free paper. ∞ press.princeton.edu Printed in the United States of America 10 9 8 7 6 5 4 3 2 1

Contents

Preface

vii

1 Introduction 1.1 History . . . . . . . . . . 1.2 Empirical evidence . . . . 1.3 Early explanations . . . . 1.4 Mathematical framework .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

1 3 4 6 7

2 Significant Digits and the Significand 11 2.1 Signif cant digits . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 The signif cand . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 The signif cand σ-algebra . . . . . . . . . . . . . . . . . . . . . . 14 3 The 3.1 3.2 3.3

Benford Property 22 Benford sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Benford functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Benford distributions and random variables . . . . . . . . . . . . 29

4 The 4.1 4.2 4.3

Uniform Distribution and Benford’s Law 43 Uniform distribution characterization of Benford’s law . . . . . . 43 Uniform distribution of sequences and functions . . . . . . . . . . 46 Uniform distribution of random variables . . . . . . . . . . . . . 54

5 Scale-, Base-, and Sum-Invariance 63 5.1 The scale-invariance property . . . . . . . . . . . . . . . . . . . . 63 5.2 The base-invariance property . . . . . . . . . . . . . . . . . . . . 74 5.3 The sum-invariance property . . . . . . . . . . . . . . . . . . . . 80 6 Real-valued Deterministic Processes 6.1 Iteration of functions . . . . . . . . . . . . 6.2 Sequences with polynomial growth . . . . 6.3 Sequences with exponential growth . . . . 6.4 Sequences with super-exponential growth 6.5 An application to Newton’s method . . . 6.6 Time-varying systems . . . . . . . . . . . 6.7 Chaotic systems: Two examples . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

90 90 93 97 101 111 116 124

vi

CONTENTS

6.8

Differential equations . . . . . . . . . . . . . . . . . . . . . . . . . 127

7 Multi-dimensional Linear Processes 7.1 Linear processes, observables, and difference 7.2 Nonnegative matrices . . . . . . . . . . . . 7.3 General matrices . . . . . . . . . . . . . . . 7.4 An application to Markov chains . . . . . . 7.5 Linear difference equations . . . . . . . . . . 7.6 Linear differential equations . . . . . . . . .

equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8 Real-valued Random Processes 8.1 Convergence of random variables to Benford’s law 8.2 Powers, products, and sums of random variables . 8.3 Mixtures of distributions . . . . . . . . . . . . . . . 8.4 Random maps . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

135 135 139 145 162 165 170

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

180 180 182 202 213

9 Finitely Additive Probability and Benford’s Law 216 9.1 Finitely additive probabilities . . . . . . . . . . . . . . . . . . . . 217 9.2 Finitely additive Benford probabilities . . . . . . . . . . . . . . . 219 10 Applications of Benford’s Law 10.1 Fraud detection . . . . . . . . . . . . 10.2 Detection of natural phenomena . . 10.3 Diagnostics and design . . . . . . . . 10.4 Computations and Computer Science 10.5 Pedagogical tool . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

223 224 225 226 228 230

List of Symbols

231

Bibliography

234

Index

245

Preface This book is an up-to-date reference on Benford’s law, a statistical phenomenon f rst documented in the nineteenth century. Benford’s law, also known as the signif cant-digit law, is a subject of great beauty, encompassing counterintuitive predictions, deep mathematical theories, and widespread applications ranging from fraud detection to diagnosis and design of mathematical models. Building on over a decade of our joint work, this text is a self-contained comprehensive treatment of the theory of Benford’s law that includes formal def nitions and proofs, open problems, dozens of basic theorems we discovered in the process of writing that have not before appeared in print, and hundreds of examples. Complementing the theory are overviews of its history, new empirical evidence, and applications. Inspiration for this project has come f rst and foremost from the wide variety of lay people and scientists who kept contacting us with basic questions about Benford’s law, and repeatedly asked for a good reference text. Not knowing of any, we decided to write one ourselves. Our main goal in doing so has been to assimilate the essential mathematical and statistical aspects of Benford’s law, and present them in a way we hope will aid researchers interested in applications, and will also inspire further theoretical advances in this fascinating f eld. After a brief overview of the history and empirical evidence of the law, the book makes a smooth progression through the f eld: basic facts about signif cant digits, Benford sequences, functions, and random variables; tools from the theory of uniform distribution; scale-, base-, and sum-invariance; one-dimensional dynamical systems and differential equations; powers of matrices, Markov chains, and difference equations; and products, powers, and mixtures of random variables. Two concluding chapters contain summaries of the f nitely additive theory of the law, and f ve general areas of applications. Many of the illustrative examples and verbal descriptions are also intended for the non-theoretician, and are accompanied by f gures, graphs, and tables that we hope will be helpful to all readers. An Introduction to Benford’s Law is intended as a reference tool for a broad audience: lay people interested in learning about the history and numerous current applications of this surprising statistical phenomenon; undergraduate students wanting to understand some of the basics; graduate students and professionals in science, engineering, and accounting who are contemplating using or already using Benford’s law in their own research; and professional mathematicians and statisticians, both those conducting theoretical or applied research in

viii

PREFACE

the f eld, and those in other areas who want to learn or at least have access to the basic mathematics underlying the subject. Most of the formal statements of theorems are accessible to an advanced undergraduate mathematics student, and although the proofs sometimes require familiarity with more advanced topics such as measure and ergodic theory, they should be accessible to most mathematics and statistics graduate students. As such, we hope the book may also provide a good base for a special topics course or seminar. We wish to thank the collaborators on our own research on Benford’s law, notably Leonid Bunimovich, Gideon Eshun, Steven Evans, Bahar Kaynar, Kent Morrison, Ad Ridder, and Klaus Sch¨ urger; the second author also wishes to express his deep gratitude to Lester Dubins, from whom he f rst learned about Benford’s law and who strongly encouraged him to write such a book, and to Amos Tversky for his advice and insights into testing a Benford theory about fabricating data. We gratefully acknowledge Bhisham Bherwani for his excellent copyediting, Kathleen Cioÿ and Quinn Fusting at Princeton University Press for the f ne administrative support, and especially our editor Vickie Kearn, who has been very enthusiastic about this project from the beginning and has helped us every step of the way. Finally, we both are grateful to Erika Rogers for continued technical and editorial support, assistance in researching the applications, and for designing and maintaining the Benford database [24], which currently contains listings of over 800 research papers, books, newspaper articles, software, and videos. Comments and suggestions for improvement by readers of this book will be gratefully received. Arno Berger and Theodore P. Hill, December 2014

Chapter One Introduction Benford’s law, also known as the First-digit or Signif cant-digit law, is the empirical gem of statistical folklore that in many naturally occurring tables of numerical data, the signif cant digits are not uniformly distributed as might be expected, but instead follow a particular logarithmic distribution. In its most common formulation, the special case of the f rst signif cant (i.e., f rst non-zero) decimal digit, Benford’s law asserts that the leading digit is not equally likely to be any one of the nine possible digits 1, 2, . . . , 9, but is 1 more than 30% of the time, and is 9 less than 5% of the time, with the probabilities decreasing monotonically in between; see Figure 1.1. More precisely, the exact law for the f rst signif cant digit is   1 Prob(D1 = d) = log10 1 + for all d = 1, 2, . . . , 9 ; (1.1) d here, D1 denotes the f rst signif cant decimal digit, e.g., √  D1 2 = D1 (1.414) = 1 ,   D1 π −1 = D1 (0.3183) = 3 ,   D1 eπ = D1 (23.14) = 2 .

Hence, the two smallest digits occur as the f rst signif cant digit with a combined probability close to 50 percent, whereas the two largest digits together have a probability of less than 10 percent, since Prob(D1 = 1) = log10 2 = 0.3010 ,

Prob(D1 = 2) = log10

3 = 0.1760 , 2

9 = 0.05115 , 8

Prob(D1 = 9) = log10

10 = 0.04575 . 9

and Prob(D1 = 8) = log10

The complete form of Benford’s law also specif es the probabilities of occurrence of the second and higher signif cant digits, and more generally, the joint distribution of all the signif cant digits. A general statement of Benford’s law that includes the probabilities of all blocks of consecutive initial signif cant digits is this: For every positive integer m, and for all initial blocks of m signif cant

2

CHAPTER 1

digits (d1 , d2 , . . . , dm ), where d1 is in {1, 2, . . . , 9}, and dj is in {0, 1, . . . , 9} for all j ≥ 2,  Xm −1    Prob D1 = d1 , D2 = d2 , . . . , Dm = dm = log10 1 + 10m−j dj , j=1

(1.2) where D2 , D3 , D4 , etc. represent the second, third, fourth, etc. signif cant decimal digits, e.g., √      D2 2 = 4 , D3 π −1 = 8 , D4 eπ = 4 .

For example, (1.2) yields the probabilities for the individual second signif cant digits,   X9 1 Prob(D2 = d2 ) = log10 1 + for all d2 = 0, 1, . . . , 9 , (1.3) j=1 10j + d2 which also are not uniformly distributed on all the possible second digit values 0, 1, . . . , 9, but are strictly decreasing, although they are much closer to uniform than the f rst digits; see Figure 1.1. 5

6

7

8

9

7.91

6.69

5.79

5.11

4.57

Prob(D2 = d) 11.96 11.38 10.88 10.43 10.03 9.66

9.33

9.03

8.75

8.49

Prob(D3 = d) 10.17 10.13 10.09 10.05 10.01 9.97

9.94

9.90

9.86

9.82

Prob(D4 = d) 10.01 10.01 10.00 10.00 10.00 9.99

9.99

9.99

9.98

9.98

d

0

Prob(D1 = d)

0

1

2

3

4

30.10 17.60 12.49 9.69

Figure 1.1: Probabilities (in percent) of the f rst four signif cant decimal digits, as implied by Benford’s law (1.2); note that the f rst row is simply the f rst-digit law (1.1). More generally, (1.2) yields the probabilities for longer blocks of digits as well. For instance, the probability that a number has the same f rst three signif cant digits as π = 3.141 is     1 315 Prob D1 = 3, D2 = 1, D3 = 4 = log10 1 + = log10 = 0.001380 . 314 314

A perhaps surprising corollary of the general form of Benford’s law (1.2) is that the signif cant digits are dependent, and not independent as one might expect

INTRODUCTION

3

[74]. To see this, note that (1.3) implies that the (unconditional) probability that the second digit equals 1 is   X9 1 6029312 log10 1 + Prob(D2 = 1) = = log10 = 0.1138 , j=1 10j + 1 4638501 whereas it follows from (1.2) that if the f rst digit is 1, the (conditional) probability that the second digit also equals 1 is

log10 12 − log10 11 = 0.1255 . log10 2 √ Note. Throughout, real numbers such as 2 and π√are displayed to four correct signif cant decimal like 2 = 1.414 ought to be read √ digits. Thus an equation √ as 1414 ≤ 1000 · 2 < 1415, and not as 2 = 1414 1000 . The only exceptions to this rule are probabilities given in percent (as in Figure 1.1), as well as the numbers ∆ and ∆∞ , introduced later; all these quantities only attain values between 0 and 100, and are shown to two correct digits after the decimal point. Thus, for instance, ∆ = 0.00 means 0 ≤ 100 · ∆ < 1, but not necessarily ∆ = 0. Prob(D2 = 1|D1 = 1) =

1.1

HISTORY

The f rst known reference to the logarithmic distribution of leading digits dates back to 1881, when the American astronomer Simon Newcomb noticed “how much faster the f rst pages [of logarithmic tables] wear out than the last ones,” and, after several short heuristics, deduced the logarithmic probabilities shown in the f rst two rows of Figure 1.1 for the f rst and second digits [111]. Some f fty-seven years later the physicist Frank Benford rediscovered the law [9], and supported it with over 20,000 entries from 20 different tables including such diverse data as catchment areas of 335 rivers, specif c heats of 1,389 chemical compounds, American League baseball statistics, and numbers gleaned from front pages of newspapers and Reader’s Digest articles; see Figure 1.2 (rows A, E, P, D and M, respectively). Although P. Diaconis and D. Freedman offer convincing evidence that Benford manipulated round-off errors to obtain a better f t to the logarithmic law [47, p. 363], even the unmanipulated data are remarkably close. Benford’s article attracted much attention and, Newcomb’s article having been overlooked, the law became known as Benford’s law and many articles on the subject appeared. As R. Raimi observed nearly half a century ago [127, p. 521], This particular logarithmic distribution of the f rst digits, while not universal, is so common and yet so surprising at f rst glance that it has given rise to a varied literature, among the authors of which are mathematicians, statisticians, economists, engineers, physicists, and amateurs. The online database [24] now references more than 800 articles on Benford’s law, as well as other resources (books, websites, lectures, etc.).

4

CHAPTER 1

Figure 1.2: Benford’s original data from [9]; reprinted courtesy of the American Philosophical Society.

1.2

EMPIRICAL EVIDENCE

Many tables of numerical data, of course, do not follow Benford’s law in any sense. Telephone numbers in a given region typically begin with the same few digits, and never begin with a 1; lottery numbers in all common lotteries are distributed uniformly, not logarithmically; and tables of heights of human adults, whether given in feet or meters, clearly do not begin with a 1 about 30% of the time. Even “neutral” mathematical data such as square-root tables of integers do not follow Benford’s law, as Benford himself discovered (see row K in Figure 1.2 above), nor do the prime numbers, as will be seen in later chapters. On the other hand, since Benford’s popularization of the law, an abundance of additional empirical evidence has appeared. In physics, for example, D. Knuth [90] and J. Burke and E. Kincanon [31] observed that of the most commonly used physical constants (e.g., the speed of light and the force of gravity listed on the inside cover of an introductory physics textbook), about 30% have leading signif cant digit 1; P. Becker [8] observed that the decimal parts of failure (haz-

INTRODUCTION

5

ard) rates often have a logarithmic distribution; and R. Buck et al., in studying the values of the 477 radioactive half-lives of unhindered alpha decays that were accumulated throughout the past century, and that vary over many orders of magnitude, found that the frequency of occurrence of the f rst digits of both measured and calculated values of the half-lives is in “good agreement” with Benford’s law [29]. In scientif c calculations, A. Feldstein and P. Turner called the assumption of logarithmically distributed mantissas “widely used and well established” [57, p. 241]; R. Hamming labeled the appearance of the logarithmic distribution in f oating-point numbers “well-known” [70, p. 1609]; and Knuth observed that “repeated calculations with real numbers will nearly always tend to yield better and better approximations to a logarithmic distribution” [90, p. 262]. Additional empirical evidence of Benford’s law continues to appear. M. Nigrini observed that the digital frequencies of certain entries in Internal Revenue Service f les are an extremely good f t to Benford’s law (see [113] and Figure 1.3); E. Ley found that “the series of one-day returns on the Dow-Jones Industrial Average Index (DJIA) and the Standard and Poor’s Index (S&P) reasonably agrees with Benford’s law” [98]; and Z. Shengmin and W. Wenchao found that “Benford’s law reasonably holds for the two main Chinese stock indices” [148]. In the f eld of biology, E. Costas et al. observed that in a certain cyanobacterium, “the distribution of the number of cells per colony satisf es Benford’s law” [39, p. 341]; S. Docampo et al. reported that “gross data sets of daily pollen counts from three aerobiological stations (located in European cities with different features regarding vegetation and climatology) f t Benford’s law” [49, p. 275]; and J. Friar et al. found that “the Benford distribution produces excellent f ts” to certain basic genome data [60, p. 1]. Figure 1.3 compares the probabilities of occurrence of f rst digits predicted by (1.1) to the distributions of f rst digits in four datasets: the combined data reported by Benford in 1938 (second-to-last row in Figure 1.2); the populations of the 3,143 counties in the United States in the 2010 census [102]; all numbers appearing on the World Wide Web as estimated using a Google search experiment [97]; and over 90,000 entries for Interest Received in U.S. tax returns from the IRS Individual Tax Model Files [113]. To instill in the reader a quantitative perception of closeness to, or deviation from, the f rst-digit law (1.1), for every distribution of the f rst signif cant decimal digit shown in this book, the number   1 ∆ = 100 · max9d=1 Prob(D1 = d) − log10 1 + d will also be displayed. Note that ∆ is simply the maximum difference, in percent, between the probabilities of the f rst signif cant digits of the given distribution and the Benford probabilities in (1.1). Thus, for example, ∆ = 0 indicates exact conformance to (1.1), and ∆ = 12.08 indicates that the probability of some digit d ∈ {1, 2, . . . , 9} differs from log10 (1 + d−1 ) by 12.08%, and the probability of no other digit differs by more than this.

6

CHAPTER 1

exact first-digit law (∆ = 0) Benford’s combined data (∆ = 0.89)

Prob(D1 = d)

0.3

populations of US counties (∆ = 1.41) numbers on WWW (∆ = 1.54) 0.2

US tax returns (∆ = 0.48)

0.1

1

2

3

4

5 d

6

7

8

9

Figure 1.3: Comparisons of four datasets to Benford’s law (1.1).

All these statistics aside, the authors also highly recommend that the justif ably skeptical reader perform a simple experiment, such as randomly selecting numerical data from front pages of several local newspapers, or from “a Farmer’s Almanack” as Knuth suggests [90], or running a Google search similar to the Dartmouth classroom project described in [97]. 1.3

EARLY EXPLANATIONS

Since the empirical signif cant-digit law (1.1) or (1.2) does not specify a welldef ned statistical experiment or sample space, most early attempts to explain the appearance of Benford’s law argued that it is “merely the result of our way of writing numbers” [67] or “a built-in characteristic of our number system” [159]. The idea was to f rst show that the set of real numbers satisf es (1.1) or (1.2), and then suggest that this explains the empirical statistical evidence. A common starting point has been to try to establish (1.1) for the positive integers, beginning with the prototypical set {D1 = 1} = {1, 10, 11, . . . , 18, 19, 100, 101, . . ., 198, 199, 1000, 1001, . . .} , the set of positive integers with f rst signif cant digit 1. The source of diÿ culty and much of the fascination of the f rst-digit problem is that the set {D1 = 1} does not have a natural density among the integers, that is, the proportion of integers in the set {D1 = 1} up to N , i.e., the ratio #{1 ≤ n ≤ N : D1 (n) = 1} , N

(1.4)

7

INTRODUCTION

does not have a limit as N goes to inf nity, unlike the sets of even integers or primes, say, which have natural densities 12 and 0, respectively. It is easy to see that the empirical density (1.4) of {D1 = 1} oscillates repeatedly between 19 and 59 , and thus it is theoretically possible to assign any number between 19 and 5 9 as the “probability” of this set. Similarly, the empirical density of {D1 = 9} 1 forever oscillates between 81 and 19 ; see Figure 1.4. #{1 ≤ n ≤ N : D1 = d} N

1 5 9

d=1 1 9 1 81

d=9 1

10

100

1000

10000

N

Figure 1.4: The sets {D1 = 1} and {D1 = 9} do not have a natural density (and neither does {D1 = d} for any d = 2, 3, . . . , 8). Many partial attempts to put Benford’s law on a solid logical basis have been made, beginning with Newcomb’s own heuristics, and continuing through the decades with various urn model arguments and mathematical proofs; Raimi [127] has an excellent review of these. But as the eminent logician, mathematician, and philosopher C. S. Peirce once observed, “in no other branch of mathematics is it so easy for experts to blunder as in probability theory” [63, p. 273], and the arguments surrounding Benford’s law certainly bear that out. Even W. Feller’s classic and hugely inf uential text [58] contains a critical f aw that apparently went unnoticed for half a century. Specif cally, the claim by Feller and subsequent authors that “regularity and large spread implies Benford’s Law” is fallacious for any reasonable def nitions of regularity and spread (measure of dispersion) [21]. 1.4

MATHEMATICAL FRAMEWORK

A crucial part of (1.1), of course, is an appropriate interpretation of Prob. In practice, this can take several forms. For sequences of real numbers (x1 , x2 , . . .), Prob usually refers to the limiting proportion (or relative frequency) of elements in the sequence for which an event such as {D1 = 1} occurs. Equivalently, f x a positive integer N and calculate the probability that the f rst digit is 1 in an experiment where one of the elements x1 , x2 , . . . , xN is selected at random (each with probability 1/N ); if this probability has a limit as N goes to inf nity, then the limiting probability is designated Prob(D1 = 1). Implicit in this usage of

8

CHAPTER 1

Prob is the assumption that all limiting proportions of interest actually exist. Similarly, for real-valued functions f : [0, +∞) → R, f x a positive real number T , choose a number τ at random uniformly between 0 and T , and calculate the probability that f (τ ) has f rst signif cant digit 1. If this probability has a limit, as T → +∞, then Prob(D1 = 1) is that limiting probability. For a random variable or probability distribution, on the other hand, Prob simply denotes the underlying probability of the given event. Thus, if X is a random variable, then Prob (D1 (X) = 1) is the probability that the f rst signif cant digit of X is 1. Finite datasets of real numbers can also be dealt with this way, with Prob being the empirical distribution of the dataset. One of the main themes of this book is the robustness of Benford’s law. In the context of sequences of numbers, for example, iterations of linear maps typically follow Benford’s law exactly; Figure 1.5 illustrates the convergence of f rst-digit probabilities for the Fibonacci sequence (1, 1, 2, 3, 5, 8, 13, . . .). As will be seen in Chapter 6, not only do iterations of most linear functions follow Benford’s law exactly, but iterations of most functions close to linear also follow Benford’s law exactly. Similarly, as will be seen in Chapter 8, powers and products of very general classes of random variables approach Benford’s law in the limit; Figure 1.6 illustrates this starting with U (0, 1), the standard random variable uniformly distributed between 0 and 1. Similarly, if random samples from different randomly-selected probability distributions are combined, the resulting meta-sample also typically converges to Benford’s law; Figure 1.7 illustrates this by comparing two of Benford’s original empirical datasets with the combination of all his data.

Prob(D1 = d)

N = 10

N = 100

N = 1000

0.3 (∆ = 12.08)

(∆ = 1.88)

(∆ = 0.19)

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

0.2 0.1

d

Figure 1.5: Probabilities that a number chosen uniformly from among the f rst N Fibonacci numbers has f rst signif cant digit d.

9

INTRODUCTION

Prob(D1 = d)

X = U(0,1)

X 10

X2

0.3 (∆ = 18.99)

(∆ = 10.94)

(∆ = 2.38)

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

0.2 0.1

d

Figure 1.6: First-digit probabilities of powers of a U (0, 1) random variable X.

Non-decimal bases Throughout this book, attention will normally be restricted to decimal (i.e., base-10) signif cant digits, and when results for more general bases are employed, that will be made explicit. From now on, therefore, log x will always denote the logarithm base 10 of x, while ln x is the natural logarithm of x. For convenience, the convention log 0 := ln 0 := 0 is adopted. Nearly all the results in this book that are stated only with respect to base 10 carry over easily to arbitrary integer bases b ≥ 2, and the interested reader may f nd some pertinent details in [15]. In particular, the general form of (1.2) with respect to any such base b is  X   −1  m (b) (b) (b) Prob D1 = d1 , D2 = d2 , . . . , Dm = dm = logb 1+ bm−j dj , j=1

(1.5) (b) (b) (b) where logb denotes the base-b logarithm, and D1 , D2 , D3 , etc. are the f rst, second, third, etc. signif cant digits base b, respectively; so in (1.5), d1 is an integer in {1, 2, . . . , b − 1}, and for j ≥ 2, dj is an integer in {0, 1, . . . , b − 1}.  (2)  Note that in the case m = 1 and b = 2, (1.5) reduces to Prob D1 = 1 = 1, which is trivially true because the f rst signif cant digit base 2 of every non-zero number is 1.

This book is organized as follows. Chapter 2 contains formal def nitions, examples, and graphs of signif cant digits and the signif cand (mantissa) function, and also of the probability spaces needed to formulate Benford’s law precisely, including the crucial natural domain of “events,” the so-called signif cand σalgebra. Chapter 3 def nes Benford sequences, functions, and random variables, with examples of each. Chapters 4 and 5 contain four of the main mathematical characterizations of Benford’s law, with proofs and examples. Chapters 6 and 7 study Benford’s law in the context of deterministic processes, including both one- and multi-dimensional discrete-time dynamical systems and algorithms as

10

CHAPTER 1

Spec. Heat (row E)

Atomic Wgt. (row J)

Math tables (row K)

0.3 (∆ = 6.10)

(∆ = 17.09)

(∆ = 4.40)

0.2 0.1 1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

Prob(D1 = d)

Combined data 0.3 (∆ = 0.89) 0.2 0.1 1 2 3 4 5 6 7 8 9

d

Figure 1.7: Empirical f rst-digit probabilities in Benford’s original data; see Figure 1.2.

well as continuous-time processes generated by differential equations. Chapter 8 addresses Benford’s law for random variables and stochastic processes, including products of random variables, mixtures of distributions, and random maps. Chapter 9 offers a glimpse of the complementary theory of Benford’s law in the non-traditional context of f nitely additive probability theory, and Chapter 10 provides a brief overview of the many applications of Benford’s law that continue to appear in a wide range of disciplines. The mathematical detail in this book is on several levels: The basic explanations, and many of the f gures and comments, are intended for a general scientif c audience; the formal statements of def nitions and theorems are accessible to an undergraduate mathematics student; and the proofs, some of which contain basics of measure and ergodic theory, are accessible to a mathematics graduate student (or a diligent undergraduate).

Chapter Two Signif cant Digits and the Signif cand Benford’s law is a statement about the statistical distribution of signif cant (decimal) digits or, equivalently, about signif cands, viz., fraction parts in f oatingpoint arithmetic. Thus, a natural starting point for any study of Benford’s law is the formal def nition of signif cant digits and the signif cand function. 2.1

SIGNIFICANT DIGITS

Informally, the f rst signif cant decimal digit of a positive real number x is the f rst non-zero digit appearing in the decimal expansion of x, e.g., the f rst signif cant digits of 2015 and 0.2015 are both 2. The second signif cant digit is the decimal digit following the f rst signif cant digit, so the second signif cant digits of 2015 and 0.2015 are both 0. This informal def nition, however, is ambiguous about whether the f rst signif cant digit of 19.99 . . . = 20 is 1 or 2. The convention used throughout this book, as seen in the following formal def nition, is that the f rst signif cant digit of 19.99 . . . is 2. Interchanging the strict and non-strict inequalities in the def nition would change that convention, but would not affect the essential conclusions about Benford’s law in the following chapters. Definition 2.1. For every non-zero real number x, the f rst signif cant (decimal ) digit of x, denoted by D1 (x), is the unique integer j ∈ {1, 2, . . . , 9} satisfying 10k j ≤ |x| < 10k (j + 1) for some (necessarily unique) k ∈ Z. Similarly, for every m ≥ 2, m ∈ N, the mth signif cant (decimal ) digit of x, denoted by Dm (x), is def ned inductively as the unique integer j ∈ {0, 1, . . . , 9} such that X  X  m−1 m−1 Di (x)10m−i + j ≤ |x| < 10k Di (x)10m−i + j + 1 10k i=1

i=1

for some (necessarily unique) k ∈ Z; for convenience, Dm (0) := 0 for all m ∈ N.

Note that, by def nition, the f rst signif cant digit D1 (x) of every non-zero x is not 0, whereas the second, third, etc. signif cant digits may be any integers in {0, 1, . . . , 9}. Example 2.2. √   √   √  √  √  D1 2 = D1 − 2 = D1 10 2 = 1 , D2 2 = 4 , D3 2 = 1 ;         D1 π −1 = D1 10π −1 = 3 , D2 π −1 = 1 , D3 π −1 = 8 . z

12

CHAPTER 2

2.2

THE SIGNIFICAND

Informally, the signif cand of a real number is its coeÿ cient when expressed in f oating-point (“scientif c”) notation, so the signif cand of 2015 = 2.015 · 103 is 2.015. Unlike in the def nition of signif cant digit, there is no ambiguity here — the signif cand of 19.99 . . . = 20 is 1.99 . . . = 2. The formal def nition of the signif cand (function) is this: Definition 2.3. The (decimal ) signif cand function S : R → [1, 10) is def ned as follows: If x 6= 0 then S(x) = t, where t is the unique number in [1, 10) with |x| = 10k t for some (necessarily unique) k ∈ Z; if x = 0 then, for convenience, S(0) := 0. Explicitly, S is given by S(x) = 10log |x|−⌊log |x|⌋

for all x = 6 0;

here ⌊x⌋ denotes the largest integer less than or equal to the real number x. (The function ⌊·⌋ is often referred to as the f oor function, e.g., ⌊π⌋ = ⌊3.141⌋ = 3, and ⌊−π⌋ = −4.) Observe that, for all x ∈ R,   S(10k x) = S(x) = S S(x) for every k ∈ Z .

Figure 2.1 depicts the graph of the signif cand function and the graph of its logarithm. Between integers, the latter is a linear function of log |x|. This fact will play an important role in the theory of Benford’s law in later chapters. S(x)

log S(x)

10

1

x −10

−1

1

10

−3 −2 −1

0

1

2

3

log |x|

Figure 2.1: Graphs of the signif cand function S (left) and its logarithm. Note. The original word used in American English to describe the coeÿ cient of f oating-point numbers in computer hardware seems to have been mantissa, and this usage remains common among some computer scientists and practitioners. However, this use of the word mantissa is discouraged by the IEEE f oating-point standard committee and by many professionals because it conf icts with the preexisting usage of mantissa for the fractional part of a logarithm. In accordance

13

SIGNIFICANT DIGITS AND THE SIGNIFICAND

with the IEEE standard, only the term signif cand will be used henceforth in this book. (With the signif cand S = S(x) as in Def nition 2.3, the (traditional) mantissa of x would simply be log S(x).) The reader should also note that in the literature, the signif cand is sometimes taken to have values in [0.1, 1) rather than in [1, 10). Example 2.4. √   √   √  √ 2 = S − 2 = S 10 2 = 2 = 1.414 ,     S π −1 = S 10π −1 = 10π −1 = 3.183 . S

z

The signif cand uniquely determines all the signif cant digits, and vice versa. This relationship is recorded in the following proposition which immediately follows from Def nitions 2.1 and 2.3. Proposition 2.5. Let x be any real number. Then: P (i) S(x) = m∈N 101−m Dm (x);

(ii) Dm (x) = ⌊10m−1 S(x)⌋ − 10⌊10m−2S(x)⌋ for every m ∈ N.

Thus, Proposition 2.5(i) expresses the signif cand of a number as an explicit function of the signif cant digits of that number, and (ii) expresses the signif cant digits as a function of the signif cand. It is important to note that the def nitions of signif cand and signif cant digits per se do not involve any decimal expansion of x. However, it is clear from Proposition 2.5(i) that the signif cant digits provide a decimal expansion of S(x), and that a sequence (dm ) in {0, 1, . . . , 9} is the sequence of signif cant digits of some positive real number if and only if d1 6= 0, and dm = 6 9 for inf nitely many m. Example 2.6. It follows from Proposition 2.5, together with Examples 2.2 and 2.4, that √ √  √  √  √  S 2 = D1 2 + 10−1 D2 2 + 10−2 D3 2 + . . . = 1.414 = 2 ,

as well as that

√  √  2 = 2 = 1, √   √  √  D2 2 = 10 2 − 10 2 = 4 , √  √    √  D3 2 = 100 2 − 10 10 2 = 1 , D1

etc .

z

Since the signif cant digits determine the signif cand, and are in turn determined by it, the informal version (1.2) of Benford’s law in Chapter 1 has an immediate and very concise counterpart in terms of the signif cand function, namely, Prob(S ≤ t) = log t for all 1 ≤ t < 10 . (2.1)

(Recall that log denotes the base-10 logarithm throughout.) As noted earlier, the formal versions of (1.2) and (2.1) will be developed in detail below.

14 2.3

CHAPTER 2

THE SIGNIFICAND σ-ALGEBRA

The informal statements (1.1), (1.2), and (2.1) of Benford’s law all involve probabilities. Hence, to formulate mathematically precise versions of these statements, it is necessary to reformulate them in the setting of rigorous probability theory. The fundamental domain of standard modern probability theory is a nonempty set (the “outcome space”) together with a σ-algebra of subsets of (the “events”). Recall that a σ-algebra A on is simply a family of subsets of such that ∅ ∈ A, and A is closed under complements and countable unions, that is, A ∈ A =⇒ Ac := {ω ∈ : ω 6∈ A} ∈ A , and An ∈ A for all n ∈ N =⇒

[

n∈N

An ∈ A .

The largest σ-algebra on a non-empty set is the so-called power set of , the set of all subsets of . If is a f nite set, this power σ-algebra is often the natural setting for most probabilistic purposes. For larger such as the real line R, however, the power set contains many complicated and non-constructible sets which cause diÿ culties in mathematical analysis, including standard countably additive probability theory, even with the most common probability distributions such as uniform, exponential, and normal distributions. As detailed below, therefore, instead of the power set, the standard σ-algebra on R used to def ne and analyze both continuous and discrete distributions is the so-called Borel σ-algebra B, a proper subset (sub-σ-algebra) of the power set of R. Given any collection E of subsets of , there is a (unique) smallest σ-algebra on containing E, referred to as the σ-algebra generated by E and denoted by σ(E). As indicated above, one of the most important examples of σ-algebras in real analysis and probability theory is the Borel σ-algebra B on R, which by def nition is the σ-algebra generated by all intervals of real numbers, i.e., B = σ({J ⊂ R : J an interval }). If C ⊂ R then B(C) is understood to be the σ-algebra   C ∩ B := {C ∩ B : B ∈ B} on C; for brevity, write B[a, b) instead of B [a, b) , where [a, b) = {x ∈ R : a ≤ x < b}, and write B+ instead of B(R+ ), where R+ = {x ∈ R : x > 0}. Example 2.7. The Borel σ-algebras B[0, 1) and B[1, 10), which play important roles in the theory of Benford’s law, are simply the smallest σ-algebras on [0, 1) and [1, 10) containing all intervals of the form [a, b), where 0 ≤ a ≤ b < 1 and 1 ≤ a ≤ b < 10, respectively. z Just as the power set of R is simply too large a σ-algebra for much of standard real analysis and probability theory, and is therefore replaced by the smaller Borel σ-algebra B on R, the σ-algebra B is itself too large for analyzing the signif cant digit and signif cand functions that are essential to formulating and analyzing Benford’s law. Instead, a smaller collection of sets, i.e., a proper sub-σ-algebra of B, will be seen to be the natural domain for Benford’s law.

SIGNIFICANT DIGITS AND THE SIGNIFICAND

15

In order to describe this new σ-algebra, it is helpful to review the notion of a σ-algebra generated by a function. Given any function f : → R, recall that for every subset C of R, the set f −1 (C) ⊂ , called the pre-image of C under f , is def ned as f −1 (C) = {ω ∈ : f (ω) ∈ C} . The σ-algebra on

generated by the collection of sets

E = {f −1 (J) : J ⊂ R, J an interval } is also referred to as the σ-algebra generated by f ; it will be denoted by σ(f ). Thus σ(f ) is the smallest σ-algebra on that contains all sets of the form {ω ∈ : a ≤ f (ω) ≤ b}, for every a, b ∈ R. It is easy to check that in fact σ(f ) = {f −1 (B) : B ∈ B}. Similarly, a whole family F of functions f : → R def nes a σ-algebra σ(F), via [    σ(F) = σ σ(f ) = σ f −1 (J) : J ⊂ R, J an interval, f ∈ F , f ∈F

which is simply the smallest σ-algebra on containing {ω ∈ : a ≤ f (ω) ≤ b} for all a, b ∈ R and all f ∈ F. In probability theory, functions f : → R with σ(f ) ⊂ A are called random variables. Probability textbooks typically use symbols X, Y , etc., rather than f , g, etc., to denote random variables, and this practice will be followed here also. Example 2.8. (i) Let f : [0, 1] → R be the so-called tent map def ned by f (x) = 1 − |2x − 1|; e.g., see [45, §1.8]. Since     f −1 ([0, a]) = 0, 12 a ∪ 1 − 21 a, 1 for all a ∈ [0, 1] ,

it is clear that σ(f ) is the proper sub-σ-algebra of B[0, 1] consisting of all Borel sets in [0, 1] that are symmetric about x = 21 , i.e., with 1 − A denoting the set {1 − a : a ∈ A}, σ(f ) = {A ∈ B[0, 1] : A = 1 − A} . (ii) Let X be a Bernoulli random variable, i.e., a random variable on (R, B) taking only the values 0 and 1. Then σ(X) is the sub-σ-algebra of B given by  σ(X) = ∅, {0}, {1}, {0, 1}, R, R\{0}, R\{1}, R\{0, 1} ;

here and throughout, A\B = A ∩ B c is the set of all elements in A that are not in B. z One key step in formulating Benford’s law precisely is identifying the correct σ-algebra. As it turns out, in the signif cant digit framework there is only one natural candidate, which, although strictly smaller than B, is nevertheless both intuitive and easy to describe. Informally, this σ-algebra is simply the collection of all subsets of R that can be described in terms of the signif cand function.

16

CHAPTER 2

Definition 2.9. The signif cand σ-algebra S is the σ-algebra on R+ generated by the signif cand function S, i.e., S = R+ ∩ σ(S).

The importance of the σ-algebra S comes from the fact that for every event A ∈ S and every x > 0, knowing the signif cand S(x) of x is enough to decide whether x ∈ A or x 6∈ A. Worded slightly more formally, this observation reads as follows. Lemma 2.10. For every function f : R+ → R the following statements are equivalent: (i) f is completely determined by S, i.e., there  exists  a function ϕ : [1, 10) → R with σ(ϕ) ⊂ B[1, 10) such that f (x) = ϕ S(x) for all x ∈ R+ ;

(ii) σ(f ) ⊂ S.

Proof. Assume and let J ⊂ R be any interval. Then B = ϕ−1 (J) ∈ B  (i) −1 −1 −1 and f (J) = S ϕ (J) = S −1 (B) ∈ S, showing that σ(f ) ⊂ S. Conversely, if σ(f ) ⊂ S then f (10x) = f (x) for all x > 0. To see this, assume by way of contradiction that f (x0 ) < f (10x0 ) for some x0 > 0, let h  i A = f −1 f (x0 ) − 1, 21 f (x0 ) + f (10x0 ) ∈ σ(f ) ⊂ S ,

and note that x0 ∈ A whereas 10x0 6∈ A. Since A = S −1 (B) for some B ∈ B, this leads to the contradiction that S(x0 ) ∈ B and S(x0 ) = S(10x0 ) 6∈ B. Hence f (10x) = f (x) for all x > 0, and by induction f (10k x) = f (x) for all k ∈ Z. Given t ∈ [1, 10), pick any x > 0 with S(x) = t and def ne ϕ(t) = f (x). Since by a factor 10k for some k ∈ Z, ϕ : [1, 10) → R is wellany two choices of x differ  def ned, and ϕ S(x) = f (x) for all x > S 0. Moreover, for any interval J ⊂ R and t ∈ [1, 10), ϕ(t) ∈ J if and only if t ∈ k∈Z 10k f −1 (J). By assumption, the latter set belongs to S, which in turn shows that σ(ϕ) ⊂ B[1, 10). 

Thus Lemma 2.10 states that the signif cand σ-algebra S is the family of all events A ⊂ R+ that can be described completely in terms of their signif cands, or, equivalently (by Theorem 2.11 below), in terms of their signif cant digits. For example, the set A1 of positive numbers whose f rst signif cant digit is 1 and whose third signif cant digit is not 7, i.e., A1 = {x > 0 : D1 (x) = 1, D3 (x) 6= 7} , belongs to S, as do the set A2 of all x > 0 whose signif cant digits are all 5 or 6, i.e., A2 = {x > 0 : Dm (x) ∈ {5, 6} for all m ∈ N} , and the set A3 of numbers whose signif cand is rational, A3 = {x > 0 : S(x) ∈ Q} .

17

SIGNIFICANT DIGITS AND THE SIGNIFICAND

On the other hand, the interval [1, 2], for instance, does not belong to S. This follows from the next theorem, which provides a useful characterization of the signif cand sets, i.e., the members of the family S. For its formulation, for every x ∈ R and every set C ⊂ R, let xC = {xc : c ∈ C}. Theorem 2.11 ([74]). For every A ∈ S, [ A= 10k S(A) , k∈Z

where S(A) = {S(x) : x ∈ A} ⊂ [1, 10). Moreover, n[ o S = R+ ∩ σ(D1 , D2 , D3 , . . .) = 10k B : B ∈ B[1, 10) . k∈Z

(2.2)

(2.3)

Proof. By def nition,

S = R+ ∩ σ(S) = R+ ∩ {S −1 (B) : B ∈ B} = R+ ∩ {S −1 (B) : B ∈ B[1, 10)} .

Thus, given any A ∈ S, there exists a set B ∈ B[1, 10) with [ A = R+ ∩ S −1 (B) = 10k B . k∈Z

Since S(A) = B, it follows that (2.2) holds for all A ∈ S. To prove (2.3), f rst observe that by Proposition 2.5(i) the signif cand function S is completely determined by the signif cant digits D1 , D2 , D3 , . . . , so σ(S) ⊂ σ(D1 , D2 , D3 , . . .) and hence S ⊂ R+ ∩ σ(D1 , D2 , D3 , . . .). Conversely, by Proposition 2.5(ii), every Dm is determined by S; thus σ(Dm ) ⊂ σ(S) for all m ∈ N, showing that σ(D1 , D2 , D3 , . . .) ⊂ σ(S) as well. To verify the remaining S equality in (2.3), note that for every A ∈ S, S(A) ∈ B[1, 10) and hence A = k∈Z 10k B for B = S(A), by (2.2). Conversely, every set of the form S k + −1 (B) with B ∈ B[1, 10) clearly belongs to S.  k∈Z 10 B = R ∩ S Note S that for every A ∈ S there is a unique set B ∈ B[1, 10) such that A = k∈Z 10k B, and (2.2) shows that in fact B = S(A).

Example 2.12. The set A4 of positive numbers with f rst signif cant digit 1 and all other signif cant digits 0, A4 = {10k : k ∈ Z} = {. . . , 0.01, 0.1, 1, 10, 100, . . .} ,

belongs to S. This can be seen either by observing that A4 is the set of positive reals with signif cand exactly equal to 1, i.e., A4 = R+ ∩ S −1 ({1}), or by noting that A4 = {x > 0 : S D1 (x) = 1, Dm (x) = 0 for all m ≥ 2}, or by using (2.3) and the fact that A4 = k∈Z 10k {1} and {1} ∈ B[1, 10). z

Example 2.13. The singleton set {1} and the interval [1, 2] do not belong to S, since the number 1 cannot be distinguished from the number 10 using only signif cant digits, as both have f rst signif cant digit 1 and all other signif cant digits 0. Nor, for the same reason, can the interval [1, 2] be distinguished from S the interval [10, 20]. Formally, neither of these sets is of the form k∈Z 10k B for any B ∈ B[1, 10). z

18

CHAPTER 2

Although the signif cand function and σ-algebra above were def ned in the setting of real numbers, the same concepts carry over immediately to the most fundamental setting of all, namely, the set of positive integers N. In this case, the induced σ-algebra SN on N is interesting in its own right. Note that, as in the case of the signif cand σ-algebra, SN does not include all subsets of positive integers, i.e., it is a smaller σ-algebra than the power set of N. Example 2.14. The restriction SN of S to subsets of N, i.e., SN = {N ∩ A : A ∈ S} , is a σ-algebra on N. A characterization of SN analogous to that of S given in Theorem 2.11 is as follows: Denote by N 10 the set of all positive integers not divisible by 10, i.e., N = N\10N. Then 10 n o [ SN = A ⊂ N : A = 10k B for some B ⊂ N . 10 k∈N0

A typical member of SN is

{271, 2710, 3141, 27100, 31410, 271000, 314100, . . .} . For instance, note that the set {31410, 314100, 3141000, . . .} does not belong to SN since 31410 is indistinguishable from 3141 in terms of signif cant digits, because all signif cant digits of both numbers are identical, so if the former number were to belong to A ∈ SN then the latter would too. Note also that the corresponding signif cand function on N still only takes values in [1, 10), as before, but may never be an irrational number. In fact, the possible values of S on N are even more restricted: S(n) = t for some n ∈ N if and only if t ∈ [1, 10) and 10k t ∈ N for some integer k ≥ 0. z

The next lemma establishes some basic closure properties of the signif cand σ-algebra that will be essential later in studying characteristic aspects of Benford’s law such as scale- and base-invariance. To formulate these properties concisely, for every C ⊂ R+ and n ∈ N, let C 1/n = {x > 0 : xn ∈ C}. Lemma 2.15. The following properties hold for the signif cand σ-algebra S:

(i) S is self-similar with respect to multiplication by integer powers of 10, i.e., 10k A = A for every A ∈ S and k ∈ Z; (ii) S is closed under multiplication by scalars, i.e., aA ∈ S for every A ∈ S and a > 0; (iii) S is closed under integral roots, i.e., A1/n ∈ S for every A ∈ S and n ∈ N. Informally, property (i) says that every signif cand set remains unchanged when multiplied by an integer power of 10 — ref ecting the simple fact that shifting the decimal point keeps all the signif cant digits, and hence the set itself, unchanged; (ii) asserts that if every element of a set expressible solely in terms of signif cant

19

SIGNIFICANT DIGITS AND THE SIGNIFICAND

digits is multiplied by a positive constant, then the new set is also expressible by signif cant digits; similarly, (iii) states that the collection of square (cube, fourth, etc.) roots of the elements of every signif cand set is also expressible in terms of signif cant digits alone. Proof. (i) This is clear from (2.2) since S(10k A) = S(A) for every S k ∈ Z. (ii) Given A ∈ S, by (2.3) there exists B ∈ B[1, 10) such that A = k∈Z 10k B. In view of (i), assume without loss of generality that 1 < a < 10. Then  [ [  1  aA = 10k aB = 10k aB ∩ [a, 10) ∪ 10 aB ∩ [1, a) k∈Z k∈Z [ = 10k C , k∈Z

  1  with C = aB ∩ [a, 10) ∪ 10 aB ∩ [1, a) ∈ B[1, 10), showing that aA ∈ S. (iii) Since intervals of the form [1, 10s ] with 0 < s < 1 generate B[1, 10), i.e.,  since B[1, 10) = σS{[1, 10s ] : 0 < s < 1} , it is enough to verify the claim for the special case A = k∈Z 10k [1, 10s ] for every 0 < s < 1. In this case, A1/n =

[

k∈Z

[n−1    [  10k/n 1, 10s/n = 10k 10j/n , 10(j+s)/n k∈Z j=0 [ = 10k C , k∈Z

 Sn−1  with C = j=0 10j/n , 10(j+s)/n ∈ B[1, 10). Hence A1/n ∈ S.



By Theorem 2.11, the signif cand σ-algebra S is the same as the signif cant digit σ-algebra σ(D1 , D2 , D3 , . . .), so the closure properties established in Lemma 2.15 carry over to sets determined by signif cant digits. Note also that S is not closed under taking integer powers: If A ∈ S and n ∈ N, then it is easy to check that An ∈ S if and only if S(A)n = B ∪ 10B ∪ . . . ∪ 10n−1 B

for some B ∈ B[1, 10) .

The next example illustrates closure under multiplication by a scalar and under integral roots, as well as lack of closure under powers. Example 2.16. Let A5 be the set of positive real numbers with f rst significant digit less than 3, i.e., [ A5 = {x > 0 : D1 (x) < 3} = {x > 0 : 1 ≤ S(x) < 3} = 10k [1, 3) . k∈Z

Then

 2A5 = x > 0 : D1 (x) ∈ {2, 3, 4, 5} = {x > 0 : 2 ≤ S(x) < 6} [ = 10k [2, 6) ∈ S , k∈Z

20

CHAPTER 2

and also 1/2

A5

  √  √ √  = x > 0 : S(x) ∈ 1, 3 ∪ 10, 30  √  √ √  [ = 10k 1, 3 ∪ 10, 30 ∈ S , k∈Z

whereas, on the other hand, clearly [ A25 =

102k [1, 9) 6∈ S ,

k∈Z

since [1, 9) ⊂ A25 but [10, 90) 6⊂ A25 ; see Figure 2.2.

z A5 = {1 ≤ S < 3} ∈ S

3

1

30

10

100

2A5 ∈ S 2

1

1



3

6

√ 10

√ 30

10

10

20

√ 10 3

60

√ √ 10 10 10 30

100

√ A5 ∈ S 100

A25 1

10

S

100

Figure 2.2: The signif cand σ-algebra S is closed under multiplication by scalars and under integral roots but not under integral powers (bottom), as seen here for the set A5 of Example 2.16. Example 2.17. Recall the signif cand σ-algebra SN on the positive integers def ned in Example 2.14. Unlike its continuous counterpart S, the family SN is not even closed under multiplication by a positive integer, since, for example, A6 = N ∩ {x > 0 : S(x) = 2} = {2, 20, 200, . . .} ∈ SN , but

5A6 = {10, 100, 1000, . . .} 6∈ SN .

Of course, this does not rule out the fact that some events determined by signif cant digits, i.e., some members of SN , still belong to SN after multiplication by an integer. For example, if A7 = {n ∈ N : D1 (n) = 1} = {1, 10, 11, . . . , 18, 19, 100, 101, . . .} ∈ SN

SIGNIFICANT DIGITS AND THE SIGNIFICAND

then

21

3A7 = {3, 30, 33, . . . , 54, 57, 300, 303, . . .} ∈ SN .

It is easy to see that, more generally, SN is closed under multiplication by m ∈ N precisely if the greatest common divisor gcd(m, 10) equals 1, that is, whenever m and 10 have no nontrivial common factor.S Moreover, like S, the σ-algebra SN is closed under integral roots: If A = k∈N0 10k B with B ⊂ N 10 then S 1/n k 1/n A = k∈N0 10 B ∈ SN . With A7 from above, for instance, 1/2

A7

  √  √ √  = n ∈ N : S(n) ∈ 1, 2 ∪ 10, 20

= {1, 4, 10, 11, 12, 13, 14, 32, 33, . . ., 43, 44, 100, 101, . . .} ∈ SN . Thus many of the conclusions drawn later for positive real numbers carry over to positive integers in a straightforward way. z

Chapter Three The Benford Property In order to translate the informal versions (1.1), (1.2), and (2.1) of Benford’s law into more precise formal statements, it is necessary to specify exactly what the Benford property means in various mathematical contexts. For the purpose of this book, the objects of interest fall mainly into three categories: sequences of real numbers, real-valued functions def ned on [0, +∞), and probability distributions and random variables. Since (1.1), (1.2), and (2.1) are statements about probabilities, it is useful to f rst review the formal def nition of probability. Given a non-empty set and a σ-algebra A of subsets of , recall that a probability measure on ( , A) is a function P : A → [0, 1] such that P(∅) = 0, P( ) = 1, and P

[

n∈N

 X An =

n∈N

P(An )

whenever the sets An ∈ A are disjoint. The natural probabilistic interpretation of P is that, for every A ∈ A, the number P(A) ∈ [0, 1] is the probability that the event {ω ∈ : ω ∈ A} occurs. Two of the most important examples of probability measures are the discrete uniform distribution on a non-empty f nite set A ⊂ , where the probability of any set B ⊂ is simply #(B ∩ A) , #A and its continuous counterpart the uniform distribution λa,b on a f nite interval [a, b) with a < b, technically referred to as the (normalized ) Lebesgue measure  on [a, b), or more precisely on [a, b), B[a, b) , given by λa,b ([c, d]) =

d−c b−a

for every [c, d] ⊂ [a, b) .

(3.1)

In advanced analysis courses, it is shown that (3.1) does indeed entail  a unique, consistent def nition of λa,b (B) for every B ∈ B[a, b), and λa,b [a, b) = 1. Another example of a probability measure, on any ( , A), is the Dirac measure (or point mass) concentrated at some ω ∈ , symbolized by δω . In this case, δω (A) = 1 if ω ∈ A, and δω (A) = 0 otherwise. Throughout, probability measures on ( , A) with ⊂ R and A ⊂ B will typically be denoted by capital Roman letters P , Q, etc.

23

THE BENFORD PROPERTY

3.1

BENFORD SEQUENCES

A sequence (xn ) = (x1 , x2 , . . .) of real numbers is a (base-10) Benford sequence if, as N → ∞, the limiting proportion of indices n ≤ N for which xn has f rst signif cant digit d exists and equals log(1 + d−1 ) for all d ∈ {1, 2, . . . , 9}, and similarly for the limiting proportions of the occurrences of all other f nite blocks of initial signif cant digits. The formal def nition is as follows. Definition 3.1. A sequence (xn ) of real numbers is a Benford sequence, or Benford for short, if limN →∞

#{1 ≤ n ≤ N : S(xn ) ≤ t} = log t N

for all t ∈ [1, 10) ,

or, equivalently, if for all m ∈ N, all d1 ∈ {1, 2, . . . , 9}, and all dj ∈ {0, 1, . . . , 9}, j ≥ 2,  # 1 ≤ n ≤ N : Dj (xn ) = dj for j = 1, 2, . . . , m limN →∞ N   Xm −1  m−j = log 1 + 10 dj . j=1

As will be shown below, the sequence of powers of 2, i.e., (2n ) = (2, 4, 8, . . .) is Benford. Note, however, that (2n ) is not base-2 Benford since the second (2) signif cant digit base 2 of 2n is 0 for every n, i.e., D2 (n) ≡ 0, whereas the generalized version (1.5) of Benford’s law requires that  (2)   (2)  Prob D2 = 0 = 1 − Prob D2 = 1 = log2 3 − 1 > 12 .

Similarly, as will be seen below, the powers of 3, i.e., (3n ) = (3, 9, 27, . . .), also form a Benford sequence, and so do the sequence of factorials (n!) and the sequence of Fibonacci numbers (Fn ). Common sequences that are not Benford include the positive integers (n), the powers of 10, i.e., (10n ), and the sequence of logarithms (log n). The notion of Benford sequence given in Def nition 3.1 offers a natural interpretation of Prob in the informal expressions (1.1)–(1.3): A sequence of real numbers (xn ) = (x1 , x2 , . . .) is Benford if, when one of the f rst N entries x1 , x2 , . . . , xN is chosen (uniformly) at random, the probability that its f rst signif cant digit is d approaches the Benford probability log(1 + d−1 ) as N → ∞ for every d ∈ {1, 2, . . . , 9}, and similarly for all other blocks of signif cant digits. Example 3.2. Two specif c sequences of positive integers will be used repeatedly to illustrate key concepts concerning Benford’s law: the Fibonacci numbers and the prime numbers. Both sequences play prominent roles in many areas of mathematics. (i) As will be seen in Example 4.18 below, the sequence of Fibonacci numbers (Fn ) = (1, 1, 2, 3, 5, 8, 13, . . .), where every entry is the sum of its two predecessors, and F1 = F2 = 1, is Benford. The f rst N = 102 elements of the Fibonacci

24

CHAPTER 3

sequence already conform very well to the f rst-digit version (1.1) of Benford’s law, with Prob being interpreted as relative frequency; see Figures 1.5 and 3.1. The conformance becomes even better if the f rst N = 104 elements are considered; see Figure 3.3. (ii) Example 4.17(v) below will establish that the sequence of prime numbers (pn ) = (2, 3, 5, 7, 11, 13, 17, . . .) is not Benford. Figure 3.2 illustrates how poorly the f rst hundred prime numbers conform to (1.1). The conformance is even worse if the f rst ten thousand primes are considered; see Figure 3.3. z F1 , F2 , . . . , F100 20365011074 121393 1 3416454622906707 196418 1 32951280099 5527939700884757 2 317811 53316291173 8944394323791464 3 514229 86267571272 14472334024676221 5 23416728348467685 832040 139583862445 225851433717 8 1346269 37889062373143906 2178309 13 365435296162 61305790721611591 21 99194853094755497 3524578 591286729879 956722026041 34 5702887 160500643816367088 9227465 55 259695496911122585 1548008755920 89 420196140727489673 14930352 2504730781961 144 24157817 679891637638612258 4052739537881 1100087778366101931 233 39088169 6557470319842 10610209857723 377 1779979416004714189 63245986 102334155 610 2880067194370816120 17167680177565 987 4660046610375530309 165580141 27777890035288 44945570212853 1597 7540113804746346429 267914296 433494437 2584 72723460248141 12200160415121876738 4181 701408733 117669030460994 19740274219868223167 6765 1134903170 190392490709135 31940434634990099905 10946 1836311903 308061521170129 51680708854858323072 17711 2971215073 498454011879264 83621143489848422977 28657 4807526976 806515533049393 135301852344706746049 46368 7778742049 1304969544928657 218922995834555169026 75025 12586269025 2111485077978050 354224848179261915075

#{D1 (Fn ) = d} 100 0.3

(∆ = 1.88) 0.2 0.1

1 2 3 4 5 6 7 8 9

d

Figure 3.1: The f rst one hundred Fibonacci numbers conform to the f rst digit law (1.1) quite well.

Alternative notions of Benford sequences The notion of Benford sequence specif ed in Def nition 3.1 is based on the classical concept of natural density of subsets of the positive integers. Recall that a subset C of N is said to have (natural ) density ρ if limN →∞

#{1 ≤ n ≤ N : n ∈ C} = ρ. N

(3.2)

For example, the set of even positive integers has density 12 ; the set of integral multiples of 3, i.e., the set {3n : n ∈ N}, has density 31 ; and the set of prime

25

THE BENFORD PROPERTY

p1 , p2 , . . . , p100 2 3 5 7 11 13 17 19 23 29

31 37 41 43 47 53 59 61 67 71

73 79 83 89 97 101 103 107 109 113

127 131 137 139 149 151 157 163 167 173

179 181 191 193 197 199 211 223 227 229

233 239 241 251 257 263 269 271 277 281

283 293 307 311 313 317 331 337 347 349

353 359 367 373 379 383 389 397 401 409

419 421 431 433 439 443 449 457 461 463

467 479 487 491 499 503 509 521 523 541

#{D1 (pn ) = d} 100 0.3

(∆ = 10.30) 0.2 0.1 1 2 3 4 5 6 7 8 9

d

Figure 3.2: The f rst one hundred prime numbers conform poorly to the f rst digit law (1.1).

numbers and all f nite sets have density 0. In these terms, Def nition 3.1 says that a sequence of real numbers (xn ) is Benford if and only if the density of the set {n ∈ N : S(xn ) ≤ t} exists and is log t for all t ∈ [1, 10). For instance, since the Fibonacci sequence (Fn ) is Benford (see Example 4.18 below), this means that the limiting proportion of Fibonacci numbers that start with a 1 exists and is log 2. In other words, the density of {n ∈ N : D1 (Fn ) = 1} is exactly log 2. Not all subsets of N have natural densities, however, as was seen in Chapter 1 with the set C = {n ∈ N : D1 (n) = 1} of positive integers with leading digit 1, for which lim inf N →∞

#{1 ≤ n ≤ N : n ∈ C} 1 = N 9 5 #{1 ≤ n ≤ N : n ∈ C} < = lim supN →∞ , 9 N

so the limit in (3.2) does not exist. Thus, in particular, the sequence of positive integers (n) is not Benford in the natural density sense of Def nition 3.1. Apart from natural density there are many other important notions of density in use, especially in analysis and number theory; see [68, 104] for recent surveys. To understand how these notions are related, note f rst that (3.2) can equivalently be written as limN →∞

1 XN 1C (n) = ρ , n=1 N

(3.3)

where 1C is the indicator (or characteristic) function of C ⊂ N, i.e., 1C (n) = 1 if n ∈ C, and 1C (n) = 0 otherwise. Thus the concept of natural density is based on the arithmetic mean (Ces`aro average), the uniform weight  assigning  1/N to each of the f rst N terms of the sequence 1C (n) . While this is arguably the simplest notion of density, often alternative def nitions are considered that are based on non-uniform weights. Two such alternative densities in particular, the logarithmic density and the H∞ -density, have been studied extensively in

26

CHAPTER 3

log ∆ N = 100

N = 10000

(Æ = 10.30)

3

(Æ = 14.09)

2

1

0

−1

N 10

100

1000

10000

N = 100 (Æ = 1.88)

−2 −3

N = 10000 (Æ = 0.01)

Figure 3.3: The Fibonacci sequence (blue data) is Benford, hence the deviation ∆ from (1.1) goes to 0 as N → ∞; this is clearly not the case for the sequence of prime numbers (red data). the context of Benford sequences [104, 126, 127, 141]. Both of these densities assign lighter weights to later terms, and unlike the uniformly weighted natural density underlying the notion of Benford sequence in Def nition 3.1, both of these alternative densities assign the value log 2 to the set of positive integers with leading digit 1. The logarithmic (or harmonic) density, a powerful tool in analytic number theory, replaces the uniform weights 1/N in (3.3) with the strictly decreasing weights 1/n·1/(1 + 12 + . . .+ N1 ). Recall that limN →∞ (1 + 12 + . . .+ N1 )/ ln N = 1. Accordingly, a set C ⊂ N is said to have logarithmic density ρ if 1 XN 1 limN →∞ 1C (n) = ρ . (3.4) n=1 n ln N

It is well known (e.g., [104]) that the natural density is strictly stronger than the logarithmic density. This means that every set with natural density ρ has logarithmic density ρ as well, but there are subsets C of N which have a logarithmic density yet do not have a natural density, i.e., the limit in (3.4) does exist whereas the limit in (3.3) does not. A prominent example is the set of positive integers with leading digit 1, i.e., C = {n ∈ N : D1 (n) = 1}, whose logarithmic density is log 2 whereas, as seen above, its natural density does not exist. In

27

THE BENFORD PROPERTY

terms of logarithmic density, a sequence of real numbers (xn ) is sometimes called logarithmic Benford (e.g., [104]) if, for all t ∈ [1, 10), the logarithmic density of {n ∈ N : S(xn ) ≤ t} is log t. For example, in contrast to the fact that neither the sequence of positive integers (n) nor the sequence of prime numbers (pn ) is Benford (see Figures 3.2, 3.3, and Example 4.17(v) below), both sequences are logarithmic Benford (e.g., [104, 127]). On the other hand, since the natural density is stronger than the logarithmic density, every sequence that is Benford, such as (2n ) and (Fn ), is automatically logarithmic Benford as well. A second alternative density that, like the logarithmic density, also assigns the “Benford value” log t to the set of positive integers with signif cands not exceeding t, is the so-called H∞ -density (e.g., [59, 104, 141]). The H∞ -density, an iterated averaging method, is def ned inductively as follows: For any set C ⊂ N, let H0,n = 1C (n) for n ∈ N

and Hm,n =

By def nition, the set C has H∞ -density ρ if

1 Xn Hm−1,j for m, n ∈ N . j=1 n

limm→∞ lim inf n→∞ Hm,n = ρ = limm→∞ lim supn→∞ Hm,n . It can be shown that the natural density is strictly stronger than the H∞ -density, which in turn is strictly stronger than the logarithmic density [104]. As with the logarithmic density, the sequence of positive integers (n) is also H∞ -Benford in the sense that every set {n ∈ N : S(n) ≤ t} has H∞ -density log t. Conversely, as with logarithmic density, every Benford sequence is also H∞ -Benford. Some sequences such as (10n ), of course, are not Benford in any of these senses. As will become apparent in subsequent chapters, one of the simplest but also most fundamental examples in this book is the sequence (an ) with a > 0. For every such sequence, the notions of logarithmic Benford and H∞ -Benford are n both equivalent to Def nition 3.1. Another basic example is the sequence (ab ) with b > 1. As will be explained in Section 6.4, this sequence is Benford for (Lebesgue) almost every, but not every, a > 0. Again this situation would remain completely unchanged if, instead of Def nition 3.1, the notion of logarithmic Benford or H∞ -Benford were used throughout. Many other notions of density exist, and can also be used to def ne alternative concepts of Benford sequences. Arithmetic averages, however, play a central role in probability and statistics, as evidenced by the strong law of large numbers, the classical ergodic theorems, and the empirical distribution functions appearing in the fundamental Glivenko–Cantelli Theorem. Thus it is the arithmetic average, with its uniform weights, that is used exclusively in this book for the def nition of Benford sequence. Extensions to settings of logarithmic density, H∞ -density, and other notions of density are beyond the scope of this book, and a possible topic for future research.

28 3.2

CHAPTER 3

BENFORD FUNCTIONS

Benford’s law also appears frequently in real-valued functions such as those arising as solutions of initial value problems for differential equations; see Chapters 6 and 7. Thus, the starting point here is to def ne what it means for a function to follow Benford’s law. In direct analogy with the terminology for sequences, a function f is a (base10) Benford function, or simply Benford, if the limiting proportion of time τ < T that the f rst digit of f (τ ) equals d1 is exactly log(1 + d−1 1 ), and more generally, if for all t in [1, 10), the proportion of time τ < T that the signif cand of f is less than or equal to t approaches log t as T → +∞. To formalize this notion, f rst recall that a function f : R → R is (Borel ) measurable if f −1 (J) is a Borel set, i.e., f −1 (J) ∈ B for every interval J ⊂ R. With the terminology introduced in Section 2.3, this is equivalent to saying that every set in the σ-algebra generated by f is a Borel set, i.e., σ(f ) ⊂ B. Slightly more generally, for any set and any σ-algebra A on , a function f : → R is (Borel ) measurable if σ(f ) ⊂ A. The collection of Borel measurable functions f : R → R contains all functions of practical interest. For example, every piecewise continuous function (meaning that f has at most countably many discontinuities) is measurable. Thus every polynomial, trigonometric, and exponential function is Borel measurable, and so is every probability density function of any relevance. In fact, it is a diÿ cult exercise to produce a function that is not measurable, or a set C ⊂ R that is not a member of B, and this can be done only in a non-constructive way. For all practical purposes, therefore, the reader may simply read “set” for “Borel set,” and “function” for “Borel measurable function.” Recall that given a set and a σ-algebra A on , a measure µ on ( , A) is a function µ : A → [0, +∞] that has all the properties of a probability measure, except that µ(A) may also be smaller or larger than 1, and even inf nite. A very important example of a measure is the so-called Lebesgue measure on (R, B), denoted by λ here and throughout. The basic def ning property of λ is that λ([a, b]) = b − a for every interval [a, b] ⊂ R, that is, λ is an extension of the informal notion of length. This measure λ is related to the probability measures λa,b considered in (3.1), since, e.g., λ(B) = limN →∞ 2N λ−N,N (B ∩ [−N, N ]) for every B ∈ B . It is customary to also use the symbol λ, a subscript, to denote  often without  the restriction of Lebesgue measure to C, B(C) , with the Borel set C being clear from the context. The next def nition formalizes the notion of a function having the Benford property. if

Definition 3.3. A (Borel measurable) function f : [0, +∞) → R is Benford     λ τ ∈ [0, T ) : S f (τ ) ≤ t limT →+∞ = log t for all t ∈ [1, 10) , T

29

THE BENFORD PROPERTY

or, equivalently, if for all m ∈ N, all d1 ∈ {1, 2, . . . , 9}, and all dj ∈ {0, 1, . . . , 9}, j ≥ 2,     λ τ ∈ [0, T ) : Dj f (τ ) = dj for j = 1, 2, . . . , m limT →+∞ T   Xm −1  = log 1 + 10m−j dj . j=1

Directly analogous to the probabilistic interpretation of a Benford sequence, the def nition of a Benford function given in Def nition 3.3 also offers a natural probabilistic interpretation: A function f : [0, +∞) → R is Benford if, when a time τ is chosen (uniformly) at random in [0, T ), the probability that the f rst digit of f (τ ) equals d approaches log(1 + d−1 ) as T → +∞, for every d ∈ {1, 2, . . . , 9}, and similarly for all other blocks of signif cant digits. As will be seen in Example 4.9 below, the function f (t) = eat is Benford whenever a 6= 0, but the functions f (t) = t and f (t) = (sin t)2 are not Benford. f1 (t) = et/2

10

f2 (t) = t 10

S(f 1 ) 1

t 0

10

S(f 2 ) 1

t 0

10

Figure 3.4: The function f1 is Benford, but the function f2 is not.

3.3

BENFORD DISTRIBUTIONS AND RANDOM VARIABLES

Benford’s law appears in a wide variety of statistics and probability settings, including: mixtures of random samples; stochastic models such as geometric Brownian motion that are of great importance for modeling real-world processes; and products of independent, identically distributed random variables. (The term independent, identically distributed will henceforth be abbreviated i.i.d., in accordance with standard practice.) This section lays the foundations for analyzing the Benford property for probability distributions and random variables. Recall from Section 2.3 that a probability space is a triple ( , A, P) where is a non-empty set (the set of outcomes), A is a σ-algebra on (the family of events), and P is a probability measure on ( , A). A (real-valued) random

30

CHAPTER 3

variable X on ( , A, P) is simply a Borel measurable function X : its distribution PX is the probability measure on (R, B) def ned by   PX (−∞, x] = P(X ≤ x) for all x ∈ R .

→ R, and

A probability measure on (R, B) will be referred to as a Borel probability measure on R. Again, since all subsets of R of any practical interest are Borel sets, the modif er “Borel” will be suppressed unless there is a potential for confusion, i.e., the reader may read “probability measure on R” for “Borel probability measure on R.” Every probability measure P on R is uniquely determined by its distribution function FP , def ned as   FP (x) = P (−∞, x] for all x ∈ R .

As is well known and easy to check, the function FP is right-continuous and non-decreasing, with limx→−∞ FP (x) = 0 and limx→+∞ FP (x) = 1. For the sake of notational simplicity, write FX instead of FPX for every random variable X. The probability measure P , or any random variable X with PX = P , is continuous (or atomless) if P ({x}) = 0 for every x ∈ R, or, equivalently, if the function FP is continuous. The probability measure P is absolutely continuous (a.c.) if, for any B ∈ B, P (B) = 0 whenever λ(B) = 0. By the Radon–Nikodym Theorem, this is equivalent to P having a density, i.e., there exists a measurable function fP : R → [0, +∞) such that P ([a, b]) =

Z

a

b

fP (x) dx

for all [a, b] ⊂ R .

(3.5)

Again, for simplicity, write fX instead of fPX for every a.c. random variable X. R +∞ Note that (3.5) implies −∞ fP (x) dx = 1. Every a.c. probability measure on (R, B) is continuous, but not vice versa; e.g., see [36]. Definition 3.4. A Borel probability measure P on R is Benford if     P {x ∈ R : S(x) ≤ t} = P S −1 ({0} ∪ [1, t]) = log t for all t ∈ [1, 10) .

A random variable X on a probability space ( , A, P) is Benford if PX is Benford, i.e., if   FS(X) (t) = P(S(X) ≤ t) = PX {x ∈ R : S(x) ≤ t} = log t for all t ∈ [1, 10) ,

or, equivalently, if S(X) is an absolutely continuous random variable with density fS(X) (t) = t−1 log e for t ∈ [1, 10). In terms of its signif cant digits, X is Benford if and only if   Xm −1    m−j P Dj (X) = dj for j = 1, 2, . . . , m = log 1 + 10 dj , j=1

for all m ∈ N, all d1 ∈ {1, 2, . . . , 9}, and all dj ∈ {0, 1, . . . , 9}, j ≥ 2.

31

THE BENFORD PROPERTY

FS(X)

Flog S(X)

1

1

log t

s

0.5

0

0.5

1

t

10

0

s

0

fS(X)

1

flog S(X)

log e

1 [0,1)

1

t−1 log e 1 [1,10) 0

1

t

10

0

0

s

1

Figure 3.5: The distribution functions (top) and densities of S(X) and log S(X), respectively, for every Benford random variable X.

Example 3.5. If X is a Benford random variable on some probability space ( , A, P), then P(D1 (X) = 1) = P(1 ≤ S(X) < 2) = log 2 = 0.3010 , P(D1 (X) = 9) = log 10 9 = 0.04575 ,

315 P(D1 (X) = 3, D2 (X) = 1, D3 (X) = 4) = log 314 = 0.001380 .

z

It follows easily from Def nition 3.4 that for every Benford random variable X, the signif cant digits D1 (X), D2 (X), . . . of X are asymptotically independent and uniformly distributed, i.e.,   limn→∞ P Dn+1 (X) = d1 , Dn+2 (X) = d2 , . . . , Dn+m (X) = dm = 10−m for all m ∈ N and all d1 , d2 , . . . dm ∈ {0, 1, . . . , 9} . (3.6) There is nothing special about the Benford distribution in this sense, as (3.6) holds for every random variable with a density [76, Cor. 4.4]. As the following example shows, there are many Benford probability measures on the positive real numbers, and correspondingly many positive random variables that are Benford. Example 3.6. (i) If U is a random variable uniformly distributed on [0, 1), then the random variable X = 10U and its probability distribution PX are both Benford, i.e., X = S(X) is absolutely continuous with density t−1 log e for

32

CHAPTER 3

t ∈ [1, 10). This follows since P(S(X) ≤ t) = P(X ≤ t) = P(10U ≤ t) = P(U ≤ log t) = log t

for all t ∈ [1, 10) .

On the other hand, Y = 2U is clearly not Benford, since P(S(Y ) > 2) = 0 6= 1 − log 2 . (ii) More generally, for every integer k, the probability measure Pk with density fk (x) = x−1 log e on [10k , 10k+1 ) is Benford, and so is 21 (Pk + Pk+1 ). In fact, )k∈Z , i.e., every probability measure P every convex combination of the (PkP k∈Z qk Pk with 0 ≤ qk ≤ 1 for all k and k∈Z qk = 1, is Benford. (iii) The absolutely continuous random variable X with density  −2   x (x − 1) log e if x ∈ [1, 10) , 10x−2 log e if x ∈ [10, 100) , fX (x) =   0 otherwise ,

is Benford, since the random variable S(X) is also absolutely continuous, with density t−1 log e for t ∈ [1, 10), even though X is not of the type of distribution considered in (ii). z

Definition 3.7. The Benford distribution B is the unique probability measure on (R+ , S) with [  B(S ≤ t) = B 10k [1, t] = log t for all t ∈ [1, 10) , k∈Z

or, equivalently, for all m ∈ N, all d1 ∈ {1, 2, . . . , 9}, and all dj ∈ {0, 1, . . . , 9}, j ≥ 2,   Xm −1    B Dj = dj for j = 1, 2, . . . , m = log 1 + 10m−j dj . j=1

Given any probability P on (R, B), let |P | denote the probability given by   |P |(B) = P {x ∈ R : |x| ∈ B} for all B ∈ B .   Clearly, |P | is concentrated on [0, +∞), i.e., |P | [0, +∞) = 1, and ( 0 if x < 0 , F|P | (x) = FP (x) − FP (−x) + P ({−x}) if x ≥ 0 ;

in particular, therefore, if P is continuous or a.c., then so is |P |, its density in the latter case being fP (x) + fP (−x), for all x ≥ 0, where fP is the density of P.

33

THE BENFORD PROPERTY

The next theorem characterizes Benford random variables in terms of the Benford distribution and the moments (expected values of positive integral powers) of S(X). To this end, recall that the expectation, or expected (or mean) value of X is Z Z E[X] = X dP = x dPX (x) , Ω

R

provided that this integral exists. More generally, for every measurable function g : R → R, the expectation of the random variable g(X) is Z Z E[g(X)] = g(X) dP = g(x) dPX (x) . Ω

R

  In particular, if E[X] exists, then var X := E (X − EX)2 is the variance of X. Theorem 3.8. Let X be a random variable. The following are equivalent:

(i) X is Benford; (ii) P(|X| ∈ A) = B(A) for all A ∈ S; (iii) E[S(X)n ] = (10n − 1)n−1 log e for all n ∈ N. Proof. The equivalence of (i) and (ii) follows immediately from Def nitions 3.4 and 3.7. Statement (iii) follows from (i) by direct calculation, since Def nition 3.4 implies that S(X) has density t−1 log e on [1, 10). Statement (iii) implies (i) since the distribution of every bounded random variable such as S(X) is uniquely determined by its moments; e.g., see [37, Exc. 6.6.5].  Note that the Benford distribution B is a probability distribution on the signif cant digits, or the signif cand, of the underlying data, and not on the raw data themselves. That is, B is a probability measure on the family of sets def ned by the base-10 signif cand, i.e., on (R+ , S), but not on the larger (R+ , B+ ) or the still larger (R, B). For example, the probability B({1}) is not def ned, simply because the singleton set {1} cannot be def ned in terms of signif cant digits or signif cands alone, and hence does not belong to the domain of B. In the framework of Examples 2.14 and 2.17, it is tempting to call an Nvalued random variable X, i.e., P(X ∈ N) = 1, or its distribution PX , Benford on N if   P(S(X) ≤ t) = PX {n ∈ N : S(n) ≤ t} = log t for all t ∈ [1, 10) . (3.7) However, 10 let S no such random variable exists! To see this, for every n ∈ N An = k∈N0 10k {n} ∈ SN , and note that N equals the disjoint union of the sets An , and that S(An ) = {10hlog ni }; here hlog ni ∈ [0, 1) denotes the fractional part of log n, that is, hlog ni = log n − ⌊log n⌋. Thus the random variable S(X) is concentrated on the countable set B := S(N) = {10hlog ni : n ∈ N 10 }, that is, P(S(X) ∈ B) = 1. On the other hand, (3.7) implies that S(X) is continuous, and so P(S(X) ∈ B) = 0. This contradiction shows that a random variable X

34

CHAPTER 3

supported on N, i.e., with P(X ∈ N) = 1, cannot be Benford. However, given ε > 0, it is not hard to f nd a random variable Xε supported on N with P(S(Xε ) ≤ t) − log t < ε for all t ∈ [1, 10) . (3.8)

The next example shows two such random variables, each supported on a f nite number of points. The f rst is distributed non-uniformly on an interval of positive integers, and the second is distributed uniformly on a non-uniform range of positive integers. Example 3.9. (i) Fix N ∈ N and let YN be a random variable def ned by P(YN = j) =

1 jcN

for j = 10N , 10N + 1, . . . , 10N +1 − 1,

P10N +1 −1 where cN = j=10N j −1 . Note that YN has 9 · 10N possible outcomes and may be thought of as a discrete approximation of the Benford random variable with distribution PN in Example 3.6(ii). From PS(YN ) = c−1 N

X10N +1 −1 j=10N

j −1 δS(j) = c−1 N

X10N +1 −10N j=1

10N

1 δ −N , + j − 1 1+10 (j−1)

M M + 1 XM −1 < j < ln , valid j=L L L−1 for all L, M ∈ N with 2 ≤ L < M , it is straightforward to deduce that, for all 1 ≤ t < 10, P(S(YN ) ≤ t) − log t < − log(1 − 10−N ) < 1 · 10−N . 2 together with the elementary estimate ln

Thus (3.8) holds for Xε = YN with N > | log ε|. (ii) Fix N ∈ N, and let ZN be the random variable uniformly distributed on the N positive integers ⌊10N +j/N ⌋, where j = 0, 1, . . . , N − 1, that is,  j k 1 P ZN = 10N +j/N = N

for all j = 0, 1, . . . , N − 1 .

Note that S(⌊10N +j/N ⌋) ≈ S(10N +j/N ) = 10j/N , and hence it is plausible that log S(ZN ) is approximately uniform on [0, 1) for large N . More precisely, j k 10j/N − 10−N ≤ S 10N +j/N ≤ 10j/N for all j = 1, 2, . . . , N − 1 , and therefore, for every N ,

#{1 ≤ j ≤ N − 1 : j ≤ N log t} ≤ P(S(ZN ) ≤ t) N ≤

1 #{1 ≤ j ≤ N − 1 : j ≤ N log(t + 10−N )} + , N N

35

THE BENFORD PROPERTY

which in turn implies that, for all 1 ≤ t < 10, P(S(ZN ) ≤ t) − log t ≤ log(1 + 10−N ) + N −1 ≤ N −1 log 11 .

Hence, given ε > 0, (3.8) is satisf ed with Xε = ZN and N > ε−1 log 11. This example is reminiscent of the distribution shown in [23, Thm. 3.22] to be the most scale-invariant probability distribution on N points; cf. Section 5.1. z

None of the standard continuous probability distributions (e.g., uniform, exponential, normal, etc.) are Benford, and the next example illustrates this with f ve common distributions. Here, as well as on several later occasions, the quantity ∆∞ := 100 · sup1≤t e −e + e −e + e − e−20 = 0.3186 > log 2 ,

and hence exp(1) is not Benford either. An explicit calculation shows that, for every 1 ≤ t < 10,   X   X k k FS(X) (t) = FX (10k t) − FX (10k ) = e−10 − e−10 t . k∈Z

k∈Z

Since FS(X) (t) 6≡ log t, the random variable X is not Benford. Numerically, one f nds ∆∞ = 3.05; see also Figure 3.6. Thus, even though X is not exactly Benford, it is close to being Benford in the sense that P(S(X) ≤ t) − log t is small for all t ∈ [1, 10); see [56, 96, 108] for a detailed analysis of the exponential distribution’s relation to Benford’s law.

36

CHAPTER 3

(iii) If X has the Pareto distribution with parameter 1, i.e., P(X > x) = x−1 for all x ≥ 1, then for all 1 ≤ t < 10,   X∞  [∞  FS(X) (t) = P X ∈ [10k , 10k t] = 10−k − (10t)−k = 10 9 (t − 1)/t , k=0

k=0

so X is not Benford. √ 2 (iv) If X has the arcsin distribution, i.e., P(X ≤ x) = arcsin x for all π 0 ≤ x < 1, then for every 1 ≤ t < 10,   [ FS(X) (t) = P(S(X) ≤ t) = P X ∈ 10−n [1, t] n∈N √    2 X∞  = arcsin 10−n/2 t − arcsin 10−n/2 n=1 π X ∞ 2 tk+1/2 − 1 (2k)! = · k+1/2 , 2k 2 k=0 2 (k!) (2k + 1) 10 π −1 and hence in particular FS(X)

√  2 X∞ (2k)! 1 10 = · k=0 22k (k!)2 (2k + 1) 10k/2+1/4 + 1 π 2 X∞ (2k)! < 10−(k/2+1/4) 2k k=0 2 (k!)2 (2k + 1) π √   2 = arcsin 10−1/4 = 0.3801 < log 10 , π

so X is not Benford. (v) If X is standard normal then, for every t ∈ [1, 10), X   P(S(X) ≤ t) = 2 Φ(10k t) − Φ(10k ) , k∈Z

where Φ is the distribution function of X, that is, Z x 1 1 2 Φ(x) = FX (x) = P(X ≤ x) = √ e− 2 τ dτ , 2π −∞

x ∈ R.

A numerical calculation indicates that ∆∞ = 6.05. Though larger than in the exponential case, the deviation of X from Benford’s law is still rather small; see Figure 3.6. z As is suggested by the special cases in Example 3.10, none of the familiar classical probability distributions or random variables, such as uniform, exponential, Pareto, normal, beta, binomial, or gamma distributions, are exactly Benford. On the other hand, some standard parametrized families of distributions are arbitrarily close to being Benford for some values of the parameters, whereas other families of distributions are bounded strictly away from being Benford for all values of the parameters. The next two theorems illustrate this with the families of Pareto and positive uniform distributions, respectively.

37

THE BENFORD PROPERTY

FS(X)(t)

1

X = U(0,1) (∆∞ = 26.88) 0.5

0

0.3

X = exp(1) (∆∞ = 3.05) 1

X 1-Pareto (∆∞ = 26.88)

10

t FS(X)(t) − log t

X arcsin (∆∞ = 28.77)

0.2

X standard normal (∆∞ = 6.05)

0.1 0

t

10

−0.1 −0.2 −0.3

Figure 3.6: The distribution functions of S(X) and their deviation from log t (bottom), where X has a uniform, exponential, Pareto, arcsin, and standard normal distribution, respectively; see Example 3.10. Theorem 3.11. Let Xα be a Pareto random variable with parameter α > 0, i.e., P(Xα > x) = x−α for all x ≥ 1. Then max1≤t 0,   X∞  [∞  10−αk − 10−αk t−α FS(Xα ) (t) = P Xα ∈ [10k , 10k t] = k=0

t−α − 1 = −α = ht (−α) . 10 − 1

k=0

For every 1 < t < 10, therefore, FS(Xα ) (t) − log t > 0 is strictly decreasing to 0 as α ց 0, and so is max1≤t 0. For every x ≥ 1, therefore, P(X n > x) = P(X > x1/n ) = x−α/n , showing that X n is Pareto with parameter α/n. Hence the conclusion follows from Theorem 3.11.  In contrast to Theorem 3.11, no uniform distribution is even close to Benford’s law, no matter how large its range or where it is centered. This statement can be quantif ed explicitly for positive uniform distributions as follows. Theorem 3.13 ([12]). For every uniformly distributed positive random variable X, 1 max1≤t δ, with  −δ ∗ − 10−δ − 1 − δ + δ ∗ = ψ(δ) − ψ(δ ∗ ) ; Gδ (1 + δ − δ ∗ ) = 1 − 10 9 10

hence max0≤s≤1 Gδ (s) = ψ(δ) as well as − min0≤s≤1 Gδ (s) = ψ(δ ∗ ) − ψ(δ). Similarly, if δ ∗ < δ < 1 then Gδ has a negative minimum at s = δ − δ ∗ < δ, with  −δ∗  Gδ (δ − δ ∗ ) = 10 − 10−δ − δ + δ ∗ = ψ(δ) − ψ(δ ∗ ) . 9 10 In summary, therefore, for every 0 ≤ δ < 1, max0≤s≤1 Gδ (s) = ψ(δ) ,

− min0≤s≤1 Gδ (s) = ψ(δ ∗ ) − ψ(δ) ,

and consequently  max0≤s 0. Similar arguments show that max0≤s≤1 Fhlog Xi (s) − s ≥ 21 ψ(δ ∗ ) holds more generally for every uniformly distributed positive random variable X; the interested reader is referred to [12] for details. 

FS(X)(t)

1

T =4 (∆∞ = 23.12)

0.5

X = U(0, T) 0

0.3

1

t

10

T = 1.275 or 6.324 min. distance from B (∆∞ = 13.44) T = 1 or 2.558 max. distance from B (∆∞ = 26.88)

FS(X)(t) − log t

0.2 0.1 0

t

10

−0.1 −0.2 −0.3

Figure 3.8: Distribution functions of S(X) for the random variable X = U (0, T ) and different values of 1 ≤ T < 10. These distribution functions consist of two linear parts, with the “break-point” located on the dotted curve (top left). Note that the maximum vertical distance between any such distribution function and the (blue) Benford curve log t is bounded away from zero. In fact, the diagram also depicts those situations for which this maximum vertical distance is minimal (green) and maximal (red), respectively. Remark. It is easy to see that max1≤t 0 whenever the random variable X is uniformly distributed (on an open interval that may contain 0), normal, or exponential. Unlike in Theorem 3.13, however, the best possible (that is, largest) value of δ is not known to the authors in any of these cases. The next lemma provides a convenient framework for studying probabilities on the signif cand σ-algebra by translating them into probability measures on   the classical space of Borel subsets of [0, 1), that is, on [0, 1), B[0, 1) . For a proper formulation, observe that for every function f : → R with σ(f ) ⊂ A

41

THE BENFORD PROPERTY

and every probability measure P on ( , A), f and P together induce a probability measure Pf on (R, B) in a natural way, namely, by setting   Pf (B) = P f −1 (B) for all B ∈ B . (3.9) For instance, if X is a (real-valued) random variable on the probability space ( , A, P) then PX is simply the distribution of X, i.e., PX = PX . (Note that some textbooks denote Pf by P ◦ f −1 , and this notation will be used later in this book also.) The special case of interest for signif cands is ( , A) = (R+ , S) and f = log S. Lemma 3.14. The function ℓ : R+ → [0, 1) def ned by ℓ(x) = log S(x) establishes a one-to-one and onto correspondence (measure isomorphism) between   probability measures on (R+ , S) and probability measures on [0, 1), B[0, 1) .

  Proof. From ℓ−1 ([a, b]) = S −1 10a , 10b for all 0 ≤ a < b < 1, it follows that σ(ℓ) = R+ ∩σ(S) = S, and hence Pℓ as def ned by (3.9), with any probability  P on (R+ , S), is a well-def ned probability measure on [0, 1), B[0, 1) .   Conversely, given any probability measure P on [0, 1),SB[0, 1) and any A in S, let B ∈ B[0, 1) be the unique set for which A = k∈Z 10k 10B , where 10B = {10s : s ∈ B}, and def ne P ℓ (A) = P (B) . It is readily conf rmed that ℓ(A) = B, ℓ−1 (B) = A, and P ℓ is a well-def ned probability measure on (R+ , S). Moreover,   (P ℓ )ℓ (B) = P ℓ ℓ−1 (B) = P ℓ (A) = P (B) for all B ∈ B[0, 1) ,

  showing that (P ℓ )ℓ = P , and hence every probability measure on [0, 1), B[0, 1) is of the form Pℓ with the appropriate P. On the other hand,   (Pℓ )ℓ (A) = Pℓ (B) = P ℓ−1 (B) = P(A) for all A ∈ S ; thus (Pℓ )ℓ = P, and the correspondence P 7→ Pℓ is one-to-one as well. In summary, P ↔ Pℓ is bijective. 

From the proof of Lemma 3.14 it is clear that a bijective correspondence  between probability measures on (R+ , S) and on [0, 1), B[0, 1) , respectively, could have been established in many other ways as well, e.g., by using the e for instance, e function ℓ(x) = 91 (S(x) − 1) instead of ℓ; with this choice of ℓ, (δ10k )ℓe = δ0 = (δ10k )ℓ for all k ∈ Z. The special role of ℓ def ned in Lemma 3.14 only becomes apparent through its relation to Benford’s law. To see this, recall that B is the (unique) probability measure on (R+ , S) with [    B {x > 0 : S(x) ≤ t} = B 10k [1, t] = log t for all 1 ≤ t < 10 . k∈Z

42

CHAPTER 3

In view of (2.1), the probability measure B on (R+ , S) is the most natural formalization of Benford’s  law. On the  other hand, it will become clear in subsequent chapters that on [0, 1), B[0, 1) , the uniform distribution λ0,1 has many special properties that make it play a very important role. The relevance of the specif c choice for ℓ in Lemma 3.14, therefore, is that Bℓ = λ0,1 . (Notice, for example, that with the function ℓe = 19 (S − 1) from above, the distribution function of Bℓe is log(9s + 1), and so clearly Bℓe 6= λ0,1 .) The reader will learn shortly why, for a deeper understanding of Benford’s law, the relation Bℓ = λ0,1 is very benef cial indeed.

Chapter Four The Uniform Distribution and Benford’s Law The uniform distribution characterization of Benford’s law is undoubtedly the most basic and powerful of all characterizations, largely because the mathematical theory of uniform distribution modulo one is very well developed; see [50, 93] for authoritative surveys. This chapter records and develops tools from that theory which will be used throughout this book to establish Benford behavior of sequences, functions, and random variables. 4.1

UNIFORM DISTRIBUTION CHARACTERIZATION OF BENFORD’S LAW

Here and throughout, denote by hxi the fractional part of any real number x, that is, hxi = x − ⌊x⌋. For example, hπi = h3.1415i = 0.1415 = π − 3. Recall that λ denotes Lebesgue measure on (R, B). Definition 4.1. A sequence (xn ) = (x1 , x2 , . . .) of real numbers is uniformly distributed modulo one, abbreviated henceforth as u.d. mod 1, if limN →∞

#{1 ≤ n ≤ N : hxn i ≤ s} =s N

for all s ∈ [0, 1) ;

a (Borel measurable) function f : [0, +∞) → R is u.d. mod 1 if   λ {t ∈ [0, T ) : hf (t)i ≤ s} limT →+∞ = s for all s ∈ [0, 1) ; T

a random variable X on a probability space ( , A, P) is u.d. mod 1 if P(hXi ≤ s) = s

for all s ∈ [0, 1) ;

and a probability measure P on (R, B) is u.d. mod 1 if [    P {x : hxi ≤ s} = P [k, k + s] = s for all s ∈ [0, 1) . k∈Z

Remark. A function that is u.d. mod 1 in the sense of Def nition 4.1 is often called continuously uniformly distributed modulo one (c.u.d. mod 1) in the literature. The next elementary result, implicit in [9] and [111] and explicitly stated in [46], is one of the main tools in the theory of Benford’s law because it allows application of the powerful theory of uniform distribution modulo one. (Recall the conventions log 0 = 0 and S(0) = 0.)

44

CHAPTER 4

Theorem 4.2 (Uniform distribution characterization). A sequence of real numbers (or a Borel measurable function, a random variable, a Borel probability measure) is Benford if and only if the decimal logarithm of its absolute value is uniformly distributed modulo one. Proof. Let X be a random variable. Then, for all s ∈ [0, 1),

  [ P(hlog |X|i ≤ s) = P log |X| ∈ [k, k + s] k∈Z   [ = P |X| ∈ [10k , 10k+s ] + P(X = 0) = P(S(X) ≤ 10s ) . k∈Z

Hence, by Def nitions 3.4 and 4.1, the random variable X is Benford if and only if P(S(X) ≤ 10s ) = log 10s = s for all s ∈ [0, 1), i.e., if and only if log |X| is u.d. mod 1. The proofs for sequences, functions, and probability measures are completely analogous.  The following proposition from the basic theory of uniform distribution mod 1, together with Theorem 4.2, will be instrumental in establishing the Benford property for many sequences, functions, and random variables. Proposition 4.3. (i) The sequence (xn ) is u.d. mod 1 if and only if (kxn + b) is u.d. mod 1 for every integer k 6= 0 and every b ∈ R. Also, the sequence (xn ) is u.d. mod 1 if and only if (yn ) is u.d. mod 1 whenever limn→∞ |yn − xn | = 0. (ii) The function f is u.d. mod 1 if and only if kf (t) + b is u.d. mod 1 for every integer k 6= 0 and every b ∈ R. (iii) The random variable X is u.d. mod 1 if and only if kX + b is u.d. mod 1 for every integer k 6= 0 and every b ∈ R. Proof. (i) The “if” part is obvious with k = 1, b = 0. For the “only if” part, assume that (xn ) is u.d. mod 1, and recall that λ0,1 denotes Lebesgue measure on [0, 1), B[0, 1) . Note f rst that limN →∞

#{n ≤ N : hxn i ∈ C} = λ0,1 (C) N

whenever C is a f nite union of intervals. Let k be a non-zero integer and observe that, for any 0 < s < 1,  n io Sk−1 h  if k > 0 ,  x : hxi ∈ j=0 kj , j+s k {x : hkxi ≤ s} = n h io S   x : hxi ∈ |k|−1 j+1−s , j+1 if k < 0 . j=0 |k| |k|

45

THE UNIFORM DISTRIBUTION AND BENFORD’S LAW

Consequently, limN →∞

 S h i k−1 j j+s  λ0,1 , k  j=0 k #{n ≤ N : hkxn i ≤ s} = S h i  N |k|−1 j+1−s j+1  λ0,1 , j=0 |k| |k| =

(

k · ks s |k| · |k|

if k > 0 , if k < 0 ,

if k > 0 , if k < 0 ,

= s, showing that (kxn ) is u.d. mod 1. Similarly, note that, for any b, s ∈ (0, 1), ( {x : hxi ∈ [0, s − b] ∪ [1 − b, 1)} if s ≥ b , {x : hx + bi ≤ s} = {x : hxi ∈ [1 − b, 1 + s − b]} if s < b . Thus, assuming without loss of generality that 0 ≤ b < 1, (   λ0,1 [0, s − b]∪[1 − b, 1) #{n ≤ N : hxn + bi ≤ s} limN →∞ =   N λ0,1 [1 − b, 1 + s − b]

if s ≥ b ,

if s < b ,

= s,

and hence (xn + b) is also u.d. mod 1. The second assertion is clear from the def nition. The proofs of (ii) and (iii) are completely analogous.  It follows immediately from Theorem 4.2 and Proposition 4.3 that powers and reciprocals of Benford sequences, functions, and random variables are also Benford. Theorem 4.4. Let (xn ), f , and X be a Benford sequence, function, and random variable, respectively. Then, for all a ∈ R and k ∈ Z with ak = 6 0, the sequence (axkn ), the function af k , and the random variable aX k are also Benford. Without the hypothesis that k ∈ Z, the conclusion of Theorem 4.4 may fail. In general, the Benford property is not preserved under taking roots, in contrast to the situation for the signif cand σ-algebra, which is closed under roots but not under powers; see Lemma 2.15 and Example 2.16.   Example 4.5. Let (xn ) be the sequence S(2n ) = (2, 4, 8, 1.6, 3.2, . . .), let t U(0,1) f be the function . √ f(t)√= S(e ),√and let X be the random variable X = 10 Then none of xn , f , and X is Benford, since all their signif cands are less √   than 10. On the other hand, since S S(x) = S(x) for all x, (xn ) is Benford by Example 4.7(i) below, f is Benford by Example 4.9(i), and X is Benford by Example 3.6(i). z

46 4.2

CHAPTER 4

UNIFORM DISTRIBUTION OF SEQUENCES AND FUNCTIONS

This section focuses on tools and techniques useful in analyzing the Benford behavior of sequences and functions. The f rst two propositions record, for ease of reference, several basic results from the theory of uniform distribution of sequences and functions. Proposition 4.6. Let (xn ) = (x1 , x2 , . . .) be a sequence of real numbers. (i) If limn→∞ (xn+1 − xn ) = θ for some irrational θ, then (xn ) is u.d. mod 1. (ii) If (xn ) is periodic, i.e., xn+p = xn for some p ∈ N and all n, then (nθ+xn ) is u.d. mod 1 if and only if θ is irrational. (iii) The sequence (xn ) is u.d. mod 1 if and only if (xn + a log n) is u.d. mod 1 for all a ∈ R. (iv) If (xn ) is u.d. mod 1 and non-decreasing, then (xn / log n) is unbounded. (v) If f is differentiable for t ≥ t0 , if f ′ (t) tends to zero monotonically   as t → +∞, and if t|f ′ (t)| → +∞ as t → +∞, then the sequence f (n) is u.d. mod 1. (vi) Suppose limn→∞ n(yn+1 − yn ) = 0 for the sequence (yn ) of real numbers. Then (xn ) is u.d. mod 1 if and only if (xn + yn ) is u.d. mod 1. Proof. Conclusion (i) is a classical result of van der Corput [93, Thm. I.3.3]; (ii) follows directly from Weyl’s criterion [93, Thm. I.2.1]; (iii) is [11, Lem. 2.8]; (iv) is [15, Lem. 2.4.(i)]; (v) is Fej´er’s Theorem [93, Cor. I.2.1]; and (vi) follows from [130, Bsp. 3.I] using a suitable C1 -function f that satisf es f (n) = yn for 2 3 all n, e.g., f (t) = y⌊t⌋ + (y⌊t⌋+1 − y⌊t⌋ ) 3hti − 2hti . 

The converse of (i) is not true in general: (xn ) may be u.d. mod 1 even if (xn+1 − xn ) has a rational limit. Also, in (ii) the sequence (nθ) cannot be replaced by an arbitrary uniformly distributed sequence (θn ), i.e., (θn + xn ) may not be u.d. mod 1 even though (θn ) is u.d. mod 1 and (xn ) is periodic.

Example 4.7. (i) The sequences (nπ) = (π, 2π, . . .), (n log π), and (n log 2) are all u.d. mod 1, by Proposition 4.6(i). Thus, by Theorem 4.2, the sequences (10nπ ), (π n ), and (2n ) are √ all  Benford sequences.  √  Similarly, (xn ) = n 2 is u.d. mod 1, whereas xn 2 = (2n) = (2, 4, . . .) clearly is not, since h2ni = 0 for all n. Thus the requirement in Proposition 4.3(i) that k be an integer cannot be dropped.  For an analogous example using random variables, let X be uniform on 0, log2 2 . Since a random variable that is uniform on (0, a) for some a > 0 is u.d. mod 1 if and only if a ∈ N, X is not u.d. mod 1, but X log 2 is. By Theorem 4.2, therefore, the random variable 2X is Benford, but 10X is not.

THE UNIFORM DISTRIBUTION AND BENFORD’S LAW

47

(ii) The sequence (log n) is not  u.d. mod 1. A short calculation shows that, for every s ∈ [0, 1), the sequence N −1 #{1 ≤ n ≤ N : hlog ni ≤ s} N ∈N has s 1 9 (10

− 1) and

10 9 (1

− 10−s )

as its limit inferior and limit superior, respectively. Thus, by Theorem 4.2, the sequence of positive integers (n) is not Benford, and neither is (anb ) for any a, b ∈ R, by Theorem 4.4.  √ a + bn is u.d. mod 1 for all a, b > 0, as follows easily (iii) The sequence √ from Proposition 4.6(v) using the function f (t) = a + bt. Hence the sequence  √a+bn  10 is Benford for all a, b > 0. z

The concepts of uniform distribution for sequences and functions are closely related, and in order to decide whether a Borel measurable function is u.d. mod 1, the following result is often useful; part (i) is [93, Thm. I.9.6(b)], whereas part (ii) follows directly from Proposition 4.6(iv) together with [93, Thm. I.9.7]. Proposition 4.8. Let f : [0, +∞) → R be Borel measurable.   (i) If, for some δ0 > 0, the sequence f (nδ) is u.d. mod 1 for almost all 0 < δ < δ0 , then the function f is u.d. mod 1 as well. (ii) If lim supt→+∞ |f (t)|/ log t < +∞, and f is continuously differentiable and monotone, then f is not u.d. mod 1. Example 4.9. (i) The function f (t) = at + b with real a, b is u.d. mod 1 if and only if a 6= 0. Clearly, if a = 0 then f is constant and hence not u.d. mod 1. On the other hand, if a 6= 0 then f (nδ) = (aδn + b) is u.d. mod 1 if (and only if) aδ is irrational, by Proposition 4.6(ii), and hence for all but countably many δ > 0. Thus f is u.d. mod 1 by Proposition 4.8(i). This can also be conf rmed directly by an elementary calculation: If a > 0 then hat + bi ≤ s if and only if   k−b+s k−b+s t ∈ k−b , for some k ∈ Z. Note that each of the intervals k−b a a a , a s has the same length a . Thus, given T > 0 and s ∈ [0, 1),   s s (⌊aT ⌋ − 2) ≤ λ {t ∈ [0, T ) : hat + bi ≤ s} ≤ (⌊aT ⌋ + 2) , a a

s and since limT →+∞ (⌊aT ⌋ ± 2) = s, the function f is u.d. mod 1. The aT argument for the case a < 0 is similar. As a consequence, although the function g(t) = at is not Benford for any a, the function g(t) = eat is Benford whenever a 6= 0, via Theorem 4.2, since log g(t) = at log e is u.d. mod 1; see Figure 4.1. (ii) The function f (t) = log |at + b| is not u.d. mod 1 for any a, b ∈ R. Indeed, if a = 0 then f is constant and hence not u.d. mod 1. On the other hand, if a 6= 0 then limt→+∞ f (t)/ log t = 1, and f is continuously differentiable and monotone for t > |b|/|a|. Hence Proposition 4.8(ii) shows that f is not u.d. mod 1. Again, this can also be seen by means of an elementary calculation: For

48

CHAPTER 4

a= 6 0 essentially the same calculation as the one in Example 4.7(ii) above shows that, for every s ∈ [0, 1),   λ {t ∈ [0, T ) : hlog |at + b|i ≤ s} = 19 (10s − 1) , lim inf T →+∞ T and lim supT →+∞

  λ {t ∈ [0, T ) : hlog |at + b|i ≤ s} = T

10 9 (1

− 10−s ) .

Again, by Theorem 4.2, this implies that g(t) = at + b is not Benford for any a, b ∈ R. Similarly, f (t) = − log(1 + t2 ) is not u.d. mod 1, so g(t) = (1 + t2 )−1 is not Benford; see Figure 4.1. In fact, it is clear that if g is continuously differentiable and monotone with limt→+∞ |g(t)| = +∞, but lim supt→+∞ |g(t)|/ta < +∞ for some a > 0, then g is not Benford. Thus, the function g(t) = atb , for instance, is not Benford for any a, b ∈ R; recall the analogous conclusion for sequences in Example 4.7(ii). (iii) The function f (t) = et is u.d. mod 1. To see this, let T > 0 and N = ⌊eT ⌋, and recall that t − 12 t2 ≤ ln(1 + t) ≤ t for all t ≥ 0. Given 0 ≤ s < 1, it follows from n    XN −1  s s o λ {t ∈ [0, T ) : het i ≤ s} = ln 1 + + min T − ln N, ln 1 + n=1 n N that

PN −1 n−1 − 21 s2 n=1 n−2 λ ({t ∈ [0, T ) : het i ≤ s}) ≤ ln(N + 1) T PN −1 −1 s n=1 n + ln(1 + N −1 ) ≤ , ln N   and hence indeed limT →+∞ T −1 λ {t ∈ [0, T ) : het i ≤ s} = s. Alternatively, note that f (nδ) = (enδ ), and Proposition 4.14 below shows that this sequence is u.d. mod 1 for almost all δ > 0. Hence Proposition 4.8(i) again implies that f is u.d. mod 1. In fact, af is u.d. mod 1 for every real a 6= 0, and so by Theorem t t 4.2, the function g(t) = ee = 10e log e , for instance, is Benford. (iv) For the function f (t) = (sin t)2 , it is straightforward to check that, given any 0 ≤ s < 1,   √ λ {t ∈ [0, T ) : h(sin t)2 i ≤ s} 2 limT →+∞ = arcsin s . T π s

PN −1 n=1

Thus, asymptotically hf i is not uniform on (0, 1) but rather arcsin-distributed; 2 see Example 3.10(iv). Theorem 4.2 implies that g(t) = 10(sin t) is not Benford. This also follows easily from Theorem 4.10 below.

49

THE UNIFORM DISTRIBUTION AND BENFORD’S LAW

  (v) For the function f (t) = log (sin t)2 , it follows from (iv) that the asymptotic distribution of hf i has density   ln 10 X∞ 1 d 2 X∞  (s−n)/2 −n/2 √ arcsin 10 − arcsin 10 = n−s n=1 n=1 ds π π 10 −1 ln 10 10s/2 > · 1/2 , π 10 − 1

for 0 ≤ s < 1. Thus clearly f is not u.d. mod 1, showing that g(t) = (sin t)2 is not Benford; see Figure 4.1 and Theorem 4.10 below. z

1

log S(f2) t

1 0

log S(f1) t

0

0

0

f2(t) = (1 + t2)−1

1

log S(f3)

f1(t) = e−t

0

t 0

f3(t) = (sin t)2

Figure 4.1: While the function f1 is Benford, the functions f2 , f3 are not; see Example 4.9. The only Benford functions encountered thus far are functions like e−t and e , which converge to either 0 or ±∞ as t → +∞. Much later, it will be seen that et cos(πt), for example, is also Benford whereas 10t cos(πt) is not; see Example 7.57 below. On the other hand, a function f may be Benford even if lim inf t→+∞ f (t) and lim supt→+∞ f (t) are different but are both f nite, as is the case if f is bounded and periodic but not constant. As the next theorem suggests, however, such Benford functions are quite rare. Recall that f : [0, +∞) → R is periodic, with period p > 0, if f (t + p) = f (t) for all t ≥ 0. For instance, all functions in Example 4.9(iv, v) are periodic with period p = π. et

Theorem 4.10. Let f : [0, +∞) → R be a (Borel measurable) periodic function with period p > 0. Then the following are equivalent: (i) f is Benford;

50 (ii)

CHAPTER 4

Rp 0

|f (t)|2πın log e dt = 0 for all n ∈ N;

(iii) f (U ) is a Benford random variable, where U is uniform on (0, p). Moreover, if the function f is differentiable then it is not Benford. Proof. By [93, Exc. I.9.9], a periodic R p function g : [0, +∞) → R with period p > 0 is u.d. mod 1 if and only if 0 e2πıng(t) dt = 0 for all n ∈ N. With g = log |f |, the equivalence of (i) and (ii) then follows from Theorem 4.2. To see the equivalence of (i) and (iii), note that for all 1 ≤ t < 10,         λ τ ∈ [0, T ) : S f (τ ) ≤ t λ τ ∈ [0, p) : S f (τ ) ≤ t limT →+∞ = T p     = P S f (U ) ≤ t ,

where the f rst equality follows from the fact that S(f ) : [0, +∞) → [1, 10) is periodic with period p, and the second equality follows from the random variable U being uniform on (0, p). To prove the last assertion, let f be differentiable and periodic with period p > 0. Without loss of generality, assume that f is not constant. It follows that m := maxt∈[0,p] |f (t)| > 0, and, since |f | is continuous, |f (t0 )| = m, where t0 = min{t ∈ [0, p] : |f (t)| = m}. Since f is not constant, it can be assumed that t0 > 0. (Otherwise, replace f by f (· + c) with the appropriate c > 0.) Furthermore, since S(10k f ) = S(|f |) for all k ∈ Z, it can also be assumed that 1 < m ≤ 10 and f (t0 ) > 0. For every n ∈ N, let  tn = sup 0 ≤ t ≤ t0 : f (t) ≤ m − n1 ,

with the convention sup ∅ := 0. Clearly, the sequence (tn ) is non-decreasing with tn ≤ t0 , and  is hence convergent, with t∞ := limn→∞ tn and t∞ ≤ t0 . If t∞ < t0 then S f (t) = S(m) for all t∞ < t < t0 . With the random variable U being uniform on (0, p), therefore, S f (U ) equals S(m) with probability at least (t0 − t∞ )/p > 0, and hence f (U ) is not Benford. If, on the other hand, t∞ = t0 then tn < t0 for all n, by the continuity of f . In this case, for every n ∈ N consider the open interval Jn := (f (tn ), m) = (m − n1 , m) and note that  S f (U ) ∈ Jn whenever U ∈ (tn , t0 ). Thus, if f (U ) were Benford, then, for all suÿ ciently large n,   m − S f (tn ) f (t0 ) − f (tn ) = t0 − tn t0 − tn !     P S f (U ) ∈ Jn 1 m   = ≥ log (t0 − tn ) log e (t0 − tn ) log e S f (tn )   P U ∈ (tn , t0 ) 1 ≥ = > 0, (t0 − tn ) log e p log e

51

THE UNIFORM DISTRIBUTION AND BENFORD’S LAW

which is impossible because f is differentiable at t0 with vanishing derivative; see Example 7.16(iii) for a similar but somewhat less general argument. Thus f (U ) is not Benford, so by the equivalence of (i) and (iii), f is not Benford.  Example 4.11. The following three functions are all periodic with period p = 1, and all satisfy both mint≥0 fj (t) = 1 and supt≥0 fj (t) = 10: f1 (t) = 1 + 9hti ,

f2 (t) =

11 2

+

9 2

sin(2πt) ,

1

f3 (t) = 10|1−2ht+ 4 i| .

Note that f1 is discontinuous whereas f2 and f3 are continuous. With U being uniform on (0, 1), clearly f1 (U ) is uniform on (1, 10). As the latter random variable is not Benford by Theorem 3.13, Theorem 4.10 shows that f1 is not Benford. Also, the function f2 is differentiable and hence not Benford, again by Theorem 4.10. On the other hand, the random variable hU + 14 i is uniform on (0, 1), and so is |1 − 2hU + 41 i|. Hence f3 is Benford; see Figure 4.2. Note that f3 is smooth on [0, 1] except at two points. z log S(fj(t))

fj(t) 10

1

1 0

1

f1 (t) = 1 + 9 t

0

t f2 (t) =

11 2

+

9 2

0

sin(2πt)

1

f3 (t) = 10|1−2

t t+ 1 4

Figure 4.2: Three periodic functions with period p = 1. Only f3 is Benford; see Example 4.11. As was seen in Examples 4.7 and 4.9, the sequence (2n ) and the function f (t) = et are both Benford. The next theorem shows that all sequences and functions suÿ ciently close to such exponential Benford sequences and functions are themselves exactly Benford. The robustness of the Benford property, of which this is a simple example, will be a recurrent theme throughout this book. Theorem 4.12. (i) Let (an ) and (bn ) be sequences of real numbers with |an | → +∞ as n → ∞, and such that limn→∞ |an /bn | exists and is positive. Then (an ) is Benford if and only if (bn ) is Benford. (ii) Let f, g : [0, +∞) → R be (Borel measurable) functions with |f (t)| → +∞ as t → +∞, and such that limt→+∞ |f (t)/g(t)| exists and is positive. Then f is Benford if and only if g is Benford.

52

CHAPTER 4

Proof. To prove (i), let (an ) and (bn ) be sequences of real numbers with |an | → +∞, and, by Theorem 4.4, assume without loss of generality that limn→∞ |an /bn | = 1. Thus |bn | → +∞ and log |an | − log |bn | = log |an /bn | → 0. If (an ) is Benford, then by Theorem 4.2, (log |an |) is u.d. mod 1, so Proposition 4.3(i) implies that (log |bn |) is also u.d. mod 1. By Theorem 4.2 again, (bn ) is Benford. As the roles of (an ) and (bn ) can be interchanged, this establishes (i). The proof of (ii) is analogous.  Example 4.13. (i) The sequence (⌊π n ⌋) = (3, 9, 31, 97, 306, . . .) is Benford [117, A001672]. This follows from Theorem 4.12(i), since |⌊π n ⌋ − π n | < 1 for all n, and since (π n ) is Benford, as was seen in Example 4.7(i). Alternatively, the conclusion follows directly from Weyl’s criterion and an easy approximation argument. (ii) The hyperbolic cosine function f (t) = cosh t = 12 (et + e−t ) is Benford. Indeed, et is Benford, so by Theorem 4.4, 21 et is Benford. Since | cosh t− 21 et | ≤ 12 for all t ≥ 0, Theorem 4.12(ii) implies that cosh t is a Benford function. (iii) By Example 4.9(ii), the function f (t) = t is not Benford, so by Theorem 4.4, g(t) = tk is not Benford for any k ∈ Z. Thus, by Theorem 4.12(ii), no polynomial or rational function is Benford. z A third very useful tool from the basic theory of uniform distribution is Koksma’s metric theorem [93, Thm. I.4.3]. For its formulation, recall that a property of real numbers is said to hold for (Lebesgue) almost all (a.a.) x ∈ [a, b) if there exists a nullset N , i.e., N ∈ B[a, b) with λa,b (N ) = 0, such that the property holds for all x 6∈ N . The probabilistic interpretation of a given property of real numbers holding for a.a. x is that this property holds almost surely (a.s.), i.e., with probability one, for every random variable that has a density (i.e., is absolutely continuous). Proposition 4.14. Let J ⊂ R be an interval and, for every n ∈ N, let ′ fn : J → R be continuously differentiable. If fm − fn′ is monotone on J, and ′ ′ |fm (x) − fn (x)|  ≥ a > 0 for all m 6= n, where a does not depend on x ∈ J, m, and n, then fn (x) is u.d. mod 1 for almost all x ∈ J.

Example 4.15. Fix a real number b > 1 and consider the sequence (bn ). Clearly, if b is an integer then hbn i ≡ 0, and so (bn ) is not u.d. mod 1. On the other hand, for non-integer values of b it is generally not known whether (bn ) is u.d. mod 1, even if b is rational. For example, it is a famous open problem whether or not (3/2)n is u.d. mod 1; see [50, p. 137]. However, Proposition 4.14 implies that (bn x) is u.d. mod 1 for almost all x ∈ R. Indeed, with fn (x) := bn x, ′ clearly fm − fn′ is constant (hence monotone), and for all m 6= n, ′ |fm (x) − fn′ (x)| = |bm − bn | ≥ b(b − 1) > 0

for all x ∈ R .

By Proposition 4.14, therefore, the set  Ub := x ∈ R : (bn x) is u.d. mod 1

THE UNIFORM DISTRIBUTION AND BENFORD’S LAW

53

has full measure, so R \ Ub is a nullset. That Ub is not trivial, i.e., Ub 6= R \ {0}, is easy to see for integral b (in which case Q ∩ Ub = ∅), and in fact is true for every b > 1. The proof of Theorem 6.46 below shows that, despite being a nullset, the set R \ Ub is uncountable and dense in R. Note that, even for integer b, it may be very hard to decide whether x ∈ Ub for √ specif c x ∈ R. For  any example, for no integer q ≥ 2 is it known whether q n 2 is u.d. mod 1, i.e., √ whether 2 ∈ Uq ; see [42]. z The next theorem establishes a necessary and suÿ cient condition for an asymptotically exponential sequence to be Benford. Theorem 4.16 ([22]). Let (bn ) be a sequence of real numbers such that limn→∞ |bn /an | exists and is positive for some a > 0. Then (bn ) is Benford if and only if log |a| is irrational. Proof. Observe that (bn ) is Benford ⇐⇒ (an ) is Benford ⇐⇒ (n log a) is u.d. mod 1 ⇐⇒ log a is irrational ,

where the f rst equivalence follows from Theorem 4.12(i), the second from Theorem 4.2, and the third from Proposition 4.6(ii).  Example 4.17. (i) By Theorem 4.16 the sequence (2n ) is Benford since log 2 is irrational, but (10n ) is not Benford since log 10 = 1 ∈ Q. Similarly, (0.2n ), √ n  (3n ), (0.3n ), (0.01 · 0.2n + 0.2 · 0.01n) are Benford, whereas (0.1n ), 10 , (0.1 · 0.02n + 0.02 · 0.1n ) are not. (ii) The sequence (0.2n + (−0.2)n ) is not Benford, since all odd terms are zero, but (0.2n + (−0.2)n + 0.03n ) is Benford — although this does not follow directly from Theorem 4.16. (iii) By Proposition 4.14, the sequence (nx) = (x, 2x, . . .) is u.d. mod 1 for almost all real x, but clearly not for all x, as taking x ∈ Q shows. Consequently, by Theorem 4.2, (10nx ) is Benford for almost all, but not all, real x, e.g., (10nx ) is not Benford when x is rational. (iv) By Proposition 4.6(iv) or Example 4.7(ii), the sequence (log n) is not u.d. mod 1, so by Theorem 4.16 the sequence (n) of positive integers is not Benford, and neither is (an) for any a ∈ R; see Figure 4.3. (v) Let (pn ) = (2, 3, 5, 7, 11, 13, 17, . . .) denote the sequence of prime numbers. By the Prime Number Theorem, limn→∞ pn /(n ln n) = 1; e.g., see [117, A000040]. Hence it follows from Proposition 4.6(iv) that (pn ) is not Benford; see Figure 4.3. z Example 4.18. The Fibonacci sequence (Fn ) = (1, 1, 2, 3, 5, 8, 13, . . .) is given explicitly (i.e., non-recursively) by the well-known formula (e.g., see [117, A000045]) 1  1 1 √ n  1 1 √ n  ϕn − (−ϕ−1 )n √ Fn = √ + 5 − 2−2 5 = , n ∈ N, 5 2 2 5

54

CHAPTER 4

#{1 ≤ n ≤ N : D1 = d} N

1

1

(xn ) = (⌊ 9 · 10n/7⌋)

= (12, 17, 24, 33, 46, . . .)

(Fn ) = (1, 1, 2, 3, 5, , . . . )

d=1

log 2 log

10 9

d=9 1

10

100

2 7 1 7

1000 10000 N

d=1 d=9 1

10

d=1

1 9

d=9 1

10

100

1000 10000 N

1000 10000 N

(pn ) = (2, 3, 5, 7, 11, . . . )

(2n) = (2, 4, 6, 8, 10, . . . ) 5 9

100

5 9

d=1 1 9

d=9 1

10

100

1000 10000 N

(xn ) and d ∈ {1, 2, . . . , 9}, the sequence Figure 4.3: For every Benford sequence  #{1 ≤ n ≤ N : D1 (xn ) = d}/N converges to log(1 + d−1 ) as N → ∞. Thus if this sequence does not converge (bottom) or has a different limit (top right), then (xn ) is not Benford; see Example 4.17. √ where ϕ = 12 (1 + 5) = 1.618. Since ϕ > 1 and, as is easily checked, log ϕ is irrational, Theorem 4.16 implies that (Fn ) is Benford; see Figure 4.3. Sequences such as (Fn ) which are generated by (linear) recurrence relations will be studied in detail in Chapters 6 and 7. z 4.3

UNIFORM DISTRIBUTION OF RANDOM VARIABLES

This section focuses on tools for the analysis of the Benford property of random variables. First, Theorem 4.2 alone is used to exhibit several Benford and nonBenford random variables. Example 4.19. (i) As a simple extension of Example 4.7(i), Theorem 4.2 implies that if U is uniform on (0, a) for some a > 0, then for every b ∈ R+ , bU is Benford if and only if a| log b| ∈ N. (ii) No exponential random variable is u.d. mod 1. Specif cally, let X be an exponential random variable with mean σ > 0, i.e., FX (x) = max{0, 1 − e−x/σ } ,

x ∈ R,

55

THE UNIFORM DISTRIBUTION AND BENFORD’S LAW

so var X = σ 2 . For every integer k ≥ 0,     P k ≤ X < k + 12 = FX k + 12 − FX (k)    > FX (k + 1) − FX k + 21 = P k + X∞ and since P(k ≤ X < k + 1) = 1, this implies that

1 2

 ≤X k=0

1 2

,

showing that X is not u.d. mod 1. To obtain more quantitative information, observe that, for every 0 ≤ s < 1, FhXi (s) = P(hXi ≤ s) =

X∞   1 − e−s/σ FX (k + s) − FX (k) = , k=0 1 − e−1/σ

from which it follows via a straightforward calculation that 1 − σ + σ ln(σe1/σ − σ) . R(σ) := max0≤s 0

contains either an irrational number, or rational numbers with arbitrarily large denominators. In either case, therefore, the set  SX1 := hx1 + x2 + . . . + xn i : n ∈ N; x1 , x2 , . . . , xn ∈ supp PX1 ⊂ [0, 1)

is dense in [0, 1). Since the forthcoming main argument rests on an application of the ergodic theorem, some ergodic theory notation and terminology will be used. Specif cally, endow the space of sequences in [0, 1),  S∞ := [0, 1)N0 = (sk )k∈N0 : sk ∈ [0, 1) for all k , with the (product) σ-algebra O B∞ := B[0, 1) k∈N0   := σ {B0 ×B1 ×. . .×Bk ×[0, 1)×[0, 1)×. . . : k ∈ N0 ; B0 , B1 , . . . , Bk ∈ B[0, 1)} . A probability measure P∞ is uniquely def ned on (S∞ , B∞ ) by setting

P∞ (B0 ×B1 ×. . .×Bk ×[0, 1)×[0, 1)×. . .) = P(X0 ∈ B0 , X1 ∈ B1 , . . . , Xk ∈ Bk ) Yk = λ0,1 (B0 ) P(Xj ∈ Bj ) j=1

for all k ∈ N0 and all B0 , B1 , . . . , Bk ∈ B[0, 1). Def ne a map σ∞ of S∞ into itself by     σ∞ (s0 , s1 , s2 , . . .) = hs0 + s1 i, s2 , s3 , . . . , (sk )k∈N0 ∈ S∞ .

−1 Clearly, σ∞ is measurable, i.e., σ∞ (A) ∈ B∞ for every A ∈ B∞ . Moreover, since X0 = U (0, 1) is independent of the i.i.d. variables X1 , X2 , . . .,   −1 P∞ σ∞ (B0 × B1 × . . . × Bk × [0, 1) × [0, 1) × . . .)

= P(hX0 + X1 i ∈ B0 , X2 ∈ B1 , . . . , Xk+1 ∈ Bk ) = λ0,1 (B0 )P(X1 ∈ B1 , . . . , Xk ∈ Bk )

= P∞ (B0 × B1 × . . . × Bk × [0, 1) × [0, 1) × . . .) ,

−1 which shows that the two probability measures P∞ and P∞ ◦ σ∞ on (S∞ , B∞ ) are identical, i.e., σ∞ is P∞ -preserving. The crucial step in the argument now consists in showing that actually more is true, namely,

σ∞ is ergodic .

(4.2)

60

CHAPTER 4

Recall that this simply means that every σ∞ -invariant set A ∈ B∞ has P∞ −1 measure zero or one, or, more formally, P∞ (σ∞ (A)∆A) = 0 for A ∈ B∞ implies that P∞ (A) ∈ {0, 1}; here the symbol ∆ denotes the symmetric difference of two sets, i.e., A∆B = (A \ B) ∪ (B \ A). It is straightforward to show (see [134, p. 300]) that ergodicity of σ∞ follows from (and in fact is equivalent to) the following property: If λ0,1 (hB + X1 i∆B) = 0 with probability one for some B ∈ B[0, 1) then λ0,1 (B) ∈ {0, 1} ;

(4.3)

here and throughout the remainder of the proof, for any B ∈ B[0, 1) and x ∈ R, the notation hB + xi := {hb + xi : b ∈ B} will be used. Thus, to prove (4.2) it is enough to verify (4.3), and this will now be done. Assume, therefore, that λ0,1 (B) > 0, and λ0,1 (hB + X1 i∆B) = 0 with probability one. It follows that λ0,1 (hB + X1 + . . . + Xn i∆B) = 0 with probability one for every n ∈ N, and hence by Fubini’s Theorem,  λ0,1 (hB + si∆B) = 0 for every s ∈ SX1 . Def ne a probability measure Q on [0, 1), B[0, 1) by Q(C) =

λ0,1 (C ∩ B) , λ0,1 (B)

C ∈ B[0, 1) ,

and observe that, for every C ∈ B[0, 1) and s ∈ SX1 , Q(hC + si) − Q(C) = |λ0,1 (hC + si ∩ B) − λ0,1 (C ∩ B)| λ0,1 (B)

|λ0,1 (C ∩ hB − si) − λ0,1 (C ∩ B)| λ0,1 (B) λ0,1 (hB − si∆B) λ0,1 (B∆hB + si) ≤ = = 0. λ0,1 (B) λ0,1 (B) =

Thus Q(hC + si) = Q(C) for every C ∈ B[0, 1) and s ∈ SX1 . Taking Fourier coeÿ cients yields, for every k ∈ Z, b b e2πıks Q(k) = Q(k)

for all s ∈ SX1 .

(4.4)

Since SX1 is dense in [0, 1), given any k ∈ Z \ {0}, there exists a sequence b (sn ) in SX1 with limn→∞ |k|sn = 21 . Hence (4.4) implies that Q(k) = 0 for all 2 k 6= 0, which in turn shows that Q = λ0,1 . Thus λ0,1 (B) = λ0,1 (B), and since λ0,1 (B) > 0, it follows that λ0,1 (B) = 1. This establishes (4.3) and, as explained in the preceding paragraph, also (4.2). The scene is now set for an application of the Birkhoff Ergodic Theorem, which states that for every (Borel measurable) function f : [0, 1) → C with R1 0 |f (x)| dx < +∞, 1 Xn n→∞ f (hs0 + s1 + . . . + sj i) −→ j=0 n

Z

0

1

f (x) dx

61

THE UNIFORM DISTRIBUTION AND BENFORD’S LAW

for all (sk )k∈N0 ∈ S∞ , with the possible exception of a P∞ -nullset. In probabilistic terms, this means that Z 1 1 Xn f (hX0 + X1 + . . . + Xj i) = f (x) dx a.s. (4.5) limn→∞ j=0 n 0

Assume henceforth that f is actually continuous with limx↑1 f (x) = f (0), e.g., f (x) = e2πıx . For any such f , as well as any s ∈ [0, 1) and m ∈ N, denote the set   Z 1 1 1 Xn f (hs + X1 (ω) + . . . + Xj (ω)i)− f (x) dx < ω ∈ : lim supn→∞ j=1 n m 0 R1 simply by f,s,m . By (4.5), 1 = 0 P( f,s,m ) ds, and hence P( f,s,m ) = 1 for a.a. s ∈ [0, 1). Since f is uniformly continuous, for every m ≥ 2 there exists sm > 0 such that P( f,sm ,m ) = 1 and f,sm ,m ⊂ f,0,⌊m/2⌋ . From \  \  1=P f,sm ,m ≤ P f,0,⌊m/2⌋ ≤ 1 , m≥2

m≥2

it is clear that

limn→∞

1 Xn f (hX1 + . . . + Xj i) = j=1 n

Z

1

f (x) dx

a.s.

(4.6)

0

Since the intersection of countably many sets of full measure itself has full measure, choosing f (x) = e2πıkx , k ∈ Z, in (4.6) shows that, with probability one, Z 1 1 Xn limn→∞ e2πık(X1 +...+Xj ) = e2πıkx dx = 0 for all k ∈ Z, k 6= 0 . j=1 n 0 (4.7) By Weyl’s criterion [93, Thm. I.2.1], (4.7) is equivalent to Xn   P Xj is u.d. mod 1 = 1 , j=1

which establishes (i).   1 Z = 1 for some Conversely, suppose (ii) does not hold, i.e., P X1 ∈ m m ∈ N. This means that mX1 is an integer with probability one, and so is mSn for all n ∈ N. In other words, (hmSn i) = (0, 0, . . .) with probability one. But since a sequence (xn ) is u.d. mod 1 if and only if (mxn ) is u.d. mod 1 for all m ∈ N, by Proposition 4.3(i), this implies that (Sn ) is not u.d. mod 1, i.e., (i) does not hold. 

Example 4.25. (i) If X1 , X2 , . . . are i.i.d. and X1 is continuous, or, more generally, if P(X1 ∈ Q) = 0, then, by Theorem 4.24, the sequence of partial sums (Sn ) is u.d. mod 1 with probability one. (ii) Let X1 , X2 , . . . be i.i.d. with P(X1 = j −1 ) = 2−j for all j ∈ N. Then, by Theorem 4.24, (Sn ) is u.d. mod 1 with probability one even though all the increments Sn+1 − Sn = Xn+1 are rational with probability one, and so is Sn for every n. z

62

CHAPTER 4

Clearly, if X1 has a density, then property (ii) holds in both Theorems 4.22 and 4.24, so if X1 , X2 , . . . are i.i.d. and X1 has a density, then the partial sums (Sn ) are u.d. mod 1 in both the distributional sense and with probability one. As it turns out, these same two properties also hold for every random walk on the unit circle that has identical increments, i.e., if Xn = X1 and hence Sn = nX1 for all n ∈ N — provided that X1 has a density. Proposition 4.26. If the random variable X1 has a density, then:

(i) limn→∞ P(hnX1 i ≤ s) = s for all 0 ≤ s < 1; (ii) The sequence (nX1 ) = (X1 , 2X1 , . . .) is u.d. mod 1 with probability one. This last proposition, which will not be used anywhere in this book, follows immediately from Theorem 4.2 and Theorem 8.8 below; it may also be proved directly using the same tools as in the proof of Theorem 4.22, together with \ the identity P\ hnX1 i (k) = PhX1 i (nk) and Proposition 4.14, respectively. Unlike properties (i) in Theorems 4.22 and 4.24, however, which both hold for all continuous X1 , the conclusion of Proposition 4.26(i) may fail if X1 is continuous but does not have a density; see Example 8.9(ii) below.

Chapter Five Scale-, Base-, and Sum-Invariance The purpose of this chapter is to establish and illustrate three basic invariance properties of the Benford distribution that are instrumental in demonstrating whether or not certain datasets are Benford, and that also prove helpful for predicting which empirical data are likely to follow Benford’s law closely.

5.1

THE SCALE-INVARIANCE PROPERTY

One popular hypothesis often related to Benford’s law is that of scale-invariance. Informally put, scale-invariance captures the intuitively attractive notion that any universal law should be independent of units. For instance, if a suÿ ciently large aggregation of data is converted from meters to feet, US$ to e, etc., then while the individual numbers may change, any statement about the overall distribution of signif cant digits should not be affected by this conversion. R. Pinkham [124] credits Hamming with the idea of scale-invariance, and attempts to prove that the Benford distribution is the only scale-invariant distribution. This idea has subsequently been used by numerous authors to explain the appearance of Benford’s law in many real-life data sets, by arguing that the data in question should be invariant under changes of scale and thus must be Benford. Although this scale-invariance conclusion is correct in the proper setting (see Theorem 5.3 below), Knuth [90] observed that Pinkham’s argument fails because of the implicit assumption that there is a scale-invariant Borel probability measure on R+ , when in fact no such probability measure exists; cf. [127]. Indeed, the only real-valued random variable X that is scale-invariant, i.e., X and aX have the same distribution for all scaling factors a > 0, is the random variable that is zero almost surely, that is, P(X = 0) = 1. Clearly, any such random variable is scale-invariant since X = aX with probability one. To see that this is the only scale-invariant random variable, simply observe that for any scale-invariant X, P(X 6= 0) = lima→+∞ P(|X| > a−1 ) = lima→+∞ P(|a2 X| > a) = lima→+∞ P(|X| > a) = 0 , where the third equality is due to scale-invariance. Thus no non-zero random variable is scale-invariant.

64

CHAPTER 5

Note, however, that the measure on (R+ , B+ ) def ned as µ([c, d]) =

Z

d c

log e d dx = log x c

for every 0 < c < d

is scale-invariant. To see this, for each a > 0 let µa denote the measure induced by µ and the scaling with factor a, that is, µa = µ ◦ f −1 with the linear function f (x) = ax. Then µa ([c, d]) =

Z

d/a

c/a

log e d dx = log = µ([c, d]) for every 0 < c < d , x c

+ and hence µa = µ. Note that µ is not f nite, S i.e., µ(R ) = +∞, but is σ-f nite. (A measure µ on ( , A) is σ-f nite if = n∈N An for some sequence (An ) in A, and µ(An ) < +∞ for all n.) In a similar spirit, a sequence (xn ) of real numbers may be called scaleinvariant if

#{1 ≤ n ≤ N : axn ∈ [c, d]} #{1 ≤ n ≤ N : xn ∈ [c, d]} = limN →∞ N N √   −1  for all a > 0 and c < d. For example, the sequences n , n , and   2, 2−1 , 2, 3, 2−1, 3−1 , 2, . . . , (n − 1)−1 , 2, 3, . . . , n, 2−1 , 3−1 , . . . , n−1 , 2 . . . limN →∞

all are scale-invariant. As above, it is not hard to see that limN →∞

#{1 ≤ n ≤ N : xn ∈ [c, d]} =0 N

for every c < d with cd > 0 ,

whenever (xn ) is scale-invariant. Most entries of a scale-invariant sequence of real numbers, therefore, are close to either 0 or ±∞. While no positive random variable X can be scale-invariant, as shown above, it may nevertheless have scale-invariant signif cant digits. For this, however, X has to be Benford. In fact, Theorem 5.3 below shows that being Benford is not only necessary but also suÿ cient for X to have scale-invariant signif cant digits. This result will f rst be stated in terms of probability distributions. Recall from Def nition 2.9 that S denotes the signif cand σ-algebra on R+ . Definition 5.1. Let A ⊃ S be a σ-algebra on R+ . A probability measure P on (R+ , A) has scale-invariant signif cant digits if P (aA) = P (A)

for all a > 0 and A ∈ S ,

or, equivalently, if for all m ∈ N, all d1 ∈ {1, 2, . . . , 9}, all dj ∈ {0, 1, . . . , 9}, j ≥ 2, and all a > 0,     P {x : Dj (ax) = dj for j = 1, 2, . . . m} = P {x : Dj (x) = dj for j = 1, 2, . . . , m} .

65

SCALE-, BASE-, AND SUM-INVARIANCE

Example 5.2. (i) The Benford probability measure B on (R+ , S) has scaleinvariant signif cant digits. This follows from Theorem 5.3 below but can also be S seen from a direct calculation. Indeed, if A = k∈Z 10k [1, 10s ] with 0 ≤ s < 1, then, given any a > 0, [ [ [ aA = 10k+log a [1, 10s ] = 10k+hlog ai [1, 10s ] = 10k B , k∈Z

k∈Z

k∈Z

where the set B is given by (  hlog ai  10 , 10hlog ai+s B=     1, 10hlog ai−1+s ∪ 10hlog ai , 10

if hlog ai < 1 − s ,

if hlog ai ≥ 1 − s .

From this, it follows that ( ) hlog ai + s − hlog ai B(aA) = = s = B(A) , hlog ai − 1 + s + (1 − hlog ai)

showing that B has scale-invariant signif cant digits. (ii) The Dirac probability measure δ1 at   x = 1 does not have scale-invariant  signif cant digits, since δ1 {x : D1 (x) = 1} = 1 but δ1 {x : D1 (2x) = 1} = 0. (iii) Let the random variable X be uniform on (0, 1), i.e., X = U (0, 1). Then, for example,  3   P(D1 (X) = 1) = 91 < 11 27 = P D1 2 X = 1 ,

so X does not have scale-invariant signif cant digits. B(3A) = B(A)

3A

1

9

A = {D1 = 2}

log 3A

= {2 ≤ S < 3}

0.9

0.1 0.2 0.3

0.7

4 6

0

0.8

3

7

S

λ0,1(log 3A) = λ0,1(log A)

2

8

z

0.6

5

log S

0.5

0.4

log A

Figure 5.1: Visualizing the scale-invariant signif cant digits of the Benford distribution B; see Example 5.2(i). In fact, the Benford distribution is the only probability measure (on the signif cand σ-algebra S) having scale-invariant signif cant digits.

66

CHAPTER 5

Theorem 5.3 (Scale-invariance characterization [73]). A probability measure P on (R+ , A) with A ⊃ S has scale-invariant signif cant digits if and only if P (A) = B(A) for every A ∈ S, i.e., if and only if P is Benford. Proof. Fix any probability measure P on (R+ , A), denote by P0 its restriction to (R+ , S), and let Q =P0 ◦ℓ−1 with ℓ as in Lemma 3.14. By Lemma 3.14, Q is a probability measure on [0, 1), B[0, 1) . Moreover, under the correspondence established by ℓ, the statement P0 (aA) = P0 (A) for all a > 0, A ∈ S

(5.1)

Q(hB + xi) = Q(B) for all x ∈ R, B ∈ B[0, 1) ,

(5.2)

is equivalent to

where hB + xi = {hb + xi : b ∈ B}. Fix any random variable X for which the distribution of hXi is given by Q. Then (5.2) simply means that, for every x ∈ R, the distributions of hXi and hX + xi coincide. By Theorem 4.21, this is the case if and only if X is u.d. mod 1, i.e., Q = λ0,1 . (For the “if” part, note that a constant random variable is independent of every other random variable.) Hence (5.1) is equivalent to P0 ◦ ℓ−1 = λ0,1 = B ◦ ℓ−1 , and so, by Lemma 3.14, (5.1) is equivalent to P0 = B.  Example 5.4. For every integer k, let  −1 x log e if 10k ≤ x < 10k+1 , fk (x) = 0 otherwise ; P P also, let qk ≥ 0. If k∈Z qk = 1 then, by Example 3.6(ii), k∈Z qk fk is the + + density of a Benford probability measure P on (R , B ). By Theorem 5.3, P has scale-invariant signif cant digits. Note that, in full agreement with earlier observations, P is not scale-invariant, as       qk = P [10k , 10k+1 ) = P 10k−l [10l , 10l+1 ) = P [10l , 10l+1 ) = ql

cannot possibly hold for all pairs of integers (k, l).

z

In analogy to Def nition 5.1, a sequence (xn ) of real numbers is said to have scale-invariant signif cant digits if limN →∞

#{1 ≤ n ≤ N : S(axn ) ≤ t} #{1 ≤ n ≤ N : S(xn ) ≤ t} = limN →∞ N N for all a > 0, t ∈ [1, 10) .

(5.3)

Implicit in (5.3) is the assumption that the limits on both sides exist for all t. The def nition of a function having scale-invariant signif cant digits is analogous. To formulate an analogue of Theorem 5.3 using this terminology, recall that a set C ⊂ N has (natural ) density ρ ∈ [0, 1] if limN →∞ #{n ≤ N : n ∈ C}/N exists and equals ρ. For example, ρ({n : n even }) = 21 and ρ({n : n prime }) = 0, whereas {n : D1 (n) = 1} does not have a density.

SCALE-, BASE-, AND SUM-INVARIANCE

67

Theorem 5.5. (i) A sequence (xn ) of real numbers has scale-invariant signif cant digits if and only if the set {n : xn 6= 0} has density ρ ∈ [0, 1], and either ρ = 0 or else (xnj )j∈N is Benford, where n1 < n2 < . . . and {n : xn 6= 0} = {nj : j ∈ N}. In particular, if ρ = 1 then the sequence (xn ) has scale-invariant signif cant digits if and only if it is Benford.   (ii) Assume λ {t ≥ 0 : f (t) = 0} < +∞ for the (Borel measurable) function f : [0, +∞) → R. Then f has scale-invariant signif cant digits if and only if f is Benford. Proof. (i) Assume f rst that (xn ) has scale-invariant signif cant digits. By (5.3), #{n ≤ N : S(xn ) ≤ 10s } G(s) = limN →∞ N exists for every 0 ≤ s < 1. Clearly, 0 ≤ G(s) ≤ 1, and G is non-decreasing, and hence continuous except for at most countably many jump discontinuities. As will now be shown, G is in fact continuous on [0, 1). To this end, suppose that G were discontinuous at 0 < s0 < 1. In this case, there exists δ > 0 such that G(s0 + ε) − G(s0 − ε) ≥ δ for all suÿ ciently small ε > 0. Pick any s1 with 0 < s1 < s0 and note that, for all N ∈ N and ε < min{s1 , 1 − s0 }, #{n ≤ N : S(10s1 −s0 xn ) ≤ 10s1 +ε } − #{n ≤ N : S(10s1 −s0 xn ) ≤ 10s1 −ε } = = #{n ≤ N : S(xn ) ≤ 10s0 +ε } − #{n ≤ N : S(xn ) ≤ 10s0 −ε } . With the assumption that the sequence (xn ) has scale-invariant signif cant digits, it follows that G(s1 + ε) − G(s1 − ε) = G(s0 + ε) − G(s0 − ε) ≥ δ , which in turn shows that G is discontinuous at s1 as well. Since 0 < s1 < s0 was arbitrary, this is impossible, and consequently G is continuous on (0, 1). Moreover, for every 1 < t < 10 and all suÿ ciently small ε > 0, #{n ≤ N : S(xn ) = t} ≤ G(log t + ε) − G(log t − ε) , N   showing that ρ {n ∈ N : S(xn ) = t} = 0. Since lim supN →∞

G(0) = limN →∞

= limN →∞

#{n ≤ N : S(5xn ) ≤ 1} N #{n ≤ N : xn = 0 or S(xn ) = 2} , N

again by scale-invariant signif cant digits, it is clear that C := {n ∈ N : xn 6= 0} has density ρ = ρ(C) = 1 − G(0). In addition, G is also (right-)continuous at s = 0.

68

CHAPTER 5

To complete the proof of (i), note that there is nothing else to show in the case G(0) = 1. From now on, therefore, assume G(0) < 1. Def ne a continuous, non-decreasing function H : [0, 1) → R as H(s) =

G(s) − G(0) , 1 − G(0)

0 ≤ s < 1.

Note that C is inf nite, and with C = {nj : j ∈ N}, where n1 < n2 < . . ., H(s) = limN →∞ = limN →∞

#{n ≤ N : 1 ≤ S(xn ) ≤ 10s } #{n ≤ N : xn 6= 0} #{j ≤ N : S(xnj ) ≤ 10s } ; N

so H only takes into account the non-zero entries of (xn ). Def ne h : R → R as h(x) = H(hxi) − hxi

for all x ∈ R .

Clearly, h is 1-periodic with h(0) = 0, and |h(x)| ≤ 1 for all x ∈ R. Note that in terms of the function H, the invariance property (5.3) simply reads ( H(1 + s − hlog ai) − H(1 − hlog ai) if s < hlog ai , H(s) = 1 − H(1 − hlog ai) + H(s − hlog ai) if s ≥ hlog ai , provided that log a 6∈ Z. In terms of h, this is equivalent to h(x) = h(1 + x − hlog ai) − h(1 − hlog ai) for all x ∈ R, a > 0 .

(5.4)

Thus, the function h(1 + x − hlog ai) − h(x) is constant for every a > 0. Since h is bounded and 1-periodic, it can be represented (at least in the L2 -sense) by P a Fourier series h(x) = k∈Z ck e2πıkx , from which it follows that   X h(1 + x − hlog ai) − h(x) = ck e2πık(1+x−hlog ai) − e2πıkx k∈Z

=

X

k∈Z

  ck e−2πıkhlog ai − 1 e2πıkx .

Fix a > 0 such that log a is irrational, e.g., a = 2. Then e−2πıkhlog ai 6= 1 whenever k 6= 0, which in turn implies that ck = 0 for all k 6= 0, i.e., h is constant almost everywhere. Thus H(s) = s + c0 for a.a. s ∈ [0, 1), and in fact H(s) ≡ s because H is continuous with H(0) = 0. In summary, therefore, limN →∞

#{j ≤ N : S(xnj ) ≤ 10s } =s N

for all s ∈ [0, 1) ,

showing that (xnj ) is Benford. For the converse in (i), note that if ρ = 0 then (5.3) holds with both sides equal to 1 for all t ∈ [1, 10). Assume, therefore, that ρ > 0 and (xnj ) is Benford.

SCALE-, BASE-, AND SUM-INVARIANCE

69

In this case, G(s) = sρ + 1 − ρ for all 0 ≤ s < 1. Thus h(x) ≡ 0, so (5.4) and hence also (5.3) hold, i.e., (xn ) has scale-invariant signif cant digits. The proof of (ii) is completely analogous, utilizing the function     λ τ ≤ T : S f (τ ) ≤ 10s G(s) = limT →+∞ , 0 ≤ s < 1. T   Note that the assumption λ {t ≥ 0 : f (t) = 0} < +∞ implies G(0) = 0 whenever f has scale-invariant signif cant digits.  Example 5.6. Let (xn ) be the sequence of either Fibonacci or prime numbers. In both cases, xn 6= 0 for all n, and hence by Theorem 5.5(i) (xn ) has scale-invariant signif cant digits if and only if it is Benford. Thus by Examples 4.18 and 4.17(v), respectively, the sequence (Fn ) has scale-invariant signif cant digits, and (pn ) does not. These facts are illustrated empirically in Figures 5.2 to 5.4, which show the relevant data for a = 2 and a = 7, respectively, and for the f rst 102 (Figures 5.2 and 5.3) and 104 (Figure 5.4) entries of both sequences. z Remark. The notion of scale-invariant digits for sequences (xn ) in (5.3) is somewhat more general than that for probability measures P in Def nition 5.1, in that the entries of (xn ) need not all be positive, whereas P (R+ ) = 1. It would be straightforward to extend Def nition 5.1, and also Def nition 5.10 below, to probability measures on (R, A) with A ⊃ σ(S). However, since the ensuing characterizations of Benford’s law (Theorems 5.3 and 5.13, respectively) would in turn be more cumbersome to state, throughout the remainder of this chapter it will be assumed for simplicity that P (R+ ) = 1, or at the very least that P ({0}) = 0. The next example is an elegant and entertaining application of the ideas underlying Theorems 5.3 and 5.5 to the mathematical theory of games. This game may be easily understood by a schoolchild, yet it has proven a challenge for game theorists not familiar with Benford’s law, especially Theorem 5.3. Example 5.7 ([109]). Consider a two-person game where Player A and Player B each independently choose a (real) number greater than 1, and Player A wins if the product of their two numbers starts with a 1, 2, or 3; otherwise, Player B wins. Using the tools presented in this section, it may easily be seen that there is a strategy for Player A to choose her numbers so that she wins with probability at least log 4 ∼ = 60.2%, no matter what strategy Player B uses. Conversely, there is a strategy for Player B so that Player A wins no more than log 4 of the time, no matter what strategy Player A uses. The idea is simple, using the scale-invariance property of Benford’s law established in Theorem 5.3: If Player A chooses her number X randomly according to Benford’s law, then since Benford’s law is scale-invariant, it follows from Theorem 4.21(i) and Example 5.2(i) that X · y is still Benford no matter what number y Player B chooses, so Player A will win with the probability that a

70

CHAPTER 5

2F1, 2F2, . . . , 2F100 2 2 4 6 10 16 26 42 68 110 178 288 466 754 1220 1974 3194 5168 8362 13530 21892 35422 57314 92736 150050

242786 392836 635622 1028458 1664080 2692538 4356618 7049156 11405774 18454930 29860704 48315634 78176338 126491972 204668310 331160282 535828592 866988874 1402817466 2269806340 3672623806 5942430146 9615053952 15557484098 25172538050

40730022148 65902560198 106632582346 172535142544 279167724890 451702867434 730870592324 1182573459758 1913444052082 3096017511840 5009461563922 8105479075762 13114940639684 21220419715446 34335360355130 55555780070576 89891140425706 145446920496282 235338060921988 380784981418270 616123042340258 996908023758528 1613031066098786 2609939089857314 4222970155956100

6832909245813414 11055879401769514 17888788647582928 28944668049352442 46833456696935370 75778124746287812 122611581443223182 198389706189510994 321001287632734176 519390993822245170 840392281454979346 1359783275277224516 2200175556732203862 3559958832009428378 5760134388741632240 9320093220751060618 15080227609492692858 24400320830243753476 39480548439736446334 63880869269980199810 103361417709716646144 167242286979696845954 270603704689413492098 437845991669110338052 708449696358523830150

#{D1 (2Fn ) = d} 100 0.3

(∆ = 1.49) 0.2 0.1

1 2 3 4 5 6 7 8 9

d

7F1, 7F2, . . . , 7F100 7 7 14 21 35 56 91 147 238 385 623 1008 1631 2639 4270 6909 11179 18088 29267 47355 76622 123977 200599 324576 525175

142555077518 849751 23915182360346949 38695577906193299 230658960693 1374926 62610760266540248 2224677 373214038211 603872998904 101306338172733547 3599603 5824280 163917098439273795 977087037115 265223436612007342 1580960036019 9423883 429140535051281137 2558047073134 15248163 694363971663288479 4139007109153 24672046 39920209 6697054182287 1123504506714569616 64592255 10836061291440 1817868478377858095 104512464 2941372985092427711 17533115473727 28369176765167 4759241463470285806 169104719 273617183 7700614448562713517 45902292238894 442721902 74271469004061 12459855912032999323 716339085 20160470360595712840 120173761242955 1159060987 32620326272628712163 194445230247016 1875400072 52780796633224425003 314618991489971 3034461059 85401122905853137166 509064221736987 4909861131 138181919539077562169 823683213226958 223583042444930699335 1332747434963945 7944322190 361764961984008261504 12854183321 2156430648190903 20798505511 3489178083154848 585348004428938960839 33652688832 5645608731345751 947112966412947222343 54451194343 9134786814500599 1532460970841886183182 88103883175 14780395545846350 2479573937254833405525

#{D1 (7Fn ) = d} 100 0.3

(∆ = 1.69) 0.2 0.1

1 2 3 4 5 6 7 8 9

d

Figure 5.2: The f rst one hundred Fibonacci numbers have approximately scaleinvariant signif cant digits: Multiplying F1 , F2 , . . . , F100 by 2 (top half) and by 7, respectively, leaves the distribution of f rst signif cant digits virtually unchanged; also see Figure 3.1.

Benford random variable has f rst signif cant digit less than 4, i.e., with probability exactly log 4. Conversely, if Player B chooses his number Y according to Benford’s law then, using scale-invariance again, x · Y is Benford, so Player A will again win with probability exactly log 4. In game-theoretic parlance, this means that the game has a value, and the value is log 4. Moreover, it can be

71

SCALE-, BASE-, AND SUM-INVARIANCE

2p1 , 2p2 , . . . , 2p100 4 6

62 74

146 158

254 262

358

10 14 22 26 34

82 86 94 106 118

166 178 194

274

38 46 58

362

466 478

566 586

706 718

838 842

934 958

202 206

278 298 302 314

382 386 394 398 422

482 502 514 526 538

614 622 626 634 662

734 746 758 766 778

862 866 878 886 898

974 982 998 1006 1018

122 134

214 218

326 334

446 454

542 554

674 694

794 802

914 922

1042 1046

142

226

346

458

562

698

818

926

1082

#{D1 (2pn ) = d} 100 0.3

(∆ = 13.10) 0.2 0.1 1 2 3 4 5 6 7 8 9

d 7p1 , 7p2 , . . . , 7p100 14 21

217 259

511 553

889 917

1253 1267

1631 1673

1981 2051

2471 2513

2933 2947

3269 3353

35 49 77 91 119

287 301 329 371

581 623 679 707 721

959 973 1043 1057 1099

1337 1351 1379 1393 1477

1687 1757 1799 1841 1883

2149 2177 2191 2219 2317

2569 2611 2653 2681 2723

3017 3031 3073 3101 3143

3409 3437 3493 3521 3563

749 763

1141 1169

1561 1589

1897 1939

2359 2429

2779 2807

3199 3227

3647 3661

791

1211

1603

1967

2443

2863

3241

3787

133 161

413 427 469

203

497

#{D1 (7pn ) = d} 100 0.3

(∆ = 9.50) 0.2 0.1 1 2 3 4 5 6 7 8 9

d

Figure 5.3: The f rst one hundred prime numbers do not have scale-invariant signif cant digits: Multiplying p1 , p2 , . . . , p100 by 2 (top half) and by 7, respectively, leads to very different distributions of f rst signif cant digits; also see Figure 3.2.

shown that Benford’s law is the only optimal strategy for each player [20, pp. 48–50]. z Theorem 5.3 shows that for a probability measure P on (R+ , B+ ) to have scale-invariant signif cant digits it is necessary (and suÿ cient) that P be Benford. In fact, as noted in [150], this conclusion already follows from a much weaker assumption: It is enough to require that the probability of a single signif cant digit remain unchanged under scaling. Theorem 5.8. For every random variable X with P(X = 0) = 0 the following statements are equivalent: (i) X is Benford; (ii) There exists a number d ∈ {1, 2, . . . , 9} such that P(D1 (aX) = d) = P(D1 (X) = d)

for all a > 0 ;

(iii) There exists a number d ∈ {1, 2, . . . , 9} such that P(D1 (aX) = d) = log(1 + d−1 )

for all a > 0 .

72

CHAPTER 5 #{D1 (Fn ) = d} 10000

#{D1 (2Fn ) = d} 10000

#{D1 (7Fn ) = d} 10000

0.3

(∆ = 0.01)

(∆ = 0.02)

(∆ = 0.03)

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

0.2 0.1

d

#{D1 (pn ) = d} 10000

#{D1 (2pn ) = d} 10000

#{D1 (7pn ) = d} 10000

0.3

(∆ = 14.09)

(∆ = 20.93)

(∆ = 13.57)

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

0.2 0.1

d

Figure 5.4: When the sample size is increased from N = 102 to N = 104 , the Fibonacci numbers are even closer to scale-invariance (top half). For the primes, this is not the case; also see Figure 3.3. Proof. Assume f rst that X is Benford. By Theorem 5.3, X has scaleinvariant signif cant digits. Thus for every a > 0, P(D1 (aX) = d) = P(D1 (X) = d) = log(1 + d−1 ) for all d = 1, 2, . . . , 9 , so (i) implies (ii) and (iii). Clearly (ii) follows from (iii), so assume next that (ii) holds. For every 0 ≤ s < 1 let G(s) = P(S(X) < 10s ). Hence G is non-decreasing and left-continuous, with G(0) = 0, and   P(D1 (X) = d) = G log(1 + d) − G(log d) .

Def ne a function g : R → R by setting g(x) = G(hxi) − hxi. Thus g is 1-periodic and Riemann-integrable, with g(0) = 0 and |g(x)| ≤ 1. Specif cally,   P(D1 (X) = d) = g log(1 + d) − g(log d) + log(1 + d−1 ) , and a straightforward calculation shows that, for all a > 0,

P(D1 (aX) = d) = g(log(1 + d) − hlog ai) − g(log d − hlog ai) + log(1 + d−1 ) . With this, the assumption that P(D1 (aX) = d) = P(D1 (X) = d) for all a > 0 simply means that the function g(log(1 + d) − x) − g(log d − x) is constant. A

73

SCALE-, BASE-, AND SUM-INVARIANCE

Fourier seriesP argument similar to that in the proof of Theorem 5.5 now applies: With g(x) = k∈Z ck e2πıkx , it follows that   X g(log(1 + d) − x) − g(log d − x) = ck e2πık log(1+d) − e2πık log d e−2πıkx k∈Z   X −1 = ck e2πık log d e2πık log(1+d ) − 1 e−2πıkx , k∈Z

and since log(1 + d ) is irrational for every d ∈ N, necessarily ck = 0 for all k 6= 0, hence g is constant almost everywhere, and so G(s) = s + c0 for a.a. s ∈ [0, 1). Since G is non-decreasing and left-continuous, G(s) = s + c0 for all 0 < s < 1, and since 0 ≤ G(s) ≤ 1, it follows that c0 = 0, i.e., G(s) ≡ s, which in turn implies (i). Thus the assertions (i), (ii), and (iii) are all equivalent.  −1

Close inspection of the above proof shows that Theorem 5.8 can be strengthened in various ways. On the one hand, other signif cant digits can be considered. For example, the theorem and its proof remain largely unchanged if in (ii) it is assumed that, for some m ≥ 2 and some d ∈ {0, 1, . . . , 9}, P(Dm (aX) = d) = P(Dm (X) = d)

for all a > 0 .

On the other hand, it is enough to assume in (ii) that, for some d ∈ {1, 2, . . . , 9}, P(D1 (an X) = d) = P(D1 (X) = d) for all n ∈ N ,

with the sequence (an ) of positive numbers being such that {hlog an i : n ∈ N} is dense in [0, 1). Possible choices for such a sequence include (2n ), (n2 ), and the sequence of prime numbers. Thus, for example, X is Benford if and only if P(D1 (2n X) = 1) = P(D1 (X) = 1) for all n ∈ N .

Example 5.9 (“Ones-scaling test” [150]). In view of Theorem 5.8, to informally test whether a sample of data comes from a Benford distribution, simply compare the proportion of the sample that has f rst signif cant digit 1 with the proportion after the data have been rescaled, i.e., multiplied by a, a2 , a3 , . . ., where log a is irrational, e.g., a = 2. In fact, it is enough to only consider re2 scalings by, for instance, an with n = 1, 2, 3, . . .. On the other hand, note that merely assuming P(D1 (2X) = d) = P(D1 (X) = d) for all d = 1, 2, . . . , 9

(5.5)

is not suÿ cient to guarantee that X is Benford. For instance, (5.5) is satisf ed z if X attains each of the four values 1, 2, 4, 8 with equal probability 14 . Recall that a random variable X is discrete if P(X ∈ C) = 1 for some countable set C ⊂ R. Clearly, no discrete random variable that is not identically zero can have scale-invariant signif cant digits. On the other hand, even if C is f nite, X may be close to being scale-invariant. In this context, [23] introduces a scale-distortion metric for n-point data sets, and shows that a sequence of real numbers is Benford if and only if the scale-distortion of the f rst n entries (data points) tends to zero as n goes to inf nity.

74 5.2

CHAPTER 5

THE BASE-INVARIANCE PROPERTY

One possible drawback to the hypothesis of scale-invariance in some tables is the special role played by the constant 1. For example, consider two physical laws, namely, Newton’s lex secunda F = ma and his universal law of gravitation F = Gm1 m2 /r2 . Both laws involve universal constants. In the lex secunda, the constant is usually made equal to 1 by the choice of units of measurement, and this 1 is then not recorded in most tables of universal constants. On the other hand, the constant G in the universal law of gravitation is typically recorded as a fundamental constant. If a “complete” list of universal physical constants also included the 1s, it would seem plausible that this special constant might occur with strictly positive frequency. But that would clearly violate scale-invariance, since then the constant 2, and in fact every other constant as well, would occur with this same positive probability, which is impossible. Instead, suppose it is assumed that any reasonable universal signif cant-digit law should have base-invariant signif cant digits, that is, the law should be equally valid when rewritten in terms of bases other than 10. In fact, “every argument that applies to 10 applies to [other bases] b mutatis mutandis” [127, p. 536]. As will be seen shortly, a hypothesis of base-invariant signif cant digits characterizes mixtures of Benford’s law and a Dirac probability measure concentrated at the special constant 1, which may occur with positive probability. Just as the only scale-invariant real-valued random variable is one that is 0 with probability one, the only positive random variable X that is base-invariant, i.e., X = 10Y for some random variable Y such that Y, 2Y, 3Y, . . . all have the same distribution, is the random variable that almost surely equals 1, that is, P(X = 1) = 1. This follows from the fact that nY has the same distribution for all n ∈ N, and hence P(Y = 0) = 1, as seen earlier. On the other hand, a positive random variable (or sequence, function, distribution) can have base-invariant signif cant digits. The idea behind baseinvariance of signif cant digits is simply this: The base-10 signif cand event A corresponds to the base-100 event A1/2 , since the new base b = 100 is the square of the original base b = 10. As a concrete example, denote by A the set of positive real numbers with f rst signif cant digit 1, i.e., A = {x > 0 : D1 (x) = 1} = {x > 0 : S(x) ∈ [1, 2)} . It is easy to see that A1/2 is the set   √  √ √  A1/2 = x > 0 : S(x) ∈ 1, 2 ∪ 10, 20 .

Now consider the base-100 signif cand function S100 , i.e., for any x 6= 0, S100 (x) is the unique number in [1, 100) such that |x| = 100k S100 (x) for some (necessarily unique) integer k. (To emphasize that the usual signif cand function S is taken relative to base 10, it will be denoted by S10 throughout this section.) Clearly, A = {x > 0 : S100 (x) ∈ [1, 2) ∪ [10, 20)} .

75

SCALE-, BASE-, AND SUM-INVARIANCE

Hence, letting s = log 2,      x > 0 : Sb (x) ∈ 1, bs/2 ∪ b1/2 , b(1+s)/2 =

(

A1/2 A

if b = 10 , if b = 100 .

Thus, if a distribution P on the signif cand σ-algebra S has base-invariant signif cant digits, then P (A) and P (A1/2 ) should be the same, and similarly for other integral roots (corresponding to other integral powers of the original base b = 10). Thus P (A) = P (A1/n ) should hold for all n ∈ N. (Recall from Lemma 2.15(iii) that A1/n ∈ S for all A ∈ S and n ∈ N, so those probabilities are well-def ned.) This motivates the following def nition. Definition 5.10. Let A ⊃ S be a σ-algebra on R+ . A probability measure P on (R+ , A) has base-invariant signif cant digits if P (A) = P (A1/n ) for all A ∈ S and all n ∈ N. Example 5.11. (i) Recall that δa denotes the Dirac measure concentrated at the point a, that is, δa (A) = 1 if a ∈ A, and δa (A) = 0 if a 6∈ A. The probability measure δ1 clearly has base-invariant signif cant digits since 1 ∈ A if and only if 1 ∈ A1/n . Similarly, δ10k has base-invariant signif cant digits for every k ∈ Z. On the other hand, δ2 does not have base-invariant signif cant digits since, with A = {x > 0 : S10 (x) ∈ [1, 3)}, δ2 (A) = 1 but δ2 (A1/2 ) = 0. (ii) It is easy to see that the Benford distribution B has base-invariant signif cant digits. Indeed, for any 0 ≤ s < 1, let [ A = {x > 0 : S10 (x) ∈ [1, 10s ]} = 10k [1, 10s ] ∈ S . k∈Z

Then, as seen in the proof of Lemma 2.15(iii), A1/n = and therefore B(A1/n ) =

Xn−1  j=0

[

k∈Z

10k

[n−1 j=0

[10j/n , 10(j+s)/n ]

  Xn−1  j + s j − log 10(j+s)/n − log 10j/n = j=0 n n

= s = B(A) .

(iii) The uniform distribution λ0,1 on (0, 1) does not have base-invariant signif cant digits. For instance, again taking A = {x > 0 : D1 (x) = 1} yields X √ √  √  √ √  λ0,1 (A1/2 ) = 10−n 2 − 1 + 20 − 10 = 19 + 91 5 − 1 2 − 2 n∈N

>

1 9

= λ0,1 (A) .

(iv) The probability measure 12 δ1 + 21 B has base-invariant signif cant digits since both δ1 and B do. z

76

CHAPTER 5

B(A1/2) = B(A) A= {D1 = 2}

1

9 8

λ0,1(log A1/2) = λ0,1(log A)

={2 ≤ S < 3}

2

A1/2

7

0.9 0.8

3

log A1/2

0.6

5

0.2 0.3

0.5

log S

S

0.1

0.7

4 6

0

0.4

log A

Figure 5.5: Visualizing the base-invariant signif cant digits of Benford’s law; see Example 5.11(ii).

Example 5.12. Completely analogously to the case of scale-invariance, it is possible to introduce a notion of a sequence or function having base-invariant signif cant digits and to formulate an analogue of Theorem 5.5 in the context of Theorem 5.13 below. With this, the sequence (Fn ) has base-invariant signif cant digits, whereas the sequence (pn ) does not. This is illustrated empirically in Figures 5.6 to 5.8. z The next theorem is the main result for base-invariant signif cant digits. It shows that convex combinations as in Example 5.11(iv) are the only probability distributions with base-invariant signif cant digits. To put the argument in perspective, recall that the proof of the scale-invariance theorem (Theorem 5.3) ultimately depended on Theorem 4.21, which in turn was proved analytically using Fourier analysis. The situation here is similar: An analytical  result (Lemma 5.15 below) identif es all probability measures on [0, 1), B[0, 1) that are invariant under every map Tn (s) = hnsi on [0, 1). Once this tool is available, it will be straightforward to prove the following analogue of Theorem 5.3. Theorem 5.13 (Base-invariance characterization [73]). A probability measure P on (R+ , A) with A ⊃ S has base-invariant signif cant digits if and only if, for some q ∈ [0, 1], P (A) = qδ1 (A) + (1 − q)B(A)

for every A ∈ S .

(5.6)

Corollary 5.14. A continuous probability measure P on R+ has baseinvariant signif cant digits if and only if P (A) = B(A) for all A ∈ S, i.e., if and only if P is Benford.   Recall that λ0,1 is Lebesgue measure on [0, 1), B[0, 1) , and Tn (s) = hnsi for every n ∈ N and s ∈ [0, 1). Generally, if T : [0, 1) → R is measurable and

77

SCALE-, BASE-, AND SUM-INVARIANCE 7 S(F17 ), S(F27 ), . . . , S(F100 )

2 S(F12 ), S(F22 ), . . . , S(F100 ) 1.000 1.000 4.000 9.000 2.500 6.400 1.690 4.410 1.156 3.025 7.921 2.073 5.428 1.421 3.721 9.741 2.550 6.677 1.748 4.576 1.198 3.136 8.212 2.149 5.628

1.473 3.858 1.010 2.644 6.922 1.812 4.745 1.242 3.252 8.514 2.229 5.836 1.527 4.000 1.047 2.741 7.177 1.879 4.919 1.288 3.372 8.828 2.311 6.050 1.584

4.147 1.085 2.842 7.442 1.948 5.100 1.335 3.496 9.153 2.396 6.273 1.642 4.300 1.125 2.947 7.716 2.020 5.288 1.384 3.624 9.490 2.484 6.504 1.702 4.458

1.167 3.055 8.000 2.094 5.483 1.435 3.758 9.839 2.576 6.744 1.765 4.622 1.210 3.168 8.294 2.171 5.685 1.488 3.896 1.020 2.670 6.992 1.830 4.792 1.254

1.000 1.000 1.280 2.187 7.812 2.097 6.274 1.801 5.252 1.522 4.423 1.283 3.728 1.082 3.142 9.124 2.649 7.692 2.233 6.484 1.882 5.466 1.587 4.608 1.337

1.452 4.217 1.224 3.555 1.032 2.997 8.703 2.526 7.336 2.130 6.184 1.795 5.213 1.513 4.395 1.276 3.705 1.075 3.123 9.068 2.633 7.644 2.219 6.444 1.871

5.432 1.577 4.579 1.329 3.860 1.120 3.254 9.449 2.743 7.966 2.312 6.715 1.949 5.661 1.643 4.772 1.385 4.023 1.168 3.391 9.846 2.858 8.300 2.410 6.997

#{D1 (Fn7 ) = d} 100

#{D1 (Fn2 ) = d} 100 0.3

3.884 1.127 3.274 9.508 2.760 8.015 2.327 6.756 1.961 5.696 1.653 4.801 1.394 4.047 1.175 3.412 9.907 2.876 8.352 2.424 7.040 2.044 5.935 1.723 5.003

0.3

(∆ = 1.49)

(∆ = 1.79) 0.2

0.2

0.1

0.1

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

d

d

Figure 5.6: Illustrating the (approximate) base-invariance of the f rst one hundred Fibonacci numbers; see Example 5.12.     T [0, 1) ⊂ [0, 1), a probability measure P on [0, 1), B[0, 1) is said to be T invariant (or, T is P -preserving) if P ◦ T −1 = P . Which probability measures are Tn -invariant for all n ∈ N? A complete answer to this question is provided by the following lemma.   Lemma 5.15 ([73]). A probability measure P on [0, 1), B[0, 1) is Tn -invariant for all n ∈ N if and only if P = qδ0 + (1 − q)λ0,1 for some q ∈ [0, 1].   Proof. Let Pb (k) k∈Z denote the Fourier coeÿ cients of P , and observe that P\ ◦ Tn−1 (k) = Pb(nk)

for all k ∈ Z, n ∈ N .

78

CHAPTER 5

S(p21 ), S(p22 ), . . . , S(p2100 )

S(p17 ), S(p72 ), . . . , S(p7100 )

4.000 9.000 2.500 4.900 1.210 1.690 2.890 3.610 5.290 8.410 9.610 1.369 1.681 1.849 2.209 2.809 3.481 3.721 4.489 5.041 5.329 6.241 6.889 7.921 9.409

1.280 2.187 7.812 8.235 1.948 6.274 4.103 8.938 3.404 1.724 2.751 9.493 1.947 2.718 5.066 1.174 2.488 3.142 6.060 9.095 1.104 1.920 2.713 4.423 8.079

1.020 1.060 1.144 1.188 1.276 1.612 1.716 1.876 1.932 2.220 2.280 2.464 2.656 2.788 2.992 3.204 3.276 3.648 3.724 3.880 3.960 4.452 4.972 5.152 5.244

5.428 5.712 5.808 6.300 6.604 6.916 7.236 7.344 7.672 7.896 8.008 8.584 9.424 9.672 9.796 1.004 1.095 1.135 1.204 1.218 1.246 1.288 1.346 1.391 1.436

1.466 1.513 1.576 1.608 1.672 1.755 1.772 1.857 1.874 1.927 1.962 2.016 2.088 2.125 2.143 2.180 2.294 2.371 2.410 2.490 2.530 2.590 2.714 2.735 2.926

1.072 1.229 1.605 1.828 2.352 5.328 6.620 9.058 1.002 1.630 1.789 2.351 3.057 3.622 4.637 5.888 6.364 9.273 9.974 1.151 1.235 1.861 2.742 3.105 3.302

3.728 4.454 4.721 6.276 7.405 8.703 1.019 1.073 1.251 1.383 1.453 1.853 2.570 2.813 2.943 3.216 4.353 4.936 6.057 6.306 6.830 7.685 8.967 1.004 1.123

1.208 1.347 1.554 1.667 1.914 2.267 2.344 2.762 2.853 3.142 3.348 3.678 4.163 4.424 4.561 4.844 5.785 6.496 6.879 7.703 8.146 8.851 1.041 1.070 1.356

#{D1 (p7n ) = d} 100

#{D1 (p2n ) = d} 100 0.3

0.3

(∆ = 3.91)

(∆ = 6.39) 0.2

0.2

0.1

0.1

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

d

d

Figure 5.7: Illustrating the lack of base-invariance for the f rst one hundred prime numbers; see Example 5.12.

Assume f rst that P = qδ0 + (1 − q)λ0,1 for some q ∈ [0, 1]. From δb0 (k) ≡ 1 and λd 0,1 (k) = 0 for all k 6= 0, it follows that Pb(k) =



1 q

if k = 0 , if k = 6 0.

For every n ∈ N and k ∈ Z\{0}, therefore, P\ ◦ Tn−1 (k) = q, and clearly P\ ◦ Tn−1 (0) = 1. Thus P\ ◦ Tn−1 (k) = Pb (k) for all k ∈ Z, and Lemma 4.20(iii) shows that P ◦ Tn−1 = P for all n ∈ N. Conversely, assume P is Tn -invariant for all n ∈ N. In this case, for every n ∈ N, Pb (n) = P\ ◦ Tn−1 (1) = Pb (1), and also Pb(−n) = P\ ◦ Tn−1 (−1) = Pb(−1).

79

SCALE-, BASE-, AND SUM-INVARIANCE #{D1 (Fn7 ) = d} 10000

#{D1 (Fn2 ) = d} 10000

#{D1 (Fn ) = d} 10000 0.3

(∆ = 0.02)

(∆ = 0.01)

(∆ = 0.01)

0.2 0.1

d

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

#{D1 (pn ) = d} 10000

#{D1 (p2n ) = d} 10000

#{D1 (p7n ) = d} 10000

0.3

(∆ = 6.70)

(∆ = 14.09)

(∆ = 1.34)

0.2 0.1

d

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

Figure 5.8: Increasing the sample size from N = 102 to N = 104 makes the sample of Fibonacci numbers’ leading digits even closer to being base-invariant (top half). As in the case of scale-invariance, this is not at all true for the primes; see Figures 3.3 and 5.4. Note, however, that (p7n ) conforms to the f rst-digit law considerably better than (pn ); cf. Theorem 8.8 below. Since Pb (−k) = Pb(k), there exists q ∈ C such that  if k > 0 ,   q b 1 if k = 0 , P (k) =   q if k < 0 . Also, observe that, for every x ∈ R, limn→∞

1 Xn e2πıjx = j=1 n

(

1 0

if x ∈ Z , if x ∈ 6 Z.

Using this and the Dominated Convergence Theorem, it follows from Z 1 1 Xn 1 Xn b P ({0}) = limn→∞ e2πıjs dP (s) = limn→∞ P (j) = q j=1 j=1 n n 0

that q is real, and in fact q ∈ [0, 1]. Hence the Fourier coeÿ cients of P are exactly the same as those of the probability measure qδ0 + (1 − q)λ0,1 . By uniqueness of Fourier coeÿ cients (Lemma 4.20(iii)), therefore, P = qδ0 + (1 − q)λ0,1 . 

80

CHAPTER 5

Note that P is Tmn -invariant if it is both Tm - and Tn -invariant. Thus, in Lemma 5.15 it is enough to require only that P be Tn -invariant whenever n is a prime number. It is natural to ask how small the set M of natural numbers n can be so that Tn -invariance for all n ∈ M suÿ ces in the “only if” part of Lemma 5.15. By the observation just made, it can be assumed that M is closed under multiplication, and hence is a (multiplicative) semi-group. If M is lacunary, i.e., M ⊂ {pm : m ∈ N} for some p ∈ N, then probability measures P satisfying P ◦ Tn−1 = P for all n ∈ M exist in abundance, so M is too small to imply the “only if” part of Lemma 5.15. If, on the other hand, M is not lacunary, then in general it is not known whether an appropriate analogue of Lemma 5.15 may hold. For example, if M = {2m1 3m2 : m1 , m2 ∈ N0 } then the probability P measure P = 14 4j=1 δj/5 is Tn -invariant for every n ∈ M, but it is a famous open question of Furstenberg [54] whether any continuous probability measure with this property exists — except, of course, for P = λ0,1 . In the context of Benford’s law, this famous question can equivalently be stated as follows: If X is a continuous random variable for which X, X 2 , and X 3 all have the same distribution of signif cant digits, does it follow that X is Benford? Proof of Theorem 5.13. As in the proof of Theorem 5.3, f x any probability measure P on (R+ , A), denote by P0 its restriction to (R+ , S), and consider the probability measure Q = P0 ◦ℓ−1 . Observe that P0 has base-invariant significant digits if and only if Q is Tn -invariant for all n ∈ N. Indeed, with 0 ≤ s < 1 and A = {x > 0 : S10 (x) ≤ 10s }, [  n−1 h j j + s i −1 Q ◦ Tn ([0, s]) = Q , j=0 n n [  [n−1     k j/n (j+s)/n = P0 10 10 , 10 = P0 A1/n , k∈Z

j=0

and hence Q ◦ Tn−1 = Q for all n precisely if P0 has base-invariant signif cant digits. In this case, by Lemma 5.15, Q = qδ0 + (1 − q)λ0,1 for some q ∈ [0, 1], which in turn implies that P0 (A) = qδ1 (A) + (1 − q)B(A) for every A ∈ S.  5.3

THE SUM-INVARIANCE PROPERTY

No f nite data set can obey Benford’s law exactly, since the Benford probabilities of sets with m given signif cant digits become arbitrarily small as m goes to inf nity, and no discrete probability measure with f nitely many atoms can take arbitrarily small positive values. But, as f rst observed by Nigrini [112], if a table of real data approximately follows Benford’s law, then the sum of the signif cands of all entries in the table with f rst signif cant digit 1 is very close to the sum of the signif cands of all entries with f rst signif cant digit 2, and to the sum of the signif cands of entries with the other possible f rst signif cant digits as well. This clearly implies that the table must contain more entries starting with 1 than with 2, more entries starting with 2 than with 3, and so forth. Similarly,

81

SCALE-, BASE-, AND SUM-INVARIANCE

the sums of signif cands of entries with (D1 , D2 , . . . , Dm ) = (d1 , d2 , . . . , dm ) are approximately equal for all tuples (d1 , d2 , . . . , dm ) of a f xed length m. In fact, even the sum-invariance of f rst or f rst and second digits yields a distribution close to Benford’s law; see Figures 5.9 and 5.11. Nigrini conjectured, and partially proved, that this sum-invariance property also characterizes Benford’s law. Note that it is the signif cands of the data, rather than the data themselves, that are added. Simply summing up the raw data will not lead to any meaningful conclusion, since the resulting sums may be dominated by a few very large numbers. It is only through considering signif cands that the magnitude of the individual numbers becomes irrelevant. d

1

2

3

4

5

6

7

8

9

Nd = #{xn= d}

2520

1260

840

630

504

420

360

315

280

8.83

7.06

5.89

5.04

4.41

3.92

100 Nd /N N=

35.34 17.67 11.78

Σd Nd = 7129

(∆ = 5.24)

Figure 5.9: The smallest sample x1 , x2 , . . . , xN from {1, 2, . . . , 9} that has exact sum-invariance for the f rst signif cant digit consists of N = 7129 elements; see Def nition 5.16 below. To motivate a precise def nition of sum-invariance, let A = {x1 , x2 , . . . , xN } be a f nite data set (unordered tuple) of N different positive numbers, as in Nigrini’s framework. Then the sum of the signif cands of all entries in A with f rst signif cant digit d is simply XN

j=1

  XN   S(xj )1[d,d+1) D1 (xj ) = S(xj )1[d,d+1) S(xj ) . j=1

(Recall that 1C denotes the indicator function of the set C, that is, 1C (x) = 1 if x ∈ C, and 1C (x) = 0 otherwise.) Thus if X is a random variable uniformly distributed on A, the expected value of the sum of the signif cands of the entries with f rst signif cant digit d is simply      1 XN S(xj )1[d,d+1) S(xj ) = E S(X)1[d,d+1) S(X) , j=1 N    so X has sum-invariant f rst signif cant digits if E S(X)1[d,d+1) S(X) is independent of d ∈ {1, 2, . . . , 9}; see Figure 5.9 for such a f nite data set A. Generalizing this to second and higher signif cant digits leads to the following natural notion of sum-invariance of signif cant digits for random variables. For convenience, for all m ∈ N, all d1 ∈ {1, 2, . . . , 9}, and all dj ∈ {0, 1, . . . , 9}, j ≥ 2, let  C(d1 , d2 , . . . , dm ) = t ∈ [1, 10) : Dj (t) = dj for j = 1, 2, . . . , m .

82

CHAPTER 5

Note that C(d1 , d2 , . . . , dm ) ⊂ [1, 10) is an intervalPof length 101−m ; in fact, m m−j 10m−1 C(d1 , d2 , . . . , dm ) = [N, N + 1) with N = dj ∈ N. Also, j=1 10   clearly S(x) ∈ C D1 (x), D2 (x), . . . , Dm (x) for all x 6= 0 and m ∈ N. For example, C(4, 0, 3) = [4.03, 4.04) = 10−2 [403, 404) . Definition 5.16. A random variable X has sum-invariant  signif cant digits  if, for every m ∈ N, the value of E S(X)1C(d1 ,d2 ,...,dm ) S(X) is independent of d1 , d2 , . . . , dm . QF

10 9 8

F 1

7

A9

6 5

0.5

QF (u)

F (t) = FS(X) (t) 4

0

t

A1 0

1

2

3

4

5

3 6

7

9

8

10 2 1

area (Ad ) = E S(X)1C(d) S(X)

0

A1 0

A9

u

1

0.5

Figure 5.10: The area of the red region A1 left of the distribution function F (and below the quantile function QF , as def ned in (5.7) below) is the expected sum of the (signif cands of the) numbers with f rst signif cant digit 1; that of the green region A9 is the expected sum of the numbers with f rst signif cant digit 9. Since these areas are unequal, any random variable X with F = FS(X) does not have sum-invariant signif cant digits; see Def nition 5.16, Theorem 5.18, and Lemma 5.19. Example 5.17. (i) If X is uniformly distributed on (0, 1), then X does not have sum-invariant signif cant digits. This follows from Theorem 5.18 below but can also be seen by a simple direct calculation: For every d1 ∈ {1, 2, . . . , 9},    X∞ E S(X)1C(d1 ) S(X) = 10n n=1

Z

10−n (d1 +1)

10−n d

1

x dx = 91 d1 +

1 18

,

83

SCALE-, BASE-, AND SUM-INVARIANCE #{S(xi ) ≤ t} N

1

N = 7129 (∆∞ = 35.34)

0.5

1 N

t log

1 0

0

t 1

2

3

4

6

5

7

9

8

10

#{S(xi ) ≤ t} N

1

N = 1.637·1041 (∆∞ = 4.25) 1 N

0.5

1 1

2

3

4

6

5

9

8

1

0.2

0

7

10

(exact sum-invariance for all digits: E[S1 [d,d+1) (S)] = log e = 0.4342)

t

0

t

Σd≤S 0. Then the following are equivalent: Z F (x2 ) (i) There exists C > 0 with Q(s) ds = C(x2 − x1 ) for all x2 > x1 , x1 , x2 ∈ (a, b);

(ii) F (x) =

F (x1 )

log x − log a for all x ∈ (a, b). log b − log a

−1

Moreover, (i) and (ii) both imply that C = (b − a)

E[X] = (ln b − ln a)−1 .

Proof. Assume f rst that (i) holds. Then F is strictly increasing on (a, b), and hence Q ◦ F (x1 ) = x1 for all x1 ∈ (a, b). Also, F is differentiable almost everywhere on (a, b); see [37, Thm. 1.3.1]. Moreover,     C(x2 − x1 ) ≥ Q ◦ F (x1 ) F (x2 ) − F (x1 ) ≥ a F (x2 ) − F (x1 )

88

CHAPTER 5

shows that F is absolutely continuous (in fact, Lipschitz). Since F is continuous and strictly increasing, so is Q. Pick any x1 for which F ′ (x1 ) exists, and observe that by assumption (i), C = limx2 ցx1 = limx2 ցx1

1 x2 − x1

Z

F (x2 )

Q(s) ds

F (x1 )

F (x2 ) − F (x1 ) 1 · x2 − x1 F (x2 ) − F (x1 )

= F ′ (x1 ) · Q ◦ F (x1 ) = F ′ (x1 ) x1 .

Z

F (x2 )

Q(s) ds

F (x1 )

Since F is absolutely continuous with F (a) = 0, for every x1 ∈ (a, b), Z x1 Z x1 ′ x−1 C dx = C(ln x1 − ln a) , F (x) dx = F (x1 ) = a

a

and since limxրb F (x) = 1, C = (ln b − ln a)−1 . Thus F (x) =

log x − log a , log b − log a

establishing (ii). Conversely, (ii) implies (i) via a simple calculation, with Q(s) = a1−s bs , and R1  (5.8) yields E[X] = 0 a1−s bs ds = (b − a)(ln b − ln a)−1 = C(b − a).

These two lemmas provide a simple proof of the sum-invariance characterization of Benford’s law. Proof of Theorem 5.18. Since the def nitions of a random variable X having sum-invariant signif cant digits and being Benford both involve only the signif cand of X, without loss of generality assume that 1 ≤ X < 10 with probability one, so S(X) = X. Assume f rst that X has sum-invariant signif cant digits. Note that this implies E[X1C (X)] = 19 101−m E[X] = 91 λ(C)E[X] for every C = C(d1 , d2 , . . . , dm ). As a consequence, P(X = t) = 0 for all 1 ≤ t < 10; hence P(1 < X < 10) = 1. Since the intervals C(d1 , d2 , . . . , dm ) generate the Borel σ-algebra B[1, 10), the equality 91 λ(C)E[X] = E[X1C (X)] actually holds for all C ∈ B[1, 10). In particular, therefore, Lemma 5.19 shows that for all 1 < t1 < t2 < 10, Z FX (t2 )   1 QX (s) ds , 9 (t2 − t1 )E[X] = E X1(t1 ,t2 ] (X) = FX (t1 )

and Lemma 5.20 with a = 1 and b = 10 yields FX (t) = log t for all t ∈ [1, 10), i.e., X is Benford. The converse, i.e., that every Benford random variable has sum-invariant signif cant digits, has been shown in Example 5.17(iii). 

As an application, Theorem 5.18 provides another informal test for goodnessof-f t to Benford’s law: Simply calculate the differences between the sums of the signif cands of the data corresponding to the same initial sequence of signif cant

SCALE-, BASE-, AND SUM-INVARIANCE

89

digits, and if these differences are large, the data are not a good f t to Benford’s law; see [112]. Also, Theorem 5.18, together with Theorems 5.3 and 5.13, has the following immediate consequence. Theorem 5.21. If a probability measure has scale-invariant or sum-invariant signif cant digits then it has base-invariant signif cant digits.

Chapter Six Real-valued Deterministic Processes In many disciplines of science, one-dimensional deterministic (i.e., non-random) systems provide the simplest models for processes that evolve over time. Mathematically, these models take the form of one-dimensional difference or differential equations, and this chapter presents the basic theory of Benford’s law for them. Specif cally, conditions are studied under which these models conform to Benford’s law by generating Benford sequences and functions, respectively. In the f rst seven sections, the focus is on discrete-time systems (i.e., difference equations) because they are somewhat easier to work with explicitly. Once the Benford properties of discrete-time systems are understood, it is straightforward to establish the analogous properties for continuous-time systems (i.e., differential equations), which is done in the chapter’s eighth and f nal section. Throughout the chapter, recall that by Theorem 4.2 a sequence (xn ) of real numbers, or a real-valued function f , is Benford if and only if (log |xn |), or log |f |, is uniformly distributed modulo one. 6.1

ITERATION OF FUNCTIONS

In a one-dimensional f rst-order difference equation (or one-step recursion) the state of a process at time n ∈ N, described by the real number xn , is completely determined by the preceding state xn−1 . More formally, xn = f (xn−1 ) ,

n ∈ N,

(6.1)

where f : C → R is a function that maps C ⊂ R into itself, i.e., f (C) ⊂ C. In what follows, the set C often equals R+ or [c, +∞) for some (large) c ≥ 0. In accordance with standard terminology in dynamics [45, 88], the function f is also referred to as a map. Given any x0 ∈ C, the difference equation (6.1) recursively def nes a sequence (xn ) in C, often called the orbit of x0 (under f ). With the nth 0 n n−1 iterate of f denoted by f n , i.e., ◦ f for every n ∈ N, the  nf =idC and f = f orbit of x0 is simply (xn ) = f (x0 ) . Note that this interpretation of the orbit as a sequence differs from terminology sometimes used in dynamics [88] where the orbit of x0 ∈ C is the mere set {f n (x0 ) : n ∈ N}. Notice also that, here and throughout, a clear notational distinction is made between the iterates of a real-valued map f , written as f n (x), and its (usually different) integer powers, written as f (x)n ; thus, for instance with f (x) = ex , x

f 2 (x) = ee 6= e2x = f (x)2 .

REAL-VALUED DETERMINISTIC PROCESSES

91

Despite its modest appearance, (6.1) can exhibit an enormous variety of different phenomena, depending on the particular properties of the map f . The simplest scenario occurs if f p (x0 ) = x0 for some p ∈ N. In this case, the orbit of x0 is a periodic sequence   (xn ) = f (x0 ), f 2 (x0 ), . . . , f p−1 (x0 ), x0 , f (x0 ), . . .

of period p, i.e., xn+p = xn for all n, and x0 is a periodic point, or a f xed point if p = 1. Furthermore, call x0 and its orbit attracting if, in addition, limn→∞ |f np+j (x) − f j (x0 )| = 0 for every j = 1, 2, . . . , p whenever |x − x0 | is suÿ ciently small. Clearly, (xn ) is not Benford in this except pos situation,  sibly if p = 1 and x0 = 0. Notice that the sequence S(xn ) may be periodic √ even when (xn ) is not: E.g., for f (x) = x 10 and x0 = 1 √ the generated orbit √   (xn ) = (10n/2 ) tends to +∞, but S(xn ) = 10hn/2i = 10, 1, 10, . . . is 2-periodic. In a slight abuse of terminology, call +∞ a f xed point of f whenever limx→+∞ f (x) = +∞, and call it an attracting f xed point if, in addition, limn→∞ f n (x) = +∞ for all suÿ ciently large x. For example, f (x) = x−1+e−x has 0 and +∞ as an attracting and a non-attracting (repelling) f xed point, respectively, whereas f (x) = ex has +∞ as its unique attracting f xed point. The following examples illustrate the wide scope of (6.1) with respect to Benford’s law, and also motivate many of the results presented throughout this chapter. √ Example 6.1. (i) Consider the map f (x) = x + 2 x + 1. The orbit  of √ any x0 ≥ 0 under f can be computed explicitly as (xn ) = ( x0 + n)2 .  In √ particular, if x0 = 0 then (xn ) = (n2 ). Since (log xn ) = 2 log( x0 + n) is increasing but (log xn / log n) is bounded, (xn ) is not Benford, by Proposition 4.6(iv). Theorem 6.4 below yields this conclusion directly from simple properties of the map f ; see Example 6.5(ii). (ii) As seen in Example 4.17(v), the sequence (pn ) = (2, 3, 5, 7, 11, . . .) of prime numbers is not Benford. To see that (pn ) f ts into the framework of (6.1), let f (x) = x + 1 whenever x ≤ 2 = p1 , and on the interval [pn , pn+1  ] let f be linear with f (pn ) = pn+1 and f (pn+1 ) = pn+2 . Then (pn ) = f n (1) . Note that f is continuous, and |f (x) − x| ≤ max{pn+2 − pn+1 , pn+1 − pn } for all x ∈ [pn , pn+1 ] .

Again, Theorem 6.4 below yields directly that (pn ) is not Benford; see Example 6.6. z Example 6.2. (i) By Theorem 4.16, the sequence (an x0 ) is Benford for every x0 6= 0 or for none, depending on whether log |a| is irrational or not. Note that this sequence is simply the orbit of x0 under the map f (x) = ax. The Benford properties of (essentially) linear maps are clarif ed by Theorem 6.13 below. (ii) The Fibonacci sequence (Fn ) = (1, 1, 2, 3, 5, . . .) is Benford; see Example Fn 4.18. By means of the (linear) maps f1 (x) ≡ x and fn (x) := x for n ≥ 2, Fn−1

92

CHAPTER 6

the sequence (Fn ) can be considered the unique solution, with x0 = 1, of the difference equation xn = fn (xn−1 ) , n ∈ N . (6.2)

Note that, unlike the map f in (6.1), the maps fn in (6.2) explicitly depend on n. In dynamical systems parlance, (6.2) is nonautonomous (i.e., explicitly time-varying) whereas (6.1) is autonomous (i.e., not explicitly time-varying). Nonautonomous difference equations will be studied in some detail in Section 6.6. Specif cally, Theorem 6.40 and Lemma 6.44 directly apply to (6.2), showing  that the sequence fn ◦ · · · f1 (x0 ) is Benford for all x0 6= 0. Note also that while fn does depend on n, √ this dependence becomes negligible rather quickly as n → ∞: With ϕ = 12 (1 + 5) > 1, ϕ2n |fn (x) − ϕx| ≤ (2 + 3ϕ)|x| for all n ∈ N, x ∈ R ; in particular, limn→∞ fn (x) = ϕx uniformly on every compact interval. (iii) Let fn (x) = nx, so again fn explicitly depends on n. Unlike in (ii), the sequence (fn ) does not converge in any reasonable sense. Nevertheless, results systems presented in Section 6.6 apply and show  for nonautonomous  that fn ◦ · · · ◦ f1 (x0 ) = (n!x0 ) is Benford for all x0 6= 0; in particular, (n!) is Benford [15, 46]. z

2 Example 6.3. Consider  n the map f (x) = 10x . Note that f (0) = 0 and 1 1 f (± 10 ) = 10 ; otherwise f (x0 ) tends to +∞ or to 0, depending on whether 1 1 1 |x0 | > 10 or |x0 | < 10 . Informally put, every orbit with |x0 | 6= 10 is attracted to one of the two f xed points at +∞ and 0. Unlike the case in Example 6.2 above,  not every orbit is Benford. For instance, if x0 = 1 then S(xn ) = S f n (1) = 1 for all n. For another example, let x0 = 101/3 and     n note that S(xn ) = 10h(−1) /3i = (102/3 , 101/3 , 102/3 , . . .) is 2-periodic. In   n n fact, the explicit formula f n (x) = 102 −1 x2 shows  n that S(xn ) is eventually periodic whenever log |x0 | is rational, so clearly f (x0 ) is not Benford in this case. Nevertheless, Proposition 6.20 below asserts that f n (x0 ) is Benford for most x0 ∈ R. z

From the above examples the reader may have gained the following general impressions (see also Figure 6.1): Sequences with polynomial growth or decay like (n2 ) or (pn ) are not Benford; sequences with exponential growth or decay, generated in the simplest case as orbits of linear maps like f (x) = ax, are either all Benford or none, depending on a; and f nally, sequences with superexponential growth or decay, generated as orbits of maps like f (x) = 10x2 , may or may not be Benford, and such maps exhibit a delicate mixture of Benford and non-Benford orbits. A main goal of this chapter is to demonstrate rigorously that, by and large, these impressions are correct. For the sake of notational clarity, results are often formulated only for maps f : R+ → R+ even though they can easily be extended to maps f : R → R. The details of such extensions are occasionally hinted at in examples, but more often are left to the interested reader.

93

REAL-VALUED DETERMINISTIC PROCESSES

1

2

3

4

5

6

7

8

9.84

9.08

8.45

7.91

9



(n2 )

19.19 14.69 12.37 10.95

7.52

10.91

(pn )

16.01 11.29 10.97 10.55 10.13 10.13 10.27 10.03 10.06

14.09

(2n )

30.10 17.61 12.49

9.70

7.91

6.70

5.79

5.12

4.58

0.00

(Fn )

30.11 17.62 12.50

9.68

7.92

6.68

5.80

5.13

4.56

0.01

(n!)

29.56 17.89 12.76

9.63

7.94

7.15

5.71

5.10

4.26

0.54

xn = 10x2n−1 x0 = 2

30.19 17.66 12.68

9.56

7.83

6.97

5.45

5.13

4.53

0.34

Figure 6.1: Relative frequencies (in percent) of the leading signif cant (decimal) digit for the f rst 104 terms of the sequences (xn ) discussed in Examples 6.1 to 6.3. Throughout, the following standard terminology is used: For any two functions f, g : R+ → R, the statement f = O(g) as x → +∞ means that lim supx→+∞ |f (x)/g(x)| < +∞, whereas f = o(g) as x → +∞ means that limx→+∞ f (x)/g(x) = 0. Note that, for every a ∈ R, f = o(xa ) implies f = O(xa ), which in turn implies f = o(xa+ε ) for every ε > 0. For sequences of functions fn : R+ → R, a uniform version of this terminology is employed that reduces to the above in case fn ≡ f : Specif cally, fn = O(g) uniformly as x → +∞ , if there exists c > 0 such that |fn (x)/g(x)| < c for all n ∈ N and x ≥ c; and fn = o(g) uniformly as x → +∞ , if, for every ε > 0, there exists c > 0 such that |fn (x)/g(x)| < ε for all n ∈ N and x ≥ c. 6.2

SEQUENCES WITH POLYNOMIAL GROWTH

2 Recall from Example 4.7(ii) that neither  n of the sequences (n) and (n ) is Benford. Clearly, both sequences are orbits f (0) under the maps f (x) = x + 1 and

94

CHAPTER 6

√ f (x) = x + 2 x + 1, respectively. As the following theorem shows, for these  maps the orbit f n (x0 ) is not Benford for any x0 . The simple reason is that in both cases the map f is close to the identity map, and no orbit under the latter is Benford. Theorem 6.4. Let f : R+ → R+ be a map such that f (x) = x + g(x) with g(x) ≥ 0 for all suÿ ciently large x. If g = o(x1−ε ) as x → +∞ with some ε > 0, then for all suÿ ciently large x0 , the orbit f n (x0 ) is not Benford.

Proof. Pick ξ > 0 such that g(x) x ≥ ξ. For every x0 ≥ ξ,  ≥ 0 whenever  the non-decreasing sequence (xn ) = f n (x0 ) is clearly not Benford whenever bounded. Assume therefore that limn→∞ xn = +∞, and let yn = log xn for every n ∈ N. Then (yn ) is non-decreasing as well, with limn→∞ yn = +∞, and yn = yn−1 + h(yn−1 ), where   g(10y ) h(y) = log 1 + . 10y

Recall that g = o(x1−ε ) as x → +∞, by assumption. Fix 0 < δ < ε and consider the auxiliary function H : R+ → R+ given by H(y) =

  1 log 1 + 10−δy . δ

Since h = o(10−εy ) as y → +∞, but limy→+∞ 10δy H(y) = δ −1 log e, it follows that h(y) ≤ H(y) for the appropriate η > 0 and all y ≥ η. Fix any Y0 ≥ η and def ne a sequence (Yn ) recursively as Yn = Yn−1 + H(Yn−1 ), n ≥ 1. Given any y0 ≥ max{η, log ξ}, note that y1 − Y1 = y0 − Y0 + h(y0 ) − H(Y0 ) ≤ y0 − Y0 + H(y0 ) − H(Y0 ) . If y0 ≤ Y0 then y1 − Y1 ≤ H(y0 ) ≤ H(η). On the other hand, if y0 > Y0 then y1 − Y1 ≤ y0 − Y0 since H is decreasing. In either case, therefore,  η ≤ y0 ≤ y1 ≤ Y1 + max H(η), |y0 − Y0 | , and iterating the same argument shows that  η ≤ yn ≤ Yn + max H(η), |y0 − Y0 | for all n .

(6.3)

Next note that Yn is explicitly, i.e., non-recursively, given by

  1     1 1 log 1 + 10δYn−1 = log 2 + 10δYn−2 = . . . = log n + 10δY0 . δ δ δ (6.4) From (6.3) and (6.4), it follows that (yn / log n) is bounded, and hence by Proposition 4.6(iv), the sequence (yn ) is not u.d. mod  1 for y0 ≥ max{η, log ξ}. Whenever x0 ≥ max{10η , ξ}, therefore, the orbit f n (x0 ) is not Benford.  Yn =

95

REAL-VALUED DETERMINISTIC PROCESSES

0 Example 6.5. (i) For the map  f (x) = x + 1, g(x) ≡ 1 = O(1) = O(x ) as n x → +∞. Consequently, f (x0 ) = (x0 + n) is not Benford for any suÿ ciently large x0 , nor in fact for any x0 ∈ R. √ (ii) Similarly, for the map f (x) = x + 2 x + 1 in Example 6.1(i), √ 0 < g(x) = 2 x + 1 = O(x1/2 ) as x → +∞ .   For every x0 ∈ R+ , therefore, f n (x0 ) is not Benford either. z

Example 6.6. Consider the map introduced in Example 6.1(ii), namely,  if x ≤ p1 = 2 ,  x+1 f (x) = p − pn+1  n+2 (x − pn ) + pn+1 if pn ≤ x < pn+1 , pn+1 − pn

where (pn ) = (2, 3, 5, 7, 11, . . .) is the sequence of prime numbers. As observed in Example 6.1(ii), 1 ≤ g(x) = f (x) − x ≤ max{pn+2 − pn+1 , pn+1 − pn }

for all x ∈ [pn , pn+1 ] .

A classical result due to G. Hoheisl asserts that, for some ε > 0, pn+1 − pn = o(p1−ε n )

as n → ∞ ;

(6.5)

19 = 0.475. From see [5] where it is shown that ε can be chosen as large as ε = 40 1−ε (6.5) it is clear that g = o(x ) as x → +∞. Thus Theorem 6.4  n    implies that f (x0 ) is not Benford for any x0 ∈ R. In particular, f n (1) = (pn ) is not Benford, as already seen in Example 4.17(v). z

The next example demonstrates that the two key assumptions on g in Theorem 6.4 are, in a sense, best possible.

Example 6.7. (i) Clearly g(x) ≥ 0 for all suÿ ciently large x whenever lim inf x→+∞ g(x) > 0. However, the conclusion of Theorem 6.4 may fail under the assumption lim inf x→+∞ g(x) ≥ 0, which, on f rst sight, may seem only insignif cantly weaker. To see this, consider f (x) = x − for which

x , x2 + 2

x = O(x−1 ) as x → +∞ ; x2 + 2 hence limx→+∞ g(x) = 0. Nevertheless, f n (x0 ) → 0 for every x0 ∈ R+(in fact, even for every x0 ∈ R), and since f ′ (0) = 12 , by Corollary 6.18 below, f n (x0 ) is Benford for every x0 6= 0. (ii) The assumption in Theorem 6.4 that g = o(x1−ε ) for some ε > 0 is sharp in that it cannot be weakened to g = o(x). To see this, consider the map f : R+ → R+ with √ 2 f (x) = 10 (log x) +1 > x , g(x) = −

96

CHAPTER 6

for which  √   √  −1 2 2 g(x) = f (x) − x = x 10 (log x) +1−log x − 1 = x 10( (log x) +1+log x) − 1

shows that g = O(x/ log x) as x → +∞. Moreover, for every x0 > 0 and n ∈ N, √ √ √ n−1 2 n−2 2 2 f n (x0 ) = 10 (log f (x0 )) +1 = 10 (log f (x0 )) +2 = . . . = 10 (log x0 ) +n ,   √ and since ( a + n) is u.d. mod 1 whenever a ≥ 0, by Example 4.7(iii), f n (x0 ) is Benford for every x0 > 0 — despite the fact that g(x) > 0 for all x > 0, and g = o(x) as x → +∞. z

Example 6.8. Theorem 6.4 carries over to maps f : R → R more or less verbatim, with the proviso that g(x) = |f (x)|−|x| ≥ 0 whenever |x| is suÿ  ciently large, and g = o(|x|1−ε ) as |x| → +∞. Thus, for instance, no orbit f n (x0 ) √ under f (x) = −2x + 3 x3 − x − 1 is Benford whenever |x0 | is suÿ ciently large, because   p 3 0 ≤ |f (x)|−|x| = |x| 2− 1 − x−2 − x−3 −1 = O(|x|−1 ) as |x| → +∞ . z

Many results in this section have immediate counterparts for maps with 0 rather than +∞ as an attracting f xed point. One only has to consider reciprocals and recall from Theorem 4.4 that a real sequence (xn ) is Benford if and only if (x−1 n ) is Benford. Corollary 6.9. Let f : R → R be C 2 , and assume that |f (x)| ≤ |x| for   some δ > 0 and all |x| ≤ δ. If |f ′ (0)| = 1 then f n (x0 ) is not Benford whenever |x0 | ≤ δ.

Example 6.10. (i) The smooth map f (x) = sin x satisf es |f (x)| < |x| for all x 6= 0, and limn→∞ fn (x0 ) = 0 for every x0 . Since f ′ (0) = 1, Corollary 6.9 applies and shows that f n (x0 ) is not Benford for any x0 ∈ R. Using Proposition this could also be seen directly by observing that the sequence  n 4.6(iv),  f (x0 ) is monotone and limn→∞ nf n (x0 )2 = 3 for all x0 6= 0. (ii) The smooth map f : R → R with f (x) = x −

x3 x4 + 1

n also satisf es |f (x)| < |x| for all  x 6= 0, and limn→∞ f (x0 ) = 0 for every x0 . ′ n Again, f (0) = 1, and f (x0 ) is not Benford for any x0 ∈ R, by Corollary 6.9. (iii) For the smooth map f : R → R given by f (x) = x − 61 x3 ,  √ √  |f (x)| = |x| 1 − 1 x2 < |x| for all x ∈ −2 3, 2 3 \ {0} . 6

Note that f is simply the third-order Taylor polynomial of sin x at x = 0. Hence it is plausible that near the f xed point at x0 = 0, the behavior of f is very similar  to that of the sine function in (i). Indeed, f n (x0 ) is not Benford whenever

REAL-VALUED DETERMINISTIC PROCESSES

97

√ √ √ √   |x0 | < 2 3, by Corollary 6.9. Also, f (±2 3) = ∓2 3, and hence f n (±2 3) √ is 2-periodic and thus not Benford either. On the other hand, if |x0 | > 2 3 then |f n (x0 )| → +∞ as n → ∞, but this scenario is not covered by Theorem 6.4 or its extension mentioned in Example 6.8, simply because the difference 1−ε |f (x)| − |x| grows as 16 |x|3 as |x| → +∞, and hence clearly is not  no(|x|  ) for any ε > 0. However, it follows from Theorem 6.23 below that f (x0 ) is √ Benford for most but not all |x0 | > 2 3; see also Example 6.25. z Example 6.11. In Corollary 6.9, the smoothness assumption on f can be weakened somewhat, but the conclusion is not generally true if f is merely C 1 . For instance, the function √  − (log x)2 +1  if x > 0 ,  10 f (x) = 0 if x = 0 , √   − (log |x|)2 +1 −10 if x < 0 ,

satisf es |f (x)| < |x| for all x 6= 0, and f ′ (0) = 1. But f is only C 1 , and for every x0 = 6 0 and n ∈ N, q p 2 log |f n (x0 )| = − log |f n−1 (x0 )| + 1 = . . . = − (log |x0 |)2 + n .

√ By Example  n 4.7(iii),  the sequence (− a + n) is u.d. mod 1 for a ≥ 0, and so the orbit f (x0 ) is Benford whenever x0 6= 0. z 6.3

SEQUENCES WITH EXPONENTIAL GROWTH

Recall from Theorem 4.16 that, for any real number a and any x0 6= 0, the sequence (an x0 ) is Benford if and only if log |a| is irrational. In the dynamical systems  n terminology of the present chapter, for the map f (x) = ax, every orbit f (x0 ) with x0 6= 0 is Benford, or none is, depending on whether log |a| is irrational or not. The following example provides an alternative, graphical description of the dynamics of f with regard to Benford’s law. Example 6.12. Given a > 0, note that the map f (x) = ax has the property that, for every x, x e ∈ R+ ,     S(x) = S(e x) =⇒ S f (x) = S f (e x) . (6.6)

As a consequence  of (6.6), there  exists a (uniquely determined) map f0 on [1, 10) such that S f (x) = f0 S(x) for all x > 0. (Note that the same map f0 is obtained if a is replaced by10k a for   any integer k.) For every n ∈ N and x1 > 0, n therefore, S f (x0 ) = f0n S(x0 ) , and by means of the map h : [0, 1) → [0, 1) with h(s) = log f0 (10s ), it follows that     n S f n (x0 ) = f0n S(x0 ) = 10h (hlog x0 i) .

98

CHAPTER 6

  the orbit f n (x0 ) of x0 under f is Benford if and only if the orbit Hence,  hn (s0 ) of s0 under h is u.d. in [0, 1), where s0 = log S(x0 ) = hlog x0 i. The latter is the case precisely when log a is irrational. For a concrete example, choose a = 2. Then ( 2t if 1 ≤ t < 5 , f0 (t) = t/5 if 5 ≤ t < 10 , and h(s) = hs + log 2i =

(

see Figure 6.2.

s + log 2 s − log 5

if 0 ≤ s < log 5 , if log 5 ≤ s < 1 ; z

f0 (t) 10

h(s)

lo g

2

2t

1

s+

t = 10s , s = log t

2 1

t/5 1

5

t

0

s−

lo g

5

log 2

0

log 5

s 1

10

  Figure 6.2: For the map f (x) = 2x, the sequence f n (x0 ) is Benford for all x0 > 0; see Example 6.12. In general, if a map f : R+ → R+ has the form f (x) = ax + g(x) with a function g that is small in some sense but not identically zero, then (6.6) will usually not hold, and so the simple graphical analysis of Example 6.12 fails. However, as the following theorem, the main result of the present section, shows, the all-or-nothing behavior with regard to Benford’s law does persist nevertheless, provided that g is small enough. Theorem 6.13. Let f : R+ → R+ be a map such that f (x) = ax + g(x) with a > 1 and g = o(x) as x → +∞.   (i) If log a 6∈ Q then f n (x0 ) is Benford for every suÿ ciently large x0 .

(ii) If log a ∈Q and g = o(x/ log x) as x → +∞ then, for every suÿ ciently large x0 , f n (x0 ) is not Benford.

REAL-VALUED DETERMINISTIC PROCESSES

99

Proof. (i) Since g(x) = o(x) as x → +∞, there exists ξ > 0 such that |g(x)| ≤ 21 (a − 1)x for all x ≥ ξ, and consequently xn = f (xn−1 ) = axn−1 + g(xn−1 ) ≥ 12 (a + 1) xn−1 , provided that xn−1 ≥ ξ. Thus f n (x0 ) ≥ ξ for all n, and f n (x0 ) → +∞ as n → ∞ whenever x0 ≥ ξ; moreover, g(xn )/xn → 0. Letting yn = log xn for every n,   f (xn−1 ) g(xn−1 ) n→∞ yn − yn−1 = log −→ log a , = log a + log 1 + xn−1 axn−1 and Proposition 4.6(i)  implies  that (yn ) is u.d. mod 1 if log a is irrational. In that case, therefore, f n (x0 ) is Benford for every x0 ≥ ξ. (ii) Assume now that g = o(x/ log x) as x → +∞. As in (i), pick ξ > 0 such that xn = f n (x0 ) ≥ ξ for every n, and f n (x0 ) → +∞ as n → ∞ whenever x0 ≥ ξ. Note that, given any ε > 0,   log 1 + g(xn ) ≤ |g(xn )| < ε log a axn 2axn log xn

for all suÿ ciently large n, where the f rst inequality holds since | log(1+x)| ≤ 12 |x| for all x ≥ − 41 . Letting zn = yn − n log a yields limn→∞ zn /n = 0 and   g(xn ) εn log a ε log a n|zn+1 − zn | = n log 1 + ≤ = , axn log xn log a + zn /n

so lim supn→∞ n|zn+1 − zn | ≤ ε. In fact, limn→∞ n(zn+1 − zn ) = 0 since ε > 0 was arbitrary. By Proposition 4.6(ii, vi), the sequence (yn ) is u.d. mod 1 (if  and) only if (n log a) is, and thus f n (x0 ) is not Benford whenever x0 ≥ ξ and log a ∈ Q.  Corollary 6.14. Let f : R+ → R+ be a map such that, with some a > 1, f (x) − ax  = o(x/ log x) as x → +∞. Then, for every suÿ ciently large x0 , f n (x0 ) is Benford if and only if log a is irrational. Remark. In Theorem 6.13 and Corollary 6.14, and also in Theorem 6.4 of the previous section, the map f is not even required to be (Borel) measurable, let alone continuous.

Example 6.15. (i) By Corollary 6.14, every orbit under f (x) = 2x + e−x is Benford. Note that no simple explicit formula is available for f n . (ii) The map f (x) = 2x − e−x has a unique (repelling) f xed point, namely, ∗ If x0 > x∗ then f n (x0 ) → +∞, so Corollary 6.14 implies that x n= 0.5671.  f (x0 ) is Benford. On the other hand, if x0 < x∗ then f n (x0 ) → −∞ superexponentially fast. This scenario is not  covered  by any result so far. However, Proposition 6.31 below implies that f n (x0 ) is Benford for most but not all x0 < x∗ . z

100

CHAPTER 6

Just as with Theorem 6.4, Corollary 6.14 carries over to maps f : R → R: If |f (x)| − a|x| = o(|x|/ log |x|) as |x| → +∞  for some a > 1 then, for every suÿ ciently large |x0 |, the sequence f n (x0 ) is Benford if and only log a is irrational. Example 6.16. Consider the map f (x) = −2x + 3. Since |f (x)| − 2|x| = |2x − 3| − |2x| ≤ 3 for all x ∈ R ,   clearly |f (x)| − 2|x| = O(1) as |x| → +∞, so f n (x0 ) is Benford provided that |x0 | is suÿ ciently large, and in fact is Benford for every x0 6= 1. Note that, for every n ≥ 2,

xn = f n (x0 ) = f (xn−1 ) = −2xn−1 + 3 = −xn−1 − (−2xn−2 + 3) + 3 = −xn−1 + 2xn−2 .   Thus f n (x0 ) solves a linear second -order difference equation (or two-step recursion). The Benford properties of (solutions of) linear difference equations are studied systematically in Chapter 7. For instance, Theorem 7.41 implies that not  n every  solution of xn = −xn−1 + 2xn−2 , and correspondingly not every orbit f (x0 ) , can be Benford. In the present example, however, the only exception occurs for x0 = 1. z As the following example shows, the requirement that g = o(x/ log x) as x → +∞ in Theorem 6.13(ii) is sharp in the sense that the latter result,  as well δ as the “only if” part in Corollary 6.14, may fail if g = o x/(log x) for some  n  0 ≤ δ < 1. Under the latter, weaker assumption, f (x0 ) may be Benford for all x0 even if log a is rational.

Example 6.17. Given any 0 ≤ δ < 1, f x 0 < ε < 1 − δ, def ne a homeomorphism hε : R → R as ( y + yε if y ≥ 0 , hε (y) = y − |y|ε if y < 0 , and observe that   h±1 ε (y) − 1 = o |y|−δ as |y| → +∞ , y

+ + as well as that |h−1 ε (y)| ≥ 1 for every |y| ≥ 2. Next def ne a map f : R → R by setting −1 f (x) = 10hε (1+hε (log x)) .

It is straightforward to check that f (x) − 10x = o(x/(log x)δ ) as x → +∞. Thus a = 10, and log a = 1 is rational. On the other hand, for every n ∈ N, −1

f n (x) = 10hε (n+hε

(log x))

,

x > 0.

REAL-VALUED DETERMINISTIC PROCESSES

101

Hence, given any x0 > 0, with c = h−1 ε (log x0 ) ∈ R, log f n (x0 ) = n + c + (n + c)ε = c + (n + c)ε = H(n) for all n ≥ |c| ,   ε with H(t) = c + (t + c) . By Proposition 4.6(v), the sequence H(n) is u.d.  mod 1, and so f n (x0 ) is Benford for every x0 ∈ R+ . z

Theorem 6.13 yields the following straightforward corollary when reciprocals are considered. ′ Corollary 6.18. Let f : R → R be C 2 with  f (0) =  0 and 0 < |f (0)| < 1. Then, for every x0 6= 0 suÿ ciently close to 0, f n (x0 ) is Benford if and only if log |f ′ (0)| is irrational.

Example 6.19. The map f (x) = x − 31 + 13 e−x is smooth, with f (0) = 0 0 0, b > 1. Then f n (x0 ) is Benford for almost all x0 > 0, but every non-empty open interval in R+ contains  n uncountably many x0 for which f (x0 ) is not Benford.   Example 6.21. Let f (x) = 10x2 . By Proposition 6.20, f n (x0 ) is Benford for almost all x0 > 0, and in fact even for almost all x0 ∈ R because f (x0 ) ≥ 0, n n but not for all x0 . For instance, f n (x0 ) = 102 −1 x20 always has f rst digit D1 = 1 if x0 = 10k for some k ∈ Z. As in the case of the exactly linear map in Example 6.12, f (x) = 10x2 has the property   (6.6) and hence induces a map f0 on [1, 10), according to S f (x) = f0 S(x) . Concretely, ( √   t2 if 1 ≤ t < 10 , √ f0 (t) = S f (t) = t2 /10 if 10 ≤ t < 10 . The map h on [0, 1) is simply h(s) = log f0 (10s ) = h2si. Again,  n associated  f (x0 ) is Benford if and only if hn (s0 ) is u.d. in [0, 1), where s0 is given by s0 = log S(x0 ) = hlog |x0 |i; see Figure 6.4. z

To put Proposition 6.20 into perspective, given any map f : R+ → R+ , let    B = x ∈ R+ : f n (x) is Benford . (6.7)

With this, if f has the special form f (x) = axb , then Proposition 6.20 asserts that R+ \ B = {x ∈ R+ : x 6∈ B}, the set of exceptional points, is a (Lebesgue) nullset. Thus most x > 0 belong to B. However, R+ \ B is also uncountable and everywhere dense in R+ . In fact, the proof of Theorem 6.46 (a generalization of Proposition 6.20) given in Section 6.6 shows that the set B is of f rst category, i.e., a countable union of nowhere dense sets. From a topological point of view, therefore, most x ∈ R+ do not belong to B. This discrepancy between the measure-theoretic and the topological point of view is not uncommon in ergodic and theory  may explain why it is diÿ cult to f nd even a single point x0 for which f n (x0 ) is Benford for, say, f (x) = 10x2 — despite the fact that Proposition 6.20 guarantees the existence of such points in abundance. The sets B and

103

REAL-VALUED DETERMINISTIC PROCESSES

f0 (t) 10

h(s)

1

t 1



10

0

1

2s

t = 10s , s = log t

2s −

t 2/ 10

t2

1

s 0

10

1 2

1

  Figure 6.4: For the map f (x) = 10x2 , the sequence f n (x0 ) is Benford for almost all, but not all, x0 . An exceptional orbit, corresponding to S(x0 ) = 101/3 , is indicated in red; see Examples 6.3 and 6.21.

R+ \ B are intertwined in a rather complicated way, and the reader may want to compare this to the much more clear-cut situation prevailing for the maps of Corollary 6.14 where, for every c > 0 suÿ ciently large, either [c, +∞) ∩ B or [c, +∞) \ B is empty, depending on whether log a is rational or not. The maps of Proposition 6.20 will now be used as basic models for studying the Benford properties of more general maps. More concretely, consider maps f : R+ → R+ with the property that f (x) − axb = o(xb ) as x → +∞ ,

(6.8)

for some a > 0 and b > 1. Recall that in order to understand the Benford properties of the essentially linear maps f of Section 6.3, all that was needed was a growth condition on the deviation f (x) − ax; see Theorem 6.13. For the power-like maps given by (6.8), the situation is more complicated in that  even if f (x) − axb decays very rapidly, the Benford properties of the orbits f n (x0 ) may be quite unlike those in Proposition 6.20. Example 6.22. The goal of this example is to demonstrate that, given any (decreasing) function g : R+ → R+ , the deviation f (x) − axb in (6.8) may decay b faster than g, i.e., f (x)  n− ax = o(g) as x → +∞, and yet, in stark contrast to Proposition 6.20, f (x0 ) may be Benford for every (large) x0 ∈ R+ , or for none at all. Throughout, let (Nn ) be any increasing sequence of positive integers, to be further specif ed shortly. (i) Fix 0 < η < 1 such that (2n η) is u.d. mod 1. (Such η exist; see Example 4.15.) Def ne a map h : [η, +∞) → R as follows: Given any y ≥ η, there exists

104

CHAPTER 6

a unique integer l(y) with 2l(y) η ≤ y < 2l(y)+1 η. With this, def ne h as j k h(y) = 21+l(y) η + 2−Nl(y) 21+Nl(y) (y − 2l(y) η) , y ≥ η , and note that

2y − 2−Nl(y) < h(y) ≤ 2y

for all y ≥ η .

Moreover, it is readily conf rmed that h (y) = 2h(y) for every y ≥ η. It follows that hn (y) − 2n+l(y) η is an integer whenever n > Nl(y) , showing in turn that  n  h (y) is u.d. mod 1 for all y ≥ η. Now consider the map f : [10η , +∞) → R+ given by f (x) = 10h(log x) , for which   f (x) − x2 = O x2 2−Nl(log x) as x → +∞ . (6.9) 2

From (6.9), it is clear that f (x) − x2 = o(g) as x → +∞, provided that (Nn ) is growing suÿ ciently fast. Since log f n (x0 ) = hn (log x0 ), it follows that f n (x0 ) is Benford for every x0 ≥ 10η . In other words, [10η , +∞) \ B is empty. (ii) Def ne a map h : [0, +∞) → R as   h(y) = 2−N⌊y⌋ 21+N⌊y⌋ y + 1 , y ≥ 0 . Note that

2y ≤ 2y + 1 − 2−N⌊y⌋ < h(y) ≤ 2y + 1 for all y ≥ 0 ,   and hn (y) is an integer whenever n is suÿ ciently large. Thus hn (y) is not u.d. mod 1 for any y ≥ 0. Analogously to (i), consider the map f : [1, +∞) → R+ given by f (x) = 10h(log x) , for which   f (x) − 10x2 = O x2 2−N⌊log x⌋ as x → +∞ .

Again, it is clear that f (x) − 10x2 = o(g) as x → +∞, provided  that (Nn ) grows fast enough. As in (i), log f n (x0 ) = hn (log x0 ), and so f n (x0 ) is not Benford z for any x0 ≥ 1, i.e., [1, +∞) ∩ B is empty.

As evidenced by Example 6.22, in order to relate the Benford properties of maps satisfying (6.8) to those of the reference map f (x) = axb , merely prescribing the order of magnitude as x → +∞ of the deviation g(x) = f (x) − axb is not enough. Rather, the analytic properties of f , such as continuity or differentiability, also play a role. As it turns out,  if f is continuous then there always exist many points x0 for which f n (x0 ) is Benford, but also many exceptional points, i.e., neither of the sets B and R+ \ B in (6.7) is empty. In fact, both sets contain many points. (Note that neither of the two maps f considered in Example 6.22 is continuous.) Theorem 6.23. Let f : R+ → R+ be a map such that f (x) = axb + g(x) with a > 0, b > 1, and g = o(xb ) as x → +∞.

REAL-VALUED DETERMINISTIC PROCESSES

105

(i) If f is continuous then, for every c > 0, there exist uncountably many x0 ≥ c forwhich f n (x0 ) is Benford, but also uncountably many x0 ≥ c for which f n (x0 ) is not Benford, i.e., [c, +∞) ∩ B and [c, +∞) \ B are both uncountable. ′ b−1 (ii) If f is continuously differentiable  n and  g = o(x / log x) as x → +∞, then there exists c > 0 such that f (x0 ) is Benford for almost all x0 ≥ c, i.e., [c, +∞) \ B is a (Lebesgue) nullset.

For the proof of Theorem 6.23, it is convenient to utilize (a simple variant of) the technique of shadowing. While the argument employed here is elementary, note that in dynamical systems theory, shadowing is a powerful and sophisticated tool; e.g., see [121, 123]. To appreciate the usefulness of a shadowing argument in the context of Benford’s law, consider any map within the scope of Theorem 6.23, that is, let f (x) = axb + g(x) with a > 0, b > 1, and g(x) = o(xb ) as x → +∞. Assume a = 1 for convenience and notice that (xn ) = f n (x0 ) is strictly increasing, and that limn→∞ f n (x0 ) = +∞ whenever x0 is suÿ ciently large. With yn = log xn , therefore,   g(10yn−1 ) yn−1 yn = log f (10 ) = byn−1 + log 1 + =: h(yn−1 ) , n ∈ N, 10byn−1   and the crucial question is whether or not (yn ) = hn (y0 ) is u.d. mod 1 for some, or even most, y0 = log x0 . Recall from Example 4.15 that (bn y) is u.d. mod  1 for almost all, but not all, y ∈ R. However, given any y0 , the two sequences hn (y0 ) and (bn y0 ) may have very little in common. In fact, (hn (y0 )− bn y0 ) will typically be unbounded, even if h(y) − by For example, with the very simple map h(y) = 2y + 1,  is bounded.  the orbit hn (y0 ) = (2n y0 + 2n − 1) is bounded if and only if y0 = −1, whereas (2n y0 ) is bounded only for y0 = 0. For every y0 , therefore, (hn (y0 ) − 2n y0 ) is unbounded. In this situation, the key observation of shadowing, made precise by Lemma 6.24 below, is that while (hn (y0 )−bn y0 ) is unbounded for every given y0 , there nevertheless exists a unique point referred to as the  sb(y0 ) ∈ R, informally  shadow of y0 , with the property that hn (y0 ) − bn sb(y0 ) remains bounded. (In the case of h(y) = 2y+1 considered earlier, hn (y0 ) = 2n y0 +2n −1 = 2n (y0 +1)−1 shows that sb(y0 ) = y0 + 1.) Thus hn (y0 ) resembles (bn y) for some y — in fact y = sb(y0 ) — and a closer inspection of the map sb and its analytic properties is virtually all that is needed in order to establish Theorem 6.23. Lemma 6.24. (Shadowing Lemma) Let h : R → R be a map such that h(y) = by + Γ(y) with b > 1. If lim supy→+∞ |Γ(y)| < +∞ then there exist   η ∈ R and a function sb : [η, +∞) → R such that the sequence hn (y)− bn sb(y) is bounded for every y ≥ η. The function sb is continuous whenever h is continuous. Moreover, if lim s(y) − y) = 0 and, for every  y→+∞ Γ(y) = 0 then limy→+∞ (b y ≥ η, limn→∞ hn (y) − bn sb(y) = 0.

106

CHAPTER 6

Proof. Since lim supy→+∞ |Γ(y)| is f nite, there exist η0 ∈ R and γ > 0 such that |Γ(y)| ≤ γ for all y ≥ η0 . Thus if y ≥ η := max{η0 , 2γ/(b − 1)} then h(y) = by + Γ(y) ≥ by − γ = (b − 1)y − γ + y ≥ y + γ , and hence hn (y) ≥ y ≥ η for all n, and limn→∞ hn (y) = +∞. In this case, therefore, Xn hn (y) = bhn−1 (y) + Γ ◦ hn−1 (y) = . . . = bn y + bn−j Γ ◦ hj−1 (y) . j=1

Since b > 1, the number

sb(y) = y +

X∞

j=1

b−j Γ ◦ hj−1 (y)

(6.10)

is well-def ned for every y ≥ η. Moreover, for every n, X∞ X∞ bn−1 Γ ◦ hj−1 (y) = b−j Γ ◦ hj+n−1 (y) |hn (y) − bn sb(y)| = j=n+1

j=1

γ ≤ . (6.11) b−1   Thus the sequence hn (y) − bn sb(y) is bounded. (Since b > 1, given any y ≥ η, it is clear that sb(y) is the only point with this property.) If h is continuous on [η, +∞) then so is Γ ◦ hj−1 for every j ∈ N. In this case, the sum on the right in (6.10) converges uniformly, by the Weierstrass M -test, and hence the function sb is continuous also. Finally, assume that limy→+∞ Γ(y) = 0. Given any ε > 0, there exists yε ≥ η such that |Γ(y)| ≤ ε(b − 1) whenever y ≥ yε . Thus, if y ≥ yε then hj−1 (y) ≥ y ≥ yε ≥ η for all j ∈ N, which in turn implies that |b s(y) − y|
0, b > 1, and g = o(xb ) as x → +∞. Without loss of generality, it can be assumed −1 that a = 1. (Otherwise replace f (x) by αf (x/α) with α = a(b−1) .) Given any x0 > 0, let yn = log xn = log f n (x0 ) and observe that yn = h(yn−1 ), where h : R → R is given by   g(10y ) h(y) = by + log 1 + . (6.12) 10by Note that g = o(xb ) as x → +∞ implies limy→+∞ (h(y) − by) = 0.

107

REAL-VALUED DETERMINISTIC PROCESSES

To prove assertion (i) in Theorem 6.23, assume that f is continuous. From (6.12), it is clear that h is also continuous, and by Lemma 6.24, there exist η ∈ R and a continuous sb :[η, +∞) → R with sb(y) − y → 0 as y → +∞ such  n function n that limn→∞ h (y) − b s b (y) = 0 for all y ≥ η. Thus,    by Proposition 4.3(i), for y ≥ η, hn (y) is u.d. mod 1 if and only if bn sb(y) is. Fix  any c > 0 with log c ≥ η. By the Intermediate Value Theorem, sb [log c, +∞) ⊃ [b s(log c), +∞), and by Proposition 6.20, the sets {y ≥ sb(log c) : (bn y) is u.d. mod 1 }

and

{y ≥ sb(log c) : (bn y) is not u.d. mod 1 }

are both uncountable. Hence the set       Uc = y ≥ log c : hn (y) is u.d. mod 1 = y ≥ log c : bn sb(y) is u.d. mod 1

is uncountable, and so is [log c, +∞) \ Uc . In other words, the sets    [c, +∞) ∩ B = x0 ≥ c : f n (x0 ) is Benford and

   [c, +∞) \ B = x0 ≥ c : f n (x0 ) is not Benford

are both uncountable. b−1 To prove (ii), assume that f is C 1 and g ′ = o(xP / log x) as x → +∞. With ∞ h : R → R given by (6.12), recall that sb(y) = y + j=1 b−j Γ ◦ hj−1 (y), where   g(10y ) Γ(y) = log 1 + , 10by

and consequently Γ′ (y) =

y ∈ R,

10y g ′ (10y ) − bg(10y ) . 10by + g(10y )

The assumption g ′ = o(xb−1 / log x) as x → +∞ implies that Γ′ (y) = o(y −1 ) as y → +∞. Deduce from  d  −j b Γ ◦ hj−1 (y) = b−j Γ′ ◦ hj−1 (y)(hj−1 )′ (y) dy  Yj−1  Γ′ ◦ hℓ−1 (y) = b−1 Γ′ ◦ hj−1 (y) 1+ ℓ=1 b

 ′ that limy→+∞ b−j Γ ◦ hj−1 (y) = 0, and also, with the appropriate constant γ0 > 0, that d  −j  j−1 ≤ γ0 b−j for all j ∈ N , y ≥ η . b Γ ◦ h (y) dy

108

CHAPTER 6

Thus the function sb def ned by (6.10) is C 1 , with its derivative sb′ given by termwise differentiation, and the Dominated Convergence Theorem implies  X∞  ′  limy→+∞ sb′ (y) = limy→+∞ 1 + b−j Γ ◦ hj−1 (y) = 1. j=1

For every suÿ ciently large c > 0, therefore, sb is a diffeomorphism of [log c, +∞) onto [b s(log c), +∞), and so maps nullsets onto nullsets. In particular, the set [log c, +∞) \ Uc is a nullset. In other words, [c, +∞) \ B is a nullset.  Remark. (i) Under the assumptions of Theorem 6.23(ii), the above proof shows that, for every suÿ ciently large c > 0, the set [c, +∞) \ B is dense in [c, +∞). By (i) of the theorem, it is also uncountable. Again, it can be shown that [c, +∞) ∩ B is of f rst category; see [13]. (ii) The assumption in Theorem 6.23(ii) that g ′ = o(xb−1 / log x) as x → +∞ can be relaxed to g ′ = o xb−1 /(log log x)1+ε for some ε > 0. On the other hand, as demonstrated by Example 6.27 below, the conclusion of that theorem may fail if g ′ = O(xb−1 ). Example 6.25. Let the map f be polynomial of degree at least two, i.e., f (x) = ap xp + ap−1 xp−1 + . . . + a1 x + a0 ,

where p ∈ N \ {1} and a0 , a1 , . . . , ap−1 , ap ∈ R with ap 6= 0. Assume without loss of generality that ap > 0. (Otherwise replace f (x) by −f (−x) if p is even, or by f 2 (x) if p is odd.) The map f is continuously and  differentiable  (f (x) − ap xp )′ = O(xp−2 ) as x → +∞. By Theorem 6.23, f n (x0 ) is Benford for almost all, but not all, x0 with |x0 | suÿ ciently large. 2 n For a simple concrete example,  n let f (x) = x + 1. Since f (x) ≥ n for every n ∈N and  x ∈ R, therefore, f (x 0 ) is Benford for almost all x0 ∈ R, but the set x0 : f n (x0 ) is not Benford is also uncountable  n  and dense in R. At the time ofthis writing, it is not known whether f (0) = (1, 2, 5, 26, 677, . . .), or  in fact f n (k) for any integer k, is Benford; note that f n (0) equals the number of binary trees of height less than n [117, A003095]. For another concrete example, consider the (polynomial) map f (x) = x− 16 x3 √ n of Example 6.10(iii). As seen earlier,  n |f (x0 )| → +∞ whenever |x0 | > 2 3, and it is clear from the above that f (x0 ) is Benford for almost all, but not all, √ x0 with |x0 | > 2 3. z

Careful inspection of the proof of Theorem 6.23(ii) shows that it suÿ ces to require that the map f : R+ → R+ be absolutely continuous. Recall that this means that f is an indef nite integral of its derivative f ′ , which exists almost everywhere on R+ . For example, every convex or continuously differentiable function is absolutely continuous. (This notion of absolute continuity of functions is consistent with the standard notion of absolute continuity of random variables as follows: A real-valued random variable X is absolutely continuous if and only if its distribution function FX is absolutely continuous.) If Theorem

109

REAL-VALUED DETERMINISTIC PROCESSES

6.23(ii) is thus generalized, the symbol g ′ has to be understood as the (measurable) function that coincides with f ′ (x) − abxb−1 whenever f ′ (x) exists, and g ′ (x) := 0 otherwise. Example 6.26. Consider the continuous map f : R+ → R+ with f (x) = (2n − 1)x − n(n − 1) for all n ∈ N, x ∈ (n − 1, n] . Clearly, the map f , a piecewise linear interpolation of the map f0 (x) = x2 , is not continuously differentiable. However, 0 ≤ f (x) − x2 = −x2 + (2n − 1)x − n(n − 1) =

1 4



1 4

 2 2x − (2n − 1) ≤

1 4

,

and so Theorem 6.23(i) applies, since g(x) = f (x) − x2 = O(1) as x → +∞. Moreover, f is convex, hence absolutely continuous, and |g ′ (x)| = |f ′ (x) − 2x| = |2n − 1 − 2x| < 1 for all x ∈ (n − 1, n) . Thus g ′ = O(1) as x → +∞, where g ′ (n) := 0 for every n ∈ N. Theorem 6.23(ii), generalized  n  to absolutely continuous maps f as outlined above, then shows that f (x0 ) is Benford for almost all suÿ ciently large x0 — in fact for almost all x0 > 1 — but there are also uncountably many exceptional points. z Example 6.27. This example shows that the condition g ′ = o(xb−1 / log x) in Theorem 6.23(ii) is needed even if f is very smooth. Let h : R → R be the (real-analytic) function with h(0) = 0 and h(y) = 2y −

sin(2πy 2 ) , 2πy

y 6= 0 .

Since h(k) = 2k ∈ Z and h′ (k) = 0 for all k ∈ Z \ {0}, clearly dist(hn (y), Z) → 0 as n → ∞ whenever |y| ≥ 1 and dist(y, Z) is suÿ ciently small; here, as usual, dist(y, Z) = min{|y − k| : k ∈ Z}. Specif cally, it is straightforward to show that h(J) ⊂ J, with the countable union of intervals   [ 1 1 J= k− ,k + , k∈Z\{0} 25|k| 25|k|

and limn→∞ (hn (y) − 2n k) = 0 for every  y ∈ J and the appropriate integer n k. For every y ∈ J, therefore, h (y) is not u.d. mod 1. (It is tempting to   speculate whether the set y ∈ R : hn (y) is u.d. mod 1 is actually a nullset.  √  If it were, then the sequence 2n 2 , for instance, would not be u.d. mod 1 or, √ in number-theoretic parlance, the number 2 would not be 2-normal. This in turn would settle a well-known open problem; cf. Example 4.15.) Next def ne a (real-analytic) map f : R+ → R+ as f (x) = 10h(log x) = x2 10−

sin(2π(log x)2 ) 2π log x

= x2 + g(x) ,

110

CHAPTER 6

with the deviation g given by    2  sin(2π(log x)2 ) x g(x) = f (x) − x2 = x2 10− 2π log x − 1 = O log x

as x → +∞ .

Hence Theorem 6.23(i) applies, with a = 1, b = 2. Also, it is readily conf  rmed that g ′ = O(x) as x → +∞. However, it is clear from the above that f n (x0 ) is not Benford whenever log x0 ∈ J or, equivalently, whenever h i [ 1 1 x0 ∈ 10J := {10y : y ∈ J} = 10k 10− 25|k| , 10 25|k| , k∈Z\{0}

and the set 10J has positive (in fact, inf nite) Lebesgue measure. Thus the conclusion of Theorem 6.23(ii) does not hold. z As usual, Theorem 6.23 has a simple corollary via taking reciprocals.

Corollary 6.28. [15, Thm. 4.1] Let f : R → R be a smooth map with  f (0) = 0, f ′ (0) = 0, and f (p) (0) 6= 0 for some p ∈ N \ {1}. Then f n (x0 ) is Benford for almost all x0 suÿ ciently close to 0, but there are also uncountably many exceptional points. Example 6.29. For the smooth map f (x) = x − 1 + e−x , f (0) = f ′ (0) =  0 but f ′′ (0) = 1 6= 0. For almost all x0 suÿ ciently close to 0, therefore, f n (x0 ) is Benford. Since limn→∞ f n (x0 ) = 0 for everyx0 , andf maps nullsets to nullsets, it is clear from Corollary 6.28 that in fact f n (x0 ) is Benford for almost all, but not all, x0 ∈ R. z

Example 6.30. In Corollary 6.28, the assumption f (p) (0) 6= 0 for some p ≥ 2 is crucial in that the conclusion may fail if f (p) (0) = 0 for all p ∈ N. For a simple example, let h : R → R be any smooth non-decreasing map with h(y) = (2k−1)3 for every k ∈ Z and all y ∈ [2k − 1, 2k]. The map f : R → R with f (0) = 0 and f (x) = 10h(log |x|) ,

x 6= 0 ,

S is smooth, and f (p) (0) = 0 for every p ∈ N. If log |x0 | ∈ k∈Z [2k − 1, 2k] or, equivalently, if [   x0 ∈ 102k−1 [−10, −1] ∪ [1, 10] =: J , k∈Z

    then f n (x0 ) is not Benford since S f n (x0 ) ≡ 1. Note that J ∩ (−ε, ε) has positive measure for every ε > 0. Thus the conclusion of Corollary 6.28 fails for the smooth map f . z

To conclude this section, note that maps such as f (x) = ex that have an even faster growth than f (x) = axb are not covered by any of the results discussed so far. The following proposition addresses some such maps; the result is a special case of Theorem 6.49 in Section 6.6, and a proof is given there.

REAL-VALUED DETERMINISTIC PROCESSES

111

Proposition 6.31. Let f : R+ → R+ be a map such that, for some c ≥ 0, both of the following conditions hold: (i) The function log f (10x ) is convex on (c, +∞); log f (10x ) − log f (10c ) > 1 for all x > c. x−c   Then f n (x0 ) is Benford for almostall suÿ ciently large x0 , but there also exist uncountably many x0 > c for which f n (x0 ) is not Benford. (ii)

Remark. When applied to f (x) = axb with a > 0, b > 1, the assertion of Proposition 6.31 is weaker than that of Proposition 6.20 in that the latter does not require x0 to be “suÿ ciently large” (a property that may depend on a and b) and also guarantees denseness of exceptional points. Example 6.32. For f (x) = ex , the map h(x) = log f (10x) = 10x log e is convex on R, and h(x)/x > h′ (0)= 1 for all x > 0. Hence Proposition 6.31 applies with c = 0 and shows that f n (x0 ) is Benford for almost all suÿ ciently n n−2 large for every x0 ∈ R and n ≥ 2, the sequence  n x0. In fact, since f (x0 ) > 2 f (x0 ) is Benford for almost all, but not all, x0 ∈ R. Again, it can be shown that the set of exceptional points is dense in R, besides being uncountable by Proposition 6.31. z Example 6.33. The (polynomial) map f (x) = x2 − 2x + 2 satisf es (ii) in Proposition 6.31, provided that c > 12 log 2. However, log f (10x ) is concave on √     Recall log 2 + 2 , +∞ , and consequently Proposition 6.31 does not apply.   that, on the other hand, Theorem 6.23 does apply, showing that f n (x0 ) is Benford for almost all, but not all, x0 6∈ [0, 2]; see also Example 6.25. z

Notice that Proposition 6.31 does not impose any restriction on the order of magnitude as x → +∞ of the deviation of f from some reference map f0 , such as f0 (x) = x, f0 (x) = ax, and f0 (x) = axb used in Theorems 6.4, 6.13, and 6.23, respectively. Example 6.34. The map f : R+ → R+ with f (x) =

xm (m − 1)!

for every m ∈ N, x ∈ (m − 1, m] ,

is neither continuously differentiable nor O(xb ) as x → +∞, for any b > 1. Nevertheless, Proposition 6.31 applies directly with c = 0, showing that f n (x0 ) is Benford for almost all, but not all, x0 > 1. z 6.5

AN APPLICATION TO NEWTON’S METHOD

In scientif c calculations using digital computers and f oating-point arithmetic, roundoff errors are inevitable, and as Knuth points out in his classic text The Art of Computer Programming [90, pp. 253–255],

112

CHAPTER 6

[I]n order to analyze the average behavior of f oating-point arithmetic algorithms (and in particular to determine their average running time), we need some statistical information that allows us to determine how often various cases arise . . . [If, for example, the] leading digits tend to be small [, that] makes the most obvious techniques of “average error” estimation for f oating-point calculations invalid. The relative error due to rounding is usually . . . more than expected. One of the most widely used f oating-point algorithms is Newton’s method for f nding the roots of a given (differentiable) function numerically. Thus when using Newton’s method, it is important to keep track of the distribution of signif cant digits (or signif cands) of the approximations generated by the method. As will be seen shortly, the differences between successive Newton approximations, and the differences between the successive approximations and the unknown root, often exhibit exactly the type of non-uniformity of signif cant digits alluded to by Knuth — they typically follow Benford’s law. Throughout this section, let g : I → R be a differentiable function def ned on some open interval I ⊂ R, and denote by Ng the map associated with g by Newton’s method, that is, Ng (x) = x −

g(x) g ′ (x)

for all x ∈ I with g ′ (x) = 6 0.

For Ng to be def ned wherever g is, set Ng (x) = x if g ′ (x) = 0. Using Newton’s method for f nding roots of g (i.e., real numbers x∗ with g(x∗ ) = 0) amounts to picking an initial point x0 ∈ I and iterating Ng . Henceforth in this section, (xn ) denotes the resulting sequence of approximations starting at x0 , that is,  (xn ) = Ngn (x0 ) . Clearly, if (xn ) converges to x∗ , say, and if Ng is continuous at x∗ , then Ng (x∗ ) = x∗ , so x∗ is a f xed point of Ng , and g(x∗ ) = 0. (Note that according to the def nition of Ng used here, Ng (x∗ ) = x∗ could also mean that g ′ (x∗ ) = 0. If, however, g ′ (x∗ ) = 0 but g(x∗ ) 6= 0, then Ng is not continuous at x∗ unless g is constant.) It is this correspondence between the roots of g and the f xed points of Ng that makes Newton’s method work locally. Often, every f xed point x∗ of Ng is attracting, i.e., limn→∞ Ngn (x0 ) = x∗ for all x0 suÿ ciently close to x∗ . (Observe that if g is linear near x∗ , i.e., g(x) = a(x − x∗ ) for some a 6= 0 and all x near x∗ , then Ng (x) = x∗ for all x near x∗ .) To formulate a result about Benford’s law for Newton’s method, it will be assumed that the function g : I → R is real-analytic. Recall that this means that g can be represented by its Taylor series in a neighborhood of each point of I. Although real-analyticity is a strong assumption indeed, the class of real-analytic functions covers most practically relevant cases, including all polynomials, and all rational, exponential, and trigonometric functions, and compositions thereof. If g : I → R is real-analytic and x∗ ∈ I is a root of g, i.e., if g(x∗ ) = 0, then g(x) = (x − x∗ )m h(x) for some m ∈ N and some real-analytic h : I → R with h(x∗ ) 6= 0. The number m is the multiplicity of the root x∗ ; if m = 1

REAL-VALUED DETERMINISTIC PROCESSES

113

then x∗ is referred to as a simple root. The following theorem becomes plausible upon observing that g(x) = (x − x∗ )m h(x) implies that Ng is real-analytic in a neighborhood of x∗ , and g(x)g ′′ (x) g ′ (x)2 m(m − 1)h(x)2 + 2m(x − x∗ )h′ (x)h(x) + (x − x∗ )2 h′′ (x)h(x) , = m2 h(x)2 + 2m(x − x∗ )h′ (x)h(x) + (x − x∗ )2 h′ (x)2

Ng′ (x) =

so that in particular Ng′ (x∗ ) = 1 − m−1 . For the following main result on   Benford’s law for Newton’s method, recall that (xn ) = Ngn (x0 ) is the sequence of Newton’s method approximations for the root x∗ of g, starting at x0 . Theorem 6.35 ([19]). Let the function g : I → R be real-analytic with g(x∗ ) = 0, and assume that g is not linear. (i) If x∗ is a simple root, then (xn − x∗ ) and (xn+1 − xn ) are both Benford for (Lebesgue) almost all, but not all, x0 in a neighborhood of x∗ . (ii) If x∗ is a root of multiplicity at least two, then (xn − x∗ ) and (xn+1 − xn ) are Benford for all x0 6= x∗ suÿ ciently close to x∗ . The proof of Theorem 6.35 given in [19] uses the following lemma, which may be of independent interest for studying Benford’s law in other numerical approximation procedures. Part (i) is an analogue of Theorem 4.12, and (ii) and (iii) follow directly from Corollaries 6.28 and 6.18, respectively. Lemma 6.36. Let f : I → I be C ∞ , and assume that f (x∗ ) = x∗ for some x ∈ I. ∗

(i) If f ′ (x∗ ) 6= 1, then for every x0 with limn→∞ f n (x0 ) = x∗ , the sequence (f n (x0 ) − x∗ ) is Benford precisely when f n+1 (x0 ) − f n (x0 ) is Benford.

(ii) If f ′ (x∗ ) = 0 but f (p) (x∗ ) 6= 0 for some p ∈ N\{1}, then (f n (x0 ) − x∗ ) is Benford for (Lebesgue) almost all, but not all, x0 in a neighborhood of x∗ .

(iii) Suppose 0 < |f ′ (x∗ )| < 1. Then, for every x0 6= x∗ suÿ ciently close to x∗ , (f n (x0 ) − x∗ ) is Benford if and only if log |f ′ (x∗ )| is irrational. In order to relate Theorem 6.35 to the results of the previous sections, suppose the real-analytic function g : I → R is not linear, and assume g(x∗ ) = 0. For convenience, def ne a (real-analytic) auxiliary map f as f (x) = Ng (x + x∗ ) − x∗ . Then f (0) = 0, f ′ (0) = Ng′ (x∗ ) = 1 − m−1 , and f n (x − x∗ ) = Ngn (x) − x∗   for all n ∈ N and x ∈ I. It follows that (xn − x∗ ) = f n (x0 − x∗ ) and (xn+1 − xn ) = f n+1 (x0 − x∗ ) − f n (x0 − x∗ ) ; hence the Benford properties of (xn − x∗ ) and (xn+1 − xn ) can also be studied directly using Corollaries 6.18 and 6.28, together with Lemma 6.36(i).

114

CHAPTER 6

Example 6.37. (i) Let g(x) = ex − 2. Then g has a unique simple root, namely, x∗ = ln 2 = 0.6931, and Ng (x) = x − 1 + 2e−x . By Theorem 6.35(i), the sequences (xn − x∗ ) and (xn+1 − xn ) are both Benford for almost all x0 near x∗ . In fact, the auxiliary map f def ned above simply takes the form f (x) = x − 1 + e−x, and Example 6.29, together with Lemma 6.36(i), shows that (xn − x∗ ) and (xn+1 − xn ) are Benford for almost all, but not all, x0 ∈ R; see Figure 6.5. (ii) Let g(x) = (ex − 2)3 . Then g has a triple root at x∗ = ln 2, and Ng (x) = x − 13 + 23 e−x . By Theorem 6.35(ii), the sequences (xn − x∗ ) and (xn+1 − xn ) are both Benford for every x0 6= 1 near x∗ ; see Figure 6.5. The auxiliary map f here is f (x) = x − 31 + 13 e−x , and again it follows directly from Example 6.19 and Lemma 6.36(i) that (xn − x∗ ) and (xn+1 − xn ) are Benford unless Ng (x0 ) = x∗ , that is, unless x0 = x∗ or x0 = x∗ − 1.903 = −1.210; in the latter case, clearly, both sequences are identically zero. z

1

0

−1

Ng1 (x) = x − 1 + 2e−x

x1

x∗ x0

1

x2

g1 (x) = ex − 2

1 x

0

−1

Ng2 (x) = x −

x1 x2

1 3

+ 23 e−x

x∗

x0

1 x

g2 (x) = (ex − 2)3

Figure 6.5: Visualizing Newton’s  method  and Theorem 6.35: For the function g1 (x) = ex − 2, the sequence Ngn1 (x0 ) converges to the unique simple root x∗ = ln 2 super-exponentially (left); for g2 (x) = g1 (x)3 = (ex − 2)3 , convergence  n of Ng2 (x0 ) to the triple root x∗ is only exponential; see Example 6.37.

Utilizing Lemma 6.36, an analogue of Theorem 6.35 can be established for other root-f nding algorithms as well.

Example 6.38. Let g(x) = x+x3 and consider the successive approximations (ξn ) for the root x∗ = 0 of g, generated iteratively from ξ0 ∈ R by the Jacobi–

115

REAL-VALUED DETERMINISTIC PROCESSES

Steffensen method, ξn = ξn−1 −

g(ξn−1 )2  , g(ξn−1 ) − g ξn−1 − g(ξn−1 )

n ≥ 1.

For almost all, but not all, ξ0 near x∗ = 0, the sequence (ξn ) is Benford. This follows from Lemma 6.36(ii), since ξn = Jgn (ξ0 ), with the Jacobi–Steffensen transformation 1 − x2 , Jg (x) = −x5 1 + x2 − x4 + x6 (4)

(5)

and Jg (0) = Jg′ (0) = . . . = Jg (0) = 0, yet Jg (0) 6= 0. Alternatively, Jg = Nh 2 1 4 with the real-analytic function h(x) = (x + x3 )e 4 x −x , so Theorem 6.35(i) applies directly as well. z

If g fails to be real-analytic, then Ng may not be well-behaved analytically. For instance, Ng may have discontinuities even if g is C ∞ and g ′ > 0. Pathologies like this can cause Newton’s method to fail for a variety of reasons, of which the reader can get an impression from [19, Sec. 4]. Even if Ng is smooth, (xn ) may not be Benford. Example 6.39. (i) For the C ∞ -function g given by ( 1 2 −2 if x 6= 0 , e 2 (x −x ) g(x) = 0 if x = 0 , the associated Newton map Ng (x) = x −

x3 x4 + 1

is smooth (in fact, real-analytic), and limn→∞ xn = limn→∞ Ngn (x0 ) = 0 = x∗ for every x0 ∈ R. However, Ng′ (0) = 1 and, as seen already in Example 6.10(ii), the sequence (xn ) is not Benford for any x0 . (ii) Similarly, if g is the C ∞ -function ( −2 e−3x if x 6= 0 , g(x) = if x = 0 , 0 then the associated map Ng (x) = x − 16 x3 is smooth. Here limn→∞ xn = 0 = x∗ √ only if |x0 | < 2 3, and (xn ) is not Benford in this case; see Example 6.10(iii). On the other hand, recall from Example 6.25 that (xn ) is Benford for almost √ all, but not all, x with |x | > 2 3. (Note, however, that limn→∞ |xn | = +∞ 0 √ 0 for |x0 | > 2 3, and so, as a root-f nding algorithm, Newton’s method fails completely in this case.) z As will be seen in Chapter 10, the appearance of Benford’s law in Newton’s method and other root-f nding methods has important implications for estimating errors in scientif c calculations that rely on such algorithms.

116 6.6

CHAPTER 6

TIME-VARYING SYSTEMS

Except for Example 6.2(ii, iii), the sequences considered in this chapter so far have all been generated by the iteration of a single map f or, in dynamical systems parlance, by an autonomous system (or recurrence relation). Autonomous systems constitute a classical and well-studied f eld. Beyond this f eld there has been, in the recent past, an increased interest in systems that are nonautonomous, i.e., explicitly time-varying in one way or another [89]. This development is motivated and driven by important practical applications as well as purely mathematical questions. In this context, it is interesting to study how the results discussed previously extend to systems with the map f changing with n. In full generality, this is a very wide topic with many open problems, both conceptual and computational. Only a small number of pertinent results (some with partial proofs or no proofs at all) and examples are presented here. The interested reader is referred to [13] for a fuller account and references, and to [91, 95] for an intriguing specif c problem. Throughout this section, let (fn ) be a sequence of functions that map C ⊂ R into itself. As in previous sections, most results and examples are for C = R+ but can easily be adjusted to other settings. Given any x0 ∈ C, the sequence       (xn ) = f1 (x0 ), f2 f1 (x0 ) , . . . = fn ◦ . . . ◦ f1 (x0 ) is the (nonautonomous) orbit of x0 (under f1 , f2 , . . .). Equivalently, (xn ) is the unique solution of the (nonautonomous) recurrence relation xn = fn (xn−1 ) ,

n ∈ N.

As an analogue of the maps leading to exponential growth (as studied in Section 6.3), consider a sequence of maps fn : R+ → R+ of the form fn (x) = an x + gn (x) ,

n ∈ N,

(6.13)

where an > 0 and the function gn is small in some sense. Note that unlike in Theorem 6.13, an may be strictly less than 1 for some n in (6.13). The following result contains Corollary 6.14 as a special case. Theorem 6.40. For every n ∈ N, let fn : R+ → R+ be a map such that fn (x) = an x + gn (x) with an > 0. Assume that lim inf n→∞ an > 1 and that gn = o(x/ log x) uniformly as x → +∞. Then,  every suÿ ciently large x0 , Qn for fn ◦. . .◦f1 (x0 ) is Benford if and only if a j=1 j = (a1 , a1 a2 , . . .) is Benford.

Proof. Pick δ > 0 and N0 ∈ N such that an ≥ 1 + 2δ for all n ≥ N0 . Also, choose ξ1 > 10 large enough to ensure |gn (x) log x| < δx for all x ≥ ξ1 and n ∈ N. With this,   gn (x) fn (x) = x an + > x(an − δ) ≥ (1 + δ)x for all x ≥ ξ1 , n ≥ N0 . x

REAL-VALUED DETERMINISTIC PROCESSES

117

Observe that limx→+∞ fN0 ◦ . . . ◦ f1 (x) = +∞ for all x ≥ ξ1 . Hence there exists ξ2 > 0 such that fN0 ◦ . . . ◦ f1 (x) ≥ ξ1 whenever x ≥ ξ2 . For every x ≥ ξ2 and n ≥ N0 , therefore,   fn ◦ . . . ◦ f1 (x) = fn ◦ . . . ◦ fN0 +1 fN0 ◦ . . . ◦ f1 (x) ≥ (1 + δ)n−N0 ξ1 .

Given any x0 Q ≥ ξ2 , def ne yn = log xn = log fn ◦ . . . ◦ f1 (x0 ). Moreover, let n zn = yn − log j=1 aj , and note that, for every ε > 0,   εn gn+1 (xn ) n|gn+1 (xn )| ≤ ≤ n|zn+1 − zn | = n log 1 + an+1 xn 2xn log xn εn ≤ (n − N0 ) log(1 + δ) + log ξ1

for all suÿ ciently large n. Thus lim supn→∞ n|zn+1 − zn | ≤ ε/ log(1 + δ), and since ε > 0 was arbitrary, limn→∞ n(zn+1 − zn ) =Q0. By Proposition 4.6(vi), n the sequence (yn ) is u.d. mod 1 if and only if log j=1 aj is. In other words, Qn  (xn ) is Benford if and only if  j=1 aj is.   Example 6.41. (i) Let the (linear) maps fn : R → R be fn (x) = 2 + n1 x. Thusan = 2 + n1 > 2, and Theorem 6.40 applies with gn ≡ 0. It follows that fn ◦ . . . ◦ f1 (x0 )  is Benford  for all x0 > 0 (in fact, it is Benford for all Qn x0 6= 0), provided that j=1 aj is Benford. The latter clearly is the case since  Qn  Q Q n→∞ log nj=1 aj − log n−1 a j=1 j = log an −→ log 2 6∈ Q, and so log j=1 aj is u.d. mod 1, by Proposition 4.6(i). (ii) Consider the (non-linear) maps fn : R+ → R+ given by √ ( 1  if n is odd , p x + 5x2 + 4   2 fn (x) = 12 x + |5x2 − 4(−1)n | = p   1 |5x2 − 4| if n is even , 2 x+ √ for which, with ϕ = 21 (1 + 5) again, |5x2 − 4(−1)n | − 5x2 2 √ ≤ √ , x > 0, |fn (x) − ϕx| = p 2 |5x2 − 4(−1)n | + 2x 5 x 5

and hence fn (x) − ϕx = O(x−1 ) uniformly as x →+∞. Since log ϕ is irrational,  Theorem 6.40 shows that (xn ) = fn ◦ . . . ◦ f1 (x0 ) is Benford for all suÿ ciently large x0 > 0, and in fact (xn ) is Benford for all x0 ∈ R because f1 (x0 ) > 0 and limn→∞ xn = +∞ for every x0 . Specif cally, the special case x0 = 0 yields (xn ) = (1, 1, 2, 3, 5, . . .) = (Fn ), and hence is another proof that the sequence (Fn ) of Fibonacci numbers is Benford, as already seen in Example 4.18. z The following two examples demonstrate that Theorem 6.40 is best possible in that its conclusion may fail if either the assumption lim inf n→∞ an > 1 or the uniformity in gn = o(x/ log x) are weakened. (That the order of magnitude in gn = o(x/ log x) is sharp as well has already been demonstrated in Example 6.17.)

118

CHAPTER 6

Example 6.42. In Theorem 6.40, the assumption lim inf n→∞ an > 1 cannot be replaced by the weaker requirement that an > 1 for all suÿ ciently large n, let alone by the still weaker assumption that lim inf n→∞ an ≥ 1. To see this, let 0 < δ < 1 and δ δ an = 10(n+1) −n , n ∈ N .

Note that an > 1, and an − (1 + δnδ−1 / log e) = O(n2δ−2 ) as n → ∞. Also, Qn  (n+1)δ −1 the sequence ) is Benford because (nδ ) is u.d. mod 1 j=1 aj = (10 by Proposition 4.6(v). For every constant c > 0, the smooth map f∞ : R+ → R with x f∞ (x) = x + c − log(2 + x)2

′ is convex, with f∞ (0) = c, f∞ (0) < 0, and limx→+∞ f∞ (x) = +∞. Thus it is ∗ possible to choose c = c such that f∞ has a unique f xed point x∗ > 0 with ′ f∞ (x∗ ) = 0. (Numerically, one f nds c∗ = 5.155 and x∗ = 1.584.) With this, next consider the maps fn : R+ → R+ given by

fn (x) = an x + c∗ −

x , log(2 + x)2

n ∈ N.

With gn (x) ≡ c∗ − x/ log(2 + x)2 , clearly gn = o(x/ log x) uniformly as x → +∞. Thus all assumptions of Theorem 6.40 are satisf ed — except for the fact that limn→∞ an = 1. Since fn (x) ≤ an x + c∗ for all n ∈ N and x ∈ R+ , it is clear that log xn = log fn ◦ . . . ◦ f1 (x0 ) = O(nδ ) as n → ∞. On the other hand, it is readily conf rmed that fn (x) < x for all suÿ ciently large n and (1−δ)/3 x∗ + 1 ≤ x ≤ 10n . Therefore, whenever δ < 13 (1 − δ), or equivalently 1 δ < 4 , lim supn→∞ xn < +∞. Since fn → f∞ uniformly on every compact ∗ + subset of R+ , it follows that limn→∞ xn = x∗ for every for  x ∈ R . Hence  1 + 0 < δ < 4 and any x ∈ R , the sequence (x ) = f ◦ . . . ◦ f (x ) is not n n 1 0 Qn 0 Benford but z j=1 aj is. Thus the conclusion of Theorem 6.40 fails.

Example 6.43. To require that gn = o(x/ log x) as x → +∞ for every n ∈ N is not enough in Theorem 6.40, i.e., the uniformity assumption cannot in general be dropped from that theorem. For a simple example demonstrating this, consider the maps fn : R+ → R+ with p 9n − 3n , n ∈ N . fn (x) = 1 + 4x2 + 9n − 3n = 2x + 1 + √ 4x2 + 9n + 2x Here an ≡ 2, and for every n ∈ N,

9n gn (x) = 1 + √ − 3n = O(1) as x → +∞ . 4x2 + 9n + 2x

Thus all assumptions of Theorem 6.40 are met — except that gn = o(x/ log x) does not hold uniformly as x → +∞ because, for instance, √ log x gn (x) n = −n(3 − 5) log 3 + n3−n log 3 . x x=3

119

REAL-VALUED DETERMINISTIC PROCESSES

Since fn (x) ≤ 2x + 1, it is clear that xn = O(2n ) as n → ∞. On the other hand, fn (x) < x for all suÿ ciently large n and 2 ≤ x ≤ 21 3n . Also, since fn (x) = 1 +

p 4x2 + 9n − 3n = 1 + √

4x2 n→∞ −→ 1 2 n n 4x + 9 + 3

+ uniformly on every compact subsetn of R , it follows that limn→∞ xn = 1 for Qn every x0 > 0. Since j=1 aj = (2 ) is Benford while (xn ) = fn ◦ . . . ◦ f1 (x0 ) is not, the conclusion of Theorem 6.40 again fails. z

As Theorem 6.40 shows, in order to understand the Benford property of  nonautonomous orbits (xn ) = fn ◦ . . . ◦ f1 (x0 ) with fn given by (6.13), one  Qn only has to decide whether the sequence j=1 aj is Benford. The following simple observation helps with this task. Lemma Q6.44. Let (an ) be a sequence of positive real numbers. Then the n sequence a = (a1 , a1 a2 , . . .) is Benford if j j=1 (i) limn→∞ an = a∞ exists, a∞ > 0, and log a∞ is irrational; or

(ii) an = g(n) for all n ∈ N, where g is any non-constant polynomial. Proof. If (i) holds then Yn+1 Yn log aj − log j=1

j=1

n→∞

aj = log an+1 −→ log a∞ ,

Qn  and hence j=1 aj is Benford whenever log a∞ 6∈ Q, by Proposition 4.6(i). Assume in turn that (ii) holds, i.e., an = g(n) for all n ∈ N and some non-constant polynomial g(t) = αp tp + αp−1 tp−1 + . . . + α0 , where p ∈ N and α0 , . . . , αp−1 , αp ∈ R with αp > 0. It is straightforward to see that, for all n ∈ N, Yn Xn log aj = log g(j) j=1

j=1

=p

Xn

j=1

log j + n log αp +

Xn 1 αp−1 log e + βn , j=1 j αp

where (βn ) is a convergent sequence. The Euler summation formula, for instance, shows that the sequences   Xn  Xn 1 log j − 12 log n − n log n + n log e and log e − log n j=1 j=1 j

both converge. It follows that the sequence X    n p αp−1 log g(j) − pn log n − n(log αp − p log e) − + log n j=1 2 αp Qn  is convergent as well. With Proposition 4.6(iii), j=1 aj is Benford if and   only if h(n) is u.d. mod 1, where h(t) = pt log t + t(log αp − p log e), and an  application of [93, Exc. I.2.26] shows that h(n) is u.d. mod 1. 

120

CHAPTER 6

Example 6.45. (i) Let f1 (x) = x, and for n ≥ 2 consider the linear √ maps 1 fn (x) = Fn /Fn−1 x. Thus an = Fn /Fn−1 and lim a = ϕ = 1 + 5 , and n→∞ n 2  since log ϕ is irrational, fn ◦ . . . ◦ f1 (x0 ) = (Fn x0 ) is Benford for all x0 6= 0. that (Fn ) is Benford. Specif cally choosing x0 = 1 provides yet another proof  (ii) For fn (x) = nx, Lemma 6.44 implies that fn ◦ . . . ◦ f1 (x0 ) = (n!x0 ) is Benford for every x0 6= 0, as already suggested by Figure 6.1. z

The remainder of this section focuses on nonautonomous analogues of the maps leading to super-exponential growth (as studied in Section 6.4). This also provides a natural opportunity to present proofs for (more general versions of) two results stated and used earlier (Propositions 6.20 and 6.31). To formulate a nonautonomous analogue of Proposition 6.20, consider the maps fn : R+ → R+ with fn (x) = an xbn ,

n ∈ N,

(6.14)

where (an ) and (bn ) are sequences of positive and non-zero real numbers, respectively. The following is a generalization of Proposition 6.20, to which it reduces if an ≡ a and bn ≡ b. Note that unlike for the latter, bn > 1 is not required to hold for every n in (6.14). bn Theorem 6.46. For every  n ∈ N, let fn(x) = an x with an > 0 and bn 6= 0. If lim inf n→∞ |bn | > 1 then fn ◦. . .◦f1 (x0 ) is Benford for almost all x0 > 0, but + every non-emptyopen interval in R contains uncountably many x0 for which fn ◦ . . . ◦ f1 (x0 ) is not Benford.

Proof. For every n ∈ N, let hn (y) = log fn (10y ) = bn y + log an . Note that if x0 > 0 then xn = fn ◦ . . . ◦ f1 (x0 ) > 0 for all n, and yn = log xn = log fn ◦ . . . ◦ f1 (x0 ) = hn ◦ . . . ◦ h1 (log x0 ) .

Clearly, the map Hn = hn ◦ . . . ◦ h1 is linear for every n; in fact   Xn log aj . Hn (y) = hn ◦ . . . ◦ h1 (y) = b1 b2 · · · bn y + j=1 b1 b2 · · · bj

Since lim inf n→∞ |bn | > 1 and Hn has non-zero slope, it can be assumed without loss of generality that |bn | ≥ 10α > 1 for some α > 0 and all n ∈ N. Also, ′ Hm − Hn′ is constant (hence monotone), and for all m > n,

′ |Hm − Hn′ | = |bn+1 · · · bm − 1| · |b1 · · · bn | ≥ (10α − 1)10α > 0 .   By Proposition 4.14, Hn (y) is u.d. mod 1 for almost all y ∈ R. Thus (xn ) is Benford for almost all x0 > 0. It remains to establish the claims regarding the exceptional points. To this end, for every 0 < δ < 41 denote by Nδ the smallest integer such that δ10αNδ ≥ 1, i.e., Nδ = −⌊α−1 log δ⌋, and f x δ > 0 so small that 5δNδ < 1. Fix any y ∈ R and any 0 < ε < 41 . Since |b1 b2 · · · bNε |2ε ≥ 2 · 10αNε ε ≥ 2, the interval

121

REAL-VALUED DETERMINISTIC PROCESSES

HNε ([y − ε, y + ε]) has length at least 2. Consequently, there exist an integer k1 and a closed interval J1 ⊂ [y−ε, y+ε] such that HNε (J1 ) = [k1 + 12 −δ, k1 + 12 +δ]. But then HNδ +Nε (J1 ) again has length at least 2, and so there exist k2 ∈ Z and a closed interval J2 ⊂ J1 such that HNε (J2 ) ⊂ [k1 + 12 − δ, k1 + 12 + δ] and HNδ +Nε (J2 ) = [k2 + 21 − δ, k2 + 12 + δ] . Continuing in this manner, there exist integers k1 , k2 , . . . and closed intervals J1 ⊃ J2 ⊃ . . . such that, for every n ∈ N, H(j−1)Nδ +Nε (Jn ) ⊂ [kj + and

1 2

− δ, kj +

1 2

H(n−1)Nδ +Nε (Jn ) = [kn +

+ δ] for all j = 1, 2, . . . , n − 1 , 1 2

− δ, kn +

1 2

+ δ] .

Since it is the intersection of a sequence of nested compact T T∞(and non-empty) ∗ intervals, the set ∞ J is not empty, and for every y ∈ n=1 n n=1 Jn , H(n−1)Nδ +Nε (y ∗ ) ∈ [kn +

1 2

− δ, kn +

1 2

+ δ] for all n ∈ N .

It follows that #{1 ≤ n ≤ N : hHn (y ∗ )i ∈ [ 12 − δ, 12 + δ]} 1 ≥ > 5δ , N Nδ   and so Hn (y ∗ ) is not u.d. mod 1. For convenience, let    U = y ∈ R : Hn (y) is u.d. mod 1 . lim inf N →∞

(6.15)

With this, the above argument shows that R \ U is dense. Moreover, consider the continuous, 1-periodic function ψ : R → [0, 1] given by  if |hti − 12 | ≥ 2δ ,   0 1 2 − | 2 − hti|/δ if δ ≤ |hti − 12 | < 2δ , ψ(t) =   1 if |hti − 21 | < δ . R1   PN Since 0 ψ(t) dt = 3δ, (6.15) implies lim inf N →∞ N1 n=1 ψ Hn (y ∗ ) ≥ 5δ, and therefore Z 1 1 XN   ∗ lim inf N →∞ ψ Hn (y ) − ψ(t) dt ≥ 2δ . (6.16) n=1 N 0

For every m ∈ N, def ne the set Um ⊂ R as   Z 1 1 XN   Um = y ∈ R : ψ Hn (y) − ψ(t) dt ≤ δ for all N ≥ m , n=1 N 0

and note that Um is closed because the functions ψ and H1 , H2 , . . . are continuous. Also, U1 ⊂ U2 ⊂ . . ., and (6.16) shows that Um does not contain any open intervals, that is, Um has empty interior. Finally, note that if

122

CHAPTER 6

  y ∈ U , i.e., if Hn (y) is u.d. mod 1, then it follows from [93, Cor. I.1.2] that   R1 PN limN →∞ N1 n=1 ψ Hn (y) = S ψ(t) dt, and hence y ∈ Um for all suÿ ciently 0 ∞ large m. In other words, U ⊂ m=1 Um . Thus the set U is contained in the countable union of the S nowhere dense sets S U1 , U2 , . . .. For every non-empty open interval J ⊂ R, J = m∈N (J ∩ Um ) ∪ x∈J\U {x}, so since J ∩ Um and {x} are nowhere dense for all m ∈ N and x ∈ R, the Baire Category Theorem implies that J \ U is uncountable. It follows that the set {x0 ∈ J : (xn ) is not Benford } is uncountable for every non-empty open interval J ⊂ R+ .  Example 6.47. (i) Let fn (x) = 2n x2 for every n ∈ N. By Theorem 6.46,  (xn ) = fn ◦ . . .◦ f1 (x0 ) is Benford for almost all, but not all, x0 ∈ R. Note that in order to draw this conclusion it is not necessary to know that limn→∞ xn = 0 if and only if |x0 | ≤ 41 , and limn→∞ xn = +∞ otherwise. n 2 (ii) Let fn (x) =  2 /x for every  n ∈ N. Again, Theorem 6.46 applies and shows that (xn ) = fn ◦. . .◦f1 (x0 ) is Benford for almost all, but not all, x0 ∈ R. As in (i), this conclusion does not depend on the specif c behavior of (xn ) which is now slightly more complicated: If |x0 | > 22/9 then limn→∞ x2n = +∞ and limn→∞ x2n−1 = 0, whereas for |x0 | < 22/9 the roles of (x2n ) and (x2n−1 ) are reversed. Moreover, xn = 2(2+3n)/9 → +∞ if |x0 | = 22/9 . z In Theorem 6.46, it is not enough to assume thatbn > 1 for all (suÿ  ciently large) n ∈ N. Under the latter, weaker assumption, fn ◦ . . . ◦ f1 (x0 ) may be Benford for every x0 > 0, or for none at all, as the following example shows. 1/n

Example 6.48. (i) Consider the maps fn (x) = 10n x3 bn = 31/n > 1 + n−1 , for which Pn

Qn

1/i

, i.e., an = 10n and

Pn j=1 1/j

xn = fn ◦ . . . ◦ f1 (x0 ) = 10 j=1 j i=j+1 3 x30 , n ∈ N.   Pn 1 1 −2 Recall the well-known fact that j=1 j − ln n − γ − 21 n−1 + 12 n = O n−4 as n → ∞, where γ = 0.5772 is Euler’s constant. With this, it is straightforward to show that log xn =

n2 + α1 nln 3 + α2 n + α3 nln 3−1 + βn , 2 − ln 3

with a convergent sequence (βn ) and the appropriate α1 , α2 , α3 ∈ R, where the latter depend on log x0 but, more importantly, are independent of n. It now follows from [93, Thm. I.3.1] that (log xn ) is u.d. mod 1, i.e., (xn ) is Benford for every x0 > 0. 2 (ii) On the other hand, if an ≡ 1 and bn = 31/n > 1 + n−2 , and thus 1/n2

fn (x) = x3

, then

Qn

xn = fn ◦ . . . ◦ f1 (x0 ) = x0

j=1

31/j

2

Pn 2 j=1 1/j

= x03

,

n ∈ N,

and so, for every x0 > 0 the sequence (xn ) is not Benford since it converges to P∞ 2 1/j 2 α j=1 the f nite positive limit limn→∞ xn = x0 with α = 3 = 3π /6 = 6.093. z

123

REAL-VALUED DETERMINISTIC PROCESSES

In analogy to Section 6.4, maps more general than fn (x) = an xbn are now considered. In situations where most of the maps fn : R+ → R+ are strongly expanding, the following generalized version of Proposition 6.31 may be useful. Only an outline of the proof is given here, under an additional assumption, and the reader is referred to [13] for a complete proof. Theorem 6.49. Let c ≥ 0 and, for every n ∈ N, let fn : R+ → R+ be a map such that both of the following conditions hold: (i) The function log fn (10x ) is convex on (c, +∞); log fn (10x ) − log fn (10c ) ≥ bn > 0 for all x > c. x−c   If lim inf n→∞ bn > 1 then fn ◦ . . . ◦ f1 (x0 ) is Benford for almost  all suÿ ciently large x0 , but there also exist uncountably many x0 > c for which fn ◦. . .◦f1 (x0 ) is not Benford. (ii)

Outline of Proof. For every n ∈ N, let gn (x) = log fn (10x ). Then, by assumption, every map gn is convex on (c, +∞), and gn (x) − gn (c) ≥ bn (x − c). The main idea of the proof is given here only for the special case where, in addition, x−1 gn (x) is non-decreasing, and bn ≥ b > 1, thecomplete details of which are in [15, Sec. 5].  First, f x b > 1. By  Theorem 4.2, fn ◦ . . . ◦ f1 (x0 ) is Benford if and only if log fn ◦ . . . ◦ f1 (x0 ) is u.d. mod 1. Since log maps sets of measure zero into sets of measure zero, setting Sn (x) = log fn ◦ . . . ◦ f1 (10x ) = gn ◦ . . . ◦ g1 (x) , it suÿ ces to show that for all suÿ ciently large j ∈ N,   Sn (x) is u.d. mod 1 for almost all x ∈ [j − 1, j] .

(6.17)

Fix 0 < s < 1 and let Yn = 1[0,s) (hSn i), i.e., Yn = 1 if hSn i < s, and Yn = 0 otherwise. Since a random variable X is uniformly distributed on [j − 1, j] if and only if P(X ≤ j − 1 + s) = s for all rational 0 < s < 1, and since countable unions of sets of measure zero have measure zero themselves, to establish (6.17), it suÿ ces to show that Y1 + · · · + Yn →s n

a.s. as n → ∞ .

(6.18)

In general the (Yn ) are neither independent nor identically distributed, so the classical Strong Law of Large Numbers does not apply. However, using the fact that if log f and log g are convex, non-decreasing, and nonnegative, then so are f , g, and log(f ◦ g), and the fact [15, Lem. 5.6] that if f : [0, 1] → R is convex, non-decreasing, and nonnegative, then for all 0 < s < 1, s−

1 f ′+ (0)

  ≤ λ {x ∈ [0, 1] : hf (x)i ≤ s} ≤ s +

2 , f ′+ (0)

124

CHAPTER 6

(6.18) can be established [15, Thm. 5.5] using a strong law for averages of bounded random variables satisfying an O(N −3 )-growth constraint on their correlations [100, p. 154].  Remark. As in the autonomous context of Section 6.4, when applied to the maps fn (x) = an xbn with positive an , bn , Theorem 6.49 only yields a weaker form of Theorem 6.46. Example 6.50. (i) Let fn be given by  2 x if n is a prime number , fn (x) = 2x if n is not a prime number .   By Theorem 6.49, (xn ) = fn ◦ . . . ◦ fn (x0 ) is Benford for almost all, but not all, suÿ ciently large x0 , and in fact for almost all x0 ∈ R, since x4 > 1 and in any case limn→∞ xn = +∞. (ii)For fn (x) = x2n +1,  n ∈ N, Theorem 6.49 applies with c = 0 and bn = n. Hence fn ◦ . . . ◦ fn (x0 ) is Benford for almost all, but not all, x0 ∈ R. (iii) With the map fn : R+ → R+ given by  x if n is a prime number , fn (x) = x2 if n is not a prime number , assumptions (i) and (ii) in Theorem 6.49 hold with c = 0 and bn = 1 or bn = 2, depending on whether n is prime or not. Consequently, lim inf n→∞ bn = 1, and the theorem does not apply. However, it follows from [10, Thm. 3.1] that  fn ◦ . . . ◦ fn (x0 ) is Benford for almost all x0 ∈ R+ . z

In Theorem 6.49, the assumptions of convexity and lim inf n→∞ bn > 1 can be relaxed somewhat [13]. However, as the following example shows, the conclusion of the theorem may fail if one of its hypotheses is violated for even a single n. Example 6.51. The functions fn given by  2 x if n 6= 2015 , fn (x) = 10 if n = 2015 , satisfy (i) and (ii) in Theorem 6.49 for every n 6= 2015, but do not satisfy (ii) for n = 2015. Clearly, (xn ) is not Benford for any x0 > 0 because D1 (xn ) = 1 whenever n ≥ 2015. z 6.7

CHAOTIC SYSTEMS: TWO EXAMPLES

The scenarios studied so far for their conformance to Benford’s law have all been dynamically very simple indeed: In Theorems 6.13, 6.23, 6.40, and 6.49, limn→∞ xn = +∞ holds automatically for all relevant initial values x0 , whereas limn→∞ xn = 0 in Corollaries 6.18 and 6.28. While  this dynamical simplicity of (xn ) does not necessarily force the behavior of S(xn ) to be equally simple

REAL-VALUED DETERMINISTIC PROCESSES

125

(recall Example 6.21), it raises the question of what may happen under more general circumstances, that is, in situations where (some or most) orbits exhibit a less trivial long-term behavior. The present section presents two simple examples in this regard. Both systems are chaotic in the sense that nearby orbits diverge quickly, exhibiting what appears to be a completely erratic pattern of recurrence. While the latter is very intricate mathematically, as far as Benford’s law is concerned, fortunately, the discussion of the two examples can be kept quite informal and non-technical. Nevertheless, the examples illustrate how Benford sequences, though not completely absent, may be less prevalent here than for the simple dynamical systems studied in earlier sections. Example 6.52. Let f : R → R be the classical tent map def ned in Example 1 2.8(i),  n i.e.,  f (x) = 1 − |2x − 1|. Since f (x) = 2x for all x ≤ 2 , it is clear that f (x0 ) is Benford whenever x0 6∈ [0, 1]. Also, f (0) = f (1) = 0.  As far  as Benford’s law is concerned, therefore, it only remains to analyze f n (x0 ) for 0 < x0 < 1. To this end, similarly to Section 6.4, consider the set    B = x ∈ [0, 1] : f n (x) is Benford . From the graph of f and its iterates (see Figure 6.6), it is evident that f has many periodic points, and the latter are actually dense in [0, 1]. Concretely, it is not hard to see that x0 is a p-periodic point (p ∈ N) whenever   2j p−1 x0 ∈ : j = 1, 2, . . . , 2 . 2p + 1

example, x0= 32 is a f xed point, and x0 = 94 is 3-periodic, its orbit being For 8 2 4 8 2 9 , 9 , 9 , 9 , 9 , . . . . Thus [0, 1] \ B contains many points. Moreover, note that for every 0 < a < b < 1,     λ0,1 ◦ f −1 ([a, b]) = λ0,1 21 a, 12 b ∪ 1 − 12 b, 1 − 21 a = b − a = λ0,1 ([a, b]) ,

which shows that λ0,1 ◦ f −1 = λ0,1 , i.e., f is λ0,1 -preserving. In fact, it can be shown that f is with respect to λ0,1 . By the Birkhoff Ergodic  even ergodic  Theorem, (xn ) = f n (x0 ) is distributed according to λ0,1 for (Lebesgue) almost all  x0 ∈ [0, 1]. Recall from Example 3.10 (i) that for every such x0 the sequence S(xn ) is uniformly distributed in [1, 10), and hence is not Benford. It follows that λ0,1 (B) = 0, and the reader may wonder whether B 6= ∅, i.e., whether B contains any points at all. As the following argument shows, it does indeed. Let IL and IR denote, respectively, the left and right half of [0, 1], that is, IL = [0, 21 ] and IR = [ 12 , 1], and note that f (x) = 2x whenever x ∈ IL . Thus xn = 2xn−1 , and (xn ) would clearly be Benford if only xn = f n (x0 ) ∈ IL held for all n. Unfortunately, though, the latter is impossible, since the only point x0 with f n (x0 ) ∈ IL for all n ∈ N is the f xed point x0 = 0, the orbit of which trivially is not Benford. Recall, however, that being Benford is an asymptotic property, and hence in order for (xn ) to be Benford it is enough for xn ∈ IL

126

CHAPTER 6

1 1

xn f(x)

0

n 100

200

(∆ = 18.99) 0.2 0

f 2 (x) 0

x 1 0

  Figure 6.6: For the tent map f (x) = 1 − |2x − 1|, typical orbits (xn ) = f n (x0 ) with 0 < x0 < 1 are uniformly distributed in [0, 1] (top right) and hence have a non-Benford asymptotic distribution of f rst signif cant digits (bottom right); see Example 6.52.

to hold for most n. To be concrete, consider any sequence (ωn ) in {L, R}, i.e., (ωn ) is a sequence made up of the two symbols L and R, of the form   (ωn ) = L, L, . . . , L, R, L, L, . . . , L, R, L, L, . . . , L, R, L . . . , (6.19) | {z } | {z } | {z } N1 times

N2 times

N3 times

where (Nn ) is any sequence of positive integers. Since, for every n ∈ N, the set Jn = IL ∩

n \

j=1

 (f j )−1 (Iωj ) = x ∈ [0, 12 ] : f j (x) ∈ Iωj for all j = 1, 2, . . . , n

is a closed interval of length 2−(n+1) with J1 ⊃ J2 ⊃ . . ., it is clear that there exists a unique point xω ∈ [0, 21 ] such that f n (xω ) ∈ Iωn for every n ∈ N. Specif cally, if (Nn ) is increasing, and hence limn→∞ N  n = ∞, it follows from [13, Lem. 2.7(i)], and can also be verif ed directly, that f n (xω ) is Benford. For a concrete example, choose (Nn ) = (n), i.e., take (ωn ) = (L, R, L, L, R, L, L, L, R, L, L, L, L, R, L, L, L, L, L, R, L . . .) , in which case X∞ xω = 2−n(n+3)/2 (−1)n+1 = 2−2 − 2−5 + 2−9 − 2−14 + 2−20 − . . . = 0.2206 , n=1

REAL-VALUED DETERMINISTIC PROCESSES

127

 n  p and  n f (xω ) is Benford. Notice that if f (x0 ) = xω for some p ∈ N, then f (x0 ) is Benford as well. It follows that B is dense in [0, 1]. Finally, observe that (6.19) with arbitrary increasing (Nn ) yields uncountably many different points xω . Thus B is uncountable. Despite being a nullset, therefore, the set B is actually quite large in that it is uncountable and dense in [0, 1]. z Remark. Conclusions similar to those in Example 6.52 also hold for the popular (full ) logistic map f (x) = 4x(1 − x); see [13]. For an alternative approach to Benford’s law for tent and logistic maps, the reader is referred to [151]. Example 6.53. Dynamically, the smooth map f : R → R given by ( 2 1 − e8x(x−1)/(2x−1) if x 6= 21 , f (x) = 1 if x = 12 , has much with the tent map in Example 6.52: If x0 6∈ [0, 1] then  in common  (xn ) = f n (x0 ) is very regular in that limn→∞ xn = x∗ , with x∗ = −6.903 denoting the unique attracting f xed point of f . In this case, (xn ) clearly is not Benford. (Using Lemma 6.36, it is not hard to see that (xn −x∗ ) and (xn+1 −xn ) are Benford.) Also, f (0) = f (1) = 0, and once  again it only remains to consider the case 0 < x0 < 1, for which the orbit f n (x0 ) is typically quite chaotic. However, as Figure 6.7 (top right) suggests, this chaotic behavior is signif cantly different from the one observed in Example 6.52 in that typical orbits now seem to be close to x = 0 unproportionally often, that is, xn ≈ 0 for many n. More formally, it can be proved that for every ε > 0 and almost all 0 < x0 < 1, limN →∞

#{1 ≤ n ≤ N : f n (x0 ) < ε} = 1. N

(6.20)

Note next that f (x) ≈ f ′ (0)x = 8x for x ≈ 0, and (6.20), together with the fact that log 8 is irrational, strongly suggests that (xn ) is Benford whenever (6.20) holds, namely, for almost all x0 . A rigorous argument [13] shows that indeed   λ0,1 x0 ∈ [0, 1] : (xn ) is Benford = 1 .

Unlike in the previous example, Benford orbits of f now constitute a set of full measure. The deeper reason for this, and indeed for (6.20), can be seen in the fact that, unlike the tent map, the map f does not preserve any absolutely continuous probability measure, but does preserve an absolutely continuous inf nite measure with well-understood properties [161]. z

6.8

DIFFERENTIAL EQUATIONS

By presenting a few results on, and examples of, differential equations, i.e., continuous-time deterministic processes, this section aims at convincing the reader that the emergence of Benford’s law is not at all restricted to discrete-time

128

CHAPTER 6

1 1

f (x)

xn

0

n 100

200

(∆ = 0) 0.2 0

f 2 (x) 0

x 1 0

  Figure 6.7: The map f in Example 6.53 has a f at critical point, and f n (x0 ) is Benford for almost all x0 ∈ [0, 1]. dynamics. Rather, solutions of ordinary or partial differential equations often are Benford as well. Recall that a (Borel measurable) function f : [0, +∞) → R is Benford if and only if log |f | is u.d. mod 1. Let F : R → R be a continuously differentiable function, and, given any x0 ∈ R, consider the initial value problem (IVP) x˙ = F (x) ,

x(0) = x0 ;

(6.21)

here, as usual, x˙ denotes the f rst derivative of x = x(t) with respect to t, that d is, x˙ = dt x. To keep the presentation simple, it will be assumed throughout that For every x0 ∈ R, the IVP (6.21) has a unique solution x = x(t) for all t ≥ 0. (6.22) The following simple, explicitly solvable examples illustrate how solutions of (6.21) may or may not be Benford. Example 6.54. (i) If F (x) ≡ 1 then the solution of (6.21) is x(t) = x0 + t, and hence not Benford, by Example 4.9(ii). (ii) For F (x) = ax with a ∈ R, one f nds x(t) = x0 eat , and so x is Benford unless ax0 = 0, by Example 4.9(iii). p (iii) If F (x) = −x3 then x(t) = x0 / 1 + 2tx20 , and Proposition 4.8(ii) shows that no solution of (6.21) is Benford. z

Note that if F (x0 ) = 0 then x(t) ≡ x0 is the unique solution of (6.21), referred to as an equilibrium (or stationary) solution. Clearly, no equilibrium

129

REAL-VALUED DETERMINISTIC PROCESSES

solution is Benford. On the other hand, if F (x0 ) > 0, say, then by (6.22) either limt→+∞ x(t) = +∞ or limt→+∞ x(t) = x∗ for some x∗ ∈ R with x∗ > x0 . In the f rst case, F (x) > 0 for all x > x0 , and it can be assumed without loss of generality that x0 = 0. (Otherwise replace x by x + x0 .) In the second case, F (x) > 0 for all x ∈ (x0 , x∗ ), but F (x∗ ) = 0, and x∗ − x solves (6.21) with F (x) and x0 replaced by Fe (x) = −F (x∗ − x) and x∗ − x0 , respectively. Note that Fe(0) = 0 and Fe(x) < 0 for all x ∈ (0, x∗ − x0 ). The case of F (x0 ) < 0 can be dealt with similarly. As far as Benford’s law is concerned, therefore, only the following two special cases of (6.21) have to be considered with x0 > 0: Either F (x) > 0 for all x > 0 (and hence limt→+∞ x(t) = +∞), or else F (0) = 0 and F (x) < 0 for all x > 0 (and hence limt→+∞ x(t) = 0). Simple results pertaining to these two cases wil now be discussed separately. Assume f rst that F (x) > 0 for all x > 0. As suggested by Example 6.54(i), if F (x) only grows slowly as x → +∞, or does not grow at all, then the solutions of (6.21) are not Benford. More formally, the following continuous-time analogue of Theorem 6.4 holds. Theorem 6.55. Let F : R → R be C 1 , and assume that F (x) > 0 for all x > 0. If F = o(x1−ε ) as x → +∞ with some ε > 0, then, for every x0 > 0, the solution of (6.21) is not Benford. Proof. Since limt→+∞ x(t) = +∞, there exists t0 ≥ 0 with x(t) ˙ ≤ x(t)1−ε for all t ≥ t0 . But then, for every t ≥ t0 , x(t)ε = x(t0 )ε +

Z

t

t0

d ε x dτ = x(t0 )ε + dτ

Z

t

t0

εx(τ )ε−1 x(τ ˙ ) dτ ≤ x(t0 )ε + ε(t − t0 ) ,

which in turn shows that lim supt→+∞ log x(t)/ log t ≤ ε−1 < +∞, and hence, by Proposition 4.8(ii), log x is not u.d. mod 1, i.e., x is not Benford.  Example 6.56. If F (x) = x3 /(1 + x4 ) then F = O(x−1 ) as x → +∞, and so Theorem 6.55 applies, showing that no solution of (6.21) with x0 > 0 is Benford. d Since dt (−x) = −x˙ = −F (x) = F (−x), the same is true for x0 < 0. Finally, x0 = 0 is an equilibrium, so no solution of (6.21) is Benford. Note that rather than by invoking Theorem 6.55, this conclusion could also have been reached via an explicit calculation: For every x0 6= 0, p x40 + 2tx20 − 1 + (x40 + 2tx20 − 1)2 + 4x40 2 x(t) = , t ≥ 0, 2x20 which again shows that lim supt→+∞ log |x(t)|/ log t < +∞ and, as in the proof of Theorem 6.55, implies that x is not Benford. z In complete analogy to the discrete-time case (Example 6.7(ii)), the assumption F = o(x1−ε ) as x → +∞ in Theorem 6.55 cannot be weakened to F = o(x), as the following simple example shows.

130

CHAPTER 6

Example 6.57. Let F : R → R+ be any C 1 -function F with F (x) = x/ log x for all x ≥ 2. If x0 ≥ 2 then  2 log x(t) = (log x0 )2 +

Z

t

2 log e log x(τ )

0

x(τ ˙ ) dτ = (log x0 )2 +2t log e , x(τ )

t ≥ 0.

p Since log x(t) = (log x0 )2 + 2t log e is u.d. mod 1 by Example 4.7(iii) and Proposition 4.8(i), every solution of (6.21) is Benford. Thus the conclusion of Theorem 6.55 may fail if F = o(x) as x → +∞. z As the example F (x) = ax with a > 0 illustrates, if F (x) grows faster than o(x1−ε ) as x → +∞, then all solutions of (6.21) may be Benford. This is true in greater generality, as the following theorem, a continuous-time analogue of Theorem 6.13, shows. To fully appreciate the result, notice that the rate of growth of F (x) as x → +∞ is limited by (6.22) in that the latter implies that lim inf x→+∞ F (x)/x1+ε = 0 for every ε > 0. Theorem 6.58. Let F : R → R be C 1 , and assume that F (x) > 0 for all x > 0. If F (x)/x converges to a f nite positive limit as x → +∞, or if F (x)/x is non-decreasing on [c, +∞) for some c > 0, then the solution of (6.21) is Benford for every x0 > 0. Proof. Assume f rst that limx→+∞ F (x)/x = a ∈ R+ . With x(t) → +∞ as t → +∞, and letting y = log x, it follows that   F x(t) t→+∞ x(t) ˙ = log e −→ a log e . (6.23) y(t) ˙ = log e x(t) x(t) Fix any δ ∈ (0, 1) and let yn = y(nδ) for all n ∈ N. With this, yn+1 − yn =

Z

(n+1)δ

y(t) ˙ dt =



Z

0

δ

n→∞

y(nδ ˙ + t) dt −→ aδ log e .

For all but countably many δ, therefore, the sequence (yn ) is u.d. mod 1, and Proposition 4.8(i) implies that y is u.d. mod 1 as well. In other words, x is Benford. To complete the proof, assume that F (x)/x is non-decreasing on [c, +∞). For all suÿ ciently large t, the function y = y(t) has a positive non-decreasing derivative, by (6.23). Hence y is u.d. mod 1 by [93, Exc. I.9.13], and again x is Benford.  Example 6.59. Let F (x) = 0
0, √ x2 + 1 1 √ −1= , x x2 + x x2 + 1

REAL-VALUED DETERMINISTIC PROCESSES

131

hence limx→+∞ F (x)/x = 1. (Note that F (x)/x is actually decreasing on R+ .) By Theorem 6.58, every solution of (6.21) is Benford. Again, this can be conf rmed by an explicit calculation, since   q q 2 2 x(t) = x0 cosh t + x0 + 1 sinh t = cosh t x0 + x0 + 1 tanh t , t ≥ 0 ,

is Benford for every x0 ∈ R; see Example 4.13(ii).

z

Remark. If, in the setting of Theorem 6.58, limx→+∞ F (x)/x equals 0 or +∞, then the solutions of (6.21) may or may not be Benford. For the case limx→+∞ F (x)/x = 0, for instance, this can be seen from Examples 6.57 (where every solution is Benford) and 6.56 (where none is). Now consider the second relevant scenario for the IVP (6.21) above. That is, assume F (0) = 0 and F (x) < 0 for all x > 0. As might be expected, with regard to Benford’s law this case is, in a sense, reciprocal to the scenario studied earlier. For instance, if |F (x)| does not become small too fast as x → 0, then all solutions of (6.21) with x0 close to 0 are Benford. More precisely, the following continuous-time analogue of Corollary 6.18 holds. Note that the result is a simple corollary of Theorem 6.58, via taking reciprocals. Theorem 6.60. Let F : R → R be C 1 with F (0) = 0. If F ′ (0) < 0 then, for every x0 6= 0 suÿ ciently close to 0, the solution of (6.21) is Benford. Proof. Let x0 6= 0 be so close to zero that xF (x) < 0 for all 0 < |x| ≤ |x0 |. Assume without loss of generality that x0 > 0. (The case x0 < 0 is analogous.) Hence F (x) < 0 for every x ∈ (0, x0 ). Since d −1 x˙ F (x) (x ) = − 2 = − 2 , dt x x the function x−1 is a solution of (6.21) with F (x) and x0 replaced by, respec1 e e tively, Fe(x) = −x2 F (x−1 ) and x−1 0 . Note that F is C , with F (x) > 0 for all suÿ ciently large x, and limx→+∞

Fe (x) F (x) = − limx→+∞ xF (x−1 ) = − limx→0 = −F ′ (0) > 0 . x x

Hence by Theorem 6.58, x−1 is Benford, and so is x, by Theorem 4.4.



Example 6.61. (i) Let F (x) = −x/(x2 + 1). By Theorem 6.60, every solution of (6.21) with x0 6= 0 is Benford. (ii) Consider the smooth function F (x) = − sin(πx). Clearly x(t) ≡ k is an equilibrium for every integer k. Since F ′ (k) = (−1)k+1 π, this equilibrium is attracting if k is even, and repelling (i.e., attracting for t → −∞) if k is odd. Note that if x is a solution of x˙ = F (x) then so is x − 2k, since F (x + 2) = F (x) for all x. It follows from Theorem 6.60 that x − 2k is Benford whenever x is a solution of (6.21) with 2k − 1 < x0 < 2k + 1. z

132

CHAPTER 6

If F (0) = 0 and F (x) < 0 for all x > 0, yet |F (x)| decreases rapidly as x → 0, then this may prevent the solutions of (6.21) from being Benford. A case in point is F (x) = −x3 , where no solution is Benford, as seen in Example 6.54(iii). The general observation is as follows. (Note that the smoothness assumption on F is stronger here than anywhere else in this section.) Theorem 6.62. Let F : R → R be C 2 with xF (x) ≤ 0 for all x in a neighborhood of 0. If F ′ (0) = 0 then, for every x0 suÿ ciently close to 0, the solution of (6.21) is not Benford. Proof. Let x0 6= 0 be so close to 0 that xF (x) ≤ 0 for all 0 ≤ |x| ≤ |x0 |. Again, assume without loss of generality that x0 > 0. If F (x∗ ) = 0 for some 0 < x∗ ≤ x0 then limt→+∞ x(t) ≥ x∗ > 0, and clearly x is not Benford. Thus it only remains to consider the case where F (x) < 0 for all 0 < x ≤ x0 . By assumption, there exists a continuous function G : R → R with G(x) > 0 for all 0 < x ≤ x0 such that F (x) = −x2 G(x). Let G0 = max0≤x≤x0 G(x) > 0, and consider the function y = − log x. Then limt→+∞ y(t) = +∞, and y˙ = − log e

x˙ = log e xG(x) = log e10−y G(10−y ) ≤ G0 log e10−y , x

which in turn yields, for every t ≥ 0, Z y(t) y(0) 10 − 10 =

0

t

10y(τ ) y(τ ˙ ) dτ ≤ G0 t , log e

and consequently lim supt→+∞ y(t)/ log t ≤ 1. Thus by Proposition 4.8(ii), y is not u.d. mod 1, i.e., x is not Benford.  Example 6.63. For the smooth function F (x) = −πx + sin(πx), clearly F (0) = F ′ (0) = 0, and xF (x) < 0 for all x 6= 0. By Theorem 6.62, no solution of (6.21) is Benford. z In Theorem 6.62, the requirement that F be C 2 can be weakened somewhat [15, Thm. 6.7]. Very similarly to its discrete-time counterpart (Corollary 6.9; cf. also Example 6.11), however, the conclusion may fail if F is merely C 1 . Example 6.64. For the C 1 -function F with F (0) = 0 and x F (x) = − p , (log |x|)2 + 1

x 6= 0 ,

xF (x) < 0 for all x 6= 0, and F ′ (0) = 0. Moreover, f x any x0 6= 0 and let y = − log |x|. Then x˙ log e y˙ = log e = p , x y2 + 1

from which it is straightforward to deduce that the function η(t) = y(t+1)−y(t) is decreasing with limt→+∞ η(t) = 0 and limt→+∞ tη(t) = +∞. Hence y is u.d. mod 1 by [93, Thm. I.9.4], and x is Benford. Thus the conclusion of Theorem 6.62 may fail if F is only C 1 . z

133

REAL-VALUED DETERMINISTIC PROCESSES

Finally, it should be mentioned that at present little seems to be known about the Benford property for solutions of partial differential equations or more general functional equations such as delay or integro-differential equations. Quite likely, it will be hard to decide in any generality whether many, or even most, solutions of such systems exhibit the Benford property in one form or another. Only one simple but fundamental example of a partial differential equation is discussed brief y here, namely, the so-called one-dimensional heat (or diffusion) equation ∂u ∂2u = , (6.24) ∂t ∂x2 a linear second-order equation for u = u(t, x). Physically, (6.24) describes, for example, the diffusion over time of heat in a homogeneous one-dimensional medium. Without further conditions, (6.24) has many solutions, of which u(t, x) = cx2 + 2ct , with any constant c 6= 0, is Benford neither in t (“time”) nor in x (“space”), by Example 4.9(ii), whereas by Example 4.9(i) and Theorem 4.10, the solution 2

u(t, x) = e−c t sin(cx) is Benford (or identically zero) in t but not in x, and u(t, x) = ec

2

t+cx

is Benford in both t and x. Usually, to specify a unique solution an equation like (6.24) has to be supplemented with initial and/or boundary conditions. Example 6.65. (i) A prototypical example of an Initial-boundary Value Problem (IBVP) consists of (6.24) together with u(0, x) = u0 (x) u(t, 0) = u(t, 1) = 0

for all 0 < x < 1 , for all t > 0 .

(6.25)

Physically, the conditions (6.25) may be interpreted as both ends of the medium, at x = 0 and x = 1, being kept at a reference temperature u = 0 while the initial distribution of heat is given by the function u0 : [0, 1] → R. It turns out that, under very mild assumptions on u0 , the IBVP consisting of (6.24) and (6.25) has a unique solution which, for any t > 0, can be written as a Fourier series, u(t, x) =

X∞

n=1

un e−π

2

n2 t

sin(πnx) ,

R1 where un = 2 0 u0 (s) sin(πns) ds. From this it is easy to see that, for every f xed 0 ≤ x ≤ 1, the function u = u(t, x) either vanishes identically or else is Benford (in time), by Theorem 4.12.

134

CHAPTER 6

(ii) Another possible set of initial and boundary data is u(0, x) = u0 (x) u(t, 0) = 0

for all x > 0 , for all t > 0 ,

(6.26)

corresponding to a semi-inf nite one-dimensional medium kept at zero temperature at its left end x = 0, with an initial heat distribution given by the (integrable) function u0 : [0, +∞) → R. Again, (6.24) together with (6.26) has a unique solution, given by Z +∞   2 2 1 u(t, x) = √ u0 (y) e−(x−y) /(4t) − e−(x+y) /(4t) dy for all t > 0 . 2 πt 0 R +∞ Assuming 0 y|u0 (y)| dy < +∞, it is not hard to see that, for every x ≥ 0, limt→+∞ t

3/2

x u(t, x) = √ 2 π

Z

+∞

yu0 (y) dy ,

0

and hence, for any f xed x ≥ 0, the function u is not Benford in time, except R +∞ possibly in the case of 0 yu0 (y) dy = 0. On the other hand, if, for example, u0 (x) = xe−x , then a short calculation conf rms that, for every t > 0, limx→+∞

ex u(t, x) = et , x

showing that u is Benford in space. Similarly, if u0 (x) = 1[0,1) (x) then limx→+∞ xe

(x−1)2 /(4t)

u(t, x) =

for every t > 0, and again u is Benford in space.

r

t π z

Chapter Seven Multi-dimensional Linear Processes For many applications, models based solely on the one-dimensional processes studied in the previous chapter are often too simple, and have to be replaced with or complemented by more sophisticated multi-dimensional models. The purpose of this chapter is to study Benford’s law in the simplest deterministic multi-dimensional processes, namely, linear processes in discrete and continuous time. Despite their simplicity, these systems provide important models for many areas of science. Through far-reaching generalizations of results from earlier chapters, they will be shown to very often conform to Benford’s law in that their dynamics is an abundant source of Benford sequences and functions. As in the previous chapter, the properties of continuous-time systems (i.e., differential equations) are analogous to those of discrete-time systems, and the chapter focuses on the latter in every but its last section. Again, recall throughout that by Theorem 4.2 a sequence or function is Benford if and only if its (decimal) logarithm is uniformly distributed modulo one. 7.1

LINEAR PROCESSES, OBSERVABLES, AND DIFFERENCE EQUATIONS

Recall the perhaps simplest example of a Benford sequence, f rst encountered in Chapter 4, namely, the sequence (an ) = (a, a2 , a3 , . . .) ,

(7.1)

where a is any real number with log |a| irrational. It is natural to ask what happens if the number a in (7.1) is replaced by a real d × d-matrix A with d ≥ 2. For instance, is it possible for the entries in, say, the upper-left corner (or at any other f xed position) in the sequence of matrices (An ) = (A, A2 , A3 , . . .) to be Benford? (Here and throughout, the powers of A are denoted by A, A2 , A3 , etc., and A0 is understood to equal Id , the d × d identity matrix.) For a concrete example, consider the 2 × 2-matrix   1 1 A= , (7.2) 1 0

136

CHAPTER 7

for which it is easy to verify that " Fn+1 An = Fn

Fn Fn−1

#

,

n ≥ 2,

(7.3)

where (Fn ) is again the sequence of Fibonacci numbers. With [A]jk denoting the entry of A at position (j, k), i.e., in the j th row and k th column, all four sequences ([An ]jk ) with j, k ∈ {1, 2} are Benford. In fact, by means of the explicit formula for Fn from Example 4.18, it is not hard to see that every positive linear combination of the entries of (An ) is also Benford. More formally, given any positive numbers h11 , h12 , h21 , h22 , the sequence (xn ) with xn = h11 [An ]11 + h12 [An ]12 + h21 [An ]21 + h22 [An ]22 ,

n ∈ N,

(7.4)

is Benford. Still more generally, except for the trivial case xn ≡ 0, which results, for example, from the choice h11 = −2, h12 = h21 = 1, h22 = 2, the sequence (xn ) turns out to be Benford even if the coeÿ cients hjk in (7.4) are arbitrary real numbers. Next consider the 2 × 2-matrix   −4 −3 B= . (7.5) 6 5 Note that, unlike the matrix A above, B has negative as well as positive entries. Again, it is easy to check that " # n n n n −2 + 2(−1) −2 + (−1) Bn = , n ∈ N, 2n+1 − 2(−1)n 2n+1 − (−1)n and as before, all sequences ([B n ]jk ) with j, k ∈ {1, 2} are Benford; see Theorem 4.16. However, the example 2[B n ]11 + [B n ]22 = 3(−1)n ,

n ∈ N,

shows that not every analogue of (7.4) is Benford or trivial (identically zero). Linear observables The above observations can be formalized in a simple and effective way. To see how to do this, recall that the sequence (xn ) = (an ) is uniquely def ned via the recursion xn = axn−1 for n ≥ 2, together with x1 = a. Similarly, (An ) is the unique solution of the (matrix) recursion Xn = AXn−1 with X1 = A. Thus one may think of (An ) as a dynamical process in the (d2 -dimensional) phase space Rd×d , perhaps providing a simple model for some real-world process. From a physicist’s or engineer’s point of view it may not be desirable or even possible to observe or record the entire sequence of matrices (An ), especially if d is very

MULTI-DIMENSIONAL LINEAR PROCESSES

137

large. Rather, what matters is the behavior of certain sequences of numbers distilled from (An ). To formalize this, call any function h : Rd×d → R an observable (on Rd×d ). This notion is extremely f exible. For example, h(A) =

Xd

j,k=1

[A]2jk

and h(A) = max{|λ| : λ is an eigenvalue of A} (7.6)

are two basic examples of (continuous) observables. With this terminology (which is motivated by similar usage in quantum mechanics and ergodic theory [34, 69]), what really matters from an applied scientist’s point of view is the behavior of h(An ) for specif c observables h that are relevant to the system or process being described by (An ). In the context of linear processes, a special role is naturally played by linear observables, i.e., by observables h on Rd×d satisfying h(A + B) = h(A) + h(B) and h(aA) = ah(A) for all A, B ∈ Rd×d and all a ∈ R. Neither of the two observables in (7.6) is linear. On the other hand, the observable h(A) = [A]jk is linear for all j, k ∈ {1, 2, . . . , d}. In fact, given any linear observable h on Rd×d , Pd there exists a unique matrix [hjk ] ∈ Rd×d such that h(A) = j,k=1 hjk [A]jk for all A ∈ Rd×d . Henceforth, for convenience, denote by Ld the set of all linear observables on Rd×d . With this, the d2 linear observables [ · ]jk form a basis of the  linear  space Ld . Thus (7.4) represents every possible sequence   of the form h(An ) with h ∈ L2 . With A from (7.2), the sequence h(An ) is Benford for every nonnegative observable h 6= 0 on R2×2 (see Section 7.2 for the formal def nition of nonnegative observables), and in fact for every h ∈ L2 unless h(I2 ) = h(A) = 0, in which case h(An ) ≡ 0. On the other hand,  with B from (7.5) and the linear observable h = 2[ · ]11 + [ · ]22 , the sequence h(B n ) is neither Benford nor identically zero. Two of the main theorems of this chapter allow the reader to easily draw similar conclusions for arbitrary nonnegative and for general real d × d-matrices A (Theorems 7.3  and 7.21, respectively). They provide necessary and suÿ cient conditions for h(An ) to be, respectively, Benford for every nonnegative linear observable h on Rd×d , and Benford or trivial (that is, zero for all n ≥ d) for every h ∈ Ld . These conditions generalize, and in the case d = 1 reduce to, the fact that for (7.1) to be Benford (or identically zero) it is necessary and suÿ cient that log |a| be irrational (or a = 0). Linear difference equations In applied sciences as well as in mathematics, linear processes often present themselves in the form of (autonomous) linear difference or differential equations. These important models are directly amenable to the results for sequences   h(An ) outlined above, or to their continuous-time analogues; the reader is referred to Sections 7.5 and 7.6 for all pertinent details. A simple but also quite prominent example is the second-order linear difference equation (or two-step recursion) xn = xn−1 + xn−2 , n ≥ 3 . (7.7)

138

CHAPTER 7

This difference equation can be rewritten in matrix-vector form as      xn 1 1 xn−1 = , n ≥ 3. xn−1 1 0 xn−2 With the (invertible) matrix A from (7.2) this leads to         x1 x2 xn+1 xn , = An =A = . . . = An−1 x2 − x1 x1 xn xn−1

n ≥ 1,

whichn in  turn shows that every sequence (xn ) satisfying (7.7) is of the form h(A ) , with the observable h ∈ L2 determined by the initial values x1 , x2 , namely, h = x1 [ · ]21 + (x2 − x1 )[ · ]22 . Thus any result concerning the Benford property of sequences h(An ) in general leads directly to a corresponding result for difference equations; see, e.g., Theorems 7.39 and 7.41. In particular, it turns out that every solution (xn ) of (7.7) is Benford — except, of course, for the trivial case x1 = x2 = 0; see Example 7.42 below. For another simple example, recall from Example 6.2(i) that every solution (xn ) of xn = −2xn−1 , n ≥ 2 , (7.8) is Benford unless x1 = 0. It is natural to ask whether this clear-cut situation persists if (7.8) is modif ed to, for instance, xn = −2xn−1 − 3xn−2 ,

n ≥ 3.

(7.9)

Theorem 7.41 below, one of the chapter’s main results on difference equations, reduces this question to a familiar open problem in number theory; see Example 7.44(iii).   Again, this result relies heavily on a careful analysis of sequences h(An ) for any A ∈ Rd×d and any h ∈ Ld . Thus the latter analysis, to be carried out in Sections 7.2 and 7.3, constitutes the core of this chapter.

In the study of linear observables and powers of matrices in the subsequent sections, standard linear algebra notions regarding complex numbers, matrices, and vectors are employed. Specif cally, for every z ∈ C, the numbers z, ℜz, ℑz, and |z| are the complex conjugate, real part, imaginary part, and absolute value (modulus) of z, respectively. Let S be the unit circle (circle of radius 1) in C, i.e., S = {z ∈ C : |z| = 1}. The argument arg z of z 6= 0 is the unique number in (−π, π] for which z = |z|eı arg z . For any set Z ⊂ C and number w ∈ C, def ne w + Z = {w + z : z ∈ Z} and wZ = {wz : z ∈ Z}. Thus, for instance, w + S = {z ∈ C : |z − w| = 1} and wS = {z ∈ C : |z| = |w|} for every w ∈ C. Given any set Z ⊂ C, denote by spanQ Z the smallest subspace of C (over Q) containing Z; equivalently, if Z 6= ∅ then spanQ Z is the set of all rational linear combinations of elements of Z, i.e.,  spanQ Z = ρ1 z1 + ρ2 z2 + . . .+ ρn zn : n ∈ N, ρ1 , ρ2 , . . . , ρn ∈ Q, z1 , z2 , . . . , zn ∈ Z ; note that spanQ ∅ = {0}. With this terminology, recall that the numbers z1 , z2 , . . . , zn ∈ C are Q-independent (or rationally independent) if the space

MULTI-DIMENSIONAL LINEAR PROCESSES

139

Pn spanQ {z1 , z2 , . . . , zn } is n-dimensional, or, equivalently, if k=1 pk zk = 0 with integers p1 , p2 , . . . , pn implies that p1 = p2 = . . . = pn = 0. Throughout, d is a f xed but usually unspecif ed positive integer. qPFor every d d 2 u ∈ R , the number |u| ≥ 0 is the Euclidean norm of u, i.e., |u| = j=1 uj . A

vector u ∈ Rd is a unit vector if |u| = 1. For every A ∈ Rd×d , the transpose of A is A⊤ , thus [A⊤ ]jk = [A]kj for all j, k ∈ {1, 2, . . . , d}. The spectrum of A, i.e., the set of its eigenvalues, is denoted by σ(A). Hence σ(A) ⊂ C is non-empty, contains at most d numbers, and is symmetric with respect to the real axis, i.e., all non-real elements of σ(A) occur in complex-conjugate pairs. The number ρ(A) = max{|λ| : λ ∈ σ(A)} ≥ 0 is referred to as the spectral radius of A. Note that ρ(A) > 0 unless A is nilpotent, i.e., unless AN = 0 (zero matrix) for some N ∈ N; in the latter case Ad = 0 as well. For every A ∈ Rd×d , the number |A| is d the (spectral ) norm of A induced p by | · |, i.e., |A| = max{|Au| : u ∈ R , |u| = 1}. n 1/n It is well known that |A| = ρ(A⊤ A) ≥ ρ(A) = limn→∞ |A | ; see [79, Sec. 5.6]. 7.2

NONNEGATIVE MATRICES

A real d × d-matrix A is nonnegative (or positive), in symbols A ≥ 0 (or A > 0), if [A]jk ≥ 0 (or [A]jk > 0) for all j, k ∈ {1, 2, . . . , d}. Nonnegative matrices play an important role in many areas of mathematics, including game theory, combinatorics, optimization, operations research, and economics. The Benford properties of sequences (An ) with A ∈ Rd×d are much simpler for A ≥ 0 than for the general case, that is, for A having both positive and negative entries. As the reader will learn through the present section, with regard to Benford’s law, powers of nonnegative matrices behave much like one-dimensional sequences, and hence provide a natural bridge between the latter and the general multidimensional linear processes to be studied in Section 7.3. The main results of this section, Theorems 7.3 and 7.11 below, make use of well-known facts regarding nonnegative matrices. These classical facts, due to O. Perron and F. G. Frobenius, are stated here for the reader’s convenience in a form targeted at their subsequent use. (For proofs as well as comprehensive overall accounts of the theory and applications of nonnegative matrices, the reader may wish to consult [6], [25], or [79, Ch. 8].) Recall that a nonnegative matrix A is irreducible if for every j, k ∈ {1, 2, . . . , d} there exists N ∈ N with [AN ]jk > 0, and A is primitive if AN > 0 for some N ∈ N. Clearly, A ≥ 0 is primitive whenever positive, and irreducible whenever primitive; as simple examples show, neither of the converses holds in general if d ≥ 2. Proposition 7.1. Let A ∈ Rd×d be nonnegative. Then:

(i) The spectral radius ρ(A) ≥ 0 is an eigenvalue of A; (ii) If A is irreducible then ρ(A) > 0, and the eigenvalue ρ(A) is (algebraically) simple, i.e., a simple root of the characteristic polynomial of A.

140

CHAPTER 7

Proposition 7.2. Let A ∈ Rd×d be nonnegative. Then the following are equivalent: (i) A is primitive; (ii) Ad

2

−2d+2

> 0;

(iii) An is irreducible for every n ∈ N;

(iv) A is irreducible, and |λ| < ρ(A) for every eigenvalue λ 6= ρ(A) of A;

(v) A is irreducible, and the limit limn→∞ An /ρ(A)n exists and is positive.

Recall from Theorem 4.16 that (an ) with a ≥ 0 is Benford if and only if log a is irrational. The following theorem, the f rst main result of this section, generalizes this simple fact to arbitrary f nite dimension. For a concise statement, call h ∈ Ld nonnegative if h(A) ≥ 0 for every nonnegative A ∈ Rd×d . Equivalently, Pd h is nonnegative precisely if [hjk ] ≥ 0, where h = j,k=1 hjk [ · ]jk . For example, every observable [ · ]jk is nonnegative. Theorem 7.3. Let A ∈ Rd×d be nonnegative. Then the following are equivalent: (i) ([An ]jk ) is Benford for all j, k ∈ {1, 2, . . . , d};   (ii) h(An ) is Benford for every nonnegative linear observable h 6= 0 on Rd×d ;

(iii) A is primitive, and log ρ(A) is irrational; (iv) Ad

2

−2d+2

> 0, and log ρ(A) is irrational.

Proof. Assume f rst that (i) holds. If, for some N ∈ N, the nonnegative matrix AN were reducible (i.e., not irreducible), then there would exist numbers j, k ∈ {1, 2, . . . , d} with [AN n ]jk = 0 for all n ∈ N, so ([An ]jk ) would not be Benford. As this contradicts (i), An is irreducible for every n ∈ N, and hence is primitive by Proposition 7.2(iii), and ρ(A) > 0. Proposition 7.2(v) shows that Q := limn→∞ An /ρ(A)n exists and is positive. It follows easily that Q2 = Q and QA = AQ = ρ(A)Q, so An = ρ(A)n Q + B n ,

n ∈ N,

(7.10)

where B = A−ρ(A)Q. Note that limn→∞ B n /ρ(A)n = 0, and that consequently limn→∞ h(B n /ρ(A)n ) = 0 for every h ∈ Ld . (All linear observables on Rd×d are continuous.) Moreover, if h 6= 0 is nonnegative then h(Q) > 0. For every nonnegative linear observable h 6= 0 on Rd×d , therefore,   h(An ) Bn n→∞ = h(Q) + h −→ h(Q) > 0 . (7.11) ρ(A)n ρ(A)n   By Theorem 4.16, h(An ) is Benford only if log ρ(A) is irrational. Hence (iii) follows from (i).

MULTI-DIMENSIONAL LINEAR PROCESSES

141

Next assume that (iii) holds. As seen above, this implies (7.10) and  (7.11) for every nonnegative linear observable h = 6 0 on Rd×d . Consequently, h(An ) is Benford, again by Theorem 4.16, and (ii) holds. Clearly (ii) implies (i); simply let h = [ · ]jk for any j, k ∈ {1, 2, . . . , d}. The statements (i), (ii), (iii), therefore, all are equivalent. Since (iii)⇔(iv) also, by Proposition 7.2, the proof  is complete.   2 1 Example 7.4. (i) For the nonnegative matrix A from (7.2), A2 = 1 1 √   1 is positive, and ρ(A) = ϕ, where ϕ = 1 + 5 as usual. Since log ϕ is 2  irrational, h(An ) is Benford for every nonnegative linear observable h 6= 0 on R2×2 . In particular, choosing h = [ · ]12 shows once again that ([An ]12 ) = (Fn ) is Benford. On the other hand, note that the linear observable h = [ · ]11 − [ · ]12 is and hence Theorem 7.3 cannot be used to decide whether  notn nonnegative,  h(A ) = (0, 1, 1, 2, 3, . . .) is Benford; Theorem 7.21 below shows that indeed it is.   2 0 (ii) The matrix B = is nonnegative but not irreducible, let alone 0 1 primitive. Even though log ρ(B) = log 2 is irrational, by Theorem 7.3 there exist j, k ∈ {1, 2} for which ([B n ]jk ) is not Benford. In fact, except for j = k = 1, each sequence ([B n ]jk ) is constant and hence not Benford. z Example 7.5. (i) For the (symmetric) nonnegative matrix   1 1 0 A =  1 0 1  ∈ R3×3 , 0 1 1   A2 > 0 and ρ(A) = 2. Hence Theorem 7.3 shows that h(An ) is Benford for every nonnegative linear observable h 6= 0 on R3×3 . (ii) Consider the positive 3 × 3-matrix   6 3 1 1  3 4 3 . B = 10 1 3 6

Notice that the entries in each row of B sum to 1, so B is a so-called (row-) stochastic matrix. (Due to its being symmetric, B is also column-stochastic.) It is not hard to see that ρ(B)  = 1 for every (row- or column-) stochastic matrix B. By Theorem 7.3, h(B n ) is not Benford for some nonnegative linear observable h 6= 0 on R3×3 . In fact, the explicit formula       1 0 −1 1 1 1 1 −2 1 0  + 61 · 10−n  −2 4 −2  B n = 13  1 1 1  + 2−(n+1)  0 0 1 1 1 −1 0 1 1 −2 1

shows that limn→∞ h(B n ) exists and is positive for every such observable, so  clearly h(B n ) is never Benford. z

142

CHAPTER 7

 From  (7.11) it is evident that if A is primitive and log ρ(A) is rational, then h(An ) is not Benford for any nonnegative h ∈ Ld , as seen in Example 7.5(ii) above. It can be shown that primitivity of A is actually not needed  for this n conclusion: If A ≥ 0 is irreducible and log ρ(A) is rational, then h(A ) is not   Benford for any nonnegative h ∈ Ld . (Still, h(An ) may be Benford for some h ∈ Ld but no such h can be nonnegative; see Example 7.7 below.) On the other hand, if A is reducible, or if A is but not primitive, and log ρ(A) is  irreducible  irrational, then the behavior of h(An ) may be less uniform.   a 0 Example 7.6. (i) For every a > 0, the nonnegative matrix A = 0 10 is reducible. Suppose log a is irrational. Depending on the choice of h ∈ L2 ,  the sequence h(An ) may be Benford (e.g., for h = [ · ]11 ), non-Benford but not trivial (e.g., for h = [ · ]22 ), or trivial (e.g., for h = [ · ]21 ). This non-uniformity of behavior exists regardless of whether log ρ(A) is irrational (if a > 10) or is rational (if a ≤ 10).   0 2 (ii) The nonnegative matrix B = , with log ρ(B) = log 2 irrational, 2 0   is irreducible. However, since B is not primitive, any sequence h(B n ) may or may not be Benford. For instance, ([B n ]11 + [B n ]12 ) = (2n ) is Benford whereas (2[B n ]11 ) = (2n + (−2)n ) is not. z  Let A be a primitive matrix. When log ρ(A) is irrational, the sequence h(An ) may nevertheless fail to be Benford for some nontrivial h ∈ Ld . On the other hand, even when log ρ(A) is rational, h(An ) may be Benford. The following example illustrates both situations. Note that, as a consequence of Theorem 7.3, h cannot be nonnegative in either case. Example 7.7. Let h = [ · ]11 − [ · ]12 ∈ L2 . Clearly, h is not nonnegative. 5 15 The matrix A = is positive, log ρ(A) = 1 + log 2 is irrational, and 15 5       6 4 yet h(An ) = (−10)n is not Benford. On the other hand, B = is 4 6  also positive, with log ρ(B) = log 10 = 1 rational. Nevertheless, h(B n ) = (2n ) is Benford. z As demonstrated by the next example, the conclusion of Theorem 7.3 may fail if even a single entry of A is negative.   1 1 Example 7.8. For A = it is clear from Theorem 4.16 and −1 3  n  2 − n2n−1 n2n−1 An = , n ∈ N, −n2n−1 2n + n2n−1 that ([An ]jk ) is Benford for every j, k ∈ {1, 2}, whereas [An ]12 + [An ]21 ≡ 0, and An is not nonnegative for any n ∈ N. Thus the implications (i)⇒(ii), (i)⇒(iii) and (i)⇒(iv) of Theorem 7.3 all fail in this case. z

MULTI-DIMENSIONAL LINEAR PROCESSES

143

  In addition to linear observables, the Benford property of h(An ) may also be of interest for certain nonlinear observables h on Rd×d , notably for h(A) = |A| and h(A) = |Au|, where u ∈ Rd \{0} is a f xed vector. To conveniently formulate a result regarding these observables, call u ∈ Rd semi-def nite if uj uk ≥ 0 for all j, k ∈ {1, 2, . . . , d}. Theorem 7.9. Let A ∈ Rd×d be nonnegative. If A satisf es (i)–(iv) in Theorem 7.3 then: (i) (|An |) is Benford; (ii) (|An u|) is Benford for every semi-def nite u ∈ Rd \ {0}.

Proof. Both assertions follow immediately from (7.10): Since |Q| > 0, |An | B n n→∞ −→ |Q| > 0 , (7.12) = Q + ρ(A)n ρ(A)n

and since log ρ(A) is irrational, (|An |) is Benford. Similarly, note that Qu 6= 0 whenever u 6= 0 is semi-def nite. Hence |An u| B n u n→∞ −→ |Qu| > 0 , (7.13) = Qu + ρ(A)n ρ(A)n and (|An u|) is Benford as well.



Except for the trivial case d = 1, the converse of Theorem 7.9 is not true in general: Even if (|An |) and (|An u|) for every semi-def nite u ∈ Rd \ {0} are Benford, A may not satisfy any of the conditions (i)–(iv) in Theorem 7.3, even if A is irreducible.   0 2 Example 7.10. For the irreducible matrix A = , the sequences 2 0 n n n n (|A |) = (2 ) and (|A u|) = (2 |u|) are Benford, yet none of the conditions (i)–(iv) in Theorem 7.3 holds for A; see Example 7.6(ii). z As evidenced by Example 7.10, properties (i) and (ii) in Theorem 7.9 are generally weaker than properties (i)–(iv) in Theorem 7.3, even if A is irreducible. If, however, A is assumed to be primitive, then all those properties turn out to be equivalent. Informally put, the following theorem shows that as far as Benford’s law is concerned, for a primitive matrix A, the sequence (An ) behaves just like the one-dimensional sequence (ρ(A)n ). Theorem 7.11. Let A ∈ Rd×d be nonnegative. If A is primitive then the following are equivalent: (i) log ρ(A) is irrational; (ii) ([An ]jk ) is Benford for all j, k ∈ {1, 2, . . . , d};   (iii) h(An ) is Benford for every nonnegative linear observable h 6= 0 on Rd×d ;

144

CHAPTER 7

(iv) (|An |) is Benford; (v) (|An u|) is Benford for every semi-def nite u ∈ Rd \ {0}.

Proof. Using Theorem 4.16, equivalences (i)⇔(iii), (i)⇔(iv), and (i)⇔(v) follow immediately from (7.11), (7.12), and (7.13), respectively. Moreover, (7.11) with h = [ · ]jk for any j, k ∈ {1, 2, . . . , d} shows that (ii)⇒(i), and since clearly (iii)⇒(ii), the proof is complete.  Remark. Similarly to Theorem 7.3, if A is primitive and log ρ(A) is rational,   then h(An ) and (|An u|) are not Benford for any nonnegative linear observable h on Rd×d and any semi-def nite u ∈ Rd , respectively. Applying Theorems 7.3 and 7.11 is especially easy when A ∈ Rd×d is a nonnegative integer (or rational) matrix, i.e., [A]jk is a nonnegative integer (or rational number) for all j, k ∈ {1, 2, . . . , d}. Using Proposition 7.2(ii), it is straightforward to check whether A is primitive. If it is, deciding whether log ρ(A) is irrational does not require explicitly determining the number ρ(A), let alone the entire set σ(A). The next example illustrates this; for more details, the interested reader is referred to [16, Ex. 3.14]. Example 7.12. The nonnegative integer matrix   0 1 0 A= 0 0 1  6 1 0

is not positive, and neither are A2 , A3 , A4 .  6 1 A5 =  36 12 6 37

However,  6 1  > 0; 12

hence A is primitive. (In this example, the exponent d2 − 2d + 2 of Proposition 7.2(ii) is smallest possible, but often smaller√exponents suÿ ce; see [79, Sec. 8.5].) Moreover, det A = 6 and therefore ρ(A) ≥ 3 6 > 1. To decide whether log ρ(A) is rational, suppose ρ(A) = 10p/q for some relatively prime p ∈ Z and q ∈ N. Then p ≥ 1 since ρ(A) > 1, and 10p is an eigenvalue of Aq . Hence 10p divides det Aq = 6q . This, however, is impossible for p ≥ 1. It follows  that log ρ(A) is irrational (in fact, transcendental), and by Theorem 7.11, h(An ) is Benford for every nonnegative linear observable h 6= 0 on R3×3 . Note that in order for this argument to work it is not at all necessary to know that ρ(A) = 2. z Remarks. (i) Under the assumption that A ∈ Rd×d satisf es AN > 0 for some N ∈ N, properties (i)–(v) in Theorem 7.11 remain equivalent even if A is not nonnegative; see [16, Thm. 3.2].   (ii) Theorem 7.11(iii) can be replaced by the stronger condition that h(An ) is Benford for every nonnegative polynomial observable h on Rd×d that satisf es h(0) = 0 and is not constant.

145

MULTI-DIMENSIONAL LINEAR PROCESSES

(iii) Since (7.12) and (7.13) also follow from (7.10) if | · | is replaced by any other norm on Rd×d and Rd , respectively, Theorems 7.9(i, ii) and 7.11(iv, v) remain valid for arbitrary norms as well. 7.3

GENERAL MATRICES

  This section studies the Benford properties of sequences h(An ) , where A is an arbitrary real d × d-matrix and h is any linear observable on Rd×d . To compare this completely general situation to the more special one considered in the previous section (where A was assumed to be nonnegative), observe that while the assumption of nonnegativity yields elegant results with short proofs (via classical facts about nonnegative matrices), it also naturally limits their scope. For instance, Theorems 7.3 and 7.11 are inconclusive whenever h ∈ Ld fails to be nonnegative. To illustrate how a more general setting may be relevant even if one is primarily interested in nonnegative matrices A and observables h, recall from Proposition 7.2 that if A is primitive then the matrix Q = limn→∞

An ρ(A)n

(7.14)

exists and is positive. Often one is interested in the (Benford) properties of the sequences (An+1 − ρ(A)An ) and (An − ρ(A)n Q), both of which in some sense measure the speed of convergence in (7.14). With a view towards Theorems7.3 and 7.11, note that, given any h ∈ Ld , the sequence h(An+1 − ρ(A)An ) is actually of the form g(An ) with g ∈ Ld def ned as   g(C) = h C(A − ρ(A)Id ) ,

(7.15)

C ∈ Rd×d .

    Similarly, h(An − ρ(A)n Q) equals h(B n ) with B = A − ρ(A)Q. In general, however, neither the linear observable g in (7.15) nor the matrix B will be nonnegative, even if h and A are.   1 1 Example 7.13. The nonnegative matrix A = 21 has spectral radius 2 0 ρ(A) = 1 and is primitive since A2 > 0; it is easy to see that Q = limn→∞ An =

1 3



2 2

1 1



and B = A − Q =

1 6



−1 2

1 −2



.

  In accordance with Theorem 7.11, the sequence h(An ) is not Benford for any nonnegative linear observable h on R2×2 . (In fact, (7.10) with B n = (− 12 )n−1 B  n shows that h(A ) , for any h ∈ L2 , is not Benford unless h(B) 6= 0 and h(Q) = 0; the latter is impossible if h is nonnegative.) While A is nonnegative (in fact, row-stochastic) and Q is positive, clearly the matrix B is neither. Also, with the nonnegative h = [ · ]11 , for instance, the linear observable g in (7.15)

146

CHAPTER 7

is g = − 21 [ · ]11 + [ · ]12 and hence fails to be nonnegative. Nevertheless, a short calculation yields  n−1 B An − Q = − 12

 n and An+1 − An = 3 − 21 B ,

n ∈ N.

  this it is clear that, given any h ∈ L2 , both sequences h(An − Q) and Fromn+1 h(A − An ) are either Benford or identically zero, depending on whether h(B) 6= 0 or h(B) = 0. Theorem 7.32 below explains this phenomenon in general. z Recall from Theorem   7.11 that, given any primitive nonnegative d×d-matrix A, the sequence h(An ) is Benford for all nonnegative linear observables h 6= 0 on Rd×d (or for none) precisely if log ρ(A) is irrational (or rational). Such a clear-cut all-or-nothing situation cannot be expected when arbitrary h ∈ Ld are   considered, simply because h(An ) may be trivial (identically zero) irrespective of any specif c properties of A. The most that can be expected in general, and hence the natural of the setting of Theorem 7.11, is that every  analogue  nontrivial sequence h(An ) is Benford. By illustrating this point, the next two examples also motivate the phrasing of the main results, Theorem 7.21 and Proposition 7.31 below. Example 7.14. For the positive matrix A from (7.2) and any h ∈ L2 , it follows from (7.3) that h(An ) = Fn h(A) + Fn−1 h(I2 ) ,

n ≥ 2,

  which in turn shows, via Theorem 4.16 and Example 4.18, that h(An ) is Benford unless h(A) = h(I2 ) = 0; in the latter case clearly h(An ) ≡ 0. Thus, for instance, h0 (An ) ≡ 0 for h0 = [ · ]12 − [ · ]21 . Observe that, similarly, h0 (B n ) ≡ 0 for the matrix B in Example 7.7. Unlike in the case for A, however, for arbitrary h ∈ L2 not every sequence h(B n ) is Benford since, for example, (2[B n ]11 ) = (10n + 2n ), which is not Benford by Theorem 4.16; on the other hand,([B n ]11 − [B n ]12 ) = (2n ) is Benford. In summary, while the sequence  h(An ) is Benford or trivial for every h ∈ L2 , this is clearly not the case for h(B n ) . z Example 7.15. Consider the nonnegative 4 × 4-matrix   1 1 1 1  1 1 1 0   A=  1 1 0 0 , 2 2 4 2

2 for 7.11,  which  it is readily conf rmed that A > 0 and ρ(A) = 4. By Theorem n h(A ) is Benford for every nonnegative linear observable h 6= 0 on R4×4 . As in the previous example, this property largely persists for arbitrary h ∈ L4 : The

147

MULTI-DIMENSIONAL LINEAR PROCESSES

explicit formula 

 22 22 22 11  10 10 10 5  n−3 3  A , An = 4n−3   8 8 8 4 =4 48 48 48 24

n ≥ 3,

(7.16)

  shows that, given any h ∈ L4 , the sequence h(An ) is Benford  precisely if h(A3 ) 6= 0. Note that, unlike in the previous example, h(An ) may fail to be Benford and yet may not vanish identically. For instance, this happens   for h = 5[ · ]31 − 4[ · ]21 , where h(An ) = (1, −2, 0, 0, 0, . . .). However, (7.16) guarantees that in this case h(An ) = 0 for all n ≥ 3. Next consider the nonnegative 4 × 4-matrix   1 1 1 1  1 1 1 0   B=  1 1 0 0 . 14 14 16 8

Again, B 2 > 0, and ρ(B) = 10. In fact, as before, B n = 10n−3 B 3 for every  now  n n ≥ 3, and hence h(B ) is not Benford for any h ∈ L4 . z

The main goal of this section is to characterize, for arbitrary A ∈ Rd×d , the situation encountered for  the matrices A in the two examples above: For every h ∈ Ld , the sequence h(An ) is either Benford or else vanishes  identically  from some n onward. For convenience, from now on any sequence h(An ) with h(An ) = 0 for all n ≥ d is referred to as terminating. To provide the reader with  some intuition as to which features of A may affect the Benford property of h(An ) , f rst a few simple examples are discussed.   1 −1 Example 7.16. (i) Consider the matrix A = . From the explicit 1 0 formula sin( 13 πn) √ An = cos( 13 πn)I2 + (2A − I2 ) , n ∈ N , 3   it is clear that, given any h ∈ L2 , the sequence h(An ) is 6-periodic, i.e.,  n h(An+6 ) = h(An ) for all n ∈ N. For no choice of h ∈ L , therefore, is h(A ) 2   n Benford. The oscillatory behavior of h(A ) corresponds to the fact that the eigenvalues of A are λ = e±πı/3 and hence lie on the unit circle S. (ii) For the matrix B in Example 7.6(ii), B n = 2n−2 (B + 2I2 ) − (−2)n−2 (B − 2I2 ) , n ∈ N ,   and so for any h ∈ L2 the sequence h(B n ) is unbounded, provided that h(I2 ) 6= 0

or h(B) 6= 0 .

(7.17)

148

CHAPTER 7

  Even if h satisf es (7.17), however, h(B n ) may not be Benford, as the examples n h = [ · ]11 and h = [ · ]12 show, for which  h(B ) = 0 for all odd and all even n ∈ N, n respectively. This failure of h(B ) , for every h ∈ L2 , to be either Benford or trivial is caused by B’s having two eigenvalues with the same modulus but opposite signs, namely, λ = ±2.   4γ −4 (iii) Let γ = cos(π log 2) = 0.5851 and consider the matrix C = . 1 0 As in (i) and (ii), an explicit formula for C n is easily derived, namely, sin(πn log 2) p (C − 2γI2 ) , n ∈ N . 2 1 − γ2   Although somewhat oscillatory, the sequence h(C n ) is unbounded for most h ∈ L2 . As will be shown now, however, it is not Benford. While the argument is essentially the same for pevery h ∈ L2 , for convenience assume specif cally that h(I2 ) = 0 and h(C) = 2 1 − γ 2 , which in turn yields   log |h(C n )| = log 2n | sin(πn log 2)| = n log 2 + log | sin(πn log 2)| , n ∈ N . C n = 2n cos(πn log 2)I2 + 2n

Therefore, with the (Borel measurable) map f : [0, 1) → [0, 1) def ned as f (s) = hs + log | sin(πs)|i ,

0 ≤ s < 1,

hlog |h(C n )|i = f (hn log 2i) for all n ∈ N. Recall from Example 4.7(i) that   (n log 2) is u.d. mod 1, and hence the sequence hlog |h(C n )|i is distributed ac cording to the probability measure λ0,1 ◦ f −1 . Consequently, h(C n ) is Benford if and only if λ0,1 ◦ f −1 is equal to λ0,1 . The latter, however, is not the case. While this is clear intuitively, an easy way to see it formally is to observe that f is piecewise smooth and has a unique local maximum s0 ∈ (0, 1). (Specif cally, s0 = 1 − π1 arctan(π log e) = 0.7013.) Thus if λ0,1 ◦ f −1 = λ0,1 then for all suÿ ciently small ε > 0,     λ0,1 f (s0 − ε), f (s0 ) λ0,1 ◦ f −1 f (s0 − ε), f (s0 ) f (s0 ) − f (s0 − ε) = = ε ε  ε  λ0,1 [s0 − ε, s0 ) ≥ = 1, ε   which is impossible because f ′ (s0 ) = 0; see Figure 7.1. Hence h(C n ) is not Benford. The reason for this can be seen in the fact that while the number log |λ| = log 2 is irrational for the eigenvalues λ = 2e±πı log 2 of C, there clearly 1 is a rational dependence between the two real numbers log |λ| and 2π arg λ, 1  namely, log |λ| ∓ 2 2π arg λ = 0. Notice also that if γ were chosen as, say, γ = cos(π log 3) = 0.07181 or γ = cos(π log 201) = 0.5796, then no such rational dependence   would exist, and Theorem 7.21 below shows that in both these cases h(C n ) would indeed be Benford for all h ∈ L2 unless h(C n ) ≡ 0; see Figure 7.1. z

149

MULTI-DIMENSIONAL LINEAR PROCESSES

xn = 2n sin(πnδ)

f (s) = s + log | sin(πs) 1

δ = log 2 10

∼ ε2

f (s0 ) S(xn )

10f (s0 )

1

∼ε

n 100

0

200

δ = log 3

0

s

s0

1

δ = log 201

10

10

S(xn )

S(xn )

1

n 100

200

1

n 100

200

  Figure 7.1: The sequence 2n sin(πnδ) is not Benford for δ = log 2 (top left), but is Benford for δ = log 3 and δ = log 201 (bottom). The former corresponds to the fact that the map f does not preserve λ0,1 ; see Example 7.16(iii). The above examples indicate that, from the perspective of Benford’s law, the main diÿ culty when dealing with multi-dimensional systems is their potential for cyclic behavior, either of the orbits themselves or of their signif cands. (In the case of primitive nonnegative matrices, as seen in the previous section, cyclicality does not occur or, more correctly, remains hidden.) To clarify this diÿ culty precisely, the following terminology is useful. Recall that, given any set Z ⊂ C, spanQ Z denotes the smallest linear subspace of C (over Q) containing Z. Definition 7.17. A non-empty set Z ⊂ C with |z| = r for some r > 0 and all z ∈ Z, i.e., Z ⊂ rS, is nonresonant if the associated set   arg z − arg w ∆Z := 1 + : z, w ∈ Z ⊂ R 2π satisf es the following two conditions: (i) ∆Z ∩ Q = {1};

(ii) log r 6∈ spanQ ∆Z .

An arbitrary set Z ⊂ C is nonresonant if, for every r > 0, the set Z ∩ rS is either nonresonant or empty; otherwise Z is resonant.

150

CHAPTER 7

Note that the set ∆Z automatically satisf es 1 ∈ ∆Z ⊂ (0, 2) and is symmetric with respect to the point 1. The empty set ∅ and the singleton {0} are nonresonant. Also, if Z is nonresonant then so is every W ⊂ Z. On the other hand, Z is certainly resonant if either #(Z ∩ rS ∩ R) = 2 for some r > 0, in which case (i) is violated, or Z ∩ S 6= ∅, which causes (ii) to fail. Finally, it is easily checked that if Z ⊂ C is nonresonant then so is Z n := {z n : z ∈ Z} for every n ∈ N. The converse is not true in general, as can be seen from the resonant set Z = {−2, 2}, for which Z 2 = {4} is nonresonant. Example 7.18. The singleton {z} is nonresonant if and only if either z = 0 or log |z| 6∈ Q. Similarly, the set {z, z} with z ∈ C \ R is nonresonant if and only 1 if 1, log |z|, and 2π arg z are Q-independent. z

As one learns in linear algebra, the asymptotic behavior of (An ) is completely determined by the eigenvalues of A, together with the corresponding (generalized) eigenvectors. As far as Benford’s law is concerned, the key question turns out to be whether or not the set σ(A) is nonresonant. Note that log ρ(A) is irrational whenever σ(A) is nonresonant (and A is not nilpotent), but the converse is not true in general. Example 7.19. For the matrix A from (7.2), σ(A) = {−ϕ−1 , ϕ} is nonresonant. On the other hand, the spectrum of the matrix B from (7.5) is σ(B) = {−1, 2}, and hence resonant. z

Example 7.20. The 2 × 2-matrices considered in Example 7.16,       1 −1 0 2 4γ −4 A= , B= , C= , 1 0 2 0 1 0   all have resonant spectrum. Indeed σ(A) = e±πı/3 , and ∆σ(A) = 32 , 1, 34 contains rational points other than 1, which violates (i) in Def nition 7.17. In addition, log |e±πı/3  | = 0, and so (ii) is also violated. Similarly, σ(B) = {±2}; hence ∆σ(B) = 21 , 1, 32 , and (i) fails whereas log 2 6∈ Q = spanQ ∆σ(B) , i.e., (ii) holds. (Note, however, that σ(B 2 ) = {4} is nonresonant, and so is σ(B 2n ) for ±πı log 2 , the set ∆σ(C) = {1, 1 ± log 2} all n ∈ N.) Finally, with σ(C) = 2e satisf es (i), but log 2 ∈ spanQ {1, log 2} = spanQ ∆σ(C) , violating (ii). z

The following theorem is the main result in this section. Like Theorems 7.3 and 7.11 above, but without any further assumptions on A, it extends to arbitrary dimension the simple fact that for the sequence (an ) with a ∈ R to be either Benford or trivial, it is necessary and suÿ cient that log |a| be irrational or a = 0. By Example 7.18, the latter is equivalent to the singleton {a} being nonresonant. Theorem 7.21 ([17]). For every A ∈ Rd×d the following are equivalent: (i) The set σ(A) is nonresonant;   (ii) For every h ∈ Ld the sequence h(An ) is Benford or terminating.

151

MULTI-DIMENSIONAL LINEAR PROCESSES

A proof of Theorem 7.21 (and also of Theorem 7.28 below) will be given here only under an additional assumption on the matrix A. While this assumption, (7.18) below, holds for most matrices, it clearly fails for some, and the interested reader may want to consult [17] for a complete proof not making use of (7.18). The argument presented here relies on the Benford properties of sequences of a particular form, stated below for ease of reference; with Theorem 4.2 and Proposition 4.6, assertion (i) is an immediate consequence of [11, Cor. 2.6], and (ii) is analogous to [16, Lem. 2.6]. Note that (i) contains the “if” part of Theorem 4.16 as a special case. Proposition 7.22. Let a > 0, b ∈ R, and (εn ) be a sequence in R with limn→∞ εn = 0. Suppose θ ∈ R is irrational, and suppose the 1-periodic function f : [0, +∞) → R is continuous, with f (t) 6= 0 for almost all t ≥ 0. Then, for the sequence (xn ) def ned by xn = an nb (f (nθ) + εn ) ,

n ∈ N,

the following hold: (i) If log a 6∈ spanQ {1, θ} then (xn ) is Benford; (ii) If log a ∈ spanQ {1, θ} and f is differentiable with f (t) = 0 for some t ≥ 0 then (xn ) is not Benford. As indicated above, a proof of Theorem 7.21 is given here only under an additional assumption. Concretely, it will be assumed that #(σ(A) ∩ rS) ≤ 2 for all r > 0 ,

(7.18)

i.e., for every r > 0 the matrix A has at most two eigenvalues of modulus r, which may take the form of the real pair −r, r, or a non-real pair λ, λ, with |λ| = r. Note that (7.18) holds for most A ∈ Rd×d in that the matrices A not satisfying (7.18) form a nowhere dense nullset in Rd×d . (If d < 3 then clearly (7.18) holds for all A ∈ Rd×d .) For convenience, let σ + (A) = {λ ∈ σ(A) : ℑλ ≥ 0} \ {0} , and recall from the Jordan Normal Form Theorem (e.g., [79, Sec. 3.2]) that An can be written in the form X  n An = ℜ P (n)λ , n ≥ d, (7.19) λ + λ∈σ (A)

where for every λ ∈ σ + (A), Pλ is a (possibly non-real) matrix-valued polynomial of degree at most d−1, i.e., for all j, k ∈ {1, 2, . . . , d} the entry [Pλ ]jk is a complex polynomial in n of degree at most d − 1. Moreover, Pλ is real whenever λ ∈ R. The representation (7.19) will be used repeatedly in what follows.

152

CHAPTER 7

Proof of Theorem 7.21. If σ+ (A) = ∅ then A is nilpotent, σ(A) = {0} is nonresonant, and every sequence h(An ) is identically zero for n ≥ d. Thus (i) is equivalent to (ii) trivially in this case. From now on, assume that σ + (A) is not empty. Given any h ∈ Ld , it follows from (7.19) that   X X   n n n , n ≥ d, =ℜ pλ (n)λ h(A ) = ℜ h Pλ (n) λ + + λ∈σ (A)

λ∈σ (A)

(7.20) where pλ = h(Pλ ) := h(ℜPλ ) + ih(ℑPλ ) is a (possibly non-real) polynomial in n of degree at most d − 1. To establish the asserted equivalence, assume f rst that σ(A) is nonresonant and, given any h ∈ Ld , that pλ 6= 0 for some λ ∈ σ + (A). (Otherwise h(An ) = 0 for all n ≥ d.) Let r = max{|λ| : λ ∈ σ + (A), pλ 6= 0} > 0 .

Recall from (7.18) that σ(A) ∩ rS contains at most two elements. Note also that r and −r cannot both be eigenvalues of A, since otherwise σ(A) would be resonant. Hence either exactly one of the two numbers  r, −r is an eigenvalue of A, and log r is irrational, or else σ(A) ∩ rS = re±πıϑ with the appropriate irrational 0 < ϑ < 1, and log r 6∈ spanQ {1, 1 ± ϑ} = spanQ {1, ϑ}. In the former case, assume without loss of generality that r is an eigenvalue. (The case of −r being an eigenvalue is completely analogous.) Recall that |λ| < r for every other eigenvalue λ of A with pλ 6= 0. Denote by k ∈ {0, 1, . . . , d − 1} the degree of the polynomial pr , and let c = limn→∞ pr (n)/nk . Note that c is non-zero and real. From (7.20), it follows that X  n h(An ) = pr (n)rn + ℜ p (n)λ = rn nk (c + εn ) , n ≥ d , λ + λ∈σ (A):|λ| 0 for all odd n, so h0 (An ) is neither Benford nor terminating. In the latter case of non-real eigenvalues, there exist linearly independent unit vectors u, v ∈ Rd such that, for all n ∈ N, An v = r0n sin(πnρ)u + r0n cos(πnρ)v . (7.21) Pd Hence with h0 = j,k=1 (uj − vj )(uk + vk )[ · ]jk ∈ Ld as above, An u = r0n cos(πnρ)u − r0n sin(πnρ)v ,

  h0 (An ) = r0n (u − v)⊤ (cos(πnρ) + sin(πnρ))u + (cos(πnρ) − sin(πnρ))v = 2(1 − u⊤ v)r0n sin(πnρ) ,

n and again  h0n(A  ) = 0 periodically but not identically since 0 < ρ < 1 is rational. Thus h0 (A ) is neither Benford nor terminating. It remains to consider the case where ∆Z ∩ Q = {1} for every Z = σ(A) ∩ rS and r > 0, but log r0 ∈ spanQ ∆Z for some r0 > 0. Again, it is helpful to distinguish two cases: Either σ(A) ∩ r0 S ⊂ R or σ(A) ∩ r0 S ⊂ C \ R. In the former case, exactly one of the two numbers r0 and −r0 is an eigenvalue of A. The argument for −r0 being analogous, assume without loss of generality that σ(A) ∩ r0 S = {r0 }. Then ∆Z = {1} and hence log r0 is rational. In particular, Pd taking h0 = j,k=1 uj uk [ · ]jk ∈ Ld , where u ∈ Rd is any unit eigenvector of A   corresponding to the eigenvalue r0 , yields h0 (An ) = r0n , thus h0 (An ) is neither Benford nor terminating. In the latter case, i.e., for Z = σ(A)∩r0 S = r0 e±πıϑ with some irrational 0 < ϑ < 1, again pick linearly independent unit vectors u, v ∈ Rd such that (7.21) holds for all n ∈ N with ρ replaced by ϑ. With Pd h0 = j,k=1 (uj + vj )(uk + vk )[ · ]jk , it follows that

h0 (An ) = 2r0n (1 + u⊤ v) cos(πnϑ) ,

n ∈ N.

Recall that log r0 ∈ spanQ {1, ϑ}, by assumption. Hence Proposition 7.22(ii) 1 ⊤ with a = r0 , b = 0, θ =  2 ϑ, f (t) = 2(1 + u v) cos(2πt), and εn ≡ 0 shows that n the sequence h0 (A ) is not Benford. Clearly, it is not terminating either. In

154

CHAPTER 7

summary, (ii) fails whenever σ(A) is resonant. Thus (ii)⇒(i), and the proof is complete.  Remark. Weaker forms of the implication (i)⇒(ii) in Theorem 7.21 or special cases thereof can be traced back at least to [138, 142] and may also be found in [11, 20, 87, 110]. Especially in the early literature, the corresponding results often apply only to the special situation of linear difference equations; cf. Section 7.5. The reverse implication (ii)⇒(i) seems to have been addressed previously only for d < 4; see [20, Thm. 5.37]. The following examples review matrices encountered earlier in the light of Theorem 7.21. Note that if A ∈ Rd×d is invertible then (7.19) holds for all n ∈ N, and hence “terminating” in Theorem 7.21(ii) can be replaced by “identically zero” in this case; see Proposition 7.31 below. Example 7.23. (i) As seen in Example 7.19, for the matrix A from (7.2) the set σ(A) is nonresonant; note also that A is invertible. Thus for every h ∈ L2   the sequence h(An ) is Benford unless h(I2 ) = h(A) = 0; in the latter case h(An ) ≡ 0. Choosing specif cally h = [ · ]21 yields h(A) = 1 6= 0 and hence shows again that ([An ]21 ) =  (Fn ) is Benford. Similarly, h = −[ · ]11 + 3[ · ]12 yields h(A) = 2 6= 0, and so h(An ) = (2, 1, 3, 4, 7, . . .), traditionally referred to as the sequence of Lucas numbers [117, A000032], is Benford as well. Note that the latter conclusion could not have been reached using Theorem 7.3 because h is not nonnegative. (ii) For the (invertible) matrix B from (7.5),  the set  σ(B) is resonant. By Theorem 7.21, there exists h ∈ L2 for which h(B n ) is neither Benford nor identically zero. One example of such an h has already been seen in Section 7.1, namely, h = 2[ · ]11 + [ · ]22 , for which h(B n ) = 3(−1)n for all n ∈ N. z Example 7.24. (i) The spectrum of the (invertible) matrix A in Example  7.5(i), σ(A) = {−1, 1, 2}, is resonant. For some h ∈ L3 , therefore, h(An ) is neither Benford nor identically zero. A simple example is h = [ · ]11− [ · ]13 , for which h(An ) ≡ 1. Recall from Example 7.5(i), however, that h(An ) is Benford whenever h 6= 0 is nonnegative. (ii) Since the (invertible) matrix B in Example 7.5(ii) is stochastic,σ(B) is resonant. This in turn corroborates the observation made there that h(B n ) is neither Benford nor identically zero whenever h ∈ L3 \ {0} is nonnegative. Recall from Example 7.5(ii) that 

1 limn→∞ B n = 31  1 1

 1 1 1 1  =: Q . 1 1

    The question whether sequences h(B n − Q) and h(B n+1 − B n ) can be Benford is addressed by Theorem 7.32 below. z

155

MULTI-DIMENSIONAL LINEAR PROCESSES

Example 7.25. The spectra of the (invertible) matrices A and B in Example 7.7, σ(A) = {−10, 20} and σ(B) = {2, 10}, are both resonant. However, with An =

· 20n−1 (A + 10I2 ) + 13 · (−10)n−1 (A − 20I2 ) , n ∈ N ,   n it is clear from Theorem 4.16that h(A ) is Benford whenever h(A+10I2 ) 6= 0.  For most h ∈ L2 , therefore, h(An ) is Benford. On the other hand, 2 3

Bn =

· 10n−1 (B − 2I2 ) − 2n−3 (B − 10I2 ) , n ∈ N .   Thus the sequence h(B n ) can be Benford only if h(B − 2I2 ) = 0, and hence, for most h ∈ L2 , is neither Benford nor identically zero. z 5 4

Example 7.26. Theorem 7.3 does not apply to the (invertible) matrix A in Example   7.8. Since σ(A) = {2} is nonresonant, by Theorem 7.21 the sequence h(An ) is Benford for every h ∈ L2 unless h(I2 ) = h(A) = 0; in the latter case it is identically zero. z   While Theorem 7.21 neatly characterizes the Benford property of h(An ) in terms of σ(A), it should be noted that deciding whether σ(A) is resonant or not can be a challenge. The next example illustrates the basic diÿ culty. Example 7.27. The (invertible, nonnegative, integer) matrix A in Example √  7.12 has spectrum σ(A) = −1 ± ı 2, 2 . Since log 2 is irrational, σ(A) is √ √  nonresonant precisely if −1 ± ı 2 = σ(A) ∩ 3S is. Note that   √ 1 √ spanQ ∆σ(A)∩ 3S = spanQ 1, arctan 2 , π √  so σ(A) is nonresonant if and only if log 3 6∈ spanQ 1, π1 arctan 2 . Using stan√ dard number theory tools, it is not hard to check that log 3 and π1 arctan 2 are both irrational (in fact, transcendental). However, it seems to be unknown √ whether the numbers 1, log 3, and π1 arctan 2 are Q-independent. (The Qindependence of these three numbers would follow easily from a prominent but as yet unresolved conjecture of S. Schanuel [158].) In other words, it is unknown whether σ(A) is resonant or not. The reader may want to compare this delicate situation to the ease with which it was ascertained, in Example 7.12, that   h(An ) is Benford for every nonnegative observable h 6= 0 on R3×3 . Thus, for instance, ([An ]12 + [An ]13 ) = (1, 1, 1, 7, 7, 13, 49, . . .) is Benford. On the other hand, Q = limn→∞

An = 2n



3 2 1  6 4 11 12 8

 1 2 , 4

and hence, as just explained, it is not known whether the integer sequence (−[An ]11 + [An ]12 + [An ]13 ) = (1, 1, −5, 7, 1, −23, 43, . . .)

156

CHAPTER 7

is Benford; cf. [117, A087455]. Experimental evidence suggests that it is; see n Figure 7.2. Also note that even if σ(A) is resonant, the sequence h(A  ) n is nevertheless Benford whenever h(Q) 6= 0. For most h ∈ L3 , therefore, h(A ) is Benford regardless of whether or not σ(A) turns out to be resonant. z

−[ ·]11 + [· ]12 + [· ]13

[ · ]12 + [· ]13

1

2

3

4

5

0.00 10.00 10.00 10.00

6

7

8

9



0.00 20.00

0.00

0.00

19.89

N = 10

50.00

N = 102

32.00 16.00 13.00 10.00

7.00

6.00

7.00

4.00

5.00

1.89

N = 103

30.30 17.50 12.50

9.70

7.80

6.50

6.10

4.90

4.70

0.30

N = 104

30.12 17.59 12.50

9.69

7.91

6.69

5.82

5.09

4.59

0.02

N = 10

40.00 20.00

0.00 10.00

0.00 10.00

12.49

N = 102

25.00 21.00 12.00

6.00

7.00

9.00

7.00

5.00

8.00

5.10

N = 103

29.40 17.80 12.40

9.40

7.70

7.10

6.40

4.60

5.20

0.70

N = 104

30.17 17.55 12.64

9.40

8.24

6.64

5.61

5.23

4.52

0.32

0.00 10.00 10.00

Figure  7.2: Relative frequencies of the f rst signif cant digit for the f rst N terms of h(An ) , with A as in Examples 7.12 and 7.27: For h = [ · ]12 + [ · ]13 , the sequence is Benford; the empirical data suggest that for h = −[ · ]11 +[ · ]12 +[ · ]13 this is the case also. As with  nonnegative matrices in the previous section, the Benford property of h(An ) may also be of interest for the nonlinear observables h(A) = |A| and h(A) = |Au| on Rd×d . Theorem 7.28 ([17]). Let A ∈ Rd×d . If σ(A) is nonresonant then: (i) The sequence (|An |) is Benford, or Ad = 0; (ii) For every u ∈ Rd the sequence (|An u|) is Benford or terminating. Proof. As in the case of Theorem 7.21 above, the argument presented here makes use of the additional assumption (7.18); see [17] for a complete proof without this assumption. To prove (i), deduce from (7.19) that either Ad = 0 or else, with the appropriate r > 0, Z ⊂ σ + (A) ∩ rS, and k ∈ {0, 1, . . . , d − 1},  X An = rn nk ℜ

λ∈Z

  Cλ eın arg λ + Bn ,

n ≥ d,

MULTI-DIMENSIONAL LINEAR PROCESSES

157

where each Cλ = 6 0 is a (possibly non-real) d × d-matrix, and (Bn ) is a sequence in Rd×d for which (n|Bn |) is bounded. By (7.18), the set σ(A) ∩ rS contains at most two elements; it is also nonresonant, by assumption. Hence either the set σ(A) ∩ rS contains exactly one of the two numbers ±r and log r is irrational, or else σ(A) ∩ rS equals {re±πıϑ } with some irrational 0 < ϑ < 1 for which log r 6∈ spanQ {1, ϑ}. Assume f rst that σ(A)∩rS equals {r} or {−r}. Then, with λ = r or λ = −r, the matrix Cλ is real, i.e., Cλ ∈ Rd×d , and log |An | = n log r + k log n + log Cλ + Bn (−1)n arg λ/π , n ∈ N .

Together with Propositions 4.3(i) and 4.6(ii, iii), this shows that (log |An |) is u.d. mod 1 since log r is irrational. Hence (|An |) is Benford. Assume now that σ(A) ∩ rS = {re±πıϑ }. Then, with λ = reπıϑ ,     |An | = rn nk ℜ Cλ eπınϑ + Bn = rn nk f ( 21 nϑ + εn ) , n ∈ N ,   with the 1-periodic continuous function f (t) = ℜ Cλ e2πıt and the (real) sequence (εn ) given by     εn = ℜ Cλ eπınϑ + Bn − ℜ Cλ eπınϑ , n ∈ N .

Note that f (t) > 0 for all but (at most) countably many t ≥ 0. Also |εn | ≤ |Bn |, so limn→∞ εn = 0. Consequently, Proposition 7.22(i) with a = r, b = k, and θ = 12 ϑ shows that (|An |) is Benford. The completely analogous proof of (ii) is left to the reader.  Remark. As is the case with Theorems 7.9 and 7.11, Theorem 7.28 (i) and (ii) both hold with | · | replaced by any other norm on Rd×d and Rd , respectively. Moreover, the conclusions remain valid under the weaker assumption that σ(AN ) is nonresonant for some N ∈ N; see [17].

Example 7.29. (i) The matrix A from (7.2) has nonresonant spectrum σ(A). By Theorem 7.28, (|An |) is Benford. This is also clear since |An | ≡ ϕn . (ii) For the matrix B from (7.5), σ(B) is resonant, so Theorem 7.28 does √ not apply. Still, it is readily checked that limn→∞ |B n |/2n = 10 6= 0, so (|B n |) is Benford nevertheless, by Theorem 4.16. z

Example 7.29(ii) above indicates that, in contrast to Theorem 7.11, the converse of Theorem 7.28 is not true in general: Even if σ(A) is resonant, (|An |) may nevertheless be Benford or terminating, and so may (|An u|) for every u ∈ Rd . In fact, as the following example illustrates, it is impossible to characterize properties (i) and (ii) in Theorem 7.28 solely in terms of σ(A) — except, of course, in the trivial case d = 1. Example 7.30. Consider the (invertible) 2 × 2-matrix " # 2 2 cos(π ) − sin(π ) A = 10π , sin(π 2 ) cos(π 2 )

158

CHAPTER 7

 2 for which σ(A) = 10π e±ıπ is resonant because

π = log 10π ∈ spanQ ∆σ(A) = spanQ {1, π} .

Nevertheless, (|An |) = (10πn ) is Benford, and so is (|An u|) = (10πn |u|) for every u ∈ R2 \ {0}. This again shows that the nonresonance assumption in Theorem 7.28 is not necessary for the conclusion. Next consider the (invertible) matrix " √ # 3 cos(π 2 ) −3 sin(π 2 ) 10π √ B= √ , 3 3 cos(π 2 ) sin(π 2 )  2 so σ(B) = 10π e±ıπ = σ(A). Thus, as far as their spectra are concerned, the matrices A and B are indistinguishable. (In fact, A and B are similar.) In particular, σ(B) is resonant as well. From " √ # 3 cos(π 2 n) −3 sin(π 2 n) 10πn n √ B = √ , n ∈ N, 3 sin(π 2 n) 3 cos(π 2 n) a straightforward calculation yields q p 10πn 4 − cos(2π 2 n) + | sin(π 2 n)| 14 − 2 cos(2π 2 n) , |B n | = √ 3

n ∈ N,

which in turn shows that hlog |B n |i = f (hnπi) for all n ∈ N, with the smooth function f : [0, 1) → [0, 1) given by D  E p f (s) = s − 12 log 3 + 12 log 4 − cos(2πs) + sin(πs) 14 − 2 cos(2πs) .

Recall from Example 4.7(i) that (nπ) is u.d. mod 1. Since f is a smooth bi jection of [0, 1) with non-constant derivative, it follows that f (hπni) is not u.d. mod 1, simply because λ0,1 ◦ f −1 6= λ0,1 . Thus (|B n |) is neither Benford nor terminating. In a similar manner, it can be shown that (|B n u|) is neither Benford nor terminating for any u ∈ R2 \ {0}. z As noted earlier, if A ∈ Rd×d is invertible then (7.19) holds for all n ∈ N; Theorems 7.21 and 7.28 therefore have the following corollary. Proposition 7.31. Let A ∈ Rd×d be invertible. If σ(A) is nonresonant then:   (i) For every h ∈ Ld the sequence h(An ) is Benford or identically zero; (ii) The sequence (|An |) is Benford;

(iii) For every u ∈ Rd \ {0} the sequence (|An u|) is Benford.

159

MULTI-DIMENSIONAL LINEAR PROCESSES

Remark. By Proposition 7.31, for any invertible A ∈ Rd×d , nonresonance of   σ(A) guarantees an abundance of Benford sequences of the form h(An ) . Most d × d-matrices are invertible with nonresonant spectrum, from a topological as well as a measure-theoretic perspective. To put this more formally, let  Gd = A ∈ Rd×d : A is invertible and σ(A) is nonresonant .  Thus, for example, G1 = [a] : a ∈ R \ {0}, log |a| 6∈ Q . While the complement of Gd is dense in Rd×d , it is a topologically small set: Rd×d \Gd is of f rst category, i.e., a countable union of nowhere dense sets. A (topologically) typical (“generic”) d × d-matrix therefore belongs to Gd . Also, Rd×d \ Gd is a (Lebesgue) nullset. Thus if A is an Rd×d -valued random variable, that is, a random matrix, whose distribution is a.c. with respect to the d2 -dimensional Lebesgue measure on Rd×d , then P(A ∈ Gd ) = 1, i.e., with probability one A is invertible and σ(A) is nonresonant. Cancellations of resonance If A is a real d×d-matrix and logρ(A) is rational then, for most linear observables h on Rd×d , the sequence h(An ) is not  Benford. Even in this  situation, however, it is quite possible for the sequence h(An+1 − ρ(A)An ) to be Benford. This phenomenon has " already # been observed in Example 7.13 for the (row-stochastic) 1 1 matrix A = 21 , where log ρ(A) = log 1 = 0 is rational, and yet 2 0  n An+1 − An = − 12 (A − I2 ) , n ∈ N .   Hence every sequence h(An+1 − An ) is Benford or identically zero. The remainder of the present section studies this “cancellation of resonance” scenario and demonstrates how it can be easily understood by utilizing the results from above. The scenario was f rst described in, and is of particular interest for, the case of stochastic matrices [22]; this important special case is the subject of the next section. However, as explained here, “cancellation of resonance” may occur much more generally, namely, whenever A has a simple dominant eigenvalue. Assume that the real d × d-matrix A has a dominant eigenvalue λ0 that is (algebraically) simple, i.e., |λ| < |λ0 | for every λ ∈ σ(A) \ {λ0 }, and λ0 is a simple root of the characteristic polynomial of A. Note that λ0 is necessarily a real number, and ρ(A) = |λ0 |. For λ0 6= 0, it is not hard to see that the limit QA = limn→∞

An λn0

(7.22)

exists. Moreover, QA A = AQA = λ0 QA and Q2A = QA . In fact, QA is simply the spectral projection associated with λ0 and can also be represented in the form uv ⊤ QA = ⊤ , (7.23) u v

160

CHAPTER 7

where u, v are eigenvectors of A and A⊤ , respectively, corresponding to the eigenvalue λ0 . A simple dominant eigenvalue is often observed in practice and occurs, for instance, whenever A is nonnegative and primitive; see Proposition 7.2. (In this case,  QA  is in fact  positive.) But it also occurs for matrices such 1 −1 2 1 as and , which are not nonnegative and nonnegative but −1 1 0 1 not primitive, respectively. Now consider the sequences (An+1 − λ0 An ) and (An − λn0 QA ), both of which in some sense measure the speed of convergence in (7.22), and therefore are often of interest in their own right. With the help of Theorem 7.21, the Benford properties of these sequences are easy to understand. Theorem 7.32. Let A ∈ Rd×d have a dominant eigenvalue λ0 that is algebraically simple, and let QA be the associated projection (7.23). Then the following are equivalent: (i) The set σ(A) \ {λ0 } is nonresonant;   (ii) For every h ∈ Ld the sequence h(An+1 − λ0 An ) is Benford or terminating;   (iii) For every h ∈ Ld the sequence h(An − λn0 QA ) is Benford or terminating.

Proof. As the theorem clearly holds for d = 1, henceforth assume d ≥ 2 and thus λ0 6= 0. As in the proof of Theorem 7.3, let B = A − λ0 QA and observe that AB = BA and QA B = 0 = BQA , so An = λn0 QA + B n ,

n ∈ N.

(7.24)

It will f rst be shown that σ(B) = (σ(A) \ {λ0 }) ∪ {0}. To this end, note that for every λ ∈ σ(A) \ {λ0 } and w ∈ Rd \ {0} with (A2 − 2ℜλA + |λ|2 Id )d w = 0 (i.e., w is a generalized eigenvector of A corresponding to the eigenvalue λ 6= λ0 or λ 6= λ0 ), 0 = v ⊤ (A2 − 2ℜλA + |λ|2 Id )d w = = (λ0 − λ)d (λ0 − λ)d v ⊤ w .



d ⊤ (A⊤ )2 − 2ℜλA⊤ + |λ|2 Id v w

Thus v ⊤ w = 0, which in turn implies QA w = 0, and hence An w = B n w for all n ∈ N, by (7.24). Consequently, λ ∈ σ(B). Also, Au = λ0 u = λ0 QA u and therefore Bu = 0. Thus 0 ∈ σ(B), and hence (σ(A) \ {λ0 }) ∪ {0} ⊂ σ(B). Moreover, if Bw = λ0 w for some w ∈ Rd , then (7.24) yields QA w = limn→∞

An w = QA w + w ; λn0

hence w = 0, which shows that λ0 6∈ σ(B). Conversely, if λ ∈ σ(B) \ {0} then (B 2 − 2ℜλB + |λ|2 Id )w = 0 for some w ∈ Rd \ {0}, which in turn implies

161

MULTI-DIMENSIONAL LINEAR PROCESSES

0 = QA (B 2 − 2ℜλB + |λ|2 Id )w = |λ|2 QA w and hence QA w = 0. It follows that (A2 − 2ℜλA + |λ|2 Id )w = 0 as well, i.e., λ ∈ σ(A). In summary, therefore, σ(B) = (σ(A) \ {λ0 }) ∪ {0}, as claimed, and σ(B) is nonresonant if and only if σ(A) \ {λ0 } is nonresonant. Deduce from (7.24) that An+1 − λ0 An = B n (B − λ0 Id ) ,

An − λn0 QA = B n ,

n ∈ N.

Since B − λ0 Id is invertible, the equivalence of (i), (ii), and (iii) now follows immediately from Theorem 7.21.    6 4 Example 7.33. (i) The (positive) matrix A = , f rst encountered 4 6 in Example 7.7, has the simple dominant eigenvalue λ0 = 10. Thus Theorem 7.32 applies with   An 1 1 QA = limn→∞ n = 21 . 1 1 λ0   Since the set σ(A)\{10} = {2} is nonresonant, every sequence h(An+1 −10An )  and h(An − 10n QA ) with h ∈ L2 is Benford or terminating (in fact, identically zero). This can also be seen directly from An+1 − 10An = 2n (A − 10I2 ) ,

An − 10n QA = −2n−3 (A − 10I2 ) , n ∈ N .   19 20 (ii) For the (nonnegative) matrix B = , λ0 = 20 is a simple 1 0   20 20 1 dominant eigenvalue, and QB = 21 . However, σ(B) \ {20} = {−1} 1 1   and hence some (in fact, most) sequences h(B n+1 − 20B n ) and is resonant,  h(B n − 20n QB ) are neither Benford nor terminating. Again, this can be conf rmed by an explicit calculation, which yields B n+1 −20B n = (−1)n (B−20I2 ) ,

n ∈ N.

 6 −8 does not have a dominant eigenvalue, since 4 −6 σ(C) = {±2}. Hence Theorem 7.32 does not apply, and the limit limn→∞ C n /2n does not exist. Note that even in this case, however, for every h ∈ L2 , the sequence h(C n+1 − 2C n ) is Benford or identically zero, because (iii) The matrix C =



1 (20I2 −B) , B n −20n QB = (−1)n 21

C n+1 − 2C n = (−2)n (2I2 − C) ,

(iv) Consider the (nonnegative) 3 × 3-matrix   3 1 0 D= 0 3 0 . 0 0 2

n ∈ N.

162

CHAPTER 7

Clearly, λ0 = 3 is a dominant eigenvalue, However,  n 3 n3n−1 n  0 3n D = 0 0

and σ(D) \ {3} = {2} is nonresonant.  0 0 , 2n

n ∈ N,

so limn→∞ Dn /3n does not exist. The reason for this can be seen in the fact that the eigenvalue λ0 , although  dominant, is not  simple. Thus Theorem 7.32 does not apply. Nevertheless, h(Dn+1 − 3Dn ) is Benford or identically zero for every h ∈ L3 . z

Remark. Close inspection of the proof of Theorem 7.32 shows that the assumption of algebraic simplicity for λ0 can be relaxed somewhat: The equivalences in Theorem 7.32 remain unchanged if the dominant eigenvalue λ0 is merely assumed to be semi-simple, meaning that its algebraic and geometric multiplicities coincide or, equivalently, that A − λ0 Id and (A − λ0 Id )2 have the same rank. For instance, the eigenvalue λ0 = 3 of D in Example 7.33(iv) is not semi-simple. 7.4

AN APPLICATION TO MARKOV CHAINS

This brief section demonstrates how the results from earlier in the chapter can fruitfully be applied to the matrices of transition probabilities associated with f nite-state Markov chains. Markov chains constitute the simplest, most fundamental class of stochastic processes with widespread applications throughout science [116]. Recall that a (time-homogeneous) d-state Markov chain is a discrete-time Markov process (Xn ) on d states s1 , s2 , . . . , sd such that P(Xn+1 = sk |Xn = sj ) is independent of n, for all j, k ∈ {1, 2, . . . , d}. Thus pjk = P(Xn+1 = sk |Xn = sj ) ,

j, k ∈ {1, 2, . . . , d} ,

(7.25)

def nes a matrix P = [pjk ] ∈ Rd×d , the matrix of one-step transition probabilities of (Xn ). Clearly, P is nonnegative and row-stochastic. Moreover, (7.25) implies that the N -step transition probabilities of (Xn ) are simply given by the entries of P N , that is, for all n ∈ N, P(Xn+N = sk |Xn = sj ) = [P N ]jk ,

j, k ∈ {1, 2, . . . , d} .

Thus the long-term behavior of the stochastic process (Xn ) is governed by the sequence of stochastic matrices (P n ). A d-state Markov chain (Xn ) is irreducible if, for every j, k ∈ {1, 2, . . . , d}, there exists N ∈ N such that P(XN = sk |X1 = sj ) > 0. Note that this is equivalent to the associated matrix P being irreducible. Also (Xn ) is aperiodic if, for every j ∈ {1, 2, . . . , d}, the greatest common divisor of the numbers in  the set n ≥ 2 : P(Xn = sj |X1 = sj ) > 0 equals 1; in this case, for convenience, call P aperiodic also. It is well known (and easy to check) that (Xn ) is irreducible and aperiodic if and only if the associated matrix P of one-step transition probabilities is primitive.

MULTI-DIMENSIONAL LINEAR PROCESSES

163

Consider now an irreducible and aperiodic d-step Markov chain (Xn ) with associated matrix P . Recall from Proposition 7.2 that P ∗ = limn→∞ P n exists and is positive. In fact, since P ∗ has rank one and is row-stochastic, all rows of P ∗ are identical. The following observation is an immediate consequence of Theorem 7.11. Proposition 7.34. Let P ∈ Rd×d be the matrix of one-step transition probabilities of an irreducible and aperiodic d-state Markov  chain (Xn ). Then, for each j, k ∈ {1, 2, . . . , d}, the sequences ([P n ]jk ) and P(Xn = sj ) are not Benford.

A common problem in many Markov chain models is to estimate the matrix P ∗ = limn→∞ P n through numerical simulations. In this context, the sequences (P n+1 − P n ) and (P n − P ∗ ) are of special interest as they both in some sense measure the speed of convergence Pn → P ∗ . As the following corollary of Theorem 7.32 shows, often these sequences are Benford. (Recall that every irreducible and aperiodic stochastic matrix has λ0 = 1 as a simple dominant eigenvalue.) Proposition 7.35. [22, Thm. 12] Let P ∈ Rd×d be the matrix of one-step transition probabilities of an irreducible and aperiodic d-state Markov chain, and P ∗ = limn→∞ P n . If the set σ(P ) \ {1} is nonresonant then, for each j, k ∈ {1, 2, . . . , d}, the sequences ([P n+1 − P n ]jk ) and ([P n − P ∗ ]jk ) are Benford or terminating.   1 1 1 Example 7.36. (i) The stochastic matrix P = 2 , f rst encountered 2 0  in Example 7.13, is irreducible and aperiodic, and since σ(P ) \ {1} = − 21 is nonresonant, Proposition 7.35 applies. In Example  7.13, it was seen that     2 1 h(P n+1 − P n ) and h(P n − P ∗ ) , with P ∗ = 31 , are Benford or 2 1 identically zero for every h ∈ L2 , depending on whether h(I2 − P ) 6= 0 or h(I2 − P ) = 0. (ii) For the (irreducible and aperiodic) stochastic matrix   3 2 0 P = 51  4 0 1  , 0 3 2   σ(P ) = ± √15 , 1 . Since σ(P ) \ {1} = ± √15 is resonant, Proposition 7.35 does not apply. A straightforward calculation conf rms that   6 3 1 1  6 3 1 , P ∗ = 10 6 3 1   and, for instance, that [P n − P ∗ ]11 = 5−(n+2)/2 1 + (−1)n for all n ∈ N; hence ([P n − P ∗ ]11 ) is neither Benford nor terminating. z

164

CHAPTER 7

It is not hard to see that for d = 2 the converse of Proposition 7.35 is also true: If the sequences ([P n+1 − P n ]jk ) and ([P n − P ∗ ]jk ) are Benford or terminating for all j, k ∈ {1, 2}, then σ(P ) \ {1} is nonresonant. For d ≥ 3 the analogous statement is no longer true. In fact, as the next example shows, for d ≥ 3 the property that the sequences ([P n+1 − P n ]jk ) and ([P n − P ∗ ]jk ) are Benford or terminating for every j, k ∈ {1, 2, . . . , d} cannot be characterized in terms of σ(P ) alone. Example 7.37. Consider the (irreducible  14 11 1  11 14 P = 30 5 5

and aperiodic) stochastic matrix  5 5 , 20

 1 1 , 2 is resonant. Hence Corollary 7.35 does not apply. for which σ(P )\{1} = 10 However, from       1 1 1 1 1 −2 1 −1 0 1 −2  + 21 ·10−n  −1 1 0 , P n = 31  1 1 1  + 13 ·2−(n+1)  1 1 1 1 −2 −2 4 0 0 0

it is straightforward to deduce that ([P n+1 − P n ]jk ) and ([P n − P ∗ ]jk ) are Benford for all j, k ∈ {1, 2, 3}. This shows that the converse of Proposition 7.35 does not hold in general if d ≥ 3. Next consider the (irreducible and aperiodic) stochastic matrix   6 3 1 1  3 4 3 . Q = 10 1 3 6 1 1 , 2 , 1 = σ(P ), the two matrices P and Q are indistinguishable Since σ(Q) = 10 with regard to their spectrum. (In fact, P and Q are similar.) Moreover, as P and Q are both symmetric, Q∗ = P ∗ . As was essentially seen in Example 7.5(ii), for instance, the sequences ([Qn+1 − Qn ]22 ) and ([Qn − Q∗ ]22 ) are neither Benford nor terminating. z

The hypotheses in Proposition 7.35 are satisf ed by most stochastic matrices. To put this more formally, denote by Pd the family of all (row-) stochastic d × dmatrices, that is,   Xd Pd = P ∈ Rd×d : P ≥ 0, [P ]jk = 1 for all j = 1, 2, . . . , d . k=1

The set Pd is a compact and convex subset of Rd×d ; for example,    s 1−s P1 = {[1]} and P2 = : 0 ≤ s, t ≤ 1 . 1−t t

165

MULTI-DIMENSIONAL LINEAR PROCESSES

Note that Pd can be identif ed with a d-fold copy of the standard (d − 1) d  P simplex, that is, Pd ≃ u ∈ Rd : u ≥ 0, dj=1 uj = 1 , and hence Pd carries the (normalized) d(d − 1)-dimensional Lebesgue measure Leb. Now let n o Hd = P ∈ Pd : P is irreducible and aperiodic, σ(P ) \ {1} is nonresonant ,

which is exactly the family of stochastic matrices covered by Proposition 7.35. For instance, H1 = {[1]} = P1 and    s 1−s H2 = : 0 ≤ s, t < 1, s + t = 1 or log |s + t − 1| 6∈ Q , 1−t t

and in both cases Hd makes up most of Pd . More formally, it can be shown that, for every d ∈ N, the complement of Hd in Pd is a set of f rst category and has Lebesgue measure zero. Thus if P is a Pd -valued random variable, i.e., a random stochastic matrix, whose distribution is absolutely continuous (with respect to Leb, which means that P(P ∈ C) = 0 whenever C ⊂ Pd and Leb(C) = 0), then P(P ∈ Hd ) = 1. Together with Proposition 7.35, this implies the following. Proposition 7.38. [22, Thm. 17] If the random stochastic matrix P has an absolutely continuous distribution then, with probability one, P is irreducible and aperiodic, and the sequences ([P n+1 − P n ]jk ) and ([P n − P ∗ ]jk ) are Benford or terminating for each j, k ∈ {1, 2, . . . , d}.

Note, for instance, that a random stochastic matrix has an absolutely continuous distribution whenever its d rows are chosen independently according to the same density on the standard (d − 1)-simplex. While the above generic properties are very similar to the corresponding results for arbitrary matrices (see the Remark on p. 159), they do not follow directly from the latter. In fact, they are somewhat harder to prove because they assert (topological as well as measure-theoretic) prevalence of Hd within the space Pd , which, as a subset of Rd×d , is itself a nowhere dense nullset. The interested reader may want to consult [22] for details. 7.5

LINEAR DIFFERENCE EQUATIONS

In applied science as well as in mathematics, linear processes often emerge in the form of linear difference equations. Despite their simplicity, these equations form a suÿ ciently wide class of systems to provide fundamental models for disciplines as diverse as mechanical engineering, population dynamics, and signal processing [41, 55]. This section studies the Benford property for solutions of linear difference equations. The main results (Theorems 7.39 and 7.41) are direct applications of results presented earlier in this chapter. Consider the (autonomous) linear difference equation (or recursion) xn = a1 xn−1 + a2 xn−2 + . . . + ad xn−d ,

n ≥ d +1,

(7.26)

166

CHAPTER 7

where, here as throughout, d is a f xed positive integer, referred to as the order of (7.26), and a1 , a2 , . . . , ad are real numbers, with ad 6= 0. Thus, (7.8) is a f rst-order equation with a1 = −2, whereas (7.7) and (7.9) are second-order equations with a1 = a2 = 1 and a1 = −2, a2 = −3, respectively. Once the initial values x1 , x2 , . . . , xd are specif ed, (7.26) uniquely def nes a sequence of real numbers (xn ), referred to as a solution of (7.26). Under what conditions is (xn ) Benford? In order to make this question amenable to results obtained earlier in this chapter, rewrite (7.26) by inserting d−1 redundant rows,      xn a1 a2 · · · ad−1 ad xn−1  xn−1   1   0 ··· 0 0       xn−2   xn−2   0   xn−3  1 0 · · · 0  =   , n ≥ d +1.    .. . . . . .. ..   ..  . .    . . . . . .  .  xn−d+1 0 ··· 0 1 0 xn−d Hence with the d × d-matrix     A=  

a1 1 0 .. . 0

a2 0 1 .. .

· · · ad−1 ··· 0 0 ··· .. .. . . ··· 0 1

ad 0 0 .. . 0



   ,  

for every n ∈ N, the number xn is simply the last (bottom)      xn+d−2 xn+d−1      .. ..      . .  = . . . = An−1   = A      xn+1  xn xn−1 xn

(7.27)

entry of  xd ..  .  . x2  x1

Recall that ad 6= 0 and hence that A is invertible; specif cally,    xd xd−1  ..   ..    . A−1  .  =   x2   x1 x1 (xd − a1 xd−1 − . . . − ad−1 x1 )/ad With the linear observable

h = xd−1 [ · ]d1 + . . . + x1 [ · ]dd−1 +

(7.28)



  . 

xd − a1 xd−1 − . . . − ad−1 x1 [ · ]dd ad

(7.29)

  on Rd×d , therefore, (xn ) = h(An ) , and the Benford property of any solution (xn ) of (7.26) can indeed be studied by applying the results from earlier sections. To do this effectively, the following simple observations are useful.

167

MULTI-DIMENSIONAL LINEAR PROCESSES

First note that the matrix A in (7.27) is nonnegative if and only if aj ≥ 0 for all j ∈ {1, 2, . . . , d}. Since ad 6= 0, A is irreducible in this case. Next, recall  from Sections 7.2 and 7.3 that the Benford behavior of h(An ) for any h ∈ Ld is determined by properties of σ(A). With A given by (7.27), det(A − zId ) = (−1)d (z d − a1 z d−1 − . . . − ad−1 z − ad ) ,

(7.30)

which shows that σ(A) = {z ∈ C : z d = a1 z d−1 + . . . + ad−1 z + ad }. Finally, it is important to note that not only can every solution (xn ) of (7.26) be written n in the form h(A   ) with the appropriate h ∈ Ld , but also, conversely, the n sequence h(A ) with A from (7.27) solves (7.26) for every h ∈ Ld . Indeed, by (7.30) and the Cayley–Hamilton Theorem, Ad = a1 Ad−1 + . . . + ad−1 A + ad Id , and hence, for every h ∈ Ld ,

  h(An ) = h(An−d Ad ) = h An−d (a1 Ad−1 + . . . + ad−1 A + ad Id ) = a1 h(An−1 ) + . . . + ad−1 h(An−d+1 ) + ad h(An−d ) ,

n ≥ d+1.

With these observations, the Benford property for solutions (xn ) of (7.26) is easily analyzed. First consider the case where the coeÿ cients a1 , a2 , . . . , ad are all positive. In this case, Theorem 7.11 has the following simple corollary. Theorem 7.39. Let (xn ) be a solution of (7.26) with a1 , a2 , . . . , ad > 0. Assume that the initial values x1 , x2 , . . . , xd are all nonnegative and at least one of them is positive. Then (xn ) is Benford if and only if log ζ is irrational, where z = ζ is the root of z d = a1 z d−1 + . . . + ad−1 z + ad with the largest real part. Proof. Since a1 , a2 , . . . , ad > 0, the matrix A associated with (7.26) via (7.27) is primitive, and ρ(A) = ζ > 0. Also, deduce from (7.28) that xn+1 = xd [An ]d1 + . . . + x2 [An ]dd−1 + x1 [An ]dd =: g(An ) ,

n ∈ N,

and note that, unlike h in (7.29), the linear observable g is nonnegative if x1 , x2 , . . . , xd ≥ 0. Clearly, (xn+1 ) is Benford if and only if (xn ) is. To prove assume f rst that log ζ is irrational. Then the sequence  the theorem,  (xn+1 ) = g(An ) is Benford by Theorem 7.11. Conversely, if log ζ is rational then (7.11) and Theorem 4.16 show that (xn+1 ) is not Benford.  2 Example 7.40. (i) For the Fibonacci recursion √  (7.7), the root of z = z + 1 1 with the largest real part is ζ = ϕ = 2 1 + 5 . Since log ϕ is irrational, by Theorem 7.39 every solution of (7.7) with x1 x2 ≥ 0 is Benford, except for the trivial case xn ≡ 0. (For x1 ≤ 0 simply note that (−xn ) is a solution of (7.7) as well.) Theorem 7.39 does not apply if x1 x2 < 0. As was essentially seen in Example 7.23, (xn ) is Benford in this case too. This also follows easily from Theorem 7.41 below.

168

CHAPTER 7

(ii) Consider the third-order difference equation xn = xn−2 + xn−3 ,

n ≥ 4,

(7.31)

which resembles the Fibonacci recursion. Since a1 = 0, Theorem 7.39 does not apply. However, close inspection of the proof of that theorem shows that its conclusion still holds because the matrix associated with (7.31),   0 1 1 A= 1 0 0 , 0 1 0

is primitive since A5 > 0. Moreover det A = 1, and essentially the same argument as that in Example 7.12 shows that log ζ = log ρ(A) is irrational. Hence every solution (xn ) of (7.31) with x1 , x2 , x3 ≥ 0 is Benford, unless x1 = x2 = x3 = 0, in which case xn ≡ 0. z Remark. In Theorem 7.39 it is enough to assume that a1 , a2 , . . . , ad ≥ 0 and aj ad > 0 for some j ∈ {1, 2, . . . , d− 1}, with j relatively prime to d; see Example 7.40(ii) above, where d = 3 and a2 a3 = 1.

For the remainder of this section consider (7.26) with arbitrary, i.e., not necessarily positive, coeÿ cients a1 , a2 , . . . , ad . (Recall that ad 6= 0 throughout.) In this case, the appropriate tools are provided by Theorem 7.21 and Proposition 7.31. Theorem 7.41. For the difference equation (7.26) with a1 , a2 , . . . , ad ∈ R and ad 6= 0 the following are equivalent: (i) The set {z ∈ C : z d = a1 z d−1 + . . . + ad−1 z + ad } is nonresonant;

(ii) Every solution (xn ) of (7.26) is Benford unless xn ≡ 0.

Proof. Let A be the matrix associated with (7.26) via (7.27), and recall that d−1 A is invertible. If σ(A) = {z ∈ C : z d =  a1 zn  +. . .+ad−1 z +ad} is nonresonant then, for every h ∈ Ld , the sequence h(A ) is Benford or identically zero, by Proposition 7.31. Choosing h as in (7.29), therefore, shows that (xn ) is either Benford or identically zero. Conversely, if σ(A) is resonant then, by Theorem  n 7.21, there exists h ∈ L for which h (A ) is neither Benford nor terminating. 0 d 0   Recall that h0 (An ) solves (7.26), so setting xj = h0 (Aj ) for all j ∈ {1, 2, . . . , d} yields a solution that is neither Benford nor terminating (let alone identically zero).  Example 7.42. The set {z ∈ C : z 2 = z + 1} = {−ϕ−1 , ϕ} associated with the Fibonacci recursion (7.7) is nonresonant. This once again shows that, except for xn ≡ 0, every solution of the latter is Benford. For instance, since (Fn ) is Benford, so is the sequence (Fn2 ), by Theorem 4.4. This can also be seen directly by noticing that (Fn2 + 52 (−1)n ) solves xn = 3xn−1 − xn−2 ,

n ≥ 3,

169

MULTI-DIMENSIONAL LINEAR PROCESSES

for which the associated set {z ∈ C : z 2 = 3z − 1} = {ϕ−2 , ϕ2 } is nonresonant. z Example 7.43. Let a1 , a2 be integers with a2 > 0, and consider the secondorder equation xn = a1 xn−1 + a2 xn−2 , n ≥ 3 . (7.32) √ When a1 = 0, clearly the set {z ∈ C : z 2 = a1 z + a2 } = {± a2 } is resonant. For a1 6= 0, however, it consists of two real numbers with different absolute value, and in this case it is resonant only if one of these numbers is of the form ±10k for some integer k ≥ 0. It follows that every solution of (7.32), except for xn ≡ 0, is Benford if and only if a1 6= 0 and |102k − a2 | 6= |a1 | · 10k

for all k ∈ {0, 1, . . . , ⌊log a2 ⌋} . (7.33)

For the Fibonacci recursion (7.7), for instance, condition (7.33) reduces to the obviously correct inequalities 1 6= 0 and |1 − 1| 6= 1. For another simple example consider xn = −2xn−1 + 2xn−2 , n ≥ 3 ,

for which (7.33) reads −2 6= 0 and |1 − 2| 6= 2, and consequently every nontrivial solution is Benford. On the other hand, for xn = −2xn−1 + 3xn−2 ,

n ≥ 3,

(7.33) fails since |1 − 3| = | − 2|, and xn ≡ 1 is a solution that is neither Benford nor identically zero. z Similarly to the main results in Section 7.3, though it is def nitive in theory, Theorem 7.41 may not always be easy to use in practice. Part (iii) of the following example illustrates that practical diÿ culties may arise in applying Theorem 7.41 even if d = 2 and a1 , a2 are both integers. √ Example 7.44. Let 0 < a < 3 be a real number and consider the secondorder difference equation xn = −2axn−1 − 3xn−2 ,

n ≥ 3, (7.34) √  for which Z = {z ∈ C : z 2 + 2az + 3 = 0} = −a ± ı 3 − a2 . Consider the Benford property for solutions (xn ) of (7.34) for three specif c values of the parameter a; Figure 2 = 1. √ √ 7.3 shows  data corresponding to x1 = x√ (i) Let a = 3 cos π √111 = 1.011. In this case, Z = 3e±πı(1−1/ 11) , √   and ∆Z = √111 , 1, 2 − √111 . Since log 3 6∈ spanQ ∆Z = spanQ 1, √111 , the set Z is nonresonant, of (7.34), except for xn ≡ 0, is Benford. √ and  every solution  (ii) Let a = 3 cos π 85 log 3 = 1.025. Here spanQ ∆Z = spanQ {1, log 3}, and since the latter clearly contains log 3, the set Z is resonant. It follows that no solution of (7.34) is Benford in this case. (iii) Finally, let a = 1, in which case (7.34) takes the innocent-looking form (7.9). Since a2 = −3 < 0, the easy-to-check condition (7.33) does not apply.

170

CHAPTER 7

√  Moreover, Z = −1 ± ı 2 , and as already mentioned in Example 7.27, it is unknown whether the set Z is nonresonant. If it were, then every solution of (7.9) except for xn ≡ 0 would be Benford; otherwise, none would. It may not have escaped the reader that the solution (xn ) of (7.34) with x1 = x2 = 1, i.e., (xn ) = (1, 1, −5, 7, 1, −23, 43, . . .), is nothing other than the (integer) sequence (−[An ]11 + [An ]12 + [An ]13 ) considered in Example 7.27. Thus experimental evidence seems to suggest that Z is nonresonant; see Figure 7.3. z log ∆ xn = −2axn−1 − 3xn−2 , x1 = x2 = 1

n≥3

1

a = 1.025 not Benford 0

N 10

100

1000

10000

a=1 unknown −1

a = 1.011 Benford

Figure 7.3: For different values of the parameter a, the solutions (xn ) of (7.34) may or may not be Benford; see Example 7.44 and Figure 7.2.

7.6

LINEAR DIFFERENTIAL EQUATIONS

Just like their discrete-time counterparts, linear differential equations are important for both theoretical and practical purposes. In the theory of differential equations, they constitute the simplest and most fundamental class of systems, and for many applications, they provide basic models of real-world processes [77]. As this section demonstrates, they also form a rich source of Benford functions. Example 7.45. With x0 , v0 ∈ R, consider initial value problems (IVPs) for three simple linear second-order differential equations. Throughout, x = x(t) denotes a smooth function, and x, ˙ x ¨, etc. are, respectively, the f rst, second, etc. derivative of x with respect to t.

171

MULTI-DIMENSIONAL LINEAR PROCESSES

(i) The unique solution x = x(t) of x ¨ −x = 0,

x(0) = x0 , x(0) ˙ = v0 ,

is x(t) = 12 (x0 + v0 )et + 21 (x0 − v0 )e−t , which by Example 4.9(i) and Theorem 4.12(ii) is a Benford function unless x0 = v0 = 0. (ii) The unique solution of x ¨ +x = 0,

x(0) = x0 , x(0) ˙ = v0 ,

is x(t) = x0 cos t + v0 sin t and hence is not Benford for any x0 , v0 , by Theorem 4.10. (iii) The unique solution of the IVP x ¨ + 2x˙ + 3x = 0 ,

x(0) = x0 , x(0) ˙ = v0 ,

is

 √   √  x0 + v0 −t  √  x(t) = x0 e−t cos t 2 + √ e sin t 2 = e−t ℜ ceit 2 , 2 √ with c = x0 − ı(x0 + v0 )/ 2. If x does not vanish √  identically, i.e., if c 6= 0, f x any δ > 0 and note that x(nδ) = an f nδ/(π 2) for all n ∈ N, with a = e−δ and f (t) = ℜ(ce2πıt ). Since the real number π log e is transcendental [158],   δ δ √ 6∈ Q and δ log e 6∈ spanQ 1, √ π 2 π 2 for all but many δ > 0, and in that case, Proposition √ countably   7.22(i) with θ = δ/ π 2 , b = 0, and εn ≡ 0 shows that the sequence x(nδ) is Benford. By Proposition 4.8, therefore, the function x is Benford as well. z

Using matrix-vector notation, all three IVPs above can be written in the form         d x˙ x˙ x˙ v0 =A , = , (7.35) x x x0 dt x t=0 with the 2 × 2-matrix A given, respectively, by      0 1 0 −1 −2 , , 1 0 1 0 1

−3 0

The (unique) solution of (7.35) is     x˙ v0 tA =e , x x0



.

(7.36)

X ∞ tk Ak . Recall that the d × d-matrix k=0 k! dk tA etA is well-def ned for all t ∈ R and any A ∈ Rd×d , and dt = etA Ak for ke k = 0, 1, . . .. From (7.36) it is clear that

with the matrix exponential etA =

x(t) = v0 [etA ]21 + x0 [etA ]22 ,

t ≥ 0.

172

CHAPTER 7

In other words, x(t) = h(etA ) with h ∈ L2 determined by the initial values x0 , v0 , namely, h = v0 [ · ]21 + x0 [ · ]22 . In complete analogy to the discrete-time context of previous sections, this suggests that the following question be studied: Given A ∈ Rd×d , is the function h(etA ) either Benford or identically zero for every h ∈ Ld ? The main result of this section, Theorem 7.49 below, provides a necessary and suÿ cient condition for this to be the case. To formulate the result, the following tailor-made terminology is convenient; here, eZ = {ez : z ∈ Z} for every Z ⊂ C. Definition 7.46. A set Z ⊂ C is exponentially nonresonant if the set etZ is nonresonant for some t ∈ R; otherwise Z is exponentially resonant.

Example 7.47. The empty set is exponentially nonresonant, and the singleton {z} is exponentially nonresonant if and only if ℜz 6= 0. Similarly, the set {z, z} with z ∈ C \ R is exponentially nonresonant if and only if π log e ℜz/ℑz 6∈ Q. z

In deciding whether a given set Z ⊂ C is exponentially nonresonant, the following proposition is often useful; its routine verif cation is left to the reader. Recall that a set is countable if it is either f nite (possibly empty) or countably inf nite.

Proposition 7.48. Assume Z ⊂ C is symmetric with respect to the real axis, i.e., Z = {z : z ∈ Z}, and is countable. Then the following are equivalent: (i) Z is exponentially nonresonant; (ii) For every z ∈ Z, ℜz 6∈ spanQ



ℑw : w ∈ Z, ℜw = ℜz π log e



.

(7.37)

Moreover, if (i) and (ii) hold then the set {t ∈ R : etZ is resonant} is countable.

Remark. The symmetry and countability assumptions are essential in Proposition 7.48. If Z is not symmetric with respect to the real axis then the implication (i)⇒(ii) may fail, as is seen for Z = {1 + ıπ log e}, which is exponentially nonresonant yet does not satisfy (ii). Conversely, if Z is uncountable then (ii)⇒(i) may fail. To see this, take Z = R \ {0}, which satisf es (ii), and yet etZ is resonant for every t ∈ R. Recall, however, that if Z = σ(A) for some A ∈ Rd×d , then clearly Z is both symmetric with respect to the real axis and countable (in fact, Z is f nite).   Section 7.3 has demonstrated how the Benford property of sequences h(An ) with h ∈ Ld depends on whether the spectrum σ(A) of the matrix A is nonresonant. The reader may suspect that it is the exponential nonresonance of σ(A) that guarantees the Benford property for functions h(etA ), and indeed this is the case. The following theorem is an analogue of Theorem 7.21 in the continuous-time setting. It extends to arbitrary dimension the simple fact that

173

MULTI-DIMENSIONAL LINEAR PROCESSES

for f (t) = eta with a ∈ R to be Benford, it is necessary and suÿ cient that a 6= 0 or, equivalently, that the set {a} be exponentially nonresonant. As in the case of Theorem 7.21, a proof of Theorem 7.49 below is given here only under an additional assumption on the matrix A, and the reader is referred to [14] for a complete proof not making use of that assumption. Theorem 7.49 ([14]). For every A ∈ Rd×d the following are equivalent:

(i) The set σ(A) is exponentially nonresonant;

(ii) For every h ∈ Ld the function h(etA ) is Benford or identically zero.

Proof. Assume f rst that σ(A) is exponentially nonresonant and, given any h ∈ Ld , def ne g : [0, +∞) → R as g(t) = h(etA ). By Proposition 7.48, the set σ(eδA ) = eδσ(A) is nonresonant many δ > 0. In that case,  for all but countably  Theorem 7.21 implies that g(nδ) = h(enδA ) is either Benford or identically zero. (Recall that eδA is invertible.) With inf ∅ := +∞, let δ0 = inf{δ > 0 : g(nδ) = 0 for all n ∈ N} ≥ 0 . If δ0 = 0 then there exists a strictly decreasing sequence (δn ) with δn ց 0 and g(δn ) = 0 for every n. Since g is real-analytic, it follows that g(t) ≡ 0. If, on the other hand, δ0 > 0 (possibly δ0 = +∞) then (log |g(nδ)|) is u.d. mod 1 for almost all 0 < δ < δ0 , and Proposition 4.8(i) shows that log |g(t)| is u.d. mod 1. In other words, the function g is Benford. Thus (i)⇒(ii). As indicated above, similarly to the discrete-time setting of Section 7.3, the reverse implication (ii)⇒(i) will be proved here only under an additional assumption which, however, is satisf ed by most A ∈ Rd×d . Specif cally, in analogy to (7.18) assume that   # σ(A) ∩ (x + ıR) ≤ 2 for all x ∈ R , (7.38) i.e., for every x ∈ R the matrix A has at most two eigenvalues λ with ℜλ = x, and hence either σ(A) ∩ (x + ıR) = {x} or else σ(A) ∩ (x + ıR) = {x + ıy, x − ıy} with some y > 0. Thus, to establish the implication (ii)⇒(i) assume that σ(A) is exponentially resonant. By (7.37), there exists λ ∈ σ(A) such that   ℑµ ℜλ ∈ spanQ : µ ∈ σ(A), ℜµ = ℜλ . (7.39) π log e Letting λ = x + ıy for convenience, assume f rst that the set σ(A) ∩ (x + ıR) is a singleton. Thus λ = x ∈ R, and in fact λ = 0, by (7.38). Let u ∈ Rd be any unit eigenvector of A corresponding to the eigenvalue λ = 0. Then etA u = u for Pd all t ≥ 0, and, with h0 = j,k=1 uj uk [ · ]jk , the function g(t) = h0 (etA ) ≡ 1 is neither Benford nor zero. If, on the other hand, σ(A)∩(x+ıR) = {x+ıy, x−ıy} with y > 0, then there exist linearly independent unit vectors u, v ∈ Rd with Au = x u − y v ,

Au = y u + x v ,

174

CHAPTER 7

and consequently   etA u = ext cos(yt)u − sin(yt)v , Letting h0 =

Pd

j,k=1 (uj

  etA v = ext sin(yt)u + cos(yt)v .

+ vj )(uk + vk )[ · ]jk yields

g(t) = h0 (etA ) = (u + v)⊤ etA (u + v) = 2(1 + u⊤ v)ext cos(yt) ,

t ≥ 0.

Recall that 1+u⊤ v > 0 by the Cauchy–Schwarz Inequality, so g is not identically zero. Recall from (7.39) that qxπ log e = py with the appropriate p ∈ Z and q ∈ N. Fix δ > 0 such that yδ/π is irrational, and observe that for all n ∈ N, g(nδ)q = eqxnδ 2q (1 + u⊤ v)q cos(ynδ)q = an f (nθ) , with a = eqxδ , θ = 12 yδ/π, and f (t) = 2q (1 + u⊤v)q cos(2πt)q . Since by assumption log a = qxδ log e = 2pθ ∈ spanQ {1, θ} for every  δ > 0,  Proposition 7.22(ii) with b = 0 and εn ≡ 0 shows that the sequence g(nδ)q is not Benford, and neither is g(nδ) . In fact, as in [16, Lem. 2.6], it is not hard to see that the sequence log |g(nδ)q | is distributed modulo one according to an a.c. probability measure P 6= λ0,1 on [0, 1), which is independent of δ as long as yδ/π is irrational. This means that there exists a non-zero integer k such that q 1 XN 0= 6 Pb (k) = limN →∞ e2πık log |g(nδ) | n=1 N

for almost all δ > 0. Thus, writing log |g(nδ)q | as gn (δ) for convenience, Z 1 1 XN b P (k) = limN →∞ e2πıkgn (δ) dδ n=1 0 N Z 1 XN 1 n 2πıkg1 (t) e dt (7.40) = limN →∞ n=1 n 0 N Z i 1 XN 1 Xn = limN →∞ e2πıkg1 (t) dt , n=1 n i=1 i−1 N

where the f rst equality follows from the Dominated Theorem. PN Convergence Pn Next, recall from [71, Thm. 92] that if limN →∞ N −1 n=1 n−1 i=1 zi exists for P a bounded sequence (zn ) in C, then limN →∞ N −1 N n=1 zn also R nexists and has the same value. Specif cally, since the sequence (zn ) with zn = n−1 e2πıkg1 (t) dt is clearly bounded, (7.40) implies that Z n Z 1 N 2πıkg1 (t) 1 XN 2πıkg1 (t) b e dt = limN →∞ e dt , 0 6= P (k) = limN →∞ n=1 n−1 N N 0

and by the continuous-time version of Weyl’s criterion [93, Thm. I.9.2], the function g1 = log |g q | is not u.d. mod 1. Thus the function g q is not Benford, and neither is g, by Theorem 4.4. In summary, if σ(A) is exponentially resonant then (ii) fails. In other words, (ii)⇒(i), and the proof is complete. 

MULTI-DIMENSIONAL LINEAR PROCESSES

175

Remarks. (i) It follows from (7.37) that σ(A) is exponentially resonant whenever ℜλ = 0 for some λ ∈ σ(A), i.e., whenever A has an eigenvalue on the imaginary axis. Moreover, if σ(A) ⊂ R, that is, if all eigenvalues of A are real, then the converse is also true, i.e., σ(A) is exponentially resonant only if 0 ∈ σ(A). (ii) As in the discrete-time setting of Section 7.3, it is not hard to check that the set  A ∈ Rd×d : σ(A) is exponentially resonant

is of f rst category and has Lebesgue measure zero. For most d × d-matrices A, therefore, the function h(etA ) is either Benford or identically zero for every h ∈ Ld .

Example 7.50. This example reviews Example 7.45 in the light of Theorem 7.49. In each case, the latter is consistent with the observations made in that example.   0 1 (i) The matrix A = has eigenvalues λ = ±1; hence σ(A) is expo1 0 nentially nonresonant.   0 −1 (ii) For B = , the spectrum σ(B) = {±ı} is exponentially reso1 0 nant.   √ −2 −3 (iii) The eigenvalues of C = are λ = −1 ± ı 2. Since π log e is 1 0 √ transcendental, −1 6∈ spanQ 2/(π log e) , and so σ(C) is exponentially nonresonant. (By contrast, recall from Example 7.27 that it is apparently unknown whether σ(C) is nonresonant.) z   1 1 Example 7.51. The eigenvalues of A = from (7.2) are both real 1 0 and non-zero. Hence σ(A) is exponentially nonresonant, and every function h(etA ) is either Benford or zero. This is also evident from the explicit formula etA =

 etϕ e−t/ϕ  (I2 + ϕA) + (1 + ϕ)I2 − ϕA , ϕ+2 ϕ+2

which shows that h(etA ) ≡ 0 if and only if h(I2 ) = h(A) = 0.

z

For the nonlinear observables h(A) = |A| and h(A) = |Au|, the following analogue of Theorem 7.28 holds. Theorem 7.52. Let A ∈ Rd×d . If σ(A) is exponentially nonresonant then:

(i) The function |etA | is Benford;

(ii) For every u ∈ Rd \ {0} the function |etA u| is Benford.

Proof. Since σ(A) is exponentially nonresonant, by Proposition 7.48 the set σ(eδA ) is nonresonant for all but countably many δ > 0, and eδA is invertible whenever δ > 0. By Theorem 7.28(i), the sequence (|enδA |) is Benford in this case, and Proposition 4.8(i) shows that the function |etA | is Benford as well.

176

CHAPTER 7

Similarly, for every u ∈ Rd , the sequence (|enδA u|) is Benford or identically zero, and the latter can happen only if u = 0. For u 6= 0, therefore, (|enδA u|) is Benford for almost all δ > 0, and, again by Proposition 4.8(i), the function |etA u| is Benford.    0 1 Example 7.53. (i) For the matrix A = in Example 7.45, the 1 0 set σ(A) is exponentially nonresonant; hence Theorem 7.52 applies, showing that the functions |etA | and |etA u| with u ∈ R2 \ {0} are Benford. This can also be seen directly by means of Example 4.9(i) and Theorem 4.12(ii), since etA = I2 cosh t + A sinh t, so |etA | = et , and, for any u ∈ R2 , q |etA u| = 12 (u1 + u2 )2 e2t + 12 (u1 − u2 )2 e−2t . 

 0 −1 (ii) The spectrum of the matrix B = is exponentially resonant, 1 0 and etB = I2 cos t + B sin t, so |etB | ≡ 1 is not Benford. Neither is |etB u| ≡ |u| for any u ∈ R2 ; see Figure 7.4.   −2 −3 (iii) As seen in Example 7.50, the spectrum of C = is expo1 0 nentially nonresonant. By Theorem 7.52, |etC | is Benford, and so is |etC u| for every u ∈ R2 \ {0}. As in (i), this √  via a direct  √ can  also easily be conf rmed calculation, since etC = I2 e−t cos t 2 + √12 (C + I2 )e−t sin t 2 . z Remarks. (i) As in the discrete-time setting, |·| in Theorem 7.52 can be replaced by any other norm on Rd×d and Rd , respectively. (ii) Except for the trivial case d = 1, the converse of Theorem 7.52 does not hold in general: Even if σ(A) is exponentially resonant, the function |etA | may be Benford, and so may |etA u| for every u ∈ Rd \ {0}. In fact, continuoustime variants of Example 7.30 show that properties (i) and (ii) in Theorem 7.52 cannot be characterized through properties of σ(A) or σ(etA ) alone. Recall from Examples 7.27 and 7.44 that it may be diÿ cult in practice to decide whether σ(A) is nonresonant for a given A ∈ Rd×d , even if d = 2 and all entries of A are integers. In many cases of practical interest, the situation regarding exponential nonresonance is much simpler. Lemma 7.54. Assume that all entries of A ∈ Rd×d are rational numbers. Then σ(A) is exponentially nonresonant if and only if A has no eigenvalues on the imaginary axis. Proof. The “only if” conclusion is immediate because eigenvalues of A on the imaginary axis always render σ(A) exponentially resonant. For the “if” part, assume A has no eigenvalues on the imaginary axis, and suppose σ(A) is exponentially resonant. Then, for some λ ∈ σ(A), π log e ℜλ ∈ spanQ {ℑµ : µ ∈ σ(A), ℜµ = ℜλ} .

177

MULTI-DIMENSIONAL LINEAR PROCESSES

[etB u]2 , [etC v]2 t=0 v

0 2u = v = 1

1

etC v t=0 u

etB u t

0

0

[etB u]1 , [etC v]1

|etB u|, |etC v| 1

t

0

t

Figure 7.4: With B, C as in Example 7.50, no function h(etB ) is Benford, but every nontrivial function h(etC ) is; see Example 7.53.

Since ℜλ 6= 0, this means that π log e is a rational linear combination of the algebraic numbers ℑµ/ℜλ. This, however, is impossible because π log e is transcendental. Thus σ(A) is exponentially nonresonant.  Remark. As the above proof shows, Lemma 7.54 remains valid if “rational” is re √ √  2 3 √ √ placed by “algebraic” (over Q). Thus, for instance, the spectrum of 5 7 is exponentially nonresonant. Recall from Section 7.5 that the Benford property for solutions  of linear difference equations follows easily from results on sequences h(An ) . Similarly, Theorem 7.49 can be used to study the Benford property for solutions of scalar linear differential equations of any order. More concretely, consider the linear differential equation x(d) + a1 x(d−1) + . . . + ad−1 x˙ + ad x = 0 ,

(7.41)

178

CHAPTER 7

where, as usual, d is a positive integer, referred to as the order of (7.41), and a1 , a2 , . . . , ad are real numbers with ad 6= 0. (For instance, d = 2 for all three differential equations in Example 7.45.) Given any u ∈ Rd , there exists a unique solution x = x(t) of (7.41) with x(j−1) (0) = uj for all j ∈ {1, 2, . . . , d}; e.g., see [77]. Using Theorem 7.49, it is straightforward to decide whether all nontrivial solutions of (7.41) are Benford. Theorem 7.55. For the differential equation (7.41) with a1 , a2 , . . . , ad ∈ R and ad 6= 0 the following are equivalent: (i) The set {z ∈ C : z d + a1 z d−1 + . . . + ad−1 z + ad = 0} is exponentially nonresonant;

(ii) Every solution x = x(t) of (7.41) is Benford unless x(t) ≡ 0.

Proof. In order to make (7.41) amenable to Theorem 7.49, rewrite it in the form   (d−1)   (d−1)   x x −a1 −a2 · · · −ad−1 −ad   x(d−2)   x(d−2)   1 0 · · · 0 0     d   (d−3)   x(d−3)   0 1 0 ··· 0  .  x =     dt  .. ..   . . . . . . . . . .     . . . . .  . . x

0

···

0

1

0

x

With the d × d-matrix A given by   −a1 −a2 · · · −ad−1 −ad  1 0 ··· 0 0     0 1 0 · · · 0  A= ,  .. ..  . . . . . .  . . . . .  0 ··· 0 1 0

therefore, any solution x = x(t) of (7.41) can be represented as tA x(t) = x(d−1) (0)[etA ]d1 + x(d−2) (0)[etA ]d2 + . . . + x(0)[e ˙ ]dd−1 + x(0)[etA ]dd ,

which is of the form x(t) = h(etA ) with the appropriate h ∈ Ld . Moreover, note that det(A − zId ) = (−1)d (z d + a1 z d−1 + . . . + ad−1 z + ad ) ,

and, consequently, σ(A) = {z ∈ C : z d + a1 z d−1 + . . . + ad−1 z + ad = 0}. Finally, observe that, as in the discrete-time setting, the function f (t) = h(etA ) solves (7.41) for every h ∈ Ld because f (k) (t) = h(etA Ak ) for k = 0, 1, 2, . . ., and so   f (d) (t) = h(etA Ad ) = h etA (−a1 Ad−1 − a2 Ad−2 − . . . − ad−1 A − ad Id ) = −a1 h(etA Ad−1 ) − a2 h(etA Ad−2 ) − . . . − ad−1 h(etA A) − ad h(etA ) = −a1 f (d−1) (t) − a2 f (d−2) (t) − . . . − ad−1 f˙(t) − ad f (t) .

179

MULTI-DIMENSIONAL LINEAR PROCESSES

With these preparations, the asserted equivalence is now easily established. If {z ∈ C : z d +a1 z d−1 +. . .+ad−1 z +ad = 0} = σ(A) is exponentially nonresonant then h(etA ) is Benford or trivial for every h ∈ Ld , by Theorem 7.49. In particular, x = x(t) either is Benford, or else x(0) = x(0) ˙ = . . . = x(d−1) (0) = 0, in d d−1 which case x(t) ≡ 0. Conversely, if {z ∈ C : z + a1 z + . . . + ad−1 z + ad = 0} is exponentially resonant then, again by Theorem 7.49, there exists h0 ∈ Ld such that the function h0 (etA ) is neither Benford nor identically zero, and the same is true for the solution x(t) = h0 (etA ) of (7.41).  Example 7.56. (i) Let a1 , a2 ∈ R be rational (or at least algebraic). By Lemma 7.54 and Theorem 7.55, every solution of x ¨ + a1 x˙ + a2 x = 0, except x(t) ≡ 0, is Benford if and only if {z ∈ C : z 2 + a1 z + a2 = 0} does not intersect the imaginary axis. This is readily conf rmed to be equivalent to a1 a2 6= 0

or a2 < 0 .

(7.42)

In Example 7.45(i), a2 = −1, whereas in Example 7.45(iii), a1 a2 = 6. In both cases, therefore, (7.42) holds and shows again that all nontrivial solutions of the respective IVPs are Benford. On the other hand, for Example 7.45(ii), a1 = 0 and a2 = 1, so (7.42) fails. This is consistent with the fact that no solution of the corresponding IVP is Benford. (ii) Similarly, if a1 , a2 , a3 ∈ R are rational (or algebraic) and a3 6= 0 then, ... except for x(t) ≡ 0, every solution of x + a1 x ¨ + a2 x˙ + a3 x = 0 is Benford if and only if a1 a2 6= a3 or a2 ≤ 0 . (7.43) ... ... Thus every nontrivial solution of x +x = 0 or x +2¨ x +3x+4x ˙ = 0, for instance, ... is Benford. On the other hand, (7.43) fails for x + x ¨ + x˙ + x = 0; for example, x(t) = cos t is a nontrivial solution that is not Benford, by Theorem 4.10. z Example 7.57. For every a, b ∈ R with b 6= 0, the functions x(t) = eat cos(bt) and x(t) = eat sin(bt) both solve x ¨ − 2ax˙ + (a2 + b2 )x = 0. By (7.42), if a and b are rational (or algebraic) and a 6= 0 then x is Benford. In fact, from (7.37) and Theorem 7.55 it is clear that x is Benford if and only if the number aπ log e/b is irrational; this is automatically the case whenever a and b are non-zero and rational (or algebraic). z

Chapter Eight Real-valued Random Processes Benford’s law arises naturally in a variety of stochastic settings, including products of independent random variables, mixtures of random samples from different distributions, and iterations of random maps. This chapter provides the reader with concepts and tools to analyze signif cant digits and signif cands for these basic random processes. Perhaps not surprisingly, Benford’s law also arises in many other important f elds of stochastics, such as geometric Brownian motion, random matrices, and Bayesian models, and the present chapter may serve as a preparation for specialized literature on these advanced topics [18, 56, 96, 108, 143, 144]. Throughout the chapter, recall that by Theorem 4.2 a random variable X is Benford if and only if log |X| is uniformly distributed modulo one. 8.1

CONVERGENCE OF RANDOM VARIABLES TO BENFORD’S LAW

The analysis of sequences of random variables, notably the special case of products and sums of independent, identically distributed (i.i.d.) random variables, constitutes an important topic in classical probability theory. Within this context, the present section studies general scenarios that lead to Benford’s law emerging as an “attracting” distribution. The results nicely complement the observations made in previous chapters for deterministic processes. Recall from Chapter 3 that a (real-valued) random variable X is Benford by def nition if P(S(X) ≤ t) = log t for all t ∈ [1, 10). Also, recall that a sequence (Xn ) = (X1 , X2 , . . .) of random variables converges in distribution to a random D

variable X, symbolically Xn −→ X, if limn→∞ P(Xn ≤ x) = P(X ≤ x) for every x ∈ R for which P(X = x) = 0. Another important concept in limit theorems in probability theory is almost sure convergence. Specif cally, the a.s. sequence (Xn ) converges to X almost surely (a.s.), in symbols Xn −→ X, if a.s.

D

P( limn→∞ Xn = X ) = 1. It is easy to check that Xn −→ X implies Xn −→ X. The reverse implication does not hold in general, as is evident from any i.i.d. sequence (X1 , X2 , . . .) for which X1 is not constant: In this case, all Xn have D

the same distribution, so trivially Xn −→ X1 but P( limn→∞ Xn exists ) = 0. The following def nition adapts these general notions to the context of Benford’s law.

181

REAL-VALUED RANDOM PROCESSES

Definition 8.1. Let (Xn ) = (X1 , X2 , . . .) be a sequence of real-valued random variables. (i) The sequence (Xn ) converges in distribution to Benford’s law if limn→∞ P(S(Xn ) ≤ t) = log t

for all t ∈ [1, 10) ,

D

or, equivalently, if S(Xn ) −→ S(X), where X is a Benford random variable. (ii) The sequence (Xn ) is Benford with probability one if   P (Xn ) is a Benford sequence = 1 .

Note that the distribution function of Benford’s law, FB (t) = log t, is continuous. Consequently, if (Xn ) converges in distribution to Benford’s law then |FS(Xn ) (t) − log t| → 0 uniformly on [1, 10). The following example shows that the two notions of conformance to Benford’s law appearing in Def nition 8.1 are generally unrelated. Example 8.2. (i) Let X be a Benford random variable and, for every n ∈ N, def ne Xn = min{S(X), 10 − n−1 }. Clearly, (Xn ) converges in distribution to Benford’s law. To see this formally, notice that S(Xn ) = Xn and so, for all t ∈ [1, 10) and n ∈ N, ( 0 if 1 ≤ t < 10 − n−1 , |FS(Xn ) (t) − log t| = 1 − log t if 10 − n−1 ≤ t < 10 , ≤ 1 − log(10 − n−1 )
0. Denote n by mn = medXn the (unique) median of Xn . Then P(Xn > mn ) = 12 = m−α , n so αn = log 2/ log mn . Thus αn → 0 if and only if mn → +∞, and the conclusion α→0 follows from Theorem 3.11, with the monotonicity of FS(Xα ) (t) −→ log t being used to establish the “only if” part.  For another illustration of the above terminology, recall the characterization of periodic Benford functions provided by Theorem 4.10. Example 8.4. Let the random variable X be uniform on (0, 1) and, for every n ∈ N, let Xn = nX. Clearly, the sequence (Xn ) does not converge in distribution or almost surely. Also, since max1≤t 0, then f (Xn ) −→ f (U ), where U is uniform on (0, p). In this case, as already seen in Theorem 4.10, the random variable f (U ) is Benford if and only if f is a Benford function. z 8.2

POWERS, PRODUCTS, AND SUMS OF RANDOM VARIABLES

Recall from Theorem 4.16 the perhaps simplest example of a Benford sequence, namely, the sequence (an ) = (a, a2 , a3 , . . .), which is Benford if and only if log |a| is irrational. This example can naturally be interpreted in probabilistic terms: The (deterministic) sequence (an ) is, with probability one, the same as the (random) sequence (X1 , X1 X2 , X1 X2 X3 , . . .), where the random variables Xj all have the same distribution, namely, P(Xj = a) = 1. Thus a natural random generalization of (an ) is provided by the class of sequences of random variables Yn  Xj = (X1 , X1 X2 , . . .) , X1 , X2 , . . . identically distributed . (8.1) j=1

Random sequences of the form (8.1) can exhibit a wide variety of different behaviors. This section studies the Benford properties of (8.1) in two important special cases:

183

REAL-VALUED RANDOM PROCESSES

(i) The factors Xj are identical (and hence dependent in extremis), that is, with probability one, Xj = X1 for every j; (ii) The factors Xj are independent, i.e., X1 , X2 , . . . are i.i.d. random variables. Keep in mind that both scenarios contain the deterministic sequence (an ) as a special case. Powers of a random variable If all factors X1 , X2 , . . . are identical then the random sequence (8.1) simply takes the form (X n ), where X is any real-valued random variable. For such sequences, it is straightforward to characterize conformance to Benford’s law in the sense of Def nition 8.1. Theorem 8.5. Let X be a random variable. Then (X n ) converges in distribution to Benford’s law if and only if limn→∞ E e2πın log |X| = 0.

Proof. Def ne a sequence (Yn ) as Yn = log S(X n ) = hn log |X|i. Note that D

(X n ) converges in distribution to Benford’s law if and only if Yn −→ U (0, 1), and d recall from Lemma 4.20(v) that the latter to limn→∞ P Yn (k) = 0  is equivalent  for every k ∈ Z \ {0}. Thus, if limn→∞ E e2πın log |X| = 0 then, for every k > 0,  2πıkYn    n→∞ d P = E e2πıkn log |X| −→ 0 . Yn (k) = E e

D d d Since Yn is real-valued, P P hence Yn −→ U (0, 1). Yn (−k) Yn (k),   =2πın log |X| > 0 then there exist δ > 0 and an Conversely, if lim supn→∞ E e   increasing sequence (nj ) of positive integers such that E e2πınj log |X| ≥ δ for all j. But then, for every j ∈ N,  2πıYn   2πın log |X|  j ≥ δ, [ j = E e |P Ynj (1)| = E e

showing that (Yn ) does not converge in distribution to U (0, 1), and hence (X n ) does not converge in distribution to Benford’s law. 

In fact, using Fourier analysis tools as in Lemma 4.20, together with Theorem 4.2 and the observation that d \ P hnY i (k) = PhY i (nk) for all n ∈ N, k ∈ Z ,

it is clear from Theorem 8.5 that (X n ) converges in distribution to Benford’s law if and only if Y = log |X| satisf es Pd hY i (k) → 0 as |k| → ∞, i.e., precisely if PhY i is a so-called Rajchman probability. As Theorem 8.8 below implies, therefore, if a probability on [0, 1) is a.c. then it is Rajchman, but the converse is not true. Also, every Rajchman probability is continuous, and again the converse is not true; see [101]. Thus there are continuous random variables whose powers do not converge in distribution to Benford’s law; see Example 8.9(ii) below.

184

CHAPTER 8

Theorem 8.6. Let X be a random variable. Then the following are equivalent: (i) The sequence (X n ) is Benford with probability one; (ii) P(log |X| ∈ Q) = 0, i.e., log |X| is irrational with probability one; (iii) P(S(X m ) = 1) = 0 for every m ∈ N. Proof. Denote by ( , A, P) the (abstract) probability space on which X is def ned. By Theorem 4.16, for every ω ∈ , the sequence   X(ω)n = (X(ω), X(ω)2 , X(ω)3 , . . .)

is Benford if and only if log |X(ω)| is irrational, which shows (i)⇔(ii). To see the equivalence of (ii) and (iii), simply note that, for every m ∈ N and t ∈ [1, 10),      log t 1 ω ∈ : S X(ω)m = t = ω ∈ : log |X(ω)| ∈ + Z , (8.2) m m and hence supm∈N P(S(X m ) = 1) = P(log |X| ∈ Q).



Example 8.2(ii) above shows that it is possible for a sequence (X n ) to be Benford with probability one yet not converge in distribution to Benford’s law. The reverse implication, however, always holds. Corollary 8.7. Let X be a random variable. If the sequence (X n ) converges in distribution to Benford’s law then it is Benford with probability one.   Proof. Suppose that P (X n ) is a Benford sequence < 1. By Theorem 8.6 there exist p ∈ Z and q ∈ N such that P(|X| = 10p/q ) > 0. But then FS(X nq ) (1) = P(S(X nq ) ≤ 1) ≥ P(|X| = 10p/q ) > 0 for every n ∈ N ,

  showing that the sequence FS(X n ) (1) does not converge to 0 = log 1, and hence (X n ) does not converge in distribution to Benford’s law.  Many random variables encountered in practice have a density, and in this case Theorems 8.5 and 8.6 both apply. Theorem 8.8. If the random variable X has a density then the sequence (X n ) converges in distribution to Benford’s law and is Benford with probability one. Proof. If X has a density then so has Y := hlog |X|i.  Since  Y is integrable, the Riemann–Lebesgue Lemma implies that limn→∞ E e2πınY = 0, so by Theorem 8.5, (X n ) converges in distribution to Benford’s law. The second assertion follows from Corollary 8.7. 

185

REAL-VALUED RANDOM PROCESSES

Example 8.9. (i) Let X be uniform on (0, 1). Then for every n ∈ N, FS(X n ) (t) =

X∞

  t1/n − 1 P 10−k/n ≤ X ≤ t1/n 10−k/n = 1/n , k=1 10 −1

and the proof of Theorem 3.11 shows that FS(X n ) (t) increases to log t monotonically as n → ∞. In fact, a straightforward calculation yields that n→∞

n(FS(X n ) (t) − log t) −→ −

log t(1 − log t) 2 log e

uniformly on [1, 10) .

(8.3)

Thus FS(X n ) converges to FB at the (polynomial) rate (n−1 ); see Figure 8.1. (ii) Let Y be uniformly distributed onPthe classical Cantor middle thirds ∞ set. In more probabilistic terms, Y = 2 j=1 3−j Yj , where the Yj are i.i.d. with P(Y1 = 0) = P(Y1 = 1) = 21 . Let X be the continuous random variable def ned by X = 10Y . Then X is not Benford since Y is not U (0, 1). But since hY i = h3n Y i for all n ∈ N, distribution of S(X) is the same as the  the n distribution of S(X 3 ), and of S X 3 for all n ∈ N. Thus even though X is continuous, (X n ) does not converge in distribution to Benford’s law. Recall from Theorem 5.13 that the only continuous positive random variables X which have identically distributed signif cands S(X n ) for all n ∈ N are the Benford random variables. z Note that if X has an atom, i.e., P(X = x) = δ for some x ∈ R and δ > 0, then (X n ) cannot converge in distribution to Benford’s law, since for every n ∈ N, the probability distribution PS(X n ) has an atom of size at least δ, so either FS(X n ) does not converge as n → ∞, or else it converges to a discontinuous, i.e., non-Benford, limit. Thus if (X n ) converges in distribution to Benford’s law then X is continuous, and n→∞

|FS(X n ) (t) − log t| −→ 0

uniformly on [1, 10) .

(8.4)

The rate of convergence in (8.4) can often be inferred from a Fourier series representation for FS(X n ) : With  2πık log |X|  \ ϕk := Phlog , |X|i (k) = E e

k ∈ Z,

it follows directly from [139, eq. (3)] that, for every n ∈ N, FS(X n ) (t)−log t =

 2 X∞ sin(πk log t)  ℜ ϕnk e−πık log t , k=1 π k

t ∈ [1, 10) . (8.5)

The next example illustrates the usefulness of (8.5) for three concrete a.c. distributions. Example 8.10. (i) Let X = U (0, 1) as in Example 8.9(i) above. Then Z 1 Z 0 2πıky y e 10 1 e2πık log x dx = ϕk = dy = , k ∈ Z, log e 1 + ı2πk log e 0 −∞

186

CHAPTER 8

P(S(X n ) ≤ t)

1

n=1 (∆∞ = 26.88) 0.5

n=2 (∆∞ = 14.13)

X = U(0,1) 0

1

n=5 (∆∞ = 5.73)

10

t P(S(X n ) ≤ t) − log t

0

n = 10 (∆∞ = 2.87)

10

t

−0.1

−0.2

−0.3

Figure 8.1: If the random variable X is distributed uniformly on (0, 1) then the distribution of S(X n ) converges to Benford’s law as n → ∞; see Example 8.9(i). and (8.5) immediately yields the estimate, for every n ∈ N, 2n X∞ |ϕnk | n maxt∈[1,10) |FS(X n ) (t) − log t| ≤ k=1 π k X∞ 1 2n X∞ 1 1 1 p = < 2 = , k=1 k 1 + 4π 2 n2 k 2 (log e)2 k=1 k 2 π π log e 6 log e

which is consistent with (8.3) and nearly best possible, as the latter implies that limn→∞ n maxt∈[1,10) |FS(X n ) (t) − log t| =

1 . 8 log e

(ii) Let X > 0 have the density fX (x) =

log e 1 · , πx 1 + (log x)2

x > 0;

in other words, the random variable log X is standard Cauchy-distributed. In this case, Z +∞ 2πık log x Z e log e 1 +∞ e2πıky ϕk = · dx = ds = e−2π|k| , k ∈ Z , 2 2 πx 1 + (log x) π 1 + y 0 −∞

187

REAL-VALUED RANDOM PROCESSES

and so (8.5) takes the form FS(X n ) (t) − log t = From this it is clear that

1 X∞ e−2πnk sin(2πk log t) , k=1 π k n→∞

e2πn (FS(X n ) (t) − log t) −→

sin(2π log t) π

t ∈ [1, 10) .

uniformly on [1, 10) ,

(8.6)

and hence convergence in (8.4) occurs at the (exponential) rate (e−2πn ). (iii) Let X > 0 be log-normal distributed with parameters µ = 0 and σ 2 = 1, i.e., log X is normal with mean 0 and variance (log e)2 . Then Z +∞ 2 2 2 2 e2πıky 1 p ϕk = e− 2 (y/ log e) dy = e−2π k (log e) , k ∈ Z , 2 2π(log e) −∞

and (8.5) now takes the form

2

1 X∞ e−2π (log e) k=1 π k which in turn shows that, uniformly on [1, 10), FS(X n ) (t) − log t =

limn→∞ e2π

2

(log e)2 n2

2

n2 k2

(FS(X n ) (t) − log t) =

sin(2πk log t) ,

sin(2π log t) . π

For a log-normal random variable X, therefore, convergence in (8.4) occurs at  2 the (super-exponential) rate e−γn , with γ = 2π 2 (log e)2 = 3.723. z

From the proofs of Theorems 8.5 and 8.6 as as Corollary 8.8, it is  well n clear that if X has a density then the sequence X 2 , for instance, converges in distribution to Benford’s law and is Benford with probability one. Note  n n 1 that X 2 is not of the form (8.1). However, with Xn := 10 X 2 , clearly 2 Xn = 10Xn−1 . Thus the map f (x) = 10x2 , f rst studied in Example 6.3 in a purely deterministic context, makes a reappearance in the present stochastic setting. This suggests that several results of Chapter 6 may be extended to the stochastic context. While this topic will not be pursued formally here, the following example gives an informal illustration. Example 8.11. Let (Xn ) be a sequence of random variables where X0 has a density, and 2 + 1, n ∈ N. Xn = Xn−1

In other words, Xn = f (Xn−1 ) with the map f (x) = x2 + 1 from Example 6.25. As in the proof of Theorem 6.23, it follows that a.s.

log Xn − 2n Y = log f n (X0 ) − 2n Y −→ 0

as n → ∞ ,

with a uniquely def ned random variable Y = sb(log |X0 |). Close inspection of the function sb shows that Y has a density as well. Hence the sequence (Xn ) converges in distribution to Benford’s law and is Benford with probability one. z

188

CHAPTER 8

Products of independent random variables When turning to (8.1) with independent factors X1 , X2 , . . ., it is worth noting that in the case of independence, Benford’s law already plays a very special role for f nite products (as opposed to sequences of products). Informally put, if one factor in a product of independent factors is Benford, then the whole product is exactly Benford, regardless of the the other factors. Theorem 8.12. Let X and Y be independent random variables, and assume that P(XY = 0) = 0. Then: (i) If X is Benford then so is XY ; (ii) If S(X) and S(XY ) have the same distribution then either log S(Y ) is rational with probability one, or X is Benford. Proof. Conclusion (i) follows immediately from Theorems 4.2 and 4.21. To see (ii), note f rst that log S(XY ) = hlog S(X) + log S(Y )i, and, since the random variables X0 := log S(X) and Y0 := log S(Y ) are independent, Lemma 4.20(iv) implies that d d \ \ Plog S(XY ) = PhX0 +Y0 i = PX0 · PY0 .

(8.7)

If S(X) and S(XY ) have the same distribution, (8.7) yields   d Pd X0 (k) 1 − PY0 (k) = 0 for all k ∈ Z .

d d d If P Y0 (k) 6= 1 for all non-zero k, then PX0 = λ0,1 by Lemma 4.20(vi), i.e., X is  2πik Y  0 0 d Benford. Otherwise, P = 1. Y0 (k0 ) = 1 for some integer k0 6= 0, i.e., E e It follows that PY0 ( k10 Z) = 1; hence k0 Y0 = k0 log S(Y ) is an integer with probability one, so log S(Y ) is rational with probability one.  Example 8.13. Let V , W be i.i.d. U (0, 1) random variables. Then X := 10V and Y := W are independent and, by Theorem 8.12(i), XY is Benford since X is Benford by Example 3.6(i), even though Y is not. If, on the other hand, X := 10V and Y := 101−V then X and Y are both Benford, yet XY is not. Hence the independence of X and Y is crucial in Theorem 8.12(i). It is essential √ 2−1 in assertion (ii) as well, as can be seen by letting X equal either 10 or √ 1 2− 2 −2 10 with probability 2 each, and choosing Y = X . Then S(X) has the same distribution as S(XY ) = S(X −1 ), but neither X is Benford nor log S(Y ) is rational with probability one. z Corollary 8.14. Let X be a random variable with P(X = 0) = 0, and let a ∈ R be such that log |a| is irrational. If the signif cant digits of X and aX are identically distributed then X is Benford. Remark. The conclusion of Corollary 8.14 fails under the weaker assumption that the f rst signif cant digits of X and aX are identically distributed; see Example 5.9.

REAL-VALUED RANDOM PROCESSES

189

With the role of Benford’s law in f nite products clarif ed, next consider sequences of products of independent, identically distributed random variables, that is, consider (8.1) with any i.i.d. sequence (Xn ). As with sequences of powers of a random variable above, by employing the tools established in Chapter 4 using Fourier coeÿ cients, it is straightforward to decide whether or not the sequence of products converges in distribution to Benford’s law, or is Benford with probability one. Theorem 8.15. Let X1 , X2 , . . . be i.i.d. random variables with P(X1 = 0) = 0. Then the following are equivalent: Qn  (i) The sequence j=1 Xj converges in distribution to Benford’s law;   1 (ii) P log |X1 | ∈ x + m Z < 1 for every x ∈ R and m ∈ N; (iii) P(S(X1m ) = t) < 1 for every t ∈ [1, 10) and m ∈ N.

Proof. Letting Yn = log |Xn | for each n ∈ N, the equivalence of (i) and (ii) follows immediately from Theorems 4.2 and 4.22. The equivalence of (ii) and (iii) follows from (8.2).  The almost sure analogue of Theorem 8.15 for sequences of products of i.i.d. random variables is as follows. Theorem 8.16. Let X1 , X2 , . . . be i.i.d. random variables with P(X1 = 0) = 0. Then the following are equivalent:  Qn (i) The sequence j=1 Xj is Benford with probability one;   1 Z < 1 for every m ∈ N; (ii) P log |X1 | ∈ m (iii) P(S(X1m ) = 1) < 1 for every m ∈ N.

Proof. Letting Yn = log |Xn | for n ∈ N, the equivalence of (i) and (ii) follows immediately from Theorems 4.2 and 4.24. The equivalence of (ii) and (iii) follows as in (8.2), with t = 1.  Example 8.17. As in Example 8.2(ii), let P(Xn = 2) = 1 for allQ n. Then n X1 , X2 , . . . are i.i.d. random variables and satisfy Theorem 8.16(ii), so j=1 Xj is Benford with probability one. On the other hand, the conclusion of Theorem Qn  n 8.15(ii) fails and, as already seen in Example 8.2, the sequence j=1 Xj = (2 ) does not converge in distribution to Benford’s law. z Qn  As the previous example shows, a sequence j=1 Xj with i.i.d. factors may be Benford with probability one yet may fail to converge in distribution to Benford’s law. As in the case of powers (Corollary 8.7), however, the reverse implication always holds. 8.18. Let X1 , X2 , . . . be i.i.d. random variables. If the sequence QnCorollary  j=1 Xj converges in distribution to Benford’s law then it is Benford with probability one.

190

CHAPTER 8

Proof. Simply note that Theorem 8.15(iii) implies Theorem 8.16(iii).



Observe that Theorem 8.15(ii) and Theorem 8.16(ii) hold unless the random variable X1 is discrete, i.e., unless there exists a countable (possibly f nite) set C ⊂ R with P(X1 ∈ C) = 1. Hence Theorems 8.15 and 8.16 together imply the following useful result (cf. Theorem 8.8 and [132]). Corollary 8.19. Let X Q1n, X2 , . . . be i.i.d. random variables. If X1 is not discrete then the sequence j=1 Xj converges in distribution to Benford’s law and is Benford with probability one.

Note, however, that even if X1 is discrete, the conclusion of Theorem 8.15(ii) may hold. Example 8.20. Let X1 > 0 be a random variable with P(X1 = 2j ) = 2−j for every j ∈ N. Clearly, X1 is discrete. However, log 2 is irrational; hence, given 1 any x ∈ R and m ∈ N, the twosets {j log 2 : j ∈ N} and x + m Z have at most 1 1 one element in common, and P log X ∈ x + Z ≤ . Thus Theorem 8.15(ii) 1 m 2 Qn  applies, and the sequence X converges in distribution to Benford’s law. j j=1 By Corollary 8.18, it is also Benford with probability one. z

Another immediate consequence of Theorems 8.6Q and 8.16 is that the almost n sure Benford properties of the sequences (X1n ) and j=1 Xj are related. Corollary 8.21. Let X1 , X2 , . . . be i.i.d. random variables. QnIf the sequence  (X1n ) is Benford with probability one then so is the sequence j=1 Xj . Proof. Simply note that P(log |X1 | ∈ Q) = 0 implies Theorem 8.16(ii). 

The converse of Corollary 8.21 does not hold in general, as the following example shows. Example 8.22. Let 0 < p < 1 and let X1 , X2 , . . . be i.i.d. positive random variables with the common distribution function  px √  √ if 0 ≤ x < 3 10 , 3 10 FX1 (x) = √  1 if x ≥ 3 10 ,  √  that is, X1 is uniformly distributed on 0, 3 10 with probability p, and otherwise √ (i.e., with probability 1 − p) is equal to 3 10. Since   1 + ı6π(1 − p)n log e n→∞ E e6πın log X1 = −→ 1 − p 6= 0 , 1 + ı6πn log e

Theorem 8.5 shows that the sequence (X1n ) does not converge in distribution to Benford’s law. Since P(log X1 = 31 ) = 1 − p > 0, it is not Benford with probability either. On the other hand, X1 is not discrete, and hence the Qone  n sequence j=1 Xj converges in distribution to Benford’s law and is Benford

191

REAL-VALUED RANDOM PROCESSES

with probability one. Note Qnthat the  latter fact is not related to the actual behavior of the sequence X : From j j=1 E[log X1 ] =

1 3

− p log e = (p0 − p) log e ,

with p0 = 31 (log e)−1 = 0.7675 ,

Qn Qn a.s. a.s. it follows that j=1 Xj −→ +∞ for p < p0 , and j=1 Xj −→ 0 whenever Qn  p > p0 . If p = p0 then, with probability one, the sequence j=1 Xj does not converge but attains both arbitrarily large and arbitrarily small positive values. z Finally, note that a particularly important special case of Corollary 8.19 occurs whenever X1 has a density (cf. Theorem 8.8). In this case, P(X1 ∈ C) = 0 for every countable set C ⊂ R, so X1 is clearly not discrete.

Corollary 8.23. Qn If the i.i.d. random variables X1 , X2 , . . . have a density then the sequence j=1 Xj converges in distribution to Benford’s law and is Benford with probability one.

Figure 8.2 illustrates the dependencies between the various Benford properties of random sequences (8.1) for the two scenarios considered so far. Example 8.24. LetX Q1n, X2 , . .. be i.i.d. U (0, 1) random variables. By Corollary 8.23, the sequence j=1 Xj converges in distribution to Benford’s law (a fact apparently f rst recorded in [2]) and is Benford with probability one. As 1 already seen in Example Qn 3.10(i),  FS(X1 ) (t) = 9 (t − 1). For n ≥ 2, the distribution function of S j=1 Xj can also be computed explicitly, for instance by observing that − log X1 is gamma-distributed and using the addition property of the gamma distribution. Specif cally, it can be shown that Xn FS(Qnj=1 Xj ) (t) = cn,1 (t − 1) + t cn,k (− ln t)k−1 , t ∈ [1, 10) , (8.8) k=2

with the appropriate positive real numbers cn,1 , cn,2 , . . . , cn,n ; for example, c1,1 =

1 9

;

c2,1 =

1 9

+

−1 10 92 (log e)

,

1 9

+

10 92 (log e)

+

c3,1 =

−1

c2,2 =

1 9

55 93 (log e)

;

−2

,

c3,2 =

1 9

+

−1 10 92 (log e)

,

c3,3 =

1 18

.

As can be seen in Figure 8.3, the distribution of S(X1 X2 X3 ) is already quite close to Benford’s law. Observe, however, that the precise rate of convergence in |FS(Qnj=1 Xj ) (t) − log t| → 0 is not easily recognizable from (8.8). In the next example, it will be identif ed by means of Fourier coeÿ cients. z Let X1 , X2 , . . . be i.i.d. random variables with the common density fX . Recall that, since FB (t) = log t is continuous, n→∞

|FS(Qnj=1 Xj ) (t) − log t| −→ 0

uniformly on [1, 10) .

(8.9)

192

CHAPTER 8

(X1n ) converges in distribution to Benford’s law

X1 is Benford



=⇒

X1 has a density =⇒

limn→∞ E e2πın log |X1 | = 0

=⇒

=

X1 is not discrete

X1 is continuous

=⇒

=⇒

n j=1 Xj

converges in distribution to Benford’s law

(X1n ) is Benford with probability one





1 P log |X1 | ∈ x + m Z 0 then P(nX = 0) = δ for every n ∈ N, from which it is clear that (i) and (ii) hold in this (iii) follows immediately from Example 4.7(ii) because  case  also. Property   P ω : nX(ω) is a Benford sequence = P(∅) = 0. 

Finally, consider (8.11) with independent terms X1 , X2 , . . .. Here the random variables X1 , X1 + X2 , . . . all have different distributions unless X1 = 0 with probability one. Given the discussion above, therefore, the assertions of the following analogue to Theorems 8.15 and 8.16 for sums may not come as a surprise. Theorem Let X1 , X2 , . . . be i.i.d. random variables with f nite vari 8.30.  ance, i.e., E X12 < +∞. Then: Pn  (i) No subsequence of j=1 Xj converges in distribution to Benford’s law; Pn  (ii) With probability one, j=1 Xj is not a Benford sequence.

Assume f rst that E[X1 ] 6= 0. By the Strong Law of Large Numbers,   1 Proof. P n Xj converges a.s., and hence also in distribution, to the constant j=1 n Pn  |E[X1 ]|. Since P j=1 Xj = 0 → 0 as n → ∞ and   E   1 Xn log S Xj Xj + log n Xj = log j=1 j=1 j=1 n   P  P whenever nj=1 Xj 6= 0, any subsequence of S n1 nj=1 Xj either does not converge in distribution at all or converges to a constant; in neither Pn case, therefore, is the limit a Benford random variable. Since j=1 Xj → +∞ with probability one, it follows from Xn 1 Xn a.s. log Xj − log n = log Xj −→ |E[X1 ]| , j=1 j=1 n Xn



Xn = log D

together with 4.3(i) and 4.6(iii), that with probability one, the Pn Propositions  sequence X is not Benford. j j=1 It remains to consider   the case E[X1 ] = 0. Without loss of generality,  Pn it can be assumed that E X12 = 1. By the Central Limit Theorem, √1n j=1 Xj

199

REAL-VALUED RANDOM PROCESSES

converges in distribution to the standard normal distribution. Thus for every suÿ ciently large n, and Pnup to a rotation (i.e., an addition mod 1) of [0, 1), the distribution of log j=1 Xj is arbitrarily close to the distribution of the random variable Y := hlog |Z|i, where Z is standard normal. Intuitively, it is clear that PY 6= λ0,1 , i.e., Y is not uniform on [0, 1). To see this more formally, note that X   s+k    FY (s) = 2 Φ 10 − Φ 10k , 0 ≤ s < 1 , (8.12) k∈Z

with Φ (= FZ ) denoting the standard normal distribution function; see Example 3.10(v). Thus   |FY (s) − s| ≥ FY (s) − s > 2 Φ (10s ) − Φ (1) − s =: R(s) , 0 ≤ s < 1 , and since R is concave on [0, 1) with R(0) = 0 but R′ (0) > 0,

max0≤s sup0≤s 0 ,  Pn  showing that indeed PY 6= λ0,1 ; hence the sequence j=1 Xj does not converge in distribution to Benford’s law, and neither does any subsequence. The verif cation of (ii) in the case E[X1 ] = 0 uses an almost sure version of the Central Limit Theorem; see [94]. With the random variables Xn def ned on some (abstract) probability space ( , A, P), let n  Xn  o : Xj (ω) is a Benford sequence . 1 = ω ∈ j=1

  By Theorem 4.2 and Proposition 4.6(iii), the sequence xn (ω) with 1 Xn xn (ω) = log √ Xj (ω) , j=1 n

is u.d. mod 1 for all ω ∈

1 XN n=1 ln N

n ∈ N,

1.

For every interval [a, b) ⊂ [0, 1), therefore,   1[a,b) hxn (ω)i → b − a as N → ∞ . n

(Recall the remarks on logarithmic averaging in Section 3.1.) However, as a consequence of [94, Thm. 2], for every [a, b) ⊂ [0, 1),   1 XN 1[a,b) hxn i a.s. −→ FY (b) − FY (a) , n=1 ln N n

with FY given by (8.12). Pn As shown above, FY (s) 6≡ s, and therefore P( In other words, P ( j=1 Xj ) is a Benford sequence = 0.

1)

= 0. 

1 Example 8.31. (i) Let  X1 , X2 , . . . be i.i.d. with P(X1 = 0) =P(X Pn 1 = 2) = 2 . 2 Then E[X1 ] = 1, E X1 = 2, and by Theorem 8.30(i) neither j=1 Xj nor

200

CHAPTER 8

any to Benford’s law. Note that Pnof its subsequences converges in distribution 1 1 X is binomial with parameters n and , i.e., for all n ∈ N, j j=1 2 2 P

   n , Xj = 2k = 2−n k j=1

Xn

k = 0, 1, . . . , n .

The Law of the Iterated Logarithm [36, Sec. 10.2] asserts that Xn

j=1

√ Xj = n + Yn n ln ln n for all n ≥ 3 ,

(8.13)

√ where the random sequence (Yn ) is bounded; in fact, lim supn→∞ |Yn | = 2 a.s. Thus by Example 4.7(ii), Theorem 4.12(i), and (8.13) it is clear that, with Pn probability one, the sequence X j is not Benford. j=1 (ii) then so is Pn If the i.i.d. random variables X1 , X2 , . . . are standard normalP n √1 X for every n ∈ N. Hence the distribution function of S j j=1 j=1 Xj n simply equals FS(

Pn

j=1

Xj ) (t)

=2

  k  k  t10 10 Φ √ −Φ √ , k∈Z n n

X

t ∈ [1, 10) .

Pn  By Theorem 8.30, the sequence j=1 Xj does not converge in distribution to Benford’s law (not even a subsequence does) and is, with probability one, not Pn Benford. Note that even though j=1 Xj in a sense may be closer to Benford’s law than, say, the uniform random variable U (0, n), as seen in Theorem 3.13, the value of maxt∈[1,10) |FS(Pnj=1 Xj ) (t) − log t| does not change much with growing z n and does not go to zero as n → ∞; see Figure 8.5. As the f nal example in this section illustrates, the conclusions of Theorem 8.30 may hold even in cases where X1 does not have a f nite f rst moment, let alone a f nite second moment. The authors conjecture that the hypothesis of f nite variance in Theorem 8.30 can be weakened considerably; perhaps even no random walk on the real line at all has Benford paths (in distribution or with probability one). Example 8.32. Let X1 , X2 , . . . be i.i.d. random variables with density fX1 (x) =

(

√1 e−1/(2x) x−3/2 2π

if x > 0 ,

0

if x ≤ 0 ,

i.e., the distribution stable exponent  p  of X1 is a one-sided  plaw  with characteristic 1 1 1 2 . Note that E X1 is f nite if p < 2 , and E X1 = +∞ if p ≥ 2 . Thus E[X1 ] is inf nite, and Theorem 8.30 does not apply. However, it is wellP known (and not n hard to check) that for every n ∈ N, the random variable n−2 j=1 Xj has the

201

REAL-VALUED RANDOM PROCESSES

P(S(

1

n j=1

Xj ) ≤ t) n=1 (∆∞ = 6.05)

0.5

X1 , . . . , Xn i.i.d. standard normal 0

0.1

1

t P(S(

n j=1

10

Xj ) ≤ t) − log t

n=5 (∆∞ = 9.73) n = 10 (∆∞ = 5.70) n = 20 (∆∞ = 9.59)

0.05

0

10

t

−0.05 −0.1

Figure 8.5: The sum of any number of independent standard normal random variables is never close to being Benford; see Example 8.31(ii). same distribution as X1 . The latter is not Benford because, for all 1 ≤ t < 10, X fS(X1 ) (t) −k t X 1 √ = fX1 (t10k )10k = e−10 /(2t) 10−k/2 k∈Z k∈Z fB (t) log e log e 2πt   X∞ √ −n 1 √ e−5/t 10 + e−10 /(2t) 10−n/2 ≥ n=0 log e 2πt r   1 5 1 5/t e−5/t + √ − √ =: R(t) ; ≥ log e πt 10 − 1 10 10 − 1

in particular, R(5) = 1.036. Thus δ := 12 sup1≤t 0, and consequently sup1≤t 1, for example,

205

REAL-VALUED RANDOM PROCESSES

the random variable X2 is exp(1)-distributed with probability one, whereas the unconditional probability that X2 is exp(1)-distributed is only 21 . z If (X1 , X2 , . . .) is a sequence of P -random m-samples for some m and some r.p.m. P , then the Xn are a.s. identically distributed with distribution that is the average (expected) distribution of P (see Proposition 8.38 below), but they are not in general independent (see Example 8.36). On the other hand, given (P1 , P2 , . . .), the Xn are a.s. independent, but clearly are not in general identically distributed. Although sequences of P -random m-samples have a fairly simple structure, they do not f t into any of the familiar categories of sequences of random variables. For example, they are generally not independent, exchangeable, Markov, martingale, or stationary sequences. Example 8.37. Assume that the r.p.m. P is, with equal probability, the Dirac measure δ1 concentrated at the probability measure 12 (δ1 + δ2 ),  1 and  1 1 respectively, i.e., P(P = δ1 ) = P P = 2 (δ1 + δ2 ) = 2 . Let (X1 , X2 , . . .) be a sequence of P -random 3-samples. Then the random variables X1 , X2 , . . . are:

not independent since

P(X2 = 2) =

1 4

6=

1 2

= P(X2 = 2|X1 = 2) ;

not exchangeable since   9   3 P (X1 , X2 , X3 , X4 ) = (1, 1, 1, 2) = 64 6= 64 = P (X1 , X2 , X3 , X4 ) = (2, 1, 1, 1) ;

not Markov since

P(X3 = 1|X1 = X2 = 1) =

9 10

6=

5 6

= P(X3 = 1|X2 = 1) ;

6=

5 4

= E[X2 ] ;

not martingale since E[X2 |X1 = 2] = not stationary since   P (X1 , X2 , X3 ) = (1, 1, 1) =

9 16

6=

3 2

15 32

  = P (X2 , X3 , X4 ) = (1, 1, 1) .

z

Recall that, given an r.p.m. P and any Borel set B, the quantity P (B) is a random variable with values between 0 and 1. The following property of the expectation of P (B), as a function of B, is easy to check. Proposition 8.38. Let P be an r.p.m. Then EP , def ned by Z (EP )(B) = E[P (B)] = P (ω)(B) dP(ω) for all B ∈ B+ , Ω

is a probability measure on (R+ , B+ ).

206

CHAPTER 8

Example 8.39. (i) Let P be the r.p.m. of Example 8.33(i). Then EP is the Borel probability measure with density ( 1 1 ) −x if 0 < x < 1 , 2 + 2e fEP (x) = = 12 1[0,1) (x) + 12 e−x , x > 0 . 1 −x e if x ≥ 1 , 2  (ii) Let  P be the r.p.m. of Example 8.33(ii), that is, P (ω) is uniform on 0, X(ω) , where X is exp(1)-distributed. In this case, EP is also a.c., with R +∞ density fEP (x) = x t−1 e−t dt for x > 0. (iii) Even if P is a.c. only with probability zero, it is possible for EP to be a.c. For a simple example, let P = 41 δX + 43 δ3X , where again X is exp(1)distributed. Then P(P  is purely atomic ) = 1, yet EP is a.c. with density fEP (x) = 21 e−2x/3 cosh 13 x for x > 0. z

Passing from an r.p.m. P to its expectation EP can be regarded as an averaging step, and since Z   |FEP ◦S −1 (t) − log t| = FP (ω)◦S −1 (t) − log t dP(ω) Ω ≤ supω∈Ω FP (ω)◦S −1 (t) − log t , t ∈ [1, 10) , it is clear that EP is at least as close to being Benford as some P (ω). In fact, as the next example demonstrates, EP may be much closer to being Benford than any P (ω). Part (ii) of the example shows that it is even possible for EP to be exactly Benford in cases where every individual probability measure P (ω) is distinctly non-Benford. Example 8.40. (i) Consider the random variable XT = U (0, T ), where T ∈ [1, 10) is itself a random variable with density fT (t) =

2 81 (10 −

t) ,

1 ≤ t < 10 .

Recall from Theorem 3.13 that maxt∈[1,10) |FS(XT ) (t) − log t| ≥ 0.1344 for every value of T . On the other hand, for every x > 0, Z 10 Z 10 nx o FEXT (x) = P(EXT ≤ x) = P(Xt ≤ x)fT (t) dt = min , 1 fT (t) dt t 1 1   20  (log e)−1 − 29 x if 0 < x < 1 ,    81 20 20 19 1 2 −1 = x − 81 if 1 ≤ x < 10 , 81 x − 81 x ln x + 81 (log e)    1 if x ≥ 10 , which in turn yields, for 1 ≤ t < 10, X∞   FS(EXT ) (t) = FEXT (t10−k ) − FEXT (10−k ) k=0 2   −1 1 = 81 t − 20t ln t + 200 −2 t+1− 9 (log e)

−1 200 9 (log e)



.

207

REAL-VALUED RANDOM PROCESSES

Numerically, maxt∈[1,10) |FS(EXT ) (t) − log t| = 0.03847, which is signif cantly smaller than the corresponding value for each individual XT ; see Figure 8.7. (ii) For an even more extreme example, consider the map g : [1, 10] → [1, 10], given by g(x) = 12 (27 log x − x + 3) . Since g(1) = 1, g(10) = 10, and g ′ (x) > 0 for all x, the map g is in fact a homeomorphism of [1, 10], and so is h := g −1 . Moreover, g(x) > x, and hence h(x) < x for all 1 < x < 10. With this, let T be a uniform random variable on (1, 10), i.e., T = U (1, 10), and let XT be the discrete random variable with the two possible outcomes T and h(T ) < T , attained with probabilities 31 and 23 , respectively, that is,   P(XT = T ) = 13 , P XT = h(T ) = 23 .

Clearly, no individual random variable XT is Benford since, for every T , it has an atom of size 23 , and so supt∈[1,10) |FS(XT ) (t) − log t| ≥ 13 . On the other hand, for every t ∈ [1, 10), P(EXT ≤ t) = 13 P(T ≤ t) + 32 P(h(T ) ≤ t) =

1 27 (t

− 1) +

2 27 (g(t)

− 1) = log t ,

which shows that EXT is exactly Benford; see Figure 8.8.

z

The next lemma shows that the limiting proportion of times that a sequence of P -random m-samples falls in a (Borel) set B is, with probability one, the average P-value of the set B, i.e., the limiting proportion equals EP (B). Note that this is not simply a direct corollary of the classical Strong Law of Large Numbers since the random variables in the sequence are not necessarily independent; see Examples 8.36 and 8.37. Lemma 8.41. Let P be an r.p.m., and let (X1 , X2 , . . .) be a sequence of P -random m-samples for some m ∈ N. Then, for every B ∈ B+ , #{1 ≤ n ≤ N : Xn ∈ B} a.s. −→ EP (B) N

as N → ∞ .

Proof. Fix B ∈ B+ and j ∈ N, and let Yj = #{1 ≤ i ≤ m : X(j−1)m+i ∈ B}. It is clear that #{1 ≤ n ≤ N : Xn ∈ B} 1 1 Xn limN →∞ = limn→∞ Yj , (8.14) j=1 N m n

whenever the limit on the right exists. By Def nition 8.35(i), given Pj , the random variable Yj is binomially distributed with parameters m and E[Pj (B)]; so, with probability one,   E[Yj ] = E E[Yj |Pj ] = E[mPj (B)] = m EP (B) , (8.15)

since Pj has the same distribution as P . By Def nition 8.35(ii), the Yj are independent. They are also uniformly bounded, since 0 ≤ Yj ≤ m for all j, and

208

CHAPTER 8

FS(XT ) , FS(EXT )

1

fT (t) =

T1 < T 2 < T 3 < T 4

0.3

1

t

XT1 (∆∞ = 19.42) XT2 (∆∞ = 26.87)

10

XT3 (∆∞ = 18.99)

FS(XT ) (t) − log t, FS(EXT ) (t) − log t

XT4 (∆∞ = 16.15)

0.2

EXT

0.1 0

− t)1[1,10) (t)

XT = U (0, T )

0.5

0

2 81 (10

(∆∞ = 3.84)

10

t

−0.1 −0.2

Figure 8.7: Although every value XT of a random probability measure may be distinctively non-Benford, the distribution of EXT may be much closer to Benford’s law; see Example 8.40(i) and also Figure 3.8. Recall that Theorem 3.13 implies ∆∞ ≥ 13.44 for every XT . P∞   hence j=1 E Yj2 /j 2 < +∞. Moreover, by (8.15) all Yj have the same mean value mEP (B). Thus, by [36, Cor. 5.1], 1 Xn a.s. Yj −→ m EP (B) j=1 n

as n → ∞ ,

and the conclusion follows from (8.14) and (8.16).

(8.16) 

The assumption that each Pj is sampled exactly m times is not essential: The above argument can easily be modif ed to show that the same conclusion holds if the j th r.p.m. is sampled Mj times, where (M1 , M2 , . . .) is a sequence of independent, uniformly bounded N-valued random variables that are also independent of the rest of the process. The stage is now set to give a statistical limit law (Theorem 8.44 below) that is the central-limit-like theorem for signif cant digits mentioned above. Roughly speaking, this law says that if probability distributions are selected at random, and random samples are then taken from each of these distributions in such a way that the overall process is scale- or base-neutral, then the signif cant digit frequencies of the combined sample will converge to the logarithmic distribution. This theorem may help explain and predict the appearance of Benford’s law in

209

REAL-VALUED RANDOM PROCESSES

FS(XT ) , FS(EXT )

T = U (1, 10)

1

P XT = T =

1 3

P XT = h(T ) =

2 3

XT1 (∆∞ = 69.89) 0.5

10

XT2 (∆∞ = 34.04) h(T )

T1 < T 2 < T 3 0

1

1

1

T

t

XT3 (∆∞ = 77.67) 10

EXT

(∆∞ = 0)

10

Figure 8.8: Every random variable XT has only two possible outcomes, namely, T and h(T ) < T , which occur with probabilities 31 and 23 , respectively. Thus ∆∞ ≥ 33.33 for every XT . Yet the random variable EXT is Benford; see Example 8.40(ii). signif cant digits in mixtures of tabulated data such as the combined data from Benford’s tables (as well as his individual dataset of numbers gleaned from newspapers), and the appearance of Benford’s law in experiments designed to estimate the distribution of signif cant digits of all numbers on the World Wide Web; see Figure 1.6 and [97]. In order to draw any conclusions concerning Benford’s law for the process of sampling from different distributions, clearly there must be some restriction on the underlying r.p.m. that generates the sampling procedure. Otherwise, if the r.p.m. is, say, U (0, 1) with probability one, then any resulting sequence of P -random m-samples will be i.i.d. U (0, 1), and hence a.s. not Benford, by Example 3.10(i). Similarly, it can easily be checked that sequences of P -random m-samples from the r.p.m.s in Example 8.33(i) and (ii) will not generate Benford sequences. A natural assumption to make concerning an r.p.m. in this context is that on average the r.p.m. is unbiased (i.e., invariant) with respect to changes in scale or base. (Analogous assumptions and conclusions concerning sum-invariance are left to the interested reader.) Recall that S denotes the signif cand σ-algebra (Def nition 2.9). Definition 8.42. An r.p.m. P has scale-unbiased signif cant (decimal) digits if, for every signif cand event A, i.e., for every A ∈ S, the expected value of P (A) is the same as the expected value of P (aA) for every a > 0, that is, if EP (aA) = EP (A)

for all a > 0 and A ∈ S ,

or, equivalently, if the Borel probability measure EP has scale-invariant signif cant digits.

210

CHAPTER 8

Similarly, P has base-unbiased signif cant (decimal) digits if, for every A ∈ S the expected value of P (A) is the same as the expected value of P (A1/n ) for every n ∈ N, that is, if EP (A1/n ) = EP (A)

for all A ∈ S and n ∈ N ,

i.e., if EP has base-invariant signif cant digits. An immediate consequence of Theorems 5.3 and 5.13 is as follows. Proposition 8.43. For every r.p.m. P , the following are equivalent: (i) P has scale-unbiased signif cant digits; (ii) P ◦S −1 ({1}) = 0 with probability one, or, equivalently, EP ({10k : k ∈ Z}) = 0, and P has base-unbiased signif cant digits; (iii) EP (A) = B(A) for all A ∈ S, i.e., EP is Benford. Random probability measures with scale- or base-unbiased signif cant digits are easy to construct mathematically; see Example 8.46 below. In real-life examples, however, scale- or base-unbiased signif cant digits should not be taken for granted. For instance, picking beverage-producing companies in Europe at random and looking at the metric volumes of samples of m products from each company is not likely to produce data with scale-unbiased signif cant digits, since the volumes in this case are probably closely related to liters. Conversion of the data to another unit such as gallons would likely yield a radically different set of f rst-digit frequencies. On the other hand, if species of mammals in Europe are selected at random and their metric volumes sampled, it seems more likely that this process is unrelated to the choice of human units. The question of base-unbiasedness of signif cant digits is most interesting when the units in question are universally agreed upon, such as the numbers of things, as opposed to sizes. For example, picking cities at random and looking at the number of leaves of m-samples of trees from those cities is certainly less base-dependent than looking at the number of f ngers of m-samples of people from those cities. As will be seen in the next theorem, scale- or base-unbiasedness of an r.p.m. implies that sequences of P -random samples are Benford a.s. A crucial point in the def nition of an r.p.m. P with scale- or base-unbiased signif cant digits is that individual realizations of P are not required to have scale- or base-invariant signif cant digits. In fact, it is often the case (see Benford’s original data in [9] and Example 8.46 below) that a.s. none of the random probabilities has either scale- or base-invariant signif cant digits, and it is only on average that the sampling process does not favor one scale or base over another. Recall that P ◦ S −1 ({1}) = 0 is the event {ω ∈ : P (ω)(S = 1) = 0}. Theorem 8.44 ([74]). Let P be an r.p.m. and assume that P either has scale-unbiased signif cant digits or has base-unbiased signif cant digits, and that

211

REAL-VALUED RANDOM PROCESSES

P ◦ S −1 ({1}) = 0 with probability one. Then, for all m ∈ N, every sequence (X1 , X2 , . . .) of P -random m-samples is Benford with probability one, that is, for all t ∈ [1, 10), #{1 ≤ n ≤ N : S(Xn ) ≤ t} a.s. −→ log t N

as N → ∞ .

Proof. Assume f rst that P has scale-unbiased signif cant digits, i.e., the probability measure EP has scale-invariant signif cant digits. By Theorem 5.3, EP is Benford. Consequently, Lemma 8.41 implies that for every sequence (X1 , X2 , . . .) of P -random m-samples and every t ∈ [1, 10),  S # 1 ≤ n ≤ N : Xn ∈ k∈Z 10k [1, t] #{1 ≤ n ≤ N : S(Xn ) ≤ t} = N N  [ a.s. k −→ EP 10 [1, t] = log t as N → ∞ . k∈Z

Assume in turn that P ◦ S −1 ({1}) = 0 with probability one, and that P has base-unbiased signif cant digits. Then Z   EP S −1 ({1}) = P ◦ S −1 (ω)({1}) dP(ω) = 0 . Ω

Hence q = 0 holds in (5.6) with P replaced by EP , proving that EP is Benford, and the remaining argument is the same as before.  Corollary 8.45. If an r.p.m. P has scale-unbiased signif cant digits, then for every m ∈ N, every sequence (X1 , X2 , . . .) of P -random m-samples, and every d ∈ {1, 2, . . . , 9}, #{1 ≤ n ≤ N : D1 (Xn ) = d} a.s. −→ log(1 + d−1 ) N

as N → ∞ .

A main point of Theorem 8.44 is that there are many natural sampling procedures that lead to the same logarithmic distribution. This may help explain how the different empirical evidence of Newcomb, Benford, Knuth, and Nigrini all led to the same law. It may also help explain why samples of numbers from newspaper front pages [9] and the World Wide Web [97] (see Figure 1.3) or extensive accounting data [112] often tend toward Benford’s law, since in each of these cases, various distributions are being sampled in a presumably unbiased way. In a newspaper, perhaps the f rst article contains statistics about population growth, the second article about stock prices, the third about forest acreage. None of these individual distributions itself may be unbiased, but the mixture may well be. Justif cation of the hypothesis of scale- or base-unbiasedness of signif cant digits in practice is akin to justif cation of the hypothesis of independence (and identical distribution) when applying the Strong Law of Large Numbers or the Central Limit Theorem to real-life processes: Neither hypothesis can be formally

212

CHAPTER 8

proved, yet in many real-life sampling procedures, they appear to be reasonable assumptions. Many standard constructions of an r.p.m. automatically have scale- and baseunbiased signif cant digits, and thus satisfy Benford’s law in the sense of Theorem 8.44. Example 8.46. Recall the classical Dubins–Freedman construction of an r.p.m. P described in Example 8.34. It follows from Theorem 4.2 and [51, Lem. 9.28] that EP is Benford, so P has scale- and base-unbiased signif cant digits. Note, however, that with probability one P will not have scale- or baseinvariant signif cant digits. It is only on average that these properties hold, but, as demonstrated by Theorem 8.44, this is enough to guarantee that random sampling from P will generate Benford sequences a.s. In the Dubins–Freedman construction, the fact that FP (101/2 ), FP (101/4 ), FP (103/4 ), etc. are chosen uniformly from the appropriate intervals is not crucial: If Q is any probability measure on (0, 1), and the values of FP (101/2 ), etc. are chosen independently according to an appropriately scaled version on Q, then, for the r.p.m. thus generated, EP will still be Benford, provided that Q is symmetric with respect to 21 ; see [51, Thm. 9.29]. As a matter of fact, realworld processes often exhibit this symmetry in a natural way: Many data may be equally apt to be recorded using certain units or their reciprocals, e.g., in miles per gallon vs. gallons per mile, or Benford’s “candles per watt” vs. “watts per candle.” This suggests that the distribution of log S should be symmetric with respect to 21 . z Data having scale- or base-unbiased signif cant digits may be produced in many ways other than through random samples. If such data are combined with unbiased random m-samples then the result will again conform to Benford’s law in the sense of Theorem 8.44. (Presumably, this is what Benford did when combining mathematical tables with numerical data from newspapers.) For example, consider the sequence (2n ), which may be thought of as the result of a periodic sampling from a (deterministic) geometric process. Since (2n ) is Benford, any mixture of this sequence with a sequence of unbiased random msamples will again be Benford. Finally, it is important to note that many r.p.m.s and sampling processes do not conform to Benford’s law, and hence necessarily are scale- and base-biased. Example 8.47. (i) Let P be the constant r.p.m. P ≡ δ1 . Since EP = δ1 has base-invariant signif cant digits, P has base-unbiased signif cant digits. Nevertheless, for every sequence (X1 , X2 , . . .) of P -random m-samples, the sequence of f rst signif cant digits is constant, namely, D1 (Xn ) ≡ 1. Similarly, if P = λ0,1 with probability one, then EP = λ0,1 does not have scale- or base-invariant signif cant digits. Consequently, every sequence of P random m-samples is an i.i.d. U (0, 1)-sequence and hence not Benford, by Example 3.10(i). (ii) The r.p.m.s considered in Example 8.33 do not have scale- or baseunbiased signif cant digits, simply because EP is not Benford.

213

REAL-VALUED RANDOM PROCESSES

(iii) As another variant of the classical construction in [51], consider the following way of generating an r.p.m. on [1, 10): First let X1/2 be uniform on [1, 10) and set FP (X1/2 ) = 12 . Next let X1/4 and X3/4 be independent and uniform on [1, X1/2 ) and [X1/2 , 10), respectively, and set FP (X1/4 ) = 41 and FP (X3/4 ) = 43 , etc. It follows from [51, Thm. 9.21] that FEP (t) =

2 arcsin log t , π

1 ≤ t < 10 ,

and hence EP is not Benford. By Proposition 8.43, therefore, the r.p.m. P has scale- and base-biased signif cant digits. z Remark. The authors do not know of any results pertaining to the rate of convergence in Theorem 8.44 based, say, on m and properties of P ; such results may be of practical use also (see Sections 10.1–3). 8.4

RANDOM MAPS

The purpose of this brief section is to illustrate and prove one simple basic theorem that combines deterministic aspects of Benford’s law studied in Chapter 6 with stochastic considerations of the present chapter. Specif cally, it is shown how applying randomly selected deterministic maps successively may generate Benford sequences with probability one. Random maps constitute a wide and intensely studied f eld, and for stronger results than the ones discussed here the interested reader is referred to [10]. Recall that for any (deterministic or random) sequence (f1, f2 , . . .) of maps n mapping R or a subset of R into  itself  and for every x0 ∈ R, f (x0 ) denotes the sequence f1 (x0 ), f2 f1 (x0 ) , . . . . The following example illustrates several of the key Benford ideas regarding random maps. √ Example 8.48. Let f : R+ → R+ be the map f (x) = x. The sequence  n  −n f (x0 ) is not Benford for any x0 > 0 because f n (x0 ) = x02 → 1 as n → ∞. More generally, consider the randomized map ( √ x with probability p , f (x) = 3 x with probability 1 − p , and assume that, at each step, the iteration of f is independent √ of the entire past process. If p = 1, this is simply the constant map f (x) = x, and hence   for every x0 ∈ R+ , the sequence f n (x0 ) is not Benford. On the other hand, if p = 0 then Theorem6.23 and  Corollary 6.28 imply that, for almost every x0 ∈ R+ , the sequence f n (x0 ) is Benford. It is plausible to expect that the latter situation persists for small p > 0. As the following theorem shows, this is √ indeed the case even in situations where the map x occurs more than half the time: If log 3 p< = 0.6131 , (8.17) log 2 + log 3

214

CHAPTER 8

  then, for a.e. x0 ∈ R+ , the (random) sequence f n (x0 ) is Benford with probability one. z Theorem 8.49 ([10]). Let W1 , W2 , . . . be i.i.d. positive random variables,   and assume that log W1 has f nite variance, i.e., E (log W1 )2 < +∞. For Wn the sequence and a.e. x0 ∈ R+ , the  n (fn )of random maps given by fn (x) = x sequence f (x0 ) is Benford with probability one or zero, depending on whether E[log W1 ] > 0 or E[log W1 ] ≤ 0. Proof. For every x ∈ R+ and n ∈ N,    Yn log fn ◦ . . . ◦ f1 (x) = Wj log x = 10Bn log x , j=1

where Bn =

Pn

j=1

log Wj . Assume f rst that E[log W1 ] > 0. By the Strong Law a.s.

of Large Numbers, Bn /n −→ log W1 as n → ∞, and it can be deduced from Proposition 4.14 that, with probability one, the sequence (10Bn y) is u.d. mod 1 for a.e. y ∈ R+ . Since the family of (Lebesgue) nullsets into itself,  log maps  with probability one, f n (x0 ) is Benford for a.e. x0 ∈ R+ . More formally, with ( , A, P) denoting the underlying (abstract) probability space, there  nexists 1 , the sequence f (x0 ) 1 ∈ A with P( 1 ) = 1 such that for every ω ∈ is Benford for all x0 ∈ R+ \Bω , where Bω ∈ B with  λ(Bω ) = 0. Denote by N ⊂ R+ × the set of all (x0 , ω) for which f n (x0 ) is not Benford, and let Nx = {ω ∈

: (x, ω) ∈ N } ,

N ω = {x ∈ R+ : (x, ω) ∈ N } ,

x ∈ R+ , ω∈

.

Then Nx ∈ A and N ω ∈ B for all x ∈ R+ and ω ∈ , respectively, and λ(N ω ) = 0 for all ω ∈ 1 . By Fubini’s Theorem, 0=

Z

λ(N ) dP(ω) =



ω

Z

R+ ×Ω

1N d(λ × P) =

Z

R+

P(Nx ) dλ(x) ,

   showing that P(Nx ) = 0 for a.e. x ∈ R+ . Thus P f n (x0 ) is Benford = 1 for a.e. x0 ∈ R+ . a.s. Next assume that E[log W1 ] < 0. In this case, fn ◦ . . . ◦ f1 (x) −→ 1 as n → ∞ for every x > 0, and hence (f n (x)) is not Benford. (Note, however, that (fn ◦ . . . ◦ f1 (x) − 1) may be Benford.) Finally, it remains to consider the case E[log W1 ] = 0. It follows from the Law of the Iterated Logarithm that, for every t ∈ R+ , #{1 ≤ n ≤ N : Bn ≤ t} 1 ≥ with probability one . N 2    Clearly, this implies P f n (x0 ) is Benford = 0 for every x0 ∈ R+ . lim supN →∞



REAL-VALUED RANDOM PROCESSES

Example 8.50. (i) For the random map in Example 8.48,   P W1 = 12 = p = 1 − P(W1 = 3) ,

215

and the condition E[log W1 ] = −p log 2 + (1 − p) log 3 > 0 is equivalent to (8.17). Note that E[log W1 ] > 0 is not generally equivalent to the equally plausible condition E[W1 ] > 1; in the present example, the latter simply reads p < 54 . 2n+Zn (ii) Let (f1 , f2 , . . .) be the sequence of random maps fn (x) = x10 , where Z1 , Z2 , . . . are i.i.d. standard Cauchy random variables. Note that Theorem 8.49     does not apply since E (log W1 )2 = E (2n + Z1 )2 = +∞. Nevertheless, it Pn a.s. follows from Bn = n(n + 1) + j=1 Zj and [36, Thm. 5.2.2] that Bn /n2 −→ 1 as n → ∞. The latter is enough to deduce from Proposition 4.14 that, with probability one, (10Bn y) is u.d. mod1 for a.e.  y ∈ R. The  same argument as that in the above proof shows that P f n (x0 ) is Benford = 1 for a.e. x0 ∈ R+ . Thus the conclusions of Theorem 8.49 may hold under weaker assumptions. Statements in the spirit of Theorem 8.49 are true also for random maps more general than monomials [10]. z

Chapter Nine Finitely Additive Probability and Benford’s Law Benford’s law, as was seen in earlier chapters, arises naturally in a wide variety of standard mathematical contexts. It is the purpose of this short chapter to illustrate that Benford’s law is robust even with regard to basic underlying mathematical hypotheses. Recall that throughout the earlier chapters of this book, Benford’s law has been studied exclusively in the standard countably additive measure-theoretic setting of probability, where a probability measure P is a real-valued function def ned on a σ-algebra A of subsets of a set satisfying the following three basic axioms:

P

[

n∈N



An =

X

P(A) ≥ 0 for all A ∈ A ;

(9.1)

P( ) = 1 ;

(9.2)

n∈N

P(An )

for all disjoint A1 , A2 , . . . ∈ A .

(9.3)

Working within this countably additive framework has several important implications for the theory of Benford’s law. First, the σ-algebra A in (9.1) usually does not include all subsets A of , so there are sets C ⊂ R whose Benford probability B(C) does not exist; see Def nition 3.7. Second, there are no countably additive Benford probability distributions on the positive integers N: If each n ∈ N had probability zero then the total probability of N would be zero, whereas if some integer n had strictly positive probability, that would contradict the basic equation (1.2), which implies that the probability of a number having exactly the same f rst m signif cant digits as n goes to zero as m → ∞; more formally,   m→∞ P({n}) ≤ P {k ∈ N : Dj (k) = Dj (n) for j = 1, . . . , m} −→ 0 , contradicting P({n}) > 0. There are even no countably additive Benford probability measures on the signif cand σ-algebra SN on N def ned in Example 2.14, as was shown prior to Example 3.9. Similarly, there are no such Benford probability measures on the power σ-algebra (i.e., the set of all subsets) of T+ , the set of positive terminating decimals (10-adic numbers) def ned by  T+ = t ∈ R+ : 10n t ∈ N for some n ∈ N .

Raimi wrote that T+ seemed “particularly appropriate as the model for tabular data” [127, p. 530], since “actual tables of constants are generally written (or

217

FINITELY ADDITIVE PROBABILITY AND BENFORD’S LAW

approximated) as decimal fractions” [126, p. 347]. Thus, if one views the tables of constants in Benford’s original data or the “stock of tabular data in the world’s libraries” [128, p. 217] as a sample from an underlying universal distribution on all of T+ , then there is no countably additive probability explanation for this model. Generalizing probabilities to the f nitely additive framework, on the other hand, guarantees the existence of Benford probability distributions that are def ned on all subsets of positive numbers. Similarly, there are f nitely additive Benford probability distributions def ned on the class of all subsets of N, and on all subsets of T+ . It is perhaps reassuring to know that such Benford probability distributions do exist in some natural sense, even though they may not exist in the theory of countably additive probability. This chapter records, without proof, several of the main theorems pertinent to a theory of Benford’s law in the context of f nitely additive probability, and points the interested reader to further references on that theory in the literature. The chapter may be read independently of the rest of the book, or, of course, be skipped entirely. 9.1

FINITELY ADDITIVE PROBABILITIES

A f nitely additive probability measure P∗ differs from the traditional probability measure P def ned in (9.1)–(9.3) in two important respects. First, P∗ is def ned on all subsets of , and second (the f nitely additive part), the fundamental conditions (9.1) and (9.2) are again required, but (9.3) is replaced by its much simpler special case P∗ (A ∪ B) = P∗ (A) + P∗ (B) if A ∩ B = ∅, A, B ⊂

.

(9.3′ )

From a philosophical standpoint, f nitely additive probability measures have sometimes been viewed as preferable to countably additive ones since (9.3′ ) is simpler and more natural than (9.3). L. Dubins and L. Savage reported that “De Finetti always insisted that countable additivity is not an integral part of the probability concept but is rather in the nature of a regularity hypothesis,” and they themselves viewed “countably additive measures much as one views analytic functions — as a particularly important case” [52, p. 10]. Moreover, since it is impossible to verify (9.3) in a f nite number of operations, it is impossible to distinguish between a f nitely additive and a countably additive probability measure in practice [125]. There are, of course, both mathematical advantages and disadvantages to replacing (9.3) with (9.3′ ). One practical advantage of using f nitely additive probabilities is that they are def ned on all subsets of , thus eliminating tedious, cumbersome, and often ultimately unnecessary measurability technicalities typically encountered in the countably additive theory. On the other hand, without (9.3), many of the most basic analytic tools in traditional probability — tools repeatedly used in the proofs in the rest of this book — are not available. For

218

CHAPTER 9

example, the fundamental continuity theorem of probability,  [ [  N An , P An = limN →∞ P n∈N

n=1

may fail for f nitely additive probability measures, so many basic limiting arguments may fail as well. Recall that in standard (i.e., countably additive) measure theory, a measure on a σ-algebra A on a set is often constructed as follows: First a countably additive nonnegative set function (also referred to as a pre-measure) is def ned only on an algebra F on . (A non-empty family of subsets of is an algebra if it is closed under complements and f nite unions; note that this notion is reminiscent of, but weaker than that of, a σ-algebra, because the latter, by def nition, is closed even under countable unions.) This pre-measure is then extended to a bona f de measure on A = σ(F), the σ-algebra on generated by F. Under mild conditions, this procedure yields a unique measure on σ(F), but this σ-algebra does not generally include all subsets of ; e.g., see [27]. A classic example is the extension of the notion of length from f nite intervals to B, the family of all Borel subsets of R, but not to all subsets of R. Finitely additive measures def ned on an algebra of sets F, on the other hand, extend to f nitely additive measures def ned on all subsets of , and this allows strong conclusions regarding Benford’s law. Note that in the following key result due to A. Horn and A. Tarski, recorded here for ease of reference, the extension need not be unique, in contrast to the countably additive context. Proposition 9.1. [81, Thm. 1.22] Let 6= ∅ be a set and F an algebra on . Every f nitely additive probability measure def ned on F may be extended to a f nitely additive probability measure on all subsets of . The next three examples illustrate the power and generality of Proposition 9.1, and contrast it with the countably additive extension theory. These ideas help illustrate some of the key differences between f nitely and countably additive probability measures, and will be applied directly to Benford’s law in the next section. In analogy to their countably additive counterparts, f nitely additive probability measures on ⊂ R are denoted by P ∗ , Q∗ , etc. throughout. Example 9.2. There exists a f nitely additive probability measure P ∗ def ned on all subsets of [0, 1] that agrees with Lebesgue measure λ on all Borel sets, that is, P ∗ ([a, b]) = λ([a, b]) = b − a for all intervals [a, b] ⊂ [0, 1] and, more generally, P ∗ (B) = λ(B) for all Borel subsets B of [0, 1]. To see this, simply note that Lebesgue measure λ on [0, 1] is a f nitely (in fact, countably) additive probability measure def ned on the algebra (in fact, σ-algebra) B[0, 1] on [0, 1]. Hence by Proposition 9.1, there is a f nitely additive probability measure P ∗ , def ned on all subsets of [0, 1], that agrees with λ on all Borel subsets of [0, 1]. (In fact, it can be shown that P ∗ may even be chosen to be countably additive — provided that large (measurable) cardinals exist [83].) z

FINITELY ADDITIVE PROBABILITY AND BENFORD’S LAW

219

Example 9.3. (Attributed in [52] to B. De Finetti.) There exists a f nitely additive probability measure Q∗ def ned on all subsets of [0, 1] ∩ Q, the set of rational numbers in [0, 1], such that Q∗ ([a, b] ∩ Q) = b − a

for all 0 ≤ a ≤ b ≤ 1.

(9.4)

To see this, apply Proposition 9.1 to extend the f nitely additive probability measure Q∗ satisfying (9.4), originally def ned only on the algebra of f nite (disjoint) unions of intervals [a, b] ∩ Q with [a, b] ⊂ [0, 1]. In contrast to Example 9.2, however, it clearly is impossible to make Q∗ countably additive, because [0, 1] ∩ Q is countable and by (9.4) each singleton must have measure zero. z Recall the notion of (natural ) density ρ of sets C ⊂ N appearing in Sections 3.1 and 5.1, def ned as ρ(C) = limN →∞ #{n ≤ N : n ∈ C}/N whenever this limit exists. The next example, not an immediate consequence of Proposition 9.1, relates this concept of density to f nitely additive probability measures. Example 9.4 ([84]). There is a f nitely additive probability measure P ∗ def ned on all subsets of N that agrees with density on all subsets of N that have density, that is P ∗ (A) = ρ(A) for all A ⊂ N for which ρ(A) exists. In particular, P ∗ (A) = 0 for every f nite set A of positive integers, and every arithmetic progression {l + m, l + 2m, . . .} = {l + nm : n ∈ N} with l ∈ N0 and m ∈ N has P ∗ -probability m−1 . z The theory of f nitely additive measures is a well-established subf eld of functional analysis, since there is a natural one-to-one correspondence between f nitely additive measures and certain linear functionals; for a detailed treatise of this material, the reader is referred to Chapter III of the classic text [53]. 9.2

FINITELY ADDITIVE BENFORD PROBABILITIES

From a historical standpoint, it is in the framework of f nitely additive probability measures that the f rst rigorous mathematical results on Benford’s law appeared. In their work on Benford’s law, R. Bumby and E. Ellentuck wrote that among the conditions (9.1)–(9.3), “countable additivity [(9.3)] seems least necessary,” and they replaced it in their theorems with the f nite additivity hypothesis (9.3′ ) [30, p. 34]. Similarly, Raimi wrote “There is no reason a priori why all things subjectively regarded as ‘probabilities’ should be countably additive, though there are of course many mathematical conveniences,” and he concluded that f nite additivity is a “closer model to reality” [126, pp. 344–5]. Moreover, f nitely additive probabilities also help overcome some of the “shortcomings” of countably additive probabilities with respect to Benford’s law. For example, the countably additive scale-invariance characterization (Theorem 5.3) holds only for a proper subset of the Borel σ-algebra (namely, the signif cand σ-algebra of Def nition 2.9), and that subset does not even include intervals of real numbers. As Knuth pointed out, there is no scale-invariant countably additive probability measure on the Borel subsets of the positive real numbers,

220

CHAPTER 9

i.e., on B+ , since existence of such a measure would imply the existence of a unique smallest median, and rescaling would change that median [90]. Thus hypothesizing the existence of a scale-invariant countably additive probability measure on B+ , or even on B (as in [124]), leads to a vacuous conclusion, in particular with regard to a Benford distribution. In this section, certain aspects of f nitely additive probability theory summarized in the previous section will be applied to Benford’s law. Even without the standard assumption of countable additivity, direct analogues of some of the most important and basic conclusions hold in the f nitely additive setting as well. As described below, analogues of the fundamental uniform distribution characterization (Theorem 4.2) also hold in the f nitely additive framework, as does the analogue of the countably additive conclusion that scale-invariance implies Benford’s law (Theorem 5.3). Recall the def nitions of the signif cand function S and the signif cand σalgebra S, Def nitions 2.3 and 2.9, respectively. A natural f nitely additive generalization of the countably additive notion of a Benford probability measure (Def nition 3.4) is as follows. Definition 9.5. A f nitely additive probability measure P ∗ on a non-empty set E ⊂ R+ is Benford if it is def ned on all subsets of E, and satisf es   P ∗ {x ∈ E : S(x) ≤ t} = log t for all t ∈ [1, 10) . Note that if P ∗ is Benford then in particular   P ∗ {x ∈ E : D1 (x) = d} = log(d + 1) − log d

for all d ∈ {1, 2, . . . , 9} .

Recall from Section 3.3 that there are no countably additive Benford probability measures def ned on all subsets of the positive integers N or on the set of positive terminating decimals T+ . Nor are there countably additive probability measures on B+ that are scale-invariant. In contrast to this fact, however, there exist f nitely additive Benford probability measures def ned on all subsets of N and T+ , respectively, as well as f nitely additive Benford probability measures def ned on all subsets of the signif cand range [1, 10). Example 9.6. There is a f nitely additive Benford probability measure def ned on all subsets of the positive integers N. To see this, let P ∗ be the f nitely additive probability measure on the algebra of f nite (disjoint) unions of subsets of N of the form {n ∈ N : S(n) ∈ [a, b)}, def ned by   P ∗ {n ∈ N : S(n) ∈ [a, b)} = log b − log a for all 1 ≤ a ≤ b < 10 ,

and then, via Proposition 9.1, extend P ∗ to a f nitely additive Benford probability measure def ned on all subsets of N. z There also exist f nitely additive Benford probability measures def ned on all subsets of the positive terminating decimals T+ . One such example, of course, is to extend P ∗ of Example 9.6 to all subsets of T+ by setting P ∗ ({t}) = 0 for all t ∈ T+ \ N. Another, which is quite different from this one, is the following.

FINITELY ADDITIVE PROBABILITY AND BENFORD’S LAW

221

Example 9.7. Consider the f nitely additive probability measure Q∗ on the S + algebra of f nite unions of intervals of T of the form k∈Z 10k [a, b) ∩ T+ def ned by  [ 10k [a, b) ∩ T+ = log b − log a for all 1 ≤ a ≤ b < 10 , Q∗ k∈Z

and extend Q∗ to the power set of T+ by means of Proposition 9.1.

z

Recall from Example 3.6(i) the canonical countably additive Benford probability measure P0 with density function t−1 log e on [1, 10), that is, Z P0 (B) = t−1 log e dt for all B ∈ B[1, 10) . (9.5) B

Countably additive measure theory does not extend absolutely continuous probability distributions (i.e., distributions with densities) to all subsets of R, and so in particular does not extend (9.5) to all subsets of [1, 10). Using f nitely additive probabilities, on the other hand, allows extension of the canonical Benford probability measure P0 in (9.5) to all subsets of [1, 10), as the next example shows. Example 9.8. There is a f nitely additive Benford probability measure P ∗ on [1, 10) that extends (9.5) to all subsets of [1, 10), that is, P ∗ (B) = P0 (B) for all Borel subsets B of [1, 10). To see this, simply apply Proposition 9.1 to extend the countably additive P0 in (9.5) to a f nitely additive P ∗ def ned on all subsets of [1, 10). z The next result is a f nitely additive analogue due to T. Jech of both the uniform distribution characterization (Theorem 4.2) of Benford’s law and the corresponding characterization by means of Fourier coeÿ cients, obtained via Theorem 4.2 and Lemma 4.20(vi). The hypothesis that the probability is def ned on all subsets of E can easily be weakened via Proposition 9.1. Note that, as explained in [30, 52], if a f nitely additive probability measure P ∗ is def ned on all subsets of E, then the linear R and order-preserving nature of the integral uniquely determine the value of f dP ∗ for all bounded functions f : E → R; also recall that hxi = x − ⌊x⌋. Theorem 9.9 ([83]). Let P ∗ be a f nitely additive probability measure def ned on all subsets of a set E ⊂ R+ . Then the following are equivalent: (i) P ∗ is Benford; R1 f (hlog xi) dP ∗ (x) = 0 f (s) ds for every Riemann-integrable function f : [0, 1] → R; R (iii) E e2πık log x dP ∗ (x) = 0 for every integer k 6= 0. (ii)

R

E

222

CHAPTER 9

The next theorem contains a f nitely additive analogue of Theorem 5.3, the characterization of Benford’s law as the unique scale-invariant countably additive probability measure def ned on the signif cand σ-algebra. Although there are no scale- or translation-invariant countably additive probability measures on = N, T+ , or R+ , there are f nitely additive probability measures on all three sets that are both scale- and translation-invariant, and all of those are necessarily Benford. Though def ned on all subsets, the f nitely additive version, on the other hand, is not unique. Theorem 9.10 ([126, 127]). There exist f nitely additive probability measures P ∗ def ned on all subsets of R+ such that, for all A ⊂ R+ and all a > 0, both P ∗ (aA) = P ∗ (A) and

P ∗ (a + A) = P ∗ (A)

(scale-invariance ) (translation-invariance) .

(9.6) (9.7)

Moreover, every f nitely additive probability P ∗ satisfying (9.6) is Benford. to (9.6) and (9.7) it may also be required in Theorem 9.10 that In addition  P ∗ (0, t) = 0 for all t ∈ R+ , thus “avoiding the philosophically awkward [midpoint] of all physical constants since the resulting measure is concentrated in the neighborhood of inf nity [or zero]” [127, p. 530]. The analogous conclusion that scale-invariance of a f nitely additive probability measure on N implies that the measure is Benford is given in [30]. The scale- and translation-invariance conditions (9.6) and (9.7) in Theorem 9.10 can be replaced (see [38] or [83]) by the single condition   P ∗ 2A ∪ (1 + 2A) = P ∗ (A) for all A ⊂ N .

As observed by Raimi, in view of Theorem 9.10, the additional translationinvariance requirement (9.7) together with scale-invariance (9.6), “means that an aÿ ne change of scale, as from Fahrenheit to Celsius, will preserve Benford’s law” [127, p. 530]. R. Scozzafava [145] provides an additional f nitely additive explanation of Benford’s law based on a conditional probability concept (“nonconglomerability”) of De Finetti. Although the f nitely additive Benford theorems in this chapter have conclusions identical in spirit to those in the countably additive theory (e.g., that scale-invariance implies Benford’s law), their proofs are quite different. The proof of Theorem 9.10, for example, is a “fairly technical exercise in the theory of amenable semigroups” [126, p. 346], and the existence of scale-invariant f nitely additive probability measures is “guaranteed by the Markov–Kakutani f xed point theorem” [30, p. 40]. As noted by Raimi, this aspect of f nitely additive probability theory also relies on results in the theories of divergent seˇ quences, invariant measures, and the Stone-Cech compactif cation [126, p. 349].

Chapter Ten Applications of Benford’s Law The purpose of this chapter is to present a brief overview of practical applications of Benford’s law, thereby complementing the theory developed in previous chapters. While the interplay between theory and applications of Benford’s law has generally proved very fruitful in both directions, several applications in particular have played a prominent role in the development of the theory. For example, in the use of Benford’s law to help detect tax fraud, the discovery by Nigrini that many tables of data are approximately sum-invariant [112] inspired the sum-invariance characterization of the law (Theorem 5.18). The question “Do dynamical systems follow Benford’s law?”, raised in [154] and subsequently addressed in [151], led directly to the discovery of Theorems 6.13 and 6.23, and indirectly to most of the results in Chapters 6 and 7. Observations about the prevalence and implications of Benford’s law in scientif c computing [90] inspired Theorem 6.35 as well as Propositions 7.34 and 7.35. Empirical evidence of Benford’s law in the stock market [98] helped motivate the study of products of random variables in the context of Benford’s law (Theorems 8.15 and 8.16). In his original 1938 paper [9], Benford combined data from radically different datasets, and this led to the discovery of the mixing-of-distributions theorem (Theorem 8.44). Conversely, it is hoped that the new theoretical results presented in earlier chapters, notably those relevant to (deterministic or random) dynamical processes, will serve as catalysts for further applications. While the main goal of this book has been to develop a solid and comprehensive theoretical basis for the appearance of Benford’s law in a broad range of mathematical settings, it is the empirical ubiquity of this logarithmic distribution of signif cant digits that has captured the interest and imagination of a surprisingly wide audience. From natural science to medicine, from social science to economics, from computer science to theology, Benford’s law, even in its most basic form, provides a simple analytical tool that invites anyone with numerical tables to look for this easily recognizable pattern in their own data. A glance at the online database [24] quickly reveals the magnitude, breadth, and recent growth of interest in both applications and theory of the signif cant-digit phenomenon. In contrast to the rest of the book, this chapter is necessarily expository and informal. It has been organized into a handful of ad hoc categories, which the authors hope will help illuminate the main ideas. None of the conclusions of the experiments or data presented here have been scrutinized or verif ed by

224

CHAPTER 10

the authors of this book, since the intent here is not to promote or critique any specif c application. Rather the goal is to offer a representative cross-section of the related scientif c literature, in the hopes that this might continue to facilitate research in both the theory and practical applications of Benford’s law. For useful guidelines on conducting and applying signif cant-digit analyses in general, the reader may also wish to consult [66]. 10.1

FRAUD DETECTION

The most well-known and widespread application of Benford’s law is in the f eld of forensic auditing, in particular, in the statistical detection of accounting fraud, where “fraud” means both data fabrication (inventing data) and data falsif cation (manipulating and/or altering data values). The underlying idea is this: If certain classes of valid f nancial datasets have been observed to closely follow a Benford distribution, then fabricated or falsif ed datasets can sometimes be identif ed simply by comparing their leading digits against the logarithmic distribution. The main impetus for this line of research was the discovery by Nigrini [113, 114] that certain verif ed IRS tax data follow Benford’s law very closely, but fraudulent data often do not. It is widely accepted that authentic data are diÿ cult to fabricate [75]. Thus, standard goodness-of-f t tests, such as a chi-squared test for f rst and/or second signif cant digits, or the Ones-scaling test (Example 5.9), provide simple “red f ag” tests for fraud. Whether given data are close to Benford’s law or are not close proves nothing, but if true data approximately follow Benford’s law, then a poor f t raises the level of suspicion, at which time independent (non-Benford) tests or monitoring may be applied. For cases where specif c deviations from true data might be expected, such as in tax returns when data are falsif ed in order to lower tax liabilities, specialized one-sided goodness-of-f t tests have also been developed [114]. The f rst success of this method was reported by the Brooklyn District Attorney’s oÿ ce in New York, where the chief f nancial investigator used an elementary goodness-of-f t test to the f rst-digit law (1.1) to identify and successfully prosecute seven companies for theft [26]. Similar tests are now routinely employed by many American state revenue services, as well as by the federal IRS and many foreign government tax agencies [114]. The detection of fraud in macroeconomic data can also benef t from similar testing. R. Abrantes-Metz et al. studied data from the Libor (London Interbank Offered Rate), and applied goodness-of-f t to Benford’s law as a statistical test to identify specif c industries and banks with competition issues such as pricef xing rate manipulation or collusion [1]. Similarly, T. Michalski and G. Stoltz examined f nancial data from the International Monetary Fund, and used Benford’s law to “f nd evidence supporting the hypothesis that countries at times misreport their economic data strategically” [107, p. 591].

APPLICATIONS OF BENFORD’S LAW

225

Due to the simplicity and effectiveness of goodness-of-f t tests based on Benford’s law, investigators from a wide variety of domains are now routinely including them in their arsenal of fraud detection techniques. In fact, workshops on Benford’s law are appearing in annual fraud conferences such as that of the Association of Certif ed Fraud Examiners [120]. In the f eld of medicine, applications range from assessing clinical trial data for new drugs [32] to identifying fraudulent scientif c publications in the f eld of anesthesiology [72]. Political scientists use modif ed goodness-of-f t tests to study the validity of voting results. W. Mebane found that while low-level vote counts rarely have f rst digits that satisfy Benford’s law, they often have second digits that are a close f t, and using this observation, he applied chi-squared statistical tests to challenge the validity of the 2009 Presidential election results in Iran [105]. Similar analyses have also been performed for voting data from Argentina [33], Germany [28], and Venezuela [153], but the reader should note that there is some debate about the validity of applying tests based on Benford’s law to election results [43, 106, 149]. The growing f eld of image and computer graphics forensics uses Benford’s law to detect fraud in visual information processing. E. Del Acebo and M. Sbert report that light intensities in certain classes of natural images and in synthetic images generated using “physically realistic” methods both follow Benford’s law closely, whereas other types of images fail to do so [44]. D. Fu et al. developed a “novel statistical model based on Benford’s law” for analysis of JPEG images, and observed that this tool is effective for detection of compression and double compression of the images [61]. J.-M. Jolion examined the frequencies of digits in both the gradients and the Laplace transforms of “non particular” images (i.e., images that are not pure noise, f ne texture, or synthetic), and compared them to those of artif cial images; the former obey Benford’s law, whereas the latter do not [85]. F. P´erez-Gonz´ alez et al. applied goodess-of-f t tests to the discrete cosine transforms of images to help determine whether certain natural digital images contain hidden messages [122], and B. Xu et al. used a similar technique to help separate computer generated or “tampered” images from photographic images. They concluded that tests based on Benford’s law are comparable to other state-of-the-art methods, but have considerably lower dimension and much lower computational complexity [160]. 10.2

DETECTION OF NATURAL PHENOMENA

Just as goodness-of-f t to Benford’s law is used to help detect fraud in tax and other human-generated data, it is also being used to detect changes in natural phenomena. The general approach here is essentially the same as that found in the case of fraud — if the signif cant digits of a certain process in nature exhibit a Benford distribution when the process is in one particular state, but do not when the process is in a different state, then simple goodness-of-f t tests can help identify when changes in the state of the process occur.

226

CHAPTER 10

Recent studies in earthquake science illustrate this idea. M. Sambridge et al. found that the set of values of the ground displacements before an earthquake, when small shifts are due to background noise, do not closely follow Benford’s law, but goodness-of-f t to Benford’s law increases sharply at the beginning of an earthquake. Observing that the “onset of an earthquake can clearly be manifested in the digit distribution alone” [135, p. 4], they reported detection of a local earthquake near Canberra using f rst signif cant digit information alone, without the necessity of seeing the complete details of the seismogram. They also concluded that goodness-of-f t to Benford’s law may prove useful for comparing the characteristics of earthquakes between regions, or over time [136]. G. Sottili et al. subsequently studied data from approximately 17,000 seismic events in Italy from 2006–2012, and also reported close conformity to Benford’s law for recurrence times of consecutive seismic events [152]. Reporting a similar phenomenon at the quantum level, A. Sen and U. Sen found that the relative frequencies of the signif cant digits are different before and after quantum phase transitions, and concluded that goodness-of-f t to Benford’s law can be used effectively to detect phase transitions in quantum manybody problems, adding that use of Benford’s law tests thus also offers new ways to tackle problems at the interface of quantum information science [147]. In the f eld of medical science, B. Horn et al. applied digital analysis to recorded electroencephalogram signals and found that different states of anesthesia can be detected by goodness-of-f t to Benford’s law [80], and more recently, M. Kreuzer et al. similarly noted that “Benford’s law can be successfully used to detect . . . signal modulations in electrophysiological recordings” [92, p. 183]. Digit analysis applied to certain datasets in interventional radiology by S. Cournane et al. was found to be potentially useful for monitoring and identifying system output changes [40]. Goodness-of-f t to Benford’s law has also been used to help detect data that originate from several different natural sources as opposed to data that originate from a single source. R. Seaman analyzed f eld errors in geopotential analysis and forecasting data in Australia, and associated closeness of f t to Benford’s law with the extent of mixing of data from different sources in the underlying distribution of those errors. He reported that goodness-of-f t tests provided useful conf rmation with other methods designed to determine when background f eld errors are composed of a mixture of different distributions, rather than a single distribution [146]. 10.3

DIAGNOSTICS AND DESIGN

In situations where Benford’s law is expected or is known to arise, goodness-off t tests can also be used to evaluate the predicted outcomes of a mathematical model or the quality of a detection instrument’s output. For example, since the 1990, 2000, and 2010 census statistics of populations of all the counties in the United States follow Benford’s law fairly closely (see Figure 1.3 and also

APPLICATIONS OF BENFORD’S LAW

227

[102, 113]), a simple diagnostic test to evaluate a given mathematical model’s prediction of future populations is the following: Enter current values as input, and then check to see how closely the output of that model agrees with Benford’s law. H. Varian, for instance, proposed Benford’s law as a criterion to help evaluate the “reasonableness” of a forecasting model for land usage in the San Francisco Bay area, and tested the idea using simulation. As with the goodnessof-f t tests described above for detecting fraud, this “Benford-in, Benford-out” principle provides a red f ag test; as Varian put it, “if the input data did obey Benford’s law while the output didn’t . . . well, that would make one a trif e uneasy about actually using that output” [156, p. 65]. More recently, C. Tolle et al. found empirical evidence of Benford’s law in certain gas phase and condensed phase experimental molecular dynamics, and concluded that the law can provide a useful diagnostic for selecting dynamical models, since “if the data follow Benford’s law, then the model dynamical system should do so as well” [154, p. 331]. Goodness-of-f t tests to Benford’s law can contribute to the explicit design of algorithms or mathematical models. J. Dickinson used this approach to assess the appropriateness of various algorithms employed in business simulation games, noting that their design should generate output close to real-world business data, which he found were a good f t to Benford’s law [48]. In another design application, M. Chou et al. examined the problem of assigning payoffs in f xed-odds numbers games, where the prizes and odds of winning are known at the time of placing the wager. Their objective was to determine how a game operator should set the sales limit, and empirical evidence showed that players have a tendency to bet on small numbers that are closely related to events around them. Viewing these numbers as being drawn from many different data sets (birth dates, addresses, etc.), they then used the connection between mixing datasets and Benford’s law to propose guidelines for setting appropriate sales limits in those games [35]. Benford’s law is also making an appearance as a diagnostic tool for quality control of datasets produced by measurement instruments and other devices. P. Manoochehrnia et al., appealing to scale-invariance and mixing-of-distributions characterizations of Benford’s law (Theorems 5.13 and 8.44, respectively), applied signif cant-digit analysis to evaluation of the detection eÿ ciency of lightning location networks. Studying cloud-to-ground f ashes detected in Switzerland from 1997 to 2007, they found the data “in very good agreement with Benford’s law,” and that the data for detection of low energy lightning strikes (absolute peak currents less than 5 kA) were in signif cantly poorer agreement. Past methods of performance evaluation for cloud-to-ground f ashes, which included erection of instrument towers and artif cially triggered lightning, were assumed to be of higher quality than that for the low energy strikes, and thus closeness to Benford’s law was associated with accuracy of detection. Their studies concluded that goodness-of-f t to Benford’s law could be used to evaluate detection eÿ ciencies, and suggested subsequent test applications for analyzing lightning

228

CHAPTER 10

data over regions close to the boundaries of lightning location networks, where the eÿ ciency of the network is also expected to decrease signif cantly [103]. M. Orita et al. demonstrated that certain classes of drug discovery data follow Benford’s law, and they “propose a protocol based on Benford’s law which will provide researchers with an eÿ cient method for data quality assessment” [118, 119]. Docampo et al. propose goodness-of-f t to Benford tests as a quality control for aerobiological datasets [49], and in the f eld of public health, A. Idrovo et al. found that a signif cant-digit analysis of the A(H1N1) inf uenza pandemic proved a useful tool for evaluating the performance of public health surveillance systems [82]. 10.4

COMPUTATIONS AND COMPUTER SCIENCE

The appearance of Benford’s law in real-life scientif c computations is now widely accepted as an empirical fact (e.g., see [7, 57, 62, 70, 78, 90, 141]), and theoretical support for this is found in its pervasiveness in large classes of deterministic and random processes (see Chapters 6, 7, and 8). Thus, from both theoretical and practical standpoints, Benford’s law plays an important role in the analysis of scientif c computations using digital computers. In f oating-point arithmetic, for example, Benford’s law can be used to help analyze various types of inherent computational errors. Even using the IEEE Standard “Unbiased” Rounding, calculations invariably contain round-off errors, and the magnitudes of possible errors may not be equally likely in practice. As pointed out by Knuth, an assumption of uniformly distributed fraction parts in calculations using f oating-point arithmetic tends to underestimate the average relative round-off error in cases where the actual statistical distribution of fraction parts is skewed toward smaller leading signif cant digits [90]. And Benford’s law has exactly this property — the leading digits tend to be small. In using an algorithm to estimate an unknown value of some desired parameter, a rule for stopping the algorithm must be specif ed, such as “stop when n = 106 ” or “stop when the difference between successive approximations is less than 10−6 .” Every stopping rule will invariably result in some unknown round-off error, and if its true statistical distribution follows Benford’s law, but the error is assumed instead to be uniformly distributed, this can lead to a considerable underestimate of the round-off error. In particular, if X denotes the absolute round-off error, and Y denotes the signif cand of the approximation at the time of stopping, then the relative error is X/Y . Assuming that X and Y are independent, the average (i.e., expected) relative error is simply E[X]E[1/Y ]. If the signif cand Y is na¨ıvely assumed to be uniformly distributed on [1, 10), R 10 then the average relative error is 1 19 t−1 dt = 0.2558 times the expected absolute error E[X], whereas if the true distribution of Y is actually Benford, then R 10 the true average relative error is 1 t−2 log e dt = 0.3908 times the expected absolute error. Thus, in numerical algorithms based on Newton’s method, or

APPLICATIONS OF BENFORD’S LAW

229

in numerical solutions of the many classes of deterministic or random processes where Benford’s law is known to appear, ignoring the fact that Y is Benford creates an average under estimation of the relative error by more than one third. A second type of error in scientif c calculations using digital computers is overf ow (or underf ow), which occurs when the running calculations exceed the largest (or smallest, in absolute value) f oating-point number allowed by the computer. Feldstein and Turner found that “under the assumption of the logarithmic distribution of numbers [i.e., Benford’s Law], f oating-point addition and subtraction can result in overf ow or underf ow with alarming frequency” [57, p. 241], and they suggest that special attention be given to overf ow and underf ow errors in any computer algorithm where the output is expected to follow a Benford distribution, such as that for estimating roots by means of Newton’s method; see Theorem 6.35. To help address this issue, they proposed “a long word format which will reduce the risks [of overf ow and underf ow errors] to acceptable levels” [57, p. 241]. Because of its prevalence in scientif c calculations, Benford’s law may also have implications for the design of computer hardware and software. For example, nearly half a century ago, Hamming provided a number of applications including the theoretical design problem of where to place the decimal point in order to minimize the number of normalization shifts in the calculations of products, and he showed that a Benford distribution of signif cands is balanced in the sense that it yields equal probability of requiring a shift to the right or to the left. He also applied his empirically observed Benford distribution of signif cands in computations to estimate the maximum relative and average relative representation errors (in order to obtain the mean and variance of the estimated propagation errors) and to optimize a library routine for minimizing running time [70]. P. Schatte, similarly hypothesizing that the signif cands tend toward the Benford distribution after a long series of computations, addressed the question of f nding the best base b = 2n for digital computers. Under the assumption of Benford distribution of signif cant digits, he found that from the standpoint of effective storage use (“effektive Bitnutzung”), b = 23 = 8 is optimal, and, under the same Benford hypothesis, he showed that the speed of f oating-point multiplication and division can also be determined [137, 141]. More recently, B. Ravikumar studied the problem of computing the leading signif cant digits of integral powers of positive integers, and used the theory of Benford’s law, especially the uniform distribution characterization (Theorem 4.2), to prove that certain natural formal languages are not unambiguous and context-free [129]. Also in the f eld of software development, Jolion suggested an application to entropy-based image coding, which he found to be eÿ cient and useful for data transmission [85], and E. Goldoni et al. developed a source coding technique for wireless sensor networks based on Benford’s law [65]. Though outside the scope of this book, a survey of the impact of these f ndings on the evolution of digital computer hardware and software would be of interest to both theoreticians and practitioners.

230 10.5

CHAPTER 10

PEDAGOGICAL TOOL

The increasing adoption of Benford’s law techniques by auditors, scientists, and other researchers has made analysis of signif cant digits an attractive subject for educators, and it is now being incorporated into curricula at several academic levels, both as a concrete statistical tool and as a general instructional tool. In a workshop for the Australian Association of Mathematics Teachers, P. Turner described lessons on Benford’s law for high school students [155], and D. Aldous and T. Phan used Benford’s law to teach university undergraduate students the conceptually diÿ cult task of testing proposed explanations of statistical phenomena [3]. A. Gelman and D. Nolan found that concepts related to Benford’s law help college students learn about statistical sampling procedures precisely because these concepts are both simple to state and counterintuitive [64]. M. Linville observed that class assignments requiring students to create f ctitious data proved very effective after the instructor had detected the students’ fabrications using goodness-of-f t to Benford’s law. He found that their curiosity was then satisf ed by a discussion of digital analysis and Benford’s law, and the usefulness of the techniques was “abundantly clear to the students without the necessity of instructor emphasis”[99, p. 56] — students were motivated to learn about Benford’s law, digital analysis, and goodness-of-f t after seeing just how effective it was in ferreting out their own fabrications. In addition to statistics classroom use, Benford’s law material is also being employed as an educational tool in other areas of mathematics. K. Ross’s article on f rst digits of squares and cubes included a section on “suggested undergraduate research” on Benford’s law [133]. R. Nillsen used the ideas underlying the uniform distribution and dynamical systems aspects of Benford’s law in his attempt to “bridge a gap between undergraduate teaching and the research level in mathematics” [115]; and S. Wagon utilized computer graphic demonstrations of Benford’s law to illustrate the effectiveness and versatility of Mathematica in the classroom as well as in research [157].

List of Symbols N, N0 , Z

set of positive integers, nonnegative integers, integers

Q, R+ , R, C

set of rational, positive real, real, complex numbers

[a, b)

(half-open) interval [a, b) = {x ∈ R : a ≤ x < b} with a < b

D1 , D2 , etc. (b)

Dm

log x logb x ln x ∆ #A

f rst, second, etc. signif cant decimal digit mth signif cant digit base b, b ≥ 2 logarithm base 10 of x ∈ R+ logarithm base b of x ∈ R+

natural logarithm (base e) of x ∈ R+

deviation (in percent) from f rst-digit law; ∆ = 100 · max9d=1 Prob(D1 = d) − log(1 + d−1 ) cardinality (number of elements) of f nite set A

U (a, b)

random variable uniformly distributed on (a, b) with a < b

S

(decimal) signif cand function

⌊x⌋

largest integer not larger than x ∈ R

hxi

fractional part of x ∈ R; hxi = x − ⌊x⌋

A

σ-algebra, on some non-empty set

σ(E)

σ-algebra generated by collection E of subsets of

B

Borel σ-algebra on R or parts thereof

σ(f )

σ-algebra generated by function f :

S

signif cand σ-algebra; S = R+ ∩ σ(S)

( , A, P) Ac A\B

→R

probability space

complement of A in set ; Ac = {ω ∈

: ω 6∈ A}

set of elements in A but not in B; A\B = A ∩ B c

232

LIST OF SYMBOLS

A∆B

symmetric difference of A and B; A∆B = (A\B) ∪ (B\A)

λ

Lebesgue measure on (R, B) or parts thereof

λa,b

normalized Lebesgue measure (uniform distribution)   on [a, b), B[a, b)

δω ρ(C)

Dirac probability measure concentrated at ω ∈ (natural) density of C ⊂ N

1A

indicator function of set A

(Fn )

sequence of Fibonacci numbers; (Fn ) = (1, 1, 2, 3, 5, . . .)

(pn )

sequence of prime numbers; (pn ) = (2, 3, 5, 7, 11, . . .)

i.i.d.

independent, identically distributed

X, Y, . . .

(real-valued) random variables

E[X]

expected (or mean) value of random variable X

var X

variance of random variable X with E[|X|] < +∞;   var X = E (X − EX)2

→R

P

probability measure on (R, B), possibly random

PX

distribution of random variable X

FP , FX

distribution function of P , X

Pb (k)

k th Fourier coeÿ cient of probability P on B[0, 1); R 1 2πıks e dP (s), k ∈ Z Pb (k) =

∆∞

deviation (100 × sup-norm) from Benford’s law; ∆∞ = 100 · sup1≤t