Information Theoretic Analysis of Watermarking Systems


305 16 2MB

English Pages 193 Year 2001

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Information Theoretic Analysis of Watermarking Systems

  • Author / Uploaded
  • Cohen
  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Information Theoreti Analysis of Watermarking Systems by

Aaron Seth Cohen S.B., Ele tri al Engineering and Computer S ien e Massa husetts Institute of Te hnology (1997) M.Eng., Ele tri al Engineering and Computer S ien e Massa husetts Institute of Te hnology (1997) Submitted to the Department of Ele tri al Engineering and Computer S ien e in partial ful llment of the requirements for the degree of Do tor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2001

Massa husetts Institute of Te hnology 2001. All rights reserved. Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Department of Ele tri al Engineering and Computer S ien e August 31, 2001

Certi ed by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amos Lapidoth Asso iate Professor of Ele tri al Engineering Thesis Supervisor

A

epted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arthur C. Smith Chairman, Department Committee on Graduate Students

Information Theoreti Analysis of Watermarking Systems by Aaron Seth Cohen

Submitted to the Department of Ele tri al Engineering and Computer S ien e on August 31, 2001, in partial ful llment of the requirements for the degree of Do tor of Philosophy

Abstra t Watermarking models a opyright prote tion me hanism where an original data sequen e is modi ed before distribution to the publi in order to embed some extra information. The embedding should be transparent (i.e., the modi ed data should be similar to the original data) and robust (i.e., the information should be re overable even if the data is modi ed further). In this thesis, we des ribe the information-theoreti apa ity of su h a system as a fun tion of the statisti s of the data to be watermarked and the desired level of transparen y and robustness. That is, we view watermarking from a ommuni ation perspe tive and des ribe the maximum bit-rate that an be reliably transmitted from en oder to de oder. We make the onservative assumption that there is a mali ious atta ker who knows how the watermarking system works and who attempts to design a forgery that is similar to the original data but that does not ontain the watermark. Conversely, the watermarking system must meet its performan e riteria for any feasible atta ker and would like to for e the atta ker to e e tively destroy the data in order to remove the watermark. Watermarking

an thus be viewed as a dynami game between these two players who are trying to minimize and maximize, respe tively, the amount of information that an be reliably embedded. We ompute the apa ity for several s enarios, fo using largely on Gaussian data and a squared di eren e similarity measure. In ontrast to many suggested watermarking te hniques that view the original data as interferen e, we nd that the apa ity in reases with the un ertainty in the original data. Indeed, we nd that out of all distributions with the same varian e, a Gaussian distribution on the original data results in the highest apa ity. Furthermore, for Gaussian data, the apa ity in reases with its varian e. One surprising result is that with Gaussian data the apa ity does not in rease if the original data an be used to de ode the watermark. This is reminis ent of a similar model, Costa's \writing on dirty paper", in whi h the atta ker simply adds independent Gaussian noise. Unlike with a more sophisti ated atta ker, we show that the apa ity does not hange for Costa's model if the original data is not Gaussian. Thesis Supervisor: Amos Lapidoth Title: Asso iate Professor of Ele tri al Engineering

A knowledgments First, I would like to thank my advisor, Prof. Amos Lapidoth, for all of his advi e and en ouragement. Even from long distan e, he has gra iously helped me over ome the many hurdles I have en ountered throughout the writing of this thesis. I would also like to thank the members of my thesis ommittee, Prof. Bob Gallager and Prof. Greg Wornell, for their insightful omments. The many omments by Neri Merhav and Anelia Somekh-Baru h have also been extremely helpful. My brief visit to the ISI in Zuri h was quite pleasant, thanks mainly to the friendly and a

ommodating students there. I would spe i ally like to thank Renate Agotai for taking

are of all of the important details and Ibrahim Abou Fay al for being a great traveling

ompanion. The intelle tual environment reated by the fa ulty, sta and students of LIDS has made it a wonderful pla e to pursue a graduate edu ation. I would espe ially like to thank my many oÆ e-mates in 35-303 in luding Randy Berry, Anand Ganti, Hisham Kassab, Thierry Klein, Emre Koksal, Mike Neely, Edmund Yeh and Won Yoon; the oÆ e was always a good pla e for getting feedba k about resear h or (more typi ally) dis ussing su h important topi s as sports or the sto k market. Of ourse, I ould not have done any of this without my parents, Mi hael and Ja kie, who have always en ouraged me to vigorously pursue my goals. I would like to thank them and the rest of my family for the support they have given me over the years. My nal and most important a knowledgment goes to my lovely wife Dina Mayzlin. She has motivated and inspired me, and she has made the many years we have spent together at MIT worthwhile.

This resear h was supported in part by an NSF Graduate Fellowship.

6

Contents 1 Introdu tion

13

1.1 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2 Watermarking Model and Results

19

2.1 Pre ise De nition of Watermarking . . . . . . . . . . . . . . . . . . . . . . .

19

2.2 Capa ity Results for Watermarking . . . . . . . . . . . . . . . . . . . . . . .

22

2.2.1

S alar Gaussian Watermarking Game . . . . . . . . . . . . . . . . .

23

2.2.2

Additive Atta k Watermarking Game . . . . . . . . . . . . . . . . .

28

2.2.3

Average Distortion Constraints . . . . . . . . . . . . . . . . . . . . .

29

2.2.4

Ve tor Gaussian Watermarking Game . . . . . . . . . . . . . . . . .

30

2.2.5

Dis rete Alphabets, No Covertext . . . . . . . . . . . . . . . . . . .

32

2.2.6

Binary Watermarking Game . . . . . . . . . . . . . . . . . . . . . .

34

2.3 Prior Work on Watermarking . . . . . . . . . . . . . . . . . . . . . . . . . .

37

2.3.1

Pra ti al Approa hes to Watermarking . . . . . . . . . . . . . . . .

37

2.3.2

Information-Theoreti Watermarking . . . . . . . . . . . . . . . . . .

38

2.3.3

Similar Models: Steganography and Fingerprinting . . . . . . . . . .

39

2.3.4

Communi ation Games . . . . . . . . . . . . . . . . . . . . . . . . .

40

2.4 Assumptions in Watermarking Model . . . . . . . . . . . . . . . . . . . . . .

41

2.4.1

Is Capa ity Meaningful? . . . . . . . . . . . . . . . . . . . . . . . . .

41

2.4.2

Randomization for En oder/De oder . . . . . . . . . . . . . . . . . .

41

2.4.3

Randomization for Atta ker - Deterministi is SuÆ ient . . . . . . .

42

2.4.4

Distortion Constraints . . . . . . . . . . . . . . . . . . . . . . . . . .

43

2.4.5

Statisti s of Covertext . . . . . . . . . . . . . . . . . . . . . . . . . .

44

7

2.5 Un ertainty in the Watermarking Model . . . . . . . . . . . . . . . . . . . .

45

2.5.1

Types of State Generators . . . . . . . . . . . . . . . . . . . . . . . .

45

2.5.2

Communi ation with Side Information . . . . . . . . . . . . . . . . .

47

2.5.3

Arbitrarily Varying Channels . . . . . . . . . . . . . . . . . . . . . .

50

2.5.4

Extended Writing on Dirty Paper . . . . . . . . . . . . . . . . . . . .

52

3 Mutual Information Games

57

3.1 De nition and Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

3.2 Proof of Mutual Information Game Result . . . . . . . . . . . . . . . . . . .

60

3.2.1

Optimal Atta k Channel . . . . . . . . . . . . . . . . . . . . . . . . .

60

3.2.2

Optimal Watermarking Channel . . . . . . . . . . . . . . . . . . . .

61

3.2.3

Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

3.3 Game Theoreti Interpretation . . . . . . . . . . . . . . . . . . . . . . . . .

66

3.4 Other Mutual Information Games . . . . . . . . . . . . . . . . . . . . . . .

68

4 The S alar Gaussian Watermarking Game 4.1 Deterministi Atta ks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71 72

4.1.1

Deterministi Additive Atta k . . . . . . . . . . . . . . . . . . . . .

72

4.1.2

Deterministi General Atta k . . . . . . . . . . . . . . . . . . . . . .

73

4.2 A hievability for Private Version . . . . . . . . . . . . . . . . . . . . . . . .

73

4.2.1

Coding Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

4.2.2

Analysis of Probability of Error . . . . . . . . . . . . . . . . . . . . .

76

4.3 A hievability for Publi Version . . . . . . . . . . . . . . . . . . . . . . . . .

80

4.3.1

Coding Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

4.3.2

Probability of Error . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

4.3.3

Distribution of Chosen Auxiliary Codeword . . . . . . . . . . . . . .

87

4.3.4

Analysis for Additive Atta k Watermarking Game . . . . . . . . . .

88

4.3.5

Analysis for General Watermarking Game . . . . . . . . . . . . . . .

90

4.4 Spheri ally Uniform Covertext is SuÆ ient . . . . . . . . . . . . . . . . . . .

92

4.5 Converse for Squared Error Distortion . . . . . . . . . . . . . . . . . . . . .

94

4.5.1

Atta ker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95

4.5.2

Analysis of Distortion . . . . . . . . . . . . . . . . . . . . . . . . . .

98

4.5.3

Analysis of Probability of Error . . . . . . . . . . . . . . . . . . . . . 100 8

4.5.4

Dis ussion: The Ergodi ity Assumption . . . . . . . . . . . . . . . . 104

5 The Ve tor Gaussian Watermarking Game

105

5.1 Diagonal Covarian e is SuÆ ient . . . . . . . . . . . . . . . . . . . . . . . . 105 5.2 De nitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.3 Outline of Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.4 A hievability for the Private Version . . . . . . . . . . . . . . . . . . . . . . 111 5.5 A hievability for the Publi Version . . . . . . . . . . . . . . . . . . . . . . . 117 5.5.1

Codebook Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 117

5.5.2

En oding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.5.3

De oding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.5.4

Probability of Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

5.6 Optimization Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.6.1

Proof of Lemma 5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

5.6.2

Proof of (5.17) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

5.7 The Optimal Atta k and Lossy Compression . . . . . . . . . . . . . . . . . 128 5.7.1

Compression Atta k . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

5.7.2

Designing for a Compression Atta k . . . . . . . . . . . . . . . . . . 130

6 Watermarking with Dis rete Alphabets

133

6.1 No Covertext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6.1.1

De nitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.1.2

A hievability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

6.1.3

Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6.2 Binary Covertext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.2.1

Private Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6.2.2

Publi Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

7 Con lusions

145

7.1 Future Resear h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.1.1

Gaussian Sour es with Memory . . . . . . . . . . . . . . . . . . . . . 147

7.1.2

Dis rete Memoryless Covertext . . . . . . . . . . . . . . . . . . . . . 148

7.1.3

Deterministi Code Capa ity for Publi Version . . . . . . . . . . . . 148 9

7.1.4

Multiple Rate Requirements . . . . . . . . . . . . . . . . . . . . . . .

150

A De nitions for Gaussian overtext

151

B Te hni al Proofs

155

B.1

Proof of Theorem 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

155

B.2

Proof of Lemma 2.1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

156

B.3

Proof of Lemma 3.1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

159

B.4

Proof of Lemma 3.2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

160

B.5

Proof of Lemma 3.3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

162

B.6

Proof of Lemma 4.3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

167

B.7

Proof of Lemma 4.6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

169

B.8

Proof of Lemma 4.8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

170

B.9

Proof of Lemma 4.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

172

B.10 Proof of Lemma 4.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

174

B.11 Proof of Lemma 4.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

177

B.12 Proof of Lemma 5.6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

180

B.13 Proof of Lemma 5.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

182

Bibliography

185

10

List of Figures 1-1 A diagram of watermarking. The dashed line is used in the private version, but not in the publi version. . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2-1 S alar Gaussian watermarking apa ity versus u2 with D1 = 1 and D2 = 4. The dashed line is the apa ity expression from [MO99, MO00℄. . . . . . . .

26

2-2 Binary watermarking apa ity (private and publi versions) versus D1 with D2 = 0:15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

2-3 Watermarking model with state sequen es. . . . . . . . . . . . . . . . . . .

46

2-4 Communi ation with side information at the en oder. . . . . . . . . . . . .

48

2-5 Gaussian arbitrarily varying hannel:

U is an IID Gaussian sequen e and s

is an arbitrary power onstrained sequen e. . . . . . . . . . . . . . . . . . . 2-6 Writing on dirty paper.

U and Z are independent IID Gaussian sequen es.

51 53

4-1 Example odebook for publi version. Dashed ve tors are in bin 1 and dotted ve tors are in bin 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

4-2 Example en oding for publi version with w = 1 (bin with dashed ve tors).

83

4-3 Example de oding for publi version. . . . . . . . . . . . . . . . . . . . . . .

83

5-1 Comparison of watermarking system parameters for the optimal atta k and the ompression atta k. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 A-1 Example plots of C  (D1 ; D2 ; 2 ) and A (D1 ; D2 ; 2 ) for di erent parameter values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

11

12

Chapter 1 Introdu tion Watermarking an model situations where sour e sequen es need to be opyright-prote ted before distribution to the publi . The opyright needs to be embedded in the distributed version so that no adversary with a

ess to the distributed version will be able produ e a forgery that resembles the original sour e sequen e and yet does not ontain the embedded message. The watermarking pro ess should, of ourse, introdu e limited distortion so as to guarantee that the distributed sequen e losely resembles the original sour e sequen e. The original sour e sequen e an be any type of data su h as still image, audio or video that

an be modi ed slightly and still maintain its inherent qualities. Watermarking resear h has exploded over the past several years. For example, see [KP00b, LSL00, PAK99, SKT98℄ and their extensive referen es. This interest has stemmed from the ease by whi h data an now be reprodu ed and transmitted around the world, whi h has in reased the demand for opyright prote tion. Furthermore, ordinary en ryption is not suÆ ient sin e, in order to be enjoyed by the publi , the data must be a

essed at some point. Thus, there is a need to embed information dire tly in the distributed data, whi h is pre isely what a watermarking system does. Mu h of the work on watermarking has fo used on designing ad-ho systems and testing them in spe i s enarios. Relatively little work has been done in assessing the fundamental performan e trade-o s of watermarking systems. In this thesis, we seek to des ribe these performan e trade-o s. The main requirements for a watermarking system are transparen y and robustness. For

opyright prote tion, transparen y means that the distributed data should be \similar" to the original data, while robustness means that the embedded information should be 13

re overable from any forgery that is \similar" to the distributed data. Another way of thinking about robustness is that only by destroying the data ould a pirate remove the

opyright. We formalize these requirements by spe ifying a distortion measure and laiming that two data sequen es are \similar" if the distortion between them is less than some threshold. The threshold for transparen y is in general di erent from the threshold for robustness. In this thesis, we view watermarking as a ommuni ation system and we seek to nd the trade-o between transparen y, robustness, and the amount of information that an be su

essfully embedded. In parti ular, we nd the information-theoreti apa ity of a watermarking system depending on the transparen y and robustness thresholds and the statisti al properties of the data to be watermarked. For a general watermarking system, the data distributed to the publi and the data that is used to re over the embedded information will be di erent. This di eren e might be

aused by a variety of signal pro essing te hniques, e.g., photo opying or ropping an image and re ording or ltering an audio lip. Instead of making any assumptions on the type of signal pro essing that will take pla e, we make the onservative assumption that there is a mali ious atta ker whose sole intent is to disrupt the information ow. For example, a pirate might wish to remove opyright information in order to make illegal opies. In order to please his ustomers, this pirate would also want the illegal opies to be of the highest quality possible. Conversely, the watermarking system wishes to ensure that the a t of removing the embedded information auses the data to be unusable to su h a pirate. We

an thus think of watermarking as a game between two players: a ommuni ation system (en oder and de oder) and an atta ker. The players are trying to respe tively maximize and minimize the amount of information that an be embedded. We assume throughout the thesis that the atta ker an only use one watermarked version of the original data sequen e to reate a forgery. That is, we assume one of following two things about the system. The rst possible assumption is that only one watermark will be embedded in ea h original data sequen e. Su h an assumption is plausible if the watermark only ontains opyright information. The se ond possible assumption is that even though many di erent versions exist, an atta ker an only a

ess one of them. This assumption is reasonable if the ost of obtaining multiple opies is prohibitively high. A system that has to deal with atta kers with di erent watermarked opies is usually alled a ngerprint14

U

?

W - En oder 6

X - Atta ker 1

Y

?

- De oder 6

W^-

Figure 1-1: A diagram of watermarking. The dashed line is used in the private version, but not in the publi version. ing system, although the terms \watermarking" and \ ngerprinting" are sometimes used inter hangeably. See Se tion 2.3.3 for more about ngerprinting. We onsider two versions of watermarking. In the private version, the de oder an use both the forgery and the original data sequen e re over the embedded information. In the publi version, the de oder must re over the embedded information from the forgery alone. The private version is sometimes alled non-oblivious or non-blind watermarking and the publi version is sometimes alled oblivious or blind watermarking. The private version is easier to analyze and is appli able, for example, when a opyright system is entralized. The publi version is more diÆ ult to analyze, but it is appli able in a mu h wider ontext. Surprisingly, we nd that when the original data is Gaussian, then the apa ity is the same for both versions. However, the apa ity-a hieving te hnique is more omplex for the publi version. Our model of the watermarking game is illustrated in Figure 1-1 and an be des ribed brie y as follows. A more thorough mathemati al model is given below in Se tion 2.1. The rst player onsists of the en oder and de oder who share a se ret key 1 that allows them to implement a randomized strategy. The atta ker is the se ond player and it is assumed to have full knowledge of the rst player's strategy. We now dis uss the fun tions of ea h of the en oder, the atta ker and the de oder. The en oder takes the original data sequen e U (whi h we will all the \ overtext") and the watermark W and produ es the \stegotext" X for distribution to the publi . The en oder must ensure that the overtext and the stegotext are similar a

ording to the given distortion measure. The atta ker produ es a forgery Y from the stegotext, and he must also ensure that the forgery and the stegotext are similar a

ording to the given distortion measure. Finally, the de oder uses the forgery (in the 15

publi version) or both the forgery and the overtext (in the private version) in order to produ e an estimate of the watermark ^ . Although the en oder, atta ker and de oder a t

W

in that order, it is important to remember that the en oder and de oder are designed rst and then the atta ker is designed with knowledge of how the en oder and de oder work, but not with knowledge of the realizations of their inputs. Although we have and will use opyright prote tion as the main watermarking appli ation, a modi ed watermarking model an be used to des ribe several other s enarios. For example, the overtext ould be a signal from an existing transmission te hnique (e.g., FM radio) and the watermark ould be supplemental digital information [CS99℄. The stegotext produ ed by the en oder would be the signal that is a tually transmitted. Sin e the transmitted signal is required to be similar to the original signal, existing re eivers will still work while newer (i.e., more expensive) re eivers will be able to de ode the supplemental information as well. For this example, instead of an a tive atta ker that arbitrarily modi es the stegotext, it is more reasonable to say that the re eived signal is simply the transmitted signal plus independent ambient noise. This modi ed watermarking model an also be used to analyze a broad ast hannel (i.e., one transmitter, many re eivers) [CS01℄. In this ase, the transmitter an use its knowledge of the signal it is sending to one user to design the signal it is simultaneously sending to another user. The modi ed watermarking model with Gaussian overtext and Gaussian ambient noise is also known as \writing on dirty paper" [Cos83℄; see Se tion 2.5.4 for more on this model in luding two extensions. We on lude the introdu tion by onsidering an example watermarking system. Let's say that the ro k band The LIzarDS has reated a new hit song (this orresponds to our

overtext

U ). Instead of dire tly releasing the song to the publi , the band submits it to a

watermarking system. This system takes the original song and the watermark (e.g., song title, artist's name, et .) and produ es a version that will be distributed to the publi (this is our stegotext

X ). To respe t the band's artisti integrity, the distributed version should

be similar to the original version (hen e, our transparen y requirement). Whenever the song is played on the radio, the watermarking system ould de ode the watermark and ensure that the proper royalty is paid to the artist. The system ould also blo k opying over the Internet based on the ontents of the watermark, as the musi industry would like to happen. Finally, the watermarking system would like to be able to re over the information in the watermark even if the distributed song has been altered, but is still essentially the 16

same (hen e, our robustness requirement). Note that the watermarking system is primarily interested in the information embedded in the watermark and is only indire tly interested in the song itself through the transparen y and robustness requirements. In other words, the part of the watermarking system that listens to the song is only required to extra t the watermark and is not required to improve the quality of the song.

1.1

Outline of Thesis

This thesis is organized as follows. In Chapter 2, we give a pre ise de nition of watermarking and give our results on the apa ity of a watermarking system for several s enarios. We also

ompare our watermarking model to other watermarking resear h and to two well-studied information theoreti problems { ommuni ation with side information and the arbitrarily varying hannel. We on lude this hapter with two extensions of a ommuni ation with side information problem, Costa's writing on dirty paper, whi h is similar to our watermarking model. In Chapter 3, we de ne and solve two mutual information games whi h are related to the private and publi versions of watermarking. Chapters 4, 5 and 6 are devoted to proving the main watermarking apa ity results. In Chapter 7, we give some on lusions and some dire tions for future work.

1.2

Notation

We use s ript letters, e.g., U

(e.g.,

U U  U

U

and

X

, to denote sets. The n-th Cartesian produ t of a set

) is written U n . Random variables and random ve tors are written

in upper ase, while their realizations are written in lower ase. Unless otherwise stated, the use of bold refers to a ve tor of length n, for example

u = (u1 ; : : : ; u

n

U

= (U1 ; : : : ; Un ) (random) or

) (deterministi ).

For real ve tors, we use k  k and h; i to denote the Eu lidean norm and inner produ t, respe tively. That is, for any

;

h

i

= 0, then we say that

;

 and

2

Rn , h;

i

=

P

n i=1

i i , and kk =

p ;  . h

i

are orthogonal and write  ? . We denote by

the linear sub-spa e of all ve tors that are orthogonal to . If 17

If

?

= 0, then j denotes the 6

proje tion of

 onto

, i.e.,

j Similarly,

j

?

We use

P

For example, larly,

PX U j

:

 onto the subspa e orthogonal to

denotes the proje tion of

j

h; i k k2

=

?

=

, i.e.,

 j :

to denote a generi probability measure on the appropriate Borel

PU ()

is the distribution of

U

on the Borel

denotes the onditional distribution of

onditional density, when it exists.

18

-algebra.

-algebra of subsets of U n .

Simi-

X given U , and fX U (xju) denotes the j

Chapter 2

Watermarking Model and Results This hapter is organized as follows. In Se tion 2.1, we give pre ise de nitions of watermarking and its information-theoreti apa ity. In Se tion 2.2, we summarize our main results on the watermarking apa ity for six di erent s enarios. In Se tion 2.3, we ompare our model and results to prior work that has been done on watermarking. In Se tion 2.4, some of the assumptions we have made in our watermarking model are dis ussed, with an emphasis on whi h assumptions an be dropped and whi h need improvement. In Se tion 2.5, we show that watermarking an be thought of as a ombination of two well-studied information-theoreti problems: ommuni ation with side information and the arbitrarily varying hannel. In Se tion 2.5.4, we onsider a spe i ommuni ation with side information model { Costa's writing on dirty paper { and des ribe two extensions to this model.

2.1 Pre ise De nition of Watermarking We now give a more detailed des ription of our watermarking model. Re all that this model is illustrated in Figure 1-1 above. Prior to the use of the watermarking system, a se ret key (random variable) 1 is generated and revealed to the en oder and de oder. Independently of the se ret key 1, a sour e subsequently emits a blo klength- overtext sequen e U 2 U n a

ording to the law U , where f U g is a olle tion of probability laws indexed by the blo klength . Independently of the overtext U and of the se ret key 1, a opyright message is drawn uniformly over the set Wn = f1 b2nR g, where is the rate of the system. Using the se ret key, the en oder maps the overtext and message to the stegotext X . n

P

P

n

W

;:::;

R

19

For every blo klength n, the en oder thus onsists of a measurable fun tion f that maps realizations of the overtext u, the message w, and the se ret key 1 into the set X , i.e., n

n

fn

: (u; w; 1 ) 7! x 2 X

n

:

The random ve tor X is the result of applying the en oder to the overtext U , the message W , and the se ret key 1 , i.e., X = f (U ; W; 1 ). The distortion introdu ed by the en oder is measured by n

d1

(u; x) = n1

X 1( n

i

=1

)

d ui ; xi ;

where d1 : U X ! R+ is a given nonnegative fun tion. We require that the en oder satisfy d1

(U ; X )  D1 ; a.s.;

(2.1)

where D1 > 0 is a given onstant alled the en oder distortion level, and a.s. stands for \almost surely", i.e., with probability 1. We will also onsider an average distortion onstraint on the en oder; see Se tion 2.2.3.

Independently of the overtext U , the message W , and the se ret key 1 the atta ker generates an atta k key (random variable) 2. For every n > 0, the atta ker onsists of a measurable fun tion g that maps realizations of the stegotext x and the atta k key 2 into the set Y , i.e., n

n

gn

: (x; 2) 7! y 2 Y

n

(2.2)

:

The forgery Y is a random ve tor that is the result of applying the atta ker to the stegotext X and the atta ker's sour e of randomness 2, i.e., Y = g (X ; 2 ). The distortion introdu ed by the atta ker is measured by n

d2

(x; y) = n1

X 2( n

=1

i

20

)

d xi ; yi ;

where d2 : X  Y ! R+ is a given nonnegative fun tion. The atta ker is required to satisfy d2

(X ; Y )  D2 ; a.s.;

(2.3)

where D2 > 0 is a given onstant alled the atta ker distortion level. We will also onsider an average distortion onstraint on the atta ker; see Se tion 2.2.3. In the publi version of watermarking, the de oder attempts to re over the opyright message based only on realizations of the se ret key 1 and the forgery y. In this version the de oder is a measurable mapping n

: (y; 1 ) 7! w^ 2 Wn (publi version).

In the private version, however, the de oder also has a

ess to the overtext. In this ase the de oder is a measurable mapping n

: (y; u; 1) 7! w^ 2 Wn (private version):

The estimate of the message W^ is a random variable that is the result of applying the de oder to the forgery Y , the overtext U (in the private version), and the same sour e of randomness used by the en oder 1. That is, W^ = n(Y ; U ; 1) in the private version, and W^ = n (Y ; 1) in the publi version. The realizations of the overtext u, message w, and sour es of randomness (1; 2 ) determine whether the de oder errs in de oding the opyright message, i.e., if the estimate of the message w^ di ers from the original message w. We write this error indi ator fun tion (for the private version) as 8 >
:0 otherwise

gn fn



(u; w; 1 ); 2 ; u; 1



;

where the expression for the publi version is the same, ex ept that the de oder mapping n does not take the overtext u as an argument. We onsider the probability of error averaged over the overtext, message and both sour es of randomness as a fun tional of the 21

mappings f , g , and  . This is written as n

n

n

P (f ; g ;  e

n

n

n

) = EU 1 2 [e(U ; W; 1 ; 2 ; f  = Pr W^ = W ; ;W;

;

n

;g ; n

n

)℄

6

where the subs ripts on the right hand side (RHS) of the rst equality indi ate that the expe tation is taken with respe t to the four random variables U , W , 1, and 2. We adopt a onservative approa h to watermarking and assume that on e the watermarking system is employed, its details | namely the en oder mapping f , the distributions (but not realizations) of the overtext U and of the se ret key 1, and the de oder mapping  | are made publi . The atta ker an be malevolently designed a

ordingly. The watermarking game is thus played so that the en oder and de oder are designed prior to the design of the atta ker. This, for example, pre ludes the de oder from using the maximumlikelihood de oding rule, whi h requires knowledge of the law PY and thus, indire tly, knowledge of the atta k strategy. We thus say that a rate R is a hievable if there exists a sequen e (f ;  ) of allowable rate-R en oder and de oder pairs su h that for any sequen e g of allowable atta kers the average probability of error P (f ; g ;  ) tends to zero as n tends to in nity. The oding apa ity of watermarking is the supremum of all a hievable rates. It depends on ve parameters: the en oder distortion fun tion d1 ( ; ) and level D1 , the atta ker distortion fun tion d2( ; ) and level D2, and the overtext distribution PU . The distortion fun tions will be made obvious from ontext, and thus we write the generi oding apa ity of watermarking as Cpriv(D1 ; D2 ; PU ) and Cpub(D1 ; D2 ; PU ) for the private and publi version, respe tively. n

n

jW

f

f

e

n

n

n

n

g

ng

n

 

 

f

f

2.2

g

f

g

g

Capa ity Results for Watermarking

In this se tion, we des ribe the apa ity of watermarking under various assumptions on the

overtext distribution, distortion onstraints, and atta ker apabilities. We nd the apa ity for the standard watermarking model of Se tion 2.1 when the overtext distribution is IID s alar Gaussian (Se tion 2.2.1), IID ve tor Gaussian (Se tion 2.2.4) and IID Bernoulli(1=2) (Se tion 2.2.6). We deviate from the standard model by onsidering an atta ker that only 22

has to meet the distortion onstraint in expe tation (Se tion 2.2.3) and an atta ker that an only inje t additive noise (Se tion 2.2.2). Finally, we nd a general formula for the apa ity when no overtext is present (Se tion 2.2.5). The detailed proofs of all of these results an be found in later hapters; we present a proof sket h and a referen e to the detailed proof following ea h result.

2.2.1

S alar Gaussian Watermarking Game

We now onsider a watermarking system where all of the alphabets are the real line (i.e.,

U = X = Y = R) and where the distortion measures for both the en oder and atta ker will be squared error, i.e., d1 (u; x) = (x

u)2 and d2 (x; y) = (y

x)2 . Of parti ular interest

U is a sequen e of independent and identi ally distributed (IID) random variables of law N (0; u ), i.e., zero-mean Gaussian. We refer to the s alar Gaussian

is when the overtext

2

watermarking (SGWM) game

when the distortion onstraints and overtext distribution are

as spe i ed above. Surprisingly, we nd that the apa ity of the SGWM game is the same for the private and publi versions. Furthermore, we show that for all stationary and ergodi

overtext distributions, the apa ity of the watermarking game is upper bounded by the

apa ity of the SGWM game. To state our results on the apa ity of the SGWM game we need to de ne the interval

A(D ; D ; u) = 1

2

2







A : max D2 ; u

p

D1

2 

A



u +

p

D1

2 

;

(2.4)

and the mappings 

D s(A; D1 ; D2 ; u ) = 1 1 D2 2

D2 A



1



u2 D1 )2 ; 4D1 u2

(A

(2.5)

and1

C  (D1 ; D2 ; u2 )

8 >

:

1 2



log 1 + s(A; D1 ; D2 ; u2 )

0

1

if A(D1 ; D2 ; u2 ) 6= ; otherwise

Unless otherwise spe i ed, all logarithms in this thesis are base-2 logarithms.

23

: (2.6)

Note that a losed-form solution for (2.6) an be found by setting the derivative with respe t to A to zero. This yields a ubi equation in A that an be solved analyti ally; see

p

Lemma A.1. Further note that C  (D1 ; D2 ; u2 ) is zero only if D2  u2 + D1 + 2u D1 .

The following theorem demonstrates that if the overtext has power u2 , then the oding

apa ity of the private and publi watermarking games annot ex eed C  (D1 ; D2 ; u2 ). Furthermore, if the overtext

U

is an IID zero-mean Gaussian sequen e with power u2 , then

the oding apa ities of the private and publi versions are equal, and they oin ide with this upper bound. Theorem 2.1.

For the watermarking game with real alphabets and squared error distortion

measures, if fPU g de nes an ergodi overtext

U su h that E







Uk4 < 1 and E Uk2



 u2 ,

then

Cpub (D1 ; D2 ; fPU g)

 

Equality is a hieved in both (2.7) and (2.8) if

Cpriv (D1 ; D2 ; fPU g)

(2.7)

C  (D1 ; D2 ; u2 ):

(2.8)

U

is an IID Gaussian sequen e with mean

zero and varian e u2 , i.e. if PU = (N (0; u2 ))n for all n.

This theorem shows that, of all ergodi overtexts with a given power, the IID zeromean Gaussian overtext has the largest watermarking apa ity. Although the overtext

an be thought of as additive noise in a ommuni ation with side information situation (see Se tion 2.5.2), this result di ers from usual \Gaussian is the worst- ase additive noise" idea, see e.g., [CT91, Lap96℄. The basi reason that a Gaussian overtext is the best ase is that the en oder is able to transmit the watermark using the un ertainty of the overtext, and a Gaussian distribution has the most un ertainty (i.e., highest entropy) out of all distributions with the same se ond moment. As an example, onsider an IID overtext in whi h ea h sample Uk is either u or +u 



with probability 1=2 ea h, so that E Uk2 = u2 . If D1 = D2

 u2 , then C (D1 ; D2 ; u2 ) 

1=2 bits/symbol, but a watermarking system ould not reliably transmit at nearly this rate with this overtext. To see this, let us further onsider an atta ker that reates the forgery by quantizing ea h stegotext sample Xk to the nearest of u or +u . Even in the private version, the only way the en oder ould send information is by hanging Uk by at least u , and the en oder an do this for only a small per entage of the samples sin e D1 24

 u2 .

Indeed, using the results of Se tion 2.2.6 on the binary watermarking game, we see that the 

largest a hievable rate for this xed atta ker is2 Hb D1 =u2 bits/symbol, whi h is smaller than 1=2 bits/symbol for D1 =u2 < 0:11, i.e., the regime of interest. Note that the apa ity for this s enario is even smaller sin e we have only onsidered a known atta ker. We also nd that the apa ity of the SGWM game is in reasing in u2 ; see Figure 2-1. Thus, again we see that the greater the un ertainty in the overtext the more bits the watermarking system an hide in it. Another interesting aspe t of this theorem is that, as in the \writing on dirty paper" model (see Se tion 2.5.4 below and [Cos83℄), the apa ity of the SGWM game is una e ted by the presen e or absen e of side-information ( overtext) at the re eiver. See [Cov99℄ for some omments on the role of re eiver side-information, parti ularly in ard games. Moulin and O'Sullivan [MO99, MO00℄ give a apa ity for the SGWM game that is stri tly smaller than C  (D1 ; D2 ; u2 ). In parti ular, they laim that the apa ity is given by 1 2

log(1+ s(A; D1 ; D2 ; u2 )) when A is xed to u2 + D1 instead of optimized over A(D1 ; D2 ; u2 )

as in (2.6), while the optimal A is stri tly larger than u2 + D1 ; see Lemma A.1. This p di eren e is parti ularly noti eable when u2 + D1 < D2 < u2 + D1 + 2u D1 , sin e C  (D1 ; D2 ; u2 ) > 0 in this range while the apa ity given in [MO99, MO00℄ is zero. An

example of the two apa ity expressions is plotted in Figure 2-1. Both ap ity expressions 



D1 and approa h this bound as 2 in reases. Note that are bounded above by 12 log 1 + D u 2

the watermarking game here is de ned di erently than in [MO99, MO00℄, but we believe

that the apa ity of the SGWM game should be the same for both models. Indeed, the general apa ity expression in [MO99, MO00℄ is similar to our mutual information game (see Chapter 3), and we nd that the value of the mutual information game for a Gaussian

overtext is also C  (D1 ; D2 ; u2 ) (see Theorem 3.1). Theorem 2.1 is proved in Chapter 4 in two steps: a proof of a hievability for Gaussian

overtexts and a onverse for general overtexts. Although a hievability for the publi version implies a hievability for the private version, we give separate proofs for the private version (Se tion 4.2) and the publi version (Se tion 4.3). We have hosen to in lude both proofs be ause the oding te hnique for the private version has a far lower omplexity (than the oding te hnique for the publi version) and may give some insight into the design of pra ti al watermarking systems for su h s enarios. We now provide a sket h of the proof. 2

We use

Hb () to denote the binary entropy, i.e., Hb (p) = p log p 25

(1

p) log(1 p).

0.12

0.1

Bits per symbol

0.08

0.06

0.04

0.02

0

0

1

2

3

4

2

σu

5

6

7

8

9

10

Figure 2-1: S alar Gaussian watermarking apa ity versus u2 with D1 = 1 and D2 = 4. The dashed line is the apa ity expression from [MO99, MO00℄.

A hievability We now argue that for a Gaussian overtext all rates less than C (D1 ; D2 ; u2 ) are a hievable in the publi version; see Se tion 4.3 for a full proof. This will also demonstrate that all su h rates are a hievable in the private version as well. The parameter A orresponds to the desired power in the overtext, i.e., our oding strategy will have n

1

kX k  A.

We

now des ribe a oding strategy that depends on A (and the given parameters D1 , D2 and

u2 ) and an a hieve all rates up to 21 log(1 + s(A; D1 ; D2 ; u2 )). Hen e, all rates less than C  (D1 ; D2 ; u2 ) are a hievable with the appropriate hoi e of A. The oding strategy is motivated by the works of Marton [Mar79℄, Gel'fand and Pinsker [GP80℄, Heegard and El Gamal [HEG83℄, and Costa [Cos83℄. The en oder/de oder pair use their ommon sour e of randomness to generate a odebook onsisting of 2nR1 IID odewords that are partitioned

into 2nR bins of size 2nR0 ea h (hen e, R = R1 R0 ). Ea h odeword is uniformly distributed

u and the watermark w, the en oder nds the odeword in bin w that is losest (in Eu lidean distan e) to u. Let vw (u) on an n-sphere with radius depending on A. Given the overtext

be the hosen odeword. The en oder then forms the stegotext as a linear ombination of 26

the hosen odeword and the overtext,

x = vw (u) + (1

)u;

where is a onstant that depends on A. The distortion onstraint will be met with high probability if R0 is large enough. The de oder nds the losest odeword (out of all 2nR1

odewords) to the forgery, and estimates the watermark as the bin of this losest odeword. If R1 is small enough, then the probability of error an be made arbitrarily small. The two

onstraints on R0 and R1 ombine to give the desired bound on the overall rate R.

Converse We now argue that no rates larger than C (D1 ; D2 ; u2 ) are a hievable in the private version

for any ergodi overtext distribution with power at most u2 ; see Se tion 4.5 for a full proof.

The main idea is to show using a Fano-type inequality that in order for the probability of error to tend to zero, a mutual information term must be greater than the watermarking

rate. The mutual information term of interest is roughly I (X ; Y jU ), whi h is related to the apa ity with side information at the en oder and de oder; see Se tion 2.5.2. A

onsequen e of this proof is that these rates are not a hievable even if the de oder knew the statisti al properties of the atta ker. The basi atta ker that guarantees that the mutual information will be small is based on the Gaussian rate distortion forward hannel. That is, su h an atta ker omputes A (i.e., the power in the stegotext) and implements the hannel that minimizes the mutual information between the stegotext and the forgery subje t to a distortion onstraint, assuming that the stegotext were an IID sequen e of mean-zero varian e-A Gaussian random variables. The method that the atta ker uses to ompute A is

riti al. If A is the average power of the stegotext (averaged over all sour es of randomness), then the mutual information will be small but the atta ker's a.s. distortion onstraint might not be met. If A is the power of the realization of the stegotext, then the a.s. distortion

onstraint will be met but the en oder ould potentially use A to transmit extra information. A strategy that avoids both of these problems is to ompute A by quantizing the power of the realization of the stegotext to one of nitely many values. This atta ker will both meet the distortion onstraint (if the quantization points are dense enough) and prevent the en oder from transmitting extra information. 27

2.2.2

Additive Atta k Watermarking Game

In this se tion, we des ribe a variation of the watermarking game for real alphabets and squared error distortions, whi h we all the additive atta k watermarking game. (When it is ne essary to distinguish the two models, we will refer to the original model of Se tion 2.1 as the general watermarking game.) The study of this model will show that it is suboptimal for the atta ker to produ e the forgery by ombining the stegotext with a power-limited jamming sequen e generated independently of the stegotext. Similarly to Costa's writing on dirty paper result (see Se tion 2.5.4 and [Cos83℄), we will show that if the overtext U is IID Gaussian then the apa ities of the private and publi versions are the same and are given by 21 log(1 + DD12 ). This result an be thus viewed as an extension of Costa's result to arbitrarily varying noises; see Se tion 2.5.4 for more dis ussion of this extension. In the additive atta k watermarking game the atta ker is more restri ted than in the general game. Rather than allowing general atta ks of the form (2.2), we restri t the atta ker to mappings that are of the form gn

(x; 2 ) = x + g~n (2)

(2.9)

for some mapping g~n. In parti ular, the jamming sequen e

Y~ = g~n (2)

(2.10)

is produ ed independently of the stegotext X , and must satisfy the distortion onstraint 1

Y~

2  D ; a.s.: 2

n

(2.11)

The apa ity of the additive atta k watermarking game is de ned similarly to the apa ity AA (D ; D ; fP g) and C AA (D ; D ; fP g) for the of the general game and is written as Cpriv 1 2 U 1 2 U pub private and publi versions, respe tively. Our main result in this se tion is to des ribe these

apa ities.

28

Theorem 2.2.

For any overtext distribution

fPU g,

f g) 

AA Cpriv (D1 ; D2 ; PU

(2.12)

=

:

(2.13)

AA (D1 ; D2 ; PU Cpub

Equality is a hieved in (2.12) if



f g)

D 1 log 1 + 1 2 D2

U is an IID Gaussian sequen e.

We rst sket h the onverse for both versions. An IID mean-zero, varian e-D2 Gaussian sequen e Y~ does not satisfy (2.11). However, for any Æ > 0, an IID mean-zero, varian e(D2 Æ) Gaussian sequen e Y~ satis es n 1 kY~ k  D2 with arbitrarily large probability for suÆ iently large blo klength n. Sin e the apa ity here annot ex eed the apa ity when

U

is absent, the apa ity results on an additive white noise Gaussian hannel imply that

D1 ). the apa ity of either version is at most 21 log(1 + D 2

We now argue that the apa ity of the private version is as in the theorem. When the

sequen e U is known to the de oder, then the results of [Lap96℄ an be used to show that all

1 rates less than 21 log(1 + D D2 ) are a hievable using Gaussian odebooks and nearest neighbor

de oding. This establishes the validity of (2.13).

D1 ) is a hievable in To omplete the proof of this theorem, we must show that 21 log(1 + D 2

the publi version of the game with IID Gaussian overtext. We present a oding strategy and demonstrate that all su h rates are a hievable in Chapter 4.3. Sin e any allowable additive atta ker is also an allowable general atta ker, the apa ity of the additive atta k watermarking game provides an upper bound to the apa ity of the general watermarking game. However, omparing Theorems 2.1 and 2.2, we see that for an IID Gaussian overtext this bound is loose. Thus, for su h overtexts, it is suboptimal for the atta ker in the general watermarking game to take the form (2.9). See Se tion 2.5.4 for more dis ussion on the additive atta k watermarking game. 2.2.3

Average Distortion Constraints

In this se tion, we show that if the almost sure distortion onstraints are repla ed with average distortion onstraint, then the apa ity is typi ally zero. That is, we repla e the a.s. onstraints (2.1) and (2.3) on the en oder and atta ker, respe tively, with

U ; X )℄  D1 ;

E [ d1 (

29

(2.14)

and [ (X ; Y )℄  D2 ;

(2.15)

E d2

where the expe tations are with respe t to all relevant random quantities. In parti ular, we have the following theorem. Theorem 2.3.

if the overtext

For the watermarking game with real alphabets and squared error distortion,

U satis es

lim inf E

!1

n



1 n

kU k

2




30

0

m

VGWM (D ; D ; S ) xed, while the blo klength n is allowed to be arbitrarily large. We use Cpriv 1 2 u

VGWM (D ; D ; S ) to denote the apa ity of the VGWM game for the private and and Cpub 1 2 u

publi versions, respe tively. Theorem 2.4.

For the ve tor Gaussian watermarking game,

VGWM VGWM (D1 ; D2 ; Su ) = Cpriv (D1 ; D2 ; Su ) Cpub

=

X m

(2.17)

 2 D1 0 : e D1 D1 D2 0 : e D2 D2 j =1 C (D1j ; D2j ; j ); (2.18)

max t

min t

2 ) are the eigenvalues of S and where C  (D1 ; D2 ; 2 ) is de ned in (2.6), (12 ; : : : ; m u

e is

the m-ve tor ontaining all 1's.

This theorem is proved in detail in Chapter 5, but we now brie y des ribe the oding strategy that a hieves the desired rates for the publi version. The ovarian e matrix Ku

an be diagonalized using an orthogonal transformation that does not a e t the distortion. Thus, we an assume that Ku is diagonal so that U onsists of m omponents, ea h a length-n sequen e of IID zero-mean Gaussian random variables with respe tive varian es 2 ) =  2 . After hoosing m-dimensional ve tors D , D ~ 2 and A, the en oder (12 ; : : : ; m 1 en odes omponent j using the s alar en oder for the SGWM game (see the dis ussion ~ 2j , and u2 = j2 . after Theorem 2.1 and Chapter 4) based on A = Aj , D1 = D1j , D2 = D ~ 2 a ts as an estimate of the amount of distortion the atta ker will pla e Thus, the ve tor D

in ea h omponent. Every atta ker is asso iated with a feasible D2 (not ne essarily equal ~ 2 ), where D2j des ribes the amount of distortion the atta ker in i ts on omponent j . to D

~ 2 by the en oder, the atta ker will hoose D2 = D ~2 However, for the optimal hoi e of D

in order to minimize the a hievable rates. This allows us to des ribe the a hievable rates using the simple form of (2.18). We now dis uss some aspe ts of this theorem, fo using on the di eren es and similarities between SGWM and VGWM. One major similarity is that in both ases the publi and private versions have the same apa ity. One major di eren e between the two games is that in the ve tor version an atta ker based on the Gaussian rate distortion solution is no longer optimal, i.e., it does not ne essarily prevent rates larger than apa ity from being a hievable. A rate-distortion based atta ker al ulates the se ond order statisti s of the stegotext, and designs the atta k to minimize (subje t to an average distortion onstraint) 31

the mutual information between the stegotext and the forgery, assuming that the stegotext was Gaussian. In the SGWM game, this atta ker does not ne essarily meet the almost sure distortion onstraint, but it does prevent rates higher than apa ity from being a hievable. However, in the ve tor version, if su h an atta ker is used, then rates stri tly larger than

apa ity an be a hieved. See Se tion 5.7 for more detail. The di eren e is that an optimal atta ker does not distribute his distortion to the di erent omponents of the stegotext using the familiar water lling algorithm (see e.g., [CT91℄). However, having hosen the

orre t distortion distribution, a parallel on atenation of optimal atta kers for the SGWM game (and hen e a parallel on atenation of s alar Gaussian rate distortion solutions) does prevent rates larger than apa ity from being a hievable. We also note that the order in whi h the watermarking game is played remains riti al in the ve tor version. In parti ular, the max and min in (2.18) annot be swit hed. We highlight the signi an e of this observation by restri ting the en oder and atta ker to parallel on atenations of optimal s alar strategies based on some ve tors There is no single ve tor

D1 and D2.

D2 that the atta ker ould pi k to ensure that no rates higher

than the apa ity are a hieved. Instead, the atta ker must use his advantage of playing se ond (i.e., his knowledge of the en oder's strategy) in order to a

omplish this goal. This di ers from the ve tor Gaussian arbitrarily varying hannel [HN88℄ where the atta ker (resp. en oder) an hoose a distortion distribution to ensure that no rates more than (resp. all rates up to) the apa ity an be a hieved. 2.2.5

Dis rete Alphabets, No Covertext

In this se tion, we examine an extreme watermarking s enario in whi h there is no overtext to hide the message in. In this situation, the atta ker an dire tly modify (subje t to a distortion onstraint) the odeword produ ed by the en oder. This an be viewed as an extension of [CN88a℄, whi h found the random oding apa ity of an arbitrarily varying

hannel (AVC) with onstrained inputs and states (see Se tion 2.5.3 for more on the AVC). The primary di eren e is that in [CN88a℄ the inputs and states are hosen independently of ea h other, while here the states of the hannel are hosen as a fun tion of the input sequen e. Before stating the main result of this se tion, we rst give our assumptions and some notation. We assume that the alphabets

X and Y are nite. Sin e there is no overtext, 32

the distortion onstraint (2.1) is repla ed by d1

:

X 7!

n

1

P

n i=1

(

d1 Xi

)  D1 a.s. for some fun tion

R+ . The distortion onstraint (2.3) imposed on the atta ker remains the same.

The la k of overtext also means that there is no distin tion between the private and publi versions, and thus we write the apa ity for this s enario as

C

NoCov

(D1 ; D2 ). For any

distributions P and P j , we write I X Y X (X ; Y ) to be the mutual information between random variables X and Y under joint distribution P P j . X

P

Y X

P

j

X

Y X

Theorem 2.5. When there is no overtext and dis rete alphabets, the apa ity of the watermarking game is given by

C

NoCov

(D1 ; D2 ) =

max

X :EPX [d1 (X )℄D1

P

min

Y jX :EPX PY jX [d2 (X;Y )℄D2

P

Y jX (X ; Y ):

(2.19)

IPX P

The proof of this Theorem an be found in Se tion 6.1; we now brie y sket h the arguments behind the proof. A hievability

For a xed n, the en oder hooses a distribution satis ed and

n  PX x

odewords f

1; : : : ;

X

( ) is an integer for every

X nR 2

su h that the onstraint in (2.19) is

PX

x 2 X

. The en oder then generates 2

nR

IID

, with ea h odeword uniformly distributed over all n-sequen es

g

whose empiri al distribution is given by P . Given the odebook and the watermark w, the transmitted sequen e is simply

X

x

w

. Note that

1

n

d1

x

(

w

)=

[ ( )℄

EPX d1 X

 D1

, and thus

the distortion onstraint is satis ed. The de oder uses the maximum mutual information (MMI) de oding rule. That is, the estimate of the watermark is given by

x

^ = arg max I (

w

 2nR

1

w0

w0

^

y)

;

x y) is the mutual information between random variables and when they have the joint empiri al distribution of x and y. The probability of error only depends on the atta ker through the onditional empiri al distribution of y given x. Using te hniques where I (

^

X

Y

from [CK81℄, we an show that the probability of error goes to zero as long as the rate

x y) for the orre t watermark . Finally, the onditional empiri al distribution of y given x must satisfy the onstraint in (2.19) in order for the atta ker to

R

is less than I (

w

^

w

meet his distortion onstraint, and thus the en oder an guarantee that the s ore of the 33

orre t odeword I (xw ^ y) is arbitrarily lose to C NoCov (D1 ; D2 ) by making the blo klength large enough.

n

Converse

The atta ker nds the minimizing PY jX in (2.19) for the empiri al distribution of the trans-

mitted sequen e x. He then implements a memoryless hannel based on this PY jX . The ~ 2 < D2 is used distortion onstraint will be met with high probability as long as any D

instead of

in (2.19). A Fano-type inequality an be used to show that no rates higher ~ 2 ) are a hievable for this atta ker. The onverse follows by ontinuity of than C NoCov (D1 ; D D2

(2.19) in D2 . 2.2.6

Binary Watermarking Game

In this se tion, we onsider the watermarking game binary alphabets, i.e.,

U=X

f0 1g. Further, we assume that the overtext U is an IID sequen e of Bernoulli(1 ;

=

Y=

2) random

=

variables, i.e., Pr(Ui = 0) = Pr(Ui = 1) = 1=2. We use Hamming distortion onstraints for

both en oder and de oder, i.e., d1 (u; x) = n

h (u  x) and d2 (x; y) = n

1w

h (x  y). We

1w

BinWM (D ; D ) and C BinWM (D ; D ) for the private write the apa ity in this s enario as Cpriv 1 2 1 2 pub

and publi versions, respe tively. Theorem 2.6. For the binary watermarking game with

BinWM

Cpriv

0  D1  1=2

(D1 ; D2 ) = Hb (D1 D2 )

H

and

0  D2  1=2,

b (D2 );

(2.20)

and

BinWM Cpub (D1 ; D2 )

where D1 p

log p



D2

(1

= D1 (1

D2

) log(1

p .

p

= max

) + (1

D g1

2 1

BinWM (D ; D ) 1 2 priv

> C

g

)

b

H

D1 D2 and H

D1 g





b (D2 )

H

(2.21)

;

b () is the binary entropy,

b (p) =

i.e., H

)

See gure 2-2 for an example plot of C

 

BinWM (D ; D ) 1 2 pub

BinWM (D ; D ) 1 2 priv

C

BinWM (D ; D ). Note that and Cpub 1 2

for 0 < D1 < 1=2 and 0 < D2 < 1=2. Thus, unlike the

Gaussian watermarking games, the apa ity of the private version an ex eed the apa ity 34

0.4

0.3

0.2

0.1

0 0

)=

g

=

0.1

8 > < 1

> : 1

2

1

2

0.2

2

1

D

0.3

if D1 < 1 otherwise

0.4

2 Hb (D2 )

;

1

0.5

2 Hb (D2 ) ; :

. Further, we an rewrite (2.21) as
D   Hb (1 < 1

b(D2 ) H

2 Hb (D2 )  1=2 and thus g (

> : Hb (D1 )

D1

with

(2.22)

(2.23)

. Indeed, we prove the onverse part of this theorem by xing the atta ker to be su h a

35

brief sket h of the a hievability proofs below.

onverse and the a hievability parts of the theorem an be found in Se tion 6.2. We give a

and Pinsker's work [GP80℄ on hannels with side information. The detailed proof of the

hannel and omputing the resulting apa ity using an extension (Lemma 2.1) of Gel'fand

D2

ity when the atta ker is xed to be a binary symmetri hannel with rossover probability

Note that Barron, Chen and Wornell [BCW00℄ found identi al expressions for the apa -

BinWM D1 ; D2 Cpub

where 1

of the publi version. Also note that the maximizing g in (2.21) is given by

Figure 2-2: Binary watermarking apa ity (private and publi versions) versus = 0:15. D2

Bits per symbol

A hievability for Private Version The en oder and the de oder an use their ombined knowledge of the overtext U to provide se re y about a transmitted sequen e hosen independently of U . To see this, let a

odeword X~ = n( 1 ) be hosen independently of U (but depending on the watermark and the se ret key 1). The en oder will form the stegotext as X = U  X~ , and thus the distortion onstraint on the en oder be omes 1 h(X~ )  1 a.s.. Furthermore, U is an IID sequen e of Bernoulli(1 2) random variables, and thus X and X~ are independent. Thus, any rate a hievable for the AVC with onstrained inputs and states is a hievable BinWM ( here; see Se tion 2.5.3 and [CN88a℄. In parti ular, all rates less than priv 1 2 ) are a hievable. f

W;

W

n

w

D

=

C

D ;D

A hievability for Publi Version Let us rst x a parameter as in (2.21). The en oder/de oder pair sele t indi es uniformly out of all subsets of f1 g of size . The en oder will use only these indi es of the overtext to en ode the watermark. We use a odebook similar to that used for the publi version of the SGWM game. In parti ular, every watermark 2 f1 2nR g

orresponds to a bin of 2nR0 odewords. Ea h odeword is a length- IID sequen e of Bernoulli(1 2) random variables. Given the watermark and the overtext u, the en oder nds the odeword in bin that agrees with the overtext at the sele ted indi es as losely as possible. The en oder then reates the stegotext by repla ing the sele ted positions of the overtext with the losest odeword. The distortion onstraint will be satis ed if g

ng

;:::;n

ng

w

;:::;

ng

=

w

w



R0 > g

 1

b

H



D1 g



(2.24)

:

The de oder nds the odeword losest to the forgery at the sele ted indi es, and estimates the watermark as the bin of this odeword. Let y~ = y  x be the di eren e between the forgery and the stegotext. The probability of error only depends on the atta ker through the Hamming weight of y~ , whi h an be at most 2. With high probability, the Hamming weight of y~ at the sele ted positions will not greatly ex eed 2 . This observation allows us to show that the probability of error tends to zero as long as nD

ngD

R

+

R0 < g

 (1 36

H

b (D2 )) :

(2.25)

The ombination of (2.24) and (2.25) ompletes the proof.

2.3

Prior Work on Watermarking

In this se tion, we dis uss some of the related literature and ompare it to the results presented above. We rst brie y des ribe some te hniques that have been proposed, and we then give an overview of the information-theoreti work that has been done. 2.3.1

Pra ti al Approa hes to Watermarking

The simplest watermarking systems onvey information by modifying the least signi ant parts of the overtext data, e.g., hanging the low-order bits in a digital representation of an image. These systems are transparent, but they are easily orrupted. For example, lossy ompression will remove the least signi ant portions of the data or an atta ker might repla e low-order bits with random bits without greatly a e ting the quality of the data. It was re ognized [CKLS97℄ that in order to a hieve robustness, information must be embedded in signi ant portions of the data. Thus, for a given desired watermarking rate, there is a non-trivial trade-o between robustness and transparen y. The most widely studied lass of watermarking systems onsist of \spread spe trum" te hniques, introdu ed in [CKLS97℄. In these systems, a noise-like sequen e is added to the

overtext at the en oder and a orrelation dete tor is used at the de oder. The watermark (i.e., the added sequen e) is often s aled to a hieve the desired robustness or transparen y requirement, but otherwise the watermark is independent of the overtext. The watermark

an be added either dire tly or in transform domains like Dis rete Cosine Transform (DCT) [CKLS97℄, Fourier-Mellon transform [ORP97℄ or wavelets [XA98℄. One important feature of su h systems is that when the overtext is not available at the de oder (i.e., the publi version), then the overtext a ts as interferen e in the de oding pro ess. Thus, as the variability in the original data in reases, the amount of information that an be embedded in this manner de reases. However, we have seen that the apa ity for our watermarking model an in rease as the variability of the overtext in reases, e.g., for the SGWM game. Thus, forming the stegotext by linearly ombining the overtext and a signal that only depends on the watermark is suboptimal. One new watermarking method that does not su er from the problem of interferen e 37

from the overtext is Quantization Index Modulation (QIM), introdu ed by Chen and Wornell [Che00, CW01℄. In QIM, a quantizer is used for the overtext depending on the value of the watermark. By varying the number and oarseness of the quantizers, one an trade o between transparen y, robustness and data rate. Some of the watermarking te hniques des ribed in this thesis are similar to distortion ompensated QIM, in whi h the stegotext is a linear ombination of the overtext and the quantized version of the overtext (where again the quantizer depends on the value of the watermark). For example, in the publi version of the SGWM game, the stegotext is a linear ombination of the overtext and a

odeword sele ted from the bin asso iated with the watermark; see the dis ussion after Theorem 2.1 and Se tion 4.3. The pro ess of sele ting the odeword is similar to quantization sin e the hosen odeword is the one losest to the overtext. In [Che00, CW01℄, it was shown that distortion ompensated QIM a hieves the apa ity for situations with a known atta ker. Here, we show that a similar te hnique also a hieves the apa ity for an unknown and arbitrary atta ker. 2.3.2

Information-Theoreti Watermarking

The basi information theoreti model of watermarking was introdu ed by O'Sullivan, Moulin and Ettinger [MO99, MO00, OME98℄. They investigated the apa ity of a model that is similar to that des ribed above but with several important di eren es. First, they assume a maximum likelihood de oder, whi h requires the de oder to be ognizant of the atta k strategy. In ontrast, we require that one en oder/de oder pair be robust against any potential atta k. Se ond, they fo us ex lusively on average distortion onstraints, while we ompare the average and almost sure onstraints. In fa t, we nd that average distortion onstraints typi ally result in a apa ity of zero. Finally, despite our stri ter requirements, we have seen that our apa ity with a Gaussian overtext is larger than that given in [MO99, MO00℄; see Figure 2-1 for a omparison of the two apa ities. Mittelhozer [Mit99℄ independently introdu ed a similar model for watermarking. Still others [BBDRP99, BI99, LC01, LM00, RA98, SPR98, SC96℄ have investigated the apa ity of watermarking systems, but only for spe i en oding s hemes or types of atta ks. The most similar model to ours has been re ently proposed by Somekh-Baru h and Merhav [SBM01a, SBM01b℄. In their model, the probability that the distortion introdu ed by the en oder or the atta ker is greater than some threshold must de ay to zero exponen38

XY

X = xg  e

tially, i.e., Pr fd2 ( ; ) > D2 j

n

for some  and for all

x 2 X n, and similarly

for the en oder. This type of onstraint is equivalent to our a.s. onstraints when  = 1.

In [SBM01a℄, they nd a general expression (that does not depend on ) for the oding

apa ity of the private version for nite alphabets. This result supersedes our result of Theorem 2.5 on the apa ity of the watermarking game with no overtext and nite alphabets. Their apa ity expression is similar to the mutual information game of [MO99, OME98℄. We also see that for a s alar Gaussian overtext, the apa ity is the same as the value of a related mutual information game; ompare Theorems 2.1 and 3.1. Besides apa ity, several other information theoreti quantities have begun to be addressed for watermarking. Merhav [Mer00℄ and Somekh-Baru h and Merhav [SBM01a, SBM01b℄ have studied error exponents (i.e., how the probability of error de reases to zero as the blo klength in reases for rates less than apa ity) for a similar watermarking model, but with slightly di erent distortion onstraints; see above. Also, Steinberg and Merhav [SM01℄ have investigated the identi ation apa ity of a watermarking system with a xed atta k hannel. In identi ation, questions of the form \Was watermark w sent?" need to be answered reliably instead of the usual \Whi h watermark was sent?". This more lenient requirement results in a doubly exponential growth in the number of watermarks; see also [AD89℄. Furthermore, questions of this form might be what needs to be answered in some

opyright prote tion appli ations. Finally, Karakos and Papamar ou [KP00a℄ have studied the trade-o between quantization and watermarking rate for data that needs to be both watermarked and ompressed. 2.3.3

Similar Models: Steganography and Fingerprinting

In this se tion, we onsider some models that are similar to watermarking and that have also generated re ent interest. In steganography, the obje tive is to embed information so that an adversary annot de ide whether or not information has been embedded. This di ers from our watermarking model sin e we assume that the atta ker knows that information has been embedded, but has only limited means to remove it. For more on steganography see e.g., [AP98, Ca 98, KP00b℄. In ngerprinting, the embedded information is used to identify one of many users as opposed to a single owner. That is, the same overtext is given to di erent users with di erent watermarks. Thus,

ollusive

atta ks are possible

on a ngerprinting system, while they are not possible on a watermarking system. In a 39

ollusive atta k, many users ontribute their distin t ngerprinted opies in order to reate a better forgery. Several resear hers [BBK01, BS98, CFNP00, CEZ00, SEG00℄ have studied the number of ngerprinted obje ts that a system an distribute under various onditions. The resear h on ngerprinting has fo used largely on ombinatorial lower bounds on the number of possible ngerprints, while there has been less work on information-theoreti upper bounds. 2.3.4

Communi ation Games

We have seen that watermarking an be viewed as a ommuni ation game. At a low level, the en oder and de oder are playing a game against the atta ker in whi h they are trying to ommuni ate over a hannel where the en oder's input sequen e an be hanged arbitrarily (subje t to a distortion onstraint), while the atta ker is trying to prevent su h reliable ommuni ation. This is similar to the arbitrarily varying hannel (AVC), in whi h the en oder and de oder have to be designed to reliably send a message over a hannel with many possible states, in whi h the hannel state an hange arbitrarily (as opposed to sto hasti ally). At a higher level, the en oder and de oder are trying to to maximize the set of a hievable rates while the atta ker tries to minimize this set. This is similar to many mutual information games, in whi h a ommuni ator and a jammer try to maximize and minimize, respe tively, a mutual information expression. The solution to a mutual information game an sometimes be used to des ribe the maximum a hievable rate for a

ommuni ation system. The AVC and a mutual information game are dis ussed in more detail in Se tion 2.5.3 and Chapter 3, respe tively. We now onsider a sample of other ommuni ation games that have been investigated. In one game [Bas83, BW85℄, a power- onstrained transmitter tries to send a sequen e of Gaussian random variables to a re eiver with minimum mean-squared error, while a jammer (with some knowledge of the transmitter's input) attempts to maximize the error. In another game [MSP00℄, a transmitter an hoose whi h slots in a slotted ommuni ation hannel to transmit and the jammer an hoose whi h slots to jam. Both transmitter and jammer are onstrained by a dissipative energy model so that if power Pn (whi h an be either zero or some xed value) is used in slot n, then Pmax

Pmn

n m n  Pmax for all m where Æ and

1 =0 Æ P

are given onstants. In a nal game [GH99, SV00℄, a transmitter tries to use the

timing of pa kets to send information over a network (as in \Bits through Queues" [AV96℄), 40

while a jamming network provider attempts to minimize the information rate subje t to a

onstraint that he must deliver the pa kets in a timely fashion.

2.4

Assumptions in Watermarking Model

In this se tion, we review some of the assumptions made in the watermarking model. In Se tion 2.4.1, we brie y dis uss if apa ity is a good measure for a watermarking system. We then dis uss randomization, and in parti ular when it is not ne essary, for the en oder/de oder (Se tion 2.4.2) and for the atta ker (Se tion 2.4.3). In Se tion 2.4.4, we dis uss the distortion onstraints that we impose in the watermarking model. In Se tion 2.4.5, we dis uss the overtext distributions that we have hosen to study. 2.4.1

Is Capa ity Meaningful?

In Se tion 2.2, we des ribed the watermarking apa ity for many s enarios, but we have not addressed whether the apa ity of a watermarking system is a meaningful on ept; we now dis uss this issue. In order for the asymptoti analysis in the de nition of apa ity to be meaningful, there should be e e tively limitless overtext data and unending watermark information to embed. This might not always be the ase for a opyright prote tion appli ation, sin e there would usually be a xed length overtext and one of a xed number of messages to embed. However, in many instan es the data to be watermarked is quite long (e.g., a movie or an album), and the asymptoti regime an be safely assumed. Furthermore, there are other appli ations, su h as hybrid digital/analog transmission and losed

aptioning, in whi h the above assumptions are met more generally. In any ase, we think that the apa ity a hieving s heme should shed light on how to design a good watermarking system even for a non-asymptoti situation. 2.4.2

Randomization for En oder/De oder

There is a di eren e between the randomized oding used here and Shannon's lassi al random oding argument (see, for example, [CT91, Chap. 8.7℄). In the latter, odebooks are hosen from an ensemble a

ording to some probability law, and it is shown that the ensemble-averaged probability of error is small, thus demonstrating the existen e of at least one odebook from the ensemble for whi h the probability of error is small. For the water41

marking game, on the other hand, randomization is not a proof te hnique that shows the existen e of a good odebook, but a de ning feature of the en oding. For example, the randomization at the en oder prevents the atta ker from knowing the parti ular mapping used for ea h message; the atta ker only knows the strategy used for generating the odewords. See [LN98℄ for more on this subje t. Nevertheless, in the private version of the watermarking game, ommon randomness is typi ally not needed between the en oder and the de oder and deterministi odes suÆ e. For example, onsider an IID Gaussian overtext. Part of the overtext to whi h both the en oder and the de oder have a

ess, an be used instead of the se ret key 1. Indeed, the en oder ould set x1 = 0, and use the random variable U1 as the ommon random experiment. The extra distortion in urred by this poli y an be made arbitrarily small by making n suÆ iently large. Sin e U1 is a real-valued random variable with a density, it is suÆ ient to provide the ne essary randomization. Even if the overtext does not have a density, a similar te hnique an be used to show that a se ret key is not ne essary in the private version, as long as the number of samples from the overtext used for randomization does not asymptoti ally a e t the distortion. Indeed, Ahlswede [Ahl78℄ has shown that only4 O(n2) odebooks are ne essary to a hieve randomization in many situations. Thus, only O(log n) random bits available to both the en oder and de oder are needed to spe ify whi h odebook to use. Thus, if the overtext is a dis rete memoryless sour e, then O(log n) samples from the overtext (whi h is known to both the en oder and de oder in the private version) an be used to spe ify the odebook. In order to prevent the atta ker from learning anything about the odebook, the en oder should make the stegotext samples independent of the overtext samples that are used to spe ify the odebook, whi h results in some extra distortion. However, if the distortion

onstraint is bounded, then the extra distortion that is needed to implement this s heme is   O logn n , whi h an be made negligible by making the blo klength n large enough.

2.4.3 Randomization for Atta ker - Deterministi is SuÆ ient We allow the atta ker to implement a randomized strategy. However, to prove a hievability in the watermarking game, we an without loss of generality limit the atta ker to determin4

For any two fun tions

f (n )

and

g (n), f (n) = O(g (n))

42

if

f (n)=g (n)

is bounded for all

n.

isti atta ks. That is, it is suÆ ient to show that the average probability of error (averaged over the side information, se ret key and message) is small for all atta ker mappings

y = g (x)

(2.26)

n

instead of the more general g (x; 2 ). With an atta ker of this form, the distortion onstraint (2.3) an be rewritten as d2(X ; g (X )) D2 , almost surely. Indeed, we an evaluate the average probability of error (averaged over everything in luding the atta k key 2) by rst onditioning on the atta k key 2. Thus, if the average probability of error given every atta ker mapping of the form (2.26) is small, then the average probability of error for any general atta ker mapping of the form (2.2) is also small. This idea is similar to the argument (whi h we outlined about in Se tion 2.4.2) that deterministi

odebooks are suÆ ient for a xed hannel. n

n

2.4.4



Distortion Constraints

Admittedly, the te hnique we have used to de ide whether two data sequen es are \similar" has some aws. However, the simpli ity of our te hnique allows us to derive losed form solutions that hopefully will give some intuition for more realisti s enarios. To review, we say that data sequen es x and y are similar if n 1 P d(x ; y ) D for some non-negative fun tion d( ; ) and some threshold D. The rst potential problem is that y ould be a shifted or rotated version of x and thus very \similar" to x. However, our distortion measure would not re ognize the similarity. This will a e t our watermarking performan e sin e we only require de oding from forgeries that are similar a

ording to our distortion measure. One way to over ome this problem is to watermark in a domain (e.g., Fourier) that is relatively robust to su h transformations [LWB+01, ORP97℄. Another way to over ome this problem is for the en oder to use some its available distortion to introdu e a syn hronization signal that the de oder an use to align the samples of the overtext and the forgery [PP99℄. The se ond potential problem is that there might not be a pointwise fun tion d( ; ) so that our distortion measure orresponds to per eptual distortion. Mu h work has been devoted to developing models of human per eption to design good data ompression s hemes; see e.g., [JJS93, MS74℄ and referen es therein. It is lear that the squared di eren e distortion measure that we have mainly used does not dire tly orrespond to human per eptual distortion, but our i

i

i



 

 

43

distortion measure is tra table and provides a de ent rst approximation. We would like to integrate some more knowledge of the human per eptual system into our watermarking model in the future. We require that the atta ker satisfy a distortion onstraint between the stegotext and the forgery

Y.

X

This is plausible be ause the atta ker observes the stegotext and thus

he knows exa tly what forgeries are allowed. Sin e one basi purpose of this onstraint is to ensure that the forgery is similar to the overtext U , some have suggested that the atta ker's

onstraint be between

U and Y

[Mou01℄. However, with this alternative onstraint, if the

amount of distortion the atta ker an add is small (but non-zero), then the watermarking system an send unlimited information, whi h seems unreasonable. On the other hand, for the SGWM game (with our original onstraint) we saw that the apa ity is zero only if

D2 > u2 + D1 +

p D , while if D 2 u 1

2

> u2 , then the atta ker ould set the forgery to zero,

resulting in no positive a hievable rates and a distortion between U and Y of approximately

u2 < D2 . Thus, the apa ity under our onstraint is potentially too large for large atta ker distortion levels, while the apa ity under the alternative onstraint is potentially too large for small atta ker distortion levels. 2.4.5

Statisti s of Covertext

In our study of watermarking, we have largely fo used on Gaussian overtext distributions. Su h a distribution might arise in transform domains where ea h sample is a weighted average of many samples from the original domain, in whi h ase one would expe t the entral limit theorem to play a role. Indeed, some studies [BBPR98, JF95, Mul93℄ have found that the dis rete osine transform (DCT) oeÆ ients for natural images are well modeled as generalized Gaussian random variables, whi h in lude the standard Gaussian distribution as a spe ial ase5 . While a Gaussian model is reasonable for many types of sour es that might need to be watermarked, there are other sour es that require watermarking that annot be so modeled; examples in lude VLSI designs [Oli99℄ and road maps [KZ00℄. A short oming of the Gaussian assumption is that the data we are interested in will be stored on a omputer, and hen e the distribution ould only be a quantized approximation generalized Gaussian density is de ned by fX (x) = 2 (1(=) ) exp( ( ( )jx= j) ), where ( ) = (3= )= (1= ), () is the usual gamma fun tion and  is the so- alled shape parameter. The generalized Gaussian is equivalent to the standard Gaussian when  = 2.

p

5 The

44

of a Gaussian distribution. If the quantization is too oarse, then the Gaussian assumption would not be reasonable. For example, one-bit quantization of every overtext sample would lead to the binary watermarking game, whi h we have seen to have mu h smaller apa ity than the s alar Gaussian watermarking game. However, we are more likely to be interested in high delity storage of the data, and the Gaussian approximation is more reasonable in this ase.

2.5

Un ertainty in the Watermarking Model

In watermarking, an en oder/de oder pair has to deal with two sour es of un ertainty, the

overtext and the atta ker. In our model, the overtext is generated sto hasti ally from some known distribution while the atta ker an take on any form subje t to a distortion

onstraint. In Se tion 2.5.1, we formalize the di eren es between these two types of un ertainty into sto hasti ally generated states and arbitrarily generated states. We then onsider two models: one that ontains only sto hasti ally generated states ( ommuni ation with side information, Se tion 2.5.2) and one that ontains only arbitrarily generated states (the arbitrarily varying hannel, Se tion 2.5.3). In Se tion 2.5.4, we onsider an instan e of

ommuni ation with side information, Costa's \writing on dirty paper" model [Cos83℄, and des ribe two extensions to this model.

2.5.1

Types of State Generators

In order to dis uss the types of state generators, we onsider a ommuni ation hannel that has a transition probability that depends on a state s. That is, given the value of the

urrent state Y

s

and the urrent input x, the output of the hannel is a random variable

with distribution

PY

jX;S

(jx; s), where we assume throughout that

PY

jX;S

is known.

Furthermore, given the state sequen e s and the input sequen e x, the output sequen e Y is generated in a memoryless fashion, so that P

(Y jx; s) =

Y n

i

PY

=1

45

jX;S

(Y jx ; s ): i

i

i

(2.27)

-

Sto hasti State Generator

U

W

-

?

En oder

Arbitrary State Generator

S

X

-

?

Channel

Y

-

De oder

- W^

Figure 2-3: Watermarking model with state sequen es.

We would like to des ribe the oding apa ity for su h a hannel. That is, we would like to answer the usual question, \For rates R an we reliably ommuni ate nR bits using the

hannel n times?". In general, reliable ommuni ation means that the probability of error

an be as small as desired by making the blo klength n large enough. The de nition of probability of error that we use a e ts the apa ity and depends on how the state sequen e is generated. Unless stated otherwise, we fo us on probability of error averaged6 (as opposed to maximized) over all possible bit sequen es and sequen e-wise probability of error (as opposed to bit-wise). We will also assume that the en oder and de oder share a sour e of randomness and that the probability of error is averaged over this sour e of randomness as well. We now onsider two possible methods for generating the state sequen e:

1. The state sequen e S ould be generated sto hasti ally from some known distribution

PS (usually independently of the other sour es of randomness). In this ase, we will be interested in the probability of error averaged over the possible values of the state sequen e.

2. The state sequen e s ould be generated arbitrarily, possibly subje t to some onstraint. In this ase, we will want to insure that the probability of error an be made small for every possible state sequen e s. 46

Restatement of Watermarking Model We an think of our watermarking model as having two state sequen es, one generated sto hasti ally and one generated arbitrarily. This idea is depi ted in Figure 2-3 (for the publi version only). Here, the sto hasti ally generated state U is the overtext and the arbitrarily generated state7 S des ribes the mapping between the stegotext X and the forgery Y . For example, if X and Y are real random ve tors, then S ould be the di eren e

between Y and X and PY X S (jx; s) is the unit mass on x + s. This form is parti ularly useful when the atta ker's distortion fun tion an be written d2 (x; y) = d2 (y x). In this j

;

ase, the atta ker's distortion onstraint be omes a onstraint solely on the sequen e S . Note that the atta ker knows the stegotext X , and thus the arbitrary state sequen e S is a tually a mapping from X into n

Y

n

. Thus, the en oder/de oder pair wishes to make the

average probability of error small for every possible atta ker mapping, where the probability of error is averaged over all sour es of randomness in luding the overtext. Both the en oder and the arbitrary state sequen e S are subje t to distortion onstraints. Thus, although the sto hasti ally generated state sequen e U does not dire tly a e t the hannel, it does indire tly a e t the hannel through the onstraint on the en oder's output. 2.5.2

Communi ation with Side Information

We now onsider a model with only sto hasti ally generated states, like the overtext in the watermarking game. When known at the en oder or de oder, the state sequen e is alled side information and thus this model is referred to as ommuni ation with side information. An example where the side information is known at the en oder only is depi ted in Figure 24. All of the models in this se tion assume that the sto hasti state sequen e is generated in an IID manner a

ording to a known distribution P . U

Shannon [Sha58℄ rst studied this problem under the assumption that the en oder must be ausal with respe t to the side information. That is, the ith hannel input x an be a i

fun tion of only the message and the hannel states up to and in luding time i. Gel'fand and Pinsker [GP80℄ later found the apa ity assuming (as we do in the watermarking game) that the en oder has non- ausal a

ess to the side information. That is, the hannel input ve tor 6 7

We make the usual assumption that all bit sequen es are equally likely. This arbitrarily generated state is a tually an arbitrary mapping

ve tor

S.

47

s(X )

that we write as the random

Sto hasti State Generator

W

-

U

?

X -

En oder

? Channel

Y -

-

De oder

^

W

Figure 2-4: Communi ation with side information at the en oder.

x2X

n

an be a fun tion of the message and the hannel state ve tor

u2U

n

. A ausal

en oder makes pra ti al sense in many real time appli ations, but a non- ausal en oder also makes sense in other situations, su h as watermarking or storing information on a partially defe tive hard drive. Heegard and El Gamal [HEG83℄ onsidered a generalization of [GP80℄ where the state sequen e an be des ribed non- ausally to both the en oder and de oder, but only using rates

Re

and R , respe tively. d

Capa ity Results We now give the apa ity of the hannel with side information in two s enarios: when the state sequen e U is known non- ausally to the en oder only, and when the state sequen e U

is known to non- ausally to both the en oder and de oder. As in the watermarking game, we will refer to these s enarios as the publi and private versions, respe tively. Note that these results are proved assuming that the sets X , U and Y are nite. For the private version with non- ausal side information, the apa ity is given by [Wol78, HEG83℄ NCSI priv = max I (X ; Y jU );

C

P

(2.28)

X jU

where the mutual information is evaluated with respe t to the joint distribution P

U;X;Y

PU PX jU PY

jX;U

. Re all that

PU

and P

Y jX;U

=

are given.

For the publi version with non- ausal side information, the apa ity is given by [GP80, 48

HEG83℄ NCSI

Cpub

=

max

V jU ; f :VU7!X

P

( ;Y )

I V

( ; U );

(2.29)

I V

where V is an auxiliary random variable with alphabet jVj  jXj + jUj

1, and the mutual

informations are evaluated with respe t to the joint distribution

PU;V ;X;Y

(u; v; x; y

8 >< )= >:0

( )

PU u PV

j (vju)P j U

Y X;U

(yjx; u) if x = f (v; u) otherwise

:

(2.30)

The a hievability of this apa ity is proved using a random binning argument that we will also use to prove the watermarking apa ity result. Note that the apa ity with ausal side information is similar [Sha58℄, ex ept that P j is repla ed by P in (2.29) and (2.30). The

apa ity with non- ausal side information an be stri tly greater than the apa ity with V U

V

ausal side information. Thus, we would not expe t the results on watermarking to dire tly

arry over to a ausal situation. Fixed Atta k Watermarking

One potential atta k strategy in the watermarking game is a memoryless hannel based on some onditional distribution P atta k j . Of ourse, the atta ker should hoose this distribution Y X

so that the distortion onstraint is met either with high probability or in expe tation. Assuming su h an atta k strategy is used and known to both the en oder and de oder, an extension of (2.28) or (2.29) an be used to des ribe the a hievable rates for this s enario. In the following lemma, we des ribe the apa ity of the publi version with non- ausal side information when the en oder is required to meet a distortion onstraint between the side information and the hannel input, whi h an be used to des ribe the watermarking

apa ity with a xed atta k hannel. Lemma 2.1.

For the ommuni ation with side information model with nite alphabets, if

the side information is available non- ausally to the en oder only and the en oder is required to satisfy 1X n

n

i

=1

(

d1 ui ; xi

)  D1 ; a.s.;

49

(2.31)

for some non-negative fun tion NCSI

Cpub

d1

(; ). Then, the apa ity is given by

(D1 ) =

P

max

V jU ; f :VU7!X ;

E [d1 (U;X )℄

where

V

( ;Y )

I V

( ; U );

(2.32)

I V



D1

is an auxiliary random variable with nite alphabet, and the mutual informations

are evaluated with respe t to the joint distribution (2.30). The proof of this lemma an be found in Appendix B.2. The a hievability part and most of the onverse part of the proof follow dire tly from the proof of Gel'fand and Pinsker [GP80℄. One tri ky part involves showing that the onditional distribution

PX

j

V;U

is de-

terministi (i.e., only takes values of 0 and 1). We will use this lemma to simplify the evaluation of the publi version of the binary watermarking game; see Se tion 6.2.2. An atta ker in the watermarking game annot implement a general hannel based on

both the input and the state sin e the atta ker does not dire tly know the state sequen e U

(i.e., the overtext). However, this result an be used to analyze xed atta k watermarking

by substituting P j (yjx; u) = P atta k j (yjx) for all u 2 U . This analysis inspires the mutual information games that we will des ribe in Chapter 3. Y X;U

Y X

In short, the mutual information game will further modify (2.32) and the analogous result for the private version by adding a minimization over feasible atta k \ hannels" P atta k j , where Y X

feasible means that the distortion onstraint is met in expe tation. It is not lear that the solution to the mutual information game des ribes the apa ity of the watermarking game. This is partly be ause a de oder for ommuni ation with side information uses knowledge about the hannel's onditional distribution, while in the watermarking game, the atta ker

an hoose any feasible atta k hannel after the de oder has been deployed. 2.5.3

Arbitrarily Varying Channels

We now turn our attention to states that an be generated arbitrarily. That is, there is no probability distribution on the state sequen es, and any performan e guarantees have to be valid for any possible state sequen e. In the watermarking game, the atta ker (under the a.s. distortion onstraint) an produ e an arbitrary sequen e (subje t to the distortion

onstraint) in its attempt to onfuse the en oder and de oder. The basi arbitrarily varying hannel (AVC) was introdu ed in [BBT60℄ and has a single arbitrarily generated state sequen e

s that determines the onditional distribution 50

Arbitrary State Generator

Sto hasti State Generator

W

-

En oder

U ? X-h

s - h? Y -

Figure 2-5: Gaussian arbitrarily varying hannel: an arbitrary power onstrained sequen e.

De oder

- W^

U is an IID Gaussian sequen e and s is

of the hannel as in (2.27). Unlike the usual ommuni ation s enario (e.g., a memoryless

hannel), the apa ity depends on whether average or maximum probability of error is used and on whether there is a ommon sour e of randomness available to the en oder and de oder. Many variations of the AVC have been studied, see e.g., [CK81, LN98℄ for extensive referen es. Unlike the watermarking game, the state sequen e and the input to the

hannel are usually assumed to be hosen independently. However, see [Ahl86℄ for analysis of the AVC when the state sequen e is known to the en oder and [AW69℄ for analysis of the AVC when the input sequen e is known to the state sele tor. Csiszar and Narayan [CN88a, CN88b℄ onsidered an instan e of the AVC that has parti ular relevan e to the watermarking game in whi h the input sequen e

x and the state sequen e s must satisfy

respe tive onstraints. The apa ity results depend on whether the onstraints are enfor ed almost surely or in expe tation, as is also the ase for the watermarking game ( ompare Theorems 2.1 and 2.3).

The Gaussian Arbitrarily Varying Channel The Gaussian arbitrarily varying hannel (GAVC), introdu ed by Hughes and Narayan [HN87℄, is a parti ular AVC with onstrained inputs and states that is related to the Gaussian watermarking game. In the GAVC (illustrated in Figure 2-5), the input and state sequen es must both satisfy power onstraints, and the hannel is given by

Y = X + s + Z,

Z is an IID sequen e of N (0; 2 ) random variables, s is an arbitrary sequen e (subje t 1 ksk2  D ), and the input X is similarly power limited to D . 2 1

where to n

Hughes and Narayan [HN87℄ found that the apa ity of the GAVC (when a sour e of 51

ommon randomness is available to the en oder and de oder) is given by C

GAVC



D1 1 (D1 ; D2 ; 2 ) = log 1 + 2 D2 +  2

 :

(2.33)

Note that this is the same apa ity that would result if s were repla ed by an IID sequen e of

N (0; D2 ) random variables. Further note that if the a.s. power onstraints are repla ed by

expe ted power onstraints then the apa ity of the GAVC is zero, although the - apa ity8 is positive and in reasing with . Let us now onsider an alternate des ription of the GAVC in order to highlight the similarities with the watermarking game with an IID Gaussian overtext. The GAVC an be obtained from the watermarking game by slightly modifying the apabilities of both the en oder and the atta ker, as we now outline. First, the en oder must be of the form X = U + X~ , where X~ is independent of U (but not independent of the watermark W ).

Se ond, the atta ker must form the atta k sequen e s independently of X . Thus, the overall ~ + s + U , where X ~ is a power limited sequen e depending on

hannel is given by Y = X

the message, s is a power limited arbitrary sequen e, and U is an IID sequen e of Gaussian ~ and s. Although both the en oder and atta ker are random variables independent of X less powerful in the GAVC than in the watermarking game, the e e t does not an el out.

Indeed, the apa ity of the GAVC de reases with the varian e of U while the watermarking

apa ity in reases; ompare (2.6) and (2.33). Finally, note that the additive atta k watermarking game of Se tion 2.2.2 with an IID Gaussian overtext is a ombination of the GAVC and the s alar Gaussian watermarking game. In parti ular, this game uses the en oder from the watermarking game and the atta ker from the GAVC. In this ompromise between the two models, the apa ity does

not depend on the varian e of U ; see Theorem 2.2. 2.5.4

Extended Writing on Dirty Paper

A spe ial ase of ommuni ation with side information (see Se tion 2.5.2) is Costa's writing on dirty paper [Cos83℄, whi h is depi ted in Figure 2-6. In this model, all of the the sets

Z are the real line. Further, the en oder knows the state sequen e U non ausally and its output X = x(W; U ) must satisfy a power onstraint, i.e., n 1 kX k2  D1 X , Y, U 8

and

The - apa ity is the supremum of all rates su h that the probability of error is at most .

52

Z

U

-

W

? En oder

? X-h

Figure 2-6: Writing on dirty paper.

- h? Y -

De oder

- W^

U and Z are independent IID Gaussian sequen es.

a.s.. Finally, the output of the hannel is given by

Y =X +U +Z

(2.34)

where both U and Z are independent IID sequen es of zero-mean Gaussian random variables of varian es 2 and D2 , respe tively. We will all U the overtext and Z the jamming

u

sequen e. Costa's main result is that the apa ity is the same whether or not the overtext

U is known to the de oder. When U is known to the de oder, the hannel e e tively be ome Y = X + Z , i.e. the lassi al power limited Gaussian hannel. Thus, the apa ity is given

D1 ), whi h does not depend on 2 . Others [Che00, YSJ+ 01℄ have extended by 21 log(1 + D u 2 this result to when U and Z are independent non-white (i.e., olored) Gaussian pro esses.

In this se tion, we des ribe two further extensions of Costa's result. First, when U has any

Z is an independent olored Gaussian pro ess, we show that the apa ity with non- ausal side information (the random ve tor U ) at the en oder is (power limited) distribution and

the same as the apa ity with side information at both the en oder and de oder. A similar result was given simultaneously by Erez, Shamai, and Zamir [ESZ00℄. Se ond, we show that the additive atta k watermarking game with Gaussian overtext (see Se tion 2.2.2) is an extension of Costa's result to arbitrarily varying noise. Extension 1 : Any Distribution on Covertext, Colored Gaussian Jamming Sequen e

We rst generalize Costa's result to where the side information

U

is an IID sequen e of

random variables with some arbitrary (but known) distribution, while the noise sequen e

Z is still an IID sequen e of mean-zero, varian e-D2 Gaussian random variables. 53

We will

then be able to further generalize to the above assumptions. Re all that the maximum over I (V ; Y )

I (U ; V ) (see (2.32)) is the apa ity for a

hannel with non- ausal side information at the en oder only. Although this result was only proved for nite alphabets, it is straightforward to extend the a hievability part to in nite alphabets, whi h is all that we will need. Indeed, we will spe ify the joint distribution of an auxiliary random variable V , the input X and the side information U su h that

I (V ; Y ) I (U ; V ) equals the apa ity when U is not present at all, whi h also a ts as an upper bound on the apa ity for writing on dirty paper. We now spe ify the ne essary joint distribution. Let X be a zero-mean Gaussian random

variable of varian e D1 , whi h is independent of U , whi h learly satis es E [X 2 ℄  D1 . Also,

let the auxiliary random variable V = U + X , where = D1D+1D2 . (As in (2.32), we ould have rst generated V onditioned on U and then generated X as a fun tion of V and

U .) The pre eding steps repli ates Costa's original proof. At this point, he al ulated   I (V ; Y ) I (U ; V ) to be 12 log 1 + DD21 , assuming that both U and Z are Gaussian random variables. This is suÆ ient to prove the original result sin e all rates less than this are a hievable and the apa ity annot ex eed the apa ity without

U , whi h is also given by

this expression. We shall assume that only Z is Gaussian, but we shall obtain the same result. With our hoi e of the auxiliary random variable V ,

V

(X + U + Z ) = X

and with our hoi e of the random variables X

(X + Z );

(2.35)

(X + Z ) and X + Z are un orrelated

and hen e, being zero-mean jointly Gaussian, also independent9. Furthermore, the random variables X

9

(X + Z ) and X + U + Z are independent sin e U is independent of (X; Z ).

Another way to view the hoi e of

estimate of

X

given

X + Z.



is that

(X + Z )

54

is the minimum mean squared error (MMSE)

Consequently,

h(V jX + U + Z ) = h V

(X + U + Z )jX + U + Z





= hX

(X + Z )

= hX

(X + Z )jX + Z



= h(X jX + Z );

(2.36)

where all of the di erential entropies exist sin e X and Z are independent Gaussian random variables, and the se ond and third equalities follow by (2.35) and the above dis ussed independen e. Also, the independen e of U and X implies that

h(V jU ) = h( U + X jU ) = h(X jU ) = h(X ):

(2.37)

We an now ompute that

I (V ; X + U + Z ) I (V ; U ) = h(V ) h(V jX + U + Z ) h(V ) + h(V jU ) = I (X ; X + Z )   1 D1 = log 1 + ; 2 D2 where the rst equality follows by the de nition of mutual information; the se ond equality follows from (2.36) and (2.37); and the last equality be ause X and Z are independent Gaussian random variables of varian e D1 and D2 , respe tively. Let us now onsider general independent random pro esses

U

and

Z

as the known

and unknown, respe tively, additive noise omponents in the writing on dirty paper model.

Also, let the random pro ess X  have the apa ity a hieving distribution for a hannel with

additive noise Z (i.e., PX  = arg maxPX I (X ; X + Z ), where the maximum is over distributions that satisfy any required onstraints). The pre eding arguments an be repeated

as long as there exists a linear10 fun tion () su h that X  (X  + Z ) is independent of

Z . (We also need U to be power limited so that all of the di erential entropies are nite.) 10

The linearity of

() is needed in (2.35).

55

That is, for this random pro ess X  and this linear fun tion (), if V = (U ) + X  , then

I (V ; Y )

X .

I (U ; V ) = I (X  ; X  + Z ), whi h is the apa ity without

U

by our hoi e of

We an thus show that for that Costa's result an be extended to any (power-limited)

distribution on U and a olored Gaussian distribution on Z . This follows sin e the apa ity a hieving

X  asso iated with Z is also Gaussian (with varian es given by the water lling

algorithm) [CT91℄. Furthermore, for any two independent Gaussian (and hen e jointly

Gaussian) pro esses, we an nd a linear fun tion () that satis es the above independen e property. We an also use an interleaving argument to show that if Costa's result holds for any power-limited IID law on

U , then it should also hold for any power-limited ergodi law.

Furthermore, by diagonalizing the problem and redu ing it to a set of parallel s alar hannels whose noise omponent (the omponent that is known to neither en oder nor de oder) is IID [HM88, Lap96℄ it should be lear that it suÆ es to prove (as we have done above) this result for the ase where Z is IID.

Extension 2 : IID Gaussian Covertext, Arbitrary Jamming Sequen e

For the additive atta k watermarking game with IID Gaussian overtext, we have shown that the apa ity is the same for both the private and publi versions; see Se tion 2.2.2. This provides an extension of Costa's writing on dirty paper result to when the jamming is an

arbitrarily varying power-limited sequen e. Note that the stegotext X in the watermarking

game orresponds to U + X here. When the overtext

U

is IID Gaussian, then the additive atta k watermarking game

is similar to Costa's writing on dirty paper. In parti ular, the former model di ers from the latter only in two respe ts. First, the jamming sequen e distribution is arbitrary (subje t to (2.11)) instead of being an IID Gaussian sequen e. Se ond, the jamming sequen e distribution is unknown to the en oder and de oder. Nevertheless, the two models give the same apa ity, thus demonstrating that the most malevolent additive atta k for the watermarking game is an IID Gaussian one.

56

Chapter 3

Mutual Information Games In this hapter, we onsider two mutual information games that are motivated by the

apa ity results of Wolfowitz [Wol78℄ and Gel'fand and Pinsker [GP80℄ on ommuni ation with side information dis ussed in Se tion 2.5.2. We de ne the private mutual information game based on the apa ity of a ommuni ation hannel with side information non- ausally

available to both the en oder and de oder; see (2.28). Similarly, we de ne the publi mutual information game based on the apa ity of a ommuni ation hannel with side

information non- ausally available to only the en oder; see (2.29). Mutual information games have been onsidered in the ontext of watermarking previously by Moulin and O'Sullivan [OME98, MO99, MO00℄. We fo us on squared error distortion and IID Gaussian sour es, and the resulting solution provides insight into how to approa h the s alar Gaussian watermarking (SGWM) game. The remainder of this hapter is organized as follows. In Se tion 3.1, we pre isely de ne our mutual information games and give our main result on the value of the games. In Se tion 3.2, we sket h the proof of the main result using three main lemmas; the proofs of these lemmas an be found in Appendix B. In Se tion 3.3, we give a game theoreti interpretation of the mutual information games. In Se tion 3.4, we dis uss some other mutual information games that have been previously onsidered

3.1 De nition and Main Result Given a overtext distribution PU , a onditional law PX U (\watermarking hannel") and a onditional law PY X (\atta k hannel") we an ompute the onditional mutual inforj

j

57

mation U PX U PY X (X ; Y jU ) = D (PU ;X ;Y jjPU PX jU PY jU );

IP

j

j

where D(jj) is the Kullba k-Leibler distan e, whi h is de ned for any probability measures P

and Q as (

8R > < log )= > :

D P jjQ

dP dQ

dP

P

dP dQ

:

is the Radon-Nikodym derivative of P with respe t to Q, and P

is absolutely ontinuous with respe t to Q. If (

 Q

otherwise

1

Here,

if P

) = E [log

D P jjQ

P

f

P ℄. Q

f

P

and

Q

have densities

 Q fP

means that

and

fQ

, then

We an similarly ompute other mutual information quantities.

Like the watermarking game, the mutual information game is a game played between two players in whi h the se ond player (atta ker) has full knowledge of the strategy of the rst player (en oder). The main di eren e between the two games is that the strategies in the mutual information game are onditional distributions instead of mappings, and the payo fun tion is mutual information, whi h may or may not have an operational signi an e in terms of a hievable rates. We rst des ribe the private mutual information game. For every n, the en oder hooses a watermarking hannel PX jU that satis es the average distortion onstraint (2.14), and the

atta ker then hooses an atta k hannel PY jX that satis es the average distortion onstraint (2.15). The quantity that the en oder wishes to maximize and that the atta ker wishes to minimize is ( U ; PX jU ; PY jX ) =

Ipriv P

1 n

IP

U PX U PY X (X ; Y jU ); j

j

(3.1)

whi h is the mutual information term in (2.28). The value of the private mutual information game is thus MI

(

Cpriv D1 ; D2 ; fP

U g) = lim inf n

!1

sup

inf

X U 2D1 (D1 ;PU ) PY X 2D2 (D2 ;PU ;PX U )

P

j

j

j

58

( U ; PX jU ; PY jX ); (3.2)

Ipriv P

where

D(

1 D1 ; P

n

U) =

X jU : E U X U [d1 (U ; X )℄  D1

P

P

P

o

j

(3.3)

;

and

D(

2 D2 ; P

n

U ; PX jU ) =

P

Y jX : E U X U Y X [d2 (X ; Y )℄  D2 P

P

j

P

j

o

(3.4)

:

Note that the hoi e of PX jU in uen es the set of distributions from whi h PY jX an be

hosen. Thus, this is not a standard stati zero-sum game; it is better des ribed as a dynami two-stage zero-sum game of omplete and perfe t information. We next des ribe the publi mutual information game. We rst de ne an auxiliary

random ve tor V that depends on the random ve tors U and X . The watermarking hannel

is expanded to in lude not only the onditional distribution PX jU but also the onditional

X , the random ve tor Y is independent of both U and V , so that the joint distribution of the random ve tors U , X , V and Y is distribution PV jU X . Given the random ve tor ;

the produ t of the laws PU , PX jU , PV jU X , and PY jU X V = PY jX . In the publi version, the mutual information term from (2.29) is n 1 (I (V ; Y ) I (V ; U )), whi h is written more ;

;

;

expli itly as ( U ; PX jU ; PV jU X ; PY jX ) = 1

Ipub P

;

U X U V U X Y X (V ; Y )

IP

n

P

j

P

j

;

P

IP

j

U X U V U X (V ; U ) P

j

P

j

;

 ;

(3.5)

The value of the publi mutual information game is thus MI

(

Cpub D1 ; D2 ;

f U g) = P

lim inf n

sup !1 X U 2D 1( P

j

U)

D1 ;P

V UX

P

j

inf

Y X 2D2 (D2 ;PU ;PX U )

P

j

j

( U ; PX jU ; PV jU X ; PY jX ): (3.6)

Ipub P

;

;

Note that the supremum is over a slightly more general set than (2.32), sin e we have not shown (as we did for nite alphabets in Lemma 2.1) that the maximizing joint distribution on the random ve tors

U , X and V

makes

X a deterministi fun tion of U and V .

In the following theorem, whi h is proved in Se tion 3.2, we show that the apa ity of 59

the SGWM game C  (D1 ; D2 ; u2 ) is an upper bound on the values of the mutual information games for real alphabets and squared error distortions. Moreover, for IID Gaussian

overtexts, this upper bound is tight. Theorem 3.1.

For real alphabets and squared error distortions MI (D1 ; D2 ; fPU g) Cpub

 

where 2u is de ned by

2u = lim inf n!1

MI Cpriv (D1 ; D2 ; fPU g)

(3.7)

C  (D1 ; D2 ;  2u );

(3.8)

X n

1 E [U 2 ℄ n i=1 PU i

(3.9)

and is assumed nite. Equality is a hieved in both (3.7) and (3.8) if the overtext is zero-mean IID Gaussian. Re all the de nition of C  (D1 ; D2 ; u2 ) given in (2.6). This de nition and some other relevant de nitions used in this hapter are summarized in Appendix A. 3.2

Proof of Mutual Information Game Result

In this se tion, we sket h a proof of Theorem 3.1. The upper bound on the values of the games is based on a family of atta k hannels that will be des ribed in Se tion 3.2.1. The equality for IID zero-mean Gaussian overtexts is based on the watermarking hannels that will be des ribed in Se tion 3.2.2. In Se tion 3.2.3, we will show that the proposed atta k

hannels prove the upper bound (3.8) and that for IID zero-mean Gaussian overtexts, the proposed watermarking hannels guarantee the laimed equality. 3.2.1

Optimal Atta k Channel

The atta k hannel we propose does not depend on the version of the game, and is des ribed next. Sin e the atta ker is assumed to be ognizant of the overtext distribution PU and of the watermarking hannel PX jU , it an ompute

X k2℄:

1 An = EPU PX U [k n j

60

(3.10)

It then bases its atta k hannel on An and on its allowed distortion D2 as follows. If An forgery

Y

D

2

then the atta ker an guarantee zero mutual information by setting the

deterministi ally to zero without violating the distortion onstraint. We shall

thus fo us on the ase An > D2 . For this ase the proposed atta k hannel is memoryless, and we pro eed to des ribe its marginal. For any A > D2 , let the onditional distribution PYA X have the density1 j



fYA X (yjx) = N y ; (A; D2 )  x ; (A; D2 )  D2 ; j

D2 (also de ned in (A.4)). Equivalently, under P A the random A YX variable Y is distributed as (A; D2 )X + S2 , where S2 is a zero-mean varian e- (A; D2 )D2 Gaussian random variable independent of X . The onditional distribution PYA X is thus equivalent to the Gaussian rate distortion forward hannel [CT91℄ for a varian e-A Gaussian where (A; D2 ) = 1

j

j

sour e and an allowable distortion D2 . For blo klength n and An > D2 , the proposed atta ker PY

PY that is, PY

j

X

j

X is

n  = PYAnX ; j

X has a produ t form with marginal PYAnX , where An is given in (3.10).

j

j

Noti e that by (3.10) and the stru ture of the atta k hannel 

j



Y Xk

1 EPU PX U (P An )n k YX n j

2

= (An ; D2 )



1 2 An + (An ; D2 ) D2

= D2 : Thus the atta k hannel (PYAnX )n satis es the distortion onstraint. Compare this atta k

hannel with the atta ker (de ned in Se tion 4.5.1) used in the proof of the onverse of the j

SGWM game. 3.2.2

Optimal Watermarking Channel

In this se tion we fo us on IID zero-mean varian e-u2 Gaussian overtexts and des ribe watermarking hannels that will demonstrate that for su h overtexts (3.7) and (3.8) both 1

We use

N (x; ; 2 ) to denote the density at x of a Gaussian distribution of mean  and varian e 2 . 61

hold with equality. The watermarking hannels are memoryless, and it thus suÆ es to des ribe their marginals. The proposed watermarking hannels depend on the version of

the game, on (u2 , D1 , D2 ), and on a parameter A 2 A(D1 ; D2 ; u2 ), where A(D1 ; D2 ; u2 ) is de ned in (A.7). The hoi e of A is at the watermarker's dis retion. Later, of ourse, we

shall optimize over this hoi e. Private Version:

For any A

2 A(D ; D ; u), let the onditional distribution PXAjU 1

2

2

be

Gaussian with mean b1 (A; D1 ; u2 )U and varian e b2 (A; D1 ; u2 ), i.e., have the density

fXAjU (xju) = N (x; b1 (A; D1 ; u2 )u; b2 (A; D1 ; u2 )); A+u2 D1 2 2u

where b1 (A; D1 ; u2 ) =

and b2 (A; D1 ; u2 ) = D1

(A

u2 D1 )2 2 4u

(also de ned in

(A.2) and (A.3)). Equivalently, under PXAjU the random variable X is distributed as b1 (A; D1 ; u2 )U + S1 , where S1 is a zero-mean varian e-b2 (A; D1 ; u2 ) Gaussian random variable that is independent of U . For IID zero-mean varian e-u2 Gaussian overtexts we have 

1 EPU (PXA U )n kX n j

Uk

 2

= b1 (A; D1 ; u2 )

2

1 u2 + b2 (A; D1 ; u2 )

= D1 : Thus for this overtext distribution (and, in fa t, for any overtext distribution with varian e

u2 ), the watermarking hannel (PXAjU )n satis es the distortion onstraint. Furthermore, 



1 EPU (P A )n kX k2 = A; XU n j

whi h gives an interpretation of the parameter A as the power in the stegotext indu ed by the overtext and the watermarking hannel. Compare this watermarking hannel with the a hievability s heme for the private SGWM game given in Se tion 4.2.1. Publi Version:

V

For the publi game, the onditional distribution of the random ve tor

given the random ve tors

U and X is also needed. The optimal su h distribution turns

out to be deterministi and memoryless. In parti ular, for A as above, let the distribution 62

PVAjU;X be des ribed by 

V = (A; D1 ; D2 ; u2 ) 1 U + X; b1 (A;D1 ;u2 ) 2) 1+s(A;D1 ;D2 ;u

where (A; D1 ; D2 ; u2 ) = 1

and s(A; D1 ; D2 ; u2 ) =

(A;D2 )b2 (A;D1 ;u2 ) D2

(also

de ned in (A.6) and (A.5)). Finally, let

PVAjU ;X = (PVAjU;X )n : Compare this expanded watermarking hannel with the a hievability s heme for the publi SGWM game given in Se tion 4.3.1.

3.2.3

Analysis

In this se tion, we state three lemmas, whi h together prove Theorem 2.4. Lemma 3.1 (proved in Appendix B.3) demonstrates the intuitive fa t that the value of the publi version of the mutual information game annot ex eed the value of the private version. Lemma 3.2 (proved in Appendix B.4) shows that, by using the atta k hannel proposed in Se tion 3.2.1, the atta ker an guarantee that the value of the private mutual information game not ex eed

C  (D1 ; D2 ;  2u ), where  2u is de ned in (3.9). Lemma 3.3 (proved in Appendix B.5) shows that by watermarking an IID zero-mean varian e-u2 Gaussian sour e using the hannel proposed in Se tion 3.2.2 with the appropriate hoi e of A, the en oder an guarantee a value for the publi mutual information game of at least C  (D1 ; D2 ; u2 ). Lemma 3.1. For any

sup

n > 0 and any overtext distribution PU , inf

PX jU 2D1 (D1 ;PU ) PY jX 2D2 (D2 ;PU ;PX jU ) PV jU ;X

Ipub (PU ; PX jU ; PV jU ;X ; PY jX ) 

sup

inf

PX jU 2D1 (D1 ;PU ) PY jX 2D2 (D2 ;PU ;PX jU )

Ipriv (PU ; PX jU ; PY jX ):

Sin e this lemma holds for every n, it implies (3.7). Lemma 3.2. For any

n > 0,

any overtext distribution

63

PU ,

any watermarking hannel

PX jU ,

and any xed distortion

D2 > An



A n Ipriv PU ; PX jU ; (PY jnX )







Ipriv

=

1 log 1 + s(A ; D1 2

(P ) ; (P jn ) ; (P jn ) A

G n U

A

n



Y X

X U

n

n

;n

2



; D2 ; u;n ) ;

(3.11)

where 



kU k2 ; D1 = E U X U [n 1 kX U k2 ℄;  1  2 A = E U X U n kX k ; 2

u;n

;n

=E U P

n

PUG

P

P

1

n

P

P

2 ; P An u;n X jU

2 , D u;n 1;n

atta k hannel des ribed in Se tion 3.2.1 for the parameters This lemma proves (3.8). To see this note that for any

2

 2u

and

D2

>

and

is the watermarking An ;

and

A PY jnX

is the

An .

0 and any integer

n0

there

su h that u;n <  2u

where

(3.14)

j

hannel des ribed in Se tion 3.2.2 for the parameters

n > n0

(3.13)

j

denotes a zero-mean Gaussian distribution of varian e

exists some

(3.12)

is de ned in (3.9) and

2 u;n

+ ;

(3.15)

is de ned in (3.12). Also, sin e the watermarking

hannel must satisfy the distortion onstraint (i.e. PX jU

2 D1 (D1 ; PU )),

D1;n

 D1;

D2 ,

then the atta k hannel that sets the forgery

(3.16)

where D1 is de ned in (3.13). ;n

If

An

de ned in (3.14) is less than

deterministi ally to zero is allowable and the resulting mutual information is zero. Thus, (3.8) is satis ed in this ase. We thus fo us on the ase when A 

2 u;n

p

D1;n

by the triangle inequality so that

An

2

A  n

2 A(D1

;n



2 u;n

+

p

2 ). ; D2 ; u;n

n

D1;n

> D2 .

2

By the de nition of

(A.8), it follows that the right hand side (RHS) of (3.11) is at most 64

We also note that

C ( ; ;

  )

2 ). C  (D1;n ; D2 ; u;n

C  (D1 ; D2 ;  2u

This in turn is upper bounded by 2 C  (D1 ; D2 ; u )

is non-de reasing in

C ( ; ;

D1

and

2 u

+ ) in view of (3.15) and (3.16), be ause

(see Appendix A). Finally, sin e

0 is

 >

  ) is ontinuous in its arguments, it follows that the atta ker PY njX guarantees that C (D ; D ; fPU g) is upper bounded by C  (D ; D ;  ).

arbitrary and

A

MI priv

1

2

1

2

2

u

This lemma also shows that for an IID Gaussian overtext, if the memoryless atta k

X





hannel (P j ) is used, then, of all watermarking hannels that satisfy E n 1 k k2 = A, mutual information is maximized by the memoryless watermarking hannel (P j ) of n

A

Y X

A

n

X U

Se tion 3.2.2. Consider an IID zero-mean varian e-2 Gaussian overtext (denoted (P ) ) G

Lemma 3.3.

u

and xed distortions

D1

and

D2 .

If the atta k hannel PY jX satis es 

E(P G P A

n U X U n PY j X j )

then for all  Ipub

A

2 A(D ; D ;  1

2

2

u

G n

A

n

A

X U

1

kY

X k  D ; 2

),

(P ) ; (P j ) ; (P j U

n

U

)

n

V U;X

; PY jX





 Ipub

(P ) ; (P j ) ; (P j G n U

A

X U

n

A

V U;X

 1 = log 1 + s(A; D1 ; D2 ; 2 ) : 2

) ; (P j ) n

A

n



Y X

u

Here,

A PX jU

and

PVAjU;X

parameters

2 , D1 u

parameters

D2

and

are the watermarking hannels des ribed in Se tion 3.2.2 for the A

and

PYAjX

is the atta k hannel des ribed in Se tion 3.2.1 for the

and A.

This lemma implies that for a zero-mean varian e-2 IID Gaussian overtext, the value u

of the publi mutual information game is lower bounded by

2 C  (D1 ; D2 ; u ).

Indeed, the

en oder an use the watermarking hannels de ned by (P j ) and (P j ) where A a hieves the maximum in the de nition of C  . Sin e for any overtext distribution (and in  A

X U

n

 A

n

V U;X

parti ular for an IID Gaussian overtext) the value of the private version is at least as high as the value of the publi version (Lemma 3.1), it follows from the above that, for an IID Gaussian overtext,

C

is also a lower bound on the value of the private Gaussian mutual

information game. The ombination of Lemmas 3.1, 3.2 and 3.3 shows that for a zero-mean IID Gaussian overtext of varian e

2 u ,

information games is exa tly

the value of both the private and publi Gaussian mutual

2 C  (D1 ; D2 ; u ).

65

Lemma 3.3 also shows that when the overtext is zero-mean IID Gaussian and the memoryless watermarking hannels (P

A

) and (P n

j

X U

A

) are used, then to minimize the n

j

V U;X

mutual information the atta ker should use the memoryless atta k hannel (P

A

j

Y X

3.3

) . n

Game Theoreti Interpretation

In this se tion, we look at the the private mutual information game, de ned in (3.2), with IID zero-mean varian e-2 Gaussian overtext from game theoreti perspe tive. Re all that u

the en oder is trying to maximize Ipriv and the atta ker is trying to minimize Ipriv . In game theoreti terminology (see e.g. [Gib92℄), this is a zero-sum game with Ipriv as the payo to the rst player (en oder) and

Ipriv as the pay-o to the se ond player (atta ker).

Spe i ally, this mutual information game is a dynami zero-sum game of omplete and perfe t information. In parti ular, the game is not stati , and thus we need to onsider an atta ker strategy of lists of responses to every possible watermarking hannel. We will show that a subgame-perfe t Nash equilibrium gives the value of the game, where we use the term \value of the game" to denote the highest possible pay-o to the rst player. We will also illustrate a mistake that ould be made when omputing the value of the game. We rst rederive the value of the game using this game theoreti interpretation. For a dynami game, a strategy spa e for ea h player is spe i ed by listing a feasible a tion for ea h possible ontingen y in the game. Sin e the en oder plays rst, his strategy spa e

D1

is simply the set of feasible watermarking hannels, i.e.,

D1 ; (P )

G n U



de ned in (3.3).

However, the atta ker plays se ond and thus his strategy spa e onsists of all mappings of the form : PX jU where

D2

7! PY jX 2 D2



8PX jU 2 D1

D2 ; (P ) ; PX jU ; G n U

D1 ; (P )

G n U



;

(3.17)



D2 ; (P ) ; PX jU is de ned in (3.4). That is, for every possible strategy PX jU the en oder might use, the atta ker must hoose a feasible response (PX jU ). G U

n

An en oder strategy PX jU and an atta ker strategy Ipriv (P ) ; PX jU ; G n U





(PX jU )

 Ipriv 66



) form a Nash equilibrium if

(

(P ) ; PX jU ; G n U





(PX jU ) ;

(3.18)

for every PX jU

2 D1



(D1 ; (PUG )n ), and

 Ipriv (PUG )n ; PX jU ;







 (PX jU )



  Ipriv (PUG )n ; PX jU ; (PX jU ) ;



(3.19)

for every mapping () of the form (3.17). That is, given that the atta ker will use

 (),

the en oder maximizes its pay-o by using PX jU . Conversely, given that the en oder will  use PX jU , the atta ker maximizes its pay-o (minimizes the en oder's pay-o ) by using  ().

  () form a subgame-perfe t An en oder strategy PX jU and an atta ker strategy equilibrium if they form a Nash equilibrium and if additionally

Ipriv (PUG )n ; PX jU ;





(PX jU )



Nash

 Ipriv (PUG )n ; PX jU ; PY jX

for all PX jU 2 D1 (D1 ; (PUG )n ) and for all PY jX 2 D2 (D2 ; (PUG )n ; PX jU ). That is, the atta ker must hoose the best response to any possible en oder strategy, and not just one en oder strategy as in the regular Nash equilibrium. The value of the game is given by evaluating the mutual information at any subgame-perfe t Nash equilibrium (there is not ne es    (P  ; ) sarily a unique equilibrium). The value of the game is thus Ipriv (PUG )n ; PX jU X jU . Using this terminology we see that Lemma 3.2 and Lemma 3.3 imply that there exists a subgame-perfe t Nash equilibrium of the form 

  PXAjU n ;





()

where PXAjU is de ned above in Se tion 3.2.2, A a hieves the maximum in (A.8), and   (P A )n = (P A )n for every A 2 A(D ; D ;  2 ), where P A is de ned in Se tion 3.2.1. 1 2 u Y jX X jU Y jX The value of the game is thus C  (D1 ; D2 ; u2 ). Using the above on epts, we now dis uss the value of this game that was given in [MO99, MO00℄. For A0 = u2 + D1 , 

Ipriv (PUG )n ; PX jU ; (PYAj0X )n



 



Ipriv (PUG )n ; (PXAj0U )n ; (PYAj0X )n ;

67

(3.20)

for every PX jU

2 D1



(D1 ; (PUG )n ), and

Ipriv (PUG )n ; (PXAj0U )n ; (PYAj0X )n



 



Ipriv (PUG )n ; (PXAj0U )n ; PY jX ;

(3.21)

 D2 ; (PUG )n ; (PXAj0U )n . Thus, it would seem that if 0 (PX jU ) = (PYAj0X )n  for all PX jU , then the pair (PXAj0U )n ; 0 () form a Nash equilibrium a

ording to the de nitions (3.18) and (3.19). The value of the game given in [MO99, MO00℄ is the mutual

for every PY jX

2 D2

information evaluated with this pair. However, this atta k strategy is not valid sin e (PYAj0X )n 2 = D2 (D2 ; (PUG )n ; PX jU ) for some PX jU , and in parti ular for any PX jU with      n 1 E k k2 > A0 . Indeed, the optimal en oder strategy (PXAjU )n has n 1 E k k2 = A

X

X

and A > A0 (see Lemma A.1). Thus, the expression on the RHS of (3.20) is stri tly less than C  (D1 ; D2 ; u2 ); see Figure 2-1 for a omparison of the two expressions.

3.4

Other Mutual Information Games

Zero-sum games in whi h one player tries to maximize some mutual information expression while the other player tries to minimize the same mutual information have also been investigated in [BMM85, SM88, Yan93℄. As in the watermarking game, typi ally the rst player is a ommuni ator and the se ond player is a jammer. Assuming maximum-likelihood de oding, the mutual information between the input and output of a hannel gives the rate at whi h reliable ommuni ation an take pla e. However, the de oder in the watermarking game is not ne essarily performing maximum-likelihood de oding, and thus the mutual information games do not ne essarily des ribe the apa ity. Most of the resear h in this area has fo used on the hannel Y = X + Z , where X is the input spe i ed by the rst player, Z is the noise spe i ed by the se ond player, and

X and Z are independent. For this game, the mutual information expression of interest is I (X ; Y ). If X and Z are both power- onstrained in expe tation (i.e., E [X 2 ℄



P and

E [Z 2 ℄  N ), then zero-mean Gaussian distributions for both X and Z form a saddlepoint in mutual information [Bla57℄. That is, if X   N (0; P ) and Z   N (0; N ), then

I (X ; X + Z  )  I (X  ; X  + Z  )  I (X  ; X  + Z );

(3.22)

for any feasible random variables X and Z . In our mutual information game with Gaussian 68

overtext and power- onstraints, the optimal strategies are also ( onditionally) Gaussian. However, the one-dimensional solution to our mutual information game does not form a saddlepoint. Another result that is re e ted in our mutual information game is that even if a player is allowed to hoose random ve tors instead of random variables, then he will

hoose the random ve tor to onsist of independent and identi ally distributed (IID) random variables [SM88℄. Thus, it is suÆ ient to onsider the one-dimensional mutual information game for the additive hannel dis ussed above.

69

70

Chapter 4

The S alar Gaussian Watermarking Game This hapter is devoted to proving Theorem 2.1, whi h des ribes the apa ity of the s alar Gaussian watermarking (SGWM) game and gives an upper bound on the apa ity for a general ergodi overtext. In the SGWM game, the overtext is an IID sequen e of zero-mean varian e-u2 random variables and the distortion is measured using the squared di eren e. The proof of this theorem is divided into two main parts, a hievability and

onverse. The a hievability part of the proof (Se tions 4.2 and 4.3) onsists of showing that all rates less than C  (D1 ; D2 ; u2 ) are a hievable for the SGWM game for the private and publi   D1 are versions, respe tively. In Se tion 4.3, we also show that all rates less than 21 log 1 + D 2

a hievable for the publi version of the additive atta k watermarking game with Gaussian

overtext, whi h ompletes the proof of Theorem 2.2. To assist in these arguments, we des ribe the allowable atta ks in Se tion 4.1. We also show in Se tion 4.4 that it is suÆ ient p to onsider overtexts that are uniformly distribution on the n-sphere S n(0; nu2 ). In the onverse part in Se tion 4.5, we show that no rates higher than C  (D1 ; D2 ; u2 ) are a hievable in the SGWM game. In fa t, we show that no su h rates are a hievable for any ergodi overtext distribution with se ond moment at most u2 . In this hapter, we will use uniform distributions on the n-dimensional sphere as an approximation for an IID Gaussian distribution. We denote the n-dimensional sphere entered 71

at

 2 Rn

with radius r  0 by S n (; r), i.e.,

S n(; r) = f 2 Rn : k k = rg:  2 S n(0; 1) and any angle 0    , we let C (; )  S n(0; 1) denote the spheri al ap entered at  with half-angle  ,

For any ve tor

C (; ) = f 2 S n(0; 1) : h; i > os g: The surfa e area of this spheri al ap in Rn depends only on the angle , and is denoted by Cn ( ). Note that Cn ( ) is the surfa e area of the unit n-sphere.

Note that many of the other de nitions used in this hapter are summarized in Ap-

pendix A. Most importantly, re all that if A(D1 ; D2 ; u2 ) is non-empty, then C  (D1 ; D2 ; u2 ) =

 1 log 1 + s(A; D1 ; D2 ; u2 ) ; A2A(D1 ;D2 ;u2 ) 2

max

(4.1)

where A(D1 ; D2 ; u2 ) and s(A; D1 ; D2 ; u2 ) are de ned in Appendix A.

4.1

Deterministi Atta ks

In Se tion 2.4.3, we argued that deterministi atta ks are suÆ ient to analyze a hievability for the watermarking game. In this se tion, we des ribe in more detail a deterministi additive atta k (Se tion 4.1.1) and a deterministi general atta k (Se tion 4.1.2).

4.1.1

Deterministi Additive Atta k

For the additive atta k watermarking game with real alphabets and squared error distortion, a deterministi atta ker takes on a parti ularly simple form. Indeed, ombining the forms (2.26) and (2.9), we see that the atta ker an be written as ~ gn (x) = x + y 72

(4.2)

for some sequen e y~ that satis es 1

n 4.1.2

ky~ k2  D2:

(4.3)

Deterministi General Atta k

For the general watermarking game with real alphabets and squared error distortions, a deterministi atta k gn (x) an be de omposed into its proje tion onto the stegotext x and its proje tion onto x?. That is, we an write

gn (x) = 1 (x)x + 2 (x);

for some 1 : Rn

(4.4)

7! R and some 2 : Rn 7! Rn , where h 2 (x); xi = 0.

De ning

3 (x) = n

1

k 2 (x)k2 ;

(4.5)

we an rewrite the atta ker's distortion onstraint (2.3) in terms of 1 (X ), X , and 3 (X ) as

1 (X )



1 2n

1

kX k2 + 3(X )  D2 ; a.s.;

and onsequently,

3 (x)

12 (x)

for almost all x su h that n

1

kxk2

 (n 1 kDx2k2 ; D ) ; 2

> D2 , where (A; D2 ) = 1

(4.6) D2 =A (also de ned in

(A.4)).

4.2

A hievability for Private Version

In this se tion, we show that for the private version of the watermarking game all rates

up to C  (D1 ; D2 ; u2 ) are a hievable when the overtext U is uniformly distributed on the p n-sphere S n (0; nu2 ). This result is extended to IID Gaussian overtexts in Se tion 4.4.

73

4.2.1

Coding Strategy

The oding strategy for the private version of the watermarking game is motivated by the solution to the orresponding mutual information game; see Theorem 3.1 and its proof in Se tion 3.2.

Constants The en oder and de oder hoose some Æ > 0 and a value of A 2 A(D1 ; D2 ; u2 ), where the interval A(D1 ; D2 ; u2 ) is de ned in (A.7). We assume throughout that the above interval is non-empty, be ause otherwise the laimed oding apa ity is zero, and there is no need for a oding theorem. Let the rate R of the oding strategy be R

= 12 log 1 + s(A; D1 ; D2 ; u2 )



Æ;

(4.7)

where s(A; D1 ; D2 ; u2 ) is de ned in (A.5). Note that if the hosen A a hieves the maximum in (4.1), then the RHS of (4.7) is R = C (D1 ; D2 ; u2 ) Æ; we show in Lemma A.1 that su h an A an be hosen. We will show that for any Æ > 0, and for U that is uniformly p distributed over the n-sphere S n(0; nu2 ), the rate R is a hievable. The en oder and de oder also ompute the onstants  = (A; D1 ; u2 ), b1 = b1(A; D1 ; u2 ) and b2(A; D1 ; u2 ), whi h are all de ned in Appendix A. Re all that  = (A u2 D1)=2, b1 = 1 + =u2 and b2 = D1 2 =u2 .

En oder and De oder The en oder and de oder use their ommon randomness 1 to generate 2nR independent random ve tors fC 1; : : : ; C 2nR g, where ea h random ve tor C i is uniformly distributed on the n-sphere S n(0; 1). Given a overtext U = u, a message W = w, and the ve tor C w = w , let w (u) be the proje tion of w onto the subspa e orthogonal to u, but s aled so that n 1

k w (u)k2 = b2 : 74

(4.8)

That is,

w (u) =

p

w ju?

w ju? :

nb2

(4.9)

Note that

h w (u); ui = 0: En oder:

(4.10)

Using the overtext u, the message w, and the sour e of ommon randomness

1 the en oder reates the stegotext x as

x = fn (u; w; 1 ) = b1 u + w (u):

(4.11)

By (4.10) and the de nitions of the onstants b1 and b2 (A.2), (A.3), it follows that

k

n 1 x

uk2 = (b1

1)2 u2 + b2 = D1 ;

thus demonstrating that the en oder satis es the distortion onstraint (2.1). We an further

al ulate that

k k

n 1 x 2 = A;

(4.12)

whi h demonstrates the operational signi an e of the onstant A as the power of the stegotext. De oder:

The de oder uses a modi ed nearest-neighbor de oding rule. It proje ts the

forgery y onto u? to reate yju? and produ es the message w^ that, among all messages w~ ,

minimizes the distan e between yju? and w~ (u). The de oder's output w^ = n (y; u; 1 ) is thus given as

j

w ^ = n (y; u; 1 ) = arg min y u? 1w ~ 2nR

= arg max h y ju? ; w~ (u)i; 1w ~ 2nR

where the last equality follows by noting that n 75

1

2

w~ (u)

(4.13) (4.14)

k w~ (u)k2 = b2 irrespe tive of w~; see (4.8).

If more than one message a hieves the minimum in (4.13), then an error is de lared. Note that w^ of (4.13) is with probability one unique.

4.2.2

Analysis of Probability of Error

We now pro eed to analyze our proposed en oding and de oding s heme. To this end we shall nd it onvenient to de ne the random variables,

1 Z1 = Y jU 2 ; n

(4.15)

?

Z2 =

1

Y j ; C W (U ) ; U n

(4.16)

?

and the mapping 1 (z1 ; z2 ) =

pbz2z

2 1

;

whi h will be shown to apture the e e t of the atta ker on the de oder's performan e. Note

that j 1 (Z1 ; Z2 )j  1, whi h follows from (4.8), (4.15), and (4.16) using the Cau hy-S hwarz inequality.

By the de nition of the de oder (4.14) and of the random variable Z2 (4.16) it follows

that a de oding error o

urs if, and only if, there exists a message w = 6 W su h that 0

1

Y j ; C w (U ) U n 0

?

1

Y j ; C W (U ) U n = Z2 :



?

Equivalently, an error o

urs if, and only if, there exists some w = 6 W su h that 0



Yp jU ; Cpw (U )   pZ2 nZ nb bZ ?

1

0

2

2 1

(4.17)

= 1 (Z1 ; Z2 ):

If a random ve tor S is uniformly distributed on an n-dimensional sphere, and if another

ve tor T is independent of it and also takes value in that n-sphere, then, by symmetry, the

inner produ t hS ; T i has a distribution that does not depend on the distribution of T . We 76

next use this observation to analyze the left hand side (LHS) of (4.17). Conditional on the overtext U = u and for any message w0 6= W , the random ve tor C w (u)=pnb2 is uniformly distributed over S n(0; 1) \ u? (i.e., all unit ve tors that are orthogonal to u) and is independent of the random ve tor Y ju =pnZ1, whi h also takes value on S n(0; 1) \ u?. Sin e S n(0; 1) \ u? is isometri to S n 1(0; 1),1 it follows from the above observation that the distribution of the random variable on the LHS of (4.17) does not depend on the distribution of Y ju =pnZ1. Consequently, for any w0 6= W , 0

?

?

Pr



Yp jU ; Cpw (U )   (z ; z ) Z = z ; Z = z ; U = u = 1 1 2 1 1 2 2 nZ1 nb2  Cn 1 ar

os 1 (z1 ; z2 ) ; (4.18) Cn 1 () ?

0

where re all that Cn 1() is the surfa e area of a spheri al ap of half-angle  on an (n 1)dimensional unit sphere. To ontinue the analysis of the probability of a de oding error, we note that onditional on U = u, the random ve tors fC w (u) : w0 6= W g are independent of ea h other. Thus, the probability of orre t de oding is given by the produ t of the probabilities that ea h of these 2nR 1 ve tors did not ause an error. Sin e the probability of error for ea h individual ve tor is given in (4.18), we an write the onditional probability of error for this

oding strategy as 0

Pr(errorjZ1 = z1; Z2 = z2 ; U = u) = Pr(errorjZ1 = z1 ; Z2 = z2) =  !2nR Cn 1 ar

os 1 (z1 ; z2 1 1 Cn 1 ()

1

:

(4.19)

We now nd an upper bound on the average of the RHS of (4.19) over the random variables Z1 and Z2 . The fun tion Pr(errorjZ1 = z1 ; Z2 = z2 ) is a monotoni ally non-in reasing fun tion of 1 (z1 ; z2 ) and is upper bounded by one. Consequently, for any real number  we have 

Pr(error)  Pr(errorj 1 (Z1 ; Z2 ) = ) + Pr 1 (Z1; Z2 ) <  :

Pni To isee= 1.this, it is suÆ ient to onsider u = (1 0 1

=2 u

;

;:::;

0

77

0). In this ase, u

0

(4.20)

2 S n(0; 1) \ u

?

if

0

u1

= 0 and

We will show that the RHS of (4.20) is small when  = 

1 , where

1

r

 =

b2

b2 + D2

1

(4.21)

;

= (A; D ) = 1 DA2 (see (A.4)) and  is a small positive number to be spe i ed later. We analyze the rst term on the RHS of (4.20) in Lemma 4.2 and the se ond term in Lemma 4.3. In order to do so, we re all that Shannon [Sha59℄ derived bounds on the ratio of the surfa e areas of spheri al aps that asymptoti ally yield

2

1

1 log Cn(ar

os  ) = log sin(ar

os  ) lim n!1 n Cn () = log(1  );

(4.22)

2

for every 0 <  < 1; see also [Wyn67℄. We shall also need the following lemma. Lemma 4.1. Let

f :R

7! (0; 1℄

be su h that the limit

1 = exists and is negative so that

1 > 0.

lim 1 t!1 Proof.

1 log f (t) lim t!1 t

Then

2t2

f (t)

8 >
:0

if

1 > 2

if

1 < 2

lim 1 2

 t t1 2 2

8 > > > > >
e

if  >  if  =  if  < 

1

> > > > :

0

1

2

1

2

1

2

(4.24)

:

Fix  > 0. Let us onsider the ase where  >  . There exists a t su h that log(f (t)) (  ) > 2 1 for all t > t sin e  >  and by (4.23). There also exists a su h that 1

t2

:

First, re all the well known fa t that

t!1

t

(4.23)

1

1

1

2



1 2

1

t(1 +2 )=2

78

2t2

2

2

>1



1

for all t > t2 sin e (1 +2 )=2 > 2 and by (4.24). Thus, we an write that (1 f (t))2t2 > 1 for all t > maxft1; t2 g. The laim follows in this ase sin e (1 f (t))2t2  1. The laim follows in the ase 1 < 2 by similar logi . Lemma 4.2. For any for all

 > 0,

1 > 0

and some integer

n1 > 0,

su h that

n > n1

1 Proof.

there exists some



Cn

1

1

 !2nR 1

ar

os( 1 Cn 1 ()

1 )

< :

With the de nitions of 1 (4.21) and R (4.7) we have 1 log  1  = R + Æ; 2 1 ( 1 )2

and onsequently there must exist some 1 > 0 su h that  1 1 R < log  2 1 ( 1



1 )2

(4.25)

:

By the result on the asymptoti area of spheri al aps (4.22) and by the inequality (4.25), it follows from Lemma 4.1 that there exists a positive integer n1 su h that for all n > n1 Cn

1

1

ar

os( 1 Cn 1 ()

 !2nR

1 )

>1

;

and the laim follows by noting that the LHS annot de rease when the exponent 2nR is repla ed by 2nR 1. Our a hievability proof will thus be omplete on e we demonstrate that the se ond term on the RHS of (4.20) onverges to zero for  = 1 1. This is demonstrated in the following lemma, whi h is proved in Appendix B.6 and whi h on ludes the a hievability proof for the private version of the SGWM game. Lemma 4.3. For any

>

0

and

1 >

0

, there exists an integer

79

n2 >

0

su h that for all

n > n2 Pr 1 (Z1 ; Z2 ) < 1

4.3



1 < :

A hievability for Publi Version

D1 ) are a hievable In this se tion, we show that all rates up to C  (D1 ; D2 ; u2 ) and 12 log(1 + D 2 for the publi version of the general watermarking game and for the additive atta k water-

marking game, respe tively, when the overtext U is uniformly distributed on the n-sphere p

S n(0; nu2 ). We extend these results to IID Gaussian overtexts in Se tion 4.4.

4.3.1

Coding Strategy

The oding strategies for the publi versions of both the additive atta k and the general watermarking games are motivated by the works of Marton [Mar79℄, Gel'fand and Pinsker [GP80℄, Heegard and El Gamal [HEG83℄, and Costa [Cos83℄. For both models, we x a Æ > 0. In the following subse tions, we de ne the set of on-

stants f ; v2 ; R0 ; R1 ; Rg separately for ea h model. Using these onstants we then des ribe the en oder and de oder used for both models. Thus, while the onstants have di erent values for the two models, in terms of these onstants the proposed oding s hemes are identi al.

Constants for the Additive Atta k Watermarking Game For the additive atta k watermarking game, we de ne the set of onstants as

D1

; D1 + D2 v2 = D1 + 2 u2 ;   1 D1 u2 R0 = log 1 + + Æ; 2 (D1 + D2 )2   D1 D1 u2 1 + Æ; R1 = log 1 + 2 D2 D2 (D1 + D2 ) =

80

(4.26) (4.27) (4.28) (4.29)

and R = R1 R0 =

1 log 1 + D 2 D



2Æ:

1 2

(4.30)

Constants for the General Watermarking Game The hoi e of the onstants for the general watermarking game is inspired by the solution to the publi Gaussian mutual information game; see Theorem 3.1 and its derivation in Se tion 3.2. The en oder and de oder hoose a free parameter A 2 A(D ; D ;  ), where the interval A(D ; D ;  ) is de ned in (A.7). We assume throughout that the above interval is non-empty, be ause otherwise the oding apa ity is zero, and there is no need for a oding theorem. First, let  = (A; D ;  ), b = b (A; D ;  ), b = b (A; D ;  ), = (A; D ) and 1 2 . = (A; D ; D ;  ) as de ned in Appendix A. In parti ular, re all that = 1 2 2 We an then de ne the other onstants as 1

1

2

2

2

u

1

1

1

2

u

2

2

1

2

u

2

2

2

u

u

1

1

2

2

b D

u

D

2 = 2 2 + 2  + D1 ; 1  ( 2 + )2  + Æ; R0 = log 1 + 2  D1 2 2  1 A b2 R1 = log 1 + Æ; 2 D2 (D2 + b2 ) v

+ b

(4.31)

u

(4.32)

u

u

(4.33)

and R = R1 R0 =

1 log 1 + s(A; D ; D ;  ) 2Æ; 2 1

2

2

u

(4.34)

where s(A; D ; D ;  ) is de ned in (A.5). If A is hosen to maximize (4.34) as in (4.1), then R = C (D ; D ;  ) 2Æ; we show in Lemma A.1 that su h an A an be hosen. 1

2

2

1

u

2

2

u

En oder and De oder The en oder and de oder use their sour e of ommon randomness  to reate a odebook of 0 IID random ve tors fV auxiliary odewords as follows. They generate 2 1 = 2 g, where 1  j  2 , 1  k  2 0 , and ea h random ve tor V is uniformly distributed on p the n-sphere S (0; n ). Thus, the odebook onsists of 2 bins (indexed by j ), ea h

ontaining 2 0 auxiliary odewords. In Figure 4-1, we give an example odebook with 1

nR

nR

n

nR

n(R+R

)

j;k

2

nR

v

nR

81

j;k

v1 2

6 ;

v2 4 ;

v1 1 

I

;

v2 3





v2 1 ;

-v  Rv v?

1;3

2;2

;

1;4

Figure 4-1: Example odebook for publi version. Dashed ve tors are in bin 1 and dotted ve tors are in bin 2.

n = 2, R0 = 1 and R = 1=2. Instead of being sele ted randomly, the odewords in this example have been pla ed regularly in the 2-sphere (i.e., ir le). En oder:

Given the message w and the overtext u, the en oder looks in bin w and hooses

the auxiliary odeword losest (in Eu lidean distan e) to the overtext. The output of the en oder x is then reated as a linear ombination of the overtext and the hosen auxiliary

odeword. Mathemati ally, the en oder behaves as follows. Given the message w, the overtext u,

and the odebook fv

j;k

g, let the hosen index for message w be k (u; w) = arg max hu; v 1k 2

nR0

w;k

i;

(4.35)

whi h is unique with probability one. Further, let the hosen auxiliary odeword for message

w be

v (u ) = v w

w;k  (

u;w) :

(4.36)

The en oder reates its output x as

x = v (u) + (1 )u:

(4.37)

w

The example of Figure 4-1 is ontinued in Figure 4-2, where the en oding pro edure is illustrated. De oder:

The de oder nds the auxiliary odeword that, among all the 2 82

nR1

sequen es in

6

 *u:x         -v (u)   R ? I

w

Figure 4-2: Example en oding for publi version with

w = 1 (bin with dashed ve tors).

6

 1y     -v ^  R ? I

w;k

Figure 4-3: Example de oding for publi version.

the odebook, is losest to the re eived sequen e

y.

He then de lares the estimate of the

message to be the bin to whi h this auxiliary odeword belongs. Mathemati ally, given the re eived sequen e

y and the odebook fv g, the estimate is given by j;k

w^

 =

arg min 1 w ~ 2nR

=

arg max 1w ~ 2nR

min 1k2nR0



max 1k2nR0

where the last equality follows by noting that

n

1

ky v ~ k w;k

hy; v ~ i



w;k

2

;

 (4.38)

(4.39)

kv ~ k2 = 2 irrespe tive of w~ and k. Note w;k

v

that w ^ of (4.38) is with probability one unique. The example is ompleted in Figure 4-3 with an illustration of the de oding pro ess. In this example, the de oder su

essfully re overed the value of the watermark. 83

4.3.2

Probability of Error

In this se tion, we derive the onditional probability of error in the above oding strategy. We rst de ne the random variables on whi h we will ondition. Let the random variable

Z be the maximum (normalized) inner produ t a hieved in (4.35), 1

Z=

n

hU ; V (U )i:

(4.40)

W

Next, let the random variable Z3 be the normalized power in the sequen e

Z3 =

1

n

kY k2:

Y, (4.41)

Next, let the random variable Z4 be the normalized inner produ t between the sequen e Y~ , whi h is de ned by

Y~ = Y X ; and the auxiliary odeword

(4.42)

V (U ), W

Z4 =

1 ~ hY ; V

(U )i:

(4.43)

2 + (1 )z + z4 pz 2 : 3

(4.44)

n

W

Finally, let us de ne a mapping 2 (z; z3 ; z4 ) as

2 (z; z3 ; z4 ) =

v

v

By the de nition of the de oder (4.39), it follows that a de oding error o

urs if, and

only if, there exists a message w = 6 W and an index k su h that 0

1

hY ; V n

w 0 ;k 0

0

i  n1 hY ; V (U )i 1 = hX ; V (U )i + n1 hY~ ; V (U )i n W

W

= 2 + (1 v

W

)Z + Z4 ;

where the rst equality follows by the de nition of Y~ (4.42) and the se ond equality follows by the de nitions of the en oder (4.37) and the random variables Z and Z4 . Note that we 84

do not need to onsider the ase where the de oder makes a mistake in the same bin sin e this does not result in an error. Equivalently, an error o

urs if, and only if, there exists a message w0 6= W and an index k0 su h that *

+

pY ; Vp nZ3

0

w ;k

0

n2 v

2   + (1p )Z2 + Z4

(4.45)

v

=

Z3  2 (Z; Z3 ; Z4 ): v

p

The random ve tor V = n2 is uniformly distributed on the unit n-sphere S (0; 1) and is independent of Y , Z , Z3 , and Z4 . Indeed, the en oder does not examine the auxiliary

odewords in bins other than in the one orresponding to the message W . The random ve tor Y =pnZ3 also takes value on the unit n-sphere S (0; 1), and thus, by symmetry (see Se tion 4.2.2), the distribution of the LHS of (4.45) does not depend on the distribution of Y . In parti ular, for any w0 6= W , w 0 ;k 0

n

v

n

Pr

*

+

pY ; Vp nz3

w 0 ;k 0

n2

!

2 z; z3 ; z4 Z

 (

) = z; Z3 = z3; Z4 = z4 =

v



C

n

ar

os 2 (z; z3 ; z4 ) : (4.46) C () n

: w0 6= W; 1  k0  2 0 g are independent of Furthermore, the random ve tors fV ea h other. Thus, the probability that there was not an error is given by the produ t of the probabilities that ea h of these 2 1 2 0 ve tors did not ause an error. Sin e the probability of error for ea h individual ve tor is given in (4.46), we an write the onditional probability of error for this oding strategy as nR

w 0 ;k 0

nR

Pr(errorjZ = z; Z3 = z3 ; Z4 = z4) = 1

nR

1

 !2nR1 2nR0

C

n

ar

os 2 (z; z3 ; z4 ) C () n

:

(4.47)

The expression Pr(errorjZ = z; Z3 = z3; Z4 = z4 ) is a monotoni ally non-in reasing fun tion of 2(z; z3 ; z4 ) and is upper-bounded by 1. Consequently, as in Se tion 4.2.2, 



Pr(error)  Pr errorj 2 (Z; Z3 ; Z4 ) =  + Pr 2 (Z; Z3 ; Z4 ) <  ;

(4.48)

for any real number . For both games under onsideration, we will show that, by hoosing a suÆ iently large blo klength n, the RHS of (4.48) an be made arbitrarily small when 85

 =  (R + Æ)

2 .

1

Here  (R1 + Æ) =



1 2

2(R1 +Æ )

1=2

(4.49)

;

2 is a small number to be spe i ed later, and the onstant R1 is de ned in (4.29) and (4.33)

for the additive atta k and general watermarking games, respe tively. We now analyze the rst term on the RHS of (4.48) for both games simultaneously. The analysis of the se ond term is performed separately for the additive atta k watermarking game in Lemma 4.8 and for the general watermarking game in Lemma 4.10. Lemma 4.4.

For any  > 0, there exists some 2 > 0 and some integer n1 > 0 su h that

for all n > n1 0

1



Cn

1



ar

os  (R + Æ) Cn ()

2

1

 12nR1

nR0

2

A

< ;

where R1 is de ned a

ording to either (4.29) or (4.33). Proof.

Rewriting (4.49) as 1 log 2 1

demonstrates the existen e of some  1 log 2 1

!

1

2

 (R1 + Æ) 2

>0

= R + Æ; 1

su h that

1  (R + Æ ) 1

!

2

2

(4.50)

> R1 ;

be ause in both (4.29) and (4.33) the rate R satis es 0 <  (R + Æ) < 1. By the result on the asymptoti area of spheri al aps (4.22) and by the inequality (4.50), it follows by Lemma 4.1 that there exists a positive integer n su h that for all n > n 1

1

1

0

1



Cn



ar

os  (R + Æ) Cn () 1

1

2

 12nR1 A

>1

;

and the laim follows by noting that the LHS annot de rease when the exponent 2nR1 is repla ed by 2nR1 2nR0 . 86

4.3.3

Distribution of Chosen Auxiliary Codeword

To ontinue with the performan e analysis, we shall need the distribution of the hosen

V (U ) (de ned in (4.36)), both un onditionally and onditioned on the random ve tor X and the random variable Z (de ned in (4.37) and (4.40), respe tively).

auxiliary odeword

Lemma 4.5.

the n-sphere

W

The random ve tor

p

S (0; n2). n

V (U ) de ned in (4.36) is uniformly distributed over W

v

Proof. By the symmetry of the en oding pro ess it is apparent that V

W

(U ) is independent

of the message W . Assume then without loss of generality that W = 1. Sin e all the auxiliary random ve tors

p

fV 1 g ;k

in bin 1 take value in the n-sphere

S (0; n2), it follows that the hosen auxiliary odeword must take value in the same n

v

n-sphere.

fV 1 g is invariant under any unitary transformation as is the distribution of U , and sin e U and fV 1 g are independent, it follows that the un onditional distribution of V (U ) is as stated above. In other words, the fa t that V (U ) a hieves the maximum inner produ t with U does not tell us anything about the dire tion of V (U ). Finally, sin e the joint distribution of

;k

;k

W

W

W

Lemma 4.6.

Given X = x and Z = z , the random ve tor V

W

(U ) is uniformly distributed

over the set

n

o

V (x; z) = a1x + v : v 2 S (0; pna2) \ x? ; n

(4.51)

where

a1 =

2 + (1 )z ; n 1 kx k2 v

and

a2 =

(1

)2 (2 2 z 2 ) : n 1 kxk2 u

v

The proof of this lemma an be found in Appendix B.7. 87

(4.52)

4.3.4

Analysis for Additive Atta k Watermarking Game

Using the en oder and de oder des ribed above, we now show that for any positive Æ the rate R de ned in (4.30) is a hievable for the additive atta k watermarking game (when the

overtext

U

p is uniformly distributed on the n-sphere S n (0; nu2 )). That is, the proba-

bilities that the en oder does not meet the distortion onstraint and that a de oding error o

ur an both be made arbitrarily small by hoosing some nite blo klength n. In order to prove these fa ts, we rst show in the following subse tion that the random variable Z takes value lose to u2 with high probability. A Law of Large Numbers for

Z

In this se tion, we state and prove a laim that des ribes the behavior of the random variable Z de ned in (4.40). This laim will be used to show that en oder and de oder behave properly (i.e., meeting the distortion onstraint and re overing the orre t message) with arbitrarily high probability. Lemma 4.7.

If the onstants de ned for the additive atta k watermarking game are used

to design the sequen e of en oders of Se tion 4.3.1, then lim Pr Z

n!1

Proof. Let

V

 u2 ) = 1:

p be uniformly distributed on S n (0; nv2 ) independent of U . Then

Pr(Z

 u ) 2



= 1 = 1

Pr 

1

max n

1k2nR0

Pr n

1

1

hU ; V W;k i < u 2

hU ; V i  u2

2nR0



;

(4.53)

V W;k, and the se ond nR equality follows be ause fV W;k g2k=10 are IID and also independent of U . The RHS of (4.53) where the rst equality follows by the de nition of Z (4.40) and of

an be further simpli ed using 

Pr

1

n

hU ; V i  u2

*



= Pr 

U p

nu2 

u Cn ar

os  v = Cn ()

88

+

V ;p

nv2



;



u v

!

(4.54)

whi h follows sin e both normalized random ve tors are uniformly distributed on S n(0; 1) and they are independent of ea h other. By (4.22) we obtain 



1 log Cn ar

os lim n!1 n Cn ()

u v



 2 2 = 21 log 1 2 u v = (R0 Æ);

(4.55) (4.56)

where the se ond equality follows by the de nitions of (4.26), v2 (4.27), and R0 (4.28). Combining Lemma 4.1 with (4.53), (4.54), and (4.56) on ludes the proof. The En oding Distortion Constraint

We now show that the en oder's distortion onstraint is met with arbitrarily high probability. By Lemma 4.7, it is suÆ ient to show that Z  u2 implies n 1kX U k2  D1 , whi h we pro eed to prove. By the de nitions of X and Z (see (4.37) and (4.40)), n

1

kX U k2 = v2 2 Z + 2u2 :

(4.57)

Sin e is positive (4.26), the RHS of (4.57) is de reasing in Z . Consequently, the ondition Z  u2 implies n

1

kX U k2  v2 2 u2 = D1;

where the last equality follows from (4.27). The De oding Error

We now show that the se ond term on the RHS of (4.48) is vanishing in n when  =  (R1 + Æ) 2 . Here R1 and  (R1 + Æ) are de ned in (4.29) and (4.49) respe tively, and 2 > 0 is spe i ed in Lemma 4.4. The ombination of this fa t with Lemma 4.4 will show that, as the blo klength n tends to in nity, the probability of de oding error approa hes zero. The following lemma is proved in Appendix B.8. Lemma 4.8.

If the onstants de ned for the additive atta k watermarking game are used

to design the sequen e of en oders of Se tion 4.3.1, then for any  >

89

0 and 2 > 0, there

exists an integer n2 > 0 su h that for all n > n2 and for all the deterministi atta ks of Se tion 4.1.1 Pr 2 (Z; Z3 ; Z4 ) <  (R1 + Æ) 4.3.5



2 < :

Analysis for General Watermarking Game

We return to the publi version of the general watermarking game to demonstrate that the en oder and de oder for the general watermarking game (de ned in Se tion 4.3.1) guarantee that the rate R of (4.34) is a hievable, for any Æ > 0. That is, we show that both the probability that the en oding distortion onstraint is not met and the probability of a de oding error are vanishing in the blo klength n. We rst show in the following subse tion that the random variable Z on entrates around u2 + . A Law of Large Numbers for

Z

In this se tion, we prove a law of large numbers for the random variable Z =

1

n

hU ; V W (U )i,

whi h is de ned in (4.40), and whi h orresponds to the normalized inner produ t between

the sour e sequen e U and the hosen auxiliary odeword V W (U ). This law will be useful for the later analysis of the probability of ex eeding the allowed en oder distortion and the probability of a de oding error. Lemma 4.9.

For every Æ > 0 used to de ne the en oder for the general watermarking game

(see equations (4.32), (4.33), (4.34) and Se tion 4.3.1), there exists (Æ) > 0 su h that lim Pr u2 +   Z

n!1



 u2 +  + (Æ)

= 1;

and lim (Æ) = 0: Æ #0

Proof. The proof that Pr(Z

 u2 + ) ! 1 is almost identi al to the proof of Lemma 4.7.

One need only repla e u2 with u2 +  and use the de nitions of the onstants that are for the general watermarking game as opposed to the onstants for the additive atta k watermarking game; see Se tion 4.3.1. 90

To omplete the proof of the present laim, we now hoose (Æ) > 0 su h that log 1



u2 +  + (Æ) u v

2!

< R0 :

(4.58)

This an be done be ause the LHS of (4.58) equates to (R0 + Æ) when (Æ) is set to zero (in analogy to the equality between (4.55) and (4.56)), and be ause log(1 x2) is ontinuous and de reasing in x, for 0 < x < 1. Using Lemma 4.1, we see that Pr(Z > u2 + (Æ)) ! 0. Finally, we an hoose (Æ) ! 0 as Æ ! 0 by the ontinuity of log(1 x2 ). The En oding Distortion Constraint

We now show that for an appropriate hoi e of n and Æ, the distortion onstraint is met with arbitrarily high probability. As in Se tion 4.3.4, if  0, then (4.57) demonstrates that whenever Z  u2 +  holds we also have n 1kX U k2  D1 . Thus, our laim follows from Lemma 4.9 if  0. However, ontrary to the additive atta k game, in the general watermarking game the

onstant need not be non-negative. To address this ase we note that for < 0, whenever the inequality u2 +   Z  u2 +  + (Æ) holds we also have n 1kX U k2  D1 2 (Æ). Thus, if we design our system for some D~ 1 < D1 instead of D1 as the en oder's distortion

onstraint, then by hoosing Æ suÆ iently enough and n suÆ iently large, Lemma 4.9 will guarantee that the en oder will meet the D1 distortion onstraint with arbitrarily high probability. The desired a hievability result an be demonstrated by letting D~ 1 approa h D1 , be ause C  (D1 ; D2 ; u2 ) is ontinuous in D1 . The De oding Error

In this se tion, we show that the se ond term on the RHS of (4.48) is vanishing in n when  =  (R1 + Æ) 2 . Here R1 and  (R1 + Æ) are de ned in (4.33) and (4.49) and 2 is spe i ed in Lemma 4.4. The ombination of this fa t with Lemma 4.4 will show that the probability of de oding error approa hes zero, as the blo klength n tends to in nity. We state the desired result in the following lemma, whi h is proved in Appendix B.9. Lemma 4.10.

If the onstants de ned for the general watermarking game are used to de-

sign the sequen e of en oders of Se tion 4.3.1, then for any  > 0 and 2 > 0, there exists

91

an integer

n2 > 0

su h that for all

n > n2

and for all atta kers of Se tion 4.1.2

Pr 2 (Z; Z3 ; Z4 ) <  (R1 + Æ)



2 < :

4.4 Spheri ally Uniform Covertext is SuÆ ient We have shown in the early se tions of this hapter that if the overtext distributed on the n-sphere S n(0;

p

U

is uniformly

nu2 ), then the oding apa ity of both the private and

publi versions of the watermarking games are lower bounded by C  (D1 ; D2 ; u2 ). We have

also shown that for su h overtexts, the oding apa ity of the additive atta k watermarking

1 game is at least 12 log(1+ D D2 ). In this se tion, we extend these results to zero-mean varian eu2 IID Gaussian overtexts.

We rst transform the IID Gaussian sequen e

formly distributed on the n-sphere S n (0;

p

U into a random ve tor U 0 that is uni-

nu2 ). To this end we set 1

SU = n

kU k2 ;

whi h onverges to u2 in probability, and let

U0 =

s

u2 SU

U;

whi h is well de ned with probability 1, and whi h is uniformly distributed on S n (0;

p

nu2 ).

We will onsider all the models simultaneously, but we will state our assumptions on the rate of ea h of the models separately:

General watermarking Assume that 0 < R < C (D1 ; D2 ; u2 ).

By the de nition of C 

(2.6), there exists some A0 2 A(D1 ; D2 ; u2 ) su h that R < 21 log(1 + s(A0 ; D1 ; D2 ; u2 )).

Sin e s(A0 ; D1 ; D2 ; u2 ) is ontinuous in D1 , there exists some D10 < D1 su h that R < 21 log(1 + s(A0 ; D10 ; D2 ; u2 )).

Additive atta k watermarking Assume that 0 < R < 12 log(1 + DD ). Then, there exists 1 a D10 < D1 su h that R < 21 log(1 + D D2 ).

1 2

0

Let X 0 be the output of the en oders as designed for the overtext U 0 and the parameters

A0 and D10 in Se tions 4.2.1 and 4.3.1. Let 0 be the orresponding de oder. Consider now 92

an en oder for the overtext

U that produ es the stegotext X a

ording to the rule 8 >

:

u

1

kx0 uk2  D1

otherwise

:

With this hoi e of x, the distortion between u and x is less than D1 almost surely, so that the en oding distortion onstraint (2.1) is met.

X = X 0 with arbitrarily high probability. Indeed, the distortion between the random ve tors X 0 and U is given by We next laim that for a suÆ iently large blo klength,

1 kX 0 n

U k2 = n1 kX 0 U 0 + U 0 U k2  n1 kX 0 U 0 k2 + n1 kU 0 U k2 + n2 kX 0 U 0k  kU 0 U k q  D10 + n1 kU 0 U k2 + D10 n2 kU 0 U k;

and 1 0 kU n

U k2 =

p

SU

p 2 u2

approa hes, by the weak law of large numbers, zero in probability. In the above, the rst inequality follows from the triangle inequality, and the se ond be ause the en oders of Se tions 4.2.1 and 4.3.1 satisfy the en oder distortion onstraint n almost surely. Sin e D10 < D1 , our laim that

1

kX 0 U 0k2  D10

lim Pr(X = X 0 ) = 1

n!1

(4.59)

is proved. ^ be the output of the de oder 0 , and onsider now any xed deterministi atta k. Let W The probability of error an be written as ^ = Pr(W 6 W ) = Pr(W^ =6 W; X = X 0) + Pr(W^ =6 W; X =6 X 0)

 Pr(W^ 6= W; X = X 0) + Pr(X 6= X 0); where the se ond term on the RHS of the above onverges to zero (uniformly over all the 93

deterministi atta kers) by (4.59), and the rst term approa hes zero by the a hievability results for overtexts that are uniformly distributed over the n-sphere. To larify the latter argument onsider, for example, the publi watermarking game with an additive atta ker as in (4.2). We would then argue that Pr(W^ 6= W; X = X 0) = Pr 0(X + y~ ; 1 ) 6= W; X = X 0  = Pr 0(X 0 + y~ ; 1 ) 6= W; X = X 0   Pr 0(X 0 + y~ ; 1 ) 6= W ; 

whi h onverges to zero by the a hievability result on overtexts that are uniformly distributed on the n-sphere.

4.5

Converse for Squared Error Distortion

In this se tion, we prove the onverse part of Theorem 2.1 for the watermarking game with real alphabets and squared error distortions. That is, we show that if the overtext   distribution fPU g is ergodi with nite fourth moment and E Uk2  u2 , then the apa ity of the private version of the watermarking game is at most C (D1 ; D2 ; u2 ). In parti ular, for any xed R > C (D1 ; D2 ; u2 ) and any sequen e of rate-R en oders that satisfy the distortion onstraint (2.1), we will propose a sequen e of atta kers fgn g that satisfy the distortion onstraint (2.3) and that guarantee that, irrespe tive of the de oding rule, the probability of error will be bounded away from zero. Thus, even if the sequen e of de oders were designed with full knowledge of this sequen e of atta kers, no rate above C (D1 ; D2 ; u2 ) would be a hievable. The remainder of this se tion is organized as follows. In Se tion 4.5.1, we des ribe the proposed sequen e of atta kers. In Se tion 4.5.2, we study the distortion they introdu e, and in Se tion 4.5.3 we show that, for the appropriate rates, these atta k strategies guarantee a probability of error that is bounded away from zero. We on lude with a dis ussion of the ne essity of the ergodi ity assumption in Se tion 4.5.4. 94

4.5.1

Atta ker

Intuitive De nition

We seek to provide some motivation for the proposed atta k strategy by rst des ribing two simple atta ks that fail to give the desired onverse. We then ombine aspe ts of these simple strategies to form the atta k strategy that we will use to prove the onverse. The up oming dis ussion will utilize the orresponden e between the en oder and atta ker (mappings) (fn, gn ) and the watermarking and atta k hannels ( onditional laws) (PX jU , PY jX ) that they indu e for given xed laws on W , fPU g, 1 , and 2. One way to prove the onverse is to show using a Fano-type inequality that in order for the probability of error to tend to zero, a mutual information term similar to Ipriv of (3.1) | evaluated with respe t to the indu ed hannels | must be greater than the watermarking rate. Thus, one would expe t that the optimal atta k hannels of Se tion 3.2.1 for the mutual information games ould be used to design good atta ker mappings for the watermarking game. The rst simple atta k strategy orresponds to the optimal atta k hannel (PYAjX )n of Se tion 3.2.1, where A is the average power in the stegotext based on the en oder, i.e.,   A = E n 1 kX k2 . Sin e the en oder must satisfy the distortion onstraint (2.1) (and thus the orresponding watermarking hannel PX jU must be in D1 (D1 ; PU )), the results of Se tion 3.2.3 show that this atta ker guarantees that the mutual information is at most C  (D1 ; D2 ; u2 ). The problem with this atta k strategy is that sin e it is based on the average power in the stegotext, there is no guarantee that the atta ker's distortion onstraint (2.3) will be met with probability one. The se ond simple atta k strategy orresponds to the optimal atta k hannel (PYa jX )n , where a is the power in the realization (sample-path) of the stegotext, i.e., a = n 1kxk2 . The results of Se tion 3.2.3 again give the appropriate upper bound on the mutual information

onditioned on the value of a. Furthermore, if a distortion level D~ 2 slightly smaller than the a tual distortion level D2 is used to design this atta ker, then the distortion onstraint will be met with high probability. The problem with this atta k strategy is that the de oder an fairly a

urately determine the value of a from the forgery. Thus, the en oder and de oder

ould potentially use the power of the stegotext to send extra information, so that the total rate might be higher than C (D1 ; D2 ; u2 ). The atta k strategy that we use to prove the onverse ombines aspe ts of the two 95

simple strategies des ribed above. To form this atta ker, we partition the possible values of

kxk2 into a nite number of intervals, A1; : : : ; Am , and ompute the average power   in the stegotext onditioned on ea h interval, i.e., ak = E n 1 kX k2 n 1 kX k2 2 Ak . We a

=n

1

then use the optimal atta k hannel (PYakjX )n whenever the a tual power of the stegotext lies in the interval Ak . Unlike the rst simple strategy, the distortion onstraint an be guaranteed by making the intervals small enough. Unlike the se ond simple strategy, the en oder and de oder annot use the power of the stegotext to transmit extra information be ause there are only nitely many intervals. These arguments will be made more pre ise in the up oming se tions.

Pre ise De nition Let

R

be a xed rate that is stri tly larger than

C  (D1 ; D2 ; u2 ).

For any rate-R sequen e

of en oders and de oders, the atta ker des ribed below will guarantee some non-vanishing probability of error. By the ontinuity of C  (D1 ; D2 ; u2 ) in D2 , it follows that there exists some 0 < Æ~ < D2 ~ u2 ). Let su h that R > C  (D1 ; D2 Æ; ~2 D

~ Æ;

= D2

(4.60)

~ 2 ; (2u + pD1 )2 suÆ iently nely for some su h Æ~. The atta ker partitions the interval D 



into m sub-intervals A1 ; : : : ; Am , so that for ea h sub-interval Ak , ~2 D

1+

~2 D A



!

A0

1

A

~2 0), and Su h a partition exists be ause this interval is nite, it does not in lude zero (D be ause the onstant Æ~ is positive. We de ne the mapping k from Rn to f0; : : : ; mg as 8 > :

k(

if n

1

kx k2 2 A l

0 if no su h l exists

This mapping will determine how the stegotext 96

:

(4.62)

x will be atta ked. Noti e that it takes on

a nite number of values. We also de ne the random variable K

= k(X ):

Using his knowledge of the distribution of the overtext and the en oder mapping, the atta ker omputes ak

=E



1 kX k K = k ; 8 0  k  m:

(4.63)

2

n

Note that ak 2 Ak for k 6= 0 sin e Ak is an interval (and hen e onvex) and sin e the event K = k orresponds to the event n kX k 2 Ak . The atta ker also omputes 1

k

=E



2

1 kU k K = k ; 8 0  k  m:

(4.64)

2

n

Using only the sour e of randomness  , the atta ker generates a random ve tor V as a sequen e of IID zero-mean varian e-D~ Gaussian random variables. Re all that we assume that the random variable  and the random ve tor X are independent, and thus the random ve tors V and X are also independent. 2

2

2

We now des ribe an atta ker gn that does not ne essarily meet the distortion onstraint. For this atta ker, the stegotext is omputed as gn

8 > <

(a ; D~ )x + = (a ; D~ )v( ) if k(x) > 0 (x;  ) = >pk x p  k x ; : ~ otherwise nD nD v ( )=kv ( )k ( )

1 2

2

( )

2

(4.65)

2

2

2

2

2

2

where (A; D ) = 1 D =A (also see (A.4)). Conditionally on X = x satisfying k(x)  1, the random ve tor Y = gn (x;  ) under this atta ker is thus distributed as (ak x ; D~ )x + ~ )V . Note that if K = k > 0, the resulting onditional distribution PY jX is

= (ak x ; D the same as the optimal atta k hannel of the mutual information game orresponding to ~ ; see Se tion 3.2.1. ak and D 2

2

2

1 2

( )

2

2

( )

2

Finally, our proposed atta ker uses gn if the distortion onstraint is met and sets y = x 97

if the distortion onstraint is not met. That is, 8 >

:x otherwise n

(4.66)

The atta ker gn thus satis es the distortion onstraint with probability one. Note that if instead of ak being al ulated as in (4.66) it was hosen arbitrarily from Ak , then the up oming proof would still be valid (provided that ea h Ak is small enough). The resulting atta ker is independent of the en oder and de oder and guarantees that no rates greater than C (D1 ; D2 ; u2 ) are a hievable. 4.5.2

Analysis of Distortion

The atta kers fgn g do not, in general, satisfy the distortion onstraint (2.3). But in this se tion we show that, as the blo klength tends to in nity, the probability that the distortion they introdu e ex eeds D2 tends to zero. In the terminology of (4.66) we shall thus show that  lim Pr gn (X ; 2 ) = gn (X ; 2 ) = 1: n!1

(4.67)

On e this is shown, for the purposes of proving the onverse, it will suÆ e to show that, for the appropriate rates, the atta kers fgn g guarantee a non-vanishing probability of error. To see this, x any R > C (D1 ; D2 ; u2 ) and x some en oder sequen e ffng and a orresponding de oder sequen e fng. Let D~ 2 be hosen as in (4.60) so that R > ~ 2 ; u2 ) and onsider the atta ker (4.65). Assume that we have managed to prove C  (D1 ; D that the atta kers fgn g of (4.65) guarantees a non-vanishing probability of error. In this

ase (4.67) will guarantee that the probability of error must also be bounded away from zero in the presen e of the atta ker gn . Sin e fgn g do satisfy the distortion onstraint, this will on lude the proof of the onverse. We now turn to the proof of (4.67). In order to summarize the distortion introdu ed by the atta ker, we de ne the following random variables, 1(k) = (ak ; D~ 2 )



n 1

kV k2 D~ 2

98



;

k

= 1; : : : ; m;

(4.68)

and 



2(k) = (ak ; D~ 2 ) 1

( ; ~ ) hX ; V i;

1=2 ak D2 n 1

k

= 1; : : : ; m:

(4.69)

Note that for any 1  k  m, the random variables 1 (k) and 2 (k) onverge to zero in probability, be ause V is a sequen e of IID N (0; D~ 2 ) random variables independent of X , and be ause 0 < (ak ; D~ 2 ) < 1 for all 1  k  m. The probability of ex eeding the allowed distortion an be written as 

   X m 1 1  2  2 Pr n kgn (X ; 2) X k > D2 = Pr n kgn (X ; 2) X k > D2 ; K = l : l=0

We shall next show that ea h of the terms in the above sum onverges to zero in probability. We begin with the rst term, namely l = 0. The event K = 0 orresponds to either ~ 2 or n 1kX k2 > (2u + pD1 )2 . In the former ase, n 1 kX k2  D 1 kY X k2 = 1

pnD 2 n n 

=

 p

q

D2

2

 q nD2 2

~ V =kV k X

~2 + 2 D



p D2

q q D2 D2

~

~ + D~ 2

D2 ;

where the inequality follows by the triangle inequality and sin e n 1kX k2  D~ 2 here. Thus, 

   p 1  2 Pr n kgn (X ; 2) X k > D2 ; K = 0 = Pr n 1kX k2 > (2u + D1 )2

 Pr

n 1

kU k2 > 4u2



;

whi h onverges to zero by the ergodi ity of the overtext. To study the limiting behavior of the rest of the terms, x some 1  l  m. If k(x) = l then 1 kg (x;  ) xk2 = 1

 (a ; D~ ) 1 x + 1=2 (a ; D~ )V

2 2 2 2 l l n n n !   1 2 ~ = D~ 2 1 + Da 2 n akxk 1 + 1(l) + 2(l) l l ~Æ  D2 2 + 1(l) + 2(l); 99

where the se ond equality follows by the de nitions of , 1(l), and 2(l) (see (A.4), (4.68) and (4.69)), and the inequality follows by (4.61) sin e both n 1kxk2 and a are in the set A . Thus, l

l



   1  2 Pr n kg (X ; 2) X k > D2 ; K = l  Pr 1(l) + 2(l)  Æ~=2; K = l    Pr 1(l) + 2(l)  Æ~=2 ; n

whi h onverges to zero be ause both 1(l) and 2(l) onverge to zero in probability. 4.5.3

Analysis of Probability of Error

In this se tion, we show that whenever the watermarking rate R ex eeds C (D1 ; D2 ; 2 ), the sequen e of atta kers fg g de ned in (4.65) prevents the probability of error from de aying to zero. In the previous se tion, we have shown that for blo klength n large enough g (X ; 2 ) = g (X ; 2 ) with arbitrarily high probability. The ombination of these two fa ts will show that the probability of error is also prevented from de aying to zero by the sequen e of atta kers fg g de ned in (4.66). This analysis is arried out in a series of laims. In Lemma 4.11 we use a Fano-type inequality to show that an a hievable rate annot ex eed some limit of mutual informations. In Lemma 4.12, we upper bound these mutual informations by simpler expe tations, and in Lemma 4.13 we nally show that, in the limit, these expe tations do not ex eed C  (D1 ; D2 ;  2 ). u

n

n

n

n

u

Lemma 4.11. For any sequen e of en oders, atta kers, and de oders

orresponding sequen e of onditional distributions as

n

!1

;

, if

n

(

)g ; ) ! 0

; gn ; n

Pe fn ; gn

with

n

, then

R

Proof.

f(PX jU 1 ; PY jX )g

f(f

1I  liminf !1 n

U P1 PX U 1 PY X (X ; Y jU ; 1 ):

P

n

j

;

j

Utilizing Fano's inequality and the data pro essing theorem, nR

= =

( j U ; 1 ) H (W j U ; 1 ; Y ) + I (W ; Y j U ; 1 )  1 + nRP (f ; g ;  ) + I (X ; Y j U ; 1); H W

e

n

n

100

n

(4.70)

where the rst equality follows sin e is independent of (U 1) and uniformly distributed over f1 2nR g, and the inequality follows by the data pro essing theorem and by Fano's inequality. Dividing by and taking the lim inf, yields the desired result. W

;

;:::;

n

The mutual information term in the RHS of (4.70) is a little umbersome to manipulate, and we next exploit the fa t that takes on at most + 1 possible values to prove that 1 (X ; Y jU  ) has the same limiting behavior as 1 (X ; Y j U 1), i.e., that 1 K

n

I

m

;

n



1 (X ; Y jU  ) lim 1 n!1 n

I

;

I

K;

;



I

(X ; Y j U 1) = 0 K;

;

(4.71)

:

To prove (4.71) write I

(X ; Y j U 1) = (Y j U 1 ) = (Y j U 1 ) = (X ; Y jU 1 ) K;

;

(Y jX U 1 ) (Y jX U 1 ) ( ; Y jU 1)

h

K;

;

h

; K;

h

K;

;

h

;

;

I K

I

;

;

;

;

where all di erential entropies exist for the atta ker n , and the se ond equality follows sin e is a fun tion of X (4.62). Thus, the mutual information on the RHS of (4.70) an be written as g

K

I

Sin e

K

(X ; Y jU 1 ) = (X ; Y j U 1 ) + ( ; Y jU 1) ;

I

K;

;

I K

;

:

(4.72)

takes on at most + 1 di erent values, it follows that m

0  ( ; Y jU 1 )  ( )  log( + 1) I K

;

H K

m

;

and thus, sin e is xed and does not grow with the blo klength, m

1 ( ; Y jU  ) = 0 lim 1 n!1 n

I K

;

:

(4.73)

Equation (4.71) now follows from (4.73) and (4.72). It now follows from Lemma 4.11 and from (4.71) that in order to prove that the rate 101

R

is not a hievable, it suÆ es to show that R>

lim inf n1 I (X ; Y n!1

K;

j

U ; 1):

We upper bound in the following Lemma, whi h is proved in Appendix B.10. For any en oder with orresponding watermarking hannel PX jU satisfying

Lemma 4.12.

(2.1), if the atta ker g  of (4.65) with orresponding atta k hannel PY jX is used, then

n

1I

n

PU P1 PXjU 1 PY jX (X ; Y jK; U ; 1 ) ;

m X

 Pr(K = k) 12 log 1 + s(ak ; D1 ; D~ 2 ; k ) k=1 h i ~ 2 ; K ) : (4.74) EK C  (D1 ; D







To pro eed with the proof of the onverse we would now like to upper bound the RHS of (4.74). Sin e the fun tion C (D1 ; D2 ; u2 ) is not ne essarily on ave in u2 , we annot use Jensen's inequality. However, C (D1 ; D2 ; u2 ) is in reasing in u2 and is upper bounded by 1=2 log(1+ D1 =D2 ) for all u2 . Thus, we will omplete the proof by showing in the following lemma that if k is larger than u2 , albeit by a small onstant, then Pr(K = k) must be vanishing. The proof of this lemma an be found in Appendix B.11. Lemma 4.13.





For any ergodi overtext distribution PU with E U 4
,

0

()

P2. For every  > , n > n0  , and event

( )

E,



if E n 1 k

U

2

k jE



> u2

+ 5, then Pr( ) < E

Æ ; n .

With the aid of Lemma 4.13 we an now upper bound the RHS of (4.74). Spe i ally, we next show that for any ergodi stegotext PU of nite fourth moment and of se ond moment u2 , if R > C (D1 ; D~ 2 ; u2 ) and the atta ker gn of (4.65) is used, then f

lim sup EK n!1

g

h i C  D1 ; D2 ; K  C  D1 ; D2 ; u2 :

(

~

)

(

~

)

(4.75)

To see this, let Æ(; n) and n0() be the mappings of Lemma 4.13 orresponding to the 102

stegotext

PU g

f

. For any  > 0, let us de ne the set K

 () = fk : k > 2 + 5g:

u





By the de nition of k (4.64), it is lear that E n 1 U 2 K  () > u2 + 5. Thus, by the Lemma 4.13, Pr(K  ()) < Æ(; n). Sin e C (D1 ; D2 ; u2 ) is non-de reasing in u2 and is upper bounded by 12 log(1 + DD21 ), k

k j

2 K

2 K

h

i

( ~ )    = Pr (K =  ()) E CK K =  () +Pr (K   ()) E CK K ~ 2 ; u2 + 5) + Æ(; n) 1 log 1 + D1 ; C  (D1 ; D ~ 2

EK C  D1 ; D2 ; K 2 K

2 K



2 K



2 K

 ()

D2

where CK = C (D1 ; D~ 2 ; K ). Sin e this is true for every suÆ iently large n and sin e Æ(; n) approa hes zero as n tends to in nity, lim sup EK n!1

h

~

C  D1 ; D2 ; K

(

i

)



~

C  D1 ; D2 ; u2

(

+ 5):

Furthermore, sin e this is true for every  > 0 and sin e C (D1 ; D2 ; u2 ) is ontinuous in u2 , (4.75) follows. We now have all of the ne essary ingredients to prove that if the rate R ex eeds  prevents the probability of error from C  (D1 ; D2 ; u2 ), then the sequen e of atta kers gn de aying to zero. Indeed, let D~ 2 be hosen as in (4.60) so that R > C (D1 ; D~ 2 ; u2 ) and

onsider the atta ker gn of (4.65). Then g

f

( ~ ) h i lim sup EK C (D1 ; D~ 2 ; K ) n!1 lim sup 1 I (X ; Y K; U ;  )

R > C  D1 ; D2 ; u2 



n!1

j

n

= lim sup n1 I (X ; Y U ; 1 );

1

j

n!1

and the probability of error must be bounded away from zero by Lemma 4.11. Here the rst inequality is justi ed by the hoi e of D~ 2 (4.60), the se ond inequality by (4.75), the third inequality by (4.74), and the nal equality by (4.71). 103

4.5.4

Dis ussion: The Ergodi ity Assumption

We have proved that the IID zero-mean Gaussian overtext is easiest to watermark among all ergodi overtexts of nite fourth moment and of a given se ond moment. That is, we have 

shown that for any overtext satisfying these onditions, no rate above C  D1 ; D2 ; E U 2



i

is a hievable. An inspe tion of the proof, however, reveals that full ergodi ity is not required, and it suÆ es that the overtext law fPU g be stationary and satisfy a se ond-moment ergodi ity 1

assumption, i.e., that the varian e of n

Pn i

2 =1 Ui

approa h zero, as n tends to in nity; see

Appendix B.11. This ondition an sometimes be further relaxed if the pro ess has an ergodi de omposition (see e.g. [Gra88℄). We illustrate this point with a simple example of a overtext that has two ergodi modes. Let Z take on the values zero and one equiprobably, and assume that onditional on

Z the overtext fU g is IID zero-mean Gaussian with varian e 2 0 , if Z = 0, and with i

varian e

2

1,

u;

u;

if Z = 1. Assume that

is stationary with

  E U 2 = (2 k

0

u;

2

u;

0


0, and let

D  1

g



+

(6.9)

;

and R

=g

 D  1

Hb

g

Hb (D2 )



2

:

(6.10)

The en oder/de oder pair generates 2n(R+R0 ) IID ve tors, ea h a length-ng IID sequen e of Bernoulli(1=2) random variables. This odebook thus onsists of 2n(R+R0 ) random ve tors

fV (w; k); 1  w  2nR ; 1  k  2nR0 g: The en oder/de oder pair also sele ts ng indi es uniformly out of all subsets of f1; : : : ; ng of size ng, say P = fP (1); : : : ; P (ng)g. For a length-n ve tor u and a size-ng position set p, we write ujp to mean the length-ng ve tor at the points p, i.e., ujp = (up ; : : : ; up ng ). Given the overtext u, the watermark w, the

odebook fvg and the indi es p, the en oder sele ts the value (1)

k



(

)

= arg min wh (ujp  v(w; k)) :

k2nR0

1

140

(6.11)

The en oder then reates the stegotext as 8 >

:u i

(6.12)

:

otherwise

In other words, the en oder nds the odeword that best mat hes the overtext at the sele ted points and then repla es the overtext with the odeword at those points. At the other end, the de oder nds the odeword losest to the forgery

y at the sele ted points.

That is, he estimates the watermark as ^ = arg min minnR wh 1w 2nR 1k2 0

w

0

 yjp  v( 0 ) w ;k

(6.13)

:

The main fa t that we will use for the remainder of the proof is the following. For 2mR IID random ve tors fX 01 ; : : : ; X 02mR g where ea h X 0i is an IID sequen e of m Bernoulli(1=2) random variables, let 0

0

P

bin

(m; D; R0 ) = Pr



min

1i2mR0

m

1

0 wh (X i )  D

 :

(6.14)

Then, for any 0  D  1=2, 8 > 1 lim P bin (m; D; R0 ) = m!1 > :0 if R0 < 1

= ng,

m

X ) = h (U jP  V ( w

W; k

D

)) sin e

:

b (D)

(6.15)

H

h (U  X )  D1 with arbitrarily high probability, = D1 =g, and R0 = R0 =g. To see this, note that wh (U 

To show that the en oder satis es n we apply (6.15) with

b (D)

H

1w

U and X an only di er at points in P . We an now

write that Pr



min

1k2nR0

= Pr =

P



bin

n

1

h (U jP

w

min

1k2ng(R0 =g)

V(

(ng)

W; k

1

w

))  D1



h (V (W; k)) 

D1



g

(ng; D1 =g; R0 =g);

where the rst equality follows sin e

U jP  V ( 141

(6.16) W; k

) is independent of

U and thus the

U jP  V (W; k) does not depend on the realization of u (in parti ular, it is the same when u = 0). Finally, the RHS of (6.16) goes to zero by distribution of the Hamming weight of

the de nition R0 (6.9) and by (6.15). To analyze the probability of an in orre t de ision being made by the de oder, we rst note that the probability of error depends on the atta k only in the amount of distortion that is introdu ed. This follows from the randomized onstru tion of the en oder and de oder. Thus, it is suÆ ient to analyze the probability of error aused by an atta ker

X  y~ for some deterministi sequen e y~ with w (~y) = bnD2 . For example, we ould let y~ be 1 for 1  i  bnD2 and 0 otherwise. Thus, we an laim that  Pr n 1 w (Y jP  V (W; k ))  g(D2 + Æ) tends to one for any Æ > 0 and for the orre t

of the form

Y

=

h

i

h

watermark W . Conditioning on this event, the probability that an in orre t watermark will be sele ted by the de oder is given by for

Æ

P bin (ng; D2

suÆ iently small by the de nitions of

R0

+ Æ; (R + R0 )=g), whi h tends to zero

(6.9) and

R

(6.10) and by (6.15). Thus,

the overall probability of error an be made as small as desired by hoosing a large enough blo klength. To on lude the a hievability proof, we note that 2D1 arbitrarily hosen. Thus, any

BinWM (D ; D ) R < Cpub 1 2

g

1 and

 >

0 an be

is a hievable.

Converse In this se tion, we prove that no rates higher than

BinWM (D ; D ) Cpub 1 2

are a hievable for the

binary watermarking game. We do so, as in the private version, by xing the atta ker to ~ 2 < D2 . For this atta ker, the be a binary symmetri hannel with rossover probability D distortion onstraint will be met with arbitrarily high probability for blo klength

n

large

enough. We will further show, using the results of Lemma 2.29, that the apa ity with this BinWM (D ; D ~ 2 ). The onverse is ompleted by noting that this xed atta ker is given by Cpub 1 ~ 2. expression is ontinuous in D

The remainder of this se tion will be devoted to evaluating the apa ity of the following hannel with side information. The side information ve tor Bernoulli(1=2) random variables. The hannel is given by

PY jX;U

8 >

:~

D2

142

~2 D

if y = x if y 6= x

:

U

is a sequen e of IID

Note that the hannel does not depend on the side information. Instead, the side information restri ts the possible inputs sin e the input sequen e must be within distan e D1 of the side information, i.e.,

1

n

wh

(U ; x(W; U ))



D1

a.s.. We have shown in Lemma 2.29 that the

apa ity of this hannel is given by (

C D1

)=

max

V jU ; f :VU7!X ; E [wh (U;X )℄D1

P

( ;Y )

( ; U );

I V

(6.17)

I V

where V is an auxiliary random variable with nite alphabet, and the mutual informations are evaluated with respe t to the joint distribution

PU;V ;X;Y

(u; v; x; y

8> < )= >:0

( )

PU u PV

j (vju)P j U

Y X;U

(yjx; u) if x = f (v; u)

:

otherwise

In order to evaluate (6.17), let us set V = fv0 ; v1 ; v2 g, whi h we will see to be suÆ ient.

Re all that U = X = Y = f0; 1g. Without loss of generality, we an x the fun tion f to be

(

f v; u

8 > > >

> > :

u

if v = v0 if v = v1 : if v = v2

The only other possibility for f would be to set f (v; u) = u  1 for some v, whi h turns out to be suboptimal. We now only need to optimize over

j in order to evaluate C (D1 ). The distortion onstraint requires that (P j (v0 j1) + P j (v1 j0))=2 = D1 , sin e these are the only ases where u and x = f (u; v) di er. In order to simplify the optimization, we V U

also require that

PV

(v 2 ) = 1

g

for some 2D1

PV

U

V U

  1. g

We later hoose the best

g

as in

BinWM the de nition of Cpub (D1 ; D2 ). We now laim that under these onstraints, the optimal

distribution is given by

8 >> >< ( j )= >> >:1

if (u; v) = (0; v1 ) or (1; v0 )

D1

P



j

V U

v u

g

D1

if (u; v) = (0; v0 ) or (1; v1 ) :

g

if v = v2 143

Under this distribution, I (V ; Y )

I (V ; U )

= g  (Hb (D1 =g)

~ 2 )). Hb (D

Thus, the establish-

ment of this laim will omplete the proof of the onverse. In order to bound I (V ; Y )

I (V ; U )

for a generi distribution that satis es the above

onstraints, we will use the following al ulation

j

j

PV (v0 )H (U V = v0 ) + PV (v1 )H (U V = v1 )    PV (v1 )  PV (v0 ) = g Hb PU jV (1 v0 ) + Hb PU jV (0 v1 ) g g   PV (v0 )PU jV (1 v0 ) + PV (v1 )PU jV (0 v1 ) gHb g   D1 = gHb ; g

j

j

j



j

(6.18)

where re all that g = PV (v0 )+ PV (v1 ) and the inequality follows by the on avity of entropy. We an thus bound I (U ; V )

=

H (U )



1

j

H (U V )   D1 gHb g

(1

g )Hb ( );

(6.19)

where  = PU jV (0jv2 ) and the inequality follows by (6.18). We an also bound I (V ; Y )

=

H (Y )



1

j

H (Y V )

gHb (D2 )

where the inequality follows sin e

Y

(1

g )Hb

1

D2

+  (2D2

1)



(6.20)

;

is a binary random variable. Combining (6.19) and

(6.20), we see that I (V ; Y )

 

I (U ; V )    D1 g Hb g    D1 g Hb g





Hb (D2 )

+ (1



Hb (D2 )

;

g)







Hb ( )

Hb

1

D2

+  (2D2

1)



(6.21)

where the se ond inequality follows by maximizing over



(the maximum is a hieved at

= 1=2). The bound (6.21) is a hieved with equality when PV jU is used. This establishes BinWM (D ; D ~ 2 ), whi h ompletes the proof of the onverse. that C (D1 ) = Cpub 1



144

Chapter 7

Con lusions In this thesis, we have de ned the information theoreti apa ity of a watermarking system, and we have found this apa ity in many s enarios. We now omment on some of their interesting aspe ts of these ndings. We on lude in Se tion 7.1.1 with some ideas for future resear h. We have formalized a watermarking model in whi h a mali ious atta ker attempts to prevent reliable transmission of the watermark. We assume that this atta ker knows the workings of both the en oder and de oder (but not a se ret key shared by the en oder and de oder). We also assume that any forgery reated by the atta ker is only useful to him if the distortion between the forgery and stegotext is less than some threshold. Thus, we only

onsider atta kers that meet this distortion onstraint with probability one; we show that the apa ity is zero when the onstraint is enfor ed in expe tation (see Se tion 2.2.3). These assumptions require the watermarking system (both en oder and de oder) to be designed rst so that they are robust against any feasible atta ker. When the overtext has a Gaussian distribution, we have shown that the apa ity is the same in the private and publi versions; see Theorems 2.1 and 2.4. This surprising result demonstrates that the apa ity does not in rease if the de oder an use the overtext. Costa's \writing on dirty paper" [Cos83℄ has this same feature; we gave two extensions of his result in Se tion 2.5.4. Although the apa ity is the same for both versions, the apa ity a hieving oding s heme for the publi version is mu h more omplex than the s heme for the private version; ompare Se tions 4.2 and 4.3 for the SGWM game and Se tions 5.4 and 5.5 for the VGWM game. As one might expe t, the two versions of watermarking 145

do not always yield the same apa ity. For example, in the binary watermarking game of Se tion 2.2.6, there is a di eren e between the two versions. When the overtext is an IID sequen e of Gaussian random variables, we have shown that an optimal lossy ompression atta k prevents rates greater than apa ity from being a hievable. This property would allow designers to test the robustness of their watermarking systems against existing te hnology. Unfortunately, this property does not hold in general. Indeed, for an IID ve tor Gaussian, the ompression atta k is not optimal, and designing for robustness against su h an atta k yields a qualitatively di erent watermarking system; see Se tion 5.7 for more dis ussion. We have seen that the watermarking apa ity in reases with the un ertainty in the

overtext. Indeed, for the SGWM game, the apa ity is in reasing in the varian e of the

overtext; see Figure 2-1. Furthermore, with squared error distortion measures and a xed

overtext varian e, the overtext distribution with the largest apa ity is the Gaussian distribution, whi h also yields the highest entropy for out of all distributions with the same varian e. Intuitively, if the un ertainty in the overtext is large, then the en oder an hide more information in the stegotext sin e the atta ker learns little about the overtext from observing the stegotext. If the atta ker does not take advantage of its knowledge of the stegotext, then this property is not as strong. For example, if the atta ker an only add an arbitrary sequen e (see Se tion 2.2.2 on the additive atta k watermarking game) or an independent random sequen e (see Se tion 2.5.4 on extended writing on dirty paper), then the amount of un ertainty in the overtext has little bearing on the apa ity. In all ases, the watermarking system's knowledge of the overtext should be used to its advantage. It is suboptimal to ignore the en oder's knowledge of the overtext, as some systems do by forming the stegotext by adding the overtext and a sequen e that depends only on the watermark. One te hni al result that might be of general interest is Lemma B.7. There, we onsider the mutual information between a Gaussian random variable and some other random variable, with the se ond order moments xed. We show that this mutual information is maximized if the other random variable is jointly Gaussian with the rst one. 146

7.1

Future Resear h

In this se tion, we o er some dire tions for future resear h that expand on the themes we have presented in the thesis. 7.1.1

Gaussian Sour es with Memory

We would like to nd the apa ity of the watermarking game for squared error distortion and a stationary Gaussian overtext. That is, let us assume that the overtext

U

is a

stationary Gaussian pro ess with ovarian e E[Uj Uk ℄ = tjj kj. We also assume that the

overtext has a nite memory m0 so that tm = 0 for m > m0 . We believe that we an use the results on the ve tor Gaussian watermarking game (see Se tion 2.2.4) to des ribe the apa ity for this overtext distribution. Indeed, for any m, the ve tors U 0j = (Uj (m+m0 )+1 ; : : : ; Uj (m+m0 )+m ), for j = 0; 1; : : : form an IID sequen e of

Gaussian random ve tors with ovarian e matrix T (m) . Here, T (m) is the m  m matrix

(m) = t . We will write the set of eigenvalues of T (m) as f(m) ; 1  k  mg. with Tjk jj kj k The en oder/de oder ould use the oding strategy for the ve tor Gaussian sour e with the

additional restri tion of making the stegotext independent of the overtext at the indi es not used in forming fU 0 g. For example, the en oder ould set xj (m+m0 )+k = 0 for m < k  m + m0 . This restri tion is needed so that the atta ker annot gain any knowledge of

the overtext samples used for en oding the watermark. This restri tion uses some of the en oder's available distortion, but this extra distortion an be made negligible by taking m large enough. Thus, we onje ture that any rate less than the following limit should be a hievable: lim max m!1 D 1 2Dm (m1 )

  min C  D1k ; D2k ; k(m) ; D2 2Dm (m2 )

(7.1)

where the term inside the limit is the normalized apa ity of the ve tor Gaussian water-

marking game with ovarian e T (m) , en oder distortion level mD1 and atta ker distortion level mD2 . Furthermore, we believe that there exist atta kers that guarantee that no rates

larger than (7.1) are a hievable. Su h an atta ker would assume that the overtext is a blo kwise IID sequen e of Gaussian random ve tors. We would also like to simplify the limit (7.1) into a more meaningful expression. We

an use the fa t that the ovarian e matri es T (m) are Toeplitz matri es, and thus we an 147

des ribe the limiting behavior of their eigenvalues (see e.g., [Gra00, GS58℄). This is similar to the approa h that Gallager [Gal68, Se . 8.5℄ takes in des ribing the apa ity of an additive Gaussian noise hannel. 7.1.2

Dis rete Memoryless Covertext

We would like to study the apa ity of the publi version of a watermarking game with a general dis rete memoryless overtext and general distortion onstraints. One onje ture is that the general apa ity is given by the related mutual information games. In the private version, this is the solution that Moulin and O'Sullivan [MO99, OME98℄ derive with a maximum-likelihood de oder and average distortion onstraints. Furthermore, SomekhBaru h and Merhav [SBM01a℄ have re ently shown that for the private version with nite alphabets, the private mutual information game also gives the apa ity for a xed de oder and almost sure distortion onstraints. In the publi version, all of the watermarking

apa ities that we des ribed in Se tion 2.2 have oin ided with values of the related mutual information games. However, no one has yet given a proof for the general publi version. 7.1.3

Deterministi Code Capa ity for Publi Version

We would like to nd the apa ity when no se ret key is available to the en oder and de oder. We have addressed this for the private version (see Se tions 2.4.2 and 6.2), where we have found that the apa ity without a se ret key is typi ally the same as with a se ret key. However, this result hinges on the fa t that the en oder and de oder both have a

ess to the overtext and they essentially use part its randomness as a se ret key. Thus, these arguments do not work in the publi version, i.e., when the de oder does not know the

overtext. We all the apa ity when no se ret key is available | and thus the atta ker knows the exa t en oding and de oding mappings | the deterministi ode apa ity. We rst show that the deterministi ode apa ity is in general smaller than the random

ode apa ity. For squared error distortion, if D2 > 4D1 , then the atta ker an make the forgery into any possible output from the en oder. This implies that the atta ker an make the de oder have any possible output as well. Thus, no positive rate is a hievable in this regime. Re all, however, that the the apa ity of the Gaussian watermarking game with randomized odes is positive in this regime for u2 large enough. Thus, the deterministi

ode apa ity does not equal the randomized ode apa ity for the publi version. 148

Additive Atta k Watermarking We now dis uss the deterministi ode apa ity for the additive atta k watermarking game with IID Gaussian overtext and squared error distortions. This s enario is similar to the Gaussian arbitrarily varying hannel (GAVC), ex ept here the en oder an base his transmitted sequen e on the entire Gaussian noise sequen e. See Se tion 2.5.3 for more on the GAVC. Csiszar and Narayan [CN91℄ studied the deterministi ode apa ity of the GAVC and found that it is given by C DetGAVC (D1 ; D2 ; 2 ) =

8  > < 12 log 1 + > :0

D1 D2 +2



if D1 > D2 otherwise

:

(7.2)

In other words, the apa ity is either the random ode apa ity or zero, depending on the allowed distortion levels. In parti ular, the apa ity is not ontinuous in the parameters. We believe that, unlike the GAVC, the deterministi ode apa ity for the additive atta k watermarking game is ontinuous in the parameters. Further, we believe that there exists values of the parameters su h that the deterministi ode apa ity is non-zero yet stri tly less than the random ode apa ity. While this is not possible for AVCs without onstraints, Csiszar and Narayan [CN88b℄ showed that this is possible for AVCs with input and state

onstraints. Our argument for the above laims is brie y as follows. For D2 small enough, we believe that we an onstru t deterministi odes whi h a hieve the random ode apa ity for the   D1 . Su h a ode would be similar additive atta k watermarking game, namely 21 log 1 + D 2 to the random ode of Se tion 4.3.1, and ould be analyzed using te hniques from [CN91℄. One di eren e from [CN91℄ is that we would have to guarantee that ea h bin has a odeword that has a small inner produ t with the overtext. For any oding strategy of this form, the

riti al distortion level D2 will be determined by the energy in the overtext whi h is in the dire tion of the orre t odeword. We believe that by in reasing the number of odewords in ea h bin, we an in rease this energy at the expense of overall rate. Thus, the a hievable rates for this oding strategy should ontinuously de rease to zero as D2 in reases instead of a non- ontinuously as in the GAVC. Besides analyzing su h a oding strategy, we also need a onverse to show that no higher rates are a hievable. 149

7.1.4

Multiple Rate Requirements

We now onsider a watermarking model where the amount of information that an be re overed depends on the distortion introdu ed by the atta ker. For example, let there be two independent watermarks W of R bits and W of R bits and two atta ker distortion h

h

l

l

levels D2 > D2 . The en oder will produ e the stegotext as a fun tion of the overtext ;h

;l

and both watermarks su h that the distortion between the stegotext and overtext is less than D1 . If the distortion between the forgery and stegotext is D2 , then the de oder ;h

should be able to re over W . However, if the distortion between the forgery and stegotext h

is D2 , then the de oder should be able to re over both W and W . The main question ;l

h

l

is what rates pairs are a hievable for given values of D2 and D2 . (Or onversely, what ;h

;l

distortion pairs are allowable for given values of R and R .) This problem an be thought h

l

of as a broad ast hannel with degraded message set, see e.g., [Cov75, KM77℄. However, the broad ast hannel is arbitrarily varying as in [Jah81℄. Let us onsider this example for an IID Gaussian overtext (zero-mean, varian e-2 ) and squared error distortion. Using the results of Theorem 2.1, we an say that both (R ; R ) = h

(C  (D

1 ; D2;h

; 2 ); 0)

and (R ; R ) = h

l

(0; C  (D

1 ; D2;l

; 2 ))

l

are a hievable. However, it is not

immediately lear that any linear ombination of these rate pairs are a hievable using the usual time-sharing argument. Indeed, it seems that in order to e e tively time-share against the atta ker, both odes will have to use the same stegotext power (i.e., the same value of A). The optimal value of A depends on the atta ker's distortion level and hen e for any

ommon value of A that the two odes hoose, at least one of the odes will be transmitting

below apa ity. On the positive side, for any value of A and any 0    1, the following rate pairs are a hievable



1 (R ; R ) =   log(1 + s(A; D1 ; D2 ; 2 )); (1 2 h

l

;h



1 )  log(1 + s(A; D1 ; D2 ; 2 )) : 2 (7.3) ;l

This follows sin e the en oder and de oder an randomly de ide on n lo ations for the high distortion ode, with the remaining positions for the low distortion ode. Sin e the two odes are using the same stegotext power, the atta ker annot fo us his distortion on either ode. The question remains as to how mu h better, if any, an we do than this simple time sharing strategy. 150

Appendix A De nitions for Gaussian overtext In this se tion, we present many of the de nitions that are used with Gaussian overtexts (i.e., Chapters 3, 4, and 5). We also dis uss some of the basi properties of some of the mappings. We now summarize the de nitions that are used for all of the hapters with Gaussian

overtexts. Re all that 1

(A; D1 ; 2 ) = (A 2 D1 ); 2 (A; D1 ; 2 ) b1 (A; D1 ; 2 ) = 1 + ; 2 2 (A; D1 ; 2 ) ; b2 = b2 (A; D1 ; 2 ) = D1 2 D2 ;

(A; D2 ) = 1 A

(A; D2 )b2 (A; D1 ; 2 ) s(A; D1 ; D2 ; 2 ) = ; D2 b1 (A; D1 ; 2 ) (A; D1 ; D2 ; 2 ) = 1 ; 1 + s(A; D1 ; D2 ; 2 )

A(D ; D ;  1

2



2





) = A : max D2 ; 

p

D1

2 



(A.1) (A.2) (A.3) (A.4) (A.5) (A.6) 2 

p

A + D

1

;

(A.7)

and, nally,

C  (D1 ; D2 ; 2 )

8 >

:

1 2

log 1 + s(A; D1 ; D2 ; 2 )

0



if A(D1 ; D2 ; 2 ) 6= ; otherwise

151

: (A.8)

D = 1, σ2 = 4

D = 9, σ2 = 4 0.4

1

0.5

0.03

0.4

0.3

2

0.035

0.025

2

2

C (D ,D ,σ )

D = 1, D = 9

1

2

0.3

0.02

0.2

0.015

*

1

0.2 0.01

0.1

0

0.1

2

4

6

8

10

12

30

0

0.005 2

4

6

8

0

4

6

8

4

6

8

10

12

14

10

12

14

10 22

2

A (D1,D2,σ )

25

*

20

9

20

8

18

7

16 14

6

15

12 5

10

10 2

4

6

D1

8

10

12

4

2

4

D2

6

8

8

2

σ

Figure A-1: Example plots of C  (D1 ; D2 ; 2 ) and A (D1 ; D2 ; 2 ) for di erent parameter values.

The fun tion C  (D1 ; D2 ; 2 ) is the apa ity of the s alar Gaussian watermarking game (Theorem 2.1) and the value of the Gaussian mutual information game (Theorem 3.1); it also plays a riti al role in the apa ity of the ve tor Gaussian watermarking game (Theorem 2.4). In Figure A-1, we have plotted C  (D1 ; D2 ; 2 ) against ea h of its three arguments. We have also plotted the maximizing A in (A.8) along with the lower and upper limits in the de nition of

A(D1 ; D2 ; 2 ).

We see that C  (D1 ; D2 ; 2 ) is non-de reasing in

D1 and 2 and non-in reasing in D2 . Further, note that C  (D1 ; D2 ; 2 ) is neither onvex

nor on ave in D1 and 2 (there are points of in e tion at D1

 8:2 and 2  6:9 in the

rst and third plots of Figure A-1). However, C  (D1 ; D2 ; 2 ) is onvex in D2 (this follows sin e 21 log(1 + s(A; D1 ; D2 ; 2 )) is onvex in D2 ). In the following lemma, we des ribe the A that a hieves the maximum in the de nition 152

of C (D1 ; D2 ; 2 ). A(D1; D2 ; 2 ) is non-empty, then the maximizing A in (A.8) is a hieved p 2 su h that p(A; D ; D ; 2 ) = 0, where by the unique maxf 2 + D1 ; D2 g < A <  + D1 1 2

Lemma A.1.

If

p(A; D1 ; D2 ; 2 ) = A3





D D D1 + 2 + 2 A2 + 2 (2

2

2

D1 )2 :

First note that, independent of D2 , it is suÆ ient to onsider only A su h that p  pD 2  A  2 + D , there 2 + D1  A   + D1 2 . This follows sin e for any  1 1 +  p 2 exists a 2 + D1  A0   + D1 su h that s(A0; D1 ; D2 ; 2 )  s(A; D1 ; D2 ; 2 ) +, independently of D2 . Namely, let A0 = 2(2 + D1 ) A. Proof.

Sin e log(x) is monotoni ally in reasing in x, the maximizing A in (A.8) is also the A 2 A(D1 ; D2 ; 2 ) that maximizes the produ t (A; D2 )b2 (A; D1 ; 2 ). We an al ulate that 

(A; D2 )b2 (A; D1 ; 2 ) = A

p(A; D1 ; D2 ; 2 ) ; 2A2 2

and 1 2

(A; D2 )b2 (A; D1 ; 2 ) = 2 A2 2



1



D2 (2 D1 )2 : A3

p

(A.9)

Sin e A  D2 and A  2 + D1 > j2 D1 j, the RHS of (A.9) is negative and hen e

(A; D2 )b2 (A; D1 ; 2 ) is stri tly on ave in A. Thus, there an be at most one lo al extremum in A(D1 ; D2 ; 2 ), whi h would also be the maximizing value. There is exa tly one lo al extremum sin e there exists an A 2 A(D1 ; D2 ; 2 ) su h that p(A ; D1 ; D2 ; 2 ) = 0. This follows sin e p(A; D1 ; D2 ; 2 ) is ontinuous in A; sin e p(2 + D1 ; D1 ; D2 ; 2 ) =

2D1 D2 2 < 0;

sin e if 2 + D1 < D2 <  + pD1 2, then p(D2 ; D1 ; D2 ; 2 ) =

D2 

2

D2

(2 + D1 )2 42 D1 153



< 0;

p

where the inequality follows by the above assumption; and sin e if D2 < ( + D1 )2 , then

p



+

  p p 2 p 4 D1 ; D1 ; D2 ; 2 = 2 2 D1  + D1  p p 2   2 D1  + D1 > > 0;

where the rst inequality follows from the above assumption.

154

 p D2  2  + D1 + 4 2 D1 2 p 2 D1



Appendix B

Te hni al Proofs In this appendix, we prove many of the te hni al laims that have been given throughout the thesis. In ea h se tion, we repeat the statement of the theorem or lemma to be proved followed by the proof. A referen e to the theorem or lemma being proved is given in the title of ea h se tion as well as in parenthesis at the beginning of ea h restatement. B.1

Proof of Theorem 2.3

Theorem B.1 (2.3).

For the watermarking game with real alphabets and squared error

distortion, if the overtext

U

satis es



lim infn!1 E n1 kU k2




0 and any integer n0 > 0 there exists some n > n0 su h that ~  Amax;

an

155

(B.2)

where A = ( +  + pD ) : Let the atta k key  take on the value 0 with probability p, and take on the value 1 with probability 1 p, where max

1

2

2

= min

p





D2 Amax

;

1

(B.3)

:

For the blo klength n onsider now the atta ker ~ (x;  ) =  x

gn

2

(B.4)

2

that with probability p produ es the all-zero forgery, and with probability (1 p) does not alter the stegotext at all. Irrespe tive of the rate (as long as b2 > 1) and of the version of the game, this atta ker guarantees a probability of error of at least p=2. It remains to he k that g~  (x;  ) satis es the average distortion onstraint. Indeed, the average distortion introdu ed by g~  is given by nR

n

2

n

1 E kX g~  (X ;  )k  = p  1 E kX k  n n 2

2

n

2

pA D ;

max

2

where the equality follows from (B.4), the subsequent inequality by (B.1) and (B.2), and the last inequality by (B.3). B.2

Proof of Lemma 2.1

Lemma B.1 (2.1).

For the ommuni ation with side information model with nite alpha-

bets, if the side information is available non- ausally to the en oder only and the en oder is required to satisfy

1 n

Pn

i=1

(

d1 ui ; xi

)  D , a.s., for some non-negative fun tion d (; ). 1

1

Then, the apa ity is given by NCSI

Cpub

(D ) = 1

P

max

V jU ; f :VU7!X ;



E [d1 (U;X )℄

where

V

( ; )

I V Y

( ; )

I V U ;

(B.5)

D1

is an auxiliary random variable with nite alphabet, and the mutual informations

are evaluated with respe t to the joint distribution (2.30).

156

The a hievability part follows dire tly from the proof of Gel'fand and Pinsker [GP80℄. ~ 1 < D1 . We simply hoose a P j and a fun tion f : V  U 7! X su h that E [d1 (U; X )℄  D

Proof.

V U

We then use the same oding strategy as in [GP80℄. The distortion between the side ~ 1 . By hoosing n large information and the transmitted sequen e will be approximately D enough, we an ensure that this distortion ex eed D1 with arbitrarily small probability. The NCSI (D1 ) is ontinuous in D1 . Furthermore, a hievability proof is ompleted by noting that Cpub

it is non-de reasing and onvex in

D1

; see [BCW00℄. Combining the onverse of Gel'fand

and Pinsker [GP80℄ with the usual onverse for hannels with input onstraints (see e.g., [CT91℄[Se t. 10.2℄), we an show that no rates greater than max

( ;Y )

( ; U)

I V

PV jU ; PX jV;U

(B.6)

I V



E [d1 (U;X )℄

D1

are a hievable. Thus, we only need to show that (B.6) is equal to the RHS of (B.5), the proposed apa ity expression. Gel'fand and Pinsker [GP80℄ showed this equivalen e without the distortion onstraint. However, their proof does not arry through to this ase. Their basi idea is that I (V ; Y ) Thus, a general PX

j

V ;U

PX

j

V ;U

( ; U ) is onvex in

I V

PX

j

V;U

for all other distributions xed.

, whi h an be written as a onvex ombination of deterministi

's, will always be dominated by a deterministi

PX

j

V ;U

. However, a general

PX

j

V ;U

that satis es the distortion onstraint might not be a onvex ombination of deterministi PX

j

V ;U

's that also satisfy the distortion onstraint.

We now prove that (B.6) is equal to the RHS of (B.5). We make the assumption that

X, U

and

Y are nite.

V is nite, whi h Gel'fand and Pinsker show

We also assume that

to be suÆ ient (this does not hange with the distortion onstraint). Without loss of generality, there exists some

x

and s. If no su h

fun tions f1 ; : : : ; f : U n

PX

v0

7! X

j

V ;U

v0

2V

su h that 0

j

< PX

V ;U

(xjv0 ; u)


< j~ ) = > :1

( j~ ) if v~ 2 Vnfv0 g

PX jV;U x v ; u

(

~ ;U x v ; u X jV

and

8 > < (~j ) = > :

PV

P~

V jU

if v~ = v

fx=fi (u)g

jU

v u

i PV

(~v ju)

i

2V

if v~ 2 Vnfv0 g

(v0 ju) if v~ = v

jU

0

0

i

2V

(B.8)

;

0

(B.9)

:

0

We now ompare the original joint distribution between V , U , X and Y with the new joint distribution V~ , U , X and Y . We laim that I (V~ ; Y ) I (V~ ; U )  I (V ; Y ) I (V ; U ) and the expe ted distortion is the same under both distributions, whi h will omplete the proof of the laim. This follows sin e we an repeat this pro ess until there is no su h v0 . We rst note that the joint distribution on U , X , and Y is the same under both distributions. That is,

X

PY

jX;U

(yjx; u)P

~ ;U X jV

(xjv~; u)P ~ (~vju)P (u) U

V jU

~ v ~2V

=

X

(

j

)

(

j

)

PY jX;U y x; u PX jV;U x v; u PV

jU

(vju)P (u): U

v 2V

In parti ular, both

( ) and

)℄ are una e ted by the hange in distribution. Se ond, we onsider a joint distribution between V , V~ and U de ned by H Y

P~

V jV;U

[ (

E d U; X

8 >

: 1 v v; u

fv ~=v g

i

fv =v0 g

if v~ 2 Vnfv0 g if v~ = v

0

i

2V

;

0

whi h is onsistent with both joint distributions. Under this distribution, the random variables U , V and V~ form a Markov hain. Thus, by the data pro essing inequality, ~ ; U ). We nally note that I (V ; U )  I (V PY

jV

(yjv0 ) =

X n

i P

~ Y jV

(yjv ); 0

i

i=1

whi h follows by the de nitions (B.7), (B.8) and (B.9). We an thus show, using the 158

on avity of entropy, that

j  H (Y jV ) and thus I (V~ ; Y )  I (V ; Y ) (sin e H (Y ) is

~) H (Y V

the same for both distributions). These three observations nish the proof of the laim.

B.3

Proof of Lemma 3.1

Lemma B.2 (3.1).

sup

X U 2D1 (D1 ;PU ) PV U X

P

j

j

P

For any

n>

0

and any overtext distribution

inf

Y X 2D2 (D2 ;PU ;PX U ) j

j

PU ,

Ipub (PU ; PX jU ; PV jU ;X ; PY jX )



;

sup

inf

X U 2D1 (D1 ;PU ) PY X 2D2 (D2 ;PU ;PX U )

P

j

j

j

Ipriv (PU ; PX jU ; PY jX ):

We rst show following Chen [Che00℄ that for arbitrary distributions PU , PX jU , PV jU X , and PY jX the mutual information terms satisfy Ipriv  Ipub . All of the below

Proof. ;

mutual information terms are evaluated in terms of these distributions. We will assume that

Ipriv

is nite, sin e otherwise the laim is trivial. We an write that Ipriv (PU ; PX jU ; PY jX )

X ; Y jU ) I (V ; Y jU )  I (V ; U ; Y ) I (V ; U )  I (V ; Y ) I (V ; U )

=

n

1

I(



n

1

=

n

1



n

1

=

Ipub (PU ; PX jU ; PV jU ;X ; PY jX )

where (B.10) follows by the data pro essing inequality (see e.g. [CT91℄) be ause

(B.10) (B.11)

V

and Y

are onditionally independent given (X ; U ); and where (B.11) follows by the hain rule for mutual informations. We next show that the values of the mutual information games also behave as desired.

 and P  Fix n and  > 0 and let PX jU V jU X be distributions that are within  of the supremum ;

159

in (3.6). Thus, sup inf

XU Y X

P

j

P

j



Ipriv (PU ; PX jU ; PY jX )

 

inf

P

YX j

inf

P

YX j



Ipriv (PU ; PX jU ; PY jX )



sup

P



Ipub (PU ; PX jU ; PV jU ;X ; PY jX )

inf

X U ;PV U X PY X j

j

j

;

Ipub (PU ; PX jU ; PV jU ;X ; PY jX )

;

where the se ond inequality follows by the pre eding paragraph and the nal inequality

 and P  follows by our hoi e of PX jU V jU X . The lemma follows sin e as small as desired. ;

B.4

0 an be hosen

 >

Proof of Lemma 3.2

Lemma B.3 (3.2).

For any n > 0, any overtext distribution PU , any watermarking han-

nel PX jU , and any xed distortion 

D2 > An

Ipriv PU ; PX jU ; (PY jnX )n A





 Ipriv

(P ) ; (P j ) ; (P j ) An

G n U

n



Y X

X U

1 = log 1 + s(A ; D1 2 n



An

n

;n



2

; D2 ; u;n ) ;







where 2 = E U n 1 kU k2 ; D1 = E U X U [n 1 kX U k2 ℄; A = E U X U n 1 kX k2 ; P denotes a zero-mean Gaussian distribution of varian e  2 ; P j is the watermarking ;n

P

u;n

P

P

n

j

P

P

j

An

G U

u;n

hannel des ribed in Se tion 3.2.2 for the parameters

X U

2 , D u;n 1;n

atta k hannel des ribed in Se tion 3.2.1 for the parameters

D2

and

and

An ;

and

A

PY jnX

is the

An .

Proof. The proof is organized as follows. In Lemma B.4, we show that a Gaussian overtext distribution and a jointly Gaussian watermarking hannel maximize the mutual information term of interest. Using this result and some basi mutual information manipulations, we then omplete the proof. Lemma B.4.

 PU;X

Let

PU;X

be an arbitrary distribution with ovarian e matrix

be a jointly Gaussian distribution of ovarian e matrix Ipriv (PU ; PX jU ; PY jX ) A

where

PYAjX

is de ned in Se tion 3.2.1 and

 Ipriv(P  ; P  j U

A > D2

160

X U

 KU X

=K

; P Y jX ); A

is arbitrary.

UX

KU X ,

. Then,

and let

Proof.

Re all that under the atta k hannel P

A

j

Y X

, the random variables Y and X are related

by Y = X + S2 , where = (A; D2 ) (de ned in (A.4)) and S2 is mean-zero varian e- D2 Gaussian random variable independent of X . Thus, h U X jU A (Y jX ) = h(S2 ) = h U  A (Y jX ); X jU Y jX Y jX P

P

P

P

P

(B.12)

P

where these and the below di erential entropies exist by the stru ture of the atta k hannel under onsideration. Let U be the linear minimum mean squared-error estimator of Y given U . Note that depends on se ond-order statisti s only, so that its value under P 

U;X

is the same as under P

U;X

. Thus,

h U X jU A (Y jU ) = h U X jU A (Y U jU ) Y jX Y jX  h U XjU YAjX (Y U )    21 log 2eE U XjU YAjX (Y  1  = log 2eE U X jU YAjX (Y 2 = h U  A (Y jU ); P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

U )2

P

U )2

 

(B.13)

X jU PY jX

where the rst inequality follows sin e onditioning redu es entropy, the se ond inequality follows sin e a Gaussian distribution maximizes entropy subje t to a se ond moment onstraint, and (B.13) follows sin e under P  , P  j and P U

are jointly Gaussian and hen e Y

X U

A

j

Y X

the random variables U and Y

U is Gaussian and independent of U .

Combining (B.12) and (B.13) with the de nition of Ipriv (see (3.1)) ompletes the proof of Lemma B.4. To ontinue with the proof of Lemma 3.2, if under P  and P  j the random variables U

X U

U and X are zero-mean and jointly Gaussian, then 



Ipriv (P ; P U



j

X U

;P

A

j

Y X







(A; D2 ) b2 E X 2 ; E (X 1 ) = log 1 + 2 D2



U )2 ; E U 2

 !

;

(B.14) where b2 (; ; ) is de ned in (A.3) and A > D2 . Note that b2 and hen e the whole expres





sion (B.14) is on ave in the triple (E U 2 ; E (X 161







U )2 ; E X 2 ), as an be veri ed by

he king that the Hessian is non-negative de nite. We an now ompute that 

Ipriv PU ; PX jU ; (P jn ) A

n



Y X



  1X Ipriv P i ; P i j i ; P jn n =1 n

A

U

X

U

Y X

i



1 n

n X 1 i=1









(A ; D2 ) b2 E X 2 ; E (X log 1 + 2 D2 n

i

(A ; D2 ) b2 A ; D1 ; 2  21 log 1 + D2  1 = log 1 + s(A ; D1 ; D2 ; 2 ) ; 2 n

n

n

;n

;n



U )2 ; E U 2

i

i

 !

i

!

u;n

u;n

where the rst inequality follows by the hain rule and sin e onditioning redu es entropy, the se ond inequality follows by Lemma B.4 and by (B.14), the third inequality follows by the above dis ussed on avity of (B.14), and the nal equality follows by the de nition of

s(; ; ; ) (A.5). We obtain equality in ea h of the above inequalities when PU = (P ) and G n

U

PX jU = (P jn ) . This ompletes the proof of Lemma 3.2. A

n

X U

B.5

Proof of Lemma 3.3

Lemma B.5 (3.3).

Consider an IID zero-mean varian e-2 Gaussian overtext (denoted u



(P ) ) and xed distortions D1 and D2 . If PY jX satis es E( UG A )n Y X n XU 2 D2 , then for all A 2 A(D1 ; D2 ;  ), G n U

P

P

P

j

j

1

kY

X k 

u



Ipub (P ) ; (P j ) ; (P j G U

n

A

n

X U

A

V U;X

) ; P Y jX n









Ipub (P ) ; (P j ) ; (P j ) ; (P j )  1 log 1 + s(A; D1 ; D2 ; 2 ) : (B.15) = 2 G n U

A

X U

n

A

V U;X

n

A

n

Y X

u

Here, P j and P j are the watermarking hannels des ribed in Se tion 3.2.2 for the parameters 2 , D1 and A and P j is the atta k hannel des ribed in Se tion 3.2.1 for the A

A

X U

V U;X

A

u

Y X

parameters D2 and A.

Proof. For every A

2 A(D ; D ;  1

2

2

u

), onsider the one-dimensional optimization based on 162

the watermarking hannel des ribed in Se tion 3.2.2 M (D2 ; A)

=

inf PY X 2D2 (D2 ;PUG ;PXA U ) j



G

A



A

PU ; PX jU ; PV jU;X ; PY jX

Ipub

:

(B.16)

j

In Lemma B.6, we derive some properties of M (D2 ; A), whi h we subsequently use to show that

M (D2 ; A)

is a lower bound on the LHS of (B.15). In Lemma B.7, we show that

when omputing M (D2 ; A) we only need to onsider atta k hannels that make the random variables Y and V jointly Gaussian. We nally use this laim to ompute M (D2 ; A).

Lemma B.6.

in reasing in

For a xed

A,

the fun tion

M (D2 ; A)

de ned in (B.16) is onvex and non-

D2 .

Proof. The fun tion M (D2 ; A) is non-in reasing in D2 sin e in reasing D2 only enlarges the

D (D ; PUG; PXAjU ). To show that M (; A) is onvex in D , we rst note that

feasible set

2

2

2

Ipub PU ; PX jU ; PV jU;X ; PY jX



= I (V ; Y )

I (V ; U )

is onvex in PY jX . Indeed, I (V ; U ) does not depend on PY jX and I (V ; Y ) is onvex in PY jV and hen e also onvex in PY jX sin e the random variables

hain.

V, X

and

Y

form a Markov

0, let the watermarking hannels PYr jX D2 (Dr ; PUG ; PXAjU ) and PYs jX 2 D2(Ds ; PUG; PXAjU ) be su h that Given the parameters

A, Dr , Ds ,



and

 >

G

A

A

r

G

A

A

s



Ipub PU ; PX jU ; PV jU;X ; PY jX

2

 M (Dr ; A) + ;

(B.17)

 M (Ds ; A) + :

(B.18)

and  Ipub

PU ; PX jU ; PV jU;X ; PY jX



 s , where  = (1 For any 0    1, let PYjX = PYr jX + P Y jX 163

).

We omplete the proof

with M (Dr

 s ; A) + D

  



 G A A  PU ; PX jU ; PV jU;X ; PY jX     G A A r  pub PUG ; P A ; P A ; P s Ipub PU ; PX jU ; PV jU;X ; PY jX + I X jU V jU;X Y jX Ipub

 (Ds ; A) + ; M (Dr ; A) + M

where the rst inequality follows sin e inequality follows by the onvexity of

EP P A P  [(X U X jU Y jX

Y )2 ℄

Ipub (PU ; PX jU ; PV jU;X ;

 s , the se ond  Dr + D

),

and the nal inequality

follows by (B.17) and (B.18). The laim follows sin e  is an arbitrary positive number. We ontinue with the proof of Lemma 3.3 by demonstrating that

M (D2 ; A)

is a lower

bound on the LHS of (B.15). Indeed, if

Y jX 2 D2 (D2 ; (PUG )n ; (PXAjU )n );

(B.19)

P

then  Ipub

(PUG )n ; (PXAjU )n ; (PVAjU;X )n ; PY jX



   

n

1X n

1

i=1

n X

n M

  G A A Ipub PU ; PX jU ; PV jU;X ; PYi jXi  M

i=1 

EP G P A P U X jU Yi jXi

 n E(P G P A )n P Y j X U X jU



1





(Yi

Xi )

kY

Xk  ; A

2

2

;A



M (D2 ; A);

where the rst inequality follows sin e the watermarking hannel is memoryless, by the

hain rule, and by the fa t that onditioning redu es entropy; the se ond inequality follows by the de nition of

 );

M( ;

and the nal two inequalities follow by Lemma B.6 and by

(B.19) so that the expe ted distortion is less than

D2 .

To omplete the proof of Lemma 3.3, we show that a minimum in the de nition of is a hieved by the distribution PYAjX of Se tion 3.2.1. To do so, we rst show in Lemma B.7 that we only need to onsider onditional distributions PY jX under whi h V

M (D2 ; A)

and Y are jointly Gaussian. A similar lemma was given in a preliminary version of [SVZ98℄ and in [MS01℄, but neither proof is as general as the one below. Lemma B.7.

Let V and Z be jointly Gaussian random variables with ovarian e matrix 164

K

V Z

. Let

Y

be another (not ne essarily Gaussian) random variable related to

ovarian e matrix

Proof.

K

V Y

. If

=K

K

V Y

VZ

I (V ; Y )  I (V ; Z ).

, then

V

via the

It suÆ es to prove the laim when all random variables are zero mean. If I (V ; Y ) is

in nite then there is nothing to prove. Thus, we only onsider the ase where I (V ; Y ) < 1:

For the xed ovarian e matrix K = K

VY

=K

VZ

(B.20)

, let the linear minimum mean squared-

error estimator of V given Y be Y . Note that the onstant is determined uniquely by the orrelation matrix K and thus Z is also the linear minimum mean squared-error estimator of V given Z . Sin e the random variables V and Z are jointly Gaussian, this is also the minimum mean squared-error estimator, and furthermore V of Z . If the onditional density f

V jY

exists, then

I (V ; Y ) = h(V )

 

Z is independent

h(V ) h(V )

= h(V )

h(V jY )

(B.21)

h(V Y ) 1 log 2eE [(V 2 1 log 2eE [(V 2

(B.22) Y )2 ℄

(B.23)

Z )2 ℄

(B.24)

= I (V ; Z )  E [V 2 ℄E [Z 2 ℄  1 = log 2 jK j

(B.25) (B.26)

VZ

and the laim is proved. Here, (B.21) follows sin e we have assumed that a onditional density exists; (B.22) follows sin e onditioning redu es entropy; (B.23) follows sin e a Gaussian maximizes di erential entropy subje t to a se ond moment onstraint; (B.24) follows sin e K

V Y

sin e V

=K

V Z

and hen e all se ond order moments are the same; (B.25) follows

Z is both Gaussian and independent of Z ; and (B.26) follows sin e V and Z are

zero-mean jointly Gaussian random variables. By (B.20) the onditional density f This follows sin e (B.20) implies P

V jY

V;Y

exists if Y takes on a ountable number of values.

P

V

P , i.e., the joint distribution is absolutely Y

ontinuous with respe t to the produ t of the marginals. In parti ular, P

V jY

every y su h that P (y) > 0. Furthermore, V is Gaussian and hen e P Y

165

V

(jy)  P for V

 , where  is

the Lebesgue measure. Thus, P (jy)   for every y su h that P (y) > 0 and hen e the

onditional density exists. To on lude the poof of the laim, we now onsider the ase where Y does not ne essarily take on a ountable number of values and I (V ; Y ) < 1. This ase follows using an approximation argument. For any  > 0, let q : R 7! f: : : ; 2; ; 0; ; 2; : : : g be a uniform quantizer with ell size , i.e., q(x) maps x to the losest integer multiple of . Let Y = q(Y ). By the data pro essing inequality, Y

V jY

I (V ; Y )  I (V ; Y ):

(B.27)

The random variable Y takes on a ountable number of values and by (B.20) and (B.27), I (V ; Y ) < 1. Thus, the onditional density f  exists and V jY

I (V ; Y ) 

1 log  E [V 2 ℄E [Y2 ℄  : 2 jK  j

(B.28)

VY

Sin e jY Yj  =2, it follows that E [Y2 ℄ ! E [Y 2 ℄ and jK  j ! jK j as  # 0. Sin e (B.27) and (B.28) hold for all  > 0, the laim follows by letting  approa h zero. VY

VY

To ontinue with the evaluation of M (D2 ; A), we note that sin e under the distributions P , P , and P , the random variable V has a Gaussian distribution, the above laim allows us to assume throughout the rest of the proof that the atta k hannel P makes the random variables V and Y jointly Gaussian. Re all that the random variables V , X , and Y form a Markov hain. Thus, if we let Y = 1 X + S1, where S1 is Gaussian random variable independent of X with varian e 2  0, then we an generate all possible orrelation matri es K by varying the parameters 1 and 2 . Sin e the mutual information I (V ; Y ) only depends on the orrelation matrix K , we an ompute the quantity M (D2 ; A) by only onsidering su h atta k hannels. Let P 1 2 be the atta k hannel su h that the random variable Y is distributed as

1 X + S1 , where S1 is a random variable independent of X , whi h is Gaussian of zero mean and varian e 2 . Under this distribution, G U

A

A

X jU

V jU;X

Y jX

VY

VY

;

Y jX

E

1 2 [(X

; A P X jU Y jX

PU P

Y )2 ℄ = (1

166

1 )2 A + 2 :

We require that P

1 ; 2

j

Y X

2 D2(D2 ; P ; P G U

A

j

X U

2

21

), and thus

2 ;  (AD; D )

(B.29)

2

where equality is a hieved by 1 = (A; D2 ) and 2 = (A; D2 )D2 , and where (; ) is de ned in (A.4). Thus, if = (A; D1 ; D2 ; 2 ),  = (A; D1 ; 2 ) and b1 = b1 (A; D1 ; 2 ) u

u

u

(see Appendix A), then 

Ipub P ; P G

U

A

j

X U

;P

A

j

V U;X

;P

1 ; 2



j

Y X

1 2 2 + 2  + D1 ( + b1 1)2 2 log = 2 2 2 + 2  + D1 (( 1)b1 2 + A)2 =(A +   21 log 1 + s(A; D1 ; D2 ; 2 ) ;

!

u

u

u

u

2

2 1

)

u

where the equality follows by evaluating Ipub with the given distributions and the inequality follows by the relevant de nitions and by (B.29). Equality is a hieved when 1 = (A; D2 ) and 2 = (A; D2 )D2 . The ombination of all of the above arguments shows that Lemma 3.3 is valid. Indeed, the hoi e of the memoryless watermarking hannels (P

A

j

X U

) and (P n

A

j

V U;X

) guarantees a n

mutual information of at least 12 log(1 + s(A; D1 ; D2 ; 2 )). Furthermore, when these wateru

marking hannels are used, the memoryless atta k hannel (P

B.6

A

j

Y X

) is optimal. n

Proof of Lemma 4.3

 > 0 and 1 > 0, there exists an integer n2 > 0 su h that for   1 < . all n > n2 , Pr 1 (Z1 ; Z2 ) < 1 Lemma B.8 (4.3).

Proof.

For any

Re all that the atta ker has the form given in Se tion 4.1.2 and that the random 167

variables Z1 and Z2 are de ned in (4.15) and (4.16). Thus, 1

Z1 =

n

Y jU ? 2

1

=

n

1 (X )X

1

=

2

 + 2 (X ) U ?

n

12 (X )b2 + 3 (X ) + 2 1 (X )n



2

1 (X )C W (U ) + 2 (X )jU ?

= 12 (X )b2 + 3 (X ) + 2 1 (X )n

1



1





2 (X )jU ? ; C W (U )

2 (X ); C W (U ) ;

(B.30)

where the rst equality follows from the de nition of Z1 (4.15); the se ond equality from the representation of the forgery as in (4.4); the third equality from the stru ture of the





en oder (4.11); the subsequent inequality from (4.8), the bound

2 (X )jU ? 2 

2 (X ) 2

and (4.5); and the nal equality be ause C W (U ) 2 U ? (4.9). Similarly, we an show that Z2 = 1 (X )b2 + n

1

h 2(X ); C W (U )i:

We now argue that the sequen e of random variables n

1

(B.31)

h 2(X ); C W (U )i approa hes,

as n tends to in nity, zero in probability uniformly over all atta kers. First, note that given the stegotext X = x, the random ve tor C W (U ) is distributed like b2 x=A + C , where

p

C is uniformly distributed on S n (0; nb3 ) and b3 = b2 (A

p 0  X = x

 D E  p p p = Pr 2 (x)= n 3 (x); C = nb3 > = 3 (x)b3

 =

E   D p p p Pr 2 (x)= n 3 (x); C = nb3 > = D2 b3

2Cn

1

p



ar

os = D2 b3 : Cn 1 ( )

Here the rst equality follows by the onditional distribution of C W (U ) and the fa t that

h 2(x); xi

= 0, the subsequent inequality follows from 3 (x)

p nal equality follows sin e C = nb p

3

 D2

(see (4.5)), and the

is uniformly distributed on S n (0; 1) \ x? and sin e

2 (x)= n 3 (x) also takes value in this set. Sin e the resulting upper bound, whi h tends 168

to zero, does not depend on x, it must also hold for the un onditional probability. Combining this fa t with (B.30) and (B.31), we see that for any 2 > 0 there exists some n2 su h that Pr Z1  12 (X )b2 + 3 (X ) + 2 and Z2  1 (X )b2

2



1



(B.32)

for all n > n2 . Sin e 1 (z1 ; z2 ) is non-in reasing in z1 and non-de reasing in z2 , it follows that (B.32) implies Pr 1 (Z1 ; Z2 )  for all n > n2 . Sin e n

1

kX k

2

3 (X )= 12 (X )  D2 = so that

1 (X )b2 2 p b2 ( 12 (X )b2 + 3 (X ) + 2 )

!

1



(B.33)

= A (4.12), it follows from (4.6) that with probability one

1 (X )b2 p = 2 b2 ( 1 (X )b2 + 3 (X ))

s

b2 b2 + 3 (X )= 12 (X )

 :

(B.34)

1

Thus, we an hoose 2 small enough (and the orresponding n2 large enough) so that (B.32) will imply via (B.33) and (B.34) that Pr ( 1 (Z1 ; Z2 )  1

B.7

Given

distributed over the set

a2 =

Proof.

, for all n > n2 .

Proof of Lemma 4.6

Lemma B.9 (4.6).

and

1 )  1

(1

X

=

V (x; z) =

)2 (u2 v2 z 2 ) . n 1 kxk2

x



and

Z = z,

the random ve tor

p

?

a1 x + v : v 2 S n (0; na2 ) \ x

Conditional on the overtext

V W (U )

, where

is uniformly

2 )z a1 = vn+(1 1 kxk2 ,

U = u and on Z = z, the auxiliary odeword V W (U )

is uniformly distributed over the set 

V 0(u; z) = v : n kvk 1

2

= v2 and n

1

hv ; u i = z



;

as follows by the de nition of Z (4.40) and the distribution of the odebook fV j;k g. Using 169

the deterministi relation (4.37) we an now relate the appropriate onditional densities as

fV W (U )jX (v jX = x; Z = z ) = fV W (U )jU ;Z

 v U = x1 v ; Z = z :

 ;Z

The proof will be on luded on e we demonstrate that irrespe tive of z , it holds that 

v 2 V (x; z) if, and only if, v 2 V 0 (x v)=(1 ); z . Indeed, if v 2 V (x; z ), then we an al ulate that n kv k 1

2

= a21 n

1

kx k + a 2

2

= 2 using v

the fa t that

n

1

kxk

2

= 2 + 2(1

)z + (1 )2 2 :

v

u

(B.35)

Furthermore, 1



n

 x v  v; =



1

+ (1 )z 1

2

v

2 v

= z;



and thus v 2 V 0 (x

v)=(1 ); z . Conversely, if v 2 V 0 (x v)=(1 1

n





); z , then

 x v n hv; xi v; =

2

1

1



1

v



= z;

and hen e vjx = a1 x. Furthermore, 1

n

2

vjx?

=

1

kvk n

2

1

n

2

vjx

= 2 v

a21 kxk2 = a2 ; n

where we have again used (B.35), and thus v 2 V (x; z ).

B.8

Proof of Lemma 4.8

Lemma B.10 (4.8).

If the onstants de ned for the additive atta k watermarking game

are used to design the sequen e of en oders of Se tion 4.3.1, then for any  > 0 and 2 > 0, there exists an integer n2 > 0 su h that for all n > n2 and for all the deterministi atta ks

of Se tion 4.1.1, Pr 2 (Z; Z3 ; Z4 ) <  (R1 + Æ)



2 < .

Proof. Re all that a deterministi atta ker of Se tion 4.1.1 is spe i ed by a ve tor y~ satis170

fying (4.3). Fix some 3 > 0 (to be hosen later) and hoose n2 large enough to ensure Pr(E1 \ E2 \ E3 )  1 where the events

8n > n2;

;

(B.36)

E1 , E2 , and E3 are de ned by  2n

E1 =

1





hX ; y~ i  3 ;





E2 = n 1hV W (U ); y~ i  3 ; and

E3 = fZ  u2 g:

Note that whenever 3 > 0, su h an n2 an always be found by the union of events bound, be ause the probability of the omplement of ea h of the events is vanishing uniformly in y~ , for all y~ satisfying (4.3). Indeed,

E1

and

E2

have vanishing probabilities be ause

U and V W (U ) are uniformly distributed on n-spheres (see Lemma 4.5) and sin e X = V + (1 )U , and E3 has vanishing probability by Lemma 4.7.

both

Event

E1 guarantees that Z3 =

2 1 2 k X k + hX ; y~ i + ky~ k2 n n n 1

 v2 + 2(1 )Z + (1 )2 u2 + 3 + D2 ;

(B.37)

where the equality follows by the de nition of Z3 (4.41) and the form of the atta ker given in Se tion 4.1.1, and where the inequality follows by (B.35), (4.3), and the inequality de ning

E1 .

From the de nition of Z4 (4.43) it follows that

quently, the interse tion E1 \ E2 guarantees that

E2 guarantees that Z4  3.

v2 + (1 )Z 3 : v2 + 2(1 )Z + (1 )2 u2 + 3 + D2

2 (Z; Z3 ; Z4 )  p

Conse-

(B.38)

For any 3 > 0, the RHS of (B.38) is monotoni ally in reasing in Z , so that the inter171

se tion

E1 \ E2 \ E3 implies v2 + (1 ) u2 3 : v2 + 2(1 ) u2 + (1 )2 u2 + 3 + D2

2 (Z; Z3 ; Z4 )  p

(B.39)

Re alling the de nitions in Se tion 4.3.1 and the de nition of  (R1 + Æ) (4.49), one an

show using some algebra that for 3 = 0, the RHS of (B.39) equals  (R1 + Æ). Sin e the RHS of (B.39) is ontinuous in 3 , we an hoose some 3 > 0 small enough (and the resulting n2 large enough) so that the interse tion E1 \ E2 \ E3 will guarantee that

2 (Z; Z3 ; Z4 )   (R1 + Æ) 2 : The laim thus follows from (B.36).

B.9

Proof of Lemma 4.10

Lemma B.11 (4.10).

If the onstants de ned for the general watermarking game are used

to design the sequen e of en oders of Se tion 4.3.1, then for any  > 0 and 2 > 0, there exists an integer n2 > 0 su h that for all n > n2 and for all atta kers of Se tion 4.1.2, Pr 2 (Z; Z3 ; Z4 ) <  (R1 + Æ)



2 < .

Proof. In order to prove the desired result, we need the following te hni al laim. Lemma B.12.

As n tends to in nity, the sequen e of random variables n

1

h 2 (X ); V W (U )i

approa hes zero in probability uniformly over all the atta kers of Se tion 4.1.2. Proof. Conditional on X = x and Z = z , the random ve tor V W (U ) is by Lemma 4.6 distributed like a1 x + V , where V is uniformly distributed on

S n(0; pna2) \ x?, and a2

de ned in (4.52) depends on z . Consequently for any 0 < 
 X = x; Z = z   D p p p E = Pr 2 (x)= n 3 (x); V = na2 > = 3 (x)a2  D  E p p  Pr 2 (x)= n 3(x); V =pna2 > = D2v2

Pr n

=

1

2Cn



1



p

ar

os = D2 v2

Cn 1 ()

172



:

Here, the rst equality follows by Lemma 4.6 and the fa t that 2 (x) 2 x?, the subsequent inequality follows from 3 (x)  D2 and a2  v2 (see (4.5) and (4.52)), and the nal equality

p

follows sin e V = na2 is uniformly distributed on

S n(0; 1) \ x? and sin e 2(x)= n 3(x) p

also takes value in this set. Sin e the resulting upper bound, whi h tends to zero, does not depend on x or z , it must also hold for the un onditional probability. We now pro eed to prove Lemma 4.10. Choose n2 large enough to ensure that Pr(E4 \ E5 )  1

;

8 n > n2 ;

n p o  u2 + g and E5 = n 1h 2 (X ); V W (U )i  2v A pD2 . Su h an n2 an be found by the union of events bound sin e both E4 and E5 have vanishing

where E4 = fZ

probabilities by Lemmas 4.9 and B.12, respe tively. For the deterministi atta ker of Se tion 4.1.2, we an express the random variables Z3 and Z4 of (4.41) and (4.43) as

Z3 = 12 (X )n

1

kX k2 + 3 (X );

and

Z4 = ( 1 (X ) 1)(v2 + (1 )Z ) + n

1

h 2 (X ); V W (U )i:

Substituting these expressions in 2 (Z; Z3 ; Z4 ) of (4.44) yields

2 (Z; Z3 ; Z4 ) 2 + (1 )Z + ( 1 (X ) 1)(v2 + (1 )Z ) + n 1 h 2 (X ); V W (U )i p = v ( 12 n 1 kX k2 + 3 (X ))v2 v2 + (1 )Z n 1 h 2 (X ); V W (U )i p = q + : (B.40)  Z3 v2 n 1kX k2 + 3 (X )= 12 (X ) v2 We on lude the proof by showing that the interse tion E4 \ E5 implies that (B.40) ex eeds

 (R1 + Æ) 2 .

We rst fo us on the se ond term on the RHS of (B.40). Using the expression (B.35) and the de nitions of Se tion 4.3.1, we see that event

E4 implies that n 1kX k2 is at least

A. When this is true, then the distortion onstraint (2.3) and the triangle inequality imply 173

p that Z3 is at least ( A pD2)2 . Thus, the interse tion E4 \ E5 guarantees that the se ond term of (B.40) is at least 2. as

We now turn to the rst term on the RHS of (B.40), whi h using (B.35) an be rewritten q

v2 + 2(1

v2 + (1 )Z  ; )Z + (1 )2 u2 + 3 (X )= 12 (X ) v2

and whi h, using (4.6) and the fa t that E4 implies n 1kX k2 is at least A, an be lower bounded by q

v2 + 2(1

v2 + (1 )Z  : )Z + (1 )2 u2 + D 2 v2

Sin e < 1 (see (A.6)), the above term is in reasing in Z . Substituting Z = u2 +  into this term yields (R1 + Æ), as an be veri ed using the de nitions of R1 (4.33) and  () (4.49), whi h yield  (R1 + Æ) = v2 + (1

r 

)( u + ) 2

: Av2

The event E4 thus implies that the rst term on the RHS of (B.40) is at least (R1 + Æ).

B.10

Proof of Lemma 4.12

Lemma B.13 (4.12). For any en oder with orresponding watermarking hannel satisfying (2.1), if the atta ker

gn

of (4.65) with orresponding atta k hannel

PX jU

PY jX

is

used, then

1

 I (X ; Y jK; U ; 1 ) n PU P1 PXjU 1 PY jX m X   Pr(K = k)  21 log 1 + s(ak ; D1 ; D~ 2 ; k ) k=1 h i  EK C (D1 ; D~ 2 ; K ) : ;

174

(B.41) (B.42)

Proof.

To simplify the proof of this lemma, we will use the following notation:

(k) = (ak ; D~ 2 ); b1(k) = b1 (ak ; D1 ; k );

(B.43) (B.44)

b(2k) = b2 (ak ; D1 ; k );

(B.45)

and

where the fun tions (; ), b (; ; ), and b (; ; ) are de ned in Appendix A. We shall need the following te hni al laim. 1

Lemma B.14.

E

2

If the en oder satis es the a.s. distortion onstraint (2.1), then 

1

g (X ;  ) n

n

for all k  1 su h that

(k ) (k ) 1

b

2

U

2



K





=k  k

( )



(k) b(2k) + D~ 2 ;

Pr(K = k) > 0.

Re all that the atta ker gn de ned in (4.65) produ es an IID sequen e of N(0; D~ ) random variables V that is independent of (X ; U ). Furthermore, sin e K is a fun tion of X , the random ve tor V is also independent of X and U given K . Thus, for all k  1 with Pr(K = k) > 0,

Proof.

2



E n

=

1



gn

(X ;  ) 1



(k)

= ( ) E (k ) 2

 h

X

n

= ( k ) E n  = ( k ) ak ( ) 2

( ) 2

b

2



E n

(k ) (k ) 1

1

1

b





X

(k ) 1

b



U +

(k ) 1

kX k

2



U

b(1k) E 2n

2



K

U

=k

p

2



K

(k )



V

2



K



=k

=k + k E ( )



h

n

1

kV k



2

K i

i

+ k D~   hX ; U ijK = k + (b k ) k + k D~ ;

b(1k) 2hX ; U i + (b(1k) )2 kU k2 K = k 1

=k

( ) 2 1

( )

( )

2

2

where the nal equality follows by the de nitions of ak and k (see (4.63) and (4.64)). The proof will be on luded on e we show n 1 E [hX ; U i j K = k℄ 

175

1 (a +  2 k k

D1 );

(B.46)

be ause b(1k) (ak + k

ak

D1 ) + (b1(k) )2 k = b(2k) ;

by (B.44) and (B.45). We verify (B.46) by noting that for every k  1 su h that Pr(K = k) > 0, D1

 = =







kX U k K = k   E n kX k 2n hX ; U i + n kU k K = k   ak E 2n hX ; U i K = k + k ;

E n

1

2

1

2

1

1

2

1

where the inequality follows sin e n kX U k  D almost-surely so that the expe tation given any event with positive probability must also be at most D . 1

2

1

1

We an now write the mutual information term of interest as I (X ; Y jK; U ; 1 )

= =

m X

Pr(K = k)  I (X ; Y jK = k; U ;  ) 1

k=0 m X

Pr(K = k)  h(Y jK = k; U ;  ) 1

k=1



h(Y jX ; K = k; U ; 1 ) ;

(B.47)

sin e by the stru ture of the atta k hannel all of the above di erential entropies exist for all k  1, and sin e when k = 0 the above mutual information is zero. To prove (B.41) we shall next verify that I (X ; Y jK = k; U ; 1 ) = h(Y jK = k; U ; 1 )

h(Y jX ; K = k; U ; 1 )

(B.48)

is upper bounded by log(1 + s(ak ; D ; D~ ; k )), for all k  1 satisfying Pr(K = k) > 0. 1 2

1

2

176

We an upper bound the rst term on the RHS of (B.48) as



(Y jK = k; U ;  ) = h gn (X ;  ) K = k; U ;    = h gn (X ;  ) k b k U K = k; U ;     h gn (X ;  ) k b k U K = k  

n  2 log 2eE n1

gn (X ;  ) k b k U

    n2 log 2e ( k ) b k + k D~ ;

h

1

2

1

2

( ) ( ) 1

2

( ) ( ) 1

1

( ) ( ) 1

2

( ) 2 ( ) 2

( )

2

K

=k



(B.49)

2

where the rst inequality follows sin e onditioning redu es entropy, the se ond inequality follows sin e a Gaussian has the highest entropy subje t to a se ond moment onstraint, and (B.49) follows by Lemma B.14. We an write the se ond term on the RHS of (B.48) as p



(Y jX ; K = k; U ;  ) = h k V K   = n2 log 2e k D~ ;

h

1

( )

( )

(B.50)

2

for all k  1, where (B.50) follows sin e V is an IID sequen e of N(0; D~ ) random variables independent of (X ; U;  ) and hen e independent of K . Combining (B.47), (B.49), and (B.50) and observing that s(ak ; D ; D~ ; k ) = k b k =D~ , proves (B.41). Finally, (B.42) follows from (B.41) by the de nition of C (D ; D ; u ) (A.8). 2

1

1

( ) ( ) 2

2

1

B.11

2

2

2

Proof of Lemma 4.13

  Lemma B.15 (4.13). For any ergodi overtext distribution PU with E Uk4 < and  2 E Uk u2 , there exists mappings Æ ; n and n0  su h that both the properties P1 and

1

( )

() P2 stated below hold, where P1 is \For every  > 0, limn!1 Æ (; n) = 0." and P2 is \For   every  > 0, n > n (), and event E , if E n kU k jE > u + 5, then Pr(E) < Æ (; n)." 

1

0

Proof.

is:

2

2

First, note that the ontrapositive (and hen e equivalent) statement of property P2

P2a. For every  > 0, n > n (), and event E, if Pr(E)  Æ(; n), then E 0

177

 n

1



kU k jE  2

2 u

+ 5.

Let us de ne 1X n

SU 2 ;n

=

n

i

2

(B.51)

Ui ;

=1

and mU 2

Sin e

U

is stationary,

mU 2

re all the assumption that

=E

 2 Ui : 

does not depend on i and mU 2

E SU 2 ;n



 2 .

=

mU 2

for all n. Further

u

We rst prove the laim assuming that S

has a density for all n, and return later to

U 2 ;n

the ase when it does not. Fix  > 0, and hoose Var(S

U 2 ;n

n0 ()

)  2 =2;

su h that

8n > n0():

(B.52)

This an be done sin e U is ergodi with nite fourth moment, and hen e S in mean square to

mU 2 .

Next, hoose

fs g su h that for all n > n0()

U 2 ;n

is onverging

n

Pr(S

U 2 ;n

s

n

)=

Var(S

U 2 ;n

)

2

(B.53)

;

and

s m



mU 2

U2

n

+ :

(B.54)

Su h an s exists for all appropriate n by the intermediate value theorem of al ulus be ause n

our assumption that

SU 2 ;n

has a density guarantees that Pr(S

U 2 ;n

and be ause Pr

SU 2 ;n

m

U2

+



 Var(S2 2 U

;n

)

;

and Pr

SU 2 ;n

m

U2





178

Var(S



1



Var(S

U 2 ;n

2

U 2 ;n

2

)

;

)

 ) is ontinuous in ,

whi h follow from Chebyshev's inequality and (B.52). From (B.53) it follows that the hoi e Æ (; n)

= Pr(SU 2 ;n  sn );

(B.55)

guarantees Property P1, be ause Var(SU 2 ;n ) approa hes zero. We now show that with this hoi e of Æ(; n), Property P2a is also satis ed. Let the event

E satisfy Pr(E )  Æ(; n) so that by (B.55),

Pr(E )  Pr(SU 2 ;n  sn )

(B.56)

Then, 

E SU 2 ;n

jE



Z

= =

 

0

1

Pr(SU 2 ;n  tjE ) dt

1 sn 1 Pr(SU 2 ;n  t; E ) dt Pr(SU 2 ;n  t; E ) dt + Pr(E ) 0 sn  Z s Z 1 n 1 Pr(E ) dt + Pr(SU 2 ;n  t) dt Pr(E ) 0 sn Z 1 1 sn + Pr(SU 2 ;n  t) dt; Pr(SU 2 ;n  sn ) sn Z



Z

where the rst equality follows sin e SU 2 ;n is a non-negative random variable and the nal inequality follows by (B.56). Furthermore, for n > n0 (), Z

1

sn

Pr(SU 2 ;n  t) dt =

Z sn +2 sn

Pr(SU 2 ;n  t) dt + Z

1

Z

1

sn

+2

Pr(SU 2 ;n  t) dt

Var(SU 2 ;n ) mU 2 )2 sn +2 (t Var(SU 2 ;n) = 2 Pr(SU 2 ;n  sn ) + sn + 2 mU 2  2 Pr(SU 2;n  sn) + Var(SU 2;n) ;



2 Pr(SU 2 ;n  sn ) +

dt



where the rst inequality follows sin e Pr(SU 2 ;n

 t) is non-in reasing in t and by Cheby-

shev's inequality, and the nal inequality is valid by (B.54). Therefore, 

E SU 2 ;n

Var(SU 2 ;n) jE  sn + 2 +  Pr(  mU 2 + 4; S 2 s ) 

U ;n

179

n

where the nal inequality follows by (B.53) and (B.54). This on ludes the proof in the

ase where S 2 has a density. We now return to the ase when S 2 does not ne essarily have a density. Fix  > 0, and let Z = U 2 +  , for all k  1, where 1 ; 2 ; : : : is an IID sequen e of exponential random variables with mean  independent of U . Sin e U is ergodi , Z is also ergodi . P Furthermore, S = n 1 =1 Z has a density, and thus the above results hold for S . In parti ular, we an hoose fs g and n0 () su h that Pr(S  s ) ! 0 and su h that Pr(E )  Pr(S  s ) and n > n0 () imply that U

;n

U

k

;n

k

k

n

Z;n

Z;n

k

k

n

Z;n

Z;n

n

E [SZ;n

jE ℄ 

mZ

= U

+ 5:

Z;n

;n

a.s. and thus E



SU 2 ;n



jE  E [S jE ℄ Z;n

Proof of Lemma 5.6

Lemma B.16 (5.6).

u x y n kx b uk ,

+ 4

mU 2

We omplete the proof by noting that S 2  S for any event E with non-zero probability. B.12

n

, and

There exists a positive fun tion

, and the s alars

1

b2 < Æ , n

2

1

D2 1

and

Æ

n

satisfy

< Æ,

hu; y yjx i

f (A; D1 ;  2 ) su h  2 < Æ , n

kuk n ky xk  D 1

2

1

2

2

that if the 1

hu; x

< A,

and

n-ve tors

b1 Æ


= b1 (A; D1 ; 2 ), b2 = b2 (A; D1 ; 2 ) and = (A; D2 ).

Consider the following hain of equalities and inequalities.

2( b2 + D2 ) =

1

b2 D2 b2 D2



n

n

1

n

1



1

ky

uk

2

2 ( b2 +D2 )

kyk

2

1+



kyk

+2 1+

b1

2

1+



b2

b1 n

D2

n

n

b2 D2

b2 D2

ky

2 D2



n

1

n

1



xk

2

kxk + 2



b21 n

1

kuk + 2n 2

kxk + b n kuk 2

2 1

1

2

hu; xi  n hy; xi 2b n hyj ? ; ui x kxk

1 1

1

1

1

2

180

1

 1





y; 1 + D x

b2 2



b1

u



=



b2 (1 + ) A b2 n 1

n

ky xk

2

D2

> >

1

+

+ b21 n 1 1

2b1 n

kuk

2



+2 1

b1 n 1

A

n

hyjx? ; ui

b2 =A)(1 + b1 )2

+ b1 (1 + b1 + b21 ) Æ (1 + b1 )2

A

hu; xi  n 1hy; xi 1 kxk2

b2

  2 2

(1 + b1 ) + b1 + 2b1 + 2 1

(1 + b1 )2 + b21 + 2b1

 b2 (1 + ) Æ A b2 (1 + ) Æ A

2 (1

kxk

2

A

 1

n

hu; xi  n 1hy; xi 1 kxk2

b1 n 1

b2

n

hy ; x i

! ;

and thus, n 1

ky

k

b1 u 2

2 ( b2 + D2 )  1 >

Æ

A

n 1



ky

k

x 2

2 D2

1 6 b2 + 2b1 + (1 + b1 )2 + 1 2b2 2b2 b2



1

b2



A

(1 + b1 )

2



+ b1 (1 + b1 + b21 )

:

The rst equality is simply an expansion of the terms of interest. The se ond equality

y = yjx + yjx? and yjx = (hu; xi=kxk2 )x. The third equality uses the de nition = 1 D2 =A and the relation hx; y i = (kxk2 + kyk2 kx yk2 )=2. The rst uses the relations

inequality uses (5.29), (5.28), (5.25), the fa t that

jn 1 kxk2

j

A < Æ (1 + b1 )2

(B.57)

(derived from (5.25), (5.26) and (5.27)), and the relation 

b2 (1 + ) A



A + b21  2

The se ond inequality uses (B.57), the fa t that jn

1

b2

hu; xi

= 0: (A + 2

j

D1 )=2 < Æ (1+ b1 + b21 )

(derived from (5.25), (5.26) and (5.27)), and the relation 1

b2 A

b1 (A +  2

2A

The nal inequality uses (5.30), the fa t that jn

D1 )

= 0:

hx; yij < 3A (derived from (5.29), (5.30), (B.57) using Cau hy-S hwartz), and the fa ts that  1 and b2 + D2  b2 .

181

1

B.13

Proof of Lemma 5.10

f~(A; D1 ; D~ 2 ; 2 ) su h that if the n 1 ksk2 ve tors s, x, and y , and the s alars D2 and Æ satisfy n v < Æ, n 1 kxk2 A < Æ, n 1 hs; xi ( 1)b1 2 A < Æ, n 1 hs; y yjx i < Æ, n 1 ky xk2  D2 < A, and Æ < A2 , then Lemma B.17 (5.10).

There exists a positive fun tion

n 1 kyk2 2(A D2 )

n

1

ky 1sk2 > Æf~(A; D ; D~ ; 2 ); 1 2 2 2

A, D1 , D~ 2 , D2 and 2 , i.e., = (A; D1 ; D~ 2 ; 2 ), b1 = b1 (A; D1 ; 2 ), v = v(A; D1 ; D~ 2 ; 2 ), 1 = 1 (A; D1 ; D~ 2 ; D2 ; 2 ), ~ 2 ; D2 ; 2 ). and 2 = 2 (A; D1 ; D

where all of the parameters are omputed with respe t to

Proof.

First, we ompute that

n

1

kyk2

= n

1

kx

< D2

y k2

kxk2 + 2n 1hx; yi A + ÆA + 2n 1 hx; yi; n

1

(B.58)

whi h follows by (5.54) and (5.51). Se ond, we ompute that

n

1

hy ; s i

hyjx; si + n 1hyjx? ; si n 1 hx; si 1 > n hx; yi Æ; n 1 kxk2

= n

1

(B.59)

whi h follows by (5.53). Next, we ompute that

n

1

hx; yi



n 1 hx; s i (A D2 ) 1 n kxk2

1 v



> n > >







A

hx; yi 1 v n 1kxk2 1   2( 1 v + A D2 ) Æn 1 hx; yi A 1

6Æ( 1 v + A

D2 );

A D Æ 1 22 n kx k



(B.60) (B.61)

where the rst inequality follows by (5.52) and the relevant de nitions; the se ond inequality follows by (5.51) and (5.55); and the nal inequality follows sin e n 182

1

hx; yi < 3A using

Cau hy-S hwartz along with (5.54), (5.51) and (5.55). Thus, 

2(A

D2 ) 2

= 12 vn >

12 v

1

= 2 1 n 

1

1

n

kyk2 + 2 1 (A

D2

12 (A

n 1ky k2 2(A D2 )

2 2

D2 )n

A + ÆA + 2n D2 ) (v + Æ)

hx; yi



(A

> Æ 12 1 ( 1 v + A

ky

n D2 ) n

1

1

1 sk2



hy; si 

hx; yi

+ 2 1 (A

hx; si 1 kx k2

1

ksk2  1 n hx; si 1 D2 ) n hx; yi n 1kxk2

12 (A



1 v

D2 ) + 12 vA + 2 1 (A

D2 )n

1

Æ 12 vA + 2 1 (A D2 ) + 12 (A



Æ

D2 ) + 12 (A

D2 ) ;

D2 )



(B.62)

where the rst inequality follows by (5.50), (B.58) and (B.59) and the se ond inequality follows by (B.61). Dividing (B.62) by 2(A essentially only depend on D2 through a A

D2 ) 2 gives the desired result sin e 1 and 2 D2 term; see (5.42) and (5.43).

183

184

Bibliography [AD89℄

Rudolf Ahlswede and Gunter Due k. Identi ation via hannels. Inform. Theory

[Ahl78℄

IEEE Trans.

, 35(1):15{29, January 1989.

Rudolf Ahlswede. Elimination of orrelation in random odes for arbitrarily varying hannels.

, 44:159{175,

Z. Wahrs heinli hkeitstheorie verw. Gebiete

1978. [Ahl86℄

Rudolf Ahlswede. Arbitrarily varying hannels with states sequen e known to the sender.

[AP98℄

IEEE Trans. Inform. Theory

, 32(5):621{629, September 1986.

Ross J. Anderson and Fabien A. Petit olas. On the limits of steganography. , 16(4):463{473, May 1998.

IEEE Jour. on Sel. Areas in Comm.

[AV96℄

Venkat Anantharam and Sergio Verdu. Bits through queues. Inform. Theory

[AW69℄

IEEE Trans.

, 42(1):4{18, January 1996.

Rudolf Ahlswede and Ja ob Wolfowitz. Correlated de oding for hannels with arbitrarily varying hannel probability fun tions.

Information and Control

,

14(5):457{473, May 1969. [Bas83℄

Tamer Basar. The Gaussian test hannel with an intelligent jammer.

IEEE

, 29(1):152{157, January 1983.

Trans. Inform. Theory

[BBDRP99℄ M. Barni, F. Bartolini, A. De Rosa, and A. Piva. Capa ity of the watermark

hannel: How many bits an be hidden within a digital image.

Pro . SPIE

, (3657):437{448, 1999.

Se urity and Watermarking of Multimedia Contents

[BBK01℄

Alexander Barg, G. R. Blakley, and G. Kabatiansky. Good digital ngerprint185

ing odes. In Pro . of the Inter. Symposium on Info. Theory, Washington, DC, 2001. [BBPR98℄

M. Barni, F. Bartolini, A. Piva, and F. Riga

i. Statisti al modelling of fullframe DCT oeÆ ients. In Pro eedings en e (EUSIPCO 98),

[BBT60℄

of European Signal Pro essing Confer-

volume 3, pages 1513{1516, Rhodes, Gree e, 1998.

David Bla kwell, Leo Breiman, and A. J. Thomasian. The apa ity of ertain hannel lasses under random oding.

Annals of Mathemati al Statisti s,

31(3):558{567, September 1960. [BCW00℄

Ri hard J. Barron, Brian Chen, and Gregory W. Wornell. The duality between information embedding and sour e oding with side information and some appli ations. Preprint, January 2000.

[BI99℄

Markus Breitba h and Hideki Imai. On hannel apa ity and modulation of watermarks in digital still images. In Finan ial

Cryptography,

number 1648 in

Le ture Notes in Computer S ien e, pages 125{139, 1999. [Bla57℄

Nelson M. Bla hman. Communi ation as a game. In IRE tion Re ord,

[BMM85℄

WESCON Conven-

number 2, pages 61{66, San Fran is o, CA, 1957.

J. Martin Borden, David M. Mason, and Robert J. M Elie e. Some information theoreti saddlepoints. SIAM Journal on Control and Optimization, 23(1):129{ 143, January 1985.

[BS98℄

Dan Boneh and James Shaw. Collusion-se ure ngerprinting for digital data. IEEE Trans. Inform. Theory,

[BW85℄

44(5):1897{1905, September 1998.

Tamer Basar and Ying-Wah Wu. A omplete hara terization of minimax and maximin en oder-de oder poli ies for ommuni ation hannels with in omplete statisti al des ription. IEEE Trans. Inform. Theory, 31(4):482{489, July 1985.

[Ca 98℄

Christian Ca hin. An information-theoreti model for steganography. In Pro . of the Inter. Workshop on Info. Hiding,

Computer S ien e, pages 306{318, 1998. 186

number 1525 in Le ture Notes in

[CEZ00℄

Gerard Cohen, Sylvia En heva, and Gilles Zemor. Copyright prote tion for digital data.

[CFNP00℄

, 4(5):158{160, May 2000.

IEEE Communi ation Letters

Benny Chor, Amos Fiat, Moni Naor, and Benny Pinkas. Tra ing traitors. IEEE Trans. Inform. Theory

[Che00℄

Brian Chen.

, 46(3):893{910, May 2000.

Design and Analysis of Digital Watermarking, Information Em-

. PhD thesis, MIT, Cambridge, MA, 2000.

bedding, and Data Hiding Systems

[CK81℄

Imre Csiszar and Janos Korner.

Information Theory:

Coding Theorems for

. Akademiai Kiado, Budapest, 1981.

Dis rete Memoryless Systems

[CKLS97℄

Ingemar J. Cox, Joe Kilian, F. Thomson Leighton, and Talal Shamoon. Se ure spread spe trum watermarking for multimedia.

,

IEEE Trans. Image Pro .

6(12):1673{1687, De ember 1997. [CN88a℄

Imre Csiszar and Prakash Narayan. Arbitrarily varying hannels with onstrained inputs and states.

IEEE Trans. Inform. Theory

, 34(1):27{34, January

1988. [CN88b℄

Imre Csiszar and Prakash Narayan. The apa ity of the arbitrarily varying hannel revisited: Positivity, onstraints.

IEEE Trans. Inform. Theory

,

34(2):181{193, Mar h 1988. [CN91℄

Imre Csiszar and Prakash Narayan. Capa ity of the Gaussian arbitrarily varying hannel.

[Cos83℄

, 37(1):18{26, January 1991.

IEEE Trans. Inform. Theory

Max H. M. Costa. Writing on dirty paper.

IEEE Trans. Inform. Theory

,

29(3):439{441, May 1983. [Cov75℄

Thomas M. Cover. An a hievable rate region for the broad ast hannel.

IEEE

, 21(4):399{404, July 1975.

Trans. Inform. Theory

[Cov99℄

Thomas M. Cover. Con i t between state information and intended information. In Information

[CS99℄

, page 21, Mestovo, Gree e, 1999.

Theory Workshop

Brian Chen and Carl-Erik W. Sundberg. Broad asting data in the FM band by means of adaptive ontiguous band insertion and pre an elling te hniques. 187

In 827, 1999.

Pro eeding of the International Conferen e on Communi ations

[CS01℄

, pages 823{

Giuseppe Caire and Shlomo Shamai. On a hievable rates in a multi-a

ess Gaussian broad ast hannel. In , page 147, Washington, DC, 2001. Pro . of the Inter. Symposium on Info. Theory

[CT91℄

Thomas M. Cover and Joy A. Thomas. Wiley & Sons, New York, 1991.

[CW01℄

Brian Chen and Gregory W. Wornell. Quantization index modulation: A lass of provably good methods for digital watermarking and information embedding. , 47(4):1423{1443, May 2001.

Elements of Information Theory

. John

IEEE Trans. Inform. Theory

[ESZ00℄

Uri Erez, Shlomo Shamai, and Ram Zamir. Capa ity and latti e-strategies for

an elling known interferen e. In , August 2000. Pro eedings of the Cornell Summer Workshop

on Information Theory

[FKK01℄

Chuhong Fei, Deepa Kundu, and Raymond Kwong. The hoi e of watermark domain in the presen e of ompression. In , pages 79{84, 2001. Pro eedings of the International

Conferen e on Information Te hnology: Coding and Computing

[Gal68℄

Robert G. Gallager. Wiley & Sons, New York, 1968.

. John

[GH99℄

James R. Giles and Bru e Hajek. The jamming game for timing hannels. In , page 35, Mestovo, Gree e, 1999.

Information Theory and Reliable Communi ation

Information Theory Workshop

[Gib92℄

Robert Gibbons. Press, Prin eton, NJ, 1992.

. Prin eton University

[GP80℄

S. I. Gel'fand and M. S. Pinsker. Coding for hannel with random parameters. , 9(1):19{31, 1980.

Game Theory for Applied E onomists

Problems of Control and Information Theory

[Gra88℄

Robert M. Gray. Spring-Verlag, New York, 1988.

Probability, Random Pro esses, and Ergodi Properties

188

.

[Gra00℄

Robert M. Gray. Toeplitz and ir ulant matri es: A review, 2000. Available at http://ee.stanford.edu/~gray/toeplitz.html.

[GS58℄

Ulf Grenander and Gabor Szeg o.

Toeplitz Forms and Their Appli ations

. Uni-

versity of California Press, 1958. [HEG83℄

Chris Heegard and Abbas A. El Gamal. On the apa ity of omputer memory with defe ts.

[HM88℄

, 29(5):731{739, September 1983.

IEEE Trans. Inform. Theory

Walter Hirt and James L. Massey. Capa ity of the dis rete-time Gaussian hannel with intersymbol interferen e.

IEEE Trans. Inform. Theory

, 34(3):380{388,

1988. [HN87℄

Brian Hughes and Prakash Narayan. Gaussian arbitrarily varying hannels. , 33(2):267{284, Mar h 1987.

IEEE Trans. Inform. Theory

[HN88℄

Brian Hughes and Prakash Narayan. The apa ity of a ve tor Gaussian arbitrarily varying hannel. IEEE Trans. Inform. Theory, 34(5):995{1003, September 1988.

[Jah81℄

Johann-Heinri h Jahn. Coding of arbitrarily varying multiuser hannels. Trans. Inform. Theory

[JF95℄

IEEE

, 27(2):212{226, Mar h 1981.

Rajan L. Joshi and Thomas R. Fis her. Comparison of generalized Gaussian and Lapla ian modeling in DCT image oding. IEEE Signal Pro essing Letters, 2(5):81{82, May 1995.

[JJS93℄

Nikil Jayant, James Johnston, and Robert Safranek. Signal ompression based on models of human per eption.

, 81(10):1385{1422,

Pro eedings of the IEEE

O tober 1993. [KM77℄

J anos Korner and Katalin Marton. Images of a set via two hannels and their role in multi-user ommuni ation.

IEEE Trans. Inform. Theory

, 23(6):751{

761, November 1977. [KP00a℄

Damianos Karakos and Adrian Papamar ou. Relationship between quantization and distribution rates of digitally watermarked data. In Pro . Symposium on Info. Theory

, page 47, Sorrento, Italy, 2000. 189

of the Inter.

[KP00b℄

Stefan Katzenbeisser and Fabien A. P. Petit olas, editors. Te hniques for Steganography and Digital Watermarking

Information Hiding

. Computer Se urity

Series. Arthouse Te h, Boston, 2000. [KZ00℄

Sanjeev Khanna and Fran is Zane. Watermarking maps: Hiding information in stru tured data. In Pro eedings of the 11th Annual ACM-SIAM Symposium on Dis rete Algorithms

[Lap96℄

Amos Lapidoth. Nearest neighbor de oding for additive non-Gaussian noise

hannels.

[LC01℄

, San Fran is o, CA, 2000.

, 42(5):1520{1529, September 1996.

IEEE Trans. Inform. Theory

Ching-Yung Lin and Shih-Fu Chang. Zero-error information hiding apa ity of digital images. In Pro .

, Thesaloniki,

of the Inter. Conf. on Image Pro essing

Gree e, O tober 2001. [LM00℄

Steven H. Low and Ni holas F. Maxem huk. Capa ity of text marking hannel. IEEE Signal Pro essing Letters

[LN98℄

Amos Lapidoth and Prakash Narayan. Reliable ommuni ation under hannel un ertainty.

[LSL00℄

, 7(12):345{347, De ember 2000.

IEEE Trans. Inform. Theory

, 44(6):2148{2177, O tober 1998.

Gerhard C. Langelaar, Iwan Setyawan, and Reginald L. Lagendijk. Watermarking digital image and video data: A state-of-the-art overview. Signal Pro essing Magazine

[LWB+ 01℄

IEEE

, 17(5):20{46, September 2000.

Ching-Yung Lin, Min Wu, Je rey A. Bloom, Matt L. Miller, Ingemar Cox, and Yui Man Lui. Rotation, s ale, and translation resilient publi watermarking for images.

[Mar79℄

Katalin Marton. A oding theorem for the dis rete memoryless broad ast

hannel.

[Mer00℄

, 10(5):767{782, May 2001.

IEEE Trans. Image Pro .

IEEE Trans. Inform. Theory

Neri Merhav. On random oding error exponents of watermarking systems. IEEE Trans. Inform. Theory

[Mit99℄

, 25(3):306{311, May 1979.

, 46(2):420{430, Mar h 2000.

Thomas Mittelholzer. An information-theoreti approa h to steganography and watermarking. In

, number

Pro . of the Inter. Workshop on Info. Hiding

1768 in Le ture Notes in Computer S ien e, 1999. 190

[MO99℄

Pierre

Moulin

and

Joseph

A.

O'Sullivan.

theoreti analysis of information hiding.

Information-

Preprint,

available at

http://www.ifp.uiu .edu/~moulin/paper.html, 1999.

[MO00℄

Pierre Moulin and Joseph A. O'Sullivan. Information-theoreti analysis of information hiding. In Pro . of the Inter. Symposium on Info. Theory, page 19, Sorrento, Italy, 2000.

[Mou01℄

Pierre Moulin. The role of information theory in watermarking and its apli ation to image watermarking.

[MS74℄

, 81(6):1121{1139, June 2001.

Signal Pro essing

James L. Mannos and David J. Sakrison. The e e ts of a visual delity riterion on the en oding of images.

, 20(4):525{536, July

IEEE Trans. Inform. Theory

1974. [MS01℄

Partha P. Mitra and Jason B. Stark. Nonlinear limits to the information

apa ity of opti al bre ommuni ations.

[MSP00℄

Nature

, 411:1027{1030, June 2001.

Ranjan K. Mallik, Robert A. S holtz, and George P. Papavassilopoulos. Analysis of an on-o jamming situation as a dynami game.

IEEE Trans. Comm.

,

48(8):1360{1373, August 2000. [Mul93℄

F. Muller. Distribution of two-dimensional DCT oeÆ ients of natural images. Ele troni s Letters

[Oli99℄

, 29(22):1935{1936, O tober 1993.

Arlindo L. Oliveira. Robust te hniques for watermarking sequential ir uit designs. In

, pages 837{842,

Pro eeding of the Design Automation Conferen e

New Orleans, LA, 1999. [OME98℄

Joseph A. O'Sullivan, Pierre Moulin, and J. Mark Ettinger. Information theoreti analysis of steganography. In Theory

[ORP97℄

Pro . of the Inter. Symposium on Info.

, page 297, Cambridge, MA, 1998.

 Ruanaidh and Thierry Pun. Rotation, s ale and translation Joseph J. K. O invariant digital image watermarking. In Pro essing

, pages 536{539, 1997. 191

Pro . of the Inter. Conf. on Image

[PAK99℄

Fabien A. P. Petit olas, Ross J. Anderson, and Markus G. Kuhn. Information hiding { a survey.

[PP99℄

, 87(7):1062{1078, July 1999.

Pro eedings of the IEEE

Shelby Pereira and Thierry Pun. Fast robust template mat hing for aÆne resistant image watermarks. In Pro .

of the Inter. Workshop on Info. Hiding

,

number 1768 in Le ture Notes in Computer S ien e, pages 199{210, 1999. [RA98℄

Mahalingam Ramkumar and Ali N. Akansu. Theoreti al apa ity measures for data hiding in ompressed images. In Pro eedings

of SPIE, Symposium on

, volume 3528, pages 482{492, Boston,

Voi e, Video and Data Communi ation

MA, November 1998. [SBM01a℄

Anelia Somekh-Baru h and Neri Merhav.

On the error exponent and

apa ity games of private watermarking systems.

Preprint, available at

http://tiger.te hnion.a .il/users/merhav/, 2001.

[SBM01b℄

Anelia Somekh-Baru h and Neri Merhav. On the error exponent and apa ity games of private watermarking systems. In

Pro . of the Inter. Symposium on

, Washington, DC, 2001.

Info. Theory

[SC96℄

Joshua R. Smith and Barrett O. Comiskey. Modulation and information hiding in images. In

, number 1174 in

Pro . of the Inter. Workshop on Info. Hiding

Le ture Notes in Computer S ien e, pages 207{226, 1996. [SEG00℄

Jonathan K. Su, Joa him J. Eggers, and Bernd Girod. Capa ity of digital watermarks subje ted to an optimal ollusion atta k. In European Signal Pro essing Conferen e

[Sha58℄

IBM

, 2:289{293, O tober 1958.

Claude E. Shannon. Probability of error for optimal odes in a Gaussian

hannel.

[SKT98℄

, 2000.

Claude E. Shannon. Channels with side information at the transmitter. Journal of Resear h and Development

[Sha59℄

Pro eedings of the

, 38(3):611{656, May 1959.

The Bell System Te hni al Journal

Mit hell D. Swanson, Mei Kobayashi, and Ahmed H. Tew k. dia data-embedding and watermarking te hnology. 86(6):1064{1087, June 1998. 192

Multime,

Pro eedings of the IEEE

[SM88℄

Wayne E Stark and Robert J. M Elie e. On the apa ity of hannels with blo k memory.

[SM01℄

, 34(2):322{324, Mar h 1988.

IEEE Trans. Inform. Theory

Yossef Steinberg and Neri Merhav. Identi ation in the presen e of side information with appli ation to watermarking.

IEEE Trans. Inform. Theory

,

47(4):1410{1422, May 2001. [SPR98℄

Sergio D. Servetto, Christine I. Podil huk, and Kannan Ram handran. Capa ity issues in digital image watermarking. In Pro . of the Inter. Conf. on Image Pro essing

[SV00℄

, 1998.

Rajesh Sundaresan and Sergio Verdu. Robust de oding for timing hannels. IEEE Trans. Inform. Theory

[SVZ98℄

Shlomo Shamai, Segio Verdu, and Ram Zamir. Systemati lossy sour e/ hannel

oding.

[SW70℄

, 46(2):405{419, Mar h 2000.

IEEE Trans. Inform. Theory

Josef Stoer and Christoph Witzgall.

, 44(2):564{579, Mar h 1998. Convexity and Optimization in Finite

. Springer-Verlag, 1970.

Dimensions I

[Wol78℄

Ja ob Wolfowitz.

Coding Theorems in Information Theory

. Spring-Verlag,

third edition, 1978. [Wyn67℄

Aaron D. Wyner. Random pa kings and overings of the unit n-sphere. Bell System Te hni al Journal

[XA98℄

The

, 46(9):2111{2118, November 1967.

Liehua Xie and Gonzalo R. Ar e. Joint wavelet ompression and authenti ation watermarking. In

Pro . of the Inter. Conf. on Image Pro essing

, pages 427{

431, 1998. [Yan93℄

Kenjiro Yanagi. Optimal mutual information for oders and jammers in mismat hed ommuni ation hannels. SIAM Journal of Control and Optimization, 31(1):41{51, January 1993.

[YSJ+ 01℄

Wei Yu, Arak Sutivong, David Julian, Thomas M. Cover, and Mung Chiang. Writing on olored paper. In

Pro . of the Inter. Symposium on Info. Theory

Washington, DC, 2001.

193

,