Information Hiding: Steganography and Watermarking - Attacks and Countermeasures [1 ed.] 9780792372042, 0792372042, 1558607145

Information Hiding: Steganography and Watermarking -- Attacks and Countermeasures deals with information hiding. With th

296 98 391KB

English Pages 12 Year 2000

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Information Hiding: Steganography and Watermarking - Attacks and Countermeasures [1 ed.]
 9780792372042, 0792372042, 1558607145

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

today



✍ ✎ ✍✍

✢✥

✜✤ ✢

✣✤ ✘

✚ ✗✒ ✘✙

✗✒ ✘

✕✖ ✑

✓✔











☎☛ ☞ ✡✟





Radu Sion ([email protected])

✟✞

✄☎



✝✆



Watermarking Multi-Content Aggregates

✮✪ ✭

❂❁ ❃ ❀

❆❏ ❉

● ❉

❉❊



❆❏ ❉▼

❉❍



❘❏

▲▼ ◆



▲◆ ❏ ❆

P◗

❏ ▲◆ ❍



❖◆ ❉▼









❲❳

❚❱ ❨



❯❚

❚❙



❊▲











❍● ❆ ❊







✿✽ ✾

✻✼

✷ ✺ ✹✸

✵✱ ✶✱

❆❅

Computer Sciences and CERIAS Purdue University

❇❈

✯ ✳✴✲✰ ✱



✩✪

✧★

✬✫

http://www.cs.purdue.edu/homes/sion

ver. 2.12, April 02, 2002

Keywords: Digital Watermarking, Steganography, Security, Copyright Protection, Databases Sion, Atallah, Prabhakar

Watermarking content aggregates

2

overview

introduction

• general Introduction (aprox. 10 slides)

• what is the problem ?

• folklore: traditional watermarking

• what is a solution ?

(aprox. 10 slides)

• who cares anyway ?

• our research (aprox. 30 slides)

Sion, Atallah, Prabhakar

Watermarking content aggregates

Sion, Atallah, Prabhakar

3

Watermarking content aggregates

4

1

issues

solution

• affirm creation rights: resiliently embed information within the object (and its copies !) allowing identification of the actual copyright owner in a Court of Law

Watermarking deploys information hiding techniques in the aim to become a solution to the previously outlined issues.

• inline annotation: encode (not necessary hide) information in object • identify agreement violators (“bad people”): hide and persist information in each sold copy of the object, allowing identification of the initial buyer of that particular copy (“fingerprinting”)

i.e. hiding a certain mark (e.g. “radu is the author of this novel”) into the object itself (e.g. novel text) is hoped to hold up in court as evidence for copyright purposes at a later dispute time; important issue: “attack survivability”

• unobtrusive communication: use ‘innocent looking’ message to hide secret (covert channel). • etc. Sion, Atallah, Prabhakar

Watermarking content aggregates

Sion, Atallah, Prabhakar

Watermarking content aggregates

5

market: got money ?

6

information hiding Information Hiding

Buyers of watermarking technology include Covert Channels

any party that produces and/or sells valuable content and then distributes it through untrusted channels,

Linguistic

Technical

Robust

Fingerprints

especially in the case when the content allows for valuable derivates, in which case the watermarking technology has to also provide protection for the derivates.

Copyrights Marking

Anonymity

Steganography

Classification of Information Hiding (according to Petitcolas et. all.)

Fragile

Watermarks

Imperceptible

Perceptible

Fundamental difference: Watermarking vs. Steganography

Sion, Atallah, Prabhakar

Watermarking content aggregates

Sion, Atallah, Prabhakar

7

Watermarking content aggregates

8

2

watermark embedding

watermark detection

Stego Object

Key

Marked Object

Watermark

Key

Watermark

Watermarking

Watermark Extraction

Marked Object

Yes/No (confidence level)

Sion, Atallah, Prabhakar

Watermarking content aggregates

Sion, Atallah, Prabhakar

Original Stego Object

Watermarking content aggregates

9

3 layer visible watermark (IBM)

10

attacks • detect and remove (“subtractive”) • know how & Key • statistics & Key

• perturb (transform, segment etc) • know approximately how • statistics

• add new watermark (“additive”) • claim ownership based on new watermark

• combine stego object copies (“collusion”) • used to avoid fingerprints

Sion, Atallah, Prabhakar

Watermarking content aggregates

Sion, Atallah, Prabhakar

11

Watermarking content aggregates

12

3

folklore: digital watermark types

folklore: images • visible • LS (least significant) bits • LS with secret Key • LS with secret Key and pixel suitability test (e.g. luminosity var.) • adding redundancy • embedding according to compression scheme if known (GIF - palette games) • embed in frequency domain (JPEG) by altering the DCT coefficients • masking of human eye

• multimedia: • images • audio • video • non-media: • text • software/runnable code • numeric sets • structures

Sion, Atallah, Prabhakar

Watermarking content aggregates

Sion, Atallah, Prabhakar

Watermarking content aggregates

13

folklore: audio

folklore: video

• LS (least significant) bits of samples • LS with secret Key • LS with secret Key and sample suitability test (e.g. noise ratio variance) • adding redundancy • masking of human auditory system (sound interference - low level/strong level, close frequencies) • “echo hiding” schemes • statistical embedding (relies on large-sets theory, e.g. 1 bit in every 1.2secs timeslice [1], change pdf of subsets selected using Key) Sion, Atallah, Prabhakar

14

• LS (least significant) bits of samples • LS with secret Key • LS with secret Key and sample suitability test (e.g. noise ratio variance) • adding redundancy • per frame apply image watermarking • human visual-temporal perception limitations (30fps-24fps) • encoding scheme dependent watermarking (MPEG - I-frames, B-frames) • captioning (annotation vs. watermark)

Watermarking content aggregates

Sion, Atallah, Prabhakar

15

Watermarking content aggregates

16

4

folklore: text/language

folklore: code/software

• “text” vs. “language” • code: register allocation/use • code: order of push/pop of registers • code: hidden values in low/high order bytes • algorithms: runtime structures (number -> graph -> structure at runtime) • code: obfuscation/runtime tamperproofing • code/algorithm: inherent part of behavior (e.g. “easter egg” - code activated after unusual input). • code: “guarding”.

• synonyms, rearranging text (vs. canonical form), distances between key words, variation of distributions of letters between key words, number of words per class (e.g.verbs, substantives) • syntax/semantic tree surgeries • semantic watermarking • “stego Turing test”: “can computer watermark NL automatically ?” Sion, Atallah, Prabhakar

Watermarking content aggregates

Sion, Atallah, Prabhakar

Watermarking content aggregates

17

folklore: media watermarking specifics

our research: non-media watermarking

• Generalize:

Bandwidth comes from exploits of limitations of the Human Sensorial System and associated media noise channels

• Define/formalize more general model (no FFT ! ;) • Develop generic techniques for watermarking • Define model elements assesment metrics

• Non-media:

What about future attackers in A.D. 3000++ ?

• Develop generic model (above) variations for watermarking structured content • Amplify power of domain-specific marking methods • Structures (e.g. numbers,documents,MLs,text)

(need fundamental theoretic encoding power warranties)

Sion, Atallah, Prabhakar

18

Watermarking content aggregates

Sion, Atallah, Prabhakar

19

Watermarking content aggregates

20

5

why ?

buzz: XML Application Design & Implementation

Bussiness Model

• Outsourcing of commercial data • (X/HT)ML: SOAP,web content, • Software meta-descriptions • B2B interactions • Stock data sharing • Customer data buying patterns • Financial analysis data • It’s fun !

Stock Market Trends Data

Web Page

etc. (any data with structure) Content: "what"

DTD: "how"

XML Description

Interoperability Sion, Atallah, Prabhakar

Watermarking content aggregates

Sion, Atallah, Prabhakar

Watermarking content aggregates

21

issues: model

issues: usability Idea: same object put to different uses (“usability domains”) has different value for each of the uses (“usability”) and associated permissible distortion bounds (“allowable change in usability”). (e.g. same picture containing different objects of differing interest for different people)

• Usability • Domains • Change in usability • Vicinities

• Watermark • Algorithm • Attack • Power • Domain Desiderata • Information Theory of Structures

Sion, Atallah, Prabhakar

22

Watermarking content aggregates

Sion, Atallah, Prabhakar

23

∆u

max

O' ∆u

max

O

usability vicinity of O usability domain

Watermarking content aggregates

24

6

issues: generic challenge

issues: key pre-commitment

k

The main challenge in watermarking lies in keeping the watermarking object within close vicinity of the original object in all considered usability domains while maximizing the power metric level of the application.

O w

wm

O'

det

O w

Given a data domain D, an object O in D and a watermarking algorithm wm is there any way to find a key k that will yield a desired mark w in the unmarked O ? In other words, for the given domain and algorithm class:

(i.e. given a set of usability domains and associated permissible changes in usability)

“can we torture the data until it confesses ?” Sion, Atallah, Prabhakar

Watermarking content aggregates

Sion, Atallah, Prabhakar

25

Watermarking content aggregates

26

hypertext

Sion, Atallah, Prabhakar

Watermarking content aggregates

27

7

aggregates: initial ideas

aggregates: challenges • Lack of inherent structural noise

• Structure -> what about “any” structure • Value in structure and content • Node/items labeling (TCL) • Attacks -> tolerant labeling • Resillience -> partitioning • Semantic partitioning • Primitive watermark: noise injection • Resillience -> hierarchical watermarking

• New transform domain (content and structural) • New data types • Many different data types • Structured vs. non-structured • Isolate general model from data domain specifics

Sion, Atallah, Prabhakar

Watermarking content aggregates

Sion, Atallah, Prabhakar

Watermarking content aggregates

29

aggregates: challenging properties

aggregates: attacks

Structured Data

Content

• A1: node elimination (subtractive) • A2: inter-node relation elimination • A3: value preserving partitioning • A4: node content altering • A5: addition of fake nodes • etc!

Structure variable change tolerance

variable bandwidth higher resilience

30

low change tolerance

Content Watermarking

Structure Watermarking

Modified Content

low bandwidth high fragility

Watermarked Structured Data

Modified Structure

Sion, Atallah, Prabhakar

Watermarking content aggregates

Sion, Atallah, Prabhakar

31

Watermarking content aggregates

32

8

aggregates: primitive labeling

aggregates: “angry/content hashes”

Labels are location and content aware by depending on both topology and content of node (“angry hashes”)

angry hash (content) = function of content, (specific to it) that tolerates “minor” (in terms of usability) changes to content. (e.g. longest number of most significant bits for set of integers s.t. resulting hash values are maximally distinct)

Sion, Atallah, Prabhakar

Watermarking content aggregates

Sion, Atallah, Prabhakar

Watermarking content aggregates

33

aggregates: tolerant canonical labeling (TCL) Composite labels of collection items are formed of sets (or confidence intervals) of individual labels resulted after successive training (e.g. original graph surgery) and labeling sessions. Each labeling session is self-adjusting according to history Sion, Atallah, Prabhakar

alteration constraints collection data (C)

34

aggregates: mark amplification

watermarking algorithm

By applying a weak mark on secret subsets of the original collection, the overall power of the marking scheme is effectively amplified

collection labeling training scenarios

training/surgery C'

C''

C'''

primitive labeling

collection subset

weak mark

collection

L'

L''

L'''

Composite Label collection subset

Watermarking content aggregates

Sion, Atallah, Prabhakar

35

Watermarking content aggregates

36

9

aggregates: hierarchical watermarking 2

3

B

Problem: Given set of numbers N, a set of local and global allowable distortions bounds T, and a set of keys K, determine the watermarked version of N, N’ such that all elements in T are satisfied and N’ features enough watermark power.

15

Question: how much is “enough” ?

2

6

6

5 1

numeric sets

4

1

A

10 9 8

13

16

8 7

12 11

7

15

C

14

J

8

Sion, Atallah, Prabhakar

Idea: use/alter global numeric properties (within distortion limits T) as bandwidth channels (e.g. confidence intervals), together with secret subset selection.

A

K

Watermarking content aggregates

Sion, Atallah, Prabhakar

Watermarking content aggregates

37

numeric sets: ideas

DBMS: challenges

• Numeric Set • Semantics • Structure

• New transforms • Views and data mining

• Labeling • Normalized distance from mean • Most important bits of item

• Preservation of relational model • Preservation of consistency

• Weak mark • Confidence intervals violators • Normalized distance from mean

• Numeric vs. alphanumeric vs. binary • Attribute semantics awareness

• Amplification: keyed subset selection Sion, Atallah, Prabhakar

38

Watermarking content aggregates

Sion, Atallah, Prabhakar

39

Watermarking content aggregates

40

10

DBMS: challenges

DBMS: required properties (details) DB Content

Core-content Meta-content high change tolerance

variable bandwidth higher resilience

medium change tolerance

Content Watermarking

• Resilience to:

Structure

• New consistent data (provide bounds) • Transforms: proj/join/sel/linear changes • Simple row swapping • Minimize quantitative change (nr. of tuples) • Should be detectable from most data views (amount of data needed to detect watermark) • Detect watermark without original data • Maintain relation chains (e.g. foreign keys) • Maintain semantics (e.g. 20yrs. vs. 21yrs)

Core-structure Hidden structure low change tolerance

variable change tolerance

Structure Watermarking

e.g. statistical props association rules lower bandwidth high fragility

Watermarked DB Content Core-content Meta-content

Structure Core-structure Hidden structure

Sion, Atallah, Prabhakar

Watermarking content aggregates

Sion, Atallah, Prabhakar

Watermarking content aggregates

41

DBMS: initial ideas

generalizing: “collections”

Translation to canonical form

• “Collection” = items and structure/patterns

(sorting/tree expansion)

Semantics check

42

type descriptors

• Keyed Tolerant Canonical Labeling (TCL)

determine available bandwidth

• Item Content Hashing (“angry hashes”) Propagate permissible modification bounds

• Weak Watermarks.

(error bounds, permissible structural surgery boundaries)

Embed watermark within available bandwidth

Alter general statistics and confidence intervals

• Power Amplification by secret subset selection

(modifiable objects & change tolerance levels)

Sion, Atallah, Prabhakar

Watermarking content aggregates

Sion, Atallah, Prabhakar

43

Watermarking content aggregates

44

11

“Space Odyssey” (HAL == IBM ?)

refs [1] “Digital Watermarking”, by Ingemar Cox, Matthew Miller, and Jeffrey Bloom, ISBN 1558607145, Morgan Kaufmann, Oct 2001 [2] “Information Hiding Techniques for Steganography and Digital Watermarking” Stefan Katzenbeisser and Fabien Petitcolas (editors), Jan. 2000, Artech House, ISBN 1580530354 [3] “Information Hiding: Steganography and Watermarking - Attacks and Countermeasures” by Neil Johnson, Zoran Duric, and Sushil Jajodia, Feb. 2001, Kluwer Academic, ISBN 0792372042 [4] Elizabeth Ferril, Matthew Moyer, "A Survey of Digital Watermarking", February 25, 1999 [5] Petitcolas, Anderson, Kuhn, "Attacks on Copyright Marking Systems", David Aucsmith, Ed., Second workshop on information hiding, in vol. 1525 of Lecture Notes in Computer Science, Portland, Oregon, 1998, pp.218-238 [6] Anderson, Petitcolas, "On the Limits of Steganography", IEEE Journal of Selected Areas in Communications, 16(4):474-481, May 1998, Special Issue on Copyright & Privacy Protection [7] Petitcolas, Anderson, Kuhn, "Information Hiding - A Survey", Proc. of the IEEE, special issue on protection of multimedia content, 87(7):1062-1078, July 1999 [8] Palsberg, Krishnaswamy, Kwon, Ma, Shao, Zhang "Experience with Software Watermarking", CERIAS and Dept. of Computer Sciences, Purdue, 2000 [9] M. Atallah et. al. “Natural Language Watermarking: Design, analysis and proof-of-concept implementation”, Proc. of 4th International Information Hiding Workshop, April 2001, Springer Verlag

does this still have value ? Sion, Atallah, Prabhakar

Watermarking content aggregates

Sion, Atallah, Prabhakar

45

Watermarking content aggregates

46

12