296 98 391KB
English Pages 12 Year 2000
today
✏
✍ ✎ ✍✍
✢✥
✜✤ ✢
✣✤ ✘
✚ ✗✒ ✘✙
✗✒ ✘
✕✖ ✑
✓✔
✒
✢
✛
✜
✌
☎☛ ☞ ✡✟
✠
✜
Radu Sion ([email protected])
✟✞
✄☎
✁
✝✆
✂
Watermarking Multi-Content Aggregates
✮✪ ✭
❂❁ ❃ ❀
❆❏ ❉
● ❉
❉❊
❆
❆❏ ❉▼
❉❍
◗
❘❏
▲▼ ◆
▼
▲◆ ❏ ❆
P◗
❏ ▲◆ ❍
❆
❖◆ ❉▼
❊
❆
❍
❍
❲❳
❚❱ ❨
❑
❯❚
❚❙
❑
❊▲
❏
■
❇
❄
❆
❍● ❆ ❊
■
❋
❇
✿✽ ✾
✻✼
✷ ✺ ✹✸
✵✱ ✶✱
❆❅
Computer Sciences and CERIAS Purdue University
❇❈
✯ ✳✴✲✰ ✱
✦
✩✪
✧★
✬✫
http://www.cs.purdue.edu/homes/sion
ver. 2.12, April 02, 2002
Keywords: Digital Watermarking, Steganography, Security, Copyright Protection, Databases Sion, Atallah, Prabhakar
Watermarking content aggregates
2
overview
introduction
• general Introduction (aprox. 10 slides)
• what is the problem ?
• folklore: traditional watermarking
• what is a solution ?
(aprox. 10 slides)
• who cares anyway ?
• our research (aprox. 30 slides)
Sion, Atallah, Prabhakar
Watermarking content aggregates
Sion, Atallah, Prabhakar
3
Watermarking content aggregates
4
1
issues
solution
• affirm creation rights: resiliently embed information within the object (and its copies !) allowing identification of the actual copyright owner in a Court of Law
Watermarking deploys information hiding techniques in the aim to become a solution to the previously outlined issues.
• inline annotation: encode (not necessary hide) information in object • identify agreement violators (“bad people”): hide and persist information in each sold copy of the object, allowing identification of the initial buyer of that particular copy (“fingerprinting”)
i.e. hiding a certain mark (e.g. “radu is the author of this novel”) into the object itself (e.g. novel text) is hoped to hold up in court as evidence for copyright purposes at a later dispute time; important issue: “attack survivability”
• unobtrusive communication: use ‘innocent looking’ message to hide secret (covert channel). • etc. Sion, Atallah, Prabhakar
Watermarking content aggregates
Sion, Atallah, Prabhakar
Watermarking content aggregates
5
market: got money ?
6
information hiding Information Hiding
Buyers of watermarking technology include Covert Channels
any party that produces and/or sells valuable content and then distributes it through untrusted channels,
Linguistic
Technical
Robust
Fingerprints
especially in the case when the content allows for valuable derivates, in which case the watermarking technology has to also provide protection for the derivates.
Copyrights Marking
Anonymity
Steganography
Classification of Information Hiding (according to Petitcolas et. all.)
Fragile
Watermarks
Imperceptible
Perceptible
Fundamental difference: Watermarking vs. Steganography
Sion, Atallah, Prabhakar
Watermarking content aggregates
Sion, Atallah, Prabhakar
7
Watermarking content aggregates
8
2
watermark embedding
watermark detection
Stego Object
Key
Marked Object
Watermark
Key
Watermark
Watermarking
Watermark Extraction
Marked Object
Yes/No (confidence level)
Sion, Atallah, Prabhakar
Watermarking content aggregates
Sion, Atallah, Prabhakar
Original Stego Object
Watermarking content aggregates
9
3 layer visible watermark (IBM)
10
attacks • detect and remove (“subtractive”) • know how & Key • statistics & Key
• perturb (transform, segment etc) • know approximately how • statistics
• add new watermark (“additive”) • claim ownership based on new watermark
• combine stego object copies (“collusion”) • used to avoid fingerprints
Sion, Atallah, Prabhakar
Watermarking content aggregates
Sion, Atallah, Prabhakar
11
Watermarking content aggregates
12
3
folklore: digital watermark types
folklore: images • visible • LS (least significant) bits • LS with secret Key • LS with secret Key and pixel suitability test (e.g. luminosity var.) • adding redundancy • embedding according to compression scheme if known (GIF - palette games) • embed in frequency domain (JPEG) by altering the DCT coefficients • masking of human eye
• multimedia: • images • audio • video • non-media: • text • software/runnable code • numeric sets • structures
Sion, Atallah, Prabhakar
Watermarking content aggregates
Sion, Atallah, Prabhakar
Watermarking content aggregates
13
folklore: audio
folklore: video
• LS (least significant) bits of samples • LS with secret Key • LS with secret Key and sample suitability test (e.g. noise ratio variance) • adding redundancy • masking of human auditory system (sound interference - low level/strong level, close frequencies) • “echo hiding” schemes • statistical embedding (relies on large-sets theory, e.g. 1 bit in every 1.2secs timeslice [1], change pdf of subsets selected using Key) Sion, Atallah, Prabhakar
14
• LS (least significant) bits of samples • LS with secret Key • LS with secret Key and sample suitability test (e.g. noise ratio variance) • adding redundancy • per frame apply image watermarking • human visual-temporal perception limitations (30fps-24fps) • encoding scheme dependent watermarking (MPEG - I-frames, B-frames) • captioning (annotation vs. watermark)
Watermarking content aggregates
Sion, Atallah, Prabhakar
15
Watermarking content aggregates
16
4
folklore: text/language
folklore: code/software
• “text” vs. “language” • code: register allocation/use • code: order of push/pop of registers • code: hidden values in low/high order bytes • algorithms: runtime structures (number -> graph -> structure at runtime) • code: obfuscation/runtime tamperproofing • code/algorithm: inherent part of behavior (e.g. “easter egg” - code activated after unusual input). • code: “guarding”.
• synonyms, rearranging text (vs. canonical form), distances between key words, variation of distributions of letters between key words, number of words per class (e.g.verbs, substantives) • syntax/semantic tree surgeries • semantic watermarking • “stego Turing test”: “can computer watermark NL automatically ?” Sion, Atallah, Prabhakar
Watermarking content aggregates
Sion, Atallah, Prabhakar
Watermarking content aggregates
17
folklore: media watermarking specifics
our research: non-media watermarking
• Generalize:
Bandwidth comes from exploits of limitations of the Human Sensorial System and associated media noise channels
• Define/formalize more general model (no FFT ! ;) • Develop generic techniques for watermarking • Define model elements assesment metrics
• Non-media:
What about future attackers in A.D. 3000++ ?
• Develop generic model (above) variations for watermarking structured content • Amplify power of domain-specific marking methods • Structures (e.g. numbers,documents,MLs,text)
(need fundamental theoretic encoding power warranties)
Sion, Atallah, Prabhakar
18
Watermarking content aggregates
Sion, Atallah, Prabhakar
19
Watermarking content aggregates
20
5
why ?
buzz: XML Application Design & Implementation
Bussiness Model
• Outsourcing of commercial data • (X/HT)ML: SOAP,web content, • Software meta-descriptions • B2B interactions • Stock data sharing • Customer data buying patterns • Financial analysis data • It’s fun !
Stock Market Trends Data
Web Page
etc. (any data with structure) Content: "what"
DTD: "how"
XML Description
Interoperability Sion, Atallah, Prabhakar
Watermarking content aggregates
Sion, Atallah, Prabhakar
Watermarking content aggregates
21
issues: model
issues: usability Idea: same object put to different uses (“usability domains”) has different value for each of the uses (“usability”) and associated permissible distortion bounds (“allowable change in usability”). (e.g. same picture containing different objects of differing interest for different people)
• Usability • Domains • Change in usability • Vicinities
• Watermark • Algorithm • Attack • Power • Domain Desiderata • Information Theory of Structures
Sion, Atallah, Prabhakar
22
Watermarking content aggregates
Sion, Atallah, Prabhakar
23
∆u
max
O' ∆u
max
O
usability vicinity of O usability domain
Watermarking content aggregates
24
6
issues: generic challenge
issues: key pre-commitment
k
The main challenge in watermarking lies in keeping the watermarking object within close vicinity of the original object in all considered usability domains while maximizing the power metric level of the application.
O w
wm
O'
det
O w
Given a data domain D, an object O in D and a watermarking algorithm wm is there any way to find a key k that will yield a desired mark w in the unmarked O ? In other words, for the given domain and algorithm class:
(i.e. given a set of usability domains and associated permissible changes in usability)
“can we torture the data until it confesses ?” Sion, Atallah, Prabhakar
Watermarking content aggregates
Sion, Atallah, Prabhakar
25
Watermarking content aggregates
26
hypertext
Sion, Atallah, Prabhakar
Watermarking content aggregates
27
7
aggregates: initial ideas
aggregates: challenges • Lack of inherent structural noise
• Structure -> what about “any” structure • Value in structure and content • Node/items labeling (TCL) • Attacks -> tolerant labeling • Resillience -> partitioning • Semantic partitioning • Primitive watermark: noise injection • Resillience -> hierarchical watermarking
• New transform domain (content and structural) • New data types • Many different data types • Structured vs. non-structured • Isolate general model from data domain specifics
Sion, Atallah, Prabhakar
Watermarking content aggregates
Sion, Atallah, Prabhakar
Watermarking content aggregates
29
aggregates: challenging properties
aggregates: attacks
Structured Data
Content
• A1: node elimination (subtractive) • A2: inter-node relation elimination • A3: value preserving partitioning • A4: node content altering • A5: addition of fake nodes • etc!
Structure variable change tolerance
variable bandwidth higher resilience
30
low change tolerance
Content Watermarking
Structure Watermarking
Modified Content
low bandwidth high fragility
Watermarked Structured Data
Modified Structure
Sion, Atallah, Prabhakar
Watermarking content aggregates
Sion, Atallah, Prabhakar
31
Watermarking content aggregates
32
8
aggregates: primitive labeling
aggregates: “angry/content hashes”
Labels are location and content aware by depending on both topology and content of node (“angry hashes”)
angry hash (content) = function of content, (specific to it) that tolerates “minor” (in terms of usability) changes to content. (e.g. longest number of most significant bits for set of integers s.t. resulting hash values are maximally distinct)
Sion, Atallah, Prabhakar
Watermarking content aggregates
Sion, Atallah, Prabhakar
Watermarking content aggregates
33
aggregates: tolerant canonical labeling (TCL) Composite labels of collection items are formed of sets (or confidence intervals) of individual labels resulted after successive training (e.g. original graph surgery) and labeling sessions. Each labeling session is self-adjusting according to history Sion, Atallah, Prabhakar
alteration constraints collection data (C)
34
aggregates: mark amplification
watermarking algorithm
By applying a weak mark on secret subsets of the original collection, the overall power of the marking scheme is effectively amplified
collection labeling training scenarios
training/surgery C'
C''
C'''
primitive labeling
collection subset
weak mark
collection
L'
L''
L'''
Composite Label collection subset
Watermarking content aggregates
Sion, Atallah, Prabhakar
35
Watermarking content aggregates
36
9
aggregates: hierarchical watermarking 2
3
B
Problem: Given set of numbers N, a set of local and global allowable distortions bounds T, and a set of keys K, determine the watermarked version of N, N’ such that all elements in T are satisfied and N’ features enough watermark power.
15
Question: how much is “enough” ?
2
6
6
5 1
numeric sets
4
1
A
10 9 8
13
16
8 7
12 11
7
15
C
14
J
8
Sion, Atallah, Prabhakar
Idea: use/alter global numeric properties (within distortion limits T) as bandwidth channels (e.g. confidence intervals), together with secret subset selection.
A
K
Watermarking content aggregates
Sion, Atallah, Prabhakar
Watermarking content aggregates
37
numeric sets: ideas
DBMS: challenges
• Numeric Set • Semantics • Structure
• New transforms • Views and data mining
• Labeling • Normalized distance from mean • Most important bits of item
• Preservation of relational model • Preservation of consistency
• Weak mark • Confidence intervals violators • Normalized distance from mean
• Numeric vs. alphanumeric vs. binary • Attribute semantics awareness
• Amplification: keyed subset selection Sion, Atallah, Prabhakar
38
Watermarking content aggregates
Sion, Atallah, Prabhakar
39
Watermarking content aggregates
40
10
DBMS: challenges
DBMS: required properties (details) DB Content
Core-content Meta-content high change tolerance
variable bandwidth higher resilience
medium change tolerance
Content Watermarking
• Resilience to:
Structure
• New consistent data (provide bounds) • Transforms: proj/join/sel/linear changes • Simple row swapping • Minimize quantitative change (nr. of tuples) • Should be detectable from most data views (amount of data needed to detect watermark) • Detect watermark without original data • Maintain relation chains (e.g. foreign keys) • Maintain semantics (e.g. 20yrs. vs. 21yrs)
Core-structure Hidden structure low change tolerance
variable change tolerance
Structure Watermarking
e.g. statistical props association rules lower bandwidth high fragility
Watermarked DB Content Core-content Meta-content
Structure Core-structure Hidden structure
Sion, Atallah, Prabhakar
Watermarking content aggregates
Sion, Atallah, Prabhakar
Watermarking content aggregates
41
DBMS: initial ideas
generalizing: “collections”
Translation to canonical form
• “Collection” = items and structure/patterns
(sorting/tree expansion)
Semantics check
42
type descriptors
• Keyed Tolerant Canonical Labeling (TCL)
determine available bandwidth
• Item Content Hashing (“angry hashes”) Propagate permissible modification bounds
• Weak Watermarks.
(error bounds, permissible structural surgery boundaries)
Embed watermark within available bandwidth
Alter general statistics and confidence intervals
• Power Amplification by secret subset selection
(modifiable objects & change tolerance levels)
Sion, Atallah, Prabhakar
Watermarking content aggregates
Sion, Atallah, Prabhakar
43
Watermarking content aggregates
44
11
“Space Odyssey” (HAL == IBM ?)
refs [1] “Digital Watermarking”, by Ingemar Cox, Matthew Miller, and Jeffrey Bloom, ISBN 1558607145, Morgan Kaufmann, Oct 2001 [2] “Information Hiding Techniques for Steganography and Digital Watermarking” Stefan Katzenbeisser and Fabien Petitcolas (editors), Jan. 2000, Artech House, ISBN 1580530354 [3] “Information Hiding: Steganography and Watermarking - Attacks and Countermeasures” by Neil Johnson, Zoran Duric, and Sushil Jajodia, Feb. 2001, Kluwer Academic, ISBN 0792372042 [4] Elizabeth Ferril, Matthew Moyer, "A Survey of Digital Watermarking", February 25, 1999 [5] Petitcolas, Anderson, Kuhn, "Attacks on Copyright Marking Systems", David Aucsmith, Ed., Second workshop on information hiding, in vol. 1525 of Lecture Notes in Computer Science, Portland, Oregon, 1998, pp.218-238 [6] Anderson, Petitcolas, "On the Limits of Steganography", IEEE Journal of Selected Areas in Communications, 16(4):474-481, May 1998, Special Issue on Copyright & Privacy Protection [7] Petitcolas, Anderson, Kuhn, "Information Hiding - A Survey", Proc. of the IEEE, special issue on protection of multimedia content, 87(7):1062-1078, July 1999 [8] Palsberg, Krishnaswamy, Kwon, Ma, Shao, Zhang "Experience with Software Watermarking", CERIAS and Dept. of Computer Sciences, Purdue, 2000 [9] M. Atallah et. al. “Natural Language Watermarking: Design, analysis and proof-of-concept implementation”, Proc. of 4th International Information Hiding Workshop, April 2001, Springer Verlag
does this still have value ? Sion, Atallah, Prabhakar
Watermarking content aggregates
Sion, Atallah, Prabhakar
45
Watermarking content aggregates
46
12