Practical Data Privacy (Final Release)
9781098129460
Between major privacy regulations like the GDPR and CCPA and expensive and notorious data breaches, there has never been
918
247
6MB
English
Pages 344
Year 2023
Report DMCA / Copyright
DOWNLOAD EPUB FILE
Table of contents :
Foreword
Preface
What Is Data Privacy?
Who Should Read This Book
Privacy Engineering
Why I Wrote This Book
Navigating This Book
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
Acknowledgments
1. Data Governance and Simple Privacy Approaches
Data Governance: What Is It?
Identifying Sensitive Data
Identifying PII
Documenting Data for Use
Basic Data Documentation
Finding and Documenting Unknown Data
Tracking Data Lineage
Data Version Control
Basic Privacy: Pseudonymization for Privacy by Design
Summary
2. Anonymization
What Is Anonymization?
Defining Differential Privacy
Understanding Epsilon: What Is Privacy Loss?
What Differential Privacy Guarantees, and What It Doesn’t
Understanding Differential Privacy
Differential Privacy in Practice: Anonymizing the US Census
Differential Privacy with the Laplace Mechanism
Differential Privacy with Laplace: A Naive Attempt
Sensitivity and Error
Privacy Budgets and Composition
Exploring Other Mechanisms: Gaussian Noise for Differential Privacy
Comparing Laplace and Gaussian Noise
Real-World Differential Privacy: Debiasing Noisy Results
Sensitivity and Privacy Units
What About k-Anonymity?
Summary
3. Building Privacy into Data Pipelines
How to Build Privacy into Data Pipelines
Design Appropriate Privacy Measures
Meet Users Where They Are
Engineer Privacy In
Test and Verify
Engineering Privacy and Data Governance into Pipelines
An Example Data Sharing Workflow
Adding Provenance and Consent Information to Collection
Using Differential Privacy Libraries in Pipelines
Collecting Data Anonymously
Apple’s Differentially Private Data Collection
Why Chrome’s Original Differential Privacy Collection Died
Working with Data Engineering Team and Leadership
Share Responsibility
Create Workflows with Documentation and Privacy
Privacy as a Core Value Proposition
Summary
4. Privacy Attacks
Privacy Attacks: Analyzing Common Attack Vectors
Netflix Prize Attack
Linkage Attacks
Singling Out Attacks
Strava Heat Map Attack
Membership Inference Attack
Inferring Sensitive Attributes
Other Model Leakage Attacks: Memorization
Model-Stealing Attacks
Attacks Against Privacy Protocols
Data Security
Access Control
Data Loss Prevention
Extra Security Controls
Threat Modeling and Incident Response
Probabilistic Reasoning About Attacks
An Average Attacker
Measuring Risk, Assessing Threats
Data Security Mitigations
Applying Web Security Basics
Protecting Training Data and Models
Staying Informed: Learning About New Attacks
Summary
5. Privacy-Aware Machine Learning and Data Science
Using Privacy-Preserving Techniques in Machine Learning
Privacy-Preserving Techniques in a Typical Data Science or ML Workflow
Privacy-Preserving Machine Learning in the Wild
Differentially Private Stochastic Gradient Descent
Open Source Libraries for PPML
Engineering Differentially Private Features
Applying Simpler Methods
Documenting Your Machine Learning
Other Ways of Protecting Privacy in Machine Learning
Architecting Privacy in Data and Machine Learning Projects
Understanding Your Data Privacy Needs
Monitoring Privacy
Summary
6. Federated Learning and Data Science
Distributed Data
Why Use Distributed Data?
How Does Distributed Data Analysis Work?
Privacy-Charging Distributed Data with Differential Privacy
Federated Learning
Federated Learning: A Brief History
Why, When, and How to Use Federated Learning
Architecting Federated Systems
Example Deployment
Security Threats
Use Cases
Deploying Federated Libraries and Tools
Open Source Federated Libraries
Flower: Unified OSS for Federated Learning Libraries
A Federated Data Science Future Outlook
Summary
7. Encrypted Computation
What Is Encrypted Computation?
When to Use Encrypted Computation
Privacy Versus Secrecy
Threat Modeling
Types of Encrypted Computation
Secure Multiparty Computation
Homomorphic Encryption
Real-World Encrypted Computation
Private Set Intersection
Private Join and Compute
Secure Aggregation
Encrypted Machine Learning
Getting Started with PSI and Moose
Imagining a World with Secure Data Sharing
Summary
8. Navigating the Legal Side of Privacy
GDPR: An Overview
Fundamental Data Rights Under GDPR
Data Controller Versus Data Processor
Applying Privacy-Enhancing Technologies for GDPR
GDPR’s Data Protection Impact Assessment: Agile and Iterative Risk Assessments
Right to an Explanation: Interpretability and Privacy
California Consumer Privacy Act (CCPA)
Applying PETs for CCPA
Other Regulations: HIPAA, LGPD, PIPL, and More!
Internal Policies and Contracts
Reading Privacy Policies and Terms of Service
Reading Data Processing Agreements
Reading Policies, Guidelines, and Contracts
Working with Legal Professionals
Adhering to Contractual Agreements and Contract Law
Interpreting Data Protection Regulations
Asking for Help and Advice
Working Together on Shared Definitions and Ideas
Providing Technical Guidance
Data Governance 2.0
What Is Federated Governance?
Supporting a Culture of Experimentation
Documentation That Works, Platforms with PETs
Summary
9. Privacy and Practicality Considerations
Getting Practical: Managing Privacy and Security Risk
Evaluating and Managing Privacy Risk
Embracing Uncertainty While Planning for the Future
Practical Privacy Technology: Use-Case Analysis
Federated Marketing: Guiding Marketing Campaigns with Privacy Built In
Public-Private Partnerships: Sharing Data for Public Health
Anonymized Machine Learning: Looking for GDPR Compliance in Iterative Training Settings
Business-to-Business Application: Hands-Off Data
Step-by-Step: How to Integrate and Automate Privacy in ML
Iterative Discovery
Documenting Privacy Requirements
Evaluating and Combining Approaches
Shifting to Automation
Making Privacy Normal
Embracing the Future: Working with Research Libraries and Teams
Working with External Researchers
Investing in Internal Research
Summary
10. Frequently Asked Questions (and Their Answers!)
Encrypted Computation and Confidential Computing
Is Secure Computation Quantum-Safe?
Can I Use Enclaves to Solve Data Privacy or Data Secrecy Problems?
What If I Need to Protect the Privacy of the Client or User Who Sends the Database Query or Request?
Do Clean Rooms or Remote Data Analysis/Access Solve My Privacy Problem?
I Want to Provide Perfect Privacy or Perfect Secrecy. Is That Possible?
How Do I Determine That an Encrypted Computation Is Secure Enough?
If I Want to Use Encrypted Computation, How Do I Manage Key Rotation?
What Is Google’s Privacy Sandbox? Does It Use Encrypted Computation?
Data Governance and Protection Mechanisms
Why Isn’t k-Anonymity Enough?
I Don’t Think Differential Privacy Works for My Use Case. What Do I Do?
Can I Use Synthetic Data to Solve Privacy Problems?
How Should Data Be Shared Ethically or What Are Alternatives to Selling Data?
How Can I Find All the Private Information That I Need to Protect?
I Dropped the Personal Identifiers, so the Data Is Safe Now, Right?
How Do I Reason About Data I Released in the Past?
I’m Working on a BI Dashboard or Visualization. How Do I Make It Privacy-Friendly?
Who Makes Privacy Engineering Decisions? How Do I Fit Privacy Engineering into My Organization?
What Skills or Background Do I Need to Become a Privacy Engineer?
Why Didn’t You Mention (Insert Technology or Company Here)? How Do I Learn More? Help!
GDPR and Data Protection Regulations
Do I Really Need to Use Differential Privacy to Remove Data from GDPR/CPRA/LGPD/etc. Requirements?
I Heard That I Can Use Personal Data Under GDPR for Legitimate Interest. Is That Correct?
I Want to Comply with Schrems II and Transatlantic Data Flows. What Are Possible Solutions?
Personal Choices and Social Privacy
What Email Provider, Browser, and Application Should I Use if I Care About My Privacy?
My Friend Has an Automated Home or Phone Assistant. I Don’t Want It Listening to Me. What Should I Do?
I Gave Up on Privacy a Long Time Ago. I Have Nothing to Hide. Why Should I Change?
Can I Just Sell My Own Data to Companies?
I Like Personalized Ads. Why Don’t You?
Is (Fill in the Blank) Listening to Me? What Should I Do About It?
Summary
11. Go Forth and Engineer Privacy!
Surveillance Capitalism and Data Science
Gig Workers and Surveillance at Work
Surveillance for “Security”
Luxury Surveillance
Vast Data Collection and Society
Machine Learning as Data Laundering
Disinformation and Misinformation
Fighting Back
Researching, Documenting, Hacking, Learning
Collectivizing Data
Regulation Fining Back
Supporting Community Work
Privacy Champions
Your Privacy-Aware Multitool
Building Trustworthy Machine Learning Systems
Privacy by Design
Privacy and Power
Tschüss
Index
About the Author