Streaming Data Mesh (Final Release) 9781098130725, 9781098135973

Data lakes and warehouses have become increasingly fragile, costly, and difficult to maintain as data gets bigger and mo

243 110 6MB

English Pages 223 Year 2023

Report DMCA / Copyright

DOWNLOAD EPUB FILE

Table of contents :
Preface
Who Should Read This Book
Why We Wrote This Book
Navigating This Book
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Hubert
Stephen
1. Data Mesh Introduction
Data Divide
Data Mesh Pillars
Data Ownership
Data as a Product
Federated Computational Data Governance
Self-Service Data Platform
Data Mesh Diagram
Other Similar Architectural Patterns
Data Fabric
Data Gateways and Data Services
Data Democratization
Data Virtualization
Focusing on Implementation
Apache Kafka
AsyncAPI
2. Streaming Data Mesh Introduction
The Streaming Advantage
Streaming Enables Real-Time Use Cases
Streaming Enables Data Optimization Advantages
Reverse ETL
The Kappa Architecture
Lambda Architecture Introduction
Kappa Architecture Introduction
Summary
3. Domain Ownership
Identifying Domains
Discernible Domains
Geographic Regions
Hybrid Architecture
Multicloud
Avoiding Ambiguous Domains
Domain-Driven Design
Domain Model
Domain Logic
Bounded Context
The Ubiquitous Language
Data Mesh Domain Roles
Data Product Engineer
Data Product Owner or Data Steward
Streaming Data Mesh Tools and Platforms to Consider
Domain Charge-Backs
Summary
4. Streaming Data Products
Defining Data Product Requirements
Identifying Data Product Derivatives
Derivatives from Other Domains
Ingesting Data Product Derivatives with Kafka Connect
Consumability
Synchronous Data Sources
Asynchronous Data Sources and Change Data Capture
Debezium Connectors
Transforming Data Derivatives to Data Products
Data Standardization
Protecting Sensitive Information
SQL
Extract, Transform, and Load
Publishing Data Products with AsyncAPI
Registering the Streaming Data Product
Building an AsyncAPI YAML Document
Assigning Data Tags
Versioning
Monitoring
Summary
5. Federated Computational Data Governance
Data Governance in a Streaming Data Mesh
Data Lineage Graph
Streaming Data Catalog to Organize Data Products
Metadata
Schemas
Lineage
Security
Scalability
Generating the Data Product Page from AsyncAPI
Apicurio Registry
Access Workflow
Centralized Versus Decentralized
Centralized Engineers
Decentralized (Domain) Engineers
Summary
6. Self-Service Data Infrastructure
Streaming Data Mesh CLI
Resource-Related Commands
Cluster-Related Commands
Topic-Related Commands
The domain Commands
The connect Commands
The streaming Commands
Publishing a Streaming Data Product
Data Governance-Related Services
Security Services
Standards Services
Lineage Services
SaaS Services and APIs
Summary
7. Architecting a Streaming Data Mesh
Infrastructure
Two Architecture Solutions
Dedicated Infrastructure
Multitenant Infrastructure
Streaming Data Mesh Central Architecture
The Domain Agent (aka Sidecar)
Data Plane
Control Plane
Summary
8. Building a Decentralized Data Team
The Traditional Data Warehouse Structure
Introducing the Decentralized Team Structure
Empowering People
Working Processes
Fostering Collaboration
Data-Driven Automation
New Roles in Data Domains
New Roles in the Data Plane
New Roles in Data Science and Business Intelligence
9. Feature Stores
Separating Data Engineering from Data Science
Online and Offline Data Stores
Apache Feast Introduction
Summary
10. Streaming Data Mesh in Practice
Streaming Data Mesh Example
Deploying an On-Premises Streaming Data Mesh
Installing a Connector
Deploying Clickstream Connector and Auto-Creating Tables
Deploying the Debezium Postgres CDC Connector
Enrichment of Streaming Data
Publishing the Data Product
Consuming Streaming Data Products
Fully Managed SaaS Services
Summary and Considerations
Index
About the Authors

Streaming Data Mesh (Final Release)
 9781098130725, 9781098135973

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
Recommend Papers