Fundamentals of Analytics Engineering
9781837636457
Gain a holistic understanding of the analytics engineering lifecycle by integrating principles from both data analysis a
137
77
12MB
English
Pages 455
Year 2024
Report DMCA / Copyright
DOWNLOAD EPUB FILE
Table of contents :
Fundamentals of Analytics Engineering
Foreword
Contributors
About the authors
About the reviewers
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Share Your Thoughts
Download a free PDF copy of this book
Prologue
Part 1:Introduction to Analytics Engineering
1
What Is Analytics Engineering?
Introducing analytics engineering
Defining analytics engineering
Why do we need analytics engineering?
A supermarket analogy
The shift from ETL to ELT
The difference between analytics engineers, data analysts, and data engineers
Summary
2
The Modern Data Stack
Understanding a Modern Data Stack
Explaining three key differentiators versus legacy stacks
Lowering technical barriers with a SQL-first approach
Improving infrastructure efficiency with cloud-native systems
Simplifying implementation and maintenance with managed and modular solutions
Discussing the advantages and disadvantages of the MDS
Summary
Part 2: Building Data Pipelines
3
Data Ingestion
Digging into the problem of moving data between two systems
The source of all problems
Understanding the eight essential steps of a data ingestion pipeline
Trigger
Connection
State management
Data extraction
Transformations
Validation and data quality
Loading
Archiving and retention
Managing the quality and scalability of data ingestion pipelines – the three key topics
Scalability and resilience
Monitoring, logging, and alerting
Governance
Working with data ingestion – an example pipeline
Summary
4
Data Warehousing
Uncovering the evolution of data warehousing
The problem with transactional databases
The history of data warehouses
Moving to the cloud
Benefits of cloud versus on-premises data warehouses
Cloud data warehouse users – no one-size fits all
Building blocks of a cloud data warehouse
Compute
Knowing the market leaders in cloud data warehousing
Amazon Redshift
Google BigQuery
Snowflake
Databricks
Use case – choosing the right cloud data warehouse
Managed versus self-hosted data warehouses
Summary
5
Data Modeling
The importance of data models
Completeness
Enforcement of business rules
Minimizing redundancy
Data reusability
Stability and flexibility
Elegance
Communication
Integration
Potential trade-offs
The elephant in the room – performance
Designing your data model
Data modeling techniques
Bill Inmon and relational modeling
Ralph Kimball and dimensional modeling
Daniel Linstedt and Data Vault
Comparison of the different data models
Choosing a data model
Summary
6
Transforming Data
Transforming data – the foundation of analytics work
A key step in the data value chain
Challenges in transforming data
Design choices
Where to apply transformations
Specify your data model
Layering transformations
Data transformation best practices
Readability and reusability first, optimization second
Modularity
Other best practices
An example of writing modular code
Tools that facilitate data transformations
Types of transformation tools
Considerations
Summary
7
Serving Data
Exposing data using dashboarding and BI tools
Dashboards
Spreadsheets
Programming environments
Low-code tools
Reverse ETL
Valuable
Usable
Sensible
Serving data – four key topics
Self-serving analytics and report factories
Interactive and static reports
Actionable and vanity metrics
Reusability and bespoke processes
Summary
Part 3: Hands-On Guide to Building a Data Platform
8
Hands-On Analytics Engineering
Technical requirements
Understanding the Stroopwafelshop use case
Business objectives, metrics, and KPIs
Looking at the data
The thing about spreadsheets
What about BI tools?
The tooling
Preparing Google Cloud
ELT using Airbyte Cloud
Loading the Stroopwafelshop data using Airbyte Cloud
Modeling data using dbt Cloud
The shortcomings of conventional analytics
The role of dbt in analytics engineering
Setting up dbt Cloud
Data marts
Additional dbt features
Visualizing data with Tableau
Why Tableau?
Selecting the KPIs
First visualization
Creating measures
Creating the store growth dashboard
What’s next?
Summary
Part 4: DataOps
9
Data Quality and Observability
Understanding the problem of data quality at the source, in transformations, and in data governance
Data quality issues in source systems
Data quality issues in data infrastructure and data pipelines
How data governance impacts data quality
Finding solutions to data quality issues – observability, data catalogs, and semantic layers
Using observability to improve your data quality
The benefits of data catalogs for data quality
Improving data quality with a semantic layer
Summary
10
Writing Code in a Team
Identifying the responsibilities of team members
Tracking tasks and issues
Tools for issue and task tracking
Clear task definition
Categorization and tagging
Managing versions with version control
Working with Git
Git branching
Development workflow for analytics engineers
Working with coding standards
PEP8
ANSI
Linters
Pre-commit hooks
Reviewing code
Pull requests – The four eyes principle
Continuous integration/continuous deployment
Documenting code
Documenting code in dbt
Code comments
READMEs
Documentation on getting started
Conceptual documentation
Working with containers
Refactoring and technical debt
Summary
11
Automating Workflows
Introducing DataOps
Orchestrating data pipelines
Designing an automated workflow – considerations
dbt Cloud
Airflow
Continuous integration
Integration
Continuous
Handling integration issues
Automating testing with a CI pipeline
Continuous deployment
The CD pipeline
Slim CI/CD
Configuring CI/CD in dbt Cloud
Continuous delivery
Continuous delivery versus continuous deployment
Summary
Part 5: Data Strategy
12
Driving Business Adoption
Defining analytics translation
The analytics value chain
Scoping analytics use cases
Identifying stakeholders
Ideating analytics use cases
Prioritizing use cases
Ensuring business adoption
Working incrementally
Gathering feedback
Knowing when to stop developing
Communicating your results
Documenting business logic
Summary
13
Data Governance
Understanding data governance
The objective of data governance
Applying data governance in analytics engineering
Defining data ownership
Data quality and integrity
Managing data assets
Training, enablement, and best practices
Data definitions
Addressing critical areas for seamless data governance
Resistance to change and adoption
Engaging stakeholders and fostering collaboration
Establishing a data governance roadmap
Summary
14
Epilogue
Reviewing the fundamental insights – what you’ve learned so far
Making your career future-proof – how to take it further
Tip #1 – keep learning and developing your skills
Tip #2 – network and engage with the community
Tip #3 – showcase your work and build a portfolio
Closing remarks
Index
Other Books You May Enjoy
Packt is searching for authors like you
Share Your Thoughts
Download a free PDF copy of this book