Table of contents : Preface Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgments I. Data Analytics Techniques 1. So What? Creating Value with Data Science What Is Value? What: Understanding the Business So What: The Gist of Value Creation in DS Now What: Be a Go-Getter Measuring Value Key Takeaways Further Reading 2. Metrics Design Desirable Properties That Metrics Should Have Measurable Actionable Relevance Timeliness Metrics Decomposition Funnel Analytics Stock-Flow Decompositions P×Q-Type Decompositions Example: Another Revenue Decomposition Example: Marketplaces Key Takeaways Further Reading 3. Growth Decompositions: Understanding Tailwinds and Headwinds Why Growth Decompositions? Additive Decomposition Example Interpretation and Use Cases Multiplicative Decomposition Example Interpretation Mix-Rate Decompositions Example Interpretation Mathematical Derivations Additive Decomposition Multiplicative Decomposition Mix-Rate Decomposition Key Takeaways Further Reading 4. 2×2 Designs The Case for Simplification What’s a 2×2 Design? Example: Test a Model and a New Feature Example: Understanding User Behavior Example: Credit Origination and Acceptance Example: Prioritizing Your Workflow Key Takeaways Further Reading 5. Building Business Cases Some Principles to Construct Business Cases Example: Proactive Retention Strategy Fraud Prevention Purchasing External Datasets Working on a Data Science Project Key Takeaways Further Reading 6. What’s in a Lift? Lifts Defined Example: Classifier Model Self-Selection and Survivorship Biases Other Use Cases for Lifts Key Takeaways Further Reading 7. Narratives What’s in a Narrative: Telling a Story with Your Data Clear and to the Point Credible Memorable Actionable Building a Narrative Science as Storytelling What, So What, and Now What? What? So what? Now what? The Last Mile Writing TL;DRs Tips to Write Memorable TL;DRs Example: Writing a TL;DR for This Chapter Delivering Powerful Elevator Pitches Presenting Your Narrative Key Takeaways Further Reading 8. Datavis: Choosing the Right Plot to Deliver a Message Some Useful and Not-So-Used Data Visualizations Bar Versus Line Plots Slopegraphs Waterfall Charts Scatterplot Smoothers Plotting Distributions General Recommendations Find the Right Datavis for Your Message Choose Your Colors Wisely Different Dimensions in a Plot Aim for a Large Enough Data-Ink Ratio Customization Versus Semiautomation Get the Font Size Right from the Beginning Interactive or Not Stay Simple Start by Explaining the Plot Key Takeaways Further Reading II. Machine Learning 9. Simulation and Bootstrapping Basics of Simulation Simulating a Linear Model and Linear Regression What Are Partial Dependence Plots? Omitted Variable Bias Simulating Classification Problems Latent Variable Models Comparing Different Algorithms Bootstrapping Key Takeaways Further Reading 10. Linear Regression: Going Back to Basics What’s in a Coefficient? The Frisch-Waugh-Lovell Theorem Why Should You Care About FWL? Confounders Additional Variables The Central Role of Variance in ML Key Takeaways Further Reading 11. Data Leakage What Is Data Leakage? Outcome Is Also a Feature A Function of the Outcome Is Itself a Feature Bad Controls Mislabeling of a Timestamp Multiple Datasets with Sloppy Time Aggregations Leakage of Other Information Detecting Data Leakage Complete Separation Windowing Methodology Choosing the Length of the Windows The Training Stage Mirrors the Scoring Stage Implementing the Windowing Methodology I Have Leakage: Now What? Key Takeaways Further Reading 12. Productionizing Models What Does “Production Ready” Mean? Batch Scores (Offline) Real-Time Model Objects Data and Model Drift Essential Steps in any Production Pipeline Get and Transform Data Validate Data Training and Scoring Stages Validate Model and Scores Deploy Model and Scores Key Takeaways Further Reading 13. Storytelling in Machine Learning A Holistic View of Storytelling in ML Ex Ante and Interim Storytelling Creating Hypotheses Predicting human behavior Predicting system behavior Predicting downstream metrics Feature Engineering Ex Post Storytelling: Opening the Black Box Interpretability-Performance Trade-Off Linear Regression: Setting a Benchmark Feature Importance Heatmaps Partial Dependence Plots Accumulated Local Effects Key Takeaways Further Reading 14. From Prediction to Decisions Dissecting Decision Making Simple Decision Rules by Smart Thresholding Precision and Recall Example: Lead Generation Confusion Matrix Optimization Key Takeaways Further Reading 15. Incrementality: The Holy Grail of Data Science? Defining Incrementality Causal Reasoning to Improve Prediction Causal Reasoning as a Differentiator Improved Decision Making Confounders and Colliders Selection Bias Unconfoundedness Assumption Breaking Selection Bias: Randomization Matching Machine Learning and Causal Inference Open Source Codebases Double Machine Learning Key Takeaways Further Reading 16. A/B Tests What Is an A/B Test? Decision Criterion Minimum Detectable Effects Choosing the Statistical Power, Level, and P Estimating the Variance of the Outcome Simulations Example: Conversion Rates Setting the MDE Hypotheses Backlog Metric Hypothesis Ranking Governance of Experiments Key Takeaways Further Reading 17. Large Language Models and the Practice of Data Science The Current State of AI What Do Data Scientists Do? Evolving the Data Scientist’s Job Description Case Study: A/B Testing Case Study: Data Cleansing Case Study: Machine Learning LLMs and This Book Key Takeaways Further Reading Index