Table of contents : Table of Contents About the Author About the Technical Reviewer Introduction Chapter 1: The Importance of Anomalies and Anomaly Detection Defining Anomalies Outlier Noise vs. Anomalies Diagnosing an Example What If We’re Wrong? Anomalies in the Wild Finance Medicine Sports Analytics A $23 Million Mistake A Persistent Anomaly Web Analytics And Many More Classes of Anomaly Detection Statistical Anomaly Detection Clustering Anomaly Detection Model-Based Anomaly Detection Building an Anomaly Detector Key Goals How Do Humans Handle Anomalies? Known Unknowns Conclusion Chapter 2: Humans Are Pattern Matchers A Primer on the Gestalt School Key Findings of the Gestalt School Emergence Reification Invariance Multistability Principles Implied in the Key Findings Meaningfulness Conciseness Closure Similarity Good Continuation Figure and Ground Proximity Connectedness Common Region Symmetry Common Fate Synchrony Helping People Find Anomalies Use Color As a Signal Limit Nonmeaningful Information Enable “Connecting the Dots” Conclusion Chapter 3: Formalizing Anomaly Detection The Importance of Formalization “I’ll Know It When I See It” Isn’t Enough Human Fallibility Marginal Outliers The Limits of Visualization The First Formal Tool: Univariate Analysis Distributions and Histograms The Normal Distribution Mean, Variance, and Standard Deviation Additional Distributions Log-Normal Uniform Cauchy Robustness and the Mean The Susceptibility of Outliers The Median and “Robust” Statistics Beyond the Median: Calculating Percentiles Control Charts Conclusion Chapter 4: Laying Out the Framework Tools of the Trade Choosing a Programming Language Making Plumbing Choices Reducing Architectural Variables Developing an Initial Framework Battlespace Preparation Framing the API Input and Output Signatures Defining a Common Signature Defining an Outlier Sensitivity and Fraction of Anomalies Single Solution Combined Arms Framing the Solution Containerizing the Solution Conclusion Chapter 5: Building a Test Suite Tools of the Trade Unit Test Library Integration Testing Writing Testable Code Keep Methods Separated Emphasize Use Cases Functional or Clean: Your Choice Creating the Initial Tests Unit Tests Integration Tests Conclusion Chapter 6: Implementing the First Methods A Motivating Example Ensembling As a Technique Sequential Ensembling Independent Ensembling Choosing Between Sequential and Independent Ensembling Implementing the First Checks Standard Deviations from the Mean Median Absolute Deviations from the Median Distance from the Interquartile Range Completing the run_tests() Function Building a Scoreboard Weighting Results Determining Outliers Updating Tests Updating Unit Tests Updating Integration Tests Conclusion Chapter 7: Extending the Ensemble Adding New Tests Checking for Normality Approaching Normality A Framework for New Tests Grubbs’ Test for Outliers Generalized ESD Test for Outliers Dixon’s Q Test Calling the Tests Updating Tests Updating Unit Tests Updating Integration Tests Multi-peaked Data A Hidden Assumption The Solution: A Sneak Peek Conclusion Untitled Chapter 8: Visualize the Results Building a Plan What Do We Want to Show? How Do We Want to Show It? Developing a Visualization App Getting Started with Streamlit Building the Initial Screen Displaying Results and Details Conclusion Chapter 9: Clustering and Anomalies What Is Clustering? Common Cluster Terminology K-Means Clustering K-Nearest Neighbors When Clustering Makes Sense Gaussian Mixture Modeling Implementing a Univariate Version Updating Tests Common Problems with Clusters Choosing the Correct Number of Clusters Clustering Is Nondeterministic Alternative Approaches Tree-Based Approaches The Problem with Trees Conclusion Chapter 10: Connectivity-Based Outlier Factor (COF) Distance or Density? Local Outlier Factor Connectivity-Based Outlier Factor Introducing Multivariate Support Laying the Groundwork Implementing COF Test and Website Updates Unit Test Updates Integration Test Updates Website Updates Conclusion Chapter 11: Local Correlation Integral (LOCI) Local Correlation Integral Discovering the Neighborhood Multi-granularity Deviation Factor (MDEF) Multivariate Algorithm Ensembles Ensemble Types COF Combinations Incorporating LOCI Test and Website Updates Unit Test Updates Website Updates Conclusion Chapter 12: Copula-Based Outlier Detection (COPOD) Copula-Based Outlier Detection What’s a Copula? Intuition Behind COPOD Implementing COPOD Test and Website Updates Unit Test Updates Integration Test Updates Website Updates Conclusion Chapter 13: Time and Anomalies What Is Time Series? Time Series Changes Our Thinking Autocorrelation Smooth Movement The Nature of Change Data Requirements Time Series Modeling (Weighted) Moving Average Exponential Smoothing Autoregressive Models What Constitutes an Outlier? Local Outlier Behavioral Changes over Time Local Non-outlier in a Global Change Differences from Peer Groups Common Classes of Technique Conclusion Untitled Chapter 14: Change Point Detection What Is Change Point Detection? Benefits of Change Point Detection Change Point Detection with ruptures Dynamic Programming PELT Implementing Change Point Detection Test and Website Updates Unit Tests Integration Tests Website Updates Avenues of Further Improvement Conclusion Chapter 15: An Introduction to Multi-series Anomaly Detection What Is Multi-series Time Series? Key Aspects of Multi-series Time Series What Needs to Change? What’s the Difference? Leading and Lagging Factors Available Processes Cross-Euclidean Distance Cross-Correlation Coefficient SameTrend (STREND) Common Problems Conclusion Chapter 16: Standard Deviation of Differences (DIFFSTD) What Is DIFFSTD? Calculating DIFFSTD Key Assumptions Writing DIFFSTD Series Processing Segmentation Comparing the Norm Determining Outliers Test and Website Updates Unit Tests Integration Tests Website Updates Conclusion Chapter 17: Symbolic Aggregate Approximation (SAX) What Is SAX? Motifs and Discords Subsequences and Matches Discretizing the Data Implementing SAX Segmentation and Blocking Making SAX Multi-series Scoring Outliers Test and Website Updates Unit and Integration Tests Website Updates Conclusion Chapter 18: Configuring Azure Cognitive Services Anomaly Detector Gathering Market Intelligence Amazon Web Services: SageMaker Microsoft Azure: Cognitive Services Google Cloud: AI Services Configuring Azure Cognitive Services Set Up an Account Using the Demo Application Conclusion Chapter 19: Performing a Bake-Off Preparing the Comparison Supervised vs. Unsupervised Learning Choosing Datasets Scoring Results Performing the Bake-Off Accessing Cognitive Services via Python Accessing Our API via Python Dataset Comparisons Lessons Learned Making a Better Anomaly Detector Increasing Robustness Extending the Ensembles Training Parameter Values Conclusion Untitled Appendix Index