Table of contents : Cover Praise for The Shape of Data Title Page Copyright Dedication About the Authors Foreword Acknowledgments Introduction Who Is This Book For? About This Book Downloading and Installing R Installing R Packages Getting Help with R Support for Python Users Summary Chapter 1: The Geometric Structure of Data Machine Learning Categories Supervised Learning Unsupervised Learning Matching Algorithms and Other Machine Learning Structured Data The Geometry of Dummy Variables The Geometry of Numerical Spreadsheets The Geometry of Supervised Learning Unstructured Data Network Data Image Data Text Data Summary Chapter 2: The Geometric Structure of Networks The Basics of Network Theory Directed Networks Networks in R Paths and Distance in a Network Network Centrality Metrics The Degree of a Vertex The Closeness of a Vertex The Betweenness of a Vertex Eigenvector Centrality PageRank Centrality Katz Centrality Hub and Authority Measuring Centrality in an Example Social Network Additional Quantities of a Network The Diversity of a Vertex Triadic Closure The Efficiency and Eccentricity of a Vertex Forman–Ricci Curvature Global Network Metrics The Interconnectivity of a Network Spreading Processes on a Network Spectral Measures of a Network Network Models for Real-World Behavior Erdös–Renyi Graphs Scale-Free Graphs Watts–Strogatz Graphs Summary Chapter 3: Network Analysis Using Network Data for Supervised Learning Making Predictions with Social Media Network Metrics Predicting Network Links in Social Media Using Network Data for Unsupervised Learning Applying Clustering to the Social Media Dataset Community Mining in a Network Comparing Networks Analyzing Spread Through Networks Tracking Disease Spread Between Towns Tracking Disease Spread Between Windsurfers Disrupting Communication and Disease Spread Summary Chapter 4: Network Filtration Graph Filtration From Graphs to Simplicial Complexes Examples of Betti Numbers The Euler Characteristic Persistent Homology Comparison of Networks with Persistent Homology Summary Chapter 5: Geometry in Data Science Common Distance Metrics Simulating a Small Dataset Using Norm-Based Distance Metrics Comparing Diagrams, Shapes, and Probability Distributions K-Nearest Neighbors with Metric Geometry Manifold Learning Using Multidimensional Scaling Extending Multidimensional Scaling with Isomap Capturing Local Properties with Locally Linear Embedding Visualizing with t-Distributed Stochastic Neighbor Embedding Fractals Summary Chapter 6: Newer Applications of Geometry in Machine Learning Working with Nonlinear Spaces Introducing dgLARS Predicting Depression with dgLARS Predicting Credit Default with dgLARS Applying Discrete Exterior Derivatives Nonlinear Algebra in Machine Learning Algorithms Comparing Choice Rankings with HodgeRank Summary Chapter 7: Tools for Topological Data Analysis Finding Distinctive Groups with Unique Behavior Validating Measurement Tools Using the Mapper Algorithm for Subgroup Mining Stepping Through the Mapper Algorithm Using TDAmapper to Find Cluster Structures in Data Summary Chapter 8: Homotopy Algorithms Introducing Homotopy Introducing Homotopy-Based Regression Comparing Results on a Sample Dataset Summary Chapter 9: Final Project: Analyzing Text Data Building a Natural Language Processing Pipeline The Project: Analyzing Language in Poetry Tokenizing Text Data Tagging Parts of Speech Normalizing Vectors Analyzing the Poem Dataset in R Using Topology-Based NLP Tools Summary Chapter 10: Multicore and Quantum Computing Multicore Approaches to Topological Data Analysis Quantum Computing Approaches Using the Qubit-Based Model Using the Qumodes-Based Model Using Quantum Network Algorithms Speeding Up Algorithms with Quantum Computing Using Image Classifiers on Quantum Computers Summary References Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 6 Datasets Chapter 9 Dataset Poems Index