The Shape of dаta: Geometry-Based Machine Learning and Data Analysis in R 9781718503090, 9781718503083

Whether you’re a mathematician, seasoned data scientist, or marketing professional, you’ll find The Shape of Data to be

314 34 8MB

English Pages 264 Year 2023

Report DMCA / Copyright

DOWNLOAD EPUB FILE

Table of contents :
Cover
Praise for The Shape of Data
Title Page
Copyright
Dedication
About the Authors
Foreword
Acknowledgments
Introduction
Who Is This Book For?
About This Book
Downloading and Installing R
Installing R Packages
Getting Help with R
Support for Python Users
Summary
Chapter 1: The Geometric Structure of Data
Machine Learning Categories
Supervised Learning
Unsupervised Learning
Matching Algorithms and Other Machine Learning
Structured Data
The Geometry of Dummy Variables
The Geometry of Numerical Spreadsheets
The Geometry of Supervised Learning
Unstructured Data
Network Data
Image Data
Text Data
Summary
Chapter 2: The Geometric Structure of Networks
The Basics of Network Theory
Directed Networks
Networks in R
Paths and Distance in a Network
Network Centrality Metrics
The Degree of a Vertex
The Closeness of a Vertex
The Betweenness of a Vertex
Eigenvector Centrality
PageRank Centrality
Katz Centrality
Hub and Authority
Measuring Centrality in an Example Social Network
Additional Quantities of a Network
The Diversity of a Vertex
Triadic Closure
The Efficiency and Eccentricity of a Vertex
Forman–Ricci Curvature
Global Network Metrics
The Interconnectivity of a Network
Spreading Processes on a Network
Spectral Measures of a Network
Network Models for Real-World Behavior
Erdös–Renyi Graphs
Scale-Free Graphs
Watts–Strogatz Graphs
Summary
Chapter 3: Network Analysis
Using Network Data for Supervised Learning
Making Predictions with Social Media Network Metrics
Predicting Network Links in Social Media
Using Network Data for Unsupervised Learning
Applying Clustering to the Social Media Dataset
Community Mining in a Network
Comparing Networks
Analyzing Spread Through Networks
Tracking Disease Spread Between Towns
Tracking Disease Spread Between Windsurfers
Disrupting Communication and Disease Spread
Summary
Chapter 4: Network Filtration
Graph Filtration
From Graphs to Simplicial Complexes
Examples of Betti Numbers
The Euler Characteristic
Persistent Homology
Comparison of Networks with Persistent Homology
Summary
Chapter 5: Geometry in Data Science
Common Distance Metrics
Simulating a Small Dataset
Using Norm-Based Distance Metrics
Comparing Diagrams, Shapes, and Probability Distributions
K-Nearest Neighbors with Metric Geometry
Manifold Learning
Using Multidimensional Scaling
Extending Multidimensional Scaling with Isomap
Capturing Local Properties with Locally Linear Embedding
Visualizing with t-Distributed Stochastic Neighbor Embedding
Fractals
Summary
Chapter 6: Newer Applications of Geometry in Machine Learning
Working with Nonlinear Spaces
Introducing dgLARS
Predicting Depression with dgLARS
Predicting Credit Default with dgLARS
Applying Discrete Exterior Derivatives
Nonlinear Algebra in Machine Learning Algorithms
Comparing Choice Rankings with HodgeRank
Summary
Chapter 7: Tools for Topological Data Analysis
Finding Distinctive Groups with Unique Behavior
Validating Measurement Tools
Using the Mapper Algorithm for Subgroup Mining
Stepping Through the Mapper Algorithm
Using TDAmapper to Find Cluster Structures in Data
Summary
Chapter 8: Homotopy Algorithms
Introducing Homotopy
Introducing Homotopy-Based Regression
Comparing Results on a Sample Dataset
Summary
Chapter 9: Final Project: Analyzing Text Data
Building a Natural Language Processing Pipeline
The Project: Analyzing Language in Poetry
Tokenizing Text Data
Tagging Parts of Speech
Normalizing Vectors
Analyzing the Poem Dataset in R
Using Topology-Based NLP Tools
Summary
Chapter 10: Multicore and Quantum Computing
Multicore Approaches to Topological Data Analysis
Quantum Computing Approaches
Using the Qubit-Based Model
Using the Qumodes-Based Model
Using Quantum Network Algorithms
Speeding Up Algorithms with Quantum Computing
Using Image Classifiers on Quantum Computers
Summary
References
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 6 Datasets
Chapter 9 Dataset Poems
Index

The Shape of dаta: Geometry-Based Machine Learning and Data Analysis in R
 9781718503090, 9781718503083

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
Recommend Papers