Distributed Machine Learning Patterns 9781617299025

Practical patterns for scaling machine learning from your laptop to a distributed cluster. Distributing machine learnin

353 90 14MB

English Pages 167 Year 2023

Table of contents :
Distributed Machine Learning Patterns
Copyright
contents
front matter
preface
acknowledgments
about this book
Who should read this book?
How this book is organized: A roadmap
About the code
liveBook discussion forum
about the author
about the cover illustration
Part 1 Basic concepts and background
1 Introduction to distributed machine learning systems
1.1 Large-scale machine learning
1.1.1 The growing scale
1.1.2 What can we do?
1.2 Distributed systems
1.2.1 What is a distributed system?
1.2.2 The complexity and patterns
1.3 Distributed machine learning systems
1.3.1 What is a distributed machine learning system?
1.3.2 Are there similar patterns?
1.3.3 When should we use a distributed machine learning system?
1.3.4 When should we not use a distributed machine learning system?
1.4 What we will learn in this book
Summary
Part 2 Patterns of distributed machine learning systems
2 Data ingestion patterns
2.1 What is data ingestion?
2.2 The Fashion-MNIST dataset
2.3 Batching pattern
2.3.1 The problem: Performing expensive operations for Fashion MNIST dataset with limited memory
2.3.2 The solution
2.3.3 Discussion
2.3.4 Exercises
2.4 Sharding pattern: Splitting extremely large datasets among multiple machines
2.4.1 The problem
2.4.2 The solution
2.4.3 Discussion
2.4.4 Exercises
2.5 Caching pattern
2.5.1 The problem: Re-accessing previously used data for efficient multi-epoch model training
2.5.2 The solution
2.5.3 Discussion
2.5.4 Exercises
2.6 Answers to exercises
Section 2.3.4
Section 2.4.4
Section 2.5.4
Summary
3 Distributed training patterns
3.1 What is distributed training?
3.2 Parameter server pattern: Tagging entities in 8 million YouTube videos
3.2.1 The problem
3.2.2 The solution
3.2.3 Discussion
3.2.4 Exercises
3.3 Collective communication pattern
3.3.1 The problem: Improving performance when parameter servers become a bottleneck
3.3.2 The solution
3.3.3 Discussion
3.3.4 Exercises
3.4 Elasticity and fault-tolerance pattern
3.4.1 The problem: Handling unexpected failures when training with limited computational resources
3.4.2 The solution
3.4.3 Discussion
3.4.4 Exercises
3.5 Answers to exercises
Section 3.2.4
Section 3.3.4
Section 3.4.4
Summary
4 Model serving patterns
4.1 What is model serving?
4.2 Replicated services pattern: Handling the growing number of serving requests
4.2.1 The problem
4.2.2 The solution
4.2.3 Discussion
4.2.4 Exercises
4.3 Sharded services pattern
4.3.1 The problem: Processing large model serving requests with high-resolution videos
4.3.2 The solution
4.3.3 Discussion
4.3.4 Exercises
4.4 The event-driven processing pattern
4.4.1 The problem: Responding to model serving requests based on events
4.4.2 The solution
4.4.3 Discussion
4.4.4 Exercises
4.5 Answers to exercises
Section 4.2
Section 4.3
Section 4.4
Summary
5 Workflow patterns
5.1 What is workflow?
5.2 Fan-in and fan-out patterns: Composing complex machine learning workflows
5.2.1 The problem
5.2.2 The solution
5.2.3 Discussion
5.2.4 Exercises
5.3 Synchronous and asynchronous patterns: Accelerating workflows with concurrency
5.3.1 The problem
5.3.2 The solution
5.3.3 Discussion
5.3.4 Exercises
5.4 Step memoization pattern: Skipping redundant workloads via memoized steps
5.4.1 The problem
5.4.2 The solution
5.4.3 Discussion
5.4.4 Exercises
5.5 Answers to exercises
Section 5.2
Section 5.3
Section 5.4
Summary
6 Operation patterns
6.1 What are operations in machine learning systems?
6.2 Scheduling patterns: Assigning resources effectively in a shared cluster
6.2.1 The problem
6.2.2 The solution
6.2.3 Discussion
6.2.4 Exercises
6.3 Metadata pattern: Handle failures appropriately to minimize the negative effect on users
6.3.1 The problem
6.3.2 The solution
6.3.3 Discussion
6.3.4 Exercises
6.4 Answers to exercises
Section 6.2
Section 6.3
Summary
Part 3 Building a distributed machine learning workflow
7 Project overview and system architecture
7.1 Project overview
7.1.1 Project background
7.1.2 System components
7.2 Data ingestion
7.2.1 The problem
7.2.2 The solution
7.2.3 Exercises
7.3 Model training
7.3.1 The problem
7.3.2 The solution
7.3.3 Exercises
7.4 Model serving
7.4.1 The problem
7.4.2 The solution
7.4.3 Exercises
7.5 End-to-end workflow
7.5.1 The problems
7.5.2 The solutions
7.5.3 Exercises
7.6 Answers to exercises
Section 7.2
Section 7.3
Section 7.4
Section 7.5
Summary
8 Overview of relevant technologies
8.1 TensorFlow: The machine learning framework
8.1.1 The basics
8.1.2 Exercises
8.2 Kubernetes: The distributed container orchestration system
8.2.1 The basics
8.2.2 Exercises
8.3 Kubeflow: Machine learning workloads on Kubernetes
8.3.1 The basics
8.3.2 Exercises
8.4 Argo Workflows: Container-native workflow engine
8.4.1 The basics
8.4.2 Exercises
8.5 Answers to exercises
Section 8.1
Section 8.2
Section 8.3
Section 8.4
Summary
9 A complete implementation
9.1 Data ingestion
9.1.1 Single-node data pipeline
9.1.2 Distributed data pipeline
9.2 Model training
9.2.1 Model definition and single-node training
9.2.2 Distributed model training
9.2.3 Model selection
9.3 Model serving
9.3.1 Single-server model inference
9.3.2 Replicated model servers
9.4 The end-to-end workflow
9.4.1 Sequential steps
9.4.2 Step memoization
Summary
index

Distributed Machine Learning Patterns
9781617299025

Author / Uploaded
Yuan Tang

Similar Topics
Computers
Algorithms and Data Structures: Pattern Recognition

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

Recommend Papers

Distributed Machine Learning Patterns [1 ed.] 9781617299025

Practical patterns for scaling machine learning from your laptop to a distributed cluster. In Distributed Machine Learn

118 55 9MB Read more

Distributed Machine Learning Patterns 4189448097, 4155892859, 4139115240, 4239780954, 1071502578, 2208102039

Practical patterns for scaling machine learning from your laptop to a distributed cluster. In Distributed Machine Learn

121 24 17MB Read more

Distributed Machine Learning Patterns (MEAP V07) 2213815408, 1381200071, 2031651288, 3886957114

Practical patterns for scaling machine learning from your laptop to a distributed cluster. Scaling up models from standa

179 33 8MB Read more

Optimization Algorithms for Distributed Machine Learning 9783031190667, 9783031190674

227 88 4MB Read more

Optimization Algorithms for Distributed Machine Learning 3031190661, 9783031190667

This book discusses state-of-the-art stochastic optimization algorithms for distributed machine learning and analyzes th

186 66 4MB Read more

Robust Machine Learning Distributed Methods for Safe AI 9789819706877, 9789819706884

Today, machine learning algorithms are often distributed across multiple machines to leverage more computing power and m

123 91 3MB Read more

Patterns, Predictions, and Actions: Foundations of Machine Learning 0691233721, 9780691233727

An authoritative, up-to-date graduate textbook on machine learning that highlights its historical context and societal i

138 108 2MB Read more

Learning Ray: Flexible Distributed Python for Machine Learning [1 ed.] 1098117220, 9781098117221

Get started with Ray, the open source distributed computing framework that simplifies the process of scaling compute-int

101 65 3MB Read more

Learning Ray: Flexible Distributed Python for Machine Learning [1 ed.] 1098117220, 9781098117221

Get started with Ray, the open source distributed computing framework that simplifies the process of scaling compute-int

189 78 7MB Read more

Learning Ray: Flexible Distributed Python for Machine Learning (Final Release) [final ed.] 9781098117221

Get started with Ray, the open source distributed computing framework that simplifies the process of scaling compute-int

193 76 4MB Read more