Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn 9781484297506, 9781484297513

Migrate from Pandas and Scikit-learn to PySpark to handle vast amounts of data and achieve faster data processing time.

120 96 913KB

English Pages 500 Year 2023

Report DMCA / Copyright

DOWNLOAD EPUB FILE

Table of contents :
Table of Contents
About the Author
About the Technical Reviewer
Acknowledgments
Introduction
Chapter 1: An Easy Transition
PySpark and Pandas Integration
Similarity in Syntax
Loading Data
Selecting Columns
Aggregating Data
Filtering Data
Joining Data
Saving Data
Modeling Steps
Pipelines
Summary
Chapter 2: Selecting Algorithms
The Dataset
Selecting Algorithms with Cross-Validation
Scikit-Learn
PySpark
Bringing It All Together
Scikit-Learn
PySpark
Summary
Chapter 3: Multiple Linear Regression with Pandas, Scikit-Learn, and PySpark
The Dataset
Multiple Linear Regression
Multiple Linear Regression with Scikit-Learn
Multiple Linear Regression with PySpark
Summary
Chapter 4: Decision Tree Regression with Pandas, Scikit-Learn, and PySpark
The Dataset
Decision Tree Regression
Decision Tree Regression with Scikit-Learn
The Modeling Steps
Decision Tree Regression with PySpark
The Modeling Steps
Bringing It All Together
Scikit-Learn
PySpark
Summary
Chapter 5: Random Forest Regression with Pandas, Scikit-Learn, and PySpark
The Dataset
Random Forest Regression
Random Forest with Scikit-Learn
Random Forest with PySpark
Bringing It All Together
Scikit-Learn
PySpark
Summary
Chapter 6: Gradient-Boosted Tree Regression with Pandas, Scikit-Learn, and PySpark
The Dataset
Gradient-Boosted Tree (GBT) Regression
GBT with Scikit-Learn
GBT with PySpark
Bringing It All Together
Scikit-Learn
PySpark
Summary
Chapter 7: Logistic Regression with Pandas, Scikit-Learn, and PySpark
The Dataset
Logistic Regression
Logistic Regression with Scikit-Learn
Logistic Regression with PySpark
Putting It All Together
Scikit-Learn
PySpark
Summary
Chapter 8: Decision Tree Classification with Pandas, Scikit-Learn, and PySpark
The Dataset
Decision Tree Classification
Scikit-Learn and PySpark Similarities
Decision Tree Classification with Scikit-Learn
Decision Tree Classification with PySpark
Bringing It All Together
Scikit-Learn
PySpark
Summary
Chapter 9: Random Forest Classification with Scikit- Learn and PySpark
Random Forest Classification
Scikit-Learn and PySpark Similarities for Random Forests
Random Forests with Scikit-Learn
Random Forests with PySpark
Bringing It All Together
Scikit-Learn
PySpark
Summary
Chapter 10: Support Vector Machine Classification with Pandas, Scikit-Learn, and PySpark
The Dataset
Support Vector Machine Classification
Linear SVM with Scikit-Learn
Linear SVM with PySpark
Bringing It All Together
Scikit-Learn
PySpark
Summary
Chapter 11: Naive Bayes Classification with Pandas, Scikit-Learn, and PySpark
The Dataset
Naive Bayes Classification
Naive Bayes with Scikit-Learn
Naive Bayes with PySpark
Bringing It All Together
Scikit-Learn
PySpark
Summary
Chapter 12: Neural Network Classification with Pandas, Scikit-Learn, and PySpark
The Dataset
MLP Classification
MLP Classification with Scikit-Learn
MLP Classification with PySpark
Bringing It All Together
Scikit-Learn
PySpark
Summary
Chapter 13: Recommender Systems with Pandas, Surprise, and PySpark
The Dataset
Building a Recommender System
Recommender System with Surprise
Recommender System with PySpark
Bringing It All Together
Surprise
PySpark
Summary
Chapter 14: Natural Language Processing with Pandas, Scikit-Learn, and PySpark
The Dataset
Cleaning, Tokenization, and Vectorization
Naive Bayes Classification
Naive Bayes with Scikit-Learn
Naive Bayes with PySpark
Bringing It All Together
Scikit-Learn
PySpark
Summary
Chapter 15: k-Means Clustering with Pandas, Scikit-Learn, and PySpark
The Dataset
Machine Learning with k-Means
k-Means Clustering with Scikit-Learn
k-Means Clustering with PySpark
Bringing It All Together
Scikit-Learn
PySpark
Summary
Chapter 16: Hyperparameter Tuning with Scikit-Learn and PySpark
Examples of Hyperparameters
Tuning the Parameters of a Random Forest
Hyperparameter Tuning in Scikit-Learn
Hyperparameter Tuning in PySpark
Bringing It All Together
Scikit-Learn
PySpark
Summary
Chapter 17: Pipelines with Scikit- Learn and PySpark
The Significance of Pipelines
Pipelines with Scikit-Learn
Pipelines with PySpark
Bringing It All Together
Scikit-Learn
PySpark
Summary
Chapter 18: Deploying Models in Production with Scikit- Learn and PySpark
Steps in Model Deployment
Deploying a Multilayer Perceptron (MLP)
Deployment with Scikit-Learn
PySpark
Bringing It All Together
Scikit-Learn
PySpark
Summary
Index

Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn
 9781484297506, 9781484297513

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
Recommend Papers