Ensemble Methods for Machine Learning 9781617297137

InEnsemble Methods for Machine Learning you'll learn to implement the most important ensemble machine learning meth

358 111 15MB

English Pages 339 Year 2023

Table of contents :
contents
front matter
preface
acknowledgments
about this book
about the author
about the cover illustration
Part 1 The basics of ensembles
1 Ensemble methods: Hype or hallelujah?
1.1 Ensemble methods: The wisdom of the crowds
1.2 Why you should care about ensemble learning
1.3 Fit vs. complexity in individual models
Regression with decision trees
Regression with support vector machines
1.4 Our first ensemble
1.5 Terminology and taxonomy for ensemble methods
Part 2 Essential ensemble methods
2 Homogeneous parallel ensembles: Bagging and random forests
2.1 Parallel ensembles
2.2 Bagging: Bootstrap aggregating
Intuition: Resampling and model aggregation
Implementing bagging
Bagging with scikit-learn
Faster training with parallelization
2.3 Random forests
Randomized decision trees
Random forests with scikit-learn
Feature importances
2.4 More homogeneous parallel ensembles
Pasting
Random subspaces and random patches
Extra Trees
2.5 Case study: Breast cancer diagnosis
Loading and preprocessing
Bagging, random forests, and Extra Trees
Feature importances with random forests
3 Heterogeneous parallel ensembles: Combining strong learners
3.1 Base estimators for heterogeneous ensembles
Fitting base estimators
Individual predictions of base estimators
3.2 Combining predictions by weighting
Majority vote
Accuracy weighting
Entropy weighting
Dempster-Shafer combination
3.3 Combining predictions by meta-learning
Stacking
Stacking with cross validation
3.4 Case study: Sentiment analysis
Preprocessing
Dimensionality reduction
Blending classifiers
4 Sequential ensembles: Adaptive boosting
4.1 Sequential ensembles of weak learners
4.2 AdaBoost: Adaptive boosting
Intuition: Learning with weighted examples
Implementing AdaBoost
AdaBoost with scikit-learn
4.3 AdaBoost in practice
Learning rate
Early stopping and pruning
4.4 Case study: Handwritten digit classification
Dimensionality reduction with t-SNE
Boosting
4.5 LogitBoost: Boosting with the logistic loss
Logistic vs. exponential loss functions
Regression as a weak learning algorithm for classification
Implementing LogitBoost
5 Sequential ensembles: Gradient boosting
5.1 Gradient descent for minimization
Gradient descent with an illustrative example
Gradient descent over loss functions for training
5.2 Gradient boosting: Gradient descent + boosting
Intuition: Learning with residuals
Implementing gradient boosting
Gradient boosting with scikit-learn
Histogram-based gradient boosting
5.3 LightGBM: A framework for gradient boosting
What makes LightGBM “light”?
Gradient boosting with LightGBM
5.4 LightGBM in practice
Learning rate
Early stopping
Custom loss functions
5.5 Case study: Document retrieval
The LETOR data set
Document retrieval with LightGBM
6 Sequential ensembles: Newton boosting
6.1 Newton’s method for minimization
Newton’s method with an illustrative example
Newton’s descent over loss functions for training
6.2 Newton boosting: Newton’s method + boosting
Intuition: Learning with weighted residuals
Intuition: Learning with regularized loss functions
Implementing Newton boosting
6.3 XGBoost: A framework for Newton boosting
What makes XGBoost “extreme”?
Newton boosting with XGBoost
6.4 XGBoost in practice
Learning rate
Early stopping
6.5 Case study redux: Document retrieval
The LETOR data set
Document retrieval with XGBoost
Part 3 Ensembles in the wild: Adapting ensemble methods to your data
7 Learning with continuous and count labels
7.1 A brief review of regression
Linear regression for continuous labels
Poisson regression for count labels
Logistic regression for classification labels
Generalized linear models
Nonlinear regression
7.2 Parallel ensembles for regression
Random forests and Extra Trees
Combining regression models
Stacking regression models
7.3 Sequential ensembles for regression
Loss and likelihood functions for regression
Gradient boosting with LightGBM and XGBoost
7.4 Case study: Demand forecasting
The UCI Bike Sharing data set
GLMs and stacking
Random forest and Extra Trees
XGBoost and LightGBM
8 Learning with categorical features
8.1 Encoding categorical features
Types of categorical features
Ordinal and one-hot encoding
Encoding with target statistics
The category_encoders package
8.2 CatBoost: A framework for ordered boosting
Ordered target statistics and ordered boosting
Oblivious decision trees
CatBoost in practice
8.3 Case study: Income prediction
Adult Data Set
Creating preprocessing and modeling pipelines
Category encoding and ensembling
Ordered encoding and boosting with CatBoost
8.4 Encoding high-cardinality string features
9 Explaining your ensembles
9.1 What is interpretability?
Black-box vs. glass-box models
Decision trees (and decision rules)
Generalized linear models
9.2 Case study: Data-driven marketing
Bank Marketing data set
Training ensembles
Feature importances in tree ensembles
9.3 Black-box methods for global explainability
Permutation feature importance
Partial dependence plots
Global surrogate models
9.4 Black-box methods for local explainability
Local surrogate models with LIME
Local interpretability with SHAP
9.5 Glass-box ensembles: Training for interpretability
Explainable boosting machines
EBMs in practice
epilogue
E.1 Further reading
Practical ensemble methods
Theory and foundations of ensemble methods
E.2 A few more advanced topics
Ensemble methods for statistical relational learning
Ensemble methods for deep learning
E.3 Thank you!
index

Ensemble Methods for Machine Learning
9781617297137

Author / Uploaded
Gautam Kunapuli

Similar Topics
Computers
Algorithms and Data Structures: Pattern Recognition

Commentary
(True EPUB, MOBI)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

Recommend Papers

Ensemble Machine Learning Cookbook 9781789136609

627 14 15MB Read more

Ensemble Learning: Pattern Classification Using Ensemble Methods [2 ed.] 9811201951, 9789811201950

This updated compendium provides a methodical introduction with a coherent and unified repository of ensemble methods, t

340 63 10MB Read more

Statistics for Machine Learning: Implement Statistical methods used in Machine Learning using Python 9388511972, 9789388511971

A practical guide that will help you understand the Statistical Foundations of any Machine Learning Problem. Key Featur

518 108 5MB Read more

Robust Machine Learning Distributed Methods for Safe AI 9789819706877, 9789819706884

Today, machine learning algorithms are often distributed across multiple machines to leverage more computing power and m

123 91 3MB Read more

Multivariate Statistical Machine Learning Methods for Genomic Prediction 3030890090, 9783030890094

This book is open access under a CC BY 4.0 license This open access book brings together the latest genome base predicti

101 81 Read more

Information Fusion: Machine Learning Methods 981168975X, 9789811689758

In the big data era, increasing information can be extracted from the same source object or scene. For instance, a perso

121 22 9MB Read more

Feature selection and ensemble methods for bioinformatics 9781609605575, 9781609605582

312 83 3MB Read more

Mathematics for Machine Learning

1,202 89 6MB Read more

Mathematics for Machine Learning

900 101 16MB Read more

$Kernel Methods for Machine Learning with Math and Python: 100 Exercises for Building Logic 9811904006, 9789811904004$

Kernel Methods for Machine Learning with Math and Python: 100 Exercises for Building Logic 9811904006, 9789811904004

The most crucial ability for machine learning and data science is mathematical logic for grasping their essence rather t

103 25 3MB Read more