Table of contents : contents front matter preface acknowledgments about this book about the author about the cover illustration Part 1 The basics of ensembles 1 Ensemble methods: Hype or hallelujah? 1.1 Ensemble methods: The wisdom of the crowds 1.2 Why you should care about ensemble learning 1.3 Fit vs. complexity in individual models Regression with decision trees Regression with support vector machines 1.4 Our first ensemble 1.5 Terminology and taxonomy for ensemble methods Part 2 Essential ensemble methods 2 Homogeneous parallel ensembles: Bagging and random forests 2.1 Parallel ensembles 2.2 Bagging: Bootstrap aggregating Intuition: Resampling and model aggregation Implementing bagging Bagging with scikit-learn Faster training with parallelization 2.3 Random forests Randomized decision trees Random forests with scikit-learn Feature importances 2.4 More homogeneous parallel ensembles Pasting Random subspaces and random patches Extra Trees 2.5 Case study: Breast cancer diagnosis Loading and preprocessing Bagging, random forests, and Extra Trees Feature importances with random forests 3 Heterogeneous parallel ensembles: Combining strong learners 3.1 Base estimators for heterogeneous ensembles Fitting base estimators Individual predictions of base estimators 3.2 Combining predictions by weighting Majority vote Accuracy weighting Entropy weighting Dempster-Shafer combination 3.3 Combining predictions by meta-learning Stacking Stacking with cross validation 3.4 Case study: Sentiment analysis Preprocessing Dimensionality reduction Blending classifiers 4 Sequential ensembles: Adaptive boosting 4.1 Sequential ensembles of weak learners 4.2 AdaBoost: Adaptive boosting Intuition: Learning with weighted examples Implementing AdaBoost AdaBoost with scikit-learn 4.3 AdaBoost in practice Learning rate Early stopping and pruning 4.4 Case study: Handwritten digit classification Dimensionality reduction with t-SNE Boosting 4.5 LogitBoost: Boosting with the logistic loss Logistic vs. exponential loss functions Regression as a weak learning algorithm for classification Implementing LogitBoost 5 Sequential ensembles: Gradient boosting 5.1 Gradient descent for minimization Gradient descent with an illustrative example Gradient descent over loss functions for training 5.2 Gradient boosting: Gradient descent + boosting Intuition: Learning with residuals Implementing gradient boosting Gradient boosting with scikit-learn Histogram-based gradient boosting 5.3 LightGBM: A framework for gradient boosting What makes LightGBM “light”? Gradient boosting with LightGBM 5.4 LightGBM in practice Learning rate Early stopping Custom loss functions 5.5 Case study: Document retrieval The LETOR data set Document retrieval with LightGBM 6 Sequential ensembles: Newton boosting 6.1 Newton’s method for minimization Newton’s method with an illustrative example Newton’s descent over loss functions for training 6.2 Newton boosting: Newton’s method + boosting Intuition: Learning with weighted residuals Intuition: Learning with regularized loss functions Implementing Newton boosting 6.3 XGBoost: A framework for Newton boosting What makes XGBoost “extreme”? Newton boosting with XGBoost 6.4 XGBoost in practice Learning rate Early stopping 6.5 Case study redux: Document retrieval The LETOR data set Document retrieval with XGBoost Part 3 Ensembles in the wild: Adapting ensemble methods to your data 7 Learning with continuous and count labels 7.1 A brief review of regression Linear regression for continuous labels Poisson regression for count labels Logistic regression for classification labels Generalized linear models Nonlinear regression 7.2 Parallel ensembles for regression Random forests and Extra Trees Combining regression models Stacking regression models 7.3 Sequential ensembles for regression Loss and likelihood functions for regression Gradient boosting with LightGBM and XGBoost 7.4 Case study: Demand forecasting The UCI Bike Sharing data set GLMs and stacking Random forest and Extra Trees XGBoost and LightGBM 8 Learning with categorical features 8.1 Encoding categorical features Types of categorical features Ordinal and one-hot encoding Encoding with target statistics The category_encoders package 8.2 CatBoost: A framework for ordered boosting Ordered target statistics and ordered boosting Oblivious decision trees CatBoost in practice 8.3 Case study: Income prediction Adult Data Set Creating preprocessing and modeling pipelines Category encoding and ensembling Ordered encoding and boosting with CatBoost 8.4 Encoding high-cardinality string features 9 Explaining your ensembles 9.1 What is interpretability? Black-box vs. glass-box models Decision trees (and decision rules) Generalized linear models 9.2 Case study: Data-driven marketing Bank Marketing data set Training ensembles Feature importances in tree ensembles 9.3 Black-box methods for global explainability Permutation feature importance Partial dependence plots Global surrogate models 9.4 Black-box methods for local explainability Local surrogate models with LIME Local interpretability with SHAP 9.5 Glass-box ensembles: Training for interpretability Explainable boosting machines EBMs in practice epilogue E.1 Further reading Practical ensemble methods Theory and foundations of ensemble methods E.2 A few more advanced topics Ensemble methods for statistical relational learning Ensemble methods for deep learning E.3 Thank you! index