Statistics and Machine Learning Toolbox. User's Guide. R2023b

140 87 57MB

English Pages 11716 Year 2023

Table of contents :
Getting Started
Statistics and Machine Learning Toolbox Product Description
Supported Data Types
Organizing Data
Test Differences Between Category Means
Grouping Variables
What Are Grouping Variables?
Group Definition
Analysis Using Grouping Variables
Missing Group Values
Dummy Variables
What Are Dummy Variables?
Creating Dummy Variables
Linear Regression with Categorical Covariates
Descriptive Statistics
Measures of Central Tendency
Measures of Central Tendency
Measures of Dispersion
Compare Measures of Dispersion
Exploratory Analysis of Data
Resampling Statistics
Bootstrap Resampling
Jackknife Resampling
Parallel Computing Support for Resampling Methods
Statistical Visualization
Create Scatter Plots Using Grouped Data
Compare Grouped Data Using Box Plots
Distribution Plots
Normal Probability Plots
Probability Plots
Quantile-Quantile Plots
Cumulative Distribution Plots
Visualizing Multivariate Data
Probability Distributions
Working with Probability Distributions
Probability Distribution Objects
Apps and Interactive User Interfaces
Distribution-Specific Functions and Generic Distribution Functions
Supported Distributions
Continuous Distributions (Data)
Continuous Distributions (Statistics)
Discrete Distributions
Multivariate Distributions
Nonparametric Distributions
Flexible Distribution Families
Maximum Likelihood Estimation
Negative Loglikelihood Functions
Find MLEs Using Negative Loglikelihood Function
Random Number Generation
Nonparametric and Empirical Probability Distributions
Overview
Kernel Distribution
Empirical Cumulative Distribution Function
Piecewise Linear Distribution
Pareto Tails
Triangular Distribution
Fit Kernel Distribution Object to Data
Fit Kernel Distribution Using ksdensity
Fit Distributions to Grouped Data Using ksdensity
Fit a Nonparametric Distribution with Pareto Tails
Generate Random Numbers Using the Triangular Distribution
Model Data Using the Distribution Fitter App
Explore Probability Distributions Interactively
Create and Manage Data Sets
Create a New Fit
Display Results
Manage Fits
Evaluate Fits
Exclude Data
Save and Load Sessions
Generate a File to Fit and Plot Distributions
Fit a Distribution Using the Distribution Fitter App
Step 1: Load Sample Data
Step 2: Import Data
Step 3: Create a New Fit
Step 4: Create and Manage Additional Fits
Define Custom Distributions Using the Distribution Fitter App
Open the Distribution Fitter App
Define Custom Distribution
Import Custom Distribution
Explore the Random Number Generation UI
Compare Multiple Distribution Fits
Fit Probability Distribution Objects to Grouped Data
Three-Parameter Weibull Distribution
Multinomial Probability Distribution Objects
Multinomial Probability Distribution Functions
Generate Random Numbers Using Uniform Distribution Inversion
Represent Cauchy Distribution Using t Location-Scale
Generate Cauchy Random Numbers Using Student's t
Generate Correlated Data Using Rank Correlation
Create Gaussian Mixture Model
Fit Gaussian Mixture Model to Data
Simulate Data from Gaussian Mixture Model
Copulas: Generate Correlated Samples
Determining Dependence Between Simulation Inputs
Constructing Dependent Bivariate Distributions
Using Rank Correlation Coefficients
Using Bivariate Copulas
Higher Dimension Copulas
Archimedean Copulas
Simulating Dependent Multivariate Data Using Copulas
Fitting Copulas to Data
Simulating Dependent Random Variables Using Copulas
Fit Custom Distributions
Avoid Numerical Issues When Fitting Custom Distributions
Nonparametric Estimates of Cumulative Distribution Functions and Their Inverses
Modelling Tail Data with the Generalized Pareto Distribution
Modelling Data with the Generalized Extreme Value Distribution
Curve Fitting and Distribution Fitting
Fitting a Univariate Distribution Using Cumulative Probabilities
Gaussian Processes
Gaussian Process Regression Models
Compare Prediction Intervals of GPR Models
Kernel (Covariance) Function Options
Exact GPR Method
Parameter Estimation
Prediction
Computational Complexity of Exact Parameter Estimation and Prediction
Subset of Data Approximation for GPR Models
Subset of Regressors Approximation for GPR Models
Approximating the Kernel Function
Parameter Estimation
Prediction
Predictive Variance Problem
Fully Independent Conditional Approximation for GPR Models
Approximating the Kernel Function
Parameter Estimation
Prediction
Block Coordinate Descent Approximation for GPR Models
Fit GPR Models Using BCD Approximation
Predict Battery State of Charge Using Machine Learning
Random Number Generation
Generating Pseudorandom Numbers
Common Pseudorandom Number Generation Methods
Representing Sampling Distributions Using Markov Chain Samplers
Using the Metropolis-Hastings Algorithm
Using Slice Sampling
Using Hamiltonian Monte Carlo
Generating Quasi-Random Numbers
Quasi-Random Sequences
Quasi-Random Point Sets
Quasi-Random Streams
Generating Data Using Flexible Families of Distributions
Bayesian Linear Regression Using Hamiltonian Monte Carlo
Bayesian Analysis for a Logistic Regression Model
Hypothesis Tests
Hypothesis Test Terminology
Hypothesis Test Assumptions
Hypothesis Testing
Available Hypothesis Tests
Selecting a Sample Size
Analysis of Variance
One-Way ANOVA
Introduction to One-Way ANOVA
Prepare Data for One-Way ANOVA
Perform One-Way ANOVA
Mathematical Details
Two-Way ANOVA
Introduction to Two-Way ANOVA
Prepare Data for Balanced Two-Way ANOVA
Perform Two-Way ANOVA
Mathematical Details
Multiple Comparisons
Multiple Comparisons Using One-Way ANOVA
Multiple Comparisons for Three-Way ANOVA
Multiple Comparison Procedures
N-Way ANOVA
Introduction to N-Way ANOVA
Prepare Data for N-Way ANOVA
Perform N-Way ANOVA
ANOVA with Random Effects
Other ANOVA Models
Analysis of Covariance
Introduction to Analysis of Covariance
Analysis of Covariance Tool
Confidence Bounds
Multiple Comparisons
Nonparametric Methods
Introduction to Nonparametric Methods
Kruskal-Wallis Test
Friedman's Test
Perform Multivariate Analysis of Variance (MANOVA)
Introduction to MANOVA
ANOVA with Multiple Responses
Model Specification for Repeated Measures Models
Wilkinson Notation
Compound Symmetry Assumption and Epsilon Corrections
Mauchly’s Test of Sphericity
Multivariate Analysis of Variance for Repeated Measures
Bayesian Optimization
Bayesian Optimization Algorithm
Algorithm Outline
Gaussian Process Regression for Fitting the Model
Acquisition Function Types
Acquisition Function Maximization
Parallel Bayesian Optimization
Optimize in Parallel
Parallel Bayesian Algorithm
Settings for Best Parallel Performance
Differences in Parallel Bayesian Optimization Output
Bayesian Optimization Plot Functions
Built-In Plot Functions
Custom Plot Function Syntax
Create a Custom Plot Function
Bayesian Optimization Output Functions
What Is a Bayesian Optimization Output Function?
Built-In Output Functions
Custom Output Functions
Bayesian Optimization Output Function
Bayesian Optimization Workflow
What Is Bayesian Optimization?
Ways to Perform Bayesian Optimization
Bayesian Optimization Using a Fit Function
Bayesian Optimization Using bayesopt
Bayesian Optimization Characteristics
Parameters Available for Fit Functions
Hyperparameter Optimization Options for Fit Functions
Variables for a Bayesian Optimization
Syntax for Creating Optimization Variables
Variables for Optimization Examples
Bayesian Optimization Objective Functions
Objective Function Syntax
Objective Function Example
Objective Function Errors
Constraints in Bayesian Optimization
Bounds
Deterministic Constraints — XConstraintFcn
Conditional Constraints — ConditionalVariableFcn
Coupled Constraints
Bayesian Optimization with Coupled Constraints
Optimize Cross-Validated Classifier Using bayesopt
Optimize Classifier Fit Using Bayesian Optimization
Optimize a Boosted Regression Ensemble
Parametric Regression Analysis
Choose a Regression Function
Update Legacy Code with New Fitting Methods
What Is a Linear Regression Model?
Linear Regression
Prepare Data
Choose a Fitting Method
Choose a Model or Range of Models
Fit Model to Data
Examine Quality and Adjust Fitted Model
Predict or Simulate Responses to New Data
Share Fitted Models
Linear Regression Workflow
Regression Using Dataset Arrays
Linear Regression Using Tables
Linear Regression with Interaction Effects
Interpret Linear Regression Results
Cook’s Distance
Purpose
Definition
How To
Determine Outliers Using Cook's Distance
Coefficient Standard Errors and Confidence Intervals
Coefficient Covariance and Standard Errors
Coefficient Confidence Intervals
Coefficient of Determination (R-Squared)
Purpose
Definition
How To
Display Coefficient of Determination
Delete-1 Statistics
Delete-1 Change in Covariance (CovRatio)
Delete-1 Scaled Difference in Coefficient Estimates (Dfbetas)
Delete-1 Scaled Change in Fitted Values (Dffits)
Delete-1 Variance (S2_i)
Durbin-Watson Test
Purpose
Definition
How To
Test for Autocorrelation Among Residuals
F-statistic and t-statistic
F-statistic
Assess Fit of Model Using F-statistic
t-statistic
Assess Significance of Regression Coefficients Using t-statistic
Hat Matrix and Leverage
Hat Matrix
Leverage
Determine High Leverage Observations
Residuals
Purpose
Definition
How To
Assess Model Assumptions Using Residuals
Summary of Output and Diagnostic Statistics
Wilkinson Notation
Overview
Formula Specification
Linear Model Examples
Linear Mixed-Effects Model Examples
Generalized Linear Model Examples
Generalized Linear Mixed-Effects Model Examples
Repeated Measures Model Examples
Stepwise Regression
Stepwise Regression to Select Appropriate Models
Compare Large and Small Stepwise Models
Reduce Outlier Effects Using Robust Regression
Why Use Robust Regression?
Iteratively Reweighted Least Squares
Compare Results of Standard and Robust Least-Squares Fit
Steps for Iteratively Reweighted Least Squares
Ridge Regression
Introduction to Ridge Regression
Ridge Regression
Lasso and Elastic Net
What Are Lasso and Elastic Net?
Lasso and Elastic Net Details
References
Wide Data via Lasso and Parallel Computing
Lasso Regularization
Lasso and Elastic Net with Cross Validation
Partial Least Squares
Introduction to Partial Least Squares
Perform Partial Least-Squares Regression
Linear Mixed-Effects Models
Prepare Data for Linear Mixed-Effects Models
Tables and Dataset Arrays
Design Matrices
Relation of Matrix Form to Tables and Dataset Arrays
Relationship Between Formula and Design Matrices
Formula
Design Matrices for Fixed and Random Effects
Grouping Variables
Estimating Parameters in Linear Mixed-Effects Models
Maximum Likelihood (ML)
Restricted Maximum Likelihood (REML)
Linear Mixed-Effects Model Workflow
Fit Mixed-Effects Spline Regression
Train Linear Regression Model
Analyze Time Series Data
Partial Least Squares Regression and Principal Components Regression
Accelerate Linear Model Fitting on GPU
Generalized Linear Models
Multinomial Models for Nominal Responses
Multinomial Models for Ordinal Responses
Multinomial Models for Hierarchical Responses
Generalized Linear Models
What Are Generalized Linear Models?
Prepare Data
Choose Generalized Linear Model and Link Function
Choose Fitting Method and Model
Fit Model to Data
Examine Quality and Adjust the Fitted Model
Predict or Simulate Responses to New Data
Share Fitted Models
Generalized Linear Model Workflow
Lasso Regularization of Generalized Linear Models
What is Generalized Linear Model Lasso Regularization?
Generalized Linear Model Lasso and Elastic Net
References
Regularize Poisson Regression
Regularize Logistic Regression
Regularize Wide Data in Parallel
Generalized Linear Mixed-Effects Models
What Are Generalized Linear Mixed-Effects Models?
GLME Model Equations
Prepare Data for Model Fitting
Choose a Distribution Type for the Model
Choose a Link Function for the Model
Specify the Model Formula
Display the Model
Work with the Model
Fit a Generalized Linear Mixed-Effects Model
Fitting Data with Generalized Linear Models
Train Generalized Additive Model for Binary Classification
Train Generalized Additive Model for Regression
Nonlinear Regression
Nonlinear Regression
What Are Parametric Nonlinear Regression Models?
Prepare Data
Represent the Nonlinear Model
Choose Initial Vector beta0
Fit Nonlinear Model to Data
Examine Quality and Adjust the Fitted Nonlinear Model
Predict or Simulate Responses Using a Nonlinear Model
Nonlinear Regression Workflow
Mixed-Effects Models
Introduction to Mixed-Effects Models
Mixed-Effects Model Hierarchy
Specifying Mixed-Effects Models
Specifying Covariate Models
Choosing nlmefit or nlmefitsa
Using Output Functions with Mixed-Effects Models
Examining Residuals for Model Verification
Mixed-Effects Models Using nlmefit and nlmefitsa
Weighted Nonlinear Regression
Pitfalls in Fitting Nonlinear Models by Transforming to Linearity
Nonlinear Logistic Regression
Time Series Forecasting
Manually Perform Time Series Forecasting Using Ensembles of Boosted Regression Trees
Perform Time Series Direct Forecasting with directforecaster
Survival Analysis
What Is Survival Analysis?
Introduction
Censoring
Data
Survivor Function
Hazard Function
Kaplan-Meier Method
Hazard and Survivor Functions for Different Groups
Survivor Functions for Two Groups
Cox Proportional Hazards Model
Introduction
Hazard Ratio
Extension of Cox Proportional Hazards Model
Partial Likelihood Function
Partial Likelihood Function for Tied Events
Frequency or Weights of Observations
Cox Proportional Hazards Model for Censored Data
Cox Proportional Hazards Model with Time-Dependent Covariates
Cox Proportional Hazards Model Object
Analyzing Survival or Reliability Data
Multivariate Methods
Multivariate Linear Regression
Introduction to Multivariate Methods
Multivariate Linear Regression Model
Solving Multivariate Regression Problems
Estimation of Multivariate Regression Models
Least Squares Estimation
Maximum Likelihood Estimation
Missing Response Data
Set Up Multivariate Regression Problems
Response Matrix
Design Matrices
Common Multivariate Regression Problems
Multivariate General Linear Model
Fixed Effects Panel Model with Concurrent Correlation
Longitudinal Analysis
Multidimensional Scaling
Nonclassical and Nonmetric Multidimensional Scaling
Nonclassical Multidimensional Scaling
Nonmetric Multidimensional Scaling
Classical Multidimensional Scaling
Compare Handwritten Shapes Using Procrustes Analysis
Introduction to Feature Selection
Feature Selection Algorithms
Feature Selection Functions
Sequential Feature Selection
Introduction to Sequential Feature Selection
Select Subset of Features with Comparative Predictive Power
Nonnegative Matrix Factorization
Perform Nonnegative Matrix Factorization
Principal Component Analysis (PCA)
Analyze Quality of Life in U.S. Cities Using PCA
Factor Analysis
Analyze Stock Prices Using Factor Analysis
Robust Feature Selection Using NCA for Regression
Neighborhood Component Analysis (NCA) Feature Selection
NCA Feature Selection for Classification
NCA Feature Selection for Regression
Impact of Standardization
Choosing the Regularization Parameter Value
t-SNE
What Is t-SNE?
t-SNE Algorithm
Barnes-Hut Variation of t-SNE
Characteristics of t-SNE
t-SNE Output Function
t-SNE Output Function Description
tsne optimValues Structure
t-SNE Custom Output Function
Visualize High-Dimensional Data Using t-SNE
tsne Settings
Feature Extraction
What Is Feature Extraction?
Sparse Filtering Algorithm
Reconstruction ICA Algorithm
Feature Extraction Workflow
Extract Mixed Signals
Select Features for Classifying High-Dimensional Data
Perform Factor Analysis on Exam Grades
Classical Multidimensional Scaling Applied to Nonspatial Distances
Nonclassical Multidimensional Scaling
Fitting an Orthogonal Regression Using Principal Components Analysis
Tune Regularization Parameter to Detect Features Using NCA for Classification
Cluster Analysis
Choose Cluster Analysis Method
Clustering Methods
Comparison of Clustering Methods
Hierarchical Clustering
Introduction to Hierarchical Clustering
Algorithm Description
Similarity Measures
Linkages
Dendrograms
Verify the Cluster Tree
Create Clusters
DBSCAN
Introduction to DBSCAN
Algorithm Description
Determine Values for DBSCAN Parameters
Partition Data Using Spectral Clustering
Introduction to Spectral Clustering
Algorithm Description
Estimate Number of Clusters and Perform Spectral Clustering
k-Means Clustering
Introduction to k-Means Clustering
Compare k-Means Clustering Solutions
Cluster Using Gaussian Mixture Model
How Gaussian Mixture Models Cluster Data
Fit GMM with Different Covariance Options and Initial Conditions
When to Regularize
Model Fit Statistics
Cluster Gaussian Mixture Data Using Hard Clustering
Cluster Gaussian Mixture Data Using Soft Clustering
Tune Gaussian Mixture Models
Cluster Evaluation
Cluster Analysis
Anomaly Detection with Isolation Forest
Introduction to Isolation Forest
Parameters for Isolation Forests
Anomaly Scores
Anomaly Indicators
Detect Outliers and Plot Contours of Anomaly Scores
Examine NumObservationsPerLearner for Small Data
Unsupervised Anomaly Detection
Outlier Detection
Novelty Detection
Model-Specific Anomaly Detection
Detect Outliers After Training Random Forest
Detect Outliers After Training Discriminant Analysis Classifier
Parametric Classification
Parametric Classification
ROC Curve and Performance Metrics
Introduction to ROC Curve
Performance Curve with MATLAB
ROC Curve for Multiclass Classification
Performance Metrics
Classification Scores and Thresholds
Pointwise Confidence Intervals
Performance Curves by perfcurve
Input Scores and Labels for perfcurve
Computation of Performance Metrics
Multiclass Classification Problems
Confidence Intervals
Observation Weights
Classification
Nonparametric Supervised Learning
Supervised Learning Workflow and Algorithms
What Is Supervised Learning?
Steps in Supervised Learning
Characteristics of Classification Algorithms
Misclassification Cost Matrix, Prior Probabilities, and Observation Weights
Visualize Decision Surfaces of Different Classifiers
Classification Using Nearest Neighbors
Pairwise Distance Metrics
k-Nearest Neighbor Search and Radius Search
Classify Query Data
Find Nearest Neighbors Using a Custom Distance Metric
K-Nearest Neighbor Classification for Supervised Learning
Construct KNN Classifier
Examine Quality of KNN Classifier
Predict Classification Using KNN Classifier
Modify KNN Classifier
Framework for Ensemble Learning
Prepare the Predictor Data
Prepare the Response Data
Choose an Applicable Ensemble Aggregation Method
Set the Number of Ensemble Members
Prepare the Weak Learners
Call fitcensemble or fitrensemble
Ensemble Algorithms
Bootstrap Aggregation (Bagging) and Random Forest
Random Subspace
Boosting Algorithms
Train Classification Ensemble
Train Regression Ensemble
Select Predictors for Random Forests
Test Ensemble Quality
Ensemble Regularization
Regularize a Regression Ensemble
Classification with Imbalanced Data
Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles
Train Ensemble With Unequal Classification Costs
Surrogate Splits
LPBoost and TotalBoost for Small Ensembles
Tune RobustBoost
Random Subspace Classification
Train Classification Ensemble in Parallel
Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger
Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger
Detect Outliers Using Quantile Regression
Conditional Quantile Estimation Using Kernel Smoothing
Tune Random Forest Using Quantile Error and Bayesian Optimization
Assess Neural Network Classifier Performance
Assess Regression Neural Network Performance
Automated Feature Engineering for Classification
Interpret Linear Model with Generated Features
Generate New Features to Improve Bagged Ensemble Accuracy
Automated Feature Engineering for Regression
Interpret Linear Model with Generated Features
Generate New Features to Improve Bagged Ensemble Performance
Moving Towards Automating Model Selection Using Bayesian Optimization
Automated Classifier Selection with Bayesian and ASHA Optimization
Automated Regression Model Selection with Bayesian and ASHA Optimization
Credit Rating by Bagging Decision Trees
Combine Heterogeneous Models into Stacked Ensemble
Label Data Using Semi-Supervised Learning Techniques
Bibliography
Decision Trees
Decision Trees
Train Classification Tree
Train Regression Tree
View Decision Tree
Growing Decision Trees
Prediction Using Classification and Regression Trees
Predict Out-of-Sample Responses of Subtrees
Improving Classification Trees and Regression Trees
Examining Resubstitution Error
Cross Validation
Choose Split Predictor Selection Technique
Control Depth or “Leafiness”
Pruning
Splitting Categorical Predictors in Classification Trees
Challenges in Splitting Multilevel Predictors
Algorithms for Categorical Predictor Split
Inspect Data with Multilevel Categorical Predictors
Discriminant Analysis
Discriminant Analysis Classification
Create Discriminant Analysis Classifiers
Creating Discriminant Analysis Model
Weighted Observations
Prediction Using Discriminant Analysis Models
Posterior Probability
Prior Probability
Cost
Create and Visualize Discriminant Analysis Classifier
Improving Discriminant Analysis Models
Deal with Singular Data
Choose a Discriminant Type
Examine the Resubstitution Error and Confusion Matrix
Cross Validation
Change Costs and Priors
Regularize Discriminant Analysis Classifier
Examine the Gaussian Mixture Assumption
Bartlett Test of Equal Covariance Matrices for Linear Discriminant Analysis
Q-Q Plot
Mardia Kurtosis Test of Multivariate Normality
Naive Bayes
Naive Bayes Classification
Supported Distributions
Plot Posterior Classification Probabilities
Classification Learner
Machine Learning in MATLAB
What Is Machine Learning?
Selecting the Right Algorithm
Train Classification Models in Classification Learner App
Train Regression Models in Regression Learner App
Train Neural Networks for Deep Learning
Train Classification Models in Classification Learner App
Automated Classifier Training
Manual Classifier Training
Parallel Classifier Training
Compare and Improve Classification Models
Select Data for Classification or Open Saved App Session
Select Data from Workspace
Import Data from File
Example Data for Classification
Choose Validation Scheme
(Optional) Reserve Data for Testing
Save and Open App Session
Choose Classifier Options
Choose Classifier Type
Decision Trees
Discriminant Analysis
Logistic Regression Classifiers
Naive Bayes Classifiers
Support Vector Machines
Efficiently Trained Linear Classifiers
Nearest Neighbor Classifiers
Kernel Approximation Classifiers
Ensemble Classifiers
Neural Network Classifiers
Feature Selection and Feature Transformation Using Classification Learner App
Investigate Features in the Scatter Plot
Select Features to Include
Transform Features with PCA in Classification Learner
Investigate Features in the Parallel Coordinates Plot
Misclassification Costs in Classification Learner App
Specify Misclassification Costs
Assess Model Performance
Misclassification Costs in Exported Model and Generated Code
Hyperparameter Optimization in Classification Learner App
Select Hyperparameters to Optimize
Optimization Options
Minimum Classification Error Plot
Optimization Results
Visualize and Assess Classifier Performance in Classification Learner
Check Performance in the Models Pane
View Model Metrics in Summary Tab and Models Pane
Compare Model Information and Results in Table View
Plot Classifier Results
Check Performance Per Class in the Confusion Matrix
Check ROC Curve
Compare Model Plots by Changing Layout
Evaluate Test Set Model Performance
Export Plots in Classification Learner App
Export Classification Model to Predict New Data
Export the Model to the Workspace to Make Predictions for New Data
Make Predictions for New Data Using Exported Model
Generate MATLAB Code to Train the Model with New Data
Generate C Code for Prediction
Deploy Predictions Using MATLAB Compiler
Export Model for Deployment to MATLAB Production Server
Train Decision Trees Using Classification Learner App
Train Discriminant Analysis Classifiers Using Classification Learner App
Train Binary GLM Logistic Regression Classifier Using Classification Learner App
Train Support Vector Machines Using Classification Learner App
Train Nearest Neighbor Classifiers Using Classification Learner App
Train Kernel Approximation Classifiers Using Classification Learner App
Train Ensemble Classifiers Using Classification Learner App
Train Naive Bayes Classifiers Using Classification Learner App
Train Neural Network Classifiers Using Classification Learner App
Train and Compare Classifiers Using Misclassification Costs in Classification Learner App
Train Classifier Using Hyperparameter Optimization in Classification Learner App
Check Classifier Performance Using Test Set in Classification Learner App
Explain Model Predictions for Classifiers Trained in Classification Learner App
Explain Local Model Predictions Using LIME Values
Explain Local Model Predictions Using Shapley Values
Interpret Model Using Partial Dependence Plots
Use Partial Dependence Plots to Interpret Classifiers Trained in Classification Learner App
Deploy Model Trained in Classification Learner to MATLAB Production Server
Choose Trained Model to Deploy
Export Model for Deployment
(Optional) Simulate Model Deployment
Package Code
Build Condition Model for Industrial Machinery and Manufacturing Processes
Load Data
Import Data into App and Partition Data
Train Models Using All Features
Assess Model Performance
Export Model to the Workspace and Save App Session
Check Model Size
Resume App Session
Select Features Using Feature Ranking
Investigate Important Features in Scatter Plot
Further Experimentation
Assess Model Accuracy on Test Set
Export Final Model
Export Model from Classification Learner to Experiment Manager
Export Classification Model
Select Hyperparameters
(Optional) Customize Experiment
Run Experiment
Tune Classification Model Using Experiment Manager
Load and Partition Data
Train Models in Classification Learner
Assess Best Model Performance
Export Model to Experiment Manager
Run Experiment with Default Hyperparameters
Adjust Hyperparameters and Hyperparameter Values
Specify Training Data
Customize Confusion Matrix
Export and Use Final Model
Regression Learner
Train Regression Models in Regression Learner App
Automated Regression Model Training
Manual Regression Model Training
Parallel Regression Model Training
Compare and Improve Regression Models
Select Data for Regression or Open Saved App Session
Select Data from Workspace
Import Data from File
Example Data for Regression
Choose Validation Scheme
(Optional) Reserve Data for Testing
Save and Open App Session
Choose Regression Model Options
Choose Regression Model Type
Linear Regression Models
Regression Trees
Support Vector Machines
Efficiently Trained Linear Regression Models
Gaussian Process Regression Models
Kernel Approximation Models
Ensembles of Trees
Neural Networks
Feature Selection and Feature Transformation Using Regression Learner App
Investigate Features in the Response Plot
Select Features to Include
Transform Features with PCA in Regression Learner
Hyperparameter Optimization in Regression Learner App
Select Hyperparameters to Optimize
Optimization Options
Minimum MSE Plot
Optimization Results
Visualize and Assess Model Performance in Regression Learner
Check Performance in Models Pane
View Model Metrics in Summary Tab and Models Pane
Compare Model Information and Results in Table View
Explore Data and Results in Response Plot
Plot Predicted vs. Actual Response
Evaluate Model Using Residuals Plot
Compare Model Plots by Changing Layout
Evaluate Test Set Model Performance
Export Plots in Regression Learner App
Export Regression Model to Predict New Data
Export Model to Workspace
Make Predictions for New Data Using Exported Model
Generate MATLAB Code to Train Model with New Data
Generate C Code for Prediction
Deploy Predictions Using MATLAB Compiler
Export Model for Deployment to MATLAB Production Server
Train Regression Trees Using Regression Learner App
Compare Linear Regression Models Using Regression Learner App
Train Regression Neural Networks Using Regression Learner App
Train Kernel Approximation Model Using Regression Learner App
Train Regression Model Using Hyperparameter Optimization in Regression Learner App
Check Model Performance Using Test Set in Regression Learner App
Explain Model Predictions for Regression Models Trained in Regression Learner App
Explain Local Model Predictions Using LIME Values
Explain Local Model Predictions Using Shapley Values
Interpret Model Using Partial Dependence Plots
Use Partial Dependence Plots to Interpret Regression Models Trained in Regression Learner App
Deploy Model Trained in Regression Learner to MATLAB Production Server
Choose Trained Model to Deploy
Export Model for Deployment
(Optional) Simulate Model Deployment
Package Code
Export Model from Regression Learner to Experiment Manager
Export Regression Model
Select Hyperparameters
(Optional) Customize Experiment
Run Experiment
Tune Regression Model Using Experiment Manager
Load and Partition Data
Train Models in Regression Learner
Assess Best Model Performance
Export Model to Experiment Manager
Run Experiment with Default Hyperparameters
Adjust Hyperparameters and Hyperparameter Values
Specify Training Data
Add Residuals Plot
Export and Use Final Model
Support Vector Machines
Support Vector Machines for Binary Classification
Understanding Support Vector Machines
Using Support Vector Machines
Train SVM Classifiers Using a Gaussian Kernel
Train SVM Classifier Using Custom Kernel
Optimize Classifier Fit Using Bayesian Optimization
Plot Posterior Probability Regions for SVM Classification Models
Analyze Images Using Linear Support Vector Machines
Understanding Support Vector Machine Regression
Mathematical Formulation of SVM Regression
Solving the SVM Regression Optimization Problem
Fairness
Introduction to Fairness in Binary Classification
Reduce Statistical Parity Difference Using Fairness Weights
Reduce Disparate Impact of Predictions
Interpretability
Interpret Machine Learning Models
Features for Model Interpretation
Interpret Classification Model
Interpret Regression Model
Shapley Values for Machine Learning Model
What Is a Shapley Value?
Shapley Value in Statistics and Machine Learning Toolbox
Algorithms
Specify Computation Algorithm
Computational Cost
Reduce Computational Cost
Incremental Learning
Incremental Learning Overview
What Is Incremental Learning?
Incremental Learning with MATLAB
Incremental Anomaly Detection Overview
What Is Incremental Anomaly Detection?
Incremental Anomaly Detection with MATLAB
Configure Incremental Learning Model
Call Object Directly
Convert Traditionally Trained Model
Configure Model for Incremental Anomaly Detection
Call Object Directly
Convert Traditionally Trained Model
Implement Incremental Learning for Regression Using Succinct Workflow
Implement Incremental Learning for Classification Using Succinct Workflow
Implement Incremental Learning for Regression Using Flexible Workflow
Implement Incremental Learning for Classification Using Flexible Workflow
Initialize Incremental Learning Model from SVM Regression Model Trained in Regression Learner
Initialize Incremental Learning Model from Logistic Regression Model Trained in Classification Learner
Perform Conditional Training During Incremental Learning
Perform Text Classification Incrementally
Incremental Learning with Naive Bayes and Heterogeneous Data
Monitor Equipment State of Health Using Drift-Aware Learning
Monitor Equipment State of Health Using Drift-Aware Learning on the Cloud
Markov Models
Markov Chains
Hidden Markov Models (HMM)
Introduction to Hidden Markov Models (HMM)
Analyzing Hidden Markov Models
Design of Experiments
Design of Experiments
Full Factorial Designs
Multilevel Designs
Two-Level Designs
Fractional Factorial Designs
Introduction to Fractional Factorial Designs
Plackett-Burman Designs
General Fractional Designs
Response Surface Designs
Introduction to Response Surface Designs
Central Composite Designs
Box-Behnken Designs
D-Optimal Designs
Introduction to D-Optimal Designs
Generate D-Optimal Designs
Augment D-Optimal Designs
Specify Fixed Covariate Factors
Specify Categorical Factors
Specify Candidate Sets
Improve an Engine Cooling Fan Using Design for Six Sigma Techniques
Statistical Process Control
Control Charts
Capability Studies
Tall Arrays
Logistic Regression with Tall Arrays
Bayesian Optimization with Tall Arrays
Statistics and Machine Learning with Big Data Using Tall Arrays
Parallel Statistics
Quick Start Parallel Computing for Statistics and Machine Learning Toolbox
Parallel Statistics and Machine Learning Toolbox Functionality
How to Compute in Parallel
Use Parallel Processing for Regression TreeBagger Workflow
Concepts of Parallel Computing in Statistics and Machine Learning Toolbox
Subtleties in Parallel Computing
Vocabulary for Parallel Computation
When to Run Statistical Functions in Parallel
Why Run in Parallel?
Factors Affecting Speed
Factors Affecting Results
Analyze and Model Data on GPU
Working with parfor
How Statistical Functions Use parfor
Characteristics of parfor
Reproducibility in Parallel Statistical Computations
Issues and Considerations in Reproducing Parallel Computations
Running Reproducible Parallel Computations
Parallel Statistical Computation Using Random Numbers
Implement Jackknife Using Parallel Computing
Implement Cross-Validation Using Parallel Computing
Simple Parallel Cross Validation
Reproducible Parallel Cross Validation
Implement Bootstrap Using Parallel Computing
Bootstrap in Serial and Parallel
Reproducible Parallel Bootstrap
Code Generation
Introduction to Code Generation
Code Generation Workflows
Code Generation Applications
General Code Generation Workflow
Define Entry-Point Function
Generate Code
Verify Generated Code
Code Generation for Prediction of Machine Learning Model at Command Line
Code Generation for Incremental Learning
Code Generation for Nearest Neighbor Searcher
Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App
Code Generation and Classification Learner App
Load Sample Data
Enable PCA
Train Models
Export Model to Workspace
Generate C Code for Prediction
Deploy Neural Network Regression Model to FPGA/ASIC Platform
Predict Class Labels Using MATLAB Function Block
Specify Variable-Size Arguments for Code Generation
Create Dummy Variables for Categorical Predictors and Generate C/C++ Code
System Objects for Classification and Code Generation
Predict Class Labels Using Stateflow
Human Activity Recognition Simulink Model for Smartphone Deployment
Human Activity Recognition Simulink Model for Fixed-Point Deployment
Code Generation for Prediction and Update Using Coder Configurer
Code Generation for Probability Distribution Objects
Fixed-Point Code Generation for Prediction of SVM
Generate Code to Classify Data in Table
Code Generation for Image Classification
Predict Class Labels Using ClassificationSVM Predict Block
Predict Responses Using RegressionSVM Predict Block
Predict Class Labels Using ClassificationTree Predict Block
Predict Responses Using RegressionTree Predict Block
Predict Class Labels Using ClassificationEnsemble Predict Block
Predict Responses Using RegressionEnsemble Predict Block
Predict Class Labels Using ClassificationNeuralNetwork Predict Block
Predict Responses Using RegressionNeuralNetwork Predict Block
Predict Responses Using RegressionGP Predict Block
Predict Class Labels Using ClassificationKNN Predict Block
Predict Class Labels Using ClassificationLinear Predict Block
Predict Responses Using RegressionLinear Predict Block
Predict Class Labels Using ClassificationECOC Predict Block
Predict Class Labels Using ClassificationNaiveBayes Predict Block
Code Generation for Binary GLM Logistic Regression Model Trained in Classification Learner
Code Generation for Anomaly Detection
Compress Machine Learning Model for Memory-Limited Hardware
Verify and Validate Machine Learning Models Using Model-Based Design
Find Nearest Neighbors Using KNN Search Block
Perform Incremental Learning Using IncrementalRegressionLinear Fit and Predict Blocks
Perform Incremental Learning Using IncrementalClassificationLinear Fit and Predict Blocks
Perform Incremental Learning and Track Performance Metrics Using Update Metrics Block
Functions
addedvarplot
addK
addlevels
addInteractions
qrandstream.addlistener
addMetrics
GeneralizedLinearMixedModel.anova
addTerms
addTerms
adtest
andrewsplot
anova
anova
anova
anova1
anova2
anovan
anova
ansaribradley
aoctool
TreeBagger.append
average
barttest
barttest
BayesianOptimization
bayesopt
bbdesign
bestPoint
betacdf
betafit
betainv
betalike
betapdf
betarnd
betastat
binocdf
binofit
binoinv
binopdf
binornd
binostat
binScatterPlot
biplot
bootci
bootstrp
boxchart
boxchart
boxplot
boundary
CalinskiHarabaszEvaluation
candexch
candgen
canoncorr
canonvars
capability
capaplot
caseread
casewrite
DaviesBouldinEvaluation
dataset.cat
cdf
ccdesign
cdf
cdfplot
cell2dataset
dataset.cellstr
chi2cdf
chi2gof
chi2inv
chi2pdf
chi2rnd
chi2stat
cholcov
ClassificationBaggedEnsemble
ClassificationECOC
ClassificationECOC Predict
ClassificationECOCCoderConfigurer
ClassificationDiscriminant
ClassificationEnsemble
ClassificationEnsemble Predict
ClassificationKNN
KNN Search
ClassificationKNN Predict
ClassificationLinear
ClassificationLinear Predict
ClassificationLinearCoderConfigurer
ClassificationNaiveBayes
ClassificationNaiveBayes Predict
ClassificationNeuralNetwork
ClassificationNeuralNetwork Predict
IncrementalClassificationLinear Predict
IncrementalClassificationLinear Fit
IncrementalRegressionLinear Predict
IncrementalRegressionLinear Fit
Update Metrics
ClassificationPartitionedECOC
ClassificationPartitionedEnsemble
ClassificationPartitionedGAM
ClassificationPartitionedKernel
ClassificationPartitionedKernelECOC
ClassificationPartitionedLinear
ClassificationPartitionedLinearECOC
ClassificationPartitionedModel
ClassificationSVM
ClassificationSVMCoderConfigurer
ClassificationSVM Predict
ClassificationTree
ClassificationTree Predict
ClassificationTreeCoderConfigurer
classify
cluster
cluster
clusterdata
Cluster Data
cmdscale
coefci
coefCI
GeneralizedLinearMixedModel.coefCI
coefCI
coefCI
coefCI
coefCI
coefTest
GeneralizedLinearMixedModel.coefTest
coefTest
coefTest
coeftest
coefTest
coefTest
coeftest
CompactTreeBagger.combine
combnk
compact
compact
compact
ClassificationEnsemble.compact
compact
compact
compact
compact
RegressionEnsemble.compact
RegressionSVM.compact
compact
TreeBagger.compact
CompactClassificationDiscriminant
CompactClassificationECOC
CompactClassificationEnsemble
ClassificationGAM
CompactClassificationNaiveBayes
CompactClassificationNeuralNetwork
CompactClassificationGAM
CompactClassificationSVM
CompactClassificationTree
CompactDirectForecaster
CompactLinearModel
CompactGeneralizedLinearModel
CompactRegressionEnsemble
CompactRegressionGAM
CompactRegressionGP
CompactRegressionNeuralNetwork
CompactRegressionSVM
CompactRegressionTree
CompactTreeBagger
GeneralizedLinearMixedModel.compare
compare
compareHoldout
confusionchart
ConfusionMatrixChart
confusionmat
controlchart
controlrules
cophenet
copulacdf
copulafit
copulaparam
copulapdf
copulastat
copularnd
cordexch
corr
corrcov
GeneralizedLinearMixedModel.covarianceParameters
covarianceParameters
CoxModel
coxphfit
createns
crosstab
crossval
crossval
crossval
crossval
ClassificationEnsemble.crossval
crossval
crossval
RegressionEnsemble.crossval
RegressionSVM.crossval
crossval
cvloss
cvloss
cvloss
cvpartition
cvpredict
cvshrink
RegressionEnsemble.cvshrink
datasample
dataset
dataset
dataset.dataset2cell
dataset.dataset2struct
dataset2table
dataset.datasetfun
daugment
dbscan
dcovary
qrandstream.delete
dendrogram
describe
designecoc
detectdrift
detectdrift
devianceTest
GeneralizedLinearMixedModel.designMatrix
designMatrix
directforecaster
discardResiduals
discardSupportVectors
discardSupportVectors
CompactRegressionSVM.discardSupportVectors
dataset.disp
qrandstream.disp
disparateImpactRemover
dataset.display
distributionFitter
Probability Distribution Function
dataset.double
DriftDiagnostics
ecdf
histcounts
plotDriftStatus
plotEmpiricalCDF
plotHistogram
plotPermutationResults
summary
DriftDetectionMethod
droplevels
dummyvar
dwtest
dwtest
ecdf
ecdfhist
edge
edge
ClassificationLinear.edge
edge
edge
CompactClassificationEnsemble.edge
edge
edge
edge
edge
dataset.end
epsilon
evcdf
evfit
evinv
qrandstream.eq
CompactTreeBagger.error
TreeBagger.error
evalclusters
evlike
evpdf
evrnd
evstat
expcdf
expfit
ExhaustiveSearcher
expinv
explike
dataset.export
exppdf
exprnd
expstat
factoran
fairnessMetrics
fairnessThresholder
fairnessWeights
fcdf
FeatureTransformer
feval
feval
feval
feval
ff2n
TreeBagger.fillprox
qrandstream.findobj
qrandstream.findprop
finv
fishertest
fit
fit
fit
fit
fit
fit
fit
fit
fit
fitcauto
fitcdiscr
fitcecoc
fitcensemble
fitcgam
fitcknn
fitclinear
fitcnb
fitcnet
fitcox
fitcsvm
fitctree
fitglm
fitglme
fitgmdist
fitlm
fitlme
fitlmematrix
fitmnr
fitsemigraph
fitsemiself
fitrauto
fitrgam
fitrgp
fitrlinear
fitrm
fitrnet
fitdist
fitensemble
fitnlm
fitPosterior
fitPosterior
fitrensemble
fitrsvm
fitrtree
fitSVMPosterior
GeneralizedLinearMixedModel.fitted
fitted
GeneralizedLinearMixedModel.fixedEffects
fixedEffects
forecast
fpdf
fracfact
fracfactgen
friedman
frnd
fscchi2
fscmrmr
fscnca
fsrnca
fsrftest
fsrmrmr
fstat
fsulaplacian
fsurfht
fullfact
gagerr
gamcdf
gamfit
gaminv
gamlike
gampdf
gamrnd
gamstat
gardnerAltmanPlot
gather
qrandstream.ge
GeneralizedLinearMixedModel
GeneralizedLinearModel
generateCode
generateFiles
generateLearnerDataTypeFcn
gencfeatures
genrfeatures
geocdf
geoinv
geomean
geopdf
geornd
geostat
GapEvaluation
dataset.get
getlabels
getlevels
gevcdf
gevfit
gevinv
gevlike
gevpdf
gevrnd
gevstat
gline
glmfit
glmval
glyphplot
gmdistribution
gname
gpcdf
gpfit
gpinv
gplike
gppdf
gplotmatrix
gprnd
gpstat
groupmeans
groupmeans
TreeBagger.growTrees
grp2idx
grpstats
grpstats
gscatter
qrandstream.gt
haltonset
harmmean
hazardratio
hist3
histfit
hmmdecode
hmmestimate
hmmgenerate
hmmtrain
hmmviterbi
HoeffdingDriftDetectionMethod
dataset.horzcat
hougen
hygecdf
hygeinv
hygepdf
hygernd
hygestat
hyperparameters
icdf
inconsistent
increaseB
interactionplot
dataset.intersect
invpred
iqr
incrementalConceptDriftDetector
incrementalClassificationECOC
incrementalClassificationKernel
incrementalClassificationLinear
incrementalClassificationNaiveBayes
incrementalOneClassSVM
incrementalRobustRandomCutForest
incrementalDriftAwareLearner
incrementalLearner
incrementalLearner
incrementalLearner
incrementalLearner
incrementalLearner
incrementalLearner
incrementalLearner
incrementalLearner
incrementalLearner
incrementalLearner
incrementalRegressionKernel
incrementalRegressionLinear
dataset.isempty
isanomaly
isanomaly
isanomaly
isanomaly
isanomaly
isanomaly
islevel
iforest
dataset.ismember
dataset.ismissing
IsolationForest
qrandstream.isvalid
iwishrnd
jackknife
jbtest
johnsrnd
dataset.join
KDTreeSearcher
kfoldEdge
kfoldEdge
kfoldEdge
ClassificationPartitionedLinear.kfoldEdge
ClassificationPartitionedLinearECOC.kfoldEdge
kfoldEdge
kfoldfun
kfoldfun
kfoldfun
kfoldLoss
kfoldLoss
kfoldLoss
ClassificationPartitionedLinear.kfoldLoss
ClassificationPartitionedLinearECOC.kfoldLoss
kfoldLoss
RegressionPartitionedLinear.kfoldLoss
kfoldLoss
kfoldMargin
kfoldMargin
kfoldMargin
ClassificationPartitionedLinear.kfoldMargin
ClassificationPartitionedLinearECOC.kfoldMargin
kfoldMargin
kfoldPredict
kfoldPredict
kfoldPredict
ClassificationPartitionedLinear.kfoldPredict
ClassificationPartitionedLinearECOC.kfoldPredict
kfoldPredict
RegressionPartitionedLinear.kfoldPredict
kfoldPredict
kmeans
kmedoids
knnsearch
knnsearch
kruskalwallis
ksdensity
kstest
kstest2
kurtosis
lasso
lassoglm
lassoPlot
qrandstream.le
learnerCoderConfigurer
dataset.length
levelcounts
leverage
lhsdesign
lhsnorm
lillietest
lime
LinearModel
LinearMixedModel
linhyptest
linhyptest
linkage
loadCompactModel
loadLearnerForCoder
LocalOutlierFactor
lof
logncdf
lognfit
logninv
lognlike
lognpdf
lognrnd
lognstat
logp
logp
logp
loss
loss
ClassificationLinear.loss
loss
loss
CompactClassificationEnsemble.loss
loss
loss
loss
loss
CompactRegressionEnsemble.loss
CompactRegressionGP.loss
loss
CompactRegressionSVM.loss
loss
loss
loss
loss
loss
loss
loss
loss
FeatureSelectionNCAClassification.loss
FeatureSelectionNCARegression.loss
loss
RegressionLinear.loss
lowerparams
qrandstream.lt
lsline
mad
mahal
mahal
mahal
maineffectsplot
makecdiscr
makedist
manova
manova
manova1
manovacluster
margin
margin
ClassificationLinear.margin
margin
margin
CompactClassificationEnsemble.margin
margin
margin
margin
margin
CompactTreeBagger.margin
TreeBagger.margin
margmean
mauchly
mat2dataset
mdscale
CompactTreeBagger.mdsprox
TreeBagger.mdsprox
mean
meanEffectSize
CompactTreeBagger.meanMargin
TreeBagger.meanMargin
surrogateAssociation
surrogateAssociation
median
mergelevels
mhsample
mle
mlecov
mnpdf
mnrfit
mnrnd
mnrval
moment
multcompare
MultinomialRegression
multcompare
multcompare
multcompare
multivarichart
mvksdensity
mvncdf
mvnpdf
mvregress
mvregresslike
mvnrnd
mvtcdf
mvtpdf
mvtrnd
nancov
nanmax
nanmean
nanmedian
nanmin
nanstd
nansum
nanvar
nearcorr
nbincdf
nbinfit
nbininv
nbinpdf
nbinrnd
nbinstat
FeatureSelectionNCAClassification
FeatureSelectionNCARegression
ncfcdf
ncfinv
ncfpdf
ncfrnd
ncfstat
nctcdf
nctinv
nctpdf
nctrnd
nctstat
ncx2cdf
ncx2inv
ncx2pdf
ncx2rnd
ncx2stat
dataset.ndims
qrandstream.ne
negloglik
net
nLinearCoeffs
nlinfit
nlintool
nlmefit
nlmefitsa
nlparci
nlpredci
nnmf
nodeVariableRange
nominal
qrandstream.notify
NonLinearModel
normcdf
normfit
norminv
normlike
normpdf
normplot
normrnd
normspec
normstat
nsegments
dataset.numel
ocsvm
OneClassSVM
onehotdecode
onehotencode
optimalleaforder
ClassificationBaggedEnsemble.oobEdge
TreeBagger.oobError
ClassificationBaggedEnsemble.oobLoss
RegressionBaggedEnsemble.oobLoss
ClassificationBaggedEnsemble.oobMargin
TreeBagger.oobMargin
TreeBagger.oobMeanMargin
ClassificationBaggedEnsemble.oobPermutedPredictorImportance
RegressionBaggedEnsemble.oobPermutedPredictorImportance
ClassificationBaggedEnsemble.oobPredict
RegressionBaggedEnsemble.oobPredict
TreeBagger.oobPredict
TreeBagger.oobQuantileError
TreeBagger.oobQuantilePredict
optimizableVariable
ordinal
CompactTreeBagger.outlierMeasure
parallelcoords
paramci
paretotails
partialcorr
partialcorri
partialDependence
PartitionedDirectForecaster
pca
pcacov
perObservationLoss
perObservationLoss
perObservationLoss
pcares
ppca
pdf
pdf
pdist
pdist2
pearscdf
pearspdf
pearsrnd
perfcurve
plot
plot
plot
plot
plot
plot
plot
plot
plotAdded
plotAdjustedResponse
plot
plotComparisons
plotDiagnostics
plotDiagnostics
plotDiagnostics
plotEffects
plotInteraction
plotLocalEffects
plotPartialDependence
plotprofile
plotprofile
plotResiduals
GeneralizedLinearMixedModel.plotResiduals
plotResiduals
plotResiduals
plotResiduals
plotResiduals
plotSlice
plotSlice
plotSlice
plotSlice
plotSurvival
plsregress
qrandstream.PointSet
poisscdf
poissfit
poissinv
poisspdf
poissrnd
poisstat
polyconf
polytool
posterior
RegressionGP.postFitStatistics
predict
predict
ClassificationLinear.predict
predict
predict
CompactClassificationEnsemble.predict
predict
predict
predict
predict
CompactRegressionEnsemble.predict
CompactRegressionGP.predict
predict
CompactRegressionSVM.predict
predict
predict
predict
predict
predict
predict
predict
predict
predict
RegressionLinear.predict
CompactTreeBagger.predict
predict
GeneralizedLinearMixedModel.predict
predict
predict
predict
FeatureSelectionNCAClassification.predict
FeatureSelectionNCARegression.predict
predict
predict
predict
predict
TreeBagger.predict
predictConstraints
predictError
predictObjective
predictObjectiveEvaluationTime
CompactClassificationEnsemble.predictorImportance
predictorImportance
CompactRegressionEnsemble.predictorImportance
predictorImportance
preparedPredictors
probplot
procrustes
proflik
CompactTreeBagger.proximity
prune
prune
qrandstream.qrand
qrandstream
qrandstream
qqplot
qrandstream.rand
TreeBagger.quantileError
TreeBagger.quantilePredict
randg
random
random
GeneralizedLinearMixedModel.random
random
random
random
random
random
random
GeneralizedLinearMixedModel.randomEffects
randomEffects
randsample
randtool
range
rangesearch
rangesearch
ranksum
ranova
raylcdf
raylfit
raylinv
raylpdf
raylrnd
raylstat
rcoplot
ReconstructionICA
Reduce Dimensionality
refcurve
GeneralizedLinearMixedModel.refit
FeatureSelectionNCAClassification.refit
FeatureSelectionNCARegression.refit
reduceDimensions
refline
regress
RegressionBaggedEnsemble
RegressionEnsemble
RegressionEnsemble Predict
RegressionGAM
RegressionGP
RegressionGP Predict
RegressionLinear
RegressionLinear Predict
RegressionLinearCoderConfigurer
RegressionNeuralNetwork
RegressionNeuralNetwork Predict
RegressionPartitionedEnsemble
RegressionPartitionedGAM
RegressionPartitionedGP
RegressionPartitionedLinear
RegressionPartitionedModel
RegressionPartitionedNeuralNetwork
RegressionPartitionedSVM
RegressionSVM
RegressionSVMCoderConfigurer
RegressionSVM Predict
RegressionTree
RegressionTree Predict
RegressionTreeCoderConfigurer
regstats
RegressionEnsemble.regularize
relieff
CompactClassificationEnsemble.removeLearners
CompactRegressionEnsemble.removeLearners
removeTerms
removeTerms
reorderlevels
repartition
RepeatedMeasuresModel
dataset.replacedata
dataset.replaceWithMissing
report
reset
reset
reset
reset
reset
reset
qrandstream.reset
GeneralizedLinearMixedModel.residuals
residuals
GeneralizedLinearMixedModel.response
response
resubEdge
resubEdge
resubEdge
ClassificationEnsemble.resubEdge
resubEdge
resubLoss
resubLoss
ClassificationEnsemble.resubLoss
resubLoss
resubLoss
RegressionEnsemble.resubLoss
resubLoss
RegressionSVM.resubLoss
resubLoss
resubMargin
resubMargin
resubMargin
ClassificationEnsemble.resubMargin
resubMargin
resubPredict
resubPredict
ClassificationEnsemble.resubPredict
resubPredict
resubPredict
RegressionEnsemble.resubPredict
resubPredict
RegressionSVM.resubPredict
resubPredict
resume
ClassificationEnsemble.resume
ClassificationPartitionedEnsemble.resume
resume
resume
RegressionEnsemble.resume
RegressionPartitionedEnsemble.resume
RegressionSVM.resume
rica
ridge
robustcov
robustdemo
robustfit
RobustRandomCutForest
ROCCurve
rocmetrics
rotatefactors
rowexch
rrcforest
rsmdemo
rstool
runstest
sampsizepwr
saveCompactModel
saveLearnerForCoder
scatterhist
scramble
segment
selectFeatures
ClassificationLinear.selectModels
selectModels
RegressionLinear.selectModels
SemiSupervisedGraphModel
SemiSupervisedSelfTrainingModel
sequentialfs
dataset.set
CompactTreeBagger.setDefaultYfit
dataset.setdiff
setlabels
dataset.setxor
shapley
RegressionEnsemble.shrink
signrank
signtest
silhouette
SilhouetteEvaluation
dataset.single
dataset.size
slicesample
skewness
sobolset
sortClasses
dataset.sortrows
sparsefilt
SparseFiltering
spectralcluster
squareform
dataset.stack
qrandstream.State
statget
statset
std
step
step
stepwise
stepwiseglm
stepwiselm
stepwisefit
dataset.subsasgn
dataset.subsref
dataset.summary
struct2dataset
surfht
survival
table2dataset
tabulate
tblread
tblwrite
tcdf
tdfread
templateDiscriminant
templateECOC
templateEnsemble
templateGAM
templateGP
templateKernel
templateKNN
templateLinear
templateNaiveBayes
templateSVM
templateTree
test
test
testcholdout
testckfold
testDeviance
tiedrank
tinv
tpdf
training
training
transform
transform
transform
TreeBagger
trimmean
trnd
truncate
tsne
tspartition
tstat
ttest
ttest2
BetaDistribution
BinomialDistribution
BirnbaumSaundersDistribution
BurrDistribution
ExponentialDistribution
ExtremeValueDistribution
GammaDistribution
GeneralizedExtremeValueDistribution
GeneralizedParetoDistribution
HalfNormalDistribution
InverseGaussianDistribution
KernelDistribution
LogisticDistribution
LoglogisticDistribution
LognormalDistribution
LoguniformDistribution
MultinomialDistribution
NakagamiDistribution
NegativeBinomialDistribution
NormalDistribution
PiecewiseLinearDistribution
PoissonDistribution
RayleighDistribution
RicianDistribution
StableDistribution
stats
stats
tLocationScaleDistribution
TriangularDistribution
UniformDistribution
WeibullDistribution
dataset.union
dataset.unique
unidcdf
unidinv
unidpdf
unidrnd
unidstat
unifcdf
unifinv
unifit
unifpdf
unifrnd
unifstat
dataset.unstack
update
updateMetrics
updateMetrics
updateMetrics
updateMetrics
updateMetrics
updateMetricsAndFit
updateMetricsAndFit
updateMetricsAndFit
updateMetricsAndFit
updateMetricsAndFit
upperparams
validatedUpdateInputs
var
varianceComponent
vartest
vartest2
vartestn
dataset.vertcat
compact
view
view
wblcdf
wblfit
wblinv
wbllike
wblpdf
wblplot
wblrnd
wblstat
wishrnd
xptread
x2fx
zscore
ztest
hmcSampler
HamiltonianSampler
HamiltonianSampler.estimateMAP
HamiltonianSampler.tuneSampler
HamiltonianSampler.drawSamples
HamiltonianSampler.diagnostics
Experiment Manager
Classification Learner
Regression Learner
Distribution Fitter
fitckernel
ClassificationKernel
edge
loss
margin
predict
resume
fitrkernel
RegressionKernel
loss
predict
resume
RegressionPartitionedKernel
kfoldLoss
kfoldPredict
Sample Data Sets
Sample Data Sets
Probability Distributions
Bernoulli Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Examples
Related Distributions
Beta Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Examples
Related Distributions
Binomial Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Example
Related Distributions
Birnbaum-Saunders Distribution
Definition
Background
Parameters
Burr Type XII Distribution
Definition
Background
Parameters
Fit a Burr Distribution and Draw the cdf
Compare Lognormal and Burr Distribution pdfs
Burr pdf for Various Parameters
Survival and Hazard Functions of Burr Distribution
Divergence of Parameter Estimates
Chi-Square Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Inverse Cumulative Distribution Function
Descriptive Statistics
Examples
Related Distributions
Exponential Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Inverse Cumulative Distribution Function
Hazard Function
Examples
Related Distributions
Extreme Value Distribution
Definition
Background
Parameters
Examples
F Distribution
Definition
Background
Examples
Gamma Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Inverse Cumulative Distribution Function
Descriptive Statistics
Examples
Related Distributions
Generalized Extreme Value Distribution
Definition
Background
Parameters
Examples
Generalized Pareto Distribution
Definition
Background
Parameters
Examples
Geometric Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Hazard Function
Examples
Related Distributions
Half-Normal Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Relationship to Other Distributions
Hypergeometric Distribution
Definition
Background
Examples
Inverse Gaussian Distribution
Definition
Background
Parameters
Inverse Wishart Distribution
Definition
Background
Example
Kernel Distribution
Overview
Kernel Density Estimator
Kernel Smoothing Function
Bandwidth
Logistic Distribution
Overview
Parameters
Probability Density Function
Relationship to Other Distributions
Loglogistic Distribution
Overview
Parameters
Probability Density Function
Relationship to Other Distributions
Lognormal Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Examples
Related Distributions
Loguniform Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Examples
Related Distributions
Multinomial Distribution
Overview
Parameter
Probability Density Function
Descriptive Statistics
Relationship to Other Distributions
Multivariate Normal Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Examples
Multivariate t Distribution
Definition
Background
Example
Nakagami Distribution
Definition
Background
Parameters
Negative Binomial Distribution
Definition
Background
Parameters
Example
Noncentral Chi-Square Distribution
Definition
Background
Examples
Noncentral F Distribution
Definition
Background
Examples
Noncentral t Distribution
Definition
Background
Examples
Normal Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Examples
Related Distributions
Pearson Distribution
Types
Parameters
Probability Density Function
Cumulative Distribution Function
Support
Examples
Piecewise Linear Distribution
Overview
Parameters
Cumulative Distribution Function
Relationship to Other Distributions
Poisson Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Examples
Related Distributions
Rayleigh Distribution
Definition
Background
Parameters
Examples
Rician Distribution
Definition
Background
Parameters
Stable Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Relationship to Other Distributions
Student's t Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Inverse Cumulative Distribution Function
Descriptive Statistics
Examples
Related Distributions
t Location-Scale Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Relationship to Other Distributions
Triangular Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Examples
Uniform Distribution (Continuous)
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Random Number Generation
Examples
Related Distributions
Uniform Distribution (Discrete)
Definition
Background
Examples
Weibull Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Inverse Cumulative Distribution Function
Hazard Function
Examples
Related Distributions
Wishart Distribution
Overview
Parameters
Probability Density Function
Example
Bibliography
Bibliography

Recommend Papers

Deep Learning HDL Toolbox. User's Guide R2023b

114 85 11MB Read more

Econometrics Toolbox. User's Guide R2023b

121 75 27MB Read more

Radar Toolbox User's Guide R2023b

120 41 23MB Read more

Antenna Toolbox User's Guide R2023b

114 18 26MB Read more

Wavelet Toolbox User's Guide R2023b

118 23 22MB Read more

Signal Processing Toolbox. User's Guide R2023b

115 55 27MB Read more

Phased Array System Toolbox User's Guide R2023b

111 34 19MB Read more

Deep Learning Toolbox. User's Guide

109 103 106MB Read more

Deep Learning Toolbox User's Guide

259 38 94MB Read more

Simulink. User's Guide R2023b

101 63 74MB Read more

Statistics and Machine Learning Toolbox. User's Guide. R2023b

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Statistics and Machine Learning Toolbox™ User's Guide

R2023b

How to Contact MathWorks Latest news:

www.mathworks.com

Sales and services:

www.mathworks.com/sales_and_services

User community:

www.mathworks.com/matlabcentral

Technical support:

www.mathworks.com/support/contact_us

Phone:

508-647-7000

The MathWorks, Inc. 1 Apple Hill Drive Natick, MA 01760-2098 Statistics and Machine Learning Toolbox™ User's Guide © COPYRIGHT 1993–2023 by The MathWorks, Inc. The software described in this document is furnished under a license agreement. The software may be used or copied only under the terms of the license agreement. No part of this manual may be photocopied or reproduced in any form without prior written consent from The MathWorks, Inc. FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by, for, or through the federal government of the United States. By accepting delivery of the Program or Documentation, the government hereby agrees that this software or documentation qualifies as commercial computer software or commercial computer software documentation as such terms are used or defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and conditions of this Agreement and only those rights specified in this Agreement, shall pertain to and govern the use, modification, reproduction, release, performance, display, and disclosure of the Program and Documentation by the federal government (or other entity acquiring for or through the federal government) and shall supersede any conflicting contractual terms or conditions. If this License fails to meet the government's needs or is inconsistent in any respect with federal procurement law, the government agrees to return the Program and Documentation, unused, to The MathWorks, Inc.

Trademarks

MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders. Patents

MathWorks products are protected by one or more U.S. patents. Please see www.mathworks.com/patents for more information.

Revision History

September 1993 March 1996 January 1997 November 2000 May 2001 July 2002 February 2003 June 2004 October 2004 March 2005 September 2005 March 2006 September 2006 March 2007 September 2007 March 2008 October 2008 March 2009 September 2009 March 2010 September 2010 April 2011 September 2011 March 2012 September 2012 March 2013 September 2013 March 2014 October 2014 March 2015 September 2015 March 2016 September 2016 March 2017 September 2017 March 2018 September 2018 March 2019 September 2019 March 2020 September 2020 March 2021 September 2021 March 2022 September 2022 March 2023 September 2023

First printing Second printing Third printing Fourth printing Fifth printing Sixth printing Online only Seventh printing Online only Online only Online only Online only Online only Eighth printing Ninth printing Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only

Version 1.0 Version 2.0 Version 2.11 Revised for Version 3.0 (Release 12) Minor revisions Revised for Version 4.0 (Release 13) Revised for Version 4.1 (Release 13.0.1) Revised for Version 5.0 (Release 14) Revised for Version 5.0.1 (Release 14SP1) Revised for Version 5.0.2 (Release 14SP2) Revised for Version 5.1 (Release 14SP3) Revised for Version 5.2 (Release 2006a) Revised for Version 5.3 (Release 2006b) Revised for Version 6.0 (Release 2007a) Revised for Version 6.1 (Release 2007b) Revised for Version 6.2 (Release 2008a) Revised for Version 7.0 (Release 2008b) Revised for Version 7.1 (Release 2009a) Revised for Version 7.2 (Release 2009b) Revised for Version 7.3 (Release 2010a) Revised for Version 7.4 (Release 2010b) Revised for Version 7.5 (Release 2011a) Revised for Version 7.6 (Release 2011b) Revised for Version 8.0 (Release 2012a) Revised for Version 8.1 (Release 2012b) Revised for Version 8.2 (Release 2013a) Revised for Version 8.3 (Release 2013b) Revised for Version 9.0 (Release 2014a) Revised for Version 9.1 (Release 2014b) Revised for Version 10.0 (Release 2015a) Revised for Version 10.1 (Release 2015b) Revised for Version 10.2 (Release 2016a) Revised for Version 11 (Release 2016b) Revised for Version 11.1 (Release 2017a) Revised for Version 11.2 (Release 2017b) Revised for Version 11.3 (Release 2018a) Revised for Version 11.4 (Release 2018b) Revised for Version 11.5 (Release 2019a) Revised for Version 11.6 (Release 2019b) Revised for Version 11.7 (Release 2020a) Revised for Version 12.0 (Release 2020b) Revised for Version 12.1 (Release 2021a) Revised for Version 12.2 (Release 2021b) Revised for Version 12.3 (Release 2022a) Revised for Version 12.4 (Release 2022b) Revised for Version 12.5 (Release 2023a) Revised for Version 23.2 (R2023b)

Contents

1

2

Getting Started Statistics and Machine Learning Toolbox Product Description . . . . . . . . .

1-2

Supported Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1-3

Organizing Data Test Differences Between Category Means . . . . . . . . . . . . . . . . . . . . . . . . .

3

2-2

Grouping Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Grouping Variables? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Group Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis Using Grouping Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Missing Group Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-11 2-11 2-11 2-12 2-12

Dummy Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Dummy Variables? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating Dummy Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-13 2-13 2-14

Linear Regression with Categorical Covariates . . . . . . . . . . . . . . . . . . . .

2-17

Descriptive Statistics Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3-2 3-2

Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compare Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3-4 3-4

Exploratory Analysis of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3-6

Resampling Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bootstrap Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jackknife Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Computing Support for Resampling Methods . . . . . . . . . . . . . . .

3-10 3-10 3-12 3-13

v

4

Statistical Visualization Create Scatter Plots Using Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . .

4-2

Compare Grouped Data Using Box Plots . . . . . . . . . . . . . . . . . . . . . . . . . . .

4-4

Distribution Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7 Normal Probability Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7 Probability Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9 Quantile-Quantile Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11 Cumulative Distribution Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13 Visualizing Multivariate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

vi

Contents

4-17

Probability Distributions Working with Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Distribution Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Apps and Interactive User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distribution-Specific Functions and Generic Distribution Functions . . . .

5-3 5-3 5-6 5-10

Supported Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Continuous Distributions (Data) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Continuous Distributions (Statistics) . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonparametric Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Flexible Distribution Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-16 5-16 5-19 5-20 5-21 5-22 5-22

Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-23

Negative Loglikelihood Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Find MLEs Using Negative Loglikelihood Function . . . . . . . . . . . . . . . . .

5-25 5-25

Random Number Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-28

Nonparametric and Empirical Probability Distributions . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kernel Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Empirical Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . Piecewise Linear Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pareto Tails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Triangular Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-31 5-31 5-31 5-32 5-33 5-34 5-35

Fit Kernel Distribution Object to Data . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-37

Fit Kernel Distribution Using ksdensity . . . . . . . . . . . . . . . . . . . . . . . . . .

5-40

Fit Distributions to Grouped Data Using ksdensity . . . . . . . . . . . . . . . . .

5-42

Fit a Nonparametric Distribution with Pareto Tails . . . . . . . . . . . . . . . . .

5-44

Generate Random Numbers Using the Triangular Distribution . . . . . . .

5-48

Model Data Using the Distribution Fitter App . . . . . . . . . . . . . . . . . . . . . Explore Probability Distributions Interactively . . . . . . . . . . . . . . . . . . . . Create and Manage Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create a New Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Display Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manage Fits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evaluate Fits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exclude Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Save and Load Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate a File to Fit and Plot Distributions . . . . . . . . . . . . . . . . . . . . . .

5-52 5-52 5-53 5-56 5-60 5-61 5-63 5-65 5-69 5-69

Fit a Distribution Using the Distribution Fitter App . . . . . . . . . . . . . . . . Step 1: Load Sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step 2: Import Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step 3: Create a New Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step 4: Create and Manage Additional Fits . . . . . . . . . . . . . . . . . . . . . . .

5-72 5-72 5-72 5-74 5-77

Define Custom Distributions Using the Distribution Fitter App . . . . . . . Open the Distribution Fitter App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Define Custom Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import Custom Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-82 5-82 5-83 5-84

Explore the Random Number Generation UI . . . . . . . . . . . . . . . . . . . . . .

5-86

Compare Multiple Distribution Fits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-88

Fit Probability Distribution Objects to Grouped Data . . . . . . . . . . . . . . .

5-93

Three-Parameter Weibull Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-96

Multinomial Probability Distribution Objects . . . . . . . . . . . . . . . . . . . . .

5-103

Multinomial Probability Distribution Functions . . . . . . . . . . . . . . . . . . .

5-106

Generate Random Numbers Using Uniform Distribution Inversion . . .

5-109

Represent Cauchy Distribution Using t Location-Scale . . . . . . . . . . . . .

5-112

Generate Cauchy Random Numbers Using Student's t . . . . . . . . . . . . .

5-115

Generate Correlated Data Using Rank Correlation . . . . . . . . . . . . . . . .

5-116

Create Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-120

Fit Gaussian Mixture Model to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-123

Simulate Data from Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . .

5-127

Copulas: Generate Correlated Samples . . . . . . . . . . . . . . . . . . . . . . . . . . Determining Dependence Between Simulation Inputs . . . . . . . . . . . . . .

5-129 5-129

vii

Constructing Dependent Bivariate Distributions . . . . . . . . . . . . . . . . . . Using Rank Correlation Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Bivariate Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Higher Dimension Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Archimedean Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulating Dependent Multivariate Data Using Copulas . . . . . . . . . . . . Fitting Copulas to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-132 5-136 5-138 5-145 5-146 5-147 5-151

Simulating Dependent Random Variables Using Copulas . . . . . . . . . . .

5-155

Fit Custom Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-173

Avoid Numerical Issues When Fitting Custom Distributions . . . . . . . .

5-186

Nonparametric Estimates of Cumulative Distribution Functions and Their Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-192

6

viii

Contents

Modelling Tail Data with the Generalized Pareto Distribution . . . . . . .

5-207

Modelling Data with the Generalized Extreme Value Distribution . . . .

5-215

Curve Fitting and Distribution Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-226

Fitting a Univariate Distribution Using Cumulative Probabilities . . . .

5-234

Gaussian Processes Gaussian Process Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compare Prediction Intervals of GPR Models . . . . . . . . . . . . . . . . . . . . . .

6-2 6-3

Kernel (Covariance) Function Options . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6-6

Exact GPR Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computational Complexity of Exact Parameter Estimation and Prediction .....................................................

6-10 6-10 6-11 6-13

Subset of Data Approximation for GPR Models . . . . . . . . . . . . . . . . . . . .

6-14

Subset of Regressors Approximation for GPR Models . . . . . . . . . . . . . . . Approximating the Kernel Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Predictive Variance Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6-15 6-15 6-16 6-16 6-17

Fully Independent Conditional Approximation for GPR Models . . . . . . . Approximating the Kernel Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6-19 6-19 6-19 6-20

7

8

9

Block Coordinate Descent Approximation for GPR Models . . . . . . . . . . . Fit GPR Models Using BCD Approximation . . . . . . . . . . . . . . . . . . . . . . .

6-22 6-22

Predict Battery State of Charge Using Machine Learning . . . . . . . . . . . .

6-27

Random Number Generation Generating Pseudorandom Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Common Pseudorandom Number Generation Methods . . . . . . . . . . . . . . .

7-2 7-2

Representing Sampling Distributions Using Markov Chain Samplers . . . Using the Metropolis-Hastings Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . Using Slice Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Hamiltonian Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7-9 7-9 7-9 7-10

Generating Quasi-Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quasi-Random Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quasi-Random Point Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quasi-Random Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7-12 7-12 7-13 7-18

Generating Data Using Flexible Families of Distributions . . . . . . . . . . . .

7-20

Bayesian Linear Regression Using Hamiltonian Monte Carlo . . . . . . . . .

7-26

Bayesian Analysis for a Logistic Regression Model . . . . . . . . . . . . . . . . .

7-35

Hypothesis Tests Hypothesis Test Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8-2

Hypothesis Test Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8-4

Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8-5

Available Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8-10

Selecting a Sample Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8-12

Analysis of Variance One-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to One-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-2 9-2

ix

Prepare Data for One-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perform One-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

x

Contents

9-3 9-4 9-8

Two-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Two-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prepare Data for Balanced Two-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . Perform Two-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-11 9-11 9-12 9-13 9-15

Multiple Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiple Comparisons Using One-Way ANOVA . . . . . . . . . . . . . . . . . . . . Multiple Comparisons for Three-Way ANOVA . . . . . . . . . . . . . . . . . . . . . Multiple Comparison Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-19 9-19 9-21 9-24

N-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to N-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prepare Data for N-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perform N-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-28 9-28 9-30 9-30

ANOVA with Random Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-36

Other ANOVA Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-41

Analysis of Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Analysis of Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis of Covariance Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Confidence Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiple Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-42 9-42 9-42 9-46 9-48

Nonparametric Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Nonparametric Methods . . . . . . . . . . . . . . . . . . . . . . . . . Kruskal-Wallis Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Friedman's Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-50 9-50 9-50 9-50

Perform Multivariate Analysis of Variance (MANOVA) . . . . . . . . . . . . . . . Introduction to MANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ANOVA with Multiple Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-52 9-52 9-52

Model Specification for Repeated Measures Models . . . . . . . . . . . . . . . . Wilkinson Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-57 9-57

Compound Symmetry Assumption and Epsilon Corrections . . . . . . . . . .

9-58

Mauchly’s Test of Sphericity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-60

Multivariate Analysis of Variance for Repeated Measures . . . . . . . . . . . .

9-62

10

Bayesian Optimization Bayesian Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gaussian Process Regression for Fitting the Model . . . . . . . . . . . . . . . . . Acquisition Function Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acquisition Function Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10-2 10-2 10-3 10-3 10-5

Parallel Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimize in Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Bayesian Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Settings for Best Parallel Performance . . . . . . . . . . . . . . . . . . . . . . . . . . Differences in Parallel Bayesian Optimization Output . . . . . . . . . . . . . . .

10-7 10-7 10-7 10-8 10-9

Bayesian Optimization Plot Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . Built-In Plot Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Custom Plot Function Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create a Custom Plot Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10-11 10-11 10-12 10-12

Bayesian Optimization Output Functions . . . . . . . . . . . . . . . . . . . . . . . . What Is a Bayesian Optimization Output Function? . . . . . . . . . . . . . . . . Built-In Output Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Custom Output Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bayesian Optimization Output Function . . . . . . . . . . . . . . . . . . . . . . . .

10-19 10-19 10-19 10-19 10-20

Bayesian Optimization Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is Bayesian Optimization? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ways to Perform Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . Bayesian Optimization Using a Fit Function . . . . . . . . . . . . . . . . . . . . . Bayesian Optimization Using bayesopt . . . . . . . . . . . . . . . . . . . . . . . . . Bayesian Optimization Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . Parameters Available for Fit Functions . . . . . . . . . . . . . . . . . . . . . . . . . Hyperparameter Optimization Options for Fit Functions . . . . . . . . . . . .

10-25 10-25 10-25 10-26 10-26 10-27 10-28 10-30

Variables for a Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . Syntax for Creating Optimization Variables . . . . . . . . . . . . . . . . . . . . . . Variables for Optimization Examples . . . . . . . . . . . . . . . . . . . . . . . . . . .

10-34 10-34 10-35

Bayesian Optimization Objective Functions . . . . . . . . . . . . . . . . . . . . . . Objective Function Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objective Function Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objective Function Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10-37 10-37 10-37 10-37

Constraints in Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deterministic Constraints — XConstraintFcn . . . . . . . . . . . . . . . . . . . . Conditional Constraints — ConditionalVariableFcn . . . . . . . . . . . . . . . . Coupled Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bayesian Optimization with Coupled Constraints . . . . . . . . . . . . . . . . .

10-39 10-39 10-39 10-40 10-41 10-42

Optimize Cross-Validated Classifier Using bayesopt . . . . . . . . . . . . . . . .

10-46

Optimize Classifier Fit Using Bayesian Optimization . . . . . . . . . . . . . . .

10-56

xi

Optimize a Boosted Regression Ensemble . . . . . . . . . . . . . . . . . . . . . . .

11

xii

Contents

10-66

Parametric Regression Analysis Choose a Regression Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Update Legacy Code with New Fitting Methods . . . . . . . . . . . . . . . . . . .

11-3 11-3

What Is a Linear Regression Model? . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-6

Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prepare Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose a Fitting Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose a Model or Range of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . Fit Model to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examine Quality and Adjust Fitted Model . . . . . . . . . . . . . . . . . . . . . . . Predict or Simulate Responses to New Data . . . . . . . . . . . . . . . . . . . . . Share Fitted Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-9 11-9 11-10 11-11 11-13 11-14 11-31 11-33

Linear Regression Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-35

Regression Using Dataset Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-40

Linear Regression Using Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-43

Linear Regression with Interaction Effects . . . . . . . . . . . . . . . . . . . . . . .

11-46

Interpret Linear Regression Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-52

Cook’s Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How To . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Determine Outliers Using Cook's Distance . . . . . . . . . . . . . . . . . . . . . .

11-57 11-57 11-57 11-57 11-57

Coefficient Standard Errors and Confidence Intervals . . . . . . . . . . . . . . Coefficient Covariance and Standard Errors . . . . . . . . . . . . . . . . . . . . . Coefficient Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-60 11-60 11-61

Coefficient of Determination (R-Squared) . . . . . . . . . . . . . . . . . . . . . . . Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How To . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Display Coefficient of Determination . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-63 11-63 11-63 11-63 11-63

Delete-1 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Delete-1 Change in Covariance (CovRatio) . . . . . . . . . . . . . . . . . . . . . . Delete-1 Scaled Difference in Coefficient Estimates (Dfbetas) . . . . . . . . Delete-1 Scaled Change in Fitted Values (Dffits) . . . . . . . . . . . . . . . . . . Delete-1 Variance (S2_i) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-65 11-65 11-67 11-68 11-70

Durbin-Watson Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How To . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test for Autocorrelation Among Residuals . . . . . . . . . . . . . . . . . . . . . . .

11-72 11-72 11-72 11-72 11-72

F-statistic and t-statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assess Fit of Model Using F-statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . t-statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assess Significance of Regression Coefficients Using t-statistic . . . . . . .

11-74 11-74 11-74 11-76 11-77

Hat Matrix and Leverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hat Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Determine High Leverage Observations . . . . . . . . . . . . . . . . . . . . . . . .

11-79 11-79 11-80 11-80

Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How To . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assess Model Assumptions Using Residuals . . . . . . . . . . . . . . . . . . . . .

11-82 11-82 11-82 11-83 11-83

Summary of Output and Diagnostic Statistics . . . . . . . . . . . . . . . . . . . .

11-91

Wilkinson Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-93 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-93 Formula Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-93 Linear Model Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-96 Linear Mixed-Effects Model Examples . . . . . . . . . . . . . . . . . . . . . . . . . 11-97 Generalized Linear Model Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-98 Generalized Linear Mixed-Effects Model Examples . . . . . . . . . . . . . . . . 11-99 Repeated Measures Model Examples . . . . . . . . . . . . . . . . . . . . . . . . . 11-100 Stepwise Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stepwise Regression to Select Appropriate Models . . . . . . . . . . . . . . . Compare Large and Small Stepwise Models . . . . . . . . . . . . . . . . . . . .

11-101 11-101 11-101

Reduce Outlier Effects Using Robust Regression . . . . . . . . . . . . . . . . . Why Use Robust Regression? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iteratively Reweighted Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . Compare Results of Standard and Robust Least-Squares Fit . . . . . . . . Steps for Iteratively Reweighted Least Squares . . . . . . . . . . . . . . . . .

11-106 11-106 11-106 11-107 11-109

Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-111 11-111 11-111

Lasso and Elastic Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Lasso and Elastic Net? . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lasso and Elastic Net Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-114 11-114 11-114 11-115

Wide Data via Lasso and Parallel Computing . . . . . . . . . . . . . . . . . . . .

11-117

xiii

Lasso Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-122

Lasso and Elastic Net with Cross Validation . . . . . . . . . . . . . . . . . . . . .

11-125

Partial Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Partial Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . Perform Partial Least-Squares Regression . . . . . . . . . . . . . . . . . . . . .

11-128 11-128 11-128

Linear Mixed-Effects Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-133

Prepare Data for Linear Mixed-Effects Models . . . . . . . . . . . . . . . . . . . Tables and Dataset Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Design Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relation of Matrix Form to Tables and Dataset Arrays . . . . . . . . . . . . .

11-136 11-136 11-137 11-139

Relationship Between Formula and Design Matrices . . . . . . . . . . . . . . Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Design Matrices for Fixed and Random Effects . . . . . . . . . . . . . . . . . . Grouping Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-140 11-140 11-141 11-143

Estimating Parameters in Linear Mixed-Effects Models . . . . . . . . . . . Maximum Likelihood (ML) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Restricted Maximum Likelihood (REML) . . . . . . . . . . . . . . . . . . . . . . .

11-145 11-145 11-146

Linear Mixed-Effects Model Workflow . . . . . . . . . . . . . . . . . . . . . . . . . .

11-148

Fit Mixed-Effects Spline Regression . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-160

Train Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-163

Analyze Time Series Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-181

Partial Least Squares Regression and Principal Components Regression ....................................................... 11-190 Accelerate Linear Model Fitting on GPU . . . . . . . . . . . . . . . . . . . . . . . .

12

xiv

Contents

11-202

Generalized Linear Models Multinomial Models for Nominal Responses . . . . . . . . . . . . . . . . . . . . . .

12-2

Multinomial Models for Ordinal Responses . . . . . . . . . . . . . . . . . . . . . . .

12-4

Multinomial Models for Hierarchical Responses . . . . . . . . . . . . . . . . . . .

12-7

Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Generalized Linear Models? . . . . . . . . . . . . . . . . . . . . . . . . . . Prepare Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose Generalized Linear Model and Link Function . . . . . . . . . . . . . . Choose Fitting Method and Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-9 12-9 12-9 12-11 12-13

Fit Model to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examine Quality and Adjust the Fitted Model . . . . . . . . . . . . . . . . . . . . Predict or Simulate Responses to New Data . . . . . . . . . . . . . . . . . . . . . Share Fitted Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

12-15 12-16 12-23 12-26

Generalized Linear Model Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-28

Lasso Regularization of Generalized Linear Models . . . . . . . . . . . . . . . What is Generalized Linear Model Lasso Regularization? . . . . . . . . . . . Generalized Linear Model Lasso and Elastic Net . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-32 12-32 12-32 12-33

Regularize Poisson Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-34

Regularize Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-36

Regularize Wide Data in Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-43

Generalized Linear Mixed-Effects Models . . . . . . . . . . . . . . . . . . . . . . . . What Are Generalized Linear Mixed-Effects Models? . . . . . . . . . . . . . . GLME Model Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prepare Data for Model Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose a Distribution Type for the Model . . . . . . . . . . . . . . . . . . . . . . . Choose a Link Function for the Model . . . . . . . . . . . . . . . . . . . . . . . . . . Specify the Model Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Display the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Work with the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-48 12-48 12-48 12-49 12-50 12-50 12-51 12-53 12-55

Fit a Generalized Linear Mixed-Effects Model . . . . . . . . . . . . . . . . . . . .

12-57

Fitting Data with Generalized Linear Models . . . . . . . . . . . . . . . . . . . . .

12-65

Train Generalized Additive Model for Binary Classification . . . . . . . . .

12-77

Train Generalized Additive Model for Regression . . . . . . . . . . . . . . . . .

12-86

Nonlinear Regression Nonlinear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Parametric Nonlinear Regression Models? . . . . . . . . . . . . . . . . Prepare Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Represent the Nonlinear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose Initial Vector beta0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fit Nonlinear Model to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examine Quality and Adjust the Fitted Nonlinear Model . . . . . . . . . . . . . Predict or Simulate Responses Using a Nonlinear Model . . . . . . . . . . . .

13-2 13-2 13-2 13-3 13-5 13-6 13-6 13-9

Nonlinear Regression Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13-13

Mixed-Effects Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Mixed-Effects Models . . . . . . . . . . . . . . . . . . . . . . . . . .

13-18 13-18

xv

14

Mixed-Effects Model Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying Mixed-Effects Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying Covariate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choosing nlmefit or nlmefitsa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Output Functions with Mixed-Effects Models . . . . . . . . . . . . . . .

13-18 13-19 13-21 13-22 13-24

Examining Residuals for Model Verification . . . . . . . . . . . . . . . . . . . . . .

13-28

Mixed-Effects Models Using nlmefit and nlmefitsa . . . . . . . . . . . . . . . .

13-33

Weighted Nonlinear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13-45

Pitfalls in Fitting Nonlinear Models by Transforming to Linearity . . . .

13-53

Nonlinear Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13-59

Time Series Forecasting Manually Perform Time Series Forecasting Using Ensembles of Boosted Regression Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perform Time Series Direct Forecasting with directforecaster . . . . . . .

15

Contents

14-16

Survival Analysis What Is Survival Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Survivor Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hazard Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xvi

14-2

15-2 15-2 15-2 15-2 15-4 15-6

Kaplan-Meier Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-10

Hazard and Survivor Functions for Different Groups . . . . . . . . . . . . . . .

15-16

Survivor Functions for Two Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-22

Cox Proportional Hazards Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hazard Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extension of Cox Proportional Hazards Model . . . . . . . . . . . . . . . . . . . Partial Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Partial Likelihood Function for Tied Events . . . . . . . . . . . . . . . . . . . . . . Frequency or Weights of Observations . . . . . . . . . . . . . . . . . . . . . . . . .

15-26 15-26 15-26 15-27 15-27 15-28 15-29

Cox Proportional Hazards Model for Censored Data . . . . . . . . . . . . . . .

15-31

16

Cox Proportional Hazards Model with Time-Dependent Covariates . . .

15-35

Cox Proportional Hazards Model Object . . . . . . . . . . . . . . . . . . . . . . . . .

15-39

Analyzing Survival or Reliability Data . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-47

Multivariate Methods Multivariate Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Multivariate Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . Multivariate Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . Solving Multivariate Regression Problems . . . . . . . . . . . . . . . . . . . . . . .

16-2 16-2 16-2 16-3

Estimation of Multivariate Regression Models . . . . . . . . . . . . . . . . . . . . . Least Squares Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Missing Response Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-5 16-5 16-7 16-9

Set Up Multivariate Regression Problems . . . . . . . . . . . . . . . . . . . . . . . Response Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Design Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Common Multivariate Regression Problems . . . . . . . . . . . . . . . . . . . . .

16-11 16-11 16-14 16-14

Multivariate General Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-19

Fixed Effects Panel Model with Concurrent Correlation . . . . . . . . . . . .

16-23

Longitudinal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-29

Multidimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-34

Nonclassical and Nonmetric Multidimensional Scaling . . . . . . . . . . . . Nonclassical Multidimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . Nonmetric Multidimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . .

16-35 16-35 16-36

Classical Multidimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-39

Compare Handwritten Shapes Using Procrustes Analysis . . . . . . . . . . .

16-41

Introduction to Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feature Selection Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feature Selection Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-46 16-46 16-47

Sequential Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Sequential Feature Selection . . . . . . . . . . . . . . . . . . . . Select Subset of Features with Comparative Predictive Power . . . . . . .

16-58 16-58 16-58

Nonnegative Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-62

Perform Nonnegative Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . .

16-63

xvii

Principal Component Analysis (PCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-65

Analyze Quality of Life in U.S. Cities Using PCA . . . . . . . . . . . . . . . . . . .

16-66

Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-75

Analyze Stock Prices Using Factor Analysis . . . . . . . . . . . . . . . . . . . . . .

16-76

Robust Feature Selection Using NCA for Regression . . . . . . . . . . . . . . .

16-82

Neighborhood Component Analysis (NCA) Feature Selection . . . . . . . . NCA Feature Selection for Classification . . . . . . . . . . . . . . . . . . . . . . . . NCA Feature Selection for Regression . . . . . . . . . . . . . . . . . . . . . . . . . Impact of Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choosing the Regularization Parameter Value . . . . . . . . . . . . . . . . . . . .

16-96 16-96 16-98 16-99 16-99

t-SNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is t-SNE? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . t-SNE Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barnes-Hut Variation of t-SNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristics of t-SNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-101 16-101 16-101 16-104 16-104

t-SNE Output Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . t-SNE Output Function Description . . . . . . . . . . . . . . . . . . . . . . . . . . tsne optimValues Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . t-SNE Custom Output Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-107 16-107 16-107 16-108

Visualize High-Dimensional Data Using t-SNE . . . . . . . . . . . . . . . . . . .

16-110

tsne Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-114

Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is Feature Extraction? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sparse Filtering Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reconstruction ICA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-125 16-125 16-125 16-127

Feature Extraction Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-130

Extract Mixed Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-150

Select Features for Classifying High-Dimensional Data . . . . . . . . . . .

16-157

Perform Factor Analysis on Exam Grades . . . . . . . . . . . . . . . . . . . . . . .

16-167

Classical Multidimensional Scaling Applied to Nonspatial Distances

16-176

Nonclassical Multidimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . .

16-184

Fitting an Orthogonal Regression Using Principal Components Analysis ....................................................... 16-192 Tune Regularization Parameter to Detect Features Using NCA for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xviii

Contents

16-197

17

Cluster Analysis Choose Cluster Analysis Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clustering Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of Clustering Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17-2 17-2 17-4

Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6 Introduction to Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 17-6 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6 Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-7 Linkages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8 Dendrograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-9 Verify the Cluster Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-10 Create Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-15 DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Determine Values for DBSCAN Parameters . . . . . . . . . . . . . . . . . . . . . .

17-19 17-19 17-19 17-20

Partition Data Using Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimate Number of Clusters and Perform Spectral Clustering . . . . . . .

17-26 17-26 17-26 17-27

k-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to k-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . Compare k-Means Clustering Solutions . . . . . . . . . . . . . . . . . . . . . . . .

17-33 17-33 17-33

Cluster Using Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . How Gaussian Mixture Models Cluster Data . . . . . . . . . . . . . . . . . . . . . Fit GMM with Different Covariance Options and Initial Conditions . . . . When to Regularize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Fit Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17-39 17-39 17-39 17-44 17-45

Cluster Gaussian Mixture Data Using Hard Clustering . . . . . . . . . . . . .

17-46

Cluster Gaussian Mixture Data Using Soft Clustering . . . . . . . . . . . . . .

17-52

Tune Gaussian Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17-57

Cluster Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17-63

Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17-66

Anomaly Detection with Isolation Forest . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Isolation Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters for Isolation Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anomaly Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anomaly Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Detect Outliers and Plot Contours of Anomaly Scores . . . . . . . . . . . . . . Examine NumObservationsPerLearner for Small Data . . . . . . . . . . . . .

17-81 17-81 17-81 17-81 17-82 17-82 17-85

xix

18

Unsupervised Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Novelty Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17-91 17-91 17-102

Model-Specific Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . Detect Outliers After Training Random Forest . . . . . . . . . . . . . . . . . . Detect Outliers After Training Discriminant Analysis Classifier . . . . . .

17-111 17-111 17-114

Parametric Classification Parametric Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-2

ROC Curve and Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-3 Introduction to ROC Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-3 Performance Curve with MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-4 ROC Curve for Multiclass Classification . . . . . . . . . . . . . . . . . . . . . . . . . 18-9 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-11 Classification Scores and Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . 18-13 Pointwise Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-17

19

Performance Curves by perfcurve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Input Scores and Labels for perfcurve . . . . . . . . . . . . . . . . . . . . . . . . . Computation of Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . Multiclass Classification Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Observation Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-19 18-19 18-20 18-22 18-22 18-22

Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-24

Nonparametric Supervised Learning Supervised Learning Workflow and Algorithms . . . . . . . . . . . . . . . . . . . . What Is Supervised Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Steps in Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristics of Classification Algorithms . . . . . . . . . . . . . . . . . . . . . . Misclassification Cost Matrix, Prior Probabilities, and Observation Weights .....................................................

xx

Contents

19-2 19-2 19-3 19-6 19-8

Visualize Decision Surfaces of Different Classifiers . . . . . . . . . . . . . . . .

19-11

Classification Using Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . Pairwise Distance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . k-Nearest Neighbor Search and Radius Search . . . . . . . . . . . . . . . . . . . Classify Query Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Find Nearest Neighbors Using a Custom Distance Metric . . . . . . . . . . . K-Nearest Neighbor Classification for Supervised Learning . . . . . . . . . Construct KNN Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-14 19-14 19-16 19-21 19-27 19-30 19-31

Examine Quality of KNN Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . Predict Classification Using KNN Classifier . . . . . . . . . . . . . . . . . . . . . Modify KNN Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-31 19-32 19-32

Framework for Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prepare the Predictor Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prepare the Response Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose an Applicable Ensemble Aggregation Method . . . . . . . . . . . . . . Set the Number of Ensemble Members . . . . . . . . . . . . . . . . . . . . . . . . . Prepare the Weak Learners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Call fitcensemble or fitrensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-34 19-35 19-35 19-35 19-38 19-38 19-40

Ensemble Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bootstrap Aggregation (Bagging) and Random Forest . . . . . . . . . . . . . . Random Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Boosting Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-42 19-45 19-48 19-49

Train Classification Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-57

Train Regression Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-60

Select Predictors for Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-63

Test Ensemble Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-69

Ensemble Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regularize a Regression Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-73 19-73

Classification with Imbalanced Data . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-82

Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Train Ensemble With Unequal Classification Costs . . . . . . . . . . . . . . . .

19-87 19-88

Surrogate Splits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-93

LPBoost and TotalBoost for Small Ensembles . . . . . . . . . . . . . . . . . . . .

19-98

Tune RobustBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-103

Random Subspace Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-106

Train Classification Ensemble in Parallel . . . . . . . . . . . . . . . . . . . . . . .

19-111

Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger ....................................................... 19-115 Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-126

Detect Outliers Using Quantile Regression . . . . . . . . . . . . . . . . . . . . .

19-139

Conditional Quantile Estimation Using Kernel Smoothing . . . . . . . . .

19-143

xxi

20

xxii

Contents

Tune Random Forest Using Quantile Error and Bayesian Optimization .......................................................

19-146

Assess Neural Network Classifier Performance . . . . . . . . . . . . . . . . . .

19-151

Assess Regression Neural Network Performance . . . . . . . . . . . . . . . . .

19-158

Automated Feature Engineering for Classification . . . . . . . . . . . . . . . Interpret Linear Model with Generated Features . . . . . . . . . . . . . . . . Generate New Features to Improve Bagged Ensemble Accuracy . . . . .

19-164 19-164 19-167

Automated Feature Engineering for Regression . . . . . . . . . . . . . . . . . Interpret Linear Model with Generated Features . . . . . . . . . . . . . . . . Generate New Features to Improve Bagged Ensemble Performance . .

19-171 19-171 19-174

Moving Towards Automating Model Selection Using Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-177

Automated Classifier Selection with Bayesian and ASHA Optimization .......................................................

19-185

Automated Regression Model Selection with Bayesian and ASHA Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-204

Credit Rating by Bagging Decision Trees . . . . . . . . . . . . . . . . . . . . . . .

19-225

Combine Heterogeneous Models into Stacked Ensemble . . . . . . . . . .

19-241

Label Data Using Semi-Supervised Learning Techniques . . . . . . . . . .

19-248

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-254

Decision Trees Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Train Classification Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Train Regression Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20-2 20-2 20-2

View Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20-4

Growing Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20-7

Prediction Using Classification and Regression Trees . . . . . . . . . . . . . . .

20-9

Predict Out-of-Sample Responses of Subtrees . . . . . . . . . . . . . . . . . . . .

20-10

Improving Classification Trees and Regression Trees . . . . . . . . . . . . . . Examining Resubstitution Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose Split Predictor Selection Technique . . . . . . . . . . . . . . . . . . . . .

20-13 20-13 20-13 20-14

21

Control Depth or “Leafiness” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20-15 20-19

Splitting Categorical Predictors in Classification Trees . . . . . . . . . . . . Challenges in Splitting Multilevel Predictors . . . . . . . . . . . . . . . . . . . . Algorithms for Categorical Predictor Split . . . . . . . . . . . . . . . . . . . . . . Inspect Data with Multilevel Categorical Predictors . . . . . . . . . . . . . . .

20-25 20-25 20-25 20-26

Discriminant Analysis Discriminant Analysis Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create Discriminant Analysis Classifiers . . . . . . . . . . . . . . . . . . . . . . . . .

21-2 21-2

Creating Discriminant Analysis Model . . . . . . . . . . . . . . . . . . . . . . . . . . . Weighted Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21-4 21-4

Prediction Using Discriminant Analysis Models . . . . . . . . . . . . . . . . . . . . Posterior Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prior Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21-6 21-6 21-6 21-7

Create and Visualize Discriminant Analysis Classifier . . . . . . . . . . . . . . .

21-9

Improving Discriminant Analysis Models . . . . . . . . . . . . . . . . . . . . . . . . Deal with Singular Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose a Discriminant Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examine the Resubstitution Error and Confusion Matrix . . . . . . . . . . . . Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Change Costs and Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21-15 21-15 21-15 21-16 21-17 21-18

Regularize Discriminant Analysis Classifier . . . . . . . . . . . . . . . . . . . . . .

21-21

Examine the Gaussian Mixture Assumption . . . . . . . . . . . . . . . . . . . . . . 21-27 Bartlett Test of Equal Covariance Matrices for Linear Discriminant Analysis .................................................... 21-27 Q-Q Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-29 Mardia Kurtosis Test of Multivariate Normality . . . . . . . . . . . . . . . . . . . 21-31

22

Naive Bayes Naive Bayes Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Supported Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22-2 22-2

Plot Posterior Classification Probabilities . . . . . . . . . . . . . . . . . . . . . . . . .

22-5

xxiii

23

Classification Learner Machine Learning in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is Machine Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selecting the Right Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Train Classification Models in Classification Learner App . . . . . . . . . . . . Train Regression Models in Regression Learner App . . . . . . . . . . . . . . . Train Neural Networks for Deep Learning . . . . . . . . . . . . . . . . . . . . . . .

xxiv

Contents

23-2 23-2 23-3 23-6 23-7 23-8

Train Classification Models in Classification Learner App . . . . . . . . . . . Automated Classifier Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manual Classifier Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Classifier Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compare and Improve Classification Models . . . . . . . . . . . . . . . . . . . . .

23-10 23-10 23-13 23-14 23-14

Select Data for Classification or Open Saved App Session . . . . . . . . . . . Select Data from Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import Data from File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example Data for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose Validation Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (Optional) Reserve Data for Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . Save and Open App Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-17 23-17 23-18 23-18 23-19 23-21 23-21

Choose Classifier Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose Classifier Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Logistic Regression Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naive Bayes Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Efficiently Trained Linear Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest Neighbor Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kernel Approximation Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ensemble Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Neural Network Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-22 23-22 23-26 23-28 23-29 23-30 23-31 23-33 23-35 23-37 23-38 23-41

Feature Selection and Feature Transformation Using Classification Learner App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Investigate Features in the Scatter Plot . . . . . . . . . . . . . . . . . . . . . . . . Select Features to Include . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transform Features with PCA in Classification Learner . . . . . . . . . . . . . Investigate Features in the Parallel Coordinates Plot . . . . . . . . . . . . . .

23-44 23-44 23-46 23-48 23-48

Misclassification Costs in Classification Learner App . . . . . . . . . . . . . . Specify Misclassification Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assess Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Misclassification Costs in Exported Model and Generated Code . . . . . .

23-51 23-51 23-54 23-55

Hyperparameter Optimization in Classification Learner App . . . . . . . . Select Hyperparameters to Optimize . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minimum Classification Error Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-56 23-56 23-63 23-65 23-67

Visualize and Assess Classifier Performance in Classification Learner ........................................................ Check Performance in the Models Pane . . . . . . . . . . . . . . . . . . . . . . . . View Model Metrics in Summary Tab and Models Pane . . . . . . . . . . . . . Compare Model Information and Results in Table View . . . . . . . . . . . . . Plot Classifier Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Check Performance Per Class in the Confusion Matrix . . . . . . . . . . . . . Check ROC Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compare Model Plots by Changing Layout . . . . . . . . . . . . . . . . . . . . . . Evaluate Test Set Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . .

23-70 23-70 23-71 23-72 23-73 23-74 23-76 23-78 23-79

Export Plots in Classification Learner App . . . . . . . . . . . . . . . . . . . . . . .

23-81

Export Classification Model to Predict New Data . . . . . . . . . . . . . . . . . . Export the Model to the Workspace to Make Predictions for New Data .................................................... Make Predictions for New Data Using Exported Model . . . . . . . . . . . . . Generate MATLAB Code to Train the Model with New Data . . . . . . . . . Generate C Code for Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deploy Predictions Using MATLAB Compiler . . . . . . . . . . . . . . . . . . . . Export Model for Deployment to MATLAB Production Server . . . . . . . .

23-86 23-86 23-86 23-87 23-88 23-91 23-91

Train Decision Trees Using Classification Learner App . . . . . . . . . . . . .

23-93

Train Discriminant Analysis Classifiers Using Classification Learner App ....................................................... 23-103 Train Binary GLM Logistic Regression Classifier Using Classification Learner App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-107

Train Support Vector Machines Using Classification Learner App . . .

23-111

Train Nearest Neighbor Classifiers Using Classification Learner App

23-115

Train Kernel Approximation Classifiers Using Classification Learner App ....................................................... 23-119 Train Ensemble Classifiers Using Classification Learner App . . . . . . .

23-124

Train Naive Bayes Classifiers Using Classification Learner App . . . . .

23-128

Train Neural Network Classifiers Using Classification Learner App . .

23-138

Train and Compare Classifiers Using Misclassification Costs in Classification Learner App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-142

Train Classifier Using Hyperparameter Optimization in Classification Learner App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-150

Check Classifier Performance Using Test Set in Classification Learner App ....................................................... 23-158 Explain Model Predictions for Classifiers Trained in Classification Learner App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-163 Explain Local Model Predictions Using LIME Values . . . . . . . . . . . . . . 23-163

xxv

24

Explain Local Model Predictions Using Shapley Values . . . . . . . . . . . . Interpret Model Using Partial Dependence Plots . . . . . . . . . . . . . . . . .

23-167 23-171

Use Partial Dependence Plots to Interpret Classifiers Trained in Classification Learner App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-175

Deploy Model Trained in Classification Learner to MATLAB Production Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose Trained Model to Deploy . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export Model for Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (Optional) Simulate Model Deployment . . . . . . . . . . . . . . . . . . . . . . . . Package Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-185 23-185 23-186 23-187 23-188

Build Condition Model for Industrial Machinery and Manufacturing Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Load Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import Data into App and Partition Data . . . . . . . . . . . . . . . . . . . . . . . Train Models Using All Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assess Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export Model to the Workspace and Save App Session . . . . . . . . . . . . Check Model Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Resume App Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Select Features Using Feature Ranking . . . . . . . . . . . . . . . . . . . . . . . Investigate Important Features in Scatter Plot . . . . . . . . . . . . . . . . . . Further Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assess Model Accuracy on Test Set . . . . . . . . . . . . . . . . . . . . . . . . . . . Export Final Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-189 23-189 23-190 23-191 23-192 23-194 23-195 23-195 23-195 23-197 23-198 23-201 23-203

Export Model from Classification Learner to Experiment Manager . . Export Classification Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Select Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (Optional) Customize Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . Run Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-205 23-205 23-206 23-209 23-210

Tune Classification Model Using Experiment Manager . . . . . . . . . . . . Load and Partition Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Train Models in Classification Learner . . . . . . . . . . . . . . . . . . . . . . . . Assess Best Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export Model to Experiment Manager . . . . . . . . . . . . . . . . . . . . . . . . Run Experiment with Default Hyperparameters . . . . . . . . . . . . . . . . . Adjust Hyperparameters and Hyperparameter Values . . . . . . . . . . . . . Specify Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Customize Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export and Use Final Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-212 23-212 23-213 23-214 23-216 23-217 23-218 23-219 23-220 23-221

Regression Learner Train Regression Models in Regression Learner App . . . . . . . . . . . . . . . . Automated Regression Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . Manual Regression Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Regression Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xxvi

Contents

24-2 24-2 24-4 24-5

Compare and Improve Regression Models . . . . . . . . . . . . . . . . . . . . . . .

24-5

Select Data for Regression or Open Saved App Session . . . . . . . . . . . . . . 24-8 Select Data from Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-8 Import Data from File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-9 Example Data for Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-9 Choose Validation Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-10 (Optional) Reserve Data for Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-11 Save and Open App Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-11 Choose Regression Model Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose Regression Model Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Efficiently Trained Linear Regression Models . . . . . . . . . . . . . . . . . . . . Gaussian Process Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . Kernel Approximation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ensembles of Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24-13 24-13 24-15 24-17 24-19 24-21 24-23 24-25 24-27 24-28

Feature Selection and Feature Transformation Using Regression Learner App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-31 Investigate Features in the Response Plot . . . . . . . . . . . . . . . . . . . . . . . 24-31 Select Features to Include . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-32 Transform Features with PCA in Regression Learner . . . . . . . . . . . . . . 24-34 Hyperparameter Optimization in Regression Learner App . . . . . . . . . . Select Hyperparameters to Optimize . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minimum MSE Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24-36 24-36 24-44 24-46 24-48

Visualize and Assess Model Performance in Regression Learner . . . . . Check Performance in Models Pane . . . . . . . . . . . . . . . . . . . . . . . . . . . View Model Metrics in Summary Tab and Models Pane . . . . . . . . . . . . . Compare Model Information and Results in Table View . . . . . . . . . . . . . Explore Data and Results in Response Plot . . . . . . . . . . . . . . . . . . . . . . Plot Predicted vs. Actual Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evaluate Model Using Residuals Plot . . . . . . . . . . . . . . . . . . . . . . . . . . Compare Model Plots by Changing Layout . . . . . . . . . . . . . . . . . . . . . . Evaluate Test Set Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . .

24-50 24-50 24-51 24-52 24-54 24-56 24-57 24-59 24-59

Export Plots in Regression Learner App . . . . . . . . . . . . . . . . . . . . . . . . .

24-61

Export Regression Model to Predict New Data . . . . . . . . . . . . . . . . . . . . Export Model to Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Make Predictions for New Data Using Exported Model . . . . . . . . . . . . . Generate MATLAB Code to Train Model with New Data . . . . . . . . . . . . Generate C Code for Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deploy Predictions Using MATLAB Compiler . . . . . . . . . . . . . . . . . . . . Export Model for Deployment to MATLAB Production Server . . . . . . . .

24-65 24-65 24-65 24-66 24-67 24-69 24-69

Train Regression Trees Using Regression Learner App . . . . . . . . . . . . .

24-71

xxvii

Compare Linear Regression Models Using Regression Learner App . .

24-82

Train Regression Neural Networks Using Regression Learner App . . .

24-88

Train Kernel Approximation Model Using Regression Learner App . . .

24-95

Train Regression Model Using Hyperparameter Optimization in Regression Learner App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24-103

Check Model Performance Using Test Set in Regression Learner App .......................................................

24-109

Explain Model Predictions for Regression Models Trained in Regression Learner App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-114 Explain Local Model Predictions Using LIME Values . . . . . . . . . . . . . . 24-114 Explain Local Model Predictions Using Shapley Values . . . . . . . . . . . . 24-118 Interpret Model Using Partial Dependence Plots . . . . . . . . . . . . . . . . . 24-122 Use Partial Dependence Plots to Interpret Regression Models Trained in Regression Learner App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-125

25

Deploy Model Trained in Regression Learner to MATLAB Production Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose Trained Model to Deploy . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export Model for Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (Optional) Simulate Model Deployment . . . . . . . . . . . . . . . . . . . . . . . . Package Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24-137 24-137 24-138 24-138 24-139

Export Model from Regression Learner to Experiment Manager . . . . Export Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Select Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (Optional) Customize Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . Run Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24-141 24-141 24-142 24-144 24-145

Tune Regression Model Using Experiment Manager . . . . . . . . . . . . . . Load and Partition Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Train Models in Regression Learner . . . . . . . . . . . . . . . . . . . . . . . . . . Assess Best Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export Model to Experiment Manager . . . . . . . . . . . . . . . . . . . . . . . . Run Experiment with Default Hyperparameters . . . . . . . . . . . . . . . . . Adjust Hyperparameters and Hyperparameter Values . . . . . . . . . . . . . Specify Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add Residuals Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export and Use Final Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24-147 24-147 24-148 24-149 24-152 24-153 24-155 24-157 24-158 24-159

Support Vector Machines Support Vector Machines for Binary Classification . . . . . . . . . . . . . . . . . Understanding Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . Using Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Train SVM Classifiers Using a Gaussian Kernel . . . . . . . . . . . . . . . . . . . .

xxviii

Contents

25-2 25-2 25-6 25-8

26

Train SVM Classifier Using Custom Kernel . . . . . . . . . . . . . . . . . . . . . . Optimize Classifier Fit Using Bayesian Optimization . . . . . . . . . . . . . . . Plot Posterior Probability Regions for SVM Classification Models . . . . . Analyze Images Using Linear Support Vector Machines . . . . . . . . . . . .

25-11 25-15 25-24 25-26

Understanding Support Vector Machine Regression . . . . . . . . . . . . . . . Mathematical Formulation of SVM Regression . . . . . . . . . . . . . . . . . . . Solving the SVM Regression Optimization Problem . . . . . . . . . . . . . . . .

25-31 25-31 25-34

Fairness Introduction to Fairness in Binary Classification . . . . . . . . . . . . . . . . . . . Reduce Statistical Parity Difference Using Fairness Weights . . . . . . . . . . Reduce Disparate Impact of Predictions . . . . . . . . . . . . . . . . . . . . . . . . .

27

26-2 26-2 26-5

Interpretability Interpret Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-2 Features for Model Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-2 Interpret Classification Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-3 Interpret Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-10 Shapley Values for Machine Learning Model . . . . . . . . . . . . . . . . . . . . . What Is a Shapley Value? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shapley Value in Statistics and Machine Learning Toolbox . . . . . . . . . . Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Computation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computational Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduce Computational Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

27-18 27-18 27-18 27-18 27-21 27-24 27-24

Incremental Learning Incremental Learning Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is Incremental Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Incremental Learning with MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28-2 28-2 28-3

Incremental Anomaly Detection Overview . . . . . . . . . . . . . . . . . . . . . . . . What Is Incremental Anomaly Detection? . . . . . . . . . . . . . . . . . . . . . . . . Incremental Anomaly Detection with MATLAB . . . . . . . . . . . . . . . . . . . .

28-9 28-9 28-9

Configure Incremental Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . Call Object Directly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convert Traditionally Trained Model . . . . . . . . . . . . . . . . . . . . . . . . . .

28-14 28-16 28-20

xxix

Configure Model for Incremental Anomaly Detection . . . . . . . . . . . . . . Call Object Directly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convert Traditionally Trained Model . . . . . . . . . . . . . . . . . . . . . . . . . .

28-24 28-24 28-25

Implement Incremental Learning for Regression Using Succinct Workflow ........................................................ 28-27 Implement Incremental Learning for Classification Using Succinct Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28-30

Implement Incremental Learning for Regression Using Flexible Workflow ........................................................ 28-33

29

30

Implement Incremental Learning for Classification Using Flexible Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28-37

Initialize Incremental Learning Model from SVM Regression Model Trained in Regression Learner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28-41

Initialize Incremental Learning Model from Logistic Regression Model Trained in Classification Learner . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28-48

Perform Conditional Training During Incremental Learning . . . . . . . .

28-53

Perform Text Classification Incrementally . . . . . . . . . . . . . . . . . . . . . . .

28-57

Incremental Learning with Naive Bayes and Heterogeneous Data . . . .

28-60

Monitor Equipment State of Health Using Drift-Aware Learning . . . . .

28-67

Monitor Equipment State of Health Using Drift-Aware Learning on the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28-72

Markov Models Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29-2

Hidden Markov Models (HMM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Hidden Markov Models (HMM) . . . . . . . . . . . . . . . . . . . . Analyzing Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29-4 29-4 29-5

Design of Experiments Design of Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xxx

Contents

30-2

Full Factorial Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multilevel Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two-Level Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30-3 30-3 30-3

Fractional Factorial Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Fractional Factorial Designs . . . . . . . . . . . . . . . . . . . . . . Plackett-Burman Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . General Fractional Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30-5 30-5 30-5 30-5

Response Surface Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-8 Introduction to Response Surface Designs . . . . . . . . . . . . . . . . . . . . . . . 30-8 Central Composite Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-8 Box-Behnken Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-10 D-Optimal Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to D-Optimal Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate D-Optimal Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Augment D-Optimal Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Fixed Covariate Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Categorical Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Candidate Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30-12 30-12 30-13 30-14 30-15 30-16 30-16

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques ........................................................ 30-19

31

32

Statistical Process Control Control Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31-2

Capability Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31-4

Tall Arrays Logistic Regression with Tall Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32-2

Bayesian Optimization with Tall Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . .

32-9

Statistics and Machine Learning with Big Data Using Tall Arrays . . . .

32-23

xxxi

33

34

xxxii

Contents

Parallel Statistics Quick Start Parallel Computing for Statistics and Machine Learning Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Statistics and Machine Learning Toolbox Functionality . . . . . . . . How to Compute in Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33-2 33-2 33-2

Use Parallel Processing for Regression TreeBagger Workflow . . . . . . . .

33-4

Concepts of Parallel Computing in Statistics and Machine Learning Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subtleties in Parallel Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vocabulary for Parallel Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33-6 33-6 33-6

When to Run Statistical Functions in Parallel . . . . . . . . . . . . . . . . . . . . . Why Run in Parallel? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Factors Affecting Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Factors Affecting Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33-7 33-7 33-7 33-7

Analyze and Model Data on GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33-9

Working with parfor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How Statistical Functions Use parfor . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristics of parfor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33-14 33-14 33-14

Reproducibility in Parallel Statistical Computations . . . . . . . . . . . . . . . Issues and Considerations in Reproducing Parallel Computations . . . . . Running Reproducible Parallel Computations . . . . . . . . . . . . . . . . . . . . Parallel Statistical Computation Using Random Numbers . . . . . . . . . . .

33-16 33-16 33-16 33-17

Implement Jackknife Using Parallel Computing . . . . . . . . . . . . . . . . . .

33-20

Implement Cross-Validation Using Parallel Computing . . . . . . . . . . . . . Simple Parallel Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducible Parallel Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . .

33-21 33-21 33-21

Implement Bootstrap Using Parallel Computing . . . . . . . . . . . . . . . . . . Bootstrap in Serial and Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducible Parallel Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33-23 33-23 33-24

Code Generation Introduction to Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Code Generation Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Code Generation Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34-3 34-3 34-5

General Code Generation Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Define Entry-Point Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34-6 34-6 34-6

Verify Generated Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34-8

Code Generation for Prediction of Machine Learning Model at Command Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-10 Code Generation for Incremental Learning . . . . . . . . . . . . . . . . . . . . . .

34-14

Code Generation for Nearest Neighbor Searcher . . . . . . . . . . . . . . . . . .

34-20

Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34-23

Code Generation and Classification Learner App . . . . . . . . . . . . . . . . . . Load Sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enable PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Train Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export Model to Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate C Code for Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34-32 34-32 34-33 34-34 34-36 34-37

Deploy Neural Network Regression Model to FPGA/ASIC Platform . . .

34-40

Predict Class Labels Using MATLAB Function Block . . . . . . . . . . . . . . .

34-49

Specify Variable-Size Arguments for Code Generation . . . . . . . . . . . . .

34-54

Create Dummy Variables for Categorical Predictors and Generate C/C++ Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-59 System Objects for Classification and Code Generation . . . . . . . . . . . .

34-63

Predict Class Labels Using Stateflow . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34-71

Human Activity Recognition Simulink Model for Smartphone Deployment ........................................................ 34-75 Human Activity Recognition Simulink Model for Fixed-Point Deployment ........................................................ 34-84 Code Generation for Prediction and Update Using Coder Configurer .

34-90

Code Generation for Probability Distribution Objects . . . . . . . . . . . . . .

34-92

Fixed-Point Code Generation for Prediction of SVM . . . . . . . . . . . . . . .

34-97

Generate Code to Classify Data in Table . . . . . . . . . . . . . . . . . . . . . . . .

34-110

Code Generation for Image Classification . . . . . . . . . . . . . . . . . . . . . . .

34-113

Predict Class Labels Using ClassificationSVM Predict Block . . . . . . .

34-121

Predict Responses Using RegressionSVM Predict Block . . . . . . . . . . .

34-125

Predict Class Labels Using ClassificationTree Predict Block . . . . . . .

34-131

xxxiii

Predict Responses Using RegressionTree Predict Block . . . . . . . . . . .

34-137

Predict Class Labels Using ClassificationEnsemble Predict Block . . .

34-140

Predict Responses Using RegressionEnsemble Predict Block . . . . . .

34-147

Predict Class Labels Using ClassificationNeuralNetwork Predict Block ....................................................... 34-154 Predict Responses Using RegressionNeuralNetwork Predict Block . .

34-158

Predict Responses Using RegressionGP Predict Block . . . . . . . . . . . .

34-162

Predict Class Labels Using ClassificationKNN Predict Block . . . . . . .

34-168

Predict Class Labels Using ClassificationLinear Predict Block . . . . . .

34-174

Predict Responses Using RegressionLinear Predict Block . . . . . . . . .

34-178

Predict Class Labels Using ClassificationECOC Predict Block . . . . . .

34-182

Predict Class Labels Using ClassificationNaiveBayes Predict Block .

34-187

Code Generation for Binary GLM Logistic Regression Model Trained in Classification Learner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-193 Code Generation for Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . .

34-196

Compress Machine Learning Model for Memory-Limited Hardware .

34-202

Verify and Validate Machine Learning Models Using Model-Based Design ....................................................... 34-217 Find Nearest Neighbors Using KNN Search Block . . . . . . . . . . . . . . . .

34-237

Perform Incremental Learning Using IncrementalRegressionLinear Fit and Predict Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-241 Perform Incremental Learning Using IncrementalClassificationLinear Fit and Predict Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-245 Perform Incremental Learning and Track Performance Metrics Using Update Metrics Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xxxiv

Contents

34-249

35

A

Functions

Sample Data Sets Sample Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B

A-2

Probability Distributions Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-2 B-2 B-2 B-2 B-2 B-2 B-3 B-4

Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-6 B-6 B-6 B-6 B-7 B-7 B-9

Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-10 B-10 B-10 B-10 B-11 B-11 B-11 B-16

Birnbaum-Saunders Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-18 B-18 B-18 B-18

Burr Type XII Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fit a Burr Distribution and Draw the cdf . . . . . . . . . . . . . . . . . . . . . . . . Compare Lognormal and Burr Distribution pdfs . . . . . . . . . . . . . . . . . . .

B-19 B-19 B-19 B-20 B-21 B-23

xxxv

xxxvi

Contents

Burr pdf for Various Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Survival and Hazard Functions of Burr Distribution . . . . . . . . . . . . . . . . Divergence of Parameter Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-24 B-26 B-27

Chi-Square Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inverse Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-29 B-29 B-29 B-29 B-30 B-30 B-30 B-30 B-32

Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inverse Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . Hazard Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-34 B-34 B-34 B-35 B-35 B-35 B-35 B-36 B-39

Extreme Value Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-41 B-41 B-41 B-43 B-44

F Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-46 B-46 B-46 B-46

Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inverse Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-48 B-48 B-48 B-49 B-49 B-50 B-50 B-50 B-54

Generalized Extreme Value Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-56 B-56 B-56 B-57 B-58

Generalized Pareto Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-60 B-60 B-60 B-61

Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-62

Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hazard Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-64 B-64 B-64 B-64 B-65 B-65 B-65 B-65 B-67

Half-Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship to Other Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-69 B-69 B-69 B-69 B-71 B-73 B-73

Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-74 B-74 B-74 B-74

Inverse Gaussian Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-76 B-76 B-76 B-76

Inverse Wishart Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-77 B-77 B-77 B-77

Kernel Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kernel Density Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kernel Smoothing Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-79 B-79 B-79 B-79 B-83

Logistic Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship to Other Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-86 B-86 B-86 B-86 B-86

Loglogistic Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship to Other Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-87 B-87 B-87 B-87 B-87

Lognormal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-89 B-89

xxxvii

Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-89 B-90 B-90 B-90 B-95

Loguniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-97 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-97 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-97 Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-97 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-97 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-98 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-98 Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-101

xxxviii

Contents

Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship to Other Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-102 B-102 B-102 B-102 B-102 B-103

Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-104 B-104 B-104 B-104 B-105 B-105

Multivariate t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-110 B-110 B-110 B-110

Nakagami Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-114 B-114 B-114 B-114

Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-115 B-115 B-115 B-115 B-117

Noncentral Chi-Square Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-119 B-119 B-119 B-119

Noncentral F Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-121 B-121 B-121 B-121

Noncentral t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-123 B-123 B-123 B-123

Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-125 B-125 B-125 B-126 B-126 B-127 B-134

Pearson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-137 B-137 B-137 B-138 B-139 B-140 B-140

Piecewise Linear Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship to Other Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-145 B-145 B-145 B-145 B-145

Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-146 B-146 B-146 B-146 B-147 B-147 B-150

Rayleigh Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-152 B-152 B-152 B-152 B-152

Rician Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-154 B-154 B-154 B-154

Stable Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship to Other Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-156 B-156 B-156 B-157 B-159 B-161 B-162

xxxix

xl

Contents

Student's t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inverse Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-165 B-165 B-165 B-165 B-166 B-166 B-166 B-166 B-170

t Location-Scale Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship to Other Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-172 B-172 B-172 B-172 B-173 B-173 B-173

Triangular Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-174 B-174 B-174 B-174 B-175 B-175

Uniform Distribution (Continuous) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Random Number Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-179 B-179 B-179 B-180 B-180 B-180 B-180 B-180 B-183

Uniform Distribution (Discrete) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-184 B-184 B-184 B-184

Weibull Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inverse Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . Hazard Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-186 B-186 B-186 B-187 B-187 B-187 B-188 B-188 B-191

Wishart Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-193 B-193 B-193 B-193 B-193

C

Bibliography Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

C-2

xli

1 Getting Started • “Statistics and Machine Learning Toolbox Product Description” on page 1-2 • “Supported Data Types” on page 1-3

1

Getting Started

Statistics and Machine Learning Toolbox Product Description Analyze and model data using statistics and machine learning Statistics and Machine Learning Toolbox provides functions and apps to describe, analyze, and model data. You can use descriptive statistics, visualizations, and clustering for exploratory data analysis, fit probability distributions to data, generate random numbers for Monte Carlo simulations, and perform hypothesis tests. Regression and classification algorithms let you draw inferences from data and build predictive models either interactively, using the Classification and Regression Learner apps, or programmatically, using AutoML. For multidimensional data analysis and feature extraction, the toolbox provides principal component analysis (PCA), regularization, dimensionality reduction, and feature selection methods that let you identify variables with the best predictive power. The toolbox provides supervised, semi-supervised and unsupervised machine learning algorithms, including support vector machines (SVMs), boosted decision trees, k-means, and other clustering methods. You can apply interpretability techniques such as partial dependence plots and LIME, and automatically generate C/C++ code for embedded deployment. Many toolbox algorithms can be used on data sets that are too big to be stored in memory.

1-2

Supported Data Types

Supported Data Types Statistics and Machine Learning Toolbox supports the following data types for input arguments: • Numeric scalars, vectors, matrices, or arrays having single- or double-precision entries. These data forms have data type single or double. Examples include response variables, predictor variables, and numeric values. • Cell arrays of character vectors; character, string, logical, or categorical arrays; or numeric vectors for categorical variables representing grouping data. These data forms have data types cell (specifically cellstr), char, string, logical, categorical, and single or double, respectively. An example is an array of class labels in machine learning. • You can also use nominal or ordinal arrays for categorical data. However, the nominal and ordinal data types are not recommended. To work with nominal or ordinal categorical data, use the categorical data type instead. • You can use signed or unsigned integers, e.g., int8 or uint8. However: • Estimation functions might not support signed or unsigned integer data types for nongrouping data. • If you recast a single or double numeric vector containing NaN values to a signed or unsigned integer, then the software converts the NaN elements to 0. • Some functions support tabular arrays for heterogeneous data (for details, see “Tables”). The table data type contains variables of any of the data types previously listed. An example is mixed categorical and numerical predictor data for regression analysis. • For some functions, you can also use dataset arrays for heterogeneous data. However, the dataset data type is not recommended. To work with heterogeneous data, use the table data type if the estimation function supports it. • Functions that do not support the table data type support sample data of type single or double, e.g., matrices. • Some functions accept gpuArray input arguments so that they execute on the GPU. For the full list of Statistics and Machine Learning Toolbox functions that accept GPU arrays, see Function List (GPU Arrays). • Some functions accept tall array input arguments to work with large data sets. For the full list of Statistics and Machine Learning Toolbox functions that accept tall arrays, see Function List (Tall Arrays). • Some functions accept sparse matrices, i.e., matrix A such that issparse(A) returns 1. For functions that do not accept sparse matrices, recast the data to a full matrix by using full. Statistics and Machine Learning Toolbox does not support the following data types: • Complex numbers. • Custom numeric data types, e.g., a variable that is double precision and an object. • Signed or unsigned numeric integers for nongrouping data, e.g., uint8 and int16. Note If you specify data of an unsupported type, then the software might return an error or unexpected results.

1-3

2 Organizing Data • “Test Differences Between Category Means” on page 2-2 • “Grouping Variables” on page 2-11 • “Dummy Variables” on page 2-13 • “Linear Regression with Categorical Covariates” on page 2-17

2

Organizing Data

Test Differences Between Category Means This example shows how to test for significant differences between category (group) means using a ttest, two-way ANOVA (analysis of variance), and ANOCOVA (analysis of covariance) analysis. Determine if the expected miles per gallon for a car depends on the decade in which it was manufactured or the location where it was manufactured. Load Sample Data load carsmall unique(Model_Year) ans = 3×1 70 76 82

The variable MPG has miles per gallon measurements on a sample of 100 cars. The variables Model_Year and Origin contain the model year and country of origin for each car. The first factor of interest is the decade of manufacture. There are three manufacturing years in the data. Create Factor for Decade of Manufacture Create a categorical array named Decade by merging the observations from years 70 and 76 into a category labeled 1970s, and putting the observations from 82 into a category labeled 1980s. Decade = discretize(Model_Year,[70 77 82], ... "categorical",["1970s","1980s"]); categories(Decade) ans = 2x1 cell {'1970s'} {'1980s'}

Plot Data Grouped by Category Draw a box plot of miles per gallon, grouped by the decade of manufacture. boxplot(MPG,Decade) title("Miles per Gallon, Grouped by Decade of Manufacture")

2-2

Test Differences Between Category Means

The box plot suggests that miles per gallon is higher in cars manufactured during the 1980s compared to the 1970s. Compute Summary Statistics Compute the mean and variance of miles per gallon for each decade. [xbar,s2,grp] = grpstats(MPG,Decade,["mean","var","gname"]) xbar = 2×1 19.7857 31.7097 s2 = 2×1 35.1429 29.0796 grp = 2x1 cell {'1970s'} {'1980s'}

This output shows that the mean miles per gallon in the 1980s was approximately 31.71, compared to 19.79 in the 1970s. The variances in the two groups are similar. 2-3

2

Organizing Data

Conduct Two-Sample t-Test for Equal Group Means Conduct a two-sample t-test, assuming equal variances, to test for a significant difference between the group means. The hypothesis is H0 : μ70 = μ80 H A : μ70 ≠ μ80 . MPG70 = MPG(Decade=="1970s"); MPG80 = MPG(Decade=="1980s"); [h,p] = ttest2(MPG70,MPG80) h = 1 p = 3.4809e-15

The logical value 1 indicates the null hypothesis is rejected at the default 0.05 significance level. The p-value for the test is very small. There is sufficient evidence that the mean miles per gallon in the 1980s differs from the mean miles per gallon in the 1970s. Create Factor for Location of Manufacture The second factor of interest is the location of manufacture. First, convert Origin to a categorical array. Location = categorical(cellstr(Origin)); tabulate(Location) Value France Germany Italy Japan Sweden USA

Count 4 9 1 15 2 69

Percent 4.00% 9.00% 1.00% 15.00% 2.00% 69.00%

There are six different countries of manufacture. The European countries have relatively few observations. Merge Categories Combine the categories France, Germany, Italy, and Sweden into a new category named Europe. Location = mergecats(Location, ... ["France","Germany","Italy","Sweden"],"Europe"); tabulate(Location) Value Europe Japan USA

Count 16 15 69

Percent 16.00% 15.00% 69.00%

Compute Summary Statistics Compute the mean miles per gallon, grouped by the location of manufacture. [meanMPG,locationGroup] = grpstats(MPG,Location,["mean","gname"])

2-4

Test Differences Between Category Means

meanMPG = 3×1 26.6667 31.8000 21.1328 locationGroup = 3x1 cell {'Europe'} {'Japan' } {'USA' }

This result shows that average miles per gallon is lowest for the sample of cars manufactured in the U.S. Conduct Two-Way ANOVA Conduct a two-way ANOVA to test for differences in expected miles per gallon between factor levels for Decade and Location. The statistical model is MPGi j = μ + αi + β j + ϵi j,

i = 1, 2; j = 1, 2, 3,

where MPGi j is the response, miles per gallon, for cars made in decade i at location j. The treatment effects for the first factor, decade of manufacture, are the αi terms (constrained to sum to zero). The treatment effects for the second factor, location of manufacture, are the β j terms (constrained to sum to zero). The ϵi j are uncorrelated, normally distributed noise terms. The hypotheses to test are equality of decade effects, H0 : α1 = α2 = 0 H A : at least one αi ≠ 0, and equality of location effects, H0 : β1 = β2 = β3 = 0 H A : at least one β j ≠ 0 . You can conduct a multiple-factor ANOVA using anovan. anovan(MPG,{Decade,Location}, ... "Varnames",["Decade","Location"]);

2-5

2

Organizing Data

This output shows the results of the two-way ANOVA. The p-value for testing the equality of decade effects is 2.88503e-18, so the null hypothesis is rejected at the 0.05 significance level. The p-value for testing the equality of location effects is 7.40416e-10, so this null hypothesis is also rejected. Conduct ANOCOVA Analysis A potential confounder in this analysis is car weight. Cars with greater weight are expected to have lower gas mileage. Include the variable Weight as a continuous covariate in the ANOVA; that is, conduct an ANOCOVA analysis. Assuming parallel lines, the statistical model is MPGi jk = μ + αi + β j + γWeighti jk + ϵi jk,

i = 1, 2; j = 1, 2, 3; k = 1, . . . , 100 .

The difference between this model and the two-way ANOVA model is the inclusion of the continuous predictor Weighti jk, the weight for the kth car, which was made in the ith decade and in the jth location. The slope parameter is γ. Add the continuous covariate as a third group in the second anovan input argument. Use the Continuous name-value argument to specify that Weight (the third group) is continuous. anovan(MPG,{Decade,Location,Weight},"Continuous",3, ... "Varnames",["Decade","Location","Weight"]);

2-6

Test Differences Between Category Means

This output shows that when car weight is considered, there is insufficient evidence of a manufacturing location effect (p-value = 0.1044). Use Interactive Tool You can use the interactive aoctool to explore this result. This command opens three dialog boxes. aoctool(Weight,MPG,Location);

2-7

2

Organizing Data

In the ANOCOVA Prediction Plot dialog box, select the Separate Means model.

2-8

Test Differences Between Category Means

This output shows that when you do not include Weight in the model, there are fairly large differences in the expected miles per gallon among the three manufacturing locations. Note that here the model does not adjust for the decade of manufacturing. Now, select the Parallel Lines model.

2-9

2

Organizing Data

When you include Weight in the model, the difference in expected miles per gallon among the three manufacturing locations is much smaller.

See Also categorical | boxplot | grpstats | ttest2 | anovan | aoctool

Related Examples

2-10

•

“Linear Regression with Categorical Covariates” on page 2-17

•

“Grouping Variables” on page 2-11

Grouping Variables

Grouping Variables In this section... “What Are Grouping Variables?” on page 2-11 “Group Definition” on page 2-11 “Analysis Using Grouping Variables” on page 2-12 “Missing Group Values” on page 2-12

What Are Grouping Variables? Grouping variables are utility variables used to group, or categorize, observations. Grouping variables are useful for summarizing or visualizing data by group. A grouping variable can be any of these data types: • Numeric vector • Logical vector • Character array • String array • Cell array of character vectors • Categorical vector A grouping variable must have the same number of observations (rows) as the table, dataset array, or numeric array you are grouping. Observations that have the same grouping variable value belong to the same group. For example, the following variables comprise the same groups. Each grouping variable divides five observations into two groups. The first group contains the first and fourth observations. The other three observations are in the second group. Data Type

Grouping Variable

Numeric vector

[1 2 2 1 2]

Logical vector

[0 1 1 0 1]

String array

["Male","Female","Female","Male","Female"]

Cell array of character vectors

{'Male','Female','Female','Male','Female'}

Categorical vector

Male Female Female Male Female

Use grouping variables with labels to give each group a meaningful name. A categorical vector is an efficient and flexible choice of grouping variable.

Group Definition Typically, there are as many groups as unique values in the grouping variable. However, categorical vectors can have levels that are not represented in the data. The groups and the order of the groups depend on the data type of the grouping variable. Suppose G is a grouping variable. • If G is a numeric or logical vector, then the groups correspond to the distinct values in G, in the sorted order of the unique values. 2-11

2

Organizing Data

• If G is a character array, string array, or cell array of character vectors, then the groups correspond to the distinct elements in G, in the order of their first appearance. • If G is a categorical vector, then the groups correspond to the unique category levels in G, in the order returned by categories. Some functions, such as grpstats, accept multiple grouping variables specified as a cell array of grouping variables, for example, {G1,G2,G3}. In this case, the groups are defined by the unique combinations of values in the grouping variables. The order is decided first by the order of the first grouping variable, then by the order of the second grouping variable, and so on.

Analysis Using Grouping Variables This table lists common tasks you might want to perform using grouping variables. Grouping Task

Function Accepting Grouping Variable

Draw side-by-side box plots for data in different groups.

boxplot

Draw a scatter plot with markers colored by group.

gscatter

Draw a scatter plot matrix with markers colored by group.

gplotmatrix

Compute summary statistics by group.

grpstats

Test for differences between group means.

anovan

Create an index vector from a grouping variable.

grp2idx

Missing Group Values Grouping variables can have missing values provided you include a valid indicator. Grouping Variable Data Type

Missing Value Indicator

Numeric vector

NaN

Logical vector

(Cannot be missing)

Character array

Row of spaces

String array

or ""

Cell array of character vectors

''

Categorical vector

See Also categorical

2-12

Dummy Variables

Dummy Variables In this section... “What Are Dummy Variables?” on page 2-13 “Creating Dummy Variables” on page 2-14 This topic provides an introduction to dummy variables, describes how the software creates them for classification and regression problems, and shows how you can create dummy variables by using the dummyvar function.

What Are Dummy Variables? When you perform classification and regression analysis, you often need to include both continuous (quantitative) and categorical (qualitative) predictor variables. A categorical variable must not be included as a numeric array. Numeric arrays have both order and magnitude. A categorical variable can have order (for example, an ordinal variable), but it does not have magnitude. Using a numeric array implies a known “distance” between the categories. The appropriate way to include categorical predictors is as dummy variables. To define dummy variables, use indicator variables that have the values 0 and 1. The software chooses one of four schemes to define dummy variables based on the type of analysis, as described in the next sections. For example, suppose you have a categorical variable with three categories: Cool, Cooler, and Coolest. Full Dummy Variables Represent the categorical variable with three categories using three dummy variables, one variable for each category.

X0 is a dummy variable that has the value 1 for Cool, and 0 otherwise. X1 is a dummy variable that has the value 1 for Cooler, and 0 otherwise. X2 is a dummy variable that has the value 1 for Coolest, and 0 otherwise. Dummy Variables with Reference Group Represent the categorical variable with three categories using two dummy variables with a reference group.

2-13

2

Organizing Data

You can distinguish Cool, Cooler, and Coolest using only X1 and X2, without X0. Observations for Cool have 0s for both dummy variables. The category represented by all 0s is the reference group. Dummy Variables for Ordered Categorical Variable Assume the mathematical ordering of the categories is Cool < Cooler < Coolest. This coding scheme uses 1 and –1 values, and uses more 1s for higher categories, to indicate the ordering.

X1 is a dummy variable that has the value 1 for Cooler and Coolest, and –1 for Cool. X2 is a dummy variable that has the value 1 for Coolest, and –1 otherwise. You can indicate that a categorical variable has mathematical ordering by using the 'Ordinal' name-value pair argument of the categorical function. Dummy Variables Created with Effects Coding Effects coding uses 1, 0, and –1 to create dummy variables. Instead of using 0 values to represent a reference group, as in “Dummy Variables with Reference Group” on page 2-13, effects coding uses –1 to represent the last category.

Creating Dummy Variables Automatic Creation of Dummy Variables Statistics and Machine Learning Toolbox offers several classification and regression fitting functions that accept categorical predictors. Some fitting functions create dummy variables to handle categorical predictors. The following is the default behavior of the fitting functions in identifying categorical predictors. • If the predictor data is in a table, the functions assume that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. The fitting functions that use decision trees assume ordered categorical vectors to be continuous variables. • If the predictor data is a matrix, the functions assume all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the 'CategoricalPredictors' or 'CategoricalVars' name-value pair argument. The fitting functions handle the identified categorical predictors as follows: 2-14

Dummy Variables

• fitckernel, fitclinear, fitcnet, fitcsvm, fitrgp, fitrkernel, fitrlinear, fitrnet, and fitrsvm use two different schemes to create dummy variables, depending on whether a categorical variable is unordered or ordered. • For an unordered categorical variable, the functions use “Full Dummy Variables” on page 2-13. • For an ordered categorical variable, the functions use “Dummy Variables for Ordered Categorical Variable” on page 2-14. • Parametric regression fitting functions such as fitlm, fitglm, and fitcox use “Dummy Variables with Reference Group” on page 2-13. When the functions include the dummy variables, the estimated coefficients of the dummy variables are relative to the reference group. For an example, see “Linear Regression with Categorical Predictor” on page 35-2621. • fitlme, fitlmematrix and fitglme allow you to specify the scheme for creating dummy variables by using the 'DummyVarCoding' name-value pair argument. The functions support three schemes: “Full Dummy Variables” on page 2-13 ('DummyVarCoding','full'), “Dummy Variables with Reference Group” on page 2-13 ('DummyVarCoding','reference'), and “Dummy Variables Created with Effects Coding” on page 2-14 ('DummyVarCoding','effects'). Note that these functions do not offer a name-value pair argument for specifying categorical variables. • fitrm uses “Dummy Variables Created with Effects Coding” on page 2-14. • Other fitting functions that accept categorical predictors use algorithms that can handle categorical predictors without creating dummy variables. Manual Creation of Dummy Variables This example shows how to create your own dummy variable design matrix by using the dummyvar function. This function accepts grouping variables and returns a matrix containing zeros and ones, whose columns are dummy variables for the grouping variables. Create a column vector of categorical data specifying gender. gender = categorical({'Male';'Female';'Female';'Male';'Female'});

Create dummy variables for gender. dv = dummyvar(gender) dv = 5×2 0 1 1 0 1

1 0 0 1 0

dv has five rows corresponding to the number of rows in gender and two columns for the unique groups, Female and Male. Column order corresponds to the order of the levels in gender. For categorical arrays, the default order is ascending alphabetical. You can check the order by using the categories function. categories(gender) ans = 2x1 cell {'Female'}

2-15

2

Organizing Data

{'Male'

}

To use the dummy variables in a regression model, you must either delete a column (to create a reference group) or fit a regression model with no intercept term. For the gender example, you need only one dummy variable to represent two genders. Notice what happens if you add an intercept term to the complete design matrix dv. X = [ones(5,1) dv] X = 5×3 1 1 1 1 1

0 1 1 0 1

1 0 0 1 0

rank(X) ans = 2

The design matrix with an intercept term is not of full rank and is not invertible. Because of this linear dependence, use only c – 1 indicator variables to represent a categorical variable with c categories in a regression model with an intercept term.

See Also dummyvar | categorical

Related Examples

2-16

•

“Linear Regression with Categorical Covariates” on page 2-17

•

“Test Differences Between Category Means” on page 2-2

Linear Regression with Categorical Covariates

Linear Regression with Categorical Covariates This example shows how to perform a regression with categorical covariates using categorical arrays and fitlm. Load sample data. load carsmall

The variable MPG contains measurements on the miles per gallon of 100 sample cars. The model year of each car is in the variable Model_Year, and Weight contains the weight of each car. Plot grouped data. Draw a scatter plot of MPG against Weight, grouped by model year. figure() gscatter(Weight,MPG,Model_Year,'bgr','x.o') title('MPG vs. Weight, Grouped by Model Year')

The grouping variable, Model_Year, has three unique values, 70, 76, and 82, corresponding to model years 1970, 1976, and 1982. Create table and categorical array. Create a table that contains the variables MPG, Weight, and Model_Year. Convert the variable Model_Year to a categorical array. cars = table(MPG,Weight,Model_Year); cars.Model_Year = categorical(cars.Model_Year);

2-17

2

Organizing Data

Fit a regression model. Fit a regression model using fitlm with MPG as the dependent variable, and Weight and Model_Year as the independent variables. Because Model_Year is a categorical covariate with three levels, it should enter the model as two indicator variables. The scatter plot suggests that the slope of MPG against Weight might differ for each model year. To assess this, include weight-year interaction terms. The proposed model is E(MPG) = β0 + β1Weight + β2I[1976] + β3I[1982] + β4Weight × I[1976] + β5Weight × I[1982], where I[1976] and I[1982] are dummy variables indicating the model years 1976 and 1982, respectively. I[1976] takes the value 1 if model year is 1976 and takes the value 0 if it is not. I[1982] takes the value 1 if model year is 1982 and takes the value 0 if it is not. In this model, 1970 is the reference year. fit = fitlm(cars,'MPG~Weight*Model_Year') fit = Linear regression model: MPG ~ 1 + Weight*Model_Year Estimated Coefficients:

(Intercept) Weight Model_Year_76 Model_Year_82 Weight:Model_Year_76 Weight:Model_Year_82

(Intercept) Weight Model_Year_76 Model_Year_82 Weight:Model_Year_76 Weight:Model_Year_82

Estimate ___________

SE __________

37.399 -0.0058437 4.6903 21.051 -0.00082009 -0.0050551

2.1466 0.00061765 2.8538 4.157 0.00085468 0.0015636

tStat ________

pValue __________

17.423 -9.4612 1.6435 5.0641 -0.95953 -3.2329

2.8607e-30 4.6077e-15 0.10384 2.2364e-06 0.33992 0.0017256

Number of observations: 94, Error degrees of freedom: 88 Root Mean Squared Error: 2.79 R-squared: 0.886, Adjusted R-Squared: 0.88 F-statistic vs. constant model: 137, p-value = 5.79e-40

The regression output shows:

2-18

Linear Regression with Categorical Covariates

• fitlm recognizes Model_Year as a categorical variable, and constructs the required indicator (dummy) variables. By default, the first level, 70, is the reference group (use reordercats to change the reference group). • The model specification, MPG~Weight*Model_Year, specifies the first-order terms for Weight and Model_Year, and all interactions. • The model R2 = 0.886, meaning the variation in miles per gallon is reduced by 88.6% when you consider weight, model year, and their interactions. • The fitted model is MP G = 37.4 − 0.006Weight + 4.7I[1976] + 21.1I[1982] − 0.0008Weight × I[1976] − 0.005Weight × I[1982] . Thus, the estimated regression equations for the model years are as follows. Model Year

Predicted MPG Against Weight

1970

MP G = 37.4 − 0.006Weight

1976

MP G = (37.4 + 4.7) − (0.006 + 0.0008)Weight

1982

MP G = (37.4 + 21.1) − (0.006 + 0.005)Weight

The relationship between MPG and Weight has an increasingly negative slope as the model year increases. Plot fitted regression lines. Plot the data and fitted regression lines. w = linspace(min(Weight),max(Weight)); figure() gscatter(Weight,MPG,Model_Year,'bgr','x.o') line(w,feval(fit,w,'70'),'Color','b','LineWidth',2) line(w,feval(fit,w,'76'),'Color','g','LineWidth',2) line(w,feval(fit,w,'82'),'Color','r','LineWidth',2) title('Fitted Regression Lines by Model Year')

2-19

2

Organizing Data

Test for different slopes. Test for significant differences between the slopes. This is equivalent to testing the hypothesis H0 : β4 = β5 = 0 H A : βi ≠ 0 for at least one i . anova(fit) ans = Weight Model_Year Weight:Model_Year Error

SumSq 2050.2 807.69 81.219 683.74

DF 1 2 2 88

MeanSq 2050.2 403.84 40.609 7.7698

F 263.87 51.976 5.2266

pValue 3.2055e-28 1.2494e-15 0.0071637

This output shows that the p-value for the test is 0.0072 (from the interaction row, Weight:Model_Year), so the null hypothesis is rejected at the 0.05 significance level. The value of the test statistic is 5.2266. The numerator degrees of freedom for the test is 2, which is the number of coefficients in the null hypothesis. There is sufficient evidence that the slopes are not equal for all three model years.

See Also fitlm | categorical | reordercats | anova

Related Examples • 2-20

“Test Differences Between Category Means” on page 2-2

Linear Regression with Categorical Covariates

•

“Linear Regression” on page 11-9

•

“Linear Regression Workflow” on page 11-35

•

“Interpret Linear Regression Results” on page 11-52

More About •

“Grouping Variables” on page 2-11

•

“Dummy Variables” on page 2-13

2-21

3 Descriptive Statistics • “Measures of Central Tendency” on page 3-2 • “Measures of Dispersion” on page 3-4 • “Exploratory Analysis of Data” on page 3-6 • “Resampling Statistics” on page 3-10

3

Descriptive Statistics

Measures of Central Tendency Measures of central tendency locate a distribution of data along an appropriate scale. The following table lists the functions that calculate the measures of central tendency. Function Name

Description

geomean

Geometric mean

harmmean

Harmonic mean

mean

Arithmetic average

median

50th percentile

mode

Most frequent value

trimmean

Trimmed mean

The average is a simple and popular estimate of location. If the data sample comes from a normal distribution, then the sample mean is also optimal (minimum variance unbiased estimator (MVUE) of µ). Unfortunately, outliers, data entry errors, or glitches exist in almost all real data. The sample mean is sensitive to these problems. One bad data value can move the average away from the center of the rest of the data by an arbitrarily large distance. The median and trimmed mean are two measures that are resistant (robust) to outliers. The median is the 50th percentile of the sample, which will only change slightly if you add a large perturbation to any value. The idea behind the trimmed mean is to ignore a small percentage of the highest and lowest values of a sample when determining the center of the sample. The geometric mean and harmonic mean, like the average, are not robust to outliers. They are useful when the sample is distributed lognormal or heavily skewed.

Measures of Central Tendency This example shows how to compute and compare measures of location for sample data that contains one outlier. Generate sample data that contains one outlier. x = [ones(1,6),100] x = 1×7 1

1

1

1

1

1

100

Compute the geometric mean, harmonic mean, mean, median, and trimmed mean for the sample data. locate = [geomean(x) harmmean(x) mean(x) median(x)... trimmean(x,25)] locate = 1×5

3-2

Measures of Central Tendency

1.9307

1.1647

15.1429

1.0000

1.0000

The mean (mean) is far from any data value because of the influence of the outlier. The geometric mean (geomean) and the harmonic mean (harmmean) are influenced by the outlier, but not as significantly. The median (median) and trimmed mean (trimmean) ignore the outlier value and describe the location of the rest of the data values.

See Also Related Examples •

“Exploratory Analysis of Data” on page 3-6

3-3

3

Descriptive Statistics

Measures of Dispersion The purpose of measures of dispersion is to find out how spread out the data values are on the number line. Another term for these statistics is measures of spread. The table gives the function names and descriptions. Function Name

Description

iqr

Interquartile range

mad

Mean absolute deviation

moment

Central moment of all orders

range

Range

std

Standard deviation

var

Variance

The range (the difference between the maximum and minimum values) is the simplest measure of spread. But if there is an outlier in the data, it will be the minimum or maximum value. Thus, the range is not robust to outliers. The standard deviation and the variance are popular measures of spread that are optimal for normally distributed samples. The sample variance is the minimum variance unbiased estimator (MVUE) of the normal parameter σ2. The standard deviation is the square root of the variance and has the desirable property of being in the same units as the data. That is, if the data is in meters, the standard deviation is in meters as well. The variance is in meters2, which is more difficult to interpret. Neither the standard deviation nor the variance is robust to outliers. A data value that is separate from the body of the data can increase the value of the statistics by an arbitrarily large amount. The mean absolute deviation (MAD) is also sensitive to outliers. But the MAD does not move quite as much as the standard deviation or variance in response to bad data. The interquartile range (IQR) is the difference between the 75th and 25th percentile of the data. Since only the middle 50% of the data affects this measure, it is robust to outliers.

Compare Measures of Dispersion This example shows how to compute and compare measures of dispersion for sample data that contains one outlier. Generate sample data that contains one outlier value. x = [ones(1,6),100] x = 1×7 1

1

1

1

1

1

100

Compute the interquartile range, mean absolute deviation, range, and standard deviation of the sample data. 3-4

Measures of Dispersion

stats = [iqr(x),mad(x),range(x),std(x)] stats = 1×4 0

24.2449

99.0000

37.4185

The interquartile range (iqr) is the difference between the 75th and 25th percentile of the sample data, and is robust to outliers. The range (range) is the difference between the maximum and minimum values in the data, and is strongly influenced by the presence of an outlier. Both the mean absolute deviation (mad) and the standard deviation (std) are sensitive to outliers. However, the mean absolute deviation is less sensitive than the standard deviation.

See Also Related Examples •

“Exploratory Analysis of Data” on page 3-6

3-5

3

Descriptive Statistics

Exploratory Analysis of Data This example shows how to explore the distribution of data using descriptive statistics. Generate sample data Generate a vector containing randomly-generated sample data. rng default % For reproducibility x = [normrnd(4,1,1,100),normrnd(6,0.5,1,200)];

Plot a histogram Plot a histogram of the sample data with a normal density fit. This provides a visual comparison of the sample data and a normal distribution fitted to the data. histfit(x)

The distribution of the data appears to be left skewed. A normal distribution does not look like a good fit for this sample data. Obtain a normal probability plot Obtain a normal probability plot. This plot provides another way to visually compare the sample data to a normal distribution fitted to the data. 3-6

Exploratory Analysis of Data

probplot('normal',x)

The probability plot also shows the deviation of data from normality. Create a box plot Create a box plot to visualize the statistics. boxplot(x)

3-7

3

Descriptive Statistics

The box plot shows the 0.25, 0.5, and 0.75 quantiles. The long lower tail and plus signs show the lack of symmetry in the sample data values. Compute descriptive statistics Compute the mean and median of the data. y = [mean(x),median(x)] y = 1×2 5.3438

5.6872

The mean and median values seem close to each other, but a mean smaller than the median usually indicates that the data is left skewed. Compute the skewness and kurtosis of the data. y = [skewness(x),kurtosis(x)] y = 1×2 -1.0417

3.5895

A negative skewness value means the data is left skewed. The data has a larger peakedness than a normal distribution because the kurtosis value is greater than 3. 3-8

Exploratory Analysis of Data

Compute z-scores Identify possible outliers by computing the z-scores and finding the values that are greater than 3 or less than -3. Z = zscore(x); find(abs(Z)>3);

Based on the z-scores, the 3rd and 35th observations might be outliers.

See Also boxplot | histfit | kurtosis | mean | median | prctile | quantile | skewness

More About •

“Compare Grouped Data Using Box Plots” on page 4-4

•

“Measures of Central Tendency” on page 3-2

•

“Measures of Dispersion” on page 3-4

3-9

3

Descriptive Statistics

Resampling Statistics In this section... “Bootstrap Resampling” on page 3-10 “Jackknife Resampling” on page 3-12 “Parallel Computing Support for Resampling Methods” on page 3-13

Bootstrap Resampling The bootstrap procedure involves choosing random samples with replacement from a data set and analyzing each sample the same way. Sampling with replacement means that each observation is selected separately at random from the original dataset. So a particular data point from the original data set could appear multiple times in a given bootstrap sample. The number of elements in each bootstrap sample equals the number of elements in the original data set. The range of sample estimates you obtain enables you to establish the uncertainty of the quantity you are estimating. This example from Efron and Tibshirani compares Law School Admission Test (LSAT) scores and subsequent law school grade point average (GPA) for a sample of 15 law schools. load lawdata plot(lsat,gpa,'+') lsline

3-10

Resampling Statistics

The least-squares fit line indicates that higher LSAT scores go with higher law school GPAs. But how certain is this conclusion? The plot provides some intuition, but nothing quantitative. You can calculate the correlation coefficient of the variables using the |corr|function. rhohat = corr(lsat,gpa) rhohat = 0.7764

Now you have a number describing the positive connection between LSAT and GPA; though it may seem large, you still do not know if it is statistically significant. Using the bootstrp function you can resample the lsat and gpa vectors as many times as you like and consider the variation in the resulting correlation coefficients. rng default % For reproducibility rhos1000 = bootstrp(1000,'corr',lsat,gpa);

This resamples the lsat and gpa vectors 1000 times and computes the corr function on each sample. You can then plot the result in a histogram. histogram(rhos1000,30,'FaceColor',[.8 .8 1])

Nearly all the estimates lie on the interval [0.4 1.0]. It is often desirable to construct a confidence interval for a parameter estimate in statistical inferences. Using the bootci function, you can use bootstrapping to obtain a confidence interval for the lsat and gpa data. 3-11

3

Descriptive Statistics

ci = bootci(5000,@corr,lsat,gpa) ci = 2×1 0.3319 0.9427

Therefore, a 95% confidence interval for the correlation coefficient between LSAT and GPA is [0.33 0.94]. This is strong quantitative evidence that LSAT and subsequent GPA are positively correlated. Moreover, this evidence does not require any strong assumptions about the probability distribution of the correlation coefficient. Although the bootci function computes the Bias Corrected and accelerated (BCa) interval as the default type, it is also able to compute various other types of bootstrap confidence intervals, such as the studentized bootstrap confidence interval.

Jackknife Resampling Similar to the bootstrap is the jackknife, which uses resampling to estimate the bias of a sample statistic. Sometimes it is also used to estimate standard error of the sample statistic. The jackknife is implemented by the Statistics and Machine Learning Toolbox™ function jackknife. The jackknife resamples systematically, rather than at random as the bootstrap does. For a sample with n points, the jackknife computes sample statistics on n separate samples of size n-1. Each sample is the original data with a single observation omitted. In the bootstrap example, you measured the uncertainty in estimating the correlation coefficient. You can use the jackknife to estimate the bias, which is the tendency of the sample correlation to overestimate or under-estimate the true, unknown correlation. First compute the sample correlation on the data. load lawdata rhohat = corr(lsat,gpa) rhohat = 0.7764

Next compute the correlations for jackknife samples, and compute their mean. rng default; % For reproducibility jackrho = jackknife(@corr,lsat,gpa); meanrho = mean(jackrho) meanrho = 0.7759

Now compute an estimate of the bias. n = length(lsat); biasrho = (n-1) * (meanrho-rhohat) biasrho = -0.0065

The sample correlation probably underestimates the true correlation by about this amount.

3-12

Resampling Statistics

Parallel Computing Support for Resampling Methods For information on computing resampling statistics in parallel, see Parallel Computing Toolbox™.

3-13

4 Statistical Visualization • “Create Scatter Plots Using Grouped Data” on page 4-2 • “Compare Grouped Data Using Box Plots” on page 4-4 • “Distribution Plots” on page 4-7 • “Visualizing Multivariate Data” on page 4-17

4

Statistical Visualization

Create Scatter Plots Using Grouped Data This example shows how to create scatter plots using grouped sample data. A scatter plot is a simple plot of one variable against another. The MATLAB® functions plot and scatter produce scatter plots. The MATLAB function plotmatrix can produce a matrix of such plots showing the relationship between several pairs of variables. Statistics and Machine Learning Toolbox™ functions gscatter and gplotmatrix produce grouped versions of these plots. These functions are useful for determining whether the values of two variables or the relationship between those variables is the same in each group. These functions use different plotting symbols to indicate group membership. You can use gname to label points on the plots with a text label or an observation number. Suppose you want to examine the weight and mileage of cars from three different model years. load carsmall gscatter(Weight,MPG,Model_Year,'bgr','xos')

This shows that not only is there a strong relationship between the weight of a car and its mileage, but also that newer cars tend to be lighter and have better gas mileage than older cars. The default arguments for gscatter produce a scatter plot with the different groups shown with the same symbol but different colors. The last two arguments above request that all groups be shown in default colors and with different symbols. 4-2

Create Scatter Plots Using Grouped Data

The carsmall data set contains other variables that describe different aspects of cars. You can examine several of them in a single display by creating a grouped plot matrix. xvars = [Weight Displacement Horsepower]; yvars = [MPG Acceleration]; gplotmatrix(xvars,yvars,Model_Year,'bgr','xos')

The upper right subplot displays MPG against Horsepower, and shows that over the years the horsepower of the cars has decreased but the gas mileage has improved. The gplotmatrix function can also graph all pairs from a single list of variables, along with histograms for each variable. See “Perform Multivariate Analysis of Variance (MANOVA)” on page 952.

See Also gscatter | gplotmatrix | gname

More About •

“Grouping Variables” on page 2-11

4-3

4

Statistical Visualization

Compare Grouped Data Using Box Plots This example shows how to compare two groups of data by creating a notched box plot. Notches display the variability of the median between samples. The width of a notch is computed so that boxes whose notches do not overlap have different medians at the 5% significance level. The significance level is based on a normal distribution assumption, but comparisons of medians are reasonably robust for other distributions. Comparing box plot medians is like a visual hypothesis test, analogous to the t test used for means. For more information on the different features of a box plot, see “Box Plot” on page 35-300. Load the fisheriris data set. The data set contains length and width measurements from the sepals and petals of three species of iris flowers. Store the sepal width data for the setosa irises as s1, and the sepal width data for the versicolor irises as s2. load fisheriris s1 = meas(1:50,2); s2 = meas(51:100,2);

Create a notched box plot using the sample data, and label each box with the name of the iris species it represents. boxplot([s1 s2],'Notch','on', ... 'Labels',{'setosa','versicolor'})

4-4

Compare Grouped Data Using Box Plots

The notches of the two boxes do not overlap, which indicates that the median sepal widths of the setosa and versicolor irises are significantly different at the 5% significance level. Neither the red median line in the setosa box nor the red median line in the versicolor box appears to be centered inside its box, which indicates that each sample is slightly skewed. Additionally, the setosa data contains one outlier value, while the versicolor data does not contain any outliers. Instead of using the boxplot function, you can use the boxchart MATLAB® function to create box plots. Recreate the previous plot by using the boxchart function rather than boxplot. speciesName = categorical(species(1:100)); sepalWidth = meas(1:100,2); b = boxchart(speciesName,sepalWidth,'Notch','on');

Each notch created by boxchart is a tapered, shaded region around the median line. The shading helps to better identify the notches. One advantage of using boxchart is that the function creates a BoxChart object, whose properties you can change easily by using dot notation. For example, you can alter the style of the whiskers by specifying the WhiskerLineStyle property of the object b. b.WhiskerLineStyle = '--';

4-5

4

Statistical Visualization

For more information on the advantages of using boxchart, see “Alternative Functionality” on page 35-302.

See Also boxplot | iqr | median | boxchart

More About

4-6

•

“Exploratory Analysis of Data” on page 3-6

•

“Measures of Central Tendency” on page 3-2

•

“Measures of Dispersion” on page 3-4

•

“Distribution Plots” on page 4-7

Distribution Plots

Distribution Plots In this section... “Normal Probability Plots” on page 4-7 “Probability Plots” on page 4-9 “Quantile-Quantile Plots” on page 4-11 “Cumulative Distribution Plots” on page 4-13 Distribution plots visually assess the distribution of sample data by comparing the empirical distribution of the data with the theoretical values expected from a specified distribution. Use distribution plots in addition to more formal hypothesis tests to determine whether the sample data comes from a specified distribution. To learn about hypothesis tests, see “Hypothesis Testing” on page 8-5. Statistics and Machine Learning Toolbox offers several distribution plot options: • “Normal Probability Plots” on page 4-7 — Use normplot to assess whether sample data comes from a normal distribution. Use probplot to create “Probability Plots” on page 4-9 for distributions other than normal, or to explore the distribution of censored data. Use plot to plot a probability plot for a probability distribution object. • “Quantile-Quantile Plots” on page 4-11 — Use qqplot to assess whether two sets of sample data come from the same distribution family. This plot is robust with respect to differences in location and scale. • “Cumulative Distribution Plots” on page 4-13 — Use cdfplot or ecdf to display the empirical cumulative distribution function (cdf) of the sample data for visual comparison to the theoretical cdf of a specified distribution. Use plot to plot a cumulative distribution function for a probability distribution object.

Normal Probability Plots Use normal probability plots to assess whether data comes from a normal distribution. Many statistical procedures make the assumption that an underlying distribution is normal. Normal probability plots can provide some assurance to justify this assumption or provide a warning of problems with the assumption. An analysis of normality typically combines normal probability plots with hypothesis tests for normality. This example generates a data sample of 25 random numbers from a normal distribution with mean 10 and standard deviation 1, and creates a normal probability plot of the data. rng('default'); % For reproducibility x = normrnd(10,1,[25,1]); normplot(x)

4-7

4

Statistical Visualization

The plus signs plot the empirical probability versus the data value for each point in the data. A solid line connects the 25th and 75th percentiles in the data, and a dashed line extends it to the ends of the data. The y-axis values are probabilities from zero to one, but the scale is not linear. The distance between tick marks on the y-axis matches the distance between the quantiles of a normal distribution. The quantiles are close together near the median (50th percentile) and stretch out symmetrically as you move away from the median. In a normal probability plot, if all the data points fall near the line, an assumption of normality is reasonable. Otherwise, an assumption of normality is not justified. For example, the following generates a data sample of 100 random numbers from an exponential distribution with mean 10, and creates a normal probability plot of the data. x = exprnd(10,100,1); normplot(x)

4-8

Distribution Plots

The plot is strong evidence that the underlying distribution is not normal.

Probability Plots A probability plot, like the normal probability plot, is just an empirical cdf plot scaled to a particular distribution. The y-axis values are probabilities from zero to one, but the scale is not linear. The distance between tick marks is the distance between quantiles of the distribution. In the plot, a line is drawn between the first and third quartiles in the data. If the data falls near the line, it is reasonable to choose the distribution as a model for the data. A distribution analysis typically combines probability plots with hypothesis tests for a particular distribution. Create Weibull Probability Plot Generate sample data and create a probability plot. Generate sample data. The sample x1 contains 500 random numbers from a Weibull distribution with scale parameter A = 3 and shape parameter B = 3. The sample x2 contains 500 random numbers from a Rayleigh distribution with scale parameter B = 3. rng('default'); % For reproducibility x1 = wblrnd(3,3,[500,1]); x2 = raylrnd(3,[500,1]);

Create a probability plot to assess whether the data in x1 and x2 comes from a Weibull distribution. 4-9

4

Statistical Visualization

figure probplot('weibull',[x1 x2]) legend('Weibull Sample','Rayleigh Sample','Location','best')

The probability plot shows that the data in x1 comes from a Weibull distribution, while the data in x2 does not. Alternatively, you can use wblplot to create a Weibull probability plot. Create Gamma Probability Plot Generate random data from a gamma distribution with shape parameter 9 and scale parameter 2. rng("default") %set the random seed for reproducibility gammadata = gamrnd(9,2,100,1);

Fit gamma and logistic distributions to the data and store the results in GammaDistribution and LogisticDistribution objects. gammapd = fitdist(gammadata,"Gamma"); logisticpd = fitdist(gammadata,"Logistic");

Compare the distributions fit to the data with probability plots. tiledlayout(1,2) nexttile

4-10

Distribution Plots

plot(logisticpd,'PlotType',"probability") title("Logistic Distribution") nexttile plot(gammapd,'PlotType',"probability") title("Gamma Distribution")

The probability plots show that the gamma distribution is the better fit to the data.

Quantile-Quantile Plots Use quantile-quantile (q-q) plots to determine whether two samples come from the same distribution family. Q-Q plots are scatter plots of quantiles computed from each sample, with a line drawn between the first and third quartiles. If the data falls near the line, it is reasonable to assume that the two samples come from the same distribution. The method is robust with respect to changes in the location and scale of either distribution. Create a quantile-quantile plot by using the qqplot function. The following example generates two data samples containing random numbers from Poisson distributions with different parameter values, and creates a quantile-quantile plot. The data in x is from a Poisson distribution with mean 10, and the data in y is from a Poisson distribution with mean 5.

4-11

4

Statistical Visualization

x = poissrnd(10,[50,1]); y = poissrnd(5,[100,1]); qqplot(x,y)

Even though the parameters and sample sizes are different, the approximate linear relationship suggests that the two samples may come from the same distribution family. As with normal probability plots, hypothesis tests can provide additional justification for such an assumption. For statistical procedures that depend on the two samples coming from the same distribution, however, a linear quantile-quantile plot is often sufficient. The following example shows what happens when the underlying distributions are not the same. Here, x contains 100 random numbers generated from a normal distribution with mean 5 and standard deviation 1, while y contains 100 random numbers generated from a Weibull distribution with a scale parameter of 2 and a shape parameter of 0.5. x = normrnd(5,1,[100,1]); y = wblrnd(2,0.5,[100,1]); qqplot(x,y)

4-12

Distribution Plots

The plots indicate that these samples clearly are not from the same distribution family.

Cumulative Distribution Plots An empirical cumulative distribution function (cdf) plot shows the proportion of data less than or equal to each x value, as a function of x. The scale on the y-axis is linear; in particular, it is not scaled to any particular distribution. Empirical cdf plots are used to compare data cdfs to cdfs for particular distributions. To create an empirical cdf plot, use the cdfplot function or the ecdf function. Compare Empirical cdf to Theoretical cdf Plot the empirical cdf of a sample data set and compare it to the theoretical cdf of the underlying distribution of the sample data set. In practice, a theoretical cdf can be unknown. Generate a random sample data set from the extreme value distribution with a location parameter of 0 and a scale parameter of 3. rng('default') % For reproducibility y = evrnd(0,3,100,1);

Plot the empirical cdf of the sample data set and the theoretical cdf on the same figure. 4-13

4

Statistical Visualization

cdfplot(y) hold on x = linspace(min(y),max(y)); plot(x,evcdf(x,0,3)) legend('Empirical CDF','Theoretical CDF','Location','best') hold off

The plot shows the similarity between the empirical cdf and the theoretical cdf. Alternatively, you can use the ecdf function. The ecdf function also plots the 95% confidence intervals estimated by using Greenwood's Formula. For details, see “Algorithms” on page 35-1739.

ecdf(y,'Bounds','on') hold on plot(x,evcdf(x,0,3)) grid on title('Empirical CDF') legend('Empirical CDF','Lower Confidence Bound','Upper Confidence Bound','Theoretical CDF','Locat hold off

4-14

Distribution Plots

Plot Binomial Distribution cdf Create a binomial distribution with 10 trials and a 0.5 probability of success for each trial. binomialpd = makedist("Binomial",10,0.5) binomialpd = BinomialDistribution Binomial distribution N = 10 p = 0.5

Plot a cdf for the binomial distribution. plot(binomialpd,'PlotType',"cdf")

4-15

4

Statistical Visualization

See Also normplot | qqplot | cdfplot | ecdf | probplot | wblplot

More About

4-16

•

“Compare Grouped Data Using Box Plots” on page 4-4

•

“Hypothesis Testing” on page 8-5

Visualizing Multivariate Data

Visualizing Multivariate Data This example shows how to visualize multivariate data using various statistical plots. Many statistical analyses involve only two variables: a predictor variable and a response variable. Such data are easy to visualize using 2D scatter plots, bivariate histograms, boxplots, etc. It's also possible to visualize trivariate data with 3D scatter plots, or 2D scatter plots with a third variable encoded with, for example color. However, many datasets involve a larger number of variables, making direct visualization more difficult. This example explores some of the ways to visualize high-dimensional data in MATLAB®, using Statistics and Machine Learning Toolbox™. In this example, we'll use the carbig dataset, a dataset that contains various measured variables for about 400 automobiles from the 1970's and 1980's. We'll illustrate multivariate visualization using the values for fuel efficiency (in miles per gallon, MPG), acceleration (time from 0-60MPH in sec), engine displacement (in cubic inches), weight, and horsepower. We'll use the number of cylinders to group observations. load carbig X = [MPG,Acceleration,Displacement,Weight,Horsepower]; varNames = {'MPG'; 'Acceleration'; 'Displacement'; 'Weight'; 'Horsepower'};

Scatter Plot Matrices Viewing slices through lower dimensional subspaces is one way to partially work around the limitation of two or three dimensions. For example, we can use the gplotmatrix function to display an array of all the bivariate scatter plots between our five variables, along with a univariate histogram for each variable. figure gplotmatrix(X,[],Cylinders,['c' 'b' 'm' 'g' 'r'],[],[],false); text([.08 .24 .43 .66 .83], repmat(-.1,1,5), varNames, 'FontSize',8); text(repmat(-.12,1,5), [.86 .62 .41 .25 .02], varNames, 'FontSize',8, 'Rotation',90);

4-17

4

Statistical Visualization

The points in each scatter plot are color-coded by the number of cylinders: blue for 4 cylinders, green for 6, and red for 8. There is also a handful of 5 cylinder cars, and rotary-engined cars are listed as having 3 cylinders. This array of plots makes it easy to pick out patterns in the relationships between pairs of variables. However, there may be important patterns in higher dimensions, and those are not easy to recognize in this plot. Parallel Coordinates Plots The scatter plot matrix only displays bivariate relationships. However, there are other alternatives that display all the variables together, allowing you to investigate higher-dimensional relationships among variables. The most straight-forward multivariate plot is the parallel coordinates plot. In this plot, the coordinate axes are all laid out horizontally, instead of using orthogonal axes as in the usual Cartesian graph. Each observation is represented in the plot as a series of connected line segments. For example, we can make a plot of all the cars with 4, 6, or 8 cylinders, and color observations by group. Cyl468 = ismember(Cylinders,[4 6 8]); parallelcoords(X(Cyl468,:), 'group',Cylinders(Cyl468), ... 'standardize','on', 'labels',varNames)

4-18

Visualizing Multivariate Data

The horizontal direction in this plot represents the coordinate axes, and the vertical direction represents the data. Each observation consists of measurements on five variables, and each measurement is represented as the height at which the corresponding line crosses each coordinate axis. Because the five variables have widely different ranges, this plot was made with standardized values, where each variable has been standardized to have zero mean and unit variance. With the color coding, the graph shows, for example, that 8 cylinder cars typically have low values for MPG and acceleration, and high values for displacement, weight, and horsepower. Even with color coding by group, a parallel coordinates plot with a large number of observations can be difficult to read. We can also make a parallel coordinates plot where only the median and quartiles (25% and 75% points) for each group are shown. This makes the typical differences and similarities among groups easier to distinguish. On the other hand, it may be the outliers for each group that are most interesting, and this plot does not show them at all. parallelcoords(X(Cyl468,:), 'group',Cylinders(Cyl468), ... 'standardize','on', 'labels',varNames, 'quantile',.25)

4-19

4

Statistical Visualization

Andrews Plots Another similar type of multivariate visualization is the Andrews plot. This plot represents each observation as a smooth function over the interval [0,1]. andrewsplot(X(Cyl468,:), 'group',Cylinders(Cyl468), 'standardize','on')

4-20

Visualizing Multivariate Data

Each function is a Fourier series, with coefficients equal to the corresponding observation's values. In this example, the series has five terms: a constant, two sine terms with periods 1 and 1/2, and two similar cosine terms. Effects on the functions' shapes due to the three leading terms are the most apparent in an Andrews plot, so patterns in the first three variables tend to be the ones most easily recognized. There's a distinct difference between groups at t = 0, indicating that the first variable, MPG, is one of the distinguishing features between 4, 6, and 8 cylinder cars. More interesting is the difference between the three groups at around t = 1/3. Plugging this value into the formula for the Andrews plot functions, we get a set of coefficients that define a linear combination of the variables that distinguishes between groups. t1 = 1/3; [1/sqrt(2) sin(2*pi*t1) cos(2*pi*t1) sin(4*pi*t1) cos(4*pi*t1)] ans = 0.7071

0.8660

-0.5000

-0.8660

-0.5000

From these coefficients, we can see that one way to distinguish 4 cylinder cars from 8 cylinder cars is that the former have higher values of MPG and acceleration, and lower values of displacement, horsepower, and particularly weight, while the latter have the opposite. That's the same conclusion we drew from the parallel coordinates plot.

4-21

4

Statistical Visualization

Glyph Plots Another way to visualize multivariate data is to use "glyphs" to represent the dimensions. The function glyphplot supports two types of glyphs: stars, and Chernoff faces. For example, here is a star plot of the first 9 models in the car data. Each spoke in a star represents one variable, and the spoke length is proportional to the value of that variable for that observation. h = glyphplot(X(1:9,:), 'glyph','star', 'varLabels',varNames, 'obslabels',Model(1:9,:)); set(h(:,3),'FontSize',8);

In a live MATLAB figure window, this plot would allow interactive exploration of the data values, using data cursors. For example, clicking on the right-hand point of the star for the Ford Torino would show that it has an MPG value of 17. Glyph Plots and Multidimensional Scaling Plotting stars on a grid, with no particular order, can lead to a figure that is confusing, because adjacent stars can end up quite different-looking. Thus, there may be no smooth pattern for the eye to catch. It's often useful to combine multidimensional scaling (MDS) with a glyph plot. To illustrate, we'll first select all cars from 1977, and use the zscore function to standardize each of the five variables to have zero mean and unit variance. Then we'll compute the Euclidean distances among those standardized observations as a measure of dissimilarity. This choice might be too simplistic in a real application, but serves here for purposes of illustration. models77 = find((Model_Year==77)); dissimilarity = pdist(zscore(X(models77,:)));

4-22

Visualizing Multivariate Data

Finally, we use mdscale to create a set of locations in two dimensions whose interpoint distances approximate the dissimilarities among the original high-dimensional data, and plot the glyphs using those locations. The distances in this 2D plot may only roughly reproduce the data, but for this type of plot, that's good enough. Y = mdscale(dissimilarity,2); glyphplot(X(models77,:), 'glyph','star', 'centers',Y, ... 'varLabels',varNames, 'obslabels',Model(models77,:), 'radius',.5); title('1977 Model Year');

In this plot, we've used MDS as dimension reduction method, to create a 2D plot. Normally that would mean a loss of information, but by plotting the glyphs, we have incorporated all of the highdimensional information in the data. The purpose of using MDS is to impose some regularity to the variation in the data, so that patterns among the glyphs are easier to see. Just as with the previous plot, interactive exploration would be possible in a live figure window. Another type of glyph is the Chernoff face. This glyph encodes the data values for each observation into facial features, such as the size of the face, the shape of the face, position of the eyes, etc. glyphplot(X(models77,:), 'glyph','face', 'centers',Y, ... 'varLabels',varNames, 'obslabels',Model(models77,:)); title('1977 Model Year');

4-23

4

Statistical Visualization

Here, the two most apparent features, face size and relative forehead/jaw size, encode MPG and acceleration, while the forehead and jaw shape encode displacement and weight. Width between eyes encodes horsepower. It's notable that there are few faces with wide foreheads and narrow jaws, or vice-versa, indicating positive linear correlation between the variables displacement and weight. That's also what we saw in the scatter plot matrix. The correspondence of features to variables determines what relationships are easiest to see, and glyphplot allows the choice to be changed easily. close

4-24

5 Probability Distributions • “Working with Probability Distributions” on page 5-3 • “Supported Distributions” on page 5-16 • “Maximum Likelihood Estimation” on page 5-23 • “Negative Loglikelihood Functions” on page 5-25 • “Random Number Generation” on page 5-28 • “Nonparametric and Empirical Probability Distributions” on page 5-31 • “Fit Kernel Distribution Object to Data” on page 5-37 • “Fit Kernel Distribution Using ksdensity” on page 5-40 • “Fit Distributions to Grouped Data Using ksdensity” on page 5-42 • “Fit a Nonparametric Distribution with Pareto Tails” on page 5-44 • “Generate Random Numbers Using the Triangular Distribution” on page 5-48 • “Model Data Using the Distribution Fitter App” on page 5-52 • “Fit a Distribution Using the Distribution Fitter App” on page 5-72 • “Define Custom Distributions Using the Distribution Fitter App” on page 5-82 • “Explore the Random Number Generation UI” on page 5-86 • “Compare Multiple Distribution Fits” on page 5-88 • “Fit Probability Distribution Objects to Grouped Data” on page 5-93 • “Three-Parameter Weibull Distribution” on page 5-96 • “Multinomial Probability Distribution Objects” on page 5-103 • “Multinomial Probability Distribution Functions” on page 5-106 • “Generate Random Numbers Using Uniform Distribution Inversion” on page 5-109 • “Represent Cauchy Distribution Using t Location-Scale” on page 5-112 • “Generate Cauchy Random Numbers Using Student's t” on page 5-115 • “Generate Correlated Data Using Rank Correlation” on page 5-116 • “Create Gaussian Mixture Model” on page 5-120 • “Fit Gaussian Mixture Model to Data” on page 5-123 • “Simulate Data from Gaussian Mixture Model” on page 5-127 • “Copulas: Generate Correlated Samples” on page 5-129 • “Simulating Dependent Random Variables Using Copulas” on page 5-155 • “Fit Custom Distributions” on page 5-173 • “Avoid Numerical Issues When Fitting Custom Distributions” on page 5-186 • “Nonparametric Estimates of Cumulative Distribution Functions and Their Inverses” on page 5-192 • “Modelling Tail Data with the Generalized Pareto Distribution” on page 5-207 • “Modelling Data with the Generalized Extreme Value Distribution” on page 5-215

5

Probability Distributions

• “Curve Fitting and Distribution Fitting” on page 5-226 • “Fitting a Univariate Distribution Using Cumulative Probabilities” on page 5-234

5-2

Working with Probability Distributions

Working with Probability Distributions In this section... “Probability Distribution Objects” on page 5-3 “Apps and Interactive User Interfaces” on page 5-6 “Distribution-Specific Functions and Generic Distribution Functions” on page 5-10 Probability distributions are theoretical distributions based on assumptions about a source population. The distributions assign probability to the event that a random variable has a specific, discrete value, or falls within a specified range of continuous values. Statistics and Machine Learning Toolbox offers several ways to work with probability distributions. • “Probability Distribution Objects” on page 5-3 — Create a probability distribution object by fitting a probability distribution to sample data or by specifying parameter values. Then, use object functions to evaluate the distribution, generate random numbers, and so on. • “Apps and Interactive User Interfaces” on page 5-6 — Interactively fit and explore probability distributions by using the Distribution Fitter app, Probability Distribution Function user interface, and random number generation tool (randtool) • “Distribution-Specific Functions and Generic Distribution Functions” on page 5-10 — These functions are useful for generating random numbers, computing summary statistics inside a loop or script, and passing a cdf or pdf as a function handle to another function. You can also use these functions to perform computations on arrays of parameter values rather than a single set of parameters. • Use distribution-specific functions, such as normpdf and normcdf, with specified distribution parameters. • Use generic distribution functions (cdf, icdf, pdf, and random) with a specified distribution name and parameters. For a list of distributions supported by Statistics and Machine Learning Toolbox, see “Supported Distributions” on page 5-16.

Probability Distribution Objects Probability distribution objects allow you to fit a probability distribution to sample data, or define a distribution by specifying parameter values. You can then perform a variety of analyses on the distribution object. Create Probability Distribution Objects Estimate probability distribution parameters from sample data by fitting a probability distribution object to the data using fitdist. You can fit a single specified parametric or nonparametric distribution to the sample data. You can also fit multiple distributions of the same type to the sample data based on grouping variables. For most distributions, fitdist uses maximum likelihood estimation (MLE) to estimate the distribution parameters from the sample data. For more information and additional syntax options, see fitdist. Alternatively, you can create a probability distribution object with specified parameter values using makedist. 5-3

5

Probability Distributions

Work with Probability Distribution Objects Once you create a probability distribution object, you can use object functions to: • Compute confidence intervals for the distribution parameters (paramci). • Compute summary statistics, including mean (mean), median (median), interquartile range (iqr), variance (var), and standard deviation (std). • Evaluate the probability density function (pdf). • Evaluate the cumulative distribution function (cdf) or the inverse cumulative distribution function (icdf). • Compute the negative loglikelihood (negloglik) and profile likelihood function (proflik) for the distribution. • Generate random numbers from the distribution (random). • Truncate the distribution to specified lower and upper limits (truncate). • Plot the probability density function, cumulative distribution, or probability plot (plot) Save a Probability Distribution Object To save your probability distribution object to a .MAT file: • In the toolbar, click Save Workspace. This option saves all of the variables in your workspace, including any probability distribution objects. • In the workspace browser, right-click the probability distribution object and select Save as. This option saves only the selected probability distribution object, not the other variables in your workspace. Alternatively, you can save a probability distribution object directly from the command line by using the save function. save enables you to choose a file name and specify the probability distribution object you want to save. If you do not specify an object (or other variable), MATLAB® saves all of the variables in your workspace, including any probability distribution objects, to the specified file name. For more information and additional syntax options, see save. Analyze Distribution Using Probability Distribution Objects This example shows how to use probability distribution objects to perform a multistep analysis on a fitted distribution. The analysis illustrates how to: • Fit a probability distribution to sample data that contains exam grades of 120 students by using fitdist. • Compute the mean of the exam grades by using mean. • Plot a histogram of the exam grade data, overlaid with a plot of the pdf of the fitted distribution, by using plot and pdf. • Compute the boundary for the top 10 percent of student grades by using icdf. • Save the fitted probability distribution object by using save. Load the sample data. load examgrades

5-4

Working with Probability Distributions

The sample data contains a 120-by-5 matrix of exam grades. The exams are scored on a scale of 0 to 100. Create a vector containing the first column of exam grade data. x = grades(:,1);

Fit a normal distribution to the sample data by using fitdist to create a probability distribution object. pd = fitdist(x,'Normal') pd = NormalDistribution Normal distribution mu = 75.0083 [73.4321, 76.5846] sigma = 8.7202 [7.7391, 9.98843]

fitdist returns a probability distribution object, pd, of the type NormalDistribution. This object contains the estimated parameter values, mu and sigma, for the fitted normal distribution. The intervals next to the parameter estimates are the 95% confidence intervals for the distribution parameters. Compute the mean of the students' exam grades using the fitted distribution object, pd. m = mean(pd) m = 75.0083

The mean of the exam grades is equal to the mu parameter estimated by fitdist. Plot a histogram of the exam grades. Overlay a plot of the fitted pdf to visually compare the fitted normal distribution with the actual exam grades. x_pdf = [1:0.1:100]; y = pdf(pd,x_pdf); figure histogram(x,'Normalization','pdf') line(x_pdf,y)

5-5

5

Probability Distributions

The pdf of the fitted distribution follows the same shape as the histogram of the exam grades. Determine the boundary for the upper 10 percent of student exam grades by using the inverse cumulative distribution function (icdf). This boundary is equivalent to the value at which the cdf of the probability distribution is equal to 0.9. In other words, 90 percent of the exam grades are less than or equal to the boundary value. A = icdf(pd,0.9) A = 86.1837

Based on the fitted distribution, 10 percent of students received an exam grade greater than 86.1837. Equivalently, 90 percent of students received an exam grade less than or equal to 86.1837. Save the fitted probability distribution, pd, as a file named myobject.mat. save('myobject.mat','pd')

Apps and Interactive User Interfaces Apps and user interfaces provide an interactive approach to working with parametric and nonparametric probability distributions. • Use the Distribution Fitter app to interactively fit a distribution to sample data, and export a probability distribution object to the workspace. 5-6

Working with Probability Distributions

• Use the Probability Distribution Function user interface to visually explore the effect on the pdf and cdf of changing the distribution parameter values. • Use the Random Number Generation user interface (randtool) to interactively generate random numbers from a probability distribution with specified parameter values and export them to the workspace. Distribution Fitter App The Distribution Fitter app allows you to interactively fit a probability distribution to your data. You can display different types of plots, compute confidence bounds, and evaluate the fit of the data. You can also exclude data from the fit. You can save the data, and export the fit to your workspace as a probability distribution object to perform further analysis. Load the Distribution Fitter app from the Apps tab, or by entering distributionFitter in the command window. For more information, see “Model Data Using the Distribution Fitter App” on page 5-52.

5-7

5

Probability Distributions

Probability Distribution Function Tool The Probability Distribution Function user interface visually explores probability distributions. You can load the Probability Distribution Function user interface by entering disttool in the command window.

5-8

Working with Probability Distributions

Random Number Generation Tool The Random Number Generation user interface generates random data from a specified distribution and exports the results to your workspace. You can use this tool to explore the effects of changing parameters and sample size on the distributions. The Random Number Generation user interface allows you to set parameter values for the distribution and change their lower and upper bounds; draw another sample from the same distribution, using the same size and parameters; and export the current random sample to your workspace for use in further analysis. A dialog box enables you to provide a name for the sample.

5-9

5

Probability Distributions

Distribution-Specific Functions and Generic Distribution Functions Using distribution-specific functions and generic distribution functions is useful for generating random numbers, computing summary statistics inside a loop or script, and passing a cdf or pdf as a function handle to another function. You can also use these functions to perform computations on arrays of parameter values rather than a single set of parameters.

5-10

Working with Probability Distributions

• Distribution-specific functions — Some of the supported distributions have distribution-specific functions. These functions use the following abbreviations, as in normpdf, normcdf, norminv, normstat, normfit, normlike, and normrnd: • pdf — Probability density functions • cdf — Cumulative distribution functions • inv — Inverse cumulative distribution functions • stat — Distribution statistics functions • fit — Distribution Fitter functions • like — Negative loglikelihood functions • rnd — Random number generators • Generic distribution functions — Use cdf, icdf, mle, pdf, and random with a specified distribution name and parameters. • cdf — Cumulative distribution function • icdf — Inverse cumulative distribution function • mle — Distribution fitting function • pdf — Probability density function • random — Random number generating function Analyze Distribution Using Distribution-Specific Functions This example shows how to use distribution-specific functions to perform a multistep analysis on a fitted distribution. The analysis illustrates how to: • Fit a probability distribution to sample data that contains exam grades of 120 students by using normfit. • Plot a histogram of the exam grade data, overlaid with a plot of the pdf of the fitted distribution, by using plot and normpdf. • Compute the boundary for the top 10 percent of student grades by using norminv. • Save the estimated distribution parameters by using save. You can perform the same analysis using a probability distribution object. See “Analyze Distribution Using Probability Distribution Objects” on page 5-4. Load the sample data. load examgrades

The sample data contains a 120-by-5 matrix of exam grades. The exams are scored on a scale of 0 to 100. Create a vector containing the first column of exam grade data. x = grades(:,1);

Fit a normal distribution to the sample data by using normfit. 5-11

5

Probability Distributions

[mu,sigma,muCI,sigmaCI] = normfit(x) mu = 75.0083 sigma = 8.7202 muCI = 2×1 73.4321 76.5846 sigmaCI = 2×1 7.7391 9.9884

The normfit function returns the estimates of normal distribution parameters and the 95% confidence intervals for the parameter estimates. Plot a histogram of the exam grades. Overlay a plot of the fitted pdf to visually compare the fitted normal distribution with the actual exam grades. x_pdf = [1:0.1:100]; y = normpdf(x_pdf,mu,sigma); figure histogram(x,'Normalization','pdf') line(x_pdf,y)

5-12

Working with Probability Distributions

The pdf of the fitted distribution follows the same shape as the histogram of the exam grades. Determine the boundary for the upper 10 percent of student exam grades by using the normal inverse cumulative distribution function. This boundary is equivalent to the value at which the cdf of the probability distribution is equal to 0.9. In other words, 90 percent of the exam grades are less than or equal to the boundary value. A = norminv(0.9,mu,sigma) A = 86.1837

Based on the fitted distribution, 10 percent of students received an exam grade greater than 86.1837. Equivalently, 90 percent of students received an exam grade less than or equal to 86.1837. Save the estimated distribution parameters as a file named myparameter.mat. save('myparameter.mat','mu','sigma')

Use Probability Distribution Functions as Function Handle This example shows how to use the probability distribution function normcdf as a function handle in the chi-square goodness of fit test (chi2gof). This example tests the null hypothesis that the sample data contained in the input vector, x, comes from a normal distribution with parameters µ and σ equal to the mean (mean) and standard deviation (std) of the sample data, respectively. 5-13

5

Probability Distributions

rng('default') % For reproducibility x = normrnd(50,5,100,1); h = chi2gof(x,'cdf',{@normcdf,mean(x),std(x)}) h = 0

The returned result h = 0 indicates that chi2gof does not reject the null hypothesis at the default 5% significance level. This next example illustrates how to use probability distribution functions as a function handle in the slice sampler (slicesample). The example uses normpdf to generate a random sample of 2,000 values from a standard normal distribution, and plots a histogram of the resulting values. rng('default') % For reproducibility x = slicesample(1,2000,'pdf',@normpdf,'thin',5,'burnin',1000); histogram(x)

The histogram shows that, when using normpdf, the resulting random sample has a standard normal distribution. If you pass the probability distribution function for the exponential distribution pdf (exppdf) as a function handle instead of normpdf, then slicesample generates the 2,000 random samples from an exponential distribution with a default parameter value of µ equal to 1. rng('default') % For reproducibility x = slicesample(1,2000,'pdf',@exppdf,'thin',5,'burnin',1000); histogram(x)

5-14

Working with Probability Distributions

The histogram shows that the resulting random sample when using exppdf has an exponential distribution.

See Also fitdist | makedist | randtool | Distribution Fitter | Probability Distribution Function

More About •

“Multinomial Probability Distribution Objects” on page 5-103

•

“Multinomial Probability Distribution Functions” on page 5-106

•

“Fit Kernel Distribution Object to Data” on page 5-37

•

“Generate Random Numbers Using the Triangular Distribution” on page 5-48

•

“Supported Distributions” on page 5-16

5-15

5

Probability Distributions

Supported Distributions In this section... “Continuous Distributions (Data)” on page 5-16 “Continuous Distributions (Statistics)” on page 5-19 “Discrete Distributions” on page 5-20 “Multivariate Distributions” on page 5-21 “Nonparametric Distributions” on page 5-22 “Flexible Distribution Families” on page 5-22 Statistics and Machine Learning Toolbox supports various probability distributions, including parametric, nonparametric, continuous, and discrete distributions. The following tables list the supported probability distributions and supported ways to work with each distribution. For more information, see “Working with Probability Distributions” on page 5-3. For a custom probability distribution, use a custom distribution template to create a probability object and then use the Distribution Fitter app or probability object functions. For details, see “Define Custom Distributions Using the Distribution Fitter App” on page 5-82. You can also define a custom distribution using a function handle and use the mle function to find maximum likelihood estimates. For an example, see “Fit Custom Distributions” on page 5-173.

Continuous Distributions (Data)

5-16

Distribution

Distribution Object

Beta on page B6

Apps and Interactive UIs

DistributionSpecific Functions

Generic Functions

BetaDistributi Distribution Fitter on Probability Distribution Function randtool

betapdf betacdf betainv betastat betafit betalike betarnd

pdf cdf icdf random mle

BirnbaumSaunders on page B-18

BirnbaumSaunde Distribution Fitter rsDistribution

—

pdf cdf icdf random mle

Burr Type XII on page B-19

BurrDistributi Distribution Fitter on Probability Distribution Function randtool

—

pdf cdf icdf random mle

Supported Distributions

Distribution

Distribution Object

Exponential on page B-34

Extreme value on page B-41

Apps and Interactive UIs

DistributionSpecific Functions

Generic Functions

ExponentialDis Distribution Fitter tribution Probability Distribution Function randtool

exppdf expcdf expinv expstat expfit explike exprnd

pdf cdf icdf random mle

ExtremeValueDi Distribution Fitter stribution Probability Distribution Function randtool

evpdf evcdf evinv evstat evfit evlike evrnd

pdf cdf icdf random mle

Gamma on page B- GammaDistribut Distribution Fitter 48 ion Probability Distribution Function randtool

gampdf gamcdf gaminv gamstat gamfit gamlike gamrnd randg

pdf cdf icdf random mle

Generalized extreme value on page B-56

gevpdf gevcdf gevinv gevstat gevfit gevlike gevrnd

pdf cdf icdf random mle

Generalized Pareto GeneralizedPar Distribution Fitter on page B-60 etoDistributio Probability n Distribution Function randtool

gppdf gpcdf gpinv gpstat gpfit gplike gprnd

pdf cdf icdf random mle

Half-Normal on page B-69

HalfNormalDist Distribution Fitter ribution Probability Distribution Function randtool

—

pdf cdf icdf random mle

Inverse Gaussian on page B-76

InverseGaussia Distribution Fitter nDistribution

—

pdf cdf icdf random mle

GeneralizedExt Distribution Fitter remeValueDistr Probability ibution Distribution Function randtool

5-17

5

Probability Distributions

Distribution

5-18

Distribution Object

Apps and Interactive UIs

DistributionSpecific Functions

Generic Functions

Logistic on page B- LogisticDistri Distribution Fitter 86 bution

—

pdf cdf icdf random mle

Loglogistic on page B-87

—

pdf cdf icdf random mle

Lognormal on page LognormalDistr Distribution Fitter B-89 ibution Probability Distribution Function randtool

lognpdf logncdf logninv lognstat lognfit lognlike lognrnd

pdf cdf icdf random mle

Loguniform on page B-97

—

pdf cdf icdf random

Nakagami on page NakagamiDistri Distribution Fitter B-114 bution

—

pdf cdf icdf random mle

Normal (Gaussian) NormalDistribu Distribution Fitter on page B-125 tion Probability Distribution Function randtool

normpdf normcdf norminv normstat normfit normlike normrnd

pdf cdf icdf random mle

Piecewise linear on PiecewiseLinea — page B-145 rDistribution

—

—

Rayleigh on page B-152

raylpdf raylcdf raylinv raylstat raylfit raylrnd

pdf cdf icdf random mle

LoglogisticDis Distribution Fitter tribution

LoguniformDist — ribution

RayleighDistri Distribution Fitter bution Probability Distribution Function randtool

Supported Distributions

Distribution

Distribution Object

Rician on page B154

Stable on page B156

Apps and Interactive UIs

DistributionSpecific Functions

Generic Functions

RicianDistribu Distribution Fitter tion

—

pdf cdf icdf random mle

StableDistribu Distribution Fitter tion

—

pdf cdf icdf random mle

Triangular on page TriangularDist — B-174 ribution

—

—

Uniform (continuous) on page B-179

unifpdf unifcdf unifinv unifstat unifit unifrnd

pdf cdf icdf random mle

wblpdf wblcdf wblinv wblstat wblfit wbllike wblrnd

pdf cdf icdf random mle

UniformDistrib Probability ution Distribution Function randtool

Weibull on page B- WeibullDistrib Distribution Fitter 186 ution Probability Distribution Function randtool

Continuous Distributions (Statistics) Distribution

Distribution Object Apps and Interactive UIs

DistributionSpecific Functions

Generic Functions

Chi-square on page B-29

—

Probability chi2pdf Distribution Function chi2cdf randtool chi2inv chi2stat chi2rnd

pdf cdf icdf random

F on page B-46

—

Probability fpdf Distribution Function fcdf randtool finv fstat frnd

pdf cdf icdf random

Noncentral chisquare on page B119

—

Probability ncx2pdf Distribution Function ncx2cdf randtool ncx2inv ncx2stat ncx2rnd

pdf cdf icdf random

5-19

5

Probability Distributions

Distribution

Distribution Object Apps and Interactive UIs

DistributionSpecific Functions

Generic Functions

Noncentral F on page B-121

—

Probability ncfpdf Distribution Function ncfcdf randtool ncfinv ncfstat ncfrnd

pdf cdf icdf random

Noncentral t on page — B-123

Probability nctpdf Distribution Function nctcdf randtool nctinv nctstat nctrnd

pdf cdf icdf random

Student's t on page B-165

—

Probability tpdf Distribution Function tcdf randtool tinv tstat trnd

pdf cdf icdf random

t location-scale on page B-172

tLocationScaleDi Distribution Fitter stribution

—

pdf cdf icdf random mle

DistributionSpecific Functions

Generic Functions

binopdf binocdf binoinv binostat binofit binornd

pdf cdf icdf random mle

—

mle

Discrete Distributions Distribution

Distribution Objects

Apps and Interactive UIs

Binomial on page B10

BinomialDistribu Distribution Fitter tion Probability Distribution Function randtool

Bernoulli on page B- — 2

—

Geometric on page B-64

—

Probability geopdf Distribution Function geocdf randtool geoinv geostat mle geornd

pdf cdf icdf random mle

Hypergeometric on page B-74

—

Probability hygepdf Distribution Function hygecdf randtool hygeinv hygestat hygernd

pdf cdf icdf random mle

5-20

Supported Distributions

Distribution

Distribution Objects

Apps and Interactive UIs

DistributionSpecific Functions

Generic Functions

Multinomial on page MultinomialDistr — B-102 ibution

mnpdf mnrnd

—

Negative binomial on NegativeBinomial Distribution Fitter page B-115 Distribution Probability Distribution Function randtool

nbinpdf nbincdf nbininv nbinstat nbinfit nbinrnd

pdf cdf icdf random mle

Poisson on page B146

poisspdf poisscdf poissinv poisstat poissfit poissrnd

pdf cdf icdf random mle

Probability unidpdf Distribution Function unidcdf randtool unidinv unidstat unidrnd

pdf cdf icdf random mle

PoissonDistribut Distribution Fitter ion Probability Distribution Function randtool

Uniform (discrete) on — page B-184

Multivariate Distributions Distribution

Distribution Object

Distribution-Specific Functions

Copula on page 5-129 (Gaussian copula, t copula, Clayton copula, Frank copula, Gumbel copula)

—

copulapdf copulacdf copulaparam copulastat copulafit copularnd

Gaussian Mixture

gmdistribution

fitgmdist pdf cdf random

Inverse Wishart on page B-77

—

iwishrnd

Multivariate normal on page B-104

—

mvnpdf mvncdf mvnrnd

Multivariate t on page B-110

—

mvtpdf mvtcdf mvtrnd

Wishart on page B-193

—

wishrnd

5-21

5

Probability Distributions

Nonparametric Distributions Distribution

Distribution Objects

Apps and Interactive UIs

Distribution-Specific Functions

Kernel on page B-79

KernelDistribution

Distribution Fitter

ksdensity

Pareto tails

paretotails

—

—

Flexible Distribution Families Distribution

Distribution-Specific Functions

Generic Functions

Pearson system on page 7-20

pearspdf pearscdf pearsrnd

pdf cdf random

Johnson system on page 7-20

johnsrnd

—

See Also More About

5-22

•

“Working with Probability Distributions” on page 5-3

•

“Nonparametric and Empirical Probability Distributions” on page 5-31

Maximum Likelihood Estimation

Maximum Likelihood Estimation The mle function computes maximum likelihood estimates (MLEs) for a distribution specified by its name and for a custom distribution specified by its probability density function (pdf), log pdf, or negative log likelihood function. For some distributions, MLEs can be given in closed form and computed directly. For other distributions, a search for the maximum likelihood must be employed. The search can be controlled with an options input argument, created using the statset function. For efficient searches, it is important to choose a reasonable distribution model and set appropriate convergence tolerances. MLEs can be biased, especially for small samples. As sample size increases, however, MLEs become unbiased minimum variance estimators with approximate normal distributions. This is used to compute confidence bounds for the estimates. For example, consider the following distribution of means from repeated random samples of an exponential distribution: mu = 1; % Population parameter n = 1e3; % Sample size ns = 1e4; % Number of samples rng('default') % For reproducibility samples = exprnd(mu,n,ns); % Population samples means = mean(samples); % Sample means

The Central Limit Theorem says that the means will be approximately normally distributed, regardless of the distribution of the data in the samples. The mle function can be used to find the normal distribution that best fits the means: [phat,pci] = mle(means) phat = 1×2 1.0000

0.0315

pci = 2×2 0.9994 1.0006

0.0311 0.0319

phat(1) and phat(2) are the MLEs for the mean and standard deviation. pci(:,1) and pci(:,1) are the corresponding 95% confidence intervals. Visualize the distribution of sample means together with the fitted normal distribution. numbins = 50; histogram(means,numbins,'Normalization','pdf') hold on x = min(means):0.001:max(means); y = normpdf(x,phat(1),phat(2)); plot(x,y,'r','LineWidth',2)

5-23

5

Probability Distributions

See Also mle | histogram

Related Examples •

“Fit Custom Distributions” on page 5-173

•

“Avoid Numerical Issues When Fitting Custom Distributions” on page 5-186

More About

5-24

•

“Working with Probability Distributions” on page 5-3

•

“Supported Distributions” on page 5-16

Negative Loglikelihood Functions

Negative Loglikelihood Functions Negative loglikelihood functions for supported Statistics and Machine Learning Toolbox distributions all end with like, as in explike. Each function represents a parametric family of distributions. Input arguments are lists of parameter values specifying a particular member of the distribution family followed by an array of data. Functions return the negative loglikelihood of the parameters, given the data. To find maximum likelihood estimates (MLEs), you can use a negative loglikelihood function as an objective function of the optimization problem and solve it by using the MATLAB function fminsearch or functions in Optimization Toolbox™ and Global Optimization Toolbox. These functions allow you to choose a search algorithm and exercise low-level control over algorithm execution. By contrast, the mle function and the distribution fitting functions that end with fit, such as normfit and gamfit, use preset algorithms with options limited to those set by the statset function. You can specify a parametric family of distributions by using the probability density function (pdf) f(x| θ), where x represents an outcome of a random variable and θ represents the distribution parameters. When you view f(x|θ) as a function of θ for a fixed x, the function f(x|θ) is the likelihood of parameters θ for a single outcome x. The likelihood of parameters θ for an independent and identically distributed random sample data set X is: L(θ) =

∏

f (x θ) .

x∈X

Given X, MLEs maximize L(θ) over all possible θ. Numerical algorithms find MLEs that (equivalently) maximize the loglikelihood function, log(L(θ)). The logarithm transforms the product of potentially small likelihoods into a sum of logs, which is easier to distinguish from 0 in computation. For convenience, Statistics and Machine Learning Toolbox negative loglikelihood functions return the negative of this sum because optimization algorithms typically search for minima rather than maxima.

Find MLEs Using Negative Loglikelihood Function This example shows how to find MLEs by using the gamlike and fminsearch functions. Use the gamrnd function to generate a random sample from a specific “Gamma Distribution” on page B-48. rng default; % For reproducibility a = [1,2]; X = gamrnd(a(1),a(2),1e3,1);

Visualize the likelihood surface in the neighborhood of a given X by using the gamlike function. mesh = 50; delta = 0.5; a1 = linspace(a(1)-delta,a(1)+delta,mesh); a2 = linspace(a(2)-delta,a(2)+delta,mesh); logL = zeros(mesh); % Preallocate memory for i = 1:mesh for j = 1:mesh logL(i,j) = gamlike([a1(i),a2(j)],X); end end

5-25

5

Probability Distributions

[A1,A2] = meshgrid(a1,a2); surfc(A1,A2,logL)

Search for the minimum of the likelihood surface by using the fminsearch function. LL = @(u)gamlike([u(1),u(2)],X); % Likelihood given X MLES = fminsearch(LL,[1,2]) MLES = 1×2 0.9980

2.0172

Compare MLES to the estimates returned by the gamfit function. ahat = gamfit(X) ahat = 1×2 0.9980

2.0172

The difference of each parameter between MLES and ahat is less than 1e-4. Add the MLEs to the surface plot.

5-26

Negative Loglikelihood Functions

hold on plot3(MLES(1),MLES(2),LL(MLES),'ro','MarkerSize',5,'MarkerFaceColor','r') view([-60 40]) % Rotate to show the minimum

See Also negloglik | statset | fminsearch | surfc

More About •

“Working with Probability Distributions” on page 5-3

•

“Supported Distributions” on page 5-16

5-27

5

Probability Distributions

Random Number Generation Statistics and Machine Learning Toolbox supports the generation of random numbers from various distributions. Each random number generator (RNG) represents a parametric family of distributions. RNGs return random numbers from the specified distribution in an array of the specified dimensions. Other random number generation functions which do not support specific distributions include: • cvpartition • datasample • hmmgenerate • lhsdesign • lhsnorm • mhsample • random • randsample • slicesample • hmcSampler RNGs in Statistics and Machine Learning Toolbox software depend on the default random number stream of MATLAB via the rand and randn functions. Each RNG uses one of the techniques discussed in “Common Pseudorandom Number Generation Methods” on page 7-2 to generate random numbers from a given distribution. By controlling the default random number stream and its state, you can control how the RNGs in Statistics and Machine Learning Toolbox software generate random values. For example, to reproduce the same sequence of values from an RNG, you can save and restore the default stream's state, or reset the default stream. For details on managing the default random number stream, see “Managing the Global Stream Using RandStream”. MATLAB initializes the default random number stream to the same state each time it starts up. Thus, RNGs in Statistics and Machine Learning Toolbox software will generate the same sequence of values for each MATLAB session unless you modify that state at startup. One simple way to do that is to add commands to startup.m such as rng shuffle

that initialize the default random number stream to a different state for each session. The following table lists the supported distributions and their respective random number generation functions.

5-28

Distribution

Random Number Generation Function

Beta on page B-6

betarnd, random, randtool

Binomial on page B-10

binornd, random, randtool

Birnbaum-Saunders on page B-18

random

Burr Type XII on page B-19

random, randtool

Chi-square on page B-29

chi2rnd, random, randtool

Random Number Generation

Distribution

Random Number Generation Function

Clayton copula on page 5-129

copularnd

Exponential on page B-34

exprnd, random, randtool

Extreme value on page B-41

evrnd, random, randtool

F on page B-46

frnd, random, randtool

Frank copula on page 5-129

copularnd

Gamma on page B-48

gamrnd, randg, random, randtool

Gaussian copula on page 5-129

copularnd

Gaussian Mixture

random

Generalized extreme value on page B-56

gevrnd, random, randtool

Generalized Pareto on page B-60

gprnd, random, randtool

Geometric on page B-64

geornd, random, randtool

Gumbel copula on page 5-129

copularnd

Half-Normal on page B-69

random, randtool

Hypergeometric on page B-74

hygernd, random, randtool

Inverse Gaussian on page B-76

random

Inverse Wishart on page B-77

iwishrnd

Johnson system on page 7-20

johnsrnd

Kernel on page B-79

random

Logistic on page B-86

random

Loglogistic on page B-87

random

Lognormal on page B-89

lognrnd, random, randtool

Multinomial on page B-102

mnrnd

Multivariate normal on page B-104

mvnrnd

Multivariate t on page B-110

mvtrnd

Nakagami on page B-114

random

Negative binomial on page B-115

nbinrnd, random, randtool

Noncentral chi-square on page B-119

ncx2rnd, random, randtool

Noncentral F on page B-121

ncfrnd, random, randtool

Noncentral t on page B-123

nctrnd, random, randtool

Normal (Gaussian) on page B-125

normrnd, randn, random, randtool

Pareto

random

Pearson system on page 7-20

pearsrnd

Piecewise on page B-145

random

Poisson on page B-146

poissrnd, random, randtool

Rayleigh on page B-152

raylrnd, random, randtool

Rician on page B-154

random

5-29

5

Probability Distributions

Distribution

Random Number Generation Function

Stable on page B-156

random

Student's t on page B-165

trnd, random, randtool

t copula on page 5-129

copularnd

t location- scale on page B-172

random

Triangular on page B-174

random

Uniform (continuous) on page B-179

unifrnd, rand, random

Uniform (discrete) on page B-184

unidrnd, random, randtool

Weibull on page B-186

wblrnd, random

Wishart on page B-193

wishrnd

See Also More About

5-30

•

“Generate Random Numbers Using the Triangular Distribution” on page 5-48

•

“Generate Random Numbers Using Uniform Distribution Inversion” on page 5-109

•

“Generating Pseudorandom Numbers” on page 7-2

•

“Generating Quasi-Random Numbers” on page 7-12

•

“Working with Probability Distributions” on page 5-3

•

“Supported Distributions” on page 5-16

Nonparametric and Empirical Probability Distributions

Nonparametric and Empirical Probability Distributions In this section... “Overview” on page 5-31 “Kernel Distribution” on page 5-31 “Empirical Cumulative Distribution Function” on page 5-32 “Piecewise Linear Distribution” on page 5-33 “Pareto Tails” on page 5-34 “Triangular Distribution” on page 5-35

Overview In some situations, you cannot accurately describe a data sample using a parametric distribution. Instead, the probability density function (pdf) or cumulative distribution function (cdf) must be estimated from the data. Statistics and Machine Learning Toolbox provides several options for estimating the pdf or cdf from sample data.

Kernel Distribution A kernel distribution on page B-79 produces a nonparametric probability density estimate that adapts itself to the data, rather than selecting a density with a particular parametric form and estimating the parameters. This distribution is defined by a kernel density estimator, a smoothing function that determines the shape of the curve used to generate the pdf, and a bandwidth value that controls the smoothness of the resulting density curve. Similar to a histogram, the kernel distribution builds a function to represent the probability distribution using the sample data. But unlike a histogram, which places the values into discrete bins, a kernel distribution sums the component smoothing functions for each data value to produce a smooth, continuous probability curve. The following plot shows a visual comparison of a histogram and a kernel distribution generated from the same sample data.

5-31

5

Probability Distributions

A histogram represents the probability distribution by establishing bins and placing each data value in the appropriate bin. Because of this bin count approach, the histogram produces a discrete probability density function. This might be unsuitable for certain applications, such as generating random numbers from a fitted distribution. Alternatively, the kernel distribution builds the probability density function (pdf) by creating an individual probability density curve for each data value, then summing the smooth curves. This approach creates one smooth, continuous probability density function for the data set. For more general information about kernel distributions, see “Kernel Distribution” on page B-79. For information on how to work with a kernel distribution, see Using KernelDistribution Objects and ksdensity.

Empirical Cumulative Distribution Function An empirical cumulative distribution function (ecdf) estimates the cdf of a random variable by assigning equal probability to each observation in a sample. Because of this approach, the ecdf is a discrete cumulative distribution function that creates an exact match between the ecdf and the distribution of the sample data. The following plot shows a visual comparison of the ecdf of 20 random numbers generated from a standard normal distribution, and the theoretical cdf of a standard normal distribution. The circles indicate the value of the ecdf calculated at each sample data point. The dashed line that passes through each circle visually represents the ecdf, although the ecdf is not a continuous function. The 5-32

Nonparametric and Empirical Probability Distributions

solid line shows the theoretical cdf of the standard normal distribution from which the random numbers in the sample data were drawn.

The ecdf is similar in shape to the theoretical cdf, although it is not an exact match. Instead, the ecdf is an exact match to the sample data. The ecdf is a discrete function, and is not smooth, especially in the tails where data might be sparse. You can smooth the distribution with Pareto tails on page 5-34, using the paretotails function. For more information and additional syntax options, see ecdf. To construct a continuous function based on cdf values computed from sample data, see “Piecewise Linear Distribution” on page 5-33.

Piecewise Linear Distribution A piecewise linear distribution on page B-145 estimates an overall cdf for the sample data by computing the cdf value at each individual point, and then linearly connecting these values to form a continuous curve. The following plot shows the cdf for a piecewise linear distribution based on a sample of hospital patients’ weight measurements. The circles represent each individual data point (weight measurement). The black line that passes through each data point represents the piecewise linear distribution cdf for the sample data.

5-33

5

Probability Distributions

A piecewise linear distribution linearly connects the cdf values calculated at each sample data point to form a continuous curve. By contrast, an empirical cumulative distribution function on page 5-32 constructed using the ecdf function produces a discrete cdf. For example, random numbers generated from the ecdf can only include x values contained in the original sample data. Random numbers generated from a piecewise linear distribution can include any x value between the lower and upper boundaries of the sample data. Because the piecewise linear distribution cdf is constructed from the values contained in the sample data, the resulting curve is often not smooth, especially in the tails where data might be sparse. You can smooth the distribution with Pareto tails on page 5-34, using the paretotails function. For information on how to work with a piecewise linear distribution, see Using PiecewiseLinearDistribution Objects.

Pareto Tails Pareto tails use a piecewise approach to improve the fit of a nonparametric cdf by smoothing the tails of the distribution. You can fit a kernel distribution on page 5-31, empirical cdf on page 5-32, or a user-defined estimator to the middle data values, then fit generalized Pareto distribution on page B60 curves to the tails. This technique is especially useful when the sample data is sparse in the tails. The following plot shows the empirical cdf (ecdf) of a data sample containing 20 random numbers. The solid line represents the ecdf, and the dashed line represents the empirical cdf with Pareto tails fit to the lower and upper 10 percent of the data. The circles denote the boundaries for the lower and upper 10 percent of the data. 5-34

Nonparametric and Empirical Probability Distributions

Fitting Pareto tails to the lower and upper 10 percent of the sample data makes the cdf smoother in the tails, where the data is sparse. For more information on working with Pareto tails, see paretotails.

Triangular Distribution A “Triangular Distribution” on page B-174 provides a simplistic representation of the probability distribution when limited sample data is available. This continuous distribution is parameterized by a lower limit, peak location, and upper limit. These points are linearly connected to estimate the pdf of the sample data. You can use the mean, median, or mode of the data as the peak location. The following plot shows the triangular distribution pdf of a random sample of 10 integers from 0 to 5. The lower limit is the smallest integer in the sample data, and the upper limit is the largest integer. The peak for this plot is at the mode, or most frequently-occurring value, in the sample data.

5-35

5

Probability Distributions

Business applications such as simulation and project management sometimes use a triangular distribution to create models when limited sample data exists. For more information, see “Triangular Distribution” on page B-174.

See Also ecdf | ksdensity | paretotails

More About

5-36

•

“Kernel Distribution” on page B-79

•

“Piecewise Linear Distribution” on page B-145

•

“Triangular Distribution” on page B-174

•

“Generalized Pareto Distribution” on page B-60

•

“Fit a Nonparametric Distribution with Pareto Tails” on page 5-44

Fit Kernel Distribution Object to Data

Fit Kernel Distribution Object to Data This example shows how to fit a kernel probability distribution object to sample data. Step 1. Load sample data. Load the sample data. load carsmall;

This data contains miles per gallon (MPG) measurements for different makes and models of cars, grouped by country of origin (Origin), model year (Year), and other vehicle characteristics. Step 2. Fit a kernel distribution object. Use fitdist to fit a kernel probability distribution object to the miles per gallon (MPG) data for all makes of cars. pd = fitdist(MPG,'Kernel') pd = KernelDistribution Kernel = normal Bandwidth = 4.11428 Support = unbounded

This creates a prob.KernelDistribution object. By default, fitdist uses a normal kernel smoothing function and chooses an optimal bandwidth for estimating normal densities, unless you specify otherwise. You can access information about the fit and perform further calculations using the related object functions. Step 3. Compute descriptive statistics. Compute the mean, median, and standard deviation of the fitted kernel distribution. m = mean(pd) m = 23.7181 med = median(pd) med = 23.4841 s = std(pd) s = 8.9896

Step 4. Compute and plot the pdf. Compute and plot the pdf of the fitted kernel distribution. figure x = 0:1:60; y = pdf(pd,x); plot(x,y,'LineWidth',2)

5-37

5

Probability Distributions

title('Miles per Gallon') xlabel('MPG')

The plot shows the pdf of the kernel distribution fit to the MPG data across all makes of cars. The distribution is smooth and fairly symmetrical, although it is slightly skewed with a heavier right tail. Step 5. Generate random numbers. Generate a vector of random numbers from the fitted kernel distribution. rng('default') % For reproducibility r = random(pd,1000,1); figure hist(r); set(get(gca,'Children'),'FaceColor',[.8 .8 1]); hold on y = y*5000; % Scale pdf to overlay on histogram plot(x,y,'LineWidth',2) title('Random Numbers Generated From Distribution') hold off

5-38

Fit Kernel Distribution Object to Data

The histogram has a similar shape to the pdf plot because the random numbers generate from the nonparametric kernel distribution fit to the sample data.

See Also fitdist | ksdensity | KernelDistribution

More About •

“Kernel Distribution” on page B-79

•

“Fit Kernel Distribution Using ksdensity” on page 5-40

•

“Working with Probability Distributions” on page 5-3

•

“Supported Distributions” on page 5-16

5-39

5

Probability Distributions

Fit Kernel Distribution Using ksdensity This example shows how to generate a kernel probability density estimate from sample data using the ksdensity function. Step 1. Load sample data. Load the sample data. load carsmall;

This data contains miles per gallon (MPG) measurements for different makes and models of cars, grouped by country of origin (Origin), model year (Year), and other vehicle characteristics. Step 2. Generate a kernel probability density estimate. Use ksdensity to generate a kernel probability density estimate for the miles per gallon (MPG) data. [f,xi] = ksdensity(MPG);

By default, ksdensity uses a normal kernel smoothing function and chooses an optimal bandwidth for estimating normal densities, unless you specify otherwise. Step 3. Plot the kernel probability density estimate. Plot the kernel probability density estimate to visualize the MPG distribution. plot(xi,f,'LineWidth',2) title('Miles per Gallon') xlabel('MPG')

5-40

Fit Kernel Distribution Using ksdensity

The plot shows the pdf of the kernel distribution fit to the MPG data across all makes of cars. The distribution is smooth and fairly symmetrical, although it is slightly skewed with a heavier right tail.

See Also ksdensity | fitdist | KernelDistribution

More About •

“Kernel Distribution” on page B-79

•

“Fit Kernel Distribution Object to Data” on page 5-37

5-41

5

Probability Distributions

Fit Distributions to Grouped Data Using ksdensity This example shows how to fit kernel distributions to grouped sample data using the ksdensity function. Step 1. Load sample data. Load the sample data. load carsmall

The data contains miles per gallon (MPG) measurements for different makes and models of cars, grouped by country of origin (Origin), model year (Model_Year), and other vehicle characteristics. Step 2. Group sample data by origin. Group the MPG data by origin (Origin) for cars made in the USA, Japan, and Germany. Origin = categorical(cellstr(Origin)); MPG_USA = MPG(Origin=='USA'); MPG_Japan = MPG(Origin=='Japan'); MPG_Germany = MPG(Origin=='Germany');

Step 3. Compute and plot the pdf. Compute and plot the pdf for each group. [fi,xi] = ksdensity(MPG_USA); plot(xi,fi,'r-') hold on [fj,xj] = ksdensity(MPG_Japan); plot(xj,fj,'b-.') [fk,xk] = ksdensity(MPG_Germany); plot(xk,fk,'k:') legend('USA','Japan','Germany') title('MPG by Origin') xlabel('MPG') hold off

5-42

Fit Distributions to Grouped Data Using ksdensity

The plot shows how miles per gallon (MPG) performance differs by country of origin (Origin). Using this data, the USA has the widest distribution, and its peak is at the lowest MPG value of the three origins. Japan has the most regular distribution with a slightly heavier left tail, and its peak is at the highest MPG value of the three origins. The peak for Germany is between the USA and Japan, and the second bump near 44 miles per gallon suggests that there might be multiple modes in the data.

See Also ksdensity | fitdist | KernelDistribution

More About •

“Kernel Distribution” on page B-79

•

“Grouping Variables” on page 2-11

•

“Fit Kernel Distribution Using ksdensity” on page 5-40

•

“Fit Probability Distribution Objects to Grouped Data” on page 5-93

5-43

5

Probability Distributions

Fit a Nonparametric Distribution with Pareto Tails This example shows how to fit a nonparametric probability distribution to sample data using Pareto tails to smooth the distribution in the tails. Step 1. Generate sample data. Generate sample data that contains more outliers than expected from a standard normal distribution. rng('default') % For reproducibility left_tail = -exprnd(1,10,1); right_tail = exprnd(5,10,1); center = randn(80,1); data = [left_tail;center;right_tail];

The data contains 80% values from a standard normal distribution, 10% from an exponential distribution with a mean of 5, and 10% from an exponential distribution with mean of -1. Compared to a standard normal distribution, the exponential values are more likely to be outliers, especially in the upper tail. Step 2. Fit probability distributions to the data. Fit a normal distribution and a t location-scale distribution to the data, and plot for a visual comparison. probplot(data); hold on p = fitdist(data,'tlocationscale'); h = plot(gca,p,'PlotType',"probability"); set(h,'color','r','linestyle','-'); title('Probability Plot') legend('Normal','Data','t location-scale','Location','SE') hold off

5-44

Fit a Nonparametric Distribution with Pareto Tails

Both distributions appear to fit reasonably well in the center, but neither the normal distribution nor the t location-scale distribution fit the tails very well. Step 3. Generate an empirical distribution. To obtain a better fit, use ecdf to generate an empirical cdf based on the sample data. figure ecdf(data)

5-45

5

Probability Distributions

The empirical distribution provides a perfect fit, but the outliers make the tails very discrete. Random samples generated from this distribution using the inversion method might include, for example, values near 4.33 and 9.25, but no values in between. Step 4. Fit a distribution using Pareto tails. Use paretotails to generate an empirical cdf for the middle 80% of the data and fit generalized Pareto distributions to the lower and upper 10%. pfit = paretotails(data,0.1,0.9) pfit = Piecewise distribution with 3 segments -Inf < x < -1.24623 (0 < p < 0.1): lower tail, GPD(-0.334156,0.798745) -1.24623 < x < 1.48551 (0.1 < p < 0.9): interpolated empirical cdf 1.48551 < x < Inf (0.9 < p < 1): upper tail, GPD(1.23681,0.581868)

To obtain a better fit, paretotails fits a distribution by piecing together an ecdf or kernel distribution in the center of the sample, and smooth generalized Pareto distributions (GPDs) in the tails. Use paretotails to create paretotails probability distribution object. You can access information about the fit and perform further calculations on the object using the object functions of the paretotails object. For example, you can evaluate the cdf or generate random numbers from the distribution.

5-46

Fit a Nonparametric Distribution with Pareto Tails

Step 5. Compute and plot the cdf. Compute and plot the cdf of the fitted paretotails distribution. x = -4:0.01:10; plot(x,cdf(pfit,x))

The paretotails cdf closely fits the data but is smoother in the tails than the ecdf generated in Step 3.

See Also fitdist | paretotails | ecdf

More About •

“Nonparametric and Empirical Probability Distributions” on page 5-31

5-47

5

Probability Distributions

Generate Random Numbers Using the Triangular Distribution This example shows how to create a triangular probability distribution object based on sample data, and generate random numbers for use in a simulation. Step 1. Input sample data. Input the data vector time, which contains the observed length of time (in seconds) that 10 different cars stopped at a highway tollbooth. time = [6 14 8 7 16 8 23 6 7 15];

The data shows that, while most cars stopped for 6 to 16 seconds, one outlier stopped for 23 seconds. Step 2. Estimate distribution parameters. Estimate the triangular distribution parameters from the sample data. lower = min(time); peak = median(time); upper = max(time);

A triangular distribution provides a simplistic representation of the probability distribution when sample data is limited. Estimate the lower and upper boundaries of the distribution by finding the minimum and maximum values of the sample data. For the peak parameter, the median might provide a better estimate of the mode than the mean, since the data includes an outlier. Step 3. Create a probability distribution object. Create a triangular probability distribution object using the estimated parameter values. pd = makedist('Triangular','A',lower,'B',peak,'C',upper) pd = TriangularDistribution A = 6, B = 8, C = 23

Compute and plot the pdf of the triangular distribution. x = 0:.1:230; y = pdf(pd,x); plot(x,y) title('Time Spent at Tollbooth') xlabel('Time (seconds)') xlim([0 30])

5-48

Generate Random Numbers Using the Triangular Distribution

The plot shows that this triangular distribution is skewed to the right. However, since the estimated peak value is the sample median, the distribution should be symmetrical about the peak. Because of its skew, this model might, for example, generate random numbers that seem unusually high when compared to the initial sample data. Step 4. Generate random numbers. Generate random numbers from this distribution to simulate future traffic flow through the tollbooth. rng('default'); % For reproducibility r = random(pd,10,1) r = 10×1 16.1265 18.0987 8.0796 18.3001 13.3176 7.8211 9.4360 12.2508 19.7082 20.0078

The returned values in r are the time in seconds that the next 10 simulated cars spend at the tollbooth. These values seem high compared to the values in the original data vector time because 5-49

5

Probability Distributions

the outlier skewed the distribution to the right. Using the second-highest value as the upper limit parameter might mitigate the effects of the outlier and generate a set of random numbers more similar to the initial sample data. Step 5. Revise estimated parameters. Estimate the upper boundary of the distribution using the second largest value in the sample data. sort_time = sort(time,'descend'); secondLargest = sort_time(2);

Step 6. Create a new distribution object and plot the pdf. Create a new triangular probability distribution object using the revised estimated parameters, and plot its pdf. figure pd2 = makedist('Triangular','A',lower,'B',peak,'C',secondLargest); y2 = pdf(pd2,x); plot(x,y2,'LineWidth',2) title('Time Spent at Tollbooth') xlabel('Time (seconds)') xlim([0 30])

The plot shows that this triangular distribution is still slightly skewed to the right. However, it is much more symmetrical about the peak than the distribution that used the maximum sample data value to estimate the upper limit. 5-50

Generate Random Numbers Using the Triangular Distribution

Step 7. Generate new random numbers. Generate new random numbers from the revised distribution. rng('default'); % For reproducibility r2 = random(pd2,10,1) r2 = 10×1 12.1501 13.2547 7.5937 13.3675 10.5768 7.3967 8.4026 9.9792 14.1562 14.3240

These new values more closely resemble those in the original data vector time. They are also closer to the sample median than the random numbers generated by the distribution that used the outlier to estimate its upper limit. This example does not remove the outlier from the sample data when computing the median. Other options for parameter estimation include removing outliers from the sample data altogether, or using the mean or mode of the sample data as the peak value.

See Also pdf | random | makedist

More About •

“Triangular Distribution” on page B-174

•

“Random Number Generation” on page 5-28

•

“Generate Random Numbers Using Uniform Distribution Inversion” on page 5-109

5-51

5

Probability Distributions

Model Data Using the Distribution Fitter App In this section... “Explore Probability Distributions Interactively” on page 5-52 “Create and Manage Data Sets” on page 5-53 “Create a New Fit” on page 5-56 “Display Results” on page 5-60 “Manage Fits” on page 5-61 “Evaluate Fits” on page 5-63 “Exclude Data” on page 5-65 “Save and Load Sessions” on page 5-69 “Generate a File to Fit and Plot Distributions” on page 5-69 The Distribution Fitter app provides a visual, interactive approach to fitting univariate distributions to data.

Explore Probability Distributions Interactively You can use the Distribution Fitter app to interactively fit probability distributions to data imported from the MATLAB workspace. You can choose from 22 built-in probability distributions, or create your own custom distribution. The app displays the fitted distribution over plots of the empirical distributions, including pdf, cdf, probability plots, and survivor functions. You can export the fit data, including fitted parameter values, to the workspace for further analysis. Distribution Fitter App Workflow To fit a probability distribution to your sample data:

5-52

1

On the MATLAB Toolstrip, click the Apps tab. In the Math, Statistics and Optimization group, open the Distribution Fitter app. Alternatively, at the command prompt, enter distributionFitter.

2

Import your sample data, or create a data vector directly in the app. You can also manage your data sets and choose which one to fit. See “Create and Manage Data Sets” on page 5-53.

3

Create a new fit for your data. See “Create a New Fit” on page 5-56.

4

Display the results of the fit. You can choose to display the density (pdf), cumulative probability (cdf), quantile (inverse cdf), probability plot (choose one of several distributions), survivor function, and cumulative hazard. See “Display Results” on page 5-60.

5

You can create additional fits, and manage multiple fits from within the app. See “Manage Fits” on page 5-61.

6

Evaluate probability functions for the fit. You can choose to evaluate the density (pdf), cumulative probability (cdf), quantile (inverse cdf), survivor function, and cumulative hazard. See “Evaluate Fits” on page 5-63.

7

Improve the fit by excluding certain data. You can specify bounds for the data to exclude, or you can exclude data graphically using a plot of the values in the sample data. See “Exclude Data” on page 5-65.

Model Data Using the Distribution Fitter App

8

Save your current Distribution Fitter app session so you can open it later. See “Save and Load Sessions” on page 5-69.

Create and Manage Data Sets To open the Data dialog box, click the Data button in the Distribution Fitter app.

Import Data Create a data set by importing a vector from the MATLAB workspace using the Import workspace vectors options. • Data — In the Data field, the drop-down list contains the names of all matrices and vectors, other than 1-by-1 matrices (scalars) in the MATLAB workspace. Select the array containing the data that you want to fit. The actual data you import must be a vector. If you select a matrix in the Data 5-53

5

Probability Distributions

field, the first column of the matrix is imported by default. To select a different column or row of the matrix, click Select Column or Row. The matrix appears in the Select Column or Row dialog box. You can select a row or column by highlighting it. Alternatively, you can enter any valid MATLAB expression in the Data field. When you select a vector in the Data field, a histogram of the data appears in the Data preview pane. • Censoring — If some of the points in the data set are censored, enter a Boolean vector of the same size as the data vector, specifying the censored entries of the data. A 1 in the censoring vector specifies that the corresponding entry of the data vector is censored. A 0 specifies that the entry is not censored. If you enter a matrix, you can select a column or row by clicking Select Column or Row. If you do not have censored data, leave the Censoring field blank. • Frequency — Enter a vector of positive integers of the same size as the data vector to specify the frequency of the corresponding entries of the data vector. For example, a value of 7 in the 15th entry of frequency vector specifies that there are 7 data points corresponding to the value in the 15th entry of the data vector. If all entries of the data vector have frequency 1, leave the Frequency field blank. • Data set name — Enter a name for the data set that you import from the workspace, such as My data. After you have entered the information in the preceding fields, click Create Data Set to create the data set My data. Manage Data Sets View and manage the data sets that you create using the Manage data sets pane. When you create a data set, its name appears in the Data set list. The following figure shows the Manage data sets pane after creating the data set My data.

For each data set in the Data set list, you can: • Select the Plot check box to display a plot of the data in the main Distribution Fitter app window. When you create a new data set, Plot is selected by default. Clearing the Plot check box removes the data from the plot in the main window. You can specify the type of plot displayed in the Display type field in the main window. • If Plot is selected, you can also select Conf bounds to display confidence interval bounds for the plot in the main window. These bounds are pointwise confidence bounds around the empirical 5-54

Model Data Using the Distribution Fitter App

estimates of these functions. The bounds are displayed only when you set Display Type in the main window to one of the following: • Cumulative probability (CDF) • Survivor function • Cumulative hazard The Distribution Fitter app cannot display confidence bounds on density (PDF), quantile (inverse CDF), or probability plots. Clearing the Conf bounds check box removes the confidence bounds from the plot in the main window. When you select a data set from the list, you can access the following buttons: • View — Display the data in a table in a new window. • Set Bin Rules — Defines the histogram bins used in a density (PDF) plot. • Rename — Rename the data set. • Delete — Delete the data set. Set Bin Rules To set bin rules for the histogram of a data set, click Set Bin Rules to open the Set Bin Rules dialog box.

You can select from the following rules:

5-55

5

Probability Distributions

• Freedman-Diaconis rule — Algorithm that chooses bin widths and locations automatically, based on the sample size and the spread of the data. This rule, which is the default, is suitable for many kinds of data. • Scott rule — Algorithm intended for data that are approximately normal. The algorithm chooses bin widths and locations automatically. • Number of bins — Enter the number of bins. All bins have equal widths. • Bins centered on integers — Specifies bins centered on integers. • Bin width — Enter the width of each bin. If you select this option, you can also select: • Automatic bin placement — Place the edges of the bins at integer multiples of the Bin width. • Bin boundary at — Enter a scalar to specify the boundaries of the bins. The boundary of each bin is equal to this scalar plus an integer multiple of the Bin width. You can also: • Apply to all existing data sets — Apply the rule to all data sets. Otherwise, the rule is applied only to the data set currently selected in the Data dialog box. • Save as default — Apply the current rule to any new data sets that you create. You can set default bin width rules by selecting Set Default Bin Rules from the Tools menu in the main window.

Create a New Fit Click the New Fit button at the top of the main window to open the New Fit dialog box. If you created the data set My data, it appears in the Data field.

5-56

Model Data Using the Distribution Fitter App

Field Name

Description

Fit Name

Enter a name for the fit.

5-57

5

Probability Distributions

Field Name

Description

Data

Select the data set to which you want to fit a distribution from the drop-down list.

Distribution

Select the type of distribution to fit from the Distribution drop-down list. Only the distributions that apply to the values of the selected data set appear in the Distribution field. For example, when the data include values that are zero or negative, positive distributions are not displayed. You can specify either a parametric or a nonparametric distribution. When you select a parametric distribution from the drop-down list, a description of its parameters appears. Distribution Fitter estimates these parameters to fit the distribution to the data set. If you select the binomial distribution or the generalized extreme value distribution, you must specify a fixed value for one of the parameters. The pane contains a text field into which you can specify that parameter. When you select Nonparametric fit, options for the fit appear in the pane, as described in “Further Options for Nonparametric Fits” on page 5-59.

Exclusion rule

Specify a rule to exclude some data. Create an exclusion rule by clicking Exclude in the Distribution Fitter app. For more information, see “Exclude Data” on page 5-65.

Apply the New Fit Click Apply to fit the distribution. For a parametric fit, the Results pane displays the values of the estimated parameters. For a nonparametric fit, the Results pane displays information about the fit. When you click Apply, the Distribution Fitter app displays a plot of the distribution and the corresponding data. Note When you click Apply, the title of the dialog box changes to Edit Fit. You can now make changes to the fit you just created and click Apply again to save them. After closing the Edit Fit dialog box, you can reopen it from the Fit Manager dialog box at any time to edit the fit. After applying the fit, you can save the information to the workspace using probability distribution objects by clicking Save to workspace. Available Distributions All of the distributions available in the Distribution Fitter app are supported elsewhere in Statistics and Machine Learning Toolbox software. You can use the fitdist function to fit any of the distributions supported by the app. Many distributions also have dedicated fitting functions. These functions compute the majority of the fits in the Distribution Fitter app, and are referenced in the following list. Other fits are computed using functions internal to the Distribution Fitter app. Not all of the distributions listed are available for all data sets. The Distribution Fitter app determines the extent of the data (nonnegative, unit interval, and so on) and displays appropriate distributions in the Distribution drop-down list. Distribution data ranges are given parenthetically in the following list. 5-58

Model Data Using the Distribution Fitter App

• Beta on page B-6 (unit interval values) distribution, fit using the function betafit. • Binomial on page B-10 (nonnegative integer values) distribution, fit using the function binopdf. • Birnbaum-Saunders on page B-18 (positive values) distribution. • Burr Type XII on page B-19 (positive values) distribution. • Exponential on page B-34 (nonnegative values) distribution, fit using the function expfit. • Extreme value on page B-41 (all values) distribution, fit using the function evfit. • Gamma on page B-48 (positive values) distribution, fit using the function gamfit. • Generalized extreme value on page B-56 (all values) distribution, fit using the function gevfit. • Generalized Pareto on page B-60 (all values) distribution, fit using the function gpfit. • Inverse Gaussian on page B-76 (positive values) distribution. • Logistic on page B-86 (all values) distribution. • Loglogistic on page B-87 (positive values) distribution. • Lognormal on page B-89 (positive values) distribution, fit using the function lognfit. • Nakagami on page B-114 (positive values) distribution. • Negative binomial on page B-115 (nonnegative integer values) distribution, fit using the function nbinpdf. • Nonparametric on page B-79 (all values) distribution, fit using the function ksdensity. • Normal on page B-125 (all values) distribution, fit using the function normfit. • Poisson on page B-146 (nonnegative integer values) distribution, fit using the function poisspdf. • Rayleigh on page B-152 (positive values) distribution using the function raylfit. • Rician on page B-154 (positive values) distribution. • t location-scale on page B-172 (all values) distribution. • Weibull on page B-186 (positive values) distribution using the function wblfit. Further Options for Nonparametric Fits When you select Non-parametric in the Distribution field, a set of options appears in the Nonparametric pane, as shown in the following figure.

The options for nonparametric distributions are: • Kernel — Type of kernel function to use. 5-59

5

Probability Distributions

• Normal • Box • Triangle • Epanechnikov • Bandwidth — The bandwidth of the kernel smoothing window. Select Auto for a default value that is optimal for estimating normal densities. After you click Apply, this value appears in the Results pane. Select Specify and enter a smaller value to reveal features such as multiple modes or a larger value to make the fit smoother. • Domain — The allowed x-values for the density. • Unbounded — The density extends over the whole real line. • Positive — The density is restricted to positive values. • Specify — Enter lower and upper bounds for the domain of the density. When you select Positive or Specify, the nonparametric fit has zero probability outside the specified domain.

Display Results The Distribution Fitter app window displays plots of: • The data sets for which you select Plot in the Data dialog box. • The fits for which you select Plot in the Fit Manager dialog box. • Confidence bounds for: • The data sets for which you select Conf bounds in the Data dialog box. • The fits for which you select Conf bounds in the Fit Manager dialog box. The following fields are available. Display Type Specify the type of plot to display using the Display Type field in the main app window. Each type corresponds to a probability function, for example, a probability density function. You can choose from the following display types: • Density (PDF) — Display a probability density function (PDF) plot for the fitted distribution. The main window displays data sets using a probability histogram, in which the height of each rectangle is the fraction of data points that lie in the bin divided by the width of the bin. This makes the sum of the areas of the rectangles equal to 1. • Cumulative probability (CDF) — Display a cumulative probability plot of the data. The main window displays data sets using a cumulative probability step function. The height of each step is the cumulative sum of the heights of the rectangles in the probability histogram. • Quantile (inverse CDF) — Display a quantile (inverse CDF) plot. • Probability plot — Display a probability plot of the data. Specify the type of distribution used to construct the probability plot in the Distribution field. This field is only available when you select Probability plot. The choices for the distribution are: • Exponential 5-60

Model Data Using the Distribution Fitter App

• Extreme Value • Half Normal • Log-Logistic • Logistic • Lognormal • Normal • Rayleigh • Weibull You can also create a probability plot against a parametric fit that you create in the New Fit dialog box. When you create these fits, they are added at the bottom of the Distribution dropdown list. • Survivor function — Display survivor function plot of the data. • Cumulative hazard — Display cumulative hazard plot of the data. Note If the plotted data includes 0 or negative values, some distributions are unavailable. Confidence Bounds You can display confidence bounds for data sets and fits when you set Display Type to Cumulative probability (CDF), Survivor function, Cumulative hazard, or, for fits only, Quantile (inverse CDF). • To display bounds for a data set, select Conf bounds next to the data set in the Manage data sets pane of the Data dialog box. • To display bounds for a fit, select Conf bounds next to the fit in the Fit Manager dialog box. Confidence bounds are not available for all fit types. To set the confidence level for the bounds, select Confidence Level from the View menu in the main window and choose from the options.

Manage Fits Click the Manage Fits button to open the Fit Manager dialog box.

5-61

5

Probability Distributions

The Table of fits displays a list of the fits that you create, with the following options: • Plot — Displays a plot of the fit in the main window of the Distribution Fitter app. When you create a new fit, Plot is selected by default. Clearing the Plot check box removes the fit from the plot in the main window. • Conf bounds — If you select Plot, you can also select Conf bounds to display confidence bounds in the plot. The bounds are displayed when you set Display type in the main window to one of the following: • Cumulative probability (CDF) • Quantile (inverse CDF) • Survivor function • Cumulative hazard The Distribution Fitter app cannot display confidence bounds on density (PDF) or probability plots. Bounds are not supported for nonparametric fits and some parametric fits. Clearing the Conf bounds check box removes the confidence intervals from the plot in the main window. When you select a fit in the Table of fits, the following buttons are enabled below the table: • New Fit — Open a New Fit window. • Copy — Create a copy of the selected fit. 5-62

Model Data Using the Distribution Fitter App

• Edit — Open an Edit Fit dialog box, to edit the fit. Note You can edit only the currently selected fit in the Edit Fit dialog box. To edit a different fit, select it in the Table of fits and click Edit to open another Edit Fit dialog box. • Save to workspace — Save the selected fit as a distribution object. • Delete — Delete the selected fit.

Evaluate Fits Use the Evaluate dialog box to evaluate your fitted distribution at any data points you choose. To open the dialog box, click the Evaluate button.

In the Evaluate dialog box, choose from the following items: • Fit pane — Display the names of existing fits. Select one or more fits that you want to evaluate. Using your platform specific functionality, you can select multiple fits. • Function — Select the type of probability function that you want to evaluate for the fit. The available functions are: • Density (PDF) — Computes a probability density function. • Cumulative probability (CDF) — Computes a cumulative probability function. • Quantile (inverse CDF) — Computes a quantile (inverse CDF) function. 5-63

5

Probability Distributions

• Survivor function — Computes a survivor function. • Cumulative hazard — Computes a cumulative hazard function. • Hazard rate — Computes the hazard rate. • At x = — Enter a vector of points or the name of a workspace variable containing a vector of points at which you want to evaluate the distribution function. If you change Function to Quantile (inverse CDF), the field name changes to At p =, and you enter a vector of probability values. • Compute confidence bounds — Select this box to compute confidence bounds for the selected fits. The check box is enabled only if you set Function to one of the following: • Cumulative probability (CDF) • Quantile (inverse CDF) • Survivor function • Cumulative hazard The Distribution Fitter app cannot compute confidence bounds for nonparametric fits and for some parametric fits. In these cases, it returns NaN for the bounds. • Level — Set the level for the confidence bounds. • Plot function — Select this box to display a plot of the distribution function, evaluated at the points you enter in the At x = field, in a new window. Note The settings for Compute confidence bounds, Level, and Plot function do not affect the plots that are displayed in the main window of the Distribution Fitter app. The settings apply only to plots you create by clicking Plot function in the Evaluate window. To apply these evaluation settings to the selected fit, click Apply. The following figure shows the results of evaluating the cumulative distribution function for the fit My fit, at the points in the vector 5:4:45.

5-64

Model Data Using the Distribution Fitter App

The columns of the table to the right of the Fit pane display the following values: • X — The entries of the vector that you enter in At x = field. • F(X) — The corresponding values of the CDF at the entries of X. • LB — The lower bounds for the confidence interval, if you select Compute confidence bounds. • UB — The upper bounds for the confidence interval, if you select Compute confidence bounds. To save the data displayed in the table to a matrix in the MATLAB workspace, click Export to Workspace.

Exclude Data To exclude values from fit, open the Exclude window by clicking the Exclude button. In the Exclude window, you can create rules for excluding specified data values. When you create a new fit in the New Fit window, you can use these rules to exclude data from the fit.

5-65

5

Probability Distributions

To create an exclusion rule: 1

Exclusion Rule Name — Enter a name for the exclusion rule.

2

Exclude Sections — Specify bounds for the excluded data: • In the Lower limit: exclude data drop-down list, select = or > and enter a scalar value in the field to the right. Depending on which operator you select, the app excludes from the fit any data values that are greater than or equal to the scalar value, or greater than the scalar value, respectively. OR Click the Exclude Graphically button to define the exclusion rule by displaying a plot of the values in a data set and selecting the bounds for the excluded data. For example, if you created the data set My data as described in Create and Manage Data Sets, select it from the Select data drop-down list, and then click the Exclude Graphically button. The app displays the values in My data in a new window.

5-66

Model Data Using the Distribution Fitter App

To set a lower limit for the boundary of the excluded region, click Add Lower Limit. The app displays a vertical line on the left side of the plot window. Move the line to the point you where you want the lower limit, as shown in the following figure.

5-67

5

Probability Distributions

Move the vertical line to change the value displayed in the Lower limit: exclude data field in the Exclude window.

The value displayed corresponds to the x-coordinate of the vertical line. Similarly, you can set the upper limit for the boundary of the excluded region by clicking Add Upper Limit, and then moving the vertical line that appears at the right side of the plot window. After setting the lower and upper limits, click Close and return to the Exclude window. 3

Create Exclusion Rule — Once you have set the lower and upper limits for the boundary of the excluded data, click Create Exclusion Rule to create the new rule. The name of the new rule appears in the Existing exclusion rules pane. Selecting an exclusion rule in the Existing exclusion rules pane enables the following buttons: • Copy — Creates a copy of the rule, which you can then modify. To save the modified rule under a different name, click Create Exclusion Rule. • View — Opens a new window in which you can see the data points excluded by the rule. The following figure shows a typical example.

5-68

Model Data Using the Distribution Fitter App

The shaded areas in the plot graphically display which data points are excluded. The table to the right lists all data points. The shaded rows indicate excluded points. • Rename — Rename the rule. • Delete — Delete the rule. After you define an exclusion rule, you can use it when you fit a distribution to your data. The rule does not exclude points from the display of the data set.

Save and Load Sessions Save your work in the current session, and then load it in a subsequent session, so that you can continue working where you left off. Save a Session To save the current session, from the File menu in the main window, select Save Session. A dialog box opens and prompts you to enter a file name, for example my_session.dfit. Click Save to save the following items created in the current session: • Data sets • Fits • Exclusion rules • Plot settings • Bin width rules Load a Session To load a previously saved session, from the File menu in the main window, select Load Session. Enter the name of a previously saved session. Click Open to restore the information from the saved session to the current session.

Generate a File to Fit and Plot Distributions Use the Generate Code option in the File menu to create a file that: • Fits the distributions in the current session to any data vector in the MATLAB workspace. • Plots the data and the fits. After you end the current session, you can use the file to create plots in a standard MATLAB figure window, without reopening the Distribution Fitter app. As an example, if you created the fit described in “Create a New Fit” on page 5-56, do the following steps: 1

From the File menu, select Generate Code.

2

In the MATLAB Editor window, choose File > Save as. Save the file as normal_fit.m in a folder on the MATLAB path.

You can then apply the function normal_fit to any vector of data in the MATLAB workspace. For example, the following commands: 5-69

5

Probability Distributions

new_data = normrnd(4.1, 12.5, 100, 1); newfit = normal_fit(new_data) legend('New Data', 'My fit')

generate newfit, a fitted normal distribution of the data. The commands also generate a plot of the data and the fit. newfit = NormalDistribution Normal distribution mu = 5.63857 [2.7555, 8.52163] sigma = 14.53 [12.7574, 16.8791]

Note By default, the file labels the data in the legend using the same name as the data set in the Distribution Fitter app. You can change the label using the legend command, as illustrated by the preceding example.

See Also Distribution Fitter

5-70

Model Data Using the Distribution Fitter App

More About •

“Fit a Distribution Using the Distribution Fitter App” on page 5-72

•

“Define Custom Distributions Using the Distribution Fitter App” on page 5-82

5-71

5

Probability Distributions

Fit a Distribution Using the Distribution Fitter App In this section... “Step 1: Load Sample Data” on page 5-72 “Step 2: Import Data” on page 5-72 “Step 3: Create a New Fit” on page 5-74 “Step 4: Create and Manage Additional Fits” on page 5-77 This example shows how you can use the Distribution Fitter app to interactively fit a probability distribution to data.

Step 1: Load Sample Data Load the sample data. load carsmall

Step 2: Import Data Open the Distribution Fitter tool. distributionFitter

To import the vector MPG into the Distribution Fitter app, click the Data button. The Data dialog box opens.

5-72

Fit a Distribution Using the Distribution Fitter App

The Data field displays all numeric arrays in the MATLAB workspace. From the drop-down list, select MPG. A histogram of the selected data appears in the Data preview pane. In the Data set name field, type a name for the data set, such as MPG data, and click Create Data Set. The main window of the Distribution Fitter app now displays a larger version of the histogram in the Data preview pane.

5-73

5

Probability Distributions

Step 3: Create a New Fit To fit a distribution to the data, in the main window of the Distribution Fitter app, click New Fit. To fit a normal distribution to MPG data:

5-74

1

In the Fit name field, enter a name for the fit, such as My fit.

2

From the drop-down list in the Data field, select MPG data.

3

Confirm that Normal is selected from the drop-down menu in the Distribution field.

4

Click Apply.

Fit a Distribution Using the Distribution Fitter App

The Results pane displays the mean and standard deviation of the normal distribution that best fits MPG data.

5-75

5

Probability Distributions

The Distribution Fitter app main window displays a plot of the normal distribution with this mean and standard deviation.

Based on the plot, a normal distribution does not appear to provide a good fit for the MPG data. To obtain a better evaluation, select Probability plot from the Display type drop-down list. Confirm that the Distribution drop-down list is set to Normal. The main window displays the following figure.

5-76

Fit a Distribution Using the Distribution Fitter App

The normal probability plot shows that the data deviates from normal, especially in the tails.

Step 4: Create and Manage Additional Fits The MPG data pdf indicates that the data has two peaks. Try fitting a nonparametric kernel distribution to obtain a better fit for this data. 1

Click Manage Fits. In the dialog box, click New Fit.

2

In the Fit name field, enter a name for the fit, such as Kernel fit.

3

From the drop-down list in the Data field, select MPG data.

4

From the drop-down list in the Distribution field, select Non-parametric. This enables several options in the Non-parametric pane, including Kernel, Bandwidth, and Domain. For now, 5-77

5

Probability Distributions

accept the default value to apply a normal kernel shape and automatically determine the kernel bandwidth (using Auto). For more information about nonparametric kernel distributions, see “Kernel Distribution” on page B-79. 5

Click Apply.

The Results pane displays the kernel type, bandwidth, and domain of the nonparametric distribution fit to MPG data.

5-78

Fit a Distribution Using the Distribution Fitter App

The main window displays plots of the original MPG data with the normal distribution and nonparametric kernel distribution overlaid. To visually compare these two fits, select Density (PDF) from the Display type drop-down list.

5-79

5

Probability Distributions

To include only the nonparametric kernel fit line (Kernel fit) on the plot, click Manage Fits. In the Table of fits pane, locate the row for the normal distribution fit (My fit) and clear the box in the Plot column.

5-80

Fit a Distribution Using the Distribution Fitter App

See Also Distribution Fitter

More About •

“Model Data Using the Distribution Fitter App” on page 5-52

•

“Define Custom Distributions Using the Distribution Fitter App” on page 5-82

5-81

5

Probability Distributions

Define Custom Distributions Using the Distribution Fitter App You can define a probability object for a custom distribution and use the Distribution Fitter app or fitdist to fit distributions not supported by Statistics and Machine Learning Toolbox. You can also use a custom probability object as an input argument of probability object functions, such as pdf, cdf, icdf, and random, to evaluate the distribution, generate random numbers, and so on.

Open the Distribution Fitter App • MATLAB Toolstrip: On the Apps tab, under Math, Statistics and Optimization, click the app icon. • MATLAB command prompt: Enter distributionFitter. distributionFitter

5-82

Define Custom Distributions Using the Distribution Fitter App

Define Custom Distribution To define a custom distribution using the app, select File > Define Custom Distributions. A file template opens in the MATLAB Editor. You then edit this file so that it creates a probability object for the distribution you want. The template includes sample code that defines a probability object for the Laplace distribution. Follow the instructions in the template to define your own custom distribution. To save your custom probability object, create a directory named +prob on your path. Save the file in this directory using a name that matches your distribution name. For example, save the template as LaplaceDistribution.m, and then import the custom distribution as described in the next section.

5-83

5

Probability Distributions

Import Custom Distribution To import a custom distribution using the app, select File > Import Custom Distributions. The Imported Distributions dialog box opens, in which you can select the file that defines the distribution. For example, if you create the file LaplaceDistribution.m, as described in the preceding section, the list in the dialog box includes Laplace followed by an asterisk, indicating the file is new or modified and available for fitting.

Alternatively, you can use the makedist function to reset the list of distributions so that you do not need to select File > Import Custom Distributions in the app. makedist -reset

This command resets the list of distributions by searching the path for files contained in a package named prob and implementing classes derived from ProbabilityDistribution. If you open the app after resetting the list, the distribution list in the app includes the custom distribution that you defined. Once you import a custom distribution using the Distribution Fitter app or reset the list by using makedist, you can use the custom distribution in the app and in the Command Window. The Distribution field of the New Fit dialog box, available from the Distribution Fitter app, contains the new custom distribution. In the Command Window, you can create the custom probability distribution object by using makedist and fit a data set to the custom distribution by using fitdist. Then, you 5-84

Define Custom Distributions Using the Distribution Fitter App

can use probability object functions, such as pdf, cdf, icdf, and random, to evaluate the distribution, generate random numbers, and so on.

See Also Distribution Fitter | makedist

More About •

“Model Data Using the Distribution Fitter App” on page 5-52

•

“Fit a Distribution Using the Distribution Fitter App” on page 5-72

5-85

5

Probability Distributions

Explore the Random Number Generation UI The Random Number Generation user interface (UI) generates random samples from specified probability distributions, and displays the samples as histograms. Use the interface to explore the effects of changing parameters and sample size on the distributions. Run the user interface by typing randtool at the command line.

Start by selecting a distribution, then enter the desired sample size. You can also • Use the controls at the bottom of the window to set parameter values for the distribution and to change their upper and lower bounds. 5-86

Explore the Random Number Generation UI

• Draw another sample from the same distribution, with the same size and parameters. • Export the current sample to your workspace. A dialog box enables you to provide a name for the sample.

5-87

5

Probability Distributions

Compare Multiple Distribution Fits This example shows how to fit multiple probability distribution objects to the same set of sample data, and obtain a visual comparison of how well each distribution fits the data. Step 1. Load sample data. Load the sample data. load carsmall

This data contains miles per gallon (MPG) measurements for different makes and models of cars, grouped by country of origin (Origin), model year (Model_Year), and other vehicle characteristics. Step 2. Create a categorical array. Transform Origin into a categorical array and remove the Italian car from the sample data. Since there is only one Italian car, fitdist cannot fit a distribution to that group using other than a kernel distribution. Origin = categorical(cellstr(Origin)); MPG2 = MPG(Origin~='Italy'); Origin2 = Origin(Origin~='Italy'); Origin2 = removecats(Origin2,'Italy');

Step 3. Fit multiple distributions by group. Use fitdist to fit Weibull, normal, logistic, and kernel distributions to each country of origin group in the MPG data. [WeiByOrig,Country] = fitdist(MPG2,'weibull','by',Origin2); [NormByOrig,Country] = fitdist(MPG2,'normal','by',Origin2); [LogByOrig,Country] = fitdist(MPG2,'logistic','by',Origin2); [KerByOrig,Country] = fitdist(MPG2,'kernel','by',Origin2); WeiByOrig WeiByOrig=1×5 cell array {1x1 prob.WeibullDistribution}

{1x1 prob.WeibullDistribution}

{1x1 prob.WeibullDistribu

Country Country = 5x1 cell {'France' } {'Germany'} {'Japan' } {'Sweden' } {'USA' }

Each country group now has four distribution objects associated with it. For example, the cell array WeiByOrig contains five Weibull distribution objects, one for each country represented in the sample data. Likewise, the cell array NormByOrig contains five normal distribution objects, and so on. Each object contains properties that hold information about the data, distribution, and parameters. The array Country lists the country of origin for each group in the same order as the distribution objects are stored in the cell arrays. 5-88

Compare Multiple Distribution Fits

Step 4. Compute the pdf for each distribution. Extract the four probability distribution objects for USA and compute the pdf for each distribution. As shown in Step 3, USA is in position 5 in each cell array. WeiUSA = WeiByOrig{5}; NormUSA = NormByOrig{5}; LogUSA = LogByOrig{5}; KerUSA = KerByOrig{5}; x = 0:1:50; pdf_Wei = pdf(WeiUSA,x); pdf_Norm = pdf(NormUSA,x); pdf_Log = pdf(LogUSA,x); pdf_Ker = pdf(KerUSA,x);

Step 5. Plot pdf the for each distribution. Plot the pdf for each distribution fit to the USA data, superimposed on a histogram of the sample data. Normalize the histogram for easier display. Create a histogram of the USA sample data. data = MPG(Origin2=='USA'); figure histogram(data,10,'Normalization','pdf','FaceColor',[1,0.8,0]);

Plot the pdf of each fitted distribution. line(x,pdf_Wei,'LineStyle','-','Color','r') line(x,pdf_Norm,'LineStyle','-.','Color','b') line(x,pdf_Log,'LineStyle','--','Color','g') line(x,pdf_Ker,'LineStyle',':','Color','k') legend('Data','Weibull','Normal','Logistic','Kernel','Location','Best') title('MPG for Cars from USA') xlabel('MPG')

5-89

5

Probability Distributions

Superimposing the pdf plots over a histogram of the sample data provides a visual comparison of how well each type of distribution fits the data. Only the nonparametric kernel distribution KerUSA comes close to revealing the two modes in the original data. Step 6. Further group USA data by year. To investigate the two modes revealed in Step 5, group the MPG data by both country of origin (Origin) and model year (Model_Year), and use fitdist to fit kernel distributions to each group. [KerByYearOrig,Names] = fitdist(MPG,'Kernel','By',{Origin Model_Year});

Each unique combination of origin and model year now has a kernel distribution object associated with it. Names Names = 14x1 cell {'France...' } {'France...' } {'Germany...'} {'Germany...'} {'Germany...'} {'Italy...' } {'Japan...' } {'Japan...' } {'Japan...' } {'Sweden...' }

5-90

Compare Multiple Distribution Fits

{'Sweden...' {'USA...' {'USA...' {'USA...'

} } } }

Plot the three probability distributions for each USA model year, which are in positions 12, 13, and 14 in the cell array KerByYearOrig. figure hold on for i = 12 : 14 plot(x,pdf(KerByYearOrig{i},x)) end legend('1970','1976','1982') title('MPG in USA Cars by Model Year') xlabel('MPG') hold off

When further grouped by model year, the pdf plots reveal two distinct peaks in the MPG data for cars made in the USA — one for the model year 1970 and one for the model year 1982. This explains why the histogram for the combined USA miles per gallon data shows two peaks instead of one.

See Also fitdist | pdf | histogram

5-91

5

Probability Distributions

More About

5-92

•

“Grouping Variables” on page 2-11

•

“Fit Probability Distribution Objects to Grouped Data” on page 5-93

•

“Working with Probability Distributions” on page 5-3

•

“Supported Distributions” on page 5-16

Fit Probability Distribution Objects to Grouped Data

Fit Probability Distribution Objects to Grouped Data This example shows how to fit probability distribution objects to grouped sample data, and create a plot to visually compare the pdf of each group. Step 1. Load sample data. Load the sample data. load carsmall;

The data contains miles per gallon (MPG) measurements for different makes and models of cars, grouped by country of origin (Origin), model year (Model_Year), and other vehicle characteristics. Step 2. Create a categorical array. Transform Origin into a categorical array. Origin = categorical(cellstr(Origin));

Step 3. Fit kernel distributions to each group. Use fitdist to fit kernel distributions to each country of origin group in the MPG data. [KerByOrig,Country] = fitdist(MPG,'Kernel','by',Origin) KerByOrig=1×6 cell array {1x1 prob.KernelDistribution}

{1x1 prob.KernelDistribution}

{1x1 prob.KernelDistributio

Country = 6x1 cell {'France' } {'Germany'} {'Italy' } {'Japan' } {'Sweden' } {'USA' }

The cell array KerByOrig contains six kernel distribution objects, one for each country represented in the sample data. Each object contains properties that hold information about the data, the distribution, and the parameters. The array Country lists the country of origin for each group in the same order as the distribution objects are stored in KerByOrig. Step 4. Compute the pdf for each group. Extract the probability distribution objects for Germany, Japan, and USA. Use the positions of each country in KerByOrig shown in Step 3, which indicates that Germany is the second country, Japan is the fourth country, and USA is the sixth country. Compute the pdf for each group. Germany = KerByOrig{2}; Japan = KerByOrig{4}; USA = KerByOrig{6}; x = 0:1:50;

5-93

5

Probability Distributions

USA_pdf = pdf(USA,x); Japan_pdf = pdf(Japan,x); Germany_pdf = pdf(Germany,x);

Step 5. Plot the pdf for each group. Plot the pdf for each group on the same figure. plot(x,USA_pdf,'r-') hold on plot(x,Japan_pdf,'b-.') plot(x,Germany_pdf,'k:') legend({'USA','Japan','Germany'},'Location','NW') title('MPG by Country of Origin') xlabel('MPG')

The resulting plot shows how miles per gallon (MPG) performance differs by country of origin (Origin). Using this data, the USA has the widest distribution, and its peak is at the lowest MPG value of the three origins. Japan has the most regular distribution with a slightly heavier left tail, and its peak is at the highest MPG value of the three origins. The peak for Germany is between the USA and Japan, and the second bump near 44 miles per gallon suggests that there might be multiple modes in the data.

See Also pdf | fitdist

5-94

Fit Probability Distribution Objects to Grouped Data

More About •

“Kernel Distribution” on page B-79

•

“Grouping Variables” on page 2-11

•

“Fit Distributions to Grouped Data Using ksdensity” on page 5-42

•

“Working with Probability Distributions” on page 5-3

•

“Supported Distributions” on page 5-16

5-95

5

Probability Distributions

Three-Parameter Weibull Distribution Statistics and Machine Learning Toolbox™ uses a two-parameter “Weibull Distribution” on page B186 with a scale parameter a and a shape parameter b in the probability distribution object WeibullDistribution and distribution-specific functions such as wblpdf and wblcdf. The Weibull distribution can take a third parameter. The three-parameter Weibull distribution adds a location parameter that is zero in the two-parameter case. If X has a two-parameter Weibull distribution, then Y = X + c has a three-parameter Weibull distribution with the added location parameter c. The probability density function (pdf) of the three-parameter Weibull distribution becomes b x−c f (x | a, b, c) = a a

b−1

exp −

x−c a

0

b

if x > c, if x ≤ c,

where a and b are positive values, and c is a real value. If the scale parameter b is less than 1, the probability density of the Weibull distribution approaches infinity as x approaches c. The maximum of the likelihood function is infinite. The software might find satisfactory estimates in some cases, but the global maximum is degenerate when b < 1. This example shows how to find the maximum likelihood estimates (MLEs) for the three-parameter Weibull distribution by using a custom defined pdf and the mle function. Also, the example explains how to avoid the problem of a pdf approaching infinity when b < 1. Load Data Load the carsmall data set, which contains measurements of cars made in the 1970s and early 1980s. load carsmall

This example uses car weight measurements in the Weight variable. Fit Two-Parameter Weibull Distribution First, fit a two-parameter Weibull distribution to Weight. pd = fitdist(Weight,'Weibull') pd = WeibullDistribution Weibull distribution A = 3321.64 [3157.65, 3494.15] B = 4.10083 [3.52497, 4.77076]

Plot the fit with a histogram. figure histogram(Weight,8,'Normalization','pdf') hold on x = linspace(0,6000);

5-96

Three-Parameter Weibull Distribution

plot(x,pdf(pd,x),'LineWidth',2) hold off

The fitted distribution plot does not match the histogram well. The histogram shows no samples in the region where Weight < 1500. Fit a Weibull distribution again after subtracting 1500 from Weight. pd = fitdist(Weight-1500,'Weibull') pd = WeibullDistribution Weibull distribution A = 1711.75 [1543.58, 1898.23] B = 1.99963 [1.70954, 2.33895] figure histogram(Weight-1500,8,'Normalization','pdf') hold on plot(x,pdf(pd,x),'LineWidth',2) hold off

5-97

5

Probability Distributions

The fitted distribution plot matches the histogram better. Instead of specifying an arbitrary value for the distribution limit, you can define a custom function for a three-parameter Weibull distribution and estimate the limit (location parameter c). Define Custom pdf for Three-Parameter Weibull Distribution Define a probability density function for a three-parameter Weibull distribution. f_def = @(x,a,b,c) (x>c).*(b/a).*(((x-c)/a).^(b-1)).*exp(-((x-c)/a).^b);

Alternatively, you can use the wblpdf function to define the three-parameter Weibull distribution. f = @(x,a,b,c) wblpdf(x-c,a,b);

Fit Three-Parameter Weibull Distribution Find the MLEs for the three parameters by using the mle function. You must also specify the initial parameter values (Start name-value argument) for the custom distribution. try mle(Weight,'pdf',f,'Start',[1700 2 1500]) catch ME disp(ME) end MException with properties:

5-98

Three-Parameter Weibull Distribution

identifier: message: cause: stack: Correction:

'stats:mle:NonpositivePdfVal' 'Custom probability function returned negative or zero values.' {} [12x1 struct] []

mle returns an error because the custom function returns nonpositive values. This error is a typical problem when you fit a lower limit of a distribution, or fit a distribution with a region that has zero probability density. mle tries some parameter values that have zero density, and then fails to estimate parameters. In the previous function call, mle tries values of c that are higher than the minimum value of Weight, which leads to a zero density for some points, and returns the error. To avoid this problem, you can turn off the option that checks for invalid function values and specify the parameter bounds when you call the mle function. Display the default options for the iterative estimation process of the mle function. statset('mlecustom') ans = struct with fields: Display: 'off' MaxFunEvals: 400 MaxIter: 200 TolBnd: 1.0000e-06 TolFun: 1.0000e-06 TolTypeFun: [] TolX: 1.0000e-06 TolTypeX: [] GradObj: 'off' Jacobian: [] DerivStep: 6.0555e-06 FunValCheck: 'on' Robust: [] RobustWgtFun: [] WgtFun: [] Tune: [] UseParallel: [] UseSubstreams: [] Streams: {} OutputFcn: []

Override that default, using an options structure created with the statset function. Specify the FunValCheck field value as 'off' to turn off the validity check for the custom function values. opt = statset('FunValCheck','off');

Find the MLEs of the three parameters again. Specify the iterative process option (Options namevalue argument) and parameter bounds (LowerBound and UpperBound name-value arguments). The scale and shape parameters must be positive, and the location parameter must be smaller than the minimum of the sample data. params = mle(Weight,'pdf',f,'Start',[1700 2 1500],'Options',opt, ... 'LowerBound',[0 0 -Inf],'UpperBound',[Inf Inf min(Weight)]) params = 1×3 103 ×

5-99

5

Probability Distributions

1.3874

0.0015

1.7581

Plot the fit with a histogram. figure histogram(Weight,8,'Normalization','pdf') hold on plot(x,f(x,params(1),params(2),params(3)),'LineWidth',2) hold off

The fitted distribution plot matches the histogram well. Fit Three-Parameter Weibull Distribution for b < 1 If the scale parameter b is less than 1, the pdf of the Weibull distribution approaches infinity near the lower limit c (location parameter). You can avoid this problem by specifying interval-censored data, if appropriate. Load the cities data set. The data includes ratings for nine different indicators of the quality of life in 329 US cities: climate, housing, health, crime, transportation, education, arts, recreation, and economics. For each indicator, a higher rating is better. load cities

Find the MLEs for the seventh indicator (arts). 5-100

Three-Parameter Weibull Distribution

Y = ratings(:,7); params1 = mle(Y,'pdf',f,'Start',[median(Y) 1 0],'Options',opt) Warning: Maximum likelihood estimation did not converge.

Iteration limit exceeded.

params1 = 1×3 103 × 2.7584

0.0008

0.0520

The warning message indicates that the estimation does not converge. Modify the estimation options, and find the MLEs again. Increase the maximum number of iterations (MaxIter) and the maximum number of objective function evaluations (MaxFunEvals). opt.MaxIter = 1e3; opt.MaxFunEvals = 1e3; params2 = mle(Y,'pdf',f,'Start',params1,'Options',opt) Warning: Maximum likelihood estimation did not converge.

Function evaluation limit exceeded.

params2 = 1×3 103 × 2.7407

0.0008

0.0520

The iteration still does not converge because the pdf approaches infinity near the lower limit. Assume that the indicators in Y are the values rounded to the nearest integer. Then, you can treat values in Y as interval-censored observations. An observation y in Y indicates that the actual rating is between y–0.5 and y+0.5. Create a matrix in which each row represents the interval surrounding each integer in Y. intervalY = [Y-0.5, Y+0.5];

Find the MLEs again using intervalY. To fit a custom distribution to a censored data set, you must pass both the pdf and cdf to the mle function. F = @(x,a,b,c) wblcdf(x-c,a,b); params = mle(intervalY,'pdf',f,'cdf',F,'Start',params2,'Options',opt) params = 1×3 103 × 2.7949

0.0008

0.0515

The function finds the MLEs without any convergence issues. This fit is based on fitting probabilities to intervals, so it does not encounter the problem of a density approaching infinity at a single point. You can use this approach only when converting data to an interval-censored version is appropriate. Plot the results. figure histogram(Y,'Normalization','pdf') hold on x = linspace(0,max(Y));

5-101

5

Probability Distributions

plot(x,f(x,params(1),params(2),params(3)),'LineWidth',2) hold off

The fitted distribution plot matches the histogram well.

See Also WeibullDistribution | mle | wblcdf | wblpdf

Related Examples •

5-102

“Weibull Distribution” on page B-186

Multinomial Probability Distribution Objects

Multinomial Probability Distribution Objects This example shows how to generate random numbers, compute and plot the pdf, and compute descriptive statistics of a multinomial distribution using probability distribution objects. Step 1. Define the distribution parameters. Create a vector p containing the probability of each outcome. Outcome 1 has a probability of 1/2, outcome 2 has a probability of 1/3, and outcome 3 has a probability of 1/6. The number of trials n in each experiment is 5, and the number of repetitions reps of the experiment is 8. p = [1/2 1/3 1/6]; n = 5; reps = 8;

Step 2. Create a multinomial probability distribution object. Create a multinomial probability distribution object using the specified value p for the Probabilities parameter. pd = makedist('Multinomial','Probabilities',p) pd = MultinomialDistribution Probabilities: 0.5000 0.3333

0.1667

Step 3. Generate one random number. Generate one random number from the multinomial distribution, which is the outcome of a single trial. rng('default') r = random(pd)

% For reproducibility

r = 2

This trial resulted in outcome 2. Step 4. Generate a matrix of random numbers. You can also generate a matrix of random numbers from the multinomial distribution, which reports the results of multiple experiments that each contain multiple trials. Generate a matrix that contains the outcomes of an experiment with n = 5 trials and reps = 8 repetitions. r = random(pd,reps,n) r = 8×5 3 1 3 2 1

3 1 3 3 1

3 2 3 2 1

2 2 1 2 1

1 1 2 2 1

5-103

5

Probability Distributions

1 2 3

2 1 1

3 3 2

2 1 1

3 1 1

Each element in the resulting matrix is the outcome of one trial. The columns correspond to the five trials in each experiment, and the rows correspond to the eight experiments. For example, in the first experiment (corresponding to the first row), one of the five trials resulted in outcome 1, one of the five trials resulted in outcome 2, and three of the five trials resulted in outcome 3. Step 5. Compute and plot the pdf. Compute the pdf of the distribution. x = 1:3; y = pdf(pd,x); bar(x,y) xlabel('Outcome') ylabel('Probability Mass') title('Trinomial Distribution')

The plot shows the probability mass for each k possible outcome. For this distribution, the pdf value for any x other than 1, 2, or 3 is 0. Step 6. Compute descriptive statistics. Compute the mean, median, and standard deviation of the distribution. 5-104

Multinomial Probability Distribution Objects

m = mean(pd) m = 1.6667 med = median(pd) med = 1 s = std(pd) s = 0.7454

See Also More About •

“Multinomial Probability Distribution Functions” on page 5-106

•

“Working with Probability Distributions” on page 5-3

•

“Supported Distributions” on page 5-16

5-105

5

Probability Distributions

Multinomial Probability Distribution Functions This example shows how to generate random numbers and compute and plot the pdf of a multinomial distribution using probability distribution functions. Step 1. Define the distribution parameters. Create a vector p containing the probability of each outcome. Outcome 1 has a probability of 1/2, outcome 2 has a probability of 1/3, and outcome 3 has a probability of 1/6. The number of trials in each experiment n is 5, and the number of repetitions of the experiment reps is 8. p = [1/2 1/3 1/6]; n = 5; reps = 8;

Step 2. Generate one random number. Generate one random number from the multinomial distribution, which is the outcome of a single trial. rng('default') % For reproducibility r = mnrnd(1,p,1) r = 1×3 0

1

0

The returned vector r contains three elements, which show the counts for each possible outcome. This single trial resulted in outcome 2. Step 3. Generate a matrix of random numbers. You can also generate a matrix of random numbers from the multinomial distribution, which reports the results of multiple experiments that each contain multiple trials. Generate a matrix that contains the outcomes of an experiment with n = 5 trials and reps = 8 repetitions. r = mnrnd(n,p,reps) r = 8×3 1 3 1 0 5 1 3 3

1 2 1 4 0 2 1 1

3 0 3 1 0 2 1 1

Each row in the resulting matrix contains counts for each of the k multinomial bins. For example, in the first experiment (corresponding to the first row), one of the five trials resulted in outcome 1, one of the five trials resulted in outcome 2, and three of the five trials resulted in outcome 3. 5-106

Multinomial Probability Distribution Functions

Step 4. Compute the pdf. Since multinomial functions work with bin counts, create a multidimensional array of all possible outcome combinations, and compute the pdf using mnpdf. count1 = 1:n; count2 = 1:n; [x1,x2] = meshgrid(count1,count2); x3 = n-(x1+x2); y = mnpdf([x1(:),x2(:),x3(:)],repmat(p,(n)^2,1));

Step 5. Plot the pdf. Create a 3-D bar graph to visualize the pdf for each combination of outcome frequencies. y = reshape(y,n,n); bar3(y) set(gca,'XTickLabel',1:n); set(gca,'YTickLabel',1:n); xlabel('x_1 Frequency') ylabel('x_2 Frequency') zlabel('Probability Mass')

5-107

5

Probability Distributions

The plot shows the probability mass for each possible combination of outcomes. It does not show x3 , which is determined by the constraint x1 + x2 + x3 = n .

See Also More About

5-108

•

“Multinomial Probability Distribution Objects” on page 5-103

•

“Working with Probability Distributions” on page 5-3

•

“Supported Distributions” on page 5-16

Generate Random Numbers Using Uniform Distribution Inversion

Generate Random Numbers Using Uniform Distribution Inversion This example shows how to generate random numbers using the uniform distribution inversion method. This is useful for distributions when it is possible to compute the inverse cumulative distribution function, but there is no support for sampling from the distribution directly. Step 1. Generate random numbers from the standard uniform distribution. Use rand to generate 1000 random numbers from the uniform distribution on the interval (0,1). rng('default') % For reproducibility u = rand(1000,1);

The inversion method relies on the principle that continuous cumulative distribution functions (cdfs) range uniformly over the open interval (0,1). If u is a uniform random number on (0,1), then x = F−1(u) generates a random number x from any continuous distribution with the specified cdf F. Step 2. Generate random numbers from the Weibull distribution. Use the inverse cumulative distribution function to generate the random numbers from a Weibull distribution with parameters A = 1 and B = 1 that correspond to the probabilities in u. Plot the results. x = wblinv(u,1,1); histogram(x,20);

5-109

5

Probability Distributions

The histogram shows that the random numbers generated using the Weibull inverse cdf function wblinv have a Weibull distribution. Step 3. Generate random numbers from the standard normal distribution. The same values in u can generate random numbers from any distribution, for example the standard normal, by following the same procedure using the inverse cdf of the desired distribution. figure x_norm = norminv(u,1,1); histogram(x_norm,20)

5-110

Generate Random Numbers Using Uniform Distribution Inversion

The histogram shows that, by using the standard normal inverse cdf norminv, the random numbers generated from u now have a standard normal distribution.

See Also wblinv | norminv | rand | hist

More About •

“Uniform Distribution (Continuous)” on page B-179

•

“Weibull Distribution” on page B-186

•

“Normal Distribution” on page B-125

•

“Random Number Generation” on page 5-28

•

“Generate Random Numbers Using the Triangular Distribution” on page 5-48

•

“Generating Pseudorandom Numbers” on page 7-2

5-111

5

Probability Distributions

Represent Cauchy Distribution Using t Location-Scale This example shows how to use the t location-scale probability distribution object to work with a Cauchy distribution with nonstandard parameter values. Step 1. Create a probability distribution object. Create a t location-scale probability distribution object with degrees of freedom nu = 1. Specify mu = 3 to set the location parameter equal to 3, and sigma = 1 to set the scale parameter equal to 1. pd = makedist('tLocationScale','mu',3,'sigma',1,'nu',1) pd = tLocationScaleDistribution t Location-Scale distribution mu = 3 sigma = 1 nu = 1

Step 2. Compute descriptive statistics. Use object functions to compute descriptive statistics for the Cauchy distribution. med = median(pd) med = 3 r = iqr(pd) r = 2 m = mean(pd) m = NaN s = std(pd) s = Inf

The median of the Cauchy distribution is equal to its location parameter, and the interquartile range is equal to two times its scale parameter. Its mean and standard deviation are undefined. Step 3. Compute and plot the pdf. Compute and plot the pdf of the Cauchy distribution. x = -20:1:20; y = pdf(pd,x); plot(x,y,'LineWidth',2)

5-112

Represent Cauchy Distribution Using t Location-Scale

The peak of the pdf is centered at the location parameter mu = 3. Step 4. Generate a vector of Cauchy random numbers. Generate a column vector containing 10 random numbers from the Cauchy distribution using the random function for the t location-scale probability distribution object. rng('default'); % For reproducibility r = random(pd,10,1) r = 10×1 3.2678 4.6547 2.0604 4.7322 3.1810 1.6649 1.8471 4.2466 5.4647 8.8874

Step 5. Generate a matrix of Cauchy random numbers. Generate a 5-by-5 matrix of Cauchy random numbers. 5-113

5

Probability Distributions

r = random(pd,5,5) r = 5×5 2.2867 2.7421 3.5966 5.4791 1.6863

2.9692 2.7180 3.9806 15.6472 4.0985

-1.7003 3.2210 1.0182 0.7558 2.9934

5.5949 2.4233 6.4180 2.8908 13.9506

1.9806 3.1394 5.1367 5.9031 4.8792

See Also makedist

More About

5-114

•

“t Location-Scale Distribution” on page B-172

•

“Generate Cauchy Random Numbers Using Student's t” on page 5-115

Generate Cauchy Random Numbers Using Student's t

Generate Cauchy Random Numbers Using Student's t This example shows how to use the Student's t distribution to generate random numbers from a standard Cauchy distribution. Step 1. Generate a vector of random numbers. Generate a column vector containing 10 random numbers from a standard Cauchy distribution, which has a location parameter mu = 0 and scale parameter sigma = 1. Use trnd with degrees of freedom V = 1. rng('default'); % For reproducibility r = trnd(1,10,1) r = 10×1 0.2678 1.6547 -0.9396 1.7322 0.1810 -1.3351 -1.1529 1.2466 2.4647 5.8874

Step 2. Generate a matrix of random numbers. Generate a 5-by-5 matrix of random numbers from a standard Cauchy distribution. r = trnd(1,5,5) r = 5×5 -0.7133 -0.2579 0.5966 2.4791 -1.3137

-0.0308 -0.2820 0.9806 12.6472 1.0985

-4.7003 0.2210 -1.9818 -2.2442 -0.0066

2.5949 -0.5767 3.4180 -0.1092 10.9506

-1.0194 0.1394 2.1367 2.9031 1.8792

See Also trnd

More About •

“Student's t Distribution” on page B-165

•

“Represent Cauchy Distribution Using t Location-Scale” on page 5-112

5-115

5

Probability Distributions

Generate Correlated Data Using Rank Correlation This example shows how to use a copula and rank correlation to generate correlated data from probability distributions that do not have an inverse cdf function available, such as the Pearson flexible distribution family. Step 1. Generate Pearson random numbers. Generate 1000 random numbers from two different Pearson distributions, using the pearsrnd function. The first distribution has the parameter values mu equal to 0, sigma equal to 1, skew equal to 1, and kurtosis equal to 4. The second distribution has the parameter values mu equal to 0, sigma equal to 1, skew equal to 0.75, and kurtosis equal to 3. rng default % For reproducibility p1 = pearsrnd(0,1,-1,4,1000,1); p2 = pearsrnd(0,1,0.75,3,1000,1);

At this stage, p1 and p2 are independent samples from their respective Pearson distributions, and are uncorrelated. Step 2. Plot the Pearson random numbers. Create a scatterhist plot to visualize the Pearson random numbers. figure scatterhist(p1,p2)

5-116

Generate Correlated Data Using Rank Correlation

The histograms show the marginal distributions for p1 and p2. The scatterplot shows the joint distribution for p1 and p2. The lack of pattern to the scatterplot shows that p1 and p2 are independent. Step 3. Generate random numbers using a Gaussian copula. Use copularnd to generate 1000 correlated random numbers with a correlation coefficient equal to –0.8, using a Gaussian copula. Create a scatterhist plot to visualize the random numbers generated from the copula. u = copularnd('Gaussian',-0.8,1000); figure scatterhist(u(:,1),u(:,2))

The histograms show that the data in each column of the copula have a marginal uniform distribution. The scatterplot shows that the data in the two columns are negatively correlated. Step 4. Sort the copula random numbers. Using Spearman's rank correlation, transform the two independent Pearson samples into correlated data. Use the sort function to sort the copula random numbers from smallest to largest, and to return a vector of indices describing the rearranged order of the numbers. [s1,i1] = sort(u(:,1)); [s2,i2] = sort(u(:,2));

5-117

5

Probability Distributions

s1 and s2 contain the numbers from the first and second columns of the copula, u, sorted in order from smallest to largest. i1 and i2 are index vectors that describe the rearranged order of the elements into s1 and s2. For example, if the first value in the sorted vector s1 is the third value in the original unsorted vector, then the first value in the index vector i1 is 3. Step 5. Transform the Pearson samples using Spearman's rank correlation. Create two vectors of zeros, x1 and x2, that are the same size as the sorted copula vectors, s1 and s2. Sort the values in p1 and p2 from smallest to largest. Place the values into x1 and x2, in the same order as the indices i1 and i2 generated by sorting the copula random numbers. x1 = zeros(size(s1)); x2 = zeros(size(s2)); x1(i1) = sort(p1); x2(i2) = sort(p2);

Step 6. Plot the correlated Pearson random numbers. Create a scatterhist plot to visualize the correlated Pearson data. figure scatterhist(x1,x2)

The histograms show the marginal Pearson distributions for each column of data. The scatterplot shows the joint distribution of p1 and p2, and indicates that the data are now negatively correlated. 5-118

Generate Correlated Data Using Rank Correlation

Step 7. Confirm Spearman rank correlation coefficient values. Confirm that the Spearman rank correlation coefficient is the same for the copula random numbers and the correlated Pearson random numbers. copula_corr = corr(u,'Type','spearman') copula_corr = 2×2 1.0000 -0.7858

-0.7858 1.0000

pearson_corr = corr([x1,x2],'Type','spearman') pearson_corr = 2×2 1.0000 -0.7858

-0.7858 1.0000

The Spearman rank correlation is the same for the copula and the Pearson random numbers.

See Also copularnd | corr | sort

More About •

“Copulas: Generate Correlated Samples” on page 5-129

5-119

5

Probability Distributions

Create Gaussian Mixture Model This example shows how to create a known, or fully specified, Gaussian mixture model (GMM) object using gmdistribution and by specifying component means, covariances, and mixture proportions. To create a GMM object by fitting data to a GMM, see “Fit Gaussian Mixture Model to Data” on page 5-123. Specify the component means, covariances, and mixing proportions for a two-component mixture of bivariate Gaussian distributions. mu = [1 2;-3 -5]; % Means sigma = cat(3,[2 0;0 .5],[1 0;0 1]); % Covariances p = ones(1,2)/2; % Mixing proportions

The rows of mu correspond to the component mean vectors, and the pages of sigma, sigma(:,;,J), correspond to the component covariance matrices. Create a GMM object using gmdistribution. gm = gmdistribution(mu,sigma,p);

Display the properties of the GMM. properties(gm) Properties for class gmdistribution: NumVariables DistributionName NumComponents ComponentProportion SharedCovariance NumIterations RegularizationValue NegativeLogLikelihood CovarianceType mu Sigma AIC BIC Converged ProbabilityTolerance

For a description of the properties, see gmdistribution. To access the value of a property, use dot notation. For example, access the number of variables of each GMM component. dimension = gm.NumVariables dimension = 2

Visualize the probability density function (pdf) of the GMM using pdf and the MATLAB® function fsurf. gmPDF = @(x,y) arrayfun(@(x0,y0) pdf(gm,[x0 y0]),x,y); fsurf(gmPDF,[-10 10]) title('Probability Density Function of GMM');

5-120

Create Gaussian Mixture Model

Visualize the cumulative distribution function (cdf) of the GMM using cdf and fsurf. gmCDF = @(x,y) arrayfun(@(x0,y0) cdf(gm,[x0 y0]),x,y); fsurf(gmCDF,[-10 10]) title('Cumulative Distribution Function of GMM');

5-121

5

Probability Distributions

See Also fitgmdist | gmdistribution

More About

5-122

•

“Fit Gaussian Mixture Model to Data” on page 5-123

•

“Simulate Data from Gaussian Mixture Model” on page 5-127

•

“Cluster Using Gaussian Mixture Model” on page 17-39

Fit Gaussian Mixture Model to Data

Fit Gaussian Mixture Model to Data This example shows how to simulate data from a multivariate normal distribution, and then fit a Gaussian mixture model (GMM) to the data using fitgmdist. To create a known, or fully specified, GMM object, see “Create Gaussian Mixture Model” on page 5-120. fitgmdist requires a matrix of data and the number of components in the GMM. To create a useful GMM, you must choose k carefully. Too few components fails to model the data accurately (i.e., underfitting to the data). Too many components leads to an over-fit model with singular covariance matrices. Simulate data from a mixture of two bivariate Gaussian distributions using mvnrnd. mu1 = [1 2]; sigma1 = [2 0; 0 .5]; mu2 = [-3 -5]; sigma2 = [1 0; 0 1]; rng(1); % For reproducibility X = [mvnrnd(mu1,sigma1,1000); mvnrnd(mu2,sigma2,1000)];

Plot the simulated data. scatter(X(:,1),X(:,2),10,'.') % Scatter plot with points of size 10 title('Simulated Data')

5-123

5

Probability Distributions

Fit a two-component GMM. Use the 'Options' name-value pair argument to display the final output of the fitting algorithm. options = statset('Display','final'); gm = fitgmdist(X,2,'Options',options) 5 iterations, log-likelihood = -7105.71 gm = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.500000 Mean: -3.0377 -4.9859 Component 2: Mixing proportion: 0.500000 Mean: 0.9812 2.0563

Plot the pdf of the fitted GMM. gmPDF = @(x,y) arrayfun(@(x0,y0) pdf(gm,[x0 y0]),x,y); hold on h = fcontour(gmPDF,[-8 6]); title('Simulated Data and Contour lines of pdf');

Display the estimates for means, covariances, and mixture proportions 5-124

Fit Gaussian Mixture Model to Data

ComponentMeans = gm.mu ComponentMeans = 2×2 -3.0377 0.9812

-4.9859 2.0563

ComponentCovariances = gm.Sigma ComponentCovariances = ComponentCovariances(:,:,1) = 1.0132 0.0482

0.0482 0.9796

ComponentCovariances(:,:,2) = 1.9919 0.0127

0.0127 0.5533

MixtureProportions = gm.ComponentProportion MixtureProportions = 1×2 0.5000

0.5000

Fit four models to the data, each with an increasing number of components, and compare the Akaike Information Criterion (AIC) values. AIC = zeros(1,4); gm = cell(1,4); for k = 1:4 gm{k} = fitgmdist(X,k); AIC(k)= gm{k}.AIC; end

Display the number of components that minimizes the AIC value. [minAIC,numComponents] = min(AIC); numComponents numComponents = 2

The two-component model has the smallest AIC value. Display the two-component GMM. gm2 = gm{numComponents} gm2 = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.500000 Mean: -3.0377 -4.9859

5-125

5

Probability Distributions

Component 2: Mixing proportion: 0.500000 Mean: 0.9812 2.0563

Both the AIC and Bayesian information criteria (BIC) are likelihood-based measures of model fit that include a penalty for complexity (specifically, the number of parameters). You can use them to determine an appropriate number of components for a model when the number of components is unspecified.

See Also fitgmdist | gmdistribution | mvnrnd | random

More About

5-126

•

“Create Gaussian Mixture Model” on page 5-120

•

“Simulate Data from Gaussian Mixture Model” on page 5-127

•

“Cluster Using Gaussian Mixture Model” on page 17-39

Simulate Data from Gaussian Mixture Model

Simulate Data from Gaussian Mixture Model This example shows how to simulate data from a Gaussian mixture model (GMM) using a fully specified gmdistribution object and the random function. Create a known, two-component GMM object. mu = [1 2;-3 -5]; sigma = cat(3,[2 0;0 .5],[1 0;0 1]); p = ones(1,2)/2; gm = gmdistribution(mu,sigma,p);

Plot the contour of the pdf of the GMM. gmPDF = @(x,y) arrayfun(@(x0,y0) pdf(gm,[x0 y0]),x,y); fcontour(gmPDF,[-10 10]); title('Contour lines of pdf');

Generate 1000 random variates from the GMM. rng('default') % For reproducibility X = random(gm,1000);

Plot the variates with the pdf contours.

5-127

5

Probability Distributions

hold on scatter(X(:,1),X(:,2),10,'.') % Scatter plot with points of size 10 title('Contour lines of pdf and Simulated Data')

See Also fitgmdist | gmdistribution | mvnrnd | random

More About

5-128

•

“Create Gaussian Mixture Model” on page 5-120

•

“Fit Gaussian Mixture Model to Data” on page 5-123

•

“Cluster Using Gaussian Mixture Model” on page 17-39

Copulas: Generate Correlated Samples

Copulas: Generate Correlated Samples In this section... “Determining Dependence Between Simulation Inputs” on page 5-129 “Constructing Dependent Bivariate Distributions” on page 5-132 “Using Rank Correlation Coefficients” on page 5-136 “Using Bivariate Copulas” on page 5-138 “Higher Dimension Copulas” on page 5-145 “Archimedean Copulas” on page 5-146 “Simulating Dependent Multivariate Data Using Copulas” on page 5-147 “Fitting Copulas to Data” on page 5-151 Copulas are functions that describe dependencies among variables, and provide a way to create distributions that model correlated multivariate data. Using a copula, you can construct a multivariate distribution by specifying marginal univariate distributions, and then choose a copula to provide a correlation structure between variables. Bivariate distributions, as well as distributions in higher dimensions, are possible.

Determining Dependence Between Simulation Inputs One of the design decisions for a Monte Carlo simulation is a choice of probability distributions for the random inputs. Selecting a distribution for each individual variable is often straightforward, but deciding what dependencies should exist between the inputs may not be. Ideally, input data to a simulation should reflect what you know about dependence among the real quantities you are modeling. However, there may be little or no information on which to base any dependence in the simulation. In such cases, it is useful to experiment with different possibilities in order to determine the model's sensitivity. It can be difficult to generate random inputs with dependence when they have distributions that are not from a standard multivariate distribution. Further, some of the standard multivariate distributions can model only limited types of dependence. It is always possible to make the inputs independent, and while that is a simple choice, it is not always sensible and can lead to the wrong conclusions. For example, a Monte-Carlo simulation of financial risk could have two random inputs that represent different sources of insurance losses. You could model these inputs as lognormal random variables. A reasonable question to ask is how dependence between these two inputs affects the results of the simulation. Indeed, you might know from real data that the same random conditions affect both sources; ignoring that in the simulation could lead to the wrong conclusions. Generate and Exponentiate Normal Random Variables The lognrnd function simulates independent lognormal random variables. In the following example, the mvnrnd function generates n pairs of independent normal random variables, and then exponentiates them. Notice that the covariance matrix used here is diagonal. n = 1000; sigma = .5; SigmaInd = sigma.^2 .* [1 0; 0 1]

5-129

5

Probability Distributions

SigmaInd = 2×2 0.2500 0

0 0.2500

rng('default'); % For reproducibility ZInd = mvnrnd([0 0],SigmaInd,n); XInd = exp(ZInd); plot(XInd(:,1),XInd(:,2),'.') axis([0 5 0 5]) axis equal xlabel('X1') ylabel('X2')

Dependent bivariate lognormal random variables are also easy to generate using a covariance matrix with nonzero off-diagonal terms. rho = .7; SigmaDep = sigma.^2 .* [1 rho; rho 1] SigmaDep = 2×2 0.2500

5-130

0.1750

Copulas: Generate Correlated Samples

0.1750

0.2500

ZDep = mvnrnd([0 0],SigmaDep,n); XDep = exp(ZDep);

A second scatter plot demonstrates the difference between these two bivariate distributions. plot(XDep(:,1),XDep(:,2),'.') axis([0 5 0 5]) axis equal xlabel('X1') ylabel('X2')

It is clear that there is a tendency in the second data set for large values of X1 to be associated with large values of X2, and similarly for small values. The correlation parameter ρ of the underlying bivariate normal determines this dependence. The conclusions drawn from the simulation could well depend on whether you generate X1 and X2 with dependence. The bivariate lognormal distribution is a simple solution in this case; it easily generalizes to higher dimensions in cases where the marginal distributions are different lognormals. Other multivariate distributions also exist. For example, the multivariate t and the Dirichlet distributions simulate dependent t and beta random variables, respectively. But the list of simple multivariate distributions is not long, and they only apply in cases where the marginals are all in the same family (or even the exact same distributions). This can be a serious limitation in many situations. 5-131

5

Probability Distributions

Constructing Dependent Bivariate Distributions Although the construction discussed in the previous section creates a bivariate lognormal that is simple, it serves to illustrate a method that is more generally applicable. 1

Generate pairs of values from a bivariate normal distribution. There is statistical dependence between these two variables, and each has a normal marginal distribution.

2

Apply a transformation (the exponential function) separately to each variable, changing the marginal distributions into lognormals. The transformed variables still have a statistical dependence.

If a suitable transformation can be found, this method can be generalized to create dependent bivariate random vectors with other marginal distributions. In fact, a general method of constructing such a transformation does exist, although it is not as simple as exponentiation alone. By definition, applying the normal cumulative distribution function (cdf), denoted here by Φ, to a standard normal random variable results in a random variable that is uniform on the interval [0,1]. To see this, if Z has a standard normal distribution, then the cdf of U = Φ(Z) is Pr U ≤ u = Pr Φ Z ≤ u = Pr Z ≤ Φ−1 u

=u

and that is the cdf of a Unif(0,1) random variable. Histograms of some simulated normal and transformed values demonstrate that fact: n = 1000; rng default % for reproducibility z = normrnd(0,1,n,1); % generate standard normal data histogram(z,-3.75:.5:3.75,'FaceColor',[.8 .8 1]) % plot the histogram of data xlim([-4 4]) title('1000 Simulated N(0,1) Random Values') xlabel('Z') ylabel('Frequency')

5-132

Copulas: Generate Correlated Samples

u = normcdf(z);

% compute the cdf values of the sample data

figure histogram(u,.05:.1:.95,'FaceColor',[.8 .8 1]) % plot the histogram of the cdf values title('1000 Simulated N(0,1) Values Transformed to Unif(0,1)') xlabel('U') ylabel('Frequency')

5-133

5

Probability Distributions

Borrowing from the theory of univariate random number generation, applying the inverse cdf of any distribution, F, to a Unif(0,1) random variable results in a random variable whose distribution is exactly F (see “Inversion Methods” on page 7-3). The proof is essentially the opposite of the preceding proof for the forward case. Another histogram illustrates the transformation to a gamma distribution: x = gaminv(u,2,1); % transform to gamma values figure histogram(x,.25:.5:9.75,'FaceColor',[.8 .8 1]) % plot the histogram of gamma values title('1000 Simulated N(0,1) Values Transformed to Gamma(2,1)') xlabel('X') ylabel('Frequency')

5-134

Copulas: Generate Correlated Samples

You can apply this two-step transformation to each variable of a standard bivariate normal, creating dependent random variables with arbitrary marginal distributions. Because the transformation works on each component separately, the two resulting random variables need not even have the same marginal distributions. The transformation is defined as: Z = Z1, Z2 ∼ N 0, 0 ,

1 ρ ρ 1

U = Φ Z1 , Φ Z2 X = G1 U1 , G2 U2 where G1 and G2 are inverse cdfs of two possibly different distributions. For example, the following generates random vectors from a bivariate distribution with t5 and Gamma(2,1) marginals: n Z U X

= = = =

1000; rho = .7; mvnrnd([0 0],[1 rho; rho 1],n); normcdf(Z); [gaminv(U(:,1),2,1) tinv(U(:,2),5)];

% draw the scatter plot of data with histograms figure scatterhist(X(:,1),X(:,2),'Direction','out')

5-135

5

Probability Distributions

This plot has histograms alongside a scatter plot to show both the marginal distributions, and the dependence.

Using Rank Correlation Coefficients The correlation parameter, ρ, of the underlying bivariate normal determines the dependence between X1 and X2 in this construction. However, the linear correlation of X1 and X2 is not ρ. For example, in the original lognormal case, a closed form for that correlation is: 2

cor X1, X2 =

eρσ − 1 2

eσ − 1

which is strictly less than ρ, unless ρ is exactly 1. In more general cases such as the Gamma/t construction, the linear correlation between X1 and X2 is difficult or impossible to express in terms of ρ, but simulations show that the same effect happens. That is because the linear correlation coefficient expresses the linear dependence between random variables, and when nonlinear transformations are applied to those random variables, linear correlation is not preserved. Instead, a rank correlation coefficient, such as Kendall's τ or Spearman's ρ, is more appropriate. Roughly speaking, these rank correlations measure the degree to which large or small values of one random variable associate with large or small values of another. However, unlike the linear 5-136

Copulas: Generate Correlated Samples

correlation coefficient, they measure the association only in terms of ranks. As a consequence, the rank correlation is preserved under any monotonic transformation. In particular, the transformation method just described preserves the rank correlation. Therefore, knowing the rank correlation of the bivariate normal Z exactly determines the rank correlation of the final transformed random variables, X. While the linear correlation coefficient, ρ, is still needed to parameterize the underlying bivariate normal, Kendall's τ or Spearman's ρ are more useful in describing the dependence between random variables, because they are invariant to the choice of marginal distribution. For the bivariate normal, there is a simple one-to-one mapping between Kendall's τ or Spearman's ρ, and the linear correlation coefficient ρ: 2 π arcsin ρ or ρ = sin τ π 2 6 ρ π ρs = arcsin or ρ = 2sin ρs π 2 6

τ=

The following plot shows the relationship. rho = -1:.01:1; tau = 2.*asin(rho)./pi; rho_s = 6.*asin(rho./2)./pi; plot(rho,tau,'b-','LineWidth',2) hold on plot(rho,rho_s,'g-','LineWidth',2) plot([-1 1],[-1 1],'k:','LineWidth',2) axis([-1 1 -1 1]) xlabel('rho') ylabel('Rank correlation coefficient') legend('Kendall''s {\it\tau}', ... 'Spearman''s {\it\rho_s}', ... 'location','NW')

5-137

5

Probability Distributions

Thus, it is easy to create the desired rank correlation between X1 and X2, regardless of their marginal distributions, by choosing the correct ρ parameter value for the linear correlation between Z1 and Z2. For the multivariate normal distribution, Spearman's rank correlation is almost identical to the linear correlation. However, this is not true once you transform to the final random variables.

Using Bivariate Copulas The first step of the construction described in the previous section defines what is known as a bivariate Gaussian copula. A copula is a multivariate probability distribution, where each random variable has a uniform marginal distribution on the unit interval [0,1]. These variables may be completely independent, deterministically related (e.g., U2 = U1), or anything in between. Because of the possibility for dependence among variables, you can use a copula to construct a new multivariate distribution for dependent variables. By transforming each of the variables in the copula separately using the inversion method, possibly using different cdfs, the resulting distribution can have arbitrary marginal distributions. Such multivariate distributions are often useful in simulations, when you know that the different random inputs are not independent of each other. Statistics and Machine Learning Toolbox™ functions compute: • Probability density functions (copulapdf) and the cumulative distribution functions (copulacdf) for Gaussian copulas 5-138

Copulas: Generate Correlated Samples

• Rank correlations from linear correlations (copulastat) and vice versa (copulaparam) • Random vectors (copularnd) • Parameters for copulas fit to data (copulafit) For example, use the copularnd function to create scatter plots of random values from a bivariate Gaussian copula for various levels of ρ, to illustrate the range of different dependence structures. The family of bivariate Gaussian copulas is parameterized by the linear correlation matrix: P=

1 ρ ρ 1

U1 and U2 approach linear dependence as ρ approaches ±1, and approach complete independence as ρ approaches zero: n = 500; rng('default') % for reproducibility U = copularnd('Gaussian',[1 .8; .8 1],n); subplot(2,2,1) plot(U(:,1),U(:,2),'.') title('{\it\rho} = 0.8') xlabel('U1') ylabel('U2') U = copularnd('Gaussian',[1 .1; .1 1],n); subplot(2,2,2) plot(U(:,1),U(:,2),'.') title('{\it\rho} = 0.1') xlabel('U1') ylabel('U2') U = copularnd('Gaussian',[1 -.1; -.1 1],n); subplot(2,2,3) plot(U(:,1),U(:,2),'.') title('{\it\rho} = -0.1') xlabel('U1') ylabel('U2') U = copularnd('Gaussian',[1 -.8; -.8 1],n); subplot(2,2,4) plot(U(:,1),U(:,2),'.') title('{\it\rho} = -0.8') xlabel('U1') ylabel('U2')

5-139

5

Probability Distributions

The dependence between U1 and U2 is completely separate from the marginal distributions of X1 = G(U1) and X2 = G(U2). X1 and X2 can be given any marginal distributions, and still have the same rank correlation. This is one of the main appeals of copulas—they allow this separate specification of dependence and marginal distribution. You can also compute the pdf (copulapdf) and the cdf (copulacdf) for a copula. For example, these plots show the pdf and cdf for ρ = .8: u1 = linspace(1e-3,1-1e-3,50); u2 = linspace(1e-3,1-1e-3,50); [U1,U2] = meshgrid(u1,u2); Rho = [1 .8; .8 1]; f = copulapdf('t',[U1(:) U2(:)],Rho,5); f = reshape(f,size(U1)); figure surf(u1,u2,log(f),'FaceColor','interp','EdgeColor','none') view([-15,20]) xlabel('U1') ylabel('U2') zlabel('Probability Density')

5-140

Copulas: Generate Correlated Samples

u1 = linspace(1e-3,1-1e-3,50); u2 = linspace(1e-3,1-1e-3,50); [U1,U2] = meshgrid(u1,u2); F = copulacdf('t',[U1(:) U2(:)],Rho,5); F = reshape(F,size(U1)); figure() surf(u1,u2,F,'FaceColor','interp','EdgeColor','none') view([-15,20]) xlabel('U1') ylabel('U2') zlabel('Cumulative Probability')

5-141

5

Probability Distributions

A different family of copulas can be constructed by starting from a bivariate t distribution and transforming using the corresponding t cdf. The bivariate t distribution is parameterized with P, the linear correlation matrix, and ν, the degrees of freedom. Thus, for example, you can speak of a t1 or a t5 copula, based on the multivariate t with one and five degrees of freedom, respectively. Just as for Gaussian copulas, Statistics and Machine Learning Toolbox functions for t copulas compute: • Probability density functions (copulapdf) and the cumulative distribution functions (copulacdf) for t copulas • Rank correlations from linear correlations (copulastat) and vice versa (copulaparam) • Random vectors (copularnd) • Parameters for copulas fit to data (copulafit) For example, use the copularnd function to create scatter plots of random values from a bivariate t1 copula for various levels of ρ, to illustrate the range of different dependence structures: n = 500; nu = 1; rng('default') % for reproducibility U = copularnd('t',[1 .8; .8 1],nu,n); subplot(2,2,1) plot(U(:,1),U(:,2),'.') title('{\it\rho} = 0.8')

5-142

Copulas: Generate Correlated Samples

xlabel('U1') ylabel('U2') U = copularnd('t',[1 .1; .1 1],nu,n); subplot(2,2,2) plot(U(:,1),U(:,2),'.') title('{\it\rho} = 0.1') xlabel('U1') ylabel('U2') U = copularnd('t',[1 -.1; -.1 1],nu,n); subplot(2,2,3) plot(U(:,1),U(:,2),'.') title('{\it\rho} = -0.1') xlabel('U1') ylabel('U2') U = copularnd('t',[1 -.8; -.8 1],nu, n); subplot(2,2,4) plot(U(:,1),U(:,2),'.') title('{\it\rho} = -0.8') xlabel('U1') ylabel('U2')

A t copula has uniform marginal distributions for U1 and U2, just as a Gaussian copula does. The rank correlation τ or ρs between components in a t copula is also the same function of ρ as for a Gaussian. However, as these plots demonstrate, a t1 copula differs quite a bit from a Gaussian copula, even 5-143

5

Probability Distributions

when their components have the same rank correlation. The difference is in their dependence structure. Not surprisingly, as the degrees of freedom parameter ν is made larger, a tν copula approaches the corresponding Gaussian copula. As with a Gaussian copula, any marginal distributions can be imposed over a t copula. For example, using a t copula with 1 degree of freedom, you can again generate random vectors from a bivariate distribution with Gamma(2,1) and t5 marginals using copularnd: n = 1000; rho = .7; nu = 1; rng('default') % for reproducibility U = copularnd('t',[1 rho; rho 1],nu,n); X = [gaminv(U(:,1),2,1) tinv(U(:,2),5)]; figure scatterhist(X(:,1),X(:,2),'Direction','out')

Compared to the bivariate Gamma/t distribution constructed earlier, which was based on a Gaussian copula, the distribution constructed here, based on a t1 copula, has the same marginal distributions and the same rank correlation between variables but a very different dependence structure. This illustrates the fact that multivariate distributions are not uniquely defined by their marginal distributions, or by their correlations. The choice of a particular copula in an application may be based on actual observed data, or different copulas may be used as a way of determining the sensitivity of simulation results to the input distribution. 5-144

Copulas: Generate Correlated Samples

Higher Dimension Copulas The Gaussian and t copulas are known as elliptical copulas. It is easy to generalize elliptical copulas to a higher number of dimensions. For example, simulate data from a trivariate distribution with Gamma(2,1), Beta(2,2), and t5 marginals using a Gaussian copula and copularnd, as follows: n = 1000; Rho = [1 .4 .2; .4 1 -.8; .2 -.8 1]; rng('default') % for reproducibility U = copularnd('Gaussian',Rho,n); X = [gaminv(U(:,1),2,1) betainv(U(:,2),2,2) tinv(U(:,3),5)];

Plot the data. subplot(1,1,1) plot3(X(:,1),X(:,2),X(:,3),'.') grid on view([-55, 15]) xlabel('X1') ylabel('X2') zlabel('X3')

Notice that the relationship between the linear correlation parameter ρ and, for example, Kendall's τ, holds for each entry in the correlation matrix P used here. You can verify that the sample rank correlations of the data are approximately equal to the theoretical values: 5-145

5

Probability Distributions

tauTheoretical = 2.*asin(Rho)./pi tauTheoretical = 3×3 1.0000 0.2620 0.1282

0.2620 1.0000 -0.5903

0.1282 -0.5903 1.0000

tauSample = corr(X,'type','Kendall') tauSample = 3×3 1.0000 0.2581 0.1414

0.2581 1.0000 -0.5790

0.1414 -0.5790 1.0000

Archimedean Copulas Statistics and Machine Learning Toolbox™ functions are available for three bivariate Archimedean copula families: • Clayton copulas • Frank copulas • Gumbel copulas These are one-parameter families that are defined directly in terms of their cdfs, rather than being defined constructively using a standard multivariate distribution. To compare these three Archimedean copulas to the Gaussian and t bivariate copulas, first use the copulastat function to find the rank correlation for a Gaussian or t copula with linear correlation parameter of 0.8, and then use the copulaparam function to find the Clayton copula parameter that corresponds to that rank correlation: tau = copulastat('Gaussian',.8 ,'type','kendall') tau = 0.5903 alpha = copulaparam('Clayton',tau,'type','kendall') alpha = 2.8820

Finally, plot a random sample from the Clayton copula with copularnd. Repeat the same procedure for the Frank and Gumbel copulas: n = 500; U = copularnd('Clayton',alpha,n); subplot(3,1,1) plot(U(:,1),U(:,2),'.'); title(['Clayton Copula, {\it\alpha} = ',sprintf('%0.2f',alpha)]) xlabel('U1') ylabel('U2')

5-146

Copulas: Generate Correlated Samples

alpha = copulaparam('Frank',tau,'type','kendall'); U = copularnd('Frank',alpha,n); subplot(3,1,2) plot(U(:,1),U(:,2),'.') title(['Frank Copula, {\it\alpha} = ',sprintf('%0.2f',alpha)]) xlabel('U1') ylabel('U2') alpha = copulaparam('Gumbel',tau,'type','kendall'); U = copularnd('Gumbel',alpha,n); subplot(3,1,3) plot(U(:,1),U(:,2),'.') title(['Gumbel Copula, {\it\alpha} = ',sprintf('%0.2f',alpha)]) xlabel('U1') ylabel('U2')

Simulating Dependent Multivariate Data Using Copulas To simulate dependent multivariate data using a copula, you must specify each of the following: • The copula family (and any shape parameters) • The rank correlations among variables • Marginal distributions for each variable 5-147

5

Probability Distributions

Suppose you have return data for two stocks and want to run a Monte Carlo simulation with inputs that follow the same distributions as the data: load stockreturns nobs = size(stocks,1); subplot(2,1,1) histogram(stocks(:,1),10,'FaceColor',[.8 .8 1]) xlim([-3.5 3.5]) xlabel('X1') ylabel('Frequency') subplot(2,1,2) histogram(stocks(:,2),10,'FaceColor',[.8 .8 1]) xlim([-3.5 3.5]) xlabel('X2') ylabel('Frequency')

You could fit a parametric model separately to each dataset, and use those estimates as the marginal distributions. However, a parametric model may not be sufficiently flexible. Instead, you can use a nonparametric model to transform to the marginal distributions. All that is needed is a way to compute the inverse cdf for the nonparametric model. The simplest nonparametric model is the empirical cdf, as computed by the ecdf function. For a discrete marginal distribution, this is appropriate. However, for a continuous distribution, use a model that is smoother than the step function computed by ecdf. One way to do that is to estimate the empirical cdf and interpolate between the midpoints of the steps with a piecewise linear function. 5-148

Copulas: Generate Correlated Samples

Another way is to use kernel smoothing with ksdensity. For example, compare the empirical cdf to a kernel smoothed cdf estimate for the first variable: [Fi,xi] = ecdf(stocks(:,1)); figure() stairs(xi,Fi,'b','LineWidth',2) hold on Fi_sm = ksdensity(stocks(:,1),xi,'function','cdf','width',.15); plot(xi,Fi_sm,'r-','LineWidth',1.5) xlabel('X1') ylabel('Cumulative Probability') legend('Empirical','Smoothed','Location','NW') grid on

For the simulation, experiment with different copulas and correlations. Here, you will use a bivariate t copula with a fairly small degrees of freedom parameter. For the correlation parameter, you can compute the rank correlation of the data. nu = 5; tau = corr(stocks(:,1),stocks(:,2),'type','kendall') tau = 0.5180

Find the corresponding linear correlation parameter for the t copula using copulaparam. 5-149

5

Probability Distributions

rho = copulaparam('t', tau, nu, 'type','kendall') rho = 0.7268

Next, use copularnd to generate random values from the t copula and transform using the nonparametric inverse cdfs. The ksdensity function allows you to make a kernel estimate of distribution and evaluate the inverse cdf at the copula points all in one step: n = 1000; U = copularnd('t',[1 rho; rho 1],nu,n); X1 = ksdensity(stocks(:,1),U(:,1),... 'function','icdf','width',.15); X2 = ksdensity(stocks(:,2),U(:,2),... 'function','icdf','width',.15);

Alternatively, when you have a large amount of data or need to simulate more than one set of values, it may be more efficient to compute the inverse cdf over a grid of values in the interval (0,1) and use interpolation to evaluate it at the copula points: p = linspace(0.00001,0.99999,1000); G1 = ksdensity(stocks(:,1),p,'function','icdf','width',0.15); X1 = interp1(p,G1,U(:,1),'spline'); G2 = ksdensity(stocks(:,2),p,'function','icdf','width',0.15); X2 = interp1(p,G2,U(:,2),'spline'); scatterhist(X1,X2,'Direction','out')

5-150

Copulas: Generate Correlated Samples

The marginal histograms of the simulated data are a smoothed version of the histograms for the original data. The amount of smoothing is controlled by the bandwidth input to ksdensity.

Fitting Copulas to Data This example shows how to use copulafit to calibrate copulas with data. To generate data Xsim with a distribution "just like" (in terms of marginal distributions and correlations) the distribution of data in the matrix X, you need to fit marginal distributions to the columns of X, use appropriate cdf functions to transform X to U, so that U has values between 0 and 1, use copulafit to fit a copula to U, generate new data Usim from the copula, and use appropriate inverse cdf functions to transform Usim to Xsim . Load and plot the simulated stock return data. load stockreturns x = stocks(:,1); y = stocks(:,2); scatterhist(x,y,'Direction','out')

Transform the data to the copula scale (unit square) using a kernel estimator of the cumulative distribution function. u = ksdensity(x,x,'function','cdf'); v = ksdensity(y,y,'function','cdf');

5-151

5

Probability Distributions

scatterhist(u,v,'Direction','out') xlabel('u') ylabel('v')

Fit a t copula. [Rho,nu] = copulafit('t',[u v],'Method','ApproximateML') Rho = 2×2 1.0000 0.7220

0.7220 1.0000

nu = 3.4516e+06

Generate a random sample from the t copula. r = copularnd('t',Rho,nu,1000); u1 = r(:,1); v1 = r(:,2); scatterhist(u1,v1,'Direction','out') xlabel('u') ylabel('v') set(get(gca,'children'),'marker','.')

5-152

Copulas: Generate Correlated Samples

Transform the random sample back to the original scale of the data. x1 = ksdensity(x,u1,'function','icdf'); y1 = ksdensity(y,v1,'function','icdf'); scatterhist(x1,y1,'Direction','out') set(get(gca,'children'),'marker','.')

5-153

5

Probability Distributions

As the example illustrates, copulas integrate naturally with other distribution fitting functions.

See Also copulacdf | copulafit | copulaparam | copulapdf | copulastat | copularnd

Related Examples

5-154

•

“Generate Correlated Data Using Rank Correlation” on page 5-116

•

“Simulating Dependent Random Variables Using Copulas” on page 5-155

Simulating Dependent Random Variables Using Copulas

Simulating Dependent Random Variables Using Copulas This example shows how to use copulas to generate data from multivariate distributions when there are complicated relationships among the variables, or when the individual variables are from different distributions. MATLAB® is an ideal tool for running simulations that incorporate random inputs or noise. Statistics and Machine Learning Toolbox™ provides functions to create sequences of random data according to many common univariate distributions. The Toolbox also includes a few functions to generate random data from multivariate distributions, such as the multivariate normal and multivariate t. However, there is no built-in way to generate multivariate distributions for all marginal distributions, or in cases where the individual variables are from different distributions. Recently, copulas have become popular in simulation models. Copulas are functions that describe dependencies among variables, and provide a way to create distributions to model correlated multivariate data. Using a copula, a data analyst can construct a multivariate distribution by specifying marginal univariate distributions, and choosing a particular copula to provide a correlation structure between variables. Bivariate distributions, as well as distributions in higher dimensions, are possible. In this example, we discuss how to use copulas to generate dependent multivariate random data in MATLAB, using Statistics and Machine Learning Toolbox. Dependence Between Simulation Inputs One of the design decisions for a Monte-Carlo simulation is a choice of probability distributions for the random inputs. Selecting a distribution for each individual variable is often straightforward, but deciding what dependencies should exist between the inputs may not be. Ideally, input data to a simulation should reflect what is known about dependence among the real quantities being modelled. However, there may be little or no information on which to base any dependence in the simulation, and in such cases, it is a good idea to experiment with different possibilities, in order to determine the model's sensitivity. However, it can be difficult to actually generate random inputs with dependence when they have distributions that are not from a standard multivariate distribution. Further, some of the standard multivariate distributions can model only very limited types of dependence. It's always possible to make the inputs independent, and while that is a simple choice, it's not always sensible and can lead to the wrong conclusions. For example, a Monte-Carlo simulation of financial risk might have random inputs that represent different sources of insurance losses. These inputs might be modeled as lognormal random variables. A reasonable question to ask is how dependence between these two inputs affects the results of the simulation. Indeed, it might be known from real data that the same random conditions affect both sources, and ignoring that in the simulation could lead to the wrong conclusions. Simulation of independent lognormal random variables is trivial. The simplest way would be to use the lognrnd function. Here, we'll use the mvnrnd function to generate n pairs of independent normal random variables, and then exponentiate them. Notice that the covariance matrix used here is diagonal, i.e., independence between the columns of Z. n = 1000; sigma = .5; SigmaInd = sigma.^2 .* [1 0; 0 1] SigmaInd =

5-155

5

Probability Distributions

0.2500 0

0 0.2500

ZInd = mvnrnd([0 0], SigmaInd, n); XInd = exp(ZInd); plot(XInd(:,1),XInd(:,2),'.'); axis equal; axis([0 5 0 5]); xlabel('X1'); ylabel('X2');

Dependent bivariate lognormal r.v.'s are also easy to generate, using a covariance matrix with nonzero off-diagonal terms. rho = .7; SigmaDep = sigma.^2 .* [1 rho; rho 1] SigmaDep = 0.2500 0.1750

0.1750 0.2500

ZDep = mvnrnd([0 0], SigmaDep, n); XDep = exp(ZDep);

5-156

Simulating Dependent Random Variables Using Copulas

A second scatter plot illustrates the difference between these two bivariate distributions. plot(XDep(:,1),XDep(:,2),'.'); axis equal; axis([0 5 0 5]); xlabel('X1'); ylabel('X2');

It's clear that there is more of a tendency in the second dataset for large values of X1 to be associated with large values of X2, and similarly for small values. This dependence is determined by the correlation parameter, rho, of the underlying bivariate normal. The conclusions drawn from the simulation could well depend on whether or not X1 and X2 were generated with dependence or not. The bivariate lognormal distribution is a simple solution in this case, and of course easily generalizes to higher dimensions and cases where the marginal distributions are different lognormals. Other multivariate distributions also exist, for example, the multivariate t and the Dirichlet distributions are used to simulate dependent t and beta random variables, respectively. But the list of simple multivariate distributions is not long, and they only apply in cases where the marginals are all in the same family (or even the exact same distributions). This can be a real limitation in many situations. A More General Method for Constructing Dependent Bivariate Distributions Although the above construction that creates a bivariate lognormal is simple, it serves to illustrate a method which is more generally applicable. First, we generate pairs of values from a bivariate normal distribution. There is statistical dependence between these two variables, and each has a normal marginal distribution. Next, a transformation (the exponential function) is applied separately to each 5-157

5

Probability Distributions

variable, changing the marginal distributions into lognormals. The transformed variables still have a statistical dependence. If a suitable transformation could be found, this method could be generalized to create dependent bivariate random vectors with other marginal distributions. In fact, a general method of constructing such a transformation does exist, although not as simple as just exponentiation. By definition, applying the normal CDF (denoted here by PHI) to a standard normal random variable results in a r.v. that is uniform on the interval [0, 1]. To see this, if Z has a standard normal distribution, then the CDF of U = PHI(Z) is Pr{U = 0, the GP has no upper limit. Also, the GP is often used in conjunction with a third, threshold parameter that shifts the lower limit away from zero. We will not need that generality here. The GP distribution is a generalization of both the exponential distribution (k = 0) and the Pareto distribution (k > 0). The GP includes those two distributions in a larger family so that a continuous range of shapes is possible. Simulating Exceedance Data The GP distribution can be defined constructively in terms of exceedances. Starting with a probability distribution whose right tail drops off to zero, such as the normal, we can sample random values independently from that distribution. If we fix a threshold value, throw out all the values that are below the threshold, and subtract the threshold off of the values that are not thrown out, the result is known as exceedances. The distribution of the exceedances is approximately a GP. Similarly, we can set a threshold in the left tail of a distribution, and ignore all values above that threshold. The threshold must be far enough out in the tail of the original distribution for the approximation to be reasonable. The original distribution determines the shape parameter, k, of the resulting GP distribution. Distributions whose tails fall off as a polynomial, such as Student's t, lead to a positive shape parameter. Distributions whose tails decrease exponentially, such as the normal, correspond to a zero shape parameter. Distributions with finite tails, such as the beta, correspond to a negative shape parameter.

5-208

Modelling Tail Data with the Generalized Pareto Distribution

Real-world applications for the GP distribution include modelling extremes of stock market returns, and modelling extreme floods. For this example, we'll use simulated data, generated from a Student's t distribution with 5 degrees of freedom. We'll take the largest 5% of 2000 observations from the t distribution, and then subtract off the 95% quantile to get exceedances. rng(3,'twister'); x = trnd(5,2000,1); q = quantile(x,.95); y = x(x>q) - q; n = numel(y) n = 100

Fitting the Distribution Using Maximum Likelihood The GP distribution is defined for 0 < sigma, and -Inf < k < Inf. However, interpretation of the results of maximum likelihood estimation is problematic when k < -1/2. Fortunately, those cases correspond to fitting tails from distributions like the beta or triangular, and so will not present a problem here. paramEsts = gpfit(y); kHat = paramEsts(1) sigmaHat = paramEsts(2)

% Tail index parameter % Scale parameter

kHat = 0.0987 sigmaHat = 0.7156

As might be expected, since the simulated data were generated using a t distribution, the estimate of k is positive. Checking the Fit Visually To visually assess how good the fit is, we'll plot a scaled histogram of the tail data, overlaid with the density function of the GP that we've estimated. The histogram is scaled so that the bar heights times their width sum to 1. bins = 0:.25:7; h = bar(bins,histc(y,bins)/(length(y)*.25),'histc'); h.FaceColor = [.9 .9 .9]; ygrid = linspace(0,1.1*max(y),100); line(ygrid,gppdf(ygrid,kHat,sigmaHat)); xlim([0,6]); xlabel('Exceedance'); ylabel('Probability Density');

5-209

5

Probability Distributions

We've used a fairly small bin width, so there is a good deal of noise in the histogram. Even so, the fitted density follows the shape of the data, and so the GP model seems to be a good choice. We can also compare the empirical CDF to the fitted CDF. [F,yi] = ecdf(y); plot(yi,gpcdf(yi,kHat,sigmaHat),'-'); hold on; stairs(yi,F,'r'); hold off; legend('Fitted Generalized Pareto CDF','Empirical CDF','location','southeast');

5-210

Modelling Tail Data with the Generalized Pareto Distribution

Computing Standard Errors for the Parameter Estimates To quantify the precision of the estimates, we'll use standard errors computed from the asymptotic covariance matrix of the maximum likelihood estimators. The function gplike computes, as its second output, a numerical approximation to that covariance matrix. Alternatively, we could have called gpfit with two output arguments, and it would have returned confidence intervals for the parameters. [nll,acov] = gplike(paramEsts, y); stdErr = sqrt(diag(acov)) stdErr = 0.1158 0.1093

These standard errors indicate that the relative precision of the estimate for k is quite a bit lower than that for sigma -- its standard error is on the order of the estimate itself. Shape parameters are often difficult to estimate. It's important to keep in mind that computation of these standard errors assumed that the GP model is correct, and that we have enough data for the asymptotic approximation to the covariance matrix to hold.

5-211

5

Probability Distributions

Checking the Asymptotic Normality Assumption Interpretation of the standard errors usually involves assuming that, if the same fit could be repeated many times on data that came from the same source, the maximum likelihood estimates of the parameters would approximately follow a normal distribution. For example, confidence intervals are often based this assumption. However, that normal approximation may or may not be a good one. To assess how good it is in this example, we can use a bootstrap simulation. We will generate 1000 replicate datasets by resampling from the data, fit a GP distribution to each one, and save all the replicate estimates. replEsts = bootstrp(1000,@gpfit,y);

As a rough check on the sampling distribution of the parameter estimators, we can look at histograms of the bootstrap replicates. subplot(2,1,1); hist(replEsts(:,1)); title('Bootstrap estimates of k'); subplot(2,1,2); hist(replEsts(:,2)); title('Bootstrap estimates of sigma');

Using a Parameter Transformation The histogram of the bootstrap estimates for k appears to be only a little asymmetric, while that for the estimates of sigma definitely appears skewed to the right. A common remedy for that skewness is 5-212

Modelling Tail Data with the Generalized Pareto Distribution

to estimate the parameter and its standard error on the log scale, where a normal approximation may be more reasonable. A Q-Q plot is a better way to assess normality than a histogram, because nonnormality shows up as points that do not approximately follow a straight line. Let's check that to see if the log transform for sigma is appropriate. subplot(1,2,1); qqplot(replEsts(:,1)); title('Bootstrap estimates of k'); subplot(1,2,2); qqplot(log(replEsts(:,2))); title('Bootstrap estimates of log(sigma)');

The bootstrap estimates for k and log(sigma) appear acceptably close to normality. A Q-Q plot for the estimates of sigma, on the unlogged scale, would confirm the skewness that we've already seen in the histogram. Thus, it would be more reasonable to construct a confidence interval for sigma by first computing one for log(sigma) under the assumption of normality, and then exponentiating to transform that interval back to the original scale for sigma. In fact, that's exactly what the function gpfit does behind the scenes. [paramEsts,paramCI] = gpfit(y); kHat kCI = paramCI(:,1) kHat =

5-213

5

Probability Distributions

0.0987 kCI = -0.1283 0.3258 sigmaHat sigmaCI = paramCI(:,2) sigmaHat = 0.7156 sigmaCI = 0.5305 0.9654

Notice that while the 95% confidence interval for k is symmetric about the maximum likelihood estimate, the confidence interval for sigma is not. That's because it was created by transforming a symmetric CI for log(sigma).

5-214

Modelling Data with the Generalized Extreme Value Distribution

Modelling Data with the Generalized Extreme Value Distribution This example shows how to fit the generalized extreme value distribution using maximum likelihood estimation. The extreme value distribution is used to model the largest or smallest value from a group or block of data. Three types of extreme value distributions are common, each as the limiting case for different types of underlying distributions. For example, the type I extreme value is the limit distribution of the maximum (or minimum) of a block of normally distributed data, as the block size becomes large. In this example, we will illustrate how to fit such data using a single distribution that includes all three types of extreme value distributions as special case, and investigate likelihood-based confidence intervals for quantiles of the fitted distribution. The Generalized Extreme Value Distribution The Generalized Extreme Value (GEV) distribution unites the type I, type II, and type III extreme value distributions into a single family, to allow a continuous range of possible shapes. It is parameterized with location and scale parameters, mu and sigma, and a shape parameter, k. When k < 0, the GEV is equivalent to the type III extreme value. When k > 0, the GEV is equivalent to the type II. In the limit as k approaches 0, the GEV becomes the type I. x = linspace(-3,6,1000); plot(x,gevpdf(x,-.5,1,0),'-', x,gevpdf(x,0,1,0),'-', x,gevpdf(x,.5,1,0),'-'); xlabel('(x-mu) / sigma'); ylabel('Probability Density'); legend({'k < 0, Type III' 'k = 0, Type I' 'k > 0, Type II'});

5-215

5

Probability Distributions

Notice that for k < 0 or k > 0, the density has zero probability above or below, respectively, the upper or lower bound -(1/k). In the limit as k approaches 0, the GEV is unbounded. This can be summarized as the constraint that 1+k*(y-mu)/sigma must be positive. Simulating Block Maximum Data The GEV can be defined constructively as the limiting distribution of block maxima (or minima). That is, if you generate a large number of independent random values from a single probability distribution, and take their maximum value, the distribution of that maximum is approximately a GEV. The original distribution determines the shape parameter, k, of the resulting GEV distribution. Distributions whose tails fall off as a polynomial, such as Student's t, lead to a positive shape parameter. Distributions whose tails decrease exponentially, such as the normal, correspond to a zero shape parameter. Distributions with finite tails, such as the beta, correspond to a negative shape parameter. Real applications for the GEV might include modelling the largest return for a stock during each month. Here, we will simulate data by taking the maximum of 25 values from a Student's t distribution with two degrees of freedom. The simulated data will include 75 random block maximum values. rng(0,'twister'); y = max(trnd(2,25,75),[],1);

5-216

Modelling Data with the Generalized Extreme Value Distribution

Fitting the Distribution by Maximum Likelihood The function gevfit returns both maximum likelihood parameter estimates, and (by default) 95% confidence intervals. [paramEsts,paramCIs] = gevfit(y); kMLE = paramEsts(1) sigmaMLE = paramEsts(2) muMLE = paramEsts(3)

% Shape parameter % Scale parameter % Location parameter

kMLE = 0.4901 sigmaMLE = 1.4856 muMLE = 2.9710 kCI = paramCIs(:,1) sigmaCI = paramCIs(:,2) muCI = paramCIs(:,3) kCI = 0.2020 0.7782 sigmaCI = 1.1431 1.9307 muCI = 2.5599 3.3821

Notice that the 95% confidence interval for k does not include the value zero. The type I extreme value distribution is apparently not a good model for these data. That makes sense, because the underlying distribution for the simulation had much heavier tails than a normal, and the type II extreme value distribution is theoretically the correct one as the block size becomes large. As an alternative to confidence intervals, we can also compute an approximation to the asymptotic covariance matrix of the parameter estimates, and from that extract the parameter standard errors. 5-217

5

Probability Distributions

[nll,acov] = gevlike(paramEsts,y); paramSEs = sqrt(diag(acov)) paramSEs = 0.1470 0.1986 0.2097

Checking the Fit Visually To visually assess how good the fit is, we'll look at plots of the fitted probability density function (PDF) and cumulative distribution function (CDF). The support of the GEV depends on the parameter values. In this case, the estimate for k is positive, so the fitted distribution has zero probability below a lower bound. lowerBnd = muMLE-sigmaMLE./kMLE;

First, we'll plot a scaled histogram of the data, overlaid with the PDF for the fitted GEV model. This histogram is scaled so that the bar heights times their width sum to 1, to make it comparable to the PDF. ymax = 1.1*max(y); bins = floor(lowerBnd):ceil(ymax); h = bar(bins,histc(y,bins)/length(y),'histc'); h.FaceColor = [.9 .9 .9]; ygrid = linspace(lowerBnd,ymax,100); line(ygrid,gevpdf(ygrid,kMLE,sigmaMLE,muMLE)); xlabel('Block Maximum'); ylabel('Probability Density'); xlim([lowerBnd ymax]);

5-218

Modelling Data with the Generalized Extreme Value Distribution

We can also compare the fit to the data in terms of cumulative probability, by overlaying the empirical CDF and the fitted CDF. [F,yi] = ecdf(y); plot(ygrid,gevcdf(ygrid,kMLE,sigmaMLE,muMLE),'-'); hold on; stairs(yi,F,'r'); hold off; xlabel('Block Maximum'); ylabel('Cumulative Probability'); legend('Fitted Generalized Extreme Value CDF','Empirical CDF','location','southeast'); xlim([lowerBnd ymax]);

5-219

5

Probability Distributions

Estimating Quantiles of the Model While the parameter estimates may be important by themselves, a quantile of the fitted GEV model is often the quantity of interest in analyzing block maxima data. For example, the return level Rm is defined as the block maximum value expected to be exceeded only once in m blocks. That is just the (1-1/m)'th quantile. We can plug the maximum likelihood parameter estimates into the inverse CDF to estimate Rm for m=10. R10MLE = gevinv(1-1./10,kMLE,sigmaMLE,muMLE) R10MLE = 9.0724

We could compute confidence limits for R10 using asymptotic approximations, but those may not be valid. Instead, we will use a likelihood-based method to compute confidence limits. This method often produces more accurate results than one based on the estimated covariance matrix of the parameter estimates. Given any set of values for the parameters mu, sigma, and k, we can compute a log-likelihood -- for example, the MLEs are the parameter values that maximize the GEV log-likelihood. As the parameter values move away from the MLEs, their log-likelihood typically becomes significantly less than the maximum. If we look at the set of parameter values that produce a log-likelihood larger than a specified critical value, this is a complicated region in the parameter space. However, for a suitable 5-220

Modelling Data with the Generalized Extreme Value Distribution

critical value, it is a confidence region for the model parameters. The region contains parameter values that are "compatible with the data". The critical value that determines the region is based on a chi-square approximation, and we'll use 95% as our confidence level. (Note that we will actually work with the negative of the log-likelihood.) nllCritVal = gevlike([kMLE,sigmaMLE,muMLE],y) + .5*chi2inv(.95,1) nllCritVal = 170.3044

For any set of parameter values mu, sigma, and k, we can compute R10. Therefore, we can find the smallest R10 value achieved within the critical region of the parameter space where the negative loglikelihood is larger than the critical value. That smallest value is the lower likelihood-based confidence limit for R10. This is difficult to visualize in all three parameter dimensions, but as a thought experiment, we can fix the shape parameter, k, we can see how the procedure would work over the two remaining parameters, sigma and mu. sigmaGrid = linspace(.8, 2.25, 110); muGrid = linspace(2.4, 3.6); nllGrid = zeros(length(sigmaGrid),length(muGrid)); R10Grid = zeros(length(sigmaGrid),length(muGrid)); for i = 1:size(nllGrid,1) for j = 1:size(nllGrid,2) nllGrid(i,j) = gevlike([kMLE,sigmaGrid(i),muGrid(j)],y); R10Grid(i,j) = gevinv(1-1./10,kMLE,sigmaGrid(i),muGrid(j)); end end nllGrid(nllGrid>gevlike([kMLE,sigmaMLE,muMLE],y)+6) = NaN; contour(muGrid,sigmaGrid,R10Grid,6.14:.64:12.14,'LineColor','r'); hold on contour(muGrid,sigmaGrid,R10Grid,[7.42 11.26],'LineWidth',2,'LineColor','r'); contour(muGrid,sigmaGrid,nllGrid,[168.7 169.1 169.6 170.3:1:173.3],'LineColor','b'); contour(muGrid,sigmaGrid,nllGrid,[nllCritVal nllCritVal],'LineWidth',2,'LineColor','b'); hold off axis([2.4 3.6 .8 2.25]); xlabel('mu'); ylabel('sigma');

5-221

5

Probability Distributions

The blue contours represent the log-likelihood surface, and the bold blue contour is the boundary of the critical region. The red contours represent the surface for R10 -- larger values are to the top right, lower to the bottom left. The contours are straight lines because for fixed k, Rm is a linear function of sigma and mu. The bold red contours are the lowest and highest values of R10 that fall within the critical region. In the full three dimensional parameter space, the log-likelihood contours would be ellipsoidal, and the R10 contours would be surfaces. Finding the lower confidence limit for R10 is an optimization problem with nonlinear inequality constraints, and so we will use the function fmincon from the Optimization Toolbox™. We need to find the smallest R10 value, and therefore the objective to be minimized is R10 itself, equal to the inverse CDF evaluated for p=1-1/m. We'll create a wrapper function that computes Rm specifically for m=10. CIobjfun = @(params) gevinv(1-1./10,params(1),params(2),params(3));

To perform the constrained optimization, we'll also need a function that defines the constraint, that is, that the negative log-likelihood be less than the critical value. The constraint function should return positive values when the constraint is violated. We'll create an anonymous function, using the simulated data and the critical log-likelihood value. It also returns an empty value because we're not using any equality constraints here. CIconfun = @(params) deal(gevlike(params,y) - nllCritVal, []);

Finally, we call fmincon, using the active-set algorithm to perform the constrained optimization. opts = optimset('Algorithm','active-set', 'Display','notify', 'MaxFunEvals',500, ... 'RelLineSrchBnd',.1, 'RelLineSrchBndDuration',Inf);

5-222

Modelling Data with the Generalized Extreme Value Distribution

[params,R10Lower,flag,output] = ... fmincon(CIobjfun,paramEsts,[],[],[],[],[],[],CIconfun,opts); Feasible point with lower objective function value found.

To find the upper likelihood confidence limit for R10, we simply reverse the sign on the objective function to find the largest R10 value in the critical region, and call fmincon a second time. CIobjfun = @(params) -gevinv(1-1./10,params(1),params(2),params(3)); [params,R10Upper,flag,output] = ... fmincon(CIobjfun,paramEsts,[],[],[],[],[],[],CIconfun,opts); R10Upper = -R10Upper; R10CI = [R10Lower, R10Upper] R10CI = 7.0841

13.4452

plot(ygrid,gevcdf(ygrid,kMLE,sigmaMLE,muMLE),'-'); hold on; stairs(yi,F,'r'); plot(R10CI([1 1 1 1 2 2 2 2]), [.88 .92 NaN .9 .9 NaN .88 .92],'k-') hold off; xlabel('Block Maximum'); ylabel('Cumulative Probability'); legend('Fitted Generalized Extreme Value CDF','Empirical CDF', ... 'R_{10} 95% CI','location','southeast'); xlim([lowerBnd ymax]);

5-223

5

Probability Distributions

Likelihood Profile for a Quantile Sometimes just an interval does not give enough information about the quantity being estimated, and a profile likelihood is needed instead. To find the log-likelihood profile for R10, we will fix a possible value for R10, and then maximize the GEV log-likelihood, with the parameters constrained so that they are consistent with that current value of R10. This is a nonlinear equality constraint. If we do that over a range of R10 values, we get a likelihood profile. As with the likelihood-based confidence interval, we can think about what this procedure would be if we fixed k and worked over the two remaining parameters, sigma and mu. Each red contour line in the contour plot shown earlier represents a fixed value of R10; the profile likelihood optimization consists of stepping along a single R10 contour line to find the highest log-likelihood (blue) contour. For this example, we'll compute a profile likelihood for R10 over the values that were included in the likelihood confidence interval. R10grid = linspace(R10CI(1)-.05*diff(R10CI), R10CI(2)+.05*diff(R10CI), 51);

The objective function for the profile likelihood optimization is simply the log-likelihood, using the simulated data. PLobjfun = @(params) gevlike(params,y);

To use fmincon, we'll need a function that returns non-zero values when the constraint is violated, that is, when the parameters are not consistent with the current value of R10. For each value of R10, we'll create an anonymous function for the particular value of R10 under consideration. It also returns an empty value because we're not using any inequality constraints here. 5-224

Modelling Data with the Generalized Extreme Value Distribution

Finally, we'll call fmincon at each value of R10, to find the corresponding constrained maximum of the log-likelihood. We'll start near the maximum likelihood estimate of R10, and work out in both directions. Lprof = nan(size(R10grid)); params = paramEsts; [dum,peak] = min(abs(R10grid-R10MLE)); for i = peak:1:length(R10grid) PLconfun = ... @(params) deal([], gevinv(1-1./10,params(1),params(2),params(3)) - R10grid(i)); [params,Lprof(i),flag,output] = ... fmincon(PLobjfun,params,[],[],[],[],[],[],PLconfun,opts); end params = paramEsts; for i = peak-1:-1:1 PLconfun = ... @(params) deal([], gevinv(1-1./10,params(1),params(2),params(3)) - R10grid(i)); [params,Lprof(i),flag,output] = ... fmincon(PLobjfun,params,[],[],[],[],[],[],PLconfun,opts); end plot(R10grid,-Lprof,'-', R10MLE,-gevlike(paramEsts,y),'ro', ... [R10grid(1), R10grid(end)],[-nllCritVal,-nllCritVal],'k--'); xlabel('R_{10}'); ylabel('Log-Likelihood'); legend('Profile likelihood','MLE','95% Conf. Limit');

5-225

5

Probability Distributions

Curve Fitting and Distribution Fitting This example shows how to perform curve fitting and distribution fitting, and discusses when each method is appropriate. Choose Between Curve Fitting and Distribution Fitting Curve fitting and distribution fitting are different types of data analysis. • Use curve fitting when you want to model a response variable as a function of a predictor variable. • Use distribution fitting when you want to model the probability distribution of a single variable. Curve Fitting In the following experimental data, the predictor variable is time, the time after the ingestion of a drug. The response variable is conc, the concentration of the drug in the bloodstream. Assume that only the response data conc is affected by experimental error. time = [ 0.1 5.1 9.6 12.3 16.4 conc = [0.01 1.83 2.08 1.36 0.49

0.1 5.6 10.2 13.1 16.4 0.08 1.68 2.02 1.05 0.53

0.3 6.2 10.3 13.2 16.7 0.13 2.09 1.65 1.29 0.42

0.3 6.4 10.8 13.4 16.7 0.16 2.17 1.96 1.32 0.48

1.3 7.7 11.2 13.7 17.5 0.55 2.66 1.91 1.20 0.41

1.7 8.1 11.2 14.0 17.6 0.90 2.08 1.30 1.10 0.27

2.1 8.2 11.2 14.3 18.1 1.11 2.26 1.62 0.88 0.36

2.6 8.9 11.7 15.4 18.5 1.62 1.65 1.57 0.63 0.33

3.9 9.0 12.1 16.1 19.3 1.79 1.70 1.32 0.69 0.17

3.9 ... 9.5 ... 12.3 ... 16.1 ... 19.7]'; 1.59 ... 2.39 ... 1.56 ... 0.69 ... 0.20]';

Suppose you want to model blood concentration as a function of time. Plot conc against time. plot(time,conc,'o'); xlabel('Time'); ylabel('Blood Concentration');

5-226

Curve Fitting and Distribution Fitting

Assume that conc follows a two-parameter Weibull curve as a function of time. A Weibull curve has the form and parameters (b − 1) −(x/a)b e ,

y = c(x/a)

where a is a horizontal scaling, b is a shape parameter, and c is a vertical scaling. Fit the Weibull model using nonlinear least squares. modelFun = @(p,x) p(3) .* (x./p(1)).^(p(2)-1) .* exp(-(x./p(1)).^p(2)); startingVals = [10 2 5]; nlModel = fitnlm(time,conc,modelFun,startingVals);

Plot the Weibull curve onto the data. xgrid = linspace(0,20,100)'; line(xgrid,predict(nlModel,xgrid),'Color','r');

5-227

5

Probability Distributions

The fitted Weibull model is problematic. fitnlm assumes the experimental errors are additive and come from a symmetric distribution with constant variance. However, the scatter plot shows that the error variance is proportional to the height of the curve. Furthermore, the additive, symmetric errors imply that a negative blood concentration measurement is possible. A more realistic assumption is that multiplicative errors are symmetric on the log scale. Under that assumption, fit a Weibull curve to the data by taking the log of both sides. Use nonlinear least squares to fit the curve: b

log(y) = log(c) + (b − 1)log(x/a) − (x/a) . nlModel2 = fitnlm(time,log(conc),@(p,x) log(modelFun(p,x)),startingVals);

Add the new curve to the existing plot. line(xgrid,exp(predict(nlModel2,xgrid)),'Color',[0 .5 0],'LineStyle','--'); legend({'Raw Data','Additive Errors Model','Multiplicative Errors Model'});

5-228

Curve Fitting and Distribution Fitting

The model object nlModel2 contains estimates of precision. A best practice is to check the model's goodness of fit. For example, make residual plots on the log scale to check the assumption of constant variance for the multiplicative errors. In this example, using the multiplicative errors model has little effect on the model predictions. For an example where the type of model has more of an impact, see “Pitfalls in Fitting Nonlinear Models by Transforming to Linearity” on page 13-53. Functions for Curve Fitting • Statistics and Machine Learning Toolbox™ includes these functions for fitting models: fitnlm for nonlinear least-squares models, fitglm for generalized linear models, fitrgp for Gaussian process regression models, and fitrsvm for support vector machine regression models. • Curve Fitting Toolbox™ provides command line and graphical tools that simplify tasks in curve fitting. For example, the toolbox provides automatic choice of starting coefficient values for various models, as well as robust and nonparametric fitting methods. • Optimization Toolbox™ has functions for performing complicated types of curve fitting analyses, such as analyzing models with constraints on the coefficients. • The MATLAB® function polyfit fits polynomial models, and the MATLAB function fminsearch is useful in other kinds of curve fitting. Distribution Fitting Suppose you want to model the distribution of electrical component lifetimes. The variable life measures the time to failure for 50 identical electrical components. 5-229

5

Probability Distributions

life = [ 6.2 16.1 16.3 19.0 12.2 8.1 8.8 5.9 16.1 12.8 9.8 11.3 5.1 10.8 6.7 1.2 4.3 2.9 14.8 4.6 3.1 13.6 14.5 5.2 5.3 6.4 3.5 11.4 9.3 12.4 18.3 15.9 8.7 3.0 12.1 3.9 6.5 3.4 8.5 0.9

7.3 8.2 ... 8.3 2.3 ... 5.7 6.5 ... 4.0 10.4 ... 9.9 7.9]';

Visualize the data with a histogram. binWidth = 2; lastVal = ceil(max(life)); binEdges = 0:binWidth:lastVal+1; h = histogram(life,binEdges); xlabel('Time to Failure'); ylabel('Frequency'); ylim([0 10]);

Because lifetime data often follows a Weibull distribution, one approach might be to use the Weibull curve from the previous curve fitting example to fit the histogram. To try this approach, convert the histogram to a set of points (x,y), where x is a bin center and y is a bin height, and then fit a curve to those points. counts = histcounts(life,binEdges); binCtrs = binEdges(1:end-1) + binWidth/2; h.FaceColor = [.9 .9 .9]; hold on plot(binCtrs,counts,'o'); hold off

5-230

Curve Fitting and Distribution Fitting

Fitting a curve to a histogram, however, is problematic and usually not recommended. 1

The process violates basic assumptions of least-squares fitting. The bin counts are nonnegative, implying that measurement errors cannot be symmetric. Also, the bin counts have different variability in the tails than in the center of the distribution. Finally, the bin counts have a fixed sum, implying that they are not independent measurements.

2

If you fit a Weibull curve to the bar heights, you have to constrain the curve because the histogram is a scaled version of an empirical probability density function (pdf).

3

For continuous data, fitting a curve to a histogram rather than data discards information.

4

The bar heights in the histogram are dependent on the choice of bin edges and bin widths.

For many parametric distributions, maximum likelihood is a better way to estimate parameters because it avoids these problems. The Weibull pdf has almost the same form as the Weibull curve: (b − 1) −(x/a)b e .

y = (b/a)(x/a)

However, b/a replaces the scale parameter c because the function must integrate to 1. To fit a Weibull distribution to the data using maximum likelihood, use fitdist and specify 'Weibull' as the distribution name. Unlike least squares, maximum likelihood finds a Weibull pdf that best matches the scaled histogram without minimizing the sum of the squared differences between the pdf and the bar heights. pd = fitdist(life,'Weibull');

Plot a scaled histogram of the data and superimpose the fitted pdf. 5-231

5

Probability Distributions

h = histogram(life,binEdges,'Normalization','pdf','FaceColor',[.9 .9 .9]); xlabel('Time to Failure'); ylabel('Probability Density'); ylim([0 0.1]); xgrid = linspace(0,20,100)'; pdfEst = pdf(pd,xgrid); line(xgrid,pdfEst)

A best practice is to check the model's goodness of fit. Although fitting a curve to a histogram is usually not recommended, the process is appropriate in some cases. For an example, see “Fit Custom Distributions” on page 5-173. Functions for Distribution Fitting • Statistics and Machine Learning Toolbox™ includes the function fitdist for fitting probability distribution objects to data. It also includes dedicated fitting functions (such as wblfit) for fitting parametric distributions using maximum likelihood, the function mle for fitting custom distributions without dedicated fitting functions, and the function ksdensity for fitting nonparametric distribution models to data. • Statistics and Machine Learning Toolbox additionally provides the Distribution Fitter app, which simplifies many tasks in distribution fitting, such as generating visualizations and diagnostic plots. • Functions in Optimization Toolbox™ enable you to fit complicated distributions, including those with constraints on the parameters.

5-232

Curve Fitting and Distribution Fitting

• The MATLAB® function fminsearch provides maximum likelihood distribution fitting.

See Also fitnlm | fitglm | fitrgp | fitrsvm | polyfit | fminsearch | fitdist | mle | ksdensity | Distribution Fitter

More About •

“Supported Distributions” on page 5-16

5-233

5

Probability Distributions

Fitting a Univariate Distribution Using Cumulative Probabilities This example shows how to fit univariate distributions using least squares estimates of the cumulative distribution functions. This is a generally-applicable method that can be useful in cases when maximum likelihood fails, for instance some models that include a threshold parameter. The most common method for fitting a univariate distribution to data is maximum likelihood. But maximum likelihood does not work in all cases, and other estimation methods, such as the Method of Moments, are sometimes needed. When applicable, maximum likelihood is probably the better choice of methods, because it is often more efficient. But the method described here provides another tool that can be used when needed. Fitting an Exponential Distribution Using Least Squares The term "least squares" is most commonly used in the context of fitting a regression line or surface to model a response variable as a function of one or more predictor variables. The method described here is a very different application of least squares: univariate distribution fitting, with only a single variable. To begin, first simulate some sample data. We'll use an exponential distribution to generate the data. For the purposes of this example, as in practice, we'll assume that the data are not known to have come from a particular model. rng(37,'twister'); n = 100; x = exprnd(2,n,1);

Next, compute the empirical cumulative distribution function (ECDF) of the data. This is simply a step function with a jump in cumulative probability, p, of 1/n at each data point, x. x = sort(x); p = ((1:n)-0.5)' ./ n; stairs(x,p,'k-'); xlabel('x'); ylabel('Cumulative probability (p)');

5-234

Fitting a Univariate Distribution Using Cumulative Probabilities

We'll fit an exponential distribution to these data. One way to do that is to find the exponential distribution whose cumulative distribution function (CDF) best approximates (in a sense to be explained below) the ECDF of the data. The exponential CDF is p = Pr{X Rotate 3D.

See Also pca | pcacov | pcares | ppca | boxplot | biplot

More About • 16-74

“Principal Component Analysis (PCA)” on page 16-65

Factor Analysis

Factor Analysis Multivariate data often includes a large number of measured variables, and sometimes those variables overlap, in the sense that groups of them might be dependent. For example, in a decathlon, each athlete competes in 10 events, but several of them can be thought of as speed events, while others can be thought of as strength events, etc. Thus, you can think of a competitor's 10 event scores as largely dependent on a smaller set of three or four types of athletic ability. Factor analysis is a way to fit a model to multivariate data to estimate just this sort of interdependence. In a factor analysis model, the measured variables depend on a smaller number of unobserved (latent) factors. Because each factor might affect several variables in common, they are known as common factors. Each variable is assumed to be dependent on a linear combination of the common factors, and the coefficients are known as loadings. Each measured variable also includes a component due to independent random variability, known as specific variance because it is specific to one variable. Specifically, factor analysis assumes that the covariance matrix of your data is of the form

∑x = ΛΛΤ + Ψ where Λ is the matrix of loadings, and the elements of the diagonal matrix Ψ are the specific variances. The function factoran fits the Factor Analysis model using maximum likelihood.

See Also factoran

Related Examples •

“Analyze Stock Prices Using Factor Analysis” on page 16-76

16-75

16

Multivariate Methods

Analyze Stock Prices Using Factor Analysis This example shows how to analyze if companies within the same sector experience similar week-toweek changes in stock price. Factor Loadings Load the sample data. load stockreturns

Suppose that over the course of 100 weeks, the percent change in stock prices for ten companies has been recorded. Of the ten companies, the first four can be classified as primarily technology, the next three as financial, and the last three as retail. It seems reasonable that the stock prices for companies that are in the same sector might vary together as economic conditions change. Factor analysis can provide quantitative evidence. First specify a model fit with three common factors. By default, factoran computes rotated estimates of the loadings to try and make their interpretation simpler. But in this example, specify an unrotated solution. [Loadings,specificVar,T,stats] = factoran(stocks,3,'rotate','none');

The first two factoran output arguments are the estimated loadings and the estimated specific variances. Each row of the loadings matrix represents one of the ten stocks, and each column corresponds to a common factor. With unrotated estimates, interpretation of the factors in this fit is difficult because most of the stocks contain fairly large coefficients for two or more factors. Loadings Loadings = 10×3 0.8885 0.7126 0.3351 0.3088 0.6277 0.4726 0.1133 0.6403 0.2363 0.1105

0.2367 0.3862 0.2784 0.1113 -0.6643 -0.6383 -0.5416 0.1669 0.5293 0.1680

-0.2354 0.0034 -0.0211 -0.1905 0.1478 0.0133 0.0322 0.4960 0.5770 0.5524

Factor rotation helps to simplify the structure in the Loadings matrix, to make it easier to assign meaningful interpretations to the factors. From the estimated specific variances, you can see that the model indicates that a particular stock price varies quite a lot beyond the variation due to the common factors. Display estimated specific variances. specificVar specificVar = 10×1

16-76

Analyze Stock Prices Using Factor Analysis

0.0991 0.3431 0.8097 0.8559 0.1429 0.3691 0.6928 0.3162 0.3311 0.6544

A specific variance of 1 would indicate that there is no common factor component in that variable, while a specific variance of 0 would indicate that the variable is entirely determined by common factors. These data seem to fall somewhere in between. Display the p-value. stats.p ans = 0.8144

The p-value returned in the stats structure fails to reject the null hypothesis of three common factors, suggesting that this model provides a satisfactory explanation of the covariation in these data. Fit a model with two common factors to determine whether fewer than three factors can provide an acceptable fit. [Loadings2,specificVar2,T2,stats2] = factoran(stocks, 2,'rotate','none');

Display the p-value. stats2.p ans = 3.5610e-06

The p-value for this second fit is highly significant, and rejects the hypothesis of two factors, indicating that the simpler model is not sufficient to explain the pattern in these data. Factor Rotation As the results illustrate, the estimated loadings from an unrotated factor analysis fit can have a complicated structure. The goal of factor rotation is to find a parameterization in which each variable has only a small number of large loadings. That is, each variable is affected by a small number of factors, preferably only one. This can often make it easier to interpret what the factors represent. If you think of each row of the loadings matrix as coordinates of a point in M-dimensional space, then each factor corresponds to a coordinate axis. Factor rotation is equivalent to rotating those axes and computing new loadings in the rotated coordinate system. There are various ways to do this. Some methods leave the axes orthogonal, while others are oblique methods that change the angles between them. For this example, you can rotate the estimated loadings by using the promax criterion, a common oblique method. [LoadingsPM,specVarPM] = factoran(stocks,3,'rotate','promax'); LoadingsPM

16-77

16

Multivariate Methods

LoadingsPM = 10×3 0.9452 0.7064 0.3885 0.4162 0.1021 0.0873 -0.1616 0.2169 0.0016 -0.2289

0.1214 -0.0178 -0.0994 -0.0148 0.9019 0.7709 0.5320 0.2844 -0.1881 0.0636

-0.0617 0.2058 0.0975 -0.1298 0.0768 -0.0821 -0.0888 0.6635 0.7849 0.6475

Promax rotation creates a simpler structure in the loadings, one in which most of the stocks have a large loading on only one factor. To see this structure more clearly, you can use the biplot function to plot each stock using its factor loadings as coordinates. biplot(LoadingsPM,'varlabels',num2str((1:10)')); axis square view(155,27);

This plot shows that promax has rotated the factor loadings to a simpler structure. Each stock depends primarily on only one factor, and it is possible to describe each factor in terms of the stocks that it affects. Based on which companies are near which axes, you could reasonably conclude that the first factor axis represents the financial sector, the second retail, and the third technology. The original conjecture, that stocks vary primarily within sector, is apparently supported by the data. 16-78

Analyze Stock Prices Using Factor Analysis

Factor Scores Sometimes, it is useful to be able to classify an observation based on its factor scores. For example, if you accepted the three-factor model and the interpretation of the rotated factors, you might want to categorize each week in terms of how favorable it was for each of the three stock sectors, based on the data from the 10 observed stocks. Because the data in this example are the raw stock price changes, and not just their correlation matrix, you can have factoran return estimates of the value of each of the three rotated common factors for each week. You can then plot the estimated scores to see how the different stock sectors were affected during each week. [LoadingsPM,specVarPM,TPM,stats,F] = factoran(stocks, 3,'rotate','promax'); plot3(F(:,1),F(:,2),F(:,3),'b.') line([-4 4 NaN 0 0 NaN 0 0], [0 0 NaN -4 4 NaN 0 0],[0 0 NaN 0 0 NaN -4 4], 'Color','black') xlabel('Financial Sector') ylabel('Retail Sector') zlabel('Technology Sector') grid on axis square view(-22.5, 8)

Oblique rotation often creates factors that are correlated. This plot shows some evidence of correlation between the first and third factors, and you can investigate further by computing the estimated factor correlation matrix. inv(TPM'*TPM);

16-79

16

Multivariate Methods

Visualize the Results You can use the biplot function to help visualize both the factor loadings for each variable and the factor scores for each observation in a single plot. For example, the following command plots the results from the factor analysis on the stock data and labels each of the 10 stocks. biplot(LoadingsPM,'scores',F,'varlabels',num2str((1:10)')) xlabel('Financial Sector') ylabel('Retail Sector') zlabel('Technology Sector') axis square view(155,27)

In this case, the factor analysis includes three factors, and so the biplot is three-dimensional. Each of the 10 stocks is represented in this plot by a vector, and the direction and length of the vector indicates how each stock depends on the underlying factors. For example, you have seen that after promax rotation, the first four stocks have positive loadings on the first factor, and unimportant loadings on the other two factors. That first factor, interpreted as a financial sector effect, is represented in this biplot as one of the horizontal axes. The dependence of those four stocks on that factor corresponds to the four vectors directed approximately along that axis. Similarly, the dependence of stocks 5, 6, and 7 primarily on the second factor, interpreted as a retail sector effect, is represented by vectors directed approximately along that axis. Each of the 100 observations is represented in this plot by a point, and their locations indicate the score of each observation for the three factors. For example, points near the top of this plot have the

16-80

Analyze Stock Prices Using Factor Analysis

highest scores for the technology sector factor. The points are scaled to fit within the unit square, so only their relative locations can be determined from the plot. You can use the Data Cursor tool from the Tools menu in the figure window to identify the items in this plot. By clicking a stock (vector), you can read off that stock's loadings for each factor. By clicking an observation (point), you can read off that observation's scores for each factor.

16-81

16

Multivariate Methods

Robust Feature Selection Using NCA for Regression Perform feature selection that is robust to outliers using a custom robust loss function in NCA. Generate data with outliers Generate sample data for regression where the response depends on three of the predictors, namely predictors 4, 7, and 13. rng(123,'twister') % For reproducibility n = 200; X = randn(n,20); y = cos(X(:,7)) + sin(X(:,4).*X(:,13)) + 0.1*randn(n,1);

Add outliers to data. numoutliers = 25; outlieridx = floor(linspace(10,90,numoutliers)); y(outlieridx) = 5*randn(numoutliers,1);

Plot the data. figure plot(y)

16-82

Robust Feature Selection Using NCA for Regression

Use non-robust loss function The performance of the feature selection algorithm highly depends on the value of the regularization parameter. A good practice is to tune the regularization parameter for the best value to use in feature selection. Tune the regularization parameter using five-fold cross validation. Use the mean squared error (MSE): n

MSE =

1 yi − y j ni∑ =1

2

First, partition the data into five folds. In each fold, the software uses 4/5th of the data for training and 1/5th of the data for validation (testing). cvp = cvpartition(length(y),'kfold',5); numtestsets = cvp.NumTestSets;

Compute the lambda values to test for and create an array to store the loss values. lambdavals = linspace(0,3,50)*std(y)/length(y); lossvals = zeros(length(lambdavals),numtestsets);

Perform NCA and compute the loss for each λ value and each fold. for i = 1:length(lambdavals) for k = 1:numtestsets Xtrain = X(cvp.training(k),:); ytrain = y(cvp.training(k),:); Xtest = X(cvp.test(k),:); ytest = y(cvp.test(k),:); nca = fsrnca(Xtrain,ytrain,'FitMethod','exact', ... 'Solver','lbfgs','Verbose',0,'Lambda',lambdavals(i), ... 'LossFunction','mse'); lossvals(i,k) = loss(nca,Xtest,ytest,'LossFunction','mse'); end end

Plot the mean loss corresponding to each lambda value. figure meanloss = mean(lossvals,2); plot(lambdavals,meanloss,'ro-') xlabel('Lambda') ylabel('Loss (MSE)') grid on

16-83

16

Multivariate Methods

Find the λ value that produces the minimum average loss. [~,idx] = min(mean(lossvals,2)); bestlambda = lambdavals(idx) bestlambda = 0.0231

Perform feature selection using the best λ value and MSE. nca = fsrnca(X,y,'FitMethod','exact','Solver','lbfgs', ... 'Verbose',1,'Lambda',bestlambda,'LossFunction','mse'); o Solver = LBFGS, HessianHistorySize = 15, LineSearchMethod = weakwolfe

|================================================================================================ | ITER | FUN VALUE | NORM GRAD | NORM STEP | CURV | GAMMA | ALPHA | ACC |================================================================================================ | 0 | 6.414642e+00 | 8.430e-01 | 0.000e+00 | | 7.117e-01 | 0.000e+00 | Y | 1 | 6.066100e+00 | 9.952e-01 | 1.264e+00 | OK | 3.741e-01 | 1.000e+00 | Y | 2 | 5.498221e+00 | 4.267e-01 | 4.250e-01 | OK | 4.016e-01 | 1.000e+00 | Y | 3 | 5.108548e+00 | 3.933e-01 | 8.564e-01 | OK | 3.599e-01 | 1.000e+00 | Y | 4 | 4.808456e+00 | 2.505e-01 | 9.352e-01 | OK | 8.798e-01 | 1.000e+00 | Y | 5 | 4.677382e+00 | 2.085e-01 | 6.014e-01 | OK | 1.052e+00 | 1.000e+00 | Y | 6 | 4.487789e+00 | 4.726e-01 | 7.374e-01 | OK | 5.593e-01 | 1.000e+00 | Y | 7 | 4.310099e+00 | 2.484e-01 | 4.253e-01 | OK | 3.367e-01 | 1.000e+00 | Y | 8 | 4.258539e+00 | 3.629e-01 | 4.521e-01 | OK | 4.705e-01 | 5.000e-01 | Y | 9 | 4.175345e+00 | 1.972e-01 | 2.608e-01 | OK | 4.018e-01 | 1.000e+00 | Y | 10 | 4.122340e+00 | 9.169e-02 | 2.947e-01 | OK | 3.487e-01 | 1.000e+00 | Y

16-84

Robust Feature Selection Using NCA for Regression

| | | | | | | | |

11 12 13 14 15 16 17 18 19

| | | | | | | | |

4.095525e+00 4.059690e+00 4.029208e+00 4.016358e+00 4.004521e+00 3.986929e+00 3.976342e+00 3.966646e+00 3.959586e+00

| | | | | | | | |

9.798e-02 1.584e-01 7.411e-02 1.068e-01 5.434e-02 6.158e-02 4.966e-02 5.458e-02 1.046e-01

| | | | | | | | |

2.529e-01 5.213e-01 2.076e-01 2.696e-01 1.136e-01 2.993e-01 2.213e-01 2.529e-01 4.169e-01

| | | | | | | | |

OK OK OK OK OK OK OK OK OK

| | | | | | | | |

1.188e+00 9.930e-01 4.886e-01 6.919e-01 5.647e-01 1.353e+00 7.668e-01 1.988e+00 1.858e+00

| | | | | | | | |

1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00

| | | | | | | | |

Y Y Y Y Y Y Y Y Y

|================================================================================================ | ITER | FUN VALUE | NORM GRAD | NORM STEP | CURV | GAMMA | ALPHA | ACC |================================================================================================ | 20 | 3.953759e+00 | 8.248e-02 | 2.892e-01 | OK | 1.040e+00 | 1.000e+00 | Y | 21 | 3.945475e+00 | 3.119e-02 | 1.698e-01 | OK | 1.095e+00 | 1.000e+00 | Y | 22 | 3.941567e+00 | 2.350e-02 | 1.293e-01 | OK | 1.117e+00 | 1.000e+00 | Y | 23 | 3.939468e+00 | 1.296e-02 | 1.805e-01 | OK | 2.287e+00 | 1.000e+00 | Y | 24 | 3.938662e+00 | 8.591e-03 | 5.955e-02 | OK | 1.553e+00 | 1.000e+00 | Y | 25 | 3.938239e+00 | 6.421e-03 | 5.334e-02 | OK | 1.102e+00 | 1.000e+00 | Y | 26 | 3.938013e+00 | 5.449e-03 | 6.773e-02 | OK | 2.085e+00 | 1.000e+00 | Y | 27 | 3.937896e+00 | 6.226e-03 | 3.368e-02 | OK | 7.541e-01 | 1.000e+00 | Y | 28 | 3.937820e+00 | 2.497e-03 | 2.397e-02 | OK | 7.940e-01 | 1.000e+00 | Y | 29 | 3.937791e+00 | 2.004e-03 | 1.339e-02 | OK | 1.863e+00 | 1.000e+00 | Y | 30 | 3.937784e+00 | 2.448e-03 | 1.265e-02 | OK | 9.667e-01 | 1.000e+00 | Y | 31 | 3.937778e+00 | 6.973e-04 | 2.906e-03 | OK | 4.672e-01 | 1.000e+00 | Y | 32 | 3.937778e+00 | 3.038e-04 | 9.502e-04 | OK | 1.060e+00 | 1.000e+00 | Y | 33 | 3.937777e+00 | 2.327e-04 | 1.069e-03 | OK | 1.597e+00 | 1.000e+00 | Y | 34 | 3.937777e+00 | 1.959e-04 | 1.537e-03 | OK | 4.026e+00 | 1.000e+00 | Y | 35 | 3.937777e+00 | 1.162e-04 | 1.464e-03 | OK | 3.418e+00 | 1.000e+00 | Y | 36 | 3.937777e+00 | 8.353e-05 | 3.660e-04 | OK | 7.304e-01 | 5.000e-01 | Y | 37 | 3.937777e+00 | 1.412e-05 | 1.412e-04 | OK | 7.842e-01 | 1.000e+00 | Y | 38 | 3.937777e+00 | 1.277e-05 | 3.808e-05 | OK | 1.021e+00 | 1.000e+00 | Y | 39 | 3.937777e+00 | 8.614e-06 | 3.698e-05 | OK | 2.561e+00 | 1.000e+00 | Y

|================================================================================================ | ITER | FUN VALUE | NORM GRAD | NORM STEP | CURV | GAMMA | ALPHA | ACC |================================================================================================ | 40 | 3.937777e+00 | 3.159e-06 | 5.299e-05 | OK | 4.331e+00 | 1.000e+00 | Y | 41 | 3.937777e+00 | 2.657e-06 | 1.080e-05 | OK | 7.038e-01 | 5.000e-01 | Y | 42 | 3.937777e+00 | 7.054e-07 | 7.036e-06 | OK | 9.519e-01 | 1.000e+00 | Y Infinity norm of the final gradient = 7.054e-07 Two norm of the final step = 7.036e-06, TolX = 1.000e-06 Relative infinity norm of the final gradient = 7.054e-07, TolFun = 1.000e-06 EXIT: Local minimum found.

Plot selected features. figure plot(nca.FeatureWeights,'ro') grid on xlabel('Feature index') ylabel('Feature weight')

16-85

16

Multivariate Methods

Predict the response values using the nca model and plot the fitted (predicted) response values and the actual response values. figure fitted = predict(nca,X); plot(y,'r.') hold on plot(fitted,'b-') xlabel('index') ylabel('Fitted values')

16-86

Robust Feature Selection Using NCA for Regression

fsrnca tries to fit every point in data including the outliers. As a result it assigns nonzero weights to many features besides predictors 4, 7, and 13. Use built-in robust loss function Repeat the same process of tuning the regularization parameter, this time using the built-in ϵinsensitive loss function: l yi, y j = max 0, yi − y j − ϵ

ϵ-insensitive loss function is more robust to outliers than mean squared error. lambdavals = linspace(0,3,50)*std(y)/length(y); cvp = cvpartition(length(y),'kfold',5); numtestsets = cvp.NumTestSets; lossvals = zeros(length(lambdavals),numtestsets); for i = 1:length(lambdavals) for k = 1:numtestsets Xtrain = X(cvp.training(k),:); ytrain = y(cvp.training(k),:); Xtest = X(cvp.test(k),:); ytest = y(cvp.test(k),:); nca = fsrnca(Xtrain,ytrain,'FitMethod','exact', ... 'Solver','sgd','Verbose',0,'Lambda',lambdavals(i), ... 'LossFunction','epsiloninsensitive','Epsilon',0.8);

16-87

16

Multivariate Methods

lossvals(i,k) = loss(nca,Xtest,ytest,'LossFunction','mse'); end end

The ϵ value to use depends on the data and the best value can be determined using cross-validation as well. But choosing the ϵ value is out of scope of this example. The choice of ϵ in this example is mainly for illustrating the robustness of the method. Plot the mean loss corresponding to each lambda value. figure meanloss = mean(lossvals,2); plot(lambdavals,meanloss,'ro-') xlabel('Lambda') ylabel('Loss (MSE)') grid on

Find the lambda value that produces the minimum average loss. [~,idx] = min(mean(lossvals,2)); bestlambda = lambdavals(idx) bestlambda = 0.0187

Fit neighborhood component analysis model using ϵ-insensitive loss function and best lambda value.

16-88

Robust Feature Selection Using NCA for Regression

nca = fsrnca(X,y,'FitMethod','exact','Solver','sgd', ... 'Lambda',bestlambda,'LossFunction','epsiloninsensitive','Epsilon',0.8);

Plot selected features. figure plot(nca.FeatureWeights,'ro') grid on xlabel('Feature index') ylabel('Feature weight')

Plot fitted values. figure fitted = predict(nca,X); plot(y,'r.') hold on plot(fitted,'b-') xlabel('index') ylabel('Fitted values')

16-89

16

Multivariate Methods

ϵ-insensitive loss seems more robust to outliers. It identified fewer features than MSE as relevant. The fit shows that it is still impacted by some of the outliers. Use custom robust loss function Define a custom robust loss function that is robust to outliers to use in feature selection for regression: f (yi, y j) = 1 − exp( − | yi − y j | ) customlossFcn = @(yi,yj) 1 - exp(-abs(yi-yj'));

Tune the regularization parameter using the custom-defined robust loss function. lambdavals = linspace(0,3,50)*std(y)/length(y); cvp = cvpartition(length(y),'kfold',5); numtestsets = cvp.NumTestSets; lossvals = zeros(length(lambdavals),numtestsets); for i = 1:length(lambdavals) for k = 1:numtestsets Xtrain = X(cvp.training(k),:); ytrain = y(cvp.training(k),:); Xtest = X(cvp.test(k),:); ytest = y(cvp.test(k),:); nca = fsrnca(Xtrain,ytrain,'FitMethod','exact', ...

16-90

Robust Feature Selection Using NCA for Regression

'Solver','lbfgs','Verbose',0,'Lambda',lambdavals(i), ... 'LossFunction',customlossFcn); lossvals(i,k) = loss(nca,Xtest,ytest,'LossFunction','mse'); end end

Plot the mean loss corresponding to each lambda value. figure meanloss = mean(lossvals,2); plot(lambdavals,meanloss,'ro-') xlabel('Lambda') ylabel('Loss (MSE)') grid on

Find the λ value that produces the minimum average loss. [~,idx] = min(mean(lossvals,2)); bestlambda = lambdavals(idx) bestlambda = 0.0165

Perform feature selection using the custom robust loss function and best λ value. nca = fsrnca(X,y,'FitMethod','exact','Solver','lbfgs', ... 'Verbose',1,'Lambda',bestlambda,'LossFunction',customlossFcn);

16-91

16

Multivariate Methods

o Solver = LBFGS, HessianHistorySize = 15, LineSearchMethod = weakwolfe

|================================================================================================ | ITER | FUN VALUE | NORM GRAD | NORM STEP | CURV | GAMMA | ALPHA | ACC |================================================================================================ | 0 | 8.610073e-01 | 4.921e-02 | 0.000e+00 | | 1.219e+01 | 0.000e+00 | Y | 1 | 6.582278e-01 | 2.328e-02 | 1.820e+00 | OK | 2.177e+01 | 1.000e+00 | Y | 2 | 5.706490e-01 | 2.241e-02 | 2.360e+00 | OK | 2.541e+01 | 1.000e+00 | Y | 3 | 5.677090e-01 | 2.666e-02 | 7.583e-01 | OK | 1.092e+01 | 1.000e+00 | Y | 4 | 5.620806e-01 | 5.524e-03 | 3.335e-01 | OK | 9.973e+00 | 1.000e+00 | Y | 5 | 5.616054e-01 | 1.428e-03 | 1.025e-01 | OK | 1.736e+01 | 1.000e+00 | Y | 6 | 5.614779e-01 | 4.446e-04 | 8.350e-02 | OK | 2.507e+01 | 1.000e+00 | Y | 7 | 5.614653e-01 | 4.118e-04 | 2.466e-02 | OK | 2.105e+01 | 1.000e+00 | Y | 8 | 5.614620e-01 | 1.307e-04 | 1.373e-02 | OK | 2.002e+01 | 1.000e+00 | Y | 9 | 5.614615e-01 | 9.318e-05 | 4.128e-03 | OK | 3.683e+01 | 1.000e+00 | Y | 10 | 5.614611e-01 | 4.579e-05 | 8.785e-03 | OK | 6.170e+01 | 1.000e+00 | Y | 11 | 5.614610e-01 | 1.232e-05 | 1.582e-03 | OK | 2.000e+01 | 5.000e-01 | Y | 12 | 5.614610e-01 | 3.174e-06 | 4.742e-04 | OK | 2.510e+01 | 1.000e+00 | Y | 13 | 5.614610e-01 | 7.896e-07 | 1.683e-04 | OK | 2.959e+01 | 1.000e+00 | Y Infinity norm of the final gradient = 7.896e-07 Two norm of the final step = 1.683e-04, TolX = 1.000e-06 Relative infinity norm of the final gradient = 7.896e-07, TolFun = 1.000e-06 EXIT: Local minimum found.

Plot selected features. figure plot(nca.FeatureWeights,'ro') grid on xlabel('Feature index') ylabel('Feature weight')

16-92

Robust Feature Selection Using NCA for Regression

Plot fitted values. figure fitted = predict(nca,X); plot(y,'r.') hold on plot(fitted,'b-') xlabel('index') ylabel('Fitted values')

16-93

16

Multivariate Methods

In this case, the loss is not affected by the outliers and results are based on most of the observation values. fsrnca detects the predictors 4, 7, and 13 as relevant features and does not select any other features. Why does the loss function choice affect the results? First, compute the loss functions for a series of values for the difference between two observations. deltay = linspace(-10,10,1000)';

Compute custom loss function values. customlossvals = customlossFcn(deltay,0);

Compute epsilon insensitive loss function and values. epsinsensitive = @(yi,yj,E) max(0,abs(yi-yj')-E); epsinsenvals = epsinsensitive(deltay,0,0.5);

Compute MSE loss function and values. mse = @(yi,yj) (yi-yj').^2; msevals = mse(deltay,0);

Now, plot the loss functions to see their difference and why they affect the results in the way they do. figure plot(deltay,customlossvals,'g-',deltay,epsinsenvals,'b-',deltay,msevals,'r-')

16-94

Robust Feature Selection Using NCA for Regression

xlabel('(yi - yj)') ylabel('loss(yi,yj)') legend('customloss','epsiloninsensitive','mse') ylim([0 20])

As the difference between two response values increases, MSE increases quadratically, which makes it very sensitive to outliers. As fsrnca tries to minimize this loss, it ends up identifying more features as relevant. The epsilon insensitive loss is more resistant to outliers than MSE, but eventually it does start to increase linearly as the difference between two observations increase. As the difference between two observations increase, the robust loss function does approach 1 and stays at that value even though the difference between the observations keeps increasing. Out of three, it is the most robust to outliers.

See Also fsrnca | FeatureSelectionNCARegression | refit | predict | loss

More About •

“Neighborhood Component Analysis (NCA) Feature Selection” on page 16-96

•

“Introduction to Feature Selection” on page 16-46

16-95

16

Multivariate Methods

Neighborhood Component Analysis (NCA) Feature Selection In this section... “NCA Feature Selection for Classification” on page 16-96 “NCA Feature Selection for Regression” on page 16-98 “Impact of Standardization” on page 16-99 “Choosing the Regularization Parameter Value” on page 16-99 Neighborhood component analysis (NCA) is a non-parametric method for selecting features with the goal of maximizing prediction accuracy of regression and classification algorithms. The Statistics and Machine Learning Toolbox functions fscnca and fsrnca perform NCA feature selection with regularization to learn feature weights for minimization of an objective function that measures the average leave-one-out classification or regression loss over the training data.

NCA Feature Selection for Classification Consider a multi-class classification problem with a training set containing n observations: S=

xi, yi , i = 1, 2, …, n ,

where xi ∈ ℝp are the feature vectors, yi ∈ 1, 2, …, c are the class labels, and c is the number of classes. The aim is to learn a classifier f : ℝp prediction f x for the true label y of x.

1, 2, …, c that accepts a feature vector and makes a

Consider a randomized classifier that: • Randomly picks a point, Ref x , from S as the ‘reference point’ for x • Labels x using the label of the reference point Ref x . This scheme is similar to that of a 1-NN classifier where the reference point is chosen to be the nearest neighbor of the new point x. In NCA, the reference point is chosen randomly and all points in S have some probability of being selected as the reference point. The probability P Ref x = x j S that point x j is picked from S as the reference point for x is higher if x j is closer to x as measured by the distance function dw, where dw(xi, x j) =

p

∑

r=1

wr2 xir − x jr ,

and wr are the feature weights. Assume that P Ref x = x j S ∝ k dw x, x j , where k is some kernel or a similarity function that assumes large values when dw x, x j is small. Suppose it is k z = exp −

16-96

z , σ

Neighborhood Component Analysis (NCA) Feature Selection

as suggested in [1]. The reference point for x is chosen from S, so sum of P Ref x = x j S for all j must be equal to 1. Therefore, it is possible to write k dw x, x j

P Ref x = x j S =

n

∑

j=1

.

k dw x, x j

Now consider the leave-one-out application of this randomized classifier, that is, predicting the label −i

of xi using the data in S , the training set S excluding the point xi, yi . The probability that point x j is picked as the reference point for xi is −i

pi j = P Ref xi = x j S

k dw xi, x j

=

n

∑

j = 1, j ≠ i

.

k dw xi, x j

The average leave-one-out probability of correct classification is the probability pi that the −i

randomized classifier correctly classifies observation i using S . pi =

n

−i

∑

j = 1, j ≠ i

P Ref xi = x j S

I yi = y j =

n

∑

j = 1, j ≠ i

pi j yi j,

where 1 if yi = y j,

yi j = I yi = y j =

0 otherwise.

The average leave-one-out probability of correct classification using the randomized classifier can be written as Fw =

n

1 pi . ni∑ =1

The right hand side of F w depends on the weight vector w. The goal of neighborhood component analysis is to maximize F w with respect to w. fscnca uses the regularized objective function as introduced in [1]. Fw = =

=

n

p

1 pi − λ ∑ wr2 ni∑ =1 r=1 n

1 ni∑ =1

n

∑

⚬j = 1, j ≠ i

pi j yi j − λ

p

∑

r=1

wr2

,

Fi w

n

1 Fi w ni∑ =1

where λ is the regularization parameter. The regularization term drives many of the weights in w to 0. After choosing the kernel parameter σ in pi j as 1, finding the weight vector w can be expressed as the following minimization problem for given λ. 16-97

16

Multivariate Methods

w = argmin f w = argmin w

w

n

1 fi w , ni∑ =1

where f(w) = -F(w) and fi(w) = -Fi(w). Note that n

n

1 ∑ pi j = 1, ni∑ = 1 j = 1, j ≠ i and the argument of the minimum does not change if you add a constant to an objective function. Therefore, you can rewrite the objective function by adding the constant 1. w = argmin 1 + f (w) w

= argmin w

n

n

n

n

p

1 ∑ pi j − n1 ∑ ∑ pi jyi j + λ ∑ wr2 ni∑ = 1 j = 1, j ≠ i i = 1 j = 1, j ≠ i r=1 n

n

n

n

p

= argmin

1 ∑ pi j 1 − yi j + λ ∑ wr2 ni∑ r=1 = 1 j = 1, j ≠ i

= argmin

1 ∑ pi jl(yi, y j) + λ ∑ wr2 , ni∑ = 1 j = 1, j ≠ i r=1

w

w

p

where the loss function is defined as l yi, y j =

1 if yi ≠ y j, 0 otherwise.

The argument of the minimum is the weight vector that minimizes the classification error. You can specify a custom loss function using the LossFunction name-value pair argument in the call to fscnca.

NCA Feature Selection for Regression The fsrnca function performs NCA feature selection modified for regression. Given n observations S=

xi, yi , i = 1, 2, …, n ,

the only difference from the classification problem is that the response values yi ∈ ℝ are continuous. In this case, the aim is to predict the response y given the training set S. Consider a randomized regression model that: • Randomly picks a point (Ref x ) from Sas the ‘reference point’ for x • Sets the response value at x equal to the response value of the reference point Ref x . Again, the probability P Ref x = x j S that point x j is picked from S as the reference point for x is P Ref x = x j S =

k dw x, x j n

∑

j=1

16-98

k dw x, x j

.

Neighborhood Component Analysis (NCA) Feature Selection

Now consider the leave-one-out application of this randomized regression model, that is, predicting −i

the response for xi using the data in S , the training set S excluding the point xi, yi . The probability that point x j is picked as the reference point for xi is −i

pi j = P Ref xi = x j S

k dw xi, x j

=

n

∑

j = 1, j ≠ i

.

k dw xi, x j

Let y i be the response value the randomized regression model predicts and yi be the actual response for xi. And let l: ℝ2 ℝ be a loss function that measures the disagreement between y i and yi. Then, the average value of l yi, y i is −i

li = E l yi, y i S

=

n

∑

j = 1, j ≠ i

pi jl yi, y j .

After adding the regularization term, the objective function for minimization is: f w =

n

p

1 li + λ ∑ wr2 . ni∑ =1 r=1

The default loss function l yi, y j for NCA for regression is mean absolute deviation, but you can specify other loss functions, including a custom one, using the LossFunction name-value pair argument in the call to fsrnca.

Impact of Standardization The regularization term drives the weights of irrelevant predictors to zero. In the objective functions for NCA for classification or regression, there is only one regularization parameter λ for all weights. This fact requires the magnitudes of the weights to be comparable to each other. When the feature vectors xi in S are in different scales, this might result in weights that are in different scales and not meaningful. To avoid this situation, standardize the predictors to have zero mean and unit standard deviation before applying NCA. You can standardize the predictors using the 'Standardize',true name-value pair argument in the call to fscnca or fsrnca.

Choosing the Regularization Parameter Value It is usually necessary to select a value of the regularization parameter by calculating the accuracy of the randomized NCA classifier or regression model on an independent test set. If you use crossvalidation instead of a single test set, select the λ value that minimizes the average loss across the cross-validation folds. For examples, see “Tune Regularization Parameter to Detect Features Using NCA for Classification” on page 16-197 and “Tune Regularization Parameter in NCA for Regression” on page 35-3195.

References [1] Yang, W., K. Wang, W. Zuo. "Neighborhood Component Feature Selection for High-Dimensional Data." Journal of Computers. Vol. 7, Number 1, January, 2012. 16-99

16

Multivariate Methods

See Also fscnca | fsrnca | FeatureSelectionNCAClassification | FeatureSelectionNCARegression

More About

16-100

•

“Robust Feature Selection Using NCA for Regression” on page 16-82

•

“Tune Regularization Parameter to Detect Features Using NCA for Classification” on page 16197

•

“Introduction to Feature Selection” on page 16-46

t-SNE

t-SNE In this section... “What Is t-SNE?” on page 16-101 “t-SNE Algorithm” on page 16-101 “Barnes-Hut Variation of t-SNE” on page 16-104 “Characteristics of t-SNE” on page 16-104

What Is t-SNE? t-SNE (tsne) is an algorithm for dimensionality reduction that is well-suited to visualizing highdimensional data. The name stands for t-distributed Stochastic Neighbor Embedding. The idea is to embed high-dimensional points in low dimensions in a way that respects similarities between points. Nearby points in the high-dimensional space correspond to nearby embedded low-dimensional points, and distant points in high-dimensional space correspond to distant embedded low-dimensional points. (Generally, it is impossible to match distances exactly between high-dimensional and low-dimensional spaces.) The tsne function creates a set of low-dimensional points from high-dimensional data. Typically, you visualize the low-dimensional points to see natural clusters in the original high-dimensional data. The algorithm takes the following general steps to embed the data in low dimensions. 1

Calculate the pairwise distances between the high-dimensional points.

2

Create a standard deviation σi for each high-dimensional point i so that the perplexity of each point is at a predetermined level. For the definition of perplexity, see “Compute Distances, Gaussian Variances, and Similarities” on page 16-102.

3

Calculate the similarity matrix. This is the joint probability distribution of X, defined by “Equation 16-1”.

4

Create an initial set of low-dimensional points.

5

Iteratively update the low-dimensional points to minimize the Kullback-Leibler divergence between a Gaussian distribution in the high-dimensional space and a t distribution in the lowdimensional space. This optimization procedure is the most time-consuming part of the algorithm.

See van der Maaten and Hinton [1].

t-SNE Algorithm The basic t-SNE algorithm performs the following steps. • “Prepare Data” on page 16-102 • “Compute Distances, Gaussian Variances, and Similarities” on page 16-102 • “Initialize the Embedding and Divergence” on page 16-103 • “Gradient Descent of Kullback-Leibler Divergence” on page 16-103 16-101

16

Multivariate Methods

Prepare Data tsne first removes each row of the input data X that contains any NaN values. Then, if the Standardize name-value pair is true, tsne centers X by subtracting the mean of each column, and scales X by dividing its columns by their standard deviations. The original authors van der Maaten and Hinton [1] recommend reducing the original data X to a lower-dimensional version using “Principal Component Analysis (PCA)” on page 16-65. You can set the tsne NumPCAComponents name-value pair to the number of dimensions you like, perhaps 50. To exercise more control over this step, preprocess the data using the pca function. Compute Distances, Gaussian Variances, and Similarities After the preprocessing, tsne calculates the distance d(xi,xj) between each pair of points xi and xj in X. You can choose various distance metrics using the Distance name-value pair. By default, tsne uses the standard Euclidean metric. tsne uses the square of the distance metric in its subsequent calculations. Then for each row i of X, tsne calculates a standard deviation σi so that the perplexity of row i is equal to the Perplexity name-value pair. The perplexity is defined in terms of a model Gaussian distribution as follows. As van der Maaten and Hinton [1] describe, “The similarity of data point xj to data point xi is the conditional probability, p j i, that xi would pick xj as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian centered at xi. For nearby data points, p j i is relatively high, whereas for widely separated data points, p j i will be almost infinitesimal (for reasonable values of the variance of the Gaussian, σi).” Define the conditional probability of j given i as 2

pj i =

exp −d xi, x j / 2σi2

∑

k≠i

2

exp −d xi, xk / 2σi2

pi i = 0. Then define the joint probability pij by symmetrizing the conditional probabilities: pi j =

p j i + pi j , 2N

(16-1)

where N is the number of rows of X. The distributions still do not have their standard deviations σi defined in terms of the Perplexity name-value pair. Let Pi represents the conditional probability distribution over all other data points given data point xi. The perplexity of the distribution is perplexity Pi = 2

H Pi

,

where H(Pi) is the Shannon entropy of Pi: H Pi = − ∑ p j ilog2 p j j

i

.

The perplexity measures the effective number of neighbors of point i. tsne performs a binary search over the σi to achieve a fixed perplexity for each point i. 16-102

t-SNE

Initialize the Embedding and Divergence To embed the points in X into a low-dimensional space, tsne performs an optimization. tsne attempts to minimize the Kullback-Leibler divergence between the model Gaussian distribution of the points in X and a Student t distribution of points Y in the low-dimensional space. The minimization procedure begins with an initial set of points Y. tsne create the points by default as random Gaussian-distributed points. You can also create these points yourself and include them in the 'InitialY' name-value pair for tsne. tsne then calculates the similarities between each pair of points in Y. The probability model qij of the distribution of the distances between points yi and yj is 1 + yi − y j

qi j =

∑ ∑

k l≠k

2 −1

1 + yk − yl

2 −1

qii = 0. Using this definition and the model of distances in X given by “Equation 16-1”, the Kullback-Leibler divergence between the joint distribution P and Q is KL(P

Q) =

∑∑

j i≠ j

pi jlog

pi j . qi j

For consequences of this definition, see “Helpful Nonlinear Distortion” on page 16-104. Gradient Descent of Kullback-Leibler Divergence To minimize the Kullback-Leibler divergence, the 'exact' algorithm uses a modified gradient descent procedure. The gradient with respect to the points in Y of the divergence is ∂KL(P Q) = 4 ∑ Z pi j − qi j qi j yi − y j , ∂yi j≠i where the normalization term Z=

∑ ∑

k l≠k

1 + yk − yl

2 −1

.

The modified gradient descent algorithm uses a few tuning parameters to attempt to reach a good local minimum. • 'Exaggeration' — During the first 99 gradient descent steps, tsne multiplies the probabilities pij from “Equation 16-1” by the exaggeration value. This step tends to create more space between clusters in the output Y. • 'LearnRate' — tsne uses adaptive learning to improve the convergence of the gradient descent iterations. The descent algorithm has iterative steps that are a linear combination of the previous step in the descent and the current gradient. 'LearnRate' is a multiplier of the current gradient for the linear combination. For details, see Jacobs [3].

16-103

16

Multivariate Methods

Barnes-Hut Variation of t-SNE To speed the t-SNE algorithm and to cut down on its memory usage, tsne offers an approximate optimization scheme. The Barnes-Hut algorithm groups nearby points together to lower the complexity and memory usage of the t-SNE optimization step. The Barnes-Hut algorithm is an approximate optimizer, not an exact optimizer. There is a nonnegative tuning parameter Theta that effects a tradeoff between speed and accuracy. Larger values of 'Theta' give faster but less accurate optimization results. The algorithm is relatively insensitive to 'Theta' values in the range (0.2,0.8). The Barnes-Hut algorithm groups nearby points in the low-dimensional space, and performs an approximate gradient descent based on these groups. The idea, originally used in astrophysics, is that the gradient is similar for nearby points, so the computations can be simplified. See van der Maaten [2].

Characteristics of t-SNE • “Cannot Use Embedding to Classify New Data” on page 16-104 • “Performance Depends on Data Sizes and Algorithm” on page 16-104 • “Helpful Nonlinear Distortion” on page 16-104 Cannot Use Embedding to Classify New Data Because t-SNE often separates data clusters well, it can seem that t-SNE can classify new data points. However, t-SNE cannot classify new points. The t-SNE embedding is a nonlinear map that is data-dependent. To embed a new point in the low-dimensional space, you cannot use the previous embedding as a map. Instead, run the entire algorithm again. Performance Depends on Data Sizes and Algorithm t-SNE can take a good deal of time to process data. If you have N data points in D dimensions that you want to map to Y dimensions, then • Exact t-SNE takes of order D*N2 operations. • Barnes-Hut t-SNE takes of order D*Nlog(N)*exp(dimension(Y)) operations. So for large data sets, where N is greater than 1000 or so, and where the embedding dimension Y is 2 or 3, the Barnes-Hut algorithm can be faster than the exact algorithm. Helpful Nonlinear Distortion T-SNE maps high-dimensional distances to distorted low-dimensional analogues. Because of the fatter tail of the Student t distribution in the low-dimensional space, tsne often moves close points closer together, and moves far points farther apart than in the high-dimensional space, as illustrated in the following figure. The figure shows both Gaussian and Student t distributions at the points where the densities are at 0.25 and 0.025. The Gaussian density relates to high-dimensional distances, and the t density relates to low-dimensional distances. The t density corresponds to close points being closer, and far points being farther, compared to the Gaussian density. t = linspace(0,5); y1 = normpdf(t,0,1);

16-104

t-SNE

y2 = tpdf(t,1); plot(t,y1,'k',t,y2,'r') hold on x1 = fzero(@(x)normpdf(x,0,1)-0.25,[0,2]); x2 = fzero(@(x)tpdf(x,1)-0.25,[0,2]); z1 = fzero(@(x)normpdf(x,0,1)-0.025,[0,5]); z2 = fzero(@(x)tpdf(x,1)-0.025,[0,5]); plot([0,x1],[0.25,0.25],'k-.') plot([0,z2],[0.025,0.025],'k-.') plot([x1,x1],[0,0.25],'g-',[x2,x2],[0,0.25],'g-') plot([z1,z1],[0,0.025],'g-',[z2,z2],[0,0.025],'g-') text(1.1,.25,'Close points are closer in low-D') text(2.4,.05,'Far points are farther in low-D') legend('Gaussian(0,1)','Student t (df = 1)') xlabel('x') ylabel('Density') title('Density of Gaussian(0,1) and Student t (df = 1)') hold off

This distortion is helpful when it applies. It does not apply in cases such as when the Gaussian variance is high, which lowers the Gaussian peak and flattens the distribution. In such a case, tsne can move close points farther apart than in the original space. To achieve a helpful distortion, • Set the 'Verbose' name-value pair to 2. • Adjust the 'Perplexity' name-value pair so the reported range of variances is not too far from 1, and the mean variance is near 1.

16-105

16

Multivariate Methods

If you can achieve this range of variances, then the diagram applies, and the tsne distortion is helpful. For effective ways to tune tsne, see Wattenberg, Viégas and Johnson [4].

References [1] van der Maaten, Laurens, and Geoffrey Hinton. "Visualizing Data using t-SNE." J. Machine Learning Research 9, 2008, pp. 2579–2605. [2] van der Maaten, Laurens. Barnes-Hut-SNE. arXiv:1301.3342 [cs.LG], 2013. [3] Jacobs, Robert A. "Increased rates of convergence through learning rate adaptation." Neural Networks 1.4, 1988, pp. 295–307. [4] Wattenberg, Martin, Fernanda Viégas, and Ian Johnson. "How to Use t-SNE Effectively." Distill, 2016. Available at How to Use t-SNE Effectively.

See Also Related Examples •

“Visualize High-Dimensional Data Using t-SNE” on page 16-110

•

“t-SNE Output Function” on page 16-107

•

“tsne Settings” on page 16-114

More About

16-106

•

“Principal Component Analysis (PCA)” on page 16-65

•

“Classical Multidimensional Scaling” on page 16-39

•

“Factor Analysis” on page 16-75

t-SNE Output Function

t-SNE Output Function In this section... “t-SNE Output Function Description” on page 16-107 “tsne optimValues Structure” on page 16-107 “t-SNE Custom Output Function” on page 16-108

t-SNE Output Function Description A tsne output function is a function that runs after every NumPrint optimization iterations of the tSNE algorithm. An output function can create plots, or log data to a file or to a workspace variable. The function cannot change the progress of the algorithm, but can halt the iterations. Set output functions using the Options name-value pair argument to the tsne function. Set Options to a structure created using statset or struct. Set the 'OutputFcn' field of the Options structure to a function handle or cell array of function handles. For example, to set an output function named outfun.m, use the following commands. opts = statset('OutputFcn',@outfun); Y = tsne(X,'Options',opts);

Write an output function using the following syntax. function stop = outfun(optimValues,state) stop = false; % do not stop by default switch state case 'init' % Set up plots or open files case 'iter' % Draw plots or update variables case 'done' % Clean up plots or files end

tsne passes the state and optimValues variables to your function. state takes on the values 'init', 'iter', or 'done' as shown in the code snippet.

tsne optimValues Structure optimValues Field

Description

'iteration'

Iteration number

'fval'

Kullback-Leibler divergence, modified by exaggeration during the first 99 iterations

'grad'

Gradient of the Kullback-Leibler divergence, modified by exaggeration during the first 99 iterations

'Exaggeration'

Value of the exaggeration parameter in use in the current iteration

'Y'

Current embedding

16-107

16

Multivariate Methods

t-SNE Custom Output Function This example shows how to use an output function in tsne. Custom Output Function The following code is an output function that performs these tasks: • Keep a history of the Kullback-Leibler divergence and the norm of its gradient in a workspace variable. • Plot the solution and the history as the iterations proceed. • Display a Stop button on the plot to stop the iterations early without losing any information. The output function has an extra input variable, species, that enables its plots to show the correct classification of the data. For information on including extra parameters such as species in a function, see “Parameterizing Functions”. function stop = KLLogging(optimValues,state,species) persistent h kllog iters stopnow switch state case 'init' stopnow = false; kllog = []; iters = []; h = figure; c = uicontrol('Style','pushbutton','String','Stop','Position', ... [10 10 50 20],'Callback',@stopme); case 'iter' kllog = [kllog; optimValues.fval,log(norm(optimValues.grad))]; assignin('base','history',kllog) iters = [iters; optimValues.iteration]; if length(iters) > 1 figure(h) subplot(2,1,2) plot(iters,kllog); xlabel('Iterations') ylabel('Loss and Gradient') legend('Divergence','log(norm(gradient))') title('Divergence and log(norm(gradient))') subplot(2,1,1) gscatter(optimValues.Y(:,1),optimValues.Y(:,2),species) title('Embedding') drawnow end case 'done' % Nothing here end stop = stopnow; function stopme(~,~) stopnow = true; end end

16-108

t-SNE Output Function

Use the Custom Output Function Plot the Fisher iris data, a 4-D data set, in two dimensions using tsne. There is a drop in the Divergence value at iteration 100 because the divergence is scaled by the exaggeration value for earlier iterations. The embedding remains largely unchanged for the last several hundred iterations, so you can save time by clicking the Stop button during the iterations. load fisheriris rng default % for reproducibility opts = statset('OutputFcn',@(optimValues,state) KLLogging(optimValues,state,species)); Y = tsne(meas,'Options',opts,'Algorithm','exact');

See Also Related Examples •

“Visualize High-Dimensional Data Using t-SNE” on page 16-110

•

“tsne Settings” on page 16-114

16-109

16

Multivariate Methods

Visualize High-Dimensional Data Using t-SNE This example shows how to visualize the humanactivity data, which consists of acceleration data collected from smartphones during various activities. tsne reduces the dimension of the data from 60 original dimensions to two or three. tsne creates a nonlinear transformation whose purpose is to enable grouping of points with similar characteristics. Ideally, the tsne result shows clean separation of the 60-dimensional data points into groups. Load and Examine Data Load the humanactivity data, which is available when you run this example. load humanactivity

View a description of the data. Description

Description = 29×1 string " === Human Activity Data === " " " " The humanactivity data set contains 24,075 observations of five different " " physical human activities: Sitting, Standing, Walking, Running, and " " Dancing. Each observation has 60 features extracted from acceleration " " data measured by smartphone accelerometer sensors. The data set contains " " the following variables: " " " " * actid - Response vector containing the activity IDs in integers: 1, 2, " " 3, 4, and 5 representing Sitting, Standing, Walking, Running, and " " Dancing, respectively " " * actnames - Activity names corresponding to the integer activity IDs " " * feat - Feature matrix of 60 features for 24,075 observations " " * featlabels - Labels of the 60 features " " " " The Sensor HAR (human activity recognition) App [1] was used to create " " the humanactivity data set. When measuring the raw acceleration data with " " this app, a person placed a smartphone in a pocket so that the smartphone " " was upside down and the screen faced toward the person. The software then " " calibrated the measured raw data accordingly and extracted the 60 " " features from the calibrated data. For details about the calibration and " " feature extraction, see [2] and [3], respectively. " " " " [1] El Helou, A. Sensor HAR recognition App. MathWorks File Exchange " " http://www.mathworks.com/matlabcentral/fileexchange/54138-sensor-har-recognition-app " " [2] STMicroelectronics, AN4508 Application note. “Parameters and " " calibration of a low-g 3-axis accelerometer.” 2014. " " [3] El Helou, A. Sensor Data Analytics. MathWorks File Exchange " " https://www.mathworks.com/matlabcentral/fileexchange/54139-sensor-data-analytics--french-we

The data set is organized by activity type. To better represent a random set of data, shuffle the rows. n = numel(actid); % Number of data points rng default % For reproducibility idx = randsample(n,n); % Shuffle X = feat(idx,:); % Shuffled data actid = actid(idx); % Shuffled labels

16-110

Visualize High-Dimensional Data Using t-SNE

Associate the activities with the labels in actid. activities = ["Sitting";"Standing";"Walking";"Running";"Dancing"]; activity = activities(actid);

Reduce Dimension of Data to Two Obtain two-dimensional analogues of the data clusters using t-SNE. To save time on this relatively large data set, use the Barnes-Hut variant of the t-SNE algorithm. rng default % For reproducibility Y = tsne(X,Algorithm="barneshut");

Display the result, colored with the correct labels. figure numGroups = length(unique(actid)); clr = hsv(numGroups); gscatter(Y(:,1),Y(:,2),activity,clr)

t-SNE creates clusters of points based solely on their relative similarities. The clusters are not very well separated in this view. Increase Perplexity To obtain better separation between data clusters, try setting the Perplexity parameter to 300. rng default % for reproducibility Y = tsne(X,Algorithm="barneshut",Perplexity=300);

16-111

16

Multivariate Methods

figure gscatter(Y(:,1),Y(:,2),activity,clr)

With the current settings, most of the clusters look better separated and structured. The sitting cluster comes in a few pieces, but these pieces are well-defined. The standing cluster is in two nearly circular pieces with very little data (colors) mixed in from other clusters. The walking cluster is one piece with a small admixture of colors from other activities. The dancing and running data are not separated from each other, but are mainly separated from the other data. This lack of separation means running and dancing are not easily distinguishable; perhaps this result is not surprising. Reduce Dimension of Data to Three t-SNE can also reduce the data to three dimensions. Set the tsne 'NumDimensions' argument to 3. rng default % for fair comparison Y3 = tsne(X,Algorithm="barneshut",Perplexity=300,NumDimensions=3); figure scatter3(Y3(:,1),Y3(:,2),Y3(:,3),15,clr(actid,:),'filled'); view(61,51)

16-112

Visualize High-Dimensional Data Using t-SNE

The clusters seem pretty well separated, with the exception of running and dancing. By rotating the 3-D plot, you can see that running and dancing are more easily distinguished in 3-D than in 2-D.

See Also Related Examples •

“tsne Settings” on page 16-114

More About •

“t-SNE” on page 16-101

16-113

16

Multivariate Methods

tsne Settings This example shows the effects of various tsne settings. Load and Examine Data Load the humanactivity data, which is available when you run this example. load humanactivity

View a description of the data. Description

Description = 29×1 string " === Human Activity Data === " " " " The humanactivity data set contains 24,075 observations of five different " " physical human activities: Sitting, Standing, Walking, Running, and " " Dancing. Each observation has 60 features extracted from acceleration " " data measured by smartphone accelerometer sensors. The data set contains " " the following variables: " " " " * actid - Response vector containing the activity IDs in integers: 1, 2, " " 3, 4, and 5 representing Sitting, Standing, Walking, Running, and " " Dancing, respectively " " * actnames - Activity names corresponding to the integer activity IDs " " * feat - Feature matrix of 60 features for 24,075 observations " " * featlabels - Labels of the 60 features " " " " The Sensor HAR (human activity recognition) App [1] was used to create " " the humanactivity data set. When measuring the raw acceleration data with " " this app, a person placed a smartphone in a pocket so that the smartphone " " was upside down and the screen faced toward the person. The software then " " calibrated the measured raw data accordingly and extracted the 60 " " features from the calibrated data. For details about the calibration and " " feature extraction, see [2] and [3], respectively. " " " " [1] El Helou, A. Sensor HAR recognition App. MathWorks File Exchange " " http://www.mathworks.com/matlabcentral/fileexchange/54138-sensor-har-recognition-app " " [2] STMicroelectronics, AN4508 Application note. “Parameters and " " calibration of a low-g 3-axis accelerometer.” 2014. " " [3] El Helou, A. Sensor Data Analytics. MathWorks File Exchange " " https://www.mathworks.com/matlabcentral/fileexchange/54139-sensor-data-analytics--french-we

The data set is organized by activity type. To better represent a random set of data, shuffle the rows. n = numel(actid); % Number of data points rng default % For reproducibility idx = randsample(n,n); % Shuffle X = feat(idx,:); % Shuffled data actid = actid(idx); % Shuffled labels

Associate the activities with the labels in actid. 16-114

tsne Settings

activities = ["Sitting";"Standing";"Walking";"Running";"Dancing"]; activity = activities(actid);

Process Data Using t-SNE Obtain two-dimensional analogs of the data clusters using t-SNE. rng default % For reproducibility Y = tsne(X); figure numGroups = length(unique(actid)); clr = hsv(numGroups); gscatter(Y(:,1),Y(:,2),activity,clr) title('Default Figure')

t-SNE creates a figure with relatively few data points that seem misplaced. However, the clusters are not very well separated. Perplexity Try altering the perplexity setting to see the effect on the figure. rng default % For fair comparison Y300 = tsne(X,Perplexity=300); figure gscatter(Y300(:,1),Y300(:,2),activity,clr) title('Perplexity 300')

16-115

16

Multivariate Methods

rng default % For fair comparison Y4 = tsne(X,Perplexity=4); figure gscatter(Y4(:,1),Y4(:,2),activity,clr) title('Perplexity 4')

16-116

tsne Settings

Setting the perplexity to 300 gives a figure that has better-separated clusters than the original figure. Setting the perplexity to 4 gives a figure without well separated clusters. For the remainder of this example, use a perplexity of 300. Exaggeration Try altering the exaggeration setting to see the effect on the figure. rng default % For fair comparison YEX20 = tsne(X,Perplexity=300,Exaggeration=20); figure gscatter(YEX20(:,1),YEX20(:,2),activity,clr) title('Exaggeration 20')

16-117

16

Multivariate Methods

rng default % For fair comparison YEx15 = tsne(X,Perplexity=300,Exaggeration=1.5); figure gscatter(YEx15(:,1),YEx15(:,2),activity,clr) title('Exaggeration 1.5')

16-118

tsne Settings

While the exaggeration setting has an effect on the figure, it is not clear whether any nondefault setting gives a better picture than the default setting. The figure with an exaggeration of 20 is similar to the default figure, except the clusters are not quite as well separated. In general, a larger exaggeration creates more empty space between embedded clusters. An exaggeration of 1.5 gives a figure similar to the default exaggeration. Exaggerating the values in the joint distribution of X makes the values in the joint distribution of Y smaller. This makes it much easier for the embedded points to move relative to one another. Learning Rate Try altering the learning rate setting to see the effect on the figure. rng default % For fair comparison YL5 = tsne(X,Perplexity=300,LearnRate=5); figure gscatter(YL5(:,1),YL5(:,2),activity,clr) title('Learning Rate 5')

16-119

16

Multivariate Methods

rng default % For fair comparison YL2000 = tsne(X,Perplexity=300,LearnRate=2000); figure gscatter(YL2000(:,1),YL2000(:,2),activity,clr) title('Learning Rate 2000')

16-120

tsne Settings

The figure with a learning rate of 5 has several clusters that split into two or more pieces. This shows that if the learning rate is too small, the minimization process can get stuck in a bad local minimum. A learning rate of 2000 gives a figure similar to the figure with no setting of the learning rate. Initial Behavior with Various Settings Large learning rates or large exaggeration values can lead to undesirable initial behavior. To see this, set large values of these parameters and set NumPrint and Verbose to 1 to show all the iterations. Stop the iterations after 10, as the goal of this experiment is simply to look at the initial behavior. Begin by setting the exaggeration to 200. rng default % For fair comparison opts = statset(MaxIter=10); YEX5000 = tsne(X,Perplexity=300,Exaggeration=5000,... NumPrint=1,Verbose=1,Options=opts); |==============================================| | ITER | KL DIVERGENCE | NORM GRAD USING | | | FUN VALUE USING | EXAGGERATED DIST| | | EXAGGERATED DIST| OF X | | | OF X | | |==============================================| | 1 | 6.388137e+04 | 6.483115e-04 | | 2 | 6.388775e+04 | 5.267770e-01 | | 3 | 7.131506e+04 | 5.754291e-02 | | 4 | 7.234772e+04 | 6.705418e-02 |

16-121

16

Multivariate Methods

| | | | | |

5 6 7 8 9 10

| | | | | |

7.409144e+04 7.484659e+04 7.445701e+04 7.391345e+04 7.315999e+04 7.265936e+04

| | | | | |

9.278330e-02 1.022587e-01 9.934864e-02 9.633570e-02 1.027610e-01 1.033174e-01

| | | | | |

The Kullback-Leibler divergence increases during the first few iterations, and the norm of the gradient increases as well. To see the final result of the embedding, allow the algorithm to run to completion using the default stopping criteria. rng default % For fair comparison YEX5000 = tsne(X,Perplexity=300,Exaggeration=5000); figure gscatter(YEX5000(:,1),YEX5000(:,2),activity,clr) title('Exaggeration 5000')

This exaggeration value does not give a clean separation into clusters. Show the initial behavior when the learning rate is 1,000,000. rng default % For fair comparison YL1000k = tsne(X,Perplexity=300,LearnRate=1e6,... NumPrint=1,Verbose=1,Options=opts); |==============================================| | ITER | KL DIVERGENCE | NORM GRAD USING |

16-122

tsne Settings

| | FUN VALUE USING | EXAGGERATED DIST| | | EXAGGERATED DIST| OF X | | | OF X | | |==============================================| | 1 | 2.258150e+01 | 4.412730e-07 | | 2 | 2.259045e+01 | 4.857725e-04 | | 3 | 2.945552e+01 | 3.210405e-05 | | 4 | 2.976546e+01 | 4.337510e-05 | | 5 | 2.976928e+01 | 4.626810e-05 | | 6 | 2.969205e+01 | 3.907617e-05 | | 7 | 2.963695e+01 | 4.943976e-05 | | 8 | 2.960336e+01 | 4.572338e-05 | | 9 | 2.956194e+01 | 6.208571e-05 | | 10 | 2.952132e+01 | 5.253798e-05 |

Again, the Kullback-Leibler divergence increases during the first few iterations, and the norm of the gradient increases as well. To see the final result of the embedding, allow the algorithm to run to completion using the default stopping criteria. rng default % For fair comparison YL1000k = tsne(X,Perplexity=300,LearnRate=1e6); figure gscatter(YL1000k(:,1),YL1000k(:,2),activity,clr) title('Learning Rate 1,000,000')

The learning rate is far too large, and gives no useful embedding. 16-123

16

Multivariate Methods

Conclusion tsne with default settings does a good job of embedding the high-dimensional initial data into twodimensional points that have well defined clusters. Increasing the perplexity gives a better-looking embedding with this data. The effects of algorithm settings are difficult to predict. Sometimes they can improve the clustering, but for typically the default settings seem good. While speed is not part of this investigation, settings can affect the speed of the algorithm. In particular, the default Barnes-Hut algorithm is notably faster on this data.

See Also Related Examples •

“Visualize High-Dimensional Data Using t-SNE” on page 16-110

•

“t-SNE Output Function” on page 16-107

More About •

“t-SNE” on page 16-101

External Websites •

16-124

How to Use t-SNE Effectively

Feature Extraction

Feature Extraction In this section... “What Is Feature Extraction?” on page 16-125 “Sparse Filtering Algorithm” on page 16-125 “Reconstruction ICA Algorithm” on page 16-127

What Is Feature Extraction? Feature extraction is a set of methods that map input features to new output features. Many feature extraction methods use unsupervised learning to extract features. Unlike some feature extraction methods such as PCA and NNMF, the methods described in this section can increase dimensionality (and decrease dimensionality). Internally, the methods involve optimizing nonlinear objective functions. For details, see “Sparse Filtering Algorithm” on page 16-125 or “Reconstruction ICA Algorithm” on page 16-127. One typical use of feature extraction is finding features in images. Using these features can lead to improved classification accuracy. For an example, see “Feature Extraction Workflow” on page 16-130. Another typical use is extracting individual signals from superpositions, which is often termed blind source separation. For an example, see “Extract Mixed Signals” on page 16-150. There are two feature extraction functions: rica and sparsefilt. Associated with these functions are the objects that they create: ReconstructionICA and SparseFiltering.

Sparse Filtering Algorithm The sparse filtering algorithm begins with a data matrix X that has n rows and p columns. Each row represents one observation and each column represents one measurement. The columns are also called the features or predictors. The algorithm then takes either an initial random p-by-q weight matrix W or uses the weight matrix passed in the InitialTransformWeights name-value pair. q is the requested number of features that sparsefilt computes. The algorithm attempts to minimize the “Sparse Filtering Objective Function” on page 16-126 by using a standard limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) quasi-Newton optimizer. See Nocedal and Wright [2]. This optimizer takes up to IterationLimit iterations. It stops iterating earlier when it takes a step whose norm is less than StepTolerance, or when it computes that the norm of the gradient at the current point is less than GradientTolerance times a scalar τ, where τ = max 1, min f , g0

∞

.

|f| is the norm of the objective function, and g0

∞

is the infinity norm of the initial gradient.

The objective function attempts to simultaneously obtain few nonzero features for each data point, and for each resulting feature to have nearly equal weight. To understand how the objective function attempts to achieve these goals, see Ngiam, Koh, Chen, Bhaskar, and Ng [1]. Frequently, you obtain good features by setting a relatively small value of IterationLimit, from as low as 5 to a few hundred. Allowing the optimizer to continue can result in overtraining, where the extracted features do not generalize well to new data. 16-125

16

Multivariate Methods

After constructing a SparseFiltering object, use the transform method to map input data to the new output features. Sparse Filtering Objective Function To compute an objective function, the sparse filtering algorithm uses the following steps. The objective function depends on the n-by-p data matrix X and a weight matrix W that the optimizer varies. The weight matrix W has dimensions p-by-q, where p is the number of original features and q is the number of requested features. 1

Compute the n-by-q matrix X*W. Apply the approximate absolute value function −8

ϕ u = u2 + 10 to each element of X*W to obtain the matrix F. ϕ is a smooth nonnegative symmetric function that closely approximates the absolute value function. 2

Normalize the columns of F by the approximate L2 norm. In other words, define the normalized matrix F(i, j) by n

∑

F( j) =

F(i, j)

2

−8

+ 10

i=1

F(i, j) = F(i, j)/ F( j) . 3

Normalize the rows of F(i, j) by the approximate L2 norm. In other words, define the normalized matrix F (i, j) by q

∑

F(i) =

F(i, j)

2

−8

+ 10

j=1

F (i, j) = F(i, j)/ F(i) . The matrix F is the matrix of converted features in X. Once sparsefilt finds the weights W that minimize the objective function h (see below), which the function stores in the output object Mdl in the Mdl.TransformWeights property, the transform function can follow the same transformation steps to convert new data to output features. 4

Compute the objective function h(W) as the 1–norm of the matrix F (i, j), meaning the sum of all the elements in the matrix (which are nonnegative by construction): hW =

q

n

∑ ∑

F (i, j) .

j = 1i = 1

5

If you set the Lambda name-value pair to a strictly positive value, sparsefilt uses the following modified objective function: hW =

q

n

∑ ∑

j = 1i = 1

F (i, j) + λ

q

∑

j=1

wTj w j .

Here, wj is the jth column of the matrix W and λ is the value of Lambda. The effect of this term is to shrink the weights W. If you plot the columns of W as images, with positive Lambda these images appear smooth compared to the same images with zero Lambda.

16-126

Feature Extraction

Reconstruction ICA Algorithm The Reconstruction Independent Component Analysis (RICA) algorithm is based on minimizing an objective function. The algorithm maps input data to output features. The ICA source model is the following. Each observation x is generated by a random vector s according to x = μ + As . • x is a column vector of length p. • μ is a column vector of length p representing a constant term. • s is a column vector of length q whose elements are zero mean, unit variance random variables that are statistically independent of each other. • A is a mixing matrix of size p-by-q. You can use this model in rica to estimate A from observations of x. See “Extract Mixed Signals” on page 16-150. The RICA algorithm begins with a data matrix X that has n rows and p columns consisting of the observations xi: x1T X=

x2T ⋮

.

xnT Each row represents one observation and each column represents one measurement. The columns are also called the features or predictors. The algorithm then takes either an initial random p-by-q weight matrix W or uses the weight matrix passed in the InitialTransformWeights name-value pair. q is the requested number of features that rica computes. The weight matrix W is composed of columns wi of size p-by-1: W = w1 w2 … wq . The algorithm attempts to minimize the “Reconstruction ICA Objective Function” on page 16-128 by using a standard limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) quasi-Newton optimizer. See Nocedal and Wright [2]. This optimizer takes up to IterationLimit iterations. It stops iterating when it takes a step whose norm is less than StepTolerance, or when it computes that the norm of the gradient at the current point is less than GradientTolerance times a scalar τ, where τ = max 1, min f , g0

∞

.

|f| is the norm of the objective function, and g0

∞

is the infinity norm of the initial gradient.

The objective function attempts to obtain a nearly orthonormal weight matrix that minimizes the sum of elements of g(XW), where g is a function (described below) that is applied elementwise to XW. To understand how the objective function attempts to achieve these goals, see Le, Karpenko, Ngiam, and Ng [3]. 16-127

16

Multivariate Methods

After constructing a ReconstructionICA object, use the transform method to map input data to the new output features. Reconstruction ICA Objective Function The objective function uses a contrast function, which you specify by using the ContrastFcn namevalue pair. The contrast function is a smooth convex function that is similar to an absolute value. By 1 default, the contrast function is g = log cosh 2x . For other available contrast functions, see 2 ContrastFcn. For an n-by-p data matrix X and q output features, with a regularization parameter λ as the value of the Lambda name-value pair, the objective function in terms of the p-by-q matrix W is h=

n

λ WW T xi − xi ni∑ =1

2 2+

n

q

1 ∑ σ jg wTj xi ni∑ =1 j=1

The σj are known constants that are ±1. When σj = +1, minimizing the objective function h encourages the histogram of wTj xi to be sharply peaked at 0 (super Gaussian). When σj = –1, minimizing the objective function h encourages the histogram of wTj xi to be flatter near 0 (sub Gaussian). Specify the σj values using the rica NonGaussianityIndicator name-value pair. The objective function h can have a spurious minimum of zero when λ is zero. Therefore, rica minimizes h over W that are normalized to 1. In other words, each column wj of W is defined in terms of a column vector vj by wj =

vj −8 T v j v j + 10

.

rica minimizes over the vj. The resulting minimal matrix W provides the transformation from input data X to output features XW.

References [1] Ngiam, Jiquan, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, and Andrew Y. Ng. “Sparse Filtering.” Advances in Neural Information Processing Systems. Vol. 24, 2011, pp. 1125–1133. https://papers.nips.cc/paper/4334-sparse-filtering.pdf. [2] Nocedal, J. and S. J. Wright. Numerical Optimization, Second Edition. Springer Series in Operations Research, Springer Verlag, 2006. [3] Le, Quoc V., Alexandre Karpenko, Jiquan Ngiam, and Andrew Y. Ng. “ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning.” Advances in Neural Information Processing Systems. Vol. 24, 2011, pp. 1017–1025. https://papers.nips.cc/paper/4467-icawith-reconstruction-cost-for-efficient-overcomplete-featurelearning.pdf.

See Also rica | sparsefilt | ReconstructionICA | SparseFiltering 16-128

Feature Extraction

Related Examples •

“Feature Extraction Workflow” on page 16-130

•

“Extract Mixed Signals” on page 16-150

16-129

16

Multivariate Methods

Feature Extraction Workflow This example shows a complete workflow for feature extraction. The example uses the humanactivity data set, which has 60 predictors and tens of thousands of data samples. Load and Examine Data Load the humanactivity data, which is available when you run this example. load humanactivity

View a description of the data. Description

Description = 29×1 string " === Human Activity Data === " " " " The humanactivity data set contains 24,075 observations of five different " " physical human activities: Sitting, Standing, Walking, Running, and " " Dancing. Each observation has 60 features extracted from acceleration " " data measured by smartphone accelerometer sensors. The data set contains " " the following variables: " " " " * actid - Response vector containing the activity IDs in integers: 1, 2, " " 3, 4, and 5 representing Sitting, Standing, Walking, Running, and " " Dancing, respectively " " * actnames - Activity names corresponding to the integer activity IDs " " * feat - Feature matrix of 60 features for 24,075 observations " " * featlabels - Labels of the 60 features " " " " The Sensor HAR (human activity recognition) App [1] was used to create " " the humanactivity data set. When measuring the raw acceleration data with " " this app, a person placed a smartphone in a pocket so that the smartphone " " was upside down and the screen faced toward the person. The software then " " calibrated the measured raw data accordingly and extracted the 60 " " features from the calibrated data. For details about the calibration and " " feature extraction, see [2] and [3], respectively. " " " " [1] El Helou, A. Sensor HAR recognition App. MathWorks File Exchange " " http://www.mathworks.com/matlabcentral/fileexchange/54138-sensor-har-recognition-app " " [2] STMicroelectronics, AN4508 Application note. “Parameters and " " calibration of a low-g 3-axis accelerometer.” 2014. " " [3] El Helou, A. Sensor Data Analytics. MathWorks File Exchange " " https://www.mathworks.com/matlabcentral/fileexchange/54139-sensor-data-analytics--french-we

The data set is organized by activity type. To better represent a random set of data, shuffle the rows. n = numel(actid); % Number of data points rng(1) % For reproducibility idx = randsample(n,n); % Shuffle X = feat(idx,:); % The corresponding labels are in actid(idx) Labels = actid(idx);

View the activities and corresponding labels. 16-130

Feature Extraction Workflow

tbl = table(["1";"2";"3";"4";"5"],... ["Sitting";"Standing";"Walking";"Running";"Dancing"],... 'VariableNames',{'Label' 'Activity'}); disp(tbl) Label _____

Activity __________

"1" "2" "3" "4" "5"

"Sitting" "Standing" "Walking" "Running" "Dancing"

Set up the data for cross-validation. Use cvpartition to create training and validation sets from the data. c = cvpartition(n,"HoldOut",0.1); idxtrain = training(c); Xtrain = X(idxtrain,:); LabelTrain = Labels(idxtrain); idxtest = test(c); Xtest = X(idxtest,:); LabelTest = Labels(idxtest);

Choose New Feature Dimensions There are several considerations in choosing the number of features to extract: • More features use more memory and computational time. • Fewer features can produce a poor classifier. To begin, choose 5 features. Later you will see the effects of using more features. q = 5;

Extract Features There are two feature extraction functions, sparsefilt and rica. Begin with the sparsefilt function. Set the number of iterations to 10 so that the extraction does not take too long. Typically, you get good results by running the sparsefilt algorithm for a few iterations to a few hundred iterations. Running the algorithm for too many iterations can lead to decreased classification accuracy, a type of overfitting problem. Use sparsefilt to obtain the sparse filtering model while using 10 iterations. tic Mdl = sparsefilt(Xtrain,q,'IterationLimit',10); Warning: Solver LBFGS was not able to converge to a solution. toc Elapsed time is 0.116473 seconds.

sparsefilt warns that the internal LBFGS optimizer did not converge. The optimizer did not converge, at least in part because you set the iteration limit to 10. Nevertheless, you can use the result to train a classifier. 16-131

16

Multivariate Methods

Create Classifier Transform the original data into the new feature representation. NewX = transform(Mdl,Xtrain);

Train a linear classifier based on the transformed data and the correct classification labels in LabelTrain. The accuracy of the learned model is sensitive to the fitcecoc regularization parameter Lambda. Try to find the best value for Lambda by using the OptimizeHyperparameters name-value pair. Be aware that this optimization takes time. If you have a Parallel Computing Toolbox™ license, use parallel computing for faster execution. If you don't have a parallel license, remove the UseParallel calls before running this script. t = templateLinear('Solver','lbfgs'); options = struct('UseParallel',true); tic Cmdl = fitcecoc(NewX,LabelTrain,Learners=t, ... OptimizeHyperparameters="auto",... HyperparameterOptimizationOptions=options);

Copying objective function to workers... Done copying objective function to workers. |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Coding | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 6 | Best | 0.24372 | 4.6976 | 0.24372 | 0.24372 | onevsall | 2 | 6 | Accept | 0.56743 | 6.3649 | 0.24372 | 0.40007 | onevsone | 3 | 6 | Best | 0.047905 | 5.6082 | 0.047905 | 0.18708 | onevsall | 4 | 5 | Best | 0.044259 | 5.9896 | 0.044259 | 0.044417 | onevsone | 5 | 5 | Accept | 0.27949 | 5.8778 | 0.044259 | 0.044417 | onevsall | 6 | 6 | Accept | 0.27072 | 1.4726 | 0.044259 | 0.046079 | onevsall | 7 | 6 | Accept | 0.044951 | 2.6811 | 0.044259 | 0.045078 | onevsone | 8 | 6 | Accept | 0.047074 | 2.4291 | 0.044259 | 0.043721 | onevsall | 9 | 5 | Accept | 0.048782 | 2.2619 | 0.044259 | 0.045714 | onevsall | 10 | 5 | Accept | 0.74128 | 1.3176 | 0.044259 | 0.045714 | onevsone | 11 | 4 | Accept | 0.047951 | 9.4874 | 0.044259 | 0.039012 | onevsall | 12 | 4 | Accept | 0.044259 | 2.4651 | 0.044259 | 0.039012 | onevsone | 13 | 6 | Accept | 0.047305 | 2.1181 | 0.044259 | 0.04016 | onevsall | 14 | 6 | Accept | 0.13762 | 1.563 | 0.044259 | 0.039295 | onevsall | 15 | 6 | Best | 0.044213 | 2.2286 | 0.044213 | 0.040603 | onevsone | 16 | 6 | Best | 0.044028 | 2.1986 | 0.044028 | 0.042724 | onevsone | 17 | 6 | Accept | 0.047397 | 2.2898 | 0.044028 | 0.042701 | onevsall | 18 | 6 | Best | 0.043982 | 2.2335 | 0.043982 | 0.04234 | onevsone | 19 | 6 | Accept | 0.044074 | 4.8905 | 0.043982 | 0.0426 | onevsone | 20 | 5 | Accept | 0.049889 | 4.6319 | 0.043982 | 0.042666 | onevsall |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Coding | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 5 | Accept | 0.044074 | 2.3749 | 0.043982 | 0.042666 | onevsone | 22 | 6 | Accept | 0.047397 | 5.9207 | 0.043982 | 0.042595 | onevsall | 23 | 6 | Accept | 0.044074 | 2.0958 | 0.043982 | 0.042706 | onevsone | 24 | 6 | Accept | 0.74128 | 1.3176 | 0.043982 | 0.042866 | onevsall | 25 | 5 | Accept | 0.044074 | 2.3518 | 0.043982 | 0.043111 | onevsone | 26 | 5 | Accept | 0.04412 | 2.2526 | 0.043982 | 0.043111 | onevsone | 27 | 6 | Accept | 0.044397 | 4.4049 | 0.043982 | 0.04312 | onevsone | 28 | 6 | Accept | 0.33889 | 1.6083 | 0.043982 | 0.043246 | onevsone

16-132

Feature Extraction Workflow

| | |

29 | 30 | 31 |

6 | Accept | 5 | Accept | 5 | Accept |

0.048689 | 0.047397 | 0.1882 |

5.0386 | 6.426 | 1.4843 |

0.043982 | 0.043982 | 0.043982 |

0.043249 | 0.043243 | 0.043243 |

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 31 Total elapsed time: 23.9334 seconds Total objective function evaluation time: 108.0825 Best observed feasible point: Coding Lambda Learner ________ __________ _______ onevsone

4.8906e-08

svm

Observed objective function value = 0.043982 Estimated objective function value = 0.043668 Function evaluation time = 2.2335 Best estimated feasible point (according to models): Coding Lambda Learner ________ __________ _______ onevsone

1.6131e-07

svm

16-133

onevsall onevsall onevsall

16

Multivariate Methods

Estimated objective function value = 0.043243 Estimated function evaluation time = 2.4171 toc Elapsed time is 25.690360 seconds.

Evaluate Classifier Check the error of the classifier when applied to test data. TestX = transform(Mdl,Xtest); Loss = loss(Cmdl,TestX,LabelTest) Loss = 0.0489

Did this transformation result in a better classifier than one trained on the original data? Create a classifier based on the original training data and evaluate its loss. tic Omdl = fitcecoc(Xtrain,LabelTrain,Learners=t, ... OptimizeHyperparameters="auto",... HyperparameterOptimizationOptions=options); Copying objective function to workers...

Warning: Files that have already been attached are being ignored. To see which files are attached

Done copying objective function to workers. |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Coding | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 6 | Best | 0.035259 | 5.5107 | 0.035259 | 0.035259 | onevsone | 2 | 6 | Best | 0.021829 | 8.7406 | 0.021829 | 0.022507 | onevsone | 3 | 6 | Accept | 0.03729 | 13.605 | 0.021829 | 0.021838 | onevsall | 4 | 6 | Accept | 0.022383 | 8.5819 | 0.021829 | 0.021836 | onevsone | 5 | 6 | Accept | 0.024045 | 7.7692 | 0.021829 | 0.02255 | onevsone | 6 | 6 | Accept | 0.040198 | 8.308 | 0.021829 | 0.022615 | onevsall | 7 | 6 | Best | 0.021553 | 7.829 | 0.021553 | 0.0221 | onevsone | 8 | 6 | Accept | 0.021829 | 7.6416 | 0.021553 | 0.021892 | onevsone | 9 | 6 | Best | 0.021506 | 9.0909 | 0.021506 | 0.021558 | onevsone | 10 | 6 | Best | 0.019937 | 33.993 | 0.019937 | 0.019992 | onevsone | 11 | 6 | Accept | 0.021091 | 8.4908 | 0.019937 | 0.019984 | onevsone | 12 | 6 | Accept | 0.019937 | 46.814 | 0.019937 | 0.019961 | onevsone | 13 | 6 | Accept | 0.022152 | 6.9556 | 0.019937 | 0.019965 | onevsone | 14 | 6 | Accept | 0.039275 | 7.372 | 0.019937 | 0.019897 | onevsone | 15 | 6 | Best | 0.019799 | 53.215 | 0.019799 | 0.019695 | onevsone | 16 | 6 | Accept | 0.022568 | 8.594 | 0.019799 | 0.019662 | onevsone | 17 | 6 | Accept | 0.020906 | 20.329 | 0.019799 | 0.019877 | onevsone | 18 | 6 | Accept | 0.023814 | 43.531 | 0.019799 | 0.019875 | onevsall | 19 | 6 | Accept | 0.020353 | 45.355 | 0.019799 | 0.020057 | onevsone | 20 | 6 | Accept | 0.020214 | 46.906 | 0.019799 | 0.020048 | onevsone |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Coding | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 6 | Best | 0.019614 | 51.847 | 0.019614 | 0.02 | onevsone | 22 | 6 | Accept | 0.020168 | 49.186 | 0.019614 | 0.01991 | onevsone

16-134

Feature Extraction Workflow

| | | | | | | |

23 24 25 26 27 28 29 30

| | | | | | | |

6 6 6 6 6 6 6 6

| | | | | | | |

Accept Accept Accept Accept Accept Accept Accept Accept

| | | | | | | |

0.022845 0.024691 0.020629 0.02026 0.020168 0.02026 0.02003 0.024229

| | | | | | | |

130.99 45.677 46.99 47.407 48.879 46.764 46.455 8.5617

| | | | | | | |

0.019614 0.019614 0.019614 0.019614 0.019614 0.019614 0.019614 0.019614

| | | | | | | |

0.019909 0.01991 0.019901 0.019899 0.019912 0.019909 0.019916 0.019905

| | | | | | | |

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 190.1928 seconds Total objective function evaluation time: 921.392 Best observed feasible point: Coding Lambda Learner ________ __________ ________ onevsone

1.5863e-06

logistic

Observed objective function value = 0.019614 Estimated objective function value = 0.019832 Function evaluation time = 51.8468 Best estimated feasible point (according to models):

16-135

onevsall onevsall onevsone onevsone onevsone onevsone onevsone onevsone

16

Multivariate Methods

Coding ________

Lambda _________

Learner ________

onevsone

3.644e-07

logistic

Estimated objective function value = 0.019905 Estimated function evaluation time = 51.7579 toc Elapsed time is 195.893143 seconds. Losso = loss(Omdl,Xtest,LabelTest) Losso = 0.0177

The classifier based on sparse filtering has a higher loss than the classifier based on the original data. However, the classifier uses only 5 features rather than the 60 features in the original data, and is much faster to create. Try to make a better sparse filtering classifier by increasing q from 5 to 20, which is still less than the 60 features in the original data. q = 20; Mdl2 = sparsefilt(Xtrain,q,'IterationLimit',10); Warning: Solver LBFGS was not able to converge to a solution. NewX = transform(Mdl2,Xtrain); TestX = transform(Mdl2,Xtest); tic Cmdl = fitcecoc(NewX,LabelTrain,Learners=t, ... OptimizeHyperparameters="auto",... HyperparameterOptimizationOptions=options); Copying objective function to workers...

Warning: Files that have already been attached are being ignored. To see which files are attached

Done copying objective function to workers. |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Coding | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 6 | Best | 0.12147 | 1.2383 | 0.12147 | 0.12147 | onevsall | 2 | 6 | Accept | 0.74128 | 1.4354 | 0.12147 | 0.43121 | onevsall | 3 | 6 | Accept | 0.74128 | 1.5455 | 0.12147 | 0.53465 | onevsone | 4 | 6 | Best | 0.036136 | 3.915 | 0.036136 | 0.082195 | onevsall | 5 | 6 | Best | 0.032352 | 5.1021 | 0.032352 | 0.032396 | onevsone | 6 | 6 | Accept | 0.040151 | 3.7664 | 0.032352 | 0.032389 | onevsone | 7 | 6 | Accept | 0.045459 | 2.0148 | 0.032352 | 0.032413 | onevsall | 8 | 6 | Accept | 0.032352 | 8.7656 | 0.032352 | 0.032422 | onevsall | 9 | 6 | Accept | 0.74128 | 1.6885 | 0.032352 | 0.0324 | onevsone | 10 | 6 | Best | 0.030183 | 5.2918 | 0.030183 | 0.03016 | onevsone | 11 | 6 | Accept | 0.032583 | 11.351 | 0.030183 | 0.03017 | onevsone | 12 | 6 | Accept | 0.038213 | 2.6803 | 0.030183 | 0.030243 | onevsone | 13 | 6 | Accept | 0.032075 | 8.8183 | 0.030183 | 0.030252 | onevsall | 14 | 6 | Accept | 0.039321 | 1.7803 | 0.030183 | 0.030264 | onevsall | 15 | 6 | Accept | 0.035259 | 2.9937 | 0.030183 | 0.030191 | onevsone | 16 | 5 | Accept | 0.040521 | 5.4636 | 0.030183 | 0.030185 | onevsall | 17 | 5 | Accept | 0.036275 | 5.0559 | 0.030183 | 0.030185 | onevsone

16-136

Feature Extraction Workflow

| 18 | 6 | Accept | 0.13827 | 0.90957 | 0.030183 | 0.030183 | onevsall | 19 | 6 | Accept | 0.030598 | 4.5358 | 0.030183 | 0.030251 | onevsone | 20 | 6 | Accept | 0.058289 | 2.5512 | 0.030183 | 0.030239 | onevsall |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Coding | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 6 | Accept | 0.10919 | 2.3477 | 0.030183 | 0.030234 | onevsone | 22 | 6 | Accept | 0.031475 | 8.4684 | 0.030183 | 0.030235 | onevsall | 23 | 6 | Accept | 0.045274 | 3.9578 | 0.030183 | 0.030226 | onevsall | 24 | 6 | Accept | 0.045782 | 1.9671 | 0.030183 | 0.030202 | onevsone | 25 | 6 | Accept | 0.031244 | 20.475 | 0.030183 | 0.030201 | onevsone | 26 | 6 | Best | 0.029952 | 4.7633 | 0.029952 | 0.029927 | onevsone | 27 | 6 | Best | 0.029814 | 4.8472 | 0.029814 | 0.029915 | onevsone | 28 | 6 | Accept | 0.036229 | 9.8727 | 0.029814 | 0.029915 | onevsall | 29 | 6 | Accept | 0.033413 | 3.7575 | 0.029814 | 0.029921 | onevsone | 30 | 6 | Accept | 0.033967 | 22.019 | 0.029814 | 0.029923 | onevsall

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 39.9471 seconds Total objective function evaluation time: 163.3793 Best observed feasible point: Coding Lambda Learner

16-137

16

Multivariate Methods

________

__________

_______

onevsone

4.6362e-10

svm

Observed objective function value = 0.029814 Estimated objective function value = 0.030009 Function evaluation time = 4.8472 Best estimated feasible point (according to models): Coding Lambda Learner ________ __________ _______ onevsone

6.8351e-10

svm

Estimated objective function value = 0.029923 Estimated function evaluation time = 4.94 toc Elapsed time is 41.581666 seconds. Loss2 = loss(Cmdl,TestX,LabelTest) Loss2 = 0.0320

This time the classification loss is lower than for the 5 feature classifier, but is still higher than the loss for the original data classifier. Again, software takes less time to create the classifier for 20 predictors than the classifier for the full data. Try RICA Try the other feature extraction function, rica. Extract 20 features, create a classifier, and examine its loss on the test data. Use more iterations for the rica function, because rica can perform better with more iterations than sparsefilt uses. Often prior to feature extraction, you "prewhiten" the input data as a data preprocessing step. The prewhitening step includes two transforms, decorrelation and standardization, which make the predictors have zero mean and identity covariance. rica supports only the standardization transform. You use the Standardize name-value pair argument to make the predictors have zero mean and unit variance. Alternatively, you can transform images for contrast normalization individually by applying the zscore transformation before calling sparsefilt or rica. Mdl3 = rica(Xtrain,q,'IterationLimit',400,'Standardize',true); Warning: Solver LBFGS was not able to converge to a solution. NewX = transform(Mdl3,Xtrain); TestX = transform(Mdl3,Xtest); tic Cmdl = fitcecoc(NewX,LabelTrain,Learners=t, ... OptimizeHyperparameters="auto",... HyperparameterOptimizationOptions=options); Copying objective function to workers...

Warning: Files that have already been attached are being ignored. To see which files are attached

Done copying objective function to workers. |================================================================================================

16-138

Feature Extraction Workflow

| Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Coding | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 6 | Best | 0.048689 | 1.7274 | 0.048689 | 0.048689 | onevsone | 2 | 6 | Accept | 0.062442 | 2.2582 | 0.048689 | 0.055562 | onevsone | 3 | 6 | Best | 0.032075 | 2.5078 | 0.032075 | 0.047435 | onevsone | 4 | 4 | Accept | 0.035998 | 3.1521 | 0.025522 | 0.040067 | onevsall | 5 | 4 | Best | 0.025522 | 3.3093 | 0.025522 | 0.040067 | onevsone | 6 | 4 | Accept | 0.035675 | 3.1199 | 0.025522 | 0.040067 | onevsall | 7 | 6 | Accept | 0.077164 | 1.6918 | 0.025522 | 0.025637 | onevsone | 8 | 6 | Accept | 0.026445 | 3.2975 | 0.025522 | 0.031634 | onevsone | 9 | 6 | Accept | 0.025568 | 3.4501 | 0.025522 | 0.027235 | onevsone | 10 | 6 | Accept | 0.025614 | 3.42 | 0.025522 | 0.025585 | onevsone | 11 | 6 | Accept | 0.025706 | 3.2351 | 0.025522 | 0.02559 | onevsone | 12 | 6 | Accept | 0.027783 | 2.7219 | 0.025522 | 0.02559 | onevsone | 13 | 6 | Accept | 0.025614 | 3.1778 | 0.025522 | 0.02559 | onevsone | 14 | 6 | Accept | 0.025568 | 3.0303 | 0.025522 | 0.02559 | onevsone | 15 | 6 | Accept | 0.027968 | 3.7776 | 0.025522 | 0.02559 | onevsone | 16 | 6 | Accept | 0.035398 | 2.7932 | 0.025522 | 0.02559 | onevsall | 17 | 6 | Accept | 0.052151 | 1.5232 | 0.025522 | 0.02559 | onevsall | 18 | 6 | Accept | 0.026214 | 11.715 | 0.025522 | 0.02559 | onevsone | 19 | 6 | Accept | 0.16042 | 1.847 | 0.025522 | 0.025488 | onevsall | 20 | 6 | Accept | 0.033783 | 12.694 | 0.025522 | 0.025591 | onevsall |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Coding | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 6 | Best | 0.025475 | 4.0126 | 0.025475 | 0.02559 | onevsone | 22 | 6 | Accept | 0.038351 | 4.3279 | 0.025475 | 0.02559 | onevsall | 23 | 6 | Accept | 0.12239 | 1.0595 | 0.025475 | 0.025591 | onevsall | 24 | 6 | Accept | 0.026214 | 6.1461 | 0.025475 | 0.025591 | onevsone | 25 | 6 | Accept | 0.037613 | 2.0165 | 0.025475 | 0.025591 | onevsall | 26 | 6 | Accept | 0.034936 | 2.4126 | 0.025475 | 0.02559 | onevsone | 27 | 6 | Accept | 0.034336 | 6.0477 | 0.025475 | 0.025591 | onevsall | 28 | 6 | Accept | 0.033967 | 10.413 | 0.025475 | 0.025591 | onevsall | 29 | 6 | Accept | 0.02626 | 8.3859 | 0.025475 | 0.02559 | onevsone | 30 | 6 | Accept | 0.035582 | 2.8227 | 0.025475 | 0.02559 | onevsall

16-139

16

Multivariate Methods

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 27.1224 seconds Total objective function evaluation time: 122.0946 Best observed feasible point: Coding Lambda Learner ________ __________ _______ onevsone

3.7723e-06

svm

Observed objective function value = 0.025475 Estimated objective function value = 0.025475 Function evaluation time = 4.0126 Best estimated feasible point (according to models): Coding Lambda Learner ________ _________ _______ onevsone

4.677e-10

svm

Estimated objective function value = 0.02559 Estimated function evaluation time = 3.4172 toc

16-140

Feature Extraction Workflow

Elapsed time is 28.228099 seconds. Loss3 = loss(Cmdl,TestX,LabelTest) Loss3 = 0.0275

The rica-based classifier has similar test loss as the 20-feature sparse filtering classifier. The classifier is relatively fast to create. Try More Features The feature extraction functions have few tuning parameters. One parameter that can affect results is the number of requested features. See how well classifiers work when based on 100 features, rather than the 20 features previously tried, or the 60 features in the original data. Using more features than appear in the original data is called "overcomplete" learning. Conversely, using fewer features is called "undercomplete" learning. Overcomplete learning can lead to increased classification accuracy, while undercomplete learning can save memory and time. q = 100; Mdl4 = sparsefilt(Xtrain,q,'IterationLimit',10); Warning: Solver LBFGS was not able to converge to a solution. NewX = transform(Mdl4,Xtrain); TestX = transform(Mdl4,Xtest); tic Cmdl = fitcecoc(NewX,LabelTrain,Learners=t, ... OptimizeHyperparameters="auto",... HyperparameterOptimizationOptions=options); Copying objective function to workers...

Warning: Files that have already been attached are being ignored. To see which files are attached

Done copying objective function to workers. |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Coding | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 6 | Best | 0.039413 | 7.4097 | 0.039413 | 0.039413 | onevsone | 2 | 6 | Accept | 0.056258 | 8.2736 | 0.039413 | 0.047834 | onevsone | 3 | 6 | Accept | 0.050535 | 9.9138 | 0.039413 | 0.048735 | onevsall | 4 | 6 | Best | 0.033967 | 13.569 | 0.033967 | 0.034082 | onevsone | 5 | 6 | Accept | 0.039413 | 7.9397 | 0.033967 | 0.033968 | onevsone | 6 | 6 | Accept | 0.034059 | 13.408 | 0.033967 | 0.033968 | onevsone | 7 | 6 | Best | 0.033598 | 12.795 | 0.033598 | 0.033599 | onevsone | 8 | 6 | Best | 0.031752 | 32.351 | 0.031752 | 0.031753 | onevsall | 9 | 6 | Accept | 0.088841 | 4.3567 | 0.031752 | 0.031753 | onevsone | 10 | 6 | Accept | 0.032121 | 9.7001 | 0.031752 | 0.031753 | onevsone | 11 | 6 | Accept | 0.74128 | 4.2755 | 0.031752 | 0.03176 | onevsone | 12 | 6 | Accept | 0.74128 | 3.1437 | 0.031752 | 0.031767 | onevsone | 13 | 6 | Accept | 0.37322 | 2.9349 | 0.031752 | 0.03177 | onevsall | 14 | 6 | Best | 0.028891 | 20.557 | 0.028891 | 0.028907 | onevsone | 15 | 6 | Best | 0.027875 | 16.755 | 0.027875 | 0.027886 | onevsone | 16 | 6 | Best | 0.027644 | 47.389 | 0.027644 | 0.02765 | onevsall | 17 | 6 | Accept | 0.18322 | 4.166 | 0.027644 | 0.027651 | onevsall | 18 | 6 | Best | 0.019614 | 45.707 | 0.019614 | 0.019626 | onevsone | 19 | 6 | Accept | 0.18363 | 4.9399 | 0.019614 | 0.019626 | onevsall | 20 | 6 | Accept | 0.055058 | 9.7251 | 0.019614 | 0.019626 | onevsall

16-141

16

Multivariate Methods

|================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Coding | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 6 | Accept | 0.035306 | 14.675 | 0.019614 | 0.019625 | onevsall | 22 | 6 | Best | 0.019522 | 28.211 | 0.019522 | 0.019531 | onevsone | 23 | 6 | Accept | 0.038582 | 17.001 | 0.019522 | 0.01953 | onevsall | 24 | 6 | Accept | 0.022245 | 32.142 | 0.019522 | 0.019529 | onevsone | 25 | 6 | Accept | 0.023399 | 31.652 | 0.019522 | 0.01953 | onevsone | 26 | 6 | Accept | 0.020168 | 31.645 | 0.019522 | 0.019535 | onevsone | 27 | 6 | Accept | 0.022937 | 112.17 | 0.019522 | 0.019535 | onevsall | 28 | 6 | Accept | 0.019614 | 27.955 | 0.019522 | 0.01955 | onevsone | 29 | 6 | Accept | 0.020122 | 44.919 | 0.019522 | 0.019549 | onevsone | 30 | 6 | Accept | 0.020722 | 38.788 | 0.019522 | 0.019547 | onevsone

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 149.9449 seconds Total objective function evaluation time: 658.4665 Best observed feasible point: Coding Lambda Learner ________ __________ _______ onevsone

16-142

7.1633e-10

svm

Feature Extraction Workflow

Observed objective function value = 0.019522 Estimated objective function value = 0.019563 Function evaluation time = 28.2111 Best estimated feasible point (according to models): Coding Lambda Learner ________ __________ _______ onevsone

2.0284e-09

svm

Estimated objective function value = 0.019547 Estimated function evaluation time = 29.3667 toc Elapsed time is 153.432841 seconds. Loss4 = loss(Cmdl,TestX,LabelTest) Loss4 = 0.0239

The classifier based on overcomplete sparse filtering with 100 extracted features has low test loss. Mdl5 = rica(Xtrain,q,'IterationLimit',400,'Standardize',true); Warning: Solver LBFGS was not able to converge to a solution. NewX = transform(Mdl5,Xtrain); TestX = transform(Mdl5,Xtest); tic Cmdl = fitcecoc(NewX,LabelTrain,Learners=t, ... OptimizeHyperparameters="auto",... HyperparameterOptimizationOptions=options); Copying objective function to workers...

Warning: Files that have already been attached are being ignored. To see which files are attached

Done copying objective function to workers. |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Coding | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 6 | Best | 0.030875 | 5.8653 | 0.030875 | 0.030875 | onevsone | 2 | 6 | Best | 0.019845 | 6.5249 | 0.019845 | 0.025358 | onevsone | 3 | 6 | Accept | 0.03489 | 4.6883 | 0.019845 | 0.020385 | onevsone | 4 | 6 | Accept | 0.020629 | 15 | 0.019845 | 0.019912 | onevsall | 5 | 6 | Best | 0.015691 | 16.21 | 0.015691 | 0.015748 | onevsone | 6 | 6 | Best | 0.015645 | 11.998 | 0.015645 | 0.015699 | onevsone | 7 | 6 | Best | 0.015368 | 18.798 | 0.015368 | 0.015369 | onevsone | 8 | 6 | Accept | 0.019476 | 6.5533 | 0.015368 | 0.015369 | onevsone | 9 | 6 | Accept | 0.022106 | 19.792 | 0.015368 | 0.015369 | onevsall | 10 | 6 | Accept | 0.11496 | 4.0322 | 0.015368 | 0.01537 | onevsone | 11 | 6 | Accept | 0.021322 | 6.2828 | 0.015368 | 0.015369 | onevsone | 12 | 6 | Accept | 0.084502 | 4.1352 | 0.015368 | 0.015369 | onevsone | 13 | 6 | Accept | 0.015599 | 11.929 | 0.015368 | 0.01537 | onevsone | 14 | 6 | Accept | 0.018876 | 14.777 | 0.015368 | 0.01537 | onevsone | 15 | 6 | Best | 0.015138 | 16.286 | 0.015138 | 0.015138 | onevsone | 16 | 6 | Accept | 0.022799 | 10.342 | 0.015138 | 0.015138 | onevsone

16-143

16

Multivariate Methods

| 17 | 6 | Accept | 0.015322 | 18.918 | 0.015138 | 0.015139 | onevsone | 18 | 6 | Accept | 0.04712 | 4.7692 | 0.015138 | 0.015139 | onevsall | 19 | 6 | Accept | 0.10246 | 4.0277 | 0.015138 | 0.015139 | onevsall | 20 | 6 | Accept | 0.15276 | 4.0699 | 0.015138 | 0.01514 | onevsall |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Coding | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 6 | Best | 0.014953 | 26.102 | 0.014953 | 0.014953 | onevsone | 22 | 6 | Accept | 0.028291 | 7.1206 | 0.014953 | 0.014953 | onevsall | 23 | 6 | Accept | 0.015876 | 19.702 | 0.014953 | 0.014953 | onevsone | 24 | 6 | Accept | 0.015091 | 23.087 | 0.014953 | 0.014953 | onevsone | 25 | 6 | Accept | 0.016845 | 44.235 | 0.014953 | 0.014953 | onevsall | 26 | 6 | Accept | 0.017076 | 54.261 | 0.014953 | 0.014953 | onevsall | 27 | 6 | Accept | 0.016799 | 45.904 | 0.014953 | 0.014953 | onevsall | 28 | 6 | Best | 0.014768 | 27.573 | 0.014768 | 0.014768 | onevsone | 29 | 6 | Accept | 0.018599 | 33.435 | 0.014768 | 0.014768 | onevsall | 30 | 6 | Accept | 0.017584 | 27.061 | 0.014768 | 0.014768 | onevsall

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 113.968 seconds Total objective function evaluation time: 513.4789 Best observed feasible point:

16-144

Feature Extraction Workflow

Coding ________

Lambda __________

Learner ________

onevsone

1.7625e-08

logistic

Observed objective function value = 0.014768 Estimated objective function value = 0.014768 Function evaluation time = 27.5729 Best estimated feasible point (according to models): Coding Lambda Learner ________ __________ ________ onevsone

1.7625e-08

logistic

Estimated objective function value = 0.014768 Estimated function evaluation time = 27.1423 toc Elapsed time is 116.497520 seconds. Loss5 = loss(Cmdl,TestX,LabelTest) Loss5 = 0.0158

The classifier based on RICA with 100 extracted features has similar test loss to the classifier based on sparse filtering and 100 features, and takes less than half the time to create as the classifier trained on the original data. Optimize Hyperparameters by Using bayesopt Feature extraction functions have these tuning parameters: • Iteration limit • Function, either rica or sparsefilt • Parameter Lambda • Number of learned features q • Coding, either onevsone or onevsall The fitcecoc regularization parameter also affects the accuracy of the learned classifier. Include that parameter in the list of hyperparameters as well. To search among the available parameters effectively, try bayesopt. Use the objective function in the supporting file filterica.m, which includes parameters passed from the workspace. To remove sources of variation, fix an initial transform weight matrix. W = randn(1e4,1e3);

Create hyperparameters for the objective function. iterlim = optimizableVariable('iterlim',[5,500],'Type','integer'); lambda = optimizableVariable('lambda',[0,10]); solver = optimizableVariable('solver',{'r','s'},'Type','categorical'); qvar = optimizableVariable('q',[5,100],'Type','integer'); lambdareg = optimizableVariable('lambdareg',[1e-6,1],'Transform','log');

16-145

16

Multivariate Methods

coding = optimizableVariable('coding',{'o','a'},'Type','categorical'); vars = [iterlim,lambda,solver,qvar,lambdareg,coding];

Run the optimization without the warnings that occur when the internal optimizations do not run to completion. Run for 60 iterations instead of the default 30 to give the optimization a better chance of locating a good value. warning('off','stats:classreg:learning:fsutils:Solver:LBFGSUnableToConverge'); tic results = bayesopt(@(x) filterica(x,Xtrain,Xtest,LabelTrain,LabelTest,W),vars, ... 'UseParallel',true,'MaxObjectiveEvaluations',60);

Copying objective function to workers... Done copying objective function to workers. |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | iterlim | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 6 | Best | 0.52943 | 4.912 | 0.52943 | 0.52943 | 70 | 2 | 6 | Best | 0.048927 | 7.0904 | 0.048927 | 0.086902 | 202 | 3 | 6 | Best | 0.027201 | 12.647 | 0.027201 | 0.027592 | 190 | 4 | 6 | Accept | 0.035378 | 8.6101 | 0.027201 | 0.027731 | 54 | 5 | 6 | Accept | 0.25536 | 30.032 | 0.027201 | 0.027466 | 178 | 6 | 6 | Accept | 0.35333 | 32.47 | 0.027201 | 0.027443 | 175 | 7 | 6 | Accept | 0.12827 | 27.117 | 0.027201 | 0.027393 | 325 | 8 | 6 | Accept | 0.34755 | 33.96 | 0.027201 | 0.027396 | 266 | 9 | 6 | Accept | 0.078705 | 2.0607 | 0.027201 | 0.027294 | 53 | 10 | 6 | Best | 0.026029 | 6.6583 | 0.026029 | 0.026189 | 70 | 11 | 6 | Accept | 0.040287 | 0.99791 | 0.026029 | 0.026218 | 9 | 12 | 6 | Accept | 0.055786 | 1.2549 | 0.026029 | 0.026247 | 38 | 13 | 6 | Best | 0.018814 | 8.233 | 0.018814 | 0.018829 | 25 | 14 | 6 | Accept | 0.53812 | 0.49606 | 0.018814 | 0.018875 | 11 | 15 | 6 | Best | 0.016874 | 47.355 | 0.016874 | 0.016839 | 212 | 16 | 6 | Accept | 0.52264 | 17.397 | 0.016874 | 0.016855 | 394 | 17 | 6 | Accept | 0.020328 | 3.7038 | 0.016874 | 0.016822 | 8 | 18 | 6 | Best | 0.015555 | 7.6923 | 0.015555 | 0.0084954 | 9 | 19 | 6 | Accept | 0.035237 | 17.017 | 0.015555 | 0.0086345 | 331 | 20 | 6 | Accept | 0.055538 | 0.73762 | 0.015555 | 0.011825 | 11 |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | iterlim | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 6 | Accept | 0.037108 | 1.0492 | 0.015555 | 0.010948 | 20 | 22 | 6 | Accept | 0.041709 | 1.9456 | 0.015555 | 0.015403 | 28 | 23 | 6 | Accept | 0.018454 | 13.93 | 0.015555 | 0.010483 | 22 | 24 | 6 | Accept | 0.028586 | 11.282 | 0.015555 | 0.01139 | 35 | 25 | 6 | Accept | 0.039866 | 2.4192 | 0.015555 | 0.010809 | 28 | 26 | 6 | Accept | 0.88918 | 49.28 | 0.015555 | 0.013345 | 429 | 27 | 6 | Accept | 0.10786 | 0.53144 | 0.015555 | 0.013969 | 5 | 28 | 6 | Accept | 0.017484 | 56.302 | 0.015555 | 0.013962 | 474 | 29 | 6 | Accept | 0.38735 | 35.182 | 0.015555 | 0.013918 | 497 | 30 | 6 | Accept | 0.044439 | 13.26 | 0.015555 | 0.014003 | 497 | 31 | 6 | Accept | 0.033871 | 1.1249 | 0.015555 | 0.013967 | 18 | 32 | 6 | Accept | 0.74128 | 1.6253 | 0.015555 | 0.013465 | 38 | 33 | 6 | Accept | 0.32977 | 19.047 | 0.015555 | 0.013292 | 497 | 34 | 6 | Accept | 0.13391 | 0.52899 | 0.015555 | 0.013399 | 8 | 35 | 6 | Accept | 0.19883 | 0.65282 | 0.015555 | 0.013247 | 13 | 36 | 6 | Accept | 0.047703 | 0.61038 | 0.015555 | 0.013195 | 7

16-146

Feature Extraction Workflow

| 37 | 6 | Accept | 0.2922 | 20.739 | 0.015555 | 0.013686 | 498 | 38 | 6 | Accept | 0.03113 | 11.814 | 0.015555 | 0.013804 | 287 | 39 | 6 | Accept | 0.05179 | 82.825 | 0.015555 | 0.014658 | 327 | 40 | 6 | Accept | 0.08023 | 0.89041 | 0.015555 | 0.014749 | 28 |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | iterlim | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 41 | 6 | Accept | 0.026725 | 10.287 | 0.015555 | 0.01503 | 13 | 42 | 6 | Accept | 0.054087 | 0.87185 | 0.015555 | 0.014909 | 13 | 43 | 6 | Accept | 0.061574 | 0.80035 | 0.015555 | 0.015371 | 18 | 44 | 6 | Accept | 0.053178 | 1.1839 | 0.015555 | 0.015997 | 20 | 45 | 6 | Accept | 0.03253 | 1.5798 | 0.015555 | 0.01643 | 29 | 46 | 6 | Accept | 0.019437 | 4.5849 | 0.015555 | 0.015791 | 12 | 47 | 6 | Accept | 0.059445 | 14.142 | 0.015555 | 0.015878 | 439 | 48 | 6 | Accept | 0.047275 | 11.728 | 0.015555 | 0.016694 | 488 | 49 | 6 | Accept | 0.040995 | 4.1608 | 0.015555 | 0.015764 | 292 | 50 | 6 | Accept | 0.084049 | 0.84595 | 0.015555 | 0.016101 | 17 | 51 | 6 | Accept | 0.01608 | 9.7893 | 0.015555 | 0.013937 | 27 | 52 | 6 | Accept | 0.046612 | 0.75998 | 0.015555 | 0.013516 | 6 | 53 | 6 | Accept | 0.026378 | 3.9473 | 0.015555 | 0.013503 | 5 | 54 | 6 | Accept | 0.01652 | 5.525 | 0.015555 | 0.013616 | 6 | 55 | 6 | Accept | 0.023626 | 4.3491 | 0.015555 | 0.013606 | 6 | 56 | 6 | Accept | 0.035756 | 1.5968 | 0.015555 | 0.013244 | 11 | 57 | 6 | Accept | 0.031252 | 9.0684 | 0.015555 | 0.013039 | 331 | 58 | 6 | Accept | 0.072441 | 12.75 | 0.015555 | 0.013042 | 492 | 59 | 6 | Accept | 0.02269 | 126.16 | 0.015555 | 0.013115 | 491 | 60 | 6 | Accept | 0.039891 | 11.467 | 0.015555 | 0.01292 | 497

16-147

16

Multivariate Methods

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 60 reached. Total function evaluations: 60 Total elapsed time: 210.4583 seconds Total objective function evaluation time: 831.0775 Best observed feasible point: iterlim lambda solver _______ _______ ______ 9

0.74158

r

q __

lambdareg __________

97

0.00043623

coding ______ o

Observed objective function value = 0.015555 Estimated objective function value = 0.019291 Function evaluation time = 7.6923 Best estimated feasible point (according to models): iterlim lambda solver q lambdareg _______ ______ ______ __ _________ 27

8.5051

r

98

0.0014041

Estimated objective function value = 0.01292 Estimated function evaluation time = 8.4066 toc

16-148

coding ______ o

Feature Extraction Workflow

Elapsed time is 211.892362 seconds. warning('on','stats:classreg:learning:fsutils:Solver:LBFGSUnableToConverge');

The resulting classifier has similar loss (the "Observed objective function value") compared to the classifier using rica for 100 features trained for 400 iterations. To use this classifier, retrieve the best classification model found by bayesopt. t = templateLinear('Lambda',results.XAtMinObjective.lambda,'Solver','lbfgs'); if results.XAtMinObjective.coding == "o" Cmdl = fitcecoc(NewX,LabelTrain,Learners=t,Coding='onevsone'); else Cmdl = fitcecoc(NewX,LabelTrain,Learners=t,Coding='onevsall'); end

See Also rica | sparsefilt | ReconstructionICA | SparseFiltering

Related Examples •

“Extract Mixed Signals” on page 16-150

More About •

“Feature Extraction” on page 16-125

16-149

16

Multivariate Methods

Extract Mixed Signals This example shows how to use rica to disentangle mixed audio signals. You can use rica to perform independent component analysis (ICA) when prewhitening is included as a preprocessing step. The ICA model is x = μ + As . Here, x is a p-by-1 vector of mixed signals, μ is a p-by-1 vector of offset values, A is a p-by- q mixing matrix, and s is a q-by-1 vector of original signals. Suppose first that A is a square matrix. If you know μ and A, you can recover an original signal s from the data x: s = A−1(x − μ) . Using the rica function, you can perform this recovery even without knowing the mixing matrix A or the mean μ. Given a set of several observations x(1), x(2), ..., rica extracts the original signals s(1), s(2), .... Load Data Load a set of six audio files, which ship with MATLAB®. Trim each file to 10,000 samples. files = {'chirp.mat' 'gong.mat' 'handel.mat' 'laughter.mat' 'splat.mat' 'train.mat'}; S = zeros(10000,6); for i = 1:6 test = load(files{i}); y = test.y(1:10000,1); S(:,i) = y; end

Mix Signals Mix the signals together by using a random mixing matrix and add a random offset. rng('default') % For reproducibility mixdata = S*randn(6) + randn(1,6);

To listen to the original sounds, execute this code: for i = 1:6 disp(i); sound(S(:,i)); pause; end

To listen to the mixed sounds, execute this code: for i = 1:6 disp(i); sound(mixdata(:,i));

16-150

Extract Mixed Signals

pause; end

Plot the signals. figure tiledlayout(2,6) for i = 1:6 nexttile(i) plot(S(:,i)) title(['Sound ',num2str(i)]) nexttile(i+6) plot(mixdata(:,i)) title(['Mix ',num2str(i)]) end

The original signals have clear structure. The mixed signals have much less structure. Prewhiten Mixed Signals To separate the signals effectively, "prewhiten" the signals by using the prewhiten function that appears at the end of this example. This function transforms mixdata so that it has zero mean and identity covariance. The idea is the following. If s is a zero-mean source with statistically independent components, then E(s) = 0 16-151

16

Multivariate Methods

E(ssT ) = I . Then the mean and covariance of x are E(x) = μ Cov(x) = AAT = C . Suppose that you know μ and C. In practice, you would estimate these quantities from the sample mean and covariance of the columns of x. You can solve for s in terms of x by −1 T

s = A−1(x − μ) = (AT A)

A (x − μ) .

The latter equation holds even when A is not a square invertible matrix. Suppose that U is a p-by- q matrix of left eigenvectors of the positive semidefinite matrix C, and Σ is the q-by- q matrix of eigenvalues. Then C = UΣUT UT U = I . Then AAT = UΣUT . There are many mixing matrices A that satisfy this last equation. If W is a q-by- q orthonormal matrix, then W T W = WW T = I A = UΣ1/2W . Substituting into the equation for s, s = W T∼ x, where ∼ x = Σ−1/2UT (x − μ) .

∼ x is the prewhitened data. rica computes the unknown matrix W under the assumption that the components of s are as independent as possible. mixdata = prewhiten(mixdata);

Separate All Signals A super-Gaussian source has a sharp peak near zero, such as a histogram of sound 1 shows. figure histogram(S(:,1))

16-152

Extract Mixed Signals

Perform Reconstruction ICA while asking for six features. Indicate that each source is superGaussian. q = 6; Mdl = rica(mixdata,q,'NonGaussianityIndicator',ones(6,1));

Extract the features. If the unmixing procedure is successful, the features are proportional to the original signals. unmixed = transform(Mdl,mixdata);

Compare Unmixed Signals To Original Signals Plot the original and unmixed signals. figure tiledlayout(2,6) for i = 1:6 nexttile(i) plot(S(:,i)) title(['Sound ',num2str(i)]) nexttile(i+6) plot(unmixed(:,i)) title(['Unmix ',num2str(i)]) end

16-153

16

Multivariate Methods

The order of the unmixed signals is different than the original order. Reorder the columns so that the unmixed signals match the corresponding original signals. Scale the unmixed signals to have the same norms as the corresponding original signals. (rica cannot identify the scale of the original signals because any scale can lead to the same signal mixture.) unmixed = unmixed(:,[2,5,4,6,3,1]); for i = 1:6 unmixed(:,i) = unmixed(:,i)/norm(unmixed(:,i))*norm(S(:,i)); end

Plot the original and unmixed signals. figure tiledlayout(2,6) for i = 1:6 nexttile(i) plot(S(:,i)) ylim([-1,1]) title(['Sound ',num2str(i)]) nexttile(i+6) plot(unmixed(:,i)) ylim([-1,1]) title(['Unmix ',num2str(i)]) end

16-154

Extract Mixed Signals

The unmixed signals look similar to the original signals. To listen to the unmixed sounds, execute this code. for i = 1:6 disp(i); sound(unmixed(:,i)); pause; end

Here is the code for the prewhiten function. function Z = prewhiten(X) % X = N-by-P matrix for N observations and P predictors % Z = N-by-P prewhitened matrix % 1. Size of X. [N,P] = size(X); assert(N >= P); % 2. SVD of covariance of X. We could also use svd(X) to proceed but N % can be large and so we sacrifice some accuracy for speed. [U,Sig] = svd(cov(X)); Sig = diag(Sig); Sig = Sig(:)'; % 3. Figure out which values of Sig are non-zero. tol = eps(class(X)); idx = (Sig > max(Sig)*tol);

16-155

16

Multivariate Methods

assert(~all(idx == 0)); % 4. Get the non-zero elements of Sig and corresponding columns of U. Sig = Sig(idx); U = U(:,idx); % 5. Compute prewhitened data. mu = mean(X,1); Z = X-mu; Z = (Z*U)./sqrt(Sig); end

See Also rica | sparsefilt | ReconstructionICA | SparseFiltering

Related Examples •

“Feature Extraction Workflow” on page 16-130

More About •

16-156

“Feature Extraction” on page 16-125

Select Features for Classifying High-Dimensional Data

Select Features for Classifying High-Dimensional Data This example shows how to select features for classifying high-dimensional data. More specifically, it shows how to perform sequential feature selection, which is one of the most popular feature selection algorithms. It also shows how to use holdout and cross-validation to evaluate the performance of the selected features. Reducing the number of features (dimensionality) is important in statistical learning. For many data sets with a large number of features and a limited number of observations, such as bioinformatics data, usually many features are not useful for producing a desired learning result and the limited observations may lead the learning algorithm to overfit to the noise. Reducing features can also save storage and computation time and increase comprehensibility. There are two main approaches to reducing features: feature selection and feature transformation. Feature selection algorithms select a subset of features from the original feature set; feature transformation methods transform data from the original high-dimensional feature space to a new space with reduced dimensionality. Loading the Data Serum proteomic pattern diagnostics can be used to differentiate observations from patients with and without disease. Profile patterns are generated using surface-enhanced laser desorption and ionization (SELDI) protein mass spectrometry. These features are ion intensity levels at specific mass/ charge values. This example uses the high-resolution ovarian cancer data set that was generated using the WCX2 protein array. After some pre-processing steps, similar to those shown in the Bioinformatics Toolbox™ example “Preprocessing Raw Mass Spectrometry Data” (Bioinformatics Toolbox), the data set has two variables obs and grp. The obs variable consists 216 observations with 4000 features. Each element in grp defines the group to which the corresponding row of obs belongs. load ovariancancer; whos Name grp obs

Size 216x1 216x4000

Bytes 25056 3456000

Class

Attributes

cell single

Dividing Data Into a Training Set and a Test Set Some of the functions used in this example call MATLAB® built-in random number generation functions. To duplicate the exact results shown in this example, execute the command below to set the random number generator to a known state. Otherwise, your results may differ. rng(8000,'twister');

The performance on the training data (resubstitution performance) is not a good estimate for a model's performance on an independent test set. Resubstitution performance will usually be overoptimistic. To predict the performance of a selected model, you need to assess its performance on another data set that was not used to build the model. Here, we use cvpartition to divide data into a training set of size 160 and a test set of size 56. Both the training set and the test set have roughly the same group proportions as in grp. We select features using the training data and judge the performance of the selected features on the test data. This is often called holdout validation. Another 16-157

16

Multivariate Methods

simple and widely-used method for evaluating and selecting a model is cross-validation, which will be illustrated later in this example. holdoutCVP = cvpartition(grp,'holdout',56) holdoutCVP = Hold-out cross validation partition NumObservations: 216 NumTestSets: 1 TrainSize: 160 TestSize: 56 IsCustom: 0 dataTrain = obs(holdoutCVP.training,:); grpTrain = grp(holdoutCVP.training);

The Problem of Classifying Data Using All the Features Without first reducing the number of features, some classification algorithms would fail on the data set used in this example, since the number of features is much larger than the number of observations. In this example, we use Quadratic Discriminant Analysis (QDA) as the classification algorithm. If we apply QDA on the data using all the features, as shown in the following, we will get an error because there are not enough samples in each group to estimate a covariance matrix. try yhat = classify(obs(test(holdoutCVP),:), dataTrain, grpTrain,'quadratic'); catch ME display(ME.message); end The covariance matrix of each group in TRAINING must be positive definite.

Selecting Features Using a Simple Filter Approach Our goal is to reduce the dimension of the data by finding a small set of important features which can give good classification performance. Feature selection algorithms can be roughly grouped into two categories: filter methods and wrapper methods. Filter methods rely on general characteristics of the data to evaluate and to select the feature subsets without involving the chosen learning algorithm (QDA in this example). Wrapper methods use the performance of the chosen learning algorithm to evaluate each candidate feature subset. Wrapper methods search for features better fit for the chosen learning algorithm, but they can be significantly slower than filter methods if the learning algorithm takes a long time to run. The concepts of "filters" and "wrappers" are described in John G. Kohavi R. (1997) "Wrappers for feature subset selection", Artificial Intelligence, Vol.97, No.1-2, pp.272-324. This example shows one instance of a filter method and one instance of a wrapper method. Filters are usually used as a pre-processing step since they are simple and fast. A widely-used filter method for bioinformatics data is to apply a univariate criterion separately on each feature, assuming that there is no interaction between features. For example, we might apply the t-test on each feature and compare p-value (or the absolute values of t-statistics) for each feature as a measure of how effective it is at separating groups. dataTrainG1 = dataTrain(grp2idx(grpTrain)==1,:); dataTrainG2 = dataTrain(grp2idx(grpTrain)==2,:); [h,p,ci,stat] = ttest2(dataTrainG1,dataTrainG2,'Vartype','unequal');

16-158

Select Features for Classifying High-Dimensional Data

In order to get a general idea of how well-separated the two groups are by each feature, we plot the empirical cumulative distribution function (CDF) of the p-values: ecdf(p); xlabel('P value'); ylabel('CDF value')

There are about 35% of features having p-values close to zero and over 50% of features having pvalues smaller than 0.05, meaning there are more than 2500 features among the original 5000 features that have strong discrimination power. One can sort these features according to their pvalues (or the absolute values of the t-statistic) and select some features from the sorted list. However, it is usually difficult to decide how many features are needed unless one has some domain knowledge or the maximum number of features that can be considered has been dictated in advance based on outside constraints. One quick way to decide the number of needed features is to plot the MCE (misclassification error, i.e., the number of misclassified observations divided by the number of observations) on the test set as a function of the number of features. Since there are only 160 observations in the training set, the largest number of features for applying QDA is limited, otherwise, there may not be enough samples in each group to estimate a covariance matrix. Actually, for the data used in this example, the holdout partition and the sizes of two groups dictate that the largest allowable number of features for applying QDA is about 70. Now we compute MCE for various numbers of features between 5 and 70 and show the plot of MCE as a function of the number of features. In order to reasonably estimate the performance of the selected model, it is important to use the 160 training samples to fit the QDA model and compute the MCE on the 56 test observations (blue circular marks in the following plot).

16-159

16

Multivariate Methods

To illustrate why resubstitution error is not a good error estimate of the test error, we also show the resubstitution MCE using red triangular marks. [~,featureIdxSortbyP] = sort(p,2); % sort the features testMCE = zeros(1,14); resubMCE = zeros(1,14); nfs = 5:5:70; classf = @(xtrain,ytrain,xtest,ytest) ... sum(~strcmp(ytest,classify(xtest,xtrain,ytrain,'quadratic'))); resubCVP = cvpartition(length(grp),'resubstitution') resubCVP = Resubstitution (no partition of data) NumObservations: 216 NumTestSets: 1 TrainSize: 216 TestSize: 216 IsCustom: 0 for i = 1:14 fs = featureIdxSortbyP(1:nfs(i)); testMCE(i) = crossval(classf,obs(:,fs),grp,'partition',holdoutCVP)... /holdoutCVP.TestSize; resubMCE(i) = crossval(classf,obs(:,fs),grp,'partition',resubCVP)/... resubCVP.TestSize; end plot(nfs, testMCE,'o',nfs,resubMCE,'r^'); xlabel('Number of Features'); ylabel('MCE'); legend({'MCE on the test set' 'Resubstitution MCE'},'location','NW'); title('Simple Filter Feature Selection Method');

16-160

Select Features for Classifying High-Dimensional Data

For convenience, classf is defined as an anonymous function. It fits QDA on the given training set and returns the number of misclassified samples for the given test set. If you were developing your own classification algorithm, you might want to put it in a separate file, as follows: % % %

function err = classf(xtrain,ytrain,xtest,ytest) yfit = classify(xtest,xtrain,ytrain,'quadratic'); err = sum(~strcmp(ytest,yfit));

The resubstitution MCE is over-optimistic. It consistently decreases when more features are used and drops to zero when more than 60 features are used. However, if the test error increases while the resubstitution error still decreases, then overfitting may have occurred. This simple filter feature selection method gets the smallest MCE on the test set when 15 features are used. The plot shows overfitting begins to occur when 20 or more features are used. The smallest MCE on the test set is 12.5%: testMCE(3) ans = 0.1250

These are the first 15 features that achieve the minimum MCE: featureIdxSortbyP(1:15) ans = 1×15 2814

2813

2721

2720

2452

2645

2644

16-161

2642

16

Multivariate Methods

Applying Sequential Feature Selection The above feature selection algorithm does not consider interaction between features; besides, features selected from the list based on their individual ranking may also contain redundant information, so that not all the features are needed. For example, the linear correlation coefficient between the first selected feature (column 2814) and the second selected feature (column 2813) is almost 0.95. corr(dataTrain(:,featureIdxSortbyP(1)),dataTrain(:,featureIdxSortbyP(2))) ans = single 0.9447

This kind of simple feature selection procedure is usually used as a pre-processing step since it is fast. More advanced feature selection algorithms improve the performance. Sequential feature selection is one of the most widely used techniques. It selects a subset of features by sequentially adding (forward search) or removing (backward search) until certain stopping conditions are satisfied. In this example, we use forward sequential feature selection in a wrapper fashion to find important features. More specifically, since the typical goal of classification is to minimize the MCE, the feature selection procedure performs a sequential search using the MCE of the learning algorithm QDA on each candidate feature subset as the performance indicator for that subset. The training set is used to select the features and to fit the QDA model, and the test set is used to evaluate the performance of the finally selected feature. During the feature selection procedure, to evaluate and to compare the performance of the each candidate feature subset, we apply stratified 10-fold cross-validation to the training set. We will illustrate later why applying cross-validation to the training set is important. First we generate a stratified 10-fold partition for the training set: tenfoldCVP = cvpartition(grpTrain,'kfold',10) tenfoldCVP = K-fold cross validation partition NumObservations: 160 NumTestSets: 10 TrainSize: 144 144 144 144 TestSize: 16 16 16 16 16 IsCustom: 0

144 144 144 144 16 16 16 16 16

144

144

Then we use the filter results from the previous section as a pre-processing step to select features. For instance, we select 150 features here: fs1 = featureIdxSortbyP(1:150);

We apply forward sequential feature selection on these 150 features. The function sequentialfs provides a simple way (the default option) to decide how many features are needed. It stops when the first local minimum of the cross-validation MCE is found. fsLocal = sequentialfs(classf,dataTrain(:,fs1),grpTrain,'cv',tenfoldCVP);

The selected features are the following: fs1(fsLocal) ans = 1×3

16-162

Select Features for Classifying High-Dimensional Data

2337

864

3288

To evaluate the performance of the selected model with these three features, we compute the MCE on the 56 test samples. testMCELocal = crossval(classf,obs(:,fs1(fsLocal)),grp,'partition',... holdoutCVP)/holdoutCVP.TestSize testMCELocal = 0.0714

With only three features being selected, the MCE is only a little over half of the smallest MCE using the simple filter feature selection method. The algorithm may have stopped prematurely. Sometimes a smaller MCE is achievable by looking for the minimum of the cross-validation MCE over a reasonable range of number of features. For instance, we draw the plot of the cross-validation MCE as a function of the number of features for up to 50 features. [fsCVfor50,historyCV] = sequentialfs(classf,dataTrain(:,fs1),grpTrain,... 'cv',tenfoldCVP,'Nf',50); plot(historyCV.Crit,'o'); xlabel('Number of Features'); ylabel('CV MCE'); title('Forward Sequential Feature Selection with cross-validation');

16-163

16

Multivariate Methods

The cross-validation MCE reaches the minimum value when 10 features are used and this curve stays flat over the range from 10 features to 35 features. Also, the curve goes up when more than 35 features are used, which means overfitting occurs there. It is usually preferable to have fewer features, so here we pick 10 features: fsCVfor10 = fs1(historyCV.In(10,:)) fsCVfor10 = 1×10 2814

2721

2720

2452

2650

2731

2337

2658

2452

2731

To show these 10 features in the order in which they are selected in the sequential forward procedure, we find the row in which they first become true in the historyCV output: [orderlist,ignore] = find( [historyCV.In(1,:); diff(historyCV.In(1:10,:) )]' ); fs1(orderlist) ans = 1×10 2337

864

3288

2721

2814

2658

To evaluate these 10 features, we compute their MCE for QDA on the test set. We get the smallest MCE value so far: testMCECVfor10 = crossval(classf,obs(:,fsCVfor10),grp,'partition',... holdoutCVP)/holdoutCVP.TestSize testMCECVfor10 = 0.0357

It is interesting to look at the plot of resubstitution MCE values on the training set (i.e., without performing cross-validation during the feature selection procedure) as a function of the number of features: [fsResubfor50,historyResub] = sequentialfs(classf,dataTrain(:,fs1),... grpTrain,'cv','resubstitution','Nf',50); plot(1:50, historyCV.Crit,'bo',1:50, historyResub.Crit,'r^'); xlabel('Number of Features'); ylabel('MCE'); legend({'10-fold CV MCE' 'Resubstitution MCE'},'location','NE');

16-164

Select Features for Classifying High-Dimensional Data

Again, the resubstitution MCE values are overly optimistic here. Most are smaller than the crossvalidation MCE values, and the resubstitution MCE goes to zero when 16 features are used. We can compute the MCE value of these 16 features on the test set to see their real performance: fsResubfor16 = fs1(historyResub.In(16,:)); testMCEResubfor16 = crossval(classf,obs(:,fsResubfor16),grp,'partition',... holdoutCVP)/holdoutCVP.TestSize testMCEResubfor16 = 0.0714

testMCEResubfor16, the performance of these 16 features (chosen by resubstitution during the feature selection procedure) on the test set, is about double that for testMCECVfor10, the performance of the 10 features (chosen by 10-fold cross-validation during the feature selection procedure) on the test set. It again indicates that the resubstitution error generally is not a good performance estimate for evaluating and selecting features. We may want to avoid using resubstitution error, not only during the final evaluation step, but also during the feature selection procedure.

See Also sequentialfs

More About •

“Introduction to Feature Selection” on page 16-46

16-165

16

Multivariate Methods

•

16-166

“Sequential Feature Selection” on page 16-58

Perform Factor Analysis on Exam Grades

Perform Factor Analysis on Exam Grades This example shows how to perform factor analysis using Statistics and Machine Learning Toolbox™. Multivariate data often include a large number of measured variables, and sometimes those variables "overlap" in the sense that groups of them may be dependent. For example, in a decathlon, each athlete competes in 10 events, but several of them can be thought of as "speed" events, while others can be thought of as "strength" events, etc. Thus, a competitor's 10 event scores might be thought of as largely dependent on a smaller set of 3 or 4 types of athletic ability. Factor analysis is a way to fit a model to multivariate data to estimate just this sort of interdependence. The Factor Analysis Model In the factor analysis model, the measured variables depend on a smaller number of unobserved (latent) factors. Because each factor may affect several variables in common, they are known as "common factors". Each variable is assumed to depend on a linear combination of the common factors, and the coefficients are known as loadings. Each measured variable also includes a component due to independent random variability, known as "specific variance" because it is specific to one variable. Specifically, factor analysis assumes that the covariance matrix of your data is of the form SigmaX = Lambda*Lambda' + Psi

where Lambda is the matrix of loadings, and the elements of the diagonal matrix Psi are the specific variances. The function factoran fits the factor analysis model using maximum likelihood. Example: Finding Common Factors Affecting Exam Grades 120 students have each taken five exams, the first two covering mathematics, the next two on literature, and a comprehensive fifth exam. It seems reasonable that the five grades for a given student ought to be related. Some students are good at both subjects, some are good at only one, etc. The goal of this analysis is to determine if there is quantitative evidence that the students' grades on the five different exams are largely determined by only two types of ability. First load the data, then call factoran and request a model fit with a single common factor. load examgrades [Loadings1,specVar1,T,stats] = factoran(grades,1);

factoran's first two return arguments are the estimated loadings and the estimated specific variances. From the estimated loadings, you can see that the one common factor in this model puts large positive weight on all five variables, but most weight on the fifth, comprehensive exam. Loadings1 Loadings1 = 0.6021 0.6686 0.7704 0.7204

16-167

16

Multivariate Methods

0.9153

One interpretation of this fit is that a student might be thought of in terms of their "overall ability", for which the comprehensive exam would be the best available measurement. A student's grade on a more subject-specific test would depend on their overall ability, but also on whether or not the student was strong in that area. This would explain the lower loadings for the first four exams. From the estimated specific variances, you can see that the model indicates that a particular student's grade on a particular test varies quite a lot beyond the variation due to the common factor. specVar1 specVar1 = 0.6375 0.5530 0.4065 0.4810 0.1623

A specific variance of 1 would indicate that there is no common factor component in that variable, while a specific variance of 0 would indicate that the variable is entirely determined by common factors. These exam grades seem to fall somewhere in between, although there is the least amount of specific variation for the comprehensive exam. This is consistent with the interpretation given above of the single common factor in this model. The p-value returned in the stats structure rejects the null hypothesis of a single common factor, so we refit the model. stats.p ans = 0.0332

Next, use two common factors to try and better explain the exam scores. With more than one factor, you could rotate the estimated loadings to try and make their interpretation simpler, but for the moment, ask for an unrotated solution. [Loadings2,specVar2,T,stats] = factoran(grades,2,'rotate','none');

From the estimated loadings, you can see that the first unrotated factor puts approximately equal weight on all five variables, while the second factor contrasts the first two variables with the second two. Loadings2 Loadings2 = 0.6289 0.6992 0.7785

16-168

0.3485 0.3287 -0.2069

Perform Factor Analysis on Exam Grades

0.7246 0.8963

-0.2070 -0.0473

You might interpret these factors as "overall ability" and "quantitative vs. qualitative ability", extending the interpretation of the one-factor fit made earlier. A plot of the variables, where each loading is a coordinate along the corresponding factor's axis, illustrates this interpretation graphically. The first two exams have a positive loading on the second factor, suggesting that they depend on "quantitative" ability, while the second two exams apparently depend on the opposite. The fifth exam has only a small loading on this second factor. biplot(Loadings2, 'varlabels',num2str((1:5)')); title('Unrotated Solution'); xlabel('Latent Factor 1'); ylabel('Latent Factor 2');

From the estimated specific variances, you can see that this two-factor model indicates somewhat less variation beyond that due to the common factors than the one-factor model did. Again, the least amount of specific variance occurs for the fifth exam. specVar2 specVar2 = 0.4829 0.4031

16-169

16

Multivariate Methods

0.3512 0.4321 0.1944

The stats structure shows that there is only a single degree of freedom in this two-factor model. stats.dfe ans = 1

With only five measured variables, you cannot fit a model with more than two factors. Factor Analysis from a Covariance/Correlation Matrix You made the fits above using the raw test scores, but sometimes you might only have a sample covariance matrix that summarizes your data. factoran accepts either a covariance or correlation matrix, using the 'Xtype' parameter, and gives an identical result to that from the raw data. Sigma = cov(grades); [LoadingsCov,specVarCov] = ... factoran(Sigma,2,'Xtype','cov','rotate','none'); LoadingsCov LoadingsCov = 0.6289 0.6992 0.7785 0.7246 0.8963

0.3485 0.3287 -0.2069 -0.2070 -0.0473

Factor Rotation Sometimes, the estimated loadings from a factor analysis model can give a large weight on several factors for some of the measured variables, making it difficult to interpret what those factors represent. The goal of factor rotation is to find a solution for which each variable has only a small number of large loadings, i.e., is affected by a small number of factors, preferably only one. If you think of each row of the loadings matrix as coordinates of a point in M-dimensional space, then each factor corresponds to a coordinate axis. Factor rotation is equivalent to rotating those axes, and computing new loadings in the rotated coordinate system. There are various ways to do this. Some methods leave the axes orthogonal, while others are oblique methods that change the angles between them. Varimax is one common criterion for orthogonal rotation. factoran performs varimax rotation by default, so you do not need to ask for it explicitly. [LoadingsVM,specVarVM,rotationVM] = factoran(grades,2);

A quick check of the varimax rotation matrix returned by factoran confirms that it is orthogonal. Varimax, in effect, rotates the factor axes in the figure above, but keeps them at right angles. 16-170

Perform Factor Analysis on Exam Grades

rotationVM'*rotationVM ans = 1.0000 0.0000

0.0000 1.0000

A biplot of the five variables on the rotated factors shows the effect of varimax rotation. biplot(LoadingsVM, 'varlabels',num2str((1:5)')); title('Varimax Solution'); xlabel('Latent Factor 1'); ylabel('Latent Factor 2');

Varimax has rigidly rotated the axes in an attempt to make all of the loadings close to zero or one. The first two exams are closest to the second factor axis, while the third and fourth are closest to the first axis and the fifth exam is at an intermediate position. These two rotated factors can probably be best interpreted as "quantitative ability" and "qualitative ability". However, because none of the variables are near a factor axis, the biplot shows that orthogonal rotation has not succeeded in providing a simple set of factors. Because the orthogonal rotation was not entirely satisfactory, you can try using promax, a common oblique rotation criterion. [LoadingsPM,specVarPM,rotationPM] = ... factoran(grades,2,'rotate','promax');

16-171

16

Multivariate Methods

A check on the promax rotation matrix returned by factoran shows that it is not orthogonal. Promax, in effect, rotates the factor axes in the first figure separately, allowing them to have an oblique angle between them. rotationPM'*rotationPM ans = 1.9405 -1.3509

-1.3509 1.9405

A biplot of the variables on the new rotated factors shows the effect of promax rotation. biplot(LoadingsPM, 'varlabels',num2str((1:5)')); title('Promax Solution'); xlabel('Latent Factor 1'); ylabel('Latent Factor 2');

Promax has performed a non-rigid rotation of the axes, and has done a much better job than varimax at creating a "simple structure". The first two exams are close to the second factor axis, while the third and fourth are close to the first axis, and the fifth exam is in an intermediate position. This makes an interpretation of these rotated factors as "quantitative ability" and "qualitative ability" more precise. Instead of plotting the variables on the different sets of rotated axes, it's possible to overlay the rotated axes on an unrotated biplot to get a better idea of how the rotated and unrotated solutions are related. 16-172

Perform Factor Analysis on Exam Grades

h1 = biplot(Loadings2, 'varlabels',num2str((1:5)')); xlabel('Latent Factor 1'); ylabel('Latent Factor 2'); hold on invRotVM = inv(rotationVM); h2 = line([-invRotVM(1,1) invRotVM(1,1) NaN -invRotVM(2,1) invRotVM(2,1)], ... [-invRotVM(1,2) invRotVM(1,2) NaN -invRotVM(2,2) invRotVM(2,2)],'Color',[1 0 0]); invRotPM = inv(rotationPM); h3 = line([-invRotPM(1,1) invRotPM(1,1) NaN -invRotPM(2,1) invRotPM(2,1)], ... [-invRotPM(1,2) invRotPM(1,2) NaN -invRotPM(2,2) invRotPM(2,2)],'Color',[0 1 0]); hold off axis square lgndHandles = [h1(1) h1(end) h2 h3]; lgndLabels = {'Variables','Unrotated Axes','Varimax Rotated Axes','Promax Rotated Axes'}; legend(lgndHandles, lgndLabels, 'location','northeast', 'fontname','arial narrow');

Predicting Factor Scores Sometimes, it is useful to be able to classify an observation based on its factor scores. For example, if you accepted the two-factor model and the interpretation of the promax rotated factors, you might want to predict how well a student would do on a mathematics exam in the future. Since the data are the raw exam grades, and not just their covariance matrix, we can have factoran return predictions of the value of each of the two rotated common factors for each student. [Loadings,specVar,rotation,stats,preds] = ... factoran(grades,2,'rotate','promax','maxit',200); biplot(Loadings, 'varlabels',num2str((1:5)'), 'Scores',preds);

16-173

16

Multivariate Methods

title('Predicted Factor Scores for Promax Solution'); xlabel('Ability In Literature'); ylabel('Ability In Mathematics');

This plot shows the model fit in terms of both the original variables (vectors) and the predicted scores for each observation (points). The fit suggests that, while some students do well in one subject but not the other (second and fourth quadrants), most students do either well or poorly in both mathematics and literature (first and third quadrants). You can confirm this by looking at the estimated correlation matrix of the two factors. inv(rotation'*rotation) ans = 1.0000 0.6962

0.6962 1.0000

A Comparison of Factor Analysis and Principal Components Analysis There is a good deal of overlap in terminology and goals between Principal Components Analysis (PCA) and Factor Analysis (FA). Much of the literature on the two methods does not distinguish between them, and some algorithms for fitting the FA model involve PCA. Both are dimensionreduction techniques, in the sense that they can be used to replace a large set of observed variables with a smaller set of new variables. They also often give similar results. However, the two methods are different in their goals and in their underlying models. Roughly speaking, you should use PCA when you simply need to summarize or approximate your data using fewer dimensions (to visualize it, 16-174

Perform Factor Analysis on Exam Grades

for example), and you should use FA when you need an explanatory model for the correlations among your data.

16-175

16

Multivariate Methods

Classical Multidimensional Scaling Applied to Nonspatial Distances This example shows how to perform classical multidimensional scaling using the cmdscale function in Statistics and Machine Learning Toolbox™. Classical multidimensional scaling, also known as Principal Coordinates Analysis, takes a matrix of interpoint distances, and creates a configuration of points. Ideally, those points can be constructed in two or three dimensions, and the Euclidean distances between them approximately reproduce the original distance matrix. Thus, a scatter plot of the points provides a visual representation of the original distances. This example illustrates applications of multidimensional scaling to dissimilarity measures other than spatial distance, and shows how to construct a configuration of points to visualize those dissimilarities. This example describes classical multidimensional scaling. The mdscale function performs nonclassical MDS, which is sometimes more flexible than the classical method. Nonclassical MDS is described in the “Nonclassical Multidimensional Scaling” on page 16-184 example. Reconstructing Spatial Locations from Nonspatial Distances Suppose you have measured the genetic "distance", or dissimilarity, between a number of local subpopulations of a single species of animal. You also know their geographic locations, and would like to know how closely their genetic and spatial distances correspond. If they do, that is evidence that interbreeding between the subpopulations is affected by their geographic locations. Below are the spatial locations of the subpopulations, and the upper-triangle of the matrix of genetic distances, in the same vector format produced by pdist. X = [39.1 40.7 41.5 39.2 38.7 41.7 40.1 39.2

18.7; 21.2; 21.5; 21.8; 20.6; 20.1; 22.1; 21.6];

D = [4.69 6.79 3.50 3.11 2.10 2.27 2.65 3.78 4.53 1.98

4.46 2.36 2.83 4.35 3.80

5.57 1.99 2.44 2.07 3.31 4.35

3.00 ... 1.74 ... 3.79 ... 0.53 ... 1.47 ... 3.82 ... 2.57];

Although this vector format for D is space-efficient, it's often easier to see the distance relationships if you reformat the distances to a square matrix. squareform(D) ans = 8×8 0 4.6900

16-176

4.6900 0

6.7900 2.1000

3.5000 2.2700

3.1100 2.6500

4.4600 2.3600

5.5700 1.9900

3.0000 1.7400

Classical Multidimensional Scaling Applied to Nonspatial Distances

6.7900 3.5000 3.1100 4.4600 5.5700 3.0000

2.1000 2.2700 2.6500 2.3600 1.9900 1.7400

0 3.7800 4.5300 2.8300 2.4400 3.7900

3.7800 0 1.9800 4.3500 2.0700 0.5300

4.5300 1.9800 0 3.8000 3.3100 1.4700

2.8300 4.3500 3.8000 0 4.3500 3.8200

2.4400 2.0700 3.3100 4.3500 0 2.5700

3.7900 0.5300 1.4700 3.8200 2.5700 0

cmdscale recognizes either of the two formats. [Y,eigvals] = cmdscale(D);

cmdscale's first output, Y, is a matrix of points created to have interpoint distances that reproduce the distances in D. With eight species, the points (rows of Y) could have as many as eight dimensions (columns of Y). Visualization of the genetic distances depends on using points in only two or three dimensions. Fortunately, cmdscale's second output, eigvals, is a set of sorted eigenvalues whose relative magnitudes indicate how many dimensions you can safely use. If only the first two or three eigenvalues are large, then only those coordinates of the points in Y are needed to accurately reproduce D. If more than three eigenvalues are large, then it is not possible to find a good lowdimensional configuration of points, and it will not be easy to visualize the distances. [eigvals eigvals/max(abs(eigvals))] ans = 8×2 29.0371 13.5746 2.0987 0.7418 0.3403 0.0000 -0.4542 -3.1755

1.0000 0.4675 0.0723 0.0255 0.0117 0.0000 -0.0156 -0.1094

Notice that there are only two large positive eigenvalues, so the configuration of points created by cmdscale can be plotted in two dimensions. The two negative eigenvalues indicate that the genetic distances are not Euclidean, that is, no configuration of points can reproduce D exactly. Fortunately, the negative eigenvalues are small relative to the largest positive ones, and the reduction to the first two columns of Y should be fairly accurate. You can check this by looking at the error in the distances between the two-dimensional configuration and the original distances. maxrelerr = max(abs(D - pdist(Y(:,1:2)))) / max(D) maxrelerr = 0.1335

Now you can compare the "genetic locations" created by cmdscale to the actual geographic locations. Because the configuration returned by cmdscale is unique only up to translation, rotation, and reflection, the genetic locations probably won't match the geographic locations. They will also have the wrong scale. But you can use the procrustes command to match up the two sets of points best in the least squares sense. [D,Z] = procrustes(X,Y(:,1:2)); plot(X(:,1),X(:,2),'bo',Z(:,1),Z(:,2),'rd'); labels = num2str((1:8)'); text(X(:,1)+.05,X(:,2),labels,'Color','b'); text(Z(:,1)+.05,Z(:,2),labels,'Color','r');

16-177

16

Multivariate Methods

xlabel('Distance East of Reference Point (Km)'); ylabel('Distance North of Reference Point (Km)'); legend({'Spatial Locations','Constructed Genetic Locations'},'Location','SE');

This plot shows the best match of the reconstructed points in the same coordinates as the actual spatial locations. Apparently, the genetic distances do have a close link to the spatial distances between the subpopulations. Visualizing a Correlation Matrix Using Multidimensional Scaling Suppose you have computed the following correlation matrix for a set of 10 variables. It's obvious that these variables are all positively correlated, and that there are some very strong pairwise correlations. But with this many variables, it's not easy to get a good feel for the relationships among all 10. Rho = ... [1 0.3906 0.3746 0.3318 0.4141 0.4279 0.4216 0.4703 0.4362 0.2066

16-178

0.3906 1 0.3200 0.3629 0.2211 0.9520 0.9811 0.9052 0.4567 0

0.3746 0.3200 1 0.8993 0.7999 0.3589 0.3460 0.3333 0.8639 0.6527

0.3318 0.3629 0.8993 1 0.7125 0.3959 0.3663 0.3394 0.8719 0.5726

0.4141 0.2211 0.7999 0.7125 1 0.2374 0.2079 0.2335 0.7050 0.7469

0.4279 0.9520 0.3589 0.3959 0.2374 1 0.9657 0.9363 0.4791 0.0254

0.4216 0.9811 0.3460 0.3663 0.2079 0.9657 1 0.9123 0.4554 0.0011

0.4703 0.9052 0.3333 0.3394 0.2335 0.9363 0.9123 1 0.4418 0.0099

0.4362 0.4567 0.8639 0.8719 0.7050 0.4791 0.4554 0.4418 1 0.5272

0.2066; 0 ; 0.6527; 0.5726; 0.7469; 0.0254; 0.0011; 0.0099; 0.5272; 1 ];

Classical Multidimensional Scaling Applied to Nonspatial Distances

Multidimensional scaling is often thought of as a way to (re)construct points using only pairwise distances. But it can also be used with dissimilarity measures that are more general than distance, to spatially visualize things that are not "points in space" in the usual sense. The variables described by Rho are an example, and you can use cmdscale to plot a visual representation of their interdependencies. Correlation actually measures similarity, but it is easy to transform it to a measure of dissimilarity. Because all the correlations here are positive, you can simply use D = 1 - Rho;

although other choices might also make sense. If Rho contained negative correlations, you would have to decide whether, for example, a correlation of -1 indicated more or less of a dissimilarity than a correlation of 0, and choose a transformation accordingly. It's important to decide whether visualization of the information in the correlation matrix is even possible, that is, whether the number of dimensions can be reduced from ten down to two or three. The eigenvalues returned by cmdscale give you a way to decide. In this case, a scree plot of those eigenvalues indicates that two dimensions are enough to represent the variables. (Notice that some of the eigenvalues in the plot below are negative, but small relative to the first two.) [Y,eigvals] = cmdscale(D); plot(1:length(eigvals),eigvals,'o-'); yline(0,':') axis([1,length(eigvals),min(eigvals),max(eigvals)*1.1]); xlabel('Eigenvalue number'); ylabel('Eigenvalue');

16-179

16

Multivariate Methods

In a more independent set of variables, more dimensions might be needed. If more than three variables are needed, the visualization isn't all that useful. A 2-D plot of the configuration returned by cmdscale indicates that there are two subsets of variables that are most closely correlated among themselves, plus a single variable that is more or less on its own. One of the clusters is tight, while the other is relatively loose. labels = {' 1',' 2',' 3',' 4',' 5',' 6',' 7',' 8',' 9',' 10'}; plot(Y(:,1),Y(:,2),'bx'); axis(max(max(abs(Y))) * [-1.1,1.1,-1.1,1.1]); axis('square'); text(Y(:,1),Y(:,2),labels,'HorizontalAlignment','left'); xline(0,'Color',[.7 .7 .7]) yline(0,'Color',[.7 .7 .7])

On the other hand, the results from cmdscale for the following correlation matrix indicate a much different structure: there are no real groups among the variables. Rather, there is a kind of "circular" dependency, where each variable has a pair of "closest neighbors" but is less well correlated with the remaining variables. Rho = ... [1 0.7946 0.1760 0.2560 0.7818 0.4496 0.2732

16-180

0.7946 1 0.1626 0.4227 0.5674 0.6183 0.4004

0.1760 0.1626 1 0.2644 0.1864 0.1859 0.4330

0.2560 0.4227 0.2644 1 0.1017 0.7426 0.8340

0.7818 0.5674 0.1864 0.1017 1 0.2733 0.1484

0.4496 0.6183 0.1859 0.7426 0.2733 1 0.6303

0.2732 0.4004 0.4330 0.8340 0.1484 0.6303 1

0.3995 0.2283 0.4656 0 0.4890 0.0648 0.1444

0.5305 0.3495 0.3947 0.0499 0.6138 0.1035 0.1357

0.2827; 0.2777; 0.8057; 0.4853; 0.2025; 0.3242; 0.6291;

Classical Multidimensional Scaling Applied to Nonspatial Distances

0.3995 0.5305 0.2827

0.2283 0.3495 0.2777

0.4656 0.3947 0.8057

0 0.0499 0.4853

0.4890 0.6138 0.2025

0.0648 0.1035 0.3242

0.1444 0.1357 0.6291

1 0.8599 0.3948

0.8599 1 0.3100

0.3948; 0.3100; 1 ];

[Y,eigvals] = cmdscale(1-Rho); [eigvals eigvals./max(abs(eigvals))] ans = 10×2 1.1416 0.7742 0.0335 0.0280 0.0239 0.0075 0.0046 -0.0000 -0.0151 -0.0472

1.0000 0.6782 0.0294 0.0245 0.0210 0.0066 0.0040 -0.0000 -0.0132 -0.0413

plot(Y(:,1),Y(:,2),'bx'); axis(max(max(abs(Y))) * [-1.1,1.1,-1.1,1.1]); axis('square'); text(Y(:,1),Y(:,2),labels,'HorizontalAlignment','left'); xline(0,'Color',[.7 .7 .7]) yline(0,'Color',[.7 .7 .7])

16-181

16

Multivariate Methods

A Comparison of Principal Components Analysis and Classical Multidimensional Scaling Multidimensional scaling is most often used to visualize data when only their distances or dissimilarities are available. However, when the original data are available, multidimensional scaling can also be used as a dimension reduction method, by reducing the data to a distance matrix, creating a new configuration of points using cmdscale, and retaining only the first few dimensions of those points. This application of multidimensional scaling is much like Principal Components Analysis, and in fact, when you call cmdscale using the Euclidean distances between the points, the results are identical to PCA, up to a change in sign. n = 10; m = 5; X = randn(n,m); D = pdist(X,'Euclidean'); [Y,eigvals] = cmdscale(D); [PC,Score,latent] = pca(X); Y Y = 10×5 -1.4505 2.6140 -2.2399 -0.4956 0.1004 -2.5996 -1.5565 0.4656 2.3961 2.7660

1.6602 -1.0513 -1.6699 0.2265 -2.3659 1.0635 0.4215 -0.6250 2.6933 -0.3529

0.8106 -1.1962 -0.7881 1.2682 1.2672 -0.8532 -0.0931 -0.7608 -0.2020 0.5474

0.5834 0.7221 -0.6659 -0.5123 0.4837 0.1392 0.2863 -0.3233 -0.2572 -0.4560

0.5952 -0.2299 0.0398 -0.5702 -0.2888 -0.1216 0.0299 0.2786 -0.4374 0.7044

1.6602 -1.0513 -1.6699 0.2265 -2.3659 1.0635 0.4215 -0.6250 2.6933 -0.3529

-0.8106 1.1962 0.7881 -1.2682 -1.2672 0.8532 0.0931 0.7608 0.2020 -0.5474

-0.5834 -0.7221 0.6659 0.5123 -0.4837 -0.1392 -0.2863 0.3233 0.2572 0.4560

-0.5952 0.2299 -0.0398 0.5702 0.2888 0.1216 -0.0299 -0.2786 0.4374 -0.7044

Score Score = 10×5 -1.4505 2.6140 -2.2399 -0.4956 0.1004 -2.5996 -1.5565 0.4656 2.3961 2.7660

Even the nonzero eigenvalues are identical up to a scale factor. [eigvals(1:m) (n-1)*latent] ans = 5×2 36.9993 21.3766 7.5792

16-182

36.9993 21.3766 7.5792

Classical Multidimensional Scaling Applied to Nonspatial Distances

2.2815 1.5981

2.2815 1.5981

16-183

16

Multivariate Methods

Nonclassical Multidimensional Scaling This example shows how to visualize dissimilarity data using nonclassical forms of multidimensional scaling (MDS). Dissimilarity data arises when we have some set of objects, and instead of measuring the characteristics of each object, we can only measure how similar or dissimilar each pair of objects is. For example, instead of knowing the latitude and longitude of a set of cities, we may only know their inter-city distances. However, MDS also works with dissimilarities that are more abstract than physical distance. For example, we may have asked consumers to rate how similar they find several brands of peanut butter. The typical goal of MDS is to create a configuration of points in one, two, or three dimensions, whose inter-point distances are "close" to the original dissimilarities. The different forms of MDS use different criteria to define "close". These points represent the set of objects, and so a plot of the points can be used as a visual representation of their dissimilarities. Some applications of "classical" MDS are described in the “Classical Multidimensional Scaling Applied to Nonspatial Distances” on page 16-176 example. Rothkopf's Morse Code Dataset To demonstrate MDS, we'll use data collected in an experiment to investigate perception of Morse code (Rothkopf, E.Z., J. Exper. Psych., 53(2):94-101). Subjects in the study listened to two Morse code signals (audible sequences of one or more "dots" and "dashes", representing the 36 alphanumeric characters) played in succession, and were asked whether the signals were the same or different. The subjects did not know Morse code. The dissimilarity between two different characters is the frequency with which those characters were correctly distinguished. The 36x36 matrix of dissimilarities is stored as a 630-element vector containing the subdiagonal elements of the matrix. You can use the function squareform to transform between the vector format and the full matrix form. Here are the first 5 letters and their dissimilarities, reconstructed in matrix form. load morse morseChars(1:5,:) ans = 5x2 cell {'A'} {'.-' } {'B'} {'-...'} {'C'} {'-.-.'} {'D'} {'-..' } {'E'} {'.' } dissMatrix = squareform(dissimilarities); dissMatrix(1:5,1:5) ans = 5×5 0 167 169 159

16-184

167 0 96 79

169 96 0 141

159 79 141 0

180 163 166 172

Nonclassical Multidimensional Scaling

180

163

166

172

0

In these data, larger values indicate that more experimental subjects were able to distinguish the two signals, and so the signals were more dissimilar. Metric Scaling Metric MDS creates a configuration of points such that their interpoint distances approximate the original dissimilarities. One measure of the goodness of fit of that approximation is known as the "stress", and that's what we'll use initially. To compute the configuration, we provide the mdscale function with the dissimilarity data, the number of dimensions in which we want to create the points (two), and the name of the goodness-of-fit criterion we are using. Y1 = mdscale(dissimilarities, 2, 'criterion','metricstress'); size(Y1) ans = 1×2 36

2

mdscale returns a set of points in, for this example, two dimensions. We could plot them, but before using this solution (i.e. the configuration) to visualize the data, we'll make some plots to help check whether the interpoint distances from this solution recreate the original dissimilarities. The Shepard Plot The Shepard plot is a scatterplot of the interpoint distances (there are n(n-1)/2 of them) vs. the original dissimilarities. This can help determine goodness of fit of the MDS solution. If the fit is poor, then visualization could be misleading, because large (small) distances between points might not correspond to large (small) dissimilarities in the data. In the Shepard plot, a narrow scatter around a 1:1 line indicates a good fit of the distances to the dissimilarities, while a large scatter or a nonlinear pattern indicates a lack of fit. distances1 = pdist(Y1); plot(dissimilarities,distances1,'bo', [0 200],[0 200],'k--'); xlabel('Dissimilarities') ylabel('Distances')

16-185

16

Multivariate Methods

This plot indicates that this metric solution in two dimensions is probably not appropriate, because it shows both a nonlinear pattern and a large scatter. The former implies that many of the largest dissimilarities would tend to be somewhat exaggerated in the visualization, while moderate and small dissimilarities would tend to be understated. The latter implies that distance in the visualization would generally be a poor reflection of dissimilarity. In particular, a good fraction of the large dissimilarities would be badly understated. Comparing Metric Criteria We could try using a third dimension to improve the fidelity of the visualization, because with more degrees of freedom, the fit should improve. We can also try a different criterion. Two other popular metric criteria are known as Sammon Mapping and squared stress ("sstress"). Each leads to a different solution, and one or the other might be more useful in visualizing the original dissimilarities. Y2 = mdscale(dissimilarities,2, 'criterion','sammon'); distances2 = pdist(Y2); Y3 = mdscale(dissimilarities,2, 'criterion','metricsstress'); distances3 = pdist(Y3);

A Shepard plot shows the differences in the three solutions so far. plot(dissimilarities,distances1,'bo', ... dissimilarities,distances2,'r+', ... dissimilarities,distances3,'g^', ... [0 200],[0 200],'k--'); xlabel('Dissimilarities')

16-186

Nonclassical Multidimensional Scaling

ylabel('Distances') legend({'Stress', 'Sammon Mapping', 'Squared Stress'}, 'Location','NorthWest');

Notice that at the largest dissimilarity values, the scatter for the squared stress criterion tends to be closer to the 1:1 line than for the other two criteria. Thus, for these data, squared stress is somewhat better at preserving the largest dissimilarities, although it badly understates some of those. At smaller dissimilarity values, the scatter for the Sammon Mapping criterion tends to be somewhat closer to the 1:1 line than for the other two criteria. Thus, Sammon Mapping is a little better at preserving small dissimilarities. Stress is somewhere in between. All three criteria show a certain amount of nonlinearity, indicating that metric scaling may not be suitable. However, the choice of criterion depends on the goal of the visualization. Nonmetric Scaling Nonmetric scaling is a second form of MDS that has a slightly less ambitious goal than metric scaling. Instead of attempting to create a configuration of points for which the pairwise distances approximate the original dissimilarities, nonmetric MDS attempts only to approximate the ranks of the dissimilarities. Another way of saying this is that nonmetric MDS creates a configuration of points whose interpoint distances approximate a monotonic transformation of the original dissimilarities. The practical use of such a construction is that large interpoint distances correspond to large dissimilarities, and small interpoint distances to small dissimilarities. This is often sufficient to convey the relationships among the items or categories being studied. First, we'll create a configuration of points in 2D. Nonmetric scaling with Kruskal's nonmetric stress criterion is the default for mdscale. 16-187

16

Multivariate Methods

[Y,stress,disparities] = mdscale(dissimilarities,2); stress stress = 0.1800

The second output of mdscale is the value of the criterion being used, as a measure of how well the solution recreates the dissimilarities. Smaller values indicate a better fit. The stress for this configuration, about 18%, is considered poor to fair for the nonmetric stress criterion. The ranges of acceptable criterion values differ for the different criteria. The third output of mdscale is a vector of what are known as disparities. These are simply the monotonic transformation of the dissimilarities. They will be used in a nonmetric scaling Shepard plot below. Visualizing the Dissimilarity Data Although this fit is not as good as we would like, the 2D representation is easiest to visualize. We can plot each signal's dots and dashes to help see why the subjects perceive differences among the characters. The orientation and scale of this configuration is completely arbitrary, so no axis labels or values have been shown. plot(Y(:,1),Y(:,2),'.', 'Marker','none'); text(Y(:,1),Y(:,2),char(morseChars(:,2)), 'Color','b', ... 'FontSize',12,'FontWeight','bold', 'HorizontalAlignment','center'); h_gca = gca; h_gca.XTickLabel = []; h_gca.YTickLabel = []; title('Nonmetric MDS solution for Rothkopf''s Morse code data');

16-188

Nonclassical Multidimensional Scaling

This reconstruction indicates that the characters can be described in terms of two axes: roughly speaking, the northwest/southeast direction discriminates signal length, while the southwest/ northeast direction discriminates dots from dashes. The two characters with the shortest signals, 'E' and 'T', are somewhat out of position in that interpretation. The Nonmetric Shepard Plot In nonmetric scaling, it is customary to show the disparities as well as the distances in a Shepard plot. This provides a check on how well the distances recreate the disparities, as well as how nonlinear the monotonic transformation from dissimilarities to disparities is. distances = pdist(Y); [dum,ord] = sortrows([disparities(:) dissimilarities(:)]); plot(dissimilarities,distances,'bo', ... dissimilarities(ord),disparities(ord),'r.-'); xlabel('Dissimilarities') ylabel('Distances/Disparities') legend({'Distances' 'Disparities'}, 'Location','NorthWest');

This plot shows how the distances in nonmetric scaling approximate the disparities (the scatter of blue circles about the red line), and the disparities reflect the ranks of the dissimilarities (the red line is nonlinear but increasing). Comparing this plot to the Shepard plot from metric scaling shows the difference in the two methods. Nonmetric scaling attempts to recreate not the original dissimilarities, but rather a nonlinear transformation of them (the disparities). In doing that, nonmetric scaling has made a trade-off: the nonmetric distances recreate the disparities better than the metric distances recreated the dissimilarities -- the scatter in this plot is 16-189

16

Multivariate Methods

smaller that in the metric plot. However, the disparities are quite nonlinear as a function of the dissimilarities. Thus, while we can be more certain that with the nonmetric solution, small distances in the visualization correspond to small dissimilarities in the data, it's important to remember that absolute distances between points in that visualization should not be taken too literally -- only relative distances. Nonmetric Scaling in 3D Because the stress in the 2D construction was somewhat high, we can try a 3D configuration. [Y,stress,disparities] = mdscale(dissimilarities,3); stress stress = 0.1189

This stress value is quite a bit lower, indicating a better fit. We can plot the configuration in 3 dimensions. A live MATLAB® figure can be rotated interactively; here we will settle for looking from two different angles. plot3(Y(:,1),Y(:,2),Y(:,3),'.', 'Marker','none'); text(Y(:,1),Y(:,2),Y(:,3),char(morseChars(:,2)), 'Color','b', ... 'FontSize',12,'FontWeight','bold', 'HorizontalAlignment','center'); set(gca,'XTickLabel',[], 'YTickLabel',[], 'ZTickLabel',[]); title('Nonmetric MDS solution for Rothkopf''s Morse code data'); view(59,18); grid on

16-190

Nonclassical Multidimensional Scaling

From this angle, we can see that the characters with one- and two-symbol signals are well-separated from the characters with longer signals, and from each other, because they are the easiest to distinguish. If we rotate the view to a different perspective, we can see that the longer characters can, as in the 2D configuration, roughly be described in terms of the number of symbols and the number of dots or dashes. (From this second angle, some of the shorter characters spuriously appear to be interspersed with the longer ones.) view(-9,8);

This 3D configuration reconstructs the distances more accurately than the 2D configuration, however, the message is essentially the same: the subjects perceive the signals primarily in terms of how many symbols they contain, and how many dots vs. dashes. In practice, the 2D configuration might be perfectly acceptable.

16-191

16

Multivariate Methods

Fitting an Orthogonal Regression Using Principal Components Analysis This example shows how to use Principal Components Analysis (PCA) to fit a linear regression. PCA minimizes the perpendicular distances from the data to the fitted model. This is the linear case of what is known as Orthogonal Regression or Total Least Squares, and is appropriate when there is no natural distinction between predictor and response variables, or when all variables are measured with error. This is in contrast to the usual regression assumption that predictor variables are measured exactly, and only the response variable has an error component. For example, given two data vectors x and y, you can fit a line that minimizes the perpendicular distances from each of the points (x(i), y(i)) to the line. More generally, with p observed variables, you can fit an r-dimensional hyperplane in p-dimensional space (r < p). The choice of r is equivalent to choosing the number of components to retain in PCA. It may be based on prediction error, or it may simply be a pragmatic choice to reduce data to a manageable number of dimensions. In this example, we fit a plane and a line through some data on three observed variables. It's easy to do the same thing for any number of variables, and for any dimension of model, although visualizing a fit in higher dimensions would obviously not be straightforward. Fitting a Plane to 3-D Data First, we generate some trivariate normal data for the example. Two of the variables are fairly strongly correlated. rng(5,'twister'); X = mvnrnd([0 0 0], [1 .2 .7; .2 1 0; .7 0 1],50); plot3(X(:,1),X(:,2),X(:,3),'bo'); grid on; maxlim = max(abs(X(:)))*1.1; axis([-maxlim maxlim -maxlim maxlim -maxlim maxlim]); axis square view(-9,12);

16-192

Fitting an Orthogonal Regression Using Principal Components Analysis

Next, we fit a plane to the data using PCA. The coefficients for the first two principal components define vectors that form a basis for the plane. The third PC is orthogonal to the first two, and its coefficients define the normal vector of the plane. [coeff,score,roots] = pca(X); basis = coeff(:,1:2) basis = 3×2 0.6774 0.2193 0.7022

-0.0790 0.9707 -0.2269

normal = coeff(:,3) normal = 3×1 0.7314 -0.0982 -0.6749

That's all there is to the fit. But let's look closer at the results, and plot the fit along with the data. Because the first two components explain as much of the variance in the data as is possible with two dimensions, the plane is the best 2-D linear approximation to the data. Equivalently, the third component explains the least amount of variation in the data, and it is the error term in the 16-193

16

Multivariate Methods

regression. The latent roots (or eigenvalues) from the PCA define the amount of explained variance for each component. pctExplained = roots' ./ sum(roots) pctExplained = 1×3 0.6226

0.2976

0.0798

The first two coordinates of the principal component scores give the projection of each point onto the plane, in the coordinate system of the plane. To get the coordinates of the fitted points in terms of the original coordinate system, we multiply each PC coefficient vector by the corresponding score, and add back in the mean of the data. The residuals are simply the original data minus the fitted points. [n,p] = size(X); meanX = mean(X,1); Xfit = repmat(meanX,n,1) + score(:,1:2)*coeff(:,1:2)'; residuals = X - Xfit;

The equation of the fitted plane, satisfied by each of the fitted points in Xfit, is ([x1 x2 x3] meanX)*normal = 0. The plane passes through the point meanX, and its perpendicular distance to the origin is meanX*normal. The perpendicular distance from each point in X to the plane, i.e., the norm of the residuals, is the dot product of each centered point with the normal to the plane. The fitted plane minimizes the sum of the squared errors. error = abs((X - repmat(meanX,n,1))*normal); sse = sum(error.^2) sse = 15.5142

To visualize the fit, we can plot the plane, the original data, and their projection to the plane. [xgrid,ygrid] = meshgrid(linspace(min(X(:,1)),max(X(:,1)),5), ... linspace(min(X(:,2)),max(X(:,2)),5)); zgrid = (1/normal(3)) .* (meanX*normal - (xgrid.*normal(1) + ygrid.*normal(2))); h = mesh(xgrid,ygrid,zgrid,'EdgeColor',[0 0 0],'FaceAlpha',0); hold on above = (X-repmat(meanX,n,1))*normal < 0; below = ~above; nabove = sum(above); X1 = [X(above,1) Xfit(above,1) nan*ones(nabove,1)]; X2 = [X(above,2) Xfit(above,2) nan*ones(nabove,1)]; X3 = [X(above,3) Xfit(above,3) nan*ones(nabove,1)]; plot3(X1',X2',X3','-', X(above,1),X(above,2),X(above,3),'o', 'Color',[0 .7 0]); nbelow = sum(below); X1 = [X(below,1) Xfit(below,1) nan*ones(nbelow,1)]; X2 = [X(below,2) Xfit(below,2) nan*ones(nbelow,1)]; X3 = [X(below,3) Xfit(below,3) nan*ones(nbelow,1)]; plot3(X1',X2',X3','-', X(below,1),X(below,2),X(below,3),'o', 'Color',[1 0 0]); hold off maxlim = max(abs(X(:)))*1.1; axis([-maxlim maxlim -maxlim maxlim -maxlim maxlim]); axis square view(-9,12);

16-194

Fitting an Orthogonal Regression Using Principal Components Analysis

Green points are above the plane, red points are below. Fitting a Line to 3-D Data Fitting a straight line to the data is even simpler, and because of the nesting property of PCA, we can use the components that have already been computed. The direction vector that defines the line is given by the coefficients for the first principal component. The second and third PCs are orthogonal to the first, and their coefficients define directions that are perpendicular to the line. The simplest equation to describe the line is meanX + t*dirVect, where t parameterizes the position along the line. dirVect = coeff(:,1) dirVect = 3×1 0.6774 0.2193 0.7022

The first coordinate of the principal component scores gives the projection of each point onto the line. As with the 2-D fit, the PC coefficient vectors multiplied by the scores the gives the fitted points in the original coordinate system. Xfit1 = repmat(meanX,n,1) + score(:,1)*coeff(:,1)';

Plot the line, the original data, and their projection to the line. 16-195

16

Multivariate Methods

t = [min(score(:,1))-.2, max(score(:,1))+.2]; endpts = [meanX + t(1)*dirVect'; meanX + t(2)*dirVect']; plot3(endpts(:,1),endpts(:,2),endpts(:,3),'k-'); X1 = [X(:,1) Xfit1(:,1) nan*ones(n,1)]; X2 = [X(:,2) Xfit1(:,2) nan*ones(n,1)]; X3 = [X(:,3) Xfit1(:,3) nan*ones(n,1)]; hold on plot3(X1',X2',X3','b-', X(:,1),X(:,2),X(:,3),'bo'); hold off maxlim = max(abs(X(:)))*1.1; axis([-maxlim maxlim -maxlim maxlim -maxlim maxlim]); axis square view(-9,12); grid on

While it appears that many of the projections in this plot are not perpendicular to the line, that's just because we're plotting 3-D data in two dimensions. In a live MATLAB® figure window, you could interactively rotate the plot to different perspectives to verify that the projections are indeed perpendicular, and to get a better feel for how the line fits the data.

16-196

Tune Regularization Parameter to Detect Features Using NCA for Classification

Tune Regularization Parameter to Detect Features Using NCA for Classification This example shows how to tune the regularization parameter in fscnca using cross-validation. Tuning the regularization parameter helps to correctly detect the relevant features in the data. Load the sample data. load('twodimclassdata.mat')

This dataset is simulated using the scheme described in [1]. This is a two-class classification problem in two dimensions. Data from the first class are drawn from two bivariate normal distributions N(μ1, Σ) or N(μ2, Σ) with equal probability, where μ1 = [ − 0 . 75, − 1 . 5], μ2 = [0 . 75, 1 . 5] and Σ = I2. Similarly, data from the second class are drawn from two bivariate normal distributions N(μ3, Σ) or N(μ4, Σ) with equal probability, where μ3 = [1 . 5, − 1 . 5], μ4 = [ − 1 . 5, 1 . 5] and Σ = I2. The normal distribution parameters used to create this data set results in tighter clusters in data than the data used in [1]. Create a scatter plot of the data grouped by the class. figure gscatter(X(:,1),X(:,2),y) xlabel('x1') ylabel('x2')

16-197

16

Multivariate Methods

Add 100 irrelevant features to X . First generate data from a Normal distribution with a mean of 0 and a variance of 20. n = size(X,1); rng('default') XwithBadFeatures = [X,randn(n,100)*sqrt(20)];

Normalize the data so that all points are between 0 and 1. XwithBadFeatures = (XwithBadFeatures-min(XwithBadFeatures,[],1))./range(XwithBadFeatures,1); X = XwithBadFeatures;

Fit an NCA model to the data using the default Lambda (regularization parameter, λ) value. Use the LBFGS solver and display the convergence information. ncaMdl = fscnca(X,y,'FitMethod','exact','Verbose',1, ... 'Solver','lbfgs'); o Solver = LBFGS, HessianHistorySize = 15, LineSearchMethod = weakwolfe

|================================================================================================ | ITER | FUN VALUE | NORM GRAD | NORM STEP | CURV | GAMMA | ALPHA | ACC |================================================================================================ | 0 | 9.519258e-03 | 1.494e-02 | 0.000e+00 | | 4.015e+01 | 0.000e+00 | Y | 1 | -3.093574e-01 | 7.186e-03 | 4.018e+00 | OK | 8.956e+01 | 1.000e+00 | Y | 2 | -4.809455e-01 | 4.444e-03 | 7.123e+00 | OK | 9.943e+01 | 1.000e+00 | Y | 3 | -4.938877e-01 | 3.544e-03 | 1.464e+00 | OK | 9.366e+01 | 1.000e+00 | Y | 4 | -4.964759e-01 | 2.901e-03 | 6.084e-01 | OK | 1.554e+02 | 1.000e+00 | Y | 5 | -4.972077e-01 | 1.323e-03 | 6.129e-01 | OK | 1.195e+02 | 5.000e-01 | Y | 6 | -4.974743e-01 | 1.569e-04 | 2.155e-01 | OK | 1.003e+02 | 1.000e+00 | Y | 7 | -4.974868e-01 | 3.844e-05 | 4.161e-02 | OK | 9.835e+01 | 1.000e+00 | Y | 8 | -4.974874e-01 | 1.417e-05 | 1.073e-02 | OK | 1.043e+02 | 1.000e+00 | Y | 9 | -4.974874e-01 | 4.893e-06 | 1.781e-03 | OK | 1.530e+02 | 1.000e+00 | Y | 10 | -4.974874e-01 | 9.404e-08 | 8.947e-04 | OK | 1.670e+02 | 1.000e+00 | Y Infinity norm of the final gradient = 9.404e-08 Two norm of the final step = 8.947e-04, TolX = 1.000e-06 Relative infinity norm of the final gradient = 9.404e-08, TolFun = 1.000e-06 EXIT: Local minimum found.

Plot the feature weights. The weights of the irrelevant features should be very close to zero. figure semilogx(ncaMdl.FeatureWeights,'ro') xlabel('Feature index') ylabel('Feature weight') grid on

16-198

Tune Regularization Parameter to Detect Features Using NCA for Classification

All weights are very close to zero. This indicates that the value of λ used in training the model is too large. When λ → ∞, all features weights approach to zero. Hence, it is important to tune the regularization parameter in most cases to detect the relevant features. Use five-fold cross-validation to tune λ for feature selection using fscnca. Tuning λ means finding the λ value that will produce the minimum classification loss. Here are the steps for tuning λ using cross-validation: 1. First partition the data into five folds. For each fold, cvpartition assigns 4/5th of the data as a training set, and 1/5th of the data as a test set. cvp numtestsets lambdavalues lossvalues

= = = =

cvpartition(y,'kfold',5); cvp.NumTestSets; linspace(0,2,20)/length(y); zeros(length(lambdavalues),numtestsets);

2. Train the neighborhood component analysis (NCA) model for each λ value using the training set in each fold. 3. Compute the classification loss for the corresponding test set in the fold using the NCA model. Record the loss value. 4. Repeat this for all folds and all λ values. for i = 1:length(lambdavalues) for k = 1:numtestsets

16-199

16

Multivariate Methods

% Extract the training set from the partition object Xtrain = X(cvp.training(k),:); ytrain = y(cvp.training(k),:); % Extract the test set from the partition object Xtest = X(cvp.test(k),:); ytest = y(cvp.test(k),:); % Train an NCA model for classification using the training set ncaMdl = fscnca(Xtrain,ytrain,'FitMethod','exact', ... 'Solver','lbfgs','Lambda',lambdavalues(i)); % Compute the classification loss for the test set using the NCA % model lossvalues(i,k) = loss(ncaMdl,Xtest,ytest, ... 'LossFunction','quadratic'); end end

Plot the average loss values of the folds versus the λ values. figure plot(lambdavalues,mean(lossvalues,2),'ro-') xlabel('Lambda values') ylabel('Loss values') grid on

16-200

Tune Regularization Parameter to Detect Features Using NCA for Classification

Find the λ value that corresponds to the minimum average loss. [~,idx] = min(mean(lossvalues,2)); % Find the index bestlambda = lambdavalues(idx) % Find the best lambda value bestlambda = 0.0037

Fit the NCA model to all of the data using the best λ value. Use the LBFGS solver and display the convergence information. ncaMdl = fscnca(X,y,'FitMethod','exact','Verbose',1, ... 'Solver','lbfgs','Lambda',bestlambda); o Solver = LBFGS, HessianHistorySize = 15, LineSearchMethod = weakwolfe

|================================================================================================ | ITER | FUN VALUE | NORM GRAD | NORM STEP | CURV | GAMMA | ALPHA | ACC |================================================================================================ | 0 | -1.246913e-01 | 1.231e-02 | 0.000e+00 | | 4.873e+01 | 0.000e+00 | Y | 1 | -3.411330e-01 | 5.717e-03 | 3.618e+00 | OK | 1.068e+02 | 1.000e+00 | Y | 2 | -5.226111e-01 | 3.763e-02 | 8.252e+00 | OK | 7.825e+01 | 1.000e+00 | Y | 3 | -5.817731e-01 | 8.496e-03 | 2.340e+00 | OK | 5.591e+01 | 5.000e-01 | Y | 4 | -6.132632e-01 | 6.863e-03 | 2.526e+00 | OK | 8.228e+01 | 1.000e+00 | Y | 5 | -6.135264e-01 | 9.373e-03 | 7.341e-01 | OK | 3.244e+01 | 1.000e+00 | Y | 6 | -6.147894e-01 | 1.182e-03 | 2.933e-01 | OK | 2.447e+01 | 1.000e+00 | Y | 7 | -6.148714e-01 | 6.392e-04 | 6.688e-02 | OK | 3.195e+01 | 1.000e+00 | Y | 8 | -6.149524e-01 | 6.521e-04 | 9.934e-02 | OK | 1.236e+02 | 1.000e+00 | Y | 9 | -6.149972e-01 | 1.154e-04 | 1.191e-01 | OK | 1.171e+02 | 1.000e+00 | Y | 10 | -6.149990e-01 | 2.922e-05 | 1.983e-02 | OK | 7.365e+01 | 1.000e+00 | Y | 11 | -6.149993e-01 | 1.556e-05 | 8.354e-03 | OK | 1.288e+02 | 1.000e+00 | Y | 12 | -6.149994e-01 | 1.147e-05 | 7.256e-03 | OK | 2.332e+02 | 1.000e+00 | Y | 13 | -6.149995e-01 | 1.040e-05 | 6.781e-03 | OK | 2.287e+02 | 1.000e+00 | Y | 14 | -6.149996e-01 | 9.015e-06 | 6.265e-03 | OK | 9.974e+01 | 1.000e+00 | Y | 15 | -6.149996e-01 | 7.763e-06 | 5.206e-03 | OK | 2.919e+02 | 1.000e+00 | Y | 16 | -6.149997e-01 | 8.374e-06 | 1.679e-02 | OK | 6.878e+02 | 1.000e+00 | Y | 17 | -6.149997e-01 | 9.387e-06 | 9.542e-03 | OK | 1.284e+02 | 5.000e-01 | Y | 18 | -6.149997e-01 | 3.250e-06 | 5.114e-03 | OK | 1.225e+02 | 1.000e+00 | Y | 19 | -6.149997e-01 | 1.574e-06 | 1.275e-03 | OK | 1.808e+02 | 1.000e+00 | Y

|================================================================================================ | ITER | FUN VALUE | NORM GRAD | NORM STEP | CURV | GAMMA | ALPHA | ACC |================================================================================================ | 20 | -6.149997e-01 | 5.764e-07 | 6.765e-04 | OK | 2.905e+02 | 1.000e+00 | Y Infinity norm of the final gradient = 5.764e-07 Two norm of the final step = 6.765e-04, TolX = 1.000e-06 Relative infinity norm of the final gradient = 5.764e-07, TolFun = 1.000e-06 EXIT: Local minimum found.

Plot the feature weights. figure semilogx(ncaMdl.FeatureWeights,'ro') xlabel('Feature index') ylabel('Feature weight') grid on

16-201

16

Multivariate Methods

fscnca correctly figures out that the first two features are relevant and the rest are not. Note that the first two features are not individually informative but when taken together result in an accurate classification model. References 1. Yang, W., K. Wang, W. Zuo. "Neighborhood Component Feature Selection for High-Dimensional Data." Journal of Computers. Vol. 7, Number 1, January, 2012.

See Also FeatureSelectionNCAClassification | fscnca | refit | predict | loss

More About

16-202

•

“Neighborhood Component Analysis (NCA) Feature Selection” on page 16-96

•

“Introduction to Feature Selection” on page 16-46

17 Cluster Analysis • “Choose Cluster Analysis Method” on page 17-2 • “Hierarchical Clustering” on page 17-6 • “DBSCAN” on page 17-19 • “Partition Data Using Spectral Clustering” on page 17-26 • “k-Means Clustering” on page 17-33 • “Cluster Using Gaussian Mixture Model” on page 17-39 • “Cluster Gaussian Mixture Data Using Hard Clustering” on page 17-46 • “Cluster Gaussian Mixture Data Using Soft Clustering” on page 17-52 • “Tune Gaussian Mixture Models” on page 17-57 • “Cluster Evaluation” on page 17-63 • “Cluster Analysis” on page 17-66 • “Anomaly Detection with Isolation Forest” on page 17-81 • “Unsupervised Anomaly Detection” on page 17-91 • “Model-Specific Anomaly Detection” on page 17-111

17

Cluster Analysis

Choose Cluster Analysis Method This topic provides a brief overview of the available clustering methods in Statistics and Machine Learning Toolbox.

Clustering Methods Cluster analysis, also called segmentation analysis or taxonomy analysis, is a common unsupervised learning method. Unsupervised learning is used to draw inferences from data sets consisting of input data without labeled responses. For example, you can use cluster analysis for exploratory data analysis to find hidden patterns or groupings in unlabeled data. Cluster analysis creates groups, or clusters, of data. Objects that belong to the same cluster are similar to one another and distinct from objects that belong to different clusters. To quantify "similar" and "distinct," you can use a dissimilarity measure (or distance metric on page 19-14) that is specific to the domain of your application and your data set. Also, depending on your application, you might consider scaling (or standardizing) the variables in your data to give them equal importance during clustering. Statistics and Machine Learning Toolbox provides functionality for these clustering methods: • “Hierarchical Clustering” on page 17-2 • “k-Means and k-Medoids Clustering” on page 17-2 • “Density-Based Spatial Clustering of Applications with Noise (DBSCAN)” on page 17-3 • “Gaussian Mixture Model” on page 17-3 • “k-Nearest Neighbor Search and Radius Search” on page 17-3 • “Spectral Clustering” on page 17-3 Hierarchical Clustering Hierarchical clustering groups data over a variety of scales by creating a cluster tree, or dendrogram. The tree is not a single set of clusters, but rather a multilevel hierarchy, where clusters at one level combine to form clusters at the next level. This multilevel hierarchy allows you to choose the level, or scale, of clustering that is most appropriate for your application. Hierarchical clustering assigns every point in your data to a cluster. Use clusterdata to perform hierarchical clustering on input data. clusterdata incorporates the pdist, linkage, and cluster functions, which you can use separately for more detailed analysis. The dendrogram function plots the cluster tree. For more information, see “Introduction to Hierarchical Clustering” on page 17-6. k-Means and k-Medoids Clustering k-means clustering and k-medoids clustering partition data into k mutually exclusive clusters. These clustering methods require that you specify the number of clusters k. Both k-means and k-medoids clustering assign every point in your data to a cluster; however, unlike hierarchical clustering, these methods operate on actual observations (rather than dissimilarity measures), and create a single level of clusters. Therefore, k-means or k-medoids clustering is often more suitable than hierarchical clustering for large amounts of data. 17-2

Choose Cluster Analysis Method

Use kmeans and kmedoids to implement k-means clustering and k-medoids clustering, respectively. For more information, see Introduction to k-Means Clustering on page 17-33 and k-Medoids Clustering on page 35-4483. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) DBSCAN is a density-based algorithm that identifies arbitrarily shaped clusters and outliers (noise) in data. During clustering, DBSCAN identifies points that do not belong to any cluster, which makes this method useful for density-based outlier detection. Unlike k-means and k-medoids clustering, DBSCAN does not require prior knowledge of the number of clusters. Use dbscan to perform clustering on an input data matrix or on pairwise distances between observations. For more information, see “Introduction to DBSCAN” on page 17-19. Gaussian Mixture Model A Gaussian mixture model (GMM) forms clusters as a mixture of multivariate normal density components. For a given observation, the GMM assigns posterior probabilities to each component density (or cluster). The posterior probabilities indicate that the observation has some probability of belonging to each cluster. A GMM can perform hard clustering by selecting the component that maximizes the posterior probability as the assigned cluster for the observation. You can also use a GMM to perform soft, or fuzzy, clustering by assigning the observation to multiple clusters based on the scores or posterior probabilities of the observation for the clusters. A GMM can be a more appropriate method than k-means clustering when clusters have different sizes and different correlation structures within them. Use fitgmdist to fit a gmdistribution object to your data. You can also use gmdistribution to create a GMM object by specifying the distribution parameters. When you have a fitted GMM, you can cluster query data by using the cluster function. For more information, see “Cluster Using Gaussian Mixture Model” on page 17-39. k-Nearest Neighbor Search and Radius Search k-nearest neighbor search finds the k closest points in your data to a query point or set of query points. In contrast, radius search finds all points in your data that are within a specified distance from a query point or set of query points. The results of these methods depend on the distance metric on page 19-14 that you specify. Use the knnsearch function to find k-nearest neighbors or the rangesearch function to find all neighbors within a specified distance of your input data. You can also create a searcher object using a training data set, and pass the object and query data sets to the object functions (knnsearch and rangesearch). For more information, see “Classification Using Nearest Neighbors” on page 19-14. Spectral Clustering Spectral clustering is a graph-based algorithm for finding k arbitrarily shaped clusters in data. The technique involves representing the data in a low dimension. In the low dimension, clusters in the data are more widely separated, enabling you to use algorithms such as k-means or k-medoids clustering. This low dimension is based on eigenvectors of a Laplacian matrix. A Laplacian matrix is one way of representing a similarity graph that models the local neighborhood relationships between data points as an undirected graph. Use spectralcluster to perform spectral clustering on an input data matrix or on a similarity matrix of a similarity graph. spectralcluster requires that you specify the number of clusters. 17-3

17

Cluster Analysis

However, the algorithm for spectral clustering also provides a way to estimate the number of clusters in your data. For more information, see “Partition Data Using Spectral Clustering” on page 17-26.

Comparison of Clustering Methods This table compares the features of available clustering methods in Statistics and Machine Learning Toolbox.

17-4

Method

Basis of Algorithm

“Hierarchical Clustering” on page 17-6

Input to Algorithm

Requires Specified Number of Clusters

Cluster Shapes Identified

Distance Pairwise between objects distances between observations

No

Arbitrarily No shaped clusters, depending on the specified 'Linkage' algorithm

“k-Means Clustering” on page 17-33 and k-Medoids Clustering on page 35-4483

Distance Actual between objects observations and centroids

Yes

Spheroidal clusters with equal diagonal covariance

Density-Based Spatial Clustering of Applications with Noise (“DBSCAN” on page 17-19)

Density of regions in the data

Actual No observations or pairwise distances between observations

Arbitrarily Yes shaped clusters

“Gaussian Mixture Models”

Mixture of Gaussian distributions

Actual observations

Yes

Spheroidal clusters with different covariance structures

Nearest Neighbors

Distance Actual between objects observations

No

Arbitrarily Yes, depending shaped clusters on the specified number of neighbors

Spectral Clustering (“Partition Data Using Spectral Clustering” on page 17-26)

Graph representing connections between data points

Yes, but the algorithm also provides a way to estimate the number of clusters

Arbitrarily No shaped clusters

Actual observations or similarity matrix

Useful for Outlier Detection

No

Yes

Choose Cluster Analysis Method

See Also More About •

“Hierarchical Clustering” on page 17-6

•

“k-Means Clustering” on page 17-33

•

“DBSCAN” on page 17-19

•

“Cluster Using Gaussian Mixture Model” on page 17-39

•

“Partition Data Using Spectral Clustering” on page 17-26

17-5

17

Cluster Analysis

Hierarchical Clustering In this section... “Introduction to Hierarchical Clustering” on page 17-6 “Algorithm Description” on page 17-6 “Similarity Measures” on page 17-7 “Linkages” on page 17-8 “Dendrograms” on page 17-9 “Verify the Cluster Tree” on page 17-10 “Create Clusters” on page 17-15

Introduction to Hierarchical Clustering Hierarchical clustering groups data over a variety of scales by creating a cluster tree or dendrogram. The tree is not a single set of clusters, but rather a multilevel hierarchy, where clusters at one level are joined as clusters at the next level. This allows you to decide the level or scale of clustering that is most appropriate for your application. The function clusterdata supports agglomerative clustering and performs all of the necessary steps for you. It incorporates the pdist, linkage, and cluster functions, which you can use separately for more detailed analysis. The dendrogram function plots the cluster tree.

Algorithm Description To perform agglomerative hierarchical cluster analysis on a data set using Statistics and Machine Learning Toolbox functions, follow this procedure: 1

Find the similarity or dissimilarity between every pair of objects in the data set. In this step, you calculate the distance between objects using the pdist function. The pdist function supports many different ways to compute this measurement. See “Similarity Measures” on page 17-7 for more information.

2

Group the objects into a binary, hierarchical cluster tree. In this step, you link pairs of objects that are in close proximity using the linkage function. The linkage function uses the distance information generated in step 1 to determine the proximity of objects to each other. As objects are paired into binary clusters, the newly formed clusters are grouped into larger clusters until a hierarchical tree is formed. See “Linkages” on page 17-8 for more information.

3

Determine where to cut the hierarchical tree into clusters. In this step, you use the cluster function to prune branches off the bottom of the hierarchical tree, and assign all the objects below each cut to a single cluster. This creates a partition of the data. The cluster function can create these clusters by detecting natural groupings in the hierarchical tree or by cutting off the hierarchical tree at an arbitrary point.

The following sections provide more information about each of these steps. Note The function clusterdata performs all of the necessary steps for you. You do not need to execute the pdist, linkage, or cluster functions separately.

17-6

Hierarchical Clustering

Similarity Measures You use the pdist function to calculate the distance between every pair of objects in a data set. For a data set made up of m objects, there are m*(m – 1)/2 pairs in the data set. The result of this computation is commonly known as a distance or dissimilarity matrix. There are many ways to calculate this distance information. By default, the pdist function calculates the Euclidean distance between objects; however, you can specify one of several other options. See pdist for more information. Note You can optionally normalize the values in the data set before calculating the distance information. In a real world data set, variables can be measured against different scales. For example, one variable can measure Intelligence Quotient (IQ) test scores and another variable can measure head circumference. These discrepancies can distort the proximity calculations. Using the zscore function, you can convert all the values in the data set to use the same proportional scale. See zscore for more information. For example, consider a data set, X, made up of five objects where each object is a set of x,y coordinates. • Object 1: 1, 2 • Object 2: 2.5, 4.5 • Object 3: 2, 2 • Object 4: 4, 1.5 • Object 5: 4, 2.5 You can define this data set as a matrix rng("default") % For reproducibility X = [1 2; 2.5 4.5; 2 2; 4 1.5; ... 4 2.5];

and pass it to pdist. The pdist function calculates the distance between object 1 and object 2, object 1 and object 3, and so on until the distances between all the pairs have been calculated. The following figure plots these objects in a graph. The Euclidean distance between object 2 and object 3 is shown to illustrate one interpretation of distance.

17-7

17

Cluster Analysis

Distance Information The pdist function returns this distance information in a vector, Y, where each element contains the distance between a pair of objects. Y = pdist(X) Y = Columns 1 through 6 2.9155

1.0000

3.0414

3.0414

2.0616

1.0000

2.5495

3.3541

Columns 7 through 10 2.5000

2.0616

To make it easier to see the relationship between the distance information generated by pdist and the objects in the original data set, you can reformat the distance vector into a matrix using the squareform function. In this matrix, element i,j corresponds to the distance between object i and object j in the original data set. In the following example, element 1,1 represents the distance between object 1 and itself (which is zero). Element 1,2 represents the distance between object 1 and object 2, and so on. squareform(Y) ans = 0 2.9155 1.0000 3.0414 3.0414

2.9155 0 2.5495 3.3541 2.5000

1.0000 2.5495 0 2.0616 2.0616

3.0414 3.3541 2.0616 0 1.0000

3.0414 2.5000 2.0616 1.0000 0

Linkages Once the proximity between objects in the data set has been computed, you can determine how objects in the data set should be grouped into clusters, using the linkage function. The linkage function takes the distance information generated by pdist and links pairs of objects that are close together into binary clusters (clusters made up of two objects). The linkage function then links these newly formed clusters to each other and to other objects to create bigger clusters until all the objects in the original data set are linked together in a hierarchical tree. For example, given the distance vector Y generated by pdist from the sample data set of x- and ycoordinates, the linkage function generates a hierarchical cluster tree, returning the linkage information in a matrix, Z. Z = linkage(Y) Z = 4.0000 1.0000 6.0000 2.0000

17-8

5.0000 3.0000 7.0000 8.0000

1.0000 1.0000 2.0616 2.5000

Hierarchical Clustering

In this output, each row identifies a link between objects or clusters. The first two columns identify the objects that have been linked. The third column contains the distance between these objects. For the sample data set of x- and y-coordinates, the linkage function begins by grouping objects 4 and 5, which have the closest proximity (distance value = 1.0000). The linkage function continues by grouping objects 1 and 3, which also have a distance value of 1.0000. The third row indicates that the linkage function grouped objects 6 and 7. If the original sample data set contained only five objects, what are objects 6 and 7? Object 6 is the newly formed binary cluster created by the grouping of objects 4 and 5. When the linkage function groups two objects into a new cluster, it must assign the cluster a unique index value, starting with the value m + 1, where m is the number of objects in the original data set. (Values 1 through m are already used by the original data set.) Similarly, object 7 is the cluster formed by grouping objects 1 and 3. linkage uses distances to determine the order in which it clusters objects. The distance vector Y contains the distances between the original objects 1 through 5. But linkage must also be able to determine distances involving clusters that it creates, such as objects 6 and 7. By default, linkage uses a method known as single linkage. However, there are a number of different methods available. See the linkage reference page for more information. As the final cluster, the linkage function grouped object 8, the newly formed cluster made up of objects 6 and 7, with object 2 from the original data set. The following figure graphically illustrates the way linkage groups the objects into a hierarchy of clusters.

Dendrograms The hierarchical, binary cluster tree created by the linkage function is most easily understood when viewed graphically. The function dendrogram plots the tree as follows. dendrogram(Z)

17-9

17

Cluster Analysis

In the figure, the numbers along the horizontal axis represent the indices of the objects in the original data set. The links between objects are represented as upside-down U-shaped lines. The height of the U indicates the distance between the objects. For example, the link representing the cluster containing objects 1 and 3 has a height of 1. The link representing the cluster that groups object 2 together with objects 1, 3, 4, and 5, (which are already clustered as object 8) has a height of 2.5. The height represents the distance linkage computes between objects 2 and 8. For more information about creating a dendrogram diagram, see the dendrogram reference page.

Verify the Cluster Tree After linking the objects in a data set into a hierarchical cluster tree, you might want to verify that the distances (that is, heights) in the tree reflect the original distances accurately. In addition, you might want to investigate natural divisions that exist among links between objects. Statistics and Machine Learning Toolbox functions are available for both of these tasks, as described in the following sections. • “Verify Dissimilarity” on page 17-10 • “Verify Consistency” on page 17-11 Verify Dissimilarity In a hierarchical cluster tree, any two objects in the original data set are eventually linked together at some level. The height of the link represents the distance between the two clusters that contain those two objects. This height is known as the cophenetic distance between the two objects. One way to 17-10

Hierarchical Clustering

measure how well the cluster tree generated by the linkage function reflects your data is to compare the cophenetic distances with the original distance data generated by the pdist function. If the clustering is valid, the linking of objects in the cluster tree should have a strong correlation with the distances between objects in the distance vector. The cophenet function compares these two sets of values and computes their correlation, returning a value called the cophenetic correlation coefficient. The closer the value of the cophenetic correlation coefficient is to 1, the more accurately the clustering solution reflects your data. You can use the cophenetic correlation coefficient to compare the results of clustering the same data set using different distance calculation methods or clustering algorithms. For example, you can use the cophenet function to evaluate the clusters created for the sample data set. c = cophenet(Z,Y) c = 0.8615

Z is the matrix output by the linkage function and Y is the distance vector output by the pdist function. Execute pdist again on the same data set, this time specifying the city block metric. After running the linkage function on this new pdist output using the average linkage method, call cophenet to evaluate the clustering solution. Y = pdist(X,"cityblock"); Z = linkage(Y,"average"); c = cophenet(Z,Y) c = 0.9047

The cophenetic correlation coefficient shows that using a different distance and linkage method creates a tree that represents the original distances slightly better. Verify Consistency One way to determine the natural cluster divisions in a data set is to compare the height of each link in a cluster tree with the heights of neighboring links below it in the tree. A link that is approximately the same height as the links below it indicates that there are no distinct divisions between the objects joined at this level of the hierarchy. These links are said to exhibit a high level of consistency, because the distance between the objects being joined is approximately the same as the distances between the objects they contain. On the other hand, a link whose height differs noticeably from the height of the links below it indicates that the objects joined at this level in the cluster tree are much farther apart from each other than their components were when they were joined. This link is said to be inconsistent with the links below it. In cluster analysis, inconsistent links can indicate the border of a natural division in a data set. The cluster function uses a quantitative measure of inconsistency to determine where to partition your data set into clusters. 17-11

17

Cluster Analysis

The following dendrogram illustrates inconsistent links. Note how the objects in the dendrogram fall into two groups that are connected by links at a much higher level in the tree. These links are inconsistent when compared with the links below them in the hierarchy.

The relative consistency of each link in a hierarchical cluster tree can be quantified and expressed as the inconsistency coefficient. This value compares the height of a link in a cluster hierarchy with the average height of links below it. Links that join distinct clusters have a high inconsistency coefficient; links that join indistinct clusters have a low inconsistency coefficient. To generate a listing of the inconsistency coefficient for each link in the cluster tree, use the inconsistent function. By default, the inconsistent function compares each link in the cluster hierarchy with adjacent links that are less than two levels below it in the cluster hierarchy. This is called the depth of the comparison. You can also specify other depths. The objects at the bottom of the cluster tree, called leaf nodes, that have no further objects below them, have an inconsistency coefficient of zero. Clusters that join two leaves also have a zero inconsistency coefficient. For example, you can use the inconsistent function to calculate the inconsistency values for the links created by the linkage function in “Linkages” on page 17-8. First, recompute the distance and linkage values using the default settings. 17-12

Hierarchical Clustering

Y = pdist(X); Z = linkage(Y);

Next, use inconsistent to calculate the inconsistency values. I = inconsistent(Z) I = 1.0000 1.0000 1.3539 2.2808

0 0 0.6129 0.3100

1.0000 1.0000 3.0000 2.0000

0 0 1.1547 0.7071

The inconsistent function returns data about the links in an (m-1)-by-4 matrix, whose columns are described in the following table. Column

Description

1

Mean of the heights of all the links included in the calculation

2

Standard deviation of all the links included in the calculation

3

Number of links included in the calculation

4

Inconsistency coefficient

In the sample output, the first row represents the link between objects 4 and 5. This cluster is assigned the index 6 by the linkage function. Because both 4 and 5 are leaf nodes, the inconsistency coefficient for the cluster is zero. The second row represents the link between objects 1 and 3, both of which are also leaf nodes. This cluster is assigned the index 7 by the linkage function. The third row evaluates the link that connects these two clusters, objects 6 and 7. (This new cluster is assigned index 8 in the linkage output). Column 3 indicates that three links are considered in the calculation: the link itself and the two links directly below it in the hierarchy. Column 1 represents the mean of the heights of these links. The inconsistent function uses the height information output by the linkage function to calculate the mean. Column 2 represents the standard deviation between the links. The last column contains the inconsistency value for these links, 1.1547. It is the difference between the current link height and the mean, normalized by the standard deviation. (2.0616 - 1.3539) / 0.6129 ans = 1.1547

The following figure illustrates the links and heights included in this calculation.

17-13

17

Cluster Analysis

Note In the preceding figure, the lower limit on the y-axis is set to 0 to show the heights of the links. To set the lower limit to 0, select Axes Properties from the Edit menu, click the Y Axis tab, and enter 0 in the field immediately to the right of Y Limits. Row 4 in the output matrix describes the link between object 8 and object 2. Column 3 indicates that two links are included in this calculation: the link itself and the link directly below it in the hierarchy. The inconsistency coefficient for this link is 0.7071. The following figure illustrates the links and heights included in this calculation.

17-14

Hierarchical Clustering

Create Clusters After you create the hierarchical tree of binary clusters, you can prune the tree to partition your data into clusters using the cluster function. The cluster function lets you create clusters in two ways, as discussed in the following sections: • “Find Natural Divisions in Data” on page 17-15 • “Specify Arbitrary Clusters” on page 17-16 Find Natural Divisions in Data The hierarchical cluster tree may naturally divide the data into distinct, well-separated clusters. This can be particularly evident in a dendrogram diagram created from data where groups of objects are densely packed in certain areas and not in others. The inconsistency coefficient of the links in the cluster tree can identify these divisions where the similarities between objects change abruptly. (See “Verify the Cluster Tree” on page 17-10 for more information about the inconsistency coefficient.) You can use this value to determine where the cluster function creates cluster boundaries. For example, if you use the cluster function to group the sample data set into clusters, specifying an inconsistency coefficient threshold of 1.2 as the value of the cutoff argument, the cluster function groups all the objects in the sample data set into one cluster. In this case, none of the links in the cluster hierarchy had an inconsistency coefficient greater than 1.2. T = cluster(Z,"cutoff",1.2)

17-15

17

Cluster Analysis

T = 1 1 1 1 1

The cluster function outputs a vector, T, that is the same size as the original data set. Each element in this vector contains the number of the cluster into which the corresponding object from the original data set was placed. If you lower the inconsistency coefficient threshold to 0.8, the cluster function divides the sample data set into three separate clusters. T = cluster(Z,"cutoff",0.8) T = 1 2 1 3 3

This output indicates that objects 1 and 3 are in one cluster, objects 4 and 5 are in another cluster, and object 2 is in its own cluster. When clusters are formed in this way, the cutoff value is applied to the inconsistency coefficient. These clusters may, but do not necessarily, correspond to a horizontal slice across the dendrogram at a certain height. If you want clusters corresponding to a horizontal slice of the dendrogram, you can either use the criterion option to specify that the cutoff should be based on distance rather than inconsistency, or you can specify the number of clusters directly as described in the following section. Specify Arbitrary Clusters Instead of letting the cluster function create clusters determined by the natural divisions in the data set, you can specify the number of clusters you want created. For example, you can specify that you want the cluster function to partition the sample data set into two clusters. In this case, the cluster function creates one cluster containing objects 1, 3, 4, and 5 and another cluster containing object 2. T = cluster(Z,"maxclust",2) T = 2 1 2 2 2

To help you visualize how the cluster function determines these clusters, the following figure shows the dendrogram of the hierarchical cluster tree. The horizontal dashed line intersects two lines of the dendrogram, corresponding to setting maxclust to 2. These two lines partition the objects into two 17-16

Hierarchical Clustering

clusters: the objects below the left-hand line, namely 1, 3, 4, and 5, belong to one cluster, while the object below the right-hand line, namely 2, belongs to the other cluster.

On the other hand, if you set maxclust to 3, the cluster function groups objects 4 and 5 in one cluster, objects 1 and 3 in a second cluster, and object 2 in a third cluster. The following command illustrates this. T = cluster(Z,"maxclust",3) T = 1 3 1 2 2

This time, the cluster function cuts off the hierarchy at a lower point, corresponding to the horizontal line that intersects three lines of the dendrogram in the following figure.

17-17

17

Cluster Analysis

See Also More About •

17-18

“Choose Cluster Analysis Method” on page 17-2

DBSCAN

DBSCAN In this section... “Introduction to DBSCAN” on page 17-19 “Algorithm Description” on page 17-19 “Determine Values for DBSCAN Parameters” on page 17-20

Introduction to DBSCAN Density-Based Spatial Clustering of Applications with Noise (DBSCAN) identifies arbitrarily shaped clusters and noise (outliers) in data. The Statistics and Machine Learning Toolbox function dbscan performs clustering on an input data matrix or on pairwise distances between observations. dbscan returns the cluster indices and a vector indicating the observations that are core points (points inside clusters). Unlike k-means clustering, the DBSCAN algorithm does not require prior knowledge of the number of clusters, and clusters are not necessarily spheroidal. DBSCAN is also useful for densitybased outlier detection, because it identifies points that do not belong to any cluster. For a point to be assigned to a cluster, it must satisfy the condition that its epsilon neighborhood (epsilon) contains at least a minimum number of neighbors (minpts). Or, the point can lie within the epsilon neighborhood of another point that satisfies the epsilon and minpts conditions. The DBSCAN algorithm identifies three kinds of points: • Core point — A point in a cluster that has at least minpts neighbors in its epsilon neighborhood • Border point — A point in a cluster that has fewer than minpts neighbors in its epsilon neighborhood • Noise point — An outlier that does not belong to any cluster DBSCAN works with a wide range of distance metrics on page 35-0 , and you can define a custom distance metric for your particular application. The choice of a distance metric determines the shape of the neighborhood.

Algorithm Description For specified values of the epsilon neighborhood epsilon and the minimum number of neighbors minpts required for a core point, the dbscan function implements DBSCAN as follows: 1

From the input data set X, select the first unlabeled observation x1 as the current point, and initialize the first cluster label C to 1.

2

Find the set of points within the epsilon neighborhood epsilon of the current point. These points are the neighbors. a

If the number of neighbors is less than minpts, then label the current point as a noise point (or an outlier). Go to step 4. Note dbscan can reassign noise points to clusters if the noise points later satisfy the constraints set by epsilon and minpts from some other point in X. This process of reassigning points happens for border points of a cluster.

b

Otherwise, label the current point as a core point belonging to cluster C. 17-19

17

Cluster Analysis

3

Iterate over each neighbor (new current point) and repeat step 2 until no new neighbors are found that can be labeled as belonging to the current cluster C.

4

Select the next unlabeled point in X as the current point, and increase the cluster count by 1.

5

Repeat steps 2–4 until all points in X are labeled.

Determine Values for DBSCAN Parameters This example shows how to select values for the epsilon and minpts parameters of dbscan. The data set is a Lidar scan, stored as a collection of 3-D points, that contains the coordinates of objects surrounding a vehicle. Load, preprocess, and visualize the data set. Load the x, y, z coordinates of the objects. load('lidar_subset.mat') X = lidar_subset;

To highlight the environment around the vehicle, set the region of interest to span 20 meters to the left and right of the vehicle, 20 meters in front and back of the vehicle, and the area above the surface of the road. xBound = 20; % in meters yBound = 20; % in meters zLowerBound = 0; % in meters

Crop the data to contain only points within the specified region. indices = X(:,1) = -xBound ... & X(:,2) = -yBound ... & X(:,3) > zLowerBound; X = X(indices,:);

Visualize the data as a 2-D scatter plot. Annotate the plot to highlight the vehicle. scatter(X(:,1),X(:,2),'.'); annotation('ellipse',[0.48 0.48 .1 .1],'Color','red')

17-20

DBSCAN

The center of the set of points (circled in red) contains the roof and hood of the vehicle. All other points are obstacles. Select a value for minpts. To select a value for minpts, consider a value greater than or equal to one plus the number of dimensions of the input data [1]. For example, for an n-by-p matrix X, set the value of 'minpts' greater than or equal to p+1. For the given data set, specify a minpts value greater than or equal to 4, specifically the value 50. minpts = 50; % Minimum number of neighbors for a core point

Select a value for epsilon. One strategy for estimating a value for epsilon is to generate a k-distance graph for the input data X. For each point in X, find the distance to the kth nearest point, and plot sorted points against this distance. The graph contains a knee. The distance that corresponds to the knee is generally a good choice for epsilon, because it is the region where points start tailing off into outlier (noise) territory [1]. Before plotting the k-distance graph, first find the minpts smallest pairwise distances for observations in X, in ascending order. kD = pdist2(X,X,'euc','Smallest',minpts);

Plot the k-distance graph. 17-21

17

Cluster Analysis

plot(sort(kD(end,:))); title('k-distance graph') xlabel('Points sorted with 50th nearest distances') ylabel('50th nearest distances') grid

The knee appears to be around 2; therefore, set the value of epsilon to 2. epsilon = 2;

Cluster using dbscan. Use dbscan with the values of minpts and epsilon that were determined in the previous steps. labels = dbscan(X,epsilon,minpts);

Visualize the clustering and annotate the figure to highlight specific clusters. numGroups = length(unique(labels)); gscatter(X(:,1),X(:,2),labels,hsv(numGroups)); title('epsilon = 2 and minpts = 50') grid annotation('ellipse',[0.54 0.41 .07 .07],'Color','red') annotation('ellipse',[0.53 0.85 .07 .07],'Color','blue') annotation('ellipse',[0.39 0.85 .07 .07],'Color','black')

17-22

DBSCAN

dbscan identifies 11 clusters and a set of noise points. The algorithm also identifies the vehicle at the center of the set of points as a distinct cluster. dbscan identifies some distinct clusters, such as the cluster circled in black (and centered around (– 6,18)) and the cluster circled in blue (and centered around (2.5,18)). The function also assigns the group of points circled in red (and centered around (3,–4)) to the same cluster (group 7) as the group of points in the southeast quadrant of the plot. The expectation is that these groups should be in separate clusters. Use a smaller value for epsilon to split up large clusters and further partition the points. epsilon2 = 1; labels2 = dbscan(X,epsilon2,minpts);

Visualize the clustering and annotate the figure to highlight specific clusters. numGroups2 = length(unique(labels2)); gscatter(X(:,1),X(:,2),labels2,hsv(numGroups2)); title('epsilon = 1 and minpts = 50') grid annotation('ellipse',[0.54 0.41 .07 .07],'Color','red') annotation('ellipse',[0.53 0.85 .07 .07],'Color','blue') annotation('ellipse',[0.39 0.85 .07 .07],'Color','black')

17-23

17

Cluster Analysis

By using a smaller epsilon value, dbscan is able to assign the group of points circled in red to a distinct cluster (group 13). However, some clusters that dbscan correctly identified before are now split between cluster points and outliers. For example, see cluster group 2 (circled in black) and cluster group 3 (circled in blue). The correct epsilon value is somewhere between 1 and 2. Use an epsilon value of 1.55 to cluster the data. epsilon3 = 1.55; labels3 = dbscan(X,epsilon3,minpts);

Visualize the clustering and annotate the figure to highlight specific clusters. numGroups3 = length(unique(labels3)); gscatter(X(:,1),X(:,2),labels3,hsv(numGroups3)); title('epsilon = 1.55 and minpts = 50') grid annotation('ellipse',[0.54 0.41 .07 .07],'Color','red') annotation('ellipse',[0.53 0.85 .07 .07],'Color','blue') annotation('ellipse',[0.39 0.85 .07 .07],'Color','black')

17-24

DBSCAN

dbscan does a better job of identifying the clusters when epsilon is set to 1.55. For example, the function identifies the distinct clusters circled in red, black, and blue (with centers around (3,–4), (– 6,18), and (2.5,18), respectively).

References [1] Ester, M., H.-P. Kriegel, J. Sander, and X. Xiaowei. “A density-based algorithm for discovering clusters in large spatial databases with noise.” In Proceedings of the Second International Conference on Knowledge Discovery in Databases and Data Mining, 226-231. Portland, OR: AAAI Press, 1996.

See Also More About •

“Choose Cluster Analysis Method” on page 17-2

17-25

17

Cluster Analysis

Partition Data Using Spectral Clustering This topic provides an introduction to spectral clustering and an example that estimates the number of clusters and performs spectral clustering.

Introduction to Spectral Clustering Spectral clustering is a graph-based algorithm for partitioning data points, or observations, into k clusters. The Statistics and Machine Learning Toolbox function spectralcluster performs clustering on an input data matrix or on a similarity matrix on page 35-7807 of a similarity graph derived from the data. spectralcluster returns the cluster indices, a matrix containing k eigenvectors of the Laplacian matrix on page 35-7808, and a vector of eigenvalues corresponding to the eigenvectors. spectralcluster requires you to specify the number of clusters k. However, you can verify that your estimate for k is correct by using one of these methods: • Count the number of zero eigenvalues of the Laplacian matrix. The multiplicity of the zero eigenvalues is an indicator of the number of clusters in your data. • Find the number of connected components in your similarity matrix by using the MATLAB function conncomp.

Algorithm Description Spectral clustering is a graph-based algorithm for finding k arbitrarily shaped clusters in data. The technique involves representing the data in a low dimension. In the low dimension, clusters in the data are more widely separated, enabling you to use algorithms such as k-means or k-medoids clustering. This low dimension is based on the eigenvectors corresponding to the k smallest eigenvalues of a Laplacian matrix. A Laplacian matrix is one way of representing a similarity graph that models the local neighborhood relationships between data points as an undirected graph. The spectral clustering algorithm derives a similarity matrix of a similarity graph from your data, finds the Laplacian matrix, and uses the Laplacian matrix to find k eigenvectors for splitting the similarity graph into k partitions. You can use spectral clustering when you know the number of clusters, but the algorithm also provides a way to estimate the number of clusters in your data. By default, the algorithm for spectralcluster computes the normalized random-walk Laplacian matrix using the method described by Shi-Malik [1]. spectralcluster also supports the unnormalized Laplacian matrix and the normalized symmetric Laplacian matrix which uses the NgJordan-Weiss method [2]. The spectralcluster function implements clustering as follows: 1

For each data point in X, define a local neighborhood using either the radius search method or nearest neighbor method, as specified by the 'SimilarityGraph' name-value pair argument (see “Similarity Graph” on page 35-7807). Then, find the pairwise distances Disti, j for all points i and j in the neighborhood.

2

Convert the distances to similarity measures using the kernel transformation Disti, j 2 . The matrix S is the similarity matrix on page 35-7807, and σ is the scale Si, j = exp − σ factor for the kernel, as specified using the 'KernelScale' name-value pair argument.

3

17-26

Calculate the unnormalized Laplacian matrix on page 35-7808 L , the normalized random-walk Laplacian matrix Lrw, or the normalized symmetric Laplacian matrix Ls, depending on the value of the 'LaplacianNormalization' name-value pair argument.

Partition Data Using Spectral Clustering

4

Create a matrix V ∈ ℝn × k containing columns v1, …, vk, where the columns are the k eigenvectors that correspond to the k smallest eigenvalues of the Laplacian matrix. If using Ls, normalize each row of V to have unit length.

5

Treating each row of V as a point, cluster the n points using k-means clustering (default) or kmedoids clustering, as specified by the 'ClusterMethod' name-value pair argument.

6

Assign the original points in X to the same clusters as their corresponding rows in V.

Estimate Number of Clusters and Perform Spectral Clustering This example demonstrates two approaches for performing spectral clustering. • The first approach estimates the number of clusters using the eigenvalues of the Laplacian matrix and performs spectral clustering on the data set. • The second approach estimates the number of clusters using the similarity graph and performs spectral clustering on the similarity matrix. Generate Sample Data Randomly generate a sample data set with three well-separated clusters, each containing 20 points. rng('default'); % For reproducibility n = 20; X = [randn(n,2)*0.5+3; randn(n,2)*0.5; randn(n,2)*0.5-3];

Perform Spectral Clustering on Data Estimate the number of clusters in the data by using the eigenvalues of the Laplacian matrix, and perform spectral clustering on the data set. Compute the five smallest eigenvalues (in magnitude) of the Laplacian matrix by using the spectralcluster function. By default, the function uses the normalized random-walk Laplacian matrix. [~,V_temp,D_temp] = spectralcluster(X,5) V_temp = 60×5 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 ⋮

0.2236 0.2236 0.2236 0.2236 0.2236 0.2236 0.2236 0.2236 0.2236 0.2236

-0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000

-0.1534 0.3093 -0.2225 -0.1776 -0.1331 -0.2176 -0.1967 0.0088 0.2844 0.3275

-0.0000 0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 0.0000 0.0000 -0.0000

D_temp = 5×1

17-27

17

Cluster Analysis

-0.0000 -0.0000 -0.0000 0.0876 0.1653

Only the first three eigenvalues are approximately zero. The number of zero eigenvalues is a good indicator of the number of connected components in a similarity graph and, therefore, is a good estimate of the number of clusters in your data. So, k=3 is a good estimate of the number of clusters in X. Perform spectral clustering on observations by using the spectralcluster function. Specify k=3 clusters. k = 3; [idx1,V,D] = spectralcluster(X,k) idx1 = 60×1 1 1 1 1 1 1 1 1 1 1

⋮

V = 60×3 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 ⋮

-0.2236 -0.2236 -0.2236 -0.2236 -0.2236 -0.2236 -0.2236 -0.2236 -0.2236 -0.2236

0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

D = 3×1 10-16 × -0.1031 -0.1601 -0.3754

Elements of D correspond to the three smallest eigenvalues of the Laplacian matrix. The columns of V contain the eigenvectors corresponding to the eigenvalues in D. For well-separated clusters, the 17-28

Partition Data Using Spectral Clustering

eigenvectors are indicator vectors. The eigenvectors have values of zero (or close to zero) for points that do not belong to a particular cluster, and nonzero values for points that belong to a particular cluster. Visualize the result of clustering. gscatter(X(:,1),X(:,2),idx1);

The spectralcluster function correctly identifies the three clusters in the data set. Instead of using the spectralcluster function again, you can pass V_temp to the kmeans function to cluster the data points. idx2 = kmeans(V_temp(:,1:3),3); gscatter(X(:,1),X(:,2),idx2);

17-29

17

Cluster Analysis

The order of cluster assignments in idx1 and idx2 is different even though the data points are clustered in the same way. Perform Spectral Clustering on Similarity Matrix Estimate the number of clusters using the similarity graph and perform spectral clustering on the similarity matrix. Find the distance between each pair of observations in X by using the pdist and squareform functions with the default Euclidean distance metric. dist_temp = pdist(X); dist = squareform(dist_temp);

Construct the similarity matrix from the pairwise distance and confirm that the similarity matrix is symmetric. S = exp(-dist.^2); issymmetric(S) ans = logical 1

Limit the similarity values to 0.5 so that the similarity graph connects only points whose pairwise distances are smaller than the search radius. 17-30

Partition Data Using Spectral Clustering

S_eps = S; S_eps(S_eps=threshold(1) & P(:,1) score_threshold2); figure t = tiledlayout(3,1); nexttile plot(s(ind1,:)') title(join(["Observations with average score < " score_threshold1])) nexttile plot(s(ind2,:)') title(join(["Observations with average score in [" ... score_threshold1 " " score_threshold2 "]"])) nexttile plot(s(ind3,:)') title(join(["Observations with average score > " score_threshold2])) xlabel(t,"Number of Observations for Each Tree") ylabel(t,"Anomaly Score")

17-86

Anomaly Detection with Isolation Forest

The anomaly score decreases as the sample size increases for the observations whose average score values are less than 0.5. For the observations whose average score values are greater than 0.55, the anomaly score increases as the sample size increases and then the score converges roughly when the sample size reaches 50. Detect anomalies in training observations by using isolation forest models with the sample sizes 50 and 100. Specify the fraction of anomalies in the training observations as 0.05. [f1,tf1,scores1] = iforest(meas,NumObservationsPerLearner=50, ... ContaminationFraction=0.05); [f2,tf2,scores2] = iforest(meas,NumObservationsPerLearner=100, ... ContaminationFraction=0.05);

Display the observation indexes of the anomalies. find(tf1) ans = 7×1 14 42 110 118 119 123 132

17-87

17

Cluster Analysis

find(tf2) ans = 7×1 14 15 16 110 118 119 132

The two isolation forest models have five anomalies in common. Visualize Anomalies For the isolation forest model with the sample size 50, visually compare observation values between normal points and anomalies. Create a matrix of grouped histograms and grouped scatter plots for each combination of variables by using the gplotmatrix function. tf1 = categorical(tf1,[0 1],["Normal Points" "Anomalies"]); predictorNames = ["Sepal Length" "Sepal Width" ... "Petal Length" "Petal Width"]; gplotmatrix(meas,[],tf1,"kr",".x",[],[],[],predictorNames)

17-88

Anomaly Detection with Isolation Forest

For high-dimensional data, you can visualize data by using only the important features. You can also visualize data after reducing the dimension by using t-SNE (t-Distributed Stochastic Neighbor Embedding). Visualize observation values using the two most important features selected by the fsulaplacian function. idx = fsulaplacian(meas) idx = 1×4 3

4

1

2

gscatter(meas(:,idx(1)),meas(:,idx(2)),tf1,"kr",".x",[],"on", ... predictorNames(idx(1)),predictorNames(idx(2)))

Visualize observation values after reducing the dimension by using the tsne function. Y = tsne(meas); gscatter(Y(:,1),Y(:,2),tf1,"kr",".x")

17-89

17

Cluster Analysis

References [1] Liu, F. T., K. M. Ting, and Z. Zhou. "Isolation Forest," 2008 Eighth IEEE International Conference on Data Mining. Pisa, Italy, 2008, pp. 413-422.

See Also iforest | IsolationForest | isanomaly

Related Examples •

17-90

“Unsupervised Anomaly Detection” on page 17-91

Unsupervised Anomaly Detection

Unsupervised Anomaly Detection This topic introduces the unsupervised anomaly detection features for multivariate sample data available in Statistics and Machine Learning Toolbox, and describes the workflows of the features for outlier detection on page 17-91 (detecting anomalies in training data) and novelty detection on page 17-102 (detecting anomalies in new data with uncontaminated training data). For unlabeled multivariate sample data, you can detect anomalies by using isolation forest, robust random cut forest, local outlier factor, one-class support vector machine (SVM), and Mahalanobis distance. These methods detect outliers either by training a model or by learning parameters. For novelty detection, you train a model or learn parameters with uncontaminated training data (data with no outliers) and detect anomalies in new data by using the trained model or learned parameters. • Isolation forest — The “Isolation Forest” on page 35-4161 algorithm detects anomalies by isolating them from normal points using an ensemble of isolation trees. Detect outliers by using the iforest function, and detect novelties by using the object function isanomaly. • Random robust cut forest — The “Robust Random Cut Forest” on page 35-7572 algorithm classifies a point as a normal point or an anomaly based on the changes in model complexity introduced by the point. Similar to the isolation forest algorithm, the robust random cut forest algorithm builds an ensemble of trees. The two algorithms differ in how they choose a split variable in trees and how they define anomaly scores. Detect outliers by using the rrcforest function, and detect novelties by using the object function isanomaly. • Local outlier factor — The “Local Outlier Factor” on page 35-4732 (LOF) algorithm detects anomalies based on the relative density of an observation with respect to the surrounding neighborhood. Detect outliers by using the lof function, and detect novelties by using the object function isanomaly. • One-class support vector machine (SVM) — One-class SVM on page 35-5672, or unsupervised SVM, tries to separate data from the origin in the transformed high-dimensional predictor space. Detect outliers by using the ocsvm function, and detect novelties by using the object function isanomaly. • “Mahalanobis Distance” on page 35-7482 — If sample data follows a multivariate normal distribution, then the squared Mahalanobis distances from samples to the distribution follow a chisquare distribution. Therefore, you can use the distances to detect anomalies based on the critical values of the chi-square distribution. For outlier detection, use the robustcov function to compute robust Mahalanobis distances. For novelty detection, you can compute Mahalanobis distances by using the robustcov and pdist2 functions. To detect anomalies when performing incremental learning, see incrementalRobustRandomCutForest, incrementalOneClassSVM, and “Incremental Anomaly Detection Overview” on page 28-9.

Outlier Detection This example illustrates the workflows of the five unsupervised anomaly detection methods (isolation forest, robust random cut forest, local outlier factor, one-class SVM, and Mahalanobis distance) for outlier detection. Load Data Load the humanactivity data set, which contains the variables feat and actid. The variable feat contains the predictor data matrix of 60 features for 24,075 observations, and the response variable 17-91

17

Cluster Analysis

actid contains the activity IDs for the observations as integers. This example uses the feat variable for anomaly detection. load humanactivity

Find the size of the variable feat. [N,D] = size(feat) N = 24075 D = 60

Assume that the fraction of outliers in the data is 0.05. contaminationFraction = 0.05;

Isolation Forest Detect outliers by using the iforest function. Train an isolation forest model by using the iforest function. Specify the fraction of outliers (ContaminationFraction) as 0.05. rng("default") % For reproducibility [forest,tf_forest,s_forest] = iforest(feat, ... ContaminationFraction=contaminationFraction);

forest is an IsolationForest object. iforest also returns the anomaly indicators (tf_forest) and anomaly scores (s_forest) for the data (feat). iforest determines the score threshold value (forest.ScoreThreshold) so that the function detects the specified fraction of observations as outliers. Plot a histogram of the score values. Create a vertical line at the score threshold corresponding to the specified fraction. figure histogram(s_forest,Normalization="probability") xline(forest.ScoreThreshold,"k-", ... join(["Threshold =" forest.ScoreThreshold])) title("Histogram of Anomaly Scores for Isolation Forest")

17-92

Unsupervised Anomaly Detection

Check the fraction of detected anomalies in the data. OF_forest = sum(tf_forest)/N OF_forest = 0.0496

The outlier fraction can be smaller than the specified fraction (0.05) when the scores have tied values at the threshold. Robust Random Cut Forest Detect outliers by using the rrcforest function. Train a robust random cut forest model by using the rrcforest function. Specify the fraction of outliers (ContaminationFraction) as 0.05, and specify StandardizeData as true to standardize the input data. rng("default") % For reproducibility [rforest,tf_rforest,s_rforest] = rrcforest(feat, ... ContaminationFraction=contaminationFraction,StandardizeData=true);

rforest is a RobustRandomCutForest object. rrcforest also returns the anomaly indicators (tf_rforest) and anomaly scores (s_rforest) for the data (feat). rrcforest determines the score threshold value (rforest.ScoreThreshold) so that the function detects the specified fraction of observations as outliers. Plot a histogram of the score values. Create a vertical line at the score threshold corresponding to the specified fraction. 17-93

17

Cluster Analysis

figure histogram(s_rforest,Normalization="probability") xline(rforest.ScoreThreshold,"k-", ... join(["Threshold =" rforest.ScoreThreshold])) title("Histogram of Anomaly Scores for Robust Random Cut Forest")

Check the fraction of detected anomalies in the data. OF_rforest = sum(tf_rforest)/N OF_rforest = 0.0500

Local Outlier Factor Detect outliers by using the lof function. Train a local outlier factor model by using the lof function. Specify the fraction of outliers (ContaminationFraction) as 0.05, 500 nearest neighbors, and the Mahalanobis distance. [LOFObj,tf_lof,s_lof] = lof(feat, ... ContaminationFraction=contaminationFraction, ... NumNeighbors=500,Distance="mahalanobis");

LOFObj is a LocalOutlierFactor object. lof also returns the anomaly indicators (tf_lof) and anomaly scores (s_lof) for the data (feat). lof determines the score threshold value (LOFObj.ScoreThreshold) so that the function detects the specified fraction of observations as outliers. 17-94

Unsupervised Anomaly Detection

Plot a histogram of the score values. Create a vertical line at the score threshold corresponding to the specified fraction. figure histogram(s_lof,Normalization="probability") xline(LOFObj.ScoreThreshold,"k-", ... join(["Threshold =" LOFObj.ScoreThreshold])) title("Histogram of Anomaly Scores for Local Outlier Factor")

Check the fraction of detected anomalies in the data. OF_lof = sum(tf_lof)/N OF_lof = 0.0500

One-Class SVM Detect outliers by using the ocsvm function. Train a one-class SVM model by using the ocsvm function. Specify the fraction of outliers (ContaminationFraction) as 0.05. In addition, set KernelScale to "auto" to let the function select an appropriate kernel scale parameter using a heuristic procedure, and specify StandardizeData as true to standardize the input data. [Mdl,tf_OCSVM,s_OCSVM] = ocsvm(feat, ... ContaminationFraction=contaminationFraction, ... KernelScale="auto",StandardizeData=true);

17-95

17

Cluster Analysis

Mdl is a OneClassSVM object. ocsvm also returns the anomaly indicators (tf_OCSVM) and anomaly scores (s_OCSVM) for the data (feat). ocsvm determines the score threshold value (Mdl.ScoreThreshold) so that the function detects the specified fraction of observations as outliers. Plot a histogram of the score values. Create a vertical line at the score threshold corresponding to the specified fraction. figure histogram(s_OCSVM,Normalization="probability") xline(Mdl.ScoreThreshold,"k-", ... join(["Threshold =" Mdl.ScoreThreshold])) title("Histogram of Anomaly Scores for One-Class SVM")

Check the fraction of detected anomalies in the data. OF_OCSVM = sum(tf_OCSVM)/N OF_OCSVM = 0.0500

Mahalanobis Distance Use the robustcov function to compute robust Mahalanobis distances and robust estimates for the mean and covariance of the data. Compute the Mahalanobis distance from feat to the distribution of feat by using the robustcov function. Specify the fraction of outliers (OutlierFraction) as 0.05. robustcov minimizes the covariance determinant over 95% of the observations. 17-96

Unsupervised Anomaly Detection

[sigma,mu,s_robustcov,tf_robustcov_default] = robustcov(feat, ... OutlierFraction=contaminationFraction);

robustcov finds the robust covariance matrix estimate (sigma) and robust mean estimate (mu), which are less sensitive to outliers than the estimates from the cov and mean functions. The robustcov function also computes the Mahalanobis distances (s_robustcov) and the outlier indicators (tf_robustcov_default). By default, the function assumes that the data set follows a multivariate normal distribution, and identifies 2.5% of input observations as outliers based on the critical values of the chi-square distribution. If the data set satisfies the normality assumption, then the squared Mahalanobis distance follows a chi-square distribution with D degrees of freedom, where D is the dimension of the data. In that case, you can find a new threshold by using the chi2inv function to detect the specified fraction of observations as outliers. s_robustcov_threshold = sqrt(chi2inv(1-contaminationFraction,D)); tf_robustcov = s_robustcov > s_robustcov_threshold;

Create a distance-distance plot (DD plot) to check the multivariate normality of the data. figure d_classical = pdist2(feat,mean(feat),"mahalanobis"); gscatter(d_classical,s_robustcov,tf_robustcov,"kr",".x") xline(s_robustcov_threshold,"k-") yline(s_robustcov_threshold,"k-", ... join(["Threshold = " s_robustcov_threshold])); l = refline([1 0]); l.Color = "k"; xlabel("Mahalanobis Distance") ylabel("Robust Distance") legend("Normal Points","Outliers",Location="northwest") title("Distance-Distance Plot")

17-97

17

Cluster Analysis

Zoom in the axes to see the normal points. xlim([0 10]) ylim([0 10])

17-98

Unsupervised Anomaly Detection

If a data set follows a multivariate normal distribution, then data points cluster tightly around the 45 degree reference line. The DD plot indicates that the data set does not follow a multivariate normal distribution. Because the data set does not satisfy the normality assumption, use the quantile of the distance values for the cumulative probability (1 — contaminationFraction) to find a threshold. s_robustcov_threshold = quantile(s_robustcov,1-contaminationFraction);

Obtain the anomaly indicators for feat using the new threshold s_robustcov_threshold. tf_robustcov = s_robustcov > s_robustcov_threshold;

Check the fraction of detected anomalies in the data. OF_robustcov = sum(tf_robustcov)/N OF_robustcov = 0.0500

Compare Detected Outliers To visualize the detected outliers, reduce the data dimension by using the tsne function. rng("default") % For reproducibility T = tsne(feat,Standardize=true,Perplexity=100,Exaggeration=20);

17-99

17

Cluster Analysis

Plot the normal points and outliers in the reduced dimension. Compare the results of the five methods: the isolation forest algorithm, robust random cut forest algorithm, local outlier factor algorithm, one-class SVM model, and robust Mahalanobis distance from robustcov. figure tiledlayout(2,3) nexttile gscatter(T(:,1),T(:,2),tf_forest,"kr",[],[],"off") title("Isolation Forest") nexttile gscatter(T(:,1),T(:,2),tf_rforest,"kr",[],[],"off") title("Robust Random Cut Forest") nexttile(4) gscatter(T(:,1),T(:,2),tf_lof,"kr",[],[],"off") title("Local Outlier Factor") nexttile(5) gscatter(T(:,1),T(:,2),tf_OCSVM,"kr",[],[],"off") title("One-Class SVM") nexttile(6) gscatter(T(:,1),T(:,2),tf_robustcov,"kr",[],[],"off") title("Robust Mahalanobis Distance") l = legend("Normal Points","Outliers"); l.Layout.Tile = 3;

The novelties identified by the five methods are located near each other in the reduced dimension.

17-100

Unsupervised Anomaly Detection

You can also visualize observation values using the two most important features selected by the fsulaplacian function. idx = fsulaplacian(feat); figure t = tiledlayout(2,3); nexttile gscatter(feat(:,idx(1)),feat(:,idx(2)),tf_forest,"kr",[],[],"off") title("Isolation Forest") nexttile gscatter(feat(:,idx(1)),feat(:,idx(2)),tf_rforest,"kr",[],[],"off") title("Robust Random Cut Forest") nexttile(4) gscatter(feat(:,idx(1)),feat(:,idx(2)),tf_lof,"kr",[],[],"off") title("Local Outlier Factor") nexttile(5) gscatter(feat(:,idx(1)),feat(:,idx(2)),tf_OCSVM,"kr",[],[],"off") title("One-Class SVM") nexttile(6) gscatter(feat(:,idx(1)),feat(:,idx(2)),tf_robustcov,"kr",[],[],"off") title("Mahalanobis Distance") l = legend("Normal Points","Outliers"); l.Layout.Tile = 3; xlabel(t,join(["Column" idx(1)])) ylabel(t,join(["Column" idx(2)]))

17-101

17

Cluster Analysis

Novelty Detection This example illustrates the workflows of the five unsupervised anomaly detection methods (isolation forest, robust random cut forest, local outlier factor, one-class SVM, and Mahalanobis distance) for novelty detection. Load Data Load the humanactivity data set, which contains the variables feat and actid. The variable feat contains the predictor data matrix of 60 features for 24,075 observations, and the response variable actid contains the activity IDs for the observations as integers. This example uses the feat variable for anomaly detection. load humanactivity

Partition the data into training and test sets by using the cvpartition function. Use 50% of the observations as training data and 50% of the observations as test data for novelty detection. rng("default") % For reproducibility c = cvpartition(actid,Holdout=0.50); trainingIndices = training(c); % Indices for the training set testIndices = test(c); % Indices for the test set XTrain = feat(trainingIndices,:); XTest = feat(testIndices,:);

Assume that the training data is not contaminated (no outliers). 17-102

Unsupervised Anomaly Detection

Find the size of the training and test sets. [N,D] = size(XTrain) N = 12038 D = 60 NTest = size(XTest,1) NTest = 12037

Isolation Forest Detect novelties using the object function isanomaly after training an isolation forest model by using the iforest function. Train an isolation forest model. [forest,tf_forest,s_forest] = iforest(XTrain);

forest is an IsolationForest object. iforest also returns the anomaly indicators (tf_forest) and anomaly scores (s_forest) for the training data (XTrain). By default, iforest treats all training observations as normal observations, and sets the score threshold (forest.ScoreThreshold) to the maximum score value. Use the trained isolation forest model and the object function isanomaly to find novelties in XTest. The isanomaly function identifies observations with scores above the threshold (forest.ScoreThreshold) as novelties. [tfTest_forest,sTest_forest] = isanomaly(forest,XTest);

The isanomaly function returns the anomaly indicators (tfTest_forest) and anomaly scores (sTest_forest) for the test data. Plot histograms of the score values. Create a vertical line at the score threshold. figure histogram(s_forest,Normalization="probability") hold on histogram(sTest_forest,Normalization="probability") xline(forest.ScoreThreshold,"k-", ... join(["Threshold =" forest.ScoreThreshold])) legend("Training data","Test data",Location="southeast") title("Histograms of Anomaly Scores for Isolation Forest") hold off

17-103

17

Cluster Analysis

The anomaly score distribution of the test data is similar to that of the training data, so isanomaly detects a small number of anomalies in the test data. Check the fraction of detected anomalies in the test data. NF_forest = sum(tfTest_forest)/NTest NF_forest = 8.3077e-05

Display the observation index of the anomalies in the test data. idx_forest = find(tfTest_forest) idx_forest = 3422

Robust Random Cut Forest Detect novelties using the object function isanomaly after training a robust random cut forest model by using the rrcforest function. Train a robust random cut forest model. Specify StandardizeData as true to standardize the input data. [rforest,tf_rforest,s_rforest] = rrcforest(XTrain,StandardizeData=true);

rforest is a RobustRandomCutForest object. rrcforest also returns the anomaly indicators (tf_rforest) and anomaly scores (s_rforest) for the training data (XTrain). By default, 17-104

Unsupervised Anomaly Detection

rrcforest treats all training observations as normal observations, and sets the score threshold (rforest.ScoreThreshold) to the maximum score value. Use the trained robust random cut forest model and the object function isanomaly to find novelties in XTest. The isanomaly function identifies observations with scores above the threshold (rforest.ScoreThreshold) as novelties. [tfTest_rforest,sTest_rforest] = isanomaly(rforest,XTest);

The isanomaly function returns the anomaly indicators (tfTest_rforest) and anomaly scores (sTest_rforest) for the test data. Plot histograms of the score values. Create a vertical line at the score threshold. figure histogram(s_rforest,Normalization="probability") hold on histogram(sTest_rforest,Normalization="probability") xline(rforest.ScoreThreshold,"k-", ... join(["Threshold =" rforest.ScoreThreshold])) legend("Training data","Test data",Location="southeast") title("Histograms of Anomaly Scores for Robust Random Cut Forest") hold off

The anomaly score distribution of the test data is similar to that of the training data, so isanomaly detects a small number of anomalies in the test data. Check the fraction of detected anomalies in the test data. 17-105

17

Cluster Analysis

NF_rforest = sum(tfTest_rforest)/NTest NF_rforest = 0

The anomaly score distribution of the test data is similar to that of the training data, so isanomaly does not detect any anomalies in the test data. Local Outlier Factor Detect novelties using the object function isanomaly after training a local outlier factor model by using the lof function. Train a local outlier factor model. [LOFObj,tf_lof,s_lof] = lof(XTrain);

LOFObj is a LocalOutlierFactor object. lof returns the anomaly indicators (tf_lof) and anomaly scores (s_lof) for the training data (XTrain). By default, lof treats all training observations as normal observations, and sets the score threshold (LOFObj.ScoreThreshold) to the maximum score value. Use the trained local outlier factor model and the object function isanomaly to find novelties in XTest. The isanomaly function identifies observations with scores above the threshold (LOFObj.ScoreThreshold) as novelties. [tfTest_lof,sTest_lof] = isanomaly(LOFObj,XTest);

The isanomaly function returns the anomaly indicators (tfTest_lof) and anomaly scores (sTest_lof) for the test data. Plot histograms of the score values. Create a vertical line at the score threshold. figure histogram(s_lof,Normalization="probability") hold on histogram(sTest_lof,Normalization="probability") xline(LOFObj.ScoreThreshold,"k-", ... join(["Threshold =" LOFObj.ScoreThreshold])) legend("Training data","Test data",Location="southeast") title("Histograms of Anomaly Scores for Local Outlier Factor") hold off

17-106

Unsupervised Anomaly Detection

The anomaly score distribution of the test data is similar to that of the training data, so isanomaly detects a small number of anomalies in the test data. Check the fraction of detected anomalies in the test data. NF_lof = sum(tfTest_lof)/NTest NF_lof = 8.3077e-05

Display the observation index of the anomalies in the test data. idx_lof = find(tfTest_lof) idx_lof = 8704

One-Class SVM Detect novelties using the object function isanomaly after training a one-class SVM model by using the ocsvm function. Train a one-class SVM model. Set KernelScale to "auto" to let the function select an appropriate kernel scale parameter using a heuristic procedure, and specify StandardizeData as true to standardize the input data. [Mdl,tf_OCSVM,s_OCSVM] = ocsvm(XTrain, ... KernelScale="auto",StandardizeData=true);

Mdl is a OneClassSVM object. ocsvm returns the anomaly indicators (tf_OCSVM) and anomaly scores (s_OCSVM) for the training data (XTrain). By default, ocsvm treats all training observations as 17-107

17

Cluster Analysis

normal observations, and sets the score threshold (Mdl.ScoreThreshold) to the maximum score value. Use the trained one-class SVM model and the object function isanomaly to find novelties in the test data (XTest). The isanomaly function identifies observations with scores above the threshold (Mdl.ScoreThreshold) as novelties. [tfTest_OCSVM,sTest_OCSVM] = isanomaly(Mdl,XTest);

The isanomaly function returns the anomaly indicators (tfTest_OCSVM) and anomaly scores (sTest_OCSVM) for the test data. Plot histograms of the score values. Create a vertical line at the score threshold. figure histogram(s_OCSVM,Normalization="probability") hold on histogram(sTest_OCSVM,Normalization="probability") xline(Mdl.ScoreThreshold,"k-", ... join(["Threshold =" Mdl.ScoreThreshold])) legend("Training data","Test data",Location="southeast") title("Histograms of Anomaly Scores for One-Class SVM") hold off

Check the fraction of detected anomalies in the test data. NF_OCSVM = sum(tfTest_OCSVM)/NTest

17-108

Unsupervised Anomaly Detection

NF_OCSVM = 1.6615e-04

Display the observation index of the anomalies in the test data. idx_OCSVM = find(tfTest_OCSVM) idx_OCSVM = 2×1 3560 8316

Mahalanobis Distance Use the robustcov function to compute Mahalanobis distances of training data, and use the pdist2 function to compute Mahalanobis distances of test data. Compute the Mahalanobis distance from XTrain to the distribution of XTrain by using the robustcov function. Specify the fraction of outliers (OutlierFraction) as 0. [sigma,mu,s_mahal] = robustcov(XTrain,OutlierFraction=0);

robustcov also returns the estimates of covariance matrix (sigma) and mean (mu), which you can use to compute distances of test data. Use the maximum value of s_mahal as the score threshold for novelty detection. s_mahal_threshold = max(s_mahal);

Compute the Mahalanobis distance from XTest to the distribution of XTrain by using the pdist2 function. sTest_mahal = pdist2(XTest,mu,"mahalanobis",sigma);

Obtain the anomaly indicators for XTest. tfTest_mahal = sTest_mahal > s_mahal_threshold;

Plot histograms of the score values. figure histogram(s_mahal,Normalization="probability"); hold on histogram(sTest_mahal,Normalization="probability"); xline(s_mahal_threshold,"k-", ... join(["Threshold =" s_mahal_threshold])) legend("Training data","Test Data",Location="southeast") title("Histograms of Mahalanobis Distances") hold off

17-109

17

Cluster Analysis

Check the fraction of detected anomalies in the test data. NF_mahal = sum(tfTest_mahal)/NTest NF_mahal = 8.3077e-05

Display the observation index of the anomalies in the test data. idx_mahal = find(tfTest_mahal) idx_mahal = 3654

See Also iforest | isanomaly (IsolationForest) | rrcforest | isanomaly (RobustRandomCutForest) | lof | isanomaly (LocalOutlierFactor) | ocsvm | isanomaly (OneClassSVM) | robustcov | pdist2

Related Examples

17-110

•

“Anomaly Detection with Isolation Forest” on page 17-81

•

“Model-Specific Anomaly Detection” on page 17-111

•

“Anomaly Detection in Industrial Machinery Using Three-Axis Vibration Data” (Predictive Maintenance Toolbox)

Model-Specific Anomaly Detection

Model-Specific Anomaly Detection Statistics and Machine Learning Toolbox provides model-specific anomaly detection features that you can apply after training a classification, regression, or clustering model. For example, you can detect anomalies by using these object functions: • Proximity matrix — outlierMeasure for random forest (CompactTreeBagger) • Mahalanobis distance — mahal for discriminant analysis classifier (ClassificationDiscriminant) and mahal for Gaussian mixture model (gmdistribution) • Unconditional probability density — logp for discriminant analysis classifier (ClassificationDiscriminant), logp for naive Bayes classifier (ClassificationNaiveBayes), and logp for naive Bayes classifier for incremental learning (incrementalClassificationNaiveBayes) For details, see the function reference pages.

Detect Outliers After Training Random Forest Train a random forest classifier by using the TreeBagger function, and detect outliers in the training data by using the object function outlierMeasure. Train Random Forest Classifier Load the ionosphere data set, which contains radar return qualities (Y) and predictor data (X) for 34 variables. Radar returns are either of good quality ('g') or bad quality ('b'). load ionosphere

Train a random forest classifier. Store the out-of-bag information for predictor importance estimation. rng("default") % For reproducibility Mdl_TB = TreeBagger(100,X,Y,Method="classification", ... OOBPredictorImportance="on");

Mdl_TB is a TreeBagger model object for classification. TreeBagger stores predictor importance estimates in the property OOBPermutedPredictorDeltaError. Detect Outliers Using Proximity Detect outliers in the training data by using the outlierMeasure function. The function computes outlier measures based on the average squared proximity between one observation and the other observations in the trained random forest. CMdl_TB = compact(Mdl_TB); s_proximity = outlierMeasure(CMdl_TB,X,Labels=Y);

A high value of the outlier measure indicates that the observation is an outlier. Find the threshold corresponding to the 95th percentile and identify outliers by using the isoutlier function. [TF,~,U] = isoutlier(s_proximity,Percentiles=[0 95]);

Plot a histogram of the outlier measures. Create a vertical line at the outlier threshold. 17-111

17

Cluster Analysis

histogram(s_proximity) xline(U,"r-",join(["Threshold =" U])) title("Histogram of Outlier Measures")

Visualize observation values using the two most important features selected by the predictor importance estimates in the property OOBPermutedPredictorDeltaError. [~,idx] = sort(Mdl_TB.OOBPermutedPredictorDeltaError,'descend'); TF_c = categorical(TF,[0 1],["Normal Points" "Anomalies"]); gscatter(X(:,idx(1)),X(:,idx(2)),TF_c,"kr",".x",[],"on", ... Mdl_TB.PredictorNames(idx(1)),Mdl_TB.PredictorNames(idx(2)))

17-112

Model-Specific Anomaly Detection

Train the classifier again without outliers, and plot the histogram of the outlier measures. Mdl_TB = TreeBagger(100,X(~TF,:),Y(~TF),Method="classification"); s_proximity = outlierMeasure(CMdl_TB,X(~TF,:),Labels=Y(~TF)); histogram(s_proximity) title("Histogram of Outlier Measures After Removing Outliers")

17-113

17

Cluster Analysis

Detect Outliers After Training Discriminant Analysis Classifier Train a discriminant analysis model by using the fitcdiscr function, and detect outliers in the training data by using the object functions logp and mahal. Train Discriminant Analysis Model Load Fisher's iris data set. The matrix meas contains flower measurements for 150 different flowers. The variable species lists the species for each flower. load fisheriris

Train a discriminant analysis model using the entire data set. Mdl = fitcdiscr(meas,species,PredictorNames= ... ["Sepal Length" "Sepal Width" "Petal Length" "Petal Width"]);

Mdl is a ClassificationDiscriminant model. Detect Outliers Using Log Unconditional Probability Density Compute the log unconditional probability densities of the training data. s_logp = logp(Mdl,meas);

A low density value indicates that the corresponding observation is an outlier. 17-114

Model-Specific Anomaly Detection

Determine the lower density threshold for outliers by using the isoutlier function. [~,L_logp] = isoutlier(s_logp);

Identify outliers by using the threshold. TF_logp = s_logp < L_logp;

Plot a histogram of the density values. Create a vertical line at the outlier threshold. figure histogram(s_logp) xline(L_logp,"r-",join(["Threshold =" L_logp])) title("Histogram of Log Unconditional Probability Densities ")

To compare observation values between normal points and anomalies, create a matrix of grouped histograms and grouped scatter plots for each combination of variables by using the gplotmatrix function. TF_logp_c = categorical(TF_logp,[0 1],["Normal Points" "Anomalies"]); gplotmatrix(meas,[],TF_logp_c,"kr",".x",[],[],[],Mdl.PredictorNames)

17-115

17

Cluster Analysis

Detect Outliers Using Mahalanobis Distance Find the squared Mahalanobis distances from the training data to the class means of true labels. s_mahal = mahal(Mdl,meas,ClassLabels=species);

A large distance value indicates that the corresponding observation is an outlier. Determine the threshold corresponding to the 95th percentile and identify outliers by using the isoutlier function. [TF_mahal,~,U_mahal] = isoutlier(s_mahal,Percentiles=[0 95]);

Plot a histogram of the distances. Create a vertical line at the outlier threshold. figure histogram(s_mahal) xline(U_mahal,"-r",join(["Threshold =" U_mahal])) title("Histogram of Mahalanobis Distances")

17-116

Model-Specific Anomaly Detection

Compare the observation values between normal points and anomalies by using the gplotmatrix function. TF_mahal_c = categorical(TF_mahal,[0 1],["Normal Points" "Anomalies"]); gplotmatrix(meas,[],TF_mahal_c,"kr",".x",[],[],[],Mdl.PredictorNames)

17-117

17

Cluster Analysis

See Also outlierMeasure | mahal (ClassificationDiscriminant) | mahal (gmdistribution) | logp (ClassificationDiscriminant) | logp (ClassificationNaiveBayes) | logp (incrementalClassificationNaiveBayes) | isoutlier

Related Examples •

17-118

“Unsupervised Anomaly Detection” on page 17-91

18 Parametric Classification • “Parametric Classification” on page 18-2 • “ROC Curve and Performance Metrics” on page 18-3 • “Performance Curves by perfcurve” on page 18-19 • “Classification” on page 18-24

18

Parametric Classification

Parametric Classification Models of data with a categorical response are called classifiers. A classifier is built from training data, for which classifications are known. The classifier assigns new test data to one of the categorical levels of the response. Parametric methods, like “Discriminant Analysis Classification” on page 21-2, fit a parametric model to the training data and interpolate to classify test data. Nonparametric methods, like classification and regression trees, use other means to determine classifications.

See Also fitcdiscr | fitcnb

Related Examples

18-2

•

“Discriminant Analysis Classification” on page 21-2

•

“Naive Bayes Classification” on page 22-2

ROC Curve and Performance Metrics

ROC Curve and Performance Metrics In this section... “Introduction to ROC Curve” on page 18-3 “Performance Curve with MATLAB” on page 18-4 “ROC Curve for Multiclass Classification” on page 18-9 “Performance Metrics” on page 18-11 “Classification Scores and Thresholds” on page 18-13 “Pointwise Confidence Intervals” on page 18-17 This topic describes the performance metrics for classification, including the receiver operating characteristic (ROC) curve and the area under a ROC curve (AUC), and introduces the Statistics and Machine Learning Toolbox object rocmetrics, which you can use to compute performance metrics for binary and multiclass classification problems.

Introduction to ROC Curve After training a classification model, such as ClassificationNaiveBayes or ClassificationEnsemble, you can examine the performance of the algorithm on a specific test data set. A common approach is to compute a gross measure of performance, such as quadratic loss or accuracy, averaged over the entire test data set. You can inspect the classifier performance more closely by plotting a ROC curve and computing performance metrics. For example, you can find the threshold that maximizes the classification accuracy, or assess how the classifier performs in the regions of high sensitivity and high specificity. Receiver Operating Characteristic (ROC) Curve A ROC curve shows the true positive rate (TPR, or sensitivity) versus the false positive rate (FPR, or 1-specificity) for different thresholds of classification scores. Each point on a ROC curve corresponds to a pair of TPR and FPR values for a specific threshold value. You can find different pairs of TPR and FPR values by varying the threshold value, and then create a ROC curve using the pairs. For a multiclass classification problem, you can use the one-versus-all on page 18-9 coding design and find a ROC curve for each class. The one-versus-all coding design treats a multiclass classification problem as a set of binary classification problems, and assumes one class as positive and the rest as negative in each binary problem. A binary classifier typically classifies an observation into a class that yields a larger score, which corresponds to a positive adjusted score on page 18-14 for a one-versus-all binary classification problem. That is, a classifier typically uses 0 as a threshold and determines whether an observation is positive or negative. For example, if an adjusted score for an observation is 0.2, then the classifier with a threshold value of 0 assigns the observation to the positive class. You can find a pair of TPR and FPR values by applying the threshold value to all observations, and use the pair as a single point on a ROC curve. Now, assume you use a new threshold value of 0.25. Then, the classifier with a threshold value of 0.25 assigns the observation with an adjusted score of 0.2 to the negative class. By applying the new threshold to all observations, you can find a new pair of TPR and FPR values and have a new point on the a ROC curve. By repeating this process for various threshold values, you find pairs of TPR and FPR values and create a ROC curve using the pairs. 18-3

18

Parametric Classification

Area Under ROC Curve (AUC) The area under a ROC curve (AUC) corresponds to the integral of a ROC curve (TPR values) with respect to FPR from FPR = 0 to FPR = 1. The AUC provides an aggregate performance measure across all possible thresholds. The AUC values are in the range 0 to 1, and larger AUC values indicate better classifier performance. • A perfect classifier always correctly assigns positive class observations to the positive class and has a true positive rate of 1 for any threshold values. Therefore, the line passing through [0,0], [0,1], and [1,1] represents the perfect classifier, and the AUC value is 1. • A random classifier returns random score values and has the same values for the false positive rate and true positive rate for any threshold values. Therefore, the ROC curve for the random classifier lies on the diagonal line, and the AUC value is 0.5.

Performance Curve with MATLAB You can compute a ROC curve and other performance curves by creating a rocmetrics object. The rocmetrics object supports both binary and multiclass classification problems and provides the following object functions: • plot — Plot ROC or other classifier performance curves. plot returns a ROCCurve graphics object for each curve. You can modify the properties of the objects to control the appearance of each curve. For details, see ROCCurve Properties. • average — Compute performance metrics for an average ROC curve for multiclass problems. • addMetrics — Compute additional classification performance metrics. You can also compute the confidence intervals of performance curves by providing cross-validated inputs or by bootstrapping the input data. After training a classifier, use a performance curve to evaluate the classifier performance on test data. Various measures such as mean squared error, classification error, or exponential loss can summarize the predictive power of a classifier in a single number. However, a performance curve offers more information because it lets you explore the classifier performance across a range of thresholds on the classification scores. Plot ROC Curve for Binary Classifier Compute the performance metrics (FPR and TPR) for a binary classification problem by creating a rocmetrics object, and plot a ROC curve by using plot function. Load the ionosphere data set. This data set has 34 predictors (X) and 351 binary responses (Y) for radar returns, either bad ('b') or good ('g'). load ionosphere

Partition the data into training and test sets. Use approximately 80% of the observations to train a support vector machine (SVM) model, and 20% of the observations to test the performance of the trained model on new data. Partition the data using cvpartition. rng("default") % For reproducibility of the partition c = cvpartition(Y,Holdout=0.20);

18-4

ROC Curve and Performance Metrics

trainingIndices = training(c); % Indices for the training set testIndices = test(c); % Indices for the test set XTrain = X(trainingIndices,:); YTrain = Y(trainingIndices); XTest = X(testIndices,:); YTest = Y(testIndices);

Train an SVM classification model. Mdl = fitcsvm(XTrain,YTrain);

Compute the classification scores for the test set. [~,Scores] = predict(Mdl,XTest); size(Scores) ans = 1×2 70

2

The output Scores is a matrix of size 70-by-2. The column order of Scores follows the class order in Mdl. Display the class order stored in Mdl.ClassNames. Mdl.ClassNames ans = 2x1 cell {'b'} {'g'}

Create a rocmetrics object by using the true labels in YTest and the classification scores in Scores. Specify the column order of Scores using Mdl.ClassNames. rocObj = rocmetrics(YTest,Scores,Mdl.ClassNames);

rocObj is a rocmetrics object that stores the AUC values and performance metrics for each class in the AUC and Metrics properties. Display the AUC property. rocObj.AUC ans = 1×2 0.8587

0.8587

For a binary classification problem, the AUC values are equal to each other. The table in Metrics contains the performance metric values for both classes, vertically concatenated according to the class order. Find the rows for the first class in the table, and display the first eight rows. idx = strcmp(rocObj.Metrics.ClassName,Mdl.ClassNames(1)); head(rocObj.Metrics(idx,:)) ClassName _________ {'b'}

Threshold _________ 15.544

FalsePositiveRate _________________ 0

TruePositiveRate ________________ 0

18-5

18

Parametric Classification

{'b'} {'b'} {'b'} {'b'} {'b'} {'b'} {'b'}

15.544 15.104 11.424 10.078 9.9721 9.9401 9.0326

0 0 0 0 0 0 0

0.04 0.08 0.16 0.2 0.24 0.28 0.32

Plot the ROC curve for each class by using the plot function. plot(rocObj)

For each class, the plot function plots a ROC curve and displays a filled circle marker at the model operating point. The legend displays the class name and AUC value for each curve. Note that you do not need to examine ROC curves for both classes in a binary classification problem. The two ROC curves are symmetric, and the AUC values are identical. A TPR of one class is a true negative rate (TNR) of the other class, and TNR is 1-FPR. Therefore, a plot of TPR versus FPR for one class is the same as a plot of 1-FPR versus 1-TPR for the other class. Plot the ROC curve for the first class only by specifying the ClassNames name-value argument. plot(rocObj,ClassNames=Mdl.ClassNames(1))

18-6

ROC Curve and Performance Metrics

Plot ROC Curves for Multiclass Classifier Compute the performance metrics (FPR and TPR) for a multiclass classification problem by creating a rocmetrics object, and plot a ROC curve for each class by using the plot function. Specify the AverageROCType name-value argument of plot to create the average ROC curve for the multiclass problem. Load the fisheriris data set. The matrix meas contains flower measurements for 150 different flowers. The vector species lists the species for each flower. species contains three distinct flower names. load fisheriris

Train a classification tree that classifies observations into one of the three labels. Cross-validate the model using 10-fold cross-validation. rng("default") % For reproducibility Mdl = fitctree(meas,species,Crossval="on");

Compute the classification scores for validation-fold observations. [~,Scores] = kfoldPredict(Mdl); size(Scores) ans = 1×2

18-7

18

Parametric Classification

150

3

The output Scores is a matrix of size 150-by-3. The column order of Scores follows the class order in Mdl. Display the class order stored in Mdl.ClassNames. Mdl.ClassNames ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' }

Create a rocmetrics object by using the true labels in species and the classification scores in Scores. Specify the column order of Scores using Mdl.ClassNames. rocObj = rocmetrics(species,Scores,Mdl.ClassNames);

rocObj is a rocmetrics object that stores the AUC values and performance metrics for each class in the AUC and Metrics properties. Display the AUC property. rocObj.AUC ans = 1×3 1.0000

0.9636

0.9636

The table in Metrics contains the performance metric values for all three classes, vertically concatenated according to the class order. Find and display the rows for the second class in the table. idx = strcmp(rocObj.Metrics.ClassName,Mdl.ClassNames(2)); rocObj.Metrics(idx,:) ans=13×4 table ClassName ______________ {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'}

Threshold _________

FalsePositiveRate _________________

1 1 0.95455 0.91304 -0.2 -0.33333 -0.6 -0.86957 -0.91111 -0.95122 -0.95238 -0.95349 -1

0 0.01 0.02 0.03 0.04 0.06 0.08 0.12 0.16 0.31 0.38 0.44 1

TruePositiveRate ________________ 0 0.7 0.8 0.9 0.9 0.9 0.9 0.92 0.96 0.96 0.98 0.98 1

Plot the ROC curve for each class. Specify AverageROCType="micro" to compute the performance metrics for the average ROC curve using the micro-averaging method. plot(rocObj,AverageROCType="micro")

18-8

ROC Curve and Performance Metrics

The filled circle markers indicate the model operating points. The legend displays the class name and AUC value for each curve.

ROC Curve for Multiclass Classification For a multiclass classifier, the rocmetrics function computes the performance metrics of a oneversus-all ROC curve for each class, and the average function computes the metrics for an average of the ROC curves. You can use the plot function to plot a ROC curve for each class and the average ROC curve. One-Versus-All (OVA) Coding Design The one-versus-all (OVA) coding design reduces a multiclass classification problem to a set of binary classification problems. In this coding design, each binary classification treats one class as positive and the rest of the classes as negative. rocmetrics uses the OVA coding design for multiclass classification and evaluates the performance on each class by using the binary classification that the class is positive. For example, the OVA coding design for three classes formulates three binary classifications: Binary 1 Binary 2 Binary 3 Class 1 1 −1 −1 Class 2 −1 1 −1 Class 3 −1 −1 1 18-9

18

Parametric Classification

Each row corresponds to a class, and each column corresponds to a binary classification problem. The first binary classification assumes that class 1 is a positive class and the rest of the classes are negative. rocmetrics evaluates the performance on the first class by using the first binary classification problem. rocmetrics applies the OVA coding design to a binary classification problem as well if you specify classification scores as a two-column matrix. rocmetrics formulates two one-versus-all binary classification problems each of which treats one class as a positive class and the other class as a negative class, and rocmetrics finds two ROC curves. You can use one of them to evaluate the binary classification problem. Average of Performance Metrics You can compute metrics for an average ROC curve by using the average function. Alternatively, you can use the plot function to compute the metrics and plot the average ROC curve. For examples, see “Find Average ROC Curve” on page 35-133 (example for average) and “Plot Average ROC Curve for Multiclass Classifier” on page 35-6053 (example for plot). average and plot support three algorithms for computing the average false positive rate (FPR) and average true positive rate (TPR) to find the average ROC curve: • Micro-averaging — The software combines all one-versus-all on page 18-9 binary classification problems into one binary classification problem and computes the average performance metrics as follows: 1

Convert the values in the Labels property of a rocmetrics object to logical values where logical 1 (true) indicates a positive class for each binary problem.

2

Stack the converted vectors of labels, one vector from each binary problem, into a single vector.

3

Convert the matrix that contains the adjusted values on page 18-14 of the classification scores (the Scores property) into a vector by stacking the columns of the matrix.

4

Compute the components of the confusion matrix on page 18-11 for the combined binary problem for each threshold (each distinct value of adjusted scores). A confusion matrix contains the number of instances for true positive (TP), false negative (FN), false positive (FP), and true negative (TN).

5

Compute the average FPR and TPR based on the components of the confusion matrix.

• Macro-averaging — The software computes the average values for FPR and TPR by averaging the values of all one-versus-all binary classification problems. The software uses three metrics—threshold, FPR, and TPR—to compute the average values as follows: 1

Determine a fixed metric. If you specify FixedMetric of rocmetrics as "FalsePositiveRate" or "TruePositiveRate", then the function holds the specified metric fixed. Otherwise, the function holds the threshold values fixed.

2

Find all distinct values in the Metrics property for the fixed metric.

3

Find the corresponding values for the other two metrics for each binary problem.

4

Average the FPR and TPR values of all binary problems.

• Weighted macro-averaging — The software computes the weighted average values for FPR and TPR using the macro-averaging algorithm and using the prior class probabilities (the Prior property) as weights. 18-10

ROC Curve and Performance Metrics

Performance Metrics The rocmetrics object supports these built-in performance metrics: • Number of true positives (TP) • Number of false negatives (FN) • Number of false positives (FP) • Number of true negatives (TN) • Sum of TP and FP • Rate of positive predictions (RPP) • Rate of negative predictions (RNP) • Accuracy • True positive rate (TPR), recall, or sensitivity • False negative rate (FNR), or miss rate • False positive rate (FPR), fallout, or 1-specificity • True negative rate (TNR), or specificity • Positive predictive value (PPV), or precision • Negative predictive value (NPR) • Expected cost rocmetrics also supports a custom metric specified as a function handle. For details, see the AdditionalMetrics name-value argument of the rocmetrics function. rocmetrics computes performance metric values for various thresholds for each one-versus-all on page 18-9 binary classification problem using a confusion matrix, scale vector, and misclassification cost matrix. Each performance metric is a function of a confusion matrix and scale vector. The expected cost is also a function of the misclassification cost matrix, as is a custom metric. • Confusion matrix — A confusion matrix contains the number of instances for true positive (TP), false negative (FN), false positive (FP), and true negative (TN). rocmetrics computes confusion matrices for various threshold values for each binary problem. • Scale vector — A scale vector is defined by the prior class probabilities and the number of classes in true labels. rocmetrics finds the probabilities and number of classes for each binary problem from the prior class probabilities specified by the Prior name-value argument and the true labels specified by the Labels input argument. • Misclassification cost matrix — rocmetrics converts the misclassification cost matrix specified by the Cost name-value argument to the values for each binary problem. By default, rocmetrics uses all distinct adjusted score on page 18-14 values as threshold values for each binary problem. For more details on threshold values, see “Thresholds, Fixed Metric, and Fixed Metric Values” on page 18-15. Confusion Matrix A confusion matrix is defined as TP FN , FP TN 18-11

18

Parametric Classification

where • P stands for "positive". • N stands for "negative". • T stands for "true". • F stands for "false". For example, the first row of the confusion matrix defines how the classifier identifies instances of the positive class: TP is the count of correctly identified positive instances, and FN is the count of positive instances misidentified as negative. rocmetrics computes confusion matrices for various threshold values for each one-versus-all binary classification. The one-versus-all binary classification model classifies an observation into a positive class if the score for the observation is greater than or equal to the threshold value. Prior Class Probabilities By default, rocmetrics uses empirical probabilities, which are class frequencies in the true labels. rocmetrics normalizes the 1-by-K prior probability vector π to a 1-by-2 vector for each one-versusall binary classification, where K is the number of classes. The prior probabilities for the kth binary classification in which the positive class is the kth class is πk, 1 − πk , where πk is the prior probability for class k in the multiclass problem. Scale Vector rocmetrics defines a scale vector sk of size 2-by-1 for each one-versus-all binary classification problem: sk =

πkN 1 , πkN + 1 − πk P 1 − πk P

where P and N represent the total instances of positive class and negative class, respectively. That is, P is the sum of TP and FN, and N is the sum of FP and TN. sk(1) (first element of sk) and sk(2) (second element of sk) are the scales for the positive class (kth class) and negative class (the rest), respectively. rocmetrics applies the scale values as multiplicative factors to the counts from the corresponding class. That is, the function multiplies counts from the positive class by sk(1) and counts from the negative class by sk(2). For example, to compute the positive predictive value (PPV = TP/(TP+FP)) for the kth binary problem, rocmetrics scales PPV as follows: PPV =

sk(1) ⋅ TP . sk(1) ⋅ TP + sk(2) ⋅ FP

Misclassification Cost Matrix By default, rocmetrics uses a K-by-K cost matrix C, where C(i,j) = 1 if i ~= j, and C(i,j) = 0 if i = j. C(i,j) is the cost of classifying a point into class j if its true class is i (that is, the rows correspond to the true class and the columns correspond to the predicted class). rocmetrics normalizes the K-by-K cost matrix C to a 2-by-2 matrix for each one-versus-all binary classification: 18-12

ROC Curve and Performance Metrics

Ck =

0

costk(N P)

costk(P N)

0

.

Ck is the cost matrix for the kth binary classification in which the positive class is the kth class, where costk(N|P) is the cost of misclassifying a positive class as a negative class, and costk(P|N) is the cost of misclassifying a negative class as a positive class. For class k, let πk+ and πk- be K-by-1 vectors with the following values: + πki =

πi if k = i, 0 otherwise.

−= πki

0 if k = i, πi otherwise.

πki+ and πki- are the ith elements of πk+ and πk-, respectively. The cost of classifying a positive-class (class k) observation into the negative class (the rest) is costk(N P) = πk+ ′Cπk− . Similarly, the cost of classifying a negative-class observation into the positive class is costk(P N) = πk− ′Cπk+ .

Classification Scores and Thresholds The rocmetrics function determines threshold values from the input classification scores or the FixedMetricValues name-value argument. Classification Score Input for rocmetrics rocmetrics accepts classification scores (Scores) in a matrix of size n-by-K or a vector of length n, where n is the number of observations and K is the number classes. For cross-validated data, Scores can be a cell array of vectors or a cell array of matrices. • Matrix of size n-by-K — Specify Scores using the second output argument of the predict function of a classification model object (such as predict of ClassificationTree). Each row of the output contains classification scores for an observation for all classes, and the column order of the output matches the class order in the ClassNames property of the classification model object. You can specify Scores as a matrix for both binary classification and multiclass classification problems. If you use a matrix format, rocmetrics adjusts the classification scores for each class relative to the scores for the rest of the classes. Specifically, the adjusted score for a class given an observation is the difference between the score for the class and the maximum value of the scores for the rest of the classes. For more details, see “Adjusted Scores for Multiclass Classification Problem” on page 18-14. • Vector of length n — Specify Scores using a vector when you have classification scores for one class only. A vector input is also suitable when you want to use a different type of adjusted scores for a multiclass problem. As an example, consider a problem with three classes, A, B, and C. If you want to compute a performance curve for separating classes A and B, with C ignored, you need to address the ambiguity in selecting A over B. You can use the score ratio s(A)/s(B) or score difference s(A)–s(B) and pass the vector to rocmetrics; this approach can depend on the nature of the scores and their normalization. 18-13

18

Parametric Classification

You can use rocmetrics with any classifier or any function that returns a numeric score for an instance of input data. • A high score returned by a classifier for a given instance and class signifies that the instance is likely from the respective class. • A low score signifies that the instance is not likely from the respective class. For some classifiers, you can interpret the score as the posterior probability of observing an instance of a class given an observation. An example of such a score is the fraction of observations for a certain class in a leaf of a decision tree. In this case, scores fall into the range from 0 to 1, and scores from all classes add up to 1. Other functions can return scores ranging between minus and plus infinity, without any obvious mapping from the score to the posterior class probability. rocmetrics does not impose any requirements on the input score range. Because of this lack of normalization, you can use rocmetrics to process scores returned by any classification, regression, or fit functions. rocmetrics does not make any assumptions about the nature of input scores. rocmetrics is intended for use with classifiers that return scores, not those that return only predicted classes. Consider a classifier that returns only classification labels, 0 or 1, for data with two classes. In this case, the performance curve reduces to a single point because the software can split classified instances into positive and negative categories in one way only. Adjusted Scores for Multiclass Classification Problem For each class, rocmetrics adjusts the classification scores (input argument Scores of rocmetrics) relative to the scores for the rest of the classes if you specify Scores as a matrix. Specifically, the adjusted score for a class given an observation is the difference between the score for the class and the maximum value of the scores for the rest of the classes. For example, if you have [s1,s2,s3] in a row of Scores for a classification problem with three classes, the adjusted score values are [s1-max(s2,s3),s2-max(s1,s3),s3-max(s1,s2)]. rocmetrics computes the performance metrics using the adjusted score values for each class. For a binary classification problem, you can specify Scores as a two-column matrix or a column vector. Using a two-column matrix is a simpler option because the predict function of a classification object returns classification scores as a matrix, which you can pass to rocmetrics. If you pass scores in a two-column matrix, rocmetrics adjusts scores in the same way that it adjusts scores for multiclass classification, and it computes performance metrics for both classes. You can use the metric values for one of the two classes to evaluate the binary classification problem. The metric values for a class returned by rocmetrics when you pass a two-column matrix are equivalent to the metric values returned by rocmetrics when you specify classification scores for the class as a column vector. Model Operating Point The model operating point represents the FPR and TPR corresponding to the typical threshold value. The typical threshold value depends on the input format of the Scores argument (classification scores) specified when you create a rocmetrics object: • If you specify Scores as a matrix, rocmetrics assumes that the values in Scores are the scores for a multiclass classification problem and uses adjusted score on page 18-14 values. A multiclass classification model classifies an observation into a class that yields the largest score, which corresponds to a nonnegative score in the adjusted scores. Therefore, the threshold value is 0. 18-14

ROC Curve and Performance Metrics

• If you specify Scores as a column vector, rocmetrics assumes that the values in Scores are posterior probabilities of the class specified in ClassNames. A binary classification model classifies an observation into a class that yields a higher posterior probability, that is, a posterior probability greater than 0.5. Therefore, the threshold value is 0.5. For a binary classification problem, you can specify Scores as a two-column matrix or a column vector. However, if the classification scores are not posterior probabilities, you must specify Scores as a matrix. A binary classifier classifies an observation into a class that yields a larger score, which is equivalent to a class that yields a nonnegative adjusted score. Therefore, if you specify Scores as a matrix for a binary classifier, rocmetrics can find a correct model operating point using the same scheme that it applies to a multiclass classifier. If you specify classification scores that are not posterior probabilities as a vector, rocmetrics cannot identify a correct model operating point because it always uses 0.5 as a threshold for the model operating point. The plot function displays a filled circle marker at the model operating point for each ROC curve (see ShowModelOperatingPoint). The function chooses a point corresponding to the typical threshold value. If the curve does not have a data point for the typical threshold value, the function finds a point that has the smallest threshold value greater than the typical threshold. The point on the curve indicates identical performance to the performance of the typical threshold value. For an example, see “Find Model Operating Point and Optimal Operating Point” on page 35-7544. Thresholds, Fixed Metric, and Fixed Metric Values rocmetrics finds the ROC curves and other metric values that correspond to the fixed values (FixedMetricValues name-value argument) of the fixed metric (FixedMetric name-value argument), and stores the values in the Metrics property as a table. The default FixedMetric value is "Thresholds", and the default FixedMetricValues value is "all". For each class, rocmetrics uses all distinct adjusted score on page 18-14 values as threshold values, computes the components of the confusion matrix on page 18-11 for each threshold value, and then computes performance metrics using the confusion matrix components. If you use the default FixedMetricValues value ("all"), specifying a nondefault FixedMetric value does not change the software behavior unless you specify to compute confidence intervals. If rocmetrics computes confidence intervals, then it holds FixedMetric fixed at FixedMetricValues and computes confidence intervals for other metrics. For more details, see “Pointwise Confidence Intervals” on page 18-17. If you specify a nondefault value for FixedMetricValues, rocmetrics finds the threshold values corresponding to the specified fixed metric values (FixedMetricValues for FixedMetric) and computes other performance metric values using the threshold values. • If you set the UseNearestNeighbor name-value argument to false, then rocmetrics uses the exact threshold values corresponding to the specified fixed metric values. • If you set UseNearestNeighbor to true, then among the adjusted scores, rocmetrics finds a value that is the nearest to the threshold value corresponding to each specified fixed metric value. The Metrics property includes an additional threshold value that replicates the largest threshold value for each class so that a ROC curve starts from the origin (0,0). The additional threshold value represents the reject-all threshold, for which TP = FP = 0 (no positive instances, that is, zero true positive instances and zero false positive instances). 18-15

18

Parametric Classification

Another special threshold in Metrics is the accept-all threshold, which is the smallest threshold value for which TN = FN = 0 (no negative instances, that is, zero true negative instances and zero false negative instances). Note that the positive predictive value (PPV = TP/(TP+FP)) is NaN for the reject-all threshold, and the negative predictive value (NPV = TN/(TN+FN)) is NaN for the accept-all threshold. NaN Score Values rocmetrics processes NaN values in the classification score input (Scores) in one of two ways: • If you specify NaNFlag="omitnan" (default), then rocmetrics discards rows with NaN scores. • If you specify NaNFlag="includenan", then rocmetrics adds the instances of NaN scores to false classification counts in the respective class for each one-versus-all binary classification. That is, for any threshold, the software counts instances with NaN scores from the positive class as false negative (FN), and counts instances with NaN scores from the negative class as false positive (FP). The software computes the metrics corresponding to a threshold of 1 by setting the number of true positive (TP) instances to zero and setting the number of true negative (TN) instances to the total count minus the NaN count in the negative class. Consider an example with two rows in the positive class and two rows in the negative class, each pair having a NaN score: True Class Label

Classification Score

Negative

0.2

Negative

NaN

Positive

0.7

Positive

NaN

If you discard rows with NaN scores (NaNFlag="omitnan"), then as the score threshold varies, rocmetrics computes performance metrics as shown in the following table. For example, a threshold of 0.5 corresponds to the middle row where rocmetrics classifies rows 1 and 3 correctly and omits rows 2 and 4. Threshold

TP

FN

FP

TN

1

0

1

0

1

0.5

1

0

0

1

0

1

0

1

0

If you add rows with NaN scores to the false category in their respective classes (NaNFlag="includenan"), rocmetrics computes performance metrics as shown in the following table. For example, a threshold of 0.5 corresponds to the middle row where rocmetrics counts rows 2 and 4 as incorrectly classified. Notice that only the FN and FP columns differ between these two tables.

18-16

Threshold

TP

FN

FP

TN

1

0

2

1

1

0.5

1

1

1

1

0

1

1

2

0

ROC Curve and Performance Metrics

Pointwise Confidence Intervals rocmetrics computes pointwise confidence intervals for the performance metrics, including the AUC values and score thresholds, by using either bootstrap samples or cross-validated data. The object stores the values in the Metrics and AUC properties. • Bootstrap — To compute confidence intervals using bootstrapping, set the NumBootstraps namevalue argument to a positive integer. rocmetrics generates NumBootstraps bootstrap samples. The function creates each bootstrap sample by randomly selecting n out of the n rows of input data with replacement. For an example, see “Compute Confidence Intervals Using Bootstrapping” on page 35-7539. • Cross-validation — To compute confidence intervals using cross-validation, specify cross-validated data for true class labels (Labels), classification scores (Scores), and observation weights (Weights) using cell arrays. rocmetrics treats elements in the cell arrays as cross-validation folds. For an example, see “Compute Confidence Intervals with Cross-Validated Input Data” on page 35-7541. You cannot specify both options. If you specify a custom metric in AdditionalMetrics, you must use bootstrap to compute confidence intervals. rocmetrics does not support cross-validation for a custom metric. rocmetrics holds FixedMetric (threshold, FPR, TPR, or a metric specified in AdditionalMetrics) fixed at FixedMetricValues and computes the confidence intervals on AUC and other metrics for the points corresponding to the values in FixedMetricValues. • Threshold averaging (TA) (when FixedMetric is "Thresholds" (default)) — rocmetrics estimates confidence intervals for performance metrics at fixed threshold values. The function takes samples at the fixed thresholds and averages the corresponding metric values. • Vertical averaging (VA) (when FixedMetric is a performance metric) — rocmetrics estimates confidence intervals for thresholds and other performance metrics at the fixed metric values. The function takes samples at the fixed metric values and averages the corresponding threshold and metric values. The function estimates confidence intervals for the AUC value only when FixedMetric is "Thresholds", "FalsePositiveRate", or "TruePositiveRate".

References [1] Fawcett, T. “ROC Graphs: Notes and Practical Considerations for Researchers”, Machine Learning 31, no. 1 (2004): 1–38. [2] Zweig, M., and G. Campbell. “Receiver-Operating Characteristic (ROC) Plots: A Fundamental Evaluation Tool in Clinical Medicine.” Clinical Chemistry 39, no. 4 (1993): 561–577. [3] Davis, J., and M. Goadrich. “The Relationship Between Precision-Recall and ROC Curves.” Proceedings of ICML ’06, 2006, pp. 233–240. [4] Moskowitz, C. S., and M. S. Pepe. “Quantifying and Comparing the Predictive Accuracy of Continuous Prognostic Factors for Binary Outcomes.” Biostatistics 5, no. 1 (2004): 113–27. [5] Huang, Y., M. S. Pepe, and Z. Feng. “Evaluating the Predictiveness of a Continuous Marker.” U. Washington Biostatistics Paper Series, 2006, 250–61. 18-17

18

Parametric Classification

[6] Briggs, W. M., and R. Zaretzki. “The Skill Plot: A Graphical Technique for Evaluating Continuous Diagnostic Tests.” Biometrics 64, no. 1 (2008): 250–256. [7] Bettinger, R. “Cost-Sensitive Classifier Selection Using the ROC Convex Hull Method.” SAS Institute, 2003.

See Also rocmetrics | addMetrics | average | plot | ROCCurve Properties

18-18

Performance Curves by perfcurve

Performance Curves by perfcurve The perfcurve function computes a receiver operating characteristic (ROC) curve and other performance curves. You can use this function to evaluate classifier performance on test data after you train a classifier. Alternatively, you can compute performance metrics for a ROC curve and other performance curves by creating a rocmetrics object. rocmetrics supports both binary and multiclass classification problems, and provides object functions to plot a ROC curve (plot), compute an average ROC curve for multiclass problems (average), and compute additional metrics after creating an object (addMetrics). For more details, see “ROC Curve and Performance Metrics” on page 18-3.

Input Scores and Labels for perfcurve You can use perfcurve with any classifier or, more broadly, with any function that returns a numeric score for an instance of input data. By convention adopted here, • A high score returned by a classifier for any given instance signifies that the instance is likely from the positive class. • A low score signifies that the instance is likely from the negative classes. For some classifiers, you can interpret the score as the posterior probability of observing an instance of the positive class at point X. An example of such a score is the fraction of positive observations in a leaf of a decision tree. In this case, scores fall into the range from 0 to 1 and scores from positive and negative classes add up to unity. Other methods can return scores ranging between minus and plus infinity, without any obvious mapping from the score to the posterior class probability. perfcurve does not impose any requirements on the input score range. Because of this lack of normalization, you can use perfcurve to process scores returned by any classification, regression, or fit method. perfcurve does not make any assumptions about the nature of input scores or relationships between the scores for different classes. As an example, consider a problem with three classes, A, B, and C, and assume that the scores returned by some classifier for two instances are as follows: A

B

C

Instance 1

0.4

0.5

0.1

Instance 2

0.4

0.1

0.5

If you want to compute a performance curve for separation of classes A and B, with C ignored, you need to address the ambiguity in selecting A over B. You could opt to use the score ratio, s(A)/s(B), or score difference, s(A)-s(B); this choice could depend on the nature of these scores and their normalization. perfcurve always takes one score per instance. If you only supply scores for class A, perfcurve does not distinguish between observations 1 and 2. The performance curve in this case may not be optimal. perfcurve is intended for use with classifiers that return scores, not those that return only predicted classes. As a counter-example, consider a decision tree that returns only hard classification labels, 0 or 1, for data with two classes. In this case, the performance curve reduces to a single point because classified instances can be split into positive and negative categories in one way only. For input, perfcurve takes true class labels for some data and scores assigned by a classifier to these data. By default, this utility computes a Receiver Operating Characteristic (ROC) curve and 18-19

18

Parametric Classification

returns values of 1–specificity, or false positive rate, for X and sensitivity, or true positive rate, for Y. You can choose other criteria for X and Y by selecting one out of several provided criteria or specifying an arbitrary criterion through an anonymous function. You can display the computed performance curve using plot(X,Y).

Computation of Performance Metrics perfcurve can compute values for various criteria to plot either on the x- or the y-axis. All such criteria are described by a 2-by-2 confusion matrix, a 2-by-2 cost matrix, and a 2-by-1 vector of scales applied to class counts. Confusion Matrix The confusionchart matrix, C, is defined as TP FN FP TN where • P stands for "positive". • N stands for "negative". • T stands for "true". • F stands for "false". For example, the first row of the confusion matrix defines how the classifier identifies instances of the positive class: C(1,1) is the count of correctly identified positive instances and C(1,2) is the count of positive instances misidentified as negative. Misclassification Cost Matrix The cost matrix defines the cost of misclassification for each category: Cost(P P) Cost(N P) Cost(P N) Cost(N N) where Cost(I|J) is the cost of assigning an instance of class J to class I. Usually Cost(I|J)=0 for I=J. For flexibility, perfcurve allows you to specify nonzero costs for correct classification as well. Scale Vector The two scales include prior information about class probabilities. perfcurve computes these scales by taking scale(P)=prior(P)*N and scale(N)=prior(N)*P and normalizing the sum scale(P) +scale(N) to 1. P=TP+FN and N=TN+FP are the total instance counts in the positive and negative class, respectively. The function then applies the scales as multiplicative factors to the counts from the corresponding class: perfcurve multiplies counts from the positive class by scale(P) and counts from the negative class by scale(N). Consider, for example, computation of positive predictive value, PPV = TP/(TP+FP). TP counts come from the positive class and FP counts come from the negative class. Therefore, you need to scale TP by scale(P) and FP by scale(N), and the modified formula for PPV with prior probabilities taken into account is now: PPV =

18-20

scale(P) * TP scale(P) * TP + scale(N) * FP

Performance Curves by perfcurve

If all scores in the data are above a certain threshold, perfcurve classifies all instances as 'positive'. This means that TP is the total number of instances in the positive class and FP is the total number of instances in the negative class. In this case, PPV is simply given by the prior: PPV =

prior(P) prior(P) + prior(N)

The perfcurve function returns two vectors, X and Y, of performance measures. Each measure is some function of confusion, cost, and scale values. You can request specific measures by name or provide a function handle to compute a custom measure. The function you provide should take confusion, cost, and scale as its three inputs and return a vector of output values. Thresholds The criterion for X must be a monotone function of the positive classification count, or equivalently, threshold for the supplied scores. If perfcurve cannot perform a one-to-one mapping between values of the X criterion and score thresholds, it exits with an error message. By default, perfcurve computes values of the X and Y criteria for all possible score thresholds. Alternatively, it can compute a reduced number of specific X values supplied as an input argument. In either case, for M requested values, perfcurve computes M+1 values for X and Y. The first value out of these M+1 values is special. perfcurve computes it by setting the TP instance count to zero and setting TN to the total count in the negative class. This value corresponds to the 'reject all' threshold. On a standard ROC curve, this translates into an extra point placed at (0,0). NaN Score Values If there are NaN values among input scores, perfcurve can process them in either of two ways: • It can discard rows with NaN scores. • It can add them to false classification counts in the respective class. That is, for any threshold, instances with NaN scores from the positive class are counted as false negative (FN), and instances with NaN scores from the negative class are counted as false positive (FP). In this case, the first value of X or Y is computed by setting TP to zero and setting TN to the total count minus the NaN count in the negative class. For illustration, consider an example with two rows in the positive and two rows in the negative class, each pair having a NaN score: Class

Score

Negative

0.2

Negative

NaN

Positive

0.7

Positive

NaN

If you discard rows with NaN scores, then as the score cutoff varies, perfcurve computes performance measures as in the following table. For example, a cutoff of 0.5 corresponds to the middle row where rows 1 and 3 are classified correctly, and rows 2 and 4 are omitted. TP

FN

FP

TN

0

1

0

1

1

0

0

1

18-21

18

Parametric Classification

TP

FN

FP

TN

1

0

1

0

If you add rows with NaN scores to the false category in their respective classes, perfcurve computes performance measures as in the following table. For example, a cutoff of 0.5 corresponds to the middle row where now rows 2 and 4 are counted as incorrectly classified. Notice that only the FN and FP columns differ between these two tables. TP

FN

FP

TN

0

2

1

1

1

1

1

1

1

1

2

0

Multiclass Classification Problems For data with three or more classes, perfcurve takes one positive class and a list of negative classes for input. The function computes the X and Y values using counts in the positive class to estimate TP and FN, and using counts in all negative classes to estimate TN and FP. perfcurve can optionally compute Y values for each negative class separately and, in addition to Y, return a matrix of size M-byC, where M is the number of elements in X or Y and C is the number of negative classes. You can use this functionality to monitor components of the negative class contribution. For example, you can plot TP counts on the X-axis and FP counts on the Y-axis. In this case, the returned matrix shows how the FP component is split across negative classes.

Confidence Intervals You can also use perfcurve to estimate confidence intervals. perfcurve computes confidence bounds using either cross-validation or bootstrap. If you supply cell arrays for labels and scores, perfcurve uses cross-validation and treats elements in the cell arrays as cross-validation folds. If you set input parameter NBoot to a positive integer, perfcurve generates nboot bootstrap replicas to compute pointwise confidence bounds. perfcurve estimates the confidence bounds using one of two methods: • Vertical averaging (VA) — estimate confidence bounds on Y and T at fixed values of X. Use the XVals input parameter to use this method for computing confidence bounds. • Threshold averaging (TA) — estimate confidence bounds for X and Y at fixed thresholds for the positive class score. Use the TVals input parameter to use this method for computing confidence bounds.

Observation Weights To use observation weights instead of observation counts, you can use the 'Weights' parameter in your call to perfcurve. When you use this parameter, to compute X, Y and T or to compute confidence bounds by cross-validation, perfcurve uses your supplied observation weights instead of observation counts. To compute confidence bounds by bootstrap, perfcurve samples N out of N with replacement using your weights as multinomial sampling probabilities.

18-22

Performance Curves by perfcurve

References [1] Fawcett, T. “ROC Graphs: Notes and Practical Considerations for Researchers”, Machine Learning 31, no. 1 (2004): 1–38. [2] Zweig, M., and G. Campbell. “Receiver-Operating Characteristic (ROC) Plots: A Fundamental Evaluation Tool in Clinical Medicine.” Clinical Chemistry 39, no. 4 (1993): 561–577.

See Also rocmetrics | addMetrics | average | plot | perfcurve | confusionchart

More About •

“ROC Curve and Performance Metrics” on page 18-3

18-23

18

Parametric Classification

Classification This example shows how to perform classification using discriminant analysis, naive Bayes classifiers, and decision trees. Suppose you have a data set containing observations with measurements on different variables (called predictors) and their known class labels. If you obtain predictor values for new observations, could you determine to which classes those observations probably belong? This is the problem of classification. Fisher's Iris Data Fisher's iris data consists of measurements on the sepal length, sepal width, petal length, and petal width for 150 iris specimens. There are 50 specimens from each of three species. Load the data and see how the sepal measurements differ between species. You can use the two columns containing sepal measurements. load fisheriris f = figure; gscatter(meas(:,1), meas(:,2), species,'rgb','osd'); xlabel('Sepal length'); ylabel('Sepal width');

N = size(meas,1);

18-24

Classification

Suppose you measure a sepal and petal from an iris, and you need to determine its species on the basis of those measurements. One approach to solving this problem is known as discriminant analysis. Linear and Quadratic Discriminant Analysis The fitcdiscr function can perform classification using different types of discriminant analysis. First classify the data using the default linear discriminant analysis (LDA). lda = fitcdiscr(meas(:,1:2),species); ldaClass = resubPredict(lda);

The observations with known class labels are usually called the training data. Now compute the resubstitution error, which is the misclassification error (the proportion of misclassified observations) on the training set. ldaResubErr = resubLoss(lda) ldaResubErr = 0.2000

You can also compute the confusion matrix on the training set. A confusion matrix contains information about known class labels and predicted class labels. Generally speaking, the (i,j) element in the confusion matrix is the number of samples whose known class label is class i and whose predicted class is j. The diagonal elements represent correctly classified observations. figure ldaResubCM = confusionchart(species,ldaClass);

18-25

18

Parametric Classification

Of the 150 training observations, 20% or 30 observations are misclassified by the linear discriminant function. You can see which ones they are by drawing X through the misclassified points. figure(f) bad = ~strcmp(ldaClass,species); hold on; plot(meas(bad,1), meas(bad,2), 'kx'); hold off;

The function has separated the plane into regions divided by lines, and assigned different regions to different species. One way to visualize these regions is to create a grid of (x,y) values and apply the classification function to that grid. [x,y] = meshgrid(4:.1:8,2:.1:4.5); x = x(:); y = y(:); j = classify([x y],meas(:,1:2),species); gscatter(x,y,j,'grb','sod')

18-26

Classification

For some data sets, the regions for the various classes are not well separated by lines. When that is the case, linear discriminant analysis is not appropriate. Instead, you can try quadratic discriminant analysis (QDA) for our data. Compute the resubstitution error for quadratic discriminant analysis. qda = fitcdiscr(meas(:,1:2),species,'DiscrimType','quadratic'); qdaResubErr = resubLoss(qda) qdaResubErr = 0.2000

You have computed the resubstitution error. Usually people are more interested in the test error (also referred to as generalization error), which is the expected prediction error on an independent set. In fact, the resubstitution error will likely under-estimate the test error. In this case you don't have another labeled data set, but you can simulate one by doing crossvalidation. A stratified 10-fold cross-validation is a popular choice for estimating the test error on classification algorithms. It randomly divides the training set into 10 disjoint subsets. Each subset has roughly equal size and roughly the same class proportions as in the training set. Remove one subset, train the classification model using the other nine subsets, and use the trained model to classify the removed subset. You could repeat this by removing each of the ten subsets one at a time. Because cross-validation randomly divides data, its outcome depends on the initial random seed. To reproduce the exact results in this example, execute the following command: rng(0,'twister');

18-27

18

Parametric Classification

First use cvpartition to generate 10 disjoint stratified subsets. cp = cvpartition(species,'KFold',10) cp = K-fold cross validation partition NumObservations: 150 NumTestSets: 10 TrainSize: 135 135 135 135 TestSize: 15 15 15 15 15 IsCustom: 0

135 135 135 135 15 15 15 15 15

135

135

The crossval and kfoldLoss methods can estimate the misclassification error for both LDA and QDA using the given data partition cp. Estimate the true test error for LDA using 10-fold stratified cross-validation. cvlda = crossval(lda,'CVPartition',cp); ldaCVErr = kfoldLoss(cvlda) ldaCVErr = 0.2000

The LDA cross-validation error has the same value as the LDA resubstitution error on this data. Estimate the true test error for QDA using 10-fold stratified cross-validation. cvqda = crossval(qda,'CVPartition',cp); qdaCVErr = kfoldLoss(cvqda) qdaCVErr = 0.2200

QDA has a slightly larger cross-validation error than LDA. It shows that a simpler model may get comparable, or better performance than a more complicated model. Naive Bayes Classifiers The fitcdiscr function has two other types, 'DiagLinear' and 'DiagQuadratic'. They are similar to 'linear' and 'quadratic', but with diagonal covariance matrix estimates. These diagonal choices are specific examples of a naive Bayes classifier, because they assume the variables are conditionally independent given the class label. Naive Bayes classifiers are among the most popular classifiers. While the assumption of class-conditional independence between variables is not true in general, naive Bayes classifiers have been found to work well in practice on many data sets. The fitcnb function can be used to create a more general type of naive Bayes classifier. First model each variable in each class using a Gaussian distribution. You can compute the resubstitution error and the cross-validation error. nbGau = fitcnb(meas(:,1:2), species); nbGauResubErr = resubLoss(nbGau) nbGauResubErr = 0.2200 nbGauCV = crossval(nbGau, 'CVPartition',cp); nbGauCVErr = kfoldLoss(nbGauCV) nbGauCVErr = 0.2200

18-28

Classification

labels = predict(nbGau, [x y]); gscatter(x,y,labels,'grb','sod')

So far you have assumed the variables from each class have a multivariate normal distribution. Often that is a reasonable assumption, but sometimes you may not be willing to make that assumption or you may see clearly that it is not valid. Now try to model each variable in each class using a kernel density estimation, which is a more flexible nonparametric technique. Here we set the kernel to box. nbKD = fitcnb(meas(:,1:2), species, 'DistributionNames','kernel', 'Kernel','box'); nbKDResubErr = resubLoss(nbKD) nbKDResubErr = 0.2067 nbKDCV = crossval(nbKD, 'CVPartition',cp); nbKDCVErr = kfoldLoss(nbKDCV) nbKDCVErr = 0.2133 labels = predict(nbKD, [x y]); gscatter(x,y,labels,'rgb','osd')

18-29

18

Parametric Classification

For this data set, the naive Bayes classifier with kernel density estimation gets smaller resubstitution error and cross-validation error than the naive Bayes classifier with a Gaussian distribution. Decision Tree Another classification algorithm is based on a decision tree. A decision tree is a set of simple rules, such as "if the sepal length is less than 5.45, classify the specimen as setosa." Decision trees are also nonparametric because they do not require any assumptions about the distribution of the variables in each class. The fitctree function creates a decision tree. Create a decision tree for the iris data and see how well it classifies the irises into species. t = fitctree(meas(:,1:2), species,'PredictorNames',{'SL' 'SW' });

It's interesting to see how the decision tree method divides the plane. Use the same technique as above to visualize the regions assigned to each species. [grpname,node] = predict(t,[x y]); gscatter(x,y,grpname,'grb','sod')

18-30

Classification

Another way to visualize the decision tree is to draw a diagram of the decision rule and class assignments. view(t,'Mode','graph');

18-31

18

Parametric Classification

This cluttered-looking tree uses a series of rules of the form "SL < 5.45" to classify each specimen into one of 19 terminal nodes. To determine the species assignment for an observation, start at the top node and apply the rule. If the point satisfies the rule you take the left path, and if not you take the right path. Ultimately you reach a terminal node that assigns the observation to one of the three species. Compute the resubstitution error and the cross-validation error for decision tree. dtResubErr = resubLoss(t) dtResubErr = 0.1333 cvt = crossval(t,'CVPartition',cp); dtCVErr = kfoldLoss(cvt) dtCVErr = 0.3000

For the decision tree algorithm, the cross-validation error estimate is significantly larger than the resubstitution error. This shows that the generated tree overfits the training set. In other words, this is a tree that classifies the original training set well, but the structure of the tree is sensitive to this 18-32

Classification

particular training set so that its performance on new data is likely to degrade. It is often possible to find a simpler tree that performs better than a more complex tree on new data. Try pruning the tree. First compute the resubstitution error for various subsets of the original tree. Then compute the cross-validation error for these sub-trees. A graph shows that the resubstitution error is overly optimistic. It always decreases as the tree size grows, but beyond a certain point, increasing the tree size increases the cross-validation error rate. resubcost = resubLoss(t,'Subtrees','all'); [cost,secost,ntermnodes,bestlevel] = cvloss(t,'Subtrees','all'); plot(ntermnodes,cost,'b-', ntermnodes,resubcost,'r--') figure(gcf); xlabel('Number of terminal nodes'); ylabel('Cost (misclassification error)') legend('Cross-validation','Resubstitution')

Which tree should you choose? A simple rule would be to choose the tree with the smallest crossvalidation error. While this may be satisfactory, you might prefer to use a simpler tree if it is roughly as good as a more complex tree. For this example, take the simplest tree that is within one standard error of the minimum. That's the default rule used by the cvloss method of ClassificationTree. You can show this on the graph by computing a cutoff value that is equal to the minimum cost plus one standard error. The "best" level computed by the cvloss method is the smallest tree under this cutoff. (Note that bestlevel=0 corresponds to the unpruned tree, so you have to add 1 to use it as an index into the vector outputs from cvloss.)

18-33

18

Parametric Classification

[mincost,minloc] = min(cost); cutoff = mincost + secost(minloc); hold on plot([0 20], [cutoff cutoff], 'k:') plot(ntermnodes(bestlevel+1), cost(bestlevel+1), 'mo') legend('Cross-validation','Resubstitution','Min + 1 std. err.','Best choice') hold off

Finally, you can look at the pruned tree and compute the estimated misclassification error for it. pt = prune(t,'Level',bestlevel); view(pt,'Mode','graph')

18-34

Classification

cost(bestlevel+1) ans = 0.2467

Conclusions This example shows how to perform classification in MATLAB® using Statistics and Machine Learning Toolbox™ functions. This example is not meant to be an ideal analysis of the Fisher iris data. In fact, using the petal measurements instead of, or in addition to, the sepal measurements may lead to better classification. Also, this example is not meant to compare the strengths and weaknesses of different classification algorithms. You may find it instructive to perform the analysis on other data sets and compare different algorithms. There are also Toolbox functions that implement other classification algorithms. For instance, you can use TreeBagger to perform bootstrap aggregation for an ensemble of decision trees, as described in the example “Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger” on page 19-126.

18-35

19 Nonparametric Supervised Learning • “Supervised Learning Workflow and Algorithms” on page 19-2 • “Visualize Decision Surfaces of Different Classifiers” on page 19-11 • “Classification Using Nearest Neighbors” on page 19-14 • “Framework for Ensemble Learning” on page 19-34 • “Ensemble Algorithms” on page 19-42 • “Train Classification Ensemble” on page 19-57 • “Train Regression Ensemble” on page 19-60 • “Select Predictors for Random Forests” on page 19-63 • “Test Ensemble Quality” on page 19-69 • “Ensemble Regularization” on page 19-73 • “Classification with Imbalanced Data” on page 19-82 • “Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 19-87 • “Surrogate Splits” on page 19-93 • “LPBoost and TotalBoost for Small Ensembles” on page 19-98 • “Tune RobustBoost” on page 19-103 • “Random Subspace Classification” on page 19-106 • “Train Classification Ensemble in Parallel” on page 19-111 • “Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger” on page 19-115 • “Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger” on page 19-126 • “Detect Outliers Using Quantile Regression” on page 19-139 • “Conditional Quantile Estimation Using Kernel Smoothing” on page 19-143 • “Tune Random Forest Using Quantile Error and Bayesian Optimization” on page 19-146 • “Assess Neural Network Classifier Performance” on page 19-151 • “Assess Regression Neural Network Performance” on page 19-158 • “Automated Feature Engineering for Classification” on page 19-164 • “Automated Feature Engineering for Regression” on page 19-171 • “Moving Towards Automating Model Selection Using Bayesian Optimization” on page 19-177 • “Automated Classifier Selection with Bayesian and ASHA Optimization” on page 19-185 • “Automated Regression Model Selection with Bayesian and ASHA Optimization” on page 19-204 • “Credit Rating by Bagging Decision Trees” on page 19-225 • “Combine Heterogeneous Models into Stacked Ensemble” on page 19-241 • “Label Data Using Semi-Supervised Learning Techniques” on page 19-248 • “Bibliography” on page 19-254

19

Nonparametric Supervised Learning

Supervised Learning Workflow and Algorithms In this section... “What Is Supervised Learning?” on page 19-2 “Steps in Supervised Learning” on page 19-3 “Characteristics of Classification Algorithms” on page 19-6 “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 19-8

What Is Supervised Learning? The aim of supervised, machine learning is to build a model that makes predictions based on evidence in the presence of uncertainty. As adaptive algorithms identify patterns in data, a computer "learns" from the observations. When exposed to more observations, the computer improves its predictive performance. Specifically, a supervised learning algorithm takes a known set of input data and known responses to the data (output), and trains a model to generate reasonable predictions for the response to new data.

For example, suppose you want to predict whether someone will have a heart attack within a year. You have a set of data on previous patients, including age, weight, height, blood pressure, etc. You know whether the previous patients had heart attacks within a year of their measurements. So, the problem is combining all the existing data into a model that can predict whether a new person will have a heart attack within a year. You can think of the entire set of input data as a heterogeneous matrix. Rows of the matrix are called observations, examples, or instances, and each contain a set of measurements for a subject (patients in the example). Columns of the matrix are called predictors, attributes, or features, and each are variables representing a measurement taken on every subject (age, weight, height, etc. in the example). You can think of the response data as a column vector where each row contains the output of the corresponding observation in the input data (whether the patient had a heart attack). To fit or train a supervised learning model, choose an appropriate algorithm, and then pass the input and response data to it. Supervised learning splits into two broad categories: classification and regression. • In classification, the goal is to assign a class (or label) from a finite set of classes to an observation. That is, responses are categorical variables. Applications include spam filters, advertisement recommendation systems, and image and speech recognition. Predicting whether a patient will have a heart attack within a year is a classification problem, and the possible classes 19-2

Supervised Learning Workflow and Algorithms

are true and false. Classification algorithms usually apply to nominal response values. However, some algorithms can accommodate ordinal classes (see fitcecoc). • In regression, the goal is to predict a continuous measurement for an observation. That is, the responses variables are real numbers. Applications include forecasting stock prices, energy consumption, or disease incidence. Statistics and Machine Learning Toolbox supervised learning functionalities comprise a stream-lined, object framework. You can efficiently train a variety of algorithms, combine models into an ensemble, assess model performances, cross-validate, and predict responses for new data.

Steps in Supervised Learning While there are many Statistics and Machine Learning Toolbox algorithms for supervised learning, most use the same basic workflow for obtaining a predictor model. (Detailed instruction on the steps for ensemble learning is in “Framework for Ensemble Learning” on page 19-34.) The steps for supervised learning are: 1. “Prepare Data” on page 19-3 2. “Choose an Algorithm” on page 19-4 3. “Fit a Model” on page 19-4 4. “Choose a Validation Method” on page 19-4 5. “Examine Fit and Update Until Satisfied” on page 19-5 6. “Use Fitted Model for Predictions” on page 19-6 Prepare Data All supervised learning methods start with an input data matrix, usually called X here. Each row of X represents one observation. Each column of X represents one variable, or predictor. Represent missing entries with NaN values in X. Statistics and Machine Learning Toolbox supervised learning algorithms can handle NaN values, either by ignoring them or by ignoring any row with a NaN value. You can use various data types for response data Y. Each element in Y represents the response to the corresponding row of X. Observations with missing Y data are ignored. • For regression, Y must be a numeric vector with the same number of elements as the number of rows of X. • For classification, Y can be any of these data types. This table also contains the method of including missing entries. Data Type

Missing Entry

Numeric vector

NaN

Categorical vector

Character array

Row of spaces

String array

or ""

Cell array of character vectors

''

Logical vector

(Cannot represent)

19-3

19

Nonparametric Supervised Learning

Choose an Algorithm There are tradeoffs between several characteristics of algorithms, such as: • Speed of training • Memory usage • Predictive accuracy on new data • Transparency or interpretability, meaning how easily you can understand the reasons an algorithm makes its predictions Details of the algorithms appear in “Characteristics of Classification Algorithms” on page 19-6. More detail about ensemble algorithms is in “Choose an Applicable Ensemble Aggregation Method” on page 19-35. Fit a Model The fitting function you use depends on the algorithm you choose. Algorithm

Fitting Function for Classification

Fitting Function for Regression

Decision trees

fitctree

fitrtree

Discriminant analysis

fitcdiscr

Not applicable

Ensembles (for example, Random Forests [1])

fitcensemble, TreeBagger

fitrensemble, TreeBagger

Gaussian kernel model

fitckernel (SVM and logistic regression learners)

fitrkernel (SVM and leastsquares regression learners)

Gaussian process regression (GPR)

Not applicable

fitrgp

Generalized additive model (GAM)

fitcgam

fitrgam

k-nearest neighbors

fitcknn

Not applicable

Linear model

fitclinear (SVM and logistic regression)

fitrlinear (SVM and leastsquares regression)

Multiclass, error-correcting output codes (ECOC) model for SVM or other classifiers

fitcecoc

Not applicable

Naive Bayes model

fitcnb

Not applicable

Neural network model

fitcnet

fitrnet

Support vector machines (SVM) fitcsvm

fitrsvm

For a comparison of these algorithms, see “Characteristics of Classification Algorithms” on page 196. Choose a Validation Method The three main methods to examine the accuracy of the resulting fitted model are: • Examine the resubstitution error. For examples, see: 19-4

Supervised Learning Workflow and Algorithms

• “Classification Tree Resubstitution Error” on page 20-13 • “Cross Validate a Regression Tree” on page 20-14 • “Test Ensemble Quality” on page 19-69 • “Example: Resubstitution Error of a Discriminant Analysis Classifier” on page 21-16 • Examine the cross-validation error. For examples, see: • “Cross Validate a Regression Tree” on page 20-14 • “Test Ensemble Quality” on page 19-69 • “Estimate Generalization Error of Boosting Ensemble” on page 35-2256 • “Cross Validating a Discriminant Analysis Classifier” on page 21-17 • Examine the out-of-bag error for bagged decision trees. For examples, see: • “Test Ensemble Quality” on page 19-69 • “Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger” on page 19-115 • “Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger” on page 19-126 Examine Fit and Update Until Satisfied After validating the model, you might want to change it for better accuracy, better speed, or to use less memory. • Change fitting parameters to try to get a more accurate model. For examples, see: • “Tune RobustBoost” on page 19-103 • “Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 19-87 • “Improving Discriminant Analysis Models” on page 21-15 • Change fitting parameters to try to get a smaller model. This sometimes gives a model with more accuracy. For examples, see: • “Select Appropriate Tree Depth” on page 20-16 • “Prune a Classification Tree” on page 20-20 • “Surrogate Splits” on page 19-93 • “Ensemble Regularization” on page 19-73 • “Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger” on page 19-115 • “Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger” on page 19-126 • Try a different algorithm. For applicable choices, see: • “Characteristics of Classification Algorithms” on page 19-6 • “Choose an Applicable Ensemble Aggregation Method” on page 19-35 When satisfied with a model of some types, you can trim it using the appropriate compact function (compact for classification trees, compact for regression trees, compact for discriminant analysis, compact for naive Bayes, compact for SVM, compact for ECOC models, compact for classification ensembles, and compact for regression ensembles). compact removes training data and other properties not required for prediction, e.g., pruning information for decision trees, from the model to reduce memory consumption. Because kNN classification models require all of the training data to predict labels, you cannot reduce the size of a ClassificationKNN model. 19-5

19

Nonparametric Supervised Learning

Use Fitted Model for Predictions To predict classification or regression response for most fitted models, use the predict method: Ypredicted = predict(obj,Xnew)

• obj is the fitted model or fitted compact model. • Xnew is the new input data. • Ypredicted is the predicted response, either classification or regression.

Characteristics of Classification Algorithms This table shows typical characteristics of the various supervised learning algorithms. The characteristics in any particular case can vary from the listed ones. Use the table as a guide for your initial choice of algorithms. Decide on the tradeoff you want in speed, memory usage, flexibility, and interpretability. Tip Try a decision tree or discriminant first, because these classifiers are fast and easy to interpret. If the models are not accurate enough predicting the response, try other classifiers with higher flexibility. To control flexibility, see the details for each classifier type. To avoid overfitting, look for a model of lower flexibility that provides sufficient accuracy. Classifier

Multiclass Support

Categorical Prediction Predictor Support Speed on page 19-7

Memory Usage

Interpretabili ty

“Decision Trees” on page 20-2 — fitctree

Yes

Yes

Fast

Small

Easy

Discriminant analysis on page 21-2 — fitcdiscr

Yes

No

Fast

Small for linear, large for quadratic

Easy

SVM on page 25-2 — fitcsvm

No. Combine multiple binary SVM classifiers using fitcecoc.

Yes

Medium for linear. Slow for others.

Medium for linear. All others: medium for multiclass, large for binary.

Easy for linear SVM. Hard for all other kernel types.

19-6

Supervised Learning Workflow and Algorithms

Classifier

Multiclass Support

Categorical Prediction Predictor Support Speed on page 19-7

Memory Usage

Interpretabili ty

Naive Bayes on page 22-2 — fitcnb

Yes

Yes

Medium for simple distributions. Slow for kernel distributions or high-dimensional data

Small for simple distributions. Medium for kernel distributions or highdimensional data

Easy

Nearest neighbor on page 19-14 — fitcknn

Yes

Yes

Slow for cubic. Medium for others.

Medium

Hard

Ensembles on page 19- Yes 34 — fitcensemble and fitrensemble

Yes

Fast to medium depending on choice of algorithm

Low to high depending on choice of algorithm.

Hard

The results in this table are based on an analysis of many data sets. The data sets in the study have up to 7000 observations, 80 predictors, and 50 classes. This list defines the terms in the table. Speed: • Fast — 0.01 second • Medium — 1 second • Slow — 100 seconds Memory • Small — 1MB • Medium — 4MB • Large — 100MB Note The table provides a general guide. Your results depend on your data and the speed of your machine. Categorical Predictor Support This table describes the data-type support of predictors for each classifier. Classifier

All predictors numeric

All predictors categorical

Some categorical, some numeric

Decision Trees

Yes

Yes

Yes

Discriminant Analysis

Yes

No

No

SVM

Yes

Yes

Yes

19-7

19

Nonparametric Supervised Learning

Classifier

All predictors numeric

All predictors categorical

Some categorical, some numeric

Naive Bayes

Yes

Yes

Yes

Nearest Neighbor

Euclidean distance only Hamming distance only No

Ensembles

Yes

Yes, except subspace ensembles of discriminant analysis classifiers

Yes, except subspace ensembles

Misclassification Cost Matrix, Prior Probabilities, and Observation Weights When you train a classification model, you can specify the misclassification cost matrix, prior probabilities, and observation weights by using the Cost, Prior, and Weights name-value arguments, respectively. Classification learning algorithms use the specified values for cost-sensitive learning and evaluation. Specify Cost, Prior, and Weights Name-Value Arguments Suppose that you specify Cost as C, Prior as p, and Weights as w. The values C, p, and w have the forms C = (ci j), p = [p1 p2 ⋯ pK ], w = [w1 w2 ⋯ wn]′ . • C is a K-by-K numeric matrix, where K is the number of classes. cij = C(i,j) is the cost of classifying an observation into class j when its true class is i. • wj is the observation weight for observation j, and n is the number of observations. • p is a 1-by-K numeric vector, where pk is the prior probability of the class k. If you specify Prior as "empirical", then the software sets pk to the sum of observation weights for the observations in class k: pk =

∑

∀ j ∈ Class k

wj .

Cost, Prior, and W Properties of Classification Model The software stores the user-specified cost matrix (C) in the Cost property as is, and stores the prior probabilities and observation weights in the Prior and W properties, respectively, after normalization. A classification model trained by the fitcdiscr, fitcgam, fitcknn, fitcnb, or fitcnet function uses the Cost property for prediction, but the functions do not use Cost for training. Therefore, the Cost property of the model is not read-only; you can change the property value by using dot notation after creating the trained model. For models that use Cost for training, the property is read-only. The software normalizes the prior probabilities to sum to 1 and normalizes observation weights to sum up to the value of the prior probability in the respective class. 19-8

Supervised Learning Workflow and Algorithms

pk =

pk K

∑

k=1

wj =

, pk

wj

∑

∀ j ∈ Class k

wj

pk .

Cost-Sensitive Learning These classification models support cost-sensitive learning: • Classification decision tree, trained by fitctree • Classification ensemble, trained by fitcensemble or TreeBagger • Gaussian kernel classification with SVM and logistic regression learners, trained by fitckernel • Multiclass, error-correcting output codes (ECOC) model, trained by fitcecoc • Linear classification for SVM and logistic regression, trained by fitclinear • SVM classification, trained by fitcsvm The fitting functions use the misclassification cost matrix specified by the Cost name-value argument for model training. Approaches to cost-sensitive learning vary from one classifier to another. • fitcecoc converts the specified cost matrix and prior probability values for multiclass classification into the values for binary classification for each binary learner. For more information, see “Prior Probabilities and Misclassification Cost” on page 35-2246. • fitctree applies average cost correction for growing a tree. • fitcensemble, TreeBagger, fitckernel, fitclinear, and fitcsvm adjust prior probabilities and observation weights for the specified cost matrix. • fitcensemble and TreeBagger generate in-bag samples by oversampling classes with large misclassification costs and undersampling classes with small misclassification costs. Consequently, out-of-bag samples have fewer observations from classes with large misclassification costs and more observations from classes with small misclassification costs. If you train a classification ensemble using a small data set and a highly skewed cost matrix, then the number of out-of-bag observations per class might be very low. Therefore, the estimated out-of-bag error can have a large variance and might be difficult to interpret. The same phenomenon can occur for classes with large prior probabilities. Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix

For model training, the fitcensemble, TreeBagger, fitckernel, fitclinear, and fitcsvm functions update the class prior probabilities p to p* and the observation weights w to w* to incorporate the penalties described in the cost matrix C. For a binary classification model, the software completes these steps: 1

Update p to incorporate the cost matrix C. p 1 = p1c12, p 2 = p2c21 .

2

Normalize p so that the updated prior probabilities sum to 1. 19-9

19

Nonparametric Supervised Learning

p* =

1 p. p1+p2

3

Remove observations from the training data corresponding to classes with zero prior probability.

4

Normalize the observation weights wj to sum up to the updated prior probability of the class to which the observation belongs. That is, the normalized weight for observation j in class k is w*j =

wj

∑

∀ j ∈ Class k

5

wj

pk* .

Remove observations that have zero weight.

If you have three or more classes for an ensemble model, trained by fitcensemble or TreeBagger, the software also adjusts prior probabilities for the misclassification cost matrix. This conversion is more complex. First, the software attempts to solve a matrix equation described in Zhou and Liu [2]. If the software fails to find a solution, it applies the “average cost” adjustment described in Breiman et al. [3]. For more information, see Zadrozny et al. [4]. Cost-Sensitive Evaluation You can account for the cost imbalance in classification models and data sets by conducting a costsensitive analysis: • Perform a cost-sensitive test by using the compareHoldout or testcholdout function. Both functions statistically compare the predictive performance of two classification models by including a cost matrix in the analysis. For details, see “Cost-Sensitive Testing” on page 35-1201. • Compare observed misclassification costs, returned by the object functions loss, resubLoss, and kfoldLoss of classification models. Specify the LossFun name-value argument as "classifcost". The functions return a weighted average misclassification cost of the input data, training data, and data for cross-validation, respectively. For details, see the object function reference page of any classification model object. For example, see “Classification Loss” on page 35-4908. For an example of cost-sensitive evaluation, see “Conduct Cost-Sensitive Comparison of Two Classification Models” on page 35-1191.

References [1] Breiman, L. "Random Forests." Machine Learning 45, 2001, pp. 5–32. [2] Zhou, Z.-H., and X.-Y. Liu. “On Multi-Class Cost-Sensitive Learning.” Computational Intelligence. Vol. 26, Issue 3, 2010, pp. 232–257 CiteSeerX. [3] Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Boca Raton, FL: Chapman & Hall, 1984. [4] Zadrozny, B., J. Langford, and N. Abe. “Cost-Sensitive Learning by Cost-Proportionate Example Weighting.” Third IEEE International Conference on Data Mining, 435–442. 2003.

19-10

Visualize Decision Surfaces of Different Classifiers

Visualize Decision Surfaces of Different Classifiers This example shows how to plot the decision surface of different classification algorithms. Load Fisher's iris data set. load fisheriris X = meas(:,1:2); y = categorical(species); labels = categories(y);

X is a numeric matrix that contains two petal measurements for 150 irises. Y is a cell array of character vectors that contains the corresponding iris species. Visualize the data using a scatter plot. Group the variables by iris species. gscatter(X(:,1),X(:,2),species,'rgb','osd'); xlabel('Sepal length'); ylabel('Sepal width');

Train four different classifiers and store the models in a cell array.

classifier_name = {'Naive Bayes','Discriminant Analysis','Classification Tree','Nearest Neighbor'

Train a naive Bayes model. classifier{1} = fitcnb(X,y);

19-11

19

Nonparametric Supervised Learning

Train a discriminant analysis classifier. classifier{2} = fitcdiscr(X,y);

Train a classification decision tree. classifier{3} = fitctree(X,y);

Train a k-nearest neighbor classifier. classifier{4} = fitcknn(X,y);

Create a grid of points spanning the entire space within some bounds of the actual data values. x1range = min(X(:,1)):.01:max(X(:,1)); x2range = min(X(:,2)):.01:max(X(:,2)); [xx1, xx2] = meshgrid(x1range,x2range); XGrid = [xx1(:) xx2(:)];

Predict the iris species of each observation in XGrid using all classifiers. Plot scatter plots of the results. for i = 1:numel(classifier) predictedspecies = predict(classifier{i},XGrid); subplot(2,2,i); gscatter(xx1(:), xx2(:), predictedspecies,'rgb'); title(classifier_name{i}) legend off, axis tight end legend(labels,'Location',[0.35,0.01,0.35,0.05],'Orientation','Horizontal')

19-12

Visualize Decision Surfaces of Different Classifiers

Each classification algorithm generates different decision making rules. A decision surface can help you visualize these rules.

See Also Functions fitcnb | fitcdiscr | fitctree | fitcknn

Related Examples •

“Plot Posterior Classification Probabilities” on page 22-5

19-13

19

Nonparametric Supervised Learning

Classification Using Nearest Neighbors In this section... “Pairwise Distance Metrics” on page 19-14 “k-Nearest Neighbor Search and Radius Search” on page 19-16 “Classify Query Data” on page 19-21 “Find Nearest Neighbors Using a Custom Distance Metric” on page 19-27 “K-Nearest Neighbor Classification for Supervised Learning” on page 19-30 “Construct KNN Classifier” on page 19-31 “Examine Quality of KNN Classifier” on page 19-31 “Predict Classification Using KNN Classifier” on page 19-32 “Modify KNN Classifier” on page 19-32

Pairwise Distance Metrics Categorizing query points based on their distance to points in a training data set can be a simple yet effective way of classifying new points. You can use various metrics to determine the distance, described next. Use pdist2 to find the distance between a set of data and query points. Distance Metrics Given an mx-by-n data matrix X, which is treated as mx (1-by-n) row vectors x1, x2, ..., xmx, and an myby-n data matrix Y, which is treated as my (1-by-n) row vectors y1, y2, ...,ymy, the various distances between the vector xs and yt are defined as follows: • Euclidean distance 2

dst = (xs − yt)(xs − yt)′ . The Euclidean distance is a special case of the Minkowski distance, where p = 2. Specify Euclidean distance by setting the Distance parameter to 'euclidean'. • Standardized Euclidean distance 2

dst = (xs − yt)V −1(xs − yt)′, where V is the n-by-n diagonal matrix whose jth diagonal element is (S(j))2, where S is a vector of scaling factors for each dimension. Specify standardized Euclidean distance by setting the Distance parameter to 'seuclidean'. • Fast Euclidean distance is the same as Euclidean distance, computed by using an alternative algorithm that saves time when the number of predictors is at least 10. In some cases, this faster algorithm can reduce accuracy. Does not support sparse data. See “Fast Euclidean Distance Algorithm” on page 35-5945. Specify fast Euclidean distance by setting the Distance parameter to 'fasteuclidean'. • Fast standardized Euclidean distance is the same as standardized Euclidean distance, computed by using an alternative algorithm that saves time when the number of predictors is at least 10. In 19-14

Classification Using Nearest Neighbors

some cases, this faster algorithm can reduce accuracy. Does not support sparse data. See “Fast Euclidean Distance Algorithm” on page 35-5945. Specify fast standardized Euclidean distance by setting the Distance parameter to 'fastseuclidean'. • Mahalanobis distance 2

−1

dst = (xs − yt)C

(xs − yt)′,

where C is the covariance matrix. Specify Mahalanobis distance by setting the Distance parameter to 'mahalanobis'. • City block distance dst =

n

∑

j=1

xs j − yt j .

The city block distance is a special case of the Minkowski distance, where p = 1. Specify city block distance by setting the Distance parameter to 'cityblock'. • Minkowski distance dst =

p

n

∑

j=1

p

xs j − yt j .

For the special case of p = 1, the Minkowski distance gives the city block distance. For the special case of p = 2, the Minkowski distance gives the Euclidean distance. For the special case of p = ∞, the Minkowski distance gives the Chebychev distance. Specify Minkowski distance by setting the Distance parameter to 'minkowski'. • Chebychev distance dst = max j xs j − yt j

.

The Chebychev distance is a special case of the Minkowski distance, where p = ∞. Specify Chebychev distance by setting the Distance parameter to 'chebychev'. • Cosine distance dst = 1 −

xs y′t . xsx′s yt y′t

Specify cosine distance by setting the Distance parameter to 'cosine'. • Correlation distance dst = 1 −

xs − x s

xs − x s yt − y t ′ , xs − x s ′ yt − y t yt − y t ′

where xs =

1 xs j n∑ j 19-15

19

Nonparametric Supervised Learning

and yt =

1 y . n ∑j t j

Specify correlation distance by setting the Distance parameter to 'correlation'. • Hamming distance is the percentage of coordinates that differ: dst = ( # (xs j ≠ yt j)/n) . Specify Hamming distance by setting the Distance parameter to 'hamming'. • Jaccard distance is one minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ: dst =

# xs j ≠ yt j ∩ xs j ≠ 0 ∪ yt j ≠ 0 # xs j ≠ 0 ∪ yt j ≠ 0

.

Specify Jaccard distance by setting the Distance parameter to 'jaccard'. • Spearman distance is one minus the sample Spearman's rank correlation between observations (treated as sequences of values): dst = 1 −

rs − r s

rs − r s rt − r t ′ , rs − r s ′ rt − r t rt − r t ′

where • rsj is the rank of xsj taken over x1j, x2j, ...xmx,j, as computed by tiedrank. • rtj is the rank of ytj taken over y1j, y2j, ...ymy,j, as computed by tiedrank. • rs and rt are the coordinate-wise rank vectors of xs and yt, that is, rs = (rs1, rs2, ... rsn) and rt = (rt1, rt2, ... rtn). • •

rs =

1 n+1 r = . n ∑j s j 2

rt =

1 n+1 r = . n ∑j t j 2

Specify Spearman distance by setting the Distance parameter to 'spearman'.

k-Nearest Neighbor Search and Radius Search Given a set X of n points and a distance function, k-nearest neighbor (kNN) search lets you find the k closest points in X to a query point or set of points Y. The kNN search technique and kNN-based algorithms are widely used as benchmark learning rules. The relative simplicity of the kNN search technique makes it easy to compare the results from other classification techniques to kNN results. The technique has been used in various areas such as: • bioinformatics • image processing and data compression • document retrieval 19-16

Classification Using Nearest Neighbors

• computer vision • multimedia database • marketing data analysis You can use kNN search for other machine learning algorithms, such as: • kNN classification • local weighted regression • missing data imputation and interpolation • density estimation You can also use kNN search with many distance-based learning functions, such as K-means clustering. In contrast, for a positive real value r, rangesearch finds all points in X that are within a distance r of each point in Y. This fixed-radius search is closely related to kNN search, as it supports the same distance metrics and search classes, and uses the same search algorithms. k-Nearest Neighbor Search Using Exhaustive Search When your input data meets any of the following criteria, knnsearch uses the exhaustive search method by default to find the k-nearest neighbors: • The number of columns of X is more than 10. • X is sparse. • The distance metric is either: • 'seuclidean' • 'mahalanobis' • 'cosine' • 'correlation' • 'spearman' • 'hamming' • 'jaccard' • A custom distance function knnsearch also uses the exhaustive search method if your search object is an ExhaustiveSearcher model object. The exhaustive search method finds the distance from each query point to every point in X, ranks them in ascending order, and returns the k points with the smallest distances. For example, this diagram shows the k = 3 nearest neighbors.

19-17

19

Nonparametric Supervised Learning

k-Nearest Neighbor Search Using a Kd-Tree When your input data meets all of the following criteria, knnsearch creates a Kd-tree by default to find the k-nearest neighbors: • The number of columns of X is less than 10. • X is not sparse. • The distance metric is either: • 'euclidean' (default) • 'cityblock' • 'minkowski' • 'chebychev' knnsearch also uses a Kd-tree if your search object is a KDTreeSearcher model object. Kd-trees divide your data into nodes with at most BucketSize (default is 50) points per node, based on coordinates (as opposed to categories). The following diagrams illustrate this concept using patch objects to color code the different “buckets.”

19-18

Classification Using Nearest Neighbors

When you want to find the k-nearest neighbors to a given query point, knnsearch does the following: 1

Determines the node to which the query point belongs. In the following example, the query point (32,90) belongs to Node 4.

2

Finds the closest k points within that node and its distance to the query point. In the following example, the points in red circles are equidistant from the query point, and are the closest points to the query point within Node 4.

3

Chooses all other nodes having any area that is within the same distance, in any direction, from the query point to the kth closest point. In this example, only Node 3 overlaps the solid black circle centered at the query point with radius equal to the distance to the closest points within Node 4.

4

Searches nodes within that range for any points closer to the query point. In the following example, the point in a red square is slightly closer to the query point than those within Node 4.

19-19

19

Nonparametric Supervised Learning

Using a Kd-tree for large data sets with fewer than 10 dimensions (columns) can be much more efficient than using the exhaustive search method, as knnsearch needs to calculate only a subset of the distances. To maximize the efficiency of Kd-trees, use a KDTreeSearcher model. What Are Search Model Objects? Basically, model objects are a convenient way of storing information. Related models have the same properties with values and types relevant to a specified search method. In addition to storing information within models, you can perform certain actions on models. You can efficiently perform a k-nearest neighbors search on your search model using knnsearch. Or, you can search for all neighbors within a specified radius using your search model and rangesearch. In addition, there are generic knnsearch and rangesearch functions that search without creating or using a model. To determine which type of model and search method is best for your data, consider the following: • Does your data have many columns, say more than 10? The ExhaustiveSearcher model may perform better. • Is your data sparse? Use the ExhaustiveSearcher model. • Do you want to use one of these distance metrics to find the nearest neighbors? Use the ExhaustiveSearcher model. • 'seuclidean'

19-20

Classification Using Nearest Neighbors

• 'mahalanobis' • 'cosine' • 'correlation' • 'spearman' • 'hamming' • 'jaccard' • A custom distance function • Is your data set huge (but with fewer than 10 columns)? Use the KDTreeSearcher model. • Are you searching for the nearest neighbors for a large number of query points? Use the KDTreeSearcher model.

Classify Query Data This example shows how to classify query data by: 1

Growing a Kd-tree

2

Conducting a k nearest neighbor search using the grown tree.

3

Assigning each query point the class with the highest representation among their respective nearest neighbors.

Classify a new point based on the last two columns of the Fisher iris data. Using only the last two columns makes it easier to plot. load fisheriris x = meas(:,3:4); gscatter(x(:,1),x(:,2),species) legend('Location','best')

19-21

19

Nonparametric Supervised Learning

Plot the new point. newpoint = [5 1.45]; line(newpoint(1),newpoint(2),'marker','x','color','k',... 'markersize',10,'linewidth',2)

19-22

Classification Using Nearest Neighbors

Prepare a Kd-tree neighbor searcher model. Mdl = KDTreeSearcher(x) Mdl = KDTreeSearcher with properties: BucketSize: Distance: DistParameter: X:

50 'euclidean' [] [150x2 double]

Mdl is a KDTreeSearcher model. By default, the distance metric it uses to search for neighbors is Euclidean distance. Find the 10 sample points closest to the new point. [n,d] = knnsearch(Mdl,newpoint,'k',10); line(x(n,1),x(n,2),'color',[.5 .5 .5],'marker','o',... 'linestyle','none','markersize',10)

19-23

19

Nonparametric Supervised Learning

It appears that knnsearch has found only the nearest eight neighbors. In fact, this particular dataset contains duplicate values. x(n,:) ans = 10×2 5.0000 4.9000 4.9000 5.1000 5.1000 4.8000 5.0000 4.7000 4.7000 4.7000

1.5000 1.5000 1.5000 1.5000 1.6000 1.4000 1.7000 1.4000 1.4000 1.5000

Make the axes equal so the calculated distances correspond to the apparent distances on the plot axis equal and zoom in to see the neighbors better. xlim([4.5 5.5]); ylim([1 2]); axis square

19-24

Classification Using Nearest Neighbors

Find the species of the 10 neighbors. tabulate(species(n)) Value virginica versicolor

Count 2 8

Percent 20.00% 80.00%

Using a rule based on the majority vote of the 10 nearest neighbors, you can classify this new point as a versicolor. Visually identify the neighbors by drawing a circle around the group of them. Define the center and diameter of a circle, based on the location of the new point. ctr = newpoint - d(end); diameter = 2*d(end); % Draw a circle around the 10 nearest neighbors. h = rectangle('position',[ctr,diameter,diameter],... 'curvature',[1 1]); h.LineStyle = ':';

19-25

19

Nonparametric Supervised Learning

Using the same dataset, find the 10 nearest neighbors to three new points. figure newpoint2 = [5 1.45;6 2;2.75 .75]; gscatter(x(:,1),x(:,2),species) legend('location','best') [n2,d2] = knnsearch(Mdl,newpoint2,'k',10); line(x(n2,1),x(n2,2),'color',[.5 .5 .5],'marker','o',... 'linestyle','none','markersize',10) line(newpoint2(:,1),newpoint2(:,2),'marker','x','color','k',... 'markersize',10,'linewidth',2,'linestyle','none')

19-26

Classification Using Nearest Neighbors

Find the species of the 10 nearest neighbors for each new point. tabulate(species(n2(1,:))) Value virginica versicolor

Count 2 8

Percent 20.00% 80.00%

tabulate(species(n2(2,:))) Value virginica

Count 10

Percent 100.00%

tabulate(species(n2(3,:))) Value versicolor setosa

Count 7 3

Percent 70.00% 30.00%

For more examples using knnsearch methods and function, see the individual reference pages.

Find Nearest Neighbors Using a Custom Distance Metric This example shows how to find the indices of the three nearest observations in X to each observation in Y with respect to the chi-square distance. This distance metric is used in correspondence analysis, particularly in ecological applications. 19-27

19

Nonparametric Supervised Learning

Randomly generate normally distributed data into two matrices. The number of rows can vary, but the number of columns must be equal. This example uses 2-D data for plotting. rng(1) % For reproducibility X = randn(50,2); Y = randn(4,2); h = zeros(3,1); figure h(1) = plot(X(:,1),X(:,2),'bx'); hold on h(2) = plot(Y(:,1),Y(:,2),'rs','MarkerSize',10); title('Heterogeneous Data')

The rows of X and Y correspond to observations, and the columns are, in general, dimensions (for example, predictors). The chi-square distance between j-dimensional points x and z is J

χ(x, z) =

∑

j=1

2

wj xj − zj ,

where w j is the weight associated with dimension j. Choose weights for each dimension, and specify the chi-square distance function. The distance function must: 19-28

Classification Using Nearest Neighbors

• Take as input arguments one row of X, e.g., x, and the matrix Z. • Compare x to each row of Z. • Return a vector D of length nz, where nz is the number of rows of Z. Each element of D is the distance between the observation corresponding to x and the observations corresponding to each row of Z. w = [0.4; 0.6]; chiSqrDist = @(x,Z)sqrt(((x-Z).^2)*w);

This example uses arbitrary weights for illustration. Find the indices of the three nearest observations in X to each observation in Y. k = 3; [Idx,D] = knnsearch(X,Y,'Distance',chiSqrDist,'k',k);

idx and D are 4-by-3 matrices. • idx(j,1) is the row index of the closest observation in X to observation j of Y, and D(j,1) is their distance. • idx(j,2) is the row index of the next closest observation in X to observation j of Y, and D(j,2) is their distance. • And so on. Identify the nearest observations in the plot. for j = 1:k h(3) = plot(X(Idx(:,j),1),X(Idx(:,j),2),'ko','MarkerSize',10); end legend(h,{'\texttt{X}','\texttt{Y}','Nearest Neighbor'},'Interpreter','latex') title('Heterogeneous Data and Nearest Neighbors') hold off

19-29

19

Nonparametric Supervised Learning

Several observations of Y share nearest neighbors. Verify that the chi-square distance metric is equivalent to the Euclidean distance metric, but with an optional scaling parameter. [IdxE,DE] = knnsearch(X,Y,'Distance','seuclidean','k',k, ... 'Scale',1./(sqrt(w))); AreDiffIdx = sum(sum(Idx ~= IdxE)) AreDiffIdx = 0 AreDiffDist = sum(sum(abs(D - DE) > eps)) AreDiffDist = 0

The indices and distances between the two implementations of three nearest neighbors are practically equivalent.

K-Nearest Neighbor Classification for Supervised Learning The ClassificationKNN classification model lets you: • “Construct KNN Classifier” on page 19-31 • “Examine Quality of KNN Classifier” on page 19-31 • “Predict Classification Using KNN Classifier” on page 19-32 19-30

Classification Using Nearest Neighbors

• “Modify KNN Classifier” on page 19-32 Prepare your data for classification according to the procedure in “Steps in Supervised Learning” on page 19-3. Then, construct the classifier using fitcknn.

Construct KNN Classifier This example shows how to construct a k-nearest neighbor classifier for the Fisher iris data. Load the Fisher iris data. load fisheriris X = meas; % Use all data for fitting Y = species; % Response data

Construct the classifier using fitcknn. Mdl = fitcknn(X,Y) Mdl = ClassificationKNN ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: Distance: NumNeighbors:

'Y' [] {'setosa' 'versicolor' 'none' 150 'euclidean' 1

'virginica'}

A default k-nearest neighbor classifier uses a single nearest neighbor only. Often, a classifier is more robust with more neighbors than that. Change the neighborhood size of Mdl to 4, meaning that Mdl classifies using the four nearest neighbors. Mdl.NumNeighbors = 4;

Examine Quality of KNN Classifier This example shows how to examine the quality of a k-nearest neighbor classifier using resubstitution and cross validation. Construct a KNN classifier for the Fisher iris data as in “Construct KNN Classifier” on page 19-31. load fisheriris X = meas; Y = species; rng(10); % For reproducibility Mdl = fitcknn(X,Y,'NumNeighbors',4);

Examine the resubstitution loss, which, by default, is the fraction of misclassifications from the predictions of Mdl. (For nondefault cost, weights, or priors, see loss.) 19-31

19

Nonparametric Supervised Learning

rloss = resubLoss(Mdl) rloss = 0.0400

The classifier predicts incorrectly for 4% of the training data. Construct a cross-validated classifier from the model. CVMdl = crossval(Mdl);

Examine the cross-validation loss, which is the average loss of each cross-validation model when predicting on data that is not used for training. kloss = kfoldLoss(CVMdl) kloss = 0.0333

The cross-validated classification accuracy resembles the resubstitution accuracy. Therefore, you can expect Mdl to misclassify approximately 4% of new data, assuming that the new data has about the same distribution as the training data.

Predict Classification Using KNN Classifier This example shows how to predict classification for a k-nearest neighbor classifier. Construct a KNN classifier for the Fisher iris data as in “Construct KNN Classifier” on page 19-31. load fisheriris X = meas; Y = species; Mdl = fitcknn(X,Y,'NumNeighbors',4);

Predict the classification of an average flower. flwr = mean(X); % an average flower flwrClass = predict(Mdl,flwr) flwrClass = 1x1 cell array {'versicolor'}

Modify KNN Classifier This example shows how to modify a k-nearest neighbor classifier. Construct a KNN classifier for the Fisher iris data as in “Construct KNN Classifier” on page 19-31. load fisheriris X = meas; Y = species; Mdl = fitcknn(X,Y,'NumNeighbors',4);

Modify the model to use the three nearest neighbors, rather than the default one nearest neighbor. 19-32

Classification Using Nearest Neighbors

Mdl.NumNeighbors = 3;

Compare the resubstitution predictions and cross-validation loss with the new number of neighbors. loss = resubLoss(Mdl) loss = 0.0400 rng(10); % For reproducibility CVMdl = crossval(Mdl,'KFold',5); kloss = kfoldLoss(CVMdl) kloss = 0.0333

In this case, the model with three neighbors has the same cross-validated loss as the model with four neighbors (see “Examine Quality of KNN Classifier” on page 19-31). Modify the model to use cosine distance instead of the default, and examine the loss. To use cosine distance, you must recreate the model using the exhaustive search method. CMdl = fitcknn(X,Y,'NSMethod','exhaustive','Distance','cosine'); CMdl.NumNeighbors = 3; closs = resubLoss(CMdl) closs = 0.0200

The classifier now has lower resubstitution error than before. Check the quality of a cross-validated version of the new model. CVCMdl = crossval(CMdl); kcloss = kfoldLoss(CVCMdl) kcloss = 0.0200

CVCMdl has a better cross-validated loss than CVMdl. However, in general, improving the resubstitution error does not necessarily produce a model with better test-sample predictions.

See Also fitcknn | ClassificationKNN | ExhaustiveSearcher | KDTreeSearcher

19-33

19

Nonparametric Supervised Learning

Framework for Ensemble Learning Using various methods, you can meld results from many weak learners into one high-quality ensemble predictor. These methods closely follow the same syntax, so you can try different methods with minor changes in your commands. You can create an ensemble for classification by using fitcensemble or for regression by using fitrensemble. To train an ensemble for classification using fitcensemble, use this syntax. ens = fitcensemble(X,Y,Name,Value)

• X is the matrix of data. Each row contains one observation, and each column contains one predictor variable. • Y is the vector of responses, with the same number of observations as the rows in X. • Name,Value specify additional options using one or more name-value pair arguments. For example, you can specify the ensemble aggregation method with the 'Method' argument, the number of ensemble learning cycles with the 'NumLearningCycles' argument, and the type of weak learners with the 'Learners' argument. For a complete list of name-value pair arguments, see the fitcensemble function page. This figure shows the information you need to create a classification ensemble.

Similarly, you can train an ensemble for regression by using fitrensemble, which follows the same syntax as fitcensemble. For details on the input arguments and name-value pair arguments, see the fitrensemble function page. For all classification or nonlinear regression problems, follow these steps to create an ensemble: 1. “Prepare the Predictor Data” on page 19-35 2. “Prepare the Response Data” on page 19-35 3. “Choose an Applicable Ensemble Aggregation Method” on page 19-35 4. “Set the Number of Ensemble Members” on page 19-38 5. “Prepare the Weak Learners” on page 19-38 6. “Call fitcensemble or fitrensemble” on page 19-40 19-34

Framework for Ensemble Learning

Prepare the Predictor Data All supervised learning methods start with predictor data, usually called X in this documentation. X can be stored in a matrix or a table. Each row of X represents one observation, and each column of X represents one variable or predictor.

Prepare the Response Data You can use a wide variety of data types for the response data. • For regression ensembles, Y must be a numeric vector with the same number of elements as the number of rows of X. • For classification ensembles, Y can be a numeric vector, categorical vector, character array, string array, cell array of character vectors, or logical vector. For example, suppose your response data consists of three observations in the following order: true, false, true. You could express Y as: • [1;0;1] (numeric vector) • categorical({'true','false','true'}) (categorical vector) • [true;false;true] (logical vector) • ['true ';'false';'true '] (character array, padded with spaces so each row has the same length) • ["true","false","true"] (string array) • {'true','false','true'} (cell array of character vectors) Use whichever data type is most convenient. Because you cannot represent missing values with logical entries, do not use logical entries when you have missing values in Y. fitcensemble and fitrensemble ignore missing values in Y when creating an ensemble. This table contains the method of including missing entries. Data Type

Missing Entry

Numeric vector

NaN

Categorical vector

Character array

Row of spaces

String array

or ""

Cell array of character vectors

''

Logical vector

(not possible to represent)

Choose an Applicable Ensemble Aggregation Method To create classification and regression ensembles with fitcensemble and fitrensemble, respectively, choose appropriate algorithms from this list. • For classification with two classes: • 'AdaBoostM1' 19-35

19

Nonparametric Supervised Learning

• 'LogitBoost' • 'GentleBoost' • 'RobustBoost' (requires Optimization Toolbox) • 'LPBoost' (requires Optimization Toolbox) • 'TotalBoost' (requires Optimization Toolbox) • 'RUSBoost' • 'Subspace' • 'Bag' • For classification with three or more classes: • 'AdaBoostM2' • 'LPBoost' (requires Optimization Toolbox) • 'TotalBoost' (requires Optimization Toolbox) • 'RUSBoost' • 'Subspace' • 'Bag' • For regression: • 'LSBoost' • 'Bag' For descriptions of the various algorithms, see “Ensemble Algorithms” on page 19-42. See “Suggestions for Choosing an Appropriate Ensemble Algorithm” on page 19-37. This table lists characteristics of the various algorithms. In the table titles: • Imbalance — Good for imbalanced data (one class has many more observations than the other) • Stop — Algorithm self-terminates • Sparse — Requires fewer weak learners than other ensemble algorithms Algorithm

Regression

Binary Multiclass Class Classificatio Classificatio Imbalance n n

Bag

×

×

AdaBoostM1

Stop

Sparse

×

×

AdaBoostM2

×

LogitBoost

×

GentleBoost

×

RobustBoost

×

LPBoost

×

×

×

×

TotalBoost

×

×

×

×

RUSBoost

×

×

19-36

×

Framework for Ensemble Learning

Algorithm

Regression

LSBoost

×

Subspace

Binary Multiclass Class Classificatio Classificatio Imbalance n n ×

Stop

Sparse

×

RobustBoost, LPBoost, and TotalBoost require an Optimization Toolbox license. Try TotalBoost before LPBoost, as TotalBoost can be more robust. Suggestions for Choosing an Appropriate Ensemble Algorithm • Regression — Your choices are LSBoost or Bag. See “General Characteristics of Ensemble Algorithms” on page 19-37 for the main differences between boosting and bagging. • Binary Classification — Try AdaBoostM1 first, with these modifications: Data Characteristic

Recommended Algorithm

Many predictors

Subspace

Skewed data (many more observations of one class)

RUSBoost

Label noise (some training data has the wrong RobustBoost class) Many observations

Avoid LPBoost and TotalBoost

• Multiclass Classification — Try AdaBoostM2 first, with these modifications: Data Characteristic

Recommended Algorithm

Many predictors

Subspace

Skewed data (many more observations of one class)

RUSBoost

Many observations

Avoid LPBoost and TotalBoost

For details of the algorithms, see “Ensemble Algorithms” on page 19-42. General Characteristics of Ensemble Algorithms • Boost algorithms generally use very shallow trees. This construction uses relatively little time or memory. However, for effective predictions, boosted trees might need more ensemble members than bagged trees. Therefore it is not always clear which class of algorithms is superior. • Bag generally constructs deep trees. This construction is both time consuming and memoryintensive. This also leads to relatively slow predictions. • Bag can estimate the generalization error without additional cross validation. See oobLoss. • Except for Subspace, all boosting and bagging algorithms are based on decision tree on page 202 learners. Subspace can use either discriminant analysis on page 21-2 or k-nearest neighbor on page 19-14 learners. For details of the characteristics of individual ensemble members, see “Characteristics of Classification Algorithms” on page 19-6.

19-37

19

Nonparametric Supervised Learning

Set the Number of Ensemble Members Choosing the size of an ensemble involves balancing speed and accuracy. • Larger ensembles take longer to train and to generate predictions. • Some ensemble algorithms can become overtrained (inaccurate) when too large. To set an appropriate size, consider starting with several dozen to several hundred members in an ensemble, training the ensemble, and then checking the ensemble quality, as in “Test Ensemble Quality” on page 19-69. If it appears that you need more members, add them using the resume method (classification) or the resume method (regression). Repeat until adding more members does not improve ensemble quality. Tip For classification, the LPBoost and TotalBoost algorithms are self-terminating, meaning you do not have to investigate the appropriate ensemble size. Try setting NumLearningCycles to 500. The algorithms usually terminate with fewer members.

Prepare the Weak Learners Currently the weak learner types are: • 'Discriminant' (recommended for Subspace ensemble) • 'KNN' (only for Subspace ensemble) • 'Tree' (for any ensemble except Subspace) There are two ways to set the weak learner type in an ensemble. • To create an ensemble with default weak learner options, specify the value of the 'Learners' name-value pair argument as the character vector or string scalar of the weak learner name. For example: ens = fitcensemble(X,Y,'Method','Subspace', ... 'NumLearningCycles',50,'Learners','KNN'); % or ens = fitrensemble(X,Y,'Method','Bag', ... 'NumLearningCycles',50,'Learners','Tree');

• To create an ensemble with nondefault weak learner options, create a nondefault weak learner using the appropriate template method. For example, if you have missing data, and want to use classification trees with surrogate splits for better accuracy: templ = templateTree('Surrogate','all'); ens = fitcensemble(X,Y,'Method','AdaBoostM2', ... 'NumLearningCycles',50,'Learners',templ);

To grow trees with leaves containing a number of observations that is at least 10% of the sample size: templ = templateTree('MinLeafSize',size(X,1)/10); ens = fitcensemble(X,Y,'Method','AdaBoostM2', ... 'NumLearningCycles',50,'Learners',templ);

19-38

Framework for Ensemble Learning

Alternatively, choose the maximal number of splits per tree: templ = templateTree('MaxNumSplits',4); ens = fitcensemble(X,Y,'Method','AdaBoostM2', ... 'NumLearningCycles',50,'Learners',templ);

You can also use nondefault weak learners in fitrensemble. While you can give fitcensemble and fitrensemble a cell array of learner templates, the most common usage is to give just one weak learner template. For examples using a template, see “Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 19-87 and “Surrogate Splits” on page 19-93. Decision trees can handle NaN values in X. Such values are called “missing”. If you have some missing values in a row of X, a decision tree finds optimal splits using nonmissing values only. If an entire row consists of NaN, fitcensemble and fitrensemble ignore that row. If you have data with a large fraction of missing values in X, use surrogate decision splits. For examples of surrogate splits, see “Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 19-87 and “Surrogate Splits” on page 19-93. Common Settings for Tree Weak Learners • The depth of a weak learner tree makes a difference for training time, memory usage, and predictive accuracy. You control the depth these parameters: • MaxNumSplits — The maximal number of branch node splits is MaxNumSplits per tree. Set large values of MaxNumSplits to get deep trees. The default for bagging is size(X,1) - 1. The default for boosting is 1. • MinLeafSize — Each leaf has at least MinLeafSize observations. Set small values of MinLeafSize to get deep trees. The default for classification is 1 and 5 for regression. • MinParentSize — Each branch node in the tree has at least MinParentSize observations. Set small values of MinParentSize to get deep trees. The default for classification is 2 and 10 for regression. If you supply both MinParentSize and MinLeafSize, the learner uses the setting that gives larger leaves (shallower trees): MinParent = max(MinParent,2*MinLeaf) If you additionally supply MaxNumSplits, then the software splits a tree until one of the three splitting criteria is satisfied. • Surrogate — Grow decision trees with surrogate splits when Surrogate is 'on'. Use surrogate splits when your data has missing values. Note Surrogate splits cause slower training and use more memory. • PredictorSelection — fitcensemble, fitrensemble, and TreeBagger grow trees using the standard CART algorithm [11] by default. If the predictor variables are heterogeneous or there are predictors having many levels and other having few levels, then standard CART tends to select predictors having many levels as split predictors. For split-predictor selection that is robust to the number of levels that the predictors have, consider specifying 'curvature' or 'interactioncurvature'. These specifications conduct chi-square tests of association between each predictor 19-39

19

Nonparametric Supervised Learning

and the response or each pair of predictors and the response, respectively. The predictor that yields the minimal p-value is the split predictor for a particular node. For more details, see “Choose Split Predictor Selection Technique” on page 20-14. Note When boosting decision trees, selecting split predictors using the curvature or interaction tests is not recommended.

Call fitcensemble or fitrensemble The syntaxes for fitcensemble and fitrensemble are identical. For fitrensemble, the syntax is: ens = fitrensemble(X,Y,Name,Value)

• X is the matrix of data. Each row contains one observation, and each column contains one predictor variable. • Y is the responses, with the same number of observations as rows in X. • Name,Value specify additional options using one or more name-value pair arguments. For example, you can specify the ensemble aggregation method with the 'Method' argument, the number of ensemble learning cycles with the 'NumLearningCycles' argument, and the type of weak learners with the 'Learners' argument. For a complete list of name-value pair arguments, see the fitrensemble function page. The result of fitrensemble and fitcensemble is an ensemble object, suitable for making predictions on new data. For a basic example of creating a regression ensemble, see “Train Regression Ensemble” on page 19-60. For a basic example of creating a classification ensemble, see “Train Classification Ensemble” on page 19-57. Where to Set Name-Value Pairs There are several name-value pairs you can pass to fitcensemble or fitrensemble, and several that apply to the weak learners (templateDiscriminant, templateKNN, and templateTree). To determine which name-value pair argument is appropriate, the ensemble or the weak learner: • Use template name-value pairs to control the characteristics of the weak learners. • Use fitcensemble or fitrensemble name-value pair arguments to control the ensemble as a whole, either for algorithms or for structure. For example, for an ensemble of boosted classification trees with each tree deeper than the default, set the templateTree name-value pair arguments MinLeafSize and MinParentSize to smaller values than the defaults. Or, MaxNumSplits to a larger value than the defaults. The trees are then leafier (deeper). To name the predictors in a classification ensemble (part of the structure of the ensemble), use the PredictorNames name-value pair in fitcensemble.

See Also fitcensemble | fitrensemble | oobLoss | resume | resume | templateDiscriminant | templateKNN | templateTree

19-40

Framework for Ensemble Learning

Related Examples •

“Train Classification Ensemble” on page 19-57

•

“Train Regression Ensemble” on page 19-60

•

“Ensemble Algorithms” on page 19-42

•

“Decision Trees” on page 20-2

•

“Choose Split Predictor Selection Technique” on page 20-14

19-41

19

Nonparametric Supervised Learning

Ensemble Algorithms In this section... “Bootstrap Aggregation (Bagging) and Random Forest” on page 19-45 “Random Subspace” on page 19-48 “Boosting Algorithms” on page 19-49 This topic provides descriptions of ensemble learning algorithms supported by Statistics and Machine Learning Toolbox, including bagging, random space, and various boosting algorithms. You can specify the algorithm by using the 'Method' name-value pair argument of fitcensemble, fitrensemble, or templateEnsemble. Use fitcensemble or fitrensemble to create an ensemble of learners for classification or regression, respectively. Use templateEnsemble to create an ensemble learner template, and pass the template to fitcecoc to specify ensemble binary learners for ECOC multiclass learning. For bootstrap aggregation (bagging) and random forest, you can use TreeBagger as well.

19-42

Ensemble Algorithms

Value of 'Method'

Algorithm

Supported Problems

'Bag'

“Bootstrap Aggregation Binary and multiclass (Bagging) and Random classification, Forest” on page 19-45 regression ([1], [2], [3])

Examples • “Select Predictors for Random Forests” on page 19-63 • “Test Ensemble Quality” on page 1969 • “Surrogate Splits” on page 19-93 • “Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger” on page 19-126 • “Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger” on page 19-115 • “Tune Random Forest Using Quantile Error and Bayesian Optimization” on page 19-146 • “Detect Outliers Using Quantile Regression” on page 19-139 • “Conditional Quantile Estimation Using Kernel Smoothing” on page 19-143

'Subspace'

“Random Subspace” on Binary and multiclass page 19-48 ([9]) classification

“Random Subspace Classification” on page 19-106

19-43

19

Nonparametric Supervised Learning

Value of 'Method'

Algorithm

Supported Problems

Examples

'AdaBoostM1'

“Adaptive Boosting for Binary Classification” on page 19-49 ([5], [6], [7], [11])

Binary classification

• “Create an Ensemble Template for ECOC Multiclass Learning” on page 35-7954 • “Conduct CostSensitive Comparison of Two Classification Models” on page 351191

'AdaBoostM2'

“Adaptive Boosting for Multiclass Classification” on page 19-50 ([5])

Multiclass classification “Predict Class Labels Using Classification Ensemble” on page 356372

'GentleBoost'

“Gentle Adaptive Boosting” on page 1951 ([7])

Binary classification

• “Speed Up Training ECOC Classifiers Using Binning and Parallel Computing” on page 35-7955 • “Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 19-87

'LogitBoost'

“Adaptive Logistic Binary classification Regression” on page 1951 ([7])

• “Train Classification Ensemble” on page 19-57 • “Speed Up Training by Binning Numeric Predictor Values” on page 35-2254

'LPBoost'

19-44

“Linear Programming Boosting” on page 1952 ([13])

Binary and multiclass classification

“LPBoost and TotalBoost for Small Ensembles” on page 1998

Ensemble Algorithms

Value of 'Method'

Algorithm

Supported Problems

Examples

'LSBoost'

“Least-Squares Boosting” on page 1953 ([2], [8])

Regression

• “Ensemble Regularization” on page 19-73 • “Optimize a Boosted Regression Ensemble” on page 10-66 • “Train Regression Ensemble” on page 19-60

'RobustBoost'

“Robust Boosting” on page 19-53 ([4])

Binary classification

“Tune RobustBoost” on page 19-103

'RUSBoost'

“Random Undersampling Boosting” on page 1954 ([12])

Binary and multiclass classification

“Classification with Imbalanced Data” on page 19-82

'TotalBoost'

“Totally Corrective Boosting” on page 1954 ([13])

Binary and multiclass classification

“LPBoost and TotalBoost for Small Ensembles” on page 1998

To learn about how to choose an appropriate algorithm, see “Choose an Applicable Ensemble Aggregation Method” on page 19-35. Note that usage of some algorithms, such as LPBoost, TotalBoost, and RobustBoost, requires Optimization Toolbox.

Bootstrap Aggregation (Bagging) and Random Forest Statistics and Machine Learning Toolbox offers three objects for bagging and random forest: • ClassificationBaggedEnsemble object created by the fitcensemble function for classification • RegressionBaggedEnsemble object created by the fitrensemble function for regression • TreeBagger object created by the TreeBagger function for classification and regression For details about the differences between TreeBagger and bagged ensembles (ClassificationBaggedEnsemble and RegressionBaggedEnsemble), see “Comparison of TreeBagger and Bagged Ensembles” on page 19-47. Bootstrap aggregation (bagging) is a type of ensemble learning. To bag a weak learner such as a decision tree on a data set, generate many bootstrap replicas of the data set and grow decision trees on the replicas. Obtain each bootstrap replica by randomly selecting N out of N observations with replacement, where N is the data set size. In addition, every tree in the ensemble can randomly select predictors for each decision split, a technique called random forest [2] known to improve the accuracy of bagged trees. By default, the number of predictors to select at random for each split is equal to the square root of the number of predictors for classification, and one third of the number of predictors for regression. After training a model, you can find the predicted response of a trained 19-45

19

Nonparametric Supervised Learning

ensemble for new data by using the predict function. predict takes an average over predictions from individual trees. By default, the minimum number of observations per leaf for bagged trees is set to 1 for classification and 5 for regression. Trees grown with the default leaf size are usually very deep. These settings are close to optimal for the predictive power of an ensemble. Often you can grow trees with larger leaves without losing predictive power. Doing so reduces training and prediction time, as well as memory usage for the trained ensemble. You can control the minimum number of observations per leaf by using the 'MinLeafSize' name-value pair argument of templateTree or TreeBagger. Note that you use the templateTree function to specify the options of tree learners when you create a bagged ensemble by using fitcensemble or fitrensemble. Several features of bagged decision trees make them a unique algorithm. Drawing N out of N observations with replacement omits 37% of observations, on average, for each decision tree. These omitted observations are called “out-of-bag” observations. TreeBagger and bagged ensembles (ClassificationBaggedEnsemble and RegressionBaggedEnsemble) have properties and object functions, whose names start with oob, that use out-of-bag observations. • Use the oobPredict function to estimate predictive power and feature importance. For each observation, oobPredict estimates the out-of-bag prediction by averaging predictions from all trees in the ensemble for which the observation is out of bag. • Estimate the average out-of-bag error by using oobError (for TreeBagger) or oobLoss (for bagged ensembles). These functions compare the out-of-bag predicted responses against the observed responses for all observations used for training. The out-of-bag average is an unbiased estimator of the true ensemble error. • Obtain out-of-bag estimates of feature importance by using the OOBPermutedPredictorDeltaError property (for TreeBagger) or oobPermutedPredictorImportance property (for bagged ensembles). The software randomly permutes out-of-bag data across one variable or column at a time and estimates the increase in the out-of-bag error due to this permutation. The larger the increase, the more important the feature. Therefore, you do not need to supply test data for bagged ensembles because you can obtain reliable estimates of predictive power and feature importance in the process of training. TreeBagger also offers the proximity matrix in the Proximity property. Every time two observations land on the same leaf of a tree, their proximity increases by 1. For normalization, sum these proximities over all trees in the ensemble and divide by the number of trees. The resulting matrix is symmetric with diagonal elements equal to 1 and off-diagonal elements ranging from 0 to 1. You can use this matrix to find outlier observations and discover clusters in the data through multidimensional scaling. For examples using bagging, see: • “Select Predictors for Random Forests” on page 19-63 • “Test Ensemble Quality” on page 19-69 • “Surrogate Splits” on page 19-93 • “Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger” on page 19-126 • “Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger” on page 19-115 • “Tune Random Forest Using Quantile Error and Bayesian Optimization” on page 19-146 • “Detect Outliers Using Quantile Regression” on page 19-139 • “Conditional Quantile Estimation Using Kernel Smoothing” on page 19-143 19-46

Ensemble Algorithms

Comparison of TreeBagger and Bagged Ensembles TreeBagger and bagged ensembles (ClassificationBaggedEnsemble and RegressionBaggedEnsemble) share most functionalities, but not all. Additionally, some functionalities have different names. TreeBagger features not in bagged ensembles Feature

TreeBagger Property TreeBagger Method

Computation of proximity matrix

Proximity

When you estimate the proximity matrix and outliers of a TreeBagger model using fillprox, MATLAB must fit an n-by-n matrix in memory, where n is the number of observations. Therefore, if n is moderate to large, avoid estimating the proximity matrix and outliers.

Computation of outliers OutlierMeasure Out-of-bag estimates of predictor importance using classification margins

fillprox, mdsprox

N/A

OOBPermutedPredict N/A orDeltaMeanMargin and OOBPermutedPredict orCountRaiseMargin

Merging two ensembles N/A trained separately

append

Quantile regression

N/A

quantilePredict, quantileError, oobQuantilePredict, oobQuantileError

Tall array support for creating ensemble

N/A

For details, see “Tall Arrays” on page 35-8169.

Bagged ensemble features not in TreeBagger Feature

Description

Hyperparameter optimization

Use the 'OptimizeHyperparameters' namevalue pair argument.

Binning numeric predictors to speed up training

Use the 'NumBins' name-value pair argument.

Code generation for predict

After training a model, you can generate C/C++ code that predicts labels for new data. Generating C/C++ code requires MATLAB Coder™. For details, see “Introduction to Code Generation” on page 34-3.

19-47

19

Nonparametric Supervised Learning

Different names for TreeBagger and bagged ensembles Feature

TreeBagger

Bagged Ensembles

Split criterion contributions for each predictor

DeltaCriterionDecisionSp First output of lit property predictorImportance (classification) or predictorImportance (regression)

Predictor associations

SurrogateAssociation property

Out-of-bag estimates of predictor importance

OOBPermutedPredictorDelt Output of aError property oobPermutedPredictorImpo rtance (classification) or oobPermutedPredictorImpo rtance (regression)

Error (misclassification probability or mean-squared error)

error and oobError methods

loss and oobLoss methods (classification) or loss and oobLoss methods (regression)

Training additional trees and adding them to ensemble

growTrees method

resume method (classification) or resume method (regression)

Mean classification margin per tree

meanMargin and oobMeanMargin methods

edge and oobEdge methods (classification)

Second output of predictorImportance (classification) or predictorImportance (regression)

In addition, two important differences exist when you train a model and predict responses: • If you pass a misclassification cost matrix to TreeBagger, it passes the matrix along to the trees. If you pass a misclassification cost matrix to fitcensemble, it uses the matrix to adjust the class prior probabilities. fitcensemble then passes the adjusted prior probabilities and the default cost matrix to the trees. The default cost matrix is ones(K)–eye(K) for K classes. • Unlike the loss and edge methods in ClassificationBaggedEnsemble, the TreeBagger error and meanMargin methods do not normalize input observation weights of the prior probabilities in the respective class.

Random Subspace Use random subspace ensembles (Subspace) to improve the accuracy of discriminant analysis (ClassificationDiscriminant) or k-nearest neighbor (ClassificationKNN) classifiers. Subspace ensembles also have the advantage of using less memory than ensembles with all predictors, and can handle missing values (NaNs). The basic random subspace algorithm uses these parameters. • m is the number of dimensions (variables) to sample in each learner. Set m using the NPredToSample name-value pair. • d is the number of dimensions in the data, which is the number of columns (predictors) in the data matrix X. 19-48

Ensemble Algorithms

• n is the number of learners in the ensemble. Set n using the NLearn input. The basic random subspace algorithm performs the following steps: 1

Choose without replacement a random set of m predictors from the d possible values.

2

Train a weak learner using just the m chosen predictors.

3

Repeat steps 1 and 2 until there are n weak learners.

4

Predict by taking an average of the score prediction of the weak learners, and classify the category with the highest average score.

You can choose to create a weak learner for every possible set of m predictors from the d dimensions. To do so, set n, the number of learners, to 'AllPredictorCombinations'. In this case, there are nchoosek(size(X,2),NPredToSample) weak learners in the ensemble. fitcensemble downweights predictors after choosing them for a learner, so subsequent learners have a lower chance of using a predictor that was previously used. This weighting tends to make predictors more evenly distributed among learners than in uniform weighting. For examples using Subspace, see “Random Subspace Classification” on page 19-106.

Boosting Algorithms Adaptive Boosting for Binary Classification Adaptive boosting named AdaBoostM1 is a very popular boosting algorithm for binary classification. The algorithm trains learners sequentially. For every learner with index t, AdaBoostM1 computes the weighted classification error N

εt =

∑

t

dn I yn ≠ ht xn ,

n=1

where • xn is a vector of predictor values for observation n. • yn is the true class label. • ht is the prediction of learner (hypothesis) with index t. • I is the indicator function. • d t is the weight of observation n at step t. n AdaBoostM1 then increases weights for observations misclassified by learner t and reduces weights for observations correctly classified by learner t. The next learner t + 1 is then trained on the data t+1

with updated weights dn

.

After training finishes, AdaBoostM1 computes prediction for new data using

∑

T

f x =

t=1

αtht x ,

where 19-49

19

Nonparametric Supervised Learning

1

αt = 2 log

1 − εt εt

are weights of the weak hypotheses in the ensemble. Training by AdaBoostM1 can be viewed as stagewise minimization of the exponential loss

∑

N

n=1

wnexp −yn f xn ,

where • yn ∊ {–1,+1} is the true class label. • wn are observation weights normalized to add up to 1. • f(xn) ∊ (–∞,+∞) is the predicted classification score. The observation weights wn are the original observation weights you passed to fitcensemble. The second output from the predict method of an AdaBoostM1 classification ensemble is an N-by-2 matrix of classification scores for the two classes and N observations. The second column in this matrix is always equal to minus the first column. The predict method returns two scores to be consistent with multiclass models, though this is redundant because the second column is always the negative of the first. Most often AdaBoostM1 is used with decision stumps (default) or shallow trees. If boosted stumps give poor performance, try setting the minimal parent node size to one quarter of the training data. By default, the learning rate for boosting algorithms is 1. If you set the learning rate to a lower number, the ensemble learns at a slower rate, but can converge to a better solution. 0.1 is a popular choice for the learning rate. Learning at a rate less than 1 is often called “shrinkage”. For examples using AdaBoostM1, see “Conduct Cost-Sensitive Comparison of Two Classification Models” on page 35-1191 and “Create an Ensemble Template for ECOC Multiclass Learning” on page 35-7954. Adaptive Boosting for Multiclass Classification Adaptive boosting named AdaBoostM2 is an extension of AdaBoostM1 for multiple classes. Instead of weighted classification error, AdaBoostM2 uses weighted pseudo-loss for N observations and K classes

∑ ∑ N

εt =

1 2

n=1

t d k ≠ yn n, k

1 − ht xn, yn + ht xn, k ,

where • ht(xn,k) is the confidence of prediction by learner at step t into class k ranging from 0 (not at all confident) to 1 (highly confident). • d t are observation weights at step t for class k. n, k • yn is the true class label taking one of the K values. • The second sum is over all classes other than the true class yn. 19-50

Ensemble Algorithms

Interpreting the pseudo-loss is harder than classification error, but the idea is the same. Pseudo-loss can be used as a measure of the classification accuracy from any learner in an ensemble. Pseudo-loss typically exhibits the same behavior as a weighted classification error for AdaBoostM1: the first few learners in a boosted ensemble give low pseudo-loss values. After the first few training steps, the ensemble begins to learn at a slower pace, and the pseudo-loss value approaches 0.5 from below. For an example using AdaBoostM2, see “Predict Class Labels Using Classification Ensemble” on page 35-6372. Gentle Adaptive Boosting Gentle adaptive boosting (GentleBoost, also known as Gentle AdaBoost) combines features of AdaBoostM1 and LogitBoost. Like AdaBoostM1, GentleBoost minimizes the exponential loss. But its numeric optimization is set up differently. Like LogitBoost, every weak learner fits a regression model to response values yn ∊ {–1,+1}. fitcensemble computes and stores the mean-squared error in the FitInfo property of the ensemble object. The mean-squared error is

∑

N t

n=1

dn y n − ht xn

2

,

where • d t are observation weights at step t (the weights add up to 1). n • ht(xn) are predictions of the regression model ht fitted to response values yn. As the strength of individual learners weakens, the weighted mean-squared error approaches 1. For examples using GentleBoost, see “Speed Up Training ECOC Classifiers Using Binning and Parallel Computing” on page 35-7955 and “Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 19-87. Adaptive Logistic Regression Adaptive logistic regression (LogitBoost) is another popular algorithm for binary classification. LogitBoost works similarly to AdaBoostM1, except it minimizes binomial deviance

∑

N

n=1

wnlog 1 + exp −2yn f xn

,

where • yn ∊ {–1,+1} is the true class label. • wn are observation weights normalized to add up to 1. • f(xn) ∊ (–∞,+∞) is the predicted classification score. Binomial deviance assigns less weight to badly misclassified observations (observations with large negative values of ynf(xn)). LogitBoost can give better average accuracy than AdaBoostM1 for data with poorly separable classes. Learner t in a LogitBoost ensemble fits a regression model to response values 19-51

19

Nonparametric Supervised Learning

yn =

* − pt xn yn pt xn 1 − pt xn

,

where • y*n ∊ {0,+1} are relabeled classes (0 instead of –1). • pt(xn) is the current ensemble estimate of the probability for observation xn to be of class 1. fitcensemble computes and stores the mean-squared error in the FitInfo property of the ensemble object. The mean-squared error is

∑

N

n=1

t

dn y n − ht xn

2

,

where • d t are observation weights at step t (the weights add up to 1). n • ht(xn) are predictions of the regression model ht fitted to response values y n. Values yn can range from –∞ to +∞, so the mean-squared error does not have well-defined bounds. For examples using LogitBoost, see “Train Classification Ensemble” on page 19-57 and “Speed Up Training by Binning Numeric Predictor Values” on page 35-2254. Linear Programming Boosting Linear programming boosting (LPBoost), like TotalBoost, performs multiclass classification by attempting to maximize the minimal margin in the training set. This attempt uses optimization algorithms, namely linear programming for LPBoost. So you need an Optimization Toolbox license to use LPBoost or TotalBoost. The margin of a classification is the difference between the predicted soft classification score for the true class, and the largest score for the false classes. For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node. For more information, see “More About” on page 35-5191 in margin. Why maximize the minimal margin? For one thing, the generalization error (the error on new data) is the probability of obtaining a negative margin. Schapire and Singer [10] establish this inequality on the probability of obtaining a negative margin: Ptest m ≤ 0 ≤ Ptrain m ≤ θ + O

2

1 Vlog (N/V) + log(1/δ) . 2 N θ

Here m is the margin, θ is any positive number, V is the Vapnik-Chervonenkis dimension of the classifier space, N is the size of the training set, and δ is a small positive number. The inequality holds with probability 1–δ over many i.i.d. training and test sets. This inequality says: To obtain a low generalization error, minimize the number of observations below margin θ in the training set. LPBoost iteratively maximizes the minimal margin through a sequence of linear programming problems. Equivalently, by duality, LPBoost minimizes the maximal edge, where edge is the weighted 19-52

Ensemble Algorithms

mean margin (see “More About” on page 35-1790). At each iteration, there are more constraints in the problem. So, for large problems, the optimization problem becomes increasingly constrained, and slow to solve. LPBoost typically creates ensembles with many learners having weights that are orders of magnitude smaller than those of other learners. Therefore, to better enable you to remove the unimportant ensemble members, the compact method reorders the members of an LPBoost ensemble from largest weight to smallest. Therefore, you can easily remove the least important members of the ensemble using the removeLearners method. For an example using LPBoost, see “LPBoost and TotalBoost for Small Ensembles” on page 19-98. Least-Squares Boosting Least-squares boosting (LSBoost) fits regression ensembles. At every step, the ensemble fits a new learner to the difference between the observed response and the aggregated prediction of all learners grown previously. The ensemble fits to minimize mean-squared error. You can use LSBoost with shrinkage by passing in the LearnRate parameter. By default this parameter is set to 1, and the ensemble learns at the maximal speed. If you set LearnRate to a value from 0 to 1, the ensemble fits every new learner to yn – ηf(xn), where • yn is the observed response. • f(xn) is the aggregated prediction from all weak learners grown so far for observation xn. • η is the learning rate. For examples using LSBoost, see “Train Regression Ensemble” on page 19-60, “Optimize a Boosted Regression Ensemble” on page 10-66, and “Ensemble Regularization” on page 19-73. Robust Boosting Boosting algorithms such as AdaBoostM1 and LogitBoost increase weights for misclassified observations at every boosting step. These weights can become very large. If this happens, the boosting algorithm sometimes concentrates on a few misclassified observations and neglects the majority of training data. Consequently the average classification accuracy suffers. In this situation, you can try using robust boosting (RobustBoost). This algorithm does not assign almost the entire data weight to badly misclassified observations. It can produce better average classification accuracy. You need an Optimization Toolbox license to use RobustBoost. Unlike AdaBoostM1 and LogitBoost, RobustBoost does not minimize a specific loss function. Instead, it maximizes the number of observations with the classification margin above a certain threshold. RobustBoost trains based on time evolution. The algorithm starts at t = 0. At every step, RobustBoost solves an optimization problem to find a positive step in time Δt and a corresponding positive change in the average margin for training data Δm. RobustBoost stops training and exits if at least one of these three conditions is true: • Time t reaches 1. • RobustBoost cannot find a solution to the optimization problem with positive updates Δt and Δm. • RobustBoost grows as many learners as you requested. Results from RobustBoost can be usable for any termination condition. Estimate the classification accuracy by cross validation or by using an independent test set. 19-53

19

Nonparametric Supervised Learning

To get better classification accuracy from RobustBoost, you can adjust three parameters in fitcensemble: RobustErrorGoal, RobustMaxMargin, and RobustMarginSigma. Start by varying values for RobustErrorGoal from 0 to 1. The maximal allowed value for RobustErrorGoal depends on the two other parameters. If you pass a value that is too high, fitcensemble produces an error message showing the allowed range for RobustErrorGoal. For an example using RobustBoost, see “Tune RobustBoost” on page 19-103. Random Undersampling Boosting Random undersampling boosting (RUSBoost) is especially effective at classifying imbalanced data, meaning some class in the training data has many fewer members than another. RUS stands for Random Under Sampling. The algorithm takes N, the number of members in the class with the fewest members in the training data, as the basic unit for sampling. Classes with more members are under sampled by taking only N observations of every class. In other words, if there are K classes, then, for each weak learner in the ensemble, RUSBoost takes a subset of the data with N observations from each of the K classes. The boosting procedure follows the procedure in “Adaptive Boosting for Multiclass Classification” on page 19-50 for reweighting and constructing the ensemble. When you construct a RUSBoost ensemble, there is an optional name-value pair called RatioToSmallest. Give a vector of K values, each value representing the multiple of N to sample for the associated class. For example, if the smallest class has N = 100 members, then RatioToSmallest = [2,3,4] means each weak learner has 200 members in class 1, 300 in class 2, and 400 in class 3. If RatioToSmallest leads to a value that is larger than the number of members in a particular class, then RUSBoost samples the members with replacement. Otherwise, RUSBoost samples the members without replacement. For an example using RUSBoost, see “Classification with Imbalanced Data” on page 19-82. Totally Corrective Boosting Totally corrective boosting (TotalBoost), like linear programming boost (LPBoost), performs multiclass classification by attempting to maximize the minimal margin in the training set. This attempt uses optimization algorithms, namely quadratic programming for TotalBoost. So you need an Optimization Toolbox license to use LPBoost or TotalBoost. The margin of a classification is the difference between the predicted soft classification score for the true class, and the largest score for the false classes. For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node. For more information, see “More About” on page 35-5191 in margin. Why maximize the minimal margin? For one thing, the generalization error (the error on new data) is the probability of obtaining a negative margin. Schapire and Singer [10] establish this inequality on the probability of obtaining a negative margin: Ptest m ≤ 0 ≤ Ptrain m ≤ θ + O

2

1 Vlog (N/V) + log(1/δ) . 2 N θ

Here m is the margin, θ is any positive number, V is the Vapnik-Chervonenkis dimension of the classifier space, N is the size of the training set, and δ is a small positive number. The inequality holds with probability 1–δ over many i.i.d. training and test sets. This inequality says: To obtain a low generalization error, minimize the number of observations below margin θ in the training set. 19-54

Ensemble Algorithms

TotalBoost minimizes a proxy of the Kullback-Leibler divergence between the current weight distribution and the initial weight distribution, subject to the constraint that the edge (the weighted margin) is below a certain value. The proxy is a quadratic expansion of the divergence: D(W, W0) =

N

∑

n=1

log

W(n) ≈ W0(n)

N

∑

n=1

1+

W(n) 1 Δ+ Δ2, W0(n) 2W(n)

where Δ is the difference between W(n), the weights at the current and next iteration, and W0, the initial weight distribution, which is uniform. This optimization formulation keeps weights from becoming zero. At each iteration, there are more constraints in the problem. So, for large problems, the optimization problem becomes increasingly constrained, and slow to solve. TotalBoost typically creates ensembles with many learners having weights that are orders of magnitude smaller than those of other learners. Therefore, to better enable you to remove the unimportant ensemble members, the compact method reorders the members of a TotalBoost ensemble from largest weight to smallest. Therefore you can easily remove the least important members of the ensemble using the removeLearners method. For an example using TotalBoost, see “LPBoost and TotalBoost for Small Ensembles” on page 1998.

References [1] Breiman, L. "Bagging Predictors." Machine Learning 26, 1996, pp. 123–140. [2] Breiman, L. "Random Forests." Machine Learning 45, 2001, pp. 5–32. [3] Breiman, L. https://www.stat.berkeley.edu/~breiman/RandomForests/ [4] Freund, Y. "A more robust boosting algorithm." arXiv:0905.2138v1, 2009. [5] Freund, Y. and R. E. Schapire. "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting." J. of Computer and System Sciences, Vol. 55, 1997, pp. 119–139. [6] Friedman, J. "Greedy function approximation: A gradient boosting machine." Annals of Statistics, Vol. 29, No. 5, 2001, pp. 1189–1232. [7] Friedman, J., T. Hastie, and R. Tibshirani. "Additive logistic regression: A statistical view of boosting." Annals of Statistics, Vol. 28, No. 2, 2000, pp. 337–407. [8] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, second edition. New York: Springer, 2008. [9] Ho, T. K. "The random subspace method for constructing decision forests." IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 8, 1998, pp. 832–844. [10] Schapire, R., and Y. Singer. "Improved boosting algorithms using confidence-rated predictions." Machine Learning, Vol. 37, No. 3, 1999, pp. 297–336. [11] Schapire, R. E. et al. "Boosting the margin: A new explanation for the effectiveness of voting methods." Annals of Statistics, Vol. 26, No. 5, 1998, pp. 1651–1686. 19-55

19

Nonparametric Supervised Learning

[12] Seiffert, C., T. Khoshgoftaar, J. Hulse, and A. Napolitano. "RUSBoost: Improving classification performance when training data is skewed." 19th International Conference on Pattern Recognition, 2008, pp. 1–4. [13] Warmuth, M., J. Liao, and G. Ratsch. "Totally corrective boosting algorithms that maximize the margin." Proc. 23rd Int’l. Conf. on Machine Learning, ACM, New York, 2006, pp. 1001–1008.

See Also fitcensemble | fitrensemble | TreeBagger | ClassificationBaggedEnsemble | RegressionBaggedEnsemble | CompactClassificationEnsemble | CompactRegressionEnsemble | ClassificationKNN | ClassificationDiscriminant | RegressionEnsemble | ClassificationEnsemble | ClassificationPartitionedEnsemble | RegressionPartitionedEnsemble

Related Examples

19-56

•

“Framework for Ensemble Learning” on page 19-34

•

“Tune RobustBoost” on page 19-103

•

“Surrogate Splits” on page 19-93

•

“Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 19-87

•

“LPBoost and TotalBoost for Small Ensembles” on page 19-98

•

“Random Subspace Classification” on page 19-106

Train Classification Ensemble

Train Classification Ensemble This example shows how to create a classification tree ensemble for the ionosphere data set, and use it to predict the classification of a radar return with average measurements. Load the ionosphere data set. load ionosphere

Train a classification ensemble. For binary classification problems, fitcensemble aggregates 100 classification trees using LogitBoost. Mdl = fitcensemble(X,Y) Mdl = ClassificationEnsemble ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: NumTrained: Method: LearnerNames: ReasonForTermination: FitInfo: FitInfoDescription:

'Y' [] {'b' 'g'} 'none' 351 100 'LogitBoost' {'Tree'} 'Terminated normally after completing the requested number of training [100x1 double] {2x1 cell}

Mdl is a ClassificationEnsemble model. Plot a graph of the first trained classification tree in the ensemble. view(Mdl.Trained{1}.CompactRegressionLearner,'Mode','graph');

19-57

19

Nonparametric Supervised Learning

By default, fitcensemble grows shallow trees for boosting algorithms. You can alter the tree depth by passing a tree template object to fitcensemble. For more details, see templateTree. Predict the quality of a radar return with average predictor measurements. label = predict(Mdl,mean(X)) label = 1x1 cell array {'g'}

See Also fitcensemble | predict

Related Examples

19-58

•

“Train Regression Ensemble” on page 19-60

•

“Select Predictors for Random Forests” on page 19-63

Train Classification Ensemble

•

“Decision Trees” on page 20-2

•

“Ensemble Algorithms” on page 19-42

•

“Framework for Ensemble Learning” on page 19-34

19-59

19

Nonparametric Supervised Learning

Train Regression Ensemble This example shows how to create a regression ensemble to predict mileage of cars based on their horsepower and weight, trained on the carsmall data. Load the carsmall data set. load carsmall

Prepare the predictor data. X = [Horsepower Weight];

The response data is MPG. The only available boosted regression ensemble type is LSBoost. For this example, arbitrarily choose an ensemble of 100 trees, and use the default tree options. Train an ensemble of regression trees. Mdl = fitrensemble(X,MPG,'Method','LSBoost','NumLearningCycles',100) Mdl = RegressionEnsemble ResponseName: CategoricalPredictors: ResponseTransform: NumObservations: NumTrained: Method: LearnerNames: ReasonForTermination: FitInfo: FitInfoDescription: Regularization:

'Y' [] 'none' 94 100 'LSBoost' {'Tree'} 'Terminated normally after completing the requested number of training [100x1 double] {2x1 cell} []

Plot a graph of the first trained regression tree in the ensemble. view(Mdl.Trained{1},'Mode','graph');

19-60

Train Regression Ensemble

By default, fitrensemble grows shallow trees for LSBoost. Predict the mileage of a car with 150 horsepower weighing 2750 lbs. mileage = predict(Mdl,[150 2750]) mileage = 23.6713

See Also fitrensemble | predict

Related Examples •

“Train Classification Ensemble” on page 19-57

•

“Select Predictors for Random Forests” on page 19-63

•

“Decision Trees” on page 20-2

•

“Ensemble Algorithms” on page 19-42 19-61

19

Nonparametric Supervised Learning

•

19-62

“Framework for Ensemble Learning” on page 19-34

Select Predictors for Random Forests

Select Predictors for Random Forests This example shows how to choose the appropriate split predictor selection technique for your data set when growing a random forest of regression trees. This example also shows how to decide which predictors are most important to include in the training data. Load and Preprocess Data Load the carbig data set. Consider a model that predicts the fuel economy of a car given its number of cylinders, engine displacement, horsepower, weight, acceleration, model year, and country of origin. Consider Cylinders, Model_Year, and Origin as categorical variables. load carbig Cylinders = categorical(Cylinders); Model_Year = categorical(Model_Year); Origin = categorical(cellstr(Origin)); X = table(Cylinders,Displacement,Horsepower,Weight,Acceleration,Model_Year,Origin);

Determine Levels in Predictors The standard CART algorithm tends to split predictors with many unique values (levels), e.g., continuous variables, over those with fewer levels, e.g., categorical variables. If your data is heterogeneous, or your predictor variables vary greatly in their number of levels, then consider using the curvature or interaction tests for split-predictor selection instead of standard CART. For each predictor, determine the number of levels in the data. One way to do this is define an anonymous function that: 1

Converts all variables to the categorical data type using categorical

2

Determines all unique categories while ignoring missing values using categories

3

Counts the categories using numel

Then, apply the function to each variable using varfun. countLevels = @(x)numel(categories(categorical(x))); numLevels = varfun(countLevels,X,'OutputFormat','uniform');

Compare the number of levels among the predictor variables. figure bar(numLevels) title('Number of Levels Among Predictors') xlabel('Predictor variable') ylabel('Number of levels') h = gca; h.XTickLabel = X.Properties.VariableNames(1:end-1); h.XTickLabelRotation = 45; h.TickLabelInterpreter = 'none';

19-63

19

Nonparametric Supervised Learning

The continuous variables have many more levels than the categorical variables. Because the number of levels among the predictors varies so much, using standard CART to select split predictors at each node of the trees in a random forest can yield inaccurate predictor importance estimates. In this case, use the curvature test or interaction test. Specify the algorithm by using the 'PredictorSelection' name-value pair argument. For more details, see “Choose Split Predictor Selection Technique” on page 20-14. Train Bagged Ensemble of Regression Trees Train a bagged ensemble of 200 regression trees to estimate predictor importance values. Define a tree learner using these name-value pair arguments: • 'NumVariablesToSample','all' — Use all predictor variables at each node to ensure that each tree uses all predictor variables. • 'PredictorSelection','interaction-curvature' — Specify usage of the interaction test to select split predictors. • 'Surrogate','on' — Specify usage of surrogate splits to increase accuracy because the data set includes missing values. t = templateTree('NumVariablesToSample','all',... 'PredictorSelection','interaction-curvature','Surrogate','on'); rng(1); % For reproducibility Mdl = fitrensemble(X,MPG,'Method','Bag','NumLearningCycles',200, ... 'Learners',t);

Mdl is a RegressionBaggedEnsemble model. 19-64

Select Predictors for Random Forests

Estimate the model R2 using out-of-bag predictions. yHat = oobPredict(Mdl); R2 = corr(Mdl.Y,yHat)^2 R2 = 0.8744

Mdl explains 87% of the variability around the mean. Predictor Importance Estimation Estimate predictor importance values by permuting out-of-bag observations among the trees. impOOB = oobPermutedPredictorImportance(Mdl);

impOOB is a 1-by-7 vector of predictor importance estimates corresponding to the predictors in Mdl.PredictorNames. The estimates are not biased toward predictors containing many levels. Compare the predictor importance estimates. figure bar(impOOB) title('Unbiased Predictor Importance Estimates') xlabel('Predictor variable') ylabel('Importance') h = gca; h.XTickLabel = Mdl.PredictorNames; h.XTickLabelRotation = 45; h.TickLabelInterpreter = 'none';

19-65

19

Nonparametric Supervised Learning

Greater importance estimates indicate more important predictors. The bar graph suggests that Model_Year is the most important predictor, followed by Cylinders and Weight. The Model_Year and Cylinders variables have only 13 and 5 distinct levels, respectively, whereas the Weight variable has over 300 levels. Compare predictor importance estimates by permuting out-of-bag observations and those estimates obtained by summing gains in the mean squared error due to splits on each predictor. Also, obtain predictor association measures estimated by surrogate splits. [impGain,predAssociation] = predictorImportance(Mdl); figure plot(1:numel(Mdl.PredictorNames),[impOOB' impGain']) title('Predictor Importance Estimation Comparison') xlabel('Predictor variable') ylabel('Importance') h = gca; h.XTickLabel = Mdl.PredictorNames; h.XTickLabelRotation = 45; h.TickLabelInterpreter = 'none'; legend('OOB permuted','MSE improvement') grid on

According to the values of impGain, the variables Displacement, Horsepower, and Weight appear to be equally important.

19-66

Select Predictors for Random Forests

predAssociation is a 7-by-7 matrix of predictor association measures. Rows and columns correspond to the predictors in Mdl.PredictorNames. The “Predictive Measure of Association” on page 35-6657 is a value that indicates the similarity between decision rules that split observations. The best surrogate decision split yields the maximum predictive measure of association. You can infer the strength of the relationship between pairs of predictors using the elements of predAssociation. Larger values indicate more highly correlated pairs of predictors. figure imagesc(predAssociation) title('Predictor Association Estimates') colorbar h = gca; h.XTickLabel = Mdl.PredictorNames; h.XTickLabelRotation = 45; h.TickLabelInterpreter = 'none'; h.YTickLabel = Mdl.PredictorNames;

predAssociation(1,2) ans = 0.6871

The largest association is between Cylinders and Displacement, but the value is not high enough to indicate a strong relationship between the two predictors. Grow Random Forest Using Reduced Predictor Set Because prediction time increases with the number of predictors in random forests, a good practice is to create a model using as few predictors as possible. 19-67

19

Nonparametric Supervised Learning

Grow a random forest of 200 regression trees using the best two predictors only. The default 'NumVariablesToSample' value of templateTree is one third of the number of predictors for regression, so fitrensemble uses the random forest algorithm. t = templateTree('PredictorSelection','interaction-curvature','Surrogate','on', ... 'Reproducible',true); % For reproducibility of random predictor selections MdlReduced = fitrensemble(X(:,{'Model_Year' 'Weight'}),MPG,'Method','Bag', ... 'NumLearningCycles',200,'Learners',t);

Compute the R2 of the reduced model. yHatReduced = oobPredict(MdlReduced); r2Reduced = corr(Mdl.Y,yHatReduced)^2 r2Reduced = 0.8653

The R2 for the reduced model is close to the R2 of the full model. This result suggests that the reduced model is sufficient for prediction.

See Also templateTree | fitrensemble | oobPredict | oobPermutedPredictorImportance | predictorImportance | corr

Related Examples

19-68

•

“Improving Classification Trees and Regression Trees” on page 20-13

•

“Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger” on page 19-115

•

“Surrogate Splits” on page 19-93

•

“Introduction to Feature Selection” on page 16-46

•

“Interpret Machine Learning Models” on page 27-2

Test Ensemble Quality

Test Ensemble Quality You cannot evaluate the predictive quality of an ensemble based on its performance on training data. Ensembles tend to "overtrain," meaning they produce overly optimistic estimates of their predictive power. This means the result of resubLoss for classification (resubLoss for regression) usually indicates lower error than you get on new data. To obtain a better idea of the quality of an ensemble, use one of these methods: • Evaluate the ensemble on an independent test set (useful when you have a lot of training data). • Evaluate the ensemble by cross validation (useful when you don't have a lot of training data). • Evaluate the ensemble on out-of-bag data (useful when you create a bagged ensemble with fitcensemble or fitrensemble). This example uses a bagged ensemble so it can use all three methods of evaluating ensemble quality. Generate an artificial dataset with 20 predictors. Each entry is a random number from 0 to 1. The initial classification is Y = 1 if X1 + X2 + X3 + X4 + X5 > 2 . 5 and Y = 0 otherwise. rng(1,'twister') % For reproducibility X = rand(2000,20); Y = sum(X(:,1:5),2) > 2.5;

In addition, to add noise to the results, randomly switch 10% of the classifications. idx = randsample(2000,200); Y(idx) = ~Y(idx);

Independent Test Set Create independent training and test sets of data. Use 70% of the data for a training set by calling cvpartition using the holdout option. cvpart = cvpartition(Y,'holdout',0.3); Xtrain = X(training(cvpart),:); Ytrain = Y(training(cvpart),:); Xtest = X(test(cvpart),:); Ytest = Y(test(cvpart),:);

Create a bagged classification ensemble of 200 trees from the training data. t = templateTree('Reproducible',true); % For reproducibility of random predictor selections bag = fitcensemble(Xtrain,Ytrain,'Method','Bag','NumLearningCycles',200,'Learners',t)

bag = ClassificationBaggedEnsemble ResponseName: 'Y' CategoricalPredictors: [] ClassNames: [0 1] ScoreTransform: 'none' NumObservations: 1400 NumTrained: 200 Method: 'Bag' LearnerNames: {'Tree'} ReasonForTermination: 'Terminated normally after completing the requested number of training

19-69

19

Nonparametric Supervised Learning

FitInfo: FitInfoDescription: FResample: Replace: UseObsForLearner:

[] 'None' 1 1 [1400x200 logical]

Plot the loss (misclassification) of the test data as a function of the number of trained trees in the ensemble. figure plot(loss(bag,Xtest,Ytest,'mode','cumulative')) xlabel('Number of trees') ylabel('Test classification error')

Cross Validation Generate a five-fold cross-validated bagged ensemble. cv = fitcensemble(X,Y,'Method','Bag','NumLearningCycles',200,'Kfold',5,'Learners',t) cv = ClassificationPartitionedEnsemble CrossValidatedModel: 'Bag' PredictorNames: {'x1' 'x2' ResponseName: 'Y' NumObservations: 2000

19-70

'x3'

'x4'

'x5'

'x6'

'x7'

'x8'

'x9'

'x10'

'x11'

'x1

Test Ensemble Quality

KFold: Partition: NumTrainedPerFold: ClassNames: ScoreTransform:

5 [1x1 cvpartition] [200 200 200 200 200] [0 1] 'none'

Examine the cross-validation loss as a function of the number of trees in the ensemble. figure plot(loss(bag,Xtest,Ytest,'mode','cumulative')) hold on plot(kfoldLoss(cv,'mode','cumulative'),'r.') hold off xlabel('Number of trees') ylabel('Classification error') legend('Test','Cross-validation','Location','NE')

Cross validating gives comparable estimates to those of the independent set. Out-of-Bag Estimates Generate the loss curve for out-of-bag estimates, and plot it along with the other curves. figure plot(loss(bag,Xtest,Ytest,'mode','cumulative')) hold on

19-71

19

Nonparametric Supervised Learning

plot(kfoldLoss(cv,'mode','cumulative'),'r.') plot(oobLoss(bag,'mode','cumulative'),'k--') hold off xlabel('Number of trees') ylabel('Classification error') legend('Test','Cross-validation','Out of bag','Location','NE')

The out-of-bag estimates are again comparable to those of the other methods.

See Also fitcensemble | fitrensemble | resubLoss | resubLoss | cvpartition | oobLoss | kfoldLoss | loss

Related Examples

19-72

•

“Framework for Ensemble Learning” on page 19-34

•

“Ensemble Algorithms” on page 19-42

•

“Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger” on page 19-126

Ensemble Regularization

Ensemble Regularization Regularization is a process of choosing fewer weak learners for an ensemble in a way that does not diminish predictive performance. Currently you can regularize regression ensembles. (You can also regularize a discriminant analysis classifier in a non-ensemble context; see “Regularize Discriminant Analysis Classifier” on page 21-21.) The regularize method finds an optimal set of learner weights αt that minimize

∑

N

n=1

∑

T

wng

t=1

∑

T

αtht xn , yn + λ

t=1

αt .

Here • λ ≥ 0 is a parameter you provide, called the lasso parameter. • ht is a weak learner in the ensemble trained on N observations with predictors xn, responses yn, and weights wn. • g(f,y) = (f – y)2 is the squared error. The ensemble is regularized on the same (xn,yn,wn) data used for training, so

∑

N

n=1

∑

T

wng

t=1

αtht xn , yn

is the ensemble resubstitution error. The error is measured by mean squared error (MSE). If you use λ = 0, regularize finds the weak learner weights by minimizing the resubstitution MSE. Ensembles tend to overtrain. In other words, the resubstitution error is typically smaller than the true generalization error. By making the resubstitution error even smaller, you are likely to make the ensemble accuracy worse instead of improving it. On the other hand, positive values of λ push the magnitude of the αt coefficients to 0. This often improves the generalization error. Of course, if you choose λ too large, all the optimal coefficients are 0, and the ensemble does not have any accuracy. Usually you can find an optimal range for λ in which the accuracy of the regularized ensemble is better or comparable to that of the full ensemble without regularization. A nice feature of lasso regularization is its ability to drive the optimized coefficients precisely to 0. If a learner's weight αt is 0, this learner can be excluded from the regularized ensemble. In the end, you get an ensemble with improved accuracy and fewer learners.

Regularize a Regression Ensemble This example uses data for predicting the insurance risk of a car based on its many attributes. Load the imports-85 data into the MATLAB® workspace. load imports-85;

Look at a description of the data to find the categorical variables and predictor names. Description

19-73

19

Nonparametric Supervised Learning

Description = 9x79 char array '1985 Auto Imports Database from the UCI repository ' 'http://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.names' 'Variables have been reordered to place variables with numeric values (referred ' 'to as "continuous" on the UCI site) to the left and categorical values to the ' 'right. Specifically, variables 1:16 are: symboling, normalized-losses, ' 'wheel-base, length, width, height, curb-weight, engine-size, bore, stroke, ' 'compression-ratio, horsepower, peak-rpm, city-mpg, highway-mpg, and price. ' 'Variables 17:26 are: make, fuel-type, aspiration, num-of-doors, body-style, ' 'drive-wheels, engine-location, engine-type, num-of-cylinders, and fuel-system. '

The objective of this process is to predict the "symboling," the first variable in the data, from the other predictors. "symboling" is an integer from -3 (good insurance risk) to 3 (poor insurance risk). You could use a classification ensemble to predict this risk instead of a regression ensemble. When you have a choice between regression and classification, you should try regression first. Prepare the data for ensemble fitting. Y = X(:,1); X(:,1) = []; VarNames = {'normalized-losses' 'wheel-base' 'length' 'width' 'height' ... 'curb-weight' 'engine-size' 'bore' 'stroke' 'compression-ratio' ... 'horsepower' 'peak-rpm' 'city-mpg' 'highway-mpg' 'price' 'make' ... 'fuel-type' 'aspiration' 'num-of-doors' 'body-style' 'drive-wheels' ... 'engine-location' 'engine-type' 'num-of-cylinders' 'fuel-system'}; catidx = 16:25; % indices of categorical predictors

Create a regression ensemble from the data using 300 trees. ls = fitrensemble(X,Y,'Method','LSBoost','NumLearningCycles',300, ... 'LearnRate',0.1,'PredictorNames',VarNames, ... 'ResponseName','Symboling','CategoricalPredictors',catidx) ls = RegressionEnsemble PredictorNames: ResponseName: CategoricalPredictors: ResponseTransform: NumObservations: NumTrained: Method: LearnerNames: ReasonForTermination: FitInfo: FitInfoDescription: Regularization:

{1x25 cell} 'Symboling' [16 17 18 19 20 21 22 23 24 25] 'none' 205 300 'LSBoost' {'Tree'} 'Terminated normally after completing the requested number of training [300x1 double] {2x1 cell} []

The final line, Regularization, is empty ([]). To regularize the ensemble, you have to use the regularize method. cv = crossval(ls,'KFold',5); figure; plot(kfoldLoss(cv,'Mode','Cumulative')); xlabel('Number of trees');

19-74

Ensemble Regularization

ylabel('Cross-validated MSE'); ylim([0.2,2])

It appears you might obtain satisfactory performance from a smaller ensemble, perhaps one containing from 50 to 100 trees. Call the regularize method to try to find trees that you can remove from the ensemble. By default, regularize examines 10 values of the lasso (Lambda) parameter spaced exponentially. ls = regularize(ls) ls = RegressionEnsemble PredictorNames: ResponseName: CategoricalPredictors: ResponseTransform: NumObservations: NumTrained: Method: LearnerNames: ReasonForTermination: FitInfo: FitInfoDescription: Regularization:

{1x25 cell} 'Symboling' [16 17 18 19 20 21 22 23 24 25] 'none' 205 300 'LSBoost' {'Tree'} 'Terminated normally after completing the requested number of training [300x1 double] {2x1 cell} [1x1 struct]

19-75

19

Nonparametric Supervised Learning

The Regularization property is no longer empty. Plot the resubstitution mean-squared error (MSE) and number of learners with nonzero weights against the lasso parameter. Separately plot the value at Lambda = 0. Use a logarithmic scale because the values of Lambda are exponentially spaced. figure; semilogx(ls.Regularization.Lambda,ls.Regularization.ResubstitutionMSE, ... 'bx-','Markersize',10); line([1e-3 1e-3],[ls.Regularization.ResubstitutionMSE(1) ... ls.Regularization.ResubstitutionMSE(1)],... 'Marker','x','Markersize',10,'Color','b'); r0 = resubLoss(ls); line([ls.Regularization.Lambda(2) ls.Regularization.Lambda(end)],... [r0 r0],'Color','r','LineStyle','--'); xlabel('Lambda'); ylabel('Resubstitution MSE'); annotation('textbox',[0.5 0.22 0.5 0.05],'String','unregularized ensemble', ... 'Color','r','FontSize',14,'LineStyle','none');

figure; loglog(ls.Regularization.Lambda,sum(ls.Regularization.TrainedWeights>0,1)); line([1e-3 1e-3],... [sum(ls.Regularization.TrainedWeights(:,1)>0) ... sum(ls.Regularization.TrainedWeights(:,1)>0)],... 'marker','x','markersize',10,'color','b'); line([ls.Regularization.Lambda(2) ls.Regularization.Lambda(end)],...

19-76

Ensemble Regularization

[ls.NTrained ls.NTrained],... 'color','r','LineStyle','--'); xlabel('Lambda'); ylabel('Number of learners'); annotation('textbox',[0.3 0.8 0.5 0.05],'String','unregularized ensemble',... 'color','r','FontSize',14,'LineStyle','none');

The resubstitution MSE values are likely to be overly optimistic. To obtain more reliable estimates of the error associated with various values of Lambda, cross validate the ensemble using cvshrink. Plot the resulting cross-validation loss (MSE) and number of learners against Lambda. rng(0,'Twister') % for reproducibility [mse,nlearn] = cvshrink(ls,'Lambda',ls.Regularization.Lambda,'KFold',5); Warning: Some folds do not have any trained weak learners. figure; semilogx(ls.Regularization.Lambda,ls.Regularization.ResubstitutionMSE, ... 'bx-','Markersize',10); hold on; semilogx(ls.Regularization.Lambda,mse,'ro-','Markersize',10); hold off; xlabel('Lambda'); ylabel('Mean squared error'); legend('resubstitution','cross-validation','Location','NW'); line([1e-3 1e-3],[ls.Regularization.ResubstitutionMSE(1) ... ls.Regularization.ResubstitutionMSE(1)],...

19-77

19

Nonparametric Supervised Learning

'Marker','x','Markersize',10,'Color','b','HandleVisibility','off'); line([1e-3 1e-3],[mse(1) mse(1)],'Marker','o',... 'Markersize',10,'Color','r','LineStyle','--','HandleVisibility','off');

figure; loglog(ls.Regularization.Lambda,sum(ls.Regularization.TrainedWeights>0,1)); hold; Current plot held loglog(ls.Regularization.Lambda,nlearn,'r--'); hold off; xlabel('Lambda'); ylabel('Number of learners'); legend('resubstitution','cross-validation','Location','NE'); line([1e-3 1e-3],... [sum(ls.Regularization.TrainedWeights(:,1)>0) ... sum(ls.Regularization.TrainedWeights(:,1)>0)],... 'Marker','x','Markersize',10,'Color','b','HandleVisibility','off'); line([1e-3 1e-3],[nlearn(1) nlearn(1)],'marker','o',... 'Markersize',10,'Color','r','LineStyle','--','HandleVisibility','off');

19-78

Ensemble Regularization

Examining the cross-validated error shows that the cross-validation MSE is almost flat for Lambda up to a bit over 1e-2. Examine ls.Regularization.Lambda to find the highest value that gives MSE in the flat region (up to a bit over 1e-2). jj = 1:length(ls.Regularization.Lambda); [jj;ls.Regularization.Lambda] ans = 2×10 1.0000 0

2.0000 0.0019

3.0000 0.0045

4.0000 0.0107

5.0000 0.0254

6.0000 0.0602

7.0000 0.1428

8.0000 0.3387

9.0000 0.8033

Element 5 of ls.Regularization.Lambda has value 0.0254, the largest in the flat range. Reduce the ensemble size using the shrink method. shrink returns a compact ensemble with no training data. The generalization error for the new compact ensemble was already estimated by cross validation in mse(5). cmp = shrink(ls,'weightcolumn',5) cmp = CompactRegressionEnsemble PredictorNames: {1x25 cell} ResponseName: 'Symboling' CategoricalPredictors: [16 17 18 19 20 21 22 23 24 25]

19-79

10.0 1.9

19

Nonparametric Supervised Learning

ResponseTransform: 'none' NumTrained: 8

The number of trees in the new ensemble has notably reduced from the 300 in ls. Compare the sizes of the ensembles. sz(1) = whos('cmp'); sz(2) = whos('ls'); [sz(1).bytes sz(2).bytes] ans = 1×2 92730

3278143

The size of the reduced ensemble is a fraction of the size of the original. Note that your ensemble sizes can vary depending on your operating system. Compare the MSE of the reduced ensemble to that of the original ensemble. figure; plot(kfoldLoss(cv,'mode','cumulative')); hold on plot(cmp.NTrained,mse(5),'ro','MarkerSize',10); xlabel('Number of trees'); ylabel('Cross-validated MSE'); legend('unregularized ensemble','regularized ensemble',... 'Location','NE'); hold off

19-80

Ensemble Regularization

The reduced ensemble gives low loss while using many fewer trees.

See Also fitrensemble | regularize | kfoldLoss | cvshrink | shrink | resubLoss | crossval

Related Examples •

“Regularize Discriminant Analysis Classifier” on page 21-21

•

“Framework for Ensemble Learning” on page 19-34

•

“Ensemble Algorithms” on page 19-42

19-81

19

Nonparametric Supervised Learning

Classification with Imbalanced Data This example shows how to perform classification when one class has many more observations than another. You use the RUSBoost algorithm first, because it is designed to handle this case. Another way to handle imbalanced data is to use the name-value pair arguments 'Prior' or 'Cost'. For details, see “Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 19-87. This example uses the "Cover type" data from the UCI machine learning archive, described in https:// archive.ics.uci.edu/ml/datasets/Covertype. The data classifies types of forest (ground cover), based on predictors such as elevation, soil type, and distance to water. The data has over 500,000 observations and over 50 predictors, so training and using a classifier is time consuming. Blackard and Dean [1] describe a neural net classification of this data. They quote a 70.6% classification accuracy. RUSBoost obtains over 81% classification accuracy. Obtain the data Import the data into your workspace. Extract the last data column into a variable named Y. gunzip('https://archive.ics.uci.edu/ml/machine-learning-databases/covtype/covtype.data.gz') load covtype.data Y = covtype(:,end); covtype(:,end) = [];

Examine the response data tabulate(Y) Value 1 2 3 4 5 6 7

Count 211840 283301 35754 2747 9493 17367 20510

Percent 36.46% 48.76% 6.15% 0.47% 1.63% 2.99% 3.53%

There are hundreds of thousands of data points. Those of class 4 are less than 0.5% of the total. This imbalance indicates that RUSBoost is an appropriate algorithm. Partition the data for quality assessment Use half the data to fit a classifier, and half to examine the quality of the resulting classifier. rng(10,'twister') % For reproducibility part = cvpartition(Y,'Holdout',0.5); istrain = training(part); % Data for fitting istest = test(part); % Data for quality assessment tabulate(Y(istrain)) Value 1 2 3 4

19-82

Count 105919 141651 17877 1374

Percent 36.46% 48.76% 6.15% 0.47%

Classification with Imbalanced Data

5 6 7

4747 8684 10254

1.63% 2.99% 3.53%

Create the ensemble Use deep trees for higher ensemble accuracy. To do so, set the trees to have maximal number of decision splits of N, where N is the number of observations in the training sample. Set LearnRate to 0.1 in order to achieve higher accuracy as well. The data is large, and, with deep trees, creating the ensemble is time consuming. N = sum(istrain); % Number of observations in the training sample t = templateTree('MaxNumSplits',N); tic rusTree = fitcensemble(covtype(istrain,:),Y(istrain),'Method','RUSBoost', ... 'NumLearningCycles',1000,'Learners',t,'LearnRate',0.1,'nprint',100); Training RUSBoost... Grown weak learners: Grown weak learners: Grown weak learners: Grown weak learners: Grown weak learners: Grown weak learners: Grown weak learners: Grown weak learners: Grown weak learners: Grown weak learners:

100 200 300 400 500 600 700 800 900 1000

toc Elapsed time is 242.836734 seconds.

Inspect the classification error Plot the classification error against the number of members in the ensemble. figure; tic plot(loss(rusTree,covtype(istest,:),Y(istest),'mode','cumulative')); toc Elapsed time is 164.470086 seconds. grid on; xlabel('Number of trees'); ylabel('Test classification error');

19-83

19

Nonparametric Supervised Learning

The ensemble achieves a classification error of under 20% using 116 or more trees. For 500 or more trees, the classification error decreases at a slower rate. Examine the confusion matrix for each class as a percentage of the true class. tic Yfit = predict(rusTree,covtype(istest,:)); toc Elapsed time is 132.353489 seconds. confusionchart(Y(istest),Yfit,'Normalization','row-normalized','RowSummary','row-normalized')

19-84

Classification with Imbalanced Data

All classes except class 2 have over 90% classification accuracy. But class 2 makes up close to half the data, so the overall accuracy is not that high. Compact the ensemble The ensemble is large. Remove the data using the compact method. cmpctRus = compact(rusTree); sz(1) = whos('rusTree'); sz(2) = whos('cmpctRus'); [sz(1).bytes sz(2).bytes] ans = 1×2 109 × 1.6579

0.9423

The compacted ensemble is about half the size of the original. Remove half the trees from cmpctRus. This action is likely to have minimal effect on the predictive performance, based on the observation that 500 out of 1000 trees give nearly optimal accuracy. cmpctRus = removeLearners(cmpctRus,[500:1000]); sz(3) = whos('cmpctRus'); sz(3).bytes

19-85

19

Nonparametric Supervised Learning

ans = 452868660

The reduced compact ensemble takes about a quarter of the memory of the full ensemble. Its overall loss rate is under 19%: L = loss(cmpctRus,covtype(istest,:),Y(istest)) L = 0.1833

The predictive accuracy on new data might differ, because the ensemble accuracy might be biased. The bias arises because the same data used for assessing the ensemble was used for reducing the ensemble size. To obtain an unbiased estimate of requisite ensemble size, you should use cross validation. However, that procedure is time consuming.

References [1] Blackard, J. A. and D. J. Dean. "Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables". Computers and Electronics in Agriculture Vol. 24, Issue 3, 1999, pp. 131–151.

See Also fitcensemble | tabulate | cvpartition | training | test | templateTree | loss | predict | compact | removeLearners | confusionchart

Related Examples

19-86

•

“Surrogate Splits” on page 19-93

•

“Ensemble Algorithms” on page 19-42

•

“Test Ensemble Quality” on page 19-69

•

“Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 19-87

•

“LPBoost and TotalBoost for Small Ensembles” on page 19-98

•

“Tune RobustBoost” on page 19-103

Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles

Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles In many applications, you might prefer to treat classes in your data asymmetrically. For example, the data might have many more observations of one class than any other. Or misclassifying observations of one class has more severe consequences than misclassifying observations of another class. In such situations, you can either use the RUSBoost algorithm (specify 'Method' as 'RUSBoost') or use the name-value pair argument 'Prior' or 'Cost' of fitcensemble. If some classes are underrepresented or overrepresented in your training set, use either the 'Prior' name-value pair argument or the RUSBoost algorithm. For example, suppose you obtain your training data by simulation. Because simulating class A is more expensive than simulating class B, you choose to generate fewer observations of class A and more observations of class B. The expectation, however, is that class A and class B are mixed in a different proportion in real (nonsimulated) situations. In this case, use 'Prior' to set prior probabilities for class A and B approximately to the values you expect to observe in a real situation. The fitcensemble function normalizes prior probabilities to make them add up to 1. Multiplying all prior probabilities by the same positive factor does not affect the result of classification. Another way to handle imbalanced data is to use the RUSBoost algorithm ('Method','RUSBoost'). You do not need to adjust the prior probabilities when using this algorithm. For details, see “Random Undersampling Boosting” on page 19-54 and “Classification with Imbalanced Data” on page 19-82. If classes are adequately represented in the training data but you want to treat them asymmetrically, use the 'Cost' name-value pair argument. Suppose you want to classify benign and malignant tumors in cancer patients. Failure to identify a malignant tumor (false negative) has far more severe consequences than misidentifying benign as malignant (false positive). You should assign high cost to misidentifying malignant as benign and low cost to misidentifying benign as malignant. You must pass misclassification costs as a square matrix with nonnegative elements. Element C(i,j) of this matrix is the cost of classifying an observation into class j if the true class is i. The diagonal elements C(i,i) of the cost matrix must be 0. For the previous example, you can choose malignant tumor to be class 1 and benign tumor to be class 2. Then you can set the cost matrix to 0 c 10 where c > 1 is the cost of misidentifying a malignant tumor as benign. Costs are relative—multiplying all costs by the same positive factor does not affect the result of classification. If you have only two classes, fitcensemble adjusts their prior probabilities using P i = Ci jPifor class i = 1,2 and j ≠ i. Pi are prior probabilities either passed into fitcensemble or computed from class frequencies in the training data, and P i are adjusted prior probabilities. fitcensemble uses the adjusted probabilities for training its weak learners and does not use the cost matrix. Manipulating the cost matrix is thus equivalent to manipulating the prior probabilities. If you have three or more classes, fitcensemble also converts input costs into adjusted prior probabilities. This conversion is more complex. First, fitcensemble attempts to solve a matrix equation described in Zhou and Liu [1]. If it fails to find a solution, fitcensemble applies the “average cost” adjustment described in Breiman et al. [2]. For more information, see Zadrozny, Langford, and Abe [3].

19-87

19

Nonparametric Supervised Learning

Train Ensemble With Unequal Classification Costs This example shows how to train an ensemble of classification trees with unequal classification costs. This example uses data on patients with hepatitis to see if they live or die as a result of the disease. The data set is described at UCI Machine Learning Data Repository. Read the hepatitis data set from the UCI repository as a character array. Then convert the result to a cell array of character vectors using textscan. Specify a cell array of character vectors containing the variable names. options = weboptions('ContentType','text'); hepatitis = textscan(webread(['http://archive.ics.uci.edu/ml/' ... 'machine-learning-databases/hepatitis/hepatitis.data'],options),... '%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f','Delimiter',',',... 'EndOfLine','\n','TreatAsEmpty','?'); size(hepatitis) ans = 1×2 1

20

VarNames = {'dieOrLive' 'age' 'sex' 'steroid' 'antivirals' 'fatigue' ... 'malaise' 'anorexia' 'liverBig' 'liverFirm' 'spleen' ... 'spiders' 'ascites' 'varices' 'bilirubin' 'alkPhosphate' 'sgot' ... 'albumin' 'protime' 'histology'};

hepatitis is a 1-by-20 cell array of character vectors. The cells correspond to the response (liveOrDie) and 19 heterogeneous predictors. Specify a numeric matrix containing the predictors and a cell vector containing 'Die' and 'Live', which are response categories. The response contains two values: 1 indicates that a patient died, and 2 indicates that a patient lived. Specify a cell array of character vectors for the response using the response categories. The first variable in hepatitis contains the response. X = cell2mat(hepatitis(2:end)); ClassNames = {'Die' 'Live'}; Y = ClassNames(hepatitis{:,1});

X is a numeric matrix containing the 19 predictors. Y is a cell array of character vectors containing the response. Inspect the data for missing values. figure barh(sum(isnan(X),1)/size(X,1)) h = gca; h.YTick = 1:numel(VarNames) - 1; h.YTickLabel = VarNames(2:end); ylabel('Predictor') xlabel('Fraction of missing values')

19-88

Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles

Most predictors have missing values, and one has nearly 45% of the missing values. Therefore, use decision trees with surrogate splits for better accuracy. Because the data set is small, training time with surrogate splits should be tolerable. Create a classification tree template that uses surrogate splits. rng(0,'twister') % For reproducibility t = templateTree('surrogate','all');

Examine the data or the description of the data to see which predictors are categorical. X(1:5,:) ans = 5×19 30.0000 50.0000 78.0000 31.0000 34.0000

2.0000 1.0000 1.0000 1.0000 1.0000

1.0000 1.0000 2.0000 NaN 2.0000

2.0000 2.0000 2.0000 1.0000 2.0000

2.0000 1.0000 1.0000 2.0000 2.0000

2.0000 2.0000 2.0000 2.0000 2.0000

2.0000 2.0000 2.0000 2.0000 2.0000

1.0000 1.0000 2.0000 2.0000 2.0000

2.0000 2.0000 2.0000 2.0000 2.0000

It appears that predictors 2 through 13 are categorical, as well as predictor 19. You can confirm this inference using the data set description at UCI Machine Learning Data Repository. List the categorical variables. catIdx = [2:13,19];

19-89

2.0 2.0 2.0 2.0 2.0

19

Nonparametric Supervised Learning

Create a cross-validated ensemble using 50 learners and the GentleBoost algorithm. Ensemble = fitcensemble(X,Y,'Method','GentleBoost', ... 'NumLearningCycles',50,'Learners',t,'PredictorNames',VarNames(2:end), ... 'LearnRate',0.1,'CategoricalPredictors',catIdx,'KFold',5);

Inspect the confusion matrix to see which patients the ensemble predicts correctly. [yFit,sFit] = kfoldPredict(Ensemble); confusionchart(Y,yFit)

Of the 123 patient who live, the ensemble predicts correctly that 116 will live. But for the 32 patients who die of hepatitis, the ensemble only predicts correctly that about half will die of hepatitis. There are two types of error in the predictions of the ensemble: • Predicting that the patient lives, but the patient dies • Predicting that the patient dies, but the patient lives Suppose you believe that the first error is five times worse than the second. Create a new classification cost matrix that reflects this belief. cost.ClassNames = ClassNames; cost.ClassificationCosts = [0 5; 1 0];

Create a new cross-validated ensemble using cost as the misclassification cost, and inspect the resulting confusion matrix. 19-90

Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles

EnsembleCost = fitcensemble(X,Y,'Method','GentleBoost', ... 'NumLearningCycles',50,'Learners',t,'PredictorNames',VarNames(2:end), ... 'LearnRate',0.1,'CategoricalPredictors',catIdx,'KFold',5,'Cost',cost); [yFitCost,sFitCost] = kfoldPredict(EnsembleCost); confusionchart(Y,yFitCost)

As expected, the new ensemble does a better job classifying the patients who die. Somewhat surprisingly, the new ensemble also does a better job classifying the patients who live, though the result is not statistically significantly better. The results of the cross validation are random, so this result is simply a statistical fluctuation. The result seems to indicate that the classification of patients who live is not very sensitive to the cost.

References [1] Zhou, Z.-H., and X.-Y. Liu. “On Multi-Class Cost-Sensitive Learning.” Computational Intelligence. Vol. 26, Issue 3, 2010, pp. 232–257 CiteSeerX. [2] Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Boca Raton, FL: Chapman & Hall, 1984. [3] Zadrozny, B., J. Langford, and N. Abe. “Cost-Sensitive Learning by Cost-Proportionate Example Weighting.” Third IEEE International Conference on Data Mining, 435–442. 2003.

See Also fitcensemble | templateTree | kfoldLoss | kfoldPredict | confusionchart 19-91

19

Nonparametric Supervised Learning

Related Examples

19-92

•

“Surrogate Splits” on page 19-93

•

“Ensemble Algorithms” on page 19-42

•

“Test Ensemble Quality” on page 19-69

•

“Classification with Imbalanced Data” on page 19-82

•

“LPBoost and TotalBoost for Small Ensembles” on page 19-98

•

“Tune RobustBoost” on page 19-103

•

“Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 19-8

Surrogate Splits

Surrogate Splits When the value of the optimal split predictor for an observation is missing, if you specify to use surrogate splits, the software sends the observation to the left or right child node using the best surrogate predictor. When you have missing data, trees and ensembles of trees with surrogate splits give better predictions. This example shows how to improve the accuracy of predictions for data with missing values by using decision trees with surrogate splits. Load Sample Data Load the ionosphere data set. load ionosphere

Partition the data set into training and test sets. Hold out 30% of the data for testing. rng('default') % For reproducibility cv = cvpartition(Y,'Holdout',0.3);

Identify the training and testing data. Xtrain = X(training(cv),:); Ytrain = Y(training(cv)); Xtest = X(test(cv),:); Ytest = Y(test(cv));

Suppose half of the values in the test set are missing. Set half of the values in the test set to NaN. Xtest(rand(size(Xtest))>0.5) = NaN;

Train Random Forest Train a random forest of 150 classification trees without surrogate splits. templ = templateTree('Reproducible',true); % For reproducibility of random predictor selections Mdl = fitcensemble(Xtrain,Ytrain,'Method','Bag','NumLearningCycles',150,'Learners',templ);

Create a decision tree template that uses surrogate splits. A tree using surrogate splits does not discard the entire observation when it includes missing data in some predictors. templS = templateTree('Surrogate','On','Reproducible',true);

Train a random forest using the template templS. Mdls = fitcensemble(Xtrain,Ytrain,'Method','Bag','NumLearningCycles',150,'Learners',templS);

Test Accuracy Test the accuracy of predictions with and without surrogate splits. Predict responses and create confusion matrix charts using both approaches. Ytest_pred = predict(Mdl,Xtest); figure cm = confusionchart(Ytest,Ytest_pred); cm.Title = 'Model Without Surrogates';

19-93

19

Nonparametric Supervised Learning

Ytest_preds = predict(Mdls,Xtest); figure cms = confusionchart(Ytest,Ytest_preds); cms.Title = 'Model with Surrogates';

19-94

Surrogate Splits

All off-diagonal elements on the confusion matrix represent misclassified data. A good classifier yields a confusion matrix that looks dominantly diagonal. In this case, the classification error is lower for the model trained with surrogate splits. Estimate cumulative classification errors. Specify 'Mode','Cumulative' when estimating classification errors by using the loss function. The loss function returns a vector in which element J indicates the error using the first J learners. figure plot(loss(Mdl,Xtest,Ytest,'Mode','Cumulative')) hold on plot(loss(Mdls,Xtest,Ytest,'Mode','Cumulative'),'r--') legend('Trees without surrogate splits','Trees with surrogate splits') xlabel('Number of trees') ylabel('Test classification error')

19-95

19

Nonparametric Supervised Learning

The error value decreases as the number of trees increases, which indicates good performance. The classification error is lower for the model trained with surrogate splits. Check the statistical significance of the difference in results with by using compareHoldout. This function uses the McNemar test. [~,p] = compareHoldout(Mdls,Mdl,Xtest,Xtest,Ytest,'Alternative','greater') p = 0.0384

The low p-value indicates that the ensemble with surrogate splits is better in a statistically significant manner. Estimate Predictor Importance Predictor importance estimates can vary depending on whether or not a tree uses surrogate splits. Estimate predictor importance measures by permuting out-of-bag observations. Then, find the five most important predictors. imp = oobPermutedPredictorImportance(Mdl); [~,ind] = maxk(imp,5) ind = 1×5 5

19-96

3

27

8

14

Surrogate Splits

imps = oobPermutedPredictorImportance(Mdls); [~,inds] = maxk(imps,5) inds = 1×5 3

5

8

27

7

After estimating predictor importance, you can exclude unimportant predictors and train a model again. Eliminating unimportant predictors saves time and memory for predictions, and makes predictions easier to understand. If the training data includes many predictors and you want to analyze predictor importance, then specify 'NumVariablesToSample' of the templateTree function as 'all' for the tree learners of the ensemble. Otherwise, the software might not select some predictors, underestimating their importance. For an example, see “Select Predictors for Random Forests” on page 19-63.

See Also compareHoldout | fitcensemble | fitrensemble

Related Examples •

“Ensemble Algorithms” on page 19-42

•

“Test Ensemble Quality” on page 19-69

•

“Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 19-87

•

“Classification with Imbalanced Data” on page 19-82

•

“LPBoost and TotalBoost for Small Ensembles” on page 19-98

•

“Tune RobustBoost” on page 19-103

19-97

19

Nonparametric Supervised Learning

LPBoost and TotalBoost for Small Ensembles This example shows how to obtain the benefits of the LPBoost and TotalBoost algorithms. These algorithms share two beneficial characteristics: • They are self-terminating, which means you do not have to figure out how many members to include. • They produce ensembles with some very small weights, enabling you to safely remove ensemble members. Load the data Load the ionosphere data set. load ionosphere

Create the classification ensembles Create ensembles for classifying the ionosphere data using the LPBoost, TotalBoost, and, for comparison, AdaBoostM1 algorithms. It is hard to know how many members to include in an ensemble. For LPBoost and TotalBoost, try using 500. For comparison, also use 500 for AdaBoostM1. The default weak learners for boosting methods are decision trees with the MaxNumSplits property set to 10. These trees tend to fit better than tree stumps (with 1 maximum split) and may overfit more. Therefore, to prevent overfitting, use tree stumps as weak learners for the ensembles. rng('default') % For reproducibility T = 500; treeStump = templateTree('MaxNumSplits',1); adaStump = fitcensemble(X,Y,'Method','AdaBoostM1','NumLearningCycles',T,'Learners',treeStump); totalStump = fitcensemble(X,Y,'Method','TotalBoost','NumLearningCycles',T,'Learners',treeStump); lpStump = fitcensemble(X,Y,'Method','LPBoost','NumLearningCycles',T,'Learners',treeStump); figure plot(resubLoss(adaStump,'Mode','Cumulative')); hold on plot(resubLoss(totalStump,'Mode','Cumulative'),'r'); plot(resubLoss(lpStump,'Mode','Cumulative'),'g'); hold off xlabel('Number of stumps'); ylabel('Training error'); legend('AdaBoost','TotalBoost','LPBoost','Location','NE');

19-98

LPBoost and TotalBoost for Small Ensembles

All three algorithms achieve perfect prediction on the training data after a while. Examine the number of members in all three ensembles. [adaStump.NTrained totalStump.NTrained lpStump.NTrained] ans = 1×3 500

52

79

AdaBoostM1 trained all 500 members. The other two algorithms stopped training early. Cross validate the ensembles Cross validate the ensembles to better determine ensemble accuracy. cvlp = crossval(lpStump,'KFold',5); cvtotal = crossval(totalStump,'KFold',5); cvada = crossval(adaStump,'KFold',5); figure plot(kfoldLoss(cvada,'Mode','Cumulative')); hold on plot(kfoldLoss(cvtotal,'Mode','Cumulative'),'r'); plot(kfoldLoss(cvlp,'Mode','Cumulative'),'g'); hold off xlabel('Ensemble size');

19-99

19

Nonparametric Supervised Learning

ylabel('Cross-validated error'); legend('AdaBoost','TotalBoost','LPBoost','Location','NE');

The results show that each boosting algorithm achieves a loss of 10% or lower with 50 ensemble members. Compact and remove ensemble members To reduce the ensemble sizes, compact them, and then use removeLearners. The question is, how many learners should you remove? The cross-validated loss curves give you one measure. For another, examine the learner weights for LPBoost and TotalBoost after compacting. cada = compact(adaStump); clp = compact(lpStump); ctotal = compact(totalStump); figure subplot(2,1,1) plot(clp.TrainedWeights) title('LPBoost weights') subplot(2,1,2) plot(ctotal.TrainedWeights) title('TotalBoost weights')

19-100

LPBoost and TotalBoost for Small Ensembles

Both LPBoost and TotalBoost show clear points where the ensemble member weights become negligible. Remove the unimportant ensemble members. cada = removeLearners(cada,150:cada.NTrained); clp = removeLearners(clp,60:clp.NTrained); ctotal = removeLearners(ctotal,40:ctotal.NTrained);

Check that removing these learners does not affect ensemble accuracy on the training data. [loss(cada,X,Y) loss(clp,X,Y) loss(ctotal,X,Y)] ans = 1×3 0

0

0

Check the resulting compact ensemble sizes. s(1) = whos('cada'); s(2) = whos('clp'); s(3) = whos('ctotal'); s.bytes ans = 616194 ans = 246170

19-101

19

Nonparametric Supervised Learning

ans = 163950

The sizes of the compact ensembles are approximately proportional to the number of members in each.

See Also fitcensemble | resubLoss | crossval | kfoldLoss | compact | loss | removeLearners

Related Examples

19-102

•

“Surrogate Splits” on page 19-93

•

“Ensemble Algorithms” on page 19-42

•

“Test Ensemble Quality” on page 19-69

•

“Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 19-87

•

“Classification with Imbalanced Data” on page 19-82

•

“Tune RobustBoost” on page 19-103

Tune RobustBoost

Tune RobustBoost The RobustBoost algorithm can make good classification predictions even when the training data has noise. However, the default RobustBoost parameters can produce an ensemble that does not predict well. This example shows one way of tuning the parameters for better predictive accuracy. Generate data with label noise. This example has twenty uniform random numbers per observation, and classifies the observation as 1 if the sum of the first five numbers exceeds 2.5 (so is larger than average), and 0 otherwise: rng(0,'twister') % for reproducibility Xtrain = rand(2000,20); Ytrain = sum(Xtrain(:,1:5),2) > 2.5;

To add noise, randomly switch 10% of the classifications: idx = randsample(2000,200); Ytrain(idx) = ~Ytrain(idx);

Create an ensemble with AdaBoostM1 for comparison purposes: ada = fitcensemble(Xtrain,Ytrain,'Method','AdaBoostM1', ... 'NumLearningCycles',300,'Learners','Tree','LearnRate',0.1);

Create an ensemble with RobustBoost. Because the data has 10% incorrect classification, perhaps an error goal of 15% is reasonable. rb1 = fitcensemble(Xtrain,Ytrain,'Method','RobustBoost', ... 'NumLearningCycles',300,'Learners','Tree','RobustErrorGoal',0.15, ... 'RobustMaxMargin',1);

Note that if you set the error goal to a high enough value, then the software returns an error. Create an ensemble with very optimistic error goal, 0.01: rb2 = fitcensemble(Xtrain,Ytrain,'Method','RobustBoost', ... 'NumLearningCycles',300,'Learners','Tree','RobustErrorGoal',0.01);

Compare the resubstitution error of the three ensembles: figure plot(resubLoss(rb1,'Mode','Cumulative')); hold on plot(resubLoss(rb2,'Mode','Cumulative'),'r--'); plot(resubLoss(ada,'Mode','Cumulative'),'g.'); hold off; xlabel('Number of trees'); ylabel('Resubstitution error'); legend('ErrorGoal=0.15','ErrorGoal=0.01',... 'AdaBoostM1','Location','NE');

19-103

19

Nonparametric Supervised Learning

All the RobustBoost curves show lower resubstitution error than the AdaBoostM1 curve. The error goal of 0.01 curve shows the lowest resubstitution error over most of the range. Xtest = rand(2000,20); Ytest = sum(Xtest(:,1:5),2) > 2.5; idx = randsample(2000,200); Ytest(idx) = ~Ytest(idx); figure; plot(loss(rb1,Xtest,Ytest,'Mode','Cumulative')); hold on plot(loss(rb2,Xtest,Ytest,'Mode','Cumulative'),'r--'); plot(loss(ada,Xtest,Ytest,'Mode','Cumulative'),'g.'); hold off; xlabel('Number of trees'); ylabel('Test error'); legend('ErrorGoal=0.15','ErrorGoal=0.01',... 'AdaBoostM1','Location','NE');

19-104

Tune RobustBoost

The error curve for error goal 0.15 is lowest (best) in the plotted range. AdaBoostM1 has higher error than the curve for error goal 0.15. The curve for the too-optimistic error goal 0.01 remains substantially higher (worse) than the other algorithms for most of the plotted range.

See Also fitcensemble | resubLoss | loss

Related Examples •

“Surrogate Splits” on page 19-93

•

“Ensemble Algorithms” on page 19-42

•

“Test Ensemble Quality” on page 19-69

•

“Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 19-87

•

“Classification with Imbalanced Data” on page 19-82

•

“LPBoost and TotalBoost for Small Ensembles” on page 19-98

19-105

19

Nonparametric Supervised Learning

Random Subspace Classification This example shows how to use a random subspace ensemble to increase the accuracy of classification. It also shows how to use cross validation to determine good parameters for both the weak learner template and the ensemble. Load the data Load the ionosphere data. This data has 351 binary responses to 34 predictors. load ionosphere; [N,D] = size(X) N = 351 D = 34 resp = unique(Y) resp = 2x1 cell {'b'} {'g'}

Choose the number of nearest neighbors Find a good choice for k, the number of nearest neighbors in the classifier, by cross validation. Choose the number of neighbors approximately evenly spaced on a logarithmic scale. rng(8000,'twister') % for reproducibility K = round(logspace(0,log10(N),10)); % number of neighbors cvloss = zeros(numel(K),1); for k=1:numel(K) knn = fitcknn(X,Y,... 'NumNeighbors',K(k),'CrossVal','On'); cvloss(k) = kfoldLoss(knn); end figure; % Plot the accuracy versus k semilogx(K,cvloss); xlabel('Number of nearest neighbors'); ylabel('10 fold classification error'); title('KNN classification');

19-106

Random Subspace Classification

The lowest cross-validation error occurs for k = 2. Create the ensembles Create ensembles for 2-nearest neighbor classification with various numbers of dimensions, and examine the cross-validated loss of the resulting ensembles. This step takes a long time. To keep track of the progress, print a message as each dimension finishes. NPredToSample = round(linspace(1,D,10)); % linear spacing of dimensions cvloss = zeros(numel(NPredToSample),1); learner = templateKNN('NumNeighbors',2); for npred=1:numel(NPredToSample) subspace = fitcensemble(X,Y,'Method','Subspace','Learners',learner, ... 'NPredToSample',NPredToSample(npred),'CrossVal','On'); cvloss(npred) = kfoldLoss(subspace); fprintf('Random Subspace %i done.\n',npred); end Random Random Random Random Random Random Random

Subspace Subspace Subspace Subspace Subspace Subspace Subspace

1 2 3 4 5 6 7

done. done. done. done. done. done. done.

19-107

19

Nonparametric Supervised Learning

Random Subspace 8 done. Random Subspace 9 done. Random Subspace 10 done. figure; % plot the accuracy versus dimension plot(NPredToSample,cvloss); xlabel('Number of predictors selected at random'); ylabel('10 fold classification error'); title('KNN classification with Random Subspace');

The ensembles that use five and eight predictors per learner have the lowest cross-validated error. The error rate for these ensembles is about 0.06, while the other ensembles have cross-validated error rates that are approximately 0.1 or more. Find a good ensemble size Find the smallest number of learners in the ensemble that still give good classification. ens = fitcensemble(X,Y,'Method','Subspace','Learners',learner, ... 'NPredToSample',5,'CrossVal','on'); figure; % Plot the accuracy versus number in ensemble plot(kfoldLoss(ens,'Mode','Cumulative')) xlabel('Number of learners in ensemble'); ylabel('10 fold classification error'); title('KNN classification with Random Subspace');

19-108

Random Subspace Classification

There seems to be no advantage in an ensemble with more than 50 or so learners. It is possible that 25 learners gives good predictions. Create a final ensemble Construct a final ensemble with 50 learners. Compact the ensemble and see if the compacted version saves an appreciable amount of memory. ens = fitcensemble(X,Y,'Method','Subspace','NumLearningCycles',50,... 'Learners',learner,'NPredToSample',5); cens = compact(ens); s1 = whos('ens'); s2 = whos('cens'); [s1.bytes s2.bytes] % si.bytes = size in bytes ans = 1×2 1757468

1527439

The compact ensemble is about 10% smaller than the full ensemble. Both give the same predictions.

See Also fitcknn | fitcensemble | kfoldLoss | templateKNN | compact

19-109

19

Nonparametric Supervised Learning

Related Examples

19-110

•

“Framework for Ensemble Learning” on page 19-34

•

“Ensemble Algorithms” on page 19-42

•

“Train Classification Ensemble” on page 19-57

•

“Test Ensemble Quality” on page 19-69

Train Classification Ensemble in Parallel

Train Classification Ensemble in Parallel This example shows how to train a classification ensemble in parallel. The model has ten red and ten green base locations, and red and green populations that are normally distributed and centered at the base locations. The objective is to classify points based on their locations. These classifications are ambiguous because some base locations are near the locations of the other color. Create and plot ten base locations of each color. rng default % For reproducibility grnpop = mvnrnd([1,0],eye(2),10); redpop = mvnrnd([0,1],eye(2),10); plot(grnpop(:,1),grnpop(:,2),'go') hold on plot(redpop(:,1),redpop(:,2),'ro') hold off

Create 40,000 points of each color centered on random base points. N = 40000; redpts = zeros(N,2);grnpts = redpts; for i = 1:N grnpts(i,:) = mvnrnd(grnpop(randi(10),:),eye(2)*0.02); redpts(i,:) = mvnrnd(redpop(randi(10),:),eye(2)*0.02); end figure

19-111

19

Nonparametric Supervised Learning

plot(grnpts(:,1),grnpts(:,2),'go') hold on plot(redpts(:,1),redpts(:,2),'ro') hold off

cdata = [grnpts;redpts]; grp = ones(2*N,1); % Green label 1, red label -1 grp(N+1:2*N) = -1;

Fit a bagged classification ensemble to the data. For comparison with parallel training, fit the ensemble in serial and return the training time. tic mdl = fitcensemble(cdata,grp,'Method','Bag'); stime = toc stime = 12.4671

Evaluate the out-of-bag loss for the fitted model. myerr = oobLoss(mdl) myerr = 0.0572

Create a bagged classification model in parallel, using a reproducible tree template and parallel substreams. You can create a parallel pool on a cluster or a parallel pool of thread workers on your 19-112

Train Classification Ensemble in Parallel

local machine. To choose the appropriate parallel environment, see “Choose Between Thread-Based and Process-Based Environments” (Parallel Computing Toolbox). parpool Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 8). ans = ProcessPool with properties: Connected: NumWorkers: Busy: Cluster: AttachedFiles: AutoAddClientPath: FileStore: ValueStore: IdleTimeout: SpmdEnabled:

true 8 false local {} true [1x1 parallel.FileStore] [1x1 parallel.ValueStore] 30 minutes (30 minutes remaining) true

s = RandStream('mrg32k3a'); options = statset("UseParallel",true,"UseSubstreams",true,"Streams",s); t = templateTree("Reproducible",true); tic mdl2 = fitcensemble(cdata,grp,'Method','Bag','Learners',t,'Options',options); ptime = toc ptime = 5.9234

On this six-core system, the training process in parallel is faster. speedup = stime/ptime speedup = 2.1047

Evaluate the out-of-bag loss for this model. myerr2 = oobLoss(mdl2) myerr2 = 0.0577

The error rate is similar to the rate of the first model. To demonstrate the reproducibility of the model, reset the random number stream and fit the model again. reset(s); tic mdl2 = fitcensemble(cdata,grp,'Method','Bag','Learners',t,'Options',options); toc Elapsed time is 3.446164 seconds.

Check that the loss is the same as the previous loss. myerr2 = oobLoss(mdl2)

19-113

19

Nonparametric Supervised Learning

myerr2 = 0.0577

See Also fitcensemble | fitrensemble

Related Examples

19-114

•

“Classification Ensembles”

•

“Regression Tree Ensembles”

Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger

Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger Statistics and Machine Learning Toolbox™ offers two objects that support bootstrap aggregation (bagging) of regression trees: TreeBagger created by using TreeBagger and RegressionBaggedEnsemble created by using fitrensemble. See “Comparison of TreeBagger and Bagged Ensembles” on page 19-47 for differences between TreeBagger and RegressionBaggedEnsemble. This example shows the workflow for regression using the features in TreeBagger only. Use a database of 1985 car imports with 205 observations, 25 predictors, and 1 response, which is insurance risk rating, or "symboling." The first 15 variables are numeric and the last 10 are categorical. The symboling index takes integer values from -3 to 3. Load the data set and split it into predictor and response arrays. load imports-85 Y = X(:,1); X = X(:,2:end); isCategorical = [zeros(15,1); ones(size(X,2)-15,1)]; % Categorical variable flag

Because bagging uses randomized data drawings, its exact outcome depends on the initial random seed. To reproduce the results in this example, use the random stream settings. rng(1945,'twister')

Finding the Optimal Leaf Size For regression, the general rule is to the set leaf size to 5 and select one third of the input features for decision splits at random. In the following step, verify the optimal leaf size by comparing mean squared errors obtained by regression for various leaf sizes. oobError computes MSE versus the number of grown trees. You must set OOBPred to 'On' to obtain out-of-bag predictions later. leaf = [5 10 20 50 100]; col = 'rbcmy'; figure hold on for i=1:length(leaf) b = TreeBagger(50,X,Y,'Method','regression', ... 'OOBPrediction','On', ... 'CategoricalPredictors',find(isCategorical == 1), ... 'MinLeafSize',leaf(i)); plot(oobError(b),col(i)) end xlabel('Number of Grown Trees') ylabel('Mean Squared Error') legend({'5' '10' '20' '50' '100'},'Location','NorthEast') hold off

19-115

19

Nonparametric Supervised Learning

The red curve (leaf size 5) yields the lowest MSE values. Estimating Feature Importance In practical applications, you typically grow ensembles with hundreds of trees. For example, the previous code block uses 50 trees for faster processing. Now that you have estimated the optimal leaf size, grow a larger ensemble with 100 trees and use it to estimate feature importance. b = TreeBagger(100,X,Y,'Method','regression', ... 'OOBPredictorImportance','On', ... 'CategoricalPredictors',find(isCategorical == 1), ... 'MinLeafSize',5);

Inspect the error curve again to make sure nothing went wrong during training. figure plot(oobError(b)) xlabel('Number of Grown Trees') ylabel('Out-of-Bag Mean Squared Error')

19-116

Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger

Prediction ability should depend more on important features than unimportant features. You can use this idea to measure feature importance. For each feature, permute the values of this feature across every observation in the data set and measure how much worse the MSE becomes after the permutation. You can repeat this for each feature. Plot the increase in MSE due to permuting out-of-bag observations across each input variable. The OOBPermutedPredictorDeltaError array stores the increase in MSE averaged over all trees in the ensemble and divided by the standard deviation taken over the trees, for each variable. The larger this value, the more important the variable. Imposing an arbitrary cutoff at 0.7, you can select the four most important features. figure bar(b.OOBPermutedPredictorDeltaError) xlabel('Feature Number') ylabel('Out-of-Bag Feature Importance')

19-117

19

Nonparametric Supervised Learning

idxvar = find(b.OOBPermutedPredictorDeltaError>0.7) idxvar = 1×4 1

2

16

19

idxCategorical = find(isCategorical(idxvar)==1);

The OOBIndices property of TreeBagger tracks which observations are out of bag for what trees. Using this property, you can monitor the fraction of observations in the training data that are in bag for all trees. The curve starts at approximately 2/3, which is the fraction of unique observations selected by one bootstrap replica, and goes down to 0 at approximately 10 trees. finbag = zeros(1,b.NTrees); for t=1:b.NTrees finbag(t) = sum(all(~b.OOBIndices(:,1:t),2)); end finbag = finbag/size(X,1); figure plot(finbag) xlabel('Number of Grown Trees') ylabel('Fraction of In-Bag Observations')

19-118

Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger

Growing Trees on a Reduced Set of Features Using just the four most powerful features, determine if it is possible to obtain a similar predictive power. To begin, grow 100 trees on these features only. The first two of the four selected features are numeric and the last two are categorical. b5v = TreeBagger(100,X(:,idxvar),Y, ... 'Method','regression','OOBPredictorImportance','On', ... 'CategoricalPredictors',idxCategorical,'MinLeafSize',5); figure plot(oobError(b5v)) xlabel('Number of Grown Trees') ylabel('Out-of-Bag Mean Squared Error')

19-119

19

Nonparametric Supervised Learning

figure bar(b5v.OOBPermutedPredictorDeltaError) xlabel('Feature Index') ylabel('Out-of-Bag Feature Importance')

19-120

Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger

These four most powerful features give the same MSE as the full set, and the ensemble trained on the reduced set ranks these features similarly to each other. If you remove features 1 and 2 from the reduced set, then the predictive power of the algorithm might not decrease significantly. Finding Outliers To find outliers in the training data, compute the proximity matrix using fillProximities. b5v = fillProximities(b5v);

The method normalizes this measure by subtracting the mean outlier measure for the entire sample. Then it takes the magnitude of this difference and divides the result by the median absolute deviation for the entire sample. figure histogram(b5v.OutlierMeasure) xlabel('Outlier Measure') ylabel('Number of Observations')

19-121

19

Nonparametric Supervised Learning

Discovering Clusters in the Data By applying multidimensional scaling to the computed matrix of proximities, you can inspect the structure of the input data and look for possible clusters of observations. The mdsProx method returns scaled coordinates and eigenvalues for the computed proximity matrix. If you run it with the Colors name-value-pair argument, then this method creates a scatter plot of two scaled coordinates. figure [~,e] = mdsProx(b5v,'Colors','K'); xlabel('First Scaled Coordinate') ylabel('Second Scaled Coordinate')

19-122

Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger

Assess the relative importance of the scaled axes by plotting the first 20 eigenvalues. figure bar(e(1:20)) xlabel('Scaled Coordinate Index') ylabel('Eigenvalue')

19-123

19

Nonparametric Supervised Learning

Saving the Ensemble Configuration for Future Use To use the trained ensemble for predicting the response on unseen data, store the ensemble to disk and retrieve it later. If you do not want to compute predictions for out-of-bag data or reuse training data in any other way, there is no need to store the ensemble object itself. Saving the compact version of the ensemble is enough in this case. Extract the compact object from the ensemble. c = compact(b5v) c = CompactTreeBagger Ensemble with 100 bagged decision trees: Method: regression NumPredictors: 4

You can save the resulting CompactTreeBagger model in a *.mat file.

See Also TreeBagger | compact | oobError | mdsprox | fillprox | fitrensemble

Related Examples

19-124

•

“Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger” on page 19-126

•

“Comparison of TreeBagger and Bagged Ensembles” on page 19-47

Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger

•

“Use Parallel Processing for Regression TreeBagger Workflow” on page 33-4

19-125

19

Nonparametric Supervised Learning

Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger Statistics and Machine Learning Toolbox™ offers two objects that support bootstrap aggregation (bagging) of classification trees: TreeBagger created by using TreeBagger and ClassificationBaggedEnsemble created by using fitcensemble. See “Comparison of TreeBagger and Bagged Ensembles” on page 19-47 for differences between TreeBagger and ClassificationBaggedEnsemble. This example shows the workflow for classification using the features in TreeBagger only. Use ionosphere data with 351 observations and 34 real-valued predictors. The response variable is categorical with two levels: • 'g' represents good radar returns. • 'b' represents bad radar returns. The goal is to predict good or bad returns using a set of 34 measurements. Fix the initial random seed, grow 50 trees, inspect how the ensemble error changes with accumulation of trees, and estimate feature importance. For classification, it is best to set the minimal leaf size to 1 and select the square root of the total number of features for each decision split at random. These settings are defaults for TreeBagger used for classification. load ionosphere rng(1945,'twister') b = TreeBagger(50,X,Y,'OOBPredictorImportance','On'); figure plot(oobError(b)) xlabel('Number of Grown Trees') ylabel('Out-of-Bag Classification Error')

19-126

Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger

The method trains ensembles with few trees on observations that are in bag for all trees. For such observations, it is impossible to compute the true out-of-bag prediction, and TreeBagger returns the most probable class for classification and the sample mean for regression. You can change the default value returned for in-bag observations using the DefaultYfit property. If you set the default value to an empty character vector for classification, the method excludes in-bag observations from computation of the out-of-bag error. In this case, the curve is more variable when the number of trees is small, either because some observations are never out of bag (and are therefore excluded) or because their predictions are based on few trees. b.DefaultYfit = ''; figure plot(oobError(b)) xlabel('Number of Grown Trees') ylabel('Out-of-Bag Error Excluding In-Bag Observations')

19-127

19

Nonparametric Supervised Learning

The OOBIndices property of TreeBagger tracks which observations are out of bag for what trees. Using this property, you can monitor the fraction of observations in the training data that are in bag for all trees. The curve starts at approximately 2/3, which is the fraction of unique observations selected by one bootstrap replica, and goes down to 0 at approximately 10 trees. finbag = zeros(1,b.NumTrees); for t=1:b.NTrees finbag(t) = sum(all(~b.OOBIndices(:,1:t),2)); end finbag = finbag / size(X,1); figure plot(finbag) xlabel('Number of Grown Trees') ylabel('Fraction of In-Bag Observations')

19-128

Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger

Estimate feature importance. figure bar(b.OOBPermutedPredictorDeltaError) xlabel('Feature Index') ylabel('Out-of-Bag Feature Importance')

19-129

19

Nonparametric Supervised Learning

Select the features yielding an importance measure greater than 0.75. This threshold is chosen arbitrarily. idxvar = find(b.OOBPermutedPredictorDeltaError>0.75) idxvar = 1×5 3

5

7

8

27

Having selected the most important features, grow a larger ensemble on the reduced feature set. Save time by not permuting out-of-bag observations to obtain new estimates of feature importance for the reduced feature set (set OOBVarImp to 'off'). You would still be interested in obtaining out-ofbag estimates of classification error (set OOBPred to 'on'). b5v = TreeBagger(100,X(:,idxvar),Y,'OOBPredictorImportance','off','OOBPrediction','on'); figure plot(oobError(b5v)) xlabel('Number of Grown Trees') ylabel('Out-of-Bag Classification Error')

19-130

Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger

For classification ensembles, in addition to classification error (fraction of misclassified observations), you can also monitor the average classification margin. For each observation, the margin is defined as the difference between the score for the true class and the maximal score for other classes predicted by this tree. The cumulative classification margin uses the scores averaged over all trees and the mean cumulative classification margin is the cumulative margin averaged over all observations. The oobMeanMargin method with the 'mode' argument set to 'cumulative' (default) shows how the mean cumulative margin changes as the ensemble grows: every new element in the returned array represents the cumulative margin obtained by including a new tree in the ensemble. If training is successful, you would expect to see a gradual increase in the mean classification margin. The method trains ensembles with few trees on observations that are in bag for all trees. For such observations, it is impossible to compute the true out-of-bag prediction, and TreeBagger returns the most probable class for classification and the sample mean for regression. For decision trees, a classification score is the probability of observing an instance of this class in this tree leaf. For example, if the leaf of a grown decision tree has five 'good' and three 'bad' training observations in it, the scores returned by this decision tree for any observation fallen on this leaf are 5/8 for the 'good' class and 3/8 for the 'bad' class. These probabilities are called 'scores' for consistency with other classifiers that might not have an obvious interpretation for numeric values of returned predictions. figure plot(oobMeanMargin(b5v)); xlabel('Number of Grown Trees') ylabel('Out-of-Bag Mean Classification Margin')

19-131

19

Nonparametric Supervised Learning

Compute the matrix of proximities and examine the distribution of outlier measures. Unlike regression, outlier measures for classification ensembles are computed within each class separately. b5v = fillProximities(b5v); figure histogram(b5v.OutlierMeasure) xlabel('Outlier Measure') ylabel('Number of Observations')

19-132

Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger

Find the class of the extreme outliers. extremeOutliers = b5v.Y(b5v.OutlierMeasure>40) extremeOutliers = 6x1 cell {'g'} {'g'} {'g'} {'g'} {'g'} {'g'} percentGood = 100*sum(strcmp(extremeOutliers,'g'))/numel(extremeOutliers) percentGood = 100

All of the extreme outliers are labeled 'good'. As for regression, you can plot scaled coordinates, displaying the two classes in different colors using the 'Colors' name-value pair argument of mdsProx. This argument takes a character vector in which every character represents a color. The software does not rank class names. Therefore, it is best practice to determine the position of the classes in the ClassNames property of the ensemble. gPosition = find(strcmp('g',b5v.ClassNames)) gPosition = 2

19-133

19

Nonparametric Supervised Learning

The 'bad' class is first and the 'good' class is second. Display scaled coordinates using red for the 'bad' class and blue for the 'good' class observations. figure [s,e] = mdsProx(b5v,'Colors','rb'); xlabel('First Scaled Coordinate') ylabel('Second Scaled Coordinate')

Plot the first 20 eigenvalues obtained by scaling. The first eigenvalue clearly dominates and the first scaled coordinate is most important. figure bar(e(1:20)) xlabel('Scaled Coordinate Index') ylabel('Eigenvalue')

19-134

Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger

Another way of exploring the performance of a classification ensemble is to plot its receiver operating characteristic (ROC) curve or another performance curve suitable for the current problem. Obtain predictions for out-of-bag observations. For a classification ensemble, the oobPredict method returns a cell array of classification labels as the first output argument and a numeric array of scores as the second output argument. The returned array of scores has two columns, one for each class. In this case, the first column is for the 'bad' class and the second column is for the 'good' class. One column in the score matrix is redundant because the scores represent class probabilities in tree leaves and by definition add up to 1. [Yfit,Sfit] = oobPredict(b5v);

Use rocmetrics to compute a performance curve. By default, rocmetrics computes true positive rates and false positive rates for a ROC curve. rocObj = rocmetrics(b5v.Y,Sfit(:,gPosition),'g');

Plot the ROC curve for the 'good' class by using the plot function of rocmetrics. plot(rocObj)

19-135

19

Nonparametric Supervised Learning

Instead of the standard ROC curve, you might want to plot, for example, ensemble accuracy versus threshold on the score for the 'good' class. Compute accuracy by using the addMetrics function of rocmetrics. Accuracy is the fraction of correctly classified observations, or equivalently, 1 minus the classification error. rocObj = addMetrics(rocObj,'Accuracy');

Create a plot of ensemble accuracy versus threshold. thre = rocObj.Metrics.Threshold; accu = rocObj.Metrics.Accuracy; plot(thre,accu) xlabel('Threshold for ''good'' Returns') ylabel('Classification Accuracy')

19-136

Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger

The curve shows a flat region indicating that any threshold from 0.2 to 0.6 is a reasonable choice. By default, a classification model assigns classification labels using 0.5 as the boundary between the two classes. You can find exactly what accuracy this corresponds to. [~,idx] = min(abs(thre-0.5)); accu(idx) ans = 0.9316

Find the maximal accuracy. [maxaccu,iaccu] = max(accu) maxaccu = 0.9345 iaccu = 99

The maximal accuracy is a little higher than the default one. The optimal threshold is therefore. thre(iaccu) ans = 0.5278

See Also TreeBagger | compact | oobError | mdsprox | oobMeanMargin | oobPredict | perfcurve | fitcensemble 19-137

19

Nonparametric Supervised Learning

Related Examples

19-138

•

“Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger” on page 19-115

•

“Comparison of TreeBagger and Bagged Ensembles” on page 19-47

•

“Use Parallel Processing for Regression TreeBagger Workflow” on page 33-4

Detect Outliers Using Quantile Regression

Detect Outliers Using Quantile Regression This example shows how to detect outliers using quantile random forest. Quantile random forest can detect outliers with respect to the conditional distribution of Y given X . However, this method cannot detect outliers in the predictor data. For outlier detection in the predictor data using a bag of decision trees, see the OutlierMeasure property of a TreeBagger model. An outlier is an observation that is located far enough from most of the other observations in a data set and can be considered anomalous. Causes of outlying observations include inherent variability or measurement error. Outliers significantly affect estimates and inference, so it is important to detect them and decide whether to remove them or consider a robust analysis. To demonstrate outlier detection, this example: 1

Generates data from a nonlinear model with heteroscedasticity and simulates a few outliers.

2

Grows a quantile random forest of regression trees.

3

Estimates conditional quartiles (Q1, Q2, and Q3) and the interquartile range (IQR) within the ranges of the predictor variables.

4

Compares the observations to the fences, which are the quantities F1 = Q1 − 1 . 5IQR and F2 = Q3 + 1 . 5IQR. Any observation that is less than F1 or greater than F2 is an outlier.

Generate Data Generate 500 observations from the model yt = 10 + 3t + tsin(2t) + εt .

t is uniformly distributed between 0 and 4π, and εt ∼ N(0, t + 0 . 01). Store the data in a table. n = 500; rng('default'); % For reproducibility t = randsample(linspace(0,4*pi,1e6),n,true)'; epsilon = randn(n,1).*sqrt((t+0.01)); y = 10 + 3*t + t.*sin(2*t) + epsilon; Tbl = table(t,y);

Move five observations in a random vertical direction by 90% of the value of the response. numOut = 5; [~,idx] = datasample(Tbl,numOut); Tbl.y(idx) = Tbl.y(idx) + randsample([-1 1],numOut,true)'.*(0.9*Tbl.y(idx));

Draw a scatter plot of the data and identify the outliers. figure; plot(Tbl.t,Tbl.y,'.'); hold on plot(Tbl.t(idx),Tbl.y(idx),'*'); axis tight; ylabel('y'); xlabel('t'); title('Scatter Plot of Data'); legend('Data','Simulated outliers','Location','NorthWest');

19-139

19

Nonparametric Supervised Learning

Grow Quantile Random Forest Grow a bag of 200 regression trees using TreeBagger. Mdl = TreeBagger(200,Tbl,'y','Method','regression');

Mdl is a TreeBagger ensemble. Predict Conditional Quartiles and Interquartile Ranges Using quantile regression, estimate the conditional quartiles of 50 equally spaced values within the range of t. tau = [0.25 0.5 0.75]; predT = linspace(0,4*pi,50)'; quartiles = quantilePredict(Mdl,predT,'Quantile',tau);

quartiles is a 500-by-3 matrix of conditional quartiles. Rows correspond to the observations in t, and columns correspond to the probabilities in tau. On the scatter plot of the data, plot the conditional mean and median responses. meanY = predict(Mdl,predT); plot(predT,[quartiles(:,2) meanY],'LineWidth',2); legend('Data','Simulated outliers','Median response','Mean response',... 'Location','NorthWest'); hold off;

19-140

Detect Outliers Using Quantile Regression

Although the conditional mean and median curves are close, the simulated outliers can affect the mean curve. Compute the conditional IQR, F1, and F2. iqr = quartiles(:,3) - quartiles(:,1); k = 1.5; f1 = quartiles(:,1) - k*iqr; f2 = quartiles(:,3) + k*iqr;

k = 1.5 means that all observations less than f1 or greater than f2 are considered outliers, but this threshold does not disambiguate from extreme outliers. A k of 3 identifies extreme outliers. Compare Observations to Fences Plot the observations and the fences. figure; plot(Tbl.t,Tbl.y,'.'); hold on plot(Tbl.t(idx),Tbl.y(idx),'*'); plot(predT,[f1 f2]); legend('Data','Simulated outliers','F_1','F_2','Location','NorthWest'); axis tight title('Outlier Detection Using Quantile Regression') hold off

19-141

19

Nonparametric Supervised Learning

All simulated outliers fall outside [F1, F2], and some observations are outside this interval as well.

See Also Classes TreeBagger Functions predict | quantilePredict

Related Examples

19-142

•

“Conditional Quantile Estimation Using Kernel Smoothing” on page 19-143

•

“Tune Random Forest Using Quantile Error and Bayesian Optimization” on page 19-146

Conditional Quantile Estimation Using Kernel Smoothing

Conditional Quantile Estimation Using Kernel Smoothing This example shows how to estimate conditional quantiles of a response given predictor data using quantile random forest and by estimating the conditional distribution function of the response using kernel smoothing. For quantile-estimation speed, quantilePredict, oobQuantilePredict, quantileError, and oobQuantileError use linear interpolation to predict quantiles in the conditional distribution of the response. However, you can obtain response weights, which comprise the distribution function, and then pass them to ksdensity to possibly gain accuracy at the cost of computation speed. Generate 2000 observations from the model yt = 0 . 5 + t + εt .

t is uniformly distributed between 0 and 1, and εt ∼ N(0, t2 /2 + 0 . 01). Store the data in a table. n = 2000; rng('default'); % For reproducibility t = randsample(linspace(0,1,1e2),n,true)'; epsilon = randn(n,1).*sqrt(t.^2/2 + 0.01); y = 0.5 + t + epsilon; Tbl = table(t,y);

Train an ensemble of bagged regression trees using the entire data set. Specify 200 weak learners and save the out-of-bag indices. rng('default'); % For reproducibility Mdl = TreeBagger(200,Tbl,'y','Method','regression',... 'OOBPrediction','on');

Mdl is a TreeBagger ensemble. Predict out-of-bag, conditional 0.05 and 0.95 quantiles (90% confidence intervals) for all trainingsample observations using oobQuantilePredict, that is, by interpolation. Request response weights. Record the execution time. tau = [0.05 0.95]; tic [quantInterp,yw] = oobQuantilePredict(Mdl,'Quantile',tau); timeInterp = toc;

quantInterp is a 94-by-2 matrix of predicted quantiles; rows correspond to the observations in Mdl.X and columns correspond to the quantile probabilities in tau. yw is a 94-by-94 sparse matrix of response weights; rows correspond to training-sample observations and columns correspond to the observations in Mdl.X. Response weights are independent of tau. Predict out-of-bag, conditional 0.05 and 0.95 quantiles using kernel smoothing and record the execution time. n = numel(Tbl.y); quantKS = zeros(n,numel(tau)); % Preallocation

19-143

19

Nonparametric Supervised Learning

tic for j = 1:n quantKS(j,:) = ksdensity(Tbl.y,tau,'Function','icdf','Weights',yw(:,j)); end timeKS = toc;

quantKS is commensurate with quantInterp. Evaluate the ratio of execution times between kernel smoothing estimation and interpolation. timeKS/timeInterp ans = 5.0376

It takes much more time to execute kernel smoothing than interpolation. This ratio is dependent on the memory of your machine, so your results will vary. Plot the data with both sets of predicted quantiles. [sT,idx] = sort(t); figure; h1 = plot(t,y,'.'); hold on h2 = plot(sT,quantInterp(idx,:),'b'); h3 = plot(sT,quantKS(idx,:),'r'); legend([h1 h2(1) h3(1)],'Data','Interpolation','Kernel Smoothing'); title('Quantile Estimates') hold off

19-144

Conditional Quantile Estimation Using Kernel Smoothing

Both sets of estimated quantiles agree fairly well. However, the quantile intervals from interpolation appear slightly tighter for smaller values of t than the ones from kernel smoothing.

See Also oobQuantilePredict | TreeBagger | ksdensity

Related Examples •

“Detect Outliers Using Quantile Regression” on page 19-139

•

“Tune Random Forest Using Quantile Error and Bayesian Optimization” on page 19-146

19-145

19

Nonparametric Supervised Learning

Tune Random Forest Using Quantile Error and Bayesian Optimization This example shows how to implement Bayesian optimization to tune the hyperparameters of a random forest of regression trees using quantile error. Tuning a model using quantile error, rather than mean squared error, is appropriate if you plan to use the model to predict conditional quantiles rather than conditional means. Load and Preprocess Data Load the carsmall data set. Consider a model that predicts the median fuel economy of a car given its acceleration, number of cylinders, engine displacement, horsepower, manufacturer, model year, and weight. Consider Cylinders, Mfg, and Model_Year as categorical variables. load carsmall Cylinders = categorical(Cylinders); Mfg = categorical(cellstr(Mfg)); Model_Year = categorical(Model_Year); X = table(Acceleration,Cylinders,Displacement,Horsepower,Mfg,... Model_Year,Weight,MPG); rng('default'); % For reproducibility

Specify Tuning Parameters Consider tuning: • The complexity (depth) of the trees in the forest. Deep trees tend to over-fit, but shallow trees tend to underfit. Therefore, specify that the minimum number of observations per leaf be at most 20. • When growing the trees, the number of predictors to sample at each node. Specify sampling from 1 through all of the predictors. bayesopt, the function that implements Bayesian optimization, requires you to pass these specifications as optimizableVariable objects. maxMinLS = 20; minLS = optimizableVariable('minLS',[1,maxMinLS],'Type','integer'); numPTS = optimizableVariable('numPTS',[1,size(X,2)-1],'Type','integer'); hyperparametersRF = [minLS; numPTS];

hyperparametersRF is a 2-by-1 array of OptimizableVariable objects. You should also consider tuning the number of trees in the ensemble. bayesopt tends to choose random forests containing many trees because ensembles with more learners are more accurate. If available computation resources is a consideration, and you prefer ensembles with as fewer trees, then consider tuning the number of trees separately from the other parameters or penalizing models containing many learners. Define Objective Function Define an objective function for the Bayesian optimization algorithm to optimize. The function should: • Accept the parameters to tune as an input. 19-146

Tune Random Forest Using Quantile Error and Bayesian Optimization

• Train a random forest using TreeBagger. In the TreeBagger call, specify the parameters to tune and specify returning the out-of-bag indices. • Estimate the out-of-bag quantile error based on the median. • Return the out-of-bag quantile error. function oobErr = oobErrRF(params,X) %oobErrRF Trains random forest and estimates out-of-bag quantile error % oobErr trains a random forest of 300 regression trees using the % predictor data in X and the parameter specification in params, and then % returns the out-of-bag quantile error based on the median. X is a table % and params is an array of OptimizableVariable objects corresponding to % the minimum leaf size and number of predictors to sample at each node. randomForest = TreeBagger(300,X,'MPG','Method','regression',... 'OOBPrediction','on','MinLeafSize',params.minLS,... 'NumPredictorstoSample',params.numPTS); oobErr = oobQuantileError(randomForest); end

Minimize Objective Using Bayesian Optimization Find the model achieving the minimal, penalized, out-of-bag quantile error with respect to tree complexity and number of predictors to sample at each node using Bayesian optimization. Specify the expected improvement plus function as the acquisition function and suppress printing the optimization information. results = bayesopt(@(params)oobErrRF(params,X),hyperparametersRF,... 'AcquisitionFunctionName','expected-improvement-plus','Verbose',0);

19-147

19

Nonparametric Supervised Learning

19-148

Tune Random Forest Using Quantile Error and Bayesian Optimization

results is a BayesianOptimization object containing, among other things, the minimum of the objective function and the optimized hyperparameter values. Display the observed minimum of the objective function and the optimized hyperparameter values. bestOOBErr = results.MinObjective bestHyperparameters = results.XAtMinObjective bestOOBErr = 1.0890 bestHyperparameters = 1×2 table minLS _____ 7

numPTS ______ 7

Train Model Using Optimized Hyperparameters Train a random forest using the entire data set and the optimized hyperparameter values. 19-149

19

Nonparametric Supervised Learning

Mdl = TreeBagger(300,X,'MPG','Method','regression',... 'MinLeafSize',bestHyperparameters.minLS,... 'NumPredictorstoSample',bestHyperparameters.numPTS);

Mdl is TreeBagger object optimized for median prediction. You can predict the median fuel economy given predictor data by passing Mdl and the new data to quantilePredict.

See Also oobQuantileError | TreeBagger | bayesopt | optimizableVariable

Related Examples

19-150

•

“Detect Outliers Using Quantile Regression” on page 19-139

•

“Conditional Quantile Estimation Using Kernel Smoothing” on page 19-143

Assess Neural Network Classifier Performance

Assess Neural Network Classifier Performance Create a feedforward neural network classifier with fully connected layers using fitcnet. Use validation data for early stopping of the training process to prevent overfitting the model. Then, use the object functions of the classifier to assess the performance of the model on test data. Load and Preprocess Sample Data This example uses the 1994 census data stored in census1994.mat. The data set consists of demographic information from the US Census Bureau that you can use to predict whether an individual makes over $50,000 per year. Load the sample data census1994, which contains the training data adultdata and the test data adulttest. Preview the first few rows of the training data set. load census1994 head(adultdata) age ___

workClass ________________

fnlwgt __________

education _________

39 50 38 53 28 37 49 52

State-gov Self-emp-not-inc Private Private Private Private Private Self-emp-not-inc

77516 83311 2.1565e+05 2.3472e+05 3.3841e+05 2.8458e+05 1.6019e+05 2.0964e+05

Bachelors Bachelors HS-grad 11th Bachelors Masters 9th HS-grad

education_num _____________ 13 13 9 7 13 14 5 9

marital_status _____________________ Never-married Married-civ-spouse Divorced Married-civ-spouse Married-civ-spouse Married-civ-spouse Married-spouse-absent Married-civ-spouse

Each row contains the demographic information for one adult. The last column, salary, shows whether a person has a salary less than or equal to $50,000 per year or greater than $50,000 per year. Delete the rows of adultdata and adulttest in which the tables have missing values. adultdata = rmmissing(adultdata); adulttest = rmmissing(adulttest);

Combine the education_num and education variables in both the training and test data to create a single ordered categorical variable that shows the highest level of education a person has achieved. edOrder = unique(adultdata.education_num,"stable"); edCats = unique(adultdata.education,"stable"); [~,edIdx] = sort(edOrder); adultdata.education = categorical(adultdata.education, ... edCats(edIdx),"Ordinal",true); adultdata.education_num = []; adulttest.education = categorical(adulttest.education, ... edCats(edIdx),"Ordinal",true); adulttest.education_num = [];

19-151

19

Nonparametric Supervised Learning

Partition Training Data Split the training data further using a stratified holdout partition. Create a separate validation data set to stop the model training process early. Reserve approximately 30% of the observations for the validation data set and use the rest of the observations to train the neural network classifier. rng("default") % For reproducibility of the partition c = cvpartition(adultdata.salary,"Holdout",0.30); trainingIndices = training(c); validationIndices = test(c); tblTrain = adultdata(trainingIndices,:); tblValidation = adultdata(validationIndices,:);

Train Neural Network Train a neural network classifier by using the training set. Specify the salary column of tblTrain as the response and the fnlwgt column as the observation weights, and standardize the numeric predictors. Evaluate the model at each iteration by using the validation set. Specify to display the training information at each iteration by using the Verbose name-value argument. By default, the training process ends early if the validation cross-entropy loss is greater than or equal to the minimum validation cross-entropy loss computed so far, six times in a row. To change the number of times the validation loss is allowed to be greater than or equal to the minimum, specify the ValidationPatience name-value argument. Mdl = fitcnet(tblTrain,"salary","Weights","fnlwgt", ... "Standardize",true,"ValidationData",tblValidation, ... "Verbose",1); |==========================================================================================| | Iteration | Train Loss | Gradient | Step | Iteration | Validation | Validation | | | | | | Time (sec) | Loss | Checks | |==========================================================================================| | 1| 0.326435| 0.105391| 1.174862| 0.030042| 0.325292| 0| | 2| 0.275413| 0.024249| 0.259219| 0.045507| 0.275310| 0| | 3| 0.258430| 0.027390| 0.173985| 0.033822| 0.258820| 0| | 4| 0.218429| 0.024172| 0.617121| 0.049046| 0.220265| 0| | 5| 0.194545| 0.022570| 0.717853| 0.037449| 0.197881| 0| | 6| 0.187702| 0.030800| 0.706053| 0.016282| 0.192706| 0| | 7| 0.182328| 0.016970| 0.175624| 0.015541| 0.187243| 0| | 8| 0.180458| 0.007389| 0.241016| 0.018755| 0.184689| 0| | 9| 0.179364| 0.007194| 0.112335| 0.018093| 0.183928| 0| | 10| 0.175531| 0.008233| 0.271539| 0.017187| 0.180789| 0| |==========================================================================================| | Iteration | Train Loss | Gradient | Step | Iteration | Validation | Validation | | | | | | Time (sec) | Loss | Checks | |==========================================================================================| | 11| 0.167236| 0.014633| 0.941927| 0.017251| 0.172918| 0| | 12| 0.164107| 0.007069| 0.186935| 0.014576| 0.169584| 0| | 13| 0.162421| 0.005973| 0.226712| 0.016246| 0.167040| 0| | 14| 0.161055| 0.004590| 0.142162| 0.028831| 0.165982| 0| | 15| 0.159318| 0.007807| 0.438498| 0.023335| 0.164524| 0| | 16| 0.158856| 0.003321| 0.054253| 0.016124| 0.164177| 0| | 17| 0.158481| 0.004336| 0.125983| 0.016123| 0.163746| 0| | 18| 0.158042| 0.004697| 0.160583| 0.016539| 0.163042| 0| | 19| 0.157412| 0.007637| 0.304204| 0.015829| 0.162194| 0| | 20| 0.156931| 0.003145| 0.182916| 0.034118| 0.161804| 0| |==========================================================================================|

19-152

Assess Neural Network Classifier Performance

| Iteration | Train Loss | Gradient | Step | Iteration | Validation | Validation | | | | | | Time (sec) | Loss | Checks | |==========================================================================================| | 21| 0.156666| 0.003791| 0.089101| 0.014508| 0.161714| 0| | 22| 0.156457| 0.003157| 0.039609| 0.014297| 0.161592| 0| | 23| 0.156210| 0.002608| 0.081463| 0.014550| 0.161511| 0| | 24| 0.155981| 0.003497| 0.088109| 0.014762| 0.161557| 1| | 25| 0.155520| 0.004131| 0.181666| 0.015088| 0.161433| 0| | 26| 0.154899| 0.002309| 0.327281| 0.016675| 0.161065| 0| | 27| 0.154703| 0.001210| 0.055537| 0.015632| 0.160733| 0| | 28| 0.154503| 0.002407| 0.089433| 0.015465| 0.160449| 0| | 29| 0.154304| 0.003212| 0.118986| 0.022733| 0.160163| 0| | 30| 0.154026| 0.002823| 0.183600| 0.021131| 0.159885| 0| |==========================================================================================| | Iteration | Train Loss | Gradient | Step | Iteration | Validation | Validation | | | | | | Time (sec) | Loss | Checks | |==========================================================================================| | 31| 0.153738| 0.004477| 0.405824| 0.026134| 0.159378| 0| | 32| 0.153538| 0.003659| 0.065795| 0.015355| 0.159333| 0| | 33| 0.153491| 0.001184| 0.017043| 0.014550| 0.159377| 1| | 34| 0.153460| 0.000988| 0.017456| 0.015024| 0.159446| 2| | 35| 0.153420| 0.002433| 0.032119| 0.015506| 0.159463| 3| | 36| 0.153329| 0.003517| 0.058506| 0.016548| 0.159478| 4| | 37| 0.153181| 0.002436| 0.116169| 0.016689| 0.159453| 5| | 38| 0.153025| 0.001577| 0.177446| 0.016400| 0.159377| 6| |==========================================================================================|

Use the information inside the TrainingHistory property of the object Mdl to check the iteration that corresponds to the minimum validation cross-entropy loss. The final returned model Mdl is the model trained at this iteration. iteration = Mdl.TrainingHistory.Iteration; valLosses = Mdl.TrainingHistory.ValidationLoss; [~,minIdx] = min(valLosses); iteration(minIdx) ans = 32

Evaluate Test Set Performance Evaluate the performance of the trained classifier Mdl on the test set adulttest by using the predict, loss, margin, and edge object functions. Find the predicted labels and classification scores for the observations in the test set. [labels,Scores] = predict(Mdl,adulttest);

Create a confusion matrix from the test set results. The diagonal elements indicate the number of correctly classified instances of a given class. The off-diagonal elements are instances of misclassified observations. confusionchart(adulttest.salary,labels)

19-153

19

Nonparametric Supervised Learning

Compute the test set classification accuracy. error = loss(Mdl,adulttest,"salary"); accuracy = (1-error)*100 accuracy = 85.0172

The neural network classifier correctly classifies approximately 85% of the test set observations. Compute the test set classification margins for the trained neural network. Display a histogram of the margins. The classification margins are the difference between the classification score for the true class and the classification score for the false class. Because neural network classifiers return scores that are posterior probabilities, classification margins close to 1 indicate confident classifications and negative margin values indicate misclassifications. m = margin(Mdl,adulttest,"salary"); histogram(m)

19-154

Assess Neural Network Classifier Performance

Use the classification edge, or mean of the classification margins, to assess the overall performance of the classifier. meanMargin = edge(Mdl,adulttest,"salary") meanMargin = 0.5943

Alternatively, compute the weighted classification edge by using observation weights. weightedMeanMargin = edge(Mdl,adulttest,"salary", ... "Weight","fnlwgt") weightedMeanMargin = 0.6045

Visualize the predicted labels and classification scores using scatter plots, in which each point corresponds to an observation. Use the predicted labels to set the color of the points, and use the maximum scores to set the transparency of the points. Points with less transparency are labeled with greater confidence. First, find the maximum classification score for each test set observation. maxScores = max(Scores,[],2);

Create a scatter plot comparing maximum scores across the number of work hours per week and level of education. Because the education variable is categorical, randomly jitter (or space out) the points along the y-dimension. 19-155

19

Nonparametric Supervised Learning

Change the colormap so that maximum scores corresponding to salaries that are less than or equal to $50,000 per year appear as blue, and maximum scores corresponding to salaries greater than $50,000 per year appear as red. scatter(adulttest.hours_per_week,adulttest.education,[],labels, ... "filled","MarkerFaceAlpha","flat","AlphaData",maxScores, ... "YJitter","rand"); xlabel("Number of Work Hours Per Week") ylabel("Education") Mdl.ClassNames ans = 2x1 categorical 50K colors = lines(2) colors = 2×3 0 0.8500

0.4470 0.3250

0.7410 0.0980

colormap(colors);

The colors in the scatter plot indicate that, in general, the neural network predicts that people with lower levels of education (12th grade or below) have salaries less than or equal to $50,000 per year. 19-156

Assess Neural Network Classifier Performance

The transparency of some of the points in the lower right of the plot indicates that the model is less confident in this prediction for people who work many hours per week (60 hours or more).

See Also fitcnet | margin | edge | loss | predict | ClassificationNeuralNetwork | confusionchart | scatter

19-157

19

Nonparametric Supervised Learning

Assess Regression Neural Network Performance Create a feedforward regression neural network model with fully connected layers using fitrnet. Use validation data for early stopping of the training process to prevent overfitting the model. Then, use the object functions of the model to assess its performance on test data. Load Sample Data Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. load carbig

Convert the Origin variable to a categorical variable. Then create a table containing the predictor variables Acceleration, Displacement, and so on, as well as the response variable MPG. Each row contains the measurements for a single car. Delete the rows of the table in which the table has missing values. Origin = categorical(cellstr(Origin)); Tbl = table(Acceleration,Displacement,Horsepower, ... Model_Year,Origin,Weight,MPG); Tbl = rmmissing(Tbl);

Partition Data Split the data into training, validation, and test sets. First, reserve approximately one third of the observations for the test set. Then, split the remaining data in half to create the training and validation sets. rng("default") % For reproducibility of the data partitions cvp1 = cvpartition(size(Tbl,1),"Holdout",1/3); testTbl = Tbl(test(cvp1),:); remainingTbl = Tbl(training(cvp1),:); cvp2 = cvpartition(size(remainingTbl,1),"Holdout",1/2); validationTbl = remainingTbl(test(cvp2),:); trainTbl = remainingTbl(training(cvp2),:);

Train Neural Network Train a regression neural network model by using the training set. Specify the MPG column of tblTrain as the response variable, and standardize the numeric predictors. Evaluate the model at each iteration by using the validation set. Specify to display the training information at each iteration by using the Verbose name-value argument. By default, the training process ends early if the validation loss is greater than or equal to the minimum validation loss computed so far, six times in a row. To change the number of times the validation loss is allowed to be greater than or equal to the minimum, specify the ValidationPatience name-value argument. Mdl = fitrnet(trainTbl,"MPG","Standardize",true, ... "ValidationData",validationTbl, ... "Verbose",1); |==========================================================================================| | Iteration | Train Loss | Gradient | Step | Iteration | Validation | Validation | | | | | | Time (sec) | Loss | Checks | |==========================================================================================|

19-158

Assess Regression Neural Network Performance

| 1| 102.962345| 46.853164| 6.700877| 0.018954| 115.730384| 0| | 2| 55.403995| 22.171181| 1.811805| 0.007201| 53.086379| 0| | 3| 37.588848| 11.135231| 0.782861| 0.001630| 38.580002| 0| | 4| 29.713458| 8.379231| 0.392009| 0.000831| 31.021379| 0| | 5| 17.523851| 9.958164| 2.137584| 0.000517| 17.594863| 0| | 6| 12.700624| 2.957771| 0.744551| 0.000525| 14.209019| 0| | 7| 11.841152| 1.907378| 0.201770| 0.000731| 13.159899| 0| | 8| 10.162988| 2.542555| 0.576907| 0.000826| 11.352490| 0| | 9| 8.889095| 2.779980| 0.615716| 0.001016| 10.446334| 0| | 10| 7.670335| 2.400272| 0.648711| 0.000732| 10.424337| 0| |==========================================================================================| | Iteration | Train Loss | Gradient | Step | Iteration | Validation | Validation | | | | | | Time (sec) | Loss | Checks | |==========================================================================================| | 11| 7.416274| 0.505111| 0.214707| 0.001889| 10.522517| 1| | 12| 7.338923| 0.880655| 0.119085| 0.001179| 10.648031| 2| | 13| 7.149407| 1.784821| 0.277908| 0.000783| 10.800952| 3| | 14| 6.866385| 1.904480| 0.472190| 0.000896| 10.839202| 4| | 15| 6.815575| 3.339285| 0.943063| 0.000752| 10.031692| 0| | 16| 6.428137| 0.684771| 0.133729| 0.001144| 9.867819| 0| | 17| 6.363299| 0.456606| 0.125363| 0.001151| 9.720076| 0| | 18| 6.289887| 0.742923| 0.152290| 0.001046| 9.576588| 0| | 19| 6.215407| 0.964684| 0.183503| 0.000832| 9.422910| 0| | 20| 6.078333| 2.124971| 0.566948| 0.000733| 9.599573| 1| |==========================================================================================| | Iteration | Train Loss | Gradient | Step | Iteration | Validation | Validation | | | | | | Time (sec) | Loss | Checks | |==========================================================================================| | 21| 5.947923| 1.217291| 0.583867| 0.000666| 9.618400| 2| | 22| 5.855505| 0.671774| 0.285123| 0.000849| 9.734680| 3| | 23| 5.831802| 1.882061| 0.657368| 0.000827| 10.365968| 4| | 24| 5.713261| 1.004072| 0.134719| 0.000543| 10.314258| 5| | 25| 5.520766| 0.967032| 0.290156| 0.000441| 10.177322| 6| |==========================================================================================|

Use the information inside the TrainingHistory property of the object Mdl to check the iteration that corresponds to the minimum validation mean squared error (MSE). The final returned model Mdl is the model trained at this iteration. iteration = Mdl.TrainingHistory.Iteration; valLosses = Mdl.TrainingHistory.ValidationLoss; [~,minIdx] = min(valLosses); iteration(minIdx) ans = 19

Evaluate Test Set Performance Evaluate the performance of the trained model Mdl on the test set testTbl by using the loss and predict object functions. Compute the test set mean squared error (MSE). Smaller MSE values indicate better performance. mse = loss(Mdl,testTbl,"MPG") mse = 7.4101

Compare the predicted test set response values to the true response values. Plot the predicted miles per gallon (MPG) along the vertical axis and the true MPG along the horizontal axis. Points on the 19-159

19

Nonparametric Supervised Learning

reference line indicate correct predictions. A good model produces predictions that are scattered near the line. predictedY = predict(Mdl,testTbl); plot(testTbl.MPG,predictedY,".") hold on plot(testTbl.MPG,testTbl.MPG) hold off xlabel("True Miles Per Gallon (MPG)") ylabel("Predicted Miles Per Gallon (MPG)")

Use box plots to compare the distribution of predicted and true MPG values by country of origin. Create the box plots by using the boxchart function. Each box plot displays the median, the lower and upper quartiles, any outliers (computed using the interquartile range), and the minimum and maximum values that are not outliers. In particular, the line inside each box is the sample median, and the circular markers indicate outliers. For each country of origin, compare the red box plot (showing the distribution of predicted MPG values) to the blue box plot (showing the distribution of true MPG values). Similar distributions for the predicted and true MPG values indicate good predictions. boxchart(testTbl.Origin,testTbl.MPG) hold on boxchart(testTbl.Origin,predictedY) hold off legend(["True MPG","Predicted MPG"])

19-160

Assess Regression Neural Network Performance

xlabel("Country of Origin") ylabel("Miles Per Gallon (MPG)")

For most countries, the predicted and true MPG values have similar distributions. Some discrepancies are possibly due to the small number of cars in the training and test sets. Compare the range of MPG values for cars in the training and test sets. trainSummary = grpstats(trainTbl(:,["MPG","Origin"]),"Origin", ... "range") trainSummary=6×3 table Origin _______ France Germany Italy Japan Sweden USA

France Germany Italy Japan Sweden USA

GroupCount __________ 2 12 1 26 4 86

range_MPG _________ 1.2 23.4 0 26.6 8 27

testSummary = grpstats(testTbl(:,["MPG","Origin"]),"Origin", ... "range") testSummary=6×3 table Origin

GroupCount

range_MPG

19-161

19

Nonparametric Supervised Learning

France Germany Italy Japan Sweden USA

_______

__________

France Germany Italy Japan Sweden USA

4 13 4 26 1 82

_________ 19.8 20.3 11.3 25.6 0 29

For countries like France, Italy, and Sweden, which have few cars in the training and test sets, the range of the MPG values varies significantly in both sets. Plot the test set residuals. A good model usually has residuals scattered roughly symmetrically around 0. Clear patterns in the residuals are a sign that you can improve your model. residuals = testTbl.MPG - predictedY; plot(testTbl.MPG,residuals,".") hold on yline(0) hold off xlabel("True Miles Per Gallon (MPG)") ylabel("MPG Residuals")

The plot suggests that the residuals are well distributed. You can obtain more information about the observations with the greatest residuals, in terms of absolute value. 19-162

Assess Regression Neural Network Performance

[~,residualIdx] = sort(residuals,"descend", ... "ComparisonMethod","abs"); residuals(residualIdx) ans = 130×1 -8.8469 8.4427 8.0493 7.8996 -6.2220 5.8589 5.7007 -5.6733 -5.4545 5.1899 ⋮

Display the three observations with the greatest residuals, that is, with magnitudes greater than 8. testTbl(residualIdx(1:3),:) ans=3×7 table Acceleration ____________ 17.6 11.4 13.8

Displacement ____________ 91 168 91

Horsepower __________ 68 132 67

Model_Year __________ 82 80 80

Origin ______ Japan Japan Japan

Weight ______

MPG ____

1970 2910 1850

31 32.7 44.6

See Also fitrnet | loss | predict | RegressionNeuralNetwork | boxchart

19-163

19

Nonparametric Supervised Learning

Automated Feature Engineering for Classification The gencfeatures function enables you to automate the feature engineering process in the context of a machine learning workflow. Before passing tabular training data to a classifier, you can create new features from the predictors in the data by using gencfeatures. Use the returned data to train the classifier. Generate new features based on your machine learning workflow. • To generate features for an interpretable binary classifier, use the default TargetLearner value of "linear" in the call to gencfeatures. You can then use the returned data to train a binary linear classifier. For an example, see “Interpret Linear Model with Generated Features” on page 19-164. • To generate features that can lead to better model accuracy, specify TargetLearner="bag" or TargetLearner="gaussian-svm" in the call to gencfeatures. You can then use the returned data to train a bagged ensemble classifier or a binary support vector machine (SVM) classifier with a Gaussian kernel, respectively. For an example, see “Generate New Features to Improve Bagged Ensemble Accuracy” on page 19-167. To better understand the generated features, use the describe function of the FeatureTransformer object. To apply the same training set feature transformations to a test or validation set, use the transform function of the FeatureTransformer object.

Interpret Linear Model with Generated Features Use automated feature engineering to generate new features. Train a linear classifier using the generated features. Interpret the relationship between the generated features and the trained model. Load the patients data set. Create a table from a subset of the variables. Display the first few rows of the table. load patients Tbl = table(Age,Diastolic,Gender,Height,SelfAssessedHealthStatus, ... Systolic,Weight,Smoker); head(Tbl) Age ___ 38 43 38 40 49 46 33 40

Diastolic _________ 93 77 83 75 80 70 88 82

Gender __________

Height ______

{'Male' } {'Male' } {'Female'} {'Female'} {'Female'} {'Female'} {'Female'} {'Male' }

71 69 64 67 64 68 64 68

SelfAssessedHealthStatus ________________________ {'Excellent'} {'Fair' } {'Good' } {'Fair' } {'Good' } {'Good' } {'Good' } {'Good' }

Systolic ________ 124 109 125 117 122 121 130 115

Generate 10 new features from the variables in Tbl. Specify the Smoker variable as the response. By default, gencfeatures assumes that the new features will be used to train a binary linear classifier. rng("default") % For reproducibility [T,NewTbl] = gencfeatures(Tbl,"Smoker",10)

19-164

Weight ______ 176 163 131 133 119 142 142 180

Automated Feature Engineering for Classification

T = FeatureTransformer with properties: Type: TargetLearner: NumEngineeredFeatures: NumOriginalFeatures: TotalNumFeatures: NewTbl=100×11 table zsc(Systolic.^2) ________________

⋮

0.15379 -1.9421 0.30311 -0.85785 -0.14125 -0.28697 1.0677 -1.1361 -1.1361 -0.71693 -1.2734 -1.1361 0.60534 1.0677 -1.2734 1.0677

'classification' 'linear' 10 0 10

eb8(Diastolic) ______________

q8(Systolic) ____________

8 2 4 2 3 1 6 4 3 5 2 1 1 8 3 7

eb8(Systolic) _____________

q8(Diastolic) _____________

zsc(k _____

4 1 5 2 4 3 6 2 2 3 1 2 5 6 1 6

8 2 5 2 4 1 6 5 3 6 2 1 1 8 4 8

-1.7 -0.22 0.57 0.83 1. 0.67 -0.42 -0.79 -0.80 0.37 1.2 1. -0.98 -0.27 0.93 -0.91

6 1 6 2 5 4 8 2 2 3 1 2 6 8 1 8

T is a FeatureTransformer object that can be used to transform new data, and newTbl contains the new features generated from the Tbl data. To better understand the generated features, use the describe object function of the FeatureTransformer object. For example, inspect the first two generated features. describe(T,1:2) Type ___________

IsOriginal __________

InputVariables ______________

zsc(Systolic.^2)

Numeric

false

Systolic

eb8(Diastolic)

Categorical

false

Diastolic

Tr __________________________

power( ,2) Standardization with z-sco Equal-width binning (numbe

The first feature in newTbl is a numeric variable, created by first squaring the values of the Systolic variable and then converting the results to z-scores. The second feature in newTbl is a categorical variable, created by binning the values of the Diastolic variable into 8 bins of equal width. Use the generated features to fit a linear classifier without any regularization. Mdl = fitclinear(NewTbl,"Smoker",Lambda=0);

19-165

19

Nonparametric Supervised Learning

Plot the coefficients of the predictors used to train Mdl. Note that fitclinear expands categorical predictors before fitting a model. p = length(Mdl.Beta); [sortedCoefs,expandedIndex] = sort(Mdl.Beta,ComparisonMethod="abs"); sortedExpandedPreds = Mdl.ExpandedPredictorNames(expandedIndex); bar(sortedCoefs,Horizontal="on") yticks(1:2:p) yticklabels(sortedExpandedPreds(1:2:end)) xlabel("Coefficient") ylabel("Expanded Predictors") title("Coefficients for Expanded Predictors")

Identify the predictors whose coefficients have larger absolute values. bigCoefs = abs(sortedCoefs) >= 4; flip(sortedExpandedPreds(bigCoefs)) ans = 1x7 cell {'zsc(Systolic.^2)'}

{'eb8(Systolic) >= 5'}

{'q8(Diastolic) >= 3'}

{'eb8(Diastolic)

You can use partial dependence plots to analyze the categorical features whose levels have large coefficients in terms of absolute value. For example, inspect the partial dependence plot for the q8(Diastolic) variable, whose levels q8(Diastolic) >= 3 and q8(Diastolic) >= 6 have coefficients with large absolute values. These two levels correspond to noticeable changes in the predicted scores. 19-166

Automated Feature Engineering for Classification

plotPartialDependence(Mdl,"q8(Diastolic)",Mdl.ClassNames,NewTbl);

Generate New Features to Improve Bagged Ensemble Accuracy Use gencfeatures to engineer new features before training a bagged ensemble classifier. Before making predictions on new data, apply the same feature transformations to the new data set. Compare the test set performance of the ensemble that uses the engineered features to the test set performance of the ensemble that uses the original features. Read the sample file CreditRating_Historical.dat into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency. Preview the first few rows of the data set. creditrating = readtable("CreditRating_Historical.dat"); head(creditrating) ID _____

WC_TA ______

RE_TA ______

62394 48608 42444 48631

0.013 0.232 0.311 0.194

0.104 0.335 0.367 0.263

EBIT_TA _______ 0.036 0.062 0.074 0.062

MVE_BVTD ________ 0.447 1.969 1.935 1.017

S_TA _____ 0.142 0.281 0.366 0.228

Industry ________ 3 8 1 4

Rating _______ {'BB' } {'A' } {'A' } {'BBB'}

19-167

19

Nonparametric Supervised Learning

43768 39255 62236 39354

0.121 -0.117 0.087 0.005

0.413 -0.799 0.158 0.181

0.057 0.01 0.049 0.034

3.647 0.179 0.816 2.597

0.466 0.082 0.324 0.388

12 4 2 7

{'AAA'} {'CCC'} {'BBB'} {'AA' }

Because each value in the ID variable is a unique customer ID, that is, length(unique(creditrating.ID)) is equal to the number of observations in creditrating, the ID variable is a poor predictor. Remove the ID variable from the table, and convert the Industry variable to a categorical variable. creditrating = removevars(creditrating,"ID"); creditrating.Industry = categorical(creditrating.Industry);

Convert the Rating response variable to an ordinal categorical variable. creditrating.Rating = categorical(creditrating.Rating, ... ["AAA","AA","A","BBB","BB","B","CCC"],Ordinal=true);

Partition the data into training and test sets. Use approximately 75% of the observations as training data, and 25% of the observations as test data. Partition the data using cvpartition. rng("default") % For reproducibility of the partition c = cvpartition(creditrating.Rating,Holdout=0.25); trainingIndices = training(c); % Indices for the training set testIndices = test(c); % Indices for the test set creditTrain = creditrating(trainingIndices,:); creditTest = creditrating(testIndices,:);

Use the training data to generate 40 new features to fit a bagged ensemble. By default, the 40 features include original features that can be used as predictors by a bagged ensemble. [T,newCreditTrain] = gencfeatures(creditTrain,"Rating",40, ... TargetLearner="bag"); T T = FeatureTransformer with properties: Type: TargetLearner: NumEngineeredFeatures: NumOriginalFeatures: TotalNumFeatures:

'classification' 'bag' 34 6 40

Create newCreditTest by applying the transformations stored in the object T to the test data. newCreditTest = transform(T,creditTest);

Compare the test set performances of a bagged ensemble trained on the original features and a bagged ensemble trained on the new features. Train a bagged ensemble using the original training set creditTrain. Compute the accuracy of the model on the original test set creditTest. Visualize the results using a confusion matrix. originalMdl = fitcensemble(creditTrain,"Rating",Method="Bag"); originalTestAccuracy = 1 - loss(originalMdl,creditTest, ... "Rating",LossFun="classiferror")

19-168

Automated Feature Engineering for Classification

originalTestAccuracy = 0.7542 predictedTestLabels = predict(originalMdl,creditTest); confusionchart(creditTest.Rating,predictedTestLabels);

Train a bagged ensemble using the transformed training set newCreditTrain. Compute the accuracy of the model on the transformed test set newCreditTest. Visualize the results using a confusion matrix. newMdl = fitcensemble(newCreditTrain,"Rating",Method="Bag"); newTestAccuracy = 1 - loss(newMdl,newCreditTest, ... "Rating",LossFun="classiferror") newTestAccuracy = 0.7461 newPredictedTestLabels = predict(newMdl,newCreditTest); confusionchart(newCreditTest.Rating,newPredictedTestLabels)

19-169

19

Nonparametric Supervised Learning

The bagged ensemble trained on the transformed data seems to outperform the bagged ensemble trained on the original data.

See Also gencfeatures | FeatureTransformer | describe | transform | fitclinear | fitcensemble | fitcsvm | plotPartialDependence | genrfeatures

19-170

Automated Feature Engineering for Regression

Automated Feature Engineering for Regression The genrfeatures function enables you to automate the feature engineering process in the context of a machine learning workflow. Before passing tabular training data to a regression model, you can create new features from the predictors in the data by using genrfeatures. Use the returned data to train the model. Generate new features based on your machine learning workflow. • To generate features for an interpretable regression model, use the default TargetLearner value of "linear" in the call to genrfeatures. You can then use the returned data to train a linear regression model. For an example, see “Interpret Linear Model with Generated Features” on page 19-171. • To generate features that can lead to better model prediction, specify TargetLearner="bag" or TargetLearner="gaussian-svm" in the call to genrfeatures. You can then use the returned data to train a bagged ensemble regression model or a support vector machine (SVM) regression model with a Gaussian kernel, respectively. For an example, see “Generate New Features to Improve Bagged Ensemble Performance” on page 19-174. To better understand the generated features, use the describe function of the FeatureTransformer object. To apply the same training set feature transformations to a test or validation set, use the transform function of the FeatureTransformer object.

Interpret Linear Model with Generated Features Use automated feature engineering to generate new features. Train a linear regression model using the generated features. Interpret the relationship between the generated features and the trained model. Load the patients data set. Create a table from a subset of the variables. Display the first few rows of the table. load patients Tbl = table(Age,Diastolic,Gender,Height,SelfAssessedHealthStatus, ... Smoker,Weight,Systolic); head(Tbl) Age ___ 38 43 38 40 49 46 33 40

Diastolic _________ 93 77 83 75 80 70 88 82

Gender __________

Height ______

{'Male' } {'Male' } {'Female'} {'Female'} {'Female'} {'Female'} {'Female'} {'Male' }

71 69 64 67 64 68 64 68

SelfAssessedHealthStatus ________________________ {'Excellent'} {'Fair' } {'Good' } {'Fair' } {'Good' } {'Good' } {'Good' } {'Good' }

Smoker ______

Weight ______

true false false false false false true false

Generate 10 new features from the variables in Tbl. Specify the Systolic variable as the response. By default, genrfeatures assumes that the new features will be used to train a linear regression model. 19-171

176 163 131 133 119 142 142 180

S _

19

Nonparametric Supervised Learning

rng("default") % For reproducibility [T,NewTbl] = genrfeatures(Tbl,"Systolic",10) T = FeatureTransformer with properties: Type: TargetLearner: NumEngineeredFeatures: NumOriginalFeatures: TotalNumFeatures: NewTbl=100×11 table zsc(d(Smoker)) ______________ 1.3863 -0.71414 -0.71414 -0.71414 -0.71414 -0.71414 1.3863 -0.71414 -0.71414 -0.71414 -0.71414 -0.71414 -0.71414 1.3863 -0.71414 1.3863 ⋮

'regression' 'linear' 10 0 10

q8(Age) _______ 4 6 4 5 8 7 3 5 1 2 7 6 1 5 3 8

eb8(Age) ________

zsc(sin(Height)) ________________

zsc(kmd8) _________

1.1483 -0.3877 1.1036 -1.4552 1.1036 -1.5163 1.1036 -1.5163 -1.5163 -0.26055 -1.5163 -0.26055 1.1483 0.14351 0.96929 1.1483

-0.56842 -2.0772 -0.21519 -0.32389 1.2302 -0.88497 -1.1434 -0.3907 0.4278 -0.092621 0.16737 -0.32104 -0.051074 2.3695 0.092962 -0.049336

5 6 5 6 8 7 3 6 2 3 7 6 1 5 4 8

q6(Height) __________

eb8(D _____

6 5 2 4 2 4 2 4 4 3 4 3 6 6 2 6

T is a FeatureTransformer object that can be used to transform new data, and newTbl contains the new features generated from the Tbl data. To better understand the generated features, use the describe object function of the FeatureTransformer object. For example, inspect the first two generated features. describe(T,1:2) Type ___________

IsOriginal __________

InputVariables ______________

Transf ____________________________

Variable of type double conv Standardization with z-score Equiprobable binning (number

zsc(d(Smoker))

Numeric

false

Smoker

q8(Age)

Categorical

false

Age

The first feature in newTbl is a numeric variable, created by first converting the values of the Smoker variable to a numeric variable of type double and then transforming the results to z-scores. The second feature in newTbl is a categorical variable, created by binning the values of the Age variable into 8 equiprobable bins. Use the generated features to fit a linear regression model without any regularization. Mdl = fitrlinear(NewTbl,"Systolic",Lambda=0);

19-172

Automated Feature Engineering for Regression

Plot the coefficients of the predictors used to train Mdl. Note that fitrlinear expands categorical predictors before fitting a model. p = length(Mdl.Beta); [sortedCoefs,expandedIndex] = sort(Mdl.Beta,ComparisonMethod="abs"); sortedExpandedPreds = Mdl.ExpandedPredictorNames(expandedIndex); bar(sortedCoefs,Horizontal="on") yticks(1:2:p) yticklabels(sortedExpandedPreds(1:2:end)) xlabel("Coefficient") ylabel("Expanded Predictors") title("Coefficients for Expanded Predictors")

Identify the predictors whose coefficients have larger absolute values. bigCoefs = abs(sortedCoefs) >= 4; flip(sortedExpandedPreds(bigCoefs)) ans = 1x6 cell {'eb8(Diastolic) >= 5'}

{'zsc(d(Smoker))'}

{'q8(Age) >= 2'}

{'q10(Weight) >= 9'}

You can use partial dependence plots to analyze the categorical features whose levels have large coefficients in terms of absolute value. For example, inspect the partial dependence plot for the eb8(Diastolic) variable, whose levels eb8(Diastolic) >= 5 and eb8(Diastolic) >= 6 have coefficients with large absolute values. These two levels correspond to noticeable changes in the predicted Systolic values. 19-173

19

Nonparametric Supervised Learning

plotPartialDependence(Mdl,"eb8(Diastolic)",NewTbl);

Generate New Features to Improve Bagged Ensemble Performance Use genrfeatures to engineer new features before training a bagged ensemble regression model. Before making predictions on new data, apply the same feature transformations to the new data set. Compare the test set performance of the ensemble that uses the engineered features to the test set performance of the ensemble that uses the original features. Read power outage data into the workspace as a table. Remove observations with missing values, and display the first few rows of the table. outages = readtable("outages.csv"); Tbl = rmmissing(outages); head(Tbl)

19-174

Region _____________

OutageTime ____________________

Loss ______

Customers __________

RestorationTime ____________________

____

{'SouthWest'} {'SouthEast'} {'West' } {'MidWest' } {'West' } {'NorthEast'}

01-Feb-2002 07-Feb-2003 06-Apr-2004 16-Mar-2002 18-Jun-2003 16-Jul-2003

458.98 289.4 434.81 186.44 0 239.93

1.8202e+06 1.4294e+05 3.4037e+05 2.1275e+05 0 49434

07-Feb-2002 17-Feb-2003 06-Apr-2004 18-Mar-2002 18-Jun-2003 17-Jul-2003

{'wi {'wi {'eq {'se {'at {'fi

12:18:00 21:15:00 05:44:00 06:18:00 02:49:00 16:23:00

16:50:00 08:14:00 06:10:00 23:23:00 10:54:00 01:12:00

Automated Feature Engineering for Regression

{'MidWest' } {'SouthEast'}

27-Sep-2004 11:09:00 05-Sep-2004 17:48:00

286.72 73.387

66104 36073

27-Sep-2004 16:37:00 05-Sep-2004 20:46:00

Some of the variables, such as OutageTime and RestorationTime, have data types that are not supported by regression model training functions like fitrensemble. Partition the data into training and test sets. Use approximately 70% of the observations as training data, and 30% of the observations as test data. Partition the data using cvpartition. rng("default") % For reproducibility of the partition c = cvpartition(size(Tbl,1),Holdout=0.30); TrainTbl = Tbl(training(c),:); TestTbl = Tbl(test(c),:);

Use the training data to generate 30 new features to fit a bagged ensemble. By default, the 30 features include original features that can be used as predictors by a bagged ensemble. [Transformer,NewTrainTbl] = genrfeatures(TrainTbl,"Loss",30, ... TargetLearner="bag"); Transformer Transformer = FeatureTransformer with properties: Type: TargetLearner: NumEngineeredFeatures: NumOriginalFeatures: TotalNumFeatures:

'regression' 'bag' 27 3 30

Create NewTestTbl by applying the transformations stored in the object Transformer to the test data. NewTestTbl = transform(Transformer,TestTbl);

Train a bagged ensemble using the original training set TrainTbl, and compute the mean squared error (MSE) of the model on the original test set TestTbl. Specify only the three predictor variables that can be used by fitrensemble (Region, Customers, and Cause), and omit the two datetime predictor variables (OutageTime and RestorationTime). Then, train a bagged ensemble using the transformed training set NewTrainTbl, and compute the MSE of the model on the transformed test set NewTestTbl. originalMdl = fitrensemble(TrainTbl,"Loss ~ Region + Customers + Cause", ... Method="bag"); originalTestMSE = loss(originalMdl,TestTbl) originalTestMSE = 1.8999e+06 newMdl = fitrensemble(NewTrainTbl,"Loss",Method="bag"); newTestMSE = loss(newMdl,NewTestTbl) newTestMSE = 1.8617e+06

newTestMSE is less than originalTestMSE, which suggests that the bagged ensemble trained on the transformed data performs slightly better than the bagged ensemble trained on the original data. Compare the predicted test set response values to the true response values for both models. Plot the log of the predicted response along the vertical axis and the log of the true response (Loss) along the 19-175

{'eq {'eq

19

Nonparametric Supervised Learning

horizontal axis. Points on the reference line indicate correct predictions. A good model produces predictions that are scattered near the line. predictedTestY = predict(originalMdl,TestTbl); newPredictedTestY = predict(newMdl,NewTestTbl); plot(log(TestTbl.Loss),log(predictedTestY),".") hold on plot(log(TestTbl.Loss),log(newPredictedTestY),".") hold on plot(log(TestTbl.Loss),log(TestTbl.Loss)) hold off xlabel("log(True Response)") ylabel("log(Predicted Response)") legend(["Original Model Results","New Model Results","Reference Line"], ... Location="southeast") xlim([-1 10]) ylim([-1 10])

See Also genrfeatures | FeatureTransformer | describe | transform | fitrlinear | fitrensemble | fitrsvm | plotPartialDependence | gencfeatures

19-176

Moving Towards Automating Model Selection Using Bayesian Optimization

Moving Towards Automating Model Selection Using Bayesian Optimization This example shows how to build multiple classification models for a given training data set, optimize their hyperparameters using Bayesian optimization, and select the model that performs the best on a test data set. Training several models and tuning their hyperparameters can often take days or weeks. Creating a script to develop and compare multiple models automatically can be much faster. You can also use Bayesian optimization to speed up the process. Instead of training each model with different sets of hyperparameters, you select a few different models and tune their default hyperparameters using Bayesian optimization. Bayesian optimization finds an optimal set of hyperparameters for a given model by minimizing the objective function of the model. This optimization algorithm strategically selects new hyperparameters in each iteration and typically arrives at the optimal set of hyperparameters more quickly than a simple grid search. You can use the script in this example to train several classification models using Bayesian optimization for a given training data set and identify the model that performs best on a test data set. Alternatively, to choose a classification model automatically across a selection of classifier types and hyperparameter values, use fitcauto. For an example, see “Automated Classifier Selection with Bayesian and ASHA Optimization” on page 19-185. Load Sample Data This example uses the 1994 census data stored in census1994.mat. The data set consists of demographic data from the US Census Bureau to predict whether an individual makes over $50,000 per year. The classification task is to fit a model that predicts the salary category of people given their age, working class, education level, marital status, race, and so on. Load the sample data census1994 and display the variables in the data set. load census1994 whos Name Description adultdata adulttest

Size

Bytes

Class

20x74 32561x15 16281x15

2960 1872566 944466

char table table

Attributes

census1994 contains the training data set adultdata and the test data set adulttest. For this example, to reduce the running time, subsample 5000 training and test observations each, from the original tables adultdata and adulttest, by using the datasample function. (You can skip this step if you want to use the complete data sets.) NumSamples = 5000; s = RandStream('mlfg6331_64'); % For reproducibility adultdata = datasample(s,adultdata,NumSamples,'Replace',false); adulttest = datasample(s,adulttest,NumSamples,'Replace',false);

Preview the first few rows of the training data set. head(adultdata)

19-177

19

Nonparametric Supervised Learning

age ___

workClass ___________

fnlwgt __________

education ____________

39 25 24 51 54 53 52 37

Private Private Private Private Private Federal-gov Private Private

4.91e+05 2.2022e+05 2.2761e+05 1.7329e+05 2.8029e+05 39643 81859 1.2429e+05

Bachelors 11th 10th HS-grad Some-college HS-grad HS-grad Some-college

education_num _____________ 13 7 6 9 10 9 9 10

marital_status __________________

__

Never-married Never-married Divorced Divorced Married-civ-spouse Widowed Married-civ-spouse Married-civ-spouse

Ex Ha Ha Ot Sa Ex Ma Ad

Each row represents the attributes of one adult, such as age, education, and occupation. The last column salary shows whether a person has a salary less than or equal to $50,000 per year or greater than $50,000 per year. Understand Data and Choose Classification Models Statistics and Machine Learning Toolbox™ provides several options for classification, including classification trees, discriminant analysis, naive Bayes, nearest neighbors, support vector machines (SVMs), and classification ensembles. For the complete list of algorithms, see “Classification”. Before choosing the algorithms to use for your problem, inspect your data set. The census data has several noteworthy characteristics: • The data is tabular and contains both numeric and categorical variables. • The data contains missing values. • The response variable (salary) has two classes (binary classification). Without making any assumptions or using prior knowledge of algorithms that you expect to work well on your data, you simply train all the algorithms that support tabular data and binary classification. Error-correcting output codes (ECOC) models are used for data with more than two classes. Discriminant analysis and nearest neighbor algorithms do not analyze data that contains both numeric and categorical variables. Therefore, the algorithms appropriate for this example are SVMs, a decision tree, an ensemble of decision trees, and a naive Bayes model. Some of these models, like decision tree and naive Bayes models, are better at handling data with missing values; that is, they return non-NaN predicted scores for observations with missing values. Build Models and Tune Hyperparameters To speed up the process, customize the hyperparameter optimization options. Specify 'ShowPlots' as false and 'Verbose' as 0 to disable plot and message displays, respectively. Also, specify 'UseParallel' as true to run Bayesian optimization in parallel, which requires Parallel Computing Toolbox™. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. hypopts = struct('ShowPlots',false,'Verbose',0,'UseParallel',true);

Start a parallel pool. poolobj = gcp; Starting parallel pool (parpool) using the 'Processes' profile ... Connected to parallel pool with 8 workers.

19-178

Moving Towards Automating Model Selection Using Bayesian Optimization

You can fit the training data set and tune parameters easily by calling each fitting function and setting its 'OptimizeHyperparameters' name-value pair argument to 'auto'. Create the classification models. % SVMs: SVM with polynomial kernel & SVM with Gaussian kernel mdls{1} = fitcsvm(adultdata,'salary','KernelFunction','polynomial','Standardize','on', ... 'OptimizeHyperparameters','auto','HyperparameterOptimizationOptions', hypopts); mdls{2} = fitcsvm(adultdata,'salary','KernelFunction','gaussian','Standardize','on', ... 'OptimizeHyperparameters','auto','HyperparameterOptimizationOptions', hypopts); % Decision tree mdls{3} = fitctree(adultdata,'salary', ... 'OptimizeHyperparameters','auto','HyperparameterOptimizationOptions', hypopts); % Ensemble of Decision trees mdls{4} = fitcensemble(adultdata,'salary','Learners','tree', ... 'OptimizeHyperparameters','auto','HyperparameterOptimizationOptions', hypopts); % Naive Bayes mdls{5} = fitcnb(adultdata,'salary', ... 'OptimizeHyperparameters','auto','HyperparameterOptimizationOptions', hypopts);

Plot Minimum Objective Curves Extract the Bayesian optimization results from each model and plot the minimum observed value of the objective function for each model over every iteration of the hyperparameter optimization. The objective function value corresponds to the misclassification rate measured by five-fold crossvalidation using the training data set. The plot compares the performance of each model. figure hold on N = length(mdls); for i = 1:N mdl = mdls{i}; results = mdls{i}.HyperparameterOptimizationResults; plot(results.ObjectiveMinimumTrace,'Marker','o','MarkerSize',5); end names = {'SVM-Polynomial','SVM-Gaussian','Decision Tree','Ensemble-Trees','Naive Bayes'}; legend(names,'Location','northeast') title('Bayesian Optimization') xlabel('Number of Iterations') ylabel('Minimum Objective Value')

19-179

19

Nonparametric Supervised Learning

Using Bayesian optimization to find better hyperparameter sets improves the performance of models over several iterations. In this case, the plot indicates that the ensemble of decision trees has the best prediction accuracy for the data. This model performs well consistently over several iterations and different sets of Bayesian optimization hyperparameters. Check Performance with Test Set Check the classifier performance with the test data set by using the confusion matrix and the receiver operating characteristic (ROC) curve. Find the predicted labels and the score values of the test data set. label = cell(N,1); score = cell(N,1); for i = 1:N [label{i},score{i}] = predict(mdls{i},adulttest); end

Confusion Matrix Obtain the most likely class for each test observation by using the predict function of each model. Then compute the confusion matrix with the predicted classes and the known (true) classes of the test data set by using the confusionchart function. figure c = cell(N,1); for i = 1:N

19-180

Moving Towards Automating Model Selection Using Bayesian Optimization

subplot(2,3,i) c{i} = confusionchart(adulttest.salary,label{i}); title(names{i}) end

The diagonal elements indicate the number of correctly classified instances of a given class. The offdiagonal elements are instances of misclassified observations. ROC Curve Inspect the classifier performance more closely by plotting a ROC curve for each classifier and computing the area under the ROC curve (AUC). A ROC curve shows the true positive rate versus the false positive rate for different thresholds of classification scores. For a perfect classifier, whose true positive rate is always 1 regardless of the threshold, AUC = 1. For a binary classifier that randomly assigns observations to classes, AUC = 0.5. A large AUC value (close to 1) indicates good classifier performance. Compute the metrics for a ROC curve and find the AUC value by creating a rocmetrics object for each classifier. Plot the ROC curves for the label ' From Workspace.

4

In the New Session from Workspace dialog box, under Data Set Variable, select a table or matrix from the list of workspace variables. If you select a matrix, choose whether to use rows or columns for observations by clicking the option buttons.

5

Under Response, observe the default response variable. The app tries to select a suitable response variable from the data set variable and treats all other variables as predictors. If you want to use a different response variable, you can: • Use the list to select another variable from the data set variable. 23-17

23

Classification Learner

• Select a separate workspace variable by clicking the From workspace option button and then selecting a variable from the list. 6

Under Predictors, add or remove predictors using the check boxes. Add or remove all predictors by clicking Add All or Remove All. You can also add or remove multiple predictors by selecting them in the table, and then clicking Add N or Remove N, where N is the number of selected predictors. The Add All and Remove All buttons change to Add N and Remove N when you select multiple predictors.

7

To accept the default validation scheme and continue, click Start Session. The default validation option is 5-fold cross-validation, which protects against overfitting. Tip If you have a large data set you might want to switch to holdout validation. To learn more, see “Choose Validation Scheme” on page 23-19.

Note If you prefer loading data into the app directly from the command line, you can specify the predictor data, response variable, and validation type to use in Classification Learner in the command line call to classificationLearner. For more information, see Classification Learner. For next steps, see “Train Classification Models in Classification Learner App” on page 23-10.

Import Data from File 1

On the Learn tab, in the File section, select New Session > From File.

2

Select a file type in the list, such as spreadsheets, text files, or comma separated values (.csv) files, or select All Files to browse for other file types such as .dat.

Example Data for Classification To get started using Classification Learner, try the following example data sets. Name

Size

Description

Fisher Iris

Number of predictors: 4 Number of observations: 150 Number of classes: 3 Response: Species

Measurements from three species of iris. Try to classify the species. For a step-by-step example, see “Train Decision Trees Using Classification Learner App” on page 23-93.

Create a table from the .csv file: fishertable = readtable('fisheriris.csv');

Credit Rating

23-18

Number of predictors: 6 Number of observations: 3932 Number of classes: 7 Response: Rating

Financial ratios and industry sectors information for a list of corporate customers. The response variable consists of credit ratings (AAA, AA, A, BBB, BB, B, CCC) assigned by a rating agency.

Select Data for Classification or Open Saved App Session

Name

Size

Description

Create a table from the CreditRating_Historical.dat file: creditrating = readtable('CreditRating_Historical.dat');

Cars

Number of predictors: 7 Number of observations: 100 Number of classes: 7 Response: Origin

Measurements of cars, in 1970, 1976, and 1982. Try to classify the country of origin.

Create a table from variables in the carsmall.mat file: load carsmall cartable = table(Acceleration, Cylinders, Displacement,... Horsepower, Model_Year, MPG, Weight, Origin);

Arrhythmia

Number of predictors: 279 Number of observations: 452 Number of classes: 16 Response: Class (Y)

Patient information and response variables that indicate the presence and absence of cardiac arrhythmia. Misclassifying a patient as "normal" has more severe consequences than false positives classified as “has arrhythmia.”

Create a table from the .mat file: load arrhythmia Arrhythmia = array2table(X); Arrhythmia.Class = categorical(Y);

Ovarian Cancer Number of predictors: 4000 Number of observations: 216 Number of classes: 2 Response: Group

Ovarian cancer data generated using the WCX2 protein array. Includes 95 controls and 121 ovarian cancers.

Create a table from the .mat file: load ovariancancer ovariancancer = array2table(obs); ovariancancer.Group = categorical(grp);

Ionosphere

Number of predictors: 34 Number of observations: 351 Number of classes: 2 Response: Group (Y)

Signals from a phased array of 16 highfrequency antennas. Good (“g”) returned radar signals are those showing evidence of some type of structure in the ionosphere. Bad (“b”) signals are those that pass through the ionosphere.

Create a table from the .mat file: load ionosphere ionosphere = array2table(X); ionosphere.Group = Y;

Choose Validation Scheme

23-19

23

Classification Learner

Choose a validation method to examine the predictive accuracy of the fitted models. Validation estimates model performance on new data compared to the training data, and helps you choose the best model. Validation protects against overfitting. Choose a validation scheme before training any models, so that you can compare all the models in your session using the same validation scheme. Tip Try the default validation scheme and click Start Session to continue. The default option is 5fold cross-validation, which protects against overfitting. If you have a large data set and training models takes too long using cross-validation, reimport your data and try the faster holdout validation instead. Assume that no data is reserved for testing, which is true by default. • Cross-Validation: Select a number of folds (or divisions) to partition the data set. If you choose k folds, then the app: 1

Partitions the data into k disjoint sets or folds

2

For each validation fold:

3

a

Trains a model using the training-fold observations (observations not in the validation fold)

b

Assesses model performance using validation-fold data

Calculates the average validation error over all folds

This method gives a good estimate of the predictive accuracy of the final model trained with all the data. It requires multiple fits but makes efficient use of all the data, so it is recommended for small data sets. • Holdout Validation: Select a percentage of the data to use as a validation set. The app trains a model on the training set and assesses its performance with the validation set. The model used for validation is based on only a portion of the data, so Holdout Validation is recommended only for large data sets. The final model is trained with the full data set. • Resubstitution Validation: No protection against overfitting. The app uses all of the data for training and computes the error rate on the same data. Without any separate validation data, you get an unrealistic estimate of the model’s performance on new data. That is, the training sample accuracy is likely to be unrealistically high, and the predictive accuracy is likely to be lower. To help you avoid overfitting to the training data, choose another validation scheme instead. Note The validation scheme only affects the way that Classification Learner computes validation metrics. The final model is always trained using the full data set, excluding any data reserved for testing. All the classification models you train after selecting data use the same validation scheme that you select in this dialog box. You can compare all the models in your session using the same validation scheme. To change the validation selection and train new models, you can select data again, but you lose any trained models. The app warns you that importing data starts a new session. Save any trained models you want to keep to the workspace, and then import the data. 23-20

Select Data for Classification or Open Saved App Session

For next steps training models, see “Train Classification Models in Classification Learner App” on page 23-10.

(Optional) Reserve Data for Testing When you import data into Classification Learner, you can specify to reserve a percentage of the data for testing. In the Test section of the New Session dialog box, click the check box to set aside a test data set. Specify the percentage of the imported data to use as a test set. If you prefer, you can still choose to import a separate test data set after starting an app session. You can use the test set to evaluate the performance of a trained model. In particular, you can check whether the validation metrics provide good estimates for the model performance on new data. For more information, see “Evaluate Test Set Model Performance” on page 23-79. For an example, see “Train Classifier Using Hyperparameter Optimization in Classification Learner App” on page 23-150. Note The app does not use test data for model training. Models exported from the app are trained on the full training and validation data, excluding any data reserved for testing.

Save and Open App Session In Classification Learner, you can save the current app session and open a previously saved app session. • To save the current app session, click Save in the File section of the Learn tab. When you first save the current session, you must specify the session file name and the file location. The Save Session option saves the current session, and the Save Session As option saves the current session to a new file. The Save Compact Session As option saves a compact version of the current app session, resulting in a smaller file size for the saved session. Note that the Save Compact Session As option permanently deletes the training data from all trained models in the current session. • To open a saved app session, click Open in the File section. In the Select File to Open dialog box, select the saved session you want to open.

See Also Related Examples •

“Train Classification Models in Classification Learner App” on page 23-10

•

“Choose Classifier Options” on page 23-22

•

“Feature Selection and Feature Transformation Using Classification Learner App” on page 2344

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

•

“Train Decision Trees Using Classification Learner App” on page 23-93

23-21

23

Classification Learner

Choose Classifier Options In this section... “Choose Classifier Type” on page 23-22 “Decision Trees” on page 23-26 “Discriminant Analysis” on page 23-28 “Logistic Regression Classifiers” on page 23-29 “Naive Bayes Classifiers” on page 23-30 “Support Vector Machines” on page 23-31 “Efficiently Trained Linear Classifiers” on page 23-33 “Nearest Neighbor Classifiers” on page 23-35 “Kernel Approximation Classifiers” on page 23-37 “Ensemble Classifiers” on page 23-38 “Neural Network Classifiers” on page 23-41

Choose Classifier Type You can use Classification Learner to automatically train a selection of different classification models on your data. Use automated training to quickly try a selection of model types, then explore promising models interactively. To get started, try these options first:

Get Started Classifier Options

Description

All Quick-To-Train

Try this option first. The app will train all the model types available for your data set that are typically fast to fit.

All Linear

Try this option if you expect linear boundaries between the classes in your data. This option fits only linear SVM, efficient linear SVM, efficient logistic regression, and linear discriminant models. The option also fits a binary GLM logistic regression model for binary class data.

All

Use this option to train all available nonoptimizable model types. Trains every type regardless of any prior trained models. Can be time-consuming.

See “Automated Classifier Training” on page 23-10. If you want to explore classifiers one at a time, or you already know what classifier type you want, you can select individual models or train a group of the same type. To see all available classifier 23-22

Choose Classifier Options

options, on the Learn tab, click the arrow in the Models section to expand the list of classifiers. The nonoptimizable model options in the Models gallery are preset starting points with different settings, suitable for a range of different classification problems. To use optimizable model options and tune model hyperparameters automatically, see “Hyperparameter Optimization in Classification Learner App” on page 23-56. For help choosing the best classifier type for your problem, see the table showing typical characteristics of different supervised learning algorithms and the MATLAB function called by each one for binary or multiclass data. Use the table as a guide for your final choice of algorithms. Decide on the tradeoff you want in speed, flexibility, and interpretability. The best classifier type depends on your data. Tip To avoid overfitting, look for a model of lower flexibility that provides sufficient accuracy. For example, look for simple models such as decision trees and discriminants that are fast and easy to interpret. If the models are not accurate enough predicting the response, choose other classifiers with higher flexibility, such as ensembles. To control flexibility, see the details for each classifier type.

23-23

23

Classification Learner

Characteristics of Classifier Types Classifier

Interpretability

Function

“Decision Trees” on page 23-26

Easy

fitctree

“Discriminant Analysis” on page Easy 23-28

fitcdiscr

“Logistic Regression Classifiers” Easy on page 23-29

fitglm (for binary GLM), fitclinear, fitcecoc (for multiclass)

“Naive Bayes Classifiers” on page 23-30

Easy

fitcnb

“Support Vector Machines” on page 23-31

Easy for linear SVM Hard for all other kernel types

fitcsvm, fitcecoc (for multiclass)

“Efficiently Trained Linear Classifiers” on page 23-33

Easy

fitclinear, fitcecoc (for multiclass)

“Nearest Neighbor Classifiers” on page 23-35

Hard

fitcknn

“Kernel Approximation Classifiers” on page 23-37

Hard

fitckernel, fitcecoc (for multiclass)

“Ensemble Classifiers” on page 23-38

Hard

fitcensemble

“Neural Network Classifiers” on Hard page 23-41

fitcnet

To read a description of each classifier in Classification Learner, switch to the details view.

23-24

Choose Classifier Options

Tip After you choose a classifier type (for example, decision trees), try training using each of the classifiers. The nonoptimizable options in the Models gallery are starting points with different settings. Try them all to see which option produces the best model with your data. For workflow instructions, see “Train Classification Models in Classification Learner App” on page 2310. Categorical Predictor Support In Classification Learner, the Models gallery shows as available the classifier types that support your selected data. Classifier

All predictors numeric

All predictors categorical

Some categorical, some numeric

Decision Trees

Yes

Yes

Yes

Discriminant Analysis

Yes

No

No

Logistic Regression

Yes

Yes

Yes

Naive Bayes

Yes

Yes

Yes

SVM

Yes

Yes

Yes

Efficiently Trained Linear

Yes

Yes

Yes

Nearest Neighbor

Euclidean distance only Hamming distance only No

Kernel Approximation

Yes

Yes

Yes

23-25

23

Classification Learner

Classifier

All predictors numeric

All predictors categorical

Some categorical, some numeric

Ensembles

Yes

Yes, except Subspace Discriminant

Yes, except any Subspace

Neural Networks

Yes

Yes

Yes

Decision Trees Decision trees are easy to interpret, fast for fitting and prediction, and low on memory usage, but they can have low predictive accuracy. Try to grow simpler trees to prevent overfitting. Control the depth with the Maximum number of splits setting. Tip Model flexibility increases with the Maximum number of splits setting. Classifier Type

Interpretability

Model Flexibility

Coarse Tree

Easy

Low Few leaves to make coarse distinctions between classes (maximum number of splits is 4).

Medium Tree

Easy

Medium Medium number of leaves for finer distinctions between classes (maximum number of splits is 20).

Fine Tree

Easy

High Many leaves to make many fine distinctions between classes (maximum number of splits is 100).

Tip In the Models gallery, click All Trees to try each of the nonoptimizable decision tree options. Train them all to see which settings produce the best model with your data. Select the best model in the Models pane. To try to improve your model, try feature selection, and then try changing some advanced options. You train classification trees to predict responses to data. To predict a response, follow the decisions in the tree from the root (beginning) node down to a leaf node. The leaf node contains the response. Statistics and Machine Learning Toolbox trees are binary. Each step in a prediction involves checking the value of one predictor (variable). For example, here is a simple classification tree:

23-26

Choose Classifier Options

This tree predicts classifications based on two predictors, x1 and x2. To predict, start at the top node. At each decision, check the values of the predictors to decide which branch to follow. When the branches reach a leaf node, the data is classified either as type 0 or 1. You can visualize your decision tree model by exporting the model from the app, and then entering: view(trainedModel.ClassificationTree,"Mode","graph")

The figure shows an example fine tree trained with the fisheriris data.

Tip For an example, see “Train Decision Trees Using Classification Learner App” on page 23-93.

23-27

23

Classification Learner

Tree Model Hyperparameter Options Classification trees in Classification Learner use the fitctree function. You can set these options: • Maximum number of splits Specify the maximum number of splits or branch points to control the depth of your tree. When you grow a decision tree, consider its simplicity and predictive power. To change the number of splits, click the buttons or enter a positive integer value in the Maximum number of splits box. • A fine tree with many leaves is usually highly accurate on the training data. However, the tree might not show comparable accuracy on an independent test set. A leafy tree tends to overtrain, and its validation accuracy is often far lower than its training (or resubstitution) accuracy. • In contrast, a coarse tree does not attain high training accuracy. But a coarse tree can be more robust in that its training accuracy can approach that of a representative test set. Also, a coarse tree is easy to interpret. • Split criterion Specify the split criterion measure for deciding when to split nodes. Try each of the three settings to see if they improve the model with your data. Split criterion options are Gini's diversity index, Twoing rule, or Maximum deviance reduction (also known as cross entropy). The classification tree tries to optimize to pure nodes containing only one class. Gini's diversity index (the default) and the deviance criterion measure node impurity. The twoing rule is a different measure for deciding how to split a node, where maximizing the twoing rule expression increases node purity. For details of these split criteria, see ClassificationTree “More About” on page 35-819. • Surrogate decision splits — Only for missing data. Specify surrogate use for decision splits. If you have data with missing values, use surrogate splits to improve the accuracy of predictions. When you set Surrogate decision splits to On, the classification tree finds at most 10 surrogate splits at each branch node. To change the number, click the buttons or enter a positive integer value in the Maximum surrogates per node box. When you set Surrogate decision splits to Find All, the classification tree finds all surrogate splits at each branch node. The Find All setting can use considerable time and memory. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Classification Learner App” on page 23-56.

Discriminant Analysis

23-28

Choose Classifier Options

Discriminant analysis is a popular first classification algorithm to try because it is fast, accurate and easy to interpret. Discriminant analysis is good for wide datasets. Discriminant analysis assumes that different classes generate data based on different Gaussian distributions. To train a classifier, the fitting function estimates the parameters of a Gaussian distribution for each class. Classifier Type

Interpretability

Model Flexibility

Linear Discriminant

Easy

Low Creates linear boundaries between classes.

Quadratic Discriminant

Easy

Low Creates nonlinear boundaries between classes (ellipse, parabola or hyperbola).

Discriminant Model Hyperparameter Options Discriminant analysis in Classification Learner uses the fitcdiscr function. For both linear and quadratic discriminants, you can change the Covariance structure option. If you have predictors with zero variance or if any of the covariance matrices of your predictors are singular, training can fail using the default, Full covariance structure. If training fails, select the Diagonal covariance structure instead. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Classification Learner App” on page 23-56.

Logistic Regression Classifiers Logistic regression is a popular classification algorithm to try because it is easy to interpret. These classifiers model the class probabilities as a function of the linear combination of predictors. Classifier Type

Interpretability

Binary GLM Logistic Regression Easy was Logistic Regression

Efficient Logistic Regression

Easy

Model Flexibility Low You cannot change any parameters to control model flexibility. Medium You can change the hyperparameter settings for solver, lambda, and beta tolerance.

Binary GLM logistic regression in Classification Learner uses the fitglm function. Efficient logistic regression uses the fitclinear function for binary class data, and fitcecoc for multiclass data. fitclinear and fitcecoc use techniques that reduce the training computation time at the cost of 23-29

23

Classification Learner

some accuracy. When training on data with many predictors or many observations, consider using efficient logistic regression. Logistic Regression Hyperparameter Options You cannot set any hyperparameter options for the binary GLM classifier. For efficient logistic regression, you can set these options: • Solver Specify the objective function minimization technique. The Auto setting uses the BFGS solver. If the model has more than 100 predictors, the Auto setting uses the SGD solver. Note that the Dual SGD solver setting is not available for the efficient logistic regression classifier. For more information on solvers, see “Solver” on page 35-0 . • Regularization strength (lambda) Specify the regularization strength parameter. The Auto setting sets lambda equal to 1/n, where n is the number of observations in the training sample (or the number of in-fold observations, if you use cross-validation). For more information on the regularization strength, see “Lambda” on page 35-0 . • Beta tolerance Specify the relative tolerance on the linear coefficients and bias term, which affects when optimization terminates. The default value is 0.0001. For more information on the beta tolerance, see “BetaTolerance” on page 35-0 .

Naive Bayes Classifiers Naive Bayes classifiers are easy to interpret and useful for multiclass classification. The naive Bayes algorithm leverages Bayes theorem and makes the assumption that predictors are conditionally independent, given the class. Use these classifiers if this independence assumption is valid for predictors in your data. However, the algorithm still appears to work well when the independence assumption is not valid. For kernel naive Bayes classifiers, you can control the kernel smoother type with the Kernel type setting, and control the kernel smoothing density support with the Support setting. Classifier Type

Interpretability

Model Flexibility

Gaussian Naive Bayes

Easy

Low You cannot change any parameters to control model flexibility.

Kernel Naive Bayes

Easy

Medium You can change settings for Kernel type and Support to control how the classifier models predictor distributions.

Naive Bayes in Classification Learner uses the fitcnb function. 23-30

Choose Classifier Options

Naive Bayes Model Hyperparameter Options For kernel naive Bayes classifiers, you can set these options: • Kernel type — Specify the kernel smoother type. Try setting each of these options to see if they improve the model with your data. Kernel type options are Gaussian, Box, Epanechnikov, or Triangle. • Support — Specify the kernel smoothing density support. Try setting each of these options to see if they improve the model with your data. Support options are Unbounded (all real values) or Positive (all positive real values). • Standardize data — Specify whether to standardize the numeric predictors. If predictors have widely different scales, standardizing can improve the fit. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Classification Learner App” on page 23-56. For next steps training models, see “Train Classification Models in Classification Learner App” on page 23-10.

Support Vector Machines In Classification Learner, you can train SVMs when your data has two or more classes. Classifier Type

Interpretability

Model Flexibility

Linear SVM

Easy

Low Makes a simple linear separation between classes.

Quadratic SVM

Hard

Medium

Cubic SVM

Hard

Medium

Fine Gaussian SVM

Hard

High — decreases with kernel scale setting. Makes finely detailed distinctions between classes, with kernel scale set to sqrt(P)/4.

Medium Gaussian SVM

Hard

Medium Medium distinctions, with kernel scale set to sqrt(P).

23-31

23

Classification Learner

Classifier Type

Interpretability

Model Flexibility

Coarse Gaussian SVM

Hard

Low Makes coarse distinctions between classes, with kernel scale set to sqrt(P)*4, where P is the number of predictors.

Tip Try training each of the nonoptimizable support vector machine options in the Models gallery. Train them all to see which settings produce the best model with your data. Select the best model in the Models pane. To try to improve your model, try feature selection, and then try changing some advanced options. An SVM classifies data by finding the best hyperplane that separates data points of one class from those of the other class. The best hyperplane for an SVM means the one with the largest margin between the two classes. Margin means the maximal width of the slab parallel to the hyperplane that has no interior data points. The support vectors are the data points that are closest to the separating hyperplane; these points are on the boundary of the slab. The following figure illustrates these definitions, with + indicating data points of type 1, and – indicating data points of type –1.

SVMs can also use a soft margin, meaning a hyperplane that separates many, but not all data points. For an example, see “Train Support Vector Machines Using Classification Learner App” on page 23111. SVM Model Hyperparameter Options If you have exactly two classes, Classification Learner uses the fitcsvm function to train the classifier. If you have more than two classes, the app uses the fitcecoc function to reduce the multiclass classification problem to a set of binary classification subproblems, with one SVM learner for each subproblem. To examine the code for the binary and multiclass classifier types, you can generate code from your trained classifiers in the app. You can set these options in the app: 23-32

Choose Classifier Options

• Kernel function Specify the Kernel function to compute the classifier. • Linear kernel, easiest to interpret • Gaussian or Radial Basis Function (RBF) kernel • Quadratic • Cubic • Box constraint level Specify the box constraint to keep the allowable values of the Lagrange multipliers in a box, a bounded region. To tune your SVM classifier, try increasing the box constraint level. Click the buttons or enter a positive scalar value in the Box constraint level box. Increasing the box constraint level can decrease the number of support vectors, but also can increase training time. The Box Constraint parameter is the soft-margin penalty known as C in the primal equations, and is a hard “box” constraint in the dual equations. • Kernel scale mode Specify manual kernel scaling if desired. When you set Kernel scale mode to Auto, then the software uses a heuristic procedure to select the scale value. The heuristic procedure uses subsampling. Therefore, to reproduce results, set a random number seed using rng before training the classifier. When you set Kernel scale mode to Manual, you can specify a value. Click the buttons or enter a positive scalar value in the Manual kernel scale box. The software divides all elements of the predictor matrix by the value of the kernel scale. Then, the software applies the appropriate kernel norm to compute the Gram matrix. Tip Model flexibility decreases with the kernel scale setting. • Multiclass coding Only for data with 3 or more classes. This method reduces the multiclass classification problem to a set of binary classification subproblems, with one SVM learner for each subproblem. One-vsOne trains one learner for each pair of classes. It learns to distinguish one class from the other. One-vs-All trains one learner for each class. It learns to distinguish one class from all others. • Standardize data Specify whether to scale each coordinate distance. If predictors have widely different scales, standardizing can improve the fit. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Classification Learner App” on page 23-56.

Efficiently Trained Linear Classifiers

23-33

23

Classification Learner

The efficiently trained linear classifiers use techniques that reduce the training computation time at the cost of some accuracy. The available efficiently trained models are logistic regression and support vector machines (SVM). When training on data with many predictors or many observations, consider using efficiently trained linear classifiers instead of the existing binary GLM logistic regression or linear SVM preset models. Classifier Type

Interpretability

Model Flexibility

Efficient Logistic Regression

Easy

Medium You can change the hyperparameter settings for solver, lambda, and beta tolerance.

Efficient Linear SVM

Easy

Medium You can change the hyperparameter settings for solver, lambda, and beta tolerance.

Hyperparameter Options for Efficiently Trained Linear Classifiers The efficiently trained linear classifiers use the fitclinear function for binary class data, and the fitcecoc function for multiclass data. You can set the following options: • Solver Specify the objective function minimization technique. The Auto setting uses the BFGS solver. If the model has more than 100 predictors, the Auto setting uses the SGD solver for efficient logistic regression, and the Dual SGD solver for efficient linear SVM. Note that the Dual SGD solver setting is not available for the efficient logistic regression classifier. For more information on solvers, see Solver. • Regularization — Specify the complexity penalty type, either a lasso (L1) penalty or a ridge (L2) penalty. Depending on the other hyperparameter values, the available regularization options are Lasso, Ridge, and Auto. If you set this option to Auto, the software selects a lasso penalty when the model uses a SpaRSA solver, and a ridge penalty otherwise. For more information, see Regularization. • Regularization strength (lambda) Specify the regularization strength parameter. The Auto setting sets lambda equal to 1/n, where n is the number of observations in the training sample (or the number of in-fold observations, if you use cross-validation). For more information on the regularization strength, see Lambda. • Beta tolerance Specify the relative tolerance on the linear coefficients and bias term, which affects when optimization terminates. The default value is 0.0001. For more information on the beta tolerance, see BetaTolerance. • Multiclass coding — Specify the method for reducing the multiclass problem to a set of binary subproblems, with one linear learner for each subproblem. This value is applicable only for data with more than two classes. • One-vs-One trains one learner for each pair of classes. This method learns to distinguish one class from the other. 23-34

Choose Classifier Options

• One-vs-All trains one learner for each class. This method learns to distinguish one class from all others. For more information, see Coding. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Classification Learner App” on page 23-56.

Nearest Neighbor Classifiers Nearest neighbor classifiers typically have good predictive accuracy in low dimensions, but might not in high dimensions. They have high memory usage, and are not easy to interpret. Tip Model flexibility decreases with the Number of neighbors setting. Classifier Type

Interpretability

Model Flexibility

Fine KNN

Hard

Finely detailed distinctions between classes. The number of neighbors is set to 1.

Medium KNN

Hard

Medium distinctions between classes. The number of neighbors is set to 10.

Coarse KNN

Hard

Coarse distinctions between classes. The number of neighbors is set to 100.

Cosine KNN

Hard

Medium distinctions between classes, using a Cosine distance metric. The number of neighbors is set to 10.

Cubic KNN

Hard

Medium distinctions between classes, using a cubic distance metric. The number of neighbors is set to 10.

Weighted KNN

Hard

Medium distinctions between classes, using a distance weight. The number of neighbors is set to 10.

Tip Try training each of the nonoptimizable nearest neighbor options in the Models gallery. Train them all to see which settings produce the best model with your data. Select the best model in the Models pane. To try to improve your model, try feature selection, and then (optionally) try changing some advanced options.

23-35

23

Classification Learner

What is k-Nearest Neighbor classification? Categorizing query points based on their distance to points (or neighbors) in a training dataset can be a simple yet effective way of classifying new points. You can use various metrics to determine the distance. Given a set X of n points and a distance function, k-nearest neighbor (kNN) search lets you find the k closest points in X to a query point or set of points. kNN-based algorithms are widely used as benchmark machine learning rules.

For an example, see “Train Nearest Neighbor Classifiers Using Classification Learner App” on page 23-115. KNN Model Hyperparameter Options Nearest Neighbor classifiers in Classification Learner use the fitcknn function. You can set these options: • Number of neighbors Specify the number of nearest neighbors to find for classifying each point when predicting. Specify a fine (low number) or coarse classifier (high number) by changing the number of neighbors. For example, a fine KNN uses one neighbor, and a coarse KNN uses 100. Many neighbors can be time consuming to fit. • Distance metric You can use various metrics to determine the distance to points. For definitions, see the class ClassificationKNN. • Distance weight Specify the distance weighting function. You can choose Equal (no weights), Inverse (weight is 1/distance), or Squared Inverse (weight is 1/distance2). 23-36

Choose Classifier Options

• Standardize data Specify whether to scale each coordinate distance. If predictors have widely different scales, standardizing can improve the fit. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Classification Learner App” on page 23-56.

Kernel Approximation Classifiers In Classification Learner, you can use kernel approximation classifiers to perform nonlinear classification of data with many observations. For large in-memory data, kernel classifiers tend to train and predict faster than SVM classifiers with Gaussian kernels. The Gaussian kernel classification models map predictors in a low-dimensional space into a highdimensional space, and then fit a linear model to the transformed predictors in the high-dimensional space. Choose between fitting an SVM linear model and fitting a logistic regression linear model in the expanded space.

Tip In the Models gallery, click All Kernels to try each of the preset kernel approximation options and see which settings produce the best model with your data. Select the best model in the Models pane, and try to improve that model by using feature selection and changing some advanced options. Classifier Type

Interpretability

Model Flexibility

SVM Kernel

Hard

Medium — increases as the Kernel scale setting decreases

Logistic Regression Kernel

Hard

Medium — increases as the Kernel scale setting decreases

For an example, see “Train Kernel Approximation Classifiers Using Classification Learner App” on page 23-119. Kernel Model Hyperparameter Options If you have exactly two classes, Classification Learner uses the fitckernel function to train kernel classifiers. If you have more than two classes, the app uses the fitcecoc function to reduce the multiclass classification problem to a set of binary classification subproblems, with one kernel learner for each subproblem. You can set these options:

23-37

23

Classification Learner

• Learner — Specify the linear classification model type to fit in the expanded space, either SVM or Logistic Regression. SVM kernel classifiers use a hinge loss function during model fitting, whereas logistic regression kernel classifiers use a deviance (logistic) loss. • Number of expansion dimensions — Specify the number of dimensions in the expanded space. • When you set this option to Auto, the software sets the number of dimensions to 2.^ceil(min(log2(p)+5,15)), where p is the number of predictors. • When you set this option to Manual, you can specify a value by clicking the buttons or entering a positive scalar value in the box. • Regularization strength (Lambda) — Specify the ridge (L2) regularization penalty term. When you use an SVM learner, the box constraint C and the regularization term strength λ are related by C = 1/(λn), where n is the number of observations. • When you set this option to Auto, the software sets the regularization strength to 1/n, where n is the number of observations. • When you set this option to Manual, you can specify a value by clicking the buttons or entering a positive scalar value in the box. • Kernel scale — Specify the kernel scaling. The software uses this value to obtain a random basis for the random feature expansion. For more details, see “Random Feature Expansion” on page 358744. • When you set this option to Auto, the software uses a heuristic procedure to select the scale value. The heuristic procedure uses subsampling. Therefore, to reproduce results, set a random number seed using rng before training the classifier. • When you set this option to Manual, you can specify a value by clicking the buttons or entering a positive scalar value in the box. • Multiclass coding — Specify the method for reducing the multiclass problem to a set of binary subproblems, with one kernel learner for each subproblem. This value is applicable only for data with more than two classes. • One-vs-One trains one learner for each pair of classes. This method learns to distinguish one class from the other. • One-vs-All trains one learner for each class. This method learns to distinguish one class from all others. • Standardize data — Specify whether to standardize the numeric predictors. If predictors have widely different scales, standardizing can improve the fit. • Iteration limit — Specify the maximum number of training iterations. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Classification Learner App” on page 23-56.

Ensemble Classifiers Ensemble classifiers meld results from many weak learners into one high-quality ensemble model. Qualities depend on the choice of algorithm. Tip Model flexibility increases with the Number of learners setting. 23-38

Choose Classifier Options

All ensemble classifiers tend to be slow to fit because they often need many learners. Classifier Type

Interpretability

Ensemble Method

Model Flexibility

Boosted Trees

Hard

AdaBoost, with Decision Tree learners

Medium to high — increases with Number of learners or Maximum number of splits setting. Tip Boosted trees can usually do better than bagged, but might require parameter tuning and more learners

Bagged Trees

Hard

Random forest Bag, with Decision Tree learners

High — increases with Number of learners setting. Tip Try this classifier first.

Subspace Discriminant

Hard

Subspace, with Discriminant learners

Medium — increases with Number of learners setting. Good for many predictors

Subspace KNN

Hard

Subspace, with Nearest Neighbor learners

Medium — increases with Number of learners setting. Good for many predictors

RUSBoost Trees

Hard

RUSBoost, with Decision Tree learners

Medium — increases with Number of learners or Maximum number of splits setting. Good for skewed data (with many more observations of 1 class)

23-39

23

Classification Learner

Classifier Type

Interpretability

GentleBoost or Hard LogitBoost — not available in the Model Type gallery. If you have 2 class data, select manually.

Ensemble Method

Model Flexibility

GentleBoost or LogitBoost, with Decision Tree learners Choose Boosted Trees and change to GentleBoost method.

Medium — increases with Number of learners or Maximum number of splits setting. For binary classification only

Bagged trees use Breiman's 'random forest' algorithm. For reference, see Breiman, L. Random Forests. Machine Learning 45, pp. 5–32, 2001. Tips • Try bagged trees first. Boosted trees can usually do better but might require searching many parameter values, which is time-consuming. • Try training each of the nonoptimizable ensemble classifier options in the Models gallery. Train them all to see which settings produce the best model with your data. Select the best model in the Models pane. To try to improve your model, try feature selection, PCA, and then (optionally) try changing some advanced options. • For boosting ensemble methods, you can get fine detail with either deeper trees or larger numbers of shallow trees. As with single tree classifiers, deep trees can cause overfitting. You need to experiment to choose the best tree depth for the trees in the ensemble, in order to trade-off data fit with tree complexity. Use the Number of learners and Maximum number of splits settings. For an example, see “Train Ensemble Classifiers Using Classification Learner App” on page 23-124. Ensemble Model Hyperparameter Options Ensemble classifiers in Classification Learner use the fitcensemble function. You can set these options: • For help choosing Ensemble method and Learner type, see the Ensemble table. Try the presets first. • Maximum number of splits For boosting ensemble methods, specify the maximum number of splits or branch points to control the depth of your tree learners. Many branches tend to overfit, and simpler trees can be more robust and easy to interpret. Experiment to choose the best tree depth for the trees in the ensemble. • Number of learners Try changing the number of learners to see if you can improve the model. Many learners can produce high accuracy, but can be time consuming to fit. Start with a few dozen learners, and then inspect the performance. An ensemble with good predictive power can need a few hundred learners. • Learning rate Specify the learning rate for shrinkage. If you set the learning rate to less than 1, the ensemble requires more learning iterations but often achieves better accuracy. 0.1 is a popular choice. 23-40

Choose Classifier Options

• Subspace dimension For subspace ensembles, specify the number of predictors to sample in each learner. The app chooses a random subset of the predictors for each learner. The subsets chosen by different learners are independent. • Number of predictors to sample Specify the number of predictors to select at random for each split in the tree learners. • When you set this option to Select All, the software uses all available predictors. • When you set this option to Set Limit, you can specify a value by clicking the buttons or entering a positive integer value in the box. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Classification Learner App” on page 23-56.

Neural Network Classifiers Neural network models typically have good predictive accuracy and can be used for multiclass classification; however, they are not easy to interpret. Model flexibility increases with the size and number of fully connected layers in the neural network.

to try each of the preset neural network Tip In the Models gallery, click All Neural Networks options and see which settings produce the best model with your data. Select the best model in the Models pane, and try to improve that model by using feature selection and changing some advanced options. Classifier Type

Interpretability

Model Flexibility

Narrow Neural Network

Hard

Medium — increases with the First layer size setting

Medium Neural Network

Hard

Medium — increases with the First layer size setting

Wide Neural Network

Hard

Medium — increases with the First layer size setting

Bilayered Neural Network

Hard

High — increases with the First layer size and Second layer size settings

23-41

23

Classification Learner

Classifier Type

Interpretability

Model Flexibility

Trilayered Neural Network

Hard

High — increases with the First layer size, Second layer size, and Third layer size settings

Each model is a feedforward, fully connected neural network for classification. The first fully connected layer of the neural network has a connection from the network input (predictor data), and each subsequent layer has a connection from the previous layer. Each fully connected layer multiplies the input by a weight matrix and then adds a bias vector. An activation function follows each fully connected layer. The final fully connected layer and the subsequent softmax activation function produce the network's output, namely classification scores (posterior probabilities) and predicted labels. For more information, see “Neural Network Structure” on page 35-2456. For an example, see “Train Neural Network Classifiers Using Classification Learner App” on page 23138. Neural Network Model Hyperparameter Options Neural network classifiers in Classification Learner use the fitcnet function. You can set these options: • Number of fully connected layers — Specify the number of fully connected layers in the neural network, excluding the final fully connected layer for classification. You can choose a maximum of three fully connected layers. • First layer size, Second layer size, and Third layer size — Specify the size of each fully connected layer, excluding the final fully connected layer. If you choose to create a neural network with multiple fully connected layers, consider specifying layers with decreasing sizes. • Activation — Specify the activation function for all fully connected layers, excluding the final fully connected layer. The activation function for the last fully connected layer is always softmax. Choose from the following activation functions: ReLU, Tanh, None, and Sigmoid. • Iteration limit — Specify the maximum number of training iterations. • Regularization strength (Lambda) — Specify the ridge (L2) regularization penalty term. • Standardize data — Specify whether to standardize the numeric predictors. If predictors have widely different scales, standardizing can improve the fit. Standardizing the data is highly recommended. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Classification Learner App” on page 23-56.

See Also Related Examples

23-42

•

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Feature Selection and Feature Transformation Using Classification Learner App” on page 2344

Choose Classifier Options

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

•

“Train Decision Trees Using Classification Learner App” on page 23-93

23-43

23

Classification Learner

Feature Selection and Feature Transformation Using Classification Learner App In this section... “Investigate Features in the Scatter Plot” on page 23-44 “Select Features to Include” on page 23-46 “Transform Features with PCA in Classification Learner” on page 23-48 “Investigate Features in the Parallel Coordinates Plot” on page 23-48

Investigate Features in the Scatter Plot In Classification Learner, try to identify predictors that separate classes well by plotting different pairs of predictors on the scatter plot. The plot can help you investigate features to include or exclude. You can visualize training data and misclassified points on the scatter plot. Before you train a classifier, the scatter plot shows the data. If you have trained a classifier, the scatter plot shows model prediction results. Switch to plotting only the data by selecting Data in the Plot controls. • Choose features to plot using the X and Y lists under Predictors. • Look for predictors that separate classes well. For example, plotting the fisheriris data, you can see that sepal length and sepal width separate one of the classes well (setosa). You need to plot other predictors to see if you can separate the other two classes.

23-44

Feature Selection and Feature Transformation Using Classification Learner App

• Show or hide specific classes using the check boxes under Show. • Change the stacking order of the plotted classes by selecting a class under Classes and then clicking Move to Front. • Investigate finer details by zooming in and out and panning across the plot. To enable zooming or panning, hover the mouse over the scatter plot and click the corresponding button on the toolbar that appears above the top right of the plot. • If you identify predictors that are not useful for separating out classes, then try using Feature Selection to remove them and train classifiers including only the most useful predictors. See “Select Features to Include” on page 23-46. After you train a classifier, the scatter plot shows model prediction results. You can show or hide correct or incorrect results and visualize the results by class. See “Plot Classifier Results” on page 23-73. You can export the scatter plots you create in the app to figures. See “Export Plots in Classification Learner App” on page 23-81. 23-45

23

Classification Learner

Select Features to Include In Classification Learner, you can specify different features (or predictors) to include in the model. See if you can improve models by removing features with low predictive power. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily without some predictors. You can determine which important predictors to include by using different feature ranking algorithms. After you select a feature ranking algorithm, the app displays a plot of the sorted feature importance scores, where larger scores (including Infs) indicate greater feature importance. The app also displays the ranked features and their scores in a table. To use feature ranking algorithms in Classification Learner, click Feature Selection in the Options section of the Learn tab. The app opens a Default Feature Selection tab, where you can choose a feature ranking algorithm. Feature Ranking Algorithm

Supported Data Type

Description

MRMR

Categorical and continuous features

Rank features sequentially using the “Minimum Redundancy Maximum Relevance (MRMR) Algorithm” on page 35-3172. For more information, see fscmrmr.

Chi2

Categorical and continuous features

Examine whether each predictor variable is independent of the response variable by using individual chi-square tests, and then rank features using the pvalues of the chi-square test statistics. Scores correspond to –log(p). For more information, see fscchi2.

ReliefF

Either all categorical or all continuous features

Rank features using the “ReliefF” on page 35-7157 algorithm with 10 nearest neighbors. This algorithm works best for estimating feature importance for distance-based supervised models that use pairwise distances between observations to predict the response. For more information, see relieff.

23-46

Feature Selection and Feature Transformation Using Classification Learner App

Feature Ranking Algorithm

Supported Data Type

Description

ANOVA

Categorical and continuous features

Perform one-way analysis of variance for each predictor variable, grouped by class, and then rank features using the pvalues. For each predictor variable, the app tests the hypothesis that the predictor values grouped by the response classes are drawn from populations with the same mean against the alternative hypothesis that the population means are not all the same. Scores correspond to –log(p). For more information, see anova1.

Kruskal Wallis

Categorical and continuous features

Rank features using the pvalues returned by the “KruskalWallis Test” on page 35-4518. For each predictor variable, the app tests the hypothesis that the predictor values grouped by the response classes are drawn from populations with the same median against the alternative hypothesis that the population medians are not all the same. Scores correspond to –log(p). For more information, see kruskalwallis.

Choose between selecting the highest ranked features and selecting individual features. • Choose Select highest ranked features to avoid bias in validation metrics. For example, if you use a cross-validation scheme, then for each training fold, the app performs feature selection before training a model. Different folds can select different predictors as the highest ranked features. • Choose Select individual features to include specific features in model training. If you use a cross-validation scheme, then the app uses the same features across all training folds. When you are done selecting features, click Save and Apply. Your selections affect all draft models in the Models pane and will be applied to new draft models that you create using the gallery in the Models section of the Learn tab. To select features for a single draft model, open and edit the model summary. Click the model in the Models pane, and then click the model Summary tab (if necessary). The Summary tab includes an editable Feature Selection section.

23-47

23

Classification Learner

After you train a model, the Feature Selection section of the model Summary tab lists the features used to train the full model (that is, the model trained using training and validation data). To learn more about how Classification Learner applies feature selection to your data, generate code for your trained classifier. For more information, see “Generate MATLAB Code to Train the Model with New Data” on page 23-87. For an example using feature selection, see “Train Decision Trees Using Classification Learner App” on page 23-93.

Transform Features with PCA in Classification Learner Use principal component analysis (PCA) to reduce the dimensionality of the predictor space. Reducing the dimensionality can create classification models in Classification Learner that help prevent overfitting. PCA linearly transforms predictors in order to remove redundant dimensions, and generates a new set of variables called principal components. 1

On the Learn tab, in the Options section, select PCA.

2

In the Default PCA Options dialog box, select the Enable PCA check box, and then click Save and Apply. The app applies the changes to all existing draft models in the Models pane and to new draft models that you create using the gallery in the Models section of the Learn tab.

3

When you next train a model using the Train All button, the pca function transforms your selected features before training the classifier.

4

By default, PCA keeps only the components that explain 95% of the variance. In the Default PCA Options dialog box, you can change the percentage of variance to explain by selecting the Explained variance value. A higher value risks overfitting, while a lower value risks removing useful dimensions.

5

If you want to limit the number of PCA components manually, select Specify number of components in the Component reduction criterion list. Select the Number of numeric components value. The number of components cannot be larger than the number of numeric predictors. PCA is not applied to categorical predictors.

You can check PCA options for trained models in the PCA section of the Summary tab. Click a trained model in the Models pane, and then click the model Summary tab (if necessary). For example: PCA is keeping enough components to explain 95% variance. After training, 2 components were kept. Explained variance per component (in order): 92.5%, 5.3%, 1.7%, 0.5%

Check the explained variance percentages to decide whether to change the number of components. To learn more about how Classification Learner applies PCA to your data, generate code for your trained classifier. For more information on PCA, see the pca function.

Investigate Features in the Parallel Coordinates Plot To investigate features to include or exclude, use the parallel coordinates plot. You can visualize highdimensional data on a single plot to see 2-D patterns. The plot can help you understand relationships 23-48

Feature Selection and Feature Transformation Using Classification Learner App

between features and identify useful predictors for separating classes. You can visualize training data and misclassified points on the parallel coordinates plot. When you plot classifier results, misclassified points have dashed lines. 1

On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Parallel Coordinates in the Validation Results group.

2

On the plot, drag the X tick labels to reorder the predictors. Changing the order can help you identify predictors that separate classes well.

3

To specify which predictors to plot, use the Predictors check boxes. A good practice is to plot a few predictors at a time. If your data has many predictors, the plot shows the first 10 predictors by default.

4

If the predictors have significantly different scales, scale the data for easier visualization. Try different options in the Scaling list: • None displays raw data along coordinate rulers that have the same minimum and maximum limits. • Range displays raw data along coordinate rulers that have independent minimum and maximum limits. • Z-Score displays z-scores (with a mean of 0 and a standard deviation of 1) along each coordinate ruler. • Zero Mean displays data centered to have a mean of 0 along each coordinate ruler. • Unit Variance displays values scaled by standard deviation along each coordinate ruler. • L2 Norm displays 2-norm values along each coordinate ruler.

5

If you identify predictors that are not useful for separating out classes, use Feature Selection to remove them and train classifiers including only the most useful predictors. See “Select Features to Include” on page 23-46.

The plot of the fisheriris data shows the petal length and petal width features separate the classes best.

23-49

23

Classification Learner

For more information, see parallelplot. You can export the parallel coordinates plots you create in the app to figures. See “Export Plots in Classification Learner App” on page 23-81.

See Also Related Examples

23-50

•

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Choose Classifier Options” on page 23-22

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Plots in Classification Learner App” on page 23-81

•

“Generate MATLAB Code to Train the Model with New Data” on page 23-87

Misclassification Costs in Classification Learner App

Misclassification Costs in Classification Learner App In this section... “Specify Misclassification Costs” on page 23-51 “Assess Model Performance” on page 23-54 “Misclassification Costs in Exported Model and Generated Code” on page 23-55 By default, the Classification Learner app creates models that assign the same penalty to all misclassifications during training. For a given observation, the app assigns a penalty of 0 if the observation is classified correctly and a penalty of 1 if the observation is classified incorrectly. In some cases, this assignment is inappropriate. For example, suppose you want to classify patients as either healthy or sick. The cost of misclassifying a sick person as healthy might be five times the cost of misclassifying a healthy person as sick. For cases where you know the cost of misclassifying observations of one class into another, and the costs vary across the classes, specify the misclassification costs before training your models. Note Custom misclassification costs are not supported for logistic regression models.

Specify Misclassification Costs In the Classification Learner app, in the Options section of the Learn tab, select Costs. The app opens a dialog box that shows the default misclassification costs (cost matrix) as a table with row and column labels determined by the classes in the response variable. The rows of the table correspond to the true classes, and the columns correspond to the predicted classes. You can interpret the cost matrix in this way: the entry in row i and column j is the cost of misclassifying ith class observations into the jth class. The diagonal entries of the cost matrix must be 0, and the off-diagonal entries must be nonnegative real numbers. You can specify your own misclassification costs in two ways: by entering values directly into the table in the dialog box or by importing a workspace variable that contains the cost values.

23-51

23

Classification Learner

Note A scaled version of the cost matrix gives the same classification results (for example, confusion matrix and accuracy), but with a different total misclassification cost. That is, if CostMat is the misclassification cost matrix and a is a positive, real scalar, then a model trained with the cost matrix a*CostMat has the same confusion matrix as that model trained with CostMat. Enter Costs Directly in Dialog Box In the misclassification costs dialog box, double-click an entry in the table that you want to edit. Delete the value and type the correct misclassification cost for the entry. When you are done editing the table, click Save and Apply to save your changes. The changes apply to all existing draft models and to any new draft models you create using the Models gallery on the Learn tab. Import Workspace Variable Containing Costs In the misclassification costs dialog box, click Import from Workspace. The app opens a dialog box for importing costs from a variable in the MATLAB workspace.

23-52

Misclassification Costs in Classification Learner App

From the Cost variable list, select the cost matrix or structure that contains the misclassification costs. • Cost matrix — The matrix must contain the misclassification costs. The diagonal entries must be 0, and the off-diagonal entries must be nonnegative real numbers. By default, the app uses the class order shown in the previous misclassification costs dialog box to interpret the cost matrix values. To specify the order of the classes in the cost matrix, create a separate workspace variable containing the class names in the correct order. In the import dialog box, select the appropriate variable from the Class order in cost variable list. The workspace variable containing the class names must be a categorical vector, logical vector, numeric vector, string array, or cell array of character vectors. The class names must match (in spelling and capitalization) the class names in the response variable. • Structure — The structure must contain the fields ClassificationCosts and ClassNames with these specifications: • ClassificationCosts — Matrix that contains misclassification costs. • ClassNames — Names of the classes. The order of the classes in ClassNames determines the order of the rows and columns of ClassificationCosts. The variable ClassNames must be a categorical vector, logical vector, numeric vector, string array, or cell array of character vectors. The class names must match (in spelling and capitalization) the class names in the response variable. After specifying the cost variable and the class order in the cost variable, click Import. The app updates the table in the misclassification costs dialog box. After you specify a cost matrix that differs from the default, the app updates the Summary tab of existing draft models. In the Summary pane, the app displays a Misclassification Costs: Custom section. For models that use the default misclassification costs, the app displays a Misclassification Costs: Default section.

23-53

23

Classification Learner

You can click Misclassification Costs: Custom to expand the section and view the table of misclassification costs.

Assess Model Performance After specifying misclassification costs, you can train and tune your models as usual. However, using custom misclassification costs can change how you assess the performance of a model. For example, instead of choosing the model with the best accuracy, choose a model that has good accuracy and a low total misclassification cost. The total misclassification cost for a model is sum(CostMat.*ConfusionMat,"all"), where CostMat is the misclassification cost matrix and ConfusionMat is the confusion matrix for the model. The confusion matrix shows how the model classifies observations in each class. See “Check Performance Per Class in the Confusion Matrix” on page 23-74. To inspect the total misclassification cost of a trained model, select the model in the Models pane. Click the Open selected model summary button in the upper right of the Models pane. In the Summary tab, look at the Training Results section. The total misclassification cost is listed below the accuracy of the model.

23-54

Misclassification Costs in Classification Learner App

Misclassification Costs in Exported Model and Generated Code After you train a model with custom misclassification costs and export it from the app, you can find the custom costs inside the exported model. For example, if you export a tree model as a structure named trainedModel, you can use the following code to access the cost matrix and the order of the classes in the matrix. trainedModel.ClassificationTree.Cost trainedModel.ClassificationTree.ClassNames

When you generate MATLAB code for a model trained with custom misclassification costs, the generated code includes a cost matrix that is passed to the training function through the Cost namevalue argument.

See Also Related Examples •

“Train and Compare Classifiers Using Misclassification Costs in Classification Learner App” on page 23-142

•

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-44

•

“Choose Classifier Options” on page 23-22

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

23-55

23

Classification Learner

Hyperparameter Optimization in Classification Learner App In this section... “Select Hyperparameters to Optimize” on page 23-56 “Optimization Options” on page 23-63 “Minimum Classification Error Plot” on page 23-65 “Optimization Results” on page 23-67 After you choose a particular type of model to train, for example a decision tree or a support vector machine (SVM), you can tune your model by selecting different advanced options. For example, you can change the maximum number of splits for a decision tree or the box constraint of an SVM. Some of these options are internal parameters of the model, or hyperparameters, that can strongly affect its performance. Instead of manually selecting these options, you can use hyperparameter optimization within the Classification Learner app to automate the selection of hyperparameter values. For a given model type, the app tries different combinations of hyperparameter values by using an optimization scheme that seeks to minimize the model classification error, and returns a model with the optimized hyperparameters. You can use the resulting model as you would any other trained model. Note Because hyperparameter optimization can lead to an overfitted model, the recommended approach is to create a separate test set before importing your data into the Classification Learner app. After you train your optimizable model, you can see how it performs on your test set. For an example, see “Train Classifier Using Hyperparameter Optimization in Classification Learner App” on page 23-150. To perform hyperparameter optimization in Classification Learner, follow these steps: 1

Choose a model type and decide which hyperparameters to optimize. See “Select Hyperparameters to Optimize” on page 23-56. Note Hyperparameter optimization is not supported for binary GLM logistic regression models.

2

(Optional) Specify how the optimization is performed. For more information, see “Optimization Options” on page 23-63.

3

Train your model. Use the “Minimum Classification Error Plot” on page 23-65 to track the optimization results.

4

Inspect your trained model. See “Optimization Results” on page 23-67.

Select Hyperparameters to Optimize In the Classification Learner app, in the Models section of the Learn tab, click the arrow to open the gallery. The gallery includes optimizable models that you can train using hyperparameter optimization. After you select an optimizable model, you can choose which of its hyperparameters you want to optimize. In the model Summary tab, in the Model Hyperparameters section, select Optimize check boxes for the hyperparameters that you want to optimize. Under Values, specify the fixed values for the hyperparameters that you do not want to optimize or that are not optimizable. 23-56

Hyperparameter Optimization in Classification Learner App

This table describes the hyperparameters that you can optimize for each type of model and the search range of each hyperparameter. It also includes the additional hyperparameters for which you can specify fixed values. Model

Optimizable Hyperparameters

Additional Hyperparameters

Optimizable Tree

• Maximum number • Surrogate decision of splits — The splits software searches • Maximum among integers logsurrogates per scaled in the range node [1,max(2,n–1)], where n is the number of observations.

Notes For more information, see “Tree Model Hyperparameter Options” on page 23-28.

• Split criterion — The software searches among Gini's diversity index, Twoing rule, and Maximum deviance reduction. Optimizable Discriminant

• Discriminant type — The software searches among Linear, Quadratic, Diagonal Linear, and Diagonal Quadratic.

• The Discriminant type optimizable hyperparameter combines the preset model types (Linear Discriminant and Quadratic Discriminant) with the Covariance structure advanced option of the preset models. For more information, see “Discriminant Model Hyperparameter Options” on page 23-29.

23-57

23

Classification Learner

Model

Optimizable Hyperparameters

Optimizable Naive Bayes

• Distribution names • Support — The software searches between Gaussian and Kernel. • Kernel type — The software searches among Gaussian, Box, Epanechnikov, and Triangle. • Standardize data — The software searches between Yes and No.

Optimizable SVM

• Kernel function — The software searches among Gaussian, Linear, Quadratic, and Cubic. • Box constraint level — The software searches among positive values log-scaled in the range [0.001,1000]. • Kernel scale — The software searches among positive values log-scaled in the range [0.001,1000]. • Multiclass coding — The software searches between One-vs-One and One-vs-All. • Standardize data — The software searches between Yes and No.

23-58

Additional Hyperparameters

Notes • The Gaussian value of the Distribution names optimizable hyperparameter specifies a Gaussian Naive Bayes model. Similarly, the Kernel Distribution names value specifies a Kernel Naive Bayes model. For more information, see “Naive Bayes Model Hyperparameter Options” on page 23-31. • The Kernel scale optimizable hyperparameter combines the Kernel scale mode and Manual kernel scale advanced options of the preset SVM models. • You can optimize the Kernel scale optimizable hyperparameter only when the Kernel function value is Gaussian. Unless you specify a value for Kernel scale by clearing the Optimize check box, the app uses the Manual value of 1 by default when the Kernel function has a value other than Gaussian. For more information, see “SVM Model Hyperparameter Options” on page 23-32.

Hyperparameter Optimization in Classification Learner App

Model

Optimizable Hyperparameters

Optimizable Efficient • Learner — The Linear software searches between SVM and Logistic regression. • Regularization — The software searches between Ridge and Lasso.

Additional Hyperparameters

Notes

• Solver

For more information, • Relative coefficient see “Hyperparameter Options for Efficiently tolerance (Beta Trained Linear tolerance) Classifiers” on page 2334.

• Regularization strength (Lambda) — The software searches among positive values logscaled in the range [0.00001/ n,100000/n], where n is the number of observations. • Multiclass coding — The software searches between One-vs-One and One-vs-All.

23-59

23

Classification Learner

Model

Optimizable Hyperparameters

Optimizable KNN

• Number of neighbors — The software searches among integers logscaled in the range [1,max(2,round(n /2))], where n is the number of observations. • Distance metric — The software searches among: • Euclidean • City block • Chebyshev • Minkowski (cubic) • Mahalanobis • Cosine • Correlation • Spearman • Hamming • Jaccard • Distance weight — The software searches among Equal, Inverse, and Squared inverse. • Standardize data — The software searches between Yes and No.

23-60

Additional Hyperparameters

Notes For more information, see “KNN Model Hyperparameter Options” on page 23-36.

Hyperparameter Optimization in Classification Learner App

Model

Optimizable Hyperparameters

Additional Hyperparameters

Notes

Optimizable Kernel

• Learner — The software searches between SVM and Logistic Regression.

• Iteration limit

For more information, see “Kernel Model Hyperparameter Options” on page 23-37.

• Number of expansion dimensions — The software searches among positive integers log-scaled in the range [100,10000]. • Regularization strength (Lambda) — The software searches among positive values logscaled in the range [0.001/ n,1000/n], where n is the number of observations. • Kernel scale — The software searches among positive values log-scaled in the range [0.001,1000]. • Multiclass coding — The software searches between One-vs-One and One-vs-All. • Standardize data — The software searches between Yes and No.

23-61

23

Classification Learner

Model

Optimizable Hyperparameters

Additional Hyperparameters

Notes

Optimizable Ensemble

• Ensemble method — The software searches among AdaBoost, RUSBoost, LogitBoost, GentleBoost, and Bag.

• Learner type

• The AdaBoost, LogitBoost, and GentleBoost values of the Ensemble method optimizable hyperparameter specify a Boosted Trees model. Similarly, the RUSBoost Ensemble method value specifies an RUSBoosted Trees model, and the Bag Ensemble method value specifies a Bagged Trees model.

• Maximum number of splits — The software searches among integers logscaled in the range [1,max(2,n–1)], where n is the number of observations. • Number of learners — The software searches among integers logscaled in the range [10,500]. • Learning rate — The software searches among real values log-scaled in the range [0.001,1]. • Number of predictors to sample — The software searches among integers in the range [1,max(2,p)], where p is the number of predictor variables.

• The LogitBoost and GentleBoost values are available only for binary classification. • You can optimize the Number of predictors to sample optimizable hyperparameter only when the Ensemble method value is Bag. Unless you specify a value for Number of predictors to sample by clearing the Optimize check box, the app uses the default value of Select All when the Ensemble method has a value other than Bag. For more information, see “Ensemble Model Hyperparameter Options” on page 23-40.

23-62

Hyperparameter Optimization in Classification Learner App

Model

Optimizable Hyperparameters

Additional Hyperparameters

Optimizable Neural Network

• Number of fully • Iteration limit connected layers — The software searches among 1, 2, and 3 fully connected layers.

Notes For more information, see “Neural Network Model Hyperparameter Options” on page 23-42.

• First layer size — The software searches among integers log-scaled in the range [1,300]. • Second layer size — The software searches among integers log-scaled in the range [1,300]. • Third layer size — The software searches among integers log-scaled in the range [1,300]. • Activation — The software searches among ReLU, Tanh, None, and Sigmoid. • Regularization strength (Lambda) — The software searches among real values log-scaled in the range [0.00001/ n,100000/n], where n is the number of observations. • Standardize data — The software searches between Yes and No.

Optimization Options

23-63

23

Classification Learner

By default, the Classification Learner app performs hyperparameter tuning by using Bayesian optimization. The goal of Bayesian optimization, and optimization in general, is to find a point that minimizes an objective function. In the context of hyperparameter tuning in the app, a point is a set of hyperparameter values, and the objective function is the loss function, or the classification error. For more information on the basics of Bayesian optimization, see “Bayesian Optimization Workflow” on page 10-25. You can specify how the hyperparameter tuning is performed. For example, you can change the optimization method to grid search or limit the training time. On the Learn tab, in the Options section, click Optimizer. The app opens a dialog box in which you can select optimization options. After making your selections, click Save and Apply. Your selections affect all draft optimizable models in the Models pane and will be applied to new optimizable models that you create using the gallery in the Models section of the Learn tab. To specify optimization options for a single optimizable model, open and edit the model summary before training the model. Click the model in the Models pane. The model Summary tab includes an editable Optimizer section. This table describes the available optimization options and their default values. Option

Description

Optimizer

The optimizer values are: • Bayesopt (default) – Use Bayesian optimization. Internally, the app calls the bayesopt function. • Grid search – Use grid search with the number of values per dimension determined by the Number of grid divisions value. The app searches in a random order, using uniform sampling without replacement from the grid. • Random search – Search at random among points, where the number of points corresponds to the Iterations value.

23-64

Hyperparameter Optimization in Classification Learner App

Option

Description

Acquisition function

When the app performs Bayesian optimization for hyperparameter tuning, it uses the acquisition function to determine the next set of hyperparameter values to try. The acquisition function values are: • Expected improvement per second plus (default) • Expected improvement • Expected improvement plus • Expected improvement per second • Lower confidence bound • Probability of improvement For details on how these acquisition functions work in the context of Bayesian optimization, see “Acquisition Function Types” on page 10-3.

Iterations

Each iteration corresponds to a combination of hyperparameter values that the app tries. When you use Bayesian optimization or random search, specify a positive integer that sets the number of iterations. The default value is 30. When you use grid search, the app ignores the Iterations value and evaluates the loss at every point in the entire grid. You can set a training time limit to stop the optimization process prematurely.

Training time limit

To set a training time limit, select this option and set the Maximum training time in seconds option. By default, the app does not have a training time limit.

Maximum training time in seconds

Set the training time limit in seconds as a positive real number. The default value is 300. The run time can exceed the training time limit because this limit does not interrupt an iteration evaluation.

Number of grid divisions

When you use grid search, set a positive integer as the number of values the app tries for each numeric hyperparameter. The app ignores this value for categorical hyperparameters. The default value is 10.

Minimum Classification Error Plot

23-65

23

Classification Learner

After specifying which model hyperparameters to optimize and setting any additional optimization options (optional), train your optimizable model. On the Learn tab, in the Train section, click Train All and select Train Selected. The app creates a Minimum Classification Error Plot that it updates as the optimization runs.

The minimum classification error plot displays the following information: • Estimated minimum classification error – Each light blue point corresponds to an estimate of the minimum classification error computed by the optimization process when considering all the sets of hyperparameter values tried so far, including the current iteration. The estimate is based on an upper confidence interval of the current classification error objective model, as mentioned in the Bestpoint hyperparameters description. If you use grid search or random search to perform hyperparameter optimization, the app does not display these light blue points. • Observed minimum classification error – Each dark blue point corresponds to the observed minimum classification error computed so far by the optimization process. For example, at the third iteration, the dark blue point corresponds to the minimum of the classification error observed in the first, second, and third iterations. 23-66

Hyperparameter Optimization in Classification Learner App

• Bestpoint hyperparameters – The red square indicates the iteration that corresponds to the optimized hyperparameters. You can find the values of the optimized hyperparameters listed in the upper right of the plot under Optimization Results. The optimized hyperparameters do not always provide the observed minimum classification error. When the app performs hyperparameter tuning by using Bayesian optimization (see “Optimization Options” on page 23-63 for a brief introduction), it chooses the set of hyperparameter values that minimizes an upper confidence interval of the classification error objective model, rather than the set that minimizes the classification error. For more information, see the "Criterion","minvisited-upper-confidence-interval" name-value argument of bestPoint. • Minimum error hyperparameters – The yellow point indicates the iteration that corresponds to the hyperparameters that yield the observed minimum classification error. For more information, see the "Criterion","min-observed" name-value argument of bestPoint. If you use grid search to perform hyperparameter optimization, the Bestpoint hyperparameters and the Minimum error hyperparameters are the same. Missing points in the plot correspond to NaN minimum classification error values.

Optimization Results When the app finishes tuning model hyperparameters, it returns a model trained with the optimized hyperparameter values (Bestpoint hyperparameters). The model metrics, displayed plots, and exported model correspond to this trained model with fixed hyperparameter values. To inspect the optimization results of a trained optimizable model, select the model in the Models pane and look at the model Summary tab.

23-67

23

Classification Learner

The model Summary tab includes these sections: • Training Results – Shows the performance of the optimizable model. See “View Model Metrics in Summary Tab and Models Pane” on page 23-71. • Model Hyperparameters – Displays the type of optimizable model and lists any fixed hyperparameter values • Optimized Hyperparameters – Lists the values of the optimized hyperparameters • Hyperparameter Search Range – Displays the search ranges for the optimized hyperparameters 23-68

Hyperparameter Optimization in Classification Learner App

• Optimizer – Shows the selected optimizer options When you perform hyperparameter tuning using Bayesian optimization and you export the resulting trained optimizable model to the workspace as a structure, the structure includes a BayesianOptimization object in the HyperParameterOptimizationResult field. The object contains the results of the optimization performed in the app. When you generate MATLAB code from a trained optimizable model, the generated code uses the fixed and optimized hyperparameter values of the model to train on new data. The generated code does not include the optimization process. For information on how to perform Bayesian optimization when you use a fit function, see “Bayesian Optimization Using a Fit Function” on page 10-26.

See Also Related Examples •

“Train Classifier Using Hyperparameter Optimization in Classification Learner App” on page 23150

•

“Bayesian Optimization Workflow” on page 10-25

•

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Choose Classifier Options” on page 23-22

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

23-69

23

Classification Learner

Visualize and Assess Classifier Performance in Classification Learner In this section... “Check Performance in the Models Pane” on page 23-70 “View Model Metrics in Summary Tab and Models Pane” on page 23-71 “Compare Model Information and Results in Table View” on page 23-72 “Plot Classifier Results” on page 23-73 “Check Performance Per Class in the Confusion Matrix” on page 23-74 “Check ROC Curve” on page 23-76 “Compare Model Plots by Changing Layout” on page 23-78 “Evaluate Test Set Model Performance” on page 23-79 After training classifiers in Classification Learner, you can compare models based on accuracy values, visualize results by plotting class predictions, and check performance using the confusion matrix and ROC curve. • If you use k-fold cross-validation, then the app computes the accuracy values using the observations in the k validation folds and reports the average cross-validation error. It also makes predictions on the observations in these validation folds and computes the confusion matrix and ROC curve based on these predictions. Note When you import data into the app, if you accept the defaults, the app automatically uses cross-validation. To learn more, see “Choose Validation Scheme” on page 23-19. • If you use holdout validation, the app computes the accuracy values using the observations in the validation fold and makes predictions on these observations. The app also computes the confusion matrix and ROC curve based on these predictions. • If you use resubstitution validation, the score is the resubstitution accuracy based on all the training data, and the predictions are resubstitution predictions.

Check Performance in the Models Pane After training a model in Classification Learner, check the Models pane to see which model has the best overall accuracy in percent. The best Accuracy (Validation) score is highlighted in a box. This score is the validation accuracy. The validation accuracy score estimates a model's performance on new data compared to the training data. Use the score to help you choose the best model. • For cross-validation, the score is the accuracy on all observations not set aside for testing, counting each observation when it was in a holdout (validation) fold. • For holdout validation, the score is the accuracy on the held-out observations. • For resubstitution validation, the score is the resubstitution accuracy against all the training data observations. The best overall score might not be the best model for your goal. A model with a slightly lower overall accuracy might be the best classifier for your goal. For example, false positives in a particular class 23-70

Visualize and Assess Classifier Performance in Classification Learner

might be important to you. You might want to exclude some predictors where data collection is expensive or difficult. To find out how the classifier performed in each class, examine the confusion matrix.

View Model Metrics in Summary Tab and Models Pane You can view model metrics in the model Summary tab and the Models pane, and use the metrics to assess and compare models. Alternatively, you can use the Results Table tab to compare models. For more information, see “Compare Model Information and Results in Table View” on page 23-72. The Training Results metrics are calculated on the validation set. The Test Results metrics, if displayed, are calculated on an imported test set. For more information, see “Evaluate Test Set Model Performance” on page 23-79.

Model Metrics Metric

Description

Tip

Accuracy

Percentage of observations that Look for larger accuracy values. are correctly classified

Total cost

Total misclassification cost

Look for smaller total cost values. Make sure the accuracy value is still large.

Prediction speed

Estimated prediction speed for new data, based on the prediction times for the validation data sets

Background processes inside and outside the app can affect this estimate, so train models under similar conditions for better comparisons.

Training time

Time spent training the model

Background processes inside and outside the app can affect this estimate, so train models under similar conditions for better comparisons.

Model size (Compact)

Size of the model if exported as a compact model (that is, without training data)

Look for model size values that fit the memory requirements of target hardware applications.

23-71

23

Classification Learner

You can sort models in the Models pane according to accuracy or total cost. To select a metric for model sorting, use the Sort by list at the top of the Models pane. Not all metrics are available for sorting in the Models pane. You can sort models by other metrics in the Results Table (see “Compare Model Information and Results in Table View” on page 23-72). You can also delete unwanted models listed in the Models pane. Select the model you want to delete and click the Delete selected model button in the upper right of the pane or right-click the model and select Delete. You cannot delete the last remaining model in the Models pane.

Compare Model Information and Results in Table View Rather than using the Summary tab or the Models pane to compare model metrics, you can use a table of results. On the Learn tab, in the Plots and Results section, click Results Table. In the Results Table tab, you can sort models by their training and test results, as well as by their options (such as model type, selected features, PCA, and so on). For example, to sort models by validation accuracy, click the sorting arrows in the Accuracy (Validation) column header. A down arrow indicates that models are sorted from highest accuracy to lowest accuracy. To view more table column options, click the "Select columns to display" button at the top right of the table. In the Select Columns to Display dialog box, check the boxes for the columns you want to display in the results table. Newly selected columns are appended to the table on the right.

Within the results table, you can manually drag and drop the table columns so that they appear in your preferred order. You can mark some models as favorites by using the Favorite column. The app keeps the selection of favorite models consistent between the results table and the Models pane. Unlike other columns, the Favorite and Model Number columns cannot be removed from the table. 23-72

Visualize and Assess Classifier Performance in Classification Learner

To remove a row from the table, right-click any entry within the row and click Hide row (or Hide selected row(s) if the row is highlighted). To remove consecutive rows, click any entry within the first row you want to remove, press Shift, and click any entry within the last row you want to remove. Then, right-click one of the highlighted entries and click Hide selected row(s). To restore all removed rows, right-click any entry in the table and click Show all rows. The restored rows are appended to the bottom of the table. To export the information in the table, use one of the export buttons at the top right of the table. Choose between exporting the table to the workspace or to a file. The exported table includes only the displayed rows and columns.

Plot Classifier Results Use a scatter plot to examine the classifier results. To view the scatter plot for a model, select the model in the Models pane. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Scatter in the Validation Results group. After you train a classifier, the scatter plot switches from displaying the data to showing model predictions. If you are using holdout or cross-validation, then these predictions are the predictions on the held-out (validation) observations. In other words, the software obtains each prediction by using a model that was trained without the corresponding observation. To investigate your results, use the controls on the right. You can: • Choose whether to plot model predictions or the data alone. • Show or hide correct or incorrect results using the check boxes under Model predictions. • Choose features to plot using the X and Y lists under Predictors. • Visualize results by class by showing or hiding specific classes using the check boxes under Show. • Change the stacking order of the plotted classes by selecting a class under Classes and then clicking Move to Front. • Zoom in and out, or pan across the plot. To enable zooming or panning, place the mouse over the scatter plot and click the corresponding button on the toolbar that appears above the top right of the plot.

23-73

23

Classification Learner

See also “Investigate Features in the Scatter Plot” on page 23-44. To export the scatter plots you create in the app to figures, see “Export Plots in Classification Learner App” on page 23-81.

Check Performance Per Class in the Confusion Matrix Use the confusion matrix plot to understand how the currently selected classifier performed in each class. After you train a classification model, the app automatically opens the confusion matrix for that model. If you train an "All" model, the app opens the confusion matrix for the first model only. To view the confusion matrix for another model, select the model in the Models pane. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Confusion Matrix (Validation) in the Validation Results group. The confusion matrix helps you identify the areas where the classifier performed poorly.

23-74

Visualize and Assess Classifier Performance in Classification Learner

When you open the plot, the rows show the true class, and the columns show the predicted class. If you are using holdout or cross-validation, then the confusion matrix is calculated using the predictions on the held-out (validation) observations. The diagonal cells show where the true class and predicted class match. If these diagonal cells are blue, the classifier has classified observations of this true class correctly. The default view shows the number of observations in each cell. To see how the classifier performed per class, under Plot, select the True Positive Rates (TPR), False Negative Rates (FNR) option. The TPR is the proportion of correctly classified observations per true class. The FNR is the proportion of incorrectly classified observations per true class. The plot shows summaries per true class in the last two columns on the right. Tip Look for areas where the classifier performed poorly by examining cells off the diagonal that display high percentages and are orange. The higher the percentage, the darker the hue of the cell color. In these orange cells, the true class and the predicted class do not match. The data points are misclassified.

23-75

23

Classification Learner

In this example, which uses the carbig data set, the fifth row from the top shows all cars with the true class Japan. The columns show the predicted classes. Of the cars from Japan, 77.2% are correctly classified, so 77.2% is the true positive rate for correctly classified points in this class, shown in the blue cell in the TPR column. The other cars in the Japan row are misclassified: 5.1% of the cars are incorrectly classified as from Germany, 5.1% are classified as from Sweden, and 12.7% are classified as from the USA. The false negative rate for incorrectly classified points in this class is 22.8%, shown in the orange cell in the FNR column. If you want to see numbers of observations (cars, in this example) instead of percentages, under Plot, select Number of observations. If false positives are important in your classification problem, plot results per predicted class (instead of true class) to investigate false discovery rates. To see results per predicted class, under Plot, select the Positive Predictive Values (PPV), False Discovery Rates (FDR) option. The PPV is the proportion of correctly classified observations per predicted class. The FDR is the proportion of incorrectly classified observations per predicted class. With this option selected, the confusion matrix now includes summary rows below the table. Positive predictive values are shown in blue for the correctly predicted points in each class, and false discovery rates are shown in orange for the incorrectly predicted points in each class. If you decide there are too many misclassified points in the classes of interest, try changing classifier settings or feature selection to search for a better model. To export the confusion matrix plots you create in the app to figures, see “Export Plots in Classification Learner App” on page 23-81.

Check ROC Curve View a receiver operating characteristic (ROC) curve after training a model. In the Plots and Results section, click the arrow to open the gallery, and then click ROC Curve (Validation) in the Validation Results group. The app creates a ROC curve by using the rocmetrics function.

23-76

Visualize and Assess Classifier Performance in Classification Learner

The ROC curve shows the true positive rate (TPR) versus the false positive rate (FPR) for different thresholds of classification scores, computed by the currently selected classifier. The Model Operating Point shows the false positive rate and true positive rate corresponding to the threshold used by the classifier to classify an observation. For example, a false positive rate of 0.4 indicates that the classifier incorrectly assigns 40% of the negative class observations to the positive class. A true positive rate of 0.9 indicates that the classifier correctly assigns 90% of the positive class observations to the positive class. The AUC (area the under curve) value corresponds to the integral of a ROC curve (TPR values) with respect to FPR from FPR = 0 to FPR = 1. The AUC value is a measure of the overall quality of the classifier. The AUC values are in the range 0 to 1, and larger AUC values indicate better classifier performance. Compare classes and trained models to see if they perform differently in the ROC curve. You can create a ROC curve for a specific class using the Show check boxes under Plot. However, you do not need to examine ROC curves for both classes in a binary classification problem. The two ROC curves are symmetric, and the AUC values are identical. A TPR of one class is a true negative rate (TNR) of the other class, and TNR is 1–FPR. Therefore, a plot of TPR versus FPR for one class is the same as a plot of 1–FPR versus 1–TPR for the other class. For a multiclass classifier, the app formulates a set of one-versus-all binary classification problems to have one binary problem for each class, and finds a ROC curve for each class using the corresponding binary problem. Each binary problem assumes that one class is positive and the rest are negative. 23-77

23

Classification Learner

The model operating point on the plot shows the performance of the classifier for each class in its one-versus-all binary problem.

For more information, see rocmetrics and “ROC Curve and Performance Metrics” on page 18-3. To export the ROC curve plots you create in the app to figures, see “Export Plots in Classification Learner App” on page 23-81.

Compare Model Plots by Changing Layout Visualize the results of models trained in Classification Learner by using the plot options in the Plots and Results section of the Learn tab. You can rearrange the layout of the plots to compare results across multiple models: use the options in the Layout button, drag and drop plots, or select the options provided by the Document Actions button located to the right of the model plot tabs. For example, after training two models in Classification Learner, display a plot for each model and change the plot layout to compare the plots by using one of these procedures: • In the Plots and Results section, click Layout and select Compare models. • Click the second model tab name, and then drag and drop the second model tab to the right. • Click the Document Actions button located to the far right of the model plot tabs. Select the Tile All option and specify a 1-by-2 layout. 23-78

Visualize and Assess Classifier Performance in Classification Learner

Note that you can click the Hide plot options button room for the plots.

at the top right of the plots to make more

Evaluate Test Set Model Performance After training a model in Classification Learner, you can evaluate the model performance on a test set in the app. This process allows you to check whether the validation accuracy provides a good estimate for the model performance on new data. 1

Import a test data set into Classification Learner. Alternatively, reserve some data for testing when importing data into the app (see “(Optional) Reserve Data for Testing” on page 23-21). • If the test data set is in the MATLAB workspace, then in the Data section on the Test tab, click Test Data and select From Workspace. • If the test data set is in a file, then in the Data section, click Test Data and select From File. Select a file type in the list, such as a spreadsheet, text file, or comma-separated values (.csv) file, or select All Files to browse for other file types such as .dat. In the Import Test Data dialog box, select the test data set from the Test Data Set Variable list. The test set must have the same variables as the predictors imported for training and validation. The unique values in the test response variable must be a subset of the classes in the full response variable.

2

Compute the test set metrics. • To compute test metrics for a single model, select the trained model in the Models pane. On the Test tab, in the Test section, click Test Selected. • To compute test metrics for all trained models, click Test All in the Test section. The app computes the test set performance of each model trained on the full data set, including training and validation data (but excluding test data). 23-79

23

Classification Learner

3

Compare the validation accuracy with the test accuracy. In the model Summary tab, the app displays the validation metrics and test metrics in the Training Results section and Test Results section, respectively. You can check if the validation accuracy gives a good estimate for the test accuracy. You can also visualize the test results using plots. • Display a confusion matrix. In the Plots and Results section on the Test tab, click Confusion Matrix (Test). • Display a ROC curve. In the Plots and Results section, click ROC Curve (Test).

For an example, see “Check Classifier Performance Using Test Set in Classification Learner App” on page 23-158. For an example that uses test set metrics in a hyperparameter optimization workflow, see “Train Classifier Using Hyperparameter Optimization in Classification Learner App” on page 23150.

See Also Related Examples

23-80

•

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Choose Classifier Options” on page 23-22

•

“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-44

•

“Export Plots in Classification Learner App” on page 23-81

•

“Export Classification Model to Predict New Data” on page 23-86

•

“Train Decision Trees Using Classification Learner App” on page 23-93

Export Plots in Classification Learner App

Export Plots in Classification Learner App After you create plots interactively in the Classification Learner app, you can export your app plots to MATLAB figures. You can then copy, save, or customize the new figures. Choose among the available plots: scatter plot on page 23-44, parallel coordinates plot on page 23-48, confusion matrix on page 23-74, ROC curve on page 23-76, minimum classification error plot on page 23-65, LIME explanations plot on page 23-163, Shapley explanations plot on page 23-167, and partial dependence plot on page 23-171. • Before exporting a plot, make sure the plot in the app displays the same data that you want in the new figure. • On the Learn, Test, or Explain tab, in the Export section, click Export Plot to Figure. The app creates a figure from the selected plot. • The new figure might not have the same interactivity options as the plot in the Classification Learner app. For example, data tips for the exported scatter plot show only X,Y values for the selected point, not the detailed information displayed in the app. • Additionally, the figure might have a different axes toolbar than the one in the app plot. For plots in Classification Learner, an axes toolbar appears above the top right of the plot. The buttons available on the toolbar depend on the contents of the plot. The toolbar can include buttons to export the plot as an image, add data tips, pan or zoom the data, and restore the view.

• Copy, save, or customize the new figure, which is displayed in the figure window. • To copy the figure, select Edit > Copy Figure. For more information, see “Copy Figure to Clipboard from Edit Menu”. • To save the figure, select File > Save As. Alternatively, you can follow the workflow described in “Customize Figure Before Saving”. •

on the figure toolbar. Right-click the To customize the figure, click the Edit Plot button section of the plot that you want to edit. You can change the listed properties, which might include Color, Font, Line Style, and other properties. Or, you can use the Property Inspector to change the figure properties.

As an example, export a scatter plot in the app to a figure, customize the figure, and save the modified figure. 1

In the MATLAB Command Window, read the sample file fisheriris.csv into a table. fishertable = readtable("fisheriris.csv");

2

Click the Apps tab.

3

In the Apps section, click the arrow to open the gallery. Under Machine Learning and Deep Learning, click Classification Learner.

4

On the Learn tab, in the File section, click New Session 5

.

In the New Session from Workspace dialog box, select the table fishertable from the Data Set Variable list. 23-81

23

Classification Learner

6

Click Start Session. Classification Learner creates a scatter plot of the data by default.

7

Change the predictors in the scatter plot to PetalLength and PetalWidth.

8

On the Learn tab, in the Export section, click Export Plot to Figure.

9

23-82

on the figure toolbar. Right-click the points in the In the new figure, click the Edit Plot button plot corresponding to the versicolor irises. In the context menu, select Color.

Export Plots in Classification Learner App

10 In the Color dialog box, select a new color and click OK.

23-83

23

Classification Learner

11 To save the figure, select File > Save As. Specify the saved file location, name, and type.

23-84

Export Plots in Classification Learner App

See Also Related Examples •

“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-44

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

23-85

23

Classification Learner

Export Classification Model to Predict New Data In this section... “Export the Model to the Workspace to Make Predictions for New Data” on page 23-86 “Make Predictions for New Data Using Exported Model” on page 23-86 “Generate MATLAB Code to Train the Model with New Data” on page 23-87 “Generate C Code for Prediction” on page 23-88 “Deploy Predictions Using MATLAB Compiler” on page 23-91 “Export Model for Deployment to MATLAB Production Server” on page 23-91

Export the Model to the Workspace to Make Predictions for New Data After you create classification models interactively in Classification Learner, you can export your best model to the workspace. You can then use the trained model to make predictions using new data. Note The final model Classification Learner exports is always trained using the full data set, excluding any data reserved for testing. The validation scheme that you use only affects the way that the app computes validation metrics. You can use the validation metrics and various plots that visualize results to pick the best model for your classification problem. To export a model to the MATLAB workspace: 1

In Classification Learner, select the model you want to export in the Models pane. You can typically export a full or compact version of the trained model to the workspace as a structure containing a classification object, such as ClassificationTree.

2

On the Learn tab, click Export, click Export Model and select Export Model. To exclude the training data and export a compact model, clear the check box in the Export Classification Model dialog box. Note that the check box is disabled if the model does not have training data or if the training data cannot be excluded from the model. You can still use a compact model for making predictions on new data. Some models, such as kernel approximation, never store training data. Other models, such as nearest neighbor and binary GLM logistic regression, always store training data.

3

In the Export Classification Model dialog box, edit the name of the exported variable, if necessary, and then click OK. The default name of the exported model, trainedModel, increments every time you export (for example, trainedModel1) to avoid overwriting previously exported classifiers. The new variable (for example, trainedModel) appears in the workspace. The app displays information about the exported model in the Command Window. Read the message to learn how to make predictions with new data.

Make Predictions for New Data Using Exported Model After you export a model to the workspace from Classification Learner, or run the code generated from the app, you get a trainedModel structure that you can use to make predictions using new 23-86

Export Classification Model to Predict New Data

data. The structure contains a classification object and a function for prediction. The structure allows you to make predictions for models that include principal component analysis (PCA). 1

To use the exported classifier to make predictions for new data T, use the form: [yfit,scores] = C.predictFcn(T)

C is the name of your variable (for example, trainedModel). An exported model trained using the binary GLM logistic regression preset does not include class scores. For an exported binary GLM logistic classifier, use the form: yfit = C.predictFcn(T)

Supply the data T with the same format and data type as the training data used in the app (table or matrix). • If you supply a table, ensure it contains the same predictor names as your training data. The predictFcn function ignores additional variables in tables. Variable formats and types must match the original training data. • If you supply a matrix, it must contain the same predictor columns or rows as your training data, in the same order and format. Do not include a response variable, any variables that you did not import in the app, or other unused variables. The output yfit contains a class prediction for each data point. The output scores contains the class scores returned by the trained model. scores is an n-by-k array, where n is the number of data points and k is the number of classes in the trained model. 2

Examine the fields of the exported structure. For help making predictions, enter: C.HowToPredict

You can also extract the classification object from the exported structure for further analysis (for example, trainedModel.ClassificationSVM, trainedModel.ClassificationTree, and so on, depending on your model type). Be aware that if you used feature transformation such as PCA in the app, you will need to take account of this transformation by using the information in the PCA fields of the structure.

Generate MATLAB Code to Train the Model with New Data After you create classification models interactively in Classification Learner, you can generate MATLAB code for your best model. You can then use the code to train the model with new data. Generate MATLAB code to: • Train on huge data sets. Explore models in the app trained on a subset of your data, then generate code to train a selected model on a larger data set • Create scripts for training models without needing to learn syntax of the different functions • Examine the code to learn how to train classifiers programmatically • Modify the code for further analysis, for example to set options that you cannot change in the app • Repeat your analysis on different data and automate training 1

In Classification Learner, in the Models pane, select the model you want to generate code for.

2

On the Learn tab, in the Export section, click Generate Function. 23-87

23

Classification Learner

The app generates code from your session and displays the file in the MATLAB Editor. The file includes the predictors and response, the classifier training methods, and validation methods. Save the file. 3

To retrain your classifier model, call the function from the command line with your original data or new data as the input argument or arguments. New data must have the same shape as the original data. Copy the first line of the generated code, excluding the word function, and edit the trainingData input argument to reflect the variable name of your training data or new data. Similarly, edit the responseData input argument (if applicable). For example, to retrain a classifier trained with the fishertable data set, enter: [trainedModel,validationAccuracy] = trainClassifier(fishertable)

The generated code returns a trainedModel structure that contains the same fields as the structure you create when you export a classifier from Classification Learner to the workspace. 4

If you want to automate training the same classifier with new data, or learn how to programmatically train classifiers, examine the generated code. The code shows you how to: • Process the data into the right shape • Train a classifier and specify all the classifier options • Perform cross-validation • Compute validation accuracy • Compute validation predictions and scores Note If you generate MATLAB code from a trained optimizable model, the generated code does not include the optimization process.

Generate C Code for Prediction If you train one of the models in this table using Classification Learner, you can generate C code for prediction.

23-88

Model Type

Underlying Model Object

Decision Tree

ClassificationTree or CompactClassificationTree

Discriminant Analysis

ClassificationDiscriminant or CompactClassificationDiscriminant

Naive Bayes Classifier

ClassificationNaiveBayes or CompactClassificationNaiveBayes

Support Vector Machine

ClassificationSVM (binary), CompactClassificationSVM (binary), ClassificationECOC (multiclass), or CompactClassificationECOC (multiclass)

Export Classification Model to Predict New Data

Model Type

Underlying Model Object

Efficiently Trained Linear Classifier

ClassificationLinear (binary), ClassificationECOC (multiclass), or CompactClassificationECOC (multiclass)

Nearest Neighbor Classifier

ClassificationKNN

Kernel Approximation

ClassificationKernel, ClassificationECOC (multiclass), or CompactClassificationECOC (multiclass)

Ensemble Classifier

ClassificationEnsemble, CompactClassificationEnsemble, or ClassificationBaggedEnsemble

Neural Network

ClassificationNeuralNetwork or CompactClassificationNeuralNetwork

Note You can generate C code for prediction using the binary GLM logistic regression model. However, because the underlying model for binary GLM logistic regression is a GeneralizedLinearModel object, this process requires you to add extra lines of code in the prediction entry-point function to convert numeric predictions to class predictions. For an example, see “Code Generation for Binary GLM Logistic Regression Model Trained in Classification Learner” on page 34-193. C code generation requires: • MATLAB Coder license • Appropriate model (binary or multiclass) 1

For example, train an SVM model in Classification Learner, and then export the model to the workspace. Find the underlying classification model object in the exported structure. Examine the fields of the structure to find the model object, for example, C.ClassificationSVM, where C is the name of your structure. The underlying model object depends on what type of SVM you trained (binary or multiclass) and whether you exported a full or compact model. The model object can be ClassificationSVM, CompactClassificationSVM, ClassificationECOC, or CompactClassificationECOC.

2

Use the function saveLearnerForCoder to prepare the model for code generation: saveLearnerForCoder(Mdl,filename). For example: saveLearnerForCoder(C.ClassificationSVM,'mySVM')

3

Create a function that loads the saved model and makes predictions on new data. For example: function label = classifyX (X) %#codegen %CLASSIFYX Classify using SVM Model % CLASSIFYX classifies the measurements in X % using the SVM model in the file mySVM.mat, and then % returns class labels in label. CompactMdl = loadLearnerForCoder('mySVM');

23-89

23

Classification Learner

label = predict(CompactMdl,X); end 4

Generate a MEX function from your function. For example: codegen classifyX.m -args {data}

The %#codegen compilation directive indicates that the MATLAB code is intended for code generation. To ensure that the MEX function can use the same input, specify the data in the workspace as arguments to the function using the -args option. Specify data as a matrix containing only the predictor columns used to train the model. 5

Use the MEX function to make predictions. For example: labels = classifyX_mex(data);

If you used feature selection or PCA feature transformation in the app, then you need to take additional steps. If you used manual feature selection, supply the same columns in X. The X argument is the input to your function. If you used PCA in the app, use the information in the PCA fields of the exported structure to take account of this transformation. It does not matter whether you imported a table or a matrix into the app, as long as X contains the matrix columns in the same order. Before generating code, follow these steps: 1

Save the PCACenters and PCACoefficients fields of the trained classifier structure, C, to file using the following command: save('pcaInfo.mat','-struct','C','PCACenters','PCACoefficients');

2

In your function file, include additional lines to perform the PCA transformation. Create a function that loads the saved model, performs PCA, and makes predictions on new data. For example: function label = classifyX (X) %#codegen %CLASSIFYX Classify using SVM Model % CLASSIFYX classifies the measurements in X % using the SVM model in the file mySVM.mat, % and then returns class labels in label. % If you used manual feature selection in the app, ensure that X % contains only the columns you included in the model. CompactMdl = loadLearnerForCoder('mySVM'); pcaInfo = coder.load('pcaInfo.mat','PCACenters','PCACoefficients'); PCACenters = pcaInfo.PCACenters; PCACoefficients = pcaInfo.PCACoefficients; % Performs PCA transformation pcaTransformedX = bsxfun(@minus,X,PCACenters)*PCACoefficients; [label,scores] = predict(CompactMdl,pcaTransformedX); end

For a more detailed example, see “Code Generation and Classification Learner App” on page 34-32. For more information on the C code generation workflow and limitations, see “Code Generation”.

23-90

Export Classification Model to Predict New Data

Deploy Predictions Using MATLAB Compiler After you export a model to the workspace from Classification Learner, you can deploy it using MATLAB Compiler. Suppose you export the trained model to MATLAB Workspace based on the instructions in “Export Model to Workspace” on page 24-65, with the name trainedModel. To deploy predictions, follow these steps. • Save the trainedModel structure in a .mat file. save mymodel trainedModel

• Write the code to be compiled. This code must load the trained model and use it to make a prediction. It must also have a pragma, so the compiler recognizes that Statistics and Machine Learning Toolbox code is needed in the compiled application. This pragma can be any model training function used in Classification Learner (for example, fitctree). function ypred = mypredict(tbl) %#function fitctree load('mymodel.mat'); ypred = trainedModel.predictFcn(tbl); end

• Compile as a standalone application. mcc -m mypredict.m

Export Model for Deployment to MATLAB Production Server After you train a model in Classification Learner, you can export the model for deployment to MATLAB Production Server (requires MATLAB Compiler SDK™). • Select the trained model in the Models pane. On the Learn tab, click Export, click Export Model and select Export Model for Deployment. • In the Select Project File for Model Deployment dialog box, select a location and name for your project file. • In the autogenerated predictFunction.m file, inspect and amend the code as needed. • Use the Production Server Compiler app to package your model and prediction function. You can simulate the model deployment to MATLAB Production Server by clicking the Test Client button in the Test section of the Compiler tab, and then package your code by clicking the Package button in the Package section. For an example, see “Deploy Model Trained in Classification Learner to MATLAB Production Server” on page 23-185. For more information, see “Create Deployable Archive for MATLAB Production Server” (MATLAB Production Server).

See Also Functions fitctree | fitcdiscr | fitglm | fitclinear | fitcecoc | fitcsvm | fitcknn | fitckernel | fitcensemble | fitcnet 23-91

23

Classification Learner

Classes ClassificationTree | CompactClassificationTree | ClassificationDiscriminant | CompactClassificationDiscriminant | ClassificationSVM | CompactClassificationSVM | ClassificationLinear | ClassificationECOC | CompactClassificationECOC | ClassificationKNN | ClassificationNaiveBayes | CompactClassificationNaiveBayes | ClassificationKernel | ClassificationEnsemble | ClassificationBaggedEnsemble | CompactClassificationEnsemble | ClassificationNeuralNetwork | CompactClassificationNeuralNetwork | GeneralizedLinearModel

Related Examples •

23-92

“Train Classification Models in Classification Learner App” on page 23-10

Train Decision Trees Using Classification Learner App

Train Decision Trees Using Classification Learner App This example shows how to create and compare various classification trees using Classification Learner, and export trained models to the workspace to make predictions for new data. You can train classification trees to predict responses to data. To predict a response, follow the decisions in the tree from the root (beginning) node down to a leaf node. The leaf node contains the response. Statistics and Machine Learning Toolbox trees are binary. Each step in a prediction involves checking the value of one predictor (variable). For example, here is a simple classification tree:

This tree predicts classifications based on two predictors, x1 and x2. To predict, start at the top node. At each decision, check the values of the predictors to decide which branch to follow. When the branches reach a leaf node, the data is classified either as type 0 or 1. 1

In MATLAB, load the fisheriris data set and create a table of measurement predictors (or features) using variables from the data set to use for a classification. fishertable = readtable("fisheriris.csv");

2

On the Apps tab, in the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Learn tab, in the File section, click New Session > From Workspace.

4

In the New Session from Workspace dialog box, select the table fishertable from the Data Set Variable list (if necessary). Observe that the app has selected response and predictor variables based on their data type. Petal and sepal length and width are predictors, and species is the response that you want to classify. For this example, do not change the selections.

23-93

23

Classification Learner

5

To accept the default validation scheme and continue, click Start Session. The default validation option is cross-validation, to protect against overfitting. Classification Learner creates a scatter plot of the data.

23-94

Train Decision Trees Using Classification Learner App

6

Use the scatter plot to investigate which variables are useful for predicting the response. To visualize the distribution of species and measurements, select different variables in the X and Y lists under Predictors to the right of the plot. Observe which variables separate the species colors most clearly. Observe that the setosa species (blue points) is easy to separate from the other two species with all four predictors. The versicolor and virginica species are much closer together in all predictor measurements, and overlap especially when you plot sepal length and width. setosa is easier to predict than the other two species.

7

Train fine, medium, and coarse trees simultaneously. The Models pane already contains a fine tree model. Add medium and coarse tree models to the list of draft models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Decision Trees group, click Medium Tree. The app creates a draft medium tree in the Models pane. Reopen the model gallery and click Coarse Tree in the Decision Trees group. The app creates a draft coarse tree in the Models pane. In the Train section, click Train All and select Train All. The app trains the three tree models. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. 23-95

23

Classification Learner

• If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. In the Models pane, each model has a validation accuracy score that indicates the percentage of correctly predicted responses. The app highlights the highest Accuracy (Validation) value (or values) by outlining it in a box. 8

Click a model to view the results, which are displayed in the Summary tab. To open this tab, click the Open selected model summary button in the upper right of the Models pane.

9

For each model, examine the scatter plot. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Scatter in the Validation Results group. An X indicates misclassified points. For all three models, the blue points (setosa species) are all correctly classified, but some of the other two species are misclassified. Under Plot, switch between the Data and Model Predictions options. Observe the color of the incorrect (X) points. Alternatively, while plotting model predictions, to view only the incorrect points, clear the Correct check box.

10 To try to improve the models, include different features during model training. See if you can

improve the model by removing features with low predictive power. On the Learn tab, in the Options section, click Feature Selection. 23-96

Train Decision Trees Using Classification Learner App

In the Default Feature Selection tab, you can select different feature ranking algorithms to determine the most important features. After you select a feature ranking algorithm, the app displays a plot of the sorted feature importance scores, where larger scores (including Infs) indicate greater feature importance. The table shows the ranked features and their scores. In this example, the Chi2, ReliefF, ANOVA, and Kruskal Wallis feature ranking algorithms all identify the petal measurements as the most important features. Under Feature Ranking Algorithm, click Chi2.

Under Feature Selection, use the default option of selecting the highest ranked features to avoid bias in the validation metrics. Specify to keep 2 of the 4 features for model training. Click Save and Apply. The app applies the feature selection changes to new models created using the Models gallery. 11 Train new tree models using the reduced set of features. On the Learn tab, in the Models

section, click the arrow to open the gallery. In the Decision Trees group, click All Tree. In the Train section, click Train All and select Train All or Train Selected.

23-97

23

Classification Learner

The models trained using only two measurements perform comparably to the models containing all predictors. The models predict no better using all the measurements compared to only the two measurements. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily without some predictors. 12 Note the last model in the Models pane, a Coarse Tree model trained using only 2 of 4

predictors. The app displays how many predictors are excluded. To check which predictors are included, click the model in the Models pane, and note the check boxes in the expanded Feature Selection section of the model Summary tab. Note If you use a cross-validation scheme and choose to perform feature selection using the Select highest ranked features option, then for each training fold, the app performs feature selection before training a model. Different folds can select different predictors as the highest ranked features. The table on the Default Feature Selection tab shows the list of predictors used by the full model, trained on the training and validation data. 13 Train new tree models using another subset of measurements. On the Learn tab, in the Options

section, click Feature Selection. In the Default Feature Selection tab, click MRMR under Feature Ranking Algorithm. Under Feature Selection, specify to keep 3 of the 4 features for model training. Click Save and Apply. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Decision Trees group, click All Tree. In the Train section, click Train All and select Train All or Train Selected. The models trained using only 3 of 4 predictors do not perform as well as the other trained models. 14 Choose a best model among those of similar accuracy by examining the performance in each

class. For example, select the coarse tree that includes 2 of 4 predictors. Inspect the accuracy of the predictions in each class. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Confusion Matrix (Validation) in the Validation Results group. Use this plot to understand how the currently selected classifier performed in each class. View the matrix of true class and predicted class results. Look for areas where the classifier performed poorly by examining cells off the diagonal that display high numbers and are red. In these red cells, the true class and the predicted class do not match. The data points are misclassified.

23-98

Train Decision Trees Using Classification Learner App

In this figure, examine the third cell in the middle row. In this cell, true class is versicolor, but the model misclassified the points as virginica. For this model, the cell shows 2 misclassified (your results can vary). To view percentages instead of numbers of observations, select the True Positive Rates option under Plot controls. You can use this information to help you choose the best model for your goal. If false positives in this class are very important to your classification problem, then choose the best model at predicting this class. If false positives in this class are not very important, and models with fewer predictors do better in other classes, then choose a model to tradeoff some overall accuracy to exclude some predictors and make future data collection easier. 15 Compare the confusion matrix for each model in the Models pane. Check the Feature Selection

section of the model Summary tab to see which predictors are included in each model. In this example, the coarse tree that includes 2 of 4 predictors performs as well as the coarse tree with all predictors. That is, both models provide the same validation accuracy and have the same confusion matrix. 16 To further investigate features to include or exclude, use the parallel coordinates plot. On the

Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Parallel Coordinates in the Validation Results group. You can see that petal length and petal width are the features that separate the classes best.

23-99

23

Classification Learner

17 To learn about model hyperparameter settings, choose a model in the Models pane and expand

the Model Hyperparameters section in the model Summary tab. Compare the coarse and medium tree models, and note the differences in the model hyperparameters. In particular, the Maximum number of splits setting is 4 for coarse trees and 20 for medium trees. This setting controls the tree depth. To try to improve the coarse tree model further, change the Maximum number of splits setting. First, click the model in the Models pane. Right-click the model and select Duplicate. In the Summary tab, change the Maximum number of splits value. Then, in the Train section of the Learn tab, click Train All and select Train Selected. 18 Click on the best trained model in the Models pane. To export this model to the workspace, on

the Learn tab, click Export, click Export Model and select Export Model. In the Export Classification Model dialog box, click OK to accept the default variable name trainedModel. Look in the command window to see information about the results. 19 To visualize your decision tree model, enter: view(trainedModel.ClassificationTree,"Mode","graph")

23-100

Train Decision Trees Using Classification Learner App

20 You can use the exported classifier to make predictions on new data. For example, to make

predictions for the fishertable data in your workspace, enter: [yfit,scores] = trainedModel.predictFcn(fishertable)

The output yfit contains a class prediction for each data point. The output scores contains the class scores returned by the trained model. scores is an n-by-k array, where n is the number of data points and k is the number of classes in the trained model. 21 If you want to automate training the same classifier with new data, or learn how to

programmatically train classifiers, you can generate code from the app. To generate code for the best trained model, on the Learn tab, in the Export section, click Generate Function. The app generates code from your model and displays the file in the MATLAB Editor. To learn more, see “Generate MATLAB Code to Train the Model with New Data” on page 23-87. This example uses Fisher's 1936 iris data. The iris data contains measurements of flowers: the petal length, petal width, sepal length, and sepal width for specimens from three species. Train a classifier to predict the species based on the predictor measurements. Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. 23-101

23

Classification Learner

To try all the nonoptimizable classifier model presets available for your data set: 1

On the Learn tab, in the Models section, click the arrow to open the gallery of classification models.

2

In the Get Started group, click All.

3

In the Train section, click Train All and select Train All.

To learn about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10.

See Also Related Examples

23-102

•

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Choose Classifier Options” on page 23-22

•

“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-44

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

Train Discriminant Analysis Classifiers Using Classification Learner App

Train Discriminant Analysis Classifiers Using Classification Learner App This example shows how to construct discriminant analysis classifiers in the Classification Learner app, using the fisheriris data set. You can use discriminant analysis with two or more classes in Classification Learner. 1

In MATLAB, load the fisheriris data set. fishertable = readtable("fisheriris.csv");

2

On the Apps tab, in the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Learn tab, in the File section, click New Session > From Workspace.

In the New Session from Workspace dialog box, select the table fishertable from the Data Set Variable list (if necessary). Observe that the app has selected response and predictor variables based on their data type. Petal and sepal length and width are predictors, and species is the response that you want to classify. For this example, do not change the selections. 4

Click Start Session. Classification Learner creates a scatter plot of the data.

5

Use the scatter plot to visualize which variables are useful for predicting the response. Select different variables in the X- and Y-axis controls. Observe which variables separate the classes most clearly.

6

Train two discriminant analysis classifiers (one linear and one quadratic). On the Learn tab, in the Models section, click the arrow to expand the list of classifiers, and under Discriminant Analysis, click All Discriminants. Then, in the Train section, click Train All and select Train All. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

Classification Learner trains one of each discriminant option in the gallery, as well as the default fine tree model. In the Models pane, the app outlines in a box the Accuracy (Validation) score 23-103

23

Classification Learner

of the best model (or models). Classification Learner also displays a validation confusion matrix for the first discriminant model (Linear Discriminant).

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 7

To view the results for a model, select the model in the Models pane, and inspect the Summary tab. To open this tab, click the Open selected model summary button in the upper right of the Models pane. The Summary tab displays the Training Results metrics, calculated on the validation set.

8

Select the second discriminant model (Quadratic Discriminant) in the Models pane, and inspect the accuracy of the predictions in each class. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Confusion Matrix (Validation) in the Validation Results group. View the matrix of true class and predicted class results.

9

Compare the results for the two discriminant models. For information on the strengths of different model types, see “Discriminant Analysis” on page 23-28.

10 Choose the best model in the Models pane (the best score is highlighted in a box). To improve

the model, try including different features in the model. See if you can improve the model by removing features with low predictive power. First, duplicate the best model. Right-click the model and select Duplicate. 11 Investigate features to include or exclude using one of these methods.

• Use the parallel coordinates plot. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and click Parallel Coordinates in the Validation Results group. Keep predictors that separate classes well. 23-104

Train Discriminant Analysis Classifiers Using Classification Learner App

In the model Summary tab, you can specify the predictors to use during training. Click Feature Selection to expand the section, and specify predictors to remove from the model. • Use a feature ranking algorithm. On the Learn tab, in the Options section, click Feature Selection. In the Default Feature Selection tab, specify the feature ranking algorithm you want to use, and the number of features to keep among the highest ranked features. The bar graph can help you decide how many features to use. Click Save and Apply to save your changes. The new feature selection is applied to the existing draft model in the Models pane and will be applied to new draft models that you create using the gallery in the Models section of the Learn tab. 12 Train the model. On the Learn tab, in the Train section, click Train All and select Train

Selected to train the model using the new options. Compare results among the classifiers in the Models pane. 13 Choose the best model in the Models pane. To try to improve the model further, try changing its

hyperparameters. First, duplicate the model by clicking Duplicate in the Models section. Then, try changing a hyperparameter setting in the model Summary tab. Train the new model by clicking Train All and selecting Train Selected in the Train section. For information on settings, see “Discriminant Analysis” on page 23-28. 14 You can export a full or compact version of the trained model to the workspace. On the

Classification Learner tab, click Export, click Export Model and select Export Model. To exclude the training data and export a compact model, clear the check box in the Export Classification Model dialog box. You can still use the compact model for making predictions on new data. In the dialog box, click OK to accept the default variable name. 15 To examine the code for training this classifier, click Generate Function in the Export section.

Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. To try all the nonoptimizable classifier model presets available for your data set: 1

On the Learn tab, in the Models section, click the arrow to open the gallery of classification models.

2

In the Get Started group, click All.

3

In the Train section, click Train All and select Train All.

To learn about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10.

23-105

23

Classification Learner

See Also Related Examples

23-106

•

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Choose Classifier Options” on page 23-22

•

“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-44

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

•

“Train Decision Trees Using Classification Learner App” on page 23-93

Train Binary GLM Logistic Regression Classifier Using Classification Learner App

Train Binary GLM Logistic Regression Classifier Using Classification Learner App This example shows how to train a binary GLM logistic regression classifier in the Classification Learner app using the ionosphere data set which contains two classes. In the ionosphere data, the response variable is categorical with two levels: g represents good radar returns, and b represents bad radar returns. 1

In MATLAB, load the ionosphere data set and define some variables from the data set to use for a classification. load ionosphere ionosphere = array2table(X); ionosphere.Group = Y;

Alternatively, you can load the ionosphere data set and keep the X and Y data as separate variables. 2

On the Apps tab, in the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Learn tab, in the File section, click New Session > From Workspace.

In the New Session from Workspace dialog box, select the table ionosphere from the Data Set Variable list. Observe that the app has selected Group for the response variable, and the rest as predictors. Group has two levels. Alternatively, if you kept your predictor data X and response variable Y as two separate variables, you can first select the matrix X from the Data Set Variable list. Then, under Response, click the From workspace option button and select Y from the list. The Y variable is the same as the Group variable. 4

Click Start Session. Classification Learner creates a scatter plot of the data.

5

Use the scatter plot to visualize which variables are useful for predicting the response. Select different variables in the X- and Y-axis controls. Observe which variables separate the class colors most clearly.

6

Train the binary GLM logistic regression classifier. On the Learn tab, in the Models section, click the Show more arrow to display the gallery of classifiers. Under Logistic Regression Classifiers, click Binary GLM Logistic Regression. Then, in the Train section, click Train All and select Train All. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel 23-107

23

Classification Learner

pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

Classification Learner trains the model as well as the default fine tree model. The app outlines in a box the Accuracy (Validation) score of the best model. Classification Learner also displays a validation confusion matrix for the logistic regression model.

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 7

To view the results for the model, inspect the Summary tab. To open this tab, click the Open selected model summary button in the upper right of the Models pane. The Summary tab displays the Training Results metrics, calculated on the validation set.

8

Examine the scatter plot for the trained model. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Scatter in the Validation Results group. Try plotting different predictors. Misclassified points are shown as an X.

9

Inspect the accuracy of the predictions in each class. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Confusion Matrix (Validation) in the Validation Results group. View the matrix of true class and predicted class results.

10 Choose the best model in the Models pane (the best score is highlighted in a box). To improve

the model, try including different features in the model. See if you can improve the model by removing features with low predictive power. First, duplicate the best model. Right-click the model and select Duplicate. 23-108

Train Binary GLM Logistic Regression Classifier Using Classification Learner App

11 Investigate features to include or exclude using one of these methods.

• Use the parallel coordinates plot. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Parallel Coordinates in the Validation Results group. Keep predictors that separate classes well. In the model Summary tab, you can specify the predictors to use during training. Click Feature Selection to expand the section, and specify predictors to remove from the model. • Use a feature ranking algorithm. On the Learn tab, in the Options section, click Feature Selection. In the Default Feature Selection tab, specify the feature ranking algorithm you want to use, and the number of features to keep among the highest ranked features. The bar graph can help you decide how many features to use. Click Save and Apply to save your changes. The new feature selection is applied to the existing draft model in the Models pane and will be applied to new draft models that you create using the gallery in the Models section of the Learn tab. 12 Train the model. On the Learn tab, in the Train section, click Train All and select Train

Selected to train the model using the new options. Compare results among the classifiers in the Models pane. 13 You can export a full version of the trained model to the workspace. On the Learn tab, click

Export, click Export Model and select Export Model. In the Export Classification Model dialog box, the check box to include the training data is selected and disabled, because binary GLM logistic regression models always store training data. Click OK in the dialog box to accept the default variable name. 14 To examine the code for training this classifier, click Generate Function.

Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. To try all the nonoptimizable classifier model presets available for your data set: 1

On the Learn tab, in the Models section, click the arrow to open the gallery of classification models.

2

In the Get Started group, click All.

3

In the Train section, click Train All and select Train All.

To learn about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10.

23-109

23

Classification Learner

See Also Related Examples

23-110

•

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Choose Classifier Options” on page 23-22

•

“Logistic Regression Classifiers” on page 23-29

•

“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-44

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

•

“Train Decision Trees Using Classification Learner App” on page 23-93

Train Support Vector Machines Using Classification Learner App

Train Support Vector Machines Using Classification Learner App This example shows how to construct support vector machine (SVM) classifiers in the Classification Learner app, using the ionosphere data set that contains two classes. You can use a support vector machine (SVM) with two or more classes in Classification Learner. An SVM classifies data by finding the best hyperplane that separates all data points of one class from those of another class. In the ionosphere data, the response variable is categorical with two levels: g represents good radar returns, and b represents bad radar returns. 1

In MATLAB, load the ionosphere data set and define some variables from the data set to use for a classification. load ionosphere ionosphere = array2table(X); ionosphere.Group = Y;

Alternatively, you can load the ionosphere data set and keep the X and Y data as separate variables. 2

On the Apps tab, in the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Learn tab, in the File section, click New Session > From Workspace.

In the New Session from Workspace dialog box, select the table ionosphere from the Data Set Variable list. Observe that the app has selected response and predictor variables based on their data type. The response variable Group has two levels. All the other variables are predictors. Alternatively, if you kept your predictor data X and response variable Y as two separate variables, you can first select the matrix X from the Data Set Variable list. Then, under Response, click the From workspace option button and select Y from the list. The Y variable is the same as the Group variable. 4

Click Start Session. Classification Learner creates a scatter plot of the data.

5

Use the scatter plot to visualize which variables are useful for predicting the response. Select different variables in the X- and Y-axis controls. Observe which variables separate the class colors most clearly.

6

Train a selection of SVM models. On the Learn tab, in the Models section, click the arrow to expand the list of classifiers, and under Support Vector Machines, click All SVMs. Then, in the Train section, click Train All and select Train All. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel 23-111

23

Classification Learner

pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

Classification Learner trains one of each SVM option in the gallery, as well as the default fine tree model. In the Models pane, the app outlines in a box the Accuracy (Validation) score of the best model. Classification Learner also displays a validation confusion matrix for the first SVM model (Linear SVM).

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example.

23-112

7

To view the results for a model, select the model in the Models pane, and inspect the Summary tab. To open this tab, click the Open selected model summary button in the upper right of the Models pane. The Summary tab displays the Training Results metrics, calculated on the validation set.

8

For the selected model, inspect the accuracy of the predictions in each class. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Confusion Matrix (Validation) in the Validation Results group. View the matrix of true class and predicted class results.

9

For each remaining model, select the model in the Models pane, open the validation confusion matrix, and then compare the results across the models.

Train Support Vector Machines Using Classification Learner App

10 Choose the best model in the Models pane (the best score is highlighted in a box). To improve

the model, try including different features in the model. See if you can improve the model by removing features with low predictive power. First, duplicate the best model by right-clicking the model and selecting Duplicate. 11 Investigate features to include or exclude using one of these methods.

• Use the parallel coordinates plot. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Parallel Coordinates in the Validation Results group. Keep predictors that separate classes well. In the model Summary tab, you can specify the predictors to use during training. Click Feature Selection to expand the section, and specify predictors to remove from the model. • Use a feature ranking algorithm. On the Learn tab, in the Options section, click Feature Selection. In the Default Feature Selection tab, specify the feature ranking algorithm you want to use, and the number of features to keep among the highest ranked features. The bar graph can help you decide how many features to use. Click Save and Apply to save your changes. The new feature selection is applied to the existing draft model in the Models pane and will be applied to new draft models that you create using the gallery in the Models section of the Learn tab. 12 Train the model. On the Learn tab, in the Train section, click Train All and select Train

Selected to train the model using the new options. Compare results among the classifiers in the Models pane. 13 Choose the best model in the Models pane. To try to improve the model further, try changing its

hyperparameters. First, duplicate the best model by right-clicking the model and selecting Duplicate. Then, try changing a hyperparameter setting in the model Summary tab. Train the new model by clicking Train All and selecting Train Selected in the Train section. For information on settings, see “Support Vector Machines” on page 23-31. 14 You can export a full or compact version of the trained model to the workspace. On the

Classification Learner tab, click Export, click Export Model and select Export Model. To exclude the training data and export a compact model, clear the check box in the Export Classification Model dialog box. You can still use the compact model for making predictions on new data. In the dialog box, click OK to accept the default variable name. 15 To examine the code for training this classifier, click Generate Function. For SVM models, see

also “Generate C Code for Prediction” on page 23-88. Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. To try all the nonoptimizable classifier model presets available for your data set: 1

On the Learn tab, in the Models section, click the arrow to open the gallery of classification models.

2

In the Get Started group, click All.

23-113

23

Classification Learner

3

In the Train section, click Train All and select Train All.

To learn about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10.

See Also Related Examples

23-114

•

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Choose Classifier Options” on page 23-22

•

“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-44

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

•

“Generate C Code for Prediction” on page 23-88

•

“Train Decision Trees Using Classification Learner App” on page 23-93

Train Nearest Neighbor Classifiers Using Classification Learner App

Train Nearest Neighbor Classifiers Using Classification Learner App This example shows how to construct nearest neighbors classifiers in the Classification Learner app. 1

In MATLAB, load the fisheriris data set and define some variables from the data set to use for a classification. fishertable = readtable("fisheriris.csv");

2

On the Apps tab, in the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Learn tab, in the File section, click New Session > From Workspace.

In the New Session from Workspace dialog box, select the table fishertable from the Data Set Variable list (if necessary). Observe that the app has selected response and predictor variables based on their data type. Petal and sepal length and width are predictors, and species is the response that you want to classify. For this example, do not change the selections. 4

Click Start Session. The app creates a scatter plot of the data.

5

Use the scatter plot to investigate which variables are useful for predicting the response. To visualize the distribution of species and measurements, select different options in the X- and Yaxis controls. Observe which variables separate the species colors most clearly.

6

To train a selection of nearest neighbor models, on the Learn tab, in the Models section, click the arrow to expand the list of classifiers, and under Nearest Neighbor Classifiers, click All KNNs. Then, in the Train section, click Train All and select Train All. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

Classification Learner trains one of each nearest neighbor classification option in the gallery, as well as the default fine tree model. The app outlines in a box the Accuracy (Validation) score of the best model. Classification Learner also displays a validation confusion matrix for the first KNN model (Fine KNN). 23-115

23

Classification Learner

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 7

To view the results for a model, select the model in the Models pane, and inspect the Summary tab. To open this tab, click the Open selected model summary button in the upper right of the Models pane. The Summary tab displays the Training Results metrics, calculated on the validation set.

8

For the selected model, inspect the accuracy of the predictions in each class. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Confusion Matrix (Validation) in the Validation Results group. View the matrix of true class and predicted class results.

9

For each remaining model, select the model in the Models pane, open the validation confusion matrix, and then compare the results across the models.

10 Choose the best model in the Models pane (the best score is highlighted in a box). To improve

the model, try including different features in the model. See if you can improve the model by removing features with low predictive power. First, duplicate the best model by right-clicking the model and selecting Duplicate. 11 Investigate features to include or exclude using one of these methods.

• Use the parallel coordinates plot. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Parallel Coordinates in the Validation Results group. Keep predictors that separate classes well. In the model Summary tab, you can specify the predictors to use during training. Click Feature Selection to expand the section, and specify predictors to remove from the model. • Use a feature ranking algorithm. On the Learn tab, in the Options section, click Feature Selection. In the Default Feature Selection tab, specify the feature ranking algorithm you 23-116

Train Nearest Neighbor Classifiers Using Classification Learner App

want to use, and the number of features to keep among the highest ranked features. The bar graph can help you decide how many features to use. Click Save and Apply to save your changes. The new feature selection is applied to the existing draft model in the Models pane and will be applied to new draft models that you create using the gallery in the Models section of the Learn tab. 12 Train the model. On the Learn tab, in the Train section, click Train All and select Train

Selected to train the model using the new options. Compare results among the classifiers in the Models pane. 13 Choose the best model in the Models pane. To try to improve the model further, try changing its

hyperparameters. First, duplicate the best model by right-clicking the model and selecting Duplicate. Then, try changing a hyperparameter setting in the model Summary tab. Train the new model by clicking Train All and selecting Train Selected in the Train section. For information on settings and the strengths of different nearest neighbor model types, see “Nearest Neighbor Classifiers” on page 23-35. 14 You can export a full version of the trained model to the workspace. On the Learn tab, click

Export, click Export Model and select Export Model. In the Export Classification Model dialog box, the check box to include the training data is selected and disabled, because nearest neighbor approximation models always store training data. Click OK in the dialog box to accept the default variable name. 15 To examine the code for training this classifier, click Generate Function.

Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. To try all the nonoptimizable classifier model presets available for your data set: 1

On the Learn tab, in the Models section, click the arrow to open the gallery of classification models.

2

In the Get Started group, click All.

3

In the Train section, click Train All and select Train All.

To learn about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10.

23-117

23

Classification Learner

See Also Related Examples

23-118

•

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Choose Classifier Options” on page 23-22

•

“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-44

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

•

“Train Decision Trees Using Classification Learner App” on page 23-93

Train Kernel Approximation Classifiers Using Classification Learner App

Train Kernel Approximation Classifiers Using Classification Learner App This example shows how to create and compare kernel approximation classifiers in the Classification Learner app, and export trained models to the workspace to make predictions for new data. You can use kernel approximation classifiers to perform nonlinear classification of data with many observations. For large in-memory data, kernel classifiers tend to train and predict faster than SVM classifiers with Gaussian kernels. 1

In the MATLAB Command Window, load the humanactivity data set, and create a table from the variables in the data set to use for classification. The data set contains 24,075 observations of five physical human activities: sitting, standing, walking, running, and dancing. Each observation has 60 features extracted from acceleration data measured by smartphone accelerometer sensors. load humanactivity Tbl = array2table(feat); Tbl.Properties.VariableNames = featlabels'; activity = categorical(actid,1:5,actnames); Tbl.Activity = activity;

Alternatively, you can load the humanactivity data set, create the categorical activity response variable, and keep the feat and activity data as separate variables. 2

Click the Apps tab, and then click the Show more arrow on the right to open the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Learn tab, in the File section, click New Session and select From Workspace.

4

In the New Session from Workspace dialog box, select the table Tbl from the Data Set Variable list. Note that the app selects response and predictor variables based on their data types. In particular, the app selects Activity as the response variable because it is the only categorical variable. For this example, do not change the selections. Alternatively, if you keep the predictor data feat and response variable activity as two separate variables, you can first select the matrix feat from the Data Set Variable list. Then, under Response, click the From workspace option button and select activity from the list.

5

To accept the default validation scheme and continue, click Start Session. The default validation option is 5-fold cross-validation, to protect against overfitting. Classification Learner creates a scatter plot of the data.

6

Use the scatter plot to investigate which variables are useful for predicting the response. Select different options in the X and Y lists under Predictors to visualize the distribution of activities and measurements. Note which variables separate the activities (colors) most clearly.

7

Create a selection of kernel approximation models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Kernel Approximation Classifiers group, click All Kernels. 23-119

23

Classification Learner

8

In the Train section, click Train All and select Train All. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

Classification Learner trains one of each kernel approximation option in the gallery, as well as the default fine tree model. In the Models pane, the app outlines the Accuracy (Validation) score of the best model. Classification Learner also displays a validation confusion matrix for the first kernel model (SVM Kernel).

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 9

23-120

To view the results for a model, double-click the model in the Models pane, and inspect the model Summary tab. The Summary tab displays the Training Results metrics, calculated on the validation set.

Train Kernel Approximation Classifiers Using Classification Learner App

10 Select the second kernel model (Logistic Regression Kernel) in the Models pane, and inspect

the accuracy of the predictions in each class using a validation confusion matrix. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Confusion Matrix (Validation) in the Validation Results group. View the matrix of true class and predicted class results. 11 Compare the confusion matrices for the two kernel models side-by-side. First, close the plot and

summary tabs for Model 1. On the Learn tab, in the Plots and Results section, click the Layout button and select Compare models. In the top right of each plot, click the Hide plot options button

to make more room for the plot.

To return to the original layout, you can click the Layout button and select Single model (Default). 12 Choose the best kernel model in the Models pane (the best overall score is highlighted in the

Accuracy (Validation) box). See if you can improve the model by removing features with low predictive power. First, duplicate the best kernel model by right-clicking the model and selecting Duplicate. 13 Investigate features to include or exclude using one of these methods.

23-121

23

Classification Learner

• Use the parallel coordinates plot. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Parallel Coordinates in the Validation Results group. Keep predictors that separate classes well. In the model Summary tab, you can specify the predictors to use during training. Click Feature Selection to expand the section, and specify predictors to remove from the model. • Use a feature ranking algorithm. On the Learn tab, in the Options section, click Feature Selection. In the Default Feature Selection tab, specify the feature ranking algorithm you want to use, and the number of features to keep among the highest ranked features. The bar graph can help you decide how many features to use. Click Save and Apply to save your changes. The new feature selection is applied to the existing draft model in the Models pane and will be applied to new draft models that you create using the gallery in the Models section of the Learn tab. 14 Train the model. On the Learn tab, in the Train section, click Train All and select Train

Selected to train the model using the new options. Compare results among the classifiers in the Models pane. 15 Choose the best kernel model in the Models pane. To try to improve the model further, change

its hyperparameters. First, duplicate the model by right-clicking the model and selecting Duplicate. Then, try changing some hyperparameter settings in the model Summary tab. Train the new model by clicking Train All and selecting Train Selected in the Train section. To learn more about kernel model settings, see “Kernel Approximation Classifiers” on page 2337. 16 You can export a compact version of the trained model to the workspace. On the Learn tab, click

Export, click Export Model and select Export Model. In the Export Classification Model dialog box, the check box to include the training data is disabled because kernel approximation models do not store training data. In the dialog box, click OK to accept the default variable name. 17 To examine the code for training this classifier, click Generate Function in the Export section.

Because the data set used to train this classifier has more than two classes, the generated code uses the fitcecoc function rather than fitckernel. Tip Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. To train all the nonoptimizable classifier model presets available for your data set:

23-122

1

On the Learn tab, in the Models section, click the arrow to open the gallery of classification models.

2

In the Get Started group, click All.

Train Kernel Approximation Classifiers Using Classification Learner App

3

In the Train section, click Train All and select Train All.

To learn about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10.

See Also Related Examples •

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Choose Classifier Options” on page 23-22

•

“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-44

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

•

“Train Decision Trees Using Classification Learner App” on page 23-93

23-123

23

Classification Learner

Train Ensemble Classifiers Using Classification Learner App This example shows how to construct ensembles of classifiers in the Classification Learner app. Ensemble classifiers meld results from many weak learners into one high-quality ensemble predictor. Qualities depend on the choice of algorithm, but ensemble classifiers tend to be slow to fit because they often need many learners. 1

In MATLAB, load the fisheriris data set and define some variables from the data set to use for a classification. fishertable = readtable("fisheriris.csv");

2

On the Apps tab, in the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Learn tab, in the File section, click New Session > From Workspace.

In the New Session from Workspace dialog box, select the table fishertable from the Data Set Variable list (if necessary). Observe that the app has selected response and predictor variables based on their data type. Petal and sepal length and width are predictors. Species is the response that you want to classify. For this example, do not change the selections. 4

Click Start Session. Classification Learner creates a scatter plot of the data.

5

Use the scatter plot to investigate which variables are useful for predicting the response. Select different variables in the X- and Y-axis controls to visualize the distribution of species and measurements. Observe which variables separate the species colors most clearly.

6

Train a selection of ensemble models. On the Learn tab, in the Models section, click the arrow to expand the list of classifiers, and under Ensemble Classifiers, click All Ensembles. Then, in the Train section, click Train All and select Train All. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

Classification Learner trains one of each ensemble classification option in the gallery, as well as the default fine tree model. In the Models pane, the app outlines in a box the Accuracy (Validation) score of the best model. Classification Learner also displays a validation confusion matrix for the first ensemble model (Boosted Trees). 23-124

Train Ensemble Classifiers Using Classification Learner App

7

Select a model in the Models pane to view the results. For example, select the Subspace Discriminant model (model 2.3). Inspect the model Summary tab, which displays the Training Results metrics, calculated on the validation set.

8

Examine the scatter plot for the trained model. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Scatter in the Validation Results group. Misclassified points are shown as an X.

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 9

Inspect the accuracy of the predictions in each class. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Confusion Matrix (Validation) in the Validation Results group. View the matrix of true class and predicted class results.

10 For each remaining model, select the model in the Models pane, open the validation confusion

matrix, and then compare the results across the models. 11 Choose the best model (the best score is highlighted in the Accuracy (Validation) box). To

improve the model, try including different features in the model. See if you can improve the model by removing features with low predictive power. First, duplicate the model by right-clicking the model and selecting Duplicate. 12 Investigate features to include or exclude using one of these methods.

• Use the parallel coordinates plot. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Parallel Coordinates in the Validation Results group. Keep predictors that separate classes well. 23-125

23

Classification Learner

In the model Summary tab, you can specify the predictors to use during training. Click Feature Selection to expand the section, and specify predictors to remove from the model. • Use a feature ranking algorithm. On the Learn tab, in the Options section, click Feature Selection. In the Default Feature Selection tab, specify the feature ranking algorithm you want to use, and the number of features to keep among the highest ranked features. The bar graph can help you decide how many features to use. Click Save and Apply to save your changes. The new feature selection is applied to the existing draft model in the Models pane and will be applied to new draft models that you create using the gallery in the Models section of the Learn tab. 13 Train the model. On the Learn tab, in the Train section, click Train All and select Train

Selected to train the model using the new options. Compare results among the classifiers in the Models pane. 14 Choose the best model in the Models pane. To try to improve the model further, try changing its

hyperparameters. First, duplicate the model by right-clicking the model and selecting Duplicate. Then, try changing a hyperparameter setting in the model Summary tab. Train the new model by clicking Train All and selecting Train Selected in the Train section. For information on the settings to try and the strengths of different ensemble model types, see “Ensemble Classifiers” on page 23-38. 15 You can export a full or compact version of the trained model to the workspace. On the

Classification Learner tab, click Export, click Export Model and select Export Model. To exclude the training data and export a compact model, clear the check box in the Export Classification Model dialog box. You can still use the compact model for making predictions on new data. In the dialog box, click OK to accept the default variable name. 16 To examine the code for training this classifier, click Generate Function.

Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. To try all the nonoptimizable classifier model presets available for your data set: 1

On the Learn tab, in the Models section, click the arrow to open the gallery of classification models.

2

In the Get Started group, click All.

3

In the Train section, click Train All and select Train All.

To learn about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10. 23-126

Train Ensemble Classifiers Using Classification Learner App

See Also Related Examples •

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Choose Classifier Options” on page 23-22

•

“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-44

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

•

“Train Decision Trees Using Classification Learner App” on page 23-93

23-127

23

Classification Learner

Train Naive Bayes Classifiers Using Classification Learner App This example shows how to create and compare different naive Bayes classifiers using the Classification Learner app, and export trained models to the workspace to make predictions for new data. Naive Bayes classifiers leverage Bayes' theorem and make the assumption that predictors are independent of one another within each class. However, the classifiers appear to work well even when the independence assumption is not valid. You can use naive Bayes with two or more classes in Classification Learner. The app allows you to train a Gaussian naive Bayes model or a kernel naive Bayes model individually or simultaneously. This table lists the available naive Bayes models in Classification Learner and the probability distributions used by each model to fit predictors. Model

Numerical Predictor

Categorical Predictor

Gaussian naive Bayes

Gaussian distribution (or normal multivariate multinomial distribution) distribution

Kernel naive Bayes

Kernel distribution You can specify the kernel type and support. Classification Learner automatically determines the kernel width using the underlying fitcnb function.

multivariate multinomial distribution

This example uses Fisher's iris data set, which contains measurements of flowers (petal length, petal width, sepal length, and sepal width) for specimens from three species. Train naive Bayes classifiers to predict the species based on the predictor measurements. 1

In the MATLAB Command Window, load the Fisher iris data set and create a table of measurement predictors (or features) using variables from the data set. fishertable = readtable("fisheriris.csv");

2

Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Learn tab, in the File section, select New Session > From Workspace.

4

In the New Session from Workspace dialog box, select the table fishertable from the Data Set Variable list (if necessary). As shown in the dialog box, the app selects the response and predictor variables based on their data type. Petal and sepal length and width are predictors, and species is the response that you want to classify. For this example, do not change the selections.

23-128

Train Naive Bayes Classifiers Using Classification Learner App

5

To accept the default validation scheme and continue, click Start Session. The default validation option is cross-validation, to protect against overfitting. Classification Learner creates a scatter plot of the data.

23-129

23

Classification Learner

6

Use the scatter plot to investigate which variables are useful for predicting the response. Select different options on the X and Y lists under Predictors to visualize the distribution of species and measurements. Observe which variables separate the species colors most clearly. The setosa species (blue points) is easy to separate from the other two species with all four predictors. The versicolor and virginica species are much closer together in all predictor measurements and overlap, especially when you plot sepal length and width. setosa is easier to predict than the other two species.

7

Create a naive Bayes model. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Naive Bayes Classifiers group, click Gaussian Naive Bayes. Note that the Model Hyperparameters section of the model Summary tab contains no hyperparameter options.

8

In the Train section, click Train All and select Train Selected. Note

23-130

Train Naive Bayes Classifiers Using Classification Learner App

• If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

The app creates a Gaussian naive Bayes model, and plots a validation confusion matrix. The app displays the Gaussian Naive Bayes model in the Models pane. Check the model validation accuracy in the Accuracy (Validation) box. The value shows that the model performs well. For the Gaussian Naive Bayes model, by default, the app models the distribution of numerical predictors using the Gaussian distribution, and models the distribution of categorical predictors using the multivariate multinomial distribution (MVMN).

Note Validation introduces some randomness into the results. Your model validation results might vary from the results shown in this example. 9

Examine the scatter plot for the trained model. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Scatter in the Validation Results group. An X indicates a misclassified point. The blue points (setosa species) are all correctly classified, but the other two species have misclassified points. Under Plot, switch between the 23-131

23

Classification Learner

Data and Model predictions options. Observe the color of the incorrect (X) points. Or, to view only the incorrect points, clear the Correct check box. 10 Train a kernel naive Bayes model for comparison. On the Learn tab, in the Models gallery, click

Kernel Naive Bayes. The app displays a draft kernel naive Bayes model in the Models pane. In the model Summary tab, under Model Hyperparameters, select Triangle from the Kernel type list, select Positive from the Support list, and select No from the Standardize data list.

In the Train section, click Train All and select Train Selected to train the new model.

23-132

Train Naive Bayes Classifiers Using Classification Learner App

The Models pane displays the model validation accuracy for the new kernel naive Bayes model. Its validation accuracy is better than the validation accuracy of the Gaussian naive Bayes model. The app highlights the Accuracy (Validation) value of the best model (or models) by outlining it in a box. 11 In the Models pane, click each model to view and compare the results. To view the results for a

model, inspect the model Summary tab. The Summary tab displays the Training Results metrics, calculated on the validation set. 12 Train a Gaussian naive Bayes model and a kernel naive Bayes model simultaneously. On the

Learn tab, in the Models gallery, click All Naive Bayes. In the Train section, click Train All and select Train Selected. The app trains one of each naive Bayes model type and highlights the Accuracy (Validation) value of the best model or models. Classification Learner displays a validation confusion matrix for the first model (model 4.1).

23-133

23

Classification Learner

13 In the Models pane, click a model to view the results. For example, select model 2. Examine the

scatter plot for the trained model. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Scatter in the Validation Results group. Try plotting different predictors. Misclassified points appear as an X. 14 Inspect the accuracy of the predictions in each class. On the Learn tab, in the Plots and

Results section, click the arrow to open the gallery, and then click Confusion Matrix (Validation) in the Validation Results group. The app displays a matrix of true class and predicted class results. 15 In the Models pane, click the other trained models and compare their results. 16 To try to improve the models, include different features during model training. See if you can

improve the models by removing features with low predictive power. On the Learn tab, in the Options section, click Feature Selection. In the Default Feature Selection tab, you can select different feature ranking algorithms to determine the most important features. After you select a feature ranking algorithm, the app displays a plot of the sorted feature importance scores, where larger scores (including Infs) indicate greater feature importance. The table shows the ranked features and their scores. In this example, use one-way ANOVA to rank the features. Under Feature Ranking Algorithm, click ANOVA.

23-134

Train Naive Bayes Classifiers Using Classification Learner App

Under Feature Selection, use the default option of selecting the highest ranked features to avoid bias in the validation metrics. Specify to keep 2 of the 4 features for model training. Click Save and Apply. The app applies the feature selection changes to new models created using the Models gallery. 17 Train new naive Bayes models using the reduced set of features. On the Learn tab, in the

Models gallery, click All Naive Bayes. In the Train section, click Train All and select Train Selected. In this example, the two models trained using a reduced set of features perform better than the models trained using all the predictors. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily without some predictors. 18 To determine which predictors are included, click a model in the Models pane, and note the

check boxes in the expanded Feature Selection section of the model Summary tab. For example, model 5.1 contains only the petal measurements.

23-135

23

Classification Learner

Note If you use a cross-validation scheme and choose to perform feature selection using the Select highest ranked features option, then for each training fold, the app performs feature selection before training a model. Different folds can select different predictors as the highest ranked features. The table on the Default Feature Selection tab shows the list of predictors used by the full model, trained on the training and validation data. 19 To further investigate features to include or exclude, use the parallel coordinates plot. On the

Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Parallel Coordinates in the Validation Results group. 20 In the Models pane, click the model with the highest Accuracy (Validation) value. To try to

improve the model further, change its hyperparameters (if possible). First, duplicate the model by right-clicking the model and selecting Duplicate. Then, try changing hyperparameter settings in the model Summary tab. Recall that hyperparameter options are available only for some models. Train the new model by clicking Train All and selecting Train Selected in the Train section. 21 Export the trained model to the workspace. On the Learn tab, in the Export section, click

Export Model and select Export Model. In the Export Classification Model dialog box, click OK to accept the default variable name. 22 Examine the code for training this classifier. In the Export section, click Generate Function.

Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. To try all the nonoptimizable classifier model presets available for your data set: 1

On the Learn tab, in the Models section, click the arrow to open the gallery of models.

2

In the Get Started group, click All.

3

In the Train section, click Train All and select Train All.

For information about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10.

See Also Related Examples

23-136

•

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Choose Classifier Options” on page 23-22

•

“Naive Bayes Classification” on page 22-2

Train Naive Bayes Classifiers Using Classification Learner App

•

“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-44

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

23-137

23

Classification Learner

Train Neural Network Classifiers Using Classification Learner App This example shows how to create and compare neural network classifiers in the Classification Learner app, and export trained models to the workspace to make predictions for new data. 1

In the MATLAB Command Window, load the fisheriris data set, and create a table from the variables in the data set to use for classification. fishertable = readtable("fisheriris.csv");

2

Click the Apps tab, and then click the Show more arrow on the right to open the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Learn tab, in the File section, click New Session and select From Workspace.

4

In the New Session from Workspace dialog box, select the table fishertable from the Data Set Variable list (if necessary). Observe that the app has selected response and predictor variables based on their data types. Petal and sepal length and width are predictors, and species is the response that you want to classify. For this example, do not change the selections.

5

To accept the default validation scheme and continue, click Start Session. The default validation option is 5-fold cross-validation, to protect against overfitting. Classification Learner creates a scatter plot of the data.

6

Use the scatter plot to investigate which variables are useful for predicting the response. Select different options in the X and Y lists under Predictors to visualize the distribution of species and measurements. Note which variables separate the species colors most clearly.

7

Create a selection of neural network models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Neural Network Classifiers group, click All Neural Networks.

8

In the Train section, click Train All and select Train All. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

Classification Learner trains one of each neural network classification option in the gallery, as well as the default fine tree model. In the Models pane, the app outlines the Accuracy 23-138

Train Neural Network Classifiers Using Classification Learner App

(Validation) score of the best model. Classification Learner also displays a validation confusion matrix for the first neural network model (Narrow Neural Network). 9

Select a model in the Models pane to view the results. For example, double-click the Narrow Neural Network model (model 2.1). Inspect the model Summary tab, which displays the Training Results metrics, calculated on the validation set.

10 Examine the scatter plot for the trained model. On the Learn tab, in the Plots and Results

section, click the arrow to open the gallery, and then click Scatter in the Validation Results group. Correctly classified points are marked with an O, and incorrectly classified points are marked with an X.

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 11 Inspect the accuracy of the predictions in each class. On the Learn tab, in the Plots and

Results section, click the arrow to open the gallery, and then click Confusion Matrix (Validation) in the Validation Results group. View the matrix of true class and predicted class results. 12 For each remaining model, select the model in the Models pane, open the validation confusion

matrix, and then compare the results across the models. 13 Choose the best model in the Models pane (the best score is highlighted in the Accuracy

(Validation) box). See if you can improve the model by removing features with low predictive power. First, duplicate the best model. On the Learn tab, in the Models section, click Duplicate. 14 Investigate features to include or exclude using one of these methods.

23-139

23

Classification Learner

• Use the parallel coordinates plot. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Parallel Coordinates in the Validation Results group. Keep predictors that separate classes well. In the model Summary tab, you can specify the predictors to use during training. Click Feature Selection to expand the section, and specify predictors to remove from the model. • Use a feature ranking algorithm. On the Learn tab, in the Options section, click Feature Selection. In the Default Feature Selection tab, specify the feature ranking algorithm you want to use, and the number of features to keep among the highest ranked features. The bar graph can help you decide how many features to use. Click Save and Apply to save your changes. The new feature selection is applied to the existing draft model in the Models pane and will be applied to new draft models that you create using the gallery in the Models section of the Learn tab. 15 Train the model. On the Learn tab, in the Train section, click Train All and select Train

Selected to train the model using the new options. Compare results among the classifiers in the Models pane. 16 Choose the best model in the Models pane. To try to improve the model further, change its

hyperparameters. First, duplicate the model by clicking Duplicate in the Models section. Then, try changing hyperparameter settings, like the sizes of the fully connected layers or the regularization strength, in the model Summary tab. Train the new model by clicking Train All and selecting Train Selected in the Train section. To learn more about neural network model settings, see “Neural Network Classifiers” on page 23-41. 17 You can export a full or compact version of the trained model to the workspace. On the

Classification Learner tab, click Export, click Export Model and select Export Model. To exclude the training data and export a compact model, clear the check box in the Export Classification Model dialog box. You can still use the compact model for making predictions on new data. In the dialog box, click OK to accept the default variable name. 18 To examine the code for training this classifier, click Generate Function in the Export section.

Tip Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. To train all the nonoptimizable classifier model presets available for your data set:

23-140

1

On the Learn tab, in the Models section, click the arrow to open the gallery of models.

2

In the Get Started group, click All.

Train Neural Network Classifiers Using Classification Learner App

3

In the Train section, click Train All and select Train All.

To learn about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10.

See Also Related Examples •

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Choose Classifier Options” on page 23-22

•

“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-44

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

•

“Train Decision Trees Using Classification Learner App” on page 23-93

23-141

23

Classification Learner

Train and Compare Classifiers Using Misclassification Costs in Classification Learner App This example shows how to create and compare classifiers that use specified misclassification costs in the Classification Learner app. Specify the misclassification costs before training, and use the accuracy and total misclassification cost results to compare the trained models. 1

In the MATLAB Command Window, read the sample file CreditRating_Historical.dat into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency. Combine all the A ratings into one rating. Do the same for the B and C ratings, so that the response variable has three distinct ratings. Among the three ratings, A is considered the best and C the worst. creditrating = readtable("CreditRating_Historical.dat"); Rating = categorical(creditrating.Rating); Rating = mergecats(Rating,["AAA","AA","A"],"A"); Rating = mergecats(Rating,["BBB","BB","B"],"B"); Rating = mergecats(Rating,["CCC","CC","C"],"C"); creditrating.Rating = Rating;

2

Assume these are the costs associated with misclassifying the credit ratings of customers. Customer Predicted Rating Customer True Rating

A

B

C

A

$0

$100

$200

B

$500

$0

$100

C

$1000

$500

$0

For example, the cost of misclassifying a C rating customer as an A rating customer is $1000. The costs indicate that classifying a customer with bad credit as a customer with good credit is more costly than classifying a customer with good credit as a customer with bad credit. Create a matrix variable that contains the misclassification costs. Create another variable that specifies the class names and their order in the matrix variable. ClassificationCosts = [0 100 200; 500 0 100; 1000 500 0]; ClassNames = categorical(["A","B","C"]);

Tip Alternatively, you can specify misclassification costs directly inside the Classification Learner app. See “Specify Misclassification Costs” on page 23-51 for more information.

23-142

3

Open Classification Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.

4

On the Learn tab, in the File section, select New Session > From Workspace.

Train and Compare Classifiers Using Misclassification Costs in Classification Learner App

5

In the New Session from Workspace dialog box, select the table creditrating from the Data Set Variable list. As shown in the dialog box, the app selects the response and predictor variables based on their data type. The default response variable is the Rating variable. The default validation option is cross-validation, to protect against overfitting. For this example, do not change the default settings.

6

To accept the default settings, click Start Session.

7

Specify the misclassification costs. On the Learn tab, in the Options section, click Costs. The app opens a dialog box showing the default misclassification costs. In the dialog box, click Import from Workspace.

23-143

23

Classification Learner

In the import dialog box, select ClassificationCosts as the cost variable and ClassNames as the class order in the cost variable. Click Import.

The app updates the values in the misclassification costs dialog box. Click Save and Apply to save your changes. The new misclassification costs are applied to the existing draft model in the Models pane and will be applied to new draft models that you create using the gallery in the Models section of the Learn tab.

23-144

Train and Compare Classifiers Using Misclassification Costs in Classification Learner App

8

Train fine, medium, and coarse trees simultaneously. The Models pane already contains a fine tree model. Add medium and coarse tree models to the list of draft models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Decision Trees group, click Medium Tree. The app creates a draft medium tree and adds it to the Models pane. Reopen the model gallery and click Coarse Tree in the Decision Trees group. The app creates a draft coarse tree and adds it to the Models pane. In the Train section, click Train All and select Train All. The app trains the three tree models. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

23-145

23

Classification Learner

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. In the Models pane, each model has a validation accuracy score that indicates the percentage of correctly predicted responses. The app highlights the highest Accuracy (Validation) score by outlining it in a box. 9

Click a model to view the results, which are displayed in the Summary tab. To open this tab, click the Open selected model summary button in the upper right of the Models pane.

10 Inspect the accuracy of the predictions in each class. On the Learn tab, in the Plots and

Results section, click the arrow to open the gallery, and then click Confusion Matrix (Validation) in the Validation Results group. The app displays a matrix of true class and predicted class results for the selected model (in this case, for the medium tree).

23-146

Train and Compare Classifiers Using Misclassification Costs in Classification Learner App

11 You can also plot results per predicted class to investigate false discovery rates. Under Plot,

select the Positive Predictive Values (PPV) False Discovery Rates (FDR) option. In the confusion matrix for the medium tree, the entries below the diagonal have small percentage values. These values indicate that the model tries to avoid assigning a credit rating that is higher than the true rating for a customer.

23-147

23

Classification Learner

12 Compare the total misclassification costs of the tree models. To inspect the total misclassification

cost of a model, select the model in the Models pane, and then view the Training Results section of the Summary tab. For example, the medium tree has these results.

Alternatively, you can sort the models based on the total misclassification cost. In the Models pane, open the Sort by list and select Total Cost (Validation).

23-148

Train and Compare Classifiers Using Misclassification Costs in Classification Learner App

In general, choose a model that has high accuracy and low total misclassification cost. In this example, the medium tree has the highest validation accuracy value and the lowest total misclassification cost of the three models. You can perform feature selection and transformation or tune your model just as you do in the workflow without misclassification costs. However, always check the total misclassification cost of your model when assessing its performance. For information on how to find misclassification costs in the exported model and exported code, see “Misclassification Costs in Exported Model and Generated Code” on page 23-55.

See Also Related Examples •

“Misclassification Costs in Classification Learner App” on page 23-51

•

“Train Classification Models in Classification Learner App” on page 23-10

23-149

23

Classification Learner

Train Classifier Using Hyperparameter Optimization in Classification Learner App This example shows how to tune hyperparameters of a classification support vector machine (SVM) model by using hyperparameter optimization in the Classification Learner app. Compare the test set performance of the trained optimizable SVM to that of the best-performing preset SVM model. 1

In the MATLAB Command Window, load the ionosphere data set, and create a table containing the data. load ionosphere tbl = array2table(X); tbl.Y = Y;

2

Open Classification Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Learn tab, in the File section, select New Session > From Workspace.

4

In the New Session from Workspace dialog box, select tbl from the Data Set Variable list. The app selects the response and predictor variables. The default response variable is Y. The default validation option is 5-fold cross-validation, to protect against overfitting. In the Test section, click the check box to set aside a test data set. Specify to use 15 percent of the imported data as a test set.

23-150

Train Classifier Using Hyperparameter Optimization in Classification Learner App

5

To accept the options and continue, click Start Session.

6

Train all preset SVM models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Support Vector Machines group, click All SVMs. In the Train section, click Train All and select Train All. The app trains one of each SVM model type, as well as the default fine tree model, and displays the models in the Models pane. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

23-151

23

Classification Learner

The app displays a validation confusion matrix for the first SVM model (model 2.1). Blue values indicate correct classifications, and red values indicate incorrect classifications. The Models pane on the left shows the validation accuracy for each model. Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example.

23-152

7

Select an optimizable SVM model to train. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Support Vector Machines group, click Optimizable SVM.

8

Select the model hyperparameters to optimize. In the Summary tab, you can select Optimize check boxes for the hyperparameters that you want to optimize. By default, all the check boxes for the available hyperparameters are selected. For this example, clear the Optimize check boxes for Kernel function and Standardize data. By default, the app disables the Optimize check box for Kernel scale whenever the kernel function has a fixed value other than Gaussian. Select a Gaussian kernel function, and select the Optimize check box for Kernel scale.

Train Classifier Using Hyperparameter Optimization in Classification Learner App

9

Train the optimizable model. In the Train section of the Learn tab, click Train All and select Train Selected.

10 The app displays a Minimum Classification Error Plot as it runs the optimization process. At

each iteration, the app tries a different combination of hyperparameter values and updates the plot with the minimum validation classification error observed up to that iteration, indicated in dark blue. When the app completes the optimization process, it selects the set of optimized hyperparameters, indicated by a red square. For more information, see “Minimum Classification Error Plot” on page 23-65. The app lists the optimized hyperparameters in both the Optimization Results section to the right of the plot and the Optimizable SVM Model Hyperparameters section of the model Summary tab.

23-153

23

Classification Learner

Note In general, the optimization results are not reproducible. 11 Compare the trained preset SVM models to the trained optimizable model. In the Models pane,

the app highlights the highest Accuracy (Validation) by outlining it in a box. In this example, the trained optimizable SVM model outperforms the six preset models. A trained optimizable model does not always have a higher accuracy than the trained preset models. If a trained optimizable model does not perform well, you can try to get better results by running the optimization for longer. On the Learn tab, in the Options section, click Optimizer. In the dialog box, increase the Iterations value. For example, you can double-click the default value of 30 and enter a value of 60. Then click Save and Apply. The options will be applied to future optimizable models created using the Models gallery. 12 Because hyperparameter tuning often leads to overfitted models, check the performance of the

optimizable SVM model on a test set and compare it to the performance of the best preset SVM model. Use the data you reserved for testing when you imported data into the app. First, in the Models pane, click the star icons next to the Medium Gaussian SVM model and the Optimizable SVM model. 13 For each model, select the model in the Models pane. In the Test section of the Test tab, click

Test Selected. The app computes the test set performance of the model trained on the rest of the data, namely the training and validation data. 14 Sort the models based on the test set accuracy. In the Models pane, open the Sort by list and

select Accuracy (Test). 23-154

Train Classifier Using Hyperparameter Optimization in Classification Learner App

In this example, the trained optimizable model still outperforms the trained preset model on the test set data. However, neither model has a test accuracy as high as its validation accuracy.

15 Visually compare the test set performance of the models. For each of the starred models, select

the model in the Models pane. On the Test tab, in the Plots and Results section, click Confusion Matrix (Test) . 16 Rearrange the layout of the plots to better compare them. First, close the plot and summary tabs

for all models except Model 2.5 and Model 3. Then, in the Plots and Results section, click the Layout button and select Compare models. Click the Hide plot options button right of the plots to make more room for the plots.

at the top

23-155

23

Classification Learner

To return to the original layout, you can click the Layout button and select Single model (Default).

See Also Related Examples

23-156

•

“Hyperparameter Optimization in Classification Learner App” on page 23-56

•

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Choose Classifier Options” on page 23-22

Train Classifier Using Hyperparameter Optimization in Classification Learner App

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

•

“Bayesian Optimization Workflow” on page 10-25

23-157

23

Classification Learner

Check Classifier Performance Using Test Set in Classification Learner App This example shows how to train multiple models in Classification Learner, and determine the bestperforming models based on their validation accuracy. Check the test accuracy for the bestperforming models trained on the full data set, including training and validation data. 1

In the MATLAB Command Window, load the ionosphere data set, and create a table containing the data. Separate the table into training and test sets. load ionosphere tbl = array2table(X); tbl.Y = Y; rng("default") % For reproducibility of the data split partition = cvpartition(Y,"Holdout",0.15); idxTrain = training(partition); % Indices for the training set tblTrain = tbl(idxTrain,:); tblTest = tbl(~idxTrain,:);

Alternatively, you can create a test set later on when you import data into the app. For more information, see “(Optional) Reserve Data for Testing” on page 23-21. 2

Open Classification Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Learn tab, in the File section, click New Session and select From Workspace.

4

In the New Session from Workspace dialog box, select the tblTrain table from the Data Set Variable list. As shown in the dialog box, the app selects the response and predictor variables. The default response variable is Y. To protect against overfitting, the default validation option is 5-fold crossvalidation. For this example, do not change the default settings.

23-158

Check Classifier Performance Using Test Set in Classification Learner App

5

To accept the default options and continue, click Start Session.

6

Train all preset models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Get Started group, click All. In the Train section, click Train All and select Train All. The app trains one of each preset model type, along with the default fine tree model, and displays the models in the Models pane. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

7

Sort the trained models based on the validation accuracy. In the Models pane, open the Sort by list and select Accuracy (Validation).

23-159

23

Classification Learner

8

In the Models pane, click the star icons next to the three models with the highest validation accuracy. The app highlights the highest validation accuracy by outlining it in a box. In this example, the trained SVM Kernel model has the highest validation accuracy.

The app displays a validation confusion matrix for the second fine tree model (model 2.1). Blue values indicate correct classifications, and red values indicate incorrect classifications. The Models pane on the left shows the validation accuracy for each model. Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 9

Check the test set performance of the best-performing models. Begin by importing test data into the app. On the Test tab, in the Data section, click Test Data and select From Workspace.

10 In the Import Test Data dialog box, select the tblTest table from the Test Data Set Variable

list. As shown in the dialog box, the app identifies the response and predictor variables.

23-160

Check Classifier Performance Using Test Set in Classification Learner App

11 Click Import. 12 Compute the accuracy of the best preset models on the tblTest data. For convenience, compute

the test set accuracy for all models at once. On the Test tab, in the Test section, click Test All. The app computes the test set performance of the model trained on the full data set, including training and validation data. 13 Sort the models based on the test set accuracy. In the Models pane, open the Sort by list and

select Accuracy (Test). The app still outlines the metric for the model with the highest validation accuracy, despite displaying the test accuracy. 14 Visually check the test set performance of the models. For each starred model, select the model

in the Models pane. On the Test tab, in the Plots and Results section, click Confusion Matrix (Test). 15 Rearrange the layout of the plots to better compare them. First, close the summary and plot tabs

for Model 1 and Model 2.1. Then, click the Document Actions button located to the far right of the model plot tabs. Select the Tile All option and specify a 1-by-3 layout. Click the Hide plot options button

at the top right of the plots to make more room for the plots.

In this example, the trained Medium Gaussian SVM model is the best-performing model on the test set data.

23-161

23

Classification Learner

To return to the original layout, you can click the Layout button in the Plots and Results section and select Single model (Default). 16 Compare the validation and test accuracy for the trained SVM Kernel model. In the Models

pane, double-click the model. In the model Summary tab, compare the Accuracy (Validation) value under Training Results to the Accuracy (Test) value under Test Results. In this example, the validation accuracy is higher than the test accuracy, which indicates that the validation accuracy is perhaps overestimating the performance of this model.

See Also Related Examples

23-162

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

•

“Train Classifier Using Hyperparameter Optimization in Classification Learner App” on page 23150

Explain Model Predictions for Classifiers Trained in Classification Learner App

Explain Model Predictions for Classifiers Trained in Classification Learner App Understanding how some machine learning models make predictions can be difficult. Interpretability tools help reveal how predictors contribute (or do not contribute) to predictions. You can also use these tools to validate whether a model uses the correct evidence for its predictions, and find model biases that are not immediately apparent. Classification Learner provides functionality for two levels of model interpretation: local and global. Level

Objective

Use Case

Local interpretation

Explain a • Identify prediction for a important single query point. predictors for an individual prediction. • Examine a counterintuitive prediction.

Global interpretation

Explain how a • Demonstrate trained model how a trained makes predictions model works. for the entire data • Compare set. different models.

App Functionality Use LIME or Shapley values for a specified query point. See “Explain Local Model Predictions Using LIME Values” on page 23-163 or “Explain Local Model Predictions Using Shapley Values” on page 23-167.

Use partial dependence plots for the predictors of interest. See “Interpret Model Using Partial Dependence Plots” on page 23-171.

Explain Local Model Predictions Using LIME Values Use LIME (local interpretable model-agnostic explanations) to interpret a prediction for a query point by fitting a simple interpretable model for the query point. The simple model acts as an approximation for the trained model and explains model predictions around the query point. The simple model can be a linear model or a decision tree model. You can use the estimated coefficients of a linear model or the estimated predictor importance of a decision tree model to explain the contribution of individual predictors to the prediction for the query point. After you train a model in Classification Learner, select the model in the Models pane. On the Explain tab, in the Local Explanations section, click LIME. The app opens a new tab. In the left plot or table, select a query point. In the right plot or table, the app displays the LIME values corresponding to the query point. The app uses the lime function to compute the LIME values. When computing LIME values, the app uses the final model, trained on the full data set (including training and validation data, but excluding test data). Note Classification Learner does not support LIME explanations for models trained after applying feature selection or principal component analysis (PCA).

23-163

23

Classification Learner

Select Query Point To select a query point, you can use various controls. • To the right of the LIME plots, under Data, choose whether to select a query point from the Training set data or Test set data. The training set refers to the data used to train the final model and includes all the observations that are not reserved for testing. • Above the left plot, under Select Query Point, choose whether to select a query point from a plot (Plot) or a table (Table). If using a plot, click a point in the plot to designate the associated observation as the query point. If using a table, click a row in the table to select the associated observation as the query point. Alternatively, select a query point using the index of the observation in the selected data set. To the right of the LIME plots, under Query Point, enter the observation index. • To make selecting a query point from a plot easier, you can change the plot display by using the controls below the left plot. You can select the x-axis and y-axis variables and choose the values to display (such as correctly classified observations and incorrectly classified observations).

23-164

Explain Model Predictions for Classifiers Trained in Classification Learner App

• After selecting a query point, you can expand the LIME Explanations display by hiding the Select Query Point display. To the right of the LIME plots, under Data, clear the Show query points check box. Plot LIME Explanations Given a query point, view its LIME values by using the LIME Explanations display. Choose whether to view the results using a bar graph (Plot) or a table (Table). The table includes the predictor values at the query point. The meaning of the LIME values depends on the type of LIME model used. To the right of the LIME plots, in the Simple Model section under LIME Options, specify the type of simple model to use for approximating the behavior of the trained model. • If you use a Linear simple model, the LIME values correspond to the coefficient values of the simple model. The bar graph shows the coefficients, sorted by their absolute values. For each categorical predictor, the software creates one less dummy variable than the number of categories, and the bar graph displays only the most important dummy variable. You can check the coefficients of the other dummy variables using the SimpleModel property of the exported results object. For more information, see “Export LIME Results” on page 23-167. • If you use a Tree simple model, the LIME values correspond to the estimated predictor importance values of the simple model. The bar graph shows the predictor importance values, sorted by their absolute values. The bar graph shows LIME values only for the subset of predictors included in the simple model. Below the display of the LIME explanations, the app shows the query point predictions for the trained model (for example, Model 1 prediction) and the simple model (for example, LIME model prediction). If the two predictions are not the same, the simple model is not a good approximation of the trained model at the query point. You can change the simple model so that it better matches the trained model at the query point by adjusting LIME options. Adjust LIME Options To adjust LIME options, you can use various controls to the right of the LIME plots, under LIME Options. Under Simple Model, you can set these options: • Simple model — Specify the type of simple model to use for approximating the behavior of the trained model. Choose between a linear model, which uses fitclinear, and a decision tree, which uses fitctree. For more information, see SimpleModelType. In Classification Learner, linear simple models use a BetaTolerance value of 0.00000001. • Max num predictors — Specify the maximum number of predictors to use for training the simple model. For a linear simple model, this value indicates the number of predictors to include in the model, not counting expanded categorical predictors. For a tree simple model, this value indicates the maximum number of decision splits (or branch nodes) in the tree, which might cause the model to include fewer predictors than the specified maximum. For more information, see numImportantPredictors. • Kernel width — Specify the width of the kernel function used to fit the simple model. Smaller kernel widths create LIME models that focus on data samples near the query point. For more information, see KernelWidth. Under Synthetic Predictor Data, you can set these options: 23-165

23

Classification Learner

• Num data samples — Specify the number of synthetic data samples to generate for training the simple model. For more information, see NumSyntheticData. • Data locality — Specify the locality of the data to use for synthetic data generation. A Global locality uses all observations in the training set, and a Local locality uses the k-nearest neighbors of the query point. (Recall that the training set contains the data used to train the final model and includes all the observations that are not reserved for testing.) For more information, see DataLocality. • Num neighbors — Specify the number of k-nearest neighbors for the query point. This option is valid only when the data locality is Local. For more information, see NumNeighbors. For more information on the LIME algorithm and how synthetic data is used, see “LIME” on page 354652. Perform What-If Analysis After computing the LIME results for a query point, you can perform what-if analysis and compare the LIME results for the original query point to the results for a custom query point. For example, you can see whether the important predictors change when the query point predictor values deviate slightly from their original values. To the right of the LIME plots, under Query Point, select What-if analysis. The app creates a table that shows the predictor values for the original query point and a custom query point. Manually specify the predictor values of the custom query point by editing the Custom Value table entries. To better see the table entries, you can increase the width of the plot options panel by using the plus button + at the top of the panel. After you specify a custom query point, the app updates the display of the LIME results. • The query point plot shows the original query point as a black circle and the custom query point as a green square. • The LIME explanations bar graph shows the LIME values for the original and custom query points, and differentiates the two sets of bars by using different colors and edge styles. • The LIME explanations table includes the LIME and predictor values for both query points. • Below the display of the LIME explanations, you can find the trained model and simple model predictions for both query points. Ensure that the two predictions for the custom query point are the same. Otherwise, the simple model is not a good approximation of the trained model at the custom query point.

23-166

Explain Model Predictions for Classifiers Trained in Classification Learner App

Export LIME Results After computing LIME values, you can export your results by using any of the following options in the Export section on the Explain tab. • To export the LIME explanations bar graph to a figure, click Export Plot to Figure. • To export the LIME explanations table to the workspace, click Export Results and select Export Results Table. • To export the query point model explainer object to the workspace, click Export Results and select Export Results Object. If you specify a custom query point by using what-if analysis, the model explainer object corresponds to the custom query point. For more information on the explainer object, see lime.

Explain Local Model Predictions Using Shapley Values Use the Shapley value of a predictor for a query point to explain the deviation of the query point prediction from the average prediction, due to the predictor. For classification models, predictions are 23-167

23

Classification Learner

class scores. For a query point, the sum of the Shapley values for all predictors corresponds to the total deviation of the prediction from the average. After you train a model in Classification Learner, select the model in the Models pane. On the Explain tab, in the Local Explanations section, click Local Shapley. The app opens a new tab. In the left plot or table, select a query point. In the right plot or table, the app displays the Shapley values corresponding to the query point. The app uses the shapley function to compute the Shapley values. When computing Shapley values, the app uses the final model, trained on the full data set (including training and validation data, but excluding test data). Note Classification Learner does not support Shapley explanations for binary GLM logistic regression models or models trained after applying feature selection or PCA.

Select Query Point To select a query point, you can use various controls.

23-168

Explain Model Predictions for Classifiers Trained in Classification Learner App

• To the right of the Shapley plots, under Data, choose whether to select a query point from the Training set data or Test set data. The training set refers to the data used to train the final model and includes all the observations that are not reserved for testing. • Above the left plot, under Select Query Point, choose whether to select a query point from a plot (Plot) or a table (Table). If using a plot, click a point in the plot to designate the associated observation as the query point. If using a table, click a row in the table to select the associated observation as the query point. Alternatively, select a query point using the index of the observation in the selected data set. To the right of the Shapley plots, under Query Point, enter the observation index. • To make selecting a query point from a plot easier, you can change the plot display by using the controls below the left plot. You can select the x-axis and y-axis variables and choose the values to display (such as correctly classified observations and incorrectly classified observations). • After selecting a query point, you can expand the Shapley Explanations display by hiding the Select Query Point display. To the right of the Shapley plots, under Data, clear the Show query points check box. Plot Shapley Explanations Given a query point, view the Shapley values for its predicted class by using the Shapley Explanations display. Each Shapley value explains the deviation of the score for the query point from the average score for the predicted class, due to the corresponding predictor. Choose whether to view the results using a bar graph (Plot) or a table (Table). • By default, the horizontal bar graph shows the predicted class Shapley values for all predictors, sorted by their absolute values. To see the Shapley values for other classes, check the corresponding Show boxes under Shapley Plot to the right of the Shapley plots. The bar color in the bar graph indicates the class. • The table includes the predictor values at the query point along with the Shapley values for every class. Below the display of the Shapley explanations, the app shows the query point predicted score and the average model predicted score for the predicted class. The sum of the Shapley values equals the difference between the two predictions. If the trained model includes many predictors, you can choose to display only the most important predictors in the bar graph. To the right of the Shapley plots, under Shapley Plot, specify the number of important predictors to show in the Shapley Explanations bar graph. The app displays the specified number of Shapley values with the largest absolute value. Adjust Shapley Options To adjust Shapley options, you can use various controls to the right of the Shapley plots. Under Shapley Options, you can set these options: • Num data samples — Specify the number of observations sampled from the training set to use for Shapley value computations. (Recall that the training set contains the data used to train the final model and includes all the observations that are not reserved for testing.) If the value equals the number of observations in the training set, the app uses every observation in the data set. When the training set has over 1000 observations, the Shapley value computations can be slow. For faster computations, consider using a smaller number of data samples. 23-169

23

Classification Learner

• Method — Specify the algorithm to use when computing Shapley values. The Interventional option computes Shapley values with an interventional value function. The app uses the Kernel SHAP, Linear SHAP, or Tree SHAP algorithm, depending on the trained model type and other specified options. The Conditional option uses the extension to the Kernel SHAP algorithm with a conditional value function. For more information, see Method. • Max num subsets mode — Allow the app to choose the maximum number of predictor subsets automatically, or specify a value manually. You can check the number of predictor subsets used by querying the NumSubsets property of the exported results object. For more information, see “Export Shapley Results” on page 23-171. • Manual max num subsets — When you set Max num subsets mode to Manual, specify the maximum number of predictor subsets to use for Shapley value computations. This option is valid only when the app uses the Kernel SHAP algorithm or the extension to the Kernel SHAP algorithm. For more information, see MaxNumSubsets. For more information on the algorithms used to compute Shapley values, see “Shapley Values for Machine Learning Model” on page 27-18. Perform What-If Analysis After computing the Shapley results for a query point, you can perform what-if analysis and compare the Shapley results for the original query point to the results for a custom query point. For example, you can see whether the important predictors change when the query point predictor values deviate slightly from their original values. To the right of the Shapley plots, under Query Point, select What-if analysis. The app creates a table that shows the predictor values for the original query point and a custom query point. Manually specify the predictor values of the custom query point by editing the Custom Value table entries. To better see the table entries, you can increase the width of the plot options panel by using the plus button + at the top of the panel. After you specify a custom query point, the app updates the display of the Shapley results. • The query point plot shows the original query point as a black circle and the custom query point as a green square. • The Shapley explanations bar graph shows the Shapley values for the original and custom query points, and differentiates the two sets of bars by using different colors and edge styles. • The Shapley explanations table includes the Shapley and predictor values for both query points. • Below the display of the Shapley explanations, you can find the model predictions for both query points.

23-170

Explain Model Predictions for Classifiers Trained in Classification Learner App

Export Shapley Results After computing Shapley values, you can export your results by using any of the following options in the Export section on the Explain tab. • To export the Shapley explanations bar graph to a figure, click Export Plot to Figure. • To export the Shapley explanations table to the workspace, click Export Results and select Export Results Table. • To export the query point model explainer object to the workspace, click Export Results and select Export Results Object. If you specify a custom query point by using what-if analysis, the model explainer object corresponds to the custom query point. For more information on the explainer object, see shapley.

Interpret Model Using Partial Dependence Plots Partial dependence plots (PDPs) allow you to visualize the marginal effect of each predictor on the predicted scores of a trained classification model. After you train a model in Classification Learner, 23-171

23

Classification Learner

you can view a partial dependence plot for the model. On the Explain tab, in the Global Explanations section, click Partial Dependence. When computing partial dependence values, the app uses the final model, trained on the full data set (including training and validation data, but excluding test data). To investigate your results, use the controls on the right. • Under Data, choose whether to plot results using Training set data or Test set data. The training set refers to the data used to train the final model and includes all the observations that are not reserved for testing. • Under Feature, choose the feature to plot using the X list. The x-axis tick marks in the plot correspond to the unique predictor values in the selected data set. If you use PCA to train a model, you can select principal components from the X list. • Visualize the predicted scores by class. Each line in the plot corresponds to the average predicted scores across the predictor values for a specific class. Show or hide a plotted line by checking or clearing the corresponding Show box under Classes. Make a plotted line thicker by clicking the corresponding Class name under Classes. • Zoom in and out, or pan across the plot. To enable zooming or panning, place the mouse over the PDP and click the corresponding button on the toolbar that appears above the top right of the plot.

23-172

Explain Model Predictions for Classifiers Trained in Classification Learner App

For an example, see “Use Partial Dependence Plots to Interpret Classifiers Trained in Classification Learner App” on page 23-175. For more information on partial dependence plots, see plotPartialDependence. To export PDPs you create in the app to figures, see “Export Plots in Classification Learner App” on page 23-81.

See Also lime | shapley | plotPartialDependence | partialDependence

Related Examples •

“Use Partial Dependence Plots to Interpret Classifiers Trained in Classification Learner App” on page 23-175

•

“Interpret Machine Learning Models” on page 27-2

23-173

23

Classification Learner

23-174

•

“Export Plots in Classification Learner App” on page 23-81

•

“Export Classification Model to Predict New Data” on page 23-86

Use Partial Dependence Plots to Interpret Classifiers Trained in Classification Learner App

Use Partial Dependence Plots to Interpret Classifiers Trained in Classification Learner App For trained classification models, partial dependence plots (PDPs) show the relationship between a predictor and the predicted class scores. The partial dependence on the selected predictor is defined by the averaged prediction obtained by marginalizing out the effect of the other predictors. This example shows how to train classification models in the Classification Learner app and interpret the best-performing models using PDPs. You can use PDP results to confirm that models use features as expected, or to remove unhelpful features from model training. 1

In the MATLAB Command Window, load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. load carbig

2

Categorize the cars based on whether they were made in the USA. Origin = categorical(cellstr(Origin)); Origin = mergecats(Origin,["France","Japan","Germany", ... "Sweden","Italy","England"],"NotUSA");

3

Create a table containing the predictor variables Acceleration, Displacement, and so on, as well as the response variable Origin. cars = table(Acceleration,Displacement,Horsepower, ... Model_Year,MPG,Weight,Origin);

4

Remove rows of cars where the table has missing values.

5

Open Classification Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.

6

On the Learn tab, in the File section, click New Session and select From Workspace.

7

In the New Session from Workspace dialog box, select the cars table from the Data Set Variable list. The app selects the response and predictor variables. The default response variable is Origin. The default validation option is 5-fold cross-validation, to protect against overfitting.

cars = rmmissing(cars);

In the Test section, click the check box to set aside a test data set. Specify 15 percent of the imported data as a test set. 8

To accept the options and continue, click Start Session.

9

Train all preset models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Get Started group, click All. In the Train section, click Train All and select Train All. The app trains one of each preset model type, along with the default fine tree model, and displays the models in the Models pane. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. 23-175

23

Classification Learner

• If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background. 10 Sort the trained models based on the validation accuracy. In the Models pane, open the Sort by

list and select Accuracy (Validation). 11 In the Models pane, click the star icon next to the model with the highest validation accuracy.

The app highlights the highest validation accuracy by outlining it in a box. In this example, the trained Bagged Trees model has the highest validation accuracy.

Note Validation introduces some randomness into the results. Your model validation results might vary from the results shown in this example. 12 For the starred model, you can check the model performance by using various plots (for example,

scatter plots, confusion matrices, and ROC curves). In the Models pane, select the model. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery. Then, click any of the buttons in the Validation Results group to open the corresponding plot. After opening multiple plots, you can change the layout of the plots by using the Document Actions button located to the far right of the model plot tabs. For example, click the button, select the Sub-Tile option, and specify a layout. For more information on how to use and display validation plots, see “Visualize and Assess Classifier Performance in Classification Learner” on page 23-70.

23-176

Use Partial Dependence Plots to Interpret Classifiers Trained in Classification Learner App

To return to the original layout, you can click the Layout button in the Plots and Results section and select Single model (Default). 13 For the starred model, see how the model features relate to the model predictions by using

partial dependence plots (PDPs). On the Explain tab, in the Global Explanations section, click Partial Dependence. The PDP allows you to visualize the marginal effect of each predictor on the predicted scores of the trained model. To compute the partial dependence values, the app uses the model trained on the 85% of observations in cars not reserved for testing. 14 Examine the relationship between the model predictors and model scores on the training data

(that is, 85% of the observations in cars). Under Data, select Training set. Look for features that seem to contribute to model predictions. For example, under Feature, select Displacement.

23-177

23

Classification Learner

The blue plotted line represents the averaged partial relationship between the Displacement feature and the NotUSA predicted scores. The red plotted line represents the averaged partial relationship between the Displacement feature and the USA predicted scores. The tick marks along the x-axis indicate the unique Displacement values in the training data set. According to this model (Model 2.24), the probability of a car originating in the USA tends to increase as its engine displacement increases. In particular, the probability of a car originating outside of the USA drops to almost 0 when the engine displacement is greater than 200. Notice, however, that few cars have a displacement value greater than 200. Note In general, consider the distribution of values when interpreting partial dependence plots. Results tend to be more reliable in intervals where you have sufficient observations whose predictor values are spread evenly. 15 You can tune your best-performing model by removing predictors that do not seem to contribute

to model predictions. For example, in the partial dependence plot for the starred model, select Model_Year under Feature. 23-178

Use Partial Dependence Plots to Interpret Classifiers Trained in Classification Learner App

The predicted scores do not seem to vary greatly as the model year increases. This result does not necessarily imply that the predictor is an unimportant feature. Because the Model_Year variable is discrete, the x-axis tick marks cannot fully reflect the distribution of the predictor values; that is, the values might be sparsely or unevenly distributed across the range of model years. Although you cannot determine that Model_Year is an unimportant feature, you might expect the model year to have limited influence on the car origin. Therefore, you can try removing the Model_Year predictor. In general, you do not need to remove predictors that contribute to predictions as expected. 16 For this example, remove the Model_Year predictor from the best-performing model. For the

starred model, create a copy of the model. Right-click the model in the Models pane, and select Duplicate. Then, in the model Summary tab, expand the Feature Selection section, and clear the Select check box for the Model_Year feature.

23-179

23

Classification Learner

17 Train the new model. In the Train section of the Learn tab, click Train All and select Train

Selected. 18 In the Models pane, click the star icon next to the new model. To group the starred models

together, open the Sort by list and select Favorites. 19 For each starred model, compute the accuracy of the model on the test set. First, select the

model in the Models pane. Then, on the Test tab, in the Test section, click Test Selected. 20 Compare the validation and test accuracy results for the starred models by using a table. On the

Test tab, in the Plots and Results section, click Results Table. In the Results Table tab, click the "Select columns to display" button at the top right of the table.

23-180

Use Partial Dependence Plots to Interpret Classifiers Trained in Classification Learner App

In the Select Columns to Display dialog box, check the Select box for the Preset column, and clear the Select check boxes for the Total Cost (Validation) and Total Cost (Test) columns. Click OK.

In this example, the original Bagged Trees model (Model 2.24) outperforms the other starred model in terms of validation and test accuracy. 21 For the best-performing model, look at the PDPs on the test data set. Ensure that the partial

relationships meet expectations. For this example, compare the training set and test set PDPs for the Acceleration feature and the Model 2.24 predicted scores. In the Partial Dependence Plot tab, under Feature, select Acceleration. Under Data, select Training set and then select Test set to see each plot.

23-181

23

Classification Learner

23-182

Use Partial Dependence Plots to Interpret Classifiers Trained in Classification Learner App

The PDPs are similar for the training and test data sets. For lower acceleration values, the predicted scores remain fairly consistent. The scores begin to change noticeably at an acceleration value of approximately 19.5. The test data set does not appear to include many observations with acceleration values above 20; therefore, comparing predictions for that range of values is not possible. If you are satisfied with the best-performing model, you can export the trained model to the workspace. For more information, see “Export the Model to the Workspace to Make Predictions for New Data” on page 23-86. You can also export any of the partial dependence plots you create in Classification Learner. For more information, see “Export Plots in Classification Learner App” on page 23-81.

See Also plotPartialDependence | partialDependence

23-183

23

Classification Learner

Related Examples

23-184

•

“Explain Model Predictions for Classifiers Trained in Classification Learner App” on page 23-163

•

“Interpret Machine Learning Models” on page 27-2

•

“Export Plots in Classification Learner App” on page 23-81

•

“Export Classification Model to Predict New Data” on page 23-86

Deploy Model Trained in Classification Learner to MATLAB Production Server

Deploy Model Trained in Classification Learner to MATLAB Production Server This example shows how to train a model in Classification Learner and export it for deployment to MATLAB Production Server. This workflow requires MATLAB Compiler SDK.

Choose Trained Model to Deploy 1

In the Command Window, load the patients data set, and create a table from a subset of the variables in the data set. Each row in patientTbl corresponds to a patient, and each column corresponds to a diagnostic variable. load patients patientTbl = table(Age,Diastolic,Gender,Height, ... SelfAssessedHealthStatus,Systolic,Weight,Smoker);

2

Convert the SelfAssessedHealthStatus variable to an ordinal categorical predictor. patientTbl.SelfAssessedHealthStatus = categorical(patientTbl.SelfAssessedHealthStatus, ... ["Poor","Fair","Good","Excellent"],"Ordinal",true);

3

From the Command Window, open the Classification Learner app. Populate the New Session from Arguments dialog box with the predictor data in patientTbl and the response variable Smoker. classificationLearner(patientTbl,"Smoker")

The default validation option is 5-fold cross-validation, to protect against overfitting. For this example, do not change the default validation setting. 4

To accept the selections in the New Session from Arguments dialog box and continue, click Start Session.

5

Train all preset models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Get Started group, click All. In the Train section, click Train All and select Train All. The app trains all preset models, along with the default fine tree model, and displays the models in the Models pane. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

The app displays a confusion matrix for the second fine tree model (model 2.1). Blue values indicate correct classifications, and red values indicate incorrect classifications. The Models pane on the left shows the validation accuracy for each model.

23-185

23

Classification Learner

6

Sort the models based on the validation accuracy. In the Models pane, open the Sort by list and select Accuracy (Validation). The app outlines the metric for the model (or models) with the highest validation accuracy.

7

Select the model in the Models pane with the highest validation accuracy.

Export Model for Deployment 1

Export the selected model for deployment to MATLAB Production Server. On the Learn tab, click Export, click Export Model and select Export Model for Deployment.

2

In the Select Project File for Model Deployment dialog box, select a location and name for your project file. For this example, use the default project name ClassificationLearnerDeployedModel.prj. Click Save. The software opens the Production Server Compiler app and the autogenerated predictFunction.m file. In the Compiler tab of the Production Server Compiler app, the Exported Functions section includes the files modelInformation.m and predictFunction.m. The section Additional files required for your archive to run includes the files processInputData.m and TrainedClassificationModel.mat.

3

Update the code in the files processInputData.m and predictFunction.m to include preprocessing steps performed before you imported data in Classification Learner. Open the processInputData.m file from the ClassificationLearnerDeployedModel_resources folder, and change the code to include the conversion of the SelfAssessedHealthStatus variable to an ordinal categorical predictor. function processedData = processInputData(T) T.SelfAssessedHealthStatus = categorical(T.SelfAssessedHealthStatus, ... ["Poor","Fair","Good","Excellent"],"Ordinal",true); processedData = T; end

4

In the predictFunction.m file, uncomment the following lines of code so that the predictFunction function calls the processInputData function. processedData = processInputData(T); T = processedData;

5

Edit the predictFunction.m code so that the function returns two outputs, labels and scores, instead of the single output result. Update the function signature in the first line of code. function [labels,scores] = predictFunction(varargin)

Then, update the result = model.predictFcn(T); line of code to include the two output arguments. [labels,scores] = model.predictFcn(T);

Also update the commented-out description of the predictFunction function to include descriptions of the new output arguments. labels contains the predicted labels returned by the trained model, and scores contains the predicted scores returned by the trained model. 6

23-186

Close the files predictFunction.m and processInputData.m.

Deploy Model Trained in Classification Learner to MATLAB Production Server

(Optional) Simulate Model Deployment Before packaging your code for deployment to MATLAB Production Server, you can simulate the model deployment using a MATLAB client. Completing this process requires opening another instance of MATLAB. For an example that shows how to use a sample Java® client for sending data to a MATLAB function deployed on the server, see “Evaluate Deployed Machine Learning Models Using Java Client” (MATLAB Production Server). 1

In the Production Server Compiler app, click the Test Client button in the Test section on the Compiler tab.

2

On the Test tab, in the Server Actions section, click the Start button. Note the address listed in the Server Address pane, which in this example is http://localhost:9910/ DeployedClassificationModel.

3

Open a new instance of MATLAB. In the new MATLAB instance, the Production Server Compiler app automatically opens. Close this instance of the app.

4

In the Command Window of the new MATLAB instance, load the predictor and response data. Ensure that the data has the same format as the training data used in Classification Learner. load patients patientTbl = table(Age,Diastolic,Gender,Height, ... SelfAssessedHealthStatus,Systolic,Weight,Smoker); patientTbl.SelfAssessedHealthStatus = categorical(patientTbl.SelfAssessedHealthStatus, ... ["Poor","Fair","Good","Excellent"],"Ordinal",true);

5

Prepare the data to send it to MATLAB Production Server. You must convert categorical variables and tables to cell arrays and structures, respectively, before sending them to MATLAB Production Server. Because SelfAssessedHealthStatus is a categorical variable and patientTbl is a table, process the input data further before sending it. inputTbl = patientTbl; columnNames = patientTbl.Properties.VariableNames; for i=1:length(columnNames) if iscategorical(patientTbl.(columnNames{i})) inputTbl.(columnNames{i}) = cellstr(patientTbl.(columnNames{i})); end end inputData = table2struct(inputTbl);

6

Send the input data to MATLAB Production Server. Use the server address displayed in the Production Server Compiler app. jsonData = mps.json.encoderequest({inputData},"Nargout",2); URL = "http://localhost:9910/DeployedClassificationModel/predictFunction"; options = weboptions("MediaType","application/json","Timeout",30); response = webwrite(URL,jsonData,options);

In the original MATLAB instance, in the opened Production Server Compiler app, the MATLAB Execution Requests pane under the Test tab shows a successful request between the server and the MATLAB client. 7

In the Command Window of the new MATLAB instance, extract the predicted labels and scores from the response variable. Check that the predicted values are correct. 23-187

23

Classification Learner

labels = response.lhs{1}; scores = response.lhs{2}; 8

In the original MATLAB instance, in the Production Server Compiler app, click Stop in the Server Actions section on the Test tab. In the Close section, click Close Test.

Package Code 1

Use the Production Server Compiler app to package your model and prediction function. On the Compiler tab, in the Package section, click the Package button.

2

In the Package dialog box, verify that the option Open output folder when process completes is selected. After the deployment process finishes, examine the generated output. • for_redistribution — Folder containing the DeployedClassificationModel.ctf file • for_testing — Folder containing the raw generated files required to create the installer • PackagingLog.html — Log file generated by MATLAB Compiler SDK

See Also Related Examples

23-188

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

•

“Create Deployable Archive for MATLAB Production Server” (MATLAB Production Server)

•

“Evaluate Deployed Machine Learning Models Using Java Client” (MATLAB Production Server)

•

“Execute Deployed MATLAB Functions” (MATLAB Production Server)

Build Condition Model for Industrial Machinery and Manufacturing Processes

Build Condition Model for Industrial Machinery and Manufacturing Processes This example builds a condition model for sensor data collected from an industrial manufacturing machine. Use the Classification Learner App to build a binary classification model that determines the condition of the machine as either "after maintenance" or "before maintenance." Train the model using the data collected both immediately before and after a scheduled maintenance. Assume that the data collected after scheduled maintenance represents normal observations, and the data collected before maintenance represents anomalies. You can use the trained model to monitor incoming observations and determine whether a new maintenance cycle is necessary by detecting anomalies in the observations. The classification workflow in this example includes these steps: 1

Load data into the MATLAB workspace.

2

Import data into the Classification Learner app and reserve a percentage of the data for testing.

3

Train binary classification models that can detect anomalies in sensor data. Use all features in the data set.

4

Assess model performance using the model accuracy on the validation data.

5

Interrupt the app session to explore aspects of model deployment, including whether the model can fit on the target hardware within the resources designated for the classification task.

6

Resume the app session to build new models with reduced size. To reduce model size, train the models after selecting features using feature ranking.

7

Select a final model and observe its accuracy on the test set.

8

Export the final model for deployment on the target hardware.

Load Data This example uses a data set that contains 12 features extracted from three-axis vibration measurements of an industrial machine. Execute the following commands to download and extract the data set file. url = "https://ssd.mathworks.com/supportfiles/predmaint/" + ... "anomalyDetection3axisVibration/v1/vibrationData.zip"; outfilename = websave("vibrationData.zip",url); unzip(outfilename)

Load the featureAll table in the FeatureEntire.mat file. load("FeatureEntire.mat")

The table contains 17,642 observations for 13 variables (one categorical response variable and 12 predictor variables). Shorten the predictor variable names by removing the redundant phrase ("_stats/Col1_"). for i = 2:13 featureAll.Properties.VariableNames(i) = ... erase(featureAll.Properties.VariableNames(i),"_stats/Col1_"); end

Preview the first eight rows of the table. 23-189

23

Classification Learner

head(featureAll) ans=8×13 table label ch1CrestFactor ______ ______________ Before Before Before Before Before Before Before Before

2.3683 2.402 2.4157 2.4595 2.2502 2.4211 3.3111 2.2655

ch1Kurtosis ___________ 1.927 1.9206 1.9523 1.8205 1.8609 2.2479 4.0304 2.0656

ch1RMS ______

ch1Std ______

ch2Mean __________

ch2RMS _______

ch2Sk _____

2.2225 2.1807 2.1789 2.14 2.3391 2.1286 1.5896 2.3233

2.2225 2.1803 2.1788 2.1401 2.339 2.1285 1.5896 2.3233

-0.015149 -0.018269 -0.0063652 0.0017307 -0.0081829 0.011139 -0.0080759 -0.0049447

0.62512 0.56773 0.45646 0.41418 0.3694 0.36638 0.47218 0.37829

4.2 3.9 2.8 2.0 3.3 1.8 2.1 2.4

The values in the first column are the labels of observations, Before or After, which indicate whether each observation is collected immediately before or after a scheduled maintenance, respectively. The remaining columns contain 12 features extracted from the vibration measurements using the Diagnostic Feature Designer app in Predictive Maintenance Toolbox™. For more information about the extracted features, see “Anomaly Detection in Industrial Machinery Using Three-Axis Vibration Data” (Predictive Maintenance Toolbox).

Import Data into App and Partition Data Import the featureAll table into the Classification Learner app, and set aside 10% of the data as a test set.

23-190

1

On the Apps tab, click the Show more arrow to display the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.

2

On the Learn tab, in the File section, click New Session > From Workspace.

3

In the New Session from Workspace dialog box, select the table featureAll from the Data Set Variable list. The app selects the response (label) and predictor variables (12 features) based on their data types.

4

In the Test section, click the check box to set aside a test data set. Specify to use 10% of the imported data as a test set. The featureAll table contains 17,642 samples, so setting aside 10% yields 1764 samples in the test set and 15,878 samples in the training set.

Build Condition Model for Industrial Machinery and Manufacturing Processes

5

To accept the default validation scheme and continue, click Start Session. The default validation option is 5-fold cross-validation, to protect against overfitting.

Alternatively, you can open the Classification Learner app from the MATLAB Command Window by entering classificationLearner. You can specify the predictor data, response variable, and percentage of the data for testing. classificationLearner(featureAll,"label",TestDataFraction=0.1)

Train Models Using All Features First, train models using all 12 features in the data set. The Models pane already contains a draft for a fine tree model. You can add a variety of draft models to the Models pane by selecting them from the Models gallery, and then train all models. 1

On the Learn tab, in the Models section, click the Show more arrow to open the gallery.

2

Select three models: 23-191

23

Classification Learner

• Bagged trees — In the Ensemble Classifiers group, click Bagged Trees. • Fine Gaussian support vector machine (SVM) — In the Support Vector Machines group, click Fine Gaussian SVM. • Bilayered neural network — In the Neural Network Classifiers group, click Bilayered Neural Network. The app includes the draft models in the Models pane.

For more information on each classifier option, see “Choose Classifier Options” on page 23-22. 3

In the Train section of the Learn tab, click Train All and select Train All. The app trains the four models using all 12 features. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

Assess Model Performance You can compare trained models based on multiple characteristics. For example, you can assess the model accuracy, model size (which affects memory or disk storage needs), computational costs associated with training and testing the model, and model interpretability. Compare the four trained models based on the model accuracy measured on validation data. In the Models pane, each model has a validation accuracy score that indicates the percentage of correctly predicted responses. 23-192

Build Condition Model for Industrial Machinery and Manufacturing Processes

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 1

Sort the trained models based on the validation accuracy. In the Models pane, click the Sort by arrow and select Accuracy (Validation).

Although three of the models share the same percentage for validation accuracy, when viewed with three significant digits, the ensemble model achieves the highest accuracy by a small margin. The app highlights the ensemble model by outlining its accuracy score, and the model appears first when the models are sorted by validation accuracy. 2

To better understand the results, rearrange the layout of the plots so you can compare the confusion matrices for the four models. Click the Document Actions button located to the far right of the model plot tabs. Select the Tile All option and specify a 2-by-2 layout.

23-193

23

Classification Learner

In the top right of each plot, click the Hide plot options button plot.

to make more room for the

Compared to the other models, the ensemble model (Model 2) has fewer off-diagonal cells corresponding to incorrectly classified observations. For more details on assessing model performance, see “Visualize and Assess Classifier Performance in Classification Learner” on page 23-70.

Export Model to the Workspace and Save App Session Export the best model to the workspace and check the model size.

23-194

1

In the Models pane, click the ensemble model to select it.

2

On the Learn tab, click Export, click Export Model and select Export Model. Exclude the training data from the exported model by clearing the check box in the Export Classification Model dialog box. You can still use the compact model for making predictions on new data.

Build Condition Model for Industrial Machinery and Manufacturing Processes

Note The final model exported by Classification Learner is always trained using the full data set, excluding any data reserved for testing. The validation scheme that you use only affects the way the app computes validation metrics. 3

In the Export Classification Model dialog box, edit the name of the exported variable, if you want, and then click OK. The default name for the exported model, trainedModel, increments every time you export (for example, trainedModel1), to avoid overwriting existing exported models. The new variable trainedModel appears in the workspace.

4

Save and close the current app session. Click Save in the File section of the Learn tab. Specify the session file name and location, and then close the app.

Check Model Size Check the exported model size by using the whos function in the Command Window. mdl = trainedModel.ClassificationEnsemble; whos mdl Name

Size

mdl

1x1

Bytes 315622

Class

At

classreg.learning.classif.CompactClassificationEnsemble

Assume that you want to deploy the model on a programmable logic controller (PLC) with limited memory, and the ensemble model with all 12 features does not fit within the resources designated on the PLC.

Resume App Session In Classification Learner, open the previously saved app session. Click Open in the File section. In the Select File to Open dialog box, select the saved session.

Select Features Using Feature Ranking One approach to reducing model size is to reduce the number of features in a model using feature ranking and selection. Build new models with a reduced set of features and assess the model accuracy. 1

Create a copy of each trained model. After selecting a model in the Models pane, either click the Duplicate selected model button in the upper right of the Models pane, or right-click the model and select Duplicate.

2

To use feature ranking algorithms in Classification Learner, click Feature Selection in the Options section of the Learn tab. The app opens a Default Feature Selection tab.

3

In the Default Feature Selection tab, click MRMR under Feature Ranking Algorithm. The app displays a bar graph of the sorted feature importance scores, where larger scores (including Infs) indicate greater feature importance. The table on the right shows the ranked features and their scores.

4

Under Feature Selection, use the default option of selecting the highest ranked features to avoid bias in the validation metrics. Specify to keep 3 features for model training.

23-195

23

Classification Learner

The feature ranking results are based on the full data set, including the training and validation data but not the test data. The app uses the highest ranked features to train the full model (that is, the model trained on the full data set). For each training fold, the app performs feature selection before training the model. Different folds can choose different predictors as the highest ranked features.

23-196

5

Click Save and Apply. The app applies the feature selection changes to the new draft models in the Models pane. Note that the draft models use 3/12 features (3 features out of 12).

6

In the Train section, click Train All and select Train All. The app trains all new draft models using three features.

Build Condition Model for Industrial Machinery and Manufacturing Processes

The models trained using only three features perform comparably to the models trained on all features. This result indicates that a model based on only the top three features can achieve similar accuracy as a model based on all features. Among all the trained models, the best performing model is the neural network model with three features. For more details on feature selection, see “Feature Selection and Feature Transformation Using Classification Learner App” on page 23-44.

Investigate Important Features in Scatter Plot Examine the scatter plot for the best performing model using the top two features, ch3THD and ch3SINAD. The plot should show strong class separation, given the high observed model accuracies. 1

In the Models pane, select the best performing model (Model 7, neural network model).

2

On the Learn tab, in the Plots and Results section, click the Show more arrow to open the gallery, and then click Scatter in the Validation Results group.

3

Choose the two most important predictors, ch3THD and ch3SINAD, using the X and Y lists under Predictors. The app creates a scatter plot of the two selected predictors, grouped by the model predictions. Because you are using cross-validation, these predictions are on the validation observations. In 23-197

23

Classification Learner

other words, the software obtains each prediction by using a model that was trained without the corresponding observation.

The plot shows the strong separation between the Before and After categories for the two features. For more information, see “Investigate Features in the Scatter Plot” on page 23-44 and “Plot Classifier Results” on page 23-73.

Further Experimentation To choose a final model, you can explore further on these aspects, if necessary: 23-198

Build Condition Model for Industrial Machinery and Manufacturing Processes

• Model accuracy — To achieve better accuracy, you can explore additional model types (for example, try All in the Models gallery), further feature selection, or hyperparameter tuning. For example, before training, select a draft model in the Models pane, and then click the model Summary tab. You can specify the classifier hyperparameter options in the Model Hyperparameters section. The tab also includes Feature Selection and PCA sections with options you can set.

23-199

23

Classification Learner

23-200

Build Condition Model for Industrial Machinery and Manufacturing Processes

If you want to automatically tune hyperparameters of a specific model type, you can select the corresponding Optimizable model in the Models gallery and perform hyperparameter optimization. For more information, see “Hyperparameter Optimization in Classification Learner App” on page 23-56. • Computational complexity — You can find the training time and prediction speed in the Summary tab for a trained model. For example, see the Summary tab of the best performing model (Model 7, neural network model).

Assume that you decide to use the neural network model trained with the top three features based on the validation accuracy and computational complexity.

Assess Model Accuracy on Test Set You can use the test set accuracy as an estimate of the model accuracy on unseen data. Assess the neural network model using the test set. 1

In the Models pane, select the neural network model.

2

On the Test tab, in the Test section, click Test Selected. The app computes the test set performance of the model trained on the full data set. As expected, the model achieves similar accuracy on the test data (99.8%) compared to the validation accuracy.

23-201

23

Classification Learner

3

23-202

Display the confusion matrix of the test set. In the Plots and Results section on the Test tab, click Confusion Matrix (Test).

Build Condition Model for Industrial Machinery and Manufacturing Processes

For more details, see “Evaluate Test Set Model Performance” on page 23-79.

Export Final Model Export the final model to the workspace and check the model size. 1

In Classification Learner, select the neural network model in the Models pane.

2

On the Learn tab, click Export, click Export Model and select Export Model.

3

In the Export Classification Model dialog box, edit the name of the exported variable, if necessary, and then click OK. The default name is trainedModel1. 23-203

23

Classification Learner

4

Check the model size by using the whos function. mdl_final = trainedModel1.ClassificationNeuralNetwork; whos mdl_final Name

Size

mdl_final

1x1

Bytes 7842

Class

classreg.learning.classif.CompactClassificationNeural

The size of the final model (mdl_final from trainedModel1) is smaller than the size of the ensemble model (mdl from trainedModel). For information about potential next steps of generating code for prediction or deploying predictions, see “Export Classification Model to Predict New Data” on page 23-86.

See Also Related Examples

23-204

•

“Train Classification Models in Classification Learner App” on page 23-10

•

“Select Data for Classification or Open Saved App Session” on page 23-17

•

“Choose Classifier Options” on page 23-22

•

“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-44

•

“Visualize and Assess Classifier Performance in Classification Learner” on page 23-70

•

“Export Classification Model to Predict New Data” on page 23-86

•

“Train Decision Trees Using Classification Learner App” on page 23-93

Export Model from Classification Learner to Experiment Manager

Export Model from Classification Learner to Experiment Manager After training a classification model in Classification Learner, you can export the model to Experiment Manager to perform multiple experiments. By default, Experiment Manager uses Bayesian optimization to tune the model in a process similar to training optimizable models in Classification Learner. (For more information, see “Hyperparameter Optimization in Classification Learner App” on page 23-56.) Consider exporting a model to Experiment Manager when you want to do any of the following: • Adjust hyperparameter search ranges during hyperparameter tuning. • Change the training data. • Adjust the preprocessing steps that precede model fitting. • Tune hyperparameters using a different metric. For a workflow example, see “Tune Classification Model Using Experiment Manager” on page 23-212. Note that if you have a Statistics and Machine Learning Toolbox license, you do not need a Deep Learning Toolbox license to use the Experiment Manager app.

Export Classification Model To create an Experiment Manager experiment from a model trained in Classification Learner, select the model in the Models pane. On the Learn tab, in the Export section, click Export Model and select Create Experiment. Note This option is not supported for binary GLM logistic regression models. In the Create Experiment dialog box, modify the filenames or accept the default values.

The app exports the following files to Experiment Manager: 23-205

23

Classification Learner

• Training function — This function trains a classification model using the model hyperparameters specified in the Experiment Manager app and records the resulting metrics and visualizations. For each trial of the experiment, the app calls the training function with a new combination of hyperparameter values, selected from the hyperparameter search ranges specified in the app. The app saves the returned trained model, which you can export to the MATLAB workspace after training is complete. • Training data set — This .mat file contains the full data set used in Classification Learner (including training and validation data, but excluding test data). Depending on how you imported the data into Classification Learner, the data set is contained in either a table named dataTable or two separate variables named predictorMatrix and responseData. • Conditional constraints function — For some models, a conditional constraints function is required to tune model hyperparameters using Bayesian optimization. Conditional constraints enforce one of these conditions: • When some hyperparameters have certain values, other hyperparameters are set to given values. • When some hyperparameters have certain values, other hyperparameters are set to NaN values (for numeric hyperparameters) or values (for categorical hyperparameters). For more information, see “Conditional Constraints — ConditionalVariableFcn” on page 10-40. • Deterministic constraints function — For some models, a deterministic constraints function is required to tune model hyperparameters using Bayesian optimization. A deterministic constraints function returns a true value when a point in the hyperparameter search space is feasible (that is, the problem is valid or well defined at this point) and a false value otherwise. For more information, see “Deterministic Constraints — XConstraintFcn” on page 10-39. After you click Create Experiment, the app opens Experiment Manager. The Experiment Manager app then opens a dialog box in which you can choose to use a new or existing project for your experiment.

Select Hyperparameters In Experiment Manager, use different hyperparameters and hyperparameter search ranges to tune your model. On the tab for your experiment, in the Hyperparameters section, click Add to add a hyperparameter to the model tuning process. In the table, double-click an entry to adjust its value.

23-206

Export Model from Classification Learner to Experiment Manager

When you use the default Bayesian optimization strategy for model tuning, specify these properties of the hyperparameters used in the experiment: • Name — Enter a valid hyperparameter name. • Range — For a real- or integer-valued hyperparameter, enter a two-element vector that gives the lower bound and upper bound of the hyperparameter. For a categorical hyperparameter, enter an array of strings or a cell array of character vectors that lists the possible values of the hyperparameter. • Type — Select real for a real-valued hyperparameter, integer for an integer-valued hyperparameter, or categorical for a categorical hyperparameter. • Transform — Select none to use no transform or log to use a logarithmic transform. When you select log, the hyperparameter values must be positive. With this setting, the Bayesian optimization algorithm models the hyperparameter on a logarithmic scale. The following table provides information on the hyperparameters you can tune in the app for each model type. Model Type

Fitting Function

Hyperparameters

Tree

fitctree

MaxNumSplits, MinLeafSize, SplitCriterion For more information, see OptimizeHyperparameters.

Discriminant

fitcdiscr

Delta, DiscrimType, Gamma For more information, see OptimizeHyperparameters.

Naive Bayes

fitcnb

DistributionNames, Kernel, Standardize, Width For more information, see OptimizeHyperparameters.

23-207

23

Classification Learner

Model Type

Fitting Function

Hyperparameters

SVM

fitcsvm (for two classes), fitcecoc (for three or more classes)

BoxConstraint, Coding (for three or more classes only), KernelFunction, KernelScale, PolynomialOrder, Standardize For more information, see OptimizeHyperparameters (for fitcsvm hyperparameters) and OptimizeHyperparameters (for fitcecoc hyperparameters).

Efficient Linear

fitclinear (for two classes), fitcecoc (for three or more classes)

Coding (for three or more classes only), Lambda, Learner, Regularization For more information, see OptimizeHyperparameters (for fitclinear hyperparameters) and OptimizeHyperparameters (for fitcecoc hyperparameters).

KNN

fitcknn

Distance, DistanceWeight, Exponent, NumNeighbors, Standardize For more information, see OptimizeHyperparameters.

Kernel

fitckernel (for two classes), fitcecoc (for three or more classes)

Coding (for three or more classes only), KernelScale, Lambda, Learner, NumExpansionDimensions, Standardize For more information, see OptimizeHyperparameters (for fitckernel hyperparameters) and OptimizeHyperparameters (for fitcecoc hyperparameters).

23-208

Export Model from Classification Learner to Experiment Manager

Model Type

Fitting Function

Hyperparameters

Ensemble

fitcensemble

LearnRate, MaxNumSplits, Method, MinLeafSize, NumLearningCycles, NumVariablesToSample, SplitCriterion For more information, see OptimizeHyperparameters.

Neural Network

fitcnet

Activations, Lambda, LayerBiasesInitializer, LayerSizes, LayerWeightsInitializer, Standardize For more information, see OptimizeHyperparameters.

In the MATLAB Command Window, you can use the hyperparameters function to get more information about the hyperparameters available for your model and their default search ranges. Specify the fitting function, the training predictor data, and the training response variable in the call to the hyperparameters function.

(Optional) Customize Experiment In your experiment, you can change more than the model hyperparameters. In most cases, experiment customization requires editing the training function file before running the experiment. For example, to change the training data set, preprocessing steps, returned metrics, or generated visualizations, you must update the training function file. To edit the training function, click Edit in the Training Function section on the experiment tab. For an example that includes experiment customization, see “Tune Classification Model Using Experiment Manager” on page 23-212. Some experiment customization steps do not require editing the training function file. For example, you can change the strategy for model tuning, adjust the Bayesian optimization options, or change the metric used to perform Bayesian optimization. Change Strategy for Model Tuning Instead of using Bayesian optimization to search for the best hyperparameter values, you can sweep through a range of hyperparameter values. On the tab for your experiment, in the Hyperparameters section, set Strategy to Exhaustive Sweep. In the hyperparameter table, enter the names and values of the hyperparameters to use in the experiment. Hyperparameter values must be scalars or vectors with numeric, logical, or string values, or cell arrays of character vectors. For example, these are valid hyperparameter specifications, depending on your model: • 0.01 • 0.01:0.01:0.05 • [0.01 0.02 0.04 0.08] • ["Bag","AdaBoostM2","RUSBoost"] • {'gaussian','linear','polynomial'} 23-209

23

Classification Learner

When you run the experiment, Experiment Manager trains a model using every combination of the hyperparameter values specified in the table. Adjust Bayesian Optimization Options When you use Bayesian optimization, you can specify the duration of your experiment. On the tab for your experiment, in the Hyperparameters section, ensure that Strategy is set to Bayesian Optimization. In the Bayesian Optimization Options section, enter the maximum time in seconds and the maximum number of trials to run. Note that the actual run time and number of trials in your experiment can exceed these settings because Experiment Manager checks these options only when a trial finishes executing. You can also specify the acquisition function for the Bayesian optimization algorithm. In the Bayesian Optimization Options section, click Advanced Options. Select an acquisition function from the Acquisition Function Name list. The default value for this option is expected-improvementplus. For more information, see “Acquisition Function Types” on page 10-3. Note that if you edit the training function file so that a new deterministic constraints function or conditional constraints function is required, you can specify the new function names in the Advanced Options section. Change Metric Used to Perform Bayesian Optimization By default, the app uses Bayesian optimization to try to find the combination of hyperparameter values that minimizes the validation accuracy. You can specify to minimize the validation total cost instead. On the tab for your experiment, in the Metrics section, specify to optimize the ValidationTotalCost value. Keep the Direction set to Minimize. Note that if you edit the training function file to return another metric, you can specify it in the Metrics section. Ensure that the Direction is appropriate for the given metric.

Run Experiment When are you ready to run your experiment, you can run it either sequentially or in parallel. • If you have Parallel Computing Toolbox, Experiment Manager can perform computations in parallel. On the Experiment Manager tab, in the Execution section, select Simultaneous from the Mode list. Note Parallel computations with a thread pool are not supported in Experiment Manager. • Otherwise, use the default Mode option of Sequential. On the Experiment Manager tab, in the Run section, click Run.

See Also Apps Experiment Manager | Classification Learner

23-210

Export Model from Classification Learner to Experiment Manager

Related Examples •

“Tune Classification Model Using Experiment Manager” on page 23-212

•

“Hyperparameter Optimization in Classification Learner App” on page 23-56

•

“Manage Experiments” (Deep Learning Toolbox)

23-211

23

Classification Learner

Tune Classification Model Using Experiment Manager This example shows how to use Experiment Manager to optimize a machine learning classifier. The goal is to create a classifier for the CreditRating_Historical data set that has minimal crossvalidation loss. Begin by using the Classification Learner app to train all available classification models on the training data. Then, improve the best model by exporting it to Experiment Manager. In Experiment Manager, use the default settings to minimize the cross-validation loss (that is, maximize the cross-validation accuracy). Investigate options that help improve the loss, and perform more detailed experiments. For example, fix some hyperparameters at their best values, add useful hyperparameters to the model tuning process, adjust hyperparameter search ranges, adjust the training data, and customize the visualizations. The final result is a classifier with better test set accuracy. For more information on when to export models from Classification Learner to Experiment Manager, see “Export Model from Classification Learner to Experiment Manager” on page 23-205.

Load and Partition Data 1

In the MATLAB Command Window, read the sample file CreditRating_Historical.dat into a table. The predictor data contains financial ratios and industry sector information for a list of corporate customers. The response variable contains credit ratings assigned by a rating agency. creditrating = readtable("CreditRating_Historical.dat");

The goal is to create a classification model that predicts a customer's rating, based on the customer's information. 2

Because each value in the ID variable is a unique customer ID, that is, length(unique(creditrating.ID)) is equal to the number of observations in creditrating, the ID variable is a poor predictor. Remove the ID variable from the table, and convert the Industry variable to a categorical variable. creditrating = removevars(creditrating,"ID"); creditrating.Industry = categorical(creditrating.Industry);

3

Convert the response variable Rating to a categorical variable and specify the order of the categories. creditrating.Rating = categorical(creditrating.Rating, ... ["AAA","AA","A","BBB","BB","B","CCC"]);

4

Partition the data into two sets. Use approximately 80% of the observations for model training in Classification Learner, and reserve 20% of the observations for a final test set. Use cvpartition to partition the data. rng("default") % For reproducibility c = cvpartition(creditrating.Rating,"Holdout",0.2); trainingIndices = training(c); testIndices = test(c); creditTrain = creditrating(trainingIndices,:); creditTest = creditrating(testIndices,:);

23-212

Tune Classification Model Using Experiment Manager

Train Models in Classification Learner 1

If you have Parallel Computing Toolbox, the Classification Learner app can train models in parallel. Training models in parallel is typically faster than training models in series. If you do not have Parallel Computing Toolbox, skip to the next step. Before opening the app, start a parallel pool of process workers by using the parpool function. parpool("Processes")

By starting a parallel pool of process workers rather than thread workers, you ensure that Experiment Manager can use the same parallel pool later. Note Parallel computations with a thread pool are not supported in Experiment Manager. 2

Open Classification Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Learn tab, in the File section, click New Session and select From Workspace.

4

In the New Session from Workspace dialog box, select the creditTrain table from the Data Set Variable list. The app selects the response and predictor variables. The default response variable is Rating. The default validation option is 5-fold cross-validation, to protect against overfitting. In the Test section, click the check box to set aside a test data set. Specify 15 percent of the imported data as a test set.

5

To accept the options and continue, click Start Session.

6

To obtain the best classifier, train all preset models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Get Started group, click All. In the Train section, click Train All and select Train All. The app trains one of each preset model type, along with the default fine tree model, and displays the models in the Models pane.

7

To find the best result, sort the trained models based on the validation accuracy. In the Models pane, open the Sort by list and select Accuracy (Validation).

23-213

23

Classification Learner

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example.

Assess Best Model Performance 1

For the model with the greatest validation accuracy, inspect the accuracy of the predictions in each class. Select the efficient linear SVM model in the Models pane. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Confusion Matrix (Validation) in the Validation Results group. View the matrix of true class and predicted class results. Blue values indicate correct classifications, and red values indicate incorrect classifications.

Overall, the model performs well. In particular, most of the misclassifications have a predicted value that is only one category away from the true value. 23-214

Tune Classification Model Using Experiment Manager

2

See how the classifier performed per class. Under Plot, select the True Positive Rates (TPR), False Negative Rates (FNR) option. The TPR is the proportion of correctly classified observations per true class. The FNR is the proportion of incorrectly classified observations per true class.

The model correctly classifies almost 94% of the observations with a true rating of AAA, but has difficulty classifying observations with a true rating of B. 3

Check the test set performance of the model. On the Test tab, in the Test section, click Test Selected. The app computes the test set performance of the model trained on the full data set, including training and validation data.

4

Compare the validation and test accuracy for the model. On the model Summary tab, compare the Accuracy (Validation) value under Training Results to the Accuracy (Test) value under Test Results. In this example, the two values are similar.

23-215

23

Classification Learner

Export Model to Experiment Manager 1

To try to improve the classification accuracy of the model, export it to Experiment Manager. On the Learn tab, in the Export section, click Export Model and select Create Experiment. The Create Experiment dialog box opens.

Because the Rating response variable has multiple classes, the efficient linear SVM model is a multiclass ECOC model, trained using the fitcecoc function (with linear binary learners). 2

23-216

In the Create Experiment dialog box, click Create Experiment. The app opens Experiment Manager and a new dialog box.

Tune Classification Model Using Experiment Manager

3

In the dialog box, choose a new or existing project for your experiment. For this example, create a new project, and specify TrainEfficientModelProject as the filename in the Specify Project Folder Name dialog box.

Run Experiment with Default Hyperparameters 1

Run the experiment either sequentially or in parallel. Note • If you have Parallel Computing Toolbox, save time by running the experiment in parallel. On the Experiment Manager tab, in the Execution section, select Simultaneous from the Mode list. • Otherwise, use the default Mode option of Sequential.

On the Experiment Manager tab, in the Run section, click Run. Experiment Manager opens a new tab that displays the results of the experiment. At each trial, the app trains a model with a different combination of hyperparameter values, as specified in the Hyperparameters table in the Experiment1 tab. 2

After the app runs the experiment, check the results. In the table of results, click the arrow for the ValidationAccuracy column and select Sort in Descending Order.

23-217

23

Classification Learner

Notice that the models with the greatest validation accuracy all have the same Coding value, onevsone. 3

Check the confusion matrix for the model with the greatest accuracy. On the Experiment Manager tab, in the Review Results section, click Confusion Matrix (Validation). In the Visualizations pane, the app displays the confusion matrix for the model.

For this model, all misclassifications have a predicted value that is only one category away from the true value.

Adjust Hyperparameters and Hyperparameter Values

23-218

1

The one-versus-one coding design seems best for this data set. To try to obtain a better classifier, fix the Coding hyperparameter value as onevsone and then rerun the experiment. Click the Experiment1 tab. In the Hyperparameters table, select the row for the Coding hyperparameter. Then click Delete.

2

To specify the coding design value, open the training function file. In the Training Function section, click Edit. The app opens the Experiment1_training1.mlx file.

Tune Classification Model Using Experiment Manager

3

In the file, search for the lines of code that use the fitcecoc function. This function is used to create multiclass linear classifiers. Specify the coding design value as a name-value argument. In this case, adjust the two calls to fitcecoc by adding 'Coding','onevsone' as follows. classificationLinear = fitcecoc(predictors, response, ... 'Learners', template, ecocParamsNameValuePairs{:}, ... 'ClassNames', classNames, 'Coding', 'onevsone'); classificationLinear = fitcecoc(trainingPredictors, ... trainingResponse, 'Learners', template, ... ecocParamsNameValuePairs{:}, 'ClassNames', classNames, ... 'Coding', 'onevsone');

Save the code changes, and close the file. 4

On the Experiment Manager tab, in the Run section, click Run.

5

To further vary the models evaluated during the experiment, add the regularization hyperparameter to the model tuning process. On the Experiment1 tab, in the Hyperparameters section, click Add. Edit the row entries so that the hyperparameter name is Regularization, the range is ["lasso","ridge"], and the type is categorical.

For more information on the hyperparameters you can tune for your model, see “Export Model from Classification Learner to Experiment Manager” on page 23-205. 6

On the Experiment Manager tab, in the Run section, click Run.

7

Adjust the range of values for the regularization term (lambda). On the Experiment1 tab, in the Hyperparameters table, change the Lambda range so that the upper bound is 3.7383e-02.

8

On the Experiment Manager tab, in the Run section, click Run.

Specify Training Data 1

Before running the experiment again, specify to use all the observations in creditTrain. Because you reserved some observations for testing when you imported the training data into Classification Learner, all experiments so far have used only 85% of the observations in the creditTrain data set. Save the creditTrain data set as the file fullTrainingData.mat in the TrainEfficientModelProject folder, which contains the experiment files. To do so, right23-219

23

Classification Learner

click the creditTrain variable name in the MATLAB workspace, and click Save As. In the dialog box, specify the filename and location, and then click Save. 2

On the Experiment1 tab, in the Training Function section, click Edit.

3

In the Experiment1_training1.mlx file, search for the load command. Specify to use the full creditTrain data set for model training by adjusting the code as follows. % Load training data fileData = load("fullTrainingData.mat"); trainingData = fileData.creditTrain;

4

On the Experiment1 tab, in the Description section, change the number of observations to 3146, which is the number of rows in the creditTrain table.

5

On the Experiment Manager tab, in the Run section, click Run.

6

Instead of using all predictors, you can use a subset of the predictors to train and tune your model. In this case, omit the Industry variable from the model training process. On the Experiment1 tab, in the Training Function section, click Edit. In the Experiment1_training1.mlx file, search for the lines of code that specify the variables predictorNames and isCategoricalPredictor. Remove references to the Industry variable by adjusting the code as follows. predictorNames = {'WC_TA', 'RE_TA', 'EBIT_TA', 'MVE_BVTD', 'S_TA'}; isCategoricalPredictor = [false, false, false, false, false];

7

On the Experiment1 tab, in the Description section, change the number of predictors to 5.

8

On the Experiment Manager tab, in the Run section, click Run.

Customize Confusion Matrix 1

You can customize the visualization returned by Experiment Manager at each trial. In this case, customize the validation confusion matrix so that it displays the true positive rates and false negative rates. On the Experiment1 tab, in the Training Function section, click Edit.

2

In the Experiment1_training1.mlx file, search for the confusionchart function. This function creates the validation confusion matrix for each trained model. Specify to display the number of correctly and incorrectly classified observations for each true class as percentages of the number of observations of the corresponding true class. Adjust the code as follows. cm = confusionchart(response, validationPredictions, ... 'RowSummary', 'row-normalized');

23-220

3

On the Experiment Manager tab, in the Run section, click Run.

4

In the table of results, click the arrow for the ValidationAccuracy column and select Sort in Descending Order.

5

Check the confusion matrix for the model with the greatest accuracy. On the Experiment Manager tab, in the Review Results section, click Confusion Matrix (Validation). In the Visualizations pane, the app displays the confusion matrix for the model.

Tune Classification Model Using Experiment Manager

Like the best-performing model trained in Classification Learner, this model has difficulty classifying observations with a true rating of B. However, this model is better at classifying observations with a true rating of CCC.

Export and Use Final Model 1

You can export a model trained in Experiment Manager to the MATLAB workspace. Select the best-performing model from the most recently run experiment. On the Experiment Manager tab, in the Export section, click Export and select Training Output.

2

In the Export dialog box, change the workspace variable name to finalLinearModel and click OK. The new variable appears in your workspace.

3

Use the exported finalLinearModel structure to make predictions using new data. You can use the structure in the same way that you use any trained model exported from the Classification Learner app. For more information, see “Make Predictions for New Data Using Exported Model” on page 23-86. In this case, predict labels for the test data in creditTest. testLabels = finalLinearModel.predictFcn(creditTest);

4

Create a confusion matrix using the true test data response and the predicted labels. cm = confusionchart(creditTest.Rating,testLabels, ... "RowSummary","row-normalized");

23-221

23

Classification Learner

5

Compute the model test set accuracy using the values in the confusion matrix. testAccuracy = sum(diag(cm.NormalizedValues))/ ... sum(cm.NormalizedValues,"all") 0.8015

The test set accuracy for this tuned model (80.2%) is greater than the test set accuracy for the efficient linear SVM classifier in Classification Learner (76.4%). However, keep in mind that the tuned model uses observations in creditTest as test data and the Classification Learner model uses a subset of the observations in creditTrain as test data.

See Also Apps Experiment Manager | Classification Learner Functions fitclinear | fitcecoc

Related Examples

23-222

•

“Export Model from Classification Learner to Experiment Manager” on page 23-205

•

“Export Classification Model to Predict New Data” on page 23-86

•

“Manage Experiments” (Deep Learning Toolbox)

24 Regression Learner • “Train Regression Models in Regression Learner App” on page 24-2 • “Select Data for Regression or Open Saved App Session” on page 24-8 • “Choose Regression Model Options” on page 24-13 • “Feature Selection and Feature Transformation Using Regression Learner App” on page 24-31 • “Hyperparameter Optimization in Regression Learner App” on page 24-36 • “Visualize and Assess Model Performance in Regression Learner” on page 24-50 • “Export Plots in Regression Learner App” on page 24-61 • “Export Regression Model to Predict New Data” on page 24-65 • “Train Regression Trees Using Regression Learner App” on page 24-71 • “Compare Linear Regression Models Using Regression Learner App” on page 24-82 • “Train Regression Neural Networks Using Regression Learner App” on page 24-88 • “Train Kernel Approximation Model Using Regression Learner App” on page 24-95 • “Train Regression Model Using Hyperparameter Optimization in Regression Learner App” on page 24-103 • “Check Model Performance Using Test Set in Regression Learner App” on page 24-109 • “Explain Model Predictions for Regression Models Trained in Regression Learner App” on page 24-114 • “Use Partial Dependence Plots to Interpret Regression Models Trained in Regression Learner App” on page 24-125 • “Deploy Model Trained in Regression Learner to MATLAB Production Server” on page 24-137 • “Export Model from Regression Learner to Experiment Manager” on page 24-141 • “Tune Regression Model Using Experiment Manager” on page 24-147

24

Regression Learner

Train Regression Models in Regression Learner App In this section... “Automated Regression Model Training” on page 24-2 “Manual Regression Model Training” on page 24-4 “Parallel Regression Model Training” on page 24-5 “Compare and Improve Regression Models” on page 24-5 You can use Regression Learner to train regression models including linear regression models, regression trees, Gaussian process regression models, support vector machines, kernel approximation, ensembles of regression trees, and neural network regression models. In addition to training models, you can explore your data, select features, specify validation schemes, and evaluate results. You can export a model to the workspace to use the model with new data or generate MATLAB code to learn about programmatic regression. Training a model in Regression Learner consists of two parts: • Validated Model: Train a model with a validation scheme. By default, the app protects against overfitting by applying cross-validation. Alternatively, you can choose holdout validation. The validated model is visible in the app. • Full Model: Train a model on full data, excluding test data. The app trains this model simultaneously with the validated model. However, the model trained on full data is not visible in the app. When you choose a regression model to export to the workspace, Regression Learner exports the full model. Note The app does not use test data for model training. Models exported from the app are trained on the full data, excluding any data reserved for testing. The app displays the results of the validated model. Diagnostic measures, such as model accuracy, and plots, such as a response plot or residuals plot, reflect the validated model results. You can automatically train one or more regression models, compare validation results, and choose the best model that works for your regression problem. When you choose a model to export to the workspace, Regression Learner exports the full model. Because Regression Learner creates a model object of the full model during training, you experience no lag time when you export the model. You can use the exported model to make predictions on new data. To get started by training a selection of model types, see “Automated Regression Model Training” on page 24-2. If you already know which regression model you want to train, see “Manual Regression Model Training” on page 24-4.

Automated Regression Model Training You can use Regression Learner to automatically train a selection of different regression models on your data. • Get started by automatically training multiple models simultaneously. You can quickly try a selection of models, and then explore promising models interactively. 24-2

Train Regression Models in Regression Learner App

• If you already know what model type you want, then you can train individual models instead. See “Manual Regression Model Training” on page 24-4. 1

On the Apps tab, in the Machine Learning and Deep Learning group, click Regression Learner to open the Regression Learner app.

2

On the Learn tab, in the File section, click New Session and select data from the workspace or from a file. Specify a response variable and variables to use as predictors. Alternatively, click Open to open a previously saved app session. See “Select Data for Regression or Open Saved App Session” on page 24-8.

3

In the Models section, select All Quick-To-Train. This option trains all the model presets that are fast to fit.

4

In the Train section, click Train All and select Train All. Note If you have Parallel Computing Toolbox, the app trains the models in parallel by default. See “Parallel Regression Model Training” on page 24-5. A selection of model types appears in the Models pane. When the models finish training, the best RMSE (Validation) score is outlined in a box.

5

Click models in the Models pane and open the corresponding plots to explore the results. For the next steps, see “Manual Regression Model Training” on page 24-4 or “Compare and Improve Regression Models” on page 24-5.

6

To try all the nonoptimizable model presets available, click All in the Models section of the Learn tab. 24-3

24

Regression Learner

7

In the Train section, click Train All and select Train Selected.

Manual Regression Model Training To explore individual model types, you can train models one at a time or as a group. 1

Choose a model type. On the Learn tab, in the Models section, click a model type. To see all available model options, click the arrow in the Models section to expand the list of regression models. The nonoptimizable model options in the gallery are preset starting points with different settings, suitable for a range of different regression problems. To read descriptions of the models, switch to the details view.

For more information on each option, see “Choose Regression Model Options” on page 24-13. 2

After selecting a model, you can train the model. In the Train section, click Train All and select Train Selected. Repeat the process to explore different models. Alternatively, you can create several draft models and then train the models as a group. In the Train section, click Train All and select Train All.

24-4

Train Regression Models in Regression Learner App

Tip Select regression trees first. If your trained models do not predict the response accurately enough, then try other models with higher flexibility. To avoid overfitting, look for a less flexible model that provides sufficient accuracy. 3

If you want to try all nonoptimizable models of the same or different types, then select one of the All options in the Models gallery. Alternatively, if you want to automatically tune hyperparameters of a specific model type, select the corresponding Optimizable model and perform hyperparameter optimization. For more information, see “Hyperparameter Optimization in Regression Learner App” on page 24-36.

For next steps, see “Compare and Improve Regression Models” on page 24-5.

Parallel Regression Model Training You can train models in parallel using Regression Learner if you have Parallel Computing Toolbox. Parallel training allows you to train multiple models simultaneously and continue working. To control parallel training, toggle the Use Parallel button in the Train section of the Learn tab. To train draft models in parallel, ensure the button is toggled on before clicking Train All. The Use Parallel button is available only if you have Parallel Computing Toolbox.

The Use Parallel button is on by default. The first time you click Train All and select Train All or Train Selected, a dialog box is displayed while the app opens a parallel pool of workers. After the pool opens, you can train multiple models at once. When models are training in parallel, progress indicators appear on each training and queued model in the Models pane. If you want, you can cancel individual models. During training, you can examine results and plots from models, and initiate training of more models. If you have Parallel Computing Toolbox, then parallel training is available for nonoptimizable models in Regression Learner, and you do not need to set the UseParallel option of the statset function. Note Even if you do not have Parallel Computing Toolbox, you can keep the app responsive during model training. Before training draft models, on the Learn tab, in the Train section, click Train All and ensure the Use Background Training check box is selected. Then, select the Train All option. A dialog box is displayed while the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

Compare and Improve Regression Models 1

Examine the RMSE (Validation) score reported in the Models pane for each model. Click models in the Models pane and open the corresponding plots to explore the results. Compare model performance by inspecting results in the plots. You can rearrange the layout of the plots to compare results across multiple models: use the options in the Layout button, drag and drop 24-5

24

Regression Learner

plots, or select the options provided by the Document Actions button located to the right of the model plot tabs. Additionally, you can compare the models by using the Sort by options in the Models pane. Delete any unwanted model by selecting the model and clicking the Delete selected model button in the upper right of the pane or right-clicking the model and selecting Delete. See “Visualize and Assess Model Performance in Regression Learner” on page 24-50. 2

Select the best model in the Models pane and then try including and excluding different features in the model. First, create a copy of the model. After selecting the model, either click the Duplicate selected model button in the upper right of the Models pane or right-click the model and select Duplicate. Then, click Feature Selection in the Options section of the Learn tab. Use the available feature ranking algorithms to select features. Try the response plot to help you identify features to remove. See if you can improve the model by removing features with low predictive power. Specify predictors to include in the model, and train new models using the new options. Compare results among the models in the Models pane. You also can try transforming features with PCA to reduce dimensionality. Click PCA in the Options section of the Learn tab. See “Feature Selection and Feature Transformation Using Regression Learner App” on page 2431.

3

To try to improve the model further, you can duplicate it, change the hyperparameter options in the Model Hyperparameters section of the model Summary tab, and then train the model using the new options. To learn how to control model flexibility, see “Choose Regression Model Options” on page 24-13. For information on how to tune model hyperparameters automatically, see “Hyperparameter Optimization in Regression Learner App” on page 24-36. If feature selection, PCA, or new hyperparameter values improve your model, try training All model types with the new settings. See if another model type does better with the new settings.

Tip To avoid overfitting, look for a less flexible model that provides sufficient accuracy. For example, look for simple models, such as regression trees that are fast and easy to interpret. If your models are not accurate enough, then try other models with higher flexibility, such as ensembles. To learn about the model flexibility, see “Choose Regression Model Options” on page 24-13. This figure shows the app with a Models pane containing various regression model types.

24-6

Train Regression Models in Regression Learner App

For a step-by-step example comparing different regression models, see “Train Regression Trees Using Regression Learner App” on page 24-71. Next, you can generate code to train the model with different data or export trained models to the workspace to make predictions using new data. See “Export Regression Model to Predict New Data” on page 24-65.

See Also Related Examples •

“Select Data for Regression or Open Saved App Session” on page 24-8

•

“Choose Regression Model Options” on page 24-13

•

“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-31

•

“Visualize and Assess Model Performance in Regression Learner” on page 24-50

•

“Export Regression Model to Predict New Data” on page 24-65

•

“Train Regression Trees Using Regression Learner App” on page 24-71

24-7

24

Regression Learner

Select Data for Regression or Open Saved App Session In this section... “Select Data from Workspace” on page 24-8 “Import Data from File” on page 24-9 “Example Data for Regression” on page 24-9 “Choose Validation Scheme” on page 24-10 “(Optional) Reserve Data for Testing” on page 24-11 “Save and Open App Session” on page 24-11 When you first launch the Regression Learner app, you can choose to import data or to open a previously saved app session. To import data, see “Select Data from Workspace” on page 24-8 and “Import Data from File” on page 24-9. To open a saved session, see “Save and Open App Session” on page 24-11.

Select Data from Workspace

Tip In Regression Learner, tables are the easiest way to work with your data, because they can contain numeric and label data. Use the Import Tool to bring your data into the MATLAB workspace as a table, or use the table functions to create a table from workspace variables. See “Tables”. 1

Load your data into the MATLAB workspace. Predictor variables can be numeric, categorical, string, or logical vectors, cell arrays of character vectors, or character arrays. The response variable must be a floating-point vector (single or double precision). Combine the predictor data into one variable, either a table or a matrix. You can additionally combine your predictor data and response variable, or you can keep them separate. For example data sets, see “Example Data for Regression” on page 24-9.

2

On the Apps tab, click Regression Learner to open the app.

3

On the Learn tab, in the File section, click New Session > From Workspace.

4

In the New Session from Workspace dialog box, under Data Set Variable, select a table or matrix from the workspace variables. If you select a matrix, choose whether to use rows or columns for observations by clicking the option buttons.

5

Under Response, observe the default response variable. The app tries to select a suitable response variable from the data set variable and treats all other variables as predictors. If you want to use a different response variable, you can: • Use the list to select another variable from the data set variable. • Select a separate workspace variable by clicking the From workspace option button and then selecting a variable from the list.

24-8

Select Data for Regression or Open Saved App Session

6

Under Predictors, add or remove predictors using the check boxes. Add or remove all predictors by clicking Add All or Remove All. You can also add or remove multiple predictors by selecting them in the table, and then clicking Add N or Remove N, where N is the number of selected predictors. The Add All and Remove All buttons change to Add N and Remove N when you select multiple predictors.

7

Click Start Session to accept the default validation scheme and continue. The default validation option is 5-fold cross-validation, which protects against overfitting. Tip If you have a large data set, you might want to switch to holdout validation. To learn more, see “Choose Validation Scheme” on page 24-10.

Note If you prefer loading data into the app directly from the command line, you can specify the predictor data, response variable, and validation type to use in Regression Learner in the command line call to regressionLearner. For more information, see Regression Learner. For next steps, see “Train Regression Models in Regression Learner App” on page 24-2.

Import Data from File 1

On the Learn tab, in the File section, select New Session > From File.

2

Select a file type in the list, such as spreadsheets, text files, or comma-separated values (.csv) files, or select All Files to browse for other file types such as .dat.

Example Data for Regression To get started using Regression Learner, try these example data sets. Name

Size

Description

Cars

Number of predictors: 7 Number of observations: 406 Response: MPG (miles per gallon)

Data on different car models, 1970– 1982. Predict the fuel economy (in miles per gallon), or one of the other characteristics. For a step-by-step example, see “Train Regression Trees Using Regression Learner App” on page 24-71.

Create a table from variables in the carbig data set. load carbig cartable = table(Acceleration,Cylinders,Displacement, ... Horsepower,Model_Year,Weight,Origin,MPG);

Hospital

Number of predictors: 5 Number of observations: 100 Response: Diastolic

Simulated hospital data. Predict the diastolic blood pressure of patients.

24-9

24

Regression Learner

Name

Size

Description

Create a table from variables in the patients data set. load patients hospitaltable = table(Gender,Age,Weight,Smoker,Systolic, ... Diastolic);

Choose Validation Scheme Choose a validation method to examine the predictive accuracy of the fitted models. Validation estimates model performance on new data, and helps you choose the best model. Validation protects against overfitting. A model that is too flexible and suffers from overfitting has a worse validation accuracy. Choose a validation scheme before training any models so that you can compare all the models in your session using the same validation scheme. Tip Try the default validation scheme and click Start Session to continue. The default option is 5fold cross-validation, which protects against overfitting. If you have a large data set and training the models takes too long using cross-validation, reimport your data and try the faster holdout validation instead. Assume that no data is reserved for testing, which is true by default. • Cross-Validation: Select the number of folds (or divisions) to partition the data set. If you choose k folds, then the app: 1

Partitions the data into k disjoint sets or folds

2

For each validation fold:

3

a

Trains a model using the training-fold observations (observations not in the validation fold)

b

Assesses model performance using validation-fold data

Calculates the average validation error over all folds

This method gives a good estimate of the predictive accuracy of the final model trained using the full data set. The method requires multiple fits, but makes efficient use of all the data, so it works well for small data sets. • Holdout Validation: Select a percentage of the data to use as a validation set. The app trains a model on the training set and assesses its performance with the validation set. The model used for validation is based on only a portion of the data, so holdout validation is appropriate only for large data sets. The final model is trained using the full data set. • Resubstitution Validation: No protection against overfitting. The app uses all the data for training and computes the error rate on the same data. Without any separate validation data, you get an unrealistic estimate of the model’s performance on new data. That is, the training sample accuracy is likely to be unrealistically high, and the predictive accuracy is likely to be lower. To help you avoid overfitting to the training data, choose another validation scheme instead. 24-10

Select Data for Regression or Open Saved App Session

Note The validation scheme only affects the way that Regression Learner computes validation metrics. The final model is always trained using the full data set, excluding any data reserved for testing. All the models you train after selecting data use the same validation scheme that you select in this dialog box. You can compare all the models in your session using the same validation scheme. To change the validation selection and train new models, you can select data again, but you lose any trained models. The app warns you that importing data starts a new session. Save any trained models you want to keep to the workspace, and then import the data. For next steps training models, see “Train Regression Models in Regression Learner App” on page 24-2.

(Optional) Reserve Data for Testing When you import data into Regression Learner, you can specify to reserve a percentage of the data for testing. In the Test section of the New Session dialog box, click the check box to set aside a test data set. Specify the percentage of the imported data to use as a test set. If you prefer, you can still choose to import a separate test data set after starting an app session. You can use the test set to evaluate the performance of a trained model. In particular, you can check whether the validation metrics provide good estimates for the model performance on new data. For more information, see “Evaluate Test Set Model Performance” on page 24-59. For an example, see “Train Regression Model Using Hyperparameter Optimization in Regression Learner App” on page 24-103. Note The app does not use test data for model training. Models exported from the app are trained on the full training and validation data, excluding any data reserved for testing.

Save and Open App Session In Regression Learner, you can save the current app session and open a previously saved app session. • To save the current app session, click Save in the File section of the Learn tab. When you first save the current session, you must specify the session file name and the file location. The Save Session option saves the current session, and the Save Session As option saves the current session to a new file. The Save Compact Session As option saves a compact version of the current app session, resulting in a smaller file size for the saved session. Note that the Save Compact Session As option permanently deletes the training data from all trained models in the current session. • To open a saved app session, click Open in the File section. In the Select File to Open dialog box, select the saved session you want to open.

See Also Related Examples •

“Train Regression Models in Regression Learner App” on page 24-2 24-11

24

Regression Learner

24-12

•

“Choose Regression Model Options” on page 24-13

•

“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-31

•

“Visualize and Assess Model Performance in Regression Learner” on page 24-50

•

“Export Regression Model to Predict New Data” on page 24-65

•

“Train Regression Trees Using Regression Learner App” on page 24-71

Choose Regression Model Options

Choose Regression Model Options In this section... “Choose Regression Model Type” on page 24-13 “Linear Regression Models” on page 24-15 “Regression Trees” on page 24-17 “Support Vector Machines” on page 24-19 “Efficiently Trained Linear Regression Models” on page 24-21 “Gaussian Process Regression Models” on page 24-23 “Kernel Approximation Models” on page 24-25 “Ensembles of Trees” on page 24-27 “Neural Networks” on page 24-28

Choose Regression Model Type You can use the Regression Learner app to automatically train a selection of different models on your data. Use automated training to quickly try a selection of model types, and then explore promising models interactively. To get started, try these options first:

Get Started Regression Model Options

Description

All Quick-To-Train

Try the All Quick-To-Train option first. The app trains all model types that are typically quick to train.

All

Use the All option to train all available nonoptimizable model types. Trains every type regardless of any prior trained models. Can be time-consuming.

To learn more about automated model training, see “Automated Regression Model Training” on page 24-2. If you want to explore models one at a time, or if you already know what model type you want, you can select individual models or train a group of the same type. To see all available regression model options, on the Learn tab, click the arrow in the Models section to expand the list of regression models. The nonoptimizable model options in the gallery are preset starting points with different settings, suitable for a range of different regression problems. To use optimizable model options and tune model hyperparameters automatically, see “Hyperparameter Optimization in Regression Learner App” on page 24-36. For help choosing the best model type for your problem, see the tables showing typical characteristics of different regression model types. Decide on the tradeoff you want in speed, flexibility, and interpretability. The best model type depends on your data. 24-13

24

Regression Learner

Tip To avoid overfitting, look for a less flexible model that provides sufficient accuracy. For example, look for simple models such as regression trees that are fast and easy to interpret. If the models are not accurate enough predicting the response, choose other models with higher flexibility, such as ensembles. To control flexibility, see the details for each model type. Characteristics of Regression Model Types Regression Model Type

Interpretability

“Linear Regression Models” on page 24-15

Easy

“Regression Trees” on page 24-17

Easy

“Support Vector Machines” on page 24-19

Easy for linear SVMs. Hard for other kernels.

“Efficiently Trained Linear Regression Models” Easy on page 24-21

“Gaussian Process Regression Models” on page 24-23

Hard

“Kernel Approximation Models” on page 2425

Hard

“Ensembles of Trees” on page 24-27

Hard

“Neural Networks” on page 24-28

Hard

To read a description of each model in Regression Learner, switch to the details view in the list of all model presets.

24-14

Choose Regression Model Options

Tip The nonoptimizable models in the Models gallery are preset starting points with different settings. After you choose a model type, such as regression trees, try training all the nonoptimizable presets to see which one produces the best model with your data. For workflow instructions, see “Train Regression Models in Regression Learner App” on page 24-2. Categorical Predictor Support In Regression Learner, all model types support categorical predictors. Tip If you have categorical predictors with many unique values, training linear models with interaction or quadratic terms and stepwise linear models can use a lot of memory. If the model fails to train, try removing these categorical predictors.

Linear Regression Models Linear regression models have predictors that are linear in the model parameters, are easy to interpret, and are fast for making predictions. These characteristics make linear regression models popular models to try first. However, the highly constrained form of these models means that they often have low predictive accuracy. After fitting a linear regression model, try creating more flexible models, such as regression trees, and compare the results.

Tip In the Models gallery, click All Linear to try each of the linear regression options and see which settings produce the best model with your data. Select the best model in the Models pane and try to improve that model by using feature selection and changing some advanced options.

24-15

24

Regression Learner

Regression Model Type

Interpretability

Model Flexibility

Linear

Easy

Very low

Interactions Linear

Easy

Medium

Robust Linear

Easy

Very low. Less sensitive to outliers, but can be slow to train.

Stepwise Linear

Easy

Medium

For a workflow example, see “Train Regression Trees Using Regression Learner App” on page 2471. Linear Regression Model Hyperparameter Options Regression Learner uses the fitlm function to train Linear, Interactions Linear, and Robust Linear models. The app uses the stepwiselm function to train Stepwise Linear models. For Linear, Interactions Linear, and Robust Linear models you can set these options: • Terms Specify which terms to use in the linear model. You can choose from: • Linear. A constant term and linear terms in the predictors • Interactions. A constant term, linear terms, and interaction terms between the predictors • Pure Quadratic. A constant term, linear terms, and terms that are purely quadratic in each of the predictors • Quadratic. A constant term, linear terms, and quadratic terms (including interactions) • Robust option Specify whether to use a robust objective function and make your model less sensitive to outliers. With this option, the fitting method automatically assigns lower weights to data points that are more likely to be outliers. Stepwise linear regression starts with an initial model and systematically adds and removes terms to the model based on the explanatory power of these incrementally larger and smaller models. For Stepwise Linear models, you can set these options: • Initial terms Specify the terms that are included in the initial model of the stepwise procedure. You can choose from Constant, Linear, Interactions, Pure Quadratic, and Quadratic. • Upper bound on terms Specify the highest order of the terms that the stepwise procedure can add to the model. You can choose from Linear, Interactions, Pure Quadratic, and Quadratic. 24-16

Choose Regression Model Options

• Maximum number of steps Specify the maximum number of different linear models that can be tried in the stepwise procedure. To speed up training, try reducing the maximum number of steps. Selecting a small maximum number of steps decreases your chances of finding a good model. Tip If you have categorical predictors with many unique values, training linear models with interaction or quadratic terms and stepwise linear models can use a lot of memory. If the model fails to train, try removing these categorical predictors.

Regression Trees Regression trees are easy to interpret, fast for fitting and prediction, and low on memory usage. Try to grow smaller trees with fewer larger leaves to prevent overfitting. Control the leaf size with the Minimum leaf size setting.

Tip In the Models gallery, click All Trees to try each of the nonoptimizable regression tree options and see which settings produce the best model with your data. Select the best model in the Models pane, and try to improve that model by using feature selection and changing some advanced options. Regression Model Type Interpretability

Model Flexibility

Fine Tree

Easy

High Many small leaves for a highly flexible response function (Minimum leaf size is 4.)

Medium Tree

Easy

Medium Medium-sized leaves for a less flexible response function (Minimum leaf size is 12.)

Coarse Tree

Easy

Low Few large leaves for a coarse response function (Minimum leaf size is 36.)

To predict a response of a regression tree, follow the tree from the root (beginning) node down to a leaf node. The leaf node contains the value of the response. Statistics and Machine Learning Toolbox trees are binary. Each step in a prediction involves checking the value of one predictor variable. For example, here is a simple regression tree

24-17

24

Regression Learner

This tree predicts the response based on two predictors, x1 and x2. To make a prediction, start at the top node. At each node, check the values of the predictors to decide which branch to follow. When the branches reach a leaf node, the response is set to the value corresponding to that node. You can visualize your regression tree model by exporting the model from the app, and then entering: view(trainedModel.RegressionTree,"Mode","graph")

For a workflow example, see “Train Regression Trees Using Regression Learner App” on page 2471. Regression Tree Model Hyperparameter Options The Regression Learner app uses the fitrtree function to train regression trees. You can set these options: • Minimum leaf size Specify the minimum number of training samples used to calculate the response of each leaf node. When you grow a regression tree, consider its simplicity and predictive power. To change the minimum leaf size, click the buttons or enter a positive integer value in the Minimum leaf size box. • A fine tree with many small leaves is usually highly accurate on the training data. However, the tree might not show comparable accuracy on an independent test set. A very leafy tree tends to overfit, and its validation accuracy is often far lower than its training (or resubstitution) accuracy. • In contrast, a coarse tree with fewer large leaves does not attain high training accuracy. But a coarse tree can be more robust in that its training accuracy can be near that of a representative test set. 24-18

Choose Regression Model Options

Tip Decrease the Minimum leaf size to create a more flexible model. • Surrogate decision splits — For missing data only. Specify surrogate use for decision splits. If you have data with missing values, use surrogate splits to improve the accuracy of predictions. When you set Surrogate decision splits to On, the regression tree finds at most 10 surrogate splits at each branch node. To change the number of surrogate splits, click the buttons or enter a positive integer value in the Maximum surrogates per node box. When you set Surrogate decision splits to Find All, the regression tree finds all surrogate splits at each branch node. The Find All setting can use considerable time and memory. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Regression Learner App” on page 24-36.

Support Vector Machines You can train regression support vector machines (SVMs) in Regression Learner. Linear SVMs are easy to interpret, but can have low predictive accuracy. Nonlinear SVMs are more difficult to interpret, but can be more accurate.

Tip In the Models gallery, click All SVMs to try each of the nonoptimizable SVM options and see which settings produce the best model with your data. Select the best model in the Models pane, and try to improve that model by using feature selection and changing some advanced options. Regression Model Type

Interpretability

Model Flexibility

Linear SVM

Easy

Low

Quadratic SVM

Hard

Medium

Cubic SVM

Hard

Medium

Fine Gaussian SVM Hard

High Allows rapid variations in the response function. Kernel scale is set to sqrt(P)/4, where P is the number of predictors.

Medium Gaussian SVM

Medium Gives a less flexible response function. Kernel scale is set to sqrt(P).

Hard

24-19

24

Regression Learner

Regression Model Type

Interpretability

Model Flexibility

Coarse Gaussian SVM

Hard

Low Gives a rigid response function. Kernel scale is set to sqrt(P)*4.

Statistics and Machine Learning Toolbox implements linear epsilon-insensitive SVM regression. This SVM ignores prediction errors that are less than some fixed number ε. The support vectors are the data points that have errors larger than ε. The function the SVM uses to predict new values depends only on the support vectors. To learn more about SVM regression, see Understanding Support Vector Machine Regression on page 25-31. For a workflow example, see “Train Regression Trees Using Regression Learner App” on page 2471. SVM Model Hyperparameter Options Regression Learner uses the fitrsvm function to train SVM regression models. You can set these options in the app: • Kernel function The kernel function determines the nonlinear transformation applied to the data before the SVM is trained. You can choose from: • Gaussian or Radial Basis Function (RBF) kernel • Linear kernel, easiest to interpret • Quadratic kernel • Cubic kernel • Box constraint mode The box constraint controls the penalty imposed on observations with large residuals. A larger box constraint gives a more flexible model. A smaller value gives a more rigid model, less sensitive to overfitting. When Box constraint mode is set to Auto, the app uses a heuristic procedure to select the box constraint. Try to fine-tune your model by specifying the box constraint manually. Set Box constraint mode to Manual and specify a value. Change the value by clicking the arrows or entering a positive scalar value in the Manual box constraint box. The app automatically preselects a reasonable value for you. Try to increase or decrease this value slightly and see if this improves your model. Tip Increase the box constraint value to create a more flexible model. • Epsilon mode Prediction errors that are smaller than the epsilon (ε) value are ignored and treated as equal to zero. A smaller epsilon value gives a more flexible model. 24-20

Choose Regression Model Options

When Epsilon mode is set to Auto, the app uses a heuristic procedure to select the kernel scale. Try to fine-tune your model by specifying the epsilon value manually. Set Epsilon mode to Manual and specify a value. Change the value by clicking the arrows or entering a positive scalar value in the Manual epsilon box. The app automatically preselects a reasonable value for you. Try to increase or decrease this value slightly and see if this improves your model. Tip Decrease the epsilon value to create a more flexible model. • Kernel scale mode The kernel scale controls the scale of the predictors on which the kernel varies significantly. A smaller kernel scale gives a more flexible model. When Kernel scale mode is set to Auto, the app uses a heuristic procedure to select the kernel scale. Try to fine-tune your model by specifying the kernel scale manually. Set Kernel scale mode to Manual and specify a value. Change the value by clicking the arrows or entering a positive scalar value in the Manual kernel scale box. The app automatically preselects a reasonable value for you. Try to increase or decrease this value slightly and see if this improves your model. Tip Decrease the kernel scale value to create a more flexible model. • Standardize data Standardizing the predictors transforms them so that they have mean 0 and standard deviation 1. Standardizing removes the dependence on arbitrary scales in the predictors and generally improves performance. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Regression Learner App” on page 24-36.

Efficiently Trained Linear Regression Models The efficiently trained linear regression models use techniques that reduce the training computation time at the cost of some accuracy. The available efficiently trained models are linear least-squares models and linear support vector machines (SVMs). When training on data with many predictors or many observations, consider using efficiently trained linear regression models instead of the existing linear or linear SVM preset models. Tip In the Models gallery, click All Efficiently Trained Linear Models to try each of the preset efficient linear model options and see which settings produce the best model with your data. Select the best model in the Models pane, and try to improve that model by using feature selection and changing some advanced options.

24-21

24

Regression Learner

Regression Model Type

Interpretability

Model Flexibility

Efficient Linear Least Squares

Easy

Medium — increases as the Beta tolerance setting decreases

Efficient Linear SVM

Easy

Medium — increases as the Beta tolerance setting decreases

For an example, see “Compare Linear Regression Models Using Regression Learner App” on page 2482. Efficiently Trained Linear Model Hyperparameter Options Regression Learner uses the fitrlinear function to create efficiently trained linear regression models. You can set the following options: • Learner — Specify the learner type for the efficient linear regression model, either SVM or Least squares. SVM models use an epsilon-insensitive loss during model fitting, whereas least-squares models use a mean squared error (MSE). For more information, see Learner. • Solver — Specify the objective function minimization technique to use for training. Depending on your data and the other hyperparameter values, the available solver options are SGD, ASGD, Dual SGD, BFGS, LBFGS, SpaRSA, and Auto. When you set this option to Auto, the software selects: • BFGS when the data contains 100 or fewer predictor variables and the model uses a ridge penalty • SpaRSA when the data contains 100 or fewer predictor variables and the model uses a lasso penalty • Dual SGD when the data contains more than 100 predictor variables and the model uses an SVM learner with a ridge penalty • SGD otherwise For more information, see Solver. • Regularization — Specify the complexity penalty type, either a lasso (L1) penalty or a ridge (L2) penalty. Depending on the other hyperparameter values, the available regularization options are Lasso, Ridge, and Auto. When you set this option to Auto, the software selects: • Lasso when the model uses a SpaRSA solver • Ridge otherwise For more information, see Regularization. • Regularization strength (Lambda) — Specify lambda, the regularization strength. 24-22

Choose Regression Model Options

• When you set this option to Auto, the software sets the regularization strength to 1/n, where n is the number of observations. • When you set this option to Manual, you can specify a value by clicking the arrows or entering a positive scalar value in the box. For more information, see Lambda. • Beta tolerance (Beta tolerance) — Specify the beta tolerance, which is the relative tolerance on the linear coefficients and bias term (intercept). The beta tolerance affects when the training process ends. If the software converges too quickly to a model that performs poorly, you can decrease the beta tolerance to try to improve the fit. The default value is 0.0001. For more information, see BetaTolerance. • Epsilon — Specify half the width of the epsilon-insensitive band. This option is available when Learner is SVM. • When you set this option to Auto, the software determines the value of Epsilon as iqr(Y)/ 13.49, which is an estimate of a tenth of the standard deviation using the interquartile range of the response variable Y. If iqr(Y) is equal to zero, then the software sets the value to 0.1. • When you set this option to Manual, you can specify a value by clicking the arrows or entering a positive scalar value in the box. For more information, see Epsilon. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Regression Learner App” on page 24-36.

Gaussian Process Regression Models You can train Gaussian process regression (GPR) models in Regression Learner. GPR models are often highly accurate, but can be difficult to interpret.

Tip In the Models gallery, click All GPR Models to try each of the nonoptimizable GPR model options and see which settings produce the best model with your data. Select the best model in the Models pane, and try to improve that model by using feature selection and changing some advanced options. Regression Model Type

Interpretability

Model Flexibility

Rational Quadratic

Hard

Automatic

Squared Exponential

Hard

Automatic

24-23

24

Regression Learner

Regression Model Type

Interpretability

Model Flexibility

Matern 5/2

Hard

Automatic

Exponential

Hard

Automatic

In Gaussian process regression, the response is modeled using a probability distribution over a space of functions. The flexibility of the presets in the Models gallery is automatically chosen to give a small training error and, simultaneously, protection against overfitting. To learn more about Gaussian process regression, see Gaussian Process Regression Models on page 6-2. For a workflow example, see “Train Regression Trees Using Regression Learner App” on page 2471. Gaussian Process Regression Model Hyperparameter Options Regression Learner uses the fitrgp function to train GPR models. You can set these options in the app: • Basis function The basis function specifies the form of the prior mean function of the Gaussian process regression model. You can choose from Zero, Constant, and Linear. Try to choose a different basis function and see if this improves your model. • Kernel function The kernel function determines the correlation in the response as a function of the distance between the predictor values. You can choose from Rational Quadratic, Squared Exponential, Matern 5/2, Matern 3/2, and Exponential. To learn more about kernel functions, see Kernel (Covariance) Function Options on page 6-6. • Use isotropic kernel If you use an isotropic kernel, the correlation length scales are the same for all the predictors. With a nonisotropic kernel, each predictor variable has its own separate correlation length scale. Using a nonisotropic kernel can improve the accuracy of your model, but can make the model slow to fit. To learn more about nonisotropic kernels, see Kernel (Covariance) Function Options. on page 6-6 • Kernel mode You can manually specify initial values of the kernel parameters Kernel scale and Signal standard deviation. The signal standard deviation is the prior standard deviation of the response values. By default the app locally optimizes the kernel parameters starting from the initial values. To use fixed kernel parameters, set Optimize numeric parameters to No. 24-24

Choose Regression Model Options

When Kernel scale mode is set to Auto, the app uses a heuristic procedure to select the initial kernel parameters. If you set Kernel scale mode to Manual, you can specify the initial values. Click the buttons or enter a positive scalar value in the Kernel scale box and the Signal standard deviation box. If you set Use isotropic kernel to No, you cannot set initial kernel parameters manually. • Sigma mode You can specify manually the initial value of the observation noise standard deviation Sigma. By default the app optimizes the observation noise standard deviation, starting from the initial value. To use fixed kernel parameters, clear the Optimize numeric parameters check box in the advanced options. When Sigma mode is set to Auto, the app uses a heuristic procedure to select the initial observation noise standard deviation. If you set Sigma mode to Manual, you can specify the initial values. Click the buttons or enter a positive scalar value in the Sigma box. • Standardize data Standardizing the predictors transforms them so that they have mean 0 and standard deviation 1. Standardizing removes the dependence on arbitrary scales in the predictors and generally improves performance. • Optimize numeric parameters With this option, the app automatically optimizes numeric parameters of the GPR model. The optimized parameters are the coefficients of the Basis function, the kernel parameters Kernel scale and Signal standard deviation, and the observation noise standard deviation Sigma. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Regression Learner App” on page 24-36.

Kernel Approximation Models In Regression Learner, you can use kernel approximation models to perform nonlinear regression of data with many observations. For large in-memory data, kernel approximation models tend to train and predict faster than SVM models with Gaussian kernels. Gaussian kernel regression models map predictors in a low-dimensional space into a highdimensional space, and then fit a linear model to the transformed predictors in the high-dimensional space. Choose between fitting an SVM linear model and fitting a least-squares linear model in the expanded space.

Tip In the Models gallery, click All Kernels to try each of the preset kernel approximation options and see which settings produce the best model with your data. Select the best model in the Models pane, and try to improve that model by using feature selection and changing some advanced options. 24-25

24

Regression Learner

Regression Model Type

Interpretability

Model Flexibility

SVM Kernel

Hard

Medium — increases as the Kernel scale setting decreases

Least Squares Kernel Regression

Hard

Medium — increases as the Kernel scale setting decreases

For an example, see “Train Kernel Approximation Model Using Regression Learner App” on page 2495. Kernel Model Hyperparameter Options Regression Learner uses the fitrkernel function to train kernel approximation regression models. You can set these options on the Summary tab for the selected model: • Learner — Specify the linear regression model type to fit in the expanded space, either SVM or Least Squares Kernel. SVM models use an epsilon-insensitive loss during model fitting, whereas least-square models use a mean squared error (MSE). • Number of expansion dimensions — Specify the number of dimensions in the expanded space. • When you set this option to Auto, the software sets the number of dimensions to 2.^ceil(min(log2(p)+5,15)), where p is the number of predictors. • When you set this option to Manual, you can specify a value by clicking the arrows or entering a positive scalar value in the box. • Regularization strength (Lambda) — Specify the ridge (L2) regularization penalty term. When you use an SVM learner, the box constraint C and the regularization term strength λ are related by C = 1/(λn), where n is the number of observations. • When you set this option to Auto, the software sets the regularization strength to 1/n, where n is the number of observations. • When you set this option to Manual, you can specify a value by clicking the arrows or entering a positive scalar value in the box. • Kernel scale — Specify the kernel scaling. The software uses this value to obtain a random basis for the random feature expansion. For more details, see “Random Feature Expansion” on page 358828. • When you set this option to Auto, the software uses a heuristic procedure to select the scale value. The heuristic procedure uses subsampling. Therefore, to reproduce results, set a random number seed using rng before training the regression model. • When you set this option to Manual, you can specify a value by clicking the arrows or entering a positive scalar value in the box. • Epsilon — Specify half the width of the epsilon-insensitive band. This option is available when Learner is SVM.

24-26

Choose Regression Model Options

• When you set this option to Auto, the software determines the value of Epsilon as iqr(Y)/ 13.49, which is an estimate of a tenth of the standard deviation using the interquartile range of the response variable Y. If iqr(Y) is equal to zero, then the software sets the value to 0.1. • When you set this option to Manual, you can specify a value by clicking the arrows or entering a positive scalar value in the box. • Standardize data — Specify whether to standardize the numeric predictors. If predictors have widely different scales, standardizing can improve the fit. • Iteration limit — Specify the maximum number of training iterations. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Regression Learner App” on page 24-36.

Ensembles of Trees You can train ensembles of regression trees in Regression Learner. Ensemble models combine results from many weak learners into one high-quality ensemble model.

Tip In the Models gallery, click All Ensembles to try each of the nonoptimizable ensemble options and see which settings produce the best model with your data. Select the best model in the Models pane, and try to improve that model by using feature selection and changing some advanced options. Regression Model Type

Interpretabili Ensemble Method ty

Model Flexibility

Boosted Trees

Hard

Least-squares boosting (LSBoost) Medium to high with regression tree learners.

Bagged Trees

Hard

Bootstrap aggregating or bagging, High with regression tree learners.

For a workflow example, see “Train Regression Trees Using Regression Learner App” on page 2471. Ensemble Model Hyperparameter Options Regression Learner uses the fitrensemble function to train ensemble models. You can set these options: • Minimum leaf size Specify the minimum number of training samples used to calculate the response of each leaf node. When you grow a regression tree, consider its simplicity and predictive power. To change the 24-27

24

Regression Learner

minimum leaf size, click the buttons or enter a positive integer value in the Minimum leaf size box. • A fine tree with many small leaves is usually highly accurate on the training data. However, the tree might not show comparable accuracy on an independent test set. A very leafy tree tends to overfit, and its validation accuracy is often far lower than its training (or resubstitution) accuracy. • In contrast, a coarse tree with fewer large leaves does not attain high training accuracy. But a coarse tree can be more robust in that its training accuracy can be near that of a representative test set. Tip Decrease the Minimum leaf size to create a more flexible model. • Number of learners Try changing the number of learners to see if you can improve the model. Many learners can produce high accuracy, but can be time consuming to fit. Tip Increase the Number of learners to create a more flexible model. • Learning rate For boosted trees, specify the learning rate for shrinkage. If you set the learning rate to less than 1, the ensemble requires more learning iterations but often achieves better accuracy. 0.1 is a popular initial choice. • Number of predictors to sample Specify the number of predictors to select at random for each split in the tree learners. • When you set this option to Select All, the software uses all available predictors. • When you set this option to Set Limit, you can specify a value by clicking the buttons or entering a positive integer value in the box. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Regression Learner App” on page 24-36.

Neural Networks Neural network models typically have good predictive accuracy; however, they are not easy to interpret. Model flexibility increases with the size and number of fully connected layers in the neural network.

to try each of the preset neural network Tip In the Models gallery, click All Neural Networks options and see which settings produce the best model with your data. Select the best model in the Models pane, and try to improve that model by using feature selection and changing some advanced options. 24-28

Choose Regression Model Options

Regression Model Type

Interpretability

Model Flexibility

Narrow Neural Network

Hard

Medium — increases with the First layer size setting

Medium Neural Network

Hard

Medium — increases with the First layer size setting

Wide Neural Network

Hard

Medium — increases with the First layer size setting

Bilayered Neural Network

Hard

High — increases with the First layer size and Second layer size settings

Trilayered Neural Network

Hard

High — increases with the First layer size, Second layer size, and Third layer size settings

Each model is a feedforward, fully connected neural network for regression. The first fully connected layer of the neural network has a connection from the network input (predictor data), and each subsequent layer has a connection from the previous layer. Each fully connected layer multiplies the input by a weight matrix and then adds a bias vector. An activation function follows each fully connected layer, excluding the last. The final fully connected layer produces the network's output, namely predicted response values. For more information, see “Neural Network Structure” on page 35-2913. For an example, see “Train Regression Neural Networks Using Regression Learner App” on page 2488. Neural Network Model Hyperparameter Options Regression Learner uses the fitrnet function to train neural network models. You can set these options: • Number of fully connected layers — Specify the number of fully connected layers in the neural network, excluding the final fully connected layer for regression. You can choose a maximum of three fully connected layers. • First layer size, Second layer size, and Third layer size — Specify the size of each fully connected layer, excluding the final fully connected layer. If you choose to create a neural network with multiple fully connected layers, consider specifying layers with decreasing sizes. • Activation — Specify the activation function for all fully connected layers, excluding the final fully connected layer. Choose from the following activation functions: ReLU, Tanh, None, and Sigmoid. • Iteration limit — Specify the maximum number of training iterations. • Regularization strength (Lambda) — Specify the ridge (L2) regularization penalty term. • Standardize data — Specify whether to standardize the numeric predictors. If predictors have widely different scales, standardizing can improve the fit. Standardizing the data is highly recommended. 24-29

24

Regression Learner

Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Regression Learner App” on page 24-36.

See Also Related Examples

24-30

•

“Train Regression Models in Regression Learner App” on page 24-2

•

“Select Data for Regression or Open Saved App Session” on page 24-8

•

“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-31

•

“Visualize and Assess Model Performance in Regression Learner” on page 24-50

•

“Export Regression Model to Predict New Data” on page 24-65

•

“Train Regression Trees Using Regression Learner App” on page 24-71

Feature Selection and Feature Transformation Using Regression Learner App

Feature Selection and Feature Transformation Using Regression Learner App In this section... “Investigate Features in the Response Plot” on page 24-31 “Select Features to Include” on page 24-32 “Transform Features with PCA in Regression Learner” on page 24-34

Investigate Features in the Response Plot In Regression Learner, use the response plot to try to identify predictors that are useful for predicting the response. To visualize the relation between different predictors and the response, under X-axis, select different variables in the X list. Before you train a regression model, the response plot shows the training data. If you have trained a regression model, then the response plot also shows the model predictions. Observe which variables are associated most clearly with the response. When you plot the carbig data set, the predictor Horsepower shows a clear negative association with the response. Look for features that do not seem to have any association with the response and use Feature Selection to remove those features from the set of used predictors. See “Select Features to Include” on page 24-32.

24-31

24

Regression Learner

You can export the response plots you create in the app to figures. See “Export Plots in Regression Learner App” on page 24-61.

Select Features to Include In Regression Learner, you can specify different features (or predictors) to include in the model. See if you can improve models by removing features with low predictive power. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily with fewer predictors. You can determine which important predictors to include by using different feature ranking algorithms. After you select a feature ranking algorithm, the app displays a plot of the sorted feature importance scores, where larger scores (including Infs) indicate greater feature importance. The app also displays the ranked features and their scores in a table. To use feature ranking algorithms in Regression Learner, click Feature Selection in the Options section of the Learn tab. The app opens a Default Feature Selection tab, where you can choose a feature ranking algorithm. 24-32

Feature Selection and Feature Transformation Using Regression Learner App

Feature Ranking Algorithm

Supported Data Type

Description

MRMR

Categorical and continuous features

Rank features sequentially using the “Minimum Redundancy Maximum Relevance (MRMR) Algorithm” on page 35-3234. For more information, see fsrmrmr.

F Test

Categorical and continuous features

Examine the importance of each predictor individually using an F-test, and then rank features using the p-values of the F-test statistics. Each F-test tests the hypothesis that the response values grouped by predictor variable values are drawn from populations with the same mean against the alternative hypothesis that the population means are not all the same. Scores correspond to –log(p). For more information, see fsrftest.

RReliefF

Either all categorical or all continuous features

Rank features using the “RReliefF” on page 35-7158 algorithm with 10 nearest neighbors. This algorithm works best for estimating feature importance for distance-based supervised models that use pairwise distances between observations to predict the response. For more information, see relieff.

Choose between selecting the highest ranked features and selecting individual features. • Choose Select highest ranked features to avoid bias in validation metrics. For example, if you use a cross-validation scheme, then for each training fold, the app performs feature selection before training a model. Different folds can select different predictors as the highest ranked features. • Choose Select individual features to include specific features in model training. If you use a cross-validation scheme, then the app uses the same features across all training folds. When you are done selecting features, click Save and Apply. Your selections affect all draft models in the Models pane and will be applied to new draft models that you create using the gallery in the Models section of the Learn tab. 24-33

24

Regression Learner

To select features for a single draft model, open and edit the model summary. Click the model in the Models pane, and then click the model Summary tab (if necessary). The Summary tab includes an editable Feature Selection section. After you train a model, the Feature Selection section of the model Summary tab lists the features used to train the full model (that is, the model trained using training and validation data). To learn more about how Regression Learner applies feature selection to your data, generate code for your trained regression model. For more information, see “Generate MATLAB Code to Train Model with New Data” on page 24-66. For an example using feature selection, see “Train Regression Trees Using Regression Learner App” on page 24-71.

Transform Features with PCA in Regression Learner Use principal component analysis (PCA) to reduce the dimensionality of the predictor space. Reducing the dimensionality can create regression models in Regression Learner that help prevent overfitting. PCA linearly transforms predictors to remove redundant dimensions, and generates a new set of variables called principal components. 1

On the Learn tab, in the Options section, select PCA.

2

In the Default PCA Options dialog box, select the Enable PCA check box, and then click Save and Apply. The app applies the changes to all existing draft models in the Models pane and to new draft models that you create using the gallery in the Models section of the Learn tab.

3

When you next train a model using the Train All button, the pca function transforms your selected features before training the model.

4

By default, PCA keeps only the components that explain 95% of the variance. In the Default PCA Options dialog box, you can change the percentage of variance to explain by selecting the Explained variance value. A higher value risks overfitting, while a lower value risks removing useful dimensions.

5

If you want to limit the number of PCA components manually, select Specify number of components in the Component reduction criterion list. Select the Number of numeric components value. The number of components cannot be larger than the number of numeric predictors. PCA is not applied to categorical predictors.

You can check PCA options for trained models in the PCA section of the Summary tab. Click a trained model in the Models pane, and then click the model Summary tab (if necessary). For example: PCA is keeping enough components to explain 95% variance. After training, 2 components were kept. Explained variance per component (in order): 92.5%, 5.3%, 1.7%, 0.5%

Check the explained variance percentages to decide whether to change the number of components. To learn more about how Regression Learner applies PCA to your data, generate code for your trained regression model. For more information on PCA, see the pca function.

24-34

Feature Selection and Feature Transformation Using Regression Learner App

See Also Related Examples •

“Train Regression Models in Regression Learner App” on page 24-2

•

“Select Data for Regression or Open Saved App Session” on page 24-8

•

“Choose Regression Model Options” on page 24-13

•

“Visualize and Assess Model Performance in Regression Learner” on page 24-50

•

“Export Plots in Regression Learner App” on page 24-61

•

“Export Regression Model to Predict New Data” on page 24-65

•

“Train Regression Trees Using Regression Learner App” on page 24-71

24-35

24

Regression Learner

Hyperparameter Optimization in Regression Learner App In this section... “Select Hyperparameters to Optimize” on page 24-36 “Optimization Options” on page 24-44 “Minimum MSE Plot” on page 24-46 “Optimization Results” on page 24-48 After you choose a particular type of model to train, for example a decision tree or a support vector machine (SVM), you can tune your model by selecting different advanced options. For example, you can change the minimum leaf size of a decision tree or the box constraint of an SVM. Some of these options are internal parameters of the model, or hyperparameters, that can strongly affect its performance. Instead of manually selecting these options, you can use hyperparameter optimization within the Regression Learner app to automate the selection of hyperparameter values. For a given model type, the app tries different combinations of hyperparameter values by using an optimization scheme that seeks to minimize the model mean squared error (MSE), and returns a model with the optimized hyperparameters. You can use the resulting model as you would any other trained model. Note Because hyperparameter optimization can lead to an overfitted model, the recommended approach is to create a separate test set before importing your data into the Regression Learner app. After you train your optimizable model, you can see how it performs on your test set. For an example, see “Train Regression Model Using Hyperparameter Optimization in Regression Learner App” on page 24-103. To perform hyperparameter optimization in Regression Learner, follow these steps: 1

Choose a model type and decide which hyperparameters to optimize. See “Select Hyperparameters to Optimize” on page 24-36. Note Hyperparameter optimization is not supported for the models in “Linear Regression Models” on page 24-15.

2

(Optional) Specify how the optimization is performed. For more information, see “Optimization Options” on page 24-44.

3

Train your model. Use the “Minimum MSE Plot” on page 24-46 to track the optimization results.

4

Inspect your trained model. See “Optimization Results” on page 24-48.

Select Hyperparameters to Optimize In the Regression Learner app, in the Models section of the Learn tab, click the arrow to open the gallery. The gallery includes optimizable models that you can train using hyperparameter optimization. After you select an optimizable model, you can choose which of its hyperparameters you want to optimize. In the model Summary tab, in the Model Hyperparameters section, select Optimize check boxes for the hyperparameters that you want to optimize. Under Values, specify the fixed values for the hyperparameters that you do not want to optimize or that are not optimizable. 24-36

Hyperparameter Optimization in Regression Learner App

This table describes the hyperparameters that you can optimize for each type of model and the search range of each hyperparameter. It also includes the additional hyperparameters for which you can specify fixed values. Model

Optimizable Hyperparameters

Additional Hyperparameters

Optimizable Tree

• Minimum leaf size • Surrogate decision — The software splits searches among • Maximum integers log-scaled surrogates per in the range node [1,max(2,floor(n /2))], where n is the number of observations.

Notes For more information, see “Regression Tree Model Hyperparameter Options” on page 24-18.

24-37

24

Regression Learner

Model

Optimizable Hyperparameters

Optimizable SVM

• Kernel function — The software searches among Gaussian, Linear, Quadratic, and Cubic. • Box constraint — The software searches among positive values logscaled in the range [0.001,1000]. • Kernel scale — The software searches among positive values log-scaled in the range [0.001,1000]. • Epsilon — The software searches among positive values log-scaled in the range [0.001,100]*iqr( Y)/1.349, where Y is the response variable. • Standardize data — The software searches between Yes and No.

Additional Hyperparameters

Notes • The Box constraint optimizable hyperparameter combines the Box constraint mode and Manual box constraint advanced options of the preset SVM models. • The Kernel scale optimizable hyperparameter combines the Kernel scale mode and Manual kernel scale advanced options of the preset SVM models. • You can optimize the Kernel scale optimizable hyperparameter only when the Kernel function value is Gaussian. Unless you specify a value for Kernel scale by clearing the Optimize check box, the app uses the Manual value of 1 by default when the Kernel function has a value other than Gaussian. • The Epsilon optimizable hyperparameter combines the Epsilon mode and Manual epsilon advanced options of the preset SVM models. For more information, see “SVM Model

24-38

Hyperparameter Optimization in Regression Learner App

Model

Optimizable Hyperparameters

Additional Hyperparameters

Notes Hyperparameter Options” on page 24-20.

Optimizable Efficient • Learner — The Linear software searches between SVM and Least squares. • Regularization — The software searches between Ridge and Lasso.

• Solver

For more information, • Relative coefficient see “Efficiently Trained Linear Model tolerance (Beta Hyperparameter tolerance) Options” on page 24-22. • Epsilon

• Regularization strength (Lambda) — The software searches among positive values logscaled in the range [0.00001/ n,100000/n], where n is the number of observations.

24-39

24

Regression Learner

Model

Optimizable Hyperparameters

Additional Hyperparameters

Notes

Optimizable GPR

• Basis function — The software searches among Zero, Constant, and Linear.

• Signal standard deviation

• The Kernel function optimizable hyperparameter combines the Kernel function and Use isotropic kernel advanced options of the preset Gaussian process models.

• Kernel function — The software searches among: • Nonisotropic Rational Quadratic • Isotropic Rational Quadratic • Nonisotropic Squared Exponential • Isotropic Squared Exponential • Nonisotropic Matern 5/2 • Isotropic Matern 5/2 • Nonisotropic Matern 3/2 • Isotropic Matern 3/2 • Nonisotropic Exponential • Isotropic Exponential • Kernel scale — The software searches among positive values log-scaled in the range [0.001,1000]. • Sigma — The software searches among positive values log-scaled in the range [0.0001,max(0.00 1,10*std(Y))], 24-40

• Optimize numeric parameters

• The Kernel scale optimizable hyperparameter combines the Kernel mode and Kernel scale advanced options of the preset Gaussian process models. • The Sigma optimizable hyperparameter combines the Sigma mode and Sigma advanced options of the preset Gaussian process models. • When you optimize the Kernel scale of isotropic kernel functions, only the kernel scale is optimized, not the signal standard deviation. You can either specify a Signal standard deviation value or use its default value. You cannot optimize the Kernel scale of nonisotropic kernel functions. For more information, see “Gaussian Process Regression Model

Hyperparameter Optimization in Regression Learner App

Model

Optimizable Hyperparameters where Y is the response variable.

Additional Hyperparameters

Notes Hyperparameter Options” on page 24-24.

• Standardize data — The software searches between Yes and No.

24-41

24

Regression Learner

Model

Optimizable Hyperparameters

Additional Hyperparameters

Notes

Optimizable Kernel

• Learner — The software searches between SVM and Least Squares Kernel.

• Iteration limit

For more information, see “Kernel Model Hyperparameter Options” on page 24-26.

• Number of expansion dimensions — The software searches among positive integers log-scaled in the range [100,10000]. • Regularization strength (Lambda) — The software searches among positive values logscaled in the range [0.001/ n,1000/n], where n is the number of observations. • Kernel scale — The software searches among positive values log-scaled in the range [0.001,1000]. • Epsilon — The software searches among positive values log-scaled in the range [0.001,100]*iqr( Y)/1.349, where Y is the response variable. • Standardize data — The software searches between Yes and No.

24-42

Hyperparameter Optimization in Regression Learner App

Model

Optimizable Hyperparameters

Optimizable Ensemble

• Ensemble method — The software searches among Bag and LSBoost. • Minimum leaf size — The software searches among integers log-scaled in the range [1,max(2,floor(n /2))], where n is the number of observations. • Number of learners — The software searches among integers logscaled in the range [10,500].

Additional Hyperparameters

Notes • The Bag value of the Ensemble method optimizable hyperparameter specifies a Bagged Trees model. Similarly, the LSBoost Ensemble method value specifies a Boosted Trees model. For more information, see “Ensemble Model Hyperparameter Options” on page 24-27.

• Learning rate — The software searches among real values log-scaled in the range [0.001,1]. • Number of predictors to sample — The software searches among integers in the range [1,max(2,p)], where p is the number of predictor variables.

24-43

24

Regression Learner

Model

Optimizable Hyperparameters

Optimizable Neural Network

• Number of fully • Iteration limit connected layers — The software searches among 1, 2, and 3 fully connected layers. • First layer size — The software searches among integers log-scaled in the range [1,300]. • Second layer size — The software searches among integers log-scaled in the range [1,300]. • Third layer size — The software searches among integers log-scaled in the range [1,300]. • Activation — The software searches among ReLU, Tanh, None, and Sigmoid. • Regularization strength (Lambda) — The software searches among real values log-scaled in the range [0.00001/ n,100000/n], where n is the number of observations. • Standardize data — The software searches between Yes and No.

Optimization Options

24-44

Additional Hyperparameters

Notes For more information, see “Neural Network Model Hyperparameter Options” on page 24-29.

Hyperparameter Optimization in Regression Learner App

By default, the Regression Learner app performs hyperparameter tuning by using Bayesian optimization. The goal of Bayesian optimization, and optimization in general, is to find a point that minimizes an objective function. In the context of hyperparameter tuning in the app, a point is a set of hyperparameter values, and the objective function is the loss function, or the mean squared error (MSE). For more information on the basics of Bayesian optimization, see “Bayesian Optimization Workflow” on page 10-25. You can specify how the hyperparameter tuning is performed. For example, you can change the optimization method to grid search or limit the training time. On the Learn tab, in the Options section, click Optimizer. The app opens a dialog box in which you can select optimization options. After making your selections, click Save and Apply. Your selections affect all draft optimizable models in the Models pane and will be applied to new optimizable models that you create using the gallery in the Models section of the Learn tab. To specify optimization options for a single optimizable model, open and edit the model summary before training the model. Click the model in the Models pane. The model Summary tab includes an editable Optimizer section. This table describes the available optimization options and their default values. Option

Description

Optimizer

The optimizer values are: • Bayesopt (default) – Use Bayesian optimization. Internally, the app calls the bayesopt function. • Grid search – Use grid search with the number of values per dimension determined by the Number of grid divisions value. The app searches in a random order, using uniform sampling without replacement from the grid. • Random search – Search at random among points, where the number of points corresponds to the Iterations value.

24-45

24

Regression Learner

Option

Description

Acquisition function

When the app performs Bayesian optimization for hyperparameter tuning, it uses the acquisition function to determine the next set of hyperparameter values to try. The acquisition function values are: • Expected improvement per second plus (default) • Expected improvement • Expected improvement plus • Expected improvement per second • Lower confidence bound • Probability of improvement For details on how these acquisition functions work in the context of Bayesian optimization, see “Acquisition Function Types” on page 10-3.

Iterations

Each iteration corresponds to a combination of hyperparameter values that the app tries. When you use Bayesian optimization or random search, specify a positive integer that sets the number of iterations. The default value is 30. When you use grid search, the app ignores the Iterations value and evaluates the loss at every point in the entire grid. You can set a training time limit to stop the optimization process prematurely.

Training time limit

To set a training time limit, select this option and set the Maximum training time in seconds option. By default, the app does not have a training time limit.

Maximum training time in seconds

Set the training time limit in seconds as a positive real number. The default value is 300. The run time can exceed the training time limit because this limit does not interrupt an iteration evaluation.

Number of grid divisions

When you use grid search, set a positive integer as the number of values the app tries for each numeric hyperparameter. The app ignores this value for categorical hyperparameters. The default value is 10.

Minimum MSE Plot

24-46

Hyperparameter Optimization in Regression Learner App

After specifying which model hyperparameters to optimize and setting any additional optimization options (optional), train your optimizable model. On the Learn tab, in the Train section, click Train All and select Train Selected. The app creates a Minimum MSE Plot that it updates as the optimization runs.

The minimum mean squared error (MSE) plot displays the following information: • Estimated minimum MSE – Each light blue point corresponds to an estimate of the minimum MSE computed by the optimization process when considering all the sets of hyperparameter values tried so far, including the current iteration. The estimate is based on an upper confidence interval of the current MSE objective model, as mentioned in the Bestpoint hyperparameters description. If you use grid search or random search to perform hyperparameter optimization, the app does not display these light blue points. • Observed minimum MSE – Each dark blue point corresponds to the observed minimum MSE computed so far by the optimization process. For example, at the third iteration, the blue point corresponds to the minimum of the MSE observed in the first, second, and third iterations. • Bestpoint hyperparameters – The red square indicates the iteration that corresponds to the optimized hyperparameters. You can find the values of the optimized hyperparameters listed in the upper right of the plot under Optimization Results. 24-47

24

Regression Learner

The optimized hyperparameters do not always provide the observed minimum MSE. When the app performs hyperparameter tuning by using Bayesian optimization (see “Optimization Options” on page 24-44 for a brief introduction), it chooses the set of hyperparameter values that minimizes an upper confidence interval of the MSE objective model, rather than the set that minimizes the MSE. For more information, see the "Criterion","min-visited-upper-confidenceinterval" name-value argument of bestPoint. • Minimum error hyperparameters – The yellow point indicates the iteration that corresponds to the hyperparameters that yield the observed minimum MSE. For more information, see the "Criterion","min-observed" name-value argument of bestPoint. If you use grid search to perform hyperparameter optimization, the Bestpoint hyperparameters and the Minimum error hyperparameters are the same. Missing points in the plot correspond to NaN minimum MSE values.

Optimization Results When the app finishes tuning model hyperparameters, it returns a model trained with the optimized hyperparameter values (Bestpoint hyperparameters). The model metrics, displayed plots, and exported model correspond to this trained model with fixed hyperparameter values. To inspect the optimization results of a trained optimizable model, select the model in the Models pane and look at the model Summary tab.

The Summary tab includes these sections: • Training Results – Shows the performance of the optimizable model. See “View Model Metrics in Summary Tab and Models Pane” on page 24-51. 24-48

Hyperparameter Optimization in Regression Learner App

• Model Hyperparameters – Displays the type of optimizable model and lists any fixed hyperparameter values • Optimized Hyperparameters – Lists the values of the optimized hyperparameters • Hyperparameter Search Range – Displays the search ranges for the optimized hyperparameters • Optimizer – Shows the selected optimizer options When you perform hyperparameter tuning using Bayesian optimization and you export a trained optimizable model to the workspace as a structure, the structure includes a BayesianOptimization object in the HyperParameterOptimizationResult field. The object contains the results of the optimization performed in the app. When you generate MATLAB code from a trained optimizable model, the generated code uses the fixed and optimized hyperparameter values of the model to train on new data. The generated code does not include the optimization process. For information on how to perform Bayesian optimization when you use a fit function, see “Bayesian Optimization Using a Fit Function” on page 10-26.

See Also Related Examples •

“Train Regression Model Using Hyperparameter Optimization in Regression Learner App” on page 24-103

•

“Bayesian Optimization Workflow” on page 10-25

•

“Train Regression Models in Regression Learner App” on page 24-2

•

“Select Data for Regression or Open Saved App Session” on page 24-8

•

“Choose Regression Model Options” on page 24-13

•

“Visualize and Assess Model Performance in Regression Learner” on page 24-50

•

“Export Regression Model to Predict New Data” on page 24-65

24-49

24

Regression Learner

Visualize and Assess Model Performance in Regression Learner In this section... “Check Performance in Models Pane” on page 24-50 “View Model Metrics in Summary Tab and Models Pane” on page 24-51 “Compare Model Information and Results in Table View” on page 24-52 “Explore Data and Results in Response Plot” on page 24-54 “Plot Predicted vs. Actual Response” on page 24-56 “Evaluate Model Using Residuals Plot” on page 24-57 “Compare Model Plots by Changing Layout” on page 24-59 “Evaluate Test Set Model Performance” on page 24-59 After training regression models in Regression Learner, you can compare models based on model metrics, visualize results in a response plot or by plotting the actual versus predicted response, and evaluate models using the residual plot. • If you use k-fold cross-validation, then the app computes the model metrics using the observations in the k validation folds and reports the average values. It makes predictions on the observations in the validation folds and the plots show these predictions. It also computes the residuals on the observations in the validation folds. Note When you import data into the app, if you accept the defaults, the app automatically uses cross-validation. To learn more, see “Choose Validation Scheme” on page 23-19. • If you use holdout validation, the app computes the model metrics using the observations in the validation fold and makes predictions on these observations. The app uses these predictions in the plots and also computes the residuals based on the predictions. • If you use resubstitution validation, the values are resubstitution model metrics based on all the training data, and the predictions are resubstitution predictions.

Check Performance in Models Pane After training a model in Regression Learner, check the Models pane to see which model has the best overall score. The best RMSE (Validation) is highlighted in a box. This score is the root mean squared error (RMSE) on the validation set. The score estimates the performance of the trained model on new data. Use the score to help you choose the best model.

24-50

Visualize and Assess Model Performance in Regression Learner

• For cross-validation, the score is the RMSE on all observations not set aside for testing, counting each observation when it was in a holdout (validation) fold. • For holdout validation, the score is the RMSE on the held-out observations. • For resubstitution validation, the score is the resubstitution RMSE on all the training data. The best overall score might not be the best model for your goal. Sometimes a model with a slightly lower overall score is the better model for your goal. You want to avoid overfitting, and you might want to exclude some predictors where data collection is expensive or difficult.

View Model Metrics in Summary Tab and Models Pane You can view model metrics in the model Summary tab and the Models pane, and use these metrics to assess and compare models. Alternatively, you can use the Results Table tab to compare models. For more information, see “Compare Model Information and Results in Table View” on page 24-52. The Training Results metrics are calculated on the validation set. The Test Results metrics, if displayed, are calculated on an imported test set. For more information, see “Evaluate Test Set Model Performance” on page 24-59.

24-51

24

Regression Learner

Model Metrics Statistic

Description

Tip

RMSE

Root mean squared error. The Look for smaller values of the RMSE is always positive and its RMSE. units match the units of your response.

R-Squared

Coefficient of determination. R- Look for an R-Squared close to squared is always smaller than 1 1. and usually larger than 0. It compares the trained model with the model where the response is constant and equals the mean of the training response. If your model is worse than this constant model, then R-Squared is negative.

MSE

Mean squared error. The MSE is Look for smaller values of the the square of the RMSE. MSE.

MAE

Mean absolute error. The MAE Look for smaller values of the is always positive and similar to MAE. the RMSE, but less sensitive to outliers.

Prediction speed

Estimated prediction speed for new data, based on the prediction times for the validation data sets

Background processes inside and outside the app can affect this estimate, so train models under similar conditions for better comparisons.

Training time

Time spent training the model

Background processes inside and outside the app can affect this estimate, so train models under similar conditions for better comparisons.

Model size (Compact)

Size of the model if exported as a compact model (that is, without training data)

Look for model size values that fit the memory requirements of target hardware applications.

You can sort models in the Models pane based on different model metrics. To select a metric for sorting, use the Sort by list at the top of the Models pane. Not all metrics are available for model sorting in the Models pane. You can sort models by other metrics in the Results Table (see “Compare Model Information and Results in Table View” on page 24-52). You can also delete unwanted models listed in the Models pane. Select the model you want to delete and click the Delete selected model button in the upper right of the pane or right-click the model and select Delete. You cannot delete the last remaining model in the Models pane.

Compare Model Information and Results in Table View Rather than using the Summary tab or the Models pane to compare model metrics, you can use a table of results. On the Learn tab, in the Plots and Results section, click Results Table. In the 24-52

Visualize and Assess Model Performance in Regression Learner

Results Table tab, you can sort models by their training and test results, as well as by their options (such as model type, selected features, PCA, and so on). For example, to sort models by root mean squared error, click the sorting arrows in the RMSE (Validation) column header. An up arrow indicates that models are sorted from lowest RMSE to highest RMSE. To view more table column options, click the "Select columns to display" button at the top right of the table. In the Select Columns to Display dialog box, check the boxes for the columns you want to display in the results table. Newly selected columns are appended to the table on the right.

Within the results table, you can manually drag and drop the table columns so that they appear in your preferred order. You can mark some models as favorites by using the Favorite column. The app keeps the selection of favorite models consistent between the results table and the Models pane. Unlike other columns, the Favorite and Model Number columns cannot be removed from the table. To remove a row from the table, right-click any entry within the row and click Hide row (or Hide selected row(s) if the row is highlighted). To remove consecutive rows, click any entry within the first row you want to remove, press Shift, and click any entry within the last row you want to remove. Then, right-click one of the highlighted entries and click Hide selected row(s). To restore all removed rows, right-click any entry in the table and click Show all rows. The restored rows are appended to the bottom of the table.

24-53

24

Regression Learner

To export the information in the table, use one of the export buttons at the top right of the table. Choose between exporting the table to the workspace or to a file. The exported table includes only the displayed rows and columns.

Explore Data and Results in Response Plot View the regression model results by using the response plot, which displays the predicted response versus the record number. After you train a regression model, the app automatically opens the response plot for that model. If you train an "All" model, the app opens the response plot for the first model only. To view the response plot for another model, select the model in the Models pane. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Response in the Validation Results group. If you are using holdout or cross-validation, then the predicted response values are the predictions on the held-out (validation) observations. In other words, the software obtains each prediction by using a model that was trained without the corresponding observation. To investigate your results, use the controls on the right. You can: • Plot predicted and/or true responses. Use the check boxes under Plot to make your selection. • Show prediction errors, drawn as vertical lines between the predicted and true responses, by selecting the Errors check box. • Choose the variable to plot on the x-axis under X-axis. You can choose the record number or one of your predictor variables.

24-54

Visualize and Assess Model Performance in Regression Learner

• Plot the response as markers, or as a box plot under Style. You can select Box plot only when the variable on the x-axis has few unique values. A box plot displays the typical values of the response and any possible outliers. The central mark indicates the median, and the bottom and top edges of the box are the 25th and 75th percentiles, respectively. Vertical lines, called whiskers, extend from the boxes to the most extreme data points that are not considered outliers. The outliers are plotted individually using the "o" symbol. For more information about box plots, see boxchart.

24-55

24

Regression Learner

To export the response plots you create in the app to figures, see “Export Plots in Regression Learner App” on page 24-61.

Plot Predicted vs. Actual Response Use the Predicted vs. Actual plot to check model performance. Use this plot to understand how well the regression model makes predictions for different response values. To view the Predicted vs. Actual plot after training a model, click the arrow in the Plots and Results section to open the gallery, and then click Predicted vs. Actual (Validation) in the Validation Results group. When you open the plot, the predicted response of your model is plotted against the actual, true response. A perfect regression model has a predicted response equal to the true response, so all the points lie on a diagonal line. The vertical distance from the line to any point is the error of the prediction for that point. A good model has small errors, which means the predictions are scattered near the line.

24-56

Visualize and Assess Model Performance in Regression Learner

Usually a good model has points scattered roughly symmetrically around the diagonal line. If you can see any clear patterns in the plot, it is likely that you can improve your model. Try training a different model type or making your current model type more flexible by duplicating the model and using the Model Hyperparameters options in the model Summary tab. If you are unable to improve your model, it is possible that you need more data, or that you are missing an important predictor. To export the Predicted vs. Actual plots you create in the app to figures, see “Export Plots in Regression Learner App” on page 24-61.

Evaluate Model Using Residuals Plot Use the residuals plot to check model performance. To view the residuals plot after training a model, click the arrow in the Plots and Results section to open the gallery, and then click Residuals (Validation) in the Validation Results group. The residuals plot displays the difference between the predicted and true responses. Choose the variable to plot on the x-axis under X-axis. Choose the true response, predicted response, record number, or one of the predictors.

24-57

24

Regression Learner

Usually a good model has residuals scattered roughly symmetrically around 0. If you can see any clear patterns in the residuals, it is likely that you can improve your model. Look for these patterns: • Residuals are not symmetrically distributed around 0. • Residuals change significantly in size from left to right in the plot. • Outliers occur, that is, residuals that are much larger than the rest of the residuals. • A clear, nonlinear pattern appears in the residuals. Try training a different model type, or making your current model type more flexible by duplicating the model and using the Model Hyperparameters options in the model Summary tab. If you are unable to improve your model, it is possible that you need more data, or that you are missing an important predictor. To export the residuals plots you create in the app to figures, see “Export Plots in Regression Learner App” on page 24-61.

24-58

Visualize and Assess Model Performance in Regression Learner

Compare Model Plots by Changing Layout Visualize the results of models trained in Regression Learner by using the plot options in the Plots and Results section of the Learn tab. You can rearrange the layout of the plots to compare results across multiple models: use the options in the Layout button, drag and drop plots, or select the options provided by the Document Actions button located to the right of the model plot tabs. For example, after training two models in Regression Learner, display a plot for each model and change the plot layout to compare the plots by using one of these procedures: • In the Plots and Results section, click Layout and select Compare models. • Click the second model tab name, and then drag and drop the second model tab to the right. • Click the Document Actions button located to the far right of the model plot tabs. Select the Tile All option and specify a 1-by-2 layout.

Note that you can click the Hide plot options button room for the plots.

at the top right of the plots to make more

Evaluate Test Set Model Performance After training a model in Regression Learner, you can evaluate the model performance on a test set in the app. This process allows you to check whether the validation metrics provide good estimates for the model performance on new data. 1

Import a test data set into Regression Learner. Alternatively, reserve some data for testing when importing data into the app (see “(Optional) Reserve Data for Testing” on page 24-11). • If the test data set is in the MATLAB workspace, then in the Data section on the Test tab, click Test Data and select From Workspace. 24-59

24

Regression Learner

• If the test data set is in a file, then in the Data section, click Test Data and select From File. Select a file type in the list, such as a spreadsheet, text file, or comma-separated values (.csv) file, or select All Files to browse for other file types such as .dat. In the Import Test Data dialog box, select the test data set from the Test Data Set Variable list. The test set must have the same variables as the predictors imported for training and validation. 2

Compute the test set metrics. • To compute test metrics for a single model, select the trained model in the Models pane. On the Test tab, in the Test section, click Test Selected. • To compute test metrics for all trained models, click Test All in the Test section. The app computes the test set performance of each model trained on the full data set, including training and validation data (but excluding test data).

3

Compare the validation metrics with the test metrics. In the model Summary tab, the app displays the validation metrics and test metrics in the Training Results section and Test Results section, respectively. You can check if the validation metrics give good estimates for the test metrics. You can also visualize the test results using plots. • Display a predicted vs. actual plot. In the Plots and Results section on the Test tab, click Predicted vs. Actual (Test). • Display a residuals plot. In the Plots and Results section, click Residuals (Test).

For an example, see “Check Model Performance Using Test Set in Regression Learner App” on page 24-109. For an example that uses test set metrics in a hyperparameter optimization workflow, see “Train Regression Model Using Hyperparameter Optimization in Regression Learner App” on page 24-103.

See Also Related Examples

24-60

•

“Train Regression Models in Regression Learner App” on page 24-2

•

“Select Data for Regression or Open Saved App Session” on page 24-8

•

“Choose Regression Model Options” on page 24-13

•

“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-31

•

“Export Plots in Regression Learner App” on page 24-61

•

“Export Regression Model to Predict New Data” on page 24-65

•

“Train Regression Trees Using Regression Learner App” on page 24-71

Export Plots in Regression Learner App

Export Plots in Regression Learner App After you create plots interactively in the Regression Learner app, you can export your app plots to MATLAB figures. You can then copy, save, or customize the new figures. Choose among the available plots: response plot on page 24-31, Predicted vs. Actual plot on page 24-56, residuals plot on page 2457, minimum MSE plot on page 24-46, LIME explanations plot on page 24-114, Shapley explanations plot on page 24-118, and partial dependence plot on page 24-122. • Before exporting a plot, make sure the plot in the app displays the same data that you want in the new figure. • On the Learn, Test, or Explain tab, in the Export section, click Export Plot to Figure. The app creates a figure from the selected plot. • The new figure might not have the same interactivity options as the plot in the Regression Learner app. • Additionally, the figure might have a different axes toolbar than the one in the app plot. For plots in Regression Learner, an axes toolbar appears above the top right of the plot. The buttons available on the toolbar depend on the contents of the plot. The toolbar can include buttons to export the plot as an image, add data tips, pan or zoom the data, and restore the view.

• Copy, save, or customize the new figure, which is displayed in the figure window. • To copy the figure, select Edit > Copy Figure. For more information, see “Copy Figure to Clipboard from Edit Menu”. • To save the figure, select File > Save As. Alternatively, you can follow the workflow described in “Customize Figure Before Saving”. •

To customize the figure, click the Edit Plot button on the figure toolbar. Right-click the section of the plot that you want to edit. You can change the listed properties, which might include Color, Font, Line Style, and other properties. Or, you can use the Property Inspector to change the figure properties.

As an example, export a response plot in the app to a figure, customize the figure, and save the modified figure. 1

In the MATLAB Command Window, load the carbig data set. load carbig cartable = table(Acceleration,Cylinders,Displacement, ... Horsepower,Model_Year,Weight,Origin,MPG);

2

Click the Apps tab.

3

In the Apps section, click the arrow to open the gallery. Under Machine Learning and Deep Learning, click Regression Learner.

4

On the Learn tab, in the File section, click 5

.

In the New Session from Workspace dialog box, select the table cartable from the Data Set Variable list. 24-61

24

Regression Learner

6

Click Start Session. Regression Learner creates a response plot of the data by default.

7

Change the x-axis data in the response plot to Weight.

8

On the Learn tab, in the Export section, click Export Plot to Figure.

9

24-62

on the figure toolbar. Right-click the points in the In the new figure, click the Edit Plot button plot. In the context menu, select Marker > square.

Export Plots in Regression Learner App

10 To save the figure, select File > Save As. Specify the saved file location, name, and type.

24-63

24

Regression Learner

See Also Related Examples

24-64

•

“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-31

•

“Visualize and Assess Model Performance in Regression Learner” on page 24-50

•

“Export Regression Model to Predict New Data” on page 24-65

Export Regression Model to Predict New Data

Export Regression Model to Predict New Data In this section... “Export Model to Workspace” on page 24-65 “Make Predictions for New Data Using Exported Model” on page 24-65 “Generate MATLAB Code to Train Model with New Data” on page 24-66 “Generate C Code for Prediction” on page 24-67 “Deploy Predictions Using MATLAB Compiler” on page 24-69 “Export Model for Deployment to MATLAB Production Server” on page 24-69

Export Model to Workspace After you create regression models interactively in the Regression Learner app, you can export your best model to the workspace. Then you can use that trained model to make predictions using new data. Note The final model Regression Learner exports is always trained using the full data set, excluding any data reserved for testing. The validation scheme that you use only affects the way that the app computes validation metrics. You can use the validation metrics and various plots that visualize results to pick the best model for your regression problem. To export a model to the MATLAB workspace: 1

In the app, select the model you want to export in the Models pane. You can typically export a full or compact version of the trained model to the workspace as a structure containing a regression object, such as RegressionTree.

2

On the Learn tab, click Export, click Export Model and select Export Model. To exclude the training data and export a compact model, clear the check box in the Export Regression Model dialog box. You can still use the compact model for making predictions on new data. Note that the check box is disabled if the model does not have training data or if the training data cannot be excluded from the model. Some models, such as kernel approximation and efficiently trained linear models, never store training data.

3

In the Export Regression Model dialog box, edit the name of the exported variable, if necessary, and then click OK. The default name of the exported model, trainedModel, increments every time you export (for example, trainedModel1) to avoid overwriting previously exported models. The new variable (for example, trainedModel) appears in the workspace. The app displays information about the exported model in the Command Window. Read the message to learn how to make predictions with new data.

Make Predictions for New Data Using Exported Model After you export a model to the workspace from Regression Learner, or run the code generated from the app, you get a trainedModel structure that you can use to make predictions using new data. 24-65

24

Regression Learner

The structure contains a model object and a function for prediction. The structure enables you to make predictions for models that include principal component analysis (PCA). 1

Use the exported model to make predictions for new data, T: yfit = trainedModel.predictFcn(T)

where trainedModel is the name of your exported variable. Supply the data T with the same format and data type as the training data used in the app (table or matrix). • If you supply a table, then ensure that it contains the same predictor names as your training data. The predictFcn ignores additional variables in tables. Variable formats and types must match the original training data. • If you supply a matrix, it must contain the same predictor columns or rows as your training data, in the same order and format. Do not include a response variable, any variables that you did not import in the app, or other unused variables. The output yfit contains a prediction for each data point. 2

Examine the fields of the exported structure. For help making predictions, enter: trainedModel.HowToPredict

You also can extract the model object from the exported structure for further analysis. If you use feature transformation such as PCA in the app, you must take into account this transformation by using the information in the PCA fields of the structure.

Generate MATLAB Code to Train Model with New Data After you create regression models interactively in the Regression Learner app, you can generate MATLAB code for your best model. Then you can use the code to train the model with new data. Generate MATLAB code to: • Train on huge data sets. Explore models in the app trained on a subset of your data, and then generate code to train a selected model on a larger data set. • Create scripts for training models without needing to learn syntax of the different functions. • Examine the code to learn how to train models programmatically. • Modify the code for further analysis, for example to set options that you cannot change in the app. • Repeat your analysis on different data and automate training. To generate code and use it to train a model with new data: 1

In the app, from the Models pane, select the model you want to generate code for.

2

On the Learn tab, in the Export section, click Generate Function. The app generates code from your session and displays the file in the MATLAB Editor. The file includes the predictors and response, the model training methods, and the validation methods. Save the file.

3

24-66

To retrain your model, call the function from the command line with your original data or new data as the input argument or arguments. New data must have the same shape as the original data.

Export Regression Model to Predict New Data

Copy the first line of the generated code, excluding the word function, and edit the trainingData input argument to reflect the variable name of your training data or new data. Similarly, edit the responseData input argument (if applicable). For example, to retrain a regression model trained with the cartable data set, enter: [trainedModel,validationRMSE] = trainRegressionModel(cartable)

The generated code returns a trainedModel structure that contains the same fields as the structure you create when you export a model from Regression Learner to the workspace. If you want to automate training the same model with new data, or learn how to programmatically train models, examine the generated code. The code shows you how to: • Process the data into the right shape. • Train a model and specify all the model options. • Perform cross-validation. • Compute statistics. • Compute validation predictions. Note If you generate MATLAB code from a trained optimizable model, the generated code does not include the optimization process.

Generate C Code for Prediction If you train one of the models in this table using Regression Learner, you can generate C code for prediction. Model Type

Underlying Model Object

Linear Regression

LinearModel or CompactLinearModel

Regression Tree

RegressionTree or CompactRegressionTree

Support Vector Machine

RegressionSVM or CompactRegressionSVM

Efficiently Trained Linear Regression

RegressionLinear

Gaussian Process Regression

RegressionGP or CompactRegressionGP

Kernel Approximation Regression

RegressionKernel

Ensemble

RegressionEnsemble, CompactRegressionEnsemble, or RegressionBaggedEnsemble

Neural Network

RegressionNeuralNetwork or CompactRegressionNeuralNetwork

C code generation requires: • MATLAB Coder license • Appropriate model 24-67

24

Regression Learner

1

For example, train a tree model in Regression Learner, and then export the model to the workspace. Find the underlying regression model object in the exported structure. Examine the fields of the structure to find the model object, for example, S.RegressionTree, where S is the name of your structure. The underlying model object depends on whether you exported a compact model (i.e., you excluded the training data). The model object can be a RegressionTree or CompactRegressionTree object.

2

Use the function saveLearnerForCoder to prepare the model for code generation: saveLearnerForCoder(Mdl,filename). For example: saveLearnerForCoder(S.RegressionTree,'myTree')

3

Create a function that loads the saved model and makes predictions on new data. For example: function yfit = predictY (X) %#codegen %PREDICTY Predict responses using tree model % PREDICTY uses the measurements in X % and the tree model in the file myTree.mat, and then % returns predicted responses in yfit. CompactMdl = loadLearnerForCoder('myTree'); yfit = predict(CompactMdl,X); end

4

Generate a MEX function from your function. For example: codegen predictY.m -args {data}

The %#codegen compilation directive indicates that the MATLAB code is intended for code generation. To ensure that the MEX function can use the same input, specify the data in the workspace as arguments to the function using the -args option. Specify data as a matrix containing only the predictor columns used to train the model. 5

Use the MEX function to make predictions. For example: yfit = predictY_mex(data);

If you used feature selection or PCA feature transformation in the app, then you need to take additional steps. If you used manual feature selection, supply the same columns in X. The X argument is the input to your function. If you used PCA in the app, use the information in the PCA fields of the exported structure to take account of this transformation. It does not matter whether you imported a table or a matrix into the app, as long as X contains the matrix columns in the same order. Before generating code, follow these steps: 1

Save the PCACenters and PCACoefficients fields of the trained regression structure, S, to file using the following command: save('pcaInfo.mat','-struct','S','PCACenters','PCACoefficients');

2

In your function file, include additional lines to perform the PCA transformation. Create a function that loads the saved model, performs PCA, and makes predictions on new data. For example: function yfit = predictY (X) %#codegen %PREDICTY Predict responses using tree model

24-68

Export Regression Model to Predict New Data

% PREDICTY uses the measurements in X % and the tree model in the file myTree.mat, % and then returns predicted responses in yfit. % If you used manual feature selection in the app, ensure that X % contains only the columns you included in the model. CompactMdl = loadLearnerForCoder('myTree'); pcaInfo = coder.load('pcaInfo.mat','PCACenters','PCACoefficients'); PCACenters = pcaInfo.PCACenters; PCACoefficients = pcaInfo.PCACoefficients; % Performs PCA transformation pcaTransformedX = bsxfun(@minus,X,PCACenters)*PCACoefficients; yfit = predict(CompactMdl,pcaTransformedX); end

For more information on the C code generation workflow and limitations, see “Code Generation”. For examples, see saveLearnerForCoder and loadLearnerForCoder.

Deploy Predictions Using MATLAB Compiler After you export a model to the workspace from Regression Learner, you can deploy it using MATLAB Compiler. Suppose you export the trained model to MATLAB Workspace based on the instructions in “Export Model to Workspace” on page 24-65, with the name trainedModel. To deploy predictions, follow these steps. • Save the trainedModel structure in a .mat file. save mymodel trainedModel

• Write the code to be compiled. This code must load the trained model and use it to make a prediction. It must also have a pragma, so the compiler recognizes that Statistics and Machine Learning Toolbox code is needed in the compiled application. This pragma can be any model training function used in Regression Learner (for example, fitrtree). function ypred = mypredict(tbl) %#function fitrtree load('mymodel.mat'); ypred = trainedModel.predictFcn(tbl); end

• Compile as a standalone application. mcc -m mypredict.m

Export Model for Deployment to MATLAB Production Server After you train a model in Regression Learner, you can export the model for deployment to MATLAB Production Server (requires MATLAB Compiler SDK). • Select the trained model in the Models pane. On the Learn tab, click Export, click Export Model and select Export Model for Deployment. • In the Select Project File for Model Deployment dialog box, select a location and name for your project file. 24-69

24

Regression Learner

• In the autogenerated predictFunction.m file, inspect and amend the code as needed. • Use the Production Server Compiler app to package your model and prediction function. You can simulate the model deployment to MATLAB Production Server by clicking the Test Client button in the Test section of the Compiler tab, and then package your code by clicking the Package button in the Package section. For an example, see “Deploy Model Trained in Regression Learner to MATLAB Production Server” on page 24-137. For more information, see “Create Deployable Archive for MATLAB Production Server” (MATLAB Production Server).

See Also Functions fitrtree | fitlm | stepwiselm | fitrsvm | fitrlinear | fitrgp | fitrkernel | fitrensemble | fitrnet Classes RegressionTree | CompactRegressionTree | LinearModel | CompactLinearModel | RegressionSVM | CompactRegressionSVM | RegressionLinear | RegressionGP | CompactRegressionGP | RegressionKernel | RegressionEnsemble | CompactRegressionEnsemble | RegressionNeuralNetwork | CompactRegressionNeuralNetwork

Related Examples

24-70

•

“Train Regression Models in Regression Learner App” on page 24-2

•

“Select Data for Regression or Open Saved App Session” on page 24-8

•

“Choose Regression Model Options” on page 24-13

•

“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-31

•

“Visualize and Assess Model Performance in Regression Learner” on page 24-50

•

“Train Regression Trees Using Regression Learner App” on page 24-71

Train Regression Trees Using Regression Learner App

Train Regression Trees Using Regression Learner App This example shows how to create and compare various regression trees using the Regression Learner app, and export trained models to the workspace to make predictions for new data. You can train regression trees to predict responses to given input data. To predict the response of a regression tree, follow the tree from the root (beginning) node down to a leaf node. At each node, decide which branch to follow using the rule associated to that node. Continue until you arrive at a leaf node. The predicted response is the value associated to that leaf node. Statistics and Machine Learning Toolbox trees are binary. Each step in a prediction involves checking the value of one predictor variable. For example, here is a simple regression tree:

This tree predicts the response based on two predictors, x1 and x2. To predict, start at the top node. At each node, check the values of the predictors to decide which branch to follow. When the branches reach a leaf node, the response is set to the value corresponding to that node. This example uses the carbig data set. This data set contains characteristics of different car models produced from 1970 through 1982, including: • Acceleration • Number of cylinders • Engine displacement • Engine power (Horsepower) • Model year • Weight • Country of origin • Miles per gallon (MPG) 24-71

24

Regression Learner

Train regression trees to predict the fuel economy in miles per gallon of a car model, given the other variables as inputs. 1

In MATLAB, load the carbig data set and create a table containing the different variables: load carbig cartable = table(Acceleration,Cylinders,Displacement, ... Horsepower,Model_Year,Weight,Origin,MPG);

2

On the Apps tab, in the Machine Learning and Deep Learning group, click Regression Learner.

3

On the Learn tab, in the File section, select New Session > From Workspace.

4

Under Data Set Variable in the New Session from Workspace dialog box, select cartable from the list of tables and matrices in your workspace. Observe that the app has preselected response and predictor variables. MPG is chosen as the response, and all the other variables as predictors. For this example, do not change the selections.

24-72

Train Regression Trees Using Regression Learner App

5

To accept the default validation scheme and continue, click Start Session. The default validation option is cross-validation, to protect against overfitting. Regression Learner creates a plot of the response with the record number on the x-axis.

6

Use the response plot to investigate which variables are useful for predicting the response. To visualize the relation between different predictors and the response, select different variables in the X list under X-axis to the right of the plot. Observe which variables are correlated most clearly with the response. Displacement, Horsepower, and Weight all have a clearly visible impact on the response and all show a negative association with the response.

7

Select the variable Origin under X-axis. A box plot is automatically displayed. A box plot shows the typical values of the response and any possible outliers. The box plot is useful when plotting markers results in many points overlapping. To show a box plot when the variable on the x-axis has few unique values, under Style, select Box plot.

24-73

24

Regression Learner

8

Train a selection of regression trees. The Models pane already contains a fine tree model. Add medium and coarse tree models to the list of draft models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Regression Trees group, click Medium Tree. The app creates a draft medium tree in the Models pane. Reopen the model gallery and click Coarse Tree in the Regression Trees group. The app creates a draft coarse tree in the Models pane. In the Train section, click Train All and select Train All. The app trains the three tree models and plots both the true training response and the predicted response for each model. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models,

24-74

Train Regression Trees Using Regression Learner App

the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 9

In the Models pane, check the RMSE (Validation) (validation root mean squared error) of the models. The best score is highlighted in a box. The Fine Tree and the Medium Tree have similar RMSEs, while the Coarse Tree is less accurate.

10 Choose a model in the Models pane to view the results of that model. For example, select the

Medium Tree model (model 2). In the Response Plot tab, under X-axis, select Horsepower and examine the response plot. Both the true and predicted responses are now plotted. Show the prediction errors, drawn as vertical lines between the predicted and true responses, by selecting the Errors check box. 11 See more details on the currently selected model in the model's Summary tab. To open this tab,

click the Open selected model summary button in the upper right of the Models pane. Check and compare additional model characteristics, such as R-squared (coefficient of determination), MAE (mean absolute error), and prediction speed. To learn more, see “View Model Metrics in Summary Tab and Models Pane” on page 24-51. In the Summary tab, you also can find details on the currently selected model type, such as the hyperparameters used for training the model.

24-75

24

Regression Learner

12 Plot the predicted response versus true response. On the Learn tab, in the Plots and Results

section, click the arrow to open the gallery, and then click Predicted vs. Actual (Validation) in the Validation Results group. Use this plot to understand how well the regression model makes predictions for different response values.

A perfect regression model has predicted response equal to true response, so all the points lie on a diagonal line. The vertical distance from the line to any point is the error of the prediction for that point. A good model has small errors, so the predictions are scattered near the line. Usually a good model has points scattered roughly symmetrically around the diagonal line. If you can see any clear patterns in the plot, it is likely that you can improve your model. 13 Select the other models in the Models pane, open the predicted versus actual plot for each of the

models, and then compare the results. Rearrange the layout of the plots to better compare the plots. Click the Document Actions button located to the far right of the model plot tabs. Select the Tile All option and specify a 1-by-3 layout. Click the Hide plot options button top right of the plots to make more room for the plots.

24-76

at the

Train Regression Trees Using Regression Learner App

To return to the original layout, you can click the Layout button in the Plots and Results section and select Single model (Default). 14 In the Models gallery, select All Trees in the Regression Trees group. To try to improve the

tree models, include different features in the models. See if you can improve the model by removing features with low predictive power. On the Learn tab, in the Options section, click Feature Selection. In the Default Feature Selection tab, you can select different feature ranking algorithms to determine the most important features. After you select a feature ranking algorithm, the app displays a plot of the sorted feature importance scores, where larger scores (including Infs) indicate greater feature importance. The table shows the ranked features and their scores. In this example, both the MRMR and F Test feature ranking algorithms rank the acceleration and country of origin predictors the lowest. The app disables the RReliefF option because the predictors include a mix of numeric and categorical variables. 24-77

24

Regression Learner

Under Feature Ranking Algorithm, click F Test. Under Feature Selection, use the default option of selecting the highest ranked features to avoid bias in the validation metrics. Specify to keep 4 of the 7 features for model training.

Click Save and Apply. The app applies the feature selection changes to the current draft model and any new models created using the Models gallery. 15 Train the tree models using the reduced set of features. On the Learn tab, in the Train section,

click Train All and select Train All or Train Selected. 16 Observe the new models in the Models pane. These models are the same regression trees as

before, but trained using only 4 of 7 predictors. The app displays how many predictors are used. To check which predictors are used, click a model in the Models pane, and note the check boxes in the expanded Feature Selection section of the model Summary tab. Note If you use a cross-validation scheme and choose to perform feature selection using the Select highest ranked features option, then for each training fold, the app performs feature selection before training a model. Different folds can select different predictors as the highest 24-78

Train Regression Trees Using Regression Learner App

ranked features. The table on the Default Feature Selection tab shows the list of predictors used by the full model, trained on the training and validation data. The models with the three features removed do not perform as well as the models using all predictors. In general, if data collection is expensive or difficult, you might prefer a model that performs satisfactorily without some predictors. 17 Train the three regression tree presets using only Horsepower as a predictor. In the Models

gallery, select All Trees in the Regression Trees group. In the model Summary tab, expand the Feature Selection section. Choose the Select individual features option, and clear the check boxes for all features except Horsepower. On the Learn tab, in the Train section, click Train All and select Train Selected. Using only the engine power as a predictor results in models with lower accuracy. However, the models perform well given that they are using only a single predictor. With this simple onedimensional predictor space, the coarse tree now performs as well as the medium and fine trees. 18 Select the best model in the Models pane and view the residuals plot. On the Learn tab, in the

Plots and Results section, click the arrow to open the gallery, and then click Residuals (Validation) in the Validation Results group. The residuals plot displays the difference between the predicted and true responses. To display the residuals as a line graph, in the Style section, choose Lines. Under X-axis, select the variable to plot on the x-axis. Choose the true response, predicted response, record number, or one of the predictors.

24-79

24

Regression Learner

Usually a good model has residuals scattered roughly symmetrically around 0. If you can see any clear patterns in the residuals, it is likely that you can improve your model. 19 To learn about model hyperparameter settings, choose the best model in the Models pane and

expand the Model Hyperparameters section in the model Summary tab. Compare the coarse, medium, and fine tree models, and note the differences in the model hyperparameters. In particular, the Minimum leaf size setting is 36 for coarse trees, 12 for medium trees, and 4 for fine trees. This setting controls the size of the tree leaves, and through that the size and depth of the regression tree. To try to improve the best model (the medium tree trained using all predictors), change the Minimum leaf size setting. First, click the model in the Models pane. Right-click the model and select Duplicate. In the Summary tab, change the Minimum leaf size value to 8. Then, in the Train section of the Learn tab, click Train All and select Train Selected. To learn more about regression tree settings, see “Regression Trees” on page 24-17. 20 You can export a full or compact version of the selected model to the workspace. On the Learn

tab, click Export, click Export Model and select Export Model. To exclude the training data and export a compact model, clear the check box in the Export Regression Model dialog box. You can still use the compact model for making predictions on new data. In the dialog box, click OK to accept the default variable name trainedModel. 24-80

Train Regression Trees Using Regression Learner App

The Command Window displays information about the results. 21 Use the exported model to make predictions on new data. For example, to make predictions for

the cartable data in your workspace, enter: yfit = trainedModel.predictFcn(cartable)

The output yfit contains the predicted response for each data point. 22 If you want to automate training the same model with new data or learn how to programmatically

train regression models, you can generate code from the app. To generate code for the best trained model, on the Learn tab, in the Export section, click Generate Function. The app generates code from your model and displays the file in the MATLAB Editor. To learn more, see “Generate MATLAB Code to Train Model with New Data” on page 24-66. Tip Use the same workflow as in this example to evaluate and compare the other regression model types you can train in Regression Learner. Train all the nonoptimizable regression model presets available: 1

On the Learn tab, in the Models section, click the arrow to open the gallery of regression models.

2

In the Get Started group, click All.

3

In the Train section, click Train All and select Train All.

To learn about other regression model types, see “Train Regression Models in Regression Learner App” on page 24-2.

See Also Related Examples •

“Train Regression Models in Regression Learner App” on page 24-2

•

“Select Data for Regression or Open Saved App Session” on page 24-8

•

“Choose Regression Model Options” on page 24-13

•

“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-31

•

“Visualize and Assess Model Performance in Regression Learner” on page 24-50

•

“Export Regression Model to Predict New Data” on page 24-65 24-81

24

Regression Learner

Compare Linear Regression Models Using Regression Learner App This example shows how to compare a linear regression model and an efficiently trained linear regression model using the Regression Learner app. Efficiently trained linear regression models are useful for performing linear regression with many observations and many predictors. For large inmemory data, efficient linear regression models that use fitrlinear tend to train and predict faster than linear regression models that use fitlm. Export the efficient linear regression model to the workspace and inspect its properties, such as its size and linear coefficients. Then, use the model to make predictions on new data. Note that you can use efficient linear regression models with smaller data sets. If necessary, adjust the relative coefficient tolerance (beta tolerance) to improve the fit. The default value is sometimes too large for the app to converge to a good model. For more information, see “Efficiently Trained Linear Model Hyperparameter Options” on page 24-22. 1

In the MATLAB Command Window, simulate 10,000 observations from the model y = x100 + 2x200 + e, where X = x1, …, x1000 is a 10,000-by-1000 matrix with 10% nonzero standard normal elements, and e is a vector of random normal errors with mean 0 and standard deviation 0.3. rng("default") % For reproducibility X = full(sprandn(10000,1000,0.1)); y = X(:,100) + 2*X(:,200) + 0.3*randn(10000,1);

2

Open the Regression Learner app. regressionLearner

3

On the Learn tab, in the File section, click New Session and select From Workspace.

4

In the New Session from Workspace dialog box, select the matrix X from the Data Set Variable list. Then, under Response, click the From workspace option button and select y from the list. To accept the default validation scheme and continue, click Start Session. The default validation option is 5-fold cross-validation, to protect against overfitting. The app creates a plot of the response with the record number on the x-axis.

5

Create a selection of linear models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Linear Regression Models group, click Linear. Reopen the gallery and click Efficient Linear Least Squares in the Efficiently Trained Linear Regression Models group.

6

In the Models pane, delete the draft fine tree model by right-clicking it and selecting Delete.

7

On the Learn tab, in the Train section, click Train All and select Train All. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models,

24-82

Compare Linear Regression Models Using Regression Learner App

the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

Regression Learner trains the two linear models. In the Models pane, the app outlines the RMSE (Validation) (root mean squared error) of the best model. 8

Compare the two models. On the Learn tab, in the Plots and Results section, click Layout and select Compare models. Click the Summary tab for each model.

Note Validation introduces some randomness into the results. Your model validation results might vary from the results shown in this example. The validation RMSE for the linear regression model (Model 2) is better than the validation RMSE of the efficient linear model (Model 3). However, the training time for the efficient linear model is significantly smaller than the training time for the linear regression model. Also, the estimated model size of the efficient linear model is significantly smaller than the size of the linear regression model. 24-83

24

Regression Learner

9

For each model, plot the predicted response versus the true response. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Predicted vs. Actual (Validation) in the Validation Results group. Use this plot to determine how well the regression model makes predictions for different response values. Click the Hide plot options button plots.

at the top right of the plots to make more room for the

A perfect regression model has predicted responses equal to the true responses, so all the points lie on a diagonal line. The vertical distance from the line to any point is the error of the prediction for that point. A good model has small errors, so the predictions are scattered near the line. Typically, a good model has points scattered roughly symmetrically around the diagonal line. In this example, both models perform well. 10 For each model, view the residuals plot. On the Learn tab, in the Plots and Results section,

click the arrow to open the gallery, and then click Residuals (Validation) in the Validation Results group. The residuals plot displays the difference between the predicted and true responses.

24-84

Compare Linear Regression Models Using Regression Learner App

Click the Hide plot options button plots.

at the top right of the plots to make more room for the

Typically, a good model has residuals scattered roughly symmetrically around 0. If you can see any clear patterns in the residuals, you can most likely improve your model. In this example, the models have similar residual distributions. 11 Because the efficient linear model performs similarly to the linear regression model, export a

compact version of the efficiently trained linear regression model to the workspace. On the Learn tab, in the Export section, click Export Model and select Export Model. In the Export Regression Model dialog box, the check box to include the training data is disabled because efficient linear models do not store training data. In the dialog box, click OK to accept the default variable name. 12 In the MATLAB workspace, extract the RegressionLinear model from the trainedModel

structure. Inspect the size of the trained model Mdl. Mdl = trainedModel.RegressionEfficientLinear; whos Mdl

24-85

24

Regression Learner

Name

Size

Mdl

1x1

Bytes 159411

Class

Attributes

RegressionLinear

Note that you can extract the model from the exported structure because Regression Learner did not use a feature transformation or feature selection technique to train the model. 13 Plot the linear coefficients from the efficient linear model. coefficients = Mdl.Beta; plot(coefficients,".") xlabel("Predictor") ylabel("Coefficient")

The coefficient for the 100th predictor is approximately 1, the coefficient for the 200th predictor is approximately 2, and the remaining coefficients are close to 0. These values match the coefficients of the model used to generate the simulated training data. 14 Use the model to make predictions on new data. For example, create a 50-by-1000 matrix with

10% nonzero standard normal elements. You can use either the predictFcn function of the trainedModel structure or the predict object function of the Mdl object to predict the response for the new data. These two methods are equivalent because Regression Learner did not use a feature transformation or feature selection technique to train the model. XTest = full(sprandn(50,1000,0.1)); predictedY1 = trainedModel.predictFcn(XTest); predictedY2 = predict(Mdl,XTest);

24-86

Compare Linear Regression Models Using Regression Learner App

isequal(predictedY1,predictedY2) ans = logical 1

If the exported trainedModel contains PCA or feature selection information, use the predictFcn function of the structure to predict on new data.

See Also Related Examples •

“Train Regression Models in Regression Learner App” on page 24-2

•

“Select Data for Regression or Open Saved App Session” on page 24-8

•

“Choose Regression Model Options” on page 24-13

•

“Visualize and Assess Model Performance in Regression Learner” on page 24-50

•

“Export Regression Model to Predict New Data” on page 24-65

•

“Train Regression Trees Using Regression Learner App” on page 24-71

24-87

24

Regression Learner

Train Regression Neural Networks Using Regression Learner App This example shows how to create and compare various regression neural network models using the Regression Learner app, and export trained models to the workspace to make predictions for new data. 1

In the MATLAB Command Window, load the carbig data set, and create a table containing the different variables. load carbig cartable = table(Acceleration,Cylinders,Displacement, ... Horsepower,Model_Year,Weight,Origin,MPG);

2

Click the Apps tab, and then click the Show more arrow on the right to open the apps gallery. In the Machine Learning and Deep Learning group, click Regression Learner.

3

On the Learn tab, in the File section, click New Session and select From Workspace.

4

In the New Session from Workspace dialog box, select the table cartable from the Data Set Variable list. As shown in the dialog box, the app selects MPG as the response and the other variables as predictors. For this example, do not change the selections.

24-88

Train Regression Neural Networks Using Regression Learner App

5

To accept the default validation scheme and continue, click Start Session. The default validation option is 5-fold cross-validation, to protect against overfitting. Regression Learner creates a plot of the response with the record number on the x-axis.

6

Use the response plot to investigate which variables are useful for predicting the response. To visualize the relation between different predictors and the response, select different variables in the X list under X-axis to the right of the plot. Observe which variables are correlated most clearly with the response.

7

Create a selection of neural network models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Neural Networks group, click All Neural Networks.

8

In the Train section, click Train All and select Train All. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel 24-89

24

Regression Learner

pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

Regression Learner trains one of each neural network option in the gallery, as well as the default fine tree model. In the Models pane, the app outlines the RMSE (Validation) (root mean squared error) of the best model. 9

Select a model in the Models pane to view the results. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Response in the Validation Results group. Examine the response plot for the trained model. True responses are blue, and predicted responses are yellow.

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 10 Under X-axis, select Horsepower and examine the response plot. Both the true and predicted

responses are now plotted. Show the prediction errors, drawn as vertical lines between the predicted and true responses, by selecting the Errors check box under Plot to the right of the plot. 11 For more information on the currently selected model, consult the Summary tab. Check and

compare additional model characteristics, such as R-squared (coefficient of determination), MAE 24-90

Train Regression Neural Networks Using Regression Learner App

(mean absolute error), and prediction speed. To learn more, see “View Model Metrics in Summary Tab and Models Pane” on page 24-51. In the Summary tab, you can also find details on the currently selected model type, such as options used for training the model. 12 Plot the predicted response versus the true response. On the Learn tab, in the Plots and

Results section, click the arrow to open the gallery, and then click Predicted vs. Actual (Validation) in the Validation Results group. Use this plot to determine how well the regression model makes predictions for different response values.

A perfect regression model has predicted responses equal to the true responses, so all the points lie on a diagonal line. The vertical distance from the line to any point is the error of the prediction for that point. A good model has small errors, so the predictions are scattered near the line. Typically, a good model has points scattered roughly symmetrically around the diagonal line. If you can see any clear patterns in the plot, you can most likely improve your model. 13 For each remaining model, select the model in the Models pane, open the predicted versus

actual plot, and then compare the results across the models. For more information, see “Compare Model Plots by Changing Layout” on page 24-59. 24-91

24

Regression Learner

14 To try to improve the models, include different features. In the Models gallery, select All Neural

Networks again. See if you can improve the models by removing features with low predictive power. In the Summary tab, click Feature Selection to expand the section. In the Feature Selection section, clear the check boxes for Acceleration and Cylinders to exclude them from the predictors. You can use the response plot to see that these variables are not highly correlated with the response variable. In the Train section, click Train All and select Train All or Train Selected to train the neural network models using the new set of features. 15 Observe the new models in the Models pane. These models are the same neural network models

as before, but trained using only five of the seven predictors. For each model, the app displays how many predictors are used. To check which predictors are used, click a model in the Models pane and consult the Feature Selection section of the Summary tab. The models with the two features removed perform comparably to the models with all predictors. The models predict no better using all the predictors compared to using only a subset of them. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily without some predictors. 16 Select the model in the Models pane with the lowest validation RMSE (best model), and view the

residuals plot. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Residuals (Validation) in the Validation Results group. The residuals plot displays the difference between the predicted and true responses. To display the residuals as a line graph select Lines under Style. Under X-axis, select the variable to plot on the x-axis. Choose the true response, predicted response, record number, or one of the predictors.

24-92

Train Regression Neural Networks Using Regression Learner App

Typically, a good model has residuals scattered roughly symmetrically around 0. If you can see any clear patterns in the residuals, you can most likely improve your model. 17 You can try to further improve the best model in the Models pane by changing its

hyperparameters. First, duplicate the model. Right-click the model and select Duplicate. Then, in the Summary tab of the duplicated model, try changing some of the hyperparameter settings, like the sizes of the fully connected layers or the regularization strength. Train the new model by clicking Train All and selecting Train Selected. To learn more about neural network model settings, see “Neural Networks” on page 24-28. 18 You can export a full or compact version of the selected model to the workspace. On the Learn

tab, click Export, click Export Model and select Export Model. To exclude the training data and export a compact model, clear the check box in the Export Regression Model dialog box. You can still use the compact model for making predictions on new data. In the dialog box, click OK to accept the default variable name.

24-93

24

Regression Learner

19 To examine the code for training this model, click Generate Function in the Export section.

Tip Use the same workflow to evaluate and compare the other regression model types you can train in Regression Learner. To train all the nonoptimizable regression model presets available for your data set: 1

On the Learn tab, in the Models section, click the arrow to open the gallery of regression models.

2

In the Get Started group, click All.

3

In the Train section, click Train All and select Train All.

To learn about other regression model types, see “Train Regression Models in Regression Learner App” on page 24-2.

See Also Related Examples

24-94

•

“Train Regression Models in Regression Learner App” on page 24-2

•

“Select Data for Regression or Open Saved App Session” on page 24-8

•

“Choose Regression Model Options” on page 24-13

•

“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-31

•

“Visualize and Assess Model Performance in Regression Learner” on page 24-50

•

“Export Regression Model to Predict New Data” on page 24-65

Train Kernel Approximation Model Using Regression Learner App

Train Kernel Approximation Model Using Regression Learner App This example shows how to create and compare various kernel approximation regression models using the Regression Learner app, and export trained models to the workspace to make predictions for new data. Kernel approximation models are typically useful for performing nonlinear regression with many observations. For large in-memory data, kernel approximation models tend to train and predict faster than SVM models with Gaussian kernels. 1

In the MATLAB Command Window, load the carbig data set, and create a table containing the different variables. load carbig cartable = table(Acceleration,Cylinders,Displacement, ... Horsepower,Model_Year,Weight,Origin,MPG);

2

Click the Apps tab, and then click the Show more arrow on the right to open the apps gallery. In the Machine Learning and Deep Learning group, click Regression Learner.

3

On the Learn tab, in the File section, click New Session and select From Workspace.

4

In the New Session from Workspace dialog box, select the table cartable from the Data Set Variable list. As shown in the dialog box, the app selects MPG as the response and the other variables as predictors. For this example, do not change the selections.

24-95

24

Regression Learner

5

To accept the default validation scheme and continue, click Start Session. The default validation option is 5-fold cross-validation, to protect against overfitting. Regression Learner creates a plot of the response with the record number on the x-axis.

6

Use the response plot to investigate which variables are useful for predicting the response. To visualize the relation between different predictors and the response, select different variables in the X list under X-axis to the right of the plot. Observe which variables are correlated most clearly with the response.

7

Create a selection of kernel approximation models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Kernel Approximation Regression Models group, click All Kernels.

8

In the Train section, click Train All and select Train All. Note

24-96

Train Kernel Approximation Model Using Regression Learner App

• If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

Regression Learner trains one of each kernel approximation option in the gallery, as well as the default fine tree model. In the Models pane, the app outlines the RMSE (Validation) (root mean squared error) of the best model. 9

Select a model in the Models pane to view the results. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Response in the Validation Results group. Examine the response plot for the trained model. True responses are blue, and predicted responses are yellow.

24-97

24

Regression Learner

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 10 Under X-axis, select Horsepower and examine the response plot. Both the true and predicted

responses are now plotted. Show the prediction errors, drawn as vertical lines between the predicted and true responses, by selecting the Errors check box under Plot to the right of the plot. 11 For more information on the currently selected model, consult the Summary tab. Check and

compare additional model characteristics, such as R-squared (coefficient of determination), MAE (mean absolute error), and prediction speed. To learn more, see “View Model Metrics in Summary Tab and Models Pane” on page 24-51. In the Summary tab, you can also find details on the currently selected model type, such as options used for training the model. 12 Plot the predicted response versus the true response. On the Learn tab, in the Plots and

Results section, click the arrow to open the gallery, and then click Predicted vs. Actual

24-98

Train Kernel Approximation Model Using Regression Learner App

(Validation) in the Validation Results group. Use this plot to determine how well the regression model makes predictions for different response values.

A perfect regression model has predicted responses equal to the true responses, so all the points lie on a diagonal line. The vertical distance from the line to any point is the error of the prediction for that point. A good model has small errors, so the predictions are scattered near the line. Typically, a good model has points scattered roughly symmetrically around the diagonal line. If you can see any clear patterns in the plot, you can most likely improve your model. 13 For each remaining model, select the model in the Models pane, open the predicted versus

actual plot, and then compare the results across the models. For more information, see “Compare Model Plots by Changing Layout” on page 24-59. 14 To try to improve the models, include different features. In the Models gallery, select All

Kernels again. See if you can improve the models by removing features with low predictive power. In the Summary tab, click Feature Selection to expand the section.

24-99

24

Regression Learner

In the Feature Selection section, clear the check boxes for Acceleration and Cylinders to exclude them from the predictors. The response plot shows that these variables are not highly correlated with the response variable. In the Train section, click Train All and select Train All or Train Selected to train the kernel approximation models using the new set of features. 15 Observe the new models in the Models pane. These models are the same kernel approximation

models as before, but trained using only five of the seven predictors. For each model, the app displays how many predictors are used. To check which predictors are used, click a model in the Models pane and consult the Feature Selection section of the Summary tab. The models with the two features removed perform comparably to the models with all predictors. The models predict no better using all the predictors compared to using only a subset of them. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily without some predictors. 16 Select the model in the Models pane with the lowest validation RSME (best model), and view the

residuals plot. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Residuals (Validation) in the Validation Results group. The residuals plot displays the difference between the predicted and true responses. To display the residuals as a line graph select Lines under Style. Under X-axis, select the variable to plot on the x-axis. Choose the true response, predicted response, record number, or one of the predictors.

24-100

Train Kernel Approximation Model Using Regression Learner App

Typically, a good model has residuals scattered roughly symmetrically around 0. If you can see any clear patterns in the residuals, you can most likely improve your model. 17 You can try to further improve the best model in the Models pane by changing its

hyperparameters. First, duplicate the model. Right-click the model and select Duplicate. Then, in the Summary tab of the duplicated model, try changing some of the hyperparameter settings, like the kernel scale parameter or the regularization strength. Train the new model by clicking Train All and selecting Train Selected. To learn more about kernel approximation model settings, see “Kernel Approximation Models” on page 24-25. 18 You can export a compact version of the trained model to the workspace. On the Learn tab, click

Export, click Export Model and select Export Model. In the Export Regression Model dialog box, the check box to include the training data is disabled because kernel approximation models do not store training data. In the dialog box, click OK to accept the default variable name. 24-101

24

Regression Learner

19 To examine the code for training this model, click Generate Function in the Export section.

Tip Use the same workflow to evaluate and compare the other regression model types you can train in Regression Learner. To train all the nonoptimizable regression model presets available for your data set: 1

On the Learn tab, in the Models section, click the arrow to open the gallery of regression models.

2

In the Get Started group, click All.

3

In the Train section, click Train All and select Train All.

To learn about other regression model types, see “Train Regression Models in Regression Learner App” on page 24-2.

See Also fitrkernel | predict

Related Examples

24-102

•

“Train Regression Models in Regression Learner App” on page 24-2

•

“Select Data for Regression or Open Saved App Session” on page 24-8

•

“Choose Regression Model Options” on page 24-13

•

“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-31

•

“Visualize and Assess Model Performance in Regression Learner” on page 24-50

•

“Export Regression Model to Predict New Data” on page 24-65

Train Regression Model Using Hyperparameter Optimization in Regression Learner App

Train Regression Model Using Hyperparameter Optimization in Regression Learner App This example shows how to tune hyperparameters of a regression ensemble by using hyperparameter optimization in the Regression Learner app. Compare the test set performance of the trained optimizable ensemble to that of the best-performing preset ensemble model. 1

In the MATLAB Command Window, load the carbig data set, and create a table containing most of the variables. load carbig cartable = table(Acceleration,Cylinders,Displacement, ... Horsepower,Model_Year,Weight,Origin,MPG);

2

Open Regression Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Regression Learner.

3

On the Learn tab, in the File section, select New Session > From Workspace.

4

In the New Session from Workspace dialog box, select cartable from the Data Set Variable list. The app selects the response and predictor variables. The default response variable is MPG. The default validation option is 5-fold cross-validation, to protect against overfitting. In the Test section, click the check box to set aside a test data set. Specify to use 15 percent of the imported data as a test set.

24-103

24

Regression Learner

5

To accept the options and continue, click Start Session.

6

Train all preset ensemble models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Ensembles of Trees group, click All Ensembles. In the Train section, click Train All and select Train All. The app trains one of each ensemble model type, as well as the default fine tree model, and displays the models in the Models pane. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

24-104

Train Regression Model Using Hyperparameter Optimization in Regression Learner App

The app displays a response plot for the first ensemble model (model 2.1). Blue points are true values, and yellow points are predicted values. The Models pane on the left shows the validation RMSE for each model. Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 7

Select an optimizable ensemble model to train. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Ensembles of Trees group, click Optimizable Ensemble.

8

Select the model hyperparameters to optimize. In the Summary tab, you can select Optimize check boxes for the hyperparameters that you want to optimize. By default, all the check boxes are selected. For this example, accept the default selections.

24-105

24

Regression Learner

9

Train the optimizable model. In the Train section of the Learn tab, click Train All and select Train Selected.

10 The app displays a Minimum MSE Plot as it runs the optimization process. At each iteration,

the app tries a different combination of hyperparameter values and updates the plot with the minimum validation mean squared error (MSE) observed up to that iteration, indicated in dark blue. When the app completes the optimization process, it selects the set of optimized hyperparameters, indicated by a red square. For more information, see “Minimum MSE Plot” on page 24-46. The app lists the optimized hyperparameters in both the Optimization Results section to the right of the plot and the Optimizable Ensemble Model Hyperparameters section of the model Summary tab.

24-106

Train Regression Model Using Hyperparameter Optimization in Regression Learner App

Note In general, the optimization results are not reproducible. 11 Compare the trained preset ensemble models to the trained optimizable model. In the Models

pane, the app highlights the lowest RMSE (Validation) (validation root mean squared error) by outlining it in a box. In this example, the trained optimizable ensemble outperforms the two preset models. A trained optimizable model does not always have a lower RMSE than the trained preset models. If a trained optimizable model does not perform well, you can try to get better results by running the optimization for longer. On the Learn tab, in the Options section, click Optimizer. In the dialog box, increase the Iterations value. For example, you can double-click the default value of 30 and enter a value of 60. Then click Save and Apply. The options will be applied to future optimizable models created using the Models gallery. 12 Because hyperparameter tuning often leads to overfitted models, check the performance of the

optimizable ensemble model on a test set and compare it to the performance of the best preset ensemble model. Use the data you reserved for testing when you imported data into the app. First, in the Models pane, click the star icons next to the Bagged Trees model and the Optimizable Ensemble model. 13 For each model, select the model in the Models pane. In the Test section of the Test tab, click

Test Selected. The app computes the test set performance of the model trained on the rest of the data, namely the training and validation data.

24-107

24

Regression Learner

14 Sort the models based on the test set RMSE. In the Models pane, open the Sort by list and

select RMSE (Test). In this example, the trained optimizable ensemble still outperforms the trained preset model on the test set data. More importantly, the test set RMSE is comparable to the validation RMSE for the optimizable model.

See Also Related Examples

24-108

•

“Hyperparameter Optimization in Regression Learner App” on page 24-36

•

“Train Regression Models in Regression Learner App” on page 24-2

•

“Select Data for Regression or Open Saved App Session” on page 24-8

•

“Choose Regression Model Options” on page 24-13

•

“Visualize and Assess Model Performance in Regression Learner” on page 24-50

•

“Export Regression Model to Predict New Data” on page 24-65

•

“Bayesian Optimization Workflow” on page 10-25

Check Model Performance Using Test Set in Regression Learner App

Check Model Performance Using Test Set in Regression Learner App This example shows how to train multiple models in Regression Learner, and determine the bestperforming models based on their validation metrics. Check the test metrics for the best-performing models trained on the full data set, including training and validation data. 1

In the MATLAB Command Window, load the carbig data set, and create a table containing most of the variables. Separate the table into training and test sets. load carbig cartable = table(Acceleration,Cylinders,Displacement, ... Horsepower,Model_Year,Weight,Origin,MPG); rng("default") % For reproducibility of the data split n = length(MPG); partition = cvpartition(n,"Holdout",0.15); idxTrain = training(partition); % Indices for the training set cartableTrain = cartable(idxTrain,:); cartableTest = cartable(~idxTrain,:);

Alternatively, you can create a test set later on when you import data into the app. For more information, see “(Optional) Reserve Data for Testing” on page 24-11. 2

Open Regression Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Regression Learner.

3

On the Learn tab, in the File section, click New Session and select From Workspace.

4

In the New Session from Workspace dialog box, select the cartableTrain table from the Data Set Variable list. As shown in the dialog box, the app selects the response and predictor variables. The default response variable is MPG. To protect against overfitting, the default validation option is 5-fold cross-validation. For this example, do not change the default settings.

24-109

24

Regression Learner

5

To accept the default options and continue, click Start Session.

6

Train all preset models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Get Started group, click All. In the Train section, click Train All and select Train All. The app trains one of each preset model type, along with the default fine tree model, and displays the models in the Models pane. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

7

24-110

Sort the trained models based on the validation root mean squared error (RMSE). In the Models pane, open the Sort by list and select RMSE (Validation).

Check Model Performance Using Test Set in Regression Learner App

8

In the Models pane, click the star icons next to the three models with the lowest validation RMSE. The app highlights the lowest validation RMSE by outlining it in a box. In this example, the trained Exponential GPR model has the lowest validation RMSE.

The app displays a response plot for the linear regression model (model 2.1). Blue points are true values, and yellow points are predicted values. The Models pane on the left shows the validation RMSE for each model. Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 9

Check the test set performance of the best-performing models. Begin by importing test data into the app. On the Test tab, in the Data section, click Test Data and select From Workspace.

10 In the Import Test Data dialog box, select the cartableTest table from the Test Data Set

Variable list. As shown in the dialog box, the app identifies the response and predictor variables.

24-111

24

Regression Learner

11 Click Import. 12 Compute the RMSE of the best preset models on the cartableTest data. For convenience,

compute the test set RMSE for all models at once. On the Test tab, in the Test section, click Test All. The app computes the test set performance of the model trained on the full data set, including training and validation data. 13 Sort the models based on the test set RMSE. In the Models pane, open the Sort by list and

select RMSE (Test). The app still outlines the metric for the model with the lowest validation RMSE, despite displaying the test RMSE. 14 Visually check the test set performance of the models. For each starred model, select the model

in the Models pane. On the Test tab, in the Plots and Results section, click Predicted vs. Actual (Test). 15 Rearrange the layout of the plots to better compare them. First, close the summary and plot tabs

for Model 1 and Model 2.1. Then, click the Document Actions button located to the far right of the model plot tabs. Select the Tile All option and specify a 1-by-3 layout. Click the Hide plot options button

at the top right of the plots to make more room for the plots.

In this example, the three starred models perform similarly on the test set data.

24-112

Check Model Performance Using Test Set in Regression Learner App

To return to the original layout, you can click the Layout button in the Plots and Results section and select Single model (Default). 16 Compare the validation and test RMSE for the trained Exponential GPR model. In the Models

pane, double-click the model. In the model Summary tab, compare the RMSE (Validation) value under Training Results to the RMSE (Test) value under Test Results. In this example, the validation RMSE is lower than the test RMSE, which indicates that the validation RMSE might be overestimating the performance of this model.

See Also Related Examples •

“Visualize and Assess Model Performance in Regression Learner” on page 24-50

•

“Export Regression Model to Predict New Data” on page 24-65

•

“Train Regression Model Using Hyperparameter Optimization in Regression Learner App” on page 24-103

24-113

24

Regression Learner

Explain Model Predictions for Regression Models Trained in Regression Learner App Understanding how some machine learning models make predictions can be difficult. Interpretability tools help reveal how predictors contribute (or do not contribute) to predictions. You can also use these tools to validate whether a model uses the correct evidence for its predictions, and find model biases that are not immediately apparent. Regression Learner provides functionality for two levels of model interpretation: local and global. Level

Objective

Use Case

Local interpretation

Explain a • Identify prediction for a important single query point. predictors for an individual prediction. • Examine a counterintuitive prediction.

Global interpretation

Explain how a • Demonstrate trained model how a trained makes predictions model works. for the entire data • Compare set. different models.

App Functionality Use LIME or Shapley values for a specified query point. See “Explain Local Model Predictions Using LIME Values” on page 24-114 or “Explain Local Model Predictions Using Shapley Values” on page 24-118.

Use partial dependence plots for the predictors of interest. See “Interpret Model Using Partial Dependence Plots” on page 24-122.

Explain Local Model Predictions Using LIME Values Use LIME (local interpretable model-agnostic explanations) to interpret a prediction for a query point by fitting a simple interpretable model for the query point. The simple model acts as an approximation for the trained model and explains model predictions around the query point. The simple model can be a linear model or a decision tree model. You can use the estimated coefficients of a linear model or the estimated predictor importance of a decision tree model to explain the contribution of individual predictors to the prediction for the query point. After you train a model in Regression Learner, select the model in the Models pane. On the Explain tab, in the Local Explanations section, click LIME. The app opens a new tab. In the left plot or table, select a query point. In the right plot or table, the app displays the LIME values corresponding to the query point. The app uses the lime function to compute the LIME values. When computing LIME values, the app uses the final model, trained on the full data set (including training and validation data, but excluding test data). Note Regression Learner does not support LIME explanations for models trained after applying feature selection or principal component analysis (PCA).

24-114

Explain Model Predictions for Regression Models Trained in Regression Learner App

Select Query Point To select a query point, you can use various controls. • To the right of the LIME plots, under Data, choose whether to select a query point from the Training set data or Test set data. The training set refers to the data used to train the final model and includes all the observations that are not reserved for testing. • Above the left plot, under Select Query Point, choose whether to select a query point from a plot (Plot) or a table (Table). If using a plot, click a point in the plot to designate the associated observation as the query point. If using a table, click a row in the table to select the associated observation as the query point. Alternatively, select a query point using the index of the observation in the selected data set. To the right of the LIME plots, under Query Point, enter the observation index. • To make selecting a query point from a plot easier, you can change the plot display by using the controls below the left plot. You can specify the plot type, select the x-axis and y-axis variables, and choose the values to display (such as true responses, predicted responses, and errors).

24-115

24

Regression Learner

• After selecting a query point, you can expand the LIME Explanations display by hiding the Select Query Point display. To the right of the LIME plots, under Data, clear the Show query points check box. Plot LIME Explanations Given a query point, view its LIME values by using the LIME Explanations display. Choose whether to view the results using a bar graph (Plot) or a table (Table). The table includes the predictor values at the query point. The meaning of the LIME values depends on the type of LIME model used. To the right of the LIME plots, in the Simple Model section under LIME Options, specify the type of simple model to use for approximating the behavior of the trained model. • If you use a Linear simple model, the LIME values correspond to the coefficient values of the simple model. The bar graph shows the coefficients, sorted by their absolute values. For each categorical predictor, the software creates one less dummy variable than the number of categories, and the bar graph displays only the most important dummy variable. You can check the coefficients of the other dummy variables using the SimpleModel property of the exported results object. For more information, see “Export LIME Results” on page 24-118. • If you use a Tree simple model, the LIME values correspond to the estimated predictor importance values of the simple model. The bar graph shows the predictor importance values, sorted by their absolute values. The bar graph shows LIME values only for the subset of predictors included in the simple model. Below the display of the LIME explanations, the app shows the query point predictions for the trained model (for example, Model 1 prediction) and the simple model (for example, LIME model prediction). If the two predictions are not close, the simple model is not a good approximation of the trained model at the query point. You can change the simple model so that it better matches the trained model at the query point by adjusting LIME options. Adjust LIME Options To adjust LIME options, you can use various controls to the right of the LIME plots, under LIME Options. Under Simple Model, you can set these options: • Simple model — Specify the type of simple model to use for approximating the behavior of the trained model. Choose between a linear model, which uses fitrlinear, and a decision tree, which uses fitrtree. For more information, see SimpleModelType. In Regression Learner, linear simple models use a BetaTolerance value of 0.00000001. • Max num predictors — Specify the maximum number of predictors to use for training the simple model. For a linear simple model, this value indicates the number of predictors to include in the model, not counting expanded categorical predictors. For a tree simple model, this value indicates the maximum number of decision splits (or branch nodes) in the tree, which might cause the model to include fewer predictors than the specified maximum. For more information, see numImportantPredictors. • Kernel width — Specify the width of the kernel function used to fit the simple model. Smaller kernel widths create LIME models that focus on data samples near the query point. For more information, see KernelWidth. Under Synthetic Predictor Data, you can set these options: 24-116

Explain Model Predictions for Regression Models Trained in Regression Learner App

• Num data samples — Specify the number of synthetic data samples to generate for training the simple model. For more information, see NumSyntheticData. • Data locality — Specify the locality of the data to use for synthetic data generation. A Global locality uses all observations in the training set, and a Local locality uses the k-nearest neighbors of the query point. (Recall that the training set contains the data used to train the final model and includes all the observations that are not reserved for testing.) For more information, see DataLocality. • Num neighbors — Specify the number of k-nearest neighbors for the query point. This option is valid only when the data locality is Local. For more information, see NumNeighbors. For more information on the LIME algorithm and how synthetic data is used, see “LIME” on page 354652. Perform What-If Analysis After computing the LIME results for a query point, you can perform what-if analysis and compare the LIME results for the original query point to the results for a custom query point. For example, you can see whether the important predictors change when the query point predictor values deviate slightly from their original values. To the right of the LIME plots, under Query Point, select What-if analysis. The app creates a table that shows the predictor values for the original query point and a custom query point. Manually specify the predictor values of the custom query point by editing the Custom Value table entries. To better see the table entries, you can increase the width of the plot options panel by using the plus button + at the top of the panel. After you specify a custom query point, the app updates the display of the LIME results. • The query point plot shows the original query point as a black circle and the custom query point as a green square. • The LIME explanations bar graph shows the LIME values for the original and custom query points, and differentiates the two sets of bars by using different colors and edge styles. • The LIME explanations table includes the LIME and predictor values for both query points. • Below the display of the LIME explanations, you can find the trained model and simple model predictions for both query points. Ensure that the two predictions for the custom query point are close. Otherwise, the simple model is not a good approximation of the trained model at the custom query point.

24-117

24

Regression Learner

Export LIME Results After computing LIME values, you can export your results by using any of the following options in the Export section on the Explain tab. • To export the LIME explanations bar graph to a figure, click Export Plot to Figure. • To export the LIME explanations table to the workspace, click Export Results and select Export Results Table. • To export the query point model explainer object to the workspace, click Export Results and select Export Results Object. If you specify a custom query point by using what-if analysis, the model explainer object corresponds to the custom query point. For more information on the explainer object, see lime.

Explain Local Model Predictions Using Shapley Values Use the Shapley value of a predictor for a query point to explain the deviation of the query point prediction from the average prediction, due to the predictor. For regression models, predictions are 24-118

Explain Model Predictions for Regression Models Trained in Regression Learner App

response values. For a query point, the sum of the Shapley values for all predictors corresponds to the total deviation of the prediction from the average. After you train a model in Regression Learner, select the model in the Models pane. On the Explain tab, in the Local Explanations section, click Local Shapley. The app opens a new tab. In the left plot or table, select a query point. In the right plot or table, the app displays the Shapley values corresponding to the query point. The app uses the shapley function to compute the Shapley values. When computing Shapley values, the app uses the final model, trained on the full data set (including training and validation data, but excluding test data). Note Regression Learner does not support Shapley explanations for models trained after applying feature selection or PCA.

Select Query Point To select a query point, you can use various controls.

24-119

24

Regression Learner

• To the right of the Shapley plots, under Data, choose whether to select a query point from the Training set data or Test set data. The training set refers to the data used to train the final model and includes all the observations that are not reserved for testing. • Above the left plot, under Select Query Point, choose whether to select a query point from a plot (Plot) or a table (Table). If using a plot, click a point in the plot to designate the associated observation as the query point. If using a table, click a row in the table to select the associated observation as the query point. Alternatively, select a query point using the index of the observation in the selected data set. To the right of the Shapley plots, under Query Point, enter the observation index. • To make selecting a query point from a plot easier, you can change the plot display by using the controls below the left plot. You can specify the plot type, select the x-axis and y-axis variables, and choose the values to display (such as true responses, predicted responses, and errors). • After selecting a query point, you can expand the Shapley Explanations display by hiding the Select Query Point display. To the right of the Shapley plots, under Data, clear the Show query points check box. Plot Shapley Explanations Given a query point, view its Shapley values by using the Shapley Explanations display. Each Shapley value explains the deviation of the prediction for the query point from the average prediction, due to the corresponding predictor. Choose whether to view the results using a bar graph (Plot) or a table (Table). The horizontal bar graph shows the Shapley values for all predictors, sorted by their absolute values. The table includes the predictor values at the query point along with the Shapley values. Below the display of the Shapley explanations, the app shows the query point prediction and the average model prediction. The sum of the Shapley values equals the difference between the two predictions. If the trained model includes many predictors, you can choose to display only the most important predictors in the bar graph. To the right of the Shapley plots, under Shapley Plot, specify the number of important predictors to show in the Shapley Explanations bar graph. The app displays the specified number of Shapley values with the largest absolute value. Adjust Shapley Options To adjust Shapley options, you can use various controls to the right of the Shapley plots. Under Shapley Options, you can set these options: • Num data samples — Specify the number of observations sampled from the training set to use for Shapley value computations. (Recall that the training set contains the data used to train the final model and includes all the observations that are not reserved for testing.) If the value equals the number of observations in the training set, the app uses every observation in the data set. When the training set has over 1000 observations, the Shapley value computations can be slow. For faster computations, consider using a smaller number of data samples. • Method — Specify the algorithm to use when computing Shapley values. The Interventional option computes Shapley values with an interventional value function. The app uses the Kernel SHAP, Linear SHAP, or Tree SHAP algorithm, depending on the trained model type and other specified options. The Conditional option uses the extension to the Kernel SHAP algorithm with a conditional value function. For more information, see Method. • Max num subsets mode — Allow the app to choose the maximum number of predictor subsets automatically, or specify a value manually. You can check the number of predictor subsets used by 24-120

Explain Model Predictions for Regression Models Trained in Regression Learner App

querying the NumSubsets property of the exported results object. For more information, see “Export Shapley Results” on page 24-122. • Manual max num subsets — When you set Max num subsets mode to Manual, specify the maximum number of predictor subsets to use for Shapley value computations. This option is valid only when the app uses the Kernel SHAP algorithm or the extension to the Kernel SHAP algorithm. For more information, see MaxNumSubsets. For more information on the algorithms used to compute Shapley values, see “Shapley Values for Machine Learning Model” on page 27-18. Perform What-If Analysis After computing the Shapley results for a query point, you can perform what-if analysis and compare the Shapley results for the original query point to the results for a custom query point. For example, you can see whether the important predictors change when the query point predictor values deviate slightly from their original values. To the right of the Shapley plots, under Query Point, select What-if analysis. The app creates a table that shows the predictor values for the original query point and a custom query point. Manually specify the predictor values of the custom query point by editing the Custom Value table entries. To better see the table entries, you can increase the width of the plot options panel by using the plus button + at the top of the panel. After you specify a custom query point, the app updates the display of the Shapley results. • The query point plot shows the original query point as a black circle and the custom query point as a green square. • The Shapley explanations bar graph shows the Shapley values for the original and custom query points, and differentiates the two sets of bars by using different colors and edge styles. • The Shapley explanations table includes the Shapley and predictor values for both query points. • Below the display of the Shapley explanations, you can find the model predictions for both query points. For easy comparison, the app lists the average model prediction twice, once below each query point prediction.

24-121

24

Regression Learner

Export Shapley Results After computing Shapley values, you can export your results by using any of the following options in the Export section on the Explain tab. • To export the Shapley explanations bar graph to a figure, click Export Plot to Figure. • To export the Shapley explanations table to the workspace, click Export Results and select Export Results Table. • To export the query point model explainer object to the workspace, click Export Results and select Export Results Object. If you specify a custom query point by using what-if analysis, the model explainer object corresponds to the custom query point. For more information on the explainer object, see shapley.

Interpret Model Using Partial Dependence Plots Partial dependence plots (PDPs) allow you to visualize the marginal effect of each predictor on the predicted response of a trained regression model. After you train a model in Regression Learner, you 24-122

Explain Model Predictions for Regression Models Trained in Regression Learner App

can view a partial dependence plot for the model. On the Explain tab, in the Global Explanations section, click Partial Dependence. When computing partial dependence values, the app uses the final model, trained on the full data set (including training and validation data, but excluding test data). To investigate your results, use the controls on the right. • Under Data, choose whether to plot results using Training set data or Test set data. The training set refers to the data used to train the final model and includes all the observations that are not reserved for testing. • Under Feature, choose the feature to plot using the X list. The plotted line corresponds to the average predicted response across the predictor values. The x-axis tick marks in the plot correspond to the unique predictor values in the selected data set. If you use PCA to train a model, you can select principal components from the X list. • Zoom in and out, or pan across the plot. To enable zooming or panning, place the mouse over the PDP and click the corresponding button on the toolbar that appears above the top right of the plot.

24-123

24

Regression Learner

For an example, see “Use Partial Dependence Plots to Interpret Regression Models Trained in Regression Learner App” on page 24-125. For more information on partial dependence plots, see plotPartialDependence. To export PDPs you create in the app to figures, see “Export Plots in Regression Learner App” on page 24-61.

See Also lime | shapley | plotPartialDependence | partialDependence

Related Examples

24-124

•

“Use Partial Dependence Plots to Interpret Regression Models Trained in Regression Learner App” on page 24-125

•

“Interpret Machine Learning Models” on page 27-2

•

“Export Plots in Regression Learner App” on page 24-61

•

“Export Regression Model to Predict New Data” on page 24-65

Use Partial Dependence Plots to Interpret Regression Models Trained in Regression Learner App

Use Partial Dependence Plots to Interpret Regression Models Trained in Regression Learner App For trained regression models, partial dependence plots (PDPs) show the relationship between a predictor and the predicted response. The partial dependence on the selected predictor is defined by the averaged prediction obtained by marginalizing out the effect of the other predictors. This example shows how to train regression models in the Regression Learner app and interpret the best-performing models using PDPs. You can use PDP results to confirm that models use features as expected, or to remove unhelpful features from model training. 1

In the MATLAB Command Window, load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. load carbig

2

Categorize the cars based on whether they were made in the USA. Origin = categorical(cellstr(Origin)); Origin = mergecats(Origin,["France","Japan","Germany", ... "Sweden","Italy","England"],"NotUSA");

3

Create a table containing the predictor variables Acceleration, Displacement, and so on, as well as the response variable MPG. cars = table(Acceleration,Displacement,Horsepower, ... Model_Year,Origin,Weight,MPG);

4

Remove rows of cars where the table has missing values.

5

Open Regression Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Regression Learner.

6

On the Learn tab, in the File section, click New Session and select From Workspace.

7

In the New Session from Workspace dialog box, select the cars table from the Data Set Variable list. The app selects the response and predictor variables. The default response variable is MPG. The default validation option is 5-fold cross-validation, to protect against overfitting.

cars = rmmissing(cars);

In the Test section, click the check box to set aside a test data set. Specify 15 percent of the imported data as a test set. 8

To accept the options and continue, click Start Session.

9

Train all preset models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Get Started group, click All. In the Train section, click Train All and select Train All. The app trains one of each preset model type, along with the default fine tree model, and displays the models in the Models pane. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. 24-125

24

Regression Learner

• If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background. 10 Sort the trained models based on the validation root mean squared error (RMSE). In the Models

pane, open the Sort by list and select RMSE (Validation). 11 In the Models pane, click the star icon next to the model with the lowest validation RMSE

values. The app highlights the lowest validation RMSE by outlining it in a box. In this example, the trained Matern 5/2 GPR model has the lowest validation RMSE.

Note Validation introduces some randomness into the results. Your model validation results might vary from the results shown in this example. 12 For the starred model, you can check the model performance by using various plots (for example,

response, Predicted vs. Actual, and residuals plots). In the Models pane, select the model. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery. Then, click any of the buttons in the Validation Results group to open the corresponding plot. After opening multiple plots, you can change the layout of the plots by using the Document Actions button located to the far right of the model plot tabs. For example, click the button, select the Sub-Tile option, and specify a layout. For more information on how to use and display validation plots, see “Visualize and Assess Model Performance in Regression Learner” on page 24-50.

24-126

Use Partial Dependence Plots to Interpret Regression Models Trained in Regression Learner App

To return to the original layout, you can click the Layout button in the Plots and Results section and select Single model (Default). 13 For the starred model, see how the model features relate to the model predictions by using

partial dependence plots (PDPs). On the Explain tab, in the Global Explanations section, click Partial Dependence. The PDP allows you to visualize the marginal effect of each predictor on the predicted response of the trained model. To compute the partial dependence values, the app uses the model trained on the 85% of observations in cars not reserved for testing. 14 Examine the relationship between the model predictors and model predictions on the training

data (that is, 85% of the observations in cars). Under Data, select Training set. Look for features that seem to contribute to model predictions. For example, under Feature, select Weight.

24-127

24

Regression Learner

The blue plotted line represents the averaged partial relationship between the Weight feature and the predicted MPG response. The tick marks along the x-axis indicate the unique Weight values in the training data set. According to this model (Model 2.19), the MPG (miles per gallon) value tends to decrease as the car weight increases. Note In general, consider the distribution of values when interpreting partial dependence plots. Results tend to be more reliable in intervals where you have sufficient observations whose predictor values are spread evenly. 15 You can tune your best-performing model by removing predictors that do not seem to contribute

to model predictions. A PDP where the predicted response remains constant across all predictor values can indicate a poor predictor. In this example, none of the predictors have a PDP where the plotted line is flat. However, two predictors, Displacement and Horsepower, show a similar relationship to the model predicted response as the Weight predictor. Under Feature, first select Displacement and then select Horsepower. 24-128

Use Partial Dependence Plots to Interpret Regression Models Trained in Regression Learner App

24-129

24

Regression Learner

16 Remove the Displacement and Horsepower predictors from the best-performing model. Create

a copy of the starred model. Right-click the model in the Models pane, and select Duplicate. Then, in the model Summary tab, expand the Feature Selection section, and clear the Select check boxes for the Displacement and Horsepower features.

24-130

Use Partial Dependence Plots to Interpret Regression Models Trained in Regression Learner App

17 Train the new model. In the Train section of the Learn tab, click Train All and select Train

Selected. 18 In the Models pane, click the star icon next to the new model. To group the starred models

together, open the Sort by list and select Favorites.

24-131

24

Regression Learner

The model trained with fewer features, Model 3, performs slightly better than the model trained with all features, Model 2.19. 19 For each starred model, compute the RMSE of the model on the test set. First, select the model

in the Models pane. Then, on the Test tab, in the Test section, click Test Selected. 20 Compare the validation and test RMSE results for the starred models by using a table. On the

Test tab, in the Plots and Results section, click Results Table. In the Results Table tab, click the "Select columns to display" button at the top right of the table. In the Select Columns to Display dialog box, check the Select box for the Preset column, and clear the Select check boxes for the MSE (Validation), RSquared (Validation), MAE (Validation), MSE (Test), RSquared (Test), and MAE (Test) columns. Click OK.

24-132

Use Partial Dependence Plots to Interpret Regression Models Trained in Regression Learner App

In this example, both of the starred models perform well on the test set. 21 For the best-performing model, look at the PDPs on the test data set. Ensure that the partial

relationships meet expectations. For this example, because the model trained on fewer features still performs well on the test set, select this model (Model 3). Compare the training set and test set PDPs for the Acceleration feature and the Model 3 predicted response. In the Partial Dependence Plot tab, under Feature, select Acceleration. Under Data, select Training set and then select Test set to see each plot.

24-133

24

Regression Learner

24-134

Use Partial Dependence Plots to Interpret Regression Models Trained in Regression Learner App

The PDPs have similar trends for the training and test data sets. However, the predicted response values vary slightly between the plots. This discrepancy might be due to a difference in the distribution of training set observations and test set observations. If you are satisfied with the best-performing model, you can export the trained model to the workspace. For more information, see “Export Model to Workspace” on page 24-65. You can also export any of the partial dependence plots you create in Regression Learner. For more information, see “Export Plots in Regression Learner App” on page 24-61.

See Also plotPartialDependence | partialDependence

Related Examples •

“Explain Model Predictions for Regression Models Trained in Regression Learner App” on page 24-114

•

“Interpret Machine Learning Models” on page 27-2

24-135

24

Regression Learner

24-136

•

“Export Plots in Regression Learner App” on page 24-61

•

“Export Regression Model to Predict New Data” on page 24-65

Deploy Model Trained in Regression Learner to MATLAB Production Server

Deploy Model Trained in Regression Learner to MATLAB Production Server This example shows how to train a model in Regression Learner and export it for deployment to MATLAB Production Server. This workflow requires MATLAB Compiler SDK.

Choose Trained Model to Deploy 1

In the Command Window, simulate 100 observations from a regression model with four predictor variables. Create a random matrix X, whose rows correspond to observations and whose columns correspond to predictor variables. Add missing values to the matrix by randomly setting approximately 2% of the values in each column as NaNs. Create a response variable y from the variables in X. rng("default") numRows = 100; numCols = 4; X = rand(numRows,numCols); randIdx = randi(numRows*numCols,floor(0.02*numRows)*numCols,1); X(randIdx) = NaN; y = 10*X(:,1) + 5*X(:,2) + 3*X(:,3) + 7*X(:,4) + 0.1*randn(numRows,1);

2

From the Command Window, open the Regression Learner app. Populate the New Session from Arguments dialog box with the predictor matrix X and the response variable y. regressionLearner(X,y)

The default validation option is 5-fold cross-validation, to protect against overfitting. For this example, do not change the default validation setting. 3

To accept the selections in the New Session from Arguments dialog box and continue, click Start Session.

4

Train all preset models. On the Learn tab, in the Models section, click the arrow to open the gallery. In the Get Started group, click All. In the Train section, click Train All and select Train All. The app trains all preset models, along with the default fine tree model, and displays the models in the Models pane. Note • If you have Parallel Computing Toolbox, then the Use Parallel button is selected by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel. • If you do not have Parallel Computing Toolbox, then the Use Background Training check box in the Train All menu is selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.

24-137

24

Regression Learner

The app displays a response plot for the linear regression model (model 2.1). Blue points are true values, and yellow points are predicted values. The Models pane on the left shows the validation RMSE (root mean squared error) for each model. 5

Sort the models based on the validation RMSE. In the Models pane, open the Sort by list and select RMSE (Validation). The app outlines the metric for the model (or models) with the lowest validation RMSE.

6

Select the model in the Models pane with the lowest validation RMSE.

Export Model for Deployment 1

Export the selected model for deployment to MATLAB Production Server. On the Learn tab, click Export, click Export Model and select Export Model for Deployment.

2

In the Select Project File for Model Deployment dialog box, select a location and name for your project file. For this example, use the default project name RegressionLearnerDeployedModel.prj. Click Save. The software opens the Production Server Compiler app and the autogenerated predictFunction.m file.

3

In the Compiler tab of the Production Server Compiler app, the Exported Functions section includes the files modelInformation.m and predictFunction.m. The section Additional files required for your archive to run includes the files processInputData.m and TrainedRegressionModel.mat. For an example where you must update the code in some of these files to include preprocessing steps, see “Deploy Model Trained in Classification Learner to MATLAB Production Server” on page 23-185. For this example, inspect the predictFunction.m code and close the file.

(Optional) Simulate Model Deployment Before packaging your code for deployment to MATLAB Production Server, you can simulate the model deployment using a MATLAB client. Completing this process requires opening another instance of MATLAB. For an example that shows how to use a sample Java client for sending data to a MATLAB function deployed on the server, see “Evaluate Deployed Machine Learning Models Using Java Client” (MATLAB Production Server). 1

In the Production Server Compiler app, click the Test Client button in the Test section on the Compiler tab.

2

On the Test tab, in the Server Actions section, click the Start button. Note the address listed in the Server Address pane, which in this example is http://localhost:9910/ DeployedRegressionModel.

3

Open a new instance of MATLAB. In the new MATLAB instance, the Production Server Compiler app automatically opens. Close this instance of the app.

4

In the Command Window of the new MATLAB instance, load predictor data that has the same format as the training data used in Regression Learner. rng("default") numRows = 100; numCols = 4; X = rand(numRows,numCols);

24-138

Deploy Model Trained in Regression Learner to MATLAB Production Server

randIdx = randi(numRows*numCols,floor(0.02*numRows)*numCols,1); X(randIdx) = NaN; 5

Send the data to MATLAB Production Server. Use the server address displayed in the Production Server Compiler app. Because X is a numeric matrix, the argument does not require further processing before being sent to MATLAB Production Server. You must convert categorical variables and tables to cell arrays and structures, respectively, before sending them to MATLAB Production Server. For an example, see “Deploy Model Trained in Classification Learner to MATLAB Production Server” on page 23-185. jsonData = mps.json.encoderequest({X},"Nargout",1, ... "OutputFormat","large"); URL = "http://localhost:9910/DeployedRegressionModel/predictFunction"; options = weboptions("MediaType","application/json","Timeout",30); response = webwrite(URL,jsonData,options);

In the original MATLAB instance, in the opened Production Server Compiler app, the MATLAB Execution Requests pane under the Test tab shows a successful request between the server and the MATLAB client. 6

In the Command Window of the new MATLAB instance, extract the predicted responses from the response variable. Convert the predicted responses to a numeric vector, and check that the values are correct. cellResults = response.lhs.mwdata; numericResults = arrayfun(@str2double,string(cellResults));

Note that the data type of response.lhs.mwdata changes depending on the presence of NaN values. For example, response.lhs.mwdata is a numeric vector when the predicted responses do not include NaN values. 7

In the original MATLAB instance, in the Production Server Compiler app, click Stop in the Server Actions section on the Test tab. In the Close section, click Close Test.

Package Code 1

Use the Production Server Compiler app to package your model and prediction function. On the Compiler tab, in the Package section, click the Package button.

2

In the Package dialog box, verify that the option Open output folder when process completes is selected. After the deployment process finishes, examine the generated output. • for_redistribution — Folder containing the DeployedRegressionModel.ctf file • for_testing — Folder containing the raw generated files required to create the installer • PackagingLog.html — Log file generated by MATLAB Compiler SDK

24-139

24

Regression Learner

See Also Related Examples

24-140

•

“Visualize and Assess Model Performance in Regression Learner” on page 24-50

•

“Export Regression Model to Predict New Data” on page 24-65

•

“Create Deployable Archive for MATLAB Production Server” (MATLAB Production Server)

•

“Evaluate Deployed Machine Learning Models Using Java Client” (MATLAB Production Server)

•

“Execute Deployed MATLAB Functions” (MATLAB Production Server)

Export Model from Regression Learner to Experiment Manager

Export Model from Regression Learner to Experiment Manager After training a regression model in Regression Learner, you can export the model to Experiment Manager to perform multiple experiments. By default, Experiment Manager uses Bayesian optimization to tune the model in a process similar to training optimizable models in Regression Learner. (For more information, see “Hyperparameter Optimization in Regression Learner App” on page 24-36.) Consider exporting a model to Experiment Manager when you want to do any of the following: • Adjust hyperparameter search ranges during hyperparameter tuning. • Change the training data. • Adjust the preprocessing steps that precede model fitting. • Tune hyperparameters using a different metric. For a workflow example, see “Tune Regression Model Using Experiment Manager” on page 24-147. Note that if you have a Statistics and Machine Learning Toolbox license, you do not need a Deep Learning Toolbox license to use the Experiment Manager app.

Export Regression Model To create an Experiment Manager experiment from a model trained in Regression Learner, select the model in the Models pane. On the Learn tab, in the Export section, click Export Model and select Create Experiment. Note This option is not supported for linear regression models. In the Create Experiment dialog box, modify the filenames or accept the default values.

The app exports the following files to Experiment Manager: 24-141

24

Regression Learner

• Training function — This function trains a regression model using the model hyperparameters specified in the Experiment Manager app and records the resulting metrics and visualizations. For each trial of the experiment, the app calls the training function with a new combination of hyperparameter values, selected from the hyperparameter search ranges specified in the app. The app saves the returned trained model, which you can export to the MATLAB workspace after training is complete. • Training data set — This .mat file contains the full data set used in Regression Learner (including training and validation data, but excluding test data). Depending on how you imported the data into Regression Learner, the data set is contained in either a table named dataTable or two separate variables named predictorMatrix and responseData. • Conditional constraints function — For some models, a conditional constraints function is required to tune model hyperparameters using Bayesian optimization. Conditional constraints enforce one of these conditions: • When some hyperparameters have certain values, other hyperparameters are set to given values. • When some hyperparameters have certain values, other hyperparameters are set to NaN values (for numeric hyperparameters) or values (for categorical hyperparameters). For more information, see “Conditional Constraints — ConditionalVariableFcn” on page 10-40. • Deterministic constraints function — For some models, a deterministic constraints function is required to tune model hyperparameters using Bayesian optimization. A deterministic constraints function returns a true value when a point in the hyperparameter search space is feasible (that is, the problem is valid or well defined at this point) and a false value otherwise. For more information, see “Deterministic Constraints — XConstraintFcn” on page 10-39. After you click Create Experiment, the app opens Experiment Manager. The Experiment Manager app then opens a dialog box in which you can choose to use a new or existing project for your experiment.

Select Hyperparameters In Experiment Manager, use different hyperparameters and hyperparameter search ranges to tune your model. On the tab for your experiment, in the Hyperparameters section, click Add to add a hyperparameter to the model tuning process. In the table, double-click an entry to adjust its value.

24-142

Export Model from Regression Learner to Experiment Manager

When you use the default Bayesian optimization strategy for model tuning, specify these properties of the hyperparameters used in the experiment: • Name — Enter a valid hyperparameter name. • Range — For a real- or integer-valued hyperparameter, enter a two-element vector that gives the lower bound and upper bound of the hyperparameter. For a categorical hyperparameter, enter an array of strings or a cell array of character vectors that lists the possible values of the hyperparameter. • Type — Select real for a real-valued hyperparameter, integer for an integer-valued hyperparameter, or categorical for a categorical hyperparameter. • Transform — Select none to use no transform or log to use a logarithmic transform. When you select log, the hyperparameter values must be positive. With this setting, the Bayesian optimization algorithm models the hyperparameter on a logarithmic scale. The following table provides information on the hyperparameters you can tune in the app for each model type. Model Type

Fitting Function

Hyperparameters

Tree

fitrtree

MaxNumSplits, MinLeafSize For more information, see OptimizeHyperparameters.

Gaussian Process Regression

fitrgp

BasisFunction, KernelFunction, KernelScale (KernelParameters), Sigma, Standardize For more information, see OptimizeHyperparameters.

24-143

24

Regression Learner

Model Type

Fitting Function

Hyperparameters

SVM

fitrsvm

BoxConstraint, Epsilon, KernelFunction, KernelScale, PolynomialOrder, Standardize For more information, see OptimizeHyperparameters.

Efficient Linear

fitrlinear

Lambda, Learner, Regularization For more information, see OptimizeHyperparameters.

Kernel

fitrkernel

Epsilon, KernelScale, Lambda, Learner, NumExpansionDimensions, Standardize For more information, see OptimizeHyperparameters.

Ensemble

fitrensemble

LearnRate, MaxNumSplits, Method, MinLeafSize, NumLearningCycles, NumVariablesToSample For more information, see OptimizeHyperparameters.

Neural Network

fitrnet

Activations, Lambda, LayerBiasesInitializer, LayerSizes, LayerWeightsInitializer, Standardize For more information, see OptimizeHyperparameters.

In the MATLAB Command Window, you can use the hyperparameters function to get more information about the hyperparameters available for your model and their default search ranges. Specify the fitting function, the training predictor data, and the training response variable in the call to the hyperparameters function.

(Optional) Customize Experiment In your experiment, you can change more than the model hyperparameters. In most cases, experiment customization requires editing the training function file before running the experiment. For example, to change the training data set, preprocessing steps, returned metrics, or generated visualizations, you must update the training function file. To edit the training function, click Edit in the Training Function section on the experiment tab. For an example that includes experiment customization, see “Tune Regression Model Using Experiment Manager” on page 24-147.

24-144

Export Model from Regression Learner to Experiment Manager

Some experiment customization steps do not require editing the training function file. For example, you can change the strategy for model tuning, adjust the Bayesian optimization options, or change the metric used to perform Bayesian optimization. Change Strategy for Model Tuning Instead of using Bayesian optimization to search for the best hyperparameter values, you can sweep through a range of hyperparameter values. On the tab for your experiment, in the Hyperparameters section, set Strategy to Exhaustive Sweep. In the hyperparameter table, enter the names and values of the hyperparameters to use in the experiment. Hyperparameter values must be scalars or vectors with numeric, logical, or string values, or cell arrays of character vectors. For example, these are valid hyperparameter specifications, depending on your model: • 0.01 • 0.01:0.01:0.05 • [0.01 0.02 0.04 0.08] • ["Bag","LSBoost"] • {'gaussian','linear','polynomial'} When you run the experiment, Experiment Manager trains a model using every combination of the hyperparameter values specified in the table. Adjust Bayesian Optimization Options When you use Bayesian optimization, you can specify the duration of your experiment. On the tab for your experiment, in the Hyperparameters section, ensure that Strategy is set to Bayesian Optimization. In the Bayesian Optimization Options section, enter the maximum time in seconds and the maximum number of trials to run. Note that the actual run time and number of trials in your experiment can exceed these settings because Experiment Manager checks these options only when a trial finishes executing. You can also specify the acquisition function for the Bayesian optimization algorithm. In the Bayesian Optimization Options section, click Advanced Options. Select an acquisition function from the Acquisition Function Name list. The default value for this option is expected-improvementplus. For more information, see “Acquisition Function Types” on page 10-3. Note that if you edit the training function file so that a new deterministic constraints function or conditional constraints function is required, you can specify the new function names in the Advanced Options section. Change Metric Used to Perform Bayesian Optimization By default, the app uses Bayesian optimization to try to find the combination of hyperparameter values that minimizes the validation RMSE. You can specify to maximize the validation R-squared value instead. On the tab for your experiment, in the Metrics section, specify to optimize the ValidationRSquared value. Set the Direction to Maximize. Note that if you edit the training function file to return another metric, you can specify it in the Metrics section. Ensure that the Direction is appropriate for the given metric.

Run Experiment When are you ready to run your experiment, you can run it either sequentially or in parallel. 24-145

24

Regression Learner

• If you have Parallel Computing Toolbox, Experiment Manager can perform computations in parallel. On the Experiment Manager tab, in the Execution section, select Simultaneous from the Mode list. Note Parallel computations with a thread pool are not supported in Experiment Manager. • Otherwise, use the default Mode option of Sequential. On the Experiment Manager tab, in the Run section, click Run.

See Also Apps Experiment Manager | Regression Learner

Related Examples

24-146

•

“Tune Regression Model Using Experiment Manager” on page 24-147

•

“Hyperparameter Optimization in Regression Learner App” on page 24-36

•

“Manage Experiments” (Deep Learning Toolbox)

Tune Regression Model Using Experiment Manager

Tune Regression Model Using Experiment Manager This example shows how to use Experiment Manager to optimize a machine learning regression model. The goal is to create a regression model for the carbig data set that has minimal crossvalidation loss. Begin by using the Regression Learner app to train all available regression models on the training data. Then, improve the best model by exporting it to Experiment Manager. In Experiment Manager, use the default settings to minimize the cross-validation loss (that is, minimize the cross-validation root mean squared error). Investigate options that help improve the loss, and perform more detailed experiments. For example, fix some hyperparameters at their best values, add useful hyperparameters to the model tuning process, adjust hyperparameter search ranges, adjust the training data, and customize the visualizations. The final result is a model with better test set performance. For more information on when to export models from Regression Learner to Experiment Manager, see “Export Model from Regression Learner to Experiment Manager” on page 24-141.

Load and Partition Data 1

In the MATLAB Command Window, load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. load carbig

The goal is to create a regression model that predicts the miles per gallon value for a car, based on the car's other measurements. 2

Categorize the cars based on whether they were made in the USA. Origin = categorical(cellstr(Origin)); Origin = mergecats(Origin,["France","Japan","Germany", ... "Sweden","Italy","England"],"NotUSA");

3

Create a table containing the predictor variables Acceleration, Displacement, and so on, as well as the response variable MPG (miles per gallon). cars = table(Acceleration,Displacement,Horsepower, ... Model_Year,Origin,Weight,MPG);

4

Remove rows of cars where the table has missing values. cars = rmmissing(cars);

5

Partition the data into two sets. Use approximately 80% of the observations for model training in Regression Learner, and reserve 20% of the observations for a final test set. Use cvpartition to partition the data. rng("default") % For reproducibility c = cvpartition(height(cars),"Holdout",0.2); trainingIndices = training(c); testIndices = test(c); carsTrain = cars(trainingIndices,:); carsTest = cars(testIndices,:);

24-147

24

Regression Learner

Train Models in Regression Learner 1

If you have Parallel Computing Toolbox, the Regression Learner app can train models in parallel. Training models in parallel is typically faster than training models in series. If you do not have Parallel Computing Toolbox, skip to the next step. Before opening the app, start a parallel pool of process workers by using the parpool function. parpool("Processes")

By starting a parallel pool of process workers rather than thread workers, you ensure that Experiment Manager can use the same parallel pool later. Note Parallel computations with a thread pool are not supported in Experiment Manager. 2

Open Regression Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Regression Learner.

3

On the Learn tab, in the File section, click New Session and select From Workspace.

4

In the New Session from Workspace dialog box, select the carsTrain table from the Data Set Variable list. The app selects the response and predictor variables. The default response variable is MPG.

5

In the Validation section, specify to use 3-fold cross-validation rather than the default 5-fold cross-validation.

6

In the Test section, click the check box to set aside a test data set. Specify 25 percent of the imported data as a test set.

7

To accept the options and continue, click Start Session.

8

Visually inspect the predictors in the open Response Plot. In the X-axis section, select each predictor from the X list. Note that some of the predictors, such as Displacement, Horsepower, and Weight, display similar trends.

9

Before training models, use principal component analysis (PCA) to reduce the dimensionality of the predictor space. PCA linearly transforms the numeric predictors to remove redundant dimensions. On the Learn tab, in the Options section, click PCA. In the Default PCA Options dialog box, click the check box to enable PCA. Select Specify number of components as the component reduction criterion, and specify 4 as the number of numeric components. Click Save and Apply.

10 To obtain the best model, train all preset models. On the Learn tab, in the Models section, click

the arrow to open the gallery. In the Get Started group, click All. In the Train section, click Train All and select Train All. The app trains one of each preset model type, along with the default fine tree model, and displays the models in the Models pane. 11 To find the best result, sort the trained models based on the validation root mean squared error

(RMSE). In the Models pane, open the Sort by list and select RMSE (Validation).

24-148

Tune Regression Model Using Experiment Manager

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example.

Assess Best Model Performance 1

For the model with the lowest RMSE, plot the predicted response versus the true response to see how well the regression model makes predictions for different response values. Select the Matern 5/2 GPR model in the Models pane. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Predicted vs. Actual (Validation) in the Validation Results group.

24-149

24

Regression Learner

Overall, the GPR (Gaussian process regression) model performs well. Most predictions are near the diagonal line. 2

24-150

View the residuals plot. On the Learn tab, in the Plots and Results section, click the arrow to open the gallery, and then click Residuals (Validation) in the Validation Results group. The residuals plot displays the difference between the true and predicted responses.

Tune Regression Model Using Experiment Manager

The residuals are scattered roughly symmetrically around 0. 3

Check the test set performance of the model. On the Test tab, in the Test section, click Test Selected. The app computes the test set performance of the model trained on the full data set, including training and validation data.

4

Compare the validation and test RMSE for the model. On the model Summary tab, compare the RMSE (Validation) value under Training Results to the RMSE (Test) value under Test Results. In this example, the validation RMSE overestimates the performance of the model on the test set.

24-151

24

Regression Learner

Export Model to Experiment Manager

24-152

1

To try to improve the predictive performance of the model, export it to Experiment Manager. On the Learn tab, in the Export section, click Export Model and select Create Experiment. The Create Experiment dialog box opens.

2

In the Create Experiment dialog box, click Create Experiment. The app opens Experiment Manager and a new dialog box.

Tune Regression Model Using Experiment Manager

3

In the dialog box, choose a new or existing project for your experiment. For this example, create a new project, and specify TrainGPRModelProject as the filename in the Specify Project Folder Name dialog box.

Run Experiment with Default Hyperparameters 1

Run the experiment either sequentially or in parallel. Note • If you have Parallel Computing Toolbox, save time by running the experiment in parallel. On the Experiment Manager tab, in the Execution section, select Simultaneous from the Mode list. • Otherwise, use the default Mode option of Sequential.

On the Experiment Manager tab, in the Run section, click Run. Experiment Manager opens a new tab that displays the results of the experiment. At each trial, the app trains a model with a different combination of hyperparameter values, as specified in the Hyperparameters table in the Experiment1 tab. 2

After the app runs the experiment, check the results. In the table of results, click the arrow for the ValidationRMSE column and select Sort in Ascending Order.

24-153

24

Regression Learner

Notice that the app tunes the Sigma and Standardize hyperparameters by default. 3

Check the predicted vs. actual plot for the model with the lowest RMSE. On the Experiment Manager tab, in the Review Results section, click Predicted vs. Actual (Validation). In the Visualizations pane, the app displays the plot for the model. To better see the plot, drag the Visualizations pane below the Experiment Browser pane.

24-154

Tune Regression Model Using Experiment Manager

For this model, the predicted values are close to the true response values. However, the model tends to underestimate the true response for values between 40 and 50 miles per gallon.

Adjust Hyperparameters and Hyperparameter Values 1

Standardizing the numeric predictors before training seems best for this data set. To try to obtain a better model, specify the Standardize hyperparameter value as true and then rerun the experiment. Click the Experiment1 tab. In the Hyperparameters table, select the row for the Standardize hyperparameter. Then click Delete.

2

Open the training function file. In the Training Function section, click Edit. The app opens the Experiment1_training1.mlx file.

3

In the file, search for the lines of code that use the fitrgp function. This function is used to create GPR models. Standardize the predictor data by using a name-value argument. In this case, adjust the four calls to fitrgp by adding 'Standardize',true as follows. regressionGP = fitrgp(predictors, response, ... paramNameValuePairs{:}, 'KernelParameters', kernelParameters, ... 'Standardize', true); regressionGP = fitrgp(predictors, response, ... paramNameValuePairs{:}, 'Standardize', true); regressionGP = fitrgp(trainingPredictors, trainingResponse, ... paramNameValuePairs{:}, 'KernelParameters', kernelParameters, ... 'Standardize', true); regressionGP = fitrgp(trainingPredictors, trainingResponse, ... paramNameValuePairs{:}, 'Standardize', true);

Save the code changes, and close the file. 4

On the Experiment Manager tab, in the Run section, click Run.

5

To further vary the models evaluated during the experiment, add hyperparameters to the model tuning process. In the MATLAB Command Window, use the hyperparameters function to see which hyperparameters you can tune for your model. Specify the training data set to see the default hyperparameter ranges. Enter the following code. load("trainingDataTable1.mat") info = hyperparameters("fitrgp",dataTable,"MPG"); for i = 1:length(info) disp(i);disp(info(i)) end 1 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'Sigma' [1.0000e-04 78.9730] 'real' 'log' 1

2 optimizableVariable with properties:

24-155

24

Regression Learner

Name: Range: Type: Transform: Optimize:

'BasisFunction' {'constant' 'none' 'categorical' 'none' 0

'linear'

'pureQuadratic'}

3 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'KernelFunction' {1×10 cell} 'categorical' 'none' 0

4 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'KernelScale' [1.0000e-03 1000] 'real' 'log' 0

5 optimizableVariable with properties: Name: Range: Type: Transform: Optimize: 6

'Standardize' {'true' 'false'} 'categorical' 'none' 1

Add the BasisFunction, KernelFunction, and KernelScale hyperparameters in Experiment Manager. For each hyperparameter, on the Experiment1 tab, in the Hyperparameters section, click Add. Edit the row entries to match the output of the hyperparameters function. In particular, specify the BasisFunction range as ["constant","none","linear"] and the KernelFunction range as ["ardexponential","ardmatern32","ardmatern52","ardrationalquadratic","ards quaredexponential","exponential","matern32","matern52","rationalquadratic" ,"squaredexponential"]. Because the training data set includes a categorical predictor, omit the pureQuadratic value from the list of basis functions.

24-156

Tune Regression Model Using Experiment Manager

For more information on the hyperparameters you can tune for your model, see “Export Model from Regression Learner to Experiment Manager” on page 24-141. 7

For better results when tuning several hyperparameters, increase the number of trials. On the Experiment1 tab, in the Bayesian Optimization Options section, specify 60 as the maximum number of trials.

8

On the Experiment Manager tab, in the Run section, click Run.

Specify Training Data 1

Before running the experiment again, specify to use all the observations in carsTrain. Because you reserved some observations for testing when you imported the training data into Regression Learner, all experiments so far have used only 75% of the observations in the carsTrain data set. Save the carsTrain data set as the file fullTrainingData.mat in the TrainGPRModelProject folder, which contains the experiment files. To do so, right-click the carsTrain variable name in the MATLAB workspace, and click Save As. In the dialog box, specify the filename and location, and then click Save.

2

On the Experiment1 tab, in the Training Function section, click Edit.

3

In the Experiment1_training1.mlx file, search for the load command. Specify to use the full carsTrain data set for model training by adjusting the code as follows. % Load training data fileData = load("fullTrainingData.mat"); trainingData = fileData.carsTrain;

24-157

24

Regression Learner

4

On the Experiment1 tab, in the Description section, change the number of observations to 314, which is the number of rows in the carsTrain table.

5

On the Experiment Manager tab, in the Run section, click Run.

Add Residuals Plot 1

You can add visualizations for Experiment Manager to return at each trial. In this case, specify to create a residuals plot. On the Experiment1 tab, in the Training Function section, click Edit.

2

In the Experiment1_training1.mlx file, search for the plot function. The surrounding code creates the validation predicted vs. actual plot for each trained model. Enter the following code to create a residuals plot. Ensure that the residuals plot code is within the trainRegressionModel function definition. % Create validation residuals plot residuals = validationResponse - validationPredictions; f = figure("Name","Residuals (Validation)"); resAxes = axes(f); hold(resAxes,"on") plot(resAxes,validationResponse,residuals,"ko", ... "MarkerFaceColor","#D95319") yline(resAxes,0) xlabel(resAxes,"True response") ylabel(resAxes,"Residuals (MPG)") title(resAxes,"Predictions: GPR")

24-158

3

On the Experiment Manager tab, in the Run section, click Run.

4

In the table of results, click the arrow for the ValidationRMSE column and select Sort in Ascending Order.

5

Check the predicted vs. actual plot and the residuals plot for the model with the lowest RMSE. On the Experiment Manager tab, in the Review Results section, click Predicted vs. Actual (Validation). In the Visualizations pane, the app displays the plot for the model.

Tune Regression Model Using Experiment Manager

6

On the Experiment Manager tab, in the Review Results section, click Residuals (Validation). In the Visualizations pane, the app displays the plot for the model.

Both plots indicate that the model generally performs well.

Export and Use Final Model 1

You can export a model trained in Experiment Manager to the MATLAB workspace. Select the best-performing model from the most recently run experiment. On the Experiment Manager tab, in the Export section, click Export and select Training Output. 24-159

24

Regression Learner

2

In the Export dialog box, change the workspace variable name to finalGPRModel and click OK. The new variable appears in your workspace.

3

Use the exported finalGPRModel structure to make predictions using new data. You can use the structure in the same way that you use any trained model exported from the Regression Learner app. For more information, see “Make Predictions for New Data Using Exported Model” on page 24-65. In this case, predict the response values for the test data in carsTest. testPredictedY = finalGPRModel.predictFcn(carsTest);

4

Compute the test RMSE using the predicted response values. testRSME = sqrt((1/length(testPredictedY))* ... sum((carsTest.MPG - testPredictedY).^2)) testRSME = 2.6647

The test RMSE is close to the validation RMSE computed in Experiment Manager (2.6894). Also, the test RMSE for this tuned model is smaller than the test RMSE for the Matern 5/2 GPR model in Regression Learner (3.0267). However, keep in mind that the tuned model uses observations in carsTest as test data and the Regression Learner model uses a subset of the observations in carsTrain as test data. 5

Create a predicted vs. actual plot and a residuals plot using the true test data response and the predicted response. figure line([min(carsTest.MPG) max(carsTest.MPG)], ... [min(carsTest.MPG) max(carsTest.MPG)], ..., "Color","black","LineWidth",2); hold on plot(carsTest.MPG,testPredictedY,"ko", ... "MarkerFaceColor","#0072BD"); hold off xlabel("True response") ylabel("Predicted response")

24-160

Tune Regression Model Using Experiment Manager

figure residuals = carsTest.MPG - testPredictedY; plot(carsTest.MPG,residuals,"ko", ... "MarkerFaceColor","#D95319") hold on yline(0) hold off xlabel("True response") ylabel("Residuals (MPG)")

24-161

24

Regression Learner

Both plots indicate that the model performs well on the test set.

See Also Apps Experiment Manager | Regression Learner Functions fitrgp

Related Examples

24-162

•

“Export Model from Regression Learner to Experiment Manager” on page 24-141

•

“Export Regression Model to Predict New Data” on page 24-65

•

“Manage Experiments” (Deep Learning Toolbox)

25 Support Vector Machines • “Support Vector Machines for Binary Classification” on page 25-2 • “Understanding Support Vector Machine Regression” on page 25-31

25

Support Vector Machines

Support Vector Machines for Binary Classification In this section... “Understanding Support Vector Machines” on page 25-2 “Using Support Vector Machines” on page 25-6 “Train SVM Classifiers Using a Gaussian Kernel” on page 25-8 “Train SVM Classifier Using Custom Kernel” on page 25-11 “Optimize Classifier Fit Using Bayesian Optimization” on page 25-15 “Plot Posterior Probability Regions for SVM Classification Models” on page 25-24 “Analyze Images Using Linear Support Vector Machines” on page 25-26

Understanding Support Vector Machines • “Separable Data” on page 25-2 • “Nonseparable Data” on page 25-4 • “Nonlinear Transformation with Kernels” on page 25-5 Separable Data You can use a support vector machine (SVM) when your data has exactly two classes. An SVM classifies data by finding the best hyperplane that separates all data points of one class from those of the other class. The best hyperplane for an SVM means the one with the largest margin between the two classes. Margin means the maximal width of the slab parallel to the hyperplane that has no interior data points. The support vectors are the data points that are closest to the separating hyperplane; these points are on the boundary of the slab. The following figure illustrates these definitions, with + indicating data points of type 1, and – indicating data points of type –1.

Mathematical Formulation: Primal

This discussion follows Hastie, Tibshirani, and Friedman [1] and Christianini and Shawe-Taylor [2]. 25-2

Support Vector Machines for Binary Classification

The data for training is a set of points (vectors) xj along with their categories yj. For some dimension d, the xj ∊ Rd, and the yj = ±1. The equation of a hyperplane is f (x) = x′β + b = 0 where β ∊ Rd and b is a real number. The following problem defines the best separating hyperplane (i.e., the decision boundary). Find β and b that minimize ||β|| such that for all data points (xj,yj), y j f x j ≥ 1. The support vectors are the xj on the boundary, those for which y j f x j = 1. For mathematical convenience, the problem is usually given as the equivalent problem of minimizing

β . This is a quadratic programming problem. The optimal solution β , b enables classification of a vector z as follows: class(z) = sign z′β + b = sign f z . f z is the classification score and represents the distance z is from the decision boundary. Mathematical Formulation: Dual

It is computationally simpler to solve the dual quadratic programming problem. To obtain the dual, take positive Lagrange multipliers αj multiplied by each constraint, and subtract from the objective function: LP =

1 β′β − ∑ α j y j x j′β + b − 1 , 2 j

where you look for a stationary point of LP over β and b. Setting the gradient of LP to 0, you get β=

∑ α jy jx j j

0=

(25-1)

∑ α jy j . j

Substituting into LP, you get the dual LD: LD =

∑ α j − 12 ∑ ∑ α jαky jykx j′xk, j

j k

which you maximize over αj ≥ 0. In general, many αj are 0 at the maximum. The nonzero αj in the solution to the dual problem define the hyperplane, as seen in “Equation 25-1”, which gives β as the sum of αjyjxj. The data points xj corresponding to nonzero αj are the support vectors. The derivative of LD with respect to a nonzero αj is 0 at an optimum. This gives y j f x j − 1 = 0. In particular, this gives the value of b at the solution, by taking any j with nonzero αj. 25-3

25

Support Vector Machines

The dual is a standard quadratic programming problem. For example, the Optimization Toolbox quadprog solver solves this type of problem. Nonseparable Data Your data might not allow for a separating hyperplane. In that case, SVM can use a soft margin, meaning a hyperplane that separates many, but not all data points. There are two standard formulations of soft margins. Both involve adding slack variables ξj and a penalty parameter C. • The L1-norm problem is: min

β, b, ξ

1 β′β + C ∑ ξ j 2 j

such that y jf x j ≥ 1 − ξ j ξ j ≥ 0. The L1-norm refers to using ξj as slack variables instead of their squares. The three solver options SMO, ISDA, and L1QP of fitcsvm minimize the L1-norm problem. • The L2-norm problem is: min

β, b, ξ

1 2 β′β + C ∑ ξ j 2 j

subject to the same constraints. In these formulations, you can see that increasing C places more weight on the slack variables ξj, meaning the optimization attempts to make a stricter separation between classes. Equivalently, reducing C towards 0 makes misclassification less important. Mathematical Formulation: Dual

For easier calculations, consider the L1 dual problem to this soft-margin formulation. Using Lagrange multipliers μj, the function to minimize for the L1-norm problem is: LP =

1 β′β + C ∑ ξ j − ∑ α j yi f x j − 1 − ξ j − ∑ μ jξ j, 2 j j j

where you look for a stationary point of LP over β, b, and positive ξj. Setting the gradient of LP to 0, you get β=

∑ α jy jx j j

∑ α jy j = 0 j

αj = C − μj α j, μ j, ξ j ≥ 0. These equations lead directly to the dual formulation: 25-4

Support Vector Machines for Binary Classification

max ∑ α j − α

j

1 α jαk y j ykx j′xk 2 ∑j ∑ k

subject to the constraints

∑ y jα j = 0 j

0 ≤ αj ≤ C . The final set of inequalities, 0 ≤ αj ≤ C, shows why C is sometimes called a box constraint. C keeps the allowable values of the Lagrange multipliers αj in a “box”, a bounded region. The gradient equation for b gives the solution b in terms of the set of nonzero αj, which correspond to the support vectors. You can write and solve the dual of the L2-norm problem in an analogous manner. For details, see Christianini and Shawe-Taylor [2], Chapter 6. fitcsvm Implementation

Both dual soft-margin problems are quadratic programming problems. Internally, fitcsvm has several different algorithms for solving the problems. • For one-class or binary classification, if you do not set a fraction of expected outliers in the data (see OutlierFraction), then the default solver is Sequential Minimal Optimization (SMO). SMO minimizes the one-norm problem by a series of two-point minimizations. During optimization, SMO respects the linear constraint ∑ αi yi = 0, and explicitly includes the bias term in the model. SMO is i

relatively fast. For more details on SMO, see [3]. • For binary classification, if you set a fraction of expected outliers in the data, then the default solver is the Iterative Single Data Algorithm. Like SMO, ISDA solves the one-norm problem. Unlike SMO, ISDA minimizes by a series on one-point minimizations, does not respect the linear constraint, and does not explicitly include the bias term in the model. For more details on ISDA, see [4]. • For one-class or binary classification, and if you have an Optimization Toolbox license, you can choose to use quadprog to solve the one-norm problem. quadprog uses a good deal of memory, but solves quadratic programs to a high degree of precision. For more details, see “Quadratic Programming Definition” (Optimization Toolbox). Nonlinear Transformation with Kernels Some binary classification problems do not have a simple hyperplane as a useful separating criterion. For those problems, there is a variant of the mathematical approach that retains nearly all the simplicity of an SVM separating hyperplane. This approach uses these results from the theory of reproducing kernels: • There is a class of functions G(x1,x2) with the following property. There is a linear space S and a function φ mapping x to S such that G(x1,x2) = .

(25-2)

The dot product takes place in the space S. 25-5

25

Support Vector Machines

• This class of functions includes: • Polynomials: For some positive integer p, G(x1,x2) = (1 + x1′x2)p.

(2 53)

• Radial basis function (Gaussian): G(x1,x2) = exp(–∥x1–x2∥2).

(2 54)

• Multilayer perceptron or sigmoid (neural network): For a positive number p1 and a negative number p2, G(x1,x2) = tanh(p1x1′x2 + p2).

(2 55)

Note • Not every set of p1 and p2 yields a valid reproducing kernel. • fitcsvm does not support the sigmoid kernel. Instead, you can define the sigmoid kernel and specify it by using the 'KernelFunction' name-value pair argument. For details, see “Train SVM Classifier Using Custom Kernel” on page 25-11. The mathematical approach using kernels relies on the computational method of hyperplanes. All the calculations for hyperplane classification use nothing more than dot products. Therefore, nonlinear kernels can use identical calculations and solution algorithms, and obtain classifiers that are nonlinear. The resulting classifiers are hypersurfaces in some space S, but the space S does not have to be identified or examined.

Using Support Vector Machines As with any supervised learning model, you first train a support vector machine, and then cross validate the classifier. Use the trained machine to classify (predict) new data. In addition, to obtain satisfactory predictive accuracy, you can use various SVM kernel functions, and you must tune the parameters of the kernel functions. • “Training an SVM Classifier” on page 25-6 • “Classifying New Data with an SVM Classifier” on page 25-7 • “Tuning an SVM Classifier” on page 25-7 Training an SVM Classifier Train, and optionally cross validate, an SVM classifier using fitcsvm. The most common syntax is: SVMModel = fitcsvm(X,Y,'KernelFunction','rbf',... 'Standardize',true,'ClassNames',{'negClass','posClass'});

The inputs are: 25-6

Support Vector Machines for Binary Classification

• X — Matrix of predictor data, where each row is one observation, and each column is one predictor. • Y — Array of class labels with each row corresponding to the value of the corresponding row in X. Y can be a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. • KernelFunction — The default value is 'linear' for two-class learning, which separates the data by a hyperplane. The value 'gaussian' (or 'rbf') is the default for one-class learning, and specifies to use the Gaussian (or radial basis function) kernel. An important step to successfully train an SVM classifier is to choose an appropriate kernel function. • Standardize — Flag indicating whether the software should standardize the predictors before training the classifier. • ClassNames — Distinguishes between the negative and positive classes, or specifies which classes to include in the data. The negative class is the first element (or row of a character array), e.g., 'negClass', and the positive class is the second element (or row of a character array), e.g., 'posClass'. ClassNames must be the same data type as Y. It is good practice to specify the class names, especially if you are comparing the performance of different classifiers. The resulting, trained model (SVMModel) contains the optimized parameters from the SVM algorithm, enabling you to classify new data. For more name-value pairs you can use to control the training, see the fitcsvm reference page. Classifying New Data with an SVM Classifier Classify new data using predict. The syntax for classifying new data using a trained SVM classifier (SVMModel) is: [label,score] = predict(SVMModel,newX);

The resulting vector, label, represents the classification of each row in X. score is an n-by-2 matrix of soft scores. Each row corresponds to a row in X, which is a new observation. The first column contains the scores for the observations being classified in the negative class, and the second column contains the scores observations being classified in the positive class. To estimate posterior probabilities rather than scores, first pass the trained SVM classifier (SVMModel) to fitPosterior, which fits a score-to-posterior-probability transformation function to the scores. The syntax is: ScoreSVMModel = fitPosterior(SVMModel,X,Y);

The property ScoreTransform of the classifier ScoreSVMModel contains the optimal transformation function. Pass ScoreSVMModel to predict. Rather than returning the scores, the output argument score contains the posterior probabilities of an observation being classified in the negative (column 1 of score) or positive (column 2 of score) class. Tuning an SVM Classifier Use the 'OptimizeHyperparameters' name-value pair argument of fitcsvm to find parameter values that minimize the cross-validation loss. The eligible parameters are 'BoxConstraint', 'KernelFunction', 'KernelScale', 'PolynomialOrder', and 'Standardize'. For an example, see “Optimize Classifier Fit Using Bayesian Optimization” on page 25-15. Alternatively, you can use the bayesopt function, as shown in “Optimize Cross-Validated Classifier Using bayesopt” on page 10-46. The bayesopt function allows more flexibility to customize optimization. You can use the 25-7

25

Support Vector Machines

bayesopt function to optimize any parameters, including parameters that are not eligible to optimize when you use the fitcsvm function. You can also try tuning parameters of your classifier manually according to this scheme: 1

Pass the data to fitcsvm, and set the name-value pair argument 'KernelScale','auto'. Suppose that the trained SVM model is called SVMModel. The software uses a heuristic procedure to select the kernel scale. The heuristic procedure uses subsampling. Therefore, to reproduce results, set a random number seed using rng before training the classifier.

2

Cross validate the classifier by passing it to crossval. By default, the software conducts 10-fold cross validation.

3

Pass the cross-validated SVM model to kfoldLoss to estimate and retain the classification error.

4

Retrain the SVM classifier, but adjust the 'KernelScale' and 'BoxConstraint' name-value pair arguments. • BoxConstraint — One strategy is to try a geometric sequence of the box constraint parameter. For example, take 11 values, from 1e-5 to 1e5 by a factor of 10. Increasing BoxConstraint might decrease the number of support vectors, but also might increase training time. • KernelScale — One strategy is to try a geometric sequence of the RBF sigma parameter scaled at the original kernel scale. Do this by: a

Retrieving the original kernel scale, e.g., ks, using dot notation: ks = SVMModel.KernelParameters.Scale.

b

Use as new kernel scales factors of the original. For example, multiply ks by the 11 values 1e-5 to 1e5, increasing by a factor of 10.

Choose the model that yields the lowest classification error. You might want to further refine your parameters to obtain better accuracy. Start with your initial parameters and perform another crossvalidation step, this time using a factor of 1.2.

Train SVM Classifiers Using a Gaussian Kernel This example shows how to generate a nonlinear classifier with Gaussian kernel function. First, generate one class of points inside the unit disk in two dimensions, and another class of points in the annulus from radius 1 to radius 2. Then, generate a classifier based on the data with the Gaussian radial basis function kernel. The default linear classifier is obviously unsuitable for this problem, since the model is circularly symmetric. Set the box constraint parameter to Inf to make a strict classification, meaning no misclassified training points. Other kernel functions might not work with this strict box constraint, since they might be unable to provide a strict classification. Even though the rbf classifier can separate the classes, the result can be overtrained. Generate 100 points uniformly distributed in the unit disk. To do so, generate a radius r as the square root of a uniform random variable, generate an angle t uniformly in (0, 2π), and put the point at (r cos(t), r sin(t)). rng(1); % For reproducibility r = sqrt(rand(100,1)); % Radius t = 2*pi*rand(100,1); % Angle data1 = [r.*cos(t), r.*sin(t)]; % Points

25-8

Support Vector Machines for Binary Classification

Generate 100 points uniformly distributed in the annulus. The radius is again proportional to a square root, this time a square root of the uniform distribution from 1 through 4. r2 = sqrt(3*rand(100,1)+1); % Radius t2 = 2*pi*rand(100,1); % Angle data2 = [r2.*cos(t2), r2.*sin(t2)]; % points

Plot the points, and plot circles of radii 1 and 2 for comparison. figure; plot(data1(:,1),data1(:,2),'r.','MarkerSize',15) hold on plot(data2(:,1),data2(:,2),'b.','MarkerSize',15) ezpolar(@(x)1);ezpolar(@(x)2); axis equal hold off

Put the data in one matrix, and make a vector of classifications. data3 = [data1;data2]; theclass = ones(200,1); theclass(1:100) = -1;

Train an SVM classifier with KernelFunction set to 'rbf' and BoxConstraint set to Inf. Plot the decision boundary and flag the support vectors. %Train the SVM Classifier cl = fitcsvm(data3,theclass,'KernelFunction','rbf',...

25-9

25

Support Vector Machines

'BoxConstraint',Inf,'ClassNames',[-1,1]); % Predict scores over the grid d = 0.02; [x1Grid,x2Grid] = meshgrid(min(data3(:,1)):d:max(data3(:,1)),... min(data3(:,2)):d:max(data3(:,2))); xGrid = [x1Grid(:),x2Grid(:)]; [~,scores] = predict(cl,xGrid); % Plot the data and the decision boundary figure; h(1:2) = gscatter(data3(:,1),data3(:,2),theclass,'rb','.'); hold on ezpolar(@(x)1); h(3) = plot(data3(cl.IsSupportVector,1),data3(cl.IsSupportVector,2),'ko'); contour(x1Grid,x2Grid,reshape(scores(:,2),size(x1Grid)),[0 0],'k'); legend(h,{'-1','+1','Support Vectors'}); axis equal hold off

fitcsvm generates a classifier that is close to a circle of radius 1. The difference is due to the random training data. Training with the default parameters makes a more nearly circular classification boundary, but one that misclassifies some training data. Also, the default value of BoxConstraint is 1, and, therefore, there are more support vectors.

25-10

Support Vector Machines for Binary Classification

cl2 = fitcsvm(data3,theclass,'KernelFunction','rbf'); [~,scores2] = predict(cl2,xGrid); figure; h(1:2) = gscatter(data3(:,1),data3(:,2),theclass,'rb','.'); hold on ezpolar(@(x)1); h(3) = plot(data3(cl2.IsSupportVector,1),data3(cl2.IsSupportVector,2),'ko'); contour(x1Grid,x2Grid,reshape(scores2(:,2),size(x1Grid)),[0 0],'k'); legend(h,{'-1','+1','Support Vectors'}); axis equal hold off

Train SVM Classifier Using Custom Kernel This example shows how to use a custom kernel function, such as the sigmoid kernel, to train SVM classifiers, and adjust custom kernel function parameters. Generate a random set of points within the unit circle. Label points in the first and third quadrants as belonging to the positive class, and those in the second and fourth quadrants in the negative class. rng(1); % For reproducibility n = 100; % Number of points per quadrant r1 = sqrt(rand(2*n,1));

% Random radii

25-11

25

Support Vector Machines

t1 = [pi/2*rand(n,1); (pi/2*rand(n,1)+pi)]; % Random angles for Q1 and Q3 X1 = [r1.*cos(t1) r1.*sin(t1)]; % Polar-to-Cartesian conversion r2 = sqrt(rand(2*n,1)); t2 = [pi/2*rand(n,1)+pi/2; (pi/2*rand(n,1)-pi/2)]; % Random angles for Q2 and Q4 X2 = [r2.*cos(t2) r2.*sin(t2)]; X = [X1; X2]; % Predictors Y = ones(4*n,1); Y(2*n + 1:end) = -1; % Labels

Plot the data. figure; gscatter(X(:,1),X(:,2),Y); title('Scatter Diagram of Simulated Data')

Write a function that accepts two matrices in the feature space as inputs, and transforms them into a Gram matrix using the sigmoid kernel. function G = mysigmoid(U,V) % Sigmoid kernel function with slope gamma and intercept c gamma = 1; c = -1; G = tanh(gamma*U*V' + c); end

25-12

Support Vector Machines for Binary Classification

Save this code as a file named mysigmoid on your MATLAB® path. Train an SVM classifier using the sigmoid kernel function. It is good practice to standardize the data. Mdl1 = fitcsvm(X,Y,'KernelFunction','mysigmoid','Standardize',true);

Mdl1 is a ClassificationSVM classifier containing the estimated parameters. Plot the data, and identify the support vectors and the decision boundary. % Compute the scores over a grid d = 0.02; % Step size of the grid [x1Grid,x2Grid] = meshgrid(min(X(:,1)):d:max(X(:,1)),... min(X(:,2)):d:max(X(:,2))); xGrid = [x1Grid(:),x2Grid(:)]; % The grid [~,scores1] = predict(Mdl1,xGrid); % The scores figure; h(1:2) = gscatter(X(:,1),X(:,2),Y); hold on h(3) = plot(X(Mdl1.IsSupportVector,1),... X(Mdl1.IsSupportVector,2),'ko','MarkerSize',10); % Support vectors contour(x1Grid,x2Grid,reshape(scores1(:,2),size(x1Grid)),[0 0],'k'); % Decision boundary title('Scatter Diagram with the Decision Boundary') legend({'-1','1','Support Vectors'},'Location','Best'); hold off

25-13

25

Support Vector Machines

You can adjust the kernel parameters in an attempt to improve the shape of the decision boundary. This might also decrease the within-sample misclassification rate, but, you should first determine the out-of-sample misclassification rate. Determine the out-of-sample misclassification rate by using 10-fold cross validation. CVMdl1 = crossval(Mdl1); misclass1 = kfoldLoss(CVMdl1); misclass1 misclass1 = 0.1350

The out-of-sample misclassification rate is 13.5%. Write another sigmoid function, but Set gamma = 0.5;. function G = mysigmoid2(U,V) % Sigmoid kernel function with slope gamma and intercept c gamma = 0.5; c = -1; G = tanh(gamma*U*V' + c); end

Save this code as a file named mysigmoid2 on your MATLAB® path. Train another SVM classifier using the adjusted sigmoid kernel. Plot the data and the decision region, and determine the out-of-sample misclassification rate. Mdl2 = fitcsvm(X,Y,'KernelFunction','mysigmoid2','Standardize',true); [~,scores2] = predict(Mdl2,xGrid); figure; h(1:2) = gscatter(X(:,1),X(:,2),Y); hold on h(3) = plot(X(Mdl2.IsSupportVector,1),... X(Mdl2.IsSupportVector,2),'ko','MarkerSize',10); title('Scatter Diagram with the Decision Boundary') contour(x1Grid,x2Grid,reshape(scores2(:,2),size(x1Grid)),[0 0],'k'); legend({'-1','1','Support Vectors'},'Location','Best'); hold off CVMdl2 = crossval(Mdl2); misclass2 = kfoldLoss(CVMdl2); misclass2 misclass2 = 0.0450

25-14

Support Vector Machines for Binary Classification

After the sigmoid slope adjustment, the new decision boundary seems to provide a better withinsample fit, and the cross-validation rate contracts by more than 66%.

Optimize Classifier Fit Using Bayesian Optimization This example shows how to optimize an SVM classification using the fitcsvm function and the OptimizeHyperparameters name-value argument. Generate Data The classification works on locations of points from a Gaussian mixture model. In The Elements of Statistical Learning, Hastie, Tibshirani, and Friedman (2009), page 17 describes the model. The model begins with generating 10 base points for a "green" class, distributed as 2-D independent normals with mean (1,0) and unit variance. It also generates 10 base points for a "red" class, distributed as 2-D independent normals with mean (0,1) and unit variance. For each class (green and red), generate 100 random points as follows: 1

Choose a base point m of the appropriate color uniformly at random.

2

Generate an independent random point with 2-D normal distribution with mean m and variance I/5, where I is the 2-by-2 identity matrix. In this example, use a variance I/50 to show the advantage of optimization more clearly.

Generate the 10 base points for each class. 25-15

25

Support Vector Machines

rng('default') % For reproducibility grnpop = mvnrnd([1,0],eye(2),10); redpop = mvnrnd([0,1],eye(2),10);

View the base points. plot(grnpop(:,1),grnpop(:,2),'go') hold on plot(redpop(:,1),redpop(:,2),'ro') hold off

Since some red base points are close to green base points, it can be difficult to classify the data points based on location alone. Generate the 100 data points of each class. redpts = zeros(100,2); grnpts = redpts; for i = 1:100 grnpts(i,:) = mvnrnd(grnpop(randi(10),:),eye(2)*0.02); redpts(i,:) = mvnrnd(redpop(randi(10),:),eye(2)*0.02); end

View the data points. figure plot(grnpts(:,1),grnpts(:,2),'go') hold on

25-16

Support Vector Machines for Binary Classification

plot(redpts(:,1),redpts(:,2),'ro') hold off

Prepare Data for Classification Put the data into one matrix, and make a vector grp that labels the class of each point. 1 indicates the green class, and –1 indicates the red class. cdata = [grnpts;redpts]; grp = ones(200,1); grp(101:200) = -1;

Prepare Cross-Validation Set up a partition for cross-validation. c = cvpartition(200,'KFold',10);

This step is optional. If you specify a partition for the optimization, then you can compute an actual cross-validation loss for the returned model. Optimize Fit To find a good fit, meaning one with optimal hyperparameters that minimize the cross-validation loss, use Bayesian optimization. Specify a list of hyperparameters to optimize by using the OptimizeHyperparameters name-value argument, and specify optimization options by using the HyperparameterOptimizationOptions name-value argument. 25-17

25

Support Vector Machines

Specify 'OptimizeHyperparameters' as 'auto'. The 'auto' option includes a typical set of hyperparameters to optimize. fitcsvm finds optimal values of BoxConstraint, KernelScale, and Standardize. Set the hyperparameter optimization options to use the cross-validation partition c and to choose the 'expected-improvement-plus' acquisition function for reproducibility. The default acquisition function depends on run time and, therefore, can give varying results. opts = struct('CVPartition',c,'AcquisitionFunctionName', ... 'expected-improvement-plus'); Mdl = fitcsvm(cdata,grp,'KernelFunction','rbf', ... 'OptimizeHyperparameters','auto','HyperparameterOptimizationOptions',opts)

|================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | BoxConstraint| KernelS | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 1 | Best | 0.195 | 0.36961 | 0.195 | 0.195 | 193.54 | 0.06 | 2 | Accept | 0.345 | 0.18896 | 0.195 | 0.20398 | 43.991 | 27 | 3 | Accept | 0.365 | 0.235 | 0.195 | 0.20784 | 0.0056595 | 0.04 | 4 | Accept | 0.61 | 0.22981 | 0.195 | 0.31714 | 49.333 | 0.001 | 5 | Best | 0.1 | 0.2008 | 0.1 | 0.10005 | 996.27 | 1. | 6 | Accept | 0.13 | 0.19225 | 0.1 | 0.10003 | 25.398 | 1. | 7 | Best | 0.085 | 0.20104 | 0.085 | 0.08521 | 930.3 | 0.6 | 8 | Accept | 0.35 | 0.18879 | 0.085 | 0.085172 | 0.012972 | 9 | 9 | Best | 0.075 | 0.24081 | 0.075 | 0.077959 | 871.26 | 0.4 | 10 | Accept | 0.08 | 0.18315 | 0.075 | 0.077975 | 974.28 | 0.4 | 11 | Accept | 0.235 | 0.28984 | 0.075 | 0.077907 | 920.57 | 6 | 12 | Accept | 0.305 | 0.17906 | 0.075 | 0.077922 | 0.0010077 | 1. | 13 | Best | 0.07 | 0.25248 | 0.07 | 0.073603 | 991.16 | 0.3 | 14 | Accept | 0.075 | 0.22592 | 0.07 | 0.073191 | 989.88 | 0.2 | 15 | Accept | 0.245 | 0.26752 | 0.07 | 0.073276 | 988.76 | 9. | 16 | Accept | 0.07 | 0.23304 | 0.07 | 0.071416 | 957.65 | 0.3 | 17 | Accept | 0.35 | 0.18969 | 0.07 | 0.071421 | 0.0010579 | 33 | 18 | Accept | 0.085 | 0.13592 | 0.07 | 0.071274 | 48.536 | 0.3 | 19 | Accept | 0.07 | 0.14931 | 0.07 | 0.070587 | 742.56 | 0.3 | 20 | Accept | 0.61 | 0.23082 | 0.07 | 0.070796 | 865.48 | 0.001 |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | BoxConstraint| KernelS | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 21 | Accept | 0.1 | 0.21008 | 0.07 | 0.070715 | 970.87 | 0.1 | 22 | Accept | 0.095 | 0.21623 | 0.07 | 0.07087 | 914.88 | 0.4 | 23 | Accept | 0.07 | 0.19067 | 0.07 | 0.070473 | 982.01 | 0. | 24 | Accept | 0.51 | 0.20812 | 0.07 | 0.070515 | 0.0010005 | 0.01 | 25 | Accept | 0.345 | 0.18743 | 0.07 | 0.070533 | 0.0010063 | 97 | 26 | Accept | 0.315 | 0.20787 | 0.07 | 0.07057 | 947.71 | 15 | 27 | Accept | 0.35 | 0.1974 | 0.07 | 0.070605 | 0.0010028 | 4 | 28 | Accept | 0.61 | 0.20168 | 0.07 | 0.070598 | 0.0010405 | 0.001 | 29 | Accept | 0.555 | 0.20219 | 0.07 | 0.070173 | 993.56 | 0.01 | 30 | Accept | 0.07 | 0.20411 | 0.07 | 0.070158 | 965.73 | 0.2 __________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 23.9901 seconds Total objective function evaluation time: 6.4096 Best observed feasible point:

25-18

Support Vector Machines for Binary Classification

BoxConstraint _____________ 991.16

KernelScale ___________

Standardize ___________

0.37801

false

Observed objective function value = 0.07 Estimated objective function value = 0.072292 Function evaluation time = 0.25248 Best estimated feasible point (according to models): BoxConstraint KernelScale Standardize _____________ ___________ ___________ 957.65

0.31271

false

Estimated objective function value = 0.070158 Estimated function evaluation time = 0.20996

Mdl = ClassificationSVM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: HyperparameterOptimizationResults: Alpha:

'Y' [] [-1 1] 'none' 200 [1x1 BayesianOptimization] [66x1 double]

25-19

25

Support Vector Machines

Bias: KernelParameters: BoxConstraints: ConvergenceInfo: IsSupportVector: Solver:

-0.0910 [1x1 struct] [200x1 double] [1x1 struct] [200x1 logical] 'SMO'

fitcsvm returns a ClassificationSVM model object that uses the best estimated feasible point. The best estimated feasible point is the set of hyperparameters that minimizes the upper confidence bound of the cross-validation loss based on the underlying Gaussian process model of the Bayesian optimization process. The Bayesian optimization process internally maintains a Gaussian process model of the objective function. The objective function is the cross-validated misclassification rate for classification. For each iteration, the optimization process updates the Gaussian process model and uses the model to find a new set of hyperparameters. Each line of the iterative display shows the new set of hyperparameters and these column values: • Objective — Objective function value computed at the new set of hyperparameters. • Objective runtime — Objective function evaluation time. • Eval result — Result report, specified as Accept, Best, or Error. Accept indicates that the objective function returns a finite value, and Error indicates that the objective function returns a value that is not a finite real scalar. Best indicates that the objective function returns a finite value that is lower than previously computed objective function values. • BestSoFar(observed) — The minimum objective function value computed so far. This value is either the objective function value of the current iteration (if the Eval result value for the current iteration is Best) or the value of the previous Best iteration. • BestSoFar(estim.) — At each iteration, the software estimates the upper confidence bounds of the objective function values, using the updated Gaussian process model, at all the sets of hyperparameters tried so far. Then the software chooses the point with the minimum upper confidence bound. The BestSoFar(estim.) value is the objective function value returned by the predictObjective function at the minimum point. The plot below the iterative display shows the BestSoFar(observed) and BestSoFar(estim.) values in blue and green, respectively. The returned object Mdl uses the best estimated feasible point, that is, the set of hyperparameters that produces the BestSoFar(estim.) value in the final iteration based on the final Gaussian process model. You can obtain the best point from the HyperparameterOptimizationResults property or by using the bestPoint function. Mdl.HyperparameterOptimizationResults.XAtMinEstimatedObjective ans=1×3 table BoxConstraint _____________ 957.65

KernelScale ___________

Standardize ___________

0.31271

false

[x,CriterionValue,iteration] = bestPoint(Mdl.HyperparameterOptimizationResults)

25-20

Support Vector Machines for Binary Classification

x=1×3 table BoxConstraint _____________

KernelScale ___________

Standardize ___________

0.31271

false

957.65 CriterionValue = 0.0724 iteration = 16

By default, the bestPoint function uses the 'min-visited-upper-confidence-interval' criterion. This criterion chooses the hyperparameters obtained from the 16th iteration as the best point. CriterionValue is the upper bound of the cross-validated loss computed by the final Gaussian process model. Compute the actual cross-validated loss by using the partition c. L_MinEstimated = kfoldLoss(fitcsvm(cdata,grp,'CVPartition',c, ... 'KernelFunction','rbf','BoxConstraint',x.BoxConstraint, ... 'KernelScale',x.KernelScale,'Standardize',x.Standardize=='true')) L_MinEstimated = 0.0700

The actual cross-validated loss is close to the estimated value. The Estimated objective function value is displayed below the plot of the optimization results. You can also extract the best observed feasible point (that is, the last Best point in the iterative display) from the HyperparameterOptimizationResults property or by specifying Criterion as 'min-observed'. Mdl.HyperparameterOptimizationResults.XAtMinObjective ans=1×3 table BoxConstraint _____________ 991.16

KernelScale ___________

Standardize ___________

0.37801

false

[x_observed,CriterionValue_observed,iteration_observed] = ... bestPoint(Mdl.HyperparameterOptimizationResults,'Criterion','min-observed') x_observed=1×3 table BoxConstraint KernelScale _____________ ___________ 991.16

0.37801

Standardize ___________ false

CriterionValue_observed = 0.0700 iteration_observed = 13

The 'min-observed' criterion chooses the hyperparameters obtained from the 13th iteration as the best point. CriterionValue_observed is the actual cross-validated loss computed using the selected hyperparameters. For more information, see the “Criterion” on page 35-0 name-value argument of bestPoint. Visualize the optimized classifier. 25-21

25

Support Vector Machines

d = 0.02; [x1Grid,x2Grid] = meshgrid(min(cdata(:,1)):d:max(cdata(:,1)), ... min(cdata(:,2)):d:max(cdata(:,2))); xGrid = [x1Grid(:),x2Grid(:)]; [~,scores] = predict(Mdl,xGrid); figure h(1:2) = gscatter(cdata(:,1),cdata(:,2),grp,'rg','+*'); hold on h(3) = plot(cdata(Mdl.IsSupportVector,1), ... cdata(Mdl.IsSupportVector,2),'ko'); contour(x1Grid,x2Grid,reshape(scores(:,2),size(x1Grid)),[0 0],'k'); legend(h,{'-1','+1','Support Vectors'},'Location','Southeast');

Evaluate Accuracy on New Data Generate and classify new test data points. grnobj = gmdistribution(grnpop,.2*eye(2)); redobj = gmdistribution(redpop,.2*eye(2)); newData = random(grnobj,10); newData = [newData;random(redobj,10)]; grpData = ones(20,1); % green = 1 grpData(11:20) = -1; % red = -1 v = predict(Mdl,newData);

25-22

Support Vector Machines for Binary Classification

Compute the misclassification rates on the test data set. L_Test = loss(Mdl,newData,grpData) L_Test = 0.2000

Determine which new data points are classified correctly. Format the correctly classified points in red squares and the incorrectly classified points in black squares. h(4:5) = gscatter(newData(:,1),newData(:,2),v,'mc','**'); mydiff = (v == grpData); % Classified correctly for ii = mydiff % Plot red squares around correct pts h(6) = plot(newData(ii,1),newData(ii,2),'rs','MarkerSize',12); end for ii = not(mydiff) % Plot black squares around incorrect pts h(7) = plot(newData(ii,1),newData(ii,2),'ks','MarkerSize',12); end legend(h,{'-1 (training)','+1 (training)','Support Vectors', ... '-1 (classified)','+1 (classified)', ... 'Correctly Classified','Misclassified'}, ... 'Location','Southeast'); hold off

25-23

25

Support Vector Machines

Plot Posterior Probability Regions for SVM Classification Models This example shows how to predict posterior probabilities of SVM models over a grid of observations, and then plot the posterior probabilities over the grid. Plotting posterior probabilities exposes decision boundaries. Load Fisher's iris data set. Train the classifier using the petal lengths and widths, and remove the virginica species from the data. load fisheriris classKeep = ~strcmp(species,'virginica'); X = meas(classKeep,3:4); y = species(classKeep);

Train an SVM classifier using the data. It is good practice to specify the order of the classes. SVMModel = fitcsvm(X,y,'ClassNames',{'setosa','versicolor'});

Estimate the optimal score transformation function. rng(1); % For reproducibility [SVMModel,ScoreParameters] = fitPosterior(SVMModel);

Warning: Classes are perfectly separated. The optimal score-to-posterior transformation is a step ScoreParameters

25-24

Support Vector Machines for Binary Classification

ScoreParameters = struct with Type: LowerBound: UpperBound: PositiveClassProbability:

fields: 'step' -0.8431 0.6897 0.5000

The optimal score transformation function is the step function because the classes are separable. The fields LowerBound and UpperBound of ScoreParameters indicate the lower and upper end points of the interval of scores corresponding to observations within the class-separating hyperplanes (the margin). No training observation falls within the margin. If a new score is in the interval, then the software assigns the corresponding observation a positive class posterior probability, i.e., the value in the PositiveClassProbability field of ScoreParameters. Define a grid of values in the observed predictor space. Predict the posterior probabilities for each instance in the grid. xMax = max(X); xMin = min(X); d = 0.01; [x1Grid,x2Grid] = meshgrid(xMin(1):d:xMax(1),xMin(2):d:xMax(2)); [~,PosteriorRegion] = predict(SVMModel,[x1Grid(:),x2Grid(:)]);

Plot the positive class posterior probability region and the training data. figure; contourf(x1Grid,x2Grid,... reshape(PosteriorRegion(:,2),size(x1Grid,1),size(x1Grid,2))); h = colorbar; h.Label.String = 'P({\it{versicolor}})'; h.YLabel.FontSize = 16; colormap jet; hold on gscatter(X(:,1),X(:,2),y,'mc','.x',[15,10]); sv = X(SVMModel.IsSupportVector,:); plot(sv(:,1),sv(:,2),'yo','MarkerSize',15,'LineWidth',2); axis tight hold off

25-25

25

Support Vector Machines

In two-class learning, if the classes are separable, then there are three regions: one where observations have positive class posterior probability 0, one where it is 1, and the other where it is the positive class prior probability.

Analyze Images Using Linear Support Vector Machines This example shows how to determine which quadrant of an image a shape occupies by training an error-correcting output codes (ECOC) model comprised of linear SVM binary learners. This example also illustrates the disk-space consumption of ECOC models that store support vectors, their labels, and the estimated α coefficients. Create the Data Set Randomly place a circle with radius five in a 50-by-50 image. Make 5000 images. Create a label for each image indicating the quadrant that the circle occupies. Quadrant 1 is in the upper right, quadrant 2 is in the upper left, quadrant 3 is in the lower left, and quadrant 4 is in the lower right. The predictors are the intensities of each pixel. d = 50; % Height and width of the images in pixels n = 5e4; % Sample size X = zeros(n,d^2); % Predictor matrix preallocation Y = zeros(n,1); % Label preallocation theta = 0:(1/d):(2*pi);

25-26

Support Vector Machines for Binary Classification

r = 5; rng(1);

% Circle radius % For reproducibility

for j = 1:n figmat = zeros(d); % Empty image c = datasample((r + 1):(d - r - 1),2); % Random circle center x = r*cos(theta) + c(1); % Make the circle y = r*sin(theta) + c(2); idx = sub2ind([d d],round(y),round(x)); % Convert to linear indexing figmat(idx) = 1; % Draw the circle X(j,:) = figmat(:); % Store the data Y(j) = (c(2) >= floor(d/2)) + 2*(c(2) < floor(d/2)) + ... (c(1) < floor(d/2)) + ... 2*((c(1) >= floor(d/2)) & (c(2) < floor(d/2))); % Determine the quadrant end

Plot an observation. figure imagesc(figmat) h = gca; h.YDir = 'normal'; title(sprintf('Quadrant %d',Y(end)))

Train the ECOC Model Use a 25% holdout sample and specify the training and holdout sample indices. 25-27

25

Support Vector Machines

p = 0.25; CVP = cvpartition(Y,'Holdout',p); % Cross-validation data partition isIdx = training(CVP); % Training sample indices oosIdx = test(CVP); % Test sample indices

Create an SVM template that specifies storing the support vectors of the binary learners. Pass it and the training data to fitcecoc to train the model. Determine the training sample classification error. t = templateSVM('SaveSupportVectors',true); MdlSV = fitcecoc(X(isIdx,:),Y(isIdx),'Learners',t); isLoss = resubLoss(MdlSV) isLoss = 0

MdlSV is a trained ClassificationECOC multiclass model. It stores the training data and the support vectors of each binary learner. For large data sets, such as those in image analysis, the model can consume a lot of memory. Determine the amount of disk space that the ECOC model consumes. infoMdlSV = whos('MdlSV'); mbMdlSV = infoMdlSV.bytes/1.049e6 mbMdlSV = 763.6163

The model consumes 763.6 MB. Improve Model Efficiency You can assess out-of-sample performance. You can also assess whether the model has been overfit with a compacted model that does not contain the support vectors, their related parameters, and the training data. Discard the support vectors and related parameters from the trained ECOC model. Then, discard the training data from the resulting model by using compact. Mdl = discardSupportVectors(MdlSV); CMdl = compact(Mdl); info = whos('Mdl','CMdl'); [bytesCMdl,bytesMdl] = info.bytes; memReduction = 1 - [bytesMdl bytesCMdl]/infoMdlSV.bytes memReduction = 1×2 0.0626

0.9996

In this case, discarding the support vectors reduces the memory consumption by about 6%. Compacting and discarding support vectors reduces the size by about 99.96%. An alternative way to manage support vectors is to reduce their numbers during training by specifying a larger box constraint, such as 100. Though SVM models that use fewer support vectors are more desirable and consume less memory, increasing the value of the box constraint tends to increase the training time. Remove MdlSV and Mdl from the workspace. clear Mdl MdlSV

25-28

Support Vector Machines for Binary Classification

Assess Holdout Sample Performance Calculate the classification error of the holdout sample. Plot a sample of the holdout sample predictions. oosLoss = loss(CMdl,X(oosIdx,:),Y(oosIdx)) oosLoss = 0 yHat = predict(CMdl,X(oosIdx,:)); nVec = 1:size(X,1); oosIdx = nVec(oosIdx); figure; for j = 1:9 subplot(3,3,j) imagesc(reshape(X(oosIdx(j),:),[d d])) h = gca; h.YDir = 'normal'; title(sprintf('Quadrant: %d',yHat(j))) end text(-1.33*d,4.5*d + 1,'Predictions','FontSize',17)

The model does not misclassify any holdout sample observations.

See Also fitcsvm | bayesopt | kfoldLoss 25-29

25

Support Vector Machines

More About •

“Train Support Vector Machines Using Classification Learner App” on page 23-111

•

“Optimize Cross-Validated Classifier Using bayesopt” on page 10-46

References [1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, second edition. New York: Springer, 2008. [2] Christianini, N., and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000. [3] Fan, R.-E., P.-H. Chen, and C.-J. Lin. “Working set selection using second order information for training support vector machines.” Journal of Machine Learning Research, Vol 6, 2005, pp. 1889–1918. [4] Kecman V., T. -M. Huang, and M. Vogt. “Iterative Single Data Algorithm for Training Kernel Machines from Huge Data Sets: Theory and Performance.” In Support Vector Machines: Theory and Applications. Edited by Lipo Wang, 255–274. Berlin: Springer-Verlag, 2005.

25-30

Understanding Support Vector Machine Regression

Understanding Support Vector Machine Regression In this section... “Mathematical Formulation of SVM Regression” on page 25-31 “Solving the SVM Regression Optimization Problem” on page 25-34

Mathematical Formulation of SVM Regression Overview Support vector machine (SVM) analysis is a popular machine learning tool for classification and regression, first identified by Vladimir Vapnik and his colleagues in 1992[5]. SVM regression is considered a nonparametric technique because it relies on kernel functions. Statistics and Machine Learning Toolbox implements linear epsilon-insensitive SVM (ε-SVM) regression, which is also known as L1 loss. In ε-SVM regression, the set of training data includes predictor variables and observed response values. The goal is to find a function f(x) that deviates from yn by a value no greater than ε for each training point x, and at the same time is as flat as possible. Linear SVM Regression: Primal Formula Suppose we have a set of training data where xn is a multivariate set of N observations with observed response values yn. To find the linear function f (x) = x′β + b, and ensure that it is as flat as possible, find f(x) with the minimal norm value (β′β). This is formulated as a convex optimization problem to minimize J β =

1 β′β 2

subject to all residuals having a value less than ε; or, in equation form: ∀n: yn − xn′β + b ≤ ε . It is possible that no such function f(x) exists to satisfy these constraints for all points. To deal with otherwise infeasible constraints, introduce slack variables ξn and ξ*n for each point. This approach is similar to the “soft margin” concept in SVM classification, because the slack variables allow regression errors to exist up to the value of ξn and ξ*n, yet still satisfy the required conditions. Including slack variables leads to the objective function, also known as the primal formula[5]: J β =

N

1 β′β + C ∑ ξn + ξn* , 2 n=1

subject to: 25-31

25

Support Vector Machines

∀n: yn − xn′β + b ≤ ε + ξn ∀n: xn′β + b − yn ≤ ε + ξn* ∀n: ξn* ≥ 0 ∀n: ξn ≥ 0 . The constant C is the box constraint, a positive numeric value that controls the penalty imposed on observations that lie outside the epsilon margin (ε) and helps to prevent overfitting (regularization). This value determines the trade-off between the flatness of f(x) and the amount up to which deviations larger than ε are tolerated. The linear ε-insensitive loss function ignores errors that are within ε distance of the observed value by treating them as equal to zero. The loss is measured based on the distance between observed value y and the ε boundary. This is formally described by Lε =

0 if y − f x ≤ ε y − f x − ε otherwise

Linear SVM Regression: Dual Formula The optimization problem previously described is computationally simpler to solve in its Lagrange dual formulation. The solution to the dual problem provides a lower bound to the solution of the primal (minimization) problem. The optimal values of the primal and dual problems need not be equal, and the difference is called the “duality gap.” But when the problem is convex and satisfies a constraint qualification condition, the value of the optimal solution to the primal problem is given by the solution of the dual problem. To obtain the dual formula, construct a Lagrangian function from the primal function by introducing nonnegative multipliers αn and α*n for each observation xn. This leads to the dual formula, where we minimize Lα =

N

N

N

N

1 ∑ αi − αi* α j − α*j xi′x j + ε ∑ αi + αi* + ∑ yi αi* − αi 2i∑ i=1 i=1 =1 j=1

subject to the constraints N

∑

n=1

αn − αn* = 0

∀n: 0 ≤ αn ≤ C ∀n: 0 ≤ αn* ≤ C . The β parameter can be completely described as a linear combination of the training observations using the equation N

∑

β=

n=1

αn − αn* xn .

The function used to predict new values depends only on the support vectors: f x =

N

∑

n=1

25-32

αn − αn* xn′x + b .

(25-6)

Understanding Support Vector Machine Regression

The Karush-Kuhn-Tucker (KKT) complementarity conditions are optimization constraints required to obtain optimal solutions. For linear SVM regression, these conditions are ∀n: αn ε + ξn − yn + xn′β + b = 0 ∀n: αn* ε + ξn* + yn − xn′β − b = 0 ∀n: ξn C − αn = 0 ∀n: ξn* C − αn* = 0 . These conditions indicate that all observations strictly inside the epsilon tube have Lagrange multipliers αn = 0 and αn* = 0. If either αn or αn* is not zero, then the corresponding observation is called a support vector. The property Alpha of a trained SVM model stores the difference between two Lagrange multipliers of support vectors, αn – αn*. The properties SupportVectors and Bias store xn and b, respectively. Nonlinear SVM Regression: Primal Formula Some regression problems cannot adequately be described using a linear model. In such a case, the Lagrange dual formulation allows the previously-described technique to be extended to nonlinear functions. Obtain a nonlinear SVM regression model by replacing the dot product x1′x2 with a nonlinear kernel function G(x1,x2) = , where φ(x) is a transformation that maps x to a high-dimensional space. Statistics and Machine Learning Toolbox provides the following built-in positive semidefinite kernel functions. Kernel Name

Kernel Function

Linear (dot product)

G(x j, xk) = x j′xk

Gaussian

G x j, xk = exp − x j − xk

Polynomial

G(x j, xk) = (1 + x j′xk) , where q is in the set {2,3,...}.

2

q

The Gram matrix is an n-by-n matrix that contains elements gi,j = G(xi,xj). Each element gi,j is equal to the inner product of the predictors as transformed by φ. However, we do not need to know φ, because we can use the kernel function to generate Gram matrix directly. Using this method, nonlinear SVM finds the optimal function f(x) in the transformed predictor space. Nonlinear SVM Regression: Dual Formula The dual formula for nonlinear SVM regression replaces the inner product of the predictors (xi′xj) with the corresponding element of the Gram matrix (gi,j). Nonlinear SVM regression finds the coefficients that minimize Lα =

N

N

N

N

1 ∑ αi − αi* α j − α*j G xi, x j + ε ∑ αi + αi* − ∑ yi αi − αi* 2i∑ i=1 i=1 =1 j=1

subject to

25-33

25

Support Vector Machines

N

∑

αn − αn* = 0

n=1

∀n: 0 ≤ αn ≤ C ∀n: 0 ≤ αn* ≤ C . The function used to predict new values is equal to f x =

N

∑

n=1

αn − αn* G xn, x + b .

(25-7)

The KKT complementarity conditions are ∀n: αn ε + ξn − yn + f xn = 0 ∀n: αn* ε + ξn* + yn − f xn = 0 ∀n: ξn C − αn = 0 ∀n: ξn* C − αn* = 0 .

Solving the SVM Regression Optimization Problem Solver Algorithms The minimization problem can be expressed in standard quadratic programming form and solved using common quadratic programming techniques. However, it can be computationally expensive to use quadratic programming algorithms, especially since the Gram matrix may be too large to be stored in memory. Using a decomposition method instead can speed up the computation and avoid running out of memory. Decomposition methods (also called chunking and working set methods) separate all observations into two disjoint sets: the working set and the remaining set. A decomposition method modifies only the elements in the working set in each iteration. Therefore, only some columns of the Gram matrix are needed in each iteration, which reduces the amount of storage needed for each iteration. Sequential minimal optimization (SMO) is the most popular approach for solving SVM problems[4]. SMO performs a series of two-point optimizations. In each iteration, a working set of two points are chosen based on a selection rule that uses second-order information. Then the Lagrange multipliers for this working set are solved analytically using the approach described in [2] and [1]. In SVM regression, the gradient vector ∇L for the active set is updated after each iteration. The decomposed equation for the gradient vector is N

∑

∇L

n

=

i=1

−

N

∑

i=1

αi − αi* G xi, xn + ε − yn , n ≤ N . αi − αi* G xi, xn + ε + yn , n > N

Iterative single data algorithm (ISDA) updates one Lagrange multiplier with each iteration[3]. ISDA is often conducted without the bias term b by adding a small positive constant a to the kernel function. Dropping b drops the sum constraint 25-34

Understanding Support Vector Machine Regression

N

∑

n=1

αi − α* = 0

in the dual equation. This allows us to update one Lagrange multiplier in each iteration, which makes it easier than SMO to remove outliers. ISDA selects the worst KKT violator among all the αn and αn* values as the working set to be updated. Convergence Criteria Each of these solver algorithms iteratively computes until the specified convergence criterion is met. There are several options for convergence criteria: • Feasibility gap — The feasibility gap is expressed as Δ=

J β +L α , J β +1

where J(β) is the primal objective and L(α) is the dual objective. After each iteration, the software evaluates the feasibility gap. If the feasibility gap is less than the value specified by GapTolerance, then the algorithm met the convergence criterion and the software returns a solution. • Gradient difference — After each iteration, the software evaluates the gradient vector, ∇L. If the difference in gradient vector values for the current iteration and the previous iteration is less than the value specified by DeltaGradientTolerance, then the algorithm met the convergence criterion and the software returns a solution. • Largest KKT violation — After each iteration, the software evaluates the KKT violation for all the αn and αn* values. If the largest violation is less than the value specified by KKTTolerance, then the algorithm met the convergence criterion and the software returns a solution.

References [1] Fan, R.E. , P.H. Chen, and C.J. Lin. "A Study on SMO-Type Decomposition Methods for Support Vector Machines." IEEE Transactions on Neural Networks, Vol. 17:893–908, 2006. [2] Fan, R.E. , P.H. Chen, and C.J. Lin. "Working Set Selection Using Second Order Information for Training Support Vector Machines." The Journal of Machine Learning Research, Vol. 6:1871– 1918, 2005. [3] Huang, T.M., V. Kecman, and I. Kopriva. Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-Supervised, and Unsupervised Learning. Springer, New York, 2006. [4] Platt, J. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Technical Report MSR-TR-98–14, 1999. [5] Vapnik, V. The Nature of Statistical Learning Theory. Springer, New York, 1995.

See Also RegressionSVM | fitrsvm | predict | resubPredict

25-35

25

Support Vector Machines

Related Examples

25-36

•

“Train Linear Support Vector Machine Regression Model” on page 35-3017

•

“Train Support Vector Machine Regression Model” on page 35-3019

•

“Cross-Validate SVM Regression Model” on page 35-3020

•

“Optimize SVM Regression” on page 35-3022

26 Fairness

26

Fairness

Introduction to Fairness in Binary Classification The functions fairnessMetrics, fairnessWeights, disparateImpactRemover, and fairnessThresholder in Statistics and Machine Learning Toolbox allow you to detect and mitigate societal bias in binary classification. First, use fairnessMetrics to evaluate the fairness of a data set or classification model with bias and group metrics. Then, use fairnessWeights to reweight observations, disparateImpactRemover to remove the disparate impact of a sensitive attribute, or fairnessThresholder to optimize the classification threshold. • fairnessMetrics — The fairnessMetrics function computes fairness metrics (bias metrics and group metrics) for a data set or binary classification model with respect to sensitive attributes. The data-level evaluation examines binary, true labels of the data. The model-level evaluation examines the predicted labels returned by one or more binary classification models, using both true labels and predicted labels. You can use the metrics to determine if your data or models contain bias toward a group within each sensitive attribute. • fairnessWeights — The fairnessWeights function computes fairness weights with respect to a sensitive attribute and the response variable. For every combination of a group in the sensitive attribute and a class label in the response variable, the software computes a weight value. The function then assigns each observation its corresponding weight. The returned weights introduce fairness across the sensitive attribute groups. Pass the weights to an appropriate training function, such as fitcsvm, using the Weights name-value argument. • disparateImpactRemover — The disparateImpactRemover function tries to remove the disparate impact of a sensitive attribute on model predictions by using the sensitive attribute to transform the continuous predictors in the data set. The function returns the transformed data set and a disparateImpactRemover object that contains the transformation. Pass the transformed data set to an appropriate training function, such as fitcsvm, and pass the object to the transform object function to apply the transformation to a new data set, such as a test data set. • fairnessThresholder — The fairnessThresholder function searches for an optimal score threshold to maximize accuracy while satisfying fairness bounds. For observations in the critical region below the optimal threshold, the function adjusts the labels so that the fairness constraints hold for the reference and nonreference groups in the sensitive attribute. After you create a fairnessThresholder object, you can use the predict and loss object functions on new data to predict fairness labels and calculate the classification loss, respectively.

Reduce Statistical Parity Difference Using Fairness Weights Train a neural network model, and compute the statistical parity difference (SPD) for each group in the sensitive attribute. To reduce the SPD values, compute fairness weights, and retrain the neural network model. Read the sample file CreditRating_Historical.dat into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency. creditrating = readtable("CreditRating_Historical.dat");

Because each value in the ID variable is a unique customer ID—that is, length(unique(creditrating.ID)) is equal to the number of observations in creditrating— the ID variable is a poor predictor. Remove the ID variable from the table, and convert the Industry variable to a categorical variable. 26-2

Introduction to Fairness in Binary Classification

creditrating.ID = []; creditrating.Industry = categorical(creditrating.Industry);

In the Rating response variable, combine the AAA, AA, A, and BBB ratings into a category of "good" ratings, and the BB, B, and CCC ratings into a category of "poor" ratings. Rating = categorical(creditrating.Rating); Rating = mergecats(Rating,["AAA","AA","A","BBB"],"good"); Rating = mergecats(Rating,["BB","B","CCC"],"poor"); creditrating.Rating = Rating;

Train a neural network model on the creditrating data. For better results, standardize the predictors before fitting the model. Use the trained model to predict labels for the training data set. rng("default") % For reproducibility netMdl = fitcnet(creditrating,"Rating",Standardize=true); netPredictions = predict(netMdl,creditrating);

Compute fairness metrics with respect to the Industry sensitive attribute by using the model predictions. In particular, find the statistical parity difference (SPD) for each group in Industry. netEvaluator = fairnessMetrics(creditrating,"Rating", ... SensitiveAttributeNames="Industry",Predictions=netPredictions); report(netEvaluator,BiasMetrics="StatisticalParityDifference") ans=12×4 table ModelNames __________ Model1 Model1 Model1 Model1 Model1 Model1 Model1 Model1 Model1 Model1 Model1 Model1

SensitiveAttributeNames _______________________ Industry Industry Industry Industry Industry Industry Industry Industry Industry Industry Industry Industry

Groups ______

StatisticalParityDifference ___________________________

1 2 3 4 5 6 7 8 9 10 11 12

0.071446 0.093203 0 0.097601 0.058292 -0.00032414 0.026536 0.090712 0.16957 0.1567 0.022251 0.035036

To better understand the distribution of SPD values, plot the values using a box plot. spdValues = netEvaluator.BiasMetrics.StatisticalParityDifference; boxchart(spdValues) ylabel("Statistical Parity Difference") title("Distribution of Statistical Parity Differences")

26-3

26

Fairness

The median SPD value is around 0.06, which is higher than the value 0 of a fair model. Compute fairness weights, and refit a neural network model using the weights. As before, standardize the predictors. Then, predict labels for the training data by using the new model. weights = fairnessWeights(creditrating,"Industry","Rating"); rng("default") % For reproducibility newNetMdl = fitcnet(creditrating,"Rating",Weights=weights, ... Standardize=true); newNetPredictions = predict(newNetMdl,creditrating);

Compute the new SPD values. newNetEvaluator = fairnessMetrics(creditrating,"Rating", ... SensitiveAttributeNames="Industry",Predictions=newNetPredictions); report(newNetEvaluator,BiasMetrics="StatisticalParityDifference") ans=12×4 table ModelNames __________ Model1 Model1 Model1 Model1 Model1 Model1

26-4

SensitiveAttributeNames _______________________ Industry Industry Industry Industry Industry Industry

Groups ______ 1 2 3 4 5 6

StatisticalParityDifference ___________________________ 0.042932 0.058633 0 0.059221 0.032651 -0.010995

Introduction to Fairness in Binary Classification

Model1 Model1 Model1 Model1 Model1 Model1

Industry Industry Industry Industry Industry Industry

7 8 9 10 11 12

0.013594 0.065071 0.11039 0.12179 0.013276 0.0093945

Display the two distributions of SPD values. The left box plot shows the SPD values computed using the original model. The right box plot shows the SPD values computed using the new model trained with fairness weights. spdValuesUpdated = newNetEvaluator.BiasMetrics.StatisticalParityDifference; boxchart([spdValues spdValuesUpdated]) xticklabels(["Without Weights","With Weights"]) ylabel("Statistical Parity Difference") title("Distribution of Statistical Parity Differences")

The new SPD values have a median around 0.04, which is closer to 0 than the previous median of 0.06. The maximum value of the new SPD values, which is around 0.11, is also closer to 0 than the previous maximum value, which is around 0.16.

Reduce Disparate Impact of Predictions

26-5

26

Fairness

Train a binary classifier, classify test data using the model, and compute the disparate impact for each group in the sensitive attribute. To reduce the disparate impact values, use disparateImpactRemover, and then retrain the binary classifier. Transform the test data set, reclassify the observations, and compute the disparate impact values. Load the sample data census1994, which contains the training data adultdata and the test data adulttest. The data sets consist of demographic information from the US Census Bureau that can be used to predict whether an individual makes over $50,000 per year. Preview the first few rows of the training data set. load census1994 head(adultdata) age ___

workClass ________________

fnlwgt __________

education _________

39 50 38 53 28 37 49 52

State-gov Self-emp-not-inc Private Private Private Private Private Self-emp-not-inc

77516 83311 2.1565e+05 2.3472e+05 3.3841e+05 2.8458e+05 1.6019e+05 2.0964e+05

Bachelors Bachelors HS-grad 11th Bachelors Masters 9th HS-grad

education_num _____________ 13 13 9 7 13 14 5 9

marital_status _____________________ Never-married Married-civ-spouse Divorced Married-civ-spouse Married-civ-spouse Married-civ-spouse Married-spouse-absent Married-civ-spouse

Each row contains the demographic information for one adult. The last column salary shows whether a person has a salary less than or equal to $50,000 per year or greater than $50,000 per year. Remove observations from adultdata and adulttest that contain missing values. adultdata = rmmissing(adultdata); adulttest = rmmissing(adulttest);

Specify the continuous numeric predictors to use for model training. predictors = ["age","education_num","capital_gain","capital_loss", ... "hours_per_week"];

Train an ensemble classifier using the training set adultdata. Specify salary as the response variable and fnlwgt as the observation weights. Because the training set is imbalanced, use the RUSBoost algorithm. After training the model, predict the salary (class label) of the observations in the test set adulttest. rng("default") % For reproducibility mdl = fitcensemble(adultdata,"salary",Weights="fnlwgt", ... PredictorNames=predictors,Method="RUSBoost"); labels = predict(mdl,adulttest);

Transform the training set predictors by using the race sensitive attribute. [remover,newadultdata] = disparateImpactRemover(adultdata, ... "race",PredictorNames=predictors); remover remover = disparateImpactRemover with properties:

26-6

Introduction to Fairness in Binary Classification

RepairFraction: 1 PredictorNames: {'age' SensitiveAttribute: 'race'

'education_num'

'capital_gain'

'capital_loss'

'hours_per_week'

remover is a disparateImpactRemover object, which contains the transformation of the remover.PredictorNames predictors with respect to the remover.SensitiveAttribute variable. Apply the same transformation stored in remover to the test set predictors. Note: You must transform both the training and test data sets before passing them to a classifier. newadulttest = transform(remover,adulttest, ... PredictorNames=predictors);

Train the same type of ensemble classifier as mdl, but use the transformed predictor data. As before, predict the salary (class label) of the observations in the test set adulttest. rng("default") % For reproducibility newMdl = fitcensemble(newadultdata,"salary",Weights="fnlwgt", ... PredictorNames=predictors,Method="RUSBoost"); newLabels = predict(newMdl,newadulttest);

Compare the disparate impact values for the predictions made by the original model (mdl) and the predictions made by the model trained with the transformed data (newMdl). For each group in the sensitive attribute, the disparate impact value is the proportion of predictions in that group with a positive class value (pg + ) divided by the proportion of predictions in the reference group with a positive class value (pr + ). An ideal classifier makes predictions where, for each group, pg + is close to pr + (that is, where the disparate impact value is close to 1). Compute the disparate impact values for the mdl predictions and the newMdl predictions by using fairnessMetrics. Include the observation weights. You can use the report object function to display bias metrics, such as disparate impact, that are stored in the evaluator object. evaluator = fairnessMetrics(adulttest,"salary", ... SensitiveAttributeNames="race",Predictions=[labels,newLabels], ... Weights="fnlwgt",ModelNames=["Original Model","New Model"]); evaluator.PositiveClass ans = categorical >50K evaluator.ReferenceGroup ans = 'White' report(evaluator,BiasMetrics="DisparateImpact") ans=5×5 table Metrics _______________ DisparateImpact DisparateImpact DisparateImpact

SensitiveAttributeNames _______________________ race race race

Groups __________________ Amer-Indian-Eskimo Asian-Pac-Islander Black

Original Model ______________ 0.41702 1.719 0.60571

New Mod _______

0.9280 0.969 0.6662

26-7

26

Fairness

DisparateImpact DisparateImpact

race race

Other White

0.66958 1

For the mdl predictions, several of the disparate impact values are below the industry standard of 0.8, and one value is above 1.25. These values indicate bias in the predictions with respect to the positive class >50K and the sensitive attribute race. The disparate impact values for the newMdl predictions are closer to 1 than the disparate impact values for the mdl predictions. One value is still below 0.8. Visually compare the disparate impact values by using the bar graph returned by the plot object function. plot(evaluator,"DisparateImpact")

The disparateImpactRemover function seems to have improved the model predictions on the test set with respect to the disparate impact metric. Check whether the transformed predictors negatively affect the accuracy of the model predictions. Compute the accuracy of the test set predictions for the two models mdl and newMdl. accuracy = 1-loss(mdl,adulttest,"salary") accuracy = 0.8024 newAccuracy = 1-loss(newMdl,newadulttest,"salary")

26-8

0.8603

Introduction to Fairness in Binary Classification

newAccuracy = 0.7955

The model trained using the transformed predictors (newMdl) achieves similar test set accuracy compared to the model trained with the original predictors (mdl).

See Also fairnessMetrics | fairnessWeights | disparateImpactRemover | transform | fairnessThresholder | loss | predict

26-9

27 Interpretability • “Interpret Machine Learning Models” on page 27-2 • “Shapley Values for Machine Learning Model” on page 27-18

27

Interpretability

Interpret Machine Learning Models This topic introduces Statistics and Machine Learning Toolbox features for model interpretation and shows how to interpret a machine learning model (classification and regression). A machine learning model is often referred to as a "black box" model because it can be difficult to understand how the model makes predictions. Interpretability tools help you overcome this aspect of machine learning algorithms and reveal how predictors contribute (or do not contribute) to predictions. Also, you can validate whether the model uses the correct evidence for its predictions, and find model biases that are not immediately apparent.

Features for Model Interpretation Use lime, shapley, and plotPartialDependence to explain the contribution of individual predictors to the predictions of a trained classification or regression model. • lime — Local interpretable model-agnostic explanations (LIME [1]) interpret a prediction for a query point by fitting a simple interpretable model for the query point. The simple model acts as an approximation for the trained model and explains model predictions around the query point. The simple model can be either a linear model or a decision tree model. You can use the estimated coefficients of a linear model or the estimated predictor importance of a decision tree model to explain the contribution of individual predictors to the prediction for the query point. For more details, see “LIME” on page 35-4652. • shapley — The Shapley value ([2], [3], and [4]) of a predictor for a query point explains the deviation of the prediction (response for regression or class scores for classification) for the query point from the average prediction, due to the predictor. For a query point, the sum of the Shapley values for all features corresponds to the total deviation of the prediction from the average. For more details, see “Shapley Values for Machine Learning Model” on page 27-18. • plotPartialDependence and partialDependence — A partial dependence plot (PDP [5]) shows the relationships between a predictor (or a pair of predictors) and the prediction (response for regression or class scores for classification) in the trained model. The partial dependence on the selected predictor is defined by the averaged prediction obtained by marginalizing out the effect of the other variables. Therefore, the partial dependence is a function of the selected predictor that shows the average effect of the selected predictor over the data set. You can also create a set of individual conditional expectation (ICE [6]) plots for each observation, showing the effect of the selected predictor on a single observation. For more details, see “More About” on page 35-6172 on the plotPartialDependence reference page. Some machine learning models support embedded type feature selection, where the model learns predictor importance as part of the model learning process. You can use the estimated predictor importance to explain model predictions. For example: • Train an ensemble (ClassificationBaggedEnsemble or RegressionBaggedEnsemble) of bagged decision trees (for example, random forest) and use the predictorImportance and oobPermutedPredictorImportance functions. • Train a linear model with lasso regularization, which shrinks the coefficients of the least important predictors. Then use the estimated coefficients as measures for predictor importance. For example, use fitclinear or fitrlinear and specify the 'Regularization' name-value argument as 'lasso'. For a list of machine learning models that support embedded type feature selection, see “Embedded Type Feature Selection” on page 16-51. 27-2

Interpret Machine Learning Models

Use Statistics and Machine Learning Toolbox features for three levels of model interpretation: local, cohort, and global. Level

Objective

Use Case

Local interpretation

Explain a • Identify prediction for a important single query point. predictors for an individual prediction.

Statistics and Machine Learning Toolbox Feature Use lime and shapley for a specified query point.

• Examine a counterintuitive prediction. Cohort interpretation

Explain how a trained model makes predictions for a subset of the entire data set.

Validate • Use lime and shapley for multiple predictions for a query points. After creating a lime particular group of or shapley object, you can call the samples. object function fit multiple times to interpret predictions for other query points. • Pass a subset of data when you call lime, shapley, and plotPartialDependence. The features interpret the trained model using the specified subset instead of the entire training data set.

Global interpretation

Explain how a • Demonstrate trained model how a trained makes predictions model works. for the entire data • Compare set. different models.

• Use plotPartialDependence to create PDPs and ICE plots for the predictors of interest. • Find important predictors from a trained model that supports “Embedded Type Feature Selection” on page 16-51.

Interpret Classification Model This example trains an ensemble of bagged decision trees using the random forest algorithm, and interprets the trained model using interpretability features. Use the object functions (oobPermutedPredictorImportance and predictorImportance) of the trained model to find important predictors in the model. Also, use lime and shapley to interpret the predictions for specified query points. Then use plotPartialDependence to create a plot that shows the relationships between an important predictor and predicted classification scores. Train Classification Ensemble Model Load the CreditRating_Historical data set. The data set contains customer IDs and their financial ratios, industry labels, and credit ratings. tbl = readtable('CreditRating_Historical.dat');

Display the first three rows of the table. 27-3

27

Interpretability

head(tbl,3) ID _____

WC_TA _____

RE_TA _____

EBIT_TA _______

62394 48608 42444

0.013 0.232 0.311

0.104 0.335 0.367

0.036 0.062 0.074

MVE_BVTD ________ 0.447 1.969 1.935

S_TA _____ 0.142 0.281 0.366

Industry ________ 3 8 1

Rating ______ {'BB'} {'A' } {'A' }

Create a table of predictor variables by removing the columns containing customer IDs and ratings from tbl. tblX = removevars(tbl,["ID","Rating"]);

Train an ensemble of bagged decision trees by using the fitcensemble function and specifying the ensemble aggregation method as random forest ('Bag'). For reproducibility of the random forest algorithm, specify the 'Reproducible' name-value argument as true for tree learners. Also, specify the class names to set the order of the classes in the trained model. rng('default') % For reproducibility t = templateTree('Reproducible',true); blackbox = fitcensemble(tblX,tbl.Rating, ... 'Method','Bag','Learners',t, ... 'CategoricalPredictors','Industry', ... 'ClassNames',{'AAA' 'AA' 'A' 'BBB' 'BB' 'B' 'CCC'});

blackbox is a ClassificationBaggedEnsemble model. Use Model-Specific Interpretability Features ClassificationBaggedEnsemble supports two object functions, oobPermutedPredictorImportance and predictorImportance, which find important predictors in the trained model. Estimate out-of-bag predictor importance by using the oobPermutedPredictorImportance function. The function randomly permutes out-of-bag data across one predictor at a time, and estimates the increase in the out-of-bag error due to this permutation. The larger the increase, the more important the feature. Imp1 = oobPermutedPredictorImportance(blackbox);

Estimate predictor importance by using the predictorImportance function. The function estimates predictor importance by summing changes in the node risk due to splits on each predictor and dividing the sum by the number of branch nodes. Imp2 = predictorImportance(blackbox);

Create a table containing the predictor importance estimates, and use the table to create horizontal bar graphs. To display an existing underscore in any predictor name, change the TickLabelInterpreter value of the axes to 'none'. table_Imp = table(Imp1',Imp2', ... 'VariableNames',{'Out-of-Bag Permuted Predictor Importance','Predictor Importance'}, ... 'RowNames',blackbox.PredictorNames); tiledlayout(1,2) ax1 = nexttile; table_Imp1 = sortrows(table_Imp,'Out-of-Bag Permuted Predictor Importance');

27-4

Interpret Machine Learning Models

barh(categorical(table_Imp1.Row,table_Imp1.Row),table_Imp1.('Out-of-Bag Permuted Predictor Import xlabel('Out-of-Bag Permuted Predictor Importance') ylabel('Predictor') ax2 = nexttile; table_Imp2 = sortrows(table_Imp,'Predictor Importance'); barh(categorical(table_Imp2.Row,table_Imp2.Row),table_Imp2.('Predictor Importance')) xlabel('Predictor Importance') ax1.TickLabelInterpreter = 'none'; ax2.TickLabelInterpreter = 'none';

Both object functions identify MVE_BVTD and RE_TA as the two most important predictors. Specify Query Point Find the observations whose Rating is 'AAA' and choose four query points among them. rng('default') tblX_AAA = tblX(strcmp(tbl.Rating,'AAA'),:); queryPoint = datasample(tblX_AAA,4,'Replace',false) queryPoint=4×6 table WC_TA RE_TA _____ _____ 0.283 0.603 0.212

0.715 0.891 0.486

EBIT_TA _______ 0.069 0.117 0.057

MVE_BVTD ________ 9.612 7.851 3.986

S_TA _____

Industry ________

1.066 0.591 0.679

11 6 2

27-5

27

Interpretability

0.273

0.491

0.071

3.287

0.465

5

Use LIME with Linear Simple Models Explain the predictions for the query points using lime with linear simple models. lime generates a synthetic data set and fits a simple model to the synthetic data set. Create a lime object using tblX_AAA so that lime generates a synthetic data set using only the observations whose Rating is 'AAA', not the entire data set. explainer_lime = lime(blackbox,tblX_AAA);

The default value of “DataLocality” on page 35-0 for lime is 'global', which implies that, by default, lime generates a global synthetic data set and uses it for any query points. lime uses different observation weights so that weight values are more focused on the observations near the query point. Therefore, you can interpret each simple model as an approximation of the trained model for a specific query point. Fit simple models for the four query points by using the object function fit. Specify the third input (the number of important predictors to use in the simple model) as 6 to use all six predictors. explainer_lime1 explainer_lime2 explainer_lime3 explainer_lime4

= = = =

fit(explainer_lime,queryPoint(1,:),6); fit(explainer_lime,queryPoint(2,:),6); fit(explainer_lime,queryPoint(3,:),6); fit(explainer_lime,queryPoint(4,:),6);

Plot the coefficients of the simple models by using the object function plot. tiledlayout(2,2) nexttile plot(explainer_lime1) nexttile plot(explainer_lime2) nexttile plot(explainer_lime3) nexttile plot(explainer_lime4)

27-6

Interpret Machine Learning Models

All simple models identify EBIT_TA, MVE_BVTD, RE_TA, and WC_TA as the four most important predictors. The positive coefficients for the predictors suggest that increasing the predictor values leads to an increase in the predicted scores in the simple models. For a categorical predictor, the plot function displays only the most important dummy variable of the categorical predictor. Therefore, each bar graph displays a different dummy variable. Compute Shapley Values The Shapley value of a predictor for a query point explains the deviation of the predicted score for the query point from the average score, due to the predictor. Create a shapley object using tblX_AAA so that shapley computes the expected contribution based on the samples for 'AAA'. explainer_shapley = shapley(blackbox,tblX_AAA);

Compute the Shapley values for the query points by using the object function fit. explainer_shapley1 explainer_shapley2 explainer_shapley3 explainer_shapley4

= = = =

fit(explainer_shapley,queryPoint(1,:)); fit(explainer_shapley,queryPoint(2,:)); fit(explainer_shapley,queryPoint(3,:)); fit(explainer_shapley,queryPoint(4,:));

Plot the Shapley values by using the object function plot. tiledlayout(2,2) nexttile plot(explainer_shapley1)

27-7

27

Interpretability

nexttile plot(explainer_shapley2) nexttile plot(explainer_shapley3) nexttile plot(explainer_shapley4)

MVE_BVTD is the most important predictor for all the query points. The Shapley values of MVE_BVTD are positive for the first three query points. The MVE_BVTD variable values are about 9.6, 7.9, 4.0, and 3.3 for the query points. According to the Shapley values for the four query points, a large MVE_BVTD value leads to an increase in the predicted score, and a small MVE_BVTD value leads to a decrease in the predicted scores compared to the average. Create Partial Dependence Plot (PDP) A PDP plot shows the averaged relationships between the predictor and the predicted score in the trained model. Create PDPs for RE_TA and MVE_BVTD, which the other interpretability tools identify as important predictors. Pass tblx_AAA to plotPartialDependence so that the function computes the expectation of the predicted scores using only the samples for 'AAA'. figure plotPartialDependence(blackbox,'RE_TA','AAA',tblX_AAA)

27-8

Interpret Machine Learning Models

plotPartialDependence(blackbox,'MVE_BVTD','AAA',tblX_AAA)

27-9

27

Interpretability

The minor ticks in the x-axis represent the unique values of the predictor in tbl_AAA. The plot for MVE_BVTD shows that the predicted score is large when the MVE_BVTD value is small. The score value decreases as the MVE_BVTD value increases until it reaches about 5, and then the score value stays unchanged as the MVE_BVTD value increases. The dependency on MVE_BVTD in the subset tbl_AAA identified by plotPartialDependence is not consistent with the local contributions of MVE_BVTD at the four query points identified by lime and shapley.

Interpret Regression Model The model interpretation workflow for a regression problem is similar to the workflow for a classification problem, as demonstrated in the example “Interpret Classification Model” on page 27-3. This example trains a Gaussian process regression (GPR) model and interprets the trained model using interpretability features. Use a kernel parameter of the GPR model to estimate predictor weights. Also, use lime and shapley to interpret the predictions for specified query points. Then use plotPartialDependence to create a plot that shows the relationships between an important predictor and predicted responses. Train GPR Model Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. load carbig

Create a table containing the predictor variables Acceleration, Cylinders, and so on 27-10

Interpret Machine Learning Models

tbl = table(Acceleration,Cylinders,Displacement,Horsepower,Model_Year,Weight);

Train a GPR model of the response variable MPG by using the fitrgp function. Specify KernelFunction as 'ardsquaredexponential' to use the squared exponential kernel with a separate length scale per predictor. blackbox = fitrgp(tbl,MPG,'ResponseName','MPG','CategoricalPredictors',[2 5], ... 'KernelFunction','ardsquaredexponential');

blackbox is a RegressionGP model. Use Model-Specific Interpretability Features You can compute predictor weights (predictor importance) from the learned length scales of the kernel function used in the model. The length scales define how far apart a predictor can be for the response values to become uncorrelated. Find the normalized predictor weights by taking the exponential of the negative learned length scales. sigmaL = blackbox.KernelInformation.KernelParameters(1:end-1); % Learned length scales weights = exp(-sigmaL); % Predictor weights weights = weights/sum(weights); % Normalized predictor weights

Create a table containing the normalized predictor weights, and use the table to create horizontal bar graphs. To display an existing underscore in any predictor name, change the TickLabelInterpreter value of the axes to 'none'. tbl_weight = table(weights,'VariableNames',{'Predictor Weight'}, ... 'RowNames',blackbox.ExpandedPredictorNames); tbl_weight = sortrows(tbl_weight,'Predictor Weight'); b = barh(categorical(tbl_weight.Row,tbl_weight.Row),tbl_weight.('Predictor Weight')); b.Parent.TickLabelInterpreter = 'none'; xlabel('Predictor Weight') ylabel('Predictor')

27-11

27

Interpretability

The predictor weights indicate that multiple dummy variables for the categorical predictors Model_Year and Cylinders are important. Specify Query Point Find the observations whose MPG values are smaller than the 0.25 quantile of MPG. From the subset, choose four query points that do not include missing values. rng('default') % For reproducibility idx_subset = find(MPG < quantile(MPG,0.25)); tbl_subset = tbl(idx_subset,:); queryPoint = datasample(rmmissing(tbl_subset),4,'Replace',false) queryPoint=4×6 table Acceleration Cylinders ____________ _________ 13.2 14.9 14 13.7

8 8 8 8

Displacement ____________ 318 302 360 318

Horsepower __________ 150 130 215 145

Model_Year __________

Weight ______

76 77 70 77

3940 4295 4615 4140

Use LIME with Tree Simple Models Explain the predictions for the query points using lime with decision tree simple models. lime generates a synthetic data set and fits a simple model to the synthetic data set. 27-12

Interpret Machine Learning Models

Create a lime object using tbl_subset so that lime generates a synthetic data set using the subset instead of the entire data set. Specify SimpleModelType as 'tree' to use a decision tree simple model. explainer_lime = lime(blackbox,tbl_subset,'SimpleModelType','tree');

The default value of “DataLocality” on page 35-0 for lime is 'global', which implies that, by default, lime generates a global synthetic data set and uses it for any query points. lime uses different observation weights so that weight values are more focused on the observations near the query point. Therefore, you can interpret each simple model as an approximation of the trained model for a specific query point. Fit simple models for the four query points by using the object function fit. Specify the third input (the number of important predictors to use in the simple model) as 6. With this setting, the software specifies the maximum number of decision splits (or branch nodes) as 6 so that the fitted decision tree uses at most all predictors. explainer_lime1 explainer_lime2 explainer_lime3 explainer_lime4

= = = =

fit(explainer_lime,queryPoint(1,:),6); fit(explainer_lime,queryPoint(2,:),6); fit(explainer_lime,queryPoint(3,:),6); fit(explainer_lime,queryPoint(4,:),6);

Plot the predictor importance by using the object function plot. tiledlayout(2,2) nexttile plot(explainer_lime1) nexttile plot(explainer_lime2) nexttile plot(explainer_lime3) nexttile plot(explainer_lime4)

27-13

27

Interpretability

All simple models identify Displacement, Model_Year, and Weight as important predictors. Compute Shapley Values The Shapley value of a predictor for a query point explains the deviation of the predicted response for the query point from the average response, due to the predictor. Create a shapley object for the model blackbox using tbl_subset so that shapley computes the expected contribution based on the observations in tbl_subset. explainer_shapley = shapley(blackbox,tbl_subset);

Compute the Shapley values for the query points by using the object function fit. explainer_shapley1 explainer_shapley2 explainer_shapley3 explainer_shapley4

= = = =

fit(explainer_shapley,queryPoint(1,:)); fit(explainer_shapley,queryPoint(2,:)); fit(explainer_shapley,queryPoint(3,:)); fit(explainer_shapley,queryPoint(4,:));

Plot the Shapley values by using the object function plot. tiledlayout(2,2) nexttile plot(explainer_shapley1) nexttile plot(explainer_shapley2) nexttile plot(explainer_shapley3)

27-14

Interpret Machine Learning Models

nexttile plot(explainer_shapley4)

Model_Year is the most important predictor for the first, second, and fourth query points, and the Shapley values of Model_Year are positive for the three query points. The Model_Year variable value is 76 or 77 for these three points, and the value for the third query point is 70. According to the Shapley values for the four query points, a small Model_Year value leads to a decrease in the predicted response, and a large Model_Year value leads to an increase in the predicted response compared to the average. Create Partial Dependence Plot (PDP) A PDP plot shows the averaged relationships between the predictor and the predicted response in the trained model. Create a PDP for Model_Year, which the other interpretability tools identify as an important predictor. Pass tbl_subset to plotPartialDependence so that the function computes the expectation of the predicted responses using only the samples in tbl_subset. figure plotPartialDependence(blackbox,'Model_Year',tbl_subset)

27-15

27

Interpretability

The plot shows the same trend identified by the Shapley values for the four query points. The predicted response (MPG) value increases as the Model_Year value increases.

References [1] Ribeiro, Marco Tulio, S. Singh, and C. Guestrin. "'Why Should I Trust You?': Explaining the Predictions of Any Classifier." In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44. San Francisco, California: ACM, 2016. [2] Lundberg, Scott M., and S. Lee. "A Unified Approach to Interpreting Model Predictions." Advances in Neural Information Processing Systems 30 (2017): 4765–774. [3] Aas, Kjersti, Martin Jullum, and Anders Løland. "Explaining Individual Predictions When Features Are Dependent: More Accurate Approximations to Shapley Values." Artificial Intelligence 298 (September 2021). [4] Lundberg, Scott M., G. Erion, H. Chen, et al. "From Local Explanations to Global Understanding with Explainable AI for Trees." Nature Machine Intelligence 2 (January 2020): 56–67. [5] Friedman, Jerome. H. “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals of Statistics 29, no. 5 (2001): 1189-1232. [6] Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24, no. 1 (January 2, 2015): 44–65. 27-16

Interpret Machine Learning Models

See Also lime | shapley | plotPartialDependence

Related Examples •

“Shapley Values for Machine Learning Model” on page 27-18

•

“Introduction to Feature Selection” on page 16-46

•

“Interpret Deep Network Predictions on Tabular Data Using LIME” (Deep Learning Toolbox)

•

Discover Interpretability Features

•

Model Interpretability in MATLAB

•

Lowering Barriers to AI Adoption with AutoML and Interpretability

27-17

27

Interpretability

Shapley Values for Machine Learning Model This topic defines Shapley values, describes two available algorithms in the Statistics and Machine Learning Toolbox feature that computes Shapley values, provides examples for each, and shows how to reduce the computational cost.

What Is a Shapley Value? In game theory, the Shapley value of a player is the average marginal contribution of the player in a cooperative game. That is, Shapley values are fair allocations, to individual players, of the total gain generated from a cooperative game. In the context of machine learning prediction, the Shapley value of a feature for a query point explains the contribution of the feature to a prediction (the response for regression or the score of each class for classification) at the specified query point. The Shapley value corresponds to the deviation of the prediction for the query point from the average prediction, due to the feature. For each query point, the sum of the Shapley values for all features corresponds to the total deviation of the prediction from the average. The Shapley value of the ith feature for the query point x is defined by the value function v: φi(vx) =

1 MS⊆∑ ℳ\

vx S ∪ i i

− vx S M−1 ! S ! M− S −1 !

(27-1)

• M is the number of all features. • ℳ is the set of all features. • |S| is the cardinality of the set S, or the number of elements in the set S. • vx(S) is the value function of the features in a set S for the query point x. The value of the function indicates the expected contribution of the features in S to the prediction for the query point x.

Shapley Value in Statistics and Machine Learning Toolbox You can compute Shapley values for a machine learning model by using a shapley object. Use the values to interpret the contributions of individual features in the model to the prediction for a query point. You can compute Shapley values in two ways: • Create a shapley object for a machine learning model with a specified query point by using the shapley function. The function computes the Shapley values of all features in the model for the query point. • Create a shapley object for a machine learning model by using the shapley function, and then compute the Shapley values for a specified query point by using the fit function.

Algorithms shapley offers two types of algorithms: interventional, which uses interventional distributions for the value function, and conditional, which uses conditional distributions for the value function. You can specify the algorithm type to use by setting the Method name-value argument of the shapley function or the fit function. The difference between the two types of algorithms is the definition of the value function. Both types define the value function so that the sum of the Shapley values of a query point over all features corresponds to the total deviation of the prediction for the query point from the average. 27-18

Shapley Values for Machine Learning Model

M

∑

i=1

φi vx = f x − E f x .

Therefore, the value function vx(S) must correspond to the expected contribution of the features in S to the prediction (f) for the query point x. The algorithms compute the expected contribution by using artificial samples created from the specified data (X). You must provide X through the machine learning model input or a separate data input argument when you create a shapley object. In the artificial samples, the values for the features in S come from the query point. For the rest of the features (features in Sc, the complement of S), an interventional algorithm generates samples using interventional distributions, whereas a conditional algorithm generates samples using conditional distributions. Interventional Algorithms By default, shapley uses one of these interventional algorithms: Kernel SHAP [1], Linear SHAP [1], or Tree SHAP [2]. Computing exact Shapley values can be computationally expensive if shapley uses all possible subsets S. Therefore, shapley estimates the Shapley values by limiting the maximum number of subsets to use for the Kernel SHAP algorithm. For more details, see “Computational Cost” on page 27-24. For linear models and tree-based models, shapley offers the Linear SHAP and Tree SHAP algorithms, respectively. These algorithms are computationally less expensive and compute exact Shapley values. The algorithms return the same Shapley values that the Kernel SHAP algorithm returns when using all possible subsets. The Linear SHAP, Tree SHAP, and Kernel SHAP algorithms differ in these ways: • The Linear SHAP and Tree SHAP algorithms ignore the ResponseTransform property (for regression) and the ScoreTransform property (for classification) of the machine learning model. That is, the algorithms compute Shapley values based on raw responses or raw scores without applying response transformation or score transformation, respectively. The Kernel SHAP algorithm uses transformed values if the model specifies transformation in the ResponseTransform or ScoreTransform property. • The Kernel SHAP and Tree SHAP algorithms can use observations with missing values. The Linear SHAP algorithm cannot handle observations with missing values for any model. shapley selects an algorithm based on the machine learning model type and other specified options: • Linear SHAP algorithm for these linear models: • RegressionLinear and ClassificationLinear • RegressionSVM, CompactRegressionSVM, ClassificationSVM, and CompactClassificationSVM models that use a linear kernel function • Tree SHAP algorithm for these tree models and ensemble models with tree learners: • Tree models — RegressionTree, CompactRegressionTree, ClassificationTree, and CompactClassificationTree • Ensemble models with tree learners — RegressionEnsemble, RegressionBaggedEnsemble, CompactRegressionEnsemble, ClassificationEnsemble, CompactClassificationEnsemble, and ClassificationBaggedEnsemble models that use tree learners 27-19

27

Interpretability

To use the Tree SHAP algorithm, you must specify the Method name-value argument (ensemble aggregation method) as 'Bag', 'AdaBoostM2', 'GentleBoost', 'LogitBoost', or 'RUSBoost' when you train a classification ensemble model. • Kernel SHAP algorithm for all other model types and for these cases: • For the tree models and ensembles of trees previously listed, the software might use Kernel SHAP instead of Tree SHAP if the models use surrogate splits (Surrogate) for prediction, and observations in the input predictor data or values in the query point contain missing values. For tree models and ensemble models with tree learners, the software always used Kernel SHAP instead of Tree SHAP when observations in the input predictor data or values in the query point contained missing values. • If you specify the MaxNumSubsets name-value argument (maximum number of predictor subsets to use for Shapley value computation) of shapley or fit, the software uses Kernel SHAP. • In some cases, Kernel SHAP can be computationally less expensive than Tree SHAP. For example, Kernel SHAP can be more efficient if a model contains a deep tree for lowdimensional data. The software heuristically selects an efficient algorithm. An interventional algorithm defines the value function of the features in S at the query point x as the expected prediction with respect to the interventional distribution D, which is the joint distribution of the features in Sc: vx S = ED f xS, XSc . xS is the query point value for the features in S, and XSc are the features in Sc. To evaluate the value function vx(S) at the query point x, with the assumption that the features are not highly correlated, shapley uses the values in the data X as samples of the interventional distribution D for the features in Sc: vx S = ED f xS, XSc ≈

1 N

N

∑

j=1

f xS, XSc

j

.

N is the number of observations, and (XSc)j contains the values of the features in Sc for the jth observation. For example, suppose you have three features in X and four observations: (x11,x12,x13), (x21,x22,x23), (x31,x32,x33), and (x41,x42,x43). Assume that S includes the first feature, and Sc includes the rest. In this case, the value function of the first feature evaluated at the query point (x41,x42,x43) is vx S =

1 f x41, x12, x13 + f x41, x22, x23 + f x41, x32, x33 + f x41, x42, x43 . 4

An interventional algorithm is computationally less expensive than a conditional algorithm and supports ordered categorical predictors. However, an interventional algorithm requires the feature independence assumption and uses out-of-distribution samples [4]. The artificial samples created with a mix of the query point and the data X can contain unrealistic observations. For example, (x41,x12,x13) might be a sample that does not occur in the full joint distribution of the three features. Conditional Algorithm Specify the Method name-value argument as 'conditional' to use the extension to the Kernel SHAP algorithm [3], which is a conditional algorithm. 27-20

Shapley Values for Machine Learning Model

A conditional algorithm defines the value function of the features in S at the query point x using the conditional distribution of XSc, given that XS has the query point values: vx S = EXSc XS = xS f xS, XSc . To evaluate the value function vx(S) at the query point x, shapley uses nearest neighbors of the query point, which correspond to 10% of the observations in the data X. This approach uses more realistic samples than an interventional algorithm and does not require the feature independence assumption. However, a conditional algorithm is computationally more expensive, does not support ordered categorical predictors, and cannot handle NaNs in continuous features. Also, the algorithm might assign a nonzero Shapley value to a dummy feature, which does not contribute to the prediction, if the dummy feature is correlated with an important feature [4].

Specify Computation Algorithm This example trains a linear classification model and computes Shapley values using an interventional algorithm ('Method','interventional') and then a conditional algorithm ('Method','conditional'). Train Linear Classification Model Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). load ionosphere

Train a linear classification model. For better accuracy of linear coefficients, specify the objective function minimization technique ('Solver' name-value argument) as the limited-memory BroydenFletcher-Goldfarb-Shanno quasi-Newton algorithm ('lbfgs'). Mdl = fitclinear(X,Y,'Solver','lbfgs') Mdl = ClassificationLinear ResponseName: 'Y' ClassNames: {'b' 'g'} ScoreTransform: 'none' Beta: [34x1 double] Bias: -3.7100 Lambda: 0.0028 Learner: 'svm'

Compute Shapley Values Using Interventional Algorithm Compute the Shapley values for the first observation using the Linear SHAP algorithm, which is an interventional algorithm. You do not have to specify the Method name-value argument because 'interventional' is the default. queryPoint = X(1,:); explainer1 = shapley(Mdl,X,'QueryPoint',queryPoint);

For a classification model, shapley computes Shapley values using the predicted class score for each class. Plot the Shapley values for the predicted class by using the plot function. 27-21

27

Interpretability

plot(explainer1)

The horizontal bar graph shows the Shapley values for the 10 most important variables, sorted by their absolute values. Each value explains the deviation of the score for the query point from the average score of the predicted class, due to the corresponding variable. For a linear model, shapley assumes features are independent from one another and computes the Shapley values from the estimated coefficients (Mdl.Beta) [1]. Compute the Shapley values for the positive class (the second class in Mdl.ClassNames, 'g') directly from the estimated coefficients. linearSHAPValues = (Mdl.Beta'.*(queryPoint-mean(X)))';

Create a table containing the Shapley values computed from shapley and the values from the coefficients. t = table(explainer1.ShapleyValues.Predictor,explainer1.ShapleyValues.g,linearSHAPValues, ... 'VariableNames',{'Predictor','Values from shapley','Values from coefficients'}) t=34×3 table Predictor _________ "x1" "x2" "x3" "x4" "x5"

27-22

Values from shapley ___________________ 0.28789 0 0.20822 -0.01998 0.20872

Values from coefficients ________________________ 0.28789 0 0.20822 -0.01998 0.20872

Shapley Values for Machine Learning Model

"x6" "x7" "x8" "x9" "x10" "x11" "x12" "x13" "x14" "x15" "x16" ⋮

-0.076991 0.19188 -0.64386 0.42348 -0.030049 -0.23132 0.1422 -0.045973 -0.29022 0.21051 0.13382

-0.076991 0.19188 -0.64386 0.42348 -0.030049 -0.23132 0.1422 -0.045973 -0.29022 0.21051 0.13382

Compute Shapley Values Using Conditional Algorithm Compute the Shapley values for the first observation using the extension to the Kernel SHAP algorithm, which is a conditional algorithm. explainer2 = shapley(Mdl,X,'QueryPoint',queryPoint,'Method','conditional');

Plot the Shapley values. plot(explainer2)

27-23

27

Interpretability

The two algorithms identify different sets for the 10 most important variables. Only two variables, x8 and x22, are included in both sets.

Computational Cost The computational cost for Shapley values increases if the number of observations or features is large. Large Number of Observations Computing the value function (v) can be computationally expensive if you have a large number of observations, for example, more than 1000. For faster computation, use a smaller sample of the observations when you create a shapley object, or run in parallel by specifying UseParallel as true when you compute the values using the shapley or fit function. The UseParallel option is available when shapley uses the Tree SHAP algorithm for an ensemble of trees, the Kernel SHAP algorithm, or the extension to the Kernel SHAP algorithm. Computing in parallel requires Parallel Computing Toolbox. Large Number of Features Computing the summand in “Equation 27-1” for all possible subsets S can be computationally expensive when M (the number of features) is large for the Kernel SHAP algorithm or the extension to the Kernel SHAP algorithm. The total number of subsets to consider is 2M. Instead of computing the summand for all subsets, you can specify the maximum number of subsets by using the MaxNumSubsets name-value argument. shapley chooses subsets to use based on their weight values. The weight of a subset is proportional to 1/(denominator of the summand), which corresponds M−1 . Therefore, a subset with a high or low value of to 1 over the binomial coefficient: 1/ S cardinality has a large weight value. shapley includes the subsets with the highest weight first, and then includes the other subsets in descending order based on their weight values.

Reduce Computational Cost This example shows how to reduce the computational cost of Shapley values when you have a large number of both observations and features. Load the sample data set NYCHousing2015. load NYCHousing2015

The data set includes 91,446 observations of 10 variables with information on the sales of properties in New York City in 2015. This example uses these variables to analyze the sale prices (SALEPRICE). Preprocess the data set. Convert the datetime array (SALEDATE) to the month numbers. NYCHousing2015.SALEDATE = month(NYCHousing2015.SALEDATE);

Train a neural network regression model. Mdl = fitrnet(NYCHousing2015,'SALEPRICE','Standardize',true);

Compute the Shapley values of all predictor variables for the first observation. Measure the time required for the computation by using tic and toc. 27-24

Shapley Values for Machine Learning Model

tic explainer1 = shapley(Mdl,'QueryPoint',NYCHousing2015(1,:));

Warning: Computation can be slow because the predictor data has over 1000 observations. Use a sma toc Elapsed time is 150.637868 seconds.

As the warning message indicates, the computation can be slow because the predictor data has over 1000 observations. shapley provides several options to reduce the computational cost when you have a large number of observations or features: • Large number of observations — Use a smaller sample of the training data and run in parallel by specifying UseParallel as true. • Large number of features — Specify the MaxNumSubsets name-value argument to limit the number of subsets included in the computation. Start a parallel pool. parpool; Starting parallel pool (parpool) using the 'Processes' profile ... Connected to parallel pool with 6 workers.

Compute the Shapley values again using a smaller sample of the training data and the parallel computing option. Also, specify the maximum number of subsets as 2^5. NumSamples = 5e2; Tbl = datasample(NYCHousing2015,NumSamples,'Replace',false); tic explainer2 = shapley(Mdl,Tbl,'QueryPoint',NYCHousing2015(1,:), ... 'UseParallel',true,'MaxNumSubsets',2^5); toc Elapsed time is 0.844226 seconds.

Specifying the additional options reduces the computation time.

References [1] Lundberg, Scott M., and S. Lee. "A Unified Approach to Interpreting Model Predictions." Advances in Neural Information Processing Systems 30 (2017): 4765–774. [2] Lundberg, Scott M., G. Erion, H. Chen, et al. "From Local Explanations to Global Understanding with Explainable AI for Trees." Nature Machine Intelligence 2 (January 2020): 56–67. [3] Aas, Kjersti, Martin Jullum, and Anders Løland. "Explaining Individual Predictions When Features Are Dependent: More Accurate Approximations to Shapley Values." Artificial Intelligence 298 (September 2021). [4] Kumar, I. Elizabeth, Suresh Venkatasubramanian, Carlos Scheidegger, and Sorelle Friedler. "Problems with Shapley-Value-Based Explanations as Feature Importance Measures." 27-25

27

Interpretability

Proceedings of the 37th International Conference on Machine Learning 119 (July 2020): 5491–500.

See Also shapley | fit | plot

Related Examples

27-26

•

“Interpret Machine Learning Models” on page 27-2

•

Discover Interpretability Features

•

Model Interpretability in MATLAB

•

Lowering Barriers to AI Adoption with AutoML and Interpretability

28 Incremental Learning • “Incremental Learning Overview” on page 28-2 • “Incremental Anomaly Detection Overview” on page 28-9 • “Configure Incremental Learning Model” on page 28-14 • “Configure Model for Incremental Anomaly Detection” on page 28-24 • “Implement Incremental Learning for Regression Using Succinct Workflow” on page 28-27 • “Implement Incremental Learning for Classification Using Succinct Workflow” on page 28-30 • “Implement Incremental Learning for Regression Using Flexible Workflow” on page 28-33 • “Implement Incremental Learning for Classification Using Flexible Workflow” on page 28-37 • “Initialize Incremental Learning Model from SVM Regression Model Trained in Regression Learner” on page 28-41 • “Initialize Incremental Learning Model from Logistic Regression Model Trained in Classification Learner” on page 28-48 • “Perform Conditional Training During Incremental Learning” on page 28-53 • “Perform Text Classification Incrementally” on page 28-57 • “Incremental Learning with Naive Bayes and Heterogeneous Data” on page 28-60 • “Monitor Equipment State of Health Using Drift-Aware Learning” on page 28-67 • “Monitor Equipment State of Health Using Drift-Aware Learning on the Cloud” on page 28-72

28

Incremental Learning

Incremental Learning Overview In this section... “What Is Incremental Learning?” on page 28-2 “Incremental Learning with MATLAB” on page 28-3

What Is Incremental Learning? Incremental learning, or online learning, is a branch of machine learning that involves processing incoming data from a data stream—continuously and in real time—possibly given little to no knowledge of the distribution of the predictor variables, sample size, aspects of the prediction or objective function (including adequate tuning parameter values), and whether the observations have labels. Another type of incremental learning involves training a model and detecting anomalies in an incoming data stream (see “Incremental Anomaly Detection Overview” on page 28-9). Incremental learning algorithms are flexible, efficient, and adaptive. The following characteristics distinguish incremental learning from traditional machine learning: • An incremental model is fit to data quickly and efficiently, which means it can adapt, in real time, to changes (or drifts) in the data distribution. • Because observation labels can be missing when corresponding predictor data is available, the algorithm must be able to generate predictions from the latest version of the model quickly, and defer training the model. • Little information might be known about the population before incremental learning starts. Therefore, the algorithm can be run with a cold start. For example, for classification problems, the class names might not be known until after the model processes observations. When enough information is known before learning begins (for example, you have good estimates of linear model coefficients), you can specify such information to provide the model with a warm start. • Because observations can arrive in a stream, the sample size is likely unknown and possibly large, which makes data storage inefficient or impossible. Therefore, the algorithm must process observations when they are available and before the system discards them. This incremental learning characteristic makes hyperparameter tuning difficult or impossible. In traditional machine learning, a batch of labeled data is available to perform cross-validation to estimate the generalization error and tune hyperparameters, infer the predictor variable distribution, and fit the model. However, the resulting model must be retrained from the beginning if underlying distributions drift or the model degrades. Although performing cross-validation to tune hyperparameters is difficult in an incremental learning environment, incremental learning methods are flexible because they can adapt to distribution drift in real time, with predictive accuracy approaching that of a traditionally trained model as the model trains more data. Suppose an incremental model is prepared to generate predictions and have its predictive performance measured. Given incoming chunks of observations, an incremental learning scheme processes data in real time and in any of the following ways, but usually in the specified order:

28-2

1

Evaluate model: Track the predictive performance of the model when true labels are available, either on the incoming data only, over a sliding window of observations, or over the entire history of the model used for incremental learning.

2

Detect drift: Check for structural breaks or distribution drift. For example, determine whether the distribution of any predictor variable has sufficiently changed.

Incremental Learning Overview

3

Train model: Update the model by training it on the incoming observations, when true labels are available or when the current model has sufficiently degraded.

4

Generate predictions: Predict labels from the latest model.

This procedure is a special case of incremental learning, in which all incoming chunks are treated as test (holdout) sets. The procedure is called interleaved test-then-train or prequential evaluation [1]. If insufficient information exists for an incremental model to generate predictions, or you do not want to track the predictive performance of the model because it has not been trained enough, you can include an optional initial step to find adequate values for hyperparameters, for models that support one (estimation period), or an initial training period before model evaluation (metrics warm-up period). As an example of an incremental learning problem, consider a smart thermostat that automatically sets a temperature given the ambient temperature, relative humidity, time of day, and other measurements, and can learn the user's indoor temperature preferences. Suppose the manufacturer prepared the device by embedding a known model that describes the average person's preferences given the measurements. After installation, the device collects data every minute, and adjusts the temperature to its presets. The thermostat adjusts the embedded model, or retrains itself, based on the user's actions or inactions with the device. This cycle can continue indefinitely. If the thermostat has limited disk space to store historical data, it needs to retrain itself in real time. If the manufacturer did not prepare the device with a known model, the device retrains itself more often.

Incremental Learning with MATLAB Statistics and Machine Learning Toolbox functionalities enable you to implement incremental learning for classification or regression. Like other Statistics and Machine Learning Toolbox machine learning functionalities, the entry point into incremental learning is an incremental learning object, which you pass to functions with data to implement incremental learning. Unlike other machine learning functions, data is not required to create an incremental learning object. However, the incremental learning object specifies how to process incoming data, such as when to fit the model, measure performance metrics, or perform both actions, in addition to the parametric form of the model and problem-specific options. Incremental Learning Model Objects This table contains the available entry-point model objects for incremental learning with their supported machine learning objective, model type, and information required to create the model object. Model Object

Objective

Model Type

Required Information

incrementalClassif Multiclass classification Error-correcting output Maximum number of icationECOC codes (ECOC) model classes expected in the with binary learners data during incremental learning or the names of all expected classes incrementalClassif Binary classification icationKernel

Linear support vector None machine (SVM) and logistic regression with Gaussian kernels

28-3

28

Incremental Learning

Model Object

Objective

incrementalClassif Binary classification icationLinear

Model Type

Required Information

Linear SVM and logistic None regression

incrementalClassif Multiclass classification Naive Bayes with icationNaiveBayes normal, multinomial, or multivariate multinomial predictor conditional distributions

Maximum number of classes expected in the data during incremental learning or the names of all expected classes

incrementalRegress Regression ionKernel

Least-squares and linear SVM regression with Gaussian kernels

None

incrementalRegress Regression ionLinear

Least-squares and linear SVM regression

None

Properties of an incremental learning model object specify: • Data characteristics, such as the number of predictor variables NumPredictors and their first and second moments • Model characteristics, such as, for linear models, the learner type Learner, linear coefficients Beta, and intercept Bias • Training options, such as, for linear models, the objective solver Solver and solver-specific hyperparameters such as the ridge penalty Lambda for standard and average stochastic gradient descent (SGD and ASGD) • Model performance evaluation characteristics and options, such as whether the model is warm IsWarm, which performance metrics to track Metrics, and the latest values of the performance metrics Unlike when working with other machine learning model objects, you can create an incremental learning model by directly calling the object and specifying property values of options using namevalue arguments; you do not need to fit a model to data to create one. This feature is convenient when you have little information about the data or model before training it. Depending on your specifications, the software can enforce estimation and metrics warm-up periods, during which incremental fitting functions infer data characteristics and then train the model for performance evaluation. By default, for linear models, the software solves the objective function using the adaptive scale-invariant solver, which does not require tuning and is insensitive to the predictor variable scales [2] . Alternatively, you can convert a traditionally trained model to a model for incremental learning by using the incrementalLearner function. For example, incrementalLearner converts a trained linear classification model of type ClassificationLinear to an incrementalClassificationLinear object. This table lists the convertible models and their conversion functions. Traditionally Trained Convertible Model Object

Conversion Function

ClassificationECOC and incrementalLearner CompactClassificationECO C

28-4

Model Object for Incremental Learning incrementalClassificatio nECOC

Incremental Learning Overview

Traditionally Trained Convertible Model Object

Conversion Function

Model Object for Incremental Learning

ClassificationKernel

incrementalLearner

incrementalClassificatio nKernel

ClassificationSVM and incrementalLearner CompactClassificationSVM

incrementalClassificatio nLinear

ClassificationLinear

incrementalLearner

incrementalClassificatio nLinear

ClassificationNaiveBayes incrementalLearner

incrementalClassificatio nNaiveBayes

RegressionKernel

incrementalLearner

incrementalRegressionKer nel

RegressionSVM and CompactRegressionSVM

incrementalLearner

incrementalRegressionLin ear

RegressionLinear

incrementalLearner

incrementalRegressionLin ear

By default, the software considers converted models to be prepared for all aspects of incremental learning (converted models are warm). incrementalLearner carries over data characteristics (such as class names), fitted parameters, and options available for incremental learning from the traditionally trained model being converted. For example: • For naive Bayes classification, incrementalLearner carries over all class names in the data expected during incremental learning, and the fitted moments of the conditional predictor distributions (DistributionParameters). • For linear models, if the objective solver of the traditionally trained model is SGD, incrementalLearner sets the incremental learning solver to SGD. For more details, see the output argument description of each incrementalLearner function page. Incremental Learning Functions The incremental learning model object specifies all aspects of the incremental learning algorithm, from training and model evaluation preparation through training and model evaluation. To implement incremental learning, you pass the configured incremental learning model to an incremental fitting function or model evaluation function. You can find the list of supported incremental learning functions in the Object Functions section of each incremental learning model object page. Statistics and Machine Learning Toolbox incremental learning functions offer two workflows that are well suited for prequential learning. For simplicity, the following workflow descriptions assume that the model is prepared to evaluate the model performance (in other words, the model is warm). • Flexible workflow — When a data chunk is available: 1

Compute cumulative and window model performance metrics by passing the data and current model to the updateMetrics function. The data is treated as test (holdout) data because the model has not been trained on it yet. updateMetrics overwrites the model performance stored in the model with the new values.

2

Optionally detect distribution drift or whether the model has degraded. 28-5

28

Incremental Learning

3

Train the model by passing the incoming data chunk and current model to the fit function. The fit function uses the specified solver to fit the model to the incoming data chunk, and overwrites the current coefficients and bias with the new estimates.

The flexible workflow enables you to perform custom model and data quality assessments before deciding whether to train the model. All steps are optional, but call updateMetrics before fit when you plan to call both functions. • Succinct workflow — When a data chunk is available, supply the incoming chunk and a configured incremental model to the updateMetricsAndFit function. updateMetricsAndFit calls updateMetrics immediately followed by fit. The succinct workflow enables you to implement incremental learning with prequential evaluation easily when you plan to track the model performance and train the model on all incoming data chunks. Once you create an incremental model object and choose a workflow to use, write a loop that implements incremental learning: 1

Read a chunk of observations from a data stream, when the chunk is available.

2

Implement the flexible or succinct workflow. To perform incremental learning properly, overwrite the input model with the output model. For example: • Flexible workflow Mdl = updateMetrics(Mdl,X,Y); % % Insert optional code % Mdl = fit(Mdl,X,Y);

• Succinct workflow Mdl = updateMetricsAndFit(Mdl,X,Y);

The model tracks its performance on incoming data incrementally using metrics measured since the beginning of training (cumulative) and over a specified window of consecutive observations (window). However, you can optionally compute the model loss on the incoming chunk, and then pass the incoming chunk and current model to the loss function. loss returns the scalar loss; it does not adjust the model. Model configurations determine whether incremental learning functions train or evaluate model performance during each iteration. Configurations can change as the functions process data. For more details, see “Incremental Learning Periods” on page 28-6. 3

Optionally: • Generate predictions by passing the chunk and latest model to predict. • If the model was fit to data, compute the resubstitution loss by passing the chunk and latest model to loss. • For naive Bayes classification models, the logp function enables you to detect outliers in realtime. The function returns the log unconditional probability density of the predictor variables at each observation in the chunk.

Incremental Learning Periods Given incoming chunks of data, the actions performed by incremental learning functions depend on the current configuration or state of the model. This figure shows the periods (consecutive groups of observations) during which incremental learning functions perform particular actions. 28-6

Incremental Learning Overview

This table describes the actions performed by incremental learning functions during each period. Period

Associated Model Properties

Size Actions (Number of Observatio ns)

Estimation

EstimationPeriod, applies to linear classification, kernel classification, linear regression, and kernel regression models only

n1

When required, fitting functions choose values for hyperparameters based on estimation period observations. Actions can include the following: • Estimate the predictor moments Mu and Sigma for data standardization (applies to linear and kernel models only). • Adjust the learning rate LearnRate for SGD solvers according to the learning rate schedule LearnRateSchedule. • Estimate the SVM regression parameter ε Epsilon. • Store information buffers required for estimation. • Update corresponding properties at the end of the period. For more details, see the Algorithms section of each object and incrementalLearner function page.

Metrics Warm-up

MetricsWarmupPerio n2 – n1 d

When the property IsWarm is false, fitting functions perform the following actions: • Fit the model to the incoming chunk of data. • Update corresponding model properties, such as Beta or DistributionParameters, after fitting the model. • At the end of the period, the model is warm (the IsWarm property becomes true).

28-7

28

Incremental Learning

Period

Associated Model Properties

Performanc Metrics and e MetricsWindowSize Evaluation j

Size Actions (Number of Observatio ns) m

• At the start of Performance Evaluation Period 1, functions begin to track cumulative Cumulative and window Window metrics. Window is a vector of NaNs throughout this period. • Functions overwrite Cumulative metrics with the updated cumulative metric at each iteration. At the end of each Performance Evaluation Period, functions compute and overwrite Window metrics based on the last m observations. • Functions store information buffers required for computing model performance.

References [1] Bifet, Albert, Ricard Gavaldá, Geoffrey Holmes, and Bernhard Pfahringer. Machine Learning for Data Streams with Practical Example in MOA. Cambridge, MA: The MIT Press, 2007. [2] Kempka, Michał, Wojciech Kotłowski, and Manfred K. Warmuth. "Adaptive Scale-Invariant Online Algorithms for Learning Linear Models." Preprint, submitted February 10, 2019. https:// arxiv.org/abs/1902.07528.

See Also Objects incrementalClassificationLinear | incrementalRegressionLinear | incrementalClassificationNaiveBayes

More About

28-8

•

“Configure Incremental Learning Model” on page 28-14

•

“Implement Incremental Learning for Classification Using Succinct Workflow” on page 28-30

•

“Implement Incremental Learning for Classification Using Flexible Workflow” on page 28-37

Incremental Anomaly Detection Overview

Incremental Anomaly Detection Overview In this section... “What Is Incremental Anomaly Detection?” on page 28-9 “Incremental Anomaly Detection with MATLAB” on page 28-9

What Is Incremental Anomaly Detection? Incremental anomaly detection is a branch of machine learning that involves processing incoming data from a data stream—continuously and in real time—and computing anomaly scores, possibly given little to no knowledge of the distribution of the predictor variables or sample size. Any observations above a score threshold are detected as anomalies. Incremental learning algorithms are flexible, efficient, and adaptive. The following characteristics distinguish incremental learning from traditional machine learning: • An incremental model is fit to data quickly and efficiently, which means it can adapt to changes (or drifts) in the data distribution, in real time. • Little information might be known about the population before incremental learning starts. Therefore, the algorithm can run with a cold start. For example, the anomaly contamination fraction and score threshold might not be known until after the model processes observations. When enough information is known before learning begins, you can specify such information to provide the model with a warm start. • Because observations can arrive in a stream, the sample size is likely unknown and possibly large, which makes data storage inefficient or impossible. Therefore, the algorithm must process observations when they are available and before the system discards them. This incremental learning characteristic makes hyperparameter tuning difficult or impossible. Suppose an incremental model is prepared to compute scores and detect anomalies. Given incoming chunks of observations, the incremental learning algorithm processes data in real time and does the following: 1

Detect anomalies — Identify observations with scores above the current score threshold as anomalies.

2

Train model — Update the model by training it on the incoming observations, computing scores, and updating the score threshold.

If insufficient information exists for an incremental model to generate predictions, or you do not want to track the predictive performance of the model because it has not been trained enough, you can include an optional initial step to find adequate values for hyperparameters (estimation period), or an initial training period before returning scores and identifying anomalies (score warm-up period).

Incremental Anomaly Detection with MATLAB Statistics and Machine Learning Toolbox functionalities enable you to implement incremental anomaly detection on streaming data. As with other machine learning functionalities, the entry point into incremental anomaly detection is an incremental learning object, which you pass to functions with data to implement incremental anomaly detection. Unlike other machine learning functions, incrementalRobustRandomCutForest and incrementalOneClassSVM do not require data to create an incremental learning object. However, the incremental learning object specifies how to 28-9

28

Incremental Learning

process incoming data, such as whether to standardize the predictor data and when to compute scores and identify anomalies. The object also specifies the parametric form of the model and problem-specific options. Incremental Learning Model Objects This table describes the available entry-point model objects for incremental anomaly detection. Model Object

Model Type

Characteristics

incrementalRobustRandomC Robust random cut forest utForest

Supports categorical predictors

incrementalOneClassSVM

Does not support categorical predictors

One-class support vector machine (SVM)

Properties of an incremental learning model object specify: • Data characteristics, such as the number of predictor variables NumPredictors and their first and second moments • Model characteristics, such as the number of trees and the number of training observations in each tree (for robust random cut forest models) • Training options, such as the objective solver Solver and solver-specific hyperparameters including the ridge penalty Lambda for standard and average stochastic gradient descent (for oneclass SVM models) Unlike when working with other machine learning model objects, you can create an incremental learning model by calling the object directly and specifying property values using name-value arguments. You do not need to fit a model to data to create an incremental learning model. This feature is convenient when you have little information about the data or model before training it. Depending on your specifications, the software can enforce estimation and score warm-up periods, during which incremental fitting functions infer data characteristics and then train the model for anomaly detection. By default, for one-class SVM models, the software solves the objective function using the adaptive scale-invariant solver, which does not require tuning and is insensitive to the predictor variable scales [3]. Alternatively, you can convert a traditionally trained model to a model for incremental learning by using the incrementalLearner function. For example, incrementalLearner converts a trained robust random cut forest model of type RobustRandomCutForest to an incrementalRobustRandomCutForest object. This table lists the convertible models and their conversion functions. Traditionally Trained Convertible Model Object

Conversion Function

Model Object for Incremental Anomaly Detection

RobustRandomCutForest

incrementalLearner

incrementalRobustRandomC utForest

OneClassSVM

incrementalLearner

incrementalOneClassSVM

By default, the software considers converted models to be prepared for all aspects of incremental learning (converted models are warm). The incrementalLearner function transfers data characteristics (such as predictor names), the score threshold, and options available for incremental anomaly detection from the traditionally trained model being converted. For example: 28-10

Incremental Anomaly Detection Overview

• For robust random cut forest models, incrementalLearner transfers all predictor names in the data expected during incremental learning, as well as the list of categorical predictors. • For one-class SVM models, if the objective solver of the traditionally trained model is SGD, incrementalLearner sets the incremental learning solver to SGD. For more details, see the output argument descriptions on each incrementalLearner function page. Incremental Anomaly Detection Functions The incremental learning model object specifies all aspects of the incremental learning algorithm, from training and anomaly detection preparation through training and anomaly detection. To implement incremental anomaly detection, you pass the configured incremental learning model to an incremental fitting function (fit) or anomaly detection function (isanomaly). You can find the list of supported incremental learning functions in the Object Functions section of each incremental learning model object page. Statistics and Machine Learning Toolbox incremental learning functions offer a workflow that is well suited for anomaly detection. For simplicity, the following workflow description assumes that the model is prepared to evaluate the model performance (in other words, the model is warm). After you create an incremental learning model object, write a loop that implements incremental learning: 1

Read a chunk of observations from a data stream, when the chunk is available.

2

Overwrite the input model with the output model to perform incremental learning properly. For example: [isanom,scores] = isanomaly(IncrementalMdl,X); IncrementalMdl = fit(IncrementalMdl,X);

isanomaly calculates scores and identifies observations in the incoming data chunk with scores higher than the current score threshold. The fit function trains the model on the incoming data chunk and updates the score threshold. You can specify to check for anomalies and update the model in either order. Incremental Learning Periods Given incoming chunks of data, the actions performed by incremental learning functions depend on the current configuration or state of the model. This table describes the actions performed by incremental learning functions during each period.

28-11

28

Incremental Learning

Period

Associated Model Properties Actions

Estimation

EstimationPeriod

When required, fitting functions choose values for hyperparameters based on estimation period observations. Actions can include the following: • Estimate the predictor means Mu and standard deviations Sigma for data standardization. • Adjust the learning rate LearnRate for SGD solvers according to the learning rate schedule LearnRateSchedule (applies to one-class SVM models only). • Store information buffers required for estimation. • Update corresponding properties at the end of the period. For more details, see the Algorithms section of each object and incrementalLearner function page.

Score Warm-up

ScoreWarmupPeriod

When the property IsWarm is false, fitting functions perform the following actions: • Fit the model to the incoming chunk of data. • Update corresponding model properties and the score threshold after fitting the model. • Return all scores as NaN and anomaly values as false. • At the end of the period, the model is warm (the IsWarm property becomes true).

28-12

Incremental Anomaly Detection Overview

References [1] Bartos, Matthew D., A. Mullapudi, and S. C. Troutman. "rrcf: Implementation of the Robust Random Cut Forest Algorithm for Anomaly Detection on Streams." Journal of Open Source Software 4, no. 35 (2019): 1336. [2] Guha, Sudipto, N. Mishra, G. Roy, and O. Schrijvers. "Robust Random Cut Forest Based Anomaly Detection on Streams," Proceedings of The 33rd International Conference on Machine Learning 48 (June 2016): 2712–21. [3] Kempka, Michał, Wojciech Kotłowski, and Manfred K. Warmuth. "Adaptive Scale-Invariant Online Algorithms for Learning Linear Models." Preprint, submitted February 10, 2019. https:// arxiv.org/abs/1902.07528.

See Also Objects incrementalRobustRandomCutForest | incrementalOneClassSVM

More About •

“Configure Model for Incremental Anomaly Detection” on page 28-24

•

“Unsupervised Anomaly Detection” on page 17-91

28-13

28

Incremental Learning

Configure Incremental Learning Model An incremental learning model object fully specifies how functions implement incremental fitting and model performance evaluation. To configure (or prepare) an incremental learning model, create one by calling the object directly, or by converting a traditionally trained model to one of the objects. The following table lists the available model types, model objects for incremental learning, and conversion functions. Objective

Model Type

Model Object for Conversion Function Incremental Learning

Binary classification

Linear support vector incrementalClassif incrementalLearner machine (SVM) and icationKernel converts a kernel logistic regression with classification model Gaussian kernels (ClassificationKern el). Linear SVM and logistic incrementalClassif incrementalLearner regression icationLinear converts a linear SVM model (ClassificationSVM or CompactClassificat ionSVM). incrementalLearner converts a linear classification model (ClassificationLine ar).

Multiclass classification Error-correcting output incrementalClassif incrementalLearner codes (ECOC) model icationECOC converts an ECOC with binary learners model (ClassificationECOC or CompactClassificat ionECOC) with binary learners. Naive Bayes with incrementalClassif incrementalLearner normal, multinomial, or icationNaiveBayes converts a full naive multivariate Bayes classification multinomial predictor model conditional distributions (ClassificationNaiv eBayes). Regression

28-14

Least-squares and linear SVM regression with Gaussian kernels

incrementalRegress incrementalLearner ionKernel converts a kernel regression model (RegressionKernel).

Configure Incremental Learning Model

Objective

Model Type

Model Object for Conversion Function Incremental Learning

Least-squares and linear SVM regression

incrementalRegress incrementalLearner ionLinear converts a linear SVM regression model (RegressionSVM or CompactRegressionS VM). incrementalLearner converts a linear regression model (RegressionLinear).

The approach you choose to create an incremental model depends on the information you have and your preferences. • Call object: Create an incremental model to your specifications by calling the object directly. This approach is flexible, enabling you to specify most options to suit your preferences, and the resulting model provides reasonable default values. For more details, see “Call Object Directly” on page 28-16. • Convert model: Convert a traditionally trained model to an incremental learner to initialize a model for incremental learning by using the incrementalLearner function. The function passes information that the traditionally trained model learned from the data. To convert a traditionally trained model, you must have a set of labeled data to which you can fit a model. When you use incrementalLearner, you can specify all performance evaluation options and only those training, model, and data options that are unknown during conversion. For more details, see “Convert Traditionally Trained Model” on page 28-20. Regardless of the approach you use, consider these configurations: • Model performance evaluation settings, such as the performance metrics to measure. For details, see “Model Options and Data Properties” on page 28-16. • For ECOC models: • Binary learners • Coding design matrix for the binary learners. • For kernel models: • Model type, such as SVM • Objective function solver, such as standard stochastic gradient descent (SGD) • Hyperparameters for random feature expansion, such as the kernel scale parameter and number of dimensions of expanded space • For linear models: • Model type, such as SVM • Coefficient initial values • Objective function solver, such as standard stochastic gradient descent (SGD) • Solver hyperparameter values, such as the learning rate of SGD solvers 28-15

28

Incremental Learning

• For naive Bayes models, the conditional distribution of the predictor variables. In a data set, you can specify that real-valued predictors are normally distributed and that categorical predictors (where levels are numeric scalars) are multivariate multinomial. For a bag-of-tokens model, where each predictor is a count, you can specify that all predictors are jointly multinomial.

Call Object Directly Unlike when working with other machine learning model objects, you can create an incremental learning model by calling the corresponding object directly, with little knowledge about the data. For example, the following code creates a default incremental model for linear regression and a naive Bayes classification model for a data stream containing 5 classes. MdlLR = incrementalRegressionLinear(); MdlNB = incrementalClassificationNaiveBayes(MaxNumClasses=5)

• For linear and kernel models, the only information required to create a model directly is the machine learning problem, either classification or regression. An estimation period might also be required, depending on your specifications. • For naive Bayes and ECOC classification models, you must specify the maximum number of classes or all class names expected in the data during incremental learning. If you have information about the data to specify, or you want to configure model options or performance evaluation settings, use name-value arguments when you call the object. (All model properties are read-only; you cannot adjust them using dot notation.) For example, the following pseudocode creates an incremental logistic regression model for binary classification, initializes the linear model coefficients Beta and bias Bias (obtained from prior knowledge of the problem), and sets the performance metrics warm-up period to 500 observations. Mdl = incrementalClassificationLinear(Learner="logistic", ... Beta=beta,Bias=bias,MetricsWarmupPeriod=500);

The following tables briefly describe notable options for the major aspects of incremental learning. For more details on all options, see the Properties section of each incremental model object page. Model Options and Data Properties This table contains notable model options and data characteristics. Model Type

Model Options and Data Properties

Description

Classification

ClassNames

For classification, the expected class names in the observation labels

ECOC classification

BinaryLearners*

Binary learners

CodingMatrix*

Class assignment codes

CodingName*

Coding design name

KernelScale

Kernel scale parameter that the software uses for random feature expansion

Learner

Model type, such as linear SVM, logistic regression, or least-squares regression

Kernel classification or regression

28-16

Configure Incremental Learning Model

Model Type

Model Options and Data Properties

Description

NumExpansionDimens Number of dimensions of expanded space ions Linear classification or regression

Naive Bayes classification

Beta

Linear coefficients that also serve as initial values for incremental fitting

Bias

Model intercept that also serve as an initial value for incremental fitting

Learner

Model type, such as linear SVM, logistic regression, or least-squares regression

Cost

Misclassification cost matrix

*You can specify the BinaryLearners property by using the Learners name-value argument, and specify the CodingMatrix and CodingName properties by using the Coding name-value argument. Set the other properties by using name-value argument syntax with the arguments of the same name when you call the object. For example, incrementalClassificationKernel(Learner="logistic") sets the Learner property to "logistic". Training and Solver Options and Properties This table contains notable training and solver options and properties. Model Type

Training and Solver Options and Properties

Description

Kernel classification or regression

EstimationPeriod

Pretraining estimation period

Solver

Objective function optimization algorithm

Standardize

Flag to standardize predictor data

Mu**

Predictor variable means

Sigma**

Predictor variable standard deviations

EstimationPeriod

Pretraining estimation period

Solver

Objective function optimization algorithm

Standardize

Flag to standardize predictor data

Lambda

Ridge penalty, a model hyperparameter that requires tuning for SGD optimization

BatchSize

Mini-batch size, an SGD hyperparameter

LearnRate

Learning rate, an SGD hyperparameter

Linear classification or regression

28-17

28

Incremental Learning

Model Type

Naive Bayes classification

Training and Solver Options and Properties

Description

Mu**

Predictor variable means

Sigma**

Predictor variable standard deviations

DistributionParameters**

Learned distribution parameters. • For each predictor with conditionally normal distributions given a class, the fitted, weighted mean and standard deviation. • For conditionally joint multinomial predictors given a class, relative frequencies of the levels the predictors represent. • For each conditionally multivariate multinomial given a class, a vector of relative frequencies of the levels of a predictor.

**You cannot specify the Mu, Sigma, and DistributionParameters properties, whereas you can set the other properties by using name-value argument syntax when you call the object. • Mu and Sigma (linear and kernel models) — When you set Standardize=true and specify a positive estimation period, and the properties are empty, incremental fitting functions estimate means and standard deviations using the estimation period observations. For more details, see “Standardize Data” on page 35-3775. • DistributionParameters (naive Bayes classification models) — The property must be fitted to data, by fit, or updateMetricsAndFit. For linear classification and regression models: • The estimation period, specified by the number of observations in EstimationPeriod, occurs before training begins (see Incremental Learning Periods on page 28-6). During the estimation period, the incremental fitting function fit or updateMetricsAndFit computes quantities required for training when they are unknown. For example, if you set Standardize=true, incremental learning functions require predictor means and standard deviations to standardize the predictor data. Consequently, the incremental model requires a positive estimation period (the default is 1000). • The default solver is the adaptive scale-invariant solver "scale-invariant" [2], which is hyperparameter free and insensitive to the predictor variable scales; therefore, predictor data standardization is not required. You can specify standard or average SGD instead, "sgd" or "asgd". However, SGD is sensitive to predictor variable scales and requires hyperparameter tuning, which can be difficult or impossible to do during incremental learning. If you plan to use an SGD solver, complete these steps:

28-18

Configure Incremental Learning Model

1

Obtain labeled data.

2

Traditionally train a linear classification or regression model by calling fitclinear or fitrlinear, respectively. Specify the SGD solver you plan to use for incremental learning, cross-validate to determine an appropriate set of hyperparameters, and standardize the predictor data.

3

Train the model on the entire sample using the specified hyperparameter set.

4

Convert the resulting model to an incremental learner by using incrementalLearner.

Performance Evaluation Options and Properties Performance evaluation properties and options enable you to configure how and when model performance is measured by the incremental learning function updateMetrics or updateMetricsAndFit. Regardless of the options you choose, first familiarize yourself with the incremental learning periods on page 28-6. This table contains all performance evaluation options and properties. Performance Evaluation Options and Properties

Description

Metrics

Specify the list of performance metrics or loss functions to measure incrementally by using the Metrics name-value argument. The Metrics property stores a table of tracked cumulative and window metrics.

MetricsWarmupPeiod

Number of observations to which the incremental model must be fit before it tracks performance metrics

MetricsWindowSize

Number of observations to use to compute window performance metrics

IsWarm***

Flag indicating whether the model is warm (measures performance metrics)

***You cannot specify the IsWarm property, whereas you can set the other properties by using namevalue argument syntax when you call the object. The metrics specified by the Metrics name-value argument form a table stored in the Metrics property of the model. For example, if you specify Metrics=["Metric1","Metric2"] when you create an incremental model Mdl, the Metrics property is >> Mdl.Metrics ans = 2×2 table Cumulative __________ Metric1 Metric2

NaN NaN

Window ______ NaN NaN

Specify a positive metrics warm-up period when you believe the model is of low quality and needs to be trained before the function updateMetrics or updateMetricsAndFit tracks performance 28-19

28

Incremental Learning

metrics in the Metrics property. In this case, the IsWarm property is false, and you must pass the incoming data and model to the incremental fitting function fit or updateMetricsAndFit. When the incremental fitting function processes enough data to satisfy the estimation period (for linear and kernel models) and the metrics warm-up period, the IsWarm property becomes true, and you can measure the model performance on incoming data and optionally train the model. For naive Bayes and ECOC classification models, incremental fitting functions must additionally fit the model to all expected classes to become warm. When the model is warm, updateMetrics or updateMetricsAndFit tracks all specified metrics cumulatively (from the start of the evaluation) and within a window of observations specified by the MetricsWindowSize property. Cumulative metrics reflect the model performance over the entire incremental learning history; after Performance Evaluation Period 1 starts, cumulative metrics are independent of the evaluation period. Window metrics reflect the model performance only over the specified window size for each performance evaluation period.

Convert Traditionally Trained Model incrementalLearner enables you to initialize an incremental model using information learned from a traditionally trained model. The converted model can generate predictions and it is warm, which means that incremental learning functions can measure model performance metrics from the start of the data stream. In other words, estimation and performance metrics warm-up periods are not required for incremental learning. To convert a traditionally trained model to an incremental learner, pass the model and any options specified by name-value arguments to incrementalLearner. For example, the following pseudocode initializes an incremental classification model by using all information that a linear SVM model for binary classification has learned from a batch of data. Mdl = fitcsvm(X,Y); IncrementalMdl = incrementalLearner(Mdl,Name=Value);

IncrementalMdl is an incremental one-class SVM model object for anomaly detection. Ease of incremental model creation and initialization is offset by decreased flexibility. The software assumes that fitted parameters, hyperparameter values, and data characteristics learned during traditional training are appropriate for incremental learning. Therefore, you cannot set corresponding learned or tuned options when you call incrementalLearner. This table lists notable read-only properties of IncrementalMdl that the incrementalLearner function transfers from Mdl or infers from other values. For more details, see the output argument description of each incrementalLearner function page.

28-20

Model Type

Property

Description

All

NumPredictors

Number of predictor variables. For models that dummy-code categorical predictor variables, NumPredictors is numel(Mdl.ExpandedPredictorNames), and predictor variables expected during incremental learning correspond to the names. For more details, see “Dummy Variables” on page 2-13.

Configure Incremental Learning Model

Model Type

Property

Description

Classification

ClassNames

All class labels expected during incremental learning

Prior

Prior class distribution

ScoreTransform

A function to apply to classification scores. For example, if you configure an SVM model to compute posterior class probabilities, ScoreTransform (containing the score-toposterior-probability function learned from the data) is transferred.

Epsilon

For an SVM learner, half the width of the epsilon-insensitive band

ResponseTransform

A function to apply to predicted responses

BinaryLearners

Trained binary learners, a cell array of model objects

CodingMatrix

Class assignment codes for the binary learners

CodingName

Coding design name

KernelScale

Kernel scale parameter

Learner

Linear model type

Mu

Predictor variable means

Regression

ECOC classification

Kernel classification or regression

NumExpansionDimensi Number of dimensions of expanded space, a ons positive integer Linear classification or regression

Sigma

Predictor variable standard deviations

Beta

Linear model coefficients

Bias

Model intercept

Learner

Linear model type

Mu

For an SVM model object, the predictor variable means

Sigma

For an SVM model object, the predictor variable standard deviations

28-21

28

Incremental Learning

Model Type

Property

Description

Naive Bayes classification

DistributionNames

Conditional distribution of the predictor variables given the class, having either of the following values: • A NumPredictors length cell vector with entries "normal", when the corresponding predictor is normal, or "mvmn", when the corresponding predictor is multivariate multinomial. • "mn", when all predictor variables compose a multinomial distribution. If you convert a naive Bayes classification model containing at least one predictor with a kernel distribution, incrementalLearner issues an error.

DistributionParamet Fitted distribution parameters of each ers conditional predictor distribution given each class, a NumPredictors-by-K cell matrix. CategoricalPredicto Numeric vector of indices of categorical rs predictors CategoricalLevels

Multivariate multinomial predictor levels, a cell vector of length NumPredictors

Note • The NumTrainingObservations property of IncrementalMdl does not include the observations used to train Mdl. It only includes the observations used for incremental learning when you call fit or updateMetricsAndFit. • If you specify Standardize=true when you train Mdl, IncrementalMdl is configured to standardize predictors during incremental learning by default.

The following conditions apply when you convert a linear classification or regression model (ClassificationLinear and RegressionLinear, respectively): • Incremental fitting functions support ridge (L2) regularization only. • Incremental fitting functions support the specification of only one regularization value. Therefore, if you specify a regularization path (vector of regularization values) when you call fitclinear or fitrlinear, choose the model associated with one penalty by passing it to selectModels. • If you solve the objective function by using standard or average SGD ("sgd" or "asgd" for the Solver name-value argument), these conditions apply when you call incrementalLearner: • incrementalLearner transfers the solver used to optimize Mdl to IncrementalMdl. • You can specify the adaptive scale-invariant solver "scale-invariant" instead, but you cannot specify a different SGD solver. • If you do not specify the adaptive scale-invariant solver, incrementalLearner transfers model and solver hyperparameter values to the incremental model object, such as the learning 28-22

Configure Incremental Learning Model

rate LearnRate, mini-batch size BatchSize, and ridge penalty Lambda. You cannot modify the transferred properties. Call Object After Training Model If you require more flexibility when you create an incremental model, you can call the object directly on page 28-16 and initialize the model by individually setting learned information using name-value arguments. The following pseudocode show two examples: • Initialize an incremental classification model from the coefficients and class names learned by fitting a linear SVM model for binary classification to a batch of data Xc and Yc. Mdl = fitcsvm(Xc,Yc); IncrementalMdl = incrementalClassificationLinear( ... Beta=Mdl.Beta,Bias=Mdl.Bias,ClassNames=Mdl.ClassNames);

• Initialize an incremental regression model from the coefficients learned by fitting a linear model to a batch of data Xr and Yr. Mdl = fitlm(Xr,Yr); bias = Mdl.Coefficients.Estimate(1); beta = Mdl.Coefficients.Estimate(2:end); IncrementalMdl = incrementalRegressionLinear( ... Learner="leastsquares",Bias=bias,Beta=beta);

References [1] Bifet, Albert, Ricard Gavaldá, Geoffrey Holmes, and Bernhard Pfahringer. Machine Learning for Data Streams with Practical Example in MOA. Cambridge, MA: The MIT Press, 2007. [2] Kempka, Michał, Wojciech Kotłowski, and Manfred K. Warmuth. "Adaptive Scale-Invariant Online Algorithms for Learning Linear Models." Preprint, submitted February 10, 2019. https:// arxiv.org/abs/1902.07528.

See Also Objects incrementalRobustRandomCutForest | incrementalOneClassSVM

More About •

“Incremental Anomaly Detection Overview” on page 28-9

•

“Unsupervised Anomaly Detection” on page 17-91

28-23

28

Incremental Learning

Configure Model for Incremental Anomaly Detection An incremental learning model object fully specifies how functions implement incremental fitting and anomaly detection. To configure (or prepare) an incremental learning model, create one by calling the object directly, or by converting a traditionally trained model to an incremental learner model object. The following table lists the available model types, model objects for incremental anomaly detection, and conversion functions. Model Type

Model Object for Incremental Conversion Function Anomaly Detection

Robust random cut forest

incrementalRobustRandomC incrementalLearner utForest converts a robust random cut forest model (RobustRandomCutForest).

One-class support vector machine (SVM)

incrementalOneClassSVM

incrementalLearner converts a one-class SVM model (OneClassSVM).

The approach you use to create an incremental model depends on the information you have and your preferences. • Call object — Create an incremental model to your specifications by calling the object directly. This approach is flexible, enabling you to specify most options to suit your preferences, and the resulting model provides reasonable default values. For more details, see “Call Object Directly” on page 28-24. • Convert model — Convert a traditionally trained model to an incremental learner to initialize a model for incremental learning by using the incrementalLearner function. The function passes information that the traditionally trained model learned from the data. When you use incrementalLearner, you can specify anomaly detection options and only those training, model, and data options that are unknown during conversion. For more details, see “Convert Traditionally Trained Model” on page 28-25.

Call Object Directly Unlike when working with other machine learning model objects, you can create an incremental anomaly detection model by calling the corresponding object directly, with little knowledge about the data. For example, the following code creates a default incremental model for anomaly detection using the robust random cut forest algorithm, and a one-class SVM incremental anomaly detection model for a data stream containing 5 predictors. MdlRRCF = incrementalRobustRandomCutForest; MdlOCSVM = incrementalOneClassSVM(NumPredictors=5);

If you have information about the data to specify, or you want to configure model options or anomaly detection settings, use name-value argument syntax when you call the object. (All model properties are read-only; you cannot modify them using dot notation.) For example, the following pseudocode creates a robust random cut forest model for incremental anomaly detection, specifies that the first, second, and fourth predictors are categorical, and sets the score warm-up period to 500 observations. 28-24

Configure Model for Incremental Anomaly Detection

Mdl = incrementalRobustRandomCutForest(CategoricalPredictors=[1 2 4], ... ScoreWarmupPeriod=500);

For more details on all options, see the Properties section of each incremental model object page.

Convert Traditionally Trained Model incrementalLearner enables you to initialize an incremental anomaly detection model using information learned from a traditionally trained model. The converted model can calculate scores and identify anomalies. The converted model is also warm, which means estimation and performance metrics warm-up periods are not required for incremental learning. To convert a traditionally trained model to an incremental learner, pass the model and any options specified by name-value arguments to incrementalLearner. For example, the following pseudocode initializes an incremental one-class SVM model by using all information that a one-class SVM model for anomaly detection learned from a batch of data. Mdl = ocsvm(X); IncrementalMdl = incrementalLearner(Mdl,Name=Value);

IncrementalMdl is an incremental learner model object associated with the machine learning objective. Ease of incremental model creation and initialization is offset by decreased flexibility. The software assumes that fitted parameters, hyperparameter values, and data characteristics learned during traditional training are appropriate for incremental learning. Therefore, you cannot set corresponding learned or tuned options when you call incrementalLearner. This table lists notable read-only properties of IncrementalMdl that the incrementalLearner function transfers from Mdl or infers from other values. For more details, see the output argument descriptions on each incrementalLearner function page. Model Type

Property

Description

All

ContaminationFraction

Fraction of anomalies in training data

Mu

Predictor variable means

PredictorNames

Predictor variable names

ScoreThreshold

Threshold score for anomalies

Sigma

Predictor variable standard deviations

CategoricalPredictors

Indices of categorical predictors

NumLearners

Number of robust random cut trees

Robust random cut forest

NumObservationsPerLearne Number of observations for r each robust random cut tree One-class SVM

KernelScale

Kernel scale parameter for random feature expansion

Lambda

Ridge (L2) regularization term strength

28-25

28

Incremental Learning

Model Type

Property

Description

NumExpansionDimensions

Number of dimensions of the expanded space

Note If you specify StandardizeData=true when you train Mdl, IncrementalMdl is configured to standardize predictors during incremental learning, by default. The following conditions apply to the one-class SVM model only: • The incremental fitting function supports ridge (L2) regularization only. • If you solve the objective function by using standard or average SGD ("sgd" or "asgd" for the Solver name-value argument), these conditions apply when you call incrementalLearner: • incrementalLearner transfers the solver used to optimize Mdl to IncrementalMdl. • You can specify the adaptive scale-invariant solver "scale-invariant" instead, but you cannot specify a different SGD solver. • If you do not specify the adaptive scale-invariant solver, incrementalLearner transfers model and solver hyperparameter values to the incremental model object, such as the learning rate LearnRate, mini-batch size BatchSize, and ridge penalty Lambda. You cannot modify the transferred properties.

References [1] Bifet, Albert, Ricard Gavaldá, Geoffrey Holmes, and Bernhard Pfahringer. Machine Learning for Data Streams with Practical Example in MOA. Cambridge, MA: The MIT Press, 2007. [2] Kempka, Michał, Wojciech Kotłowski, and Manfred K. Warmuth. "Adaptive Scale-Invariant Online Algorithms for Learning Linear Models." Preprint, submitted February 10, 2019. https:// arxiv.org/abs/1902.07528.

See Also Objects incrementalRobustRandomCutForest | incrementalOneClassSVM

More About

28-26

•

“Incremental Anomaly Detection Overview” on page 28-9

•

“Unsupervised Anomaly Detection” on page 17-91

Implement Incremental Learning for Regression Using Succinct Workflow

Implement Incremental Learning for Regression Using Succinct Workflow This example shows how to use the succinct workflow to implement incremental learning for linear regression with prequential evaluation. Specifically, this example does the following: 1

Create a default incremental learning model for linear regression.

2

Simulate a data stream using a for loop, which feeds small chunks of observations to the incremental learning algorithm.

3

For each chunk, use updateMetricsAndFit to measure the model performance given the incoming data, and then fit the model to that data.

Create Default Model Object Create a default incremental learning model for linear regression. Mdl = incrementalRegressionLinear() Mdl = incrementalRegressionLinear IsWarm: Metrics: ResponseTransform: Beta: Bias: Learner:

0 [1x2 table] 'none' [0x1 double] 0 'svm'

Mdl.EstimationPeriod ans = 1000

Mdl is an incrementalRegressionLinear model object. All its properties are read-only. Mdl must be fit to data before you can use it to perform any other operations. The software sets the estimation period to 1000 because half the width of the epsilon insensitive band Epsilon is unknown. You can set Epsilon to a positive floating-point scalar by using the 'Epsilon' namevalue pair argument. This action results in a default estimation period of 0. Load Data Load the robot arm data set. load robotarm

For details on the data set, enter Description at the command line. Implement Incremental Learning Use the succinct workflow to update model performance metrics and fit the incremental model to the training data by calling the updateMetricsAndFit function. At each iteration: 28-27

28

Incremental Learning

• Process 50 observations to simulate a data stream. • Overwrite the previous incremental model with a new one fitted to the incoming observations. • Store the cumulative metrics, window metrics, and the first coefficient β1 to see how they evolve during incremental learning. % Preallocation n = numel(ytrain); numObsPerChunk = 50; nchunk = floor(n/numObsPerChunk); ei = array2table(zeros(nchunk,2),'VariableNames',["Cumulative" "Window"]); beta1 = zeros(nchunk,1); % Incremental fitting for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; Mdl = updateMetricsAndFit(Mdl,Xtrain(idx,:),ytrain(idx)); ei{j,:} = Mdl.Metrics{"EpsilonInsensitiveLoss",:}; beta1(j + 1) = Mdl.Beta(1); end

IncrementalMdl is an incrementalRegressionLinear model object trained on all the data in the stream. During incremental learning and after the model is warmed up, updateMetricsAndFit checks the performance of the model on the incoming observations, and then fits the model to those observations. Inspect Model Evolution To see how the performance metrics and β1 evolve during training, plot them on separate tiles. t = tiledlayout(2,1); nexttile plot(beta1) ylabel('\beta_1') xlim([0 nchunk]) xline(Mdl.EstimationPeriod/numObsPerChunk,'r-.') nexttile h = plot(ei.Variables); xlim([0 nchunk]) ylabel('Epsilon Insensitive Loss') xline(Mdl.EstimationPeriod/numObsPerChunk,'r-.') xline((Mdl.EstimationPeriod + Mdl.MetricsWarmupPeriod)/numObsPerChunk,'g-.') legend(h,ei.Properties.VariableNames) xlabel(t,'Iteration')

28-28

Implement Incremental Learning for Regression Using Succinct Workflow

The plot suggests that updateMetricsAndFit does the following: • After the estimation period (first 20 iterations), fit β1 during all incremental learning iterations. • Compute the performance metrics after the metrics warm-up period only. • Compute the cumulative metrics during each iteration. • Compute the window metrics after processing 200 observations (4 iterations).

See Also Objects incrementalRegressionLinear Functions updateMetricsAndFit

More About •

“Configure Incremental Learning Model” on page 28-14

•

“Implement Incremental Learning for Regression Using Flexible Workflow” on page 28-33

28-29

28

Incremental Learning

Implement Incremental Learning for Classification Using Succinct Workflow This example shows how to use the succinct workflow to implement incremental learning for binary classification with prequential evaluation. Specifically, this example does the following: 1

Create a default incremental learning model for binary classification.

2

Simulate a data stream using a for loop, which feeds small chunks of observations to the incremental learning algorithm.

3

For each chunk, use updateMetricsAndFit to measure the model performance given the incoming data, and then fit the model to that data.

Although this example treats the application as a binary classification problem, you can implement multiclass incremental learning using an object for a multiclass problem by following this same workflow. Create Default Model Object Create a default incremental learning model for binary classification. Mdl = incrementalClassificationLinear() Mdl = incrementalClassificationLinear IsWarm: Metrics: ClassNames: ScoreTransform: Beta: Bias: Learner:

0 [1x2 table] [1x0 double] 'none' [0x1 double] 0 'svm'

Mdl is an incrementalClassificationLinear model object. All its properties are read-only. Mdl must be fit to data before you can use it to perform any other operations. Load and Preprocess Data Load the human activity data set. Randomly shuffle the data. load humanactivity n = numel(actid); rng(1) % For reproducibility idx = randsample(n,n); X = feat(idx,:); Y = actid(idx);

For details on the data set, enter Description at the command line. Responses can be one of five classes: Sitting, Standing, Walking, Running, or Dancing. Dichotomize the response by identifying whether the subject is moving (actid > 2). 28-30

Implement Incremental Learning for Classification Using Succinct Workflow

Y = Y > 2;

Implement Incremental Learning Use the succinct workflow to update model performance metrics and fit the incremental model to the training data by calling the updateMetricsAndFit function. At each iteration: • Process 50 observations to simulate a data stream. • Overwrite the previous incremental model with a new one fitted to the incoming observations. • Store the cumulative metrics, the window metrics, and the first coefficient β1 to see how they evolve during incremental learning. % Preallocation numObsPerChunk = 50; nchunk = floor(n/numObsPerChunk); ce = array2table(zeros(nchunk,2),'VariableNames',["Cumulative" "Window"]); beta1 = zeros(nchunk,1); % Incremental fitting for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; Mdl = updateMetricsAndFit(Mdl,X(idx,:),Y(idx)); ce{j,:} = Mdl.Metrics{"ClassificationError",:}; beta1(j + 1) = Mdl.Beta(1); end

Mdl is an incrementalClassificationLinear model object trained on all the data in the stream. During incremental learning and after the model is warmed up, updateMetricsAndFit checks the performance of the model on the incoming observations, and then fits the model to those observations. Inspect Model Evolution To see how the performance metrics and β1 evolve during training, plot them on separate tiles. t = tiledlayout(2,1); nexttile plot(beta1) ylabel('\beta_1') xlim([0 nchunk]) nexttile h = plot(ce.Variables); xlim([0 nchunk]) ylabel('Classification Error') xline(Mdl.MetricsWarmupPeriod/numObsPerChunk,'g-.') legend(h,ce.Properties.VariableNames) xlabel(t,'Iteration')

28-31

28

Incremental Learning

The plot suggests that updateMetricsAndFit does the following: • Fit β1 during all incremental learning iterations. • Compute the performance metrics after the metrics warm-up period only. • Compute the cumulative metrics during each iteration. • Compute the window metrics after processing 200 observations (4 iterations).

See Also Objects incrementalClassificationLinear Functions updateMetricsAndFit

More About

28-32

•

“Configure Incremental Learning Model” on page 28-14

•

“Implement Incremental Learning for Classification Using Flexible Workflow” on page 28-37

Implement Incremental Learning for Regression Using Flexible Workflow

Implement Incremental Learning for Regression Using Flexible Workflow This example shows how to use the flexible workflow to implement incremental learning for linear regression with prequential evaluation. A traditionally trained model initializes the incremental model. Specifically, this example does the following: 1

Train a linear regression model on a subset of data.

2

Convert the traditionally trained model to an incremental learning model for linear regression.

3

Simulate a data stream using a for loop, which feeds small chunks of observations to the incremental learning algorithm.

4

For each chunk, use updateMetrics to measure the model performance given the incoming data, and then use fit to fit the model to that data.

Load and Preprocess Data Load the 2015 NYC housing data set, and shuffle the data. For more details on the data, see NYC Open Data. load NYCHousing2015 rng(1) % For reproducibility n = size(NYCHousing2015,1); idxshuff = randsample(n,n); NYCHousing2015 = NYCHousing2015(idxshuff,:);

Suppose that the data collected from Manhattan (BOROUGH = 1) was collected using a new method that doubles its quality. Create a weight variable that attributes 2 to observations collected from Manhattan, and 1 to all other observations. NYCHousing2015.W = ones(n,1) + (NYCHousing2015.BOROUGH == 1);

Extract the response variable SALEPRICE from the table. For numerical stability, scale SALEPRICE by 1e6. Y = NYCHousing2015.SALEPRICE/1e6; NYCHousing2015.SALEPRICE = [];

Create dummy variable matrices from the categorical predictors. catvars = ["BOROUGH" "BUILDINGCLASSCATEGORY" "NEIGHBORHOOD"]; dumvarstbl = varfun(@(x)dummyvar(categorical(x)),NYCHousing2015,... 'InputVariables',catvars); dumvarmat = table2array(dumvarstbl); NYCHousing2015(:,catvars) = [];

Treat all other numeric variables in the table as linear predictors of sales price. Concatenate the matrix of dummy variables to the rest of the predictor data. Transpose the data. idxnum = varfun(@isnumeric,NYCHousing2015,'OutputFormat','uniform'); X = [dumvarmat NYCHousing2015{:,idxnum}]';

28-33

28

Incremental Learning

Train Linear Regression Model Fit a linear regression model to a random sample of half the data. Specify that observations are oriented along the columns of the data. idxtt = randsample([true false],n,true); TTMdl = fitrlinear(X(:,idxtt),Y(idxtt),'ObservationsIn','columns') TTMdl = RegressionLinear ResponseName: ResponseTransform: Beta: Bias: Lambda: Learner:

'Y' 'none' [313x1 double] 0.1889 2.1977e-05 'svm'

TTMdl is a RegressionLinear model object representing a traditionally trained linear regression model. Convert Trained Model Convert the traditionally trained linear regression model to a linear regression model for incremental learning. IncrementalMdl = incrementalLearner(TTMdl) IncrementalMdl = incrementalRegressionLinear IsWarm: Metrics: ResponseTransform: Beta: Bias: Learner:

1 [1x2 table] 'none' [313x1 double] 0.1889 'svm'

Implement Incremental Learning Use the flexible workflow to update model performance metrics and fit the incremental model to the training data by calling the updateMetrics and fit functions separately. Simulate a data stream by processing 500 observations at a time. At each iteration:

28-34

1

Call updateMetrics to update the cumulative and window epsilon insensitive loss of the model given the incoming chunk of observations. Overwrite the previous incremental model to update the losses in the Metrics property. Note that the function does not fit the model to the chunk of data—the chunk is "new" data for the model. Specify that observations are oriented along the columns of the data.

2

Call fit to fit the model to the incoming chunk of observations. Overwrite the previous incremental model to update the model parameters. Specify that observations are oriented along the columns of the data.

3

Store the losses and last estimated coefficient β313.

Implement Incremental Learning for Regression Using Flexible Workflow

% Preallocation numObsPerChunk = 500; nchunk = floor(n/numObsPerChunk); ei = array2table(zeros(nchunk,2),'VariableNames',["Cumulative" "Window"]); beta313 = zeros(nchunk,1); % Incremental fitting for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; IncrementalMdl = updateMetrics(IncrementalMdl,X(:,idx),Y(idx),'ObservationsIn','columns'); ei{j,:} = IncrementalMdl.Metrics{"EpsilonInsensitiveLoss",:}; IncrementalMdl = fit(IncrementalMdl,X(:,idx),Y(idx),'ObservationsIn','columns'); beta313(j) = IncrementalMdl.Beta(end); end

IncrementalMdl is an incrementalRegressionLinear model object trained on all the data in the stream. Alternatively, you can use updateMetricsAndFit to update performance metrics of the model given a new chunk of data, and then fit the model to the data. Inspect Model Evolution Plot a trace plot of the performance metrics and estimated coefficient β313. t = tiledlayout(2,1); nexttile h = plot(ei.Variables); xlim([0 nchunk]) ylabel('Epsilon Insensitive Loss') legend(h,ei.Properties.VariableNames) nexttile plot(beta313) ylabel('\beta_{313}') xlim([0 nchunk]) xlabel(t,'Iteration')

28-35

28

Incremental Learning

The cumulative loss gradually changes with each iteration (chunk of 500 observations), whereas the window loss jumps. Because the metrics window is 200 by default, updateMetrics measures the performance based on the latest 200 observations in each 500 observation chunk. β313 changes abruptly at first and then just slightly as fit processes chunks of observations.

See Also Objects incrementalRegressionLinear Functions fit | updateMetrics

More About

28-36

•

“Configure Incremental Learning Model” on page 28-14

•

“Implement Incremental Learning for Regression Using Succinct Workflow” on page 28-27

Implement Incremental Learning for Classification Using Flexible Workflow

Implement Incremental Learning for Classification Using Flexible Workflow This example shows how to use the flexible workflow to implement incremental learning for binary classification with prequential evaluation. A traditionally trained model initializes the incremental model. Specifically, this example does the following: 1

Train a linear model for binary classification on a subset of data.

2

Convert the traditionally trained model to an incremental learning model for binary classification.

3

Simulate a data stream using a for loop, which feeds small chunks of observations to the incremental learning algorithm.

4

For each chunk, use updateMetrics to measure the model performance given the incoming data, and then use fit to fit the model to that data.

Although this example treats the application as a binary classification problem, you can implement multiclass incremental learning using an object for a multiclass problem by following this same workflow. Load and Preprocess Data Load the human activity data set. Randomly shuffle the data. Orient the observations of the predictor data in columns. load humanactivity rng(1) % For reproducibility n = numel(actid); idx = randsample(n,n); X = feat(idx,:)'; Y = actid(idx);

For details on the data set, enter Description at the command line. Responses can be one of five classes: Sitting, Standing, Walking, Running, or Dancing. Dichotomize the response by identifying whether the subject is moving (actid > 2). Y = Y > 2;

Train Linear Model for Binary Classification Fit a linear model for binary classification to a random sample of half the data. Specify that the observations are oriented along the columns of the data. idxtt = randsample([true false],n,true); TTMdl = fitclinear(X(:,idxtt),Y(idxtt),'ObservationsIn','columns') TTMdl = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'none' Beta: [60x1 double] Bias: -0.3005 Lambda: 8.2967e-05 Learner: 'svm'

28-37

28

Incremental Learning

TTMdl is a ClassificationLinear model object representing a traditionally trained linear model for binary classification. Convert Trained Model Convert the traditionally trained classification model to a binary classification linear model for incremental learning. IncrementalMdl = incrementalLearner(TTMdl) IncrementalMdl = incrementalClassificationLinear IsWarm: Metrics: ClassNames: ScoreTransform: Beta: Bias: Learner:

1 [1x2 table] [0 1] 'none' [60x1 double] -0.3005 'svm'

Implement Incremental Learning Use the flexible workflow to update model performance metrics and fit the incremental model to the training data by calling the updateMetrics and fit functions separately. Simulate a data stream by processing 50 observations at a time. At each iteration: 1

Call updateMetrics to update the cumulative and window classification error of the model given the incoming chunk of observations. Overwrite the previous incremental model to update the losses in the Metrics property. Note that the function does not fit the model to the chunk of data—the chunk is "new" data for the model. Specify that the observations are oriented in columns.

2

Call fit to fit the model to the incoming chunk of observations. Overwrite the previous incremental model to update the model parameters. Specify that the observations are oriented in columns.

3

Store the classification error and first estimated coefficient β1.

% Preallocation idxil = ~idxtt; nil = sum(idxil); numObsPerChunk = 50; nchunk = floor(nil/numObsPerChunk); ce = array2table(zeros(nchunk,2),'VariableNames',["Cumulative" "Window"]); beta1 = [IncrementalMdl.Beta(1); zeros(nchunk,1)]; Xil = X(:,idxil); Yil = Y(idxil); % Incremental fitting for j = 1:nchunk ibegin = min(nil,numObsPerChunk*(j-1) + 1); iend = min(nil,numObsPerChunk*j);

28-38

Implement Incremental Learning for Classification Using Flexible Workflow

idx = ibegin:iend; IncrementalMdl = updateMetrics(IncrementalMdl,Xil(:,idx),Yil(idx),... 'ObservationsIn','columns'); ce{j,:} = IncrementalMdl.Metrics{"ClassificationError",:}; IncrementalMdl = fit(IncrementalMdl,Xil(:,idx),Yil(idx),'ObservationsIn','columns'); beta1(j + 1) = IncrementalMdl.Beta(end); end

IncrementalMdl is an incrementalClassificationLinear model object trained on all the data in the stream. Alternatively, you can use updateMetricsAndFit to update performance metrics of the model given a new chunk of data, and then fit the model to the data. Inspect Model Evolution Plot a trace plot of the performance metrics and estimated coefficient β1. t = tiledlayout(2,1); nexttile h = plot(ce.Variables); xlim([0 nchunk]) ylabel('Classification Error') legend(h,ce.Properties.VariableNames) nexttile plot(beta1) ylabel('\beta_1') xlim([0 nchunk]) xlabel(t,'Iteration')

28-39

28

Incremental Learning

The cumulative loss is stable and decreases gradually, whereas the window loss jumps.

β1 changes abruptly at first, and then gradually levels off as fit processes more chunks of observations.

See Also Objects incrementalClassificationLinear Functions fit | updateMetrics

More About

28-40

•

“Configure Incremental Learning Model” on page 28-14

•

“Implement Incremental Learning for Classification Using Succinct Workflow” on page 28-30

Initialize Incremental Learning Model from SVM Regression Model Trained in Regression Learner

Initialize Incremental Learning Model from SVM Regression Model Trained in Regression Learner This example shows how to tune and train a linear SVM regression model using the Regression Learner app. Then, at the command line, initialize and train an incremental model for linear SVM regression using the information gained from training in the app. Load and Preprocess Data Load the 2015 NYC housing data set, and shuffle the data. For more details on the data, see NYC Open Data. load NYCHousing2015 rng(1); % For reproducibility n = size(NYCHousing2015,1); idxshuff = randsample(n,n); NYCHousing2015 = NYCHousing2015(idxshuff,:);

For numerical stability, scale SALEPRICE by 1e6. NYCHousing2015.SALEPRICE = NYCHousing2015.SALEPRICE/1e6;

Consider training a linear SVM regression model to about 1% of the data, and reserving the remaining data for incremental learning. Regression Learner supports categorical variables. However, models for incremental learning require dummy-coded categorical variables. Because the BUILDINGCLASSCATEGORY and NEIGHBORHOOD variables contain many levels (some with low representation), the probability that a partition does not have all categories is high. Therefore, dummy-code all categorical variables. Concatenate the matrix of dummy variables to the rest of the numeric variables. catvars = ["BOROUGH" "BUILDINGCLASSCATEGORY" "NEIGHBORHOOD"]; dumvars = splitvars(varfun(@(x)dummyvar(categorical(x)),NYCHousing2015, ... 'InputVariables',catvars)); NYCHousing2015(:,catvars) = []; idxnum = varfun(@isnumeric,NYCHousing2015,'OutputFormat','uniform'); NYCHousing2015 = [dumvars NYCHousing2015(:,idxnum)];

Randomly partition the data into 1% and 99% subsets by calling cvpartition and specifying a holdout (test) sample proportion of 0.99. Create tables for the 1% and 99% partitions. cvp = cvpartition(n,'HoldOut',0.99); idxtt = cvp.training; idxil = cvp.test; NYCHousing2015tt = NYCHousing2015(idxtt,:); NYCHousing2015il = NYCHousing2015(idxil,:);

Tune and Train Model Using Regression Learner Open Regression Learner by entering regressionLearner at the command line. regressionLearner

Alternatively, on the Apps tab, click the Show more arrow to open the apps gallery. Under Machine Learning and Deep Learning, click the app icon. 28-41

28

Incremental Learning

Choose the training data set and variables. 1

On the Regression Learner tab, in the File section, select New Session, and then select From Workspace.

2

In the New Session from Workspace dialog box, under Data Set Variable, select the data set NYCHousing2015tt.

3

Under Response, ensure the response variable SALEPRICE is selected.

4

Click Start Session.

The app implements 5-fold cross-validation by default. Train a linear SVM regression model. Tune only the Epsilon hyperparameter by using Bayesian optimization.

28-42

1

On the Regression Learner tab, in the Models section, click the Show more arrow to open the apps gallery. In the Support Vector Machines section, click Optimizable SVM.

2

On the model Summary tab, in the Model Hyperparameters section: a

Deselect the Optimize boxes for all available options except Epsilon.

b

Set the value of Kernel scale to Manual and 1.

c

Set the value of Standardize data to No.

Initialize Incremental Learning Model from SVM Regression Model Trained in Regression Learner

3

On the Regression Learner tab, in the Train section, click Train All and select Train Selected.

The app shows a plot of the generalization minimum MSE of the model as optimization progresses. The app can take some time to optimize the algorithm.

28-43

28

Incremental Learning

Export the trained, optimized linear SVM regression model. 1

On the Regression Learner tab, in the Export section, select Export Model, and select Export Model.

2

In the Export Model dialog box, click OK.

The app passes the trained model, among other variables, in the structure array trainedModel to the workspace. Close Regression Learner . Convert Exported Model to Incremental Model At the command line, extract the trained SVM regression model from trainedModel. Mdl = trainedModel.RegressionSVM;

Convert the model to an incremental model. IncrementalMdl = incrementalLearner(Mdl) IncrementalMdl.Epsilon IncrementalMdl = incrementalRegressionLinear

28-44

Initialize Incremental Learning Model from SVM Regression Model Trained in Regression Learner

IsWarm: Metrics: ResponseTransform: Beta: Bias: Learner:

1 [1×2 table] 'none' [312×1 double] 12.3802 'svm'

Properties, Methods ans = 5.4536

IncrementalMdl is an incrementalRegressionLinear model object for incremental learning using a linear SVM regression model. incrementalLearner initializes IncrementalMdl using the coefficients and the optimized value of the Epsilon hyperparameter learned from Mdl. Therefore, you can predict responses by passing IncrementalMdl and data to predict. Also, the IsWarm property is true, which means that the incremental learning functions measure the model performance from the start of incremental learning. Implement Incremental Learning Because incremental learning functions accept floating-point matrices only, create matrices for the predictor and response data. Xil = NYCHousing2015il{:,1:(end-1)}; Yil = NYCHousing2015il{:,end};

Perform incremental learning on the 99% data partition by using the updateMetricsAndFit function. Simulate a data stream by processing 500 observations at a time. At each iteration: 1

Call updateMetricsAndFit to update the cumulative and window epsilon insensitive loss of the model given the incoming chunk of observations. Overwrite the previous incremental model to update the losses in the Metrics property.

2

Store the losses and last estimated coefficient β313.

% Preallocation nil = sum(idxil); numObsPerChunk = 500; nchunk = floor(nil/numObsPerChunk); ei = array2table(zeros(nchunk,2),'VariableNames',["Cumulative" "Window"]); beta313 = [IncrementalMdl.Beta(end); zeros(nchunk,1)]; % Incremental learning for j = 1:nchunk ibegin = min(nil,numObsPerChunk*(j-1) + 1); iend = min(nil,numObsPerChunk*j); idx = ibegin:iend; IncrementalMdl = updateMetricsAndFit(IncrementalMdl,Xil(idx,:),Yil(idx)); ei{j,:} = IncrementalMdl.Metrics{"EpsilonInsensitiveLoss",:}; beta313(j + 1) = IncrementalMdl.Beta(end); end

28-45

28

Incremental Learning

IncrementalMdl is an incrementalRegressionLinear model object trained on all the data in the stream. Plot a trace plot of the performance metrics and estimated coefficient β313. figure subplot(2,1,1) h = plot(ei.Variables); xlim([0 nchunk]) ylabel('Epsilon Insensitive Loss') legend(h,ei.Properties.VariableNames) subplot(2,1,2) plot(beta313) ylabel('\beta_{313}') xlim([0 nchunk]) xlabel('Iteration')

The cumulative loss gradually changes with each iteration (chunk of 500 observations), whereas the window loss jumps. Because the metrics window is 200 by default, updateMetricsAndFit measures the performance based on the latest 200 observations in each 500 observation chunk.

28-46

Initialize Incremental Learning Model from SVM Regression Model Trained in Regression Learner

β313 changes abruptly and then levels off as updateMetricsAndFit processes chunks of observations.

See Also Apps Regression Learner Objects incrementalRegressionLinear Functions updateMetricsAndFit | predict

More About •

“Configure Incremental Learning Model” on page 28-14

•

“Implement Incremental Learning for Regression Using Flexible Workflow” on page 28-33

28-47

28

Incremental Learning

Initialize Incremental Learning Model from Logistic Regression Model Trained in Classification Learner This example shows how to train a logistic regression model using the Classification Learner app. Then, at the command line, initialize and train an incremental model for binary classification using the information gained from training in the app. Load and Preprocess Data Load the human activity data set. Randomly shuffle the data. load humanactivity rng(1); % For reproducibility n = numel(actid); idx = randsample(n,n); X = feat(idx,:); actid = actid(idx);

For details on the data set, enter Description at the command line. Responses can be one of five classes: Sitting, Standing, Walking, Running, or Dancing. Dichotomize the response by creating a categorical array that identifies whether the subject is moving (actid > 2). moveidx = actid > 2; Y = repmat("NotMoving",n,1); Y(moveidx) = "Moving"; Y = categorical(Y);

Consider training a logistic regression model to about 1% of the data, and reserving the remaining data for incremental learning. Randomly partition the data into 1% and 99% subsets by calling cvpartition and specifying a holdout (test) sample proportion of 0.99. Create variables for the 1% and 99% partitions. cvp = cvpartition(n,'HoldOut',0.99); idxtt = cvp.training; idxil = cvp.test; Xtt Xil Ytt Yil

= = = =

X(idxtt,:); X(idxil,:); Y(idxtt); Y(idxil);

Train Model Using Classification Learner Open Classification Learner by entering classificationLearner at the command line. classificationLearner

Alternatively, on the Apps tab, click the Show more arrow to open the apps gallery. Under Machine Learning and Deep Learning, click the app icon. Choose the training data set and variables. 1

28-48

On the Classification Learner tab, in the File section, select New Session > From Workspace.

Initialize Incremental Learning Model from Logistic Regression Model Trained in Classification Learner

2

In the New Session from Workspace dialog box, under Data Set Variable, select the predictor variable Xtt.

3

Under Response, click From workspace; note that Ytt is selected automatically.

4

Under Validation Scheme, select Resubstitution Validation.

5

Click Start Session.

Train a logistic regression model. 1

On the Classification Learner tab, in the Models section, click the Show more arrow to open the gallery of models. In the Logistic Regression Classifiers section, click Logistic Regression.

2

On the Classification Learner tab, in the Train section, click Train All and select Train Selected. After training the model, the app displays a confusion matrix.

28-49

28

Incremental Learning

The confusion matrix suggests that the model classifies in-sample observations well. Export the trained logistic regression model. 1

On the Classification Learner tab, in the Export section, select Export Model > Export Model.

2

In the Export Model dialog box, click OK.

The app passes the trained model, among other variables, in the structure array trainedModel to the workspace. Close Classification Learner . Initialize Incremental Model Using Exported Model At the command line, extract the trained logistic regression model and the class names from trainedModel. The model is a GeneralizedLinearModel object. Because class names must match the data type of the response variable, convert the stored value to categorical. Mdl = trainedModel.GeneralizedLinearModel; ClassNames = categorical(trainedModel.ClassNames);

Extract the intercept and the coefficients from the model. The intercept is the first coefficient. Bias = Mdl.Coefficients.Estimate(1); Beta = Mdl.Coefficients.Estimate(2:end);

You cannot convert a GeneralizedLinearModel object to an incremental model directly. However, you can initialize an incremental model for binary classification by passing information learned from the app, such as estimated coefficients and class names. Create an incremental model for binary classification directly. Specify the learner, intercept, coefficient estimates, and class names learned from Classification Learner. Because good initial values of coefficients exist and all class names are known, specify a metrics warm-up period of length 0. IncrementalMdl = incrementalClassificationLinear('Learner','logistic', ... 'Beta',Beta,'Bias',Bias,'ClassNames',ClassNames, ... 'MetricsWarmupPeriod',0) IncrementalMdl = incrementalClassificationLinear IsWarm: Metrics: ClassNames: ScoreTransform: Beta: Bias: Learner:

0 [1×2 table] [Moving NotMoving] 'logit' [60×1 double] -471.7873 'logistic'

Properties, Methods

IncrementalMdl is an incrementalClassificationLinear model object for incremental learning using a logistic regression model. Because coefficients and all class names are specified, you can predict responses by passing IncrementalMdl and data to predict. 28-50

Initialize Incremental Learning Model from Logistic Regression Model Trained in Classification Learner

Implement Incremental Learning Perform incremental learning on the 99% data partition by using the updateMetricsAndFit function. Simulate a data stream by processing 50 observations at a time. At each iteration: 1

Call updateMetricsAndFit to update the cumulative and window classification error of the model given the incoming chunk of observations. Overwrite the previous incremental model to update the losses in the Metrics property.

2

Store the losses and the estimated coefficient β14.

% Preallocation nil = sum(idxil); numObsPerChunk = 50; nchunk = floor(nil/numObsPerChunk); ce = array2table(zeros(nchunk,2),'VariableNames',["Cumulative" "Window"]); beta14 = [IncrementalMdl.Beta(14); zeros(nchunk,1)]; % Incremental learning for j = 1:nchunk ibegin = min(nil,numObsPerChunk*(j-1) + 1); iend = min(nil,numObsPerChunk*j); idx = ibegin:iend; IncrementalMdl = updateMetricsAndFit(IncrementalMdl,Xil(idx,:),Yil(idx)); ce{j,:} = IncrementalMdl.Metrics{"ClassificationError",:}; beta14(j + 1) = IncrementalMdl.Beta(14); end

IncrementalMdl is an incrementalClassificationLinear model object trained on all the data in the stream. Plot a trace plot of the performance metrics and β14. figure; subplot(2,1,1) h = plot(ce.Variables); xlim([0 nchunk]); ylabel('Classification Error') legend(h,ce.Properties.VariableNames) subplot(2,1,2) plot(beta14) ylabel('\beta_{14}') xlim([0 nchunk]); xlabel('Iteration')

28-51

28

Incremental Learning

The cumulative loss gradually changes with each iteration (chunk of 50 observations), whereas the window loss jumps. Because the metrics window is 200 by default and updateMetricsAndFit measures the performance every four iterations. β14 adapts to the data as updateMetricsAndFit processes chunks of observations.

See Also Apps Classification Learner Objects incrementalClassificationLinear Functions updateMetricsAndFit | predict

More About

28-52

•

“Configure Incremental Learning Model” on page 28-14

•

“Implement Incremental Learning for Classification Using Flexible Workflow” on page 28-37

Perform Conditional Training During Incremental Learning

Perform Conditional Training During Incremental Learning This example shows how to train a naive Bayes multiclass classification model for incremental learning only when the model performance is unsatisfactory. The flexible incremental learning workflow enables you to train an incremental model on an incoming batch of data only when training is necessary (see “What Is Incremental Learning?” on page 28-2). For example, if the performance metrics of a model are satisfactory, then, to increase efficiency, you can skip training on incoming batches until the metrics become unsatisfactory. Load Data Load the human activity data set. Randomly shuffle the data. load humanactivity n = numel(actid); rng(1) % For reproducibility idx = randsample(n,n); X = feat(idx,:); Y = actid(idx);

For details on the data set, enter Description at the command line. Train Naive Bayes Classification Model Configure a naive Bayes classification model for incremental learning by setting: • The maximum number of expected classes to 5 • The tracked performance metric to the misclassification error rate, which also includes minimal cost • The metrics window size to 1000 • The metrics warmup period to 50 initobs = 50; Mdl = incrementalClassificationNaiveBayes('MaxNumClasses',5,'MetricsWindowSize',1000,... 'Metrics','classiferror','MetricsWarmupPeriod',initobs);

Fit the configured model to the first 50 observations. Mdl = fit(Mdl,X(1:initobs,:),Y(1:initobs)) Mdl = incrementalClassificationNaiveBayes IsWarm: Metrics: ClassNames: ScoreTransform: DistributionNames: DistributionParameters:

1 [2x2 table] [1 2 3 4 5] 'none' {1x60 cell} {5x60 cell}

haveTrainedAllClasses = numel(unique(Y(1:initobs))) == 5

28-53

28

Incremental Learning

haveTrainedAllClasses = logical 1

Mdl is an incrementalClassificationNaiveBayes model object. The model is warm (IsWarm is 1) because all the following conditions apply: • The initial training data contains all expected classes (haveTrainedAllClasses is true). • Mdl is fit to Mdl.MetricsWarmupPeriod observations. Therefore, the model is prepared to generate predictions, and incremental learning functions measure performance metrics within the model. Perform Incremental Learning with Conditional Training Suppose that you want to train the model only when the most recent 1000 observations have a misclassification error greater than 5%. Perform incremental learning, with conditional training, by following this procedure for each iteration: 1

Simulate a data stream by processing a chunk of 100 observations at a time.

2

Update the model performance by passing the model and current chunk of data to updateMetrics. Overwrite the input model with the output model.

3

Store the misclassification error rate and the mean of the first predictor in the second class μ21 to see how they evolve during training.

4

Fit the model to the chunk of data only when the misclassification error rate is greater than 0.05. Overwrite the input model with the output model when training occurs.

5

Track when fit trains the model.

% Preallocation numObsPerChunk = 100; nchunk = floor((n - initobs)/numObsPerChunk); mu21 = zeros(nchunk,1); ce = array2table(nan(nchunk,2),'VariableNames',["Cumulative" "Window"]); trained = false(nchunk,1); % Incremental fitting for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1 + initobs); iend = min(n,numObsPerChunk*j + initobs); idx = ibegin:iend; Mdl = updateMetrics(Mdl,X(idx,:),Y(idx)); ce{j,:} = Mdl.Metrics{"ClassificationError",:}; if ce{j,"Window"} > 0.05 Mdl = fit(Mdl,X(idx,:),Y(idx)); trained(j) = true; end mu21(j) = Mdl.DistributionParameters{2,1}(1); end

Mdl is an incrementalClassificationNaiveBayes model object trained on all the data in the stream. 28-54

Perform Conditional Training During Incremental Learning

To see how the model performance and μ21 evolve during training, plot them on separate tiles. Identify periods during which the model is trained. t = tiledlayout(2,1); nexttile plot(mu21) hold on plot(find(trained),mu21(trained),'r.') ylabel('\mu_{21}') legend('\mu_{21}','Training occurs','Location','best') hold off nexttile plot(ce.Variables) ylabel('Misclassification Error Rate') legend(ce.Properties.VariableNames,'Location','best') xlabel(t,'Iteration')

The trace plot of μ21 shows periods of constant values, during which the model performance within the previous observation window is at most 0.05.

See Also Objects incrementalClassificationNaiveBayes

28-55

28

Incremental Learning

Functions predict | fit | updateMetrics

More About

28-56

•

“Configure Incremental Learning Model” on page 28-14

•

“Implement Incremental Learning for Classification Using Flexible Workflow” on page 28-37

•

“Perform Text Classification Incrementally” on page 28-57

Perform Text Classification Incrementally

Perform Text Classification Incrementally This example shows how to incrementally train a model to classify documents based on word frequencies in the documents; a bag-of-words model. Load the NLP data set, which contains a sparse matrix of word frequencies X computed from MathWorks® documentation. Labels Y are the toolbox documentation to which the page belongs. load nlpdata

For more details on the data set, such as the dictionary and corpus, enter Description. The observations are arranged by label. Because the incremental learning software does not start computing performance metrics until it processes all labels at least once, shuffle the data set. [n,p] = size(X) n = 31572 p = 34023 rng(1); shflidx = randperm(n); X = X(shflidx,:); Y = Y(shflidx);

Determine the number of classes in the data. cats = categories(Y); maxNumClasses = numel(cats);

Create a naive Bayes incremental learner. Specify the number of classes, a metrics warmup period of 0, and a metrics window size of 1000. Because predictor j is the word frequency of word j in the dictionary, specify that the predictors are conditionally, jointly multinomial, given the class. Mdl = incrementalClassificationNaiveBayes(MaxNumClasses=maxNumClasses,... MetricsWarmupPeriod=0,MetricsWindowSize=1000,DistributionNames='mn');

Mdl is an incrementalClassificationNaiveBayes object. Mdl is a cold model because it has not processed observation; it represents a template for training. Measure the model performance and fit the incremental model to the training data by using the updateMetricsAndFit function. Simulate a data stream by processing chunks of 1000 observations at a time. At each iteration: 1

Process 1000 observations.

2

Overwrite the previous incremental model with a new one fitted to the incoming observations.

3

Store the current minimal cost.

This stage can take several minutes to run. numObsPerChunk = 1000; nchunks = floor(n/numObsPerChunk); mc = array2table(zeros(nchunks,2),'VariableNames',["Cumulative" "Window"]);

28-57

28

Incremental Learning

for j = 1:nchunks ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; XChunk = full(X(idx,:)); Mdl = updateMetricsAndFit(Mdl,XChunk,Y(idx)); mc{j,:} = Mdl.Metrics{"MinimalCost",:}; end

Mdl is an incrementalClassificationNaiveBayes model object trained on all the data in the stream. During incremental learning and after the model is warmed up, updateMetricsAndFit checks the performance of the model on the incoming chunk of observations, and then fits the model to those observations. Plot the minimal cost to see how it evolved during training. figure plot(mc.Variables) ylabel('Minimal Cost') legend(mc.Properties.VariableNames) xlabel('Iteration')

The cumulative minimal cost smoothly decreases and settles near 0.16, while the minimal cost computed for the chunk jumps between 0.14 and 0.18.

28-58

Perform Text Classification Incrementally

See Also Objects incrementalClassificationNaiveBayes Functions predict | fit | updateMetrics

More About •

“Configure Incremental Learning Model” on page 28-14

•

“Implement Incremental Learning for Classification Using Flexible Workflow” on page 28-37

•

“Incremental Learning with Naive Bayes and Heterogeneous Data” on page 28-60

28-59

28

Incremental Learning

Incremental Learning with Naive Bayes and Heterogeneous Data This example shows how to prepare heterogeneous predictor data, containing real-valued and categorical measurements, for incremental learning using a naive Bayes classifier. Naive Bayes classifiers for incremental learning support only numeric predictor data sets, but they can adapt to unseen categorical levels during training. If your data is heterogeneous and contained in a table, you must preprocess before performing incremental learning by following this general procedure: 1

Create a running hash map for each categorical variable by using container.Map MATLAB® objects. The hash map assigns a string to a unique numeric value, and it can easily adapt to new levels. Although you can create a cold hash map, this example assumes the first 50 observations from the data are available for populating a hash map and warming up the model.

2

Consistently concatenate all real-valued measurements with the numeric categorical levels.

Load and Preprocess Data Load the 1994 US Census data set. The learning objective is to predict a US citizen's salary (salary, either 50K) from several heterogeneous measurements on the citizen. load census1994.mat

The training data is in the table adultdata. For details on the data set, enter Description. Remove all observations containing at least one missing value from the data. adultdata = adultdata(~any(ismissing(adultdata),2),:); [n,p] = size(adultdata); p = p - 1; % Number of predictor variables

Suppose only the first 50 observations are currently available. n0 = 50; sample0 = adultdata(1:n0,:);

Create Initial Hash Maps Identify all categorical variables in the data, and determine their levels. catpredidx = table2array(varfun(@iscategorical,adultdata(:,1:(end-1)))); numcatpreds = sum(catpredidx); lvlstmp = varfun(@unique,adultdata(:,catpredidx),OutputFormat="cell"); lvls0 = cell(1,p); lvls0(catpredidx) = lvlstmp;

For each categorical variable, create an initial hash map that assigns an integer, from 1 to the number of corresponding levels, to each level. Store all hash maps in a cell vector. catmaps = cell(1,p); J = find(catpredidx); for j = J numlvls = numel(lvls0{j});

28-60

Incremental Learning with Naive Bayes and Heterogeneous Data

catmaps{j} = containers.Map(cellstr(lvls0{j}),1:numlvls); end example1 = catmaps{find(catpredidx,1)} example1 = Map with properties: Count: 7 KeyType: char ValueType: double val = example1('Private') val = 3

catmaps is a numcatpreds-by-1 cell vector of containers.Map objects, each representing a hash map for the corresponding categorical variable. For example, the first hash map assigns 3 to the level 'Private'. Represent Categorical Variables as Numeric The supporting, local function processPredictorData has the following characteristics: • Accept a table containing categorical and numeric variables, and the current cell vector of hash maps for each categorical variable. • Return a matrix of homogenous, numeric predictor data with categorical variables replaced by numeric variables. The function replaces string-based levels with positive integers. • Return an updated cell vector of hash maps when the input data contains variables with levels unknown to the current hash map. Represent the categorical data in the initial sample as numeric by using processPredictorData. [X0,catmaps] = processPredictorData(sample0(:,1:(end-1)),catmaps); y0 = adultdata.salary(1:n0);

Fit Naive Bayes Model to Initial Sample Fit a naive Bayes model to the initial sample. Identify the categorical variables. Mdl = fitcnb(X0,y0,CategoricalPredictors=catpredidx);

Mdl is a ClassificationNaiveBayes model. Prepare Naive Bayes Model for Incremental Learning Covert the traditionally trained naive Bayes model to an incremental learner. Specify that the incremental model should base window metrics on 2000 observations. IncrementalMdl = incrementalLearner(Mdl,MetricsWindowSize=2000);

IncrementalMdl is a warmed incrementalClassificationNaiveBayes object prepared for incremental learning. incrementalLearner initializes the parameters of the conditional distributions of the predictor variables with the values learned from the initial sample.

28-61

28

Incremental Learning

Perform Incremental Learning Measure the model performance and fit the incremental model to the training data by using the updateMetricsAndFit function. Simulate a data stream by processing chunks of 100 observations at a time. At each iteration: 1

Process the predictor data and update the hash maps in the incoming 100 observations by using processPredictorData.

2

Fit a naive Bayes model to the processed data.

3

Overwrite the previous incremental model with a new one fitted to the incoming observations.

4

Store the current minimal cost and the learned conditional probability of selecting a female US citizen given each salary level.

numObsPerChunk = 100; nchunks = floor(n/numObsPerChunk); mc = array2table(zeros(nchunks,2),'VariableNames',["Cumulative" "Window"]); catdistms = zeros(nchunks,2); sexidx = string(adultdata.Properties.VariableNames) == "sex"; fidx = string(keys(catmaps{sexidx(1:end-1)})) == "Female"; for j = 1:nchunks ibegin = min(n,numObsPerChunk*(j-1) + 1 + n0); iend = min(n,numObsPerChunk*j + n0); idx = ibegin:iend; [XChunk,catmaps] = processPredictorData(adultdata(idx,1:(end-1)),catmaps); IncrementalMdl = updateMetricsAndFit(IncrementalMdl,XChunk,adultdata.salary(idx)); mc{j,:} = IncrementalMdl.Metrics{"MinimalCost",:}; catdistms(j,1) = IncrementalMdl.DistributionParameters{1,sexidx}(fidx); catdistms(j,2) = IncrementalMdl.DistributionParameters{2,sexidx}(fidx); end

IncrementalMdl is an incrementalClassificationNaiveBayes object incrementally fit to the entire stream. During incremental learning, updateMetricsAndFit checks the performance of the model on the incoming chunk of observations, and then fits the model to those observations. Plot the cumulative and window minimal cost computed during incremental learning. figure plot(mc.Variables) ylabel('Minimal Cost') legend(mc.Properties.VariableNames) xlabel('Iteration')

28-62

Incremental Learning with Naive Bayes and Heterogeneous Data

The cumulative loss gradually changes with each iteration (chunk of 100 observations), whereas the window loss jumps. Because the metrics window is 2000, updateMetricsAndFit measures the performance every 20 iterations. Plot the running probability of selecting a female within each salary level.

figure plot(catdistms) ylabel('P(Female|Salary=y)') legend(sprintf("y=%s",IncrementalMdl.ClassNames(1)),sprintf("y=%s",IncrementalMdl.ClassNames(2))) xlabel('Iteration')

28-63

28

Incremental Learning

The fitted probabilities gradually settle during incremental learning. Compare Performance on Test Data Fit a naive Bayes classifier to the entire training data set. MdlTT = fitcnb(adultdata,"salary");

MdlTT is a traditionally trained ClassificationNaiveBayes object. Compute the minimal cost of the traditionally trained model on the test data adulttest. adulttest = adulttest(~any(ismissing(adulttest),2),:); % Remove missing values mctt = loss(MdlTT,adulttest) mctt = 0.1773

Process the predictors of the test data by using processPredictorData, and then compute the minimal cost of incremental learning model on the test data. XTest = processPredictorData(adulttest(:,1:(end-1)),catmaps); ilmc = loss(IncrementalMdl,XTest,adulttest.salary) ilmc = 0.1657

The minimal costs between the incremental model and the traditionally trained model are nearly the same. 28-64

Incremental Learning with Naive Bayes and Heterogeneous Data

Supporting Functions function [Pred,maps] = processPredictorData(tbl,maps) % PROCESSPREDICTORDATA Process heterogeneous data to homogeneous numeric % data % % Input arguments: % tbl: A table of raw input data % maps: A cell vector of container.Map hash maps. Cells correspond to % categorical variables in tbl. % % Output arguments: % Pred: A numeric matrix of data with the same dimensions as tbl. Numeric % variables in tbl are assigned to the corresponding column of Pred, % categorical variables in tbl are processed and placed in the % corresponding column of Pred. catidx = varfun(@iscategorical,tbl,OutputFormat="uniform"); numidx = ~catidx; numcats = sum(catidx); p = numcats + sum(numidx); currlvlstmp = varfun(@unique,tbl(:,catidx),OutputFormat="cell"); currlvls0 = cell(1,p); currlvls0(catidx) = currlvlstmp; currlvlstmp = cellfun(@categories,currlvls0(catidx),UniformOutput=false); currlvls = cell(1,p); currlvls(catidx) = currlvlstmp; Pred = zeros(size(tbl)); Pred(:,numidx) = tbl{:,numidx}; J = find(catidx); for j = J hasNewlvl = ~isKey(maps{j},currlvls{j}); if any(hasNewlvl) newcats = currlvls{j}(hasNewlvl); numnewcats = sum(hasNewlvl); g = numel(maps{j}.Count); for h = 1:numnewcats g = g + 1; maps{j}(newcats{h}) = g; end end conv2cell = cellstr(tbl{:,j}); Pred(:,j) = cell2mat(values(maps{j},conv2cell)); end end

See Also Objects incrementalClassificationNaiveBayes | ClassificationNaiveBayes | containers.Map Functions loss | fit | updateMetrics

28-65

28

Incremental Learning

More About

28-66

•

“Configure Incremental Learning Model” on page 28-14

•

“Implement Incremental Learning for Classification Using Flexible Workflow” on page 28-37

•

“Perform Text Classification Incrementally” on page 28-57

Monitor Equipment State of Health Using Drift-Aware Learning

Monitor Equipment State of Health Using Drift-Aware Learning This example shows how to automate the process of monitoring the state of health for a cooling system using an incremental drift-aware learning model and Streaming Data Framework for MATLAB® Production Server™. Machinery manufacturing facilities use cooling systems to regulate the temperature of operating equipment. Monitoring the state of these cooling systems requires significant effort from maintenance engineers. One way to reduce this effort is to automate the monitoring process by using machine learning. This example provides two apps to automate the process: uploadApp, which uploads data containing sensor readings from a cooling system; and dashboard, which you use to monitor the cooling system. Upload Sensor Readings You use the app uploadApp, designed using App Designer, to mimic the process of uploading sensor data to the local Kafka® server. The app uploads batches of unlabeled sensor readings along with periodic, human-annotated labeled sensor readings. The sensors capture the following readings: • Voltage — Voltage measurement across the cooling system • Temperature — Temperature of the cooling system • Fan speed — The revolutions per minute (RPM) of the fan used in the cooling system The cooling system has three states: • Normal • High load • Fan issue To use the uploadApp app, you must have a Kafka server running locally at port 9092 with two topics, unlabeled and labeled. For more details on uploading data using a Kafka server, see writetimetable (MATLAB Production Server) and “Streaming Data Framework for MATLAB Production Server Basics” (MATLAB Production Server). Open the uploadApp app. uploadApp

28-67

28

Incremental Learning

Click Start Streaming. The uploadApp app uploads 60,000 unlabeled sensor readings and 6000 labeled sensor readings, stored in the files unlabeledData.mat and labeledData.mat, respectively. Each sensor reading is sampled every 0.1 second. The Upload Status light turns red while the app uploads the unlabeled and labeled data. When the upload is complete, the status light turns green. The Kafka server now contains data that can be accessed by anyone responsible for maintaining a cooling system. Monitor Cooling System You use the dashboard app, also created using App Designer, to monitor the state of health of the cooling system. The dashboard app performs the following functions: 1

Predict the state of health of the cooling system using the sensor readings.

2

Monitor the data for drift using the human-annotated labels for the sensor readings.

The dashboard app has three tabs: • Main Dashboard — Provides information on the current state of the cooling system • Sensor Reading — Provides information about the sensor readings from the previous two minutes • Model Performance — Contains plots that highlight how the loss and the drift status of the learner vary over time Open the dashboard app. dashboard

28-68

Monitor Equipment State of Health Using Drift-Aware Learning

On the Main Dashboard tab, click Load Model to load the incremental drift-aware learning model (incrementalDriftAwareLearner) stored in the file warmupModel.mat. The model is a warmedup model that uses an incremental Naive Bayes classification model as a base learner. After the app loads the drift-aware model, the app is ready to start streaming the data uploaded to the Kafka server. You can initiate the streaming process by setting Streaming Options to On. Turning Streaming Options on the app to take the following actions: 1

Read 1200 observations of unlabeled data at a time from the Kafka server. Each sensor reading is sampled every 0.1 second, and 1200 observations correspond to observations for two minutes. The app uses a kafkaStream (MATLAB Production Server) object to read the uploaded sensor readings from the Kafka server by accessing the unlabeled topic. The app also stores the readings in a timetable.

2

Predict unlabeled sensor readings. The app predicts the state of the cooling system corresponding to the two-minute duration (1200 observations) by using the model loaded from warmupModel.mat and the sensor readings stored in the timetable.

3

Update the cooling system status. The app computes the mode of the predicted labels (that is, the most frequently predicted label) in the two-minute duration and uses the mode to update the Cooling System Status value. The status light can be green, blue, or red to represent that the cooling system is normal, under high load, or having a fan issue, respectively. The app updates the Number of Sensor Readings value to reflect the number of unlabeled sensor readings downloaded from the server.

4

Update the sensor readings with the most recent values. The app plots the most recently downloaded batch (1200 observations) of sensor data on the Sensor Reading tab.

5

Read and check 100 observations of labeled data at a time. The app uses labeled data to monitor drift in the data distribution. If labeled data is available, the app downloads 100 observations at a time. The app uses a kafkaStream (MATLAB Production Server) object to read the labeled sensor readings from the Kafka server by accessing the labeled topic.

6

Check for drift and compute the misclassification loss over the 100 observations. The drift-aware model uses the 100 labeled sensor readings to determine if the data is undergoing a drift. The app computes the misclassification loss by averaging the loss values returned by the perObservationLoss function for the 100 observations.

7

Update the drift status and loss plots. The app updates the plots on the Model Performance tab to show the status of the model (stable, warning, or drift) and the misclassification loss over the last batch of labeled data.

8

Update the model status. The Model Status turns green if the drift-aware model is warm (that is, the IsWarm property of the model is true), and the light turns red if the model is not warm (that is, the IsWarm property of the model is false). If the model detects a drift, the app overwrites the model with a new one. The model can return reliable labels after it has enough observations to become warm. For details, see the property “IsWarm” on page 35-0 and the topic “Incremental Drift-Aware Learning” on page 35-3861.

View the Sensor Reading tab to inspect the sensor readings of the data received in the previous two minutes.

28-69

28

Incremental Learning

View the Model Performance tab to monitor the drift in the data.

28-70

Monitor Equipment State of Health Using Drift-Aware Learning

See Also kafkaStream | incrementalDriftAwareLearner | perObservationLoss

Related Examples •

“Incremental Learning Overview” on page 28-2

•

“Streaming Data Framework for MATLAB Production Server Basics” (MATLAB Production Server)

28-71

28

Incremental Learning

Monitor Equipment State of Health Using Drift-Aware Learning on the Cloud This example describes the set up necessary to run the deployed version of the “Monitor Equipment State of Health Using Drift-Aware Learning” on page 28-67 example on the cloud. The topic shows how to automate the process of monitoring the state of health for a cooling system using an incremental drift-aware learning model using the infrastructure in the next figure. This example requires Statistics and Machine Learning Toolbox™, MATLAB® Compiler SDK™, MATLAB Production Server™, and MATLAB Web App Server™.

The following figure shows the architecture for drift-aware learning. The architecture involves two analytics functions processUnlabeledData.ctf and processLabeledData.ctf that are deployed on the MATLAB Production Server (MPS). The processUnlabeledData.ctf loads the model, uses it to predict the health of the equipment, and writes the sensor readings and predictions to the output topic processedUnlabeled. The processedLabeledData.ctf reads the labeled data from input topic labeled and uses the labeled data to check the model for drift. It then outputs the labeled data, along with the drift diagnostic information, into the output topic processedLabeled. A dashboard deployed on the Web App Server then reads the processed data. The dashboard enables to visually inspect the sensor readings and model performance. Redis server enables the updated model to be shared between the labeled and unlabeled CTFs. 28-72

Monitor Equipment State of Health Using Drift-Aware Learning on the Cloud

Upload Sensor Readings Start Kafka server from a dockerized container. To upload sensor readings, you must first create the Kafka topics: unlabeled, labeled, processedUnlabeled, and processedLabeled. You must have a Kafka server running at kafka.host.com at port number 9092 for this example. For more details on uploading data using a Kafka server, see “Streaming Data Framework for MATLAB Production Server Basics” (MATLAB Production Server). Upload readings via uploadApp. Similar to the “Monitor Equipment State of Health Using Drift-Aware Learning” on page 28-67 example, upload the unlabeled and labeled data (provided in UnlabeledData.mat and LabeledData.mat) to the Kafka server using the unlabeled and labeled topics via the uploadApp2, also provided as a supporting file. The uploadApp2 deployable archive runs on the Web App Server. To learn more about how to set up a deployable archive on Web App Server, refer to the instructions in “Create Web App” (MATLAB Compiler).

After deploying the uploadApp2, click Start Streaming. The app uploads 60,000 unlabeled sensor readings and 6000 labeled sensor readings, stored in the files unlabeledData.mat and labeledData.mat, respectively. Each sensor reading is sampled every 0.1 second. 28-73

28

Incremental Learning

The Upload Status light turns red while the app uploads the unlabeled and labeled data. When the upload is complete, the status light turns green. The Kafka server now contains data that can be accessed by anyone responsible for maintaining a cooling system. Monitor Cooling System To start monitoring the cooling system, first you must create the deployable archives processUnlabeledData.ctf and processLabeledData.ctf. To deploy these two archives, you must have the MATLAB Production Server running. Start with processUnlabeledData.ctf first since it loads the drift-aware learning model into the Redis cache. The model is a warmed-up incremental drift-aware learner with incremental Naive Bayes classification model as the base learner. Create deployable archive processUnlabeledData.ctf and move it to MPS. Package processUnlabeledData.ctf into a deployable archive by running the below script in the supporting file createCTFUnlabeled.m. kafkaHost = "kafka.host.com"; kafkaPort = 9092;

inputKS = kafkaStream(kafkaHost,kafkaPort,"unlabeled","Rows",1200,"RequestTimeout", 60,'Timestamp outputKS = kafkaStream(kafkaHost,kafkaPort,"processedUnlabeled","Rows",1200,"RequestTimeout",60);

archive = streamingDataCompiler("processUnlabeledData",inputKS,outputKS,StateStore="LocalRedis",I

To learn more about these steps, see “Package Streaming Analytic Function” (MATLAB Production Server) and “Deploy Streaming Analytic Function to Server” (MATLAB Production Server). Start Kafka connector. Start Kafka connector at port 1234 for processUnlabeledData.ctf. Windows

powershell -executionPolicy bypass -File kafka-connector-start.ps1 -out out.log -err error.log -c

Linux chmod +x kafka-connector-start.sh

!./kafka-connector-start.sh -out out.log -err error.log -c collector.properties -k kafka.properti

Starting the Kafka connector initializes processUnlabeledData.ctf, that is, uploads the warm incremental drift-aware learner model and the processUnlabeledData deployable archive listens for data in the input topic unlabeled. Create deployable archive processLabeledData and move it to MPS. Repeat the above steps to create and deploy processLabeledData.ctf using the supporting file createCTFLabeled.m and start the Kafka connector at port 1235 for the processLabeledData.ctf. The processLabeledData archive begins to read the labeled data and updates the model parameters in the event of a drift in the data distribution. The Redis cache allows for the model to be shared between the two deployable archives.

28-74

Monitor Equipment State of Health Using Drift-Aware Learning on the Cloud

Open the dashboard app hosted on Web App Server. The output topics processedUnlabeled and processedLabeled contain the processed data. Once both processUnlabeledData.ctf and processLabeledData.ctf are processed, the processedUnlabeled topic contains the sensor readings along with the model predictions, whereas the processLabeled topic contains the sensor readings along with the model drift diagnostic information. The dashboard2 app reads the processed data from the processedUnlabeled and processedLabeled topics. The app runs on the Web App Server, deploy the dashboard2 app using the MATLAB Web App server by following the instructions in “Create Web App” (MATLAB Compiler).

Once deployed, the dashboard2 app is ready to start streaming the data uploaded to the Kafka server. You can initiate the streaming process by setting Streaming Options to On. View the Sensor Reading tab to inspect the sensor readings of the data received in the previous two minutes.

28-75

28

Incremental Learning

View the Model Performance tab to monitor the drift in the data.

28-76

Monitor Equipment State of Health Using Drift-Aware Learning on the Cloud

See Also incrementalDriftAwareLearner | perObservationLoss

Related Examples •

“Incremental Learning Overview” on page 28-2

•

“Streaming Data Framework for MATLAB Production Server Basics” (MATLAB Production Server)

•

“Monitor Equipment State of Health Using Drift-Aware Learning” on page 28-67

28-77

29 Markov Models • “Markov Chains” on page 29-2 • “Hidden Markov Models (HMM)” on page 29-4

29

Markov Models

Markov Chains Markov processes are examples of stochastic processes—processes that generate random sequences of outcomes or states according to certain probabilities. Markov processes are distinguished by being memoryless—their next state depends only on their current state, not on the history that led them there. Models of Markov processes are used in a wide variety of applications, from daily stock prices to the positions of genes in a chromosome. A Markov model is given visual representation with a state diagram, such as the one below.

The rectangles in the diagram represent the possible states of the process you are trying to model, and the arrows represent transitions between states. The label on each arrow represents the probability of that transition. At each step of the process, the model may generate an output, or emission, depending on which state it is in, and then make a transition to another state. An important characteristic of Markov models is that the next state depends only on the current state, and not on the history of transitions that lead to the current state. For example, for a sequence of coin tosses the two states are heads and tails. The most recent coin toss determines the current state of the model and each subsequent toss determines the transition to the next state. If the coin is fair, the transition probabilities are all 1/2. The emission might simply be the current state. In more complicated models, random processes at each state will generate emissions. You could, for example, roll a die to determine the emission at any step. Markov chains are mathematical descriptions of Markov models with a discrete set of states. Markov chains are characterized by: • A set of states {1, 2, ..., M} • An M-by-M transition matrix T whose i,j entry is the probability of a transition from state i to state j. The sum of the entries in each row of T must be 1, because this is the sum of the probabilities of making a transition from a given state to each of the other states. • A set of possible outputs, or emissions, {s1, s2, ... , sN}. By default, the set of emissions is {1, 2, ... , N}, where N is the number of possible emissions, but you can choose a different set of numbers or symbols. • An M-by-N emission matrix E whose i,k entry gives the probability of emitting symbol sk given that the model is in state i. 29-2

Markov Chains

Markov chains begin in an initial state i0 at step 0. The chain then transitions to state i1 with probability T1i1, and emits an output sk1 with probability Ei1k1. Consequently, the probability of observing the sequence of states i1i2...ir and the sequence of emissions sk1sk2...skr in the first r steps, is T1i1Ei1k1Ti1i2Ei2k2...Tir − 1ir Eir k

See Also Related Examples •

“Hidden Markov Models (HMM)” on page 29-4

29-3

29

Markov Models

Hidden Markov Models (HMM) In this section... “Introduction to Hidden Markov Models (HMM)” on page 29-4 “Analyzing Hidden Markov Models” on page 29-5

Introduction to Hidden Markov Models (HMM) A hidden Markov model (HMM) is one in which you observe a sequence of emissions, but do not know the sequence of states the model went through to generate the emissions. Analyses of hidden Markov models seek to recover the sequence of states from the observed data. As an example, consider a Markov model with two states and six possible emissions. The model uses: • A red die, having six sides, labeled 1 through 6. • A green die, having twelve sides, five of which are labeled 2 through 6, while the remaining seven sides are labeled 1. • A weighted red coin, for which the probability of heads is .9 and the probability of tails is .1. • A weighted green coin, for which the probability of heads is .95 and the probability of tails is .05. The model creates a sequence of numbers from the set {1, 2, 3, 4, 5, 6} with the following rules: • Begin by rolling the red die and writing down the number that comes up, which is the emission. • Toss the red coin and do one of the following: • If the result is heads, roll the red die and write down the result. • If the result is tails, roll the green die and write down the result. • At each subsequent step, you flip the coin that has the same color as the die you rolled in the previous step. If the coin comes up heads, roll the same die as in the previous step. If the coin comes up tails, switch to the other die. The state diagram for this model has two states, as shown in the following figure.

29-4

Hidden Markov Models (HMM)

You determine the emission from a state by rolling the die with the same color as the state. You determine the transition to the next state by flipping the coin with the same color as the state. The transition matrix is: T=

0.9 0.1 0.05 0.95

The emissions matrix is: 1 6 E= 7 12

1 6 1 12

1 6 1 12

1 6 1 12

1 6 1 12

1 6 1 12

The model is not hidden because you know the sequence of states from the colors of the coins and dice. Suppose, however, that someone else is generating the emissions without showing you the dice or the coins. All you see is the sequence of emissions. If you start seeing more 1s than other numbers, you might suspect that the model is in the green state, but you cannot be sure because you cannot see the color of the die being rolled. Hidden Markov models raise the following questions: • Given a sequence of emissions, what is the most likely state path? • Given a sequence of emissions, how can you estimate transition and emission probabilities of the model? • What is the forward probability that the model generates a given sequence? • What is the posterior probability that the model is in a particular state at any point in the sequence?

Analyzing Hidden Markov Models • “Generating a Test Sequence” on page 29-6 • “Estimating the State Sequence” on page 29-6 • “Estimating Transition and Emission Matrices” on page 29-6 • “Estimating Posterior State Probabilities” on page 29-8 • “Changing the Initial State Distribution” on page 29-8 Statistics and Machine Learning Toolbox functions related to hidden Markov models are: • hmmgenerate — Generates a sequence of states and emissions from a Markov model • hmmestimate — Calculates maximum likelihood estimates of transition and emission probabilities from a sequence of emissions and a known sequence of states • hmmtrain — Calculates maximum likelihood estimates of transition and emission probabilities from a sequence of emissions • hmmviterbi — Calculates the most probable state path for a hidden Markov model • hmmdecode — Calculates the posterior state probabilities of a sequence of emissions This section shows how to use these functions to analyze hidden Markov models. 29-5

29

Markov Models

Generating a Test Sequence The following commands create the transition and emission matrices for the model described in the “Introduction to Hidden Markov Models (HMM)” on page 29-4: TRANS = [.9 .1; .05 .95]; EMIS = [1/6, 1/6, 1/6, 1/6, 1/6, 1/6;... 7/12, 1/12, 1/12, 1/12, 1/12, 1/12];

To generate a random sequence of states and emissions from the model, use hmmgenerate: [seq,states] = hmmgenerate(1000,TRANS,EMIS);

The output seq is the sequence of emissions and the output states is the sequence of states. hmmgenerate begins in state 1 at step 0, makes the transition to state i1 at step 1, and returns i1 as the first entry in states. To change the initial state, see “Changing the Initial State Distribution” on page 29-8. Estimating the State Sequence Given the transition and emission matrices TRANS and EMIS, the function hmmviterbi uses the Viterbi algorithm to compute the most likely sequence of states the model would go through to generate a given sequence seq of emissions: likelystates = hmmviterbi(seq, TRANS, EMIS);

likelystates is a sequence the same length as seq. To test the accuracy of hmmviterbi, compute the percentage of the actual sequence states that agrees with the sequence likelystates. sum(states==likelystates)/1000 ans = 0.8200

In this case, the most likely sequence of states agrees with the random sequence 82% of the time. Estimating Transition and Emission Matrices • “Using hmmestimate” on page 29-6 • “Using hmmtrain” on page 29-7 The functions hmmestimate and hmmtrain estimate the transition and emission matrices TRANS and EMIS given a sequence seq of emissions. Using hmmestimate

The function hmmestimate requires that you know the sequence of states states that the model went through to generate seq. The following takes the emission and state sequences and returns estimates of the transition and emission matrices: [TRANS_EST, EMIS_EST] = hmmestimate(seq, states)

29-6

Hidden Markov Models (HMM)

TRANS_EST = 0.8989 0.1011 0.0585 0.9415 EMIS_EST = 0.1721 0.1721 0.5836 0.0741

0.1749 0.0804

0.1612 0.0789

0.1803 0.0726

0.1393 0.1104

You can compare the outputs with the original transition and emission matrices, TRANS and EMIS: TRANS TRANS = 0.9000 0.0500

0.1000 0.9500

EMIS EMIS = 0.1667 0.5833

0.1667 0.0833

0.1667 0.0833

0.1667 0.0833

0.1667 0.0833

0.1667 0.0833

Using hmmtrain

If you do not know the sequence of states states, but you have initial guesses for TRANS and EMIS, you can still estimate TRANS and EMIS using hmmtrain. Suppose you have the following initial guesses for TRANS and EMIS. TRANS_GUESS = [.85 .15; .1 .9]; EMIS_GUESS = [.17 .16 .17 .16 .17 .17;.6 .08 .08 .08 .08 08];

You estimate TRANS and EMIS as follows: [TRANS_EST2, EMIS_EST2] = hmmtrain(seq, TRANS_GUESS, EMIS_GUESS) TRANS_EST2 = 0.2286 0.7714 0.0032 0.9968 EMIS_EST2 = 0.1436 0.2348 0.4355 0.1089

0.1837 0.1144

0.1963 0.1082

0.2350 0.1109

0.0066 0.1220

hmmtrain uses an iterative algorithm that alters the matrices TRANS_GUESS and EMIS_GUESS so that at each step the adjusted matrices are more likely to generate the observed sequence, seq. The algorithm halts when the matrices in two successive iterations are within a small tolerance of each other. If the algorithm fails to reach this tolerance within a maximum number of iterations, whose default value is 100, the algorithm halts. In this case, hmmtrain returns the last values of TRANS_EST and EMIS_EST and issues a warning that the tolerance was not reached. If the algorithm fails to reach the desired tolerance, increase the default value of the maximum number of iterations with the command: hmmtrain(seq,TRANS_GUESS,EMIS_GUESS,'maxiterations',maxiter)

where maxiter is the maximum number of steps the algorithm executes. 29-7

29

Markov Models

Change the default value of the tolerance with the command: hmmtrain(seq, TRANS_GUESS, EMIS_GUESS, 'tolerance', tol)

where tol is the desired value of the tolerance. Increasing the value of tol makes the algorithm halt sooner, but the results are less accurate. Two factors reduce the reliability of the output matrices of hmmtrain: • The algorithm converges to a local maximum that does not represent the true transition and emission matrices. If you suspect this, use different initial guesses for the matrices TRANS_EST and EMIS_EST. • The sequence seq may be too short to properly train the matrices. If you suspect this, use a longer sequence for seq. Estimating Posterior State Probabilities The posterior state probabilities of an emission sequence seq are the conditional probabilities that the model is in a particular state when it generates a symbol in seq, given that seq is emitted. You compute the posterior state probabilities with hmmdecode: PSTATES = hmmdecode(seq,TRANS,EMIS)

The output PSTATES is an M-by-L matrix, where M is the number of states and L is the length of seq. PSTATES(i,j) is the conditional probability that the model is in state i when it generates the jth symbol of seq, given that seq is emitted. hmmdecode begins with the model in state 1 at step 0, prior to the first emission. PSTATES(i,1) is the probability that the model is in state i at the following step 1. To change the initial state, see “Changing the Initial State Distribution” on page 29-8. To return the logarithm of the probability of the sequence seq, use the second output argument of hmmdecode: [PSTATES,logpseq] = hmmdecode(seq,TRANS,EMIS)

The probability of a sequence tends to 0 as the length of the sequence increases, and the probability of a sufficiently long sequence becomes less than the smallest positive number your computer can represent. hmmdecode returns the logarithm of the probability to avoid this problem. Changing the Initial State Distribution By default, Statistics and Machine Learning Toolbox hidden Markov model functions begin in state 1. In other words, the distribution of initial states has all of its probability mass concentrated at state 1. To assign a different distribution of probabilities, p = [p1, p2, ..., pM], to the M initial states, do the following: 1

Create an M+1-by-M+1 augmented transition matrix, T of the following form: T =

0 p 0 T

where T is the true transition matrix. The first column of T contains M+1 zeros. p must sum to 1. 2

29-8

Create an M+1-by-N augmented emission matrix, E , that has the following form:

Hidden Markov Models (HMM)

E =

0 E

If the transition and emission matrices are TRANS and EMIS, respectively, you create the augmented matrices with the following commands: TRANS_HAT = [0 p; zeros(size(TRANS,1),1) TRANS]; EMIS_HAT = [zeros(1,size(EMIS,2)); EMIS];

See Also hmmdecode | hmmestimate | hmmgenerate | hmmtrain | hmmviterbi

More About •

“Markov Chains” on page 29-2

29-9

30 Design of Experiments • “Design of Experiments” on page 30-2 • “Full Factorial Designs” on page 30-3 • “Fractional Factorial Designs” on page 30-5 • “Response Surface Designs” on page 30-8 • “D-Optimal Designs” on page 30-12 • “Improve an Engine Cooling Fan Using Design for Six Sigma Techniques” on page 30-19

30

Design of Experiments

Design of Experiments Passive data collection leads to a number of problems in statistical modeling. Observed changes in a response variable may be correlated with, but not caused by, observed changes in individual factors (process variables). Simultaneous changes in multiple factors may produce interactions that are difficult to separate into individual effects. Observations may be dependent, while a model of the data considers them to be independent. Designed experiments address these problems. In a designed experiment, the data-producing process is actively manipulated to improve the quality of information and to eliminate redundant data. A common goal of all experimental designs is to collect data as parsimoniously as possible while providing sufficient information to accurately estimate model parameters. For example, a simple model of a response y in an experiment with two controlled factors x1 and x2 might look like this: y = β0 + β1x1 + β2x2 + β3x1x2 + ε Here ε includes both experimental error and the effects of any uncontrolled factors in the experiment. The terms β1x1 and β2x2 are main effects and the term β3x1x2 is a two-way interaction effect. A designed experiment would systematically manipulate x1 and x2 while measuring y, with the objective of accurately estimating β0, β1, β2, and β3.

30-2

Full Factorial Designs

Full Factorial Designs In this section... “Multilevel Designs” on page 30-3 “Two-Level Designs” on page 30-3

Multilevel Designs To systematically vary experimental factors, assign each factor a discrete set of levels. Full factorial designs measure response variables using every treatment (combination of the factor levels). A full factorial design for n factors with N1, ..., Nn levels requires N1 × ... × Nn experimental runs—one for each treatment. While advantageous for separating individual effects, full factorial designs can make large demands on data collection. As an example, suppose a machine shop has three machines and four operators. If the same operator always uses the same machine, it is impossible to determine if a machine or an operator is the cause of variation in production. By allowing every operator to use every machine, effects are separated. A full factorial list of treatments is generated by the function fullfact: dFF = fullfact([3,4]) dFF = 1 1 2 1 3 1 1 2 2 2 3 2 1 3 2 3 3 3 1 4 2 4 3 4

Each of the 3×4 = 12 rows of dFF represent one machine/operator combination.

Two-Level Designs Many experiments can be conducted with two-level factors, using two-level designs. For example, suppose the machine shop in the previous example always keeps the same operator on the same machine, but wants to measure production effects that depend on the composition of the day and night shifts. The function ff2n generates a full factorial list of treatments: dFF2 = ff2n(4) dFF2 = 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

30-3

30

Design of Experiments

1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

Each of the 24 = 16 rows of dFF2 represent one schedule of operators for the day (0) and night (1) shifts.

30-4

Fractional Factorial Designs

Fractional Factorial Designs In this section... “Introduction to Fractional Factorial Designs” on page 30-5 “Plackett-Burman Designs” on page 30-5 “General Fractional Designs” on page 30-5

Introduction to Fractional Factorial Designs Two-level designs are sufficient for evaluating many production processes. Factor levels of ±1 can indicate categorical factors, normalized factor extremes, or simply “up” and “down” from current factor settings. Experimenters evaluating process changes are interested primarily in the factor directions that lead to process improvement. For experiments with many factors, two-level full factorial designs can lead to large amounts of data. For example, a two-level full factorial design with 10 factors requires 210 = 1024 runs. Often, however, individual factors or their interactions have no distinguishable effects on a response. This is especially true of higher order interactions. As a result, a well-designed experiment can use fewer runs for estimating model parameters. Fractional factorial designs use a fraction of the runs required by full factorial designs. A subset of experimental treatments is selected based on an evaluation (or assumption) of which factors and interactions have the most significant effects. Once this selection is made, the experimental design must separate these effects. In particular, significant effects should not be confounded, that is, the measurement of one should not depend on the measurement of another.

Plackett-Burman Designs Plackett-Burman designs are used when only main effects are considered significant. Two-level Plackett-Burman designs require a number of experimental runs that are a multiple of 4 rather than a power of 2. The function hadamard generates these designs: dPB = hadamard(8) dPB = 1 1 1 1 -1 1 1 1 -1 1 -1 -1 1 1 1 1 -1 1 1 1 -1 1 -1 -1

1 -1 -1 1 1 -1 -1 1

1 1 1 1 -1 -1 -1 -1

1 -1 1 -1 -1 1 -1 1

1 1 -1 -1 -1 -1 1 1

1 -1 -1 1 -1 1 1 -1

Binary factor levels are indicated by ±1. The design is for eight runs (the rows of dPB) manipulating seven two-level factors (the last seven columns of dPB). The number of runs is a fraction 8/27 = 0.0625 of the runs required by a full factorial design. Economy is achieved at the expense of confounding main effects with any two-way interactions.

General Fractional Designs At the cost of a larger fractional design, you can specify which interactions you wish to consider significant. A design of resolution R is one in which no n-factor interaction is confounded with any 30-5

30

Design of Experiments

other effect containing less than R – n factors. Thus, a resolution III design does not confound main effects with one another but may confound them with two-way interactions (as in “Plackett-Burman Designs” on page 30-5), while a resolution IV design does not confound either main effects or twoway interactions but may confound two-way interactions with each other. Specify general fractional factorial designs using a full factorial design for a selected subset of basic factors and generators for the remaining factors. Generators are products of the basic factors, giving the levels for the remaining factors. Use the function fracfact to generate these designs: dfF = fracfact('a b c d bcd dfF = -1 -1 -1 -1 -1 -1 -1 1 -1 -1 1 -1 -1 -1 1 1 -1 1 -1 -1 -1 1 -1 1 -1 1 1 -1 -1 1 1 1 1 -1 -1 -1 1 -1 -1 1 1 -1 1 -1 1 -1 1 1 1 1 -1 -1 1 1 -1 1 1 1 1 -1 1 1 1 1

acd') -1 1 1 -1 1 -1 -1 1 -1 1 1 -1 1 -1 -1 1

-1 1 1 -1 -1 1 1 -1 1 -1 -1 1 1 -1 -1 1

This is a six-factor design in which four two-level basic factors (a, b, c, and d in the first four columns of dfF) are measured in every combination of levels, while the two remaining factors (in the last three columns of dfF) are measured only at levels defined by the generators bcd and acd, respectively. Levels in the generated columns are products of corresponding levels in the columns that make up the generator. The challenge of creating a fractional factorial design is to choose basic factors and generators so that the design achieves a specified resolution in a specified number of runs. Use the function fracfactgen to find appropriate generators: generators = fracfactgen('a b c d e f',4,4) generators = 'a' 'b' 'c' 'd' 'bcd' 'acd'

These are generators for a six-factor design with factors a through f, using 24 = 16 runs to achieve resolution IV. The fracfactgen function uses an efficient search algorithm to find generators that meet the requirements. An optional output from fracfact displays the confounding pattern of the design: [dfF,confounding] = fracfact(generators); confounding confounding = 'Term' 'Generator' 'Confounding'

30-6

Fractional Factorial Designs

'X1' 'X2' 'X3' 'X4' 'X5' 'X6' 'X1*X2' 'X1*X3' 'X1*X4' 'X1*X5' 'X1*X6' 'X2*X3' 'X2*X4' 'X2*X5' 'X2*X6' 'X3*X4' 'X3*X5' 'X3*X6' 'X4*X5' 'X4*X6' 'X5*X6'

'a' 'b' 'c' 'd' 'bcd' 'acd' 'ab' 'ac' 'ad' 'abcd' 'cd' 'bc' 'bd' 'cd' 'abcd' 'cd' 'bd' 'ad' 'bc' 'ac' 'ab'

'X1' 'X2' 'X3' 'X4' 'X5' 'X6' 'X1*X2 'X1*X3 'X1*X4 'X1*X5 'X1*X6 'X2*X3 'X2*X4 'X1*X6 'X1*X5 'X1*X6 'X2*X4 'X1*X4 'X2*X3 'X1*X3 'X1*X2

+ + + + + + + + + + + + + + +

X5*X6' X4*X6' X3*X6' X2*X6' X2*X5 + X3*X4' X4*X5' X3*X5' X2*X5 + X3*X4' X2*X6' X2*X5 + X3*X4' X3*X5' X3*X6' X4*X5' X4*X6' X5*X6'

The confounding pattern shows that main effects are effectively separated by the design, but two-way interactions are confounded with various other two-way interactions.

30-7

30

Design of Experiments

Response Surface Designs In this section... “Introduction to Response Surface Designs” on page 30-8 “Central Composite Designs” on page 30-8 “Box-Behnken Designs” on page 30-10

Introduction to Response Surface Designs Quadratic response surfaces are simple models that provide a maximum or minimum without making additional assumptions about the form of the response. Quadratic models can be calibrated using full factorial designs with three or more levels for each factor, but these designs generally require more runs than necessary to accurately estimate model parameters. This section discusses designs for calibrating quadratic models that are much more efficient, using three or five levels for each factor, but not using all combinations of levels.

Central Composite Designs Central composite designs (CCDs), also known as Box-Wilson designs, are appropriate for calibrating full quadratic models. There are three types of CCDs—circumscribed, inscribed, and faced—pictured below:

30-8

Response Surface Designs

Each design consists of a factorial design (the corners of a cube) together with center and star points that allow for estimation of second-order effects. For a full quadratic model with n factors, CCDs have enough design points to estimate the (n+2)(n+1)/2 coefficients. The type of CCD used (the position of the factorial and star points) is determined by the number of factors and by the desired properties of the design. The following table summarizes some important properties. A design is rotatable if the prediction variance depends only on the distance of the design point from the center of the design.

30-9

30

Design of Experiments

Design

Rotatable

Factor Levels

Uses Points Outside ±1

Accuracy of Estimates

Circumscribed (CCC)

Yes

5

Yes

Good over entire design space

Inscribed (CCI)

Yes

5

No

Good over central subset of design space

Faced (CCF)

No

3

No

Fair over entire design space; poor for pure quadratic coefficients

Generate CCDs with the function ccdesign: dCC = ccdesign(3,'type','circumscribed') dCC = -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 1.0000 -1.0000 1.0000 -1.0000 -1.0000 1.0000 1.0000 1.0000 -1.0000 -1.0000 1.0000 -1.0000 1.0000 1.0000 1.0000 -1.0000 1.0000 1.0000 1.0000 -1.6818 0 0 1.6818 0 0 0 -1.6818 0 0 1.6818 0 0 0 -1.6818 0 0 1.6818 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The repeated center point runs allow for a more uniform estimate of the prediction variance over the entire design space.

Box-Behnken Designs Like the designs described in “Central Composite Designs” on page 30-8, Box-Behnken designs are used to calibrate full quadratic models. Box-Behnken designs are rotatable and, for a small number of factors (four or less), require fewer runs than CCDs. By avoiding the corners of the design space, they allow experimenters to work around extreme factor combinations. Like an inscribed CCD, however, extremes are then poorly estimated. The geometry of a Box-Behnken design is pictured in the following figure.

30-10

Response Surface Designs

Design points are at the midpoints of edges of the design space and at the center, and do not contain an embedded factorial design. Generate Box-Behnken designs with the function bbdesign: dBB = bbdesign(3) dBB = -1 -1 0 -1 1 0 1 -1 0 1 1 0 -1 0 -1 -1 0 1 1 0 -1 1 0 1 0 -1 -1 0 -1 1 0 1 -1 0 1 1 0 0 0 0 0 0 0 0 0

Again, the repeated center point runs allow for a more uniform estimate of the prediction variance over the entire design space.

30-11

30

Design of Experiments

D-Optimal Designs In this section... “Introduction to D-Optimal Designs” on page 30-12 “Generate D-Optimal Designs” on page 30-13 “Augment D-Optimal Designs” on page 30-14 “Specify Fixed Covariate Factors” on page 30-15 “Specify Categorical Factors” on page 30-16 “Specify Candidate Sets” on page 30-16

Introduction to D-Optimal Designs Traditional experimental designs (“Full Factorial Designs” on page 30-3, “Fractional Factorial Designs” on page 30-5, and “Response Surface Designs” on page 30-8) are appropriate for calibrating linear models in experimental settings where factors are relatively unconstrained in the region of interest. In some cases, however, models are necessarily nonlinear. In other cases, certain treatments (combinations of factor levels) may be expensive or infeasible to measure. D-optimal designs are model-specific designs that address these limitations of traditional designs. A D-optimal design is generated by an iterative search algorithm and seeks to minimize the covariance of the parameter estimates for a specified model. This is equivalent to maximizing the determinant D = |XTX|, where X is the design matrix of model terms (the columns) evaluated at specific treatments in the design space (the rows). Unlike traditional designs, D-optimal designs do not require orthogonal design matrices, and as a result, parameter estimates may be correlated. Parameter estimates may also be locally, but not globally, D-optimal. There are several Statistics and Machine Learning Toolbox functions for generating D-optimal designs: Function

Description

candexch

Uses a row-exchange algorithm to generate a D-optimal design with a specified number of runs for a specified model and a specified candidate set. This is the second component of the algorithm used by rowexch.

candgen

Generates a candidate set for a specified model. This is the first component of the algorithm used by rowexch.

cordexch

Uses a coordinate-exchange algorithm to generate a D-optimal design with a specified number of runs for a specified model.

daugment

Uses a coordinate-exchange algorithm to augment an existing D-optimal design with additional runs to estimate additional model terms.

dcovary

Uses a coordinate-exchange algorithm to generate a D-optimal design with fixed covariate factors.

rowexch

Uses a row-exchange algorithm to generate a D-optimal design with a specified number of runs for a specified model. The algorithm calls candgen and then candexch. (Call candexch separately to specify a candidate set.)

The following sections explain how to use these functions to generate D-optimal designs. 30-12

D-Optimal Designs

Note The function rsmdemo generates simulated data for experimental settings specified by either the user or by a D-optimal design generated by cordexch. It uses the rstool interface to visualize response surface models fit to the data, and it uses the nlintool interface to visualize a nonlinear model fit to the data.

Generate D-Optimal Designs Two Statistics and Machine Learning Toolbox algorithms generate D-optimal designs: • The cordexch function uses a coordinate-exchange algorithm • The rowexch function uses a row-exchange algorithm Both cordexch and rowexch use iterative search algorithms. They operate by incrementally changing an initial design matrix X to increase D = |XTX| at each step. In both algorithms, there is randomness built into the selection of the initial design and into the choice of the incremental changes. As a result, both algorithms may return locally, but not globally, D-optimal designs. Run each algorithm multiple times and select the best result for your final design. Both functions have a 'tries' parameter that automates this repetition and comparison. At each step, the row-exchange algorithm exchanges an entire row of X with a row from a design matrix C evaluated at a candidate set of feasible treatments. The rowexch function automatically generates a C appropriate for a specified model, operating in two steps by calling the candgen and candexch functions in sequence. Provide your own C by calling candexch directly. In either case, if C is large, its static presence in memory can affect computation. The coordinate-exchange algorithm, by contrast, does not use a candidate set. (Or rather, the candidate set is the entire design space.) At each step, the coordinate-exchange algorithm exchanges a single element of X with a new element evaluated at a neighboring point in design space. The absence of a candidate set reduces demands on memory, but the smaller scale of the search means that the coordinate-exchange algorithm is more likely to become trapped in a local minimum than the row-exchange algorithm. For example, suppose you want a design to estimate the parameters in the following three-factor, seven-term interaction model: y = β0 + β1x 1+ β2x 2+ β3x 3+ β12x1x 2+ β13x1x 3+ β23x2x 3+ ε Use cordexch to generate a D-optimal design with seven runs: nfactors = 3; nruns = 7; [dCE,X] = cordexch(nfactors,nruns,'interaction','tries',10) dCE = -1 1 1 -1 -1 -1 1 1 1 -1 1 -1 1 -1 1 1 -1 -1 -1 -1 1 X = 1 -1 1 1 -1 -1 1 1 -1 -1 -1 1 1 1 1 1 1 1 1 1 1

30-13

30

Design of Experiments

1 1 1 1

-1 1 1 -1

1 -1 -1 -1

-1 1 -1 1

-1 -1 -1 1

1 1 -1 -1

-1 -1 1 -1

Columns of the design matrix X are the model terms evaluated at each row of the design dCE. The terms appear in order from left to right: 1

Constant term

2

Linear terms (1, 2, 3)

3

Interaction terms (12, 13, 23)

Use X in a linear regression model fit to response data measured at the design points in dCE. Use rowexch in a similar fashion to generate an equivalent design: [dRE,X] = rowexch(nfactors,nruns,'interaction','tries',10) dRE = -1 -1 1 1 -1 1 1 -1 -1 1 1 1 -1 -1 -1 -1 1 -1 -1 1 1 X = 1 -1 -1 1 1 -1 -1 1 1 -1 1 -1 1 -1 1 1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1 -1 -1 -1 1 1 1 1 -1 1 -1 -1 1 -1 1 -1 1 1 -1 -1 1

Augment D-Optimal Designs In practice, you may want to add runs to a completed experiment to learn more about a process and estimate additional model coefficients. The daugment function uses a coordinate-exchange algorithm to augment an existing D-optimal design. For example, the following eight-run design is adequate for estimating main effects in a four-factor model: dCEmain = cordexch(4,8) dCEmain = 1 -1 -1 1 -1 -1 1 1 -1 1 -1 1 1 1 1 -1 1 1 1 1 -1 1 -1 -1 1 -1 -1 -1 -1 -1 1 -1

To estimate the six interaction terms in the model, augment the design with eight additional runs: 30-14

D-Optimal Designs

dCEinteraction = daugment(dCEmain,8,'interaction') dCEinteraction = 1 -1 -1 1 -1 -1 1 1 -1 1 -1 1 1 1 1 -1 1 1 1 1 -1 1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 1 1 1 -1 -1 -1 -1 1 -1 1 -1 1 1 -1 1 -1 1 1 -1 1 1 -1 -1 1 -1 1 1 1 1 1 -1

The augmented design is full factorial, with the original eight runs in the first eight rows. The 'start' parameter of the candexch function provides the same functionality as daugment, but uses a row exchange algorithm rather than a coordinate-exchange algorithm.

Specify Fixed Covariate Factors In many experimental settings, certain factors and their covariates are constrained to a fixed set of levels or combinations of levels. These cannot be varied when searching for an optimal design. The dcovary function allows you to specify fixed covariate factors in the coordinate exchange algorithm. For example, suppose you want a design to estimate the parameters in a three-factor linear additive model, with eight runs that necessarily occur at different times. If the process experiences temporal linear drift, you may want to include the run time as a variable in the model. Produce the design as follows: time = linspace(-1,1,8)'; [dCV,X] = dcovary(3,time,'linear') dCV = -1.0000 1.0000 1.0000 -1.0000 1.0000 -1.0000 -1.0000 -0.7143 -1.0000 -1.0000 -1.0000 -0.4286 1.0000 -1.0000 1.0000 -0.1429 1.0000 1.0000 -1.0000 0.1429 -1.0000 1.0000 -1.0000 0.4286 1.0000 1.0000 1.0000 0.7143 -1.0000 -1.0000 1.0000 1.0000 X = 1.0000 -1.0000 1.0000 1.0000 1.0000 1.0000 -1.0000 -1.0000 1.0000 -1.0000 -1.0000 -1.0000 1.0000 1.0000 -1.0000 1.0000 1.0000 1.0000 1.0000 -1.0000 1.0000 -1.0000 1.0000 -1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 -1.0000 -1.0000 1.0000

-1.0000 -0.7143 -0.4286 -0.1429 0.1429 0.4286 0.7143 1.0000

30-15

30

Design of Experiments

The column vector time is a fixed factor, normalized to values between ±1. The number of rows in the fixed factor specifies the number of runs in the design. The resulting design dCV gives factor settings for the three controlled model factors at each time.

Specify Categorical Factors Categorical factors take values in a discrete set of levels. Both cordexch and rowexch have a 'categorical' parameter that allows you to specify the indices of categorical factors and a 'levels' parameter that allows you to specify a number of levels for each factor. For example, the following eight-run design is for a linear additive model with five factors in which the final factor is categorical with three levels: dCEcat = cordexch(5,8,'linear','categorical',5,'levels',3) dCEcat = -1 -1 1 1 2 -1 -1 -1 -1 3 1 1 1 1 3 1 1 -1 -1 2 1 -1 -1 1 3 -1 1 -1 1 1 -1 1 1 -1 3 1 -1 1 -1 1

Specify Candidate Sets The row-exchange algorithm exchanges rows of an initial design matrix X with rows from a design matrix C evaluated at a candidate set of feasible treatments. The rowexch function automatically generates a C appropriate for a specified model, operating in two steps by calling the candgen and candexch functions in sequence. Provide your own C by calling candexch directly. For example, the following uses rowexch to generate a five-run design for a two-factor pure quadratic model using a candidate set that is produced internally: dRE1 = rowexch(2,5,'purequadratic','tries',10) dRE1 = -1 1 0 0 1 -1 1 0 1 1

The same thing can be done using candgen and candexch in sequence: [dC,C] = candgen(2,'purequadratic') % Candidate set, C dC = -1 -1 0 -1 1 -1 -1 0 0 0 1 0 -1 1 0 1 1 1

30-16

D-Optimal Designs

C = 1 -1 -1 1 1 1 0 -1 0 1 1 1 -1 1 1 1 -1 0 1 0 1 0 0 0 0 1 1 0 1 0 1 -1 1 1 1 1 0 1 0 1 1 1 1 1 1 treatments = candexch(C,5,'tries',10) % D-opt subset treatments = 2 1 7 3 4 dRE2 = dC(treatments,:) % Display design dRE2 = 0 -1 -1 -1 -1 1 1 -1 -1 0

You can replace C in this example with a design matrix evaluated at your own candidate set. For example, suppose your experiment is constrained so that the two factors cannot have extreme settings simultaneously. The following produces a restricted candidate set: constraint = sum(abs(dC),2) < 2; % Feasible treatments my_dC = dC(constraint,:) my_dC = 0 -1 -1 0 0 0 1 0 0 1

Use the x2fx function to convert the candidate set to a design matrix: my_C = x2fx(my_dC,'purequadratic') my_C = 1 0 -1 0 1 1 -1 0 1 0 1 0 0 0 0 1 1 0 1 0 1 0 1 0 1

Find the required design in the same manner: my_treatments = candexch(my_C,5,'tries',10) % D-opt subset my_treatments = 2 4 5 1 3 my_dRE = my_dC(my_treatments,:) % Display design

30-17

30

Design of Experiments

my_dRE = -1 1 0 0 0

30-18

0 0 1 -1 0

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques This example shows how to improve the performance of an engine cooling fan through a Design for Six Sigma approach using Define, Measure, Analyze, Improve, and Control (DMAIC). The initial fan does not circulate enough air through the radiator to keep the engine cool during difficult conditions. First the example shows how to design an experiment to investigate the effect of three performance factors: fan distance from the radiator, blade-tip clearance, and blade pitch angle. It then shows how to estimate optimum values for each factor, resulting in a design that produces airflows beyond the goal of 875 ft3 per minute using test data. Finally it shows how to use simulations to verify that the new design produces airflow according to the specifications in more than 99.999% of the fans manufactured. Define the Problem This example addresses an engine cooling fan design that is unable to pull enough air through the radiator to keep the engine cool during difficult conditions, such as stop-and-go traffic or hot weather. Suppose you estimate that you need airflow of at least 875 ft3/min to keep the engine cool during difficult conditions. You need to evaluate the current design and develop an alternative design that can achieve the target airflow. Assess Cooling Fan Performance Load the sample data, which is available when you run this example. load("OriginalFan.mat")

The data consists of 10,000 measurements (historical production data) of the existing cooling fan performance. Plot the data to analyze the current fan's performance. plot(originalfan) xlabel("Observation") ylabel("Max Airflow (ft^3/min)") title("Historical Production Data")

30-19

30

Design of Experiments

The data is centered around 842 ft3/min and most values fall within the range of about 8 ft3/min. The plot does not tell much about the underlying distribution of data, however. Plot the histogram and fit a normal distribution to the data. figure histfit(originalfan) % Plot histogram with normal distribution fit format shortg xlabel("Airflow (ft^3/min)") ylabel("Frequency (counts)") title("Airflow Histogram")

30-20

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques

pd = fitdist(originalfan,"normal") % Fit normal distribution to data pd = NormalDistribution Normal distribution mu = 841.652 [841.616, 841.689] sigma = 1.8768 [1.85114, 1.90318]

fitdist fits a normal distribution to data and estimates the parameters from data. The estimate for the mean airflow speed is 841.652 ft3/min, and the 95% confidence interval for the mean airflow speed is (841.616, 841.689). This estimate makes it clear that the current fan is not close to the required 875 ft3/min. There is need to improve the fan design to achieve the target airflow. Determine Factors That Affect Fan Performance Evaluate the factors that affect cooling fan performance using design of experiments (DOE). The response is the cooling fan airflow rate (ft3/min). Suppose that the factors that you can modify and control are: • Distance from radiator • Pitch angle • Blade tip clearance In general, fluid systems have nonlinear behavior. Therefore, use a response surface design to estimate any nonlinear interactions among the factors. Generate the experimental runs for a Box30-21

30

Design of Experiments

Behnken design in coded (normalized) variables [-1, 0, +1]; see “Box-Behnken Designs” on page 3010. CodedValue = bbdesign(3) CodedValue = 15×3 -1 -1 1 1 -1 -1 1 1 0 0

⋮

-1 1 -1 1 0 0 0 0 -1 -1

0 0 0 0 -1 1 -1 1 -1 1

The first column is for the distance from radiator, the second column is for the pitch angle, and the third column is for the blade tip clearance. Suppose you want to test the effects of the variables at the following minimum and maximum values. Distance from radiator: 1 to 1.5 inches Pitch angle: 15 to 35 degrees Blade tip clearance: 1 to 2 inches Randomize the order of the runs, convert the coded design values to real-world units, and perform the experiment in the order specified. runorder = randperm(15); bounds = [1 1.5;15 35;1 2];

% Random permutation of the runs % Min and max values for each factor

RealValue = zeros(size(CodedValue)); for i = 1:size(CodedValue,2) % Convert coded values to real-world units zmax = max(CodedValue(:,i)); zmin = min(CodedValue(:,i)); RealValue(:,i) = interp1([zmin zmax],bounds(i,:),CodedValue(:,i)); end

Suppose that at the end of the experiments, you collect the following response values in the variable TestResult. TestResult = [837 864 829 856 880 879 872 874 834 833 860 859 874 876 875]';

Save the design values and the response in a table. Expmt = table(runorder', CodedValue(:,1), CodedValue(:,2), CodedValue(:,3), ... TestResult,'VariableNames',{'RunNumber','D','P','C','Airflow'});

Display the design values and the response. disp(Expmt) RunNumber _________

30-22

D __

P __

C __

Airflow _______

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques

6 3 11 7 14 8 5 15 1 2 4 13 9 10 12

-1 -1 1 1 -1 -1 1 1 0 0 0 0 0 0 0

-1 1 -1 1 0 0 0 0 -1 -1 1 1 0 0 0

0 0 0 0 -1 1 -1 1 -1 1 -1 1 0 0 0

837 864 829 856 880 879 872 874 834 833 860 859 874 876 875

D stands for Distance, P stands for Pitch, and C stands for Clearance. Based on the experimental test results, the airflow rate is sensitive to the changing factors values. Also, four experimental runs meet or exceed the target airflow rate of 875 ft3/min (runs 2, 4,12, and 14). However, it is not clear which, if any, of these runs is the optimal one. In addition, it is not obvious how robust the design is to variation in the factors. Create a model based on the current experimental data and use the model to estimate the optimal factor settings. Improve the Cooling Fan Performance The Box-Behnken design enables you to test for nonlinear (quadratic) effects. The form of the quadratic model is: AF = β0 + β1 * Distance + β2 * Pitch + β3 * Clearance + β4 * Distance * Pitch +β5 * Distance * Clearance + β6 * Pitch * Clearance + β7 * Distance2 2

+β8 * Pitch + β9 * Clearance2, where AF is the airflow rate and βi is the coefficient for the term i. Estimate the coefficients of this model using the fitlm function. mdl = fitlm(Expmt,"Airflow~D*P*C-D:P:C+D^2+P^2+C^2");

Display the magnitudes of the coefficients (for normalized values) in a bar chart. figure h = bar(mdl.Coefficients.Estimate(2:10)); set(h,"facecolor",[0.8 0.8 0.9]) legend("Coefficient") set(gcf,"units","normalized","position",[0.05 0.4 0.35 0.4]) set(gca,"xticklabel",mdl.CoefficientNames(2:10)) ylabel("Airflow (ft^3/min)") xlabel("Normalized Coefficient") title("Quadratic Model Coefficients")

30-23

30

Design of Experiments

The bar chart shows that Pitch and Pitch2 are dominant factors. You can look at the relationship between multiple input variables and one output variable by generating a response surface plot. Use plotSlice to generate response surface plots for the model mdl interactively. plotSlice(mdl)

The plot shows the nonlinear relationship of airflow with pitch. Move the blue dashed lines around and see the effect the different factors have on airflow. Although you can use plotSlice to determine the optimum factor settings, you can also use Optimization Toolbox™ to automate the task. 30-24

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques

Optimize Factor Settings Find the optimal factor settings using the “Problem-Based Optimization Workflow” (Optimization Toolbox). First, define an optimization problem for maximization. prob = optimproblem("ObjectiveSense","max");

Write the objective function using the x2fx function to convert the predictor matrix to a design matrix. Multiply the result by the model coefficient estimates. fun = @(x) x2fx(x,"quadratic")*mdl.Coefficients.Estimate;

Create an optimization variable named factors that is bounded between –1 and 1, and has three components, which represent the three factors. factors = optimvar("factors",1,3,LowerBound=-1,UpperBound=1);

Convert the objective function to an optimization expression in factors by using the fcn2optimexpr (Optimization Toolbox) function. objective = fcn2optimexpr(fun,factors);

Place the objective function expression into the problem prob. prob.Objective = objective;

Set the initial point to be the center of the design of the experimental test matrix, meaning the vector [0 0 0]. For the problem-based approach, the initial point must be a structure with the variable name as the name field. x0 = struct("factors",[0 0 0]);

Find the optimal design. [sol,fval] = solve(prob,x0); Solving problem using fmincon. Feasible point with lower objective function value found. Local minimum found that satisfies the constraints. Optimization completed because the objective function is non-decreasing in feasible directions, to within the value of the optimality tolerance, and constraints are satisfied to within the value of the constraint tolerance.

Convert the results to real-world units. maxloc = (sol.factors + 1)'; maxloc = bounds(:,1)+maxloc .* ((bounds(:,2) - bounds(:,1))/2); fprintf("Optimal Values:\n" + ... "Distance Pitch Clearance Airflow\n" + ... " %g %g %g %g\n",maxloc',fval);

30-25

30

Design of Experiments

Optimal Values: Distance Pitch 1 27.2747

Clearance 1

Airflow 882.257

The optimization result suggests placing the new fan one inch from the radiator, with pitch angle 27.3, and with a one-inch clearance between the tips of the fan blades and the shroud. Because pitch angle has such a significant effect on airflow, perform additional analysis to verify that a 27.3 degree pitch angle is optimal. load("AirflowData.mat") tbl = table(pitch,airflow); mdl2 = fitlm(tbl,"airflow~pitch^2"); mdl2.Rsquared.Ordinary ans = 0.99632

The results show that a quadratic model explains the effect of pitch on the airflow well. Plot the pitch angle against airflow and impose the fitted model. figure plot(pitch,airflow,".r") hold on ylim([840 885]) line(pitch,mdl2.Fitted,"color","b") title("Fitted Model and Data") xlabel("Pitch angle (degrees)") ylabel("Airflow (ft^3/min)") legend("Test data","Quadratic model","Location","se") hold off

30-26

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques

Find the pitch value that corresponds to the maximum airflow. pitch(find(airflow==max(airflow))) ans = 27

The additional analysis confirms that a 27.3 degree pitch angle is optimal. The improved cooling fan design meets the airflow requirements. You also have a model that approximates the fan performance well based on the factors you can modify in the design. Ensure that the fan performance is robust to variability in manufacturing and installation by performing a sensitivity analysis. Sensitivity Analysis Suppose that, based on historical experience, the manufacturing uncertainty is as follows. table(["Distance from radiator";"Blade pitch angle";"Blade tip clearance"],... ["1.00 +/- 0.05 inch";"27.3 +/- 0.25 degrees";"1.00 +/- 0.125 inch"],... ["1.00 +/- 0.20 inch";"0.227 +/- 0.028 degrees";"-1.00 +/- 0.25 inch"],... 'VariableNames',{'Factor' 'Real Values' 'Coded Values'}) ans=3×3 table Factor ________________________

Real Values _______________________

Coded Values _________________________

30-27

30

Design of Experiments

"Distance from radiator" "Blade pitch angle" "Blade tip clearance"

"1.00 +/- 0.05 inch" "27.3 +/- 0.25 degrees" "1.00 +/- 0.125 inch"

"1.00 +/- 0.20 inch" "0.227 +/- 0.028 degrees" "-1.00 +/- 0.25 inch"

Verify that these variations in factors will enable to maintain a robust design around the target airflow. The philosophy of Six Sigma targets a defect rate of no more than 3.4 per 1,000,000 fans. That is, the fans must hit the 875 ft3/min target 99.999% of the time. You can verify the design using Monte Carlo simulation. Generate 10,000 random numbers for three factors with the specified tolerance. Include a noise variable that is proportional to the noise in the fitted model, mdl (that is, the RMS error of the model). Because the model coefficients are in coded variables, you must generate dist, pitch, and clearance using the coded definition. dist = random("normal",sol.factors(1),0.20,[10000 1]); pitch = random("normal",sol.factors(2),0.028,[10000 1]); clearance = random("normal",sol.factors(3),0.25,[10000 1]); noise = random("normal",0,mdl2.RMSE,[10000 1]);

Calculate airflow for 10,000 random factor combinations using the model. simfactor = [dist pitch clearance]; X = x2fx(simfactor,"quadratic");

Add noise to the model (the variation in the data that the model did not account for). simflow = X*mdl.Coefficients.Estimate+noise;

Evaluate the variation in the model's predicted airflow using a histogram. To estimate the mean and standard deviation, fit a normal distribution to data. pd = fitdist(simflow,"normal"); figure histfit(simflow) hold on text(pd.mu+2,300,["Mean: " num2str(round(pd.mu))]) text(pd.mu+2,280,["Standard deviation: " num2str(round(pd.sigma))]) hold off xlabel("Airflow (ft^3/min)") ylabel("Frequency") title("Monte Carlo Simulation Results") hold off

30-28

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques

The results look promising. The average airflow is 882 ft3/min and appears to be better than 875 ft 3/min for most of the data. Determine the probability that the airflow is at 875 ft3/min or below. format long pfail = cdf(pd,875) pfail = 1.454076873131737e-07 pass = (1-pfail)*100 pass = 99.999985459231269

The design appears to achieve at least 875 ft3/min of airflow 99.999% of the time. Use the simulation results to estimate the process capability. S = capability(simflow,[875.0 890]) S = struct mu: sigma: P:

with fields: 8.822983078353724e+02 1.422865134786495 0.999999823569868

30-29

30

Design of Experiments

Pl: Pu: Cp: Cpl: Cpu: Cpk:

1.454076873131737e-07 3.102244433021162e-08 1.757018243598422 1.709768001886212 1.804268485310632 1.709768001886212

pass = (1-S.Pl)*100 pass = 99.999985459231269

The Cp value is 1.75. A process is considered high quality when Cp is greater than or equal to 1.6. The Cpk is similar to the Cp value, which indicates that the process is centered. Now implement this design. Monitor to verify the design process and to ensure that the cooling fan delivers high-quality performance. Control Manufacturing of the Improved Cooling Fan You can monitor and evaluate the manufacturing and installation process of the new fan using control charts. Evaluate the first 30 days of production of the new cooling fan. Initially, five cooling fans per day were produced. First, load the sample data from the new process. load("spcdata.mat")

Plot the X-bar and S charts. figure controlchart(spcflow,"chart",{'xbar','s'}) % Reshape the data into daily sets xlabel("Day")

30-30

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques

According to the results, the manufacturing process is in statistical control, as indicated by the absence of violations of control limits or nonrandom patterns in the data over time. You can also run a capability analysis on the data to evaluate the process. [row,col] = size(spcflow); S2 = capability(reshape(spcflow,row*col,1),[875.0 890]) S2 = struct with fields: mu: 8.821061141685465e+02 sigma: 1.423887508874697 P: 0.999999684316149 Pl: 3.008932155898586e-07 Pu: 1.479063578225176e-08 Cp: 1.755756676295137 Cpl: 1.663547652525458 Cpu: 1.847965700064817 Cpk: 1.663547652525458 pass = (1-S.Pl)*100 pass = 99.999985459231269

The Cp value of 1.755 is very similar to the estimated value of 1.73. The Cpk value of 1.66 is smaller than the Cp value. However, only a Cpk value less than 1.33, which indicates that the process shifted 30-31

30

Design of Experiments

significantly toward one of the process limits, is a concern. The process is well within the limits and it achieves the target airflow (875 ft3/min) more than 99.999% of the time.

See Also bbdesign | fitlm | x2fx | solve | controlchart | capability

Related Examples

30-32

•

“Box-Behnken Designs” on page 30-10

•

“Problem-Based Optimization Workflow” (Optimization Toolbox)

31 Statistical Process Control • “Control Charts” on page 31-2 • “Capability Studies” on page 31-4

31

Statistical Process Control

Control Charts A control chart displays measurements of process samples over time. The measurements are plotted together with user-defined specification limits and process-defined control limits. The process can then be compared with its specifications—to see if it is in control or out of control. The chart is just a monitoring tool. Control activity might occur if the chart indicates an undesirable, systematic change in the process. The control chart is used to discover the variation, so that the process can be adjusted to reduce it. Control charts are created with the controlchart function. Any of the following chart types may be specified: • Xbar or mean • Standard deviation • Range • Exponentially weighted moving average • Individual observation • Moving range of individual observations • Moving average of individual observations • Proportion defective • Number of defectives • Defects per unit • Count of defects Control rules are specified with the controlrules function. The following example illustrates how to use Western Electric rules to mark out of control measurements on an Xbar chart. First load the sample data. load parts

Construct the Xbar control chart using the Western Electric 2 rule (2 of 3 points at least 2 standard errors above the center line) to mark the out of control measurements. st = controlchart(runout,'rules','we2');

For a better understanding of the Western Electric 2 rule, calculate and plot the 2 standard errors line on the chart. x = st.mean; cl = st.mu; se = st.sigma./sqrt(st.n); hold on plot(cl+2*se,'m')

31-2

Control Charts

Identify the measurements that violate the control rule. R = controlrules('we2',x,cl,se); I = find(R) I = 6×1 21 23 24 25 26 27

See Also controlchart | controlrules

Related Examples •

“Capability Studies” on page 31-4

31-3

31

Statistical Process Control

Capability Studies Before going into production, many manufacturers run a capability study to determine if their process will run within specifications enough of the time. Capability indices produced by such a study are used to estimate expected percentages of defective parts. Capability studies are conducted with the capability function. The following capability indices are produced: • mu — Sample mean • sigma — Sample standard deviation • P — Estimated probability of being within the lower (L) and upper (U) specification limits • Pl — Estimated probability of being below L • Pu — Estimated probability of being above U • Cp — (U-L)/(6*sigma) • Cpl — (mu-L)./(3.*sigma) • Cpu — (U-mu)./(3.*sigma) • Cpk — min(Cpl,Cpu) As an example, simulate a sample from a process with a mean of 3 and a standard deviation of 0.005: rng default; % For reproducibility data = normrnd(3,0.005,100,1);

Compute capability indices if the process has an upper specification limit of 3.01 and a lower specification limit of 2.99: S = capability(data,[2.99 3.01]) S = struct mu: sigma: P: Pl: Pu: Cp: Cpl: Cpu: Cpk:

with fields: 3.0006 0.0058 0.9129 0.0339 0.0532 0.5735 0.6088 0.5382 0.5382

Visualize the specification and process widths: capaplot(data,[2.99 3.01]); grid on

31-4

Capability Studies

See Also capability

Related Examples •

“Control Charts” on page 31-2

31-5

32 Tall Arrays • “Logistic Regression with Tall Arrays” on page 32-2 • “Bayesian Optimization with Tall Arrays” on page 32-9 • “Statistics and Machine Learning with Big Data Using Tall Arrays” on page 32-23

32

Tall Arrays

Logistic Regression with Tall Arrays This example shows how to use logistic regression and other techniques to perform data analysis on tall arrays. Tall arrays represent data that is too large to fit into computer memory. Define Execution Environment When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. To run the example using the local MATLAB session when you have Parallel Computing Toolbox, change the global execution environment by using the mapreducer function. mapreducer(0)

Get Data into MATLAB Create a datastore that references the folder location with the data. The data can be contained in a single file, a collection of files, or an entire folder. Treat 'NA' values as missing data so that datastore replaces them with NaN values. Select a subset of the variables to work with, and include the name of the airline (UniqueCarrier) as a categorical variable. Create a tall table on top of the datastore. ds = datastore('airlinesmall.csv'); ds.TreatAsMissing = 'NA'; ds.SelectedVariableNames = {'DayOfWeek','UniqueCarrier',... 'ArrDelay','DepDelay','Distance'}; ds.SelectedFormats{2} = '%C'; tt = tall(ds); tt.DayOfWeek = categorical(tt.DayOfWeek,1:7,... {'Sun','Mon','Tues','Wed','Thu','Fri','Sat'},'Ordinal',true) tt = Mx5 tall table DayOfWeek _________

UniqueCarrier _____________

? ? ? : :

? ? ? : :

ArrDelay ________ ? ? ? : :

DepDelay ________ ? ? ? : :

Distance ________ ? ? ? : :

Late Flights Determine the flights that are late by 20 minutes or more by defining a logical variable that is true for a late flight. Add this variable to the tall table of data, noting that it is not yet evaluated. A preview of this variable includes the first few rows. tt.LateFlight = tt.ArrDelay>=20 tt = Mx6 tall table

32-2

Logistic Regression with Tall Arrays

DayOfWeek _________

UniqueCarrier _____________

? ? ? : :

? ? ? : :

ArrDelay ________ ? ? ? : :

DepDelay ________

Distance ________

? ? ? : :

? ? ? : :

LateFlight __________ ? ? ? : :

Calculate the mean of LateFlight to determine the overall proportion of late flights. Use gather to trigger evaluation of the tall array and bring the result into memory. m = mean(tt.LateFlight) m = tall double ? m = gather(m) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 2: Completed in 1.2 sec - Pass 2 of 2: Completed in 1.2 sec Evaluation completed in 3.2 sec m = 0.1580

Late Flights by Carrier Examine whether certain types of flights tend to be late. First, check to see if certain carriers are more likely to have late flights. tt.LateFlight = double(tt.LateFlight); late_by_carrier = gather(grpstats(tt,'UniqueCarrier','mean','DataVar','LateFlight')) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 2.9 sec Evaluation completed in 3.7 sec late_by_carrier=29×4 table GroupLabel UniqueCarrier __________ _____________ {'9E' {'AA' {'AQ' {'AS' {'B6' {'CO' {'DH' {'DL' {'EA' {'EV' {'F9' {'FL' {'HA' {'HP'

} } } } } } } } } } } } } }

9E AA AQ AS B6 CO DH DL EA EV F9 FL HA HP

GroupCount __________ 521 14930 154 2910 806 8138 696 16578 920 1699 335 1263 273 3660

mean_LateFlight _______________ 0.13436 0.16236 0.051948 0.16014 0.23821 0.16319 0.17672 0.15261 0.15217 0.21248 0.18209 0.19952 0.047619 0.13907

32-3

32

Tall Arrays

{'ML (1)'} {'MQ' } ⋮

ML (1) MQ

69 3962

0.043478 0.18778

Carriers B6 and EV have higher proportions of late flights. Carriers AQ, ML(1), and HA have relatively few flights, but lower proportions of them are late. Late Flights by Day of Week Next, check to see if different days of the week tend to have later flights. late_by_day = gather(grpstats(tt,'DayOfWeek','mean','DataVar','LateFlight')) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 1.3 sec Evaluation completed in 1.5 sec late_by_day=7×4 table GroupLabel DayOfWeek __________ _________ {'Fri' } {'Mon' } {'Sat' } {'Sun' } {'Thu' } {'Tues'} {'Wed' }

Fri Mon Sat Sun Thu Tues Wed

GroupCount __________ 15839 18077 16958 18019 18227 18163 18240

mean_LateFlight _______________ 0.12899 0.14234 0.15603 0.15117 0.18418 0.15526 0.18399

Wednesdays and Thursdays have the highest proportion of late flights, while Fridays have the lowest proportion. Late Flights by Distance Check to see if longer or shorter flights tend to be late. First, look at the density of the flight distance for flights that are late, and compare that with flights that are on time. ksdensity(tt.Distance(tt.LateFlight==1)) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 2: Completed in 0.93 sec - Pass 2 of 2: Completed in 0.91 sec Evaluation completed in 2.3 sec hold on ksdensity(tt.Distance(tt.LateFlight==0)) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 2: Completed in 0.74 sec - Pass 2 of 2: Completed in 0.96 sec Evaluation completed in 1.9 sec hold off legend('Late','On time')

32-4

Logistic Regression with Tall Arrays

Flight distance does not make a dramatic difference in whether a flight is early or late. However, the density appears to be slightly higher for on-time flights at distances of about 400 miles. The density is also higher for late flights at distances of about 2000 miles. Calculate some simple descriptive statistics for the late and on-time flights. late_by_distance = gather(grpstats(tt,'LateFlight',{'mean' 'std'},'DataVar','Distance')) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.4 sec late_by_distance=2×5 table GroupLabel LateFlight __________ __________ {'0'} {'1'}

0 1

GroupCount __________ 1.04e+05 19519

mean_Distance _____________ 693.14 750.24

std_Distance ____________ 544.75 574.12

Late flights are about 60 miles longer on average, although this value makes up only a small portion of the standard deviation of the distance values. Logistic Regression Model Build a model for the probability of a late flight, using both continuous variables (such as Distance) and categorical variables (such as DayOfWeek) to predict the probabilities. This model can help to determine if the previous results observed for each predictor individually also hold true when you consider them together. 32-5

32

Tall Arrays

glm = fitglm(tt,'LateFlight~Distance+DayOfWeek','Distribution','binomial') Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration

[1]: [1]: [2]: [2]: [3]: [3]: [4]: [4]: [5]: [5]:

0% completed 100% completed 0% completed 100% completed 0% completed 100% completed 0% completed 100% completed 0% completed 100% completed

glm = Compact generalized linear regression model: logit(LateFlight) ~ 1 + DayOfWeek + Distance Distribution = Binomial Estimated Coefficients:

(Intercept) DayOfWeek_Mon DayOfWeek_Tues DayOfWeek_Wed DayOfWeek_Thu DayOfWeek_Fri DayOfWeek_Sat Distance

Estimate __________

SE __________

tStat _______

pValue __________

-1.855 -0.072603 0.026909 0.2359 0.23569 -0.19285 0.033542 0.00018373

0.023052 0.029798 0.029239 0.028276 0.028282 0.031583 0.029702 1.3507e-05

-80.469 -2.4365 0.92029 8.343 8.3338 -6.106 1.1293 13.602

0 0.01483 0.35742 7.2452e-17 7.8286e-17 1.0213e-09 0.25879 3.8741e-42

123319 observations, 123311 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 504, p-value = 8.74e-105

The model confirms that the previously observed conclusions hold true here as well: • The Wednesday and Thursday coefficients are positive, indicating a higher probability of a late flight on those days. The Friday coefficient is negative, indicating a lower probability. • The Distance coefficient is positive, indicating that longer flights have a higher probability of being late. All of these coefficients have very small p-values. This is common with data sets that have many observations, since one can reliably estimate small effects with large amounts of data. In fact, the uncertainty in the model is larger than the uncertainty in the estimates for the parameters in the model. Prediction with Model Predict the probability of a late flight for each day of the week, and for distances ranging from 0 to 3000 miles. Create a table to hold the predictor values by indexing the first 100 rows in the original table tt. x = gather(tt(1:100,{'Distance' 'DayOfWeek'}));

32-6

Logistic Regression with Tall Arrays

Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.25 sec Evaluation completed in 0.47 sec x.Distance = linspace(0,3000)'; x.DayOfWeek(:) = 'Sun'; plot(x.Distance,predict(glm,x)); days = {'Sun' 'Mon' 'Tues' 'Wed' 'Thu' 'Fri' 'Sat'}; hold on for j=2:length(days) x.DayOfWeek(:) = days{j}; plot(x.Distance,predict(glm,x)); end legend(days)

According to this model, a Wednesday or Thursday flight of 500 miles has the same probability of being late, about 18%, as a Friday flight of about 3000 miles. Since these probabilities are all much less than 50%, the model is unlikely to predict that any given flight will be late using this information. Investigate the model more by focusing on the flights for which the model predicts a probability of 20% or more of being late, and compare that to the actual results. C = gather(crosstab(tt.LateFlight,predict(glm,tt)>.20))

32-7

32

Tall Arrays

Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 1.3 sec Evaluation completed in 1.4 sec C = 2×2 99613 18394

4391 1125

Among the flights predicted to have a 20% or higher probability of being late, about 20% were late 1125/(1125 + 4391). Among the remainder, less than 16% were late 18394/(18394 + 99613).

32-8

Bayesian Optimization with Tall Arrays

Bayesian Optimization with Tall Arrays This example shows how to use Bayesian optimization to select optimal parameters for training a kernel classifier by using the 'OptimizeHyperparameters' name-value argument. The sample data set airlinesmall.csv is a large data set that contains a tabular file of airline flight data. This example creates a tall table containing the data, and extracts class labels and predictor data from the tall table to run the optimization procedure. When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. If you want to run the example using the local MATLAB session when you have Parallel Computing Toolbox, you can change the global execution environment by using the mapreducer function. Get Data into MATLAB® Create a datastore that references the folder location with the data. The data can be contained in a single file, a collection of files, or an entire folder. For folders that contain a collection of files, you can specify the entire folder location, or use the wildcard character, '*.csv', to include multiple files with the same file extension in the datastore. Select a subset of the variables to work with, and treat 'NA' values as missing data so that datastore replaces them with NaN values. Create a tall table that contains the data in the datastore. ds = datastore('airlinesmall.csv'); ds.SelectedVariableNames = {'Month','DayofMonth','DayOfWeek',... 'DepTime','ArrDelay','Distance','DepDelay'}; ds.TreatAsMissing = 'NA'; tt = tall(ds) % Tall table Starting parallel pool (parpool) using the 'Processes' profile ... Connected to parallel pool with 6 workers. tt = M×7 tall table Month _____ 10 10 10 10 10 10 10 10 : :

DayofMonth __________

DayOfWeek _________

21 26 23 23 22 28 8 10 : :

3 1 5 5 4 3 4 6 : :

DepTime _______ 642 1021 2055 1332 629 1446 928 859 : :

ArrDelay ________ 8 8 21 13 4 59 3 11 : :

Distance ________ 308 296 480 296 373 308 447 954 : :

DepDelay ________ 12 1 20 12 -1 63 -2 -1 : :

Prepare Class Labels and Predictor Data Determine the flights that are late by 10 minutes or more by defining a logical variable that is true for a late flight. This variable contains the class labels. A preview of this variable includes the first few rows. 32-9

32

Tall Arrays

Y = tt.DepDelay > 10 % Class labels Y = M×1 tall logical array 1 0 1 1 0 1 0 0 : :

Create a tall array for the predictor data. X = tt{:,1:end-1} % Predictor data X = M×6 tall double matrix 10 10 10 10 10 10 10 10 : :

21 26 23 23 22 28 8 10 : :

3 1 5 5 4 3 4 6 : :

642 1021 2055 1332 629 1446 928 859 : :

8 8 21 13 4 59 3 11 : :

308 296 480 296 373 308 447 954 : :

Remove rows in X and Y that contain missing data. R = rmmissing([X Y]); % Data with missing entries removed X = R(:,1:end-1); Y = R(:,end);

Perform Bayesian Optimization Using OptimizeHyperparameters Optimize hyperparameters using the 'OptimizeHyperparameters' name-value argument. Standardize the predictor variables. Z = zscore(X);

Find the optimal values for the 'KernelScale' and 'Lambda' name-value arguments that minimize the loss on the holdout validation set. By default, the software selects and reserves 20% of the data as validation data, and trains the model using the rest of the data. You can change the holdout fraction by using the 'HyperparameterOptimizationOptions' name-value argument. For reproducibility, use the 'expected-improvement-plus' acquisition function and set the seeds of the random number generators using rng and tallrng. The results can vary depending on the number of workers and the execution environment for the tall arrays. For details, see “Control Where Your Code Runs”. 32-10

Bayesian Optimization with Tall Arrays

rng('default') tallrng('default') Mdl = fitckernel(Z,Y,'Verbose',0,'OptimizeHyperparameters', ... {'KernelScale','Lambda'},'HyperparameterOptimizationOptions', ... struct('AcquisitionFunctionName','expected-improvement-plus'))

Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 2: Completed in 6.4 sec - Pass 2 of 2: Completed in 1.9 sec Evaluation completed in 9.8 sec Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.8 sec Evaluation completed in 1.9 sec |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | KernelScale | La | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 1 | Best | 0.19672 | 83.403 | 0.19672 | 0.19672 | 1.2297 | 0.008 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.94 sec Evaluation completed in 1.1 sec | 2 | Accept | 0.19672 | 35.532 | 0.19672 | 0.19672 | 0.039643 | 2.5756 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.97 sec Evaluation completed in 1.1 sec | 3 | Accept | 0.19672 | 35.184 | 0.19672 | 0.19672 | 0.02562 | 1.2555 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.83 sec Evaluation completed in 0.93 sec | 4 | Accept | 0.19672 | 34.375 | 0.19672 | 0.19672 | 92.644 | 1.2056 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.86 sec Evaluation completed in 0.96 sec | 5 | Best | 0.11469 | 55.874 | 0.11469 | 0.12698 | 11.173 | 0.0002 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.86 sec Evaluation completed in 0.97 sec | 6 | Best | 0.11365 | 50.865 | 0.11365 | 0.11373 | 10.609 | 0.0002 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.92 sec Evaluation completed in 1 sec | 7 | Accept | 0.19672 | 34.5 | 0.11365 | 0.11373 | 0.0059498 | 0.0004 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.86 sec Evaluation completed in 0.95 sec | 8 | Accept | 0.12122 | 56.119 | 0.11365 | 0.11371 | 11.44 | 0.0004 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.96 sec Evaluation completed in 1.1 sec | 9 | Best | 0.10417 | 28.379 | 0.10417 | 0.10417 | 8.0424 | 6.7998 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.82 sec Evaluation completed in 0.92 sec | 10 | Accept | 0.10433 | 27.657 | 0.10417 | 0.10417 | 9.6694 | 1.4948 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.9 sec Evaluation completed in 0.99 sec | 11 | Best | 0.10409 | 27.999 | 0.10409 | 0.10411 | 6.2099 | 6.1093

32-11

32

Tall Arrays

Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.84 sec Evaluation completed in 0.93 sec | 12 | Best | 0.10383 | 29.942 | 0.10383 | 0.10404 | 5.6767 | 7.6134 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.84 sec Evaluation completed in 0.93 sec | 13 | Accept | 0.10408 | 30.076 | 0.10383 | 0.10365 | 8.1769 | 8.5993 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.87 sec Evaluation completed in 0.96 sec | 14 | Accept | 0.10404 | 28.263 | 0.10383 | 0.10361 | 7.6191 | 6.4079 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.88 sec Evaluation completed in 0.98 sec | 15 | Best | 0.10351 | 28.352 | 0.10351 | 0.10362 | 4.2987 | 9.2645 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.86 sec Evaluation completed in 0.95 sec | 16 | Accept | 0.10404 | 29.934 | 0.10351 | 0.10362 | 4.8747 | 1.7838 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.92 sec Evaluation completed in 1 sec | 17 | Accept | 0.10657 | 57.566 | 0.10351 | 0.10357 | 4.8239 | 0.0001 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.85 sec Evaluation completed in 0.94 sec | 18 | Best | 0.10299 | 28.692 | 0.10299 | 0.10358 | 3.5555 | 2.7165 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.89 sec Evaluation completed in 0.99 sec | 19 | Accept | 0.10366 | 28.388 | 0.10299 | 0.10324 | 3.8035 | 1.3542 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.86 sec Evaluation completed in 0.95 sec | 20 | Accept | 0.10337 | 28.643 | 0.10299 | 0.10323 | 3.806 | 1.8101 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.88 sec Evaluation completed in 0.97 sec |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | KernelScale | La | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 21 | Accept | 0.10345 | 28.972 | 0.10299 | 0.10322 | 3.3655 | 9.082 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.86 sec Evaluation completed in 0.95 sec | 22 | Accept | 0.19672 | 36.972 | 0.10299 | 0.10322 | 999.62 | 1.2609 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.86 sec Evaluation completed in 0.96 sec | 23 | Accept | 0.10315 | 28.377 | 0.10299 | 0.10306 | 3.6716 | 1.2445 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.92 sec Evaluation completed in 1 sec | 24 | Accept | 0.19672 | 34.383 | 0.10299 | 0.10306 | 0.0010004 | 2.6214 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.92 sec

32-12

Bayesian Optimization with Tall Arrays

Evaluation completed in 1 sec | 25 | Accept | 0.19672 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.84 sec Evaluation completed in 0.95 sec | 26 | Accept | 0.19672 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.93 sec Evaluation completed in 1 sec | 27 | Accept | 0.19672 | Evaluating tall expression using the - Pass 1 of 1: Completed in 1.2 sec Evaluation completed in 1.4 sec | 28 | Accept | 0.19672 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.88 sec Evaluation completed in 0.97 sec | 29 | Accept | 0.10354 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.84 sec Evaluation completed in 0.92 sec | 30 | Accept | 0.10405 |

33.973 | 0.10299 | 0.10306 | Parallel Pool 'Processes':

0.21865 |

0.002

36.951 | 0.10299 | 0.10306 | Parallel Pool 'Processes':

299.92 |

0.003

34.099 | 0.10299 | 0.10306 | Parallel Pool 'Processes':

0.002436 |

0.004

36.694 | 0.10299 | 0.10305 | Parallel Pool 'Processes':

0.50559 |

3.3667

30.402 | 0.10299 | 0.10313 | Parallel Pool 'Processes':

3.7754 |

9.5626

27.686 |

8.9864 |

2.3136

0.10299 |

0.10315 |

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 1102.4884 seconds

32-13

32

Tall Arrays

32-14

Bayesian Optimization with Tall Arrays

Total objective function evaluation time: 1088.2513 Best observed feasible point: KernelScale Lambda ___________ __________ 3.5555

2.7165e-06

Observed objective function value = 0.10299 Estimated objective function value = 0.10332 Function evaluation time = 28.6921 Best estimated feasible point (according to models): KernelScale Lambda ___________ __________ 3.6716

1.2445e-08

32-15

32

Tall Arrays

Estimated objective function value = 0.10315 Estimated function evaluation time = 29.1903 Mdl = ClassificationKernel PredictorNames: ResponseName: ClassNames: Learner: NumExpansionDimensions: KernelScale: Lambda: BoxConstraint:

{'x1' 'x2' 'Y' [0 1] 'svm' 256 3.6716 1.2445e-08 665.9442

'x3'

'x4'

'x5'

'x6'}

Properties, Methods

Perform Bayesian Optimization by Using bayesopt Alternatively, you can use the bayesopt function to find the optimal values of hyperparameters. Split the data set into training and test sets. Specify a 1/3 holdout sample for the test set. rng('default') % For reproducibility tallrng('default') % For reproducibility Partition = cvpartition(Y,'Holdout',1/3); trainingInds = training(Partition); % Indices for the training set testInds = test(Partition); % Indices for the test set

Extract training and testing data and standardize the predictor data. Ytrain = Y(trainingInds); % Training class labels Xtrain = X(trainingInds,:); [Ztrain,mu,stddev] = zscore(Xtrain); % Standardized training data Ytest = Y(testInds); % Testing class labels Xtest = X(testInds,:); Ztest = (Xtest-mu)./stddev; % Standardized test data

Define the variables sigma and lambda to find the optimal values for the 'KernelScale' and 'Lambda' name-value arguments. Use optimizableVariable and specify a wide range for the variables because optimal values are unknown. Apply logarithmic transformation to the variables to search for the optimal values on a log scale. N = gather(numel(Ytrain)); % Evaluate the length of the tall training array in memory Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.85 sec Evaluation completed in 0.99 sec sigma = optimizableVariable('sigma',[1e-3,1e3],'Transform','log'); lambda = optimizableVariable('lambda',[(1e-3)/N, (1e3)/N],'Transform','log');

Create the objective function for Bayesian optimization. The objective function takes in a table that contains the variables sigma and lambda, and then computes the classification loss value for the binary Gaussian kernel classification model trained using the fitckernel function. Set 'Verbose',0 within fitckernel to suppress the iterative display of diagnostic information. 32-16

Bayesian Optimization with Tall Arrays

minfn = @(z)gather(loss(fitckernel(Ztrain,Ytrain, ... 'KernelScale',z.sigma,'Lambda',z.lambda,'Verbose',0), ... Ztest,Ytest));

Optimize the parameters [sigma,lambda] of the kernel classification model with respect to the classification loss by using bayesopt. By default, bayesopt displays iterative information about the optimization at the command line. For reproducibility, set the AcquisitionFunctionName option to 'expected-improvement-plus'. The default acquisition function depends on run time and, therefore, can give varying results. results = bayesopt(minfn,[sigma,lambda], ... 'AcquisitionFunctionName','expected-improvement-plus')

Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | sigma | la | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 1 | Best | 0.19651 | 55.774 | 0.19651 | 0.19651 | 1.2297 | 0.01 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 2 | Accept | 0.19651 | 77.257 | 0.19651 | 0.19651 | 0.039643 | 3.8633 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 3 | Accept | 0.19651 | 54.632 | 0.19651 | 0.19651 | 0.02562 | 1.8832 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.99 sec Evaluation completed in 1.1 sec | 4 | Accept | 0.19651 | 30.84 | 0.19651 | 0.19651 | 92.644 | 1.8084 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.1 sec | 5 | Accept | 0.19651 | 31.653 | 0.19651 | 0.19651 | 978.95 | 0.0001 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 6 | Accept | 0.19651 | 62.207 | 0.19651 | 0.19651 | 0.0089609 | 0.005 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 7 | Accept | 0.19651 | 68.693 | 0.19651 | 0.19651 | 0.0010015 | 1.4474 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 8 | Accept | 0.19651 | 53.736 | 0.19651 | 0.19651 | 0.27475 | 0.004 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 9 | Accept | 0.19651 | 58.473 | 0.19651 | 0.19651 | 0.81326 | 1.0753 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 10 | Accept | 0.19651 | 68.878 | 0.19651 | 0.19651 | 0.0040507 | 0.0001 Evaluating tall expression using the Parallel Pool 'Processes':

32-17

32

Tall Arrays

- Pass 1 of 1: Completed in 0.99 sec Evaluation completed in 1.1 sec | 11 | Accept | 0.19651 | 31.761 | 0.19651 | 0.19651 | 980.38 | 1.362 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 0.98 sec Evaluation completed in 1.1 sec | 12 | Accept | 0.19651 | 31.332 | 0.19651 | 0.19651 | 968.03 | 0.01 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.2 sec Evaluation completed in 1.3 sec | 13 | Accept | 0.19651 | 60.755 | 0.19651 | 0.19651 | 0.41617 | 1.6704 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.1 sec | 14 | Best | 0.10059 | 25.589 | 0.10059 | 0.1006 | 2.9545 | 2.4479 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.3 sec | 15 | Accept | 0.10098 | 24.933 | 0.10059 | 0.1006 | 5.3367 | 2.7906 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 2.1 sec Evaluation completed in 2.3 sec | 16 | Accept | 0.10101 | 30.013 | 0.10059 | 0.1006 | 4.233 | 1.4951 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 17 | Best | 0.10049 | 28.908 | 0.10049 | 0.10013 | 4.0225 | 2.3847 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.1 sec | 18 | Accept | 0.10076 | 26.625 | 0.10049 | 0.10032 | 3.7144 | 1.9977 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.2 sec Evaluation completed in 1.3 sec | 19 | Accept | 0.10061 | 29.056 | 0.10049 | 0.10025 | 3.5125 | 4.2084 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 20 | Accept | 0.10056 | 26.932 | 0.10049 | 0.10029 | 3.7269 | 2.7754 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.1 sec |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | sigma | la | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 21 | Accept | 0.10089 | 25.84 | 0.10049 | 0.10044 | 3.8681 | 2.9799 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.1 sec | 22 | Accept | 0.10101 | 25.461 | 0.10049 | 0.10052 | 6.1914 | 8.6976 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 23 | Accept | 0.10161 | 27.882 | 0.10049 | 0.10053 | 5.1566 | 5.2959 Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.1 sec

32-18

Bayesian Optimization with Tall Arrays

| 24 | Accept | 0.1028 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.99 sec Evaluation completed in 1.1 sec | 25 | Accept | 0.11158 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.95 sec Evaluation completed in 1 sec | 26 | Best | 0.10014 | Evaluating tall expression using the - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 27 | Accept | 0.19651 | Evaluating tall expression using the - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 28 | Accept | 0.10103 | Evaluating tall expression using the - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.1 sec | 29 | Accept | 0.19651 | Evaluating tall expression using the - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 30 | Accept | 0.19651 |

61.428 | 0.10049 | 0.10053 | Parallel Pool 'Processes':

3.8952 |

0.0001

50.54 | 0.10049 | 0.10053 | Parallel Pool 'Processes':

12.25 |

0.0001

25.922 | 0.10014 | 0.10042 | Parallel Pool 'Processes':

3.5501 |

2.292

70.888 | 0.10014 | 0.10042 | Parallel Pool 'Processes':

0.0010185 |

1.3606

24.309 | 0.10014 | 0.10053 | Parallel Pool 'Processes':

3.4712 |

2.1357

32.212 | 0.10014 | 0.10053 | Parallel Pool 'Processes':

980.28 |

1.8241

0.0010035 |

0.01

66.958 |

0.10014 |

0.10053 |

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 1361.76 seconds Total objective function evaluation time: 1289.4879 Best observed feasible point: sigma lambda ______ _________ 3.5501

2.292e-05

Observed objective function value = 0.10014 Estimated objective function value = 0.10053 Function evaluation time = 25.9216 Best estimated feasible point (according to models): sigma lambda ______ _________ 3.5501

2.292e-05

Estimated objective function value = 0.10053 Estimated function evaluation time = 26.7715

32-19

32

Tall Arrays

32-20

Bayesian Optimization with Tall Arrays

results = BayesianOptimization with properties: ObjectiveFcn: VariableDescriptions: Options: MinObjective: XAtMinObjective: MinEstimatedObjective: XAtMinEstimatedObjective: NumObjectiveEvaluations: TotalElapsedTime: NextPoint: XTrace: ObjectiveTrace: ConstraintsTrace: UserDataTrace: ObjectiveEvaluationTimeTrace: IterationTimeTrace: ErrorTrace: FeasibilityTrace: FeasibilityProbabilityTrace: IndexOfMinimumTrace: ObjectiveMinimumTrace: EstimatedObjectiveMinimumTrace:

@(z)gather(loss(fitckernel(Ztrain,Ytrain,'KernelScale',z.sigm [1×2 optimizableVariable] [1×1 struct] 0.1001 [1×2 table] 0.1005 [1×2 table] 30 1.3618e+03 [1×2 table] [30×2 table] [30×1 double] [] {30×1 cell} [30×1 double] [30×1 double] [30×1 double] [30×1 logical] [30×1 double] [30×1 double] [30×1 double] [30×1 double]

32-21

32

Tall Arrays

Return the best feasible point in the Bayesian model results by using the bestPoint function. Use the default criterion min-visited-upper-confidence-interval, which determines the best feasible point as the visited point that minimizes an upper confidence interval on the objective function value. zbest = bestPoint(results) zbest=1×2 table sigma lambda ______ _________ 3.5501

2.292e-05

The table zbest contains the optimal estimated values for the 'KernelScale' and 'Lambda' name-value arguments. You can specify these values when training a new optimized kernel classifier by using Mdl = fitckernel(Ztrain,Ytrain,'KernelScale',zbest.sigma,'Lambda',zbest.lambda)

For tall arrays, the optimization procedure can take a long time. If the data set is too large to run the optimization procedure, you can try to optimize the parameters by using only partial data. Use the datasample function and specify 'Replace','false' to sample data without replacement.

See Also bayesopt | bestPoint | cvpartition | datastore | fitckernel | gather | loss | optimizableVariable | tall

32-22

Statistics and Machine Learning with Big Data Using Tall Arrays

Statistics and Machine Learning with Big Data Using Tall Arrays This example shows how to perform statistical analysis and machine learning on out-of-memory data with MATLAB® and Statistics and Machine Learning Toolbox™. Tall arrays and tables are designed for working with out-of-memory data. This type of data consists of a very large number of rows (observations) compared to a smaller number of columns (variables). Instead of writing specialized code that takes into account the huge size of the data, such as with MapReduce, you can use tall arrays to work with large data sets in a manner similar to in-memory MATLAB arrays. The fundamental difference is that tall arrays typically remain unevaluated until you request that the calculations be performed. When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. To run the example using the local MATLAB session when you have Parallel Computing Toolbox, change the global execution environment by using the mapreducer function. mapreducer(0)

This example works with a subset of data on a single computer to develop a linear regression model, and then it scales up to analyze all of the data set. You can scale up this analysis even further to: • Work with data that cannot be read into memory • Work with data distributed across clusters using MATLAB Parallel Server™ • Integrate with big data systems like Hadoop® and Spark® Introduction to Machine Learning with Tall Arrays Several unsupervised and supervised learning algorithms in Statistics and Machine Learning Toolbox are available to work with tall arrays to perform data mining and predictive modeling with out-ofmemory data. These algorithms are appropriate for out-of-memory data and can include slight variations from the in-memory algorithms. Capabilities include: • k-Means clustering • Linear regression • Generalized linear regression • Logistic regression • Discriminant analysis The machine learning workflow for out-of-memory data in MATLAB is similar to in-memory data: 1

Preprocess

2

Explore

3

Develop model

4

Validate model

5

Scale up to larger data 32-23

32

Tall Arrays

This example follows a similar structure in developing a predictive model for airline delays. The data includes a large file of airline flight information from 1987 through 2008. The example goal is to predict the departure delay based on a number of variables. Details on the fundamental aspects of tall arrays are included in the example “Analyze Big Data in MATLAB Using Tall Arrays”. This example extends the analysis to include machine learning with tall arrays. Create Tall Table of Airline Data A datastore is a repository for collections of data that are too large to fit in memory. You can create a datastore from a number of different file formats as the first step to create a tall array from an external data source. Create a datastore for the sample file airlinesmall.csv. Select the variables of interest, treat 'NA' values as missing data, and generate a preview table of the data. ds = datastore('airlinesmall.csv'); ds.SelectedVariableNames = {'Year','Month','DayofMonth','DayOfWeek',... 'DepTime','ArrDelay','DepDelay','Distance'}; ds.TreatAsMissing = 'NA'; pre = preview(ds) pre=8×8 table Year Month ____ _____ 1987 1987 1987 1987 1987 1987 1987 1987

10 10 10 10 10 10 10 10

DayofMonth __________

DayOfWeek _________

21 26 23 23 22 28 8 10

3 1 5 5 4 3 4 6

DepTime _______ 642 1021 2055 1332 629 1446 928 859

ArrDelay ________

DepDelay ________

8 8 21 13 4 59 3 11

12 1 20 12 -1 63 -2 -1

Distance ________ 308 296 480 296 373 308 447 954

Create a tall table backed by the datastore to facilitate working with the data. The underlying data type of a tall array depends on the type of datastore. In this case, the datastore is tabular text and returns a tall table. The display includes a preview of the data, with indication that the size is unknown. tt = tall(ds) tt = Mx8 tall table Year ____ 1987 1987 1987 1987 1987 1987

32-24

Month _____ 10 10 10 10 10 10

DayofMonth __________

DayOfWeek _________

21 26 23 23 22 28

3 1 5 5 4 3

DepTime _______ 642 1021 2055 1332 629 1446

ArrDelay ________

DepDelay ________

8 8 21 13 4 59

12 1 20 12 -1 63

Distance ________ 308 296 480 296 373 308

Statistics and Machine Learning with Big Data Using Tall Arrays

1987 1987 : :

10 10 : :

8 10 : :

4 6 : :

928 859 : :

3 11 : :

-2 -1 : :

447 954 : :

Preprocess Data This example aims to explore the time of day and day of week in more detail. Convert the day of week to categorical data with labels and determine the hour of day from the numeric departure time variable. tt.DayOfWeek = categorical(tt.DayOfWeek,1:7,{'Sun','Mon','Tues',... 'Wed','Thu','Fri','Sat'}); tt.Hr = discretize(tt.DepTime,0:100:2400,0:23) tt = Mx9 tall table Year ____

Month _____

1987 1987 1987 1987 1987 1987 1987 1987 : :

DayofMonth __________

10 10 10 10 10 10 10 10 : :

DayOfWeek _________

21 26 23 23 22 28 8 10 : :

Tues Sun Thu Thu Wed Tues Wed Fri : :

DepTime _______ 642 1021 2055 1332 629 1446 928 859 : :

ArrDelay ________

DepDelay ________

8 8 21 13 4 59 3 11 : :

12 1 20 12 -1 63 -2 -1 : :

Distance ________ 308 296 480 296 373 308 447 954 : :

Hr __

6 10 20 13 6 14 9 8 : :

Include only years after 2000 and ignore rows with missing data. Identify data of interest by logical condition. idx = tt.Year >= 2000 & ... ~any(ismissing(tt),2); tt = tt(idx,:);

Explore Data by Group A number of exploratory functions are available for tall arrays. For example, the grpstats function calculates grouped statistics of tall arrays. Explore the data by determining the centrality and spread of the data with summary statistics grouped by day of week. Also, explore the correlation between the departure delay and arrival delay. g = grpstats(tt(:,{'ArrDelay','DepDelay','DayOfWeek'}),'DayOfWeek',... {'mean','std','skewness','kurtosis'}) g = Mx11 tall table GroupLabel __________ ?

DayOfWeek _________ ?

GroupCount __________ ?

mean_ArrDelay _____________ ?

std_ArrDelay ____________ ?

skewness_ArrDelay _________________ ?

32-25

32

Tall Arrays

? ? : :

? ? : :

? ? : :

? ? : :

? ? : :

? ? : :

C = corr(tt.DepDelay,tt.ArrDelay) C = MxNx... tall array ? ? ? : :

? ? ? : :

? ? ? : :

... ... ...

These commands produce more tall arrays. The commands are not executed until you explicitly gather the results into the workspace. The gather command triggers execution and attempts to minimize the number of passes required through the data to perform the calculations. gather requires that the resulting variables fit into memory. [statsByDay,C] = gather(g,C) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 3.6 sec Evaluation completed in 4.9 sec statsByDay=7×11 table GroupLabel DayOfWeek __________ _________ {'Fri' } {'Mon' } {'Sat' } {'Sun' } {'Thu' } {'Tues'} {'Wed' }

Fri Mon Sat Sun Thu Tues Wed

GroupCount __________ 7339 8443 8045 8570 8601 8381 8489

mean_ArrDelay _____________ 4.1512 5.2487 7.132 7.7515 10.053 6.4786 9.3324

std_ArrDelay ____________ 32.1 32.453 33.108 36.003 36.18 32.322 37.406

skewness_ArrDelay _________________ 7.082 4.5811 3.6457 5.7943 4.1381 4.374 5.1638

C = 0.8966

The variables containing the results are now in-memory variables in the Workspace. Based on these calculations, variation occurs in the data and there is correlation between the delays that you can investigate further. Explore the effect of day of week and hour of day and gain additional statistical information such as the standard error of the mean and the 95% confidence interval for the mean. You can pass the entire tall table and specify which variables to perform calculations on. byDayHr = grpstats(tt,{'Hr','DayOfWeek'},... {'mean','sem','meanci'},'DataVar','DepDelay'); byDayHr = gather(byDayHr); Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 2.5 sec Evaluation completed in 3.1 sec

32-26

Statistics and Machine Learning with Big Data Using Tall Arrays

Due to the data partitioning of the tall array, the output might be unordered. Rearrange the data in memory for further exploration. x = unstack(byDayHr(:,{'Hr','DayOfWeek','mean_DepDelay'}),... 'mean_DepDelay','DayOfWeek'); x = sortrows(x) x=24×8 table Hr Sun __ _______ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

⋮

38.519 45.846 NaN NaN -7 -2.2409 0.4 3.4173 2.3759 2.5325 6.37 6.9946 5.673 8.0879 9.5164 8.1257

Mon ________

Tues ________

Wed _______

Thu _______

Fri _______

Sat _______

71.914 27.875 39 NaN -6.2857 -3.7099 -1.8909 -0.47222 1.4054 1.6805 5.2868 4.9165 5.1193 7.1017 5.8343 4.8802

39.656 93.6 102 NaN -7 -4.0146 -1.9802 -0.18893 1.6745 2.7656 3.6822 5.5639 5.7081 5.0857 7.416 7.4726

34.667 125.23 NaN NaN -7.3333 -3.9565 -1.8304 0.71546 2.2345 2.683 7.5773 5.5936 7.9178 8.8082 9.5954 9.8674

90 52.765 78.25 -377.5 -10.5 -3.5897 -1.3578 0.08 2.9668 5.6138 5.3372 7.0435 7.5269 8.2878 8.6667 10.235

25.536 38.091 -1.5 53.5 -5 -3.5766 0.84161 1.069 1.6727 3.4838 6.9391 4.8989 8.0625 8.0675 6.0677 7.167

65.579 29.182 NaN NaN NaN -4.1474 -2.2537 -1.3221 0.88213 2.5011 4.9979 5.2839 7.4686 6.2107 8.444 8.6219

Visualize Data in Tall Arrays Currently, you can visualize tall array data using histogram, histogram2, binScatterPlot, and ksdensity. The visualizations all trigger execution, similar to calling the gather function. Use binScatterPlot to examine the relationship between the Hr and DepDelay variables. binScatterPlot(tt.Hr,tt.DepDelay,'Gamma',0.25) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.97 sec Evaluation completed in 1.3 sec Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec ylim([0 500]) xlabel('Time of Day') ylabel('Delay (Minutes)')

32-27

32

Tall Arrays

As noted in the output display, the visualizations often take two passes through the data: one to perform the binning, and one to perform the binned calculation and produce the visualization. Split Data into Training and Validation Sets To develop a machine learning model, it is useful to reserve part of the data to train and develop the model and another part of the data to test the model. A number of ways exist for you to split the data into training and validation sets. Use datasample to obtain a random sampling of the data. Then use cvpartition to partition the data into test and training sets. To obtain nonstratified partitions, set a uniform grouping variable by multiplying the data samples by zero. For reproducibility, set the seed of the random number generator using tallrng. The results can vary depending on the number of workers and the execution environment for the tall arrays. For details, see “Control Where Your Code Runs”. tallrng('default') data = datasample(tt,25000,'Replace',false); groups = 0*data.DepDelay; y = cvpartition(groups,'HoldOut',1/3); dataTrain = data(training(y),:); dataTest = data(test(y),:);

32-28

Statistics and Machine Learning with Big Data Using Tall Arrays

Fit Supervised Learning Model Build a model to predict the departure delay based on several variables. The linear regression model function fitlm behaves similarly to the in-memory function. However, calculations with tall arrays result in a CompactLinearModel, which is more efficient for large data sets. Model fitting triggers execution because it is an iterative process. model = fitlm(dataTrain,'ResponseVar','DepDelay') Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 2: Completed in 0.96 sec - Pass 2 of 2: Completed in 3.4 sec Evaluation completed in 5.1 sec model = Compact linear regression model: DepDelay ~ 1 + Year + Month + DayofMonth + DayOfWeek + DepTime + ArrDelay + Distance + Hr Estimated Coefficients:

(Intercept) Year Month DayofMonth DayOfWeek_Mon DayOfWeek_Tues DayOfWeek_Wed DayOfWeek_Thu DayOfWeek_Fri DayOfWeek_Sat DepTime ArrDelay Distance Hr

Estimate __________

SE __________

tStat ________

pValue __________

30.715 -0.01585 0.03009 -0.0094266 -0.36333 -0.2858 -0.56082 -0.25295 0.91768 0.45668 -0.011551 0.8081 0.0012881 1.4058

75.873 0.037853 0.028097 0.010903 0.35527 0.35245 0.35309 0.35239 0.36625 0.35785 0.0053851 0.002875 0.00016887 0.53785

0.40482 -0.41872 1.0709 -0.86457 -1.0227 -0.81091 -1.5883 -0.71782 2.5056 1.2762 -2.145 281.08 7.6281 2.6138

0.68562 0.67543 0.28421 0.38729 0.30648 0.41743 0.11224 0.47288 0.012234 0.20191 0.031964 0 2.5106e-14 0.0089613

Number of observations: 16667, Error degrees of freedom: 16653 Root Mean Squared Error: 12.4 R-squared: 0.834, Adjusted R-Squared: 0.833 F-statistic vs. constant model: 6.41e+03, p-value = 0

Predict and Validate the Model The display indicates fit information, as well as coefficients and associated coefficient statistics. The model variable contains information about the fitted model as properties, which you can access using dot notation. Alternatively, double click the variable in the Workspace to explore the properties interactively. model.Rsquared ans = struct with fields: Ordinary: 0.8335 Adjusted: 0.8334

32-29

32

Tall Arrays

Predict new values based on the model, calculate the residuals, and visualize using a histogram. The predict function predicts new values for both tall and in-memory data. pred = predict(model,dataTest); err = pred - dataTest.DepDelay; figure histogram(err,'BinLimits',[-100 100],'Normalization','pdf') Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 2: Completed in 1.7 sec - Pass 2 of 2: Completed in 0.95 sec Evaluation completed in 3 sec title('Histogram of Residuals')

Assess and Adjust Model Looking at the output p-values in the display, some variables might be unnecessary in the model. You can reduce the complexity of the model by removing these variables. Examine the significance of the variables in the model more closely using anova. a = anova(model) a=9×5 table SumSq __________

32-30

DF _____

MeanSq __________

F _______

pValue __________

Statistics and Machine Learning with Big Data Using Tall Arrays

Year Month DayofMonth DayOfWeek DepTime ArrDelay Distance Hr Error

26.88 175.84 114.6 3691.4 705.42 1.2112e+07 8920.9 1047.5 2.5531e+06

1 1 1 6 1 1 1 1 16653

26.88 175.84 114.6 615.23 705.42 1.2112e+07 8920.9 1047.5 153.31

0.17533 1.1469 0.74749 4.0129 4.6012 79004 58.188 6.8321

0.67543 0.28421 0.38729 0.00050851 0.031964 0 2.5106e-14 0.0089613

Based on the p-values, the variables Year, Month, and DayOfMonth are not significant to this model, so you can remove them without negatively affecting the model quality. To explore these model parameters further, use interactive visualizations such as plotSlice, plotInterations, and plotEffects. For example, use plotEffects to examine the estimated effect that each predictor variable has on the departure delay. plotEffects(model)

Based on these calculations, ArrDelay is the main effect in the model (it is highly correlated to DepDelay). The other effects are observable, but have much less impact. In addition, Hr was determined from DepTime, so only one of these variables is necessary to the model. Reduce the number of variables to exclude all date components, and then fit a new model. model2 = fitlm(dataTrain,'DepDelay ~ DepTime + ArrDelay + Distance')

32-31

32

Tall Arrays

Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 1.2 sec Evaluation completed in 1.4 sec model2 = Compact linear regression model: DepDelay ~ 1 + DepTime + ArrDelay + Distance Estimated Coefficients: Estimate _________ (Intercept) DepTime ArrDelay Distance

-1.4646 0.0025087 0.80767 0.0012981

SE __________

tStat _______

pValue __________

0.31696 0.00020401 0.0028712 0.00016886

-4.6207 12.297 281.3 7.6875

3.8538e-06 1.3333e-34 0 1.5838e-14

Number of observations: 16667, Error degrees of freedom: 16663 Root Mean Squared Error: 12.4 R-squared: 0.833, Adjusted R-Squared: 0.833 F-statistic vs. constant model: 2.77e+04, p-value = 0

Model Development Even with the model simplified, it can be useful to further adjust the relationships between the variables and include specific interactions. To experiment further, repeat this workflow with smaller tall arrays. For performance while tuning the model, you can consider working with a small extraction of in-memory data before scaling up to the entire tall array. In this example, you can use functionality like stepwise regression, which is suited for iterative, inmemory model development. After tuning the model, you can scale up to use tall arrays. Gather a subset of the data into the workspace and use stepwiselm to iteratively develop the model in memory. subset = gather(dataTest); Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.97 sec Evaluation completed in 1 sec sModel = stepwiselm(subset,'ResponseVar','DepDelay') 1. 2. 3. 4. 5. 6. 7. 8.

Adding Adding Adding Adding Adding Adding Adding Adding

ArrDelay, FStat = 42200.3016, pValue = 0 DepTime, FStat = 51.7918, pValue = 6.70647e-13 DepTime:ArrDelay, FStat = 42.4982, pValue = 7.48624e-11 Distance, FStat = 15.4303, pValue = 8.62963e-05 ArrDelay:Distance, FStat = 231.9012, pValue = 1.135326e-51 DayOfWeek, FStat = 3.4704, pValue = 0.0019917 DayOfWeek:ArrDelay, FStat = 26.334, pValue = 3.16911e-31 DayOfWeek:DepTime, FStat = 2.1732, pValue = 0.042528

sModel = Linear regression model: DepDelay ~ 1 + DayOfWeek*DepTime + DayOfWeek*ArrDelay + DepTime*ArrDelay + ArrDelay*Distance Estimated Coefficients:

32-32

Statistics and Machine Learning with Big Data Using Tall Arrays

(Intercept) DayOfWeek_Mon DayOfWeek_Tues DayOfWeek_Wed DayOfWeek_Thu DayOfWeek_Fri DayOfWeek_Sat DepTime ArrDelay Distance DayOfWeek_Mon:DepTime DayOfWeek_Tues:DepTime DayOfWeek_Wed:DepTime DayOfWeek_Thu:DepTime DayOfWeek_Fri:DepTime DayOfWeek_Sat:DepTime DayOfWeek_Mon:ArrDelay DayOfWeek_Tues:ArrDelay DayOfWeek_Wed:ArrDelay DayOfWeek_Thu:ArrDelay DayOfWeek_Fri:ArrDelay DayOfWeek_Sat:ArrDelay DepTime:ArrDelay ArrDelay:Distance

Estimate ___________

SE __________

tStat ________

pValue __________

1.1799 -2.1377 -4.2868 -1.6233 -0.74772 -1.7618 -2.1121 7.5229e-05 0.8671 0.0015163 0.0017633 0.0032578 0.00097506 0.0012517 0.0026464 0.0021477 -0.11023 -0.14589 -0.041878 -0.096741 -0.077713 -0.13669 6.4148e-05 -0.00010512

1.0675 1.4298 1.4683 1.476 1.5226 1.5079 1.5214 0.00073613 0.013836 0.00023426 0.0010106 0.0010331 0.001044 0.0010694 0.0010711 0.0010646 0.014744 0.014814 0.012849 0.013308 0.015462 0.014652 7.7372e-06 7.3888e-06

1.1053 -1.4951 -2.9196 -1.0998 -0.49109 -1.1683 -1.3882 0.10219 62.669 6.4728 1.7448 3.1534 0.93398 1.1705 2.4707 2.0174 -7.4767 -9.8482 -3.2593 -7.2693 -5.0259 -9.329 8.2909 -14.227

0.26904 0.13493 0.0035137 0.27145 0.62338 0.2427 0.16511 0.9186 0 1.0167e-10 0.081056 0.0016194 0.35034 0.24184 0.013504 0.043689 8.399e-14 9.2943e-23 0.0011215 3.9414e-13 5.1147e-07 1.3471e-20 1.3002e-16 2.1138e-45

Number of observations: 8333, Error degrees of freedom: 8309 Root Mean Squared Error: 12 R-squared: 0.845, Adjusted R-Squared: 0.845 F-statistic vs. constant model: 1.97e+03, p-value = 0

The model that results from the stepwise fit includes interaction terms. Now try to fit a model for the tall data by using fitlm with the formula returned by stepwiselm. model3 = fitlm(dataTrain,sModel.Formula) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 1.6 sec Evaluation completed in 1.7 sec model3 = Compact linear regression model: DepDelay ~ 1 + DayOfWeek*DepTime + DayOfWeek*ArrDelay + DepTime*ArrDelay + ArrDelay*Distance Estimated Coefficients:

(Intercept) DayOfWeek_Mon DayOfWeek_Tues DayOfWeek_Wed DayOfWeek_Thu DayOfWeek_Fri

Estimate ___________

SE __________

tStat ________

pValue __________

-0.31595 -0.64218 -0.90163 -1.0798 -3.2765 0.44193

0.74499 1.0473 1.0383 1.0417 1.0379 1.0813

-0.4241 -0.61316 -0.86836 -1.0365 -3.157 0.40869

0.6715 0.53978 0.38521 0.29997 0.0015967 0.68277

32-33

32

Tall Arrays

DayOfWeek_Sat DepTime ArrDelay Distance DayOfWeek_Mon:DepTime DayOfWeek_Tues:DepTime DayOfWeek_Wed:DepTime DayOfWeek_Thu:DepTime DayOfWeek_Fri:DepTime DayOfWeek_Sat:DepTime DayOfWeek_Mon:ArrDelay DayOfWeek_Tues:ArrDelay DayOfWeek_Wed:ArrDelay DayOfWeek_Thu:ArrDelay DayOfWeek_Fri:ArrDelay DayOfWeek_Sat:ArrDelay DepTime:ArrDelay ArrDelay:Distance

1.1428 0.0014188 0.72526 0.0014824 0.00040994 0.00051826 0.00058426 0.0026229 0.0002959 -0.00060921 -0.034886 -0.0073661 -0.028158 -0.061065 0.052437 0.014205 7.2632e-05 -2.4743e-05

1.0777 0.00051612 0.011907 0.00017027 0.00073548 0.00073645 0.00073695 0.00073649 0.00077194 0.00075776 0.010435 0.010113 0.0099004 0.010381 0.010927 0.01039 5.3946e-06 4.6508e-06

1.0604 2.7489 60.913 8.7059 0.55738 0.70373 0.79281 3.5614 0.38332 -0.80396 -3.3432 -0.72837 -2.8441 -5.8821 4.7987 1.3671 13.464 -5.3203

0.28899 0.0059853 0 3.4423e-18 0.57728 0.48161 0.4279 0.00036991 0.70149 0.42143 0.00082993 0.4664 0.0044594 4.1275e-09 1.6111e-06 0.1716 4.196e-41 1.0496e-07

Number of observations: 16667, Error degrees of freedom: 16643 Root Mean Squared Error: 12.3 R-squared: 0.837, Adjusted R-Squared: 0.836 F-statistic vs. constant model: 3.7e+03, p-value = 0

You can repeat this process to continue to adjust the linear model. However, in this case, you should explore different types of regression that might be more appropriate for this data. For example, if you do not want to include the arrival delay, then this type of linear model is no longer appropriate. See “Logistic Regression with Tall Arrays” on page 32-2 for more information. Scale to Spark A key capability of tall arrays in MATLAB and Statistics and Machine Learning Toolbox is the connectivity to platforms such as Hadoop and Spark. You can even compile the code and run it on Spark using MATLAB Compiler™. See “Extend Tall Arrays with Other Products” for more information about using these products: • Database Toolbox™ • Parallel Computing Toolbox™ • MATLAB® Parallel Server™ • MATLAB Compiler™

See Also More About •

32-34

Function List (Tall Arrays)

33 Parallel Statistics

33

Parallel Statistics

Quick Start Parallel Computing for Statistics and Machine Learning Toolbox Note To use parallel computing, you must have a Parallel Computing Toolbox license.

Parallel Statistics and Machine Learning Toolbox Functionality You can use any of the Statistics and Machine Learning Toolbox functions with Parallel Computing Toolbox constructs such as parfor and spmd. However, some functions, such as those with interactive displays, can lose functionality in parallel. In particular, displays and interactive usage are not effective on workers (see “Vocabulary for Parallel Computation” on page 33-6). Additionally, some Statistics and Machine Learning Toolbox functions are enhanced to use parallel computing internally. For example, some model fitting functions perform hyperparameter optimization in parallel. For a complete list of Statistics and Machine Learning Toolbox functions that support parallel computing, see Function List (Automatic Parallel Support). For the usage notes and limitations of each function, see the Automatic Parallel Support section on the function reference page.

How to Compute in Parallel This section gives the simplest way to use the enhanced functions in parallel. For more advanced topics, including the issues of reproducibility and nested parfor loops, see the other topics in “Speed Up Statistical Computations”. For information on parallel statistical computing at the command line, enter help parallelstats

To have a function compute in parallel: 1. “Set Up a Parallel Environment” on page 33-2 2. “Set the UseParallel Option to true” on page 33-3 3. “Call the Function Using the Options Structure” on page 33-3 Set Up a Parallel Environment To run a statistical computation in parallel, first set up a parallel environment. Note Setting up a parallel environment can take several seconds. For a multicore machine, enter the following at the MATLAB command line: parpool(n)

n is the number of workers you want to use. You can also run parallel code in MATLAB Online™. For details, see “Use Parallel Computing Toolbox with Cloud Center Cluster in MATLAB Online” (Parallel Computing Toolbox). 33-2

Quick Start Parallel Computing for Statistics and Machine Learning Toolbox

Set the UseParallel Option to true Create an options structure with the statset function. To run in parallel, set the UseParallel option to true: paroptions = statset('UseParallel',true);

Call the Function Using the Options Structure Call your function with syntax that uses the options structure. For example: % Run crossval in parallel cvMse = crossval('mse',x,y,'predfun',regf,'Options',paroptions); % Run bootstrp in parallel sts = bootstrp(100,@(x)[mean(x) std(x)],y,'Options',paroptions); % Run TreeBagger in parallel b = TreeBagger(50,meas,spec,'OOBPred','on','Options',paroptions);

For more complete examples of parallel statistical functions, see “Use Parallel Processing for Regression TreeBagger Workflow” on page 33-4, “Implement Jackknife Using Parallel Computing” on page 33-20, “Implement Cross-Validation Using Parallel Computing” on page 33-21, and “Implement Bootstrap Using Parallel Computing” on page 33-23. After you have finished computing in parallel, close the parallel environment: delete mypool

Tip To save time, keep the pool open if you expect to compute in parallel again soon.

33-3

33

Parallel Statistics

Use Parallel Processing for Regression TreeBagger Workflow This example shows you how to: • Use an ensemble of bagged regression trees to estimate feature importance. • Improve computation speed by using parallel computing. The sample data is a database of 1985 car imports with 205 observations, 25 predictors, and 1 response, which is insurance risk rating, or "symboling." The first 15 variables are numeric and the last 10 are categorical. The symboling index takes integer values from -3 to 3. Load the sample data and separate it into predictor and response arrays. load imports-85; Y = X(:,1); X = X(:,2:end);

Set up the parallel environment to use the default number of workers. The computer that created this example has six cores. mypool = parpool Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6). mypool = ProcessPool with properties: Connected: NumWorkers: Cluster: AttachedFiles: AutoAddClientPath: IdleTimeout: SpmdEnabled:

true 6 local {} true 30 minutes (30 minutes remaining) true

Set the options to use parallel processing. paroptions = statset('UseParallel',true);

Estimate feature importance using leaf size 1 and 5000 trees in parallel. Time the function for comparison purposes. tic b = TreeBagger(5000,X,Y,'Method','r','OOBVarImp','on', ... 'cat',16:25,'MinLeafSize',1,'Options',paroptions); toc Elapsed time is 9.873065 seconds.

Perform the same computation in serial for timing comparison. tic b = TreeBagger(5000,X,Y,'Method','r','OOBVarImp','on', ... 'cat',16:25,'MinLeafSize',1); toc

33-4

Use Parallel Processing for Regression TreeBagger Workflow

Elapsed time is 28.092654 seconds.

The results show that computing in parallel takes a fraction of the time it takes to compute serially. Note that the elapsed time can vary depending on your operating system.

See Also parpool | statset | TreeBagger

Related Examples •

“Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger” on page 19-115

•

“Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger” on page 19-126

•

“Comparison of TreeBagger and Bagged Ensembles” on page 19-47

33-5

33

Parallel Statistics

Concepts of Parallel Computing in Statistics and Machine Learning Toolbox In this section... “Subtleties in Parallel Computing” on page 33-6 “Vocabulary for Parallel Computation” on page 33-6

Subtleties in Parallel Computing There are two main subtleties in parallel computations: • Nested parallel evaluations (see “No Nested parfor Loops” on page 33-14). Only the outermost parfor loop runs in parallel, the others run serially. • Reproducible results when using random numbers (see “Reproducibility in Parallel Statistical Computations” on page 33-16). How can you get exactly the same results when repeatedly running a parallel computation that uses random numbers?

Vocabulary for Parallel Computation • worker — An independent MATLAB session that runs code distributed by the client. • client — The MATLAB session with which you interact, and that distributes jobs to workers. • parfor — A Parallel Computing Toolbox function that distributes independent code segments to workers (see “Working with parfor” on page 33-14). • random stream — A pseudorandom number generator, and the sequence of values it generates. MATLAB implements random streams with the RandStream class. • reproducible computation — A computation that can be exactly replicated, even in the presence of random numbers (see “Reproducibility in Parallel Statistical Computations” on page 33-16).

33-6

When to Run Statistical Functions in Parallel

When to Run Statistical Functions in Parallel In this section... “Why Run in Parallel?” on page 33-7 “Factors Affecting Speed” on page 33-7 “Factors Affecting Results” on page 33-7

Why Run in Parallel? The main reason to run statistical computations in parallel is to gain speed, meaning to reduce the execution time of your program or functions. “Factors Affecting Speed” on page 33-7 discusses the main items affecting the speed of programs or functions. “Factors Affecting Results” on page 33-7 discusses details that can cause a parallel run to give different results than a serial run. Note Some Statistics and Machine Learning Toolbox functions have built-in parallel computing capabilities. See Quick Start Parallel Computing for Statistics and Machine Learning Toolbox on page 33-2. You can also use any Statistics and Machine Learning Toolbox functions with Parallel Computing Toolbox functions such as parfor loops. To decide when to call functions in parallel, consider the factors affecting speed and results.

Factors Affecting Speed Some factors that can affect the speed of execution of parallel processing are: • Parallel environment setup. It takes time to run parpool to begin computing in parallel. If your computation is fast, the setup time can exceed any time saved by computing in parallel. • Parallel overhead. There is overhead in communication and coordination when running in parallel. If function evaluations are fast, this overhead could be an appreciable part of the total computation time. Thus, solving a problem in parallel can be slower than solving the problem serially. For an example, see Improving Optimization Performance with Parallel Computing in MATLAB Digest, March 2009. • No nested parfor loops. This is described in “Working with parfor” on page 33-14. parfor does not work in parallel when called from within another parfor loop. If you have programmed your custom functions to take advantage of parallel processing, the limitation of no nested parfor loops can cause a parallel function to run slower than expected. • When executing serially, parfor loops run slightly slower than for loops. • Passing parameters. Parameters are automatically passed to worker sessions during the execution of parallel computations. If there are many parameters, or they take a large amount of memory, passing parameters can slow the execution of your computation. • Contention for resources: network and computing. If the pool of workers has low bandwidth or high latency, parallel computation can be slow.

Factors Affecting Results Some factors can affect results when using parallel processing. You might need to adjust your code to run in parallel, for example, you need independent loops and the workers must be able to access the variables. Some important factors are: 33-7

33

Parallel Statistics

• Persistent or global variables. If any functions use persistent or global variables, these variables can take different values on different worker processors. The body of a parfor loop cannot contain global or persistent variable declarations. • Accessing external files. The order of computations is not guaranteed during parallel processing, so external files can be accessed in unpredictable order, leading to unpredictable results. Furthermore, if multiple processors try to read an external file simultaneously, the file can become locked, leading to a read error, and halting function execution. • Noncomputational functions, such as input, plot, and keyboard, can behave badly when used in your custom functions. Do not use these functions in a parfor loop, because they can cause a worker to become nonresponsive, since it is waiting for input. • parfor does not allow break or return statements. • The random numbers you use can affect the results of your computations. See “Reproducibility in Parallel Statistical Computations” on page 33-16. For advice on converting for loops to use parfor, see “Parallel for-Loops (parfor)” (Parallel Computing Toolbox).

33-8

Analyze and Model Data on GPU

Analyze and Model Data on GPU This example shows how to improve code performance by executing on a graphical processing unit (GPU). Execution on a GPU can improve performance if: • Your code is computationally expensive, where computing time significantly exceeds the time spent transferring data to and from GPU memory. • Your workflow uses functions with gpuArray (Parallel Computing Toolbox) support and large array inputs. When writing code for a GPU, start with code that already performs well on a CPU. Vectorization is usually critical for achieving high performance on a GPU. Convert code to use functions that support GPU array arguments and transfer the input data to the GPU. For more information about MATLAB functions with GPU array inputs, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). Many functions in Statistics and Machine Learning Toolbox™ automatically execute on a GPU when you use GPU array input data. For example, you can create a probability distribution object on a GPU, where the output is a GPU array. pd = fitdist(gpuArray(x),"Normal") Using a GPU requires Parallel Computing Toolbox™ and a supported GPU device. For information about supported devices, see “GPU Computing Requirements” (Parallel Computing Toolbox). For the complete list of Statistics and Machine Learning Toolbox™ functions that accept GPU arrays, see “Functions” and then, in the left navigation bar, scroll to the Extended Capability section and select GPU Arrays. Examine Properties of GPU You can query and select your GPU device using the gpuDevice function. If you have multiple GPUs, you can examine the properties of all GPUs detected in your system by using the gpuDeviceTable function. Then, you can select a specific GPU for single-GPU execution by using its index (gpuDevice(index)). D = gpuDevice D = CUDADevice with properties: Name: Index: ComputeCapability: SupportsDouble: DriverVersion: ToolkitVersion: MaxThreadsPerBlock: MaxShmemPerBlock: MaxThreadBlockSize: MaxGridSize: SIMDWidth: TotalMemory: AvailableMemory: MultiprocessorCount:

'TITAN V' 1 '7.0' 1 11.2000 11.2000 1024 49152 (49.15 KB) [1024 1024 64] [2.1475e+09 65535 65535] 32 12652838912 (12.65 GB) 12096045056 (12.10 GB) 80

33-9

33

Parallel Statistics

ClockRateKHz: ComputeMode: GPUOverlapsTransfers: KernelExecutionTimeout: CanMapHostMemory: DeviceSupported: DeviceAvailable: DeviceSelected:

1455000 'Default' 1 0 1 1 1 1

Execute Function on GPU Explore a data distribution on a GPU using descriptive statistics. Generate a data set of normally distributed random numbers on a GPU. dist = randn(6e4,6e3,"gpuArray");

Determine whether dist is a GPU array. TF = isgpuarray(dist) TF = logical 1

Execute a function with a GPU array input argument. For example, calculate the sample skewness for each column in dist. Because dist is a GPU array, the skewness function executes on the GPU and returns the result as a GPU array. skew = skewness(dist);

Verify that the output skew is a GPU array. TF = isgpuarray(skew) TF = logical 1

Evaluate Speedup of GPU Execution Evaluate function execution time on the GPU and compare performance with execution on a CPU. Comparing the time taken to execute code on a CPU and a GPU can be useful in determining the appropriate execution environment. For example, if you want to compute descriptive statistics from sample data, considering the execution time and the data transfer time is important to evaluating the overall performance. If a function has GPU array support, as the number of observations increases, computation on the GPU generally improves compared to the CPU. Measure the function run time in seconds by using the gputimeit (Parallel Computing Toolbox) function. gputimeit is preferable to timeit for functions that use a GPU, because it ensures operation completion and compensates for overhead. skew = @() skewness(dist); t = gputimeit(skew) t = 0.2458

33-10

Analyze and Model Data on GPU

Evaluate the performance difference between the GPU and CPU by independently measuring the CPU execution time. In this case, execution of the code is faster on the GPU than on the CPU. The performance of code on a GPU is heavily dependent on the GPU used. For additional information about measuring and improving GPU performance, see “Measure and Improve GPU Performance” (Parallel Computing Toolbox). Single Precision on GPU You can improve the performance of your code by calculating in single precision instead of double precision. Determine the execution time of the skewness function using an input argument of the dist data set in single precision. dist_single = single(dist); skew_single = @() skewness(dist_single); t_single = gputimeit(skew_single) t_single = 0.0503

In this case, execution of the code with single precision data is faster than execution with double precision data. The performance improvement is dependent on the GPU card and total number of cores. For more information about using single precision with a GPU, see “Measure and Improve GPU Performance” (Parallel Computing Toolbox). Dimensionality Reduction and Model Fitting on GPU Implement dimensionality reduction and classification workflows on a GPU. Functions such as pca and fitcensemble can be used together to train a machine learning model. • The pca (principal component analysis) function reduces data dimensionality by replacing several correlated variables with a new set of variables that are linear combinations of the original variables. • The fitcensemble function fits many classification learners to form an ensemble model that can make better predictions than a single learner. Both functions are computationally intensive and can be significantly accelerated using a GPU. For example, consider the humanactivity data set. The data set contains 24,075 observations of five physical human activities: sitting, standing, walking, running, and dancing. Each observation has 60 features extracted from acceleration data measured by smartphone accelerometer sensors. The data set contains the following variables: • actid — Response vector containing the activity IDs in integers: 1, 2, 3, 4, and 5 representing sitting, standing, walking, running, and dancing, respectively • actnames — Activity names corresponding to the integer activity IDs • feat — Feature matrix of 60 features for 24,075 observations • featlabels — Labels of the 60 features load humanactivity

33-11

33

Parallel Statistics

Use 90% of the observations to train a model that classifies the five types of human activities, and use 10% of the observations to validate the trained model. Specify a 10% holdout for the test set by using cvpartition. Partition = cvpartition(actid,"Holdout",0.10); trainingInds = training(Partition); % Indices for the training set testInds = test(Partition); % Indices for the test set

Transfer the training and test data to the GPU. XTrain = gpuArray(feat(trainingInds,:)); YTrain = gpuArray(actid(trainingInds)); XTest = gpuArray(feat(testInds,:)); YTest = gpuArray(actid(testInds));

Find the principal components for the training data set XTrain. [coeff,score,~,~,explained,mu] = pca(XTrain);

Find the number of components required to explain at least 99% of variability. idx = find(cumsum(explained)>99,1);

Determine the principal component scores that represent X in the principal component space. XTrainPCA = score(:,1:idx);

Fit an ensemble of learners for classification. template = templateTree("MaxNumSplits",20,"Reproducible",true); classificationEnsemble = fitcensemble(XTrainPCA,YTrain, ... "Method","AdaBoostM2", ... "NumLearningCycles",30, ... "Learners",template, ... "LearnRate",0.1, ... "ClassNames",[1; 2; 3; 4; 5]);

To use the trained model for the test set, you need to transform the test data set by using the PCA obtained from the training data set. XTestPCA = (XTest-mu)*coeff(:,1:idx);

Evaluate the accuracy of the trained classifier with the test data. classificationError = loss(classificationEnsemble,XTestPCA,YTest);

Transfer to Local Workspace Transfer data or model properties from a GPU to the local workspace for use with a function that does not support GPU arrays. Transferring GPU arrays can be costly and is generally not necessary unless you need to use the results with functions that do not support GPU arrays, or use the results in another workspace where a GPU is unavailable. The gather (Parallel Computing Toolbox) function transfers data from the GPU into the local workspace. Gather the dist data, and then confirm that the data is no longer a GPU array. 33-12

Analyze and Model Data on GPU

dist = gather(dist); TF = isgpuarray(dist) TF = logical 0

The gather function transfers properties of a machine learning model from a GPU into the local workspace. Gather the classificationEnsemble model, and then confirm that the model properties that were previously GPU arrays, such as X, are no longer GPU arrays. classificationEnsemble = gather(classificationEnsemble); TF = isgpuarray(classificationEnsemble.X) TF = logical 0

See Also gpuArray | gputimeit | gather

Related Examples •

“Measure and Improve GPU Performance” (Parallel Computing Toolbox)

•

“Run MATLAB Functions on a GPU” (Parallel Computing Toolbox)

33-13

33

Parallel Statistics

Working with parfor In this section... “How Statistical Functions Use parfor” on page 33-14 “Characteristics of parfor” on page 33-14

How Statistical Functions Use parfor parfor is a Parallel Computing Toolbox function similar to a for loop. Parallel statistical functions call parfor internally. parfor distributes computations to worker processors.

Characteristics of parfor You might need to adjust your code to run in parallel, for example, you need independent loops and the workers must be able to access the variables. For advice on using parfor, see “Parallel for-Loops (parfor)” (Parallel Computing Toolbox). No Nested parfor Loops parfor does not work in parallel when called from within another parfor loop, or from an spmd block. Parallelization occurs only at the outermost level. 33-14

Working with parfor

Suppose, for example, you want to apply jackknife to your function userfcn, which calls parfor, and you want to call jackknife in a loop. The following figure shows three cases: 1

The outermost loop is parfor. Only that loop runs in parallel.

2

The outermost parfor loop is in jackknife. Only jackknife runs in parallel.

3

The outermost parfor loop is in userfcn. userfcn uses parfor in parallel.

When parfor Runs in Parallel For help converting nested loops to use parfor, see “Convert for-Loops Into parfor-Loops” (Parallel Computing Toolbox). See also Quick Start Parallel Computing for Statistics and Machine Learning Toolbox on page 33-2.

33-15

33

Parallel Statistics

Reproducibility in Parallel Statistical Computations In this section... “Issues and Considerations in Reproducing Parallel Computations” on page 33-16 “Running Reproducible Parallel Computations” on page 33-16 “Parallel Statistical Computation Using Random Numbers” on page 33-17

Issues and Considerations in Reproducing Parallel Computations A reproducible computation is one that gives the same results every time it runs. Reproducibility is important for: • Debugging — To correct an anomalous result, you need to reproduce the result. • Confidence — When you can reproduce results, you can investigate and understand them. • Modifying existing code — When you change existing code, you want to ensure that you do not break anything. Generally, you do not need to ensure reproducibility for your computation. Often, when you want reproducibility, the simplest technique is to run in serial instead of in parallel. In serial computation you can simply call the rng function as follows: s = rng % Obtain the current state of the random stream % run the statistical function rng(s) % Reset the stream to the previous state % run the statistical function again, obtain identical results

This section addresses the case when your function uses random numbers, and you want reproducible results in parallel. This section also addresses the case when you want the same results in parallel as in serial.

Running Reproducible Parallel Computations To run a Statistics and Machine Learning Toolbox function reproducibly: 1

Set the UseSubstreams option to true using statset.

2

Set the Streams option to a type that supports substreams: 'mlfg6331_64' or 'mrg32k3a'. For information on these streams, see RandStream.list.

3

To compute in parallel, set the UseParallel option to true.

4

To fit an ensemble in parallel using fitcensemble or fitrensemble, create a tree template with the 'Reproducible' name-value pair set to true: t = templateTree('Reproducible',true); ens = fitcensemble(X,Y,'Method','bag','Learners',t,... 'Options',options);

5

Call the function with the options structure.

6

To reproduce the computation, reset the stream, then call the function again.

To understand why this technique gives reproducibility, see “How Substreams Enable Reproducible Parallel Computations” on page 33-17. 33-16

Reproducibility in Parallel Statistical Computations

For example, to use the 'mlfg6331_64' stream for reproducible computation: 1

Create an appropriate options structure: s = RandStream('mlfg6331_64'); options = statset('UseParallel',true, ... 'Streams',s,'UseSubstreams',true);

2 3

Run your parallel computation. For instructions, see Quick Start Parallel Computing for Statistics and Machine Learning Toolbox on page 33-2. Reset the random stream: reset(s);

4

Rerun your parallel computation. You obtain identical results.

For examples of parallel computation run this reproducible way, see “Reproducible Parallel Bootstrap” on page 33-24 and “Train Classification Ensemble in Parallel” on page 19-111.

Parallel Statistical Computation Using Random Numbers What Are Substreams? A substream is a portion of a random stream that RandStream can access quickly. There is a number M such that for any positive integer k, RandStream can go to the kMth pseudorandom number in the stream. From that point, RandStream can generate the subsequent entries in the stream. Currently, RandStream has M = 272, about 5e21, or more.

The entries in different substreams have good statistical properties, similar to the properties of entries in a single stream: independence, and lack of k-way correlation at various lags. The substreams are so long that you can view the substreams as being independent streams, as in the following picture.

Two RandStream stream types support substreams: 'mlfg6331_64' and 'mrg32k3a'. How Substreams Enable Reproducible Parallel Computations When MATLAB performs computations in parallel with parfor, each worker receives loop iterations in an unpredictable order. Therefore, you cannot predict which worker gets which iteration, so cannot determine the random numbers associated with each iteration. 33-17

33

Parallel Statistics

Substreams allow MATLAB to tie each iteration to a particular sequence of random numbers. parfor gives each iteration an index. The iteration uses the index as the substream number. Since the random numbers are associated with the iterations, not with the workers, the entire computation is reproducible. To obtain reproducible results, simply reset the stream, and all the substreams generate identical random numbers when called again. This method succeeds when all the workers use the same stream, and the stream supports substreams. This concludes the discussion of how the procedure in “Running Reproducible Parallel Computations” on page 33-16 gives reproducible parallel results. Random Numbers on the Client or Workers A few functions generate random numbers on the client before distributing them to parallel workers. The workers do not use random numbers, so operate purely deterministically. For these functions, you can run a parallel computation reproducibly using any random stream type. The functions that operate this way include: • crossval • plsregress • sequentialfs To obtain identical results, reset the random stream on the client, or the random stream you pass to the client. For example: s = rng % Obtain the current state of the random stream % run the statistical function rng(s) % Reset the stream to the previous state % run the statistical function again, obtain identical results

While this method enables you to run reproducibly in parallel, the results can differ from a serial computation. The reason for the difference is parfor loops run in reverse order from for loops. Therefore, a serial computation can generate random numbers in a different order than a parallel computation. For unequivocal reproducibility, use the technique in “Running Reproducible Parallel Computations” on page 33-16. Distributing Streams Explicitly For testing or comparison using particular random number algorithms, you must set the random number generators. How do you set these generators in parallel, or initialize streams on each worker in a particular way? Or you might want to run a computation using a different sequence of random numbers than any other you have run. How can you ensure the sequence you use is statistically independent? Parallel Statistics and Machine Learning Toolbox functions allow you to set random streams on each worker explicitly. For information on creating multiple streams, enter help RandStream/create at the command line. To create four independent streams using the 'mrg32k3a' generator: s = RandStream.create('mrg32k3a','NumStreams',4,... 'CellOutput',true);

Pass these streams to a statistical function using the Streams option. For example: parpool(4) % if you have at least 4 cores s = RandStream.create('mrg32k3a','NumStreams',4,...

33-18

Reproducibility in Parallel Statistical Computations

'CellOutput',true); % create 4 independent streams paroptions = statset('UseParallel',true,... 'Streams',s); % set the 4 different streams x = [randn(700,1); 4 + 2*randn(300,1)]; latt = -4:0.01:12; myfun = @(X) ksdensity(X,latt); pdfestimate = myfun(x); B = bootstrp(200,myfun,x,'Options',paroptions);

This method of distributing streams gives each worker a different stream for the computation. However, it does not allow for a reproducible computation, because the workers perform the 200 bootstraps in an unpredictable order. If you want to perform a reproducible computation, use substreams as described in “Running Reproducible Parallel Computations” on page 33-16. If you set the UseSubstreams option to true, then set the Streams option to a single random stream of the type that supports substreams ('mlfg6331_64' or 'mrg32k3a'). This setting gives reproducible computations.

33-19

33

Parallel Statistics

Implement Jackknife Using Parallel Computing This example is from the jackknife function reference page, but runs in parallel. Generate a sample data of size 10000 from a normal distribution with mean 0 and standard deviation 5. sigma = 5; rng('default') y = normrnd(0,sigma,10000,1);

Run jackknife in parallel to estimate the variance. To do this, use statset to create the options structure and set the UseParallel field to true. opts = statset('UseParallel',true); m = jackknife(@var,y,1,'Options',opts);

Compare the known bias formula with the jackknife bias estimate. n = length(y); bias = -sigma^2/n % Known bias formula jbias = (n-1)*(mean(m)-var(y,1)) % jackknife bias estimate Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6). bias = -0.0025 jbias = -0.0025

Compare how long it takes to compute in serial and in parallel. tic;m = jackknife(@var,y,1);toc

% Serial computation

Elapsed time is 1.638026 seconds. tic;m = jackknife(@var,y,1,'Options',opts);toc % Parallel computation Elapsed time is 0.507961 seconds.

jackknife does not use random numbers, so gives the same results every time, whether run in parallel or serial.

33-20

Implement Cross-Validation Using Parallel Computing

Implement Cross-Validation Using Parallel Computing In this section... “Simple Parallel Cross Validation” on page 33-21 “Reproducible Parallel Cross Validation” on page 33-21

Simple Parallel Cross Validation In this example, use crossval to compute a cross-validation estimate of mean-squared error for a regression model. Run the computations in parallel. mypool = parpool() Starting parpool using the 'local' profile ... connected to 2 workers. mypool = Pool with properties: AttachedFiles: NumWorkers: IdleTimeout: Cluster: RequestQueue: SpmdEnabled:

{0x1 cell} 2 30 [1x1 parallel.cluster.Local] [1x1 parallel.RequestQueue] 1

opts = statset('UseParallel',true); load('fisheriris'); y = meas(:,1); X = [ones(size(y,1),1),meas(:,2:4)]; regf=@(XTRAIN,ytrain,XTEST)(XTEST*regress(ytrain,XTRAIN)); cvMse = crossval('mse',X,y,'Predfun',regf,'Options',opts) cvMse = 0.1028

This simple example is not a good candidate for parallel computation: % How long to compute in serial? tic;cvMse = crossval('mse',X,y,'Predfun',regf);toc Elapsed time is 0.073438 seconds. % How long to compute in parallel? tic;cvMse = crossval('mse',X,y,'Predfun',regf,... 'Options',opts);toc Elapsed time is 0.289585 seconds.

Reproducible Parallel Cross Validation To run crossval in parallel in a reproducible fashion, set the options and reset the random stream appropriately (see “Running Reproducible Parallel Computations” on page 33-16). 33-21

33

Parallel Statistics

mypool = parpool() Starting parpool using the 'local' profile ... connected to 2 workers. mypool = Pool with properties: AttachedFiles: NumWorkers: IdleTimeout: Cluster: RequestQueue: SpmdEnabled:

{0x1 cell} 2 30 [1x1 parallel.cluster.Local] [1x1 parallel.RequestQueue] 1

s = RandStream('mlfg6331_64'); opts = statset('UseParallel',true,... 'Streams',s,'UseSubstreams',true); load('fisheriris'); y = meas(:,1); X = [ones(size(y,1),1),meas(:,2:4)]; regf=@(XTRAIN,ytrain,XTEST)(XTEST*regress(ytrain,XTRAIN)); cvMse = crossval('mse',X,y,'Predfun',regf,'Options',opts) cvMse = 0.1020

Reset the stream: reset(s) cvMse = crossval('mse',X,y,'Predfun',regf,'Options',opts) cvMse = 0.1020

33-22

Implement Bootstrap Using Parallel Computing

Implement Bootstrap Using Parallel Computing In this section... “Bootstrap in Serial and Parallel” on page 33-23 “Reproducible Parallel Bootstrap” on page 33-24

Bootstrap in Serial and Parallel Here is an example timing a bootstrap in parallel versus in serial. The example generates data from a mixture of two Gaussians, constructs a nonparametric estimate of the resulting data, and uses a bootstrap to get a sense of the sampling variability. 1

Generate the data: % Generate a random sample of size 1000, % from a mixture of two Gaussian distributions x = [randn(700,1); 4 + 2*randn(300,1)];

2

Construct a nonparametric estimate of the density from the data: latt = -4:0.01:12; myfun = @(X) ksdensity(X,latt); pdfestimate = myfun(x);

3

Bootstrap the estimate to get a sense of its sampling variability. Run the bootstrap in serial for timing comparison. tic;B = bootstrp(200,myfun,x);toc Elapsed time is 10.878654 seconds.

4

Run the bootstrap in parallel for timing comparison: mypool = parpool() Starting parpool using the 'local' profile ... connected to 2 workers. mypool = Pool with properties: AttachedFiles: NumWorkers: IdleTimeout: Cluster: RequestQueue: SpmdEnabled:

{0x1 cell} 2 30 [1x1 parallel.cluster.Local] [1x1 parallel.RequestQueue] 1

opt = statset('UseParallel',true); tic;B = bootstrp(200,myfun,x,'Options',opt);toc Elapsed time is 6.304077 seconds.

Computing in parallel is nearly twice as fast as computing in serial for this example. Overlay the ksdensity density estimate with the 200 bootstrapped estimates obtained in the parallel bootstrap. You can get a sense of how to assess the accuracy of the density estimate from this plot. 33-23

33

Parallel Statistics

hold on for i=1:size(B,1), plot(latt,B(i,:),'c:') end plot(latt,pdfestimate); xlabel('x');ylabel('Density estimate')

Reproducible Parallel Bootstrap To run the example in parallel in a reproducible fashion, set the options appropriately (see “Running Reproducible Parallel Computations” on page 33-16). First set up the problem and parallel environment as in “Bootstrap in Serial and Parallel” on page 33-23. Then set the options to use substreams along with a stream that supports substreams. s = RandStream('mlfg6331_64'); % has substreams opts = statset('UseParallel',true,... 'Streams',s,'UseSubstreams',true); B2 = bootstrp(200,myfun,x,'Options',opts);

To rerun the bootstrap and get the same result: reset(s) % set the stream to initial state B3 = bootstrp(200,myfun,x,'Options',opts); isequal(B2,B3) % check if same results

33-24

Implement Bootstrap Using Parallel Computing

ans = 1

33-25

34 Code Generation • “Introduction to Code Generation” on page 34-3 • “General Code Generation Workflow” on page 34-6 • “Code Generation for Prediction of Machine Learning Model at Command Line” on page 34-10 • “Code Generation for Incremental Learning” on page 34-14 • “Code Generation for Nearest Neighbor Searcher” on page 34-20 • “Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 34-23 • “Code Generation and Classification Learner App” on page 34-32 • “Deploy Neural Network Regression Model to FPGA/ASIC Platform” on page 34-40 • “Predict Class Labels Using MATLAB Function Block” on page 34-49 • “Specify Variable-Size Arguments for Code Generation” on page 34-54 • “Create Dummy Variables for Categorical Predictors and Generate C/C++ Code” on page 34-59 • “System Objects for Classification and Code Generation” on page 34-63 • “Predict Class Labels Using Stateflow” on page 34-71 • “Human Activity Recognition Simulink Model for Smartphone Deployment” on page 34-75 • “Human Activity Recognition Simulink Model for Fixed-Point Deployment” on page 34-84 • “Code Generation for Prediction and Update Using Coder Configurer” on page 34-90 • “Code Generation for Probability Distribution Objects” on page 34-92 • “Fixed-Point Code Generation for Prediction of SVM” on page 34-97 • “Generate Code to Classify Data in Table” on page 34-110 • “Code Generation for Image Classification” on page 34-113 • “Predict Class Labels Using ClassificationSVM Predict Block” on page 34-121 • “Predict Responses Using RegressionSVM Predict Block” on page 34-125 • “Predict Class Labels Using ClassificationTree Predict Block” on page 34-131 • “Predict Responses Using RegressionTree Predict Block” on page 34-137 • “Predict Class Labels Using ClassificationEnsemble Predict Block” on page 34-140 • “Predict Responses Using RegressionEnsemble Predict Block” on page 34-147 • “Predict Class Labels Using ClassificationNeuralNetwork Predict Block” on page 34-154 • “Predict Responses Using RegressionNeuralNetwork Predict Block” on page 34-158 • “Predict Responses Using RegressionGP Predict Block” on page 34-162 • “Predict Class Labels Using ClassificationKNN Predict Block” on page 34-168 • “Predict Class Labels Using ClassificationLinear Predict Block” on page 34-174 • “Predict Responses Using RegressionLinear Predict Block” on page 34-178 • “Predict Class Labels Using ClassificationECOC Predict Block” on page 34-182 • “Predict Class Labels Using ClassificationNaiveBayes Predict Block” on page 34-187

34

Code Generation

• “Code Generation for Binary GLM Logistic Regression Model Trained in Classification Learner” on page 34-193 • “Code Generation for Anomaly Detection” on page 34-196 • “Compress Machine Learning Model for Memory-Limited Hardware” on page 34-202 • “Verify and Validate Machine Learning Models Using Model-Based Design” on page 34-217 • “Find Nearest Neighbors Using KNN Search Block” on page 34-237 • “Perform Incremental Learning Using IncrementalRegressionLinear Fit and Predict Blocks” on page 34-241 • “Perform Incremental Learning Using IncrementalClassificationLinear Fit and Predict Blocks” on page 34-245 • “Perform Incremental Learning and Track Performance Metrics Using Update Metrics Block” on page 34-249

34-2

Introduction to Code Generation

Introduction to Code Generation In this section... “Code Generation Workflows” on page 34-3 “Code Generation Applications” on page 34-5 MATLAB Coder generates readable and portable C and C++ code from Statistics and Machine Learning Toolbox functions that support code generation. You can integrate the generated code into your projects as source code, static libraries, or dynamic libraries. You can also use the generated code within the MATLAB environment to accelerate computationally intensive portions of your MATLAB code. Generating C/C++ code requires MATLAB Coder and has the following limitations: • You cannot call any function at the top level when generating code by using codegen. Instead, call the function within an entry-point function, and then generate code from the entry-point function. The entry-point function, also known as the top-level or primary function, is a function you define for code generation. All functions within the entry-point function must support code generation. • The MATLAB Coder limitations also apply to Statistics and Machine Learning Toolbox for code generation. For details, see “MATLAB Language Features Supported for C/C++ Code Generation” (MATLAB Coder). • Code generation in Statistics and Machine Learning Toolbox does not support sparse matrices. • For the code generation usage notes and limitations for each function, see the Code Generation section on the function reference page. For a list of Statistics and Machine Learning Toolbox functions that support code generation, see Function List (C/C++ Code Generation).

Code Generation Workflows You can generate C/C++ code for the Statistics and Machine Learning Toolbox functions in several ways. • General code generation workflow for functions that are not the object functions of machine learning models

Define an entry-point function that calls the function that supports code generation, generate C/C ++ code for the entry-point function by using codegen, and then verify the generated code. The entry-point function, also known as the top-level or primary function, is a function you define for code generation. Because you cannot call any function at the top level using codegen, you must define an entry-point function. All functions within the entry-point function must support code generation. For details, see “General Code Generation Workflow” on page 34-6. • Code generation workflow for the object function of a machine learning model (including predict, random, knnsearch, rangesearch, isanomaly, and incremental learning object functions) 34-3

34

Code Generation

Save a trained model by using saveLearnerForCoder, and define an entry-point function that loads the saved model by using loadLearnerForCoder and calls the object function. Then generate code for the entry-point function by using codegen, and verify the generated code. The input arguments of the entry-point function cannot be classification or regression model objects. Therefore, you need to work around this limitation by using saveLearnerForCoder and loadLearnerForCoder. You can also generate single-precision C/C++ code for the prediction of machine learning models for classification and regression. For single-precision code generation, specify the name-value pair argument 'Datatype','single' as an additional input to the loadLearnerForCoder function. For details, see these examples • “Code Generation for Prediction of Machine Learning Model at Command Line” on page 3410 • “Code Generation for Incremental Learning” on page 34-14 • “Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 34-23 • “Code Generation for Nearest Neighbor Searcher” on page 34-20 • “Code Generation and Classification Learner App” on page 34-32 • “Specify Variable-Size Arguments for Code Generation” on page 34-54 You can also generate fixed-point C/C++ code for the prediction of a support vector machine (SVM) model, a decision tree model, and an ensemble of decision trees for classification and regression. This type of code generation requires Fixed-Point Designer™.

Fixed-point code generation requires an additional step that defines the fixed-point data types of the variables required for prediction. Create a fixed-point data type structure by using the data type function generated by generateLearnerDataTypeFcn, and use the structure as an input argument of loadLearnerForCoder in an entry-point function. You can also optimize the fixedpoint data types before generating code. For details, see “Fixed-Point Code Generation for Prediction of SVM” on page 34-97. • Code generation workflow for the predict and update functions of a tree model, an SVM model, a linear model, or a multiclass error-correcting output codes (ECOC) classification model using SVM or linear binary learners

34-4

Introduction to Code Generation

After training a model, create a coder configurer by using learnerCoderConfigurer, generate code by using generateCode, and then verify the generated code. You can configure code generation options and specify the coder attributes of the model parameters using object properties. After you retrain the model with new data or settings, you can update model parameters in the generated C/C++ code without having to regenerate the code. This feature reduces the effort required to regenerate, redeploy, and reverify C/C++ code. For details, see “Code Generation for Prediction and Update Using Coder Configurer” on page 3490.

Code Generation Applications To integrate the prediction of a machine learning model into Simulink®, use a MATLAB Function block or the Simulink blocks in the Statistics and Machine Learning Toolbox library. For details, see these examples: • “Predict Class Labels Using MATLAB Function Block” on page 34-49 • “Predict Responses Using RegressionSVM Predict Block” on page 34-125 • “Predict Class Labels Using ClassificationSVM Predict Block” on page 34-121 Code generation for the Statistics and Machine Learning Toolbox functions also works with other toolboxes such as System object™ and Stateflow®, as described in these examples: • “System Objects for Classification and Code Generation” on page 34-63 • “Predict Class Labels Using Stateflow” on page 34-71 For more applications of code generation, see these examples: • “Code Generation for Image Classification” on page 34-113 • “Human Activity Recognition Simulink Model for Smartphone Deployment” on page 34-75

See Also codegen | saveLearnerForCoder | loadLearnerForCoder | learnerCoderConfigurer | generateLearnerDataTypeFcn

Related Examples •

“Get Started with MATLAB Coder” (MATLAB Coder)

34-5

34

Code Generation

General Code Generation Workflow The general code generation workflow for the Statistics and Machine Learning Toolbox functions that are not the object functions of machine learning models is the same as the workflow described in MATLAB Coder. For details, see “Get Started with MATLAB Coder” (MATLAB Coder). To learn how to generate code for the object functions of machine learning models, see “Introduction to Code Generation” on page 34-3. This example briefly explains the general code generation workflow as summarized in this flow chart:

Define Entry-Point Function An entry-point function, also known as the top-level or primary function, is a function you define for code generation. Because you cannot call any function at the top level using codegen, you must define an entry-point function that calls code-generation-enabled functions, and generate C/C++ code for the entry-point function by using codegen. All functions within the entry-point function must support code generation. Add the %#codegen compiler directive (or pragma) to the entry-point function after the function signature to indicate that you intend to generate code for the MATLAB algorithm. Adding this directive instructs the MATLAB Code Analyzer to help you diagnose and fix violations that would cause errors during code generation. See “Check Code with the Code Analyzer” (MATLAB Coder). For example, to generate code that estimates the interquartile range of a data set using iqr, define this function. function r = iqrCodeGen(x) %#codegen %IQRCODEGEN Estimate interquartile range % iqrCodeGen returns the interquartile range of the data x, % a single- or double-precision vector. r = iqr(x); end

You can allow for optional input arguments by specifying varargin as an input argument. For details, see “Code Generation for Variable Length Argument Lists” (MATLAB Coder) and “Specify Variable-Size Arguments for Code Generation” on page 34-54.

Generate Code Set Up Compiler To generate C/C++ code, you must have access to a compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. To view and change the default C compiler, enter: mex -setup

For more details, see “Change Default Compiler”. 34-6

General Code Generation Workflow

Generate Code Using codegen After setting up your compiler, generate code for the entry-point function by using codegen or the MATLAB Coder app. To learn how to generate code using the MATLAB Coder app, see “Generate MEX Functions by Using the MATLAB Coder App” (MATLAB Coder). To generate code at the command line, use codegen. Because C and C++ are statically typed languages, you must determine the properties of all variables in the entry-point function at compile time. Specify the data types and sizes of all inputs of the entry-point function when you call codegen by using the -args option. • To specify the data type and exact input array size, pass a MATLAB expression that represents the set of values with a certain data type and array size. For example, to specify that the generated code from iqrCodeGen.m must accept a double-precision numeric column vector with 100 elements, enter: testX = randn(100,1); codegen iqrCodeGen -args {testX} -report

The -report flag generates a code generation report. See “Code Generation Reports” (MATLAB Coder). • To specify that at least one of the dimensions can have any length, use the -args option with coder.typeof as follows. -args {coder.typeof(example_value, size_vector, variable_dims)}

The values of example_value, size_vector, and variable_dims specify the properties of the input array that the generated code can accept. • An input array has the same data type as the example values in example_value. • size_vector is the array size of an input array if the corresponding variable_dims value is false. • size_vector is the upper bound of the array size if the corresponding variable_dims value is true. • variable_dims specifies whether each dimension of the array has a variable size or a fixed size. A value of true (logical 1) means that the corresponding dimension has a variable size; a value of false (logical 0) means that the corresponding dimension has a fixed size. Specifying a variable-size input is convenient when you have data with an unknown number of observations at compile time. For example, to specify that the generated code from iqrCodeGen.m can accept a double-precision numeric column vector of any length, enter: testX = coder.typeof(0,[Inf,1],[1,0]); codegen iqrCodeGen -args {testX} -report

0 for the example_value value implies that the data type is double because double is the default numeric data type of MATLAB. [Inf,1] for the size_vector value and [1,0] for the variable_dims value imply that the size of the first dimension is variable and unbounded, and the size of the second dimension is fixed to be 1. Note Specification of variable size inputs can affect performance. For details, see “Control Memory Allocation for Variable-Size Arrays” (MATLAB Coder). 34-7

34

Code Generation

• To specify a character array, such as supported name-value pair arguments, specify the character array as a constant using coder.Constant. For example, suppose that 'Name' is a valid namevalue pair argument for iqrCodeGen.m, and the corresponding value value is numeric. Then enter: codegen iqrCodeGen -args {testX,coder.Constant('Name'),value} -report

For more details, see “Generate C Code at the Command Line” (MATLAB Coder) and “Specify Properties of Entry-Point Function Inputs” (MATLAB Coder). Build Type MATLAB Coder can generate code for these types: • MEX (MATLAB Executable) function • Standalone C/C++ code • Standalone C/C++ code compiled to a static library • Standalone C/C++ code compiled to a dynamically linked library • Standalone C/C++ code compiled to an executable You can specify the build type using the -config option of codegen. For more details on setting code generation options, see “Configure Build Settings” (MATLAB Coder). By default, codegen generates a MEX function. A MEX function is a C/C++ program that is executable from MATLAB. You can use a MEX function to accelerate MATLAB algorithms and to test the generated code for functionality and run-time issues. For details, see “MATLAB Algorithm Acceleration” (MATLAB Coder) and “Why Test MEX Functions in MATLAB?” (MATLAB Coder). Code Generation Report You can use the -report flag to produce a code generation report. This report helps you debug code generation issues and view the generated C/C++ code. For details, see “Code Generation Reports” (MATLAB Coder).

Verify Generated Code Test a MEX function to verify that the generated code provides the same functionality as the original MATLAB code. To perform this test, run the MEX function using the same inputs that you used to run the original MATLAB code, and then compare the results. Running the MEX function in MATLAB before generating standalone code also enables you to detect and fix run-time errors that are much harder to diagnose in the generated standalone code. For more details, see “Why Test MEX Functions in MATLAB?” (MATLAB Coder). Pass some data to verify whether iqr, iqrCodeGen, and iqrCodeGen_mex return the same interquartile range. testX = randn(100,1); r = iqr(testX); r_entrypoint = iqrCodeGen(testX); r_mex = iqrCodeGen_mex(testX);

Compare the outputs by using isequal. isequal(r,r_entrypoint,r_mex)

34-8

General Code Generation Workflow

isequal returns logical 1 (true) if all the inputs are equal. You can also verify the MEX function using a test file and coder.runTest. For details, see “Testing Code Generated from MATLAB Code” (MATLAB Coder).

See Also codegen

More About •

“Introduction to Code Generation” on page 34-3

•

“Code Generation for Probability Distribution Objects” on page 34-92

•

Function List (C/C++ Code Generation)

34-9

34

Code Generation

Code Generation for Prediction of Machine Learning Model at Command Line This example shows how to generate code for the prediction of classification and regression model objects at the command line. You can also generate code using the MATLAB® Coder™ app. See “Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 3423 for details. Certain classification and regression model objects have a predict or random function that supports code generation. Prediction using these object functions requires a trained classification or regression model object, but the -args option of codegen (MATLAB Coder) does not accept these objects. Work around this limitation by using saveLearnerForCoder and loadLearnerForCoder as described in this example. This flow chart shows the code generation workflow for the object functions of classification and regression model objects.

After you train a model, save the trained model by using saveLearnerForCoder. Define an entrypoint function that loads the saved model by using loadLearnerForCoder and calls the object function. Then generate code for the entry-point function by using codegen, and verify the generated code. Train Classification Model Train a classification model object equipped with a code-generation-enabled predict function. In this case, train a support vector machine (SVM) classification model. load fisheriris inds = ~strcmp(species,'setosa'); X = meas(inds,3:4); Y = species(inds); Mdl = fitcsvm(X,Y);

Mdl is a linear SVM model. The predictor coefficients in a linear SVM model provide enough information to predict labels for new observations. Removing the support vectors reduces memory usage in the generated code. Remove the support vectors from the linear SVM model by using the discardSupportVectors function. Mdl = discardSupportVectors(Mdl);

This step can include data preprocessing, feature selection, and optimizing the model using crossvalidation, for example. Save Model Using saveLearnerForCoder Save the classification model to the file SVMModel.mat by using saveLearnerForCoder. saveLearnerForCoder(Mdl,'SVMModel');

34-10

Code Generation for Prediction of Machine Learning Model at Command Line

saveLearnerForCoder saves the classification model to the MATLAB binary file SVMModel.mat as a structure array in the current folder. Define Entry-Point Function An entry-point function, also known as the top-level or primary function, is a function you define for code generation. Because you cannot call any function at the top level using codegen, you must define an entry-point function that calls code-generation-enabled functions, and generate C/C++ code for the entry-point function by using codegen. All functions within the entry-point function must support code generation. Define an entry-point function that returns predicted labels for input predictor data. Within the function, load the trained classification model by using loadLearnerForCoder, and then pass the loaded model to predict. In this case, define the predictLabelsSVM function, which predicts labels using the SVM model Mdl. function label = predictLabelsSVM(x) %#codegen %PREDICTLABELSSVM Label new observations using trained SVM model Mdl % predictLabelsSVM predicts the vector of labels label using % the saved SVM model Mdl and the predictor data x. Mdl = loadLearnerForCoder('SVMModel'); label = predict(Mdl,x); end

Add the %#codegen compiler directive (or pragma) to the entry-point function after the function signature to indicate that you intend to generate code for the MATLAB algorithm. Adding this directive instructs the MATLAB Code Analyzer to help you diagnose and fix violations that would result in errors during code generation. See “Check Code with the Code Analyzer” (MATLAB Coder). Note: If you click the button located in the upper-right section of this page and open this example in MATLAB®, then MATLAB® opens the example folder. This folder includes the entry-point function file. Generate Code Set Up Compiler To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler”. Generate Code Using codegen Generate code for the entry-point function using codegen (MATLAB Coder). Because C and C++ are statically typed languages, you must determine the properties of all variables in the entry-point function at compile time. Specify the data types and sizes of all inputs of the entry-point function when you call codegen by using the -args option. In this case, pass X as a value of the -args option to specify that the generated code must accept an input that has the same data type and array size as the training data X. codegen predictLabelsSVM -args {X} Code generation successful.

If the number of observations is unknown at compile time, you can also specify the input as variablesize by using coder.typeof (MATLAB Coder). For details, see “Specify Variable-Size Arguments for 34-11

34

Code Generation

Code Generation” on page 34-54 and “Specify Properties of Entry-Point Function Inputs” (MATLAB Coder) Build Type MATLAB Coder can generate code for the following build types: • MEX (MATLAB Executable) function • Standalone C/C++ code • Standalone C/C++ code compiled to a static library • Standalone C/C++ code compiled to a dynamically linked library • Standalone C/C++ code compiled to an executable You can specify the build type using the -config option of codegen (MATLAB Coder). For more details on setting code generation options, see the -config option of codegen (MATLAB Coder) and “Configure Build Settings” (MATLAB Coder). By default, codegen generates a MEX function. A MEX function is a C/C++ program that is executable from MATLAB. You can use a MEX function to accelerate MATLAB algorithms and to test the generated code for functionality and run-time issues. For details, see “MATLAB Algorithm Acceleration” (MATLAB Coder) and “Why Test MEX Functions in MATLAB?” (MATLAB Coder). Code Generation Report You can use the -report flag to produce a code generation report. This report helps you debug code generation issues and view the generated C/C++ code. For details, see “Code Generation Reports” (MATLAB Coder). Verify Generated Code Test a MEX function to verify that the generated code provides the same functionality as the original MATLAB code. To perform this test, run the MEX function using the same inputs that you used to run the original MATLAB code, and then compare the results. Running the MEX function in MATLAB before generating standalone code also enables you to detect and fix run-time errors that are much harder to diagnose in the generated standalone code. For more details, see “Why Test MEX Functions in MATLAB?” (MATLAB Coder). Pass some predictor data to verify whether predict, predictLabelsSVM, and the MEX function return the same labels. labels1 = predict(Mdl,X); labels2 = predictLabelsSVM(X); labels3 = predictLabelsSVM_mex(X);

Compare the predicted labels by using isequal. verifyMEX = isequal(labels1,labels2,labels3) verifyMEX = logical 1

34-12

Code Generation for Prediction of Machine Learning Model at Command Line

isequal returns logical 1 (true), which means all the inputs are equal. The comparison confirms that the predict function, predictLabelsSVM function, and MEX function return the same labels.

See Also codegen | saveLearnerForCoder | loadLearnerForCoder | learnerCoderConfigurer

Related Examples •

“Introduction to Code Generation” on page 34-3

•

“Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 34-23

•

“Code Generation for Prediction and Update Using Coder Configurer” on page 34-90

•

“Code Generation and Classification Learner App” on page 34-32

•

“Specify Variable-Size Arguments for Code Generation” on page 34-54

•

Function List (C/C++ Code Generation)

34-13

34

Code Generation

Code Generation for Incremental Learning This example shows how to generate code that implements incremental learning for binary linear classification. To motivate its purpose, consider training a wearable device tasked to determine whether the wearer is idle or moving, based on sensory features the device reads. The generated code performs the following tasks, as defined in entry-point functions: 1

Load a configured incremental learning model template created at the command line.

2

Track performance metrics on the incoming batch of data from a data stream. This example tracks misclassification rate and hinge loss.

3

Update the model by fitting the incremental model to the batch of data.

4

Predicts labels for the batch of data.

This example generates code from the MATLAB® command line, but you can generate code using the MATLAB Coder™ app instead. For more details, see “Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 34-23. All incremental learning object functions for binary linear classification (and also linear regression) support code generation. To prepare code to generate for incremental learning, the object functions require an appropriately configured incremental learning model object, but the -args option of codegen (MATLAB Coder) does not accept these objects. To work around this limitation, use the saveLearnerForCoder and loadLearnerForCoder functions. This flow chart shows the code generation workflows for the incremental learning object functions for linear models.

The flow chart suggests two distinct but merging workflows. • The workflow beginning with Train Model > Convert Model requires data, in which case you can optionally perform feature selection or optimize the model by performing cross-validation before generating code for incremental learning. • The workflow beginning with Configure Model does not require data. Instead, you must manually configure an incremental learning model object. For details on the differences between the workflows, and for help deciding which one to use, see “Configure Incremental Learning Model” on page 28-14. Regardless of the workflow you choose, the resulting incremental learning model must have all the following qualities: 34-14

Code Generation for Incremental Learning

• The NumPredictors property reflects the number of predictors in the predictor data during incremental learning. • For classification, the ClassNames property must contain all class names expected during incremental learning. If you choose the Train Model > Convert Model workflow and you fit the model to data containing all known classes, the model is configured for code generation. After you prepare an incremental learning model, save the model object by using saveLearnerForCoder. Then, define an entry-point function that loads the saved model by using loadLearnerForCoder, and that performs incremental learning by calling the object functions. Alternatively, you can define multiple entry-point functions that perform the stages of incremental learning separately (this example uses this workflow). However, this workflow requires special treatment when an updated model object is an input to another entry-point function. For example, you write the following three entry-point functions: • A function that accepts the current model and a batch of data, calls updateMetrics, and returns a model with updated performance metrics. • A function that accepts the updated model and the batch of data, calls fit, and returns a model with updated coefficients. • A function that accepts the further updated model and the batch of predictor data, calls predict, and returns predicted labels. Finally, generate code for the entry-point functions by using codegen, and verify the generated code. Load and Preprocess Data Load the human activity data set. Randomly shuffle the data. load humanactivity rng(1); % For reproducibility n = numel(actid); p = size(feat,2); idx = randsample(n,n); X = feat(idx,:); actid = actid(idx);

For details on the data set, enter Description at the command line. Responses can be one of five classes: Sitting, Standing, Walking, Running, or Dancing. Dichotomize the response by identifying whether the subject is idle (actid 2) = classnames(2);

Configure Incremental Learning Model To generate code for incremental classification, you must appropriately configure a binary classification linear model for incremental learning incrementalClassificationLinear. Create a binary classification (SVM) model for incremental learning. Fully configure the model for code generation by specify all expected class names and the number of predictor variables. Also, 34-15

34

Code Generation

specify tracking the misclassification rate and hinge loss. For reproducibility, this example turns off observation shuffling for the scale-invariant solver. metrics = ["classiferror" "hinge"]; IncrementalMdl = incrementalClassificationLinear('ClassNames',classnames,'NumPredictors',p,... 'Shuffle',false,'Metrics',metrics) IncrementalMdl = incrementalClassificationLinear IsWarm: Metrics: ClassNames: ScoreTransform: Beta: Bias: Learner:

0 [2x2 table] [Idle NotIdle] 'none' [60x1 double] 0 'svm'

Mdl is an incremenalClassificationLinear model object configured for code generation. Mdl is cold (Mdl.IsWarm is 0) because it has not processed data—the coefficients are 0. Alternatively, because the data is available, you can fit an SVM model to the data by using either fitcsvm or fitclinear, and then convert the resulting model to an incremental learning model by passing the model to incrementalLearner. The resulting model is warm because it has processed data—its coefficients are likely non-zero. Save Model Using saveLearnerForCoder Save the incremental learning model to the file InitialMdl.mat by using saveLearnerForCoder. saveLearnerForCoder(IncrementalMdl,'InitialMdl');

saveLearnerForCoder saves the incremental learning model to the MATLAB binary file SVMClassIncrLearner.mat as structure arrays in the current folder. Define Entry-Point Functions An entry-point function, also known as the top-level or primary function, is a function you define for code generation. Because you cannot call any function at the top level using codegen, you must define an entry-point function that calls code-generation-enabled functions, and generate C/C++ code for the entry-point function by using codegen. All functions within the entry-point function must support code generation. Define four separate entry-point functions in your current folder that perform the following actions: • myInitialModelIncrLearn.m — Load the saved model by using loadLearnerForCoder, and return a model of the same form for code generation. This entry-point function facilitates the use of a model, returned by an entry-point function, as an input to another entry-point function. • myUpdateMetricsIncrLearn.m — Measure the performance of the current model on an incoming batch of data, and store the performance metrics in the model. The function accepts the current model, and predictor and response data, and returns an updated model. • myFitIncrLearn.m — Fit the current model to the incoming batch of data, and store the updated coefficients in the model. The function accepts the current model, and predictor and response data, and returns an updated model. 34-16

Code Generation for Incremental Learning

• myPredictIncrLearn.m — Predicted labels for the incoming batch of data using the current model. The function accepts the current model and predictor data, and returns labels and class scores. For more details on generating code for multiple entry-point functions, see “Generate Code for Multiple Entry-Point Functions” (MATLAB Coder). Add the %#codegen compiler directive (or pragma) to the entry-point function after the function signature to indicate that you intend to generate code for the MATLAB algorithm. Adding this directive instructs the MATLAB Code Analyzer to help you diagnose and fix violations that would result in errors during code generation. See “Check Code with the Code Analyzer” (MATLAB Coder). Alternatively, you can access the functions in mlr/examples/stats/main, where mlr is the value of matlabroot. Display the body of each function. type myInitialModelIncrLearn.m function incrementalModel = myInitialModelIncrLearn() %#codegen % MYINITIALMODELINCRLEARN Load and return configured linear model for % binary classification InitialMdl incrementalModel = loadLearnerForCoder('InitialMdl'); end type myUpdateMetricsIncrLearn.m function incrementalModel = myUpdateMetricsIncrLearn(incrementalModel,X,Y) %#codegen % MYUPDATEMETRICSINCRLEARN Measure model performance metrics on new data incrementalModel = updateMetrics(incrementalModel,X,Y); end type myFitIncrLearn.m function incrementalModel = myFitIncrLearn(incrementalModel,X,Y) %#codegen % MYFITINCRLEARN Fit model to new data incrementalModel = fit(incrementalModel,X,Y); end type myPredictIncrLearn.m function [labels,scores] = myPredictIncrLearn(incrementalModel,X) %#codegen % MYPREDICTINCRLEARN Predict labels and classification scores on new data [labels,scores] = predict(incrementalModel,X); end

Generate Code Set Up Compiler To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler”. Build Type MATLAB Coder can generate code for the following build types: 34-17

34

Code Generation

• MEX (MATLAB Executable) function • Standalone C/C++ code • Standalone C/C++ code compiled to a static library • Standalone C/C++ code compiled to a dynamically linked library • Standalone C/C++ code compiled to an executable You can specify the build type using the -config option of codegen (MATLAB Coder). For more details on setting code generation options, see the -config option of codegen (MATLAB Coder) and “Configure Build Settings” (MATLAB Coder). By default, codegen generates a MEX function. A MEX function is a C/C++ program that is executable from MATLAB. You can use a MEX function to accelerate MATLAB algorithms and to test the generated code for functionality and run-time issues. For details, see “MATLAB Algorithm Acceleration” (MATLAB Coder) and “Why Test MEX Functions in MATLAB?” (MATLAB Coder). Generate Code Using codegen Because C and C++ are statically typed languages, you must specify the properties of all variables in the entry-point function at compile time. Specify the following: • The data types of the data inputs of the entry-point functions by using coder.typeof (MATLAB Coder). Also, because the number of observations can vary from batch to batch, specify that the number of observations (first dimension) has variable size. For details, see “Specify Variable-Size Arguments for Code Generation” on page 34-54 and “Specify Properties of Entry-Point Function Inputs” (MATLAB Coder). • Because several entry-point functions accept an incremental model object as a input, and operate on it, create a representation of the model object for code generation by using coder.OutputType (MATLAB Coder). For more details, see “Pass an Entry-Point Function Output as an Input” (MATLAB Coder). predictorData = coder.typeof(X,[],[true false]); responseData = coder.typeof(Y,[],true); IncrMdlOutputType = coder.OutputType('myInitialModelIncrLearn');

Generate code for the entry-point functions using codegen (MATLAB Coder). For each entry-point function argument, use the -args flags to specify the coder representations of the variables. Specify the output MEX function name myIncrLearn_mex. codegen -o myIncrLearn_mex ... myInitialModelIncrLearn ... myUpdateMetricsIncrLearn -args {IncrMdlOutputType,predictorData,responseData} ... myFitIncrLearn -args {IncrMdlOutputType,predictorData,responseData} ... myPredictIncrLearn –args {IncrMdlOutputType,predictorData} -report

Code generation successful: To view the report, open('codegen\mex\myIncrLearn_mex\html\report.mld

For help debugging code generation issues, view the generated C/C++ code by clicking View report (see “Code Generation Reports” (MATLAB Coder)). Verify Generated Code Test the MEX function to verify that the generated code provides the same functionality as the original MATLAB code. To perform this test, run the MEX function using the same inputs that you used to run the original MATLAB code, and then compare the results. Running the MEX function in 34-18

Code Generation for Incremental Learning

MATLAB before generating standalone code also enables you to detect and fix run-time errors that are much harder to diagnose in the generated standalone code. For more details, see “Why Test MEX Functions in MATLAB?” (MATLAB Coder). Perform incremental learning by using the generated MEX functions and directly by using the object functions. Specify a batch. % Preallocation numObsPerChunk = 50; nchunk = floor(n/numObsPerChunk); ce = array2table(zeros(nchunk,2),'VariableNames',["Cumulative" "Window"]); hinge = ce; ceCG = ce; hingeCG = ce; IncrementalMdlCG = myIncrLearn_mex('myInitialModelIncrLearn'); scores = zeros(n,2); scoresCG = zeros(n,2); % Incremental fitting for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; IncrementalMdl = updateMetrics(IncrementalMdl,X(idx,:),Y(idx)); ce{j,:} = IncrementalMdl.Metrics{"ClassificationError",:}; hinge{j,:} = IncrementalMdl.Metrics{"HingeLoss",:}; IncrementalMdlCG = myIncrLearn_mex('myUpdateMetricsIncrLearn',IncrementalMdlCG,... X(idx,:),Y(idx)); ceCG{j,:} = IncrementalMdlCG.Metrics{"ClassificationError",:}; hingeCG{j,:} = IncrementalMdlCG.Metrics{"HingeLoss",:}; IncrementalMdl = fit(IncrementalMdl,X(idx,:),Y(idx)); IncrementalMdlCG = myIncrLearn_mex('myFitIncrLearn',IncrementalMdlCG,X(idx,:),Y(idx)); [~,scores(idx,:)] = predict(IncrementalMdl,X(idx,:)); [~,scoresCG(idx,:)] = myIncrLearn_mex('myPredictIncrLearn',IncrementalMdlCG,X(idx,:)); end

Compare the cumulative metrics and scores for classifying Idle returned by the object functions and MEX functions. idx = all(~isnan(ce.Variables),2); areCEsEqual = norm(ce.Cumulative(idx) - ceCG.Cumulative(idx)) areCEsEqual = 8.9904e-18 idx = all(~isnan(hinge.Variables),2); areHingeLossesEqual = norm(hinge.Cumulative(idx) - hingeCG.Cumulative(idx)) areHingeLossesEqual = 9.5220e-17 areScoresEqual = norm(scores(:,1) - scoresCG(:,1)) areScoresEqual = 8.7996e-13

The differences between the returned quantities are negligible. 34-19

34

Code Generation

Code Generation for Nearest Neighbor Searcher The object functions knnsearch and rangesearch of the nearest neighbor searcher objects, ExhaustiveSearcher and KDTreeSearcher, support code generation. This example shows how to generate code for finding the nearest neighbor using an exhaustive searcher object at the command line. The example shows two different ways to generate code, depending on the way you use the object: load the object by using loadLearnerForCoder in an entry-point function, and pass a compile-time constant object to the generated code. Train Exhaustive Nearest Neighbor Searcher Load Fisher's iris data set. load fisheriris

Remove five irises randomly from the predictor data to use as a query set. rng('default'); % For reproducibility n = size(meas,1); % Sample size qIdx = randsample(n,5); % Indices of query data X = meas(~ismember(1:n,qIdx),:); Y = meas(qIdx,:);

Prepare an exhaustive nearest neighbor searcher using the training data. Specify the 'Distance' and 'P' name-value pair arguments to use the Minkowski distance with an exponent of 1 for finding the nearest neighbor. Mdl = ExhaustiveSearcher(X,'Distance','minkowski','P',1);

Find the index of the training data (X) that is the nearest neighbor of each point in the query data (Y). Idx = knnsearch(Mdl,Y);

Generate Code Using saveLearnerForCoder and loadLearnerForCoder Generate code that loads an exhaustive searcher, takes query data as an input argument, and then finds the nearest neighbor. Save the exhaustive searcher to a file using saveLearnerForCoder. saveLearnerForCoder(Mdl,'searcherModel')

saveLearnerForCoder saves the model to the MATLAB® binary file searcherModel.mat as a structure array in the current folder. Define the entry-point function myknnsearch1 that takes query data as an input argument. Within the function, load the searcher object by using loadLearnerForCoder, and then pass the loaded model to knnsearch. type myknnsearch1.m % Display contents of myknnsearch1.m file function idx = myknnsearch1(Y) %#codegen Mdl = loadLearnerForCoder('searcherModel'); idx = knnsearch(Mdl,Y); end

34-20

Code Generation for Nearest Neighbor Searcher

Note: If you click the button located in the upper-right section of this page and open this example in MATLAB, then MATLAB opens the example folder. This folder includes the entry-point function files, myknnsearch1.m, myknnsearch2.m, and myknnsearch3.m. Generate code for myknnsearch1 by using codegen (MATLAB Coder). Specify the data type and dimension of the input argument by using coder.typeof (MATLAB Coder) so that the generated code accepts a variable-size array. codegen myknnsearch1 -args {coder.typeof(Y,[Inf,4],[1,0])} Code generation successful.

For a more detailed code generation example that uses saveLearnerForCoder and loadLearnerForCoder, see “Code Generation for Prediction of Machine Learning Model at Command Line” on page 34-10. For more details about specifying variable-size arguments, see “Specify Variable-Size Arguments for Code Generation” on page 34-54. Pass the query data (Y) to verify that myknnsearch1 and the MEX file return the same indices. myIdx1 = myknnsearch1(Y); myIdx1_mex = myknnsearch1_mex(Y);

Compare myIdx1 and myIdx1_mex by using isequal. verifyMEX1 = isequal(Idx,myIdx1,myIdx1_mex) verifyMEX1 = logical 1

isequal returns logical 1 (true) if all the inputs are equal. This comparison confirms that myknnsearch1 and the MEX file return the same results. Generate Code with Constant Folded Model Object Nearest neighbor searcher objects can be an input argument of a function you define for code generation. The -args option of codegen (MATLAB Coder) accept a compile-time constant searcher object. Define the entry-point function myknnsearch2 that takes both an exhaustive searcher model and query data as input arguments instead of loading the model in the function. type myknnsearch2.m % Display contents of myknnsearch2.m file function idx = myknnsearch2(Mdl,Y) %#codegen idx = knnsearch(Mdl,Y); end

To generate code that takes the model object as well as the query data, designate the model object as a compile-time constant by using coder.Constant (MATLAB Coder) and include the constant folded model object in the -args value of codegen. codegen myknnsearch2 -args {coder.Constant(Mdl),coder.typeof(Y,[Inf,4],[1,0])} Code generation successful.

The code generation workflow with a constant folded model object follows general code generation workflow. For details, see “General Code Generation Workflow” on page 34-6. 34-21

34

Code Generation

Verify that myknnsearch2 and the MEX file return the same results. myIdx2 = myknnsearch2(Mdl,Y); myIdx2_mex = myknnsearch2_mex(Mdl,Y); verifyMEX2 = isequal(Idx,myIdx2,myIdx2_mex) verifyMEX2 = logical 1

Generate Code with Name-Value Pair Arguments Define the entry-point function myknnsearch3 that takes a model object, query data, and name-value pair arguments. You can allow for optional name-value arguments by specifying varargin as an input argument. For details, see “Code Generation for Variable Length Argument Lists” (MATLAB Coder). type myknnsearch3.m % Display contents of myknnsearch3.m file function idx = myknnsearch3(Mdl,Y,varargin) %#codegen idx = knnsearch(Mdl,Y,varargin{:}); end

To generate code that allows a user-defined exponent for the Minkowski distance, include {coder.Constant('P'),0} in the -args value of codegen. Use coder.Constant (MATLAB Coder) because the name of a name-value pair argument must be a compile-time constant.

codegen myknnsearch3 -args {coder.Constant(Mdl),coder.typeof(Y,[Inf,4],[1,0]),coder.Constant('P') Code generation successful.

Verify that myknnsearch3 and the MEX file return the same results. newIdx = knnsearch(Mdl,Y,'P',2); myIdx3 = myknnsearch3(Mdl,Y,'P',2); myIdx3_mex = myknnsearch3_mex(Mdl,Y,'P',2); verifyMEX3 = isequal(newIdx,myIdx3,myIdx3_mex) verifyMEX3 = logical 1

See Also codegen | saveLearnerForCoder | loadLearnerForCoder | knnsearch | rangesearch | ExhaustiveSearcher | KDTreeSearcher

Related Examples

34-22

•

“Introduction to Code Generation” on page 34-3

•

“General Code Generation Workflow” on page 34-6

•

“Code Generation for Prediction of Machine Learning Model at Command Line” on page 34-10

•

“Specify Variable-Size Arguments for Code Generation” on page 34-54

Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App

Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App This example shows how to generate C/C++ code for the prediction of classification and regression model objects by using the MATLAB® Coder™ app. You can also generate code at the command line using codegen (MATLAB Coder). See “Code Generation for Prediction of Machine Learning Model at Command Line” on page 34-10 for details. Certain classification and regression model objects have a predict or random function that supports code generation. Prediction using these object functions requires a trained classification or regression model object, but an entry-point function for code generation cannot have these objects as input variables. Work around this limitation by using saveLearnerForCoder and loadLearnerForCoder as described in this example. This flow chart shows the code generation workflow for the object functions of classification and regression model objects.

In this example, you train a classification ensemble model using k-nearest-neighbor weak learners and save the trained model by using saveLearnerForCoder. Then, define an entry-point function that loads the saved model by using loadLearnerForCoder and calls the object function. Write a script to test the entry-point function. Finally, generate code by using the MATLAB Coder app and verify the generated code. Train Classification Model Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). load ionosphere

Train a classification ensemble model with k-nearest-neighbor weak learners by using the random subspace method. For details of classifications that use a random subspace ensemble, see “Random Subspace Classification” on page 19-106. rng('default') % For reproducibility learner = templateKNN('NumNeighbors',2); Mdl = fitcensemble(X,Y,'Method','Subspace','NPredToSample',5, ... 'Learners',learner,'NumLearningCycles',13);

Save Model Using saveLearnerForCoder Save the trained ensemble model to a file named knnEnsemble.mat in your current folder. saveLearnerForCoder(Mdl,'knnEnsemble')

saveLearnerForCoder makes the full classification model Mdl compact, and then saves it to the MATLAB binary file knnEnsemble.mat as a structure array in the current folder. 34-23

34

Code Generation

Define Entry-Point Function An entry-point function, also known as the top-level or primary function, is a function you define for code generation. You must define an entry-point function that calls code-generation-enabled functions and generate C/C++ code from the entry-point function. All functions within the entry-point function must support code generation. In a new file in your current folder, define an entry-point function named myknnEnsemblePredict that does the following: • Accept input data (X), the file name of the saved model (fileName), and valid name-value pair arguments of the predict function (varargin). • Load a trained ensemble model by using loadLearnerForCoder. • Predict labels and corresponding scores from the loaded model. You can allow for optional name-value arguments by specifying varargin as an input argument. For details, see “Code Generation for Variable Length Argument Lists” (MATLAB Coder). type myknnEnsemblePredict.m % Display the contents of myknnEnsemblePredict.m file. function [label,score] = myknnEnsemblePredict(X,fileName,varargin) %#codegen CompactMdl = loadLearnerForCoder(fileName); [label,score] = predict(CompactMdl,X,varargin{:}); end

Add the %#codegen compiler directive (or pragma) to the entry-point function after the function signature to indicate that you intend to generate code for the MATLAB algorithm. Adding this directive instructs the MATLAB Code Analyzer to help you diagnose and fix violations that would result in errors during code generation. See “Check Code with the Code Analyzer” (MATLAB Coder). Note: If you click the button located in the upper-right section of this page and open this example in MATLAB, then MATLAB opens the example folder. This folder includes the entry-point function file (myknnEnsemblePredict.m) and the test file (test_myknnEnsemblePredict.m, described later on). Set Up Compiler To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler”. Create Test File Write a test script that calls the myknnEnsemblePredict function. In the test script, specify the input arguments and name-value pair arguments that you use in the generated code. You use this test script to define input types automatically when generating code using the MATLAB Coder app. In this example, create the test_myknnEnsemblePredict.m file in your current folder, as shown. type test_myknnEnsemblePredict.m % Display the contents of test_myknnEnsemblePredict.m file. %% Load Sample data load ionosphere %% Test myknnEnsemblePredict [label,score] = myknnEnsemblePredict(X,'knnEnsemble','Learners',1:13);

34-24

Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App

For details, see “Automatically Define Input Types by Using the App” (MATLAB Coder). Generate Code Using MATLAB Coder App The MATLAB Coder app generates C or C++ code from MATLAB code. The workflow-based user interface steps you through the code generation process. The following steps describe a brief workflow of the MATLAB Coder App. For more details, see MATLAB Coder (MATLAB Coder) and “Generate C Code by Using the MATLAB Coder App” (MATLAB Coder). 1. Open the MATLAB Coder App and Select the Entry-Point Function File. On the Apps tab, in the Apps section, click the Show more arrow to open the apps gallery. Under Code Generation, click MATLAB Coder. The app opens the Select Source Files page. Enter or select the name of the entry-point function, myknnEnsemblePredict.

Click Next to go to the Define Input Types page. 2. Define Input Types Because C uses static typing, MATLAB Coder must determine the properties of all variables in the MATLAB files at compile time. Therefore, you need to specify the properties of the entry-point function inputs. 34-25

34

Code Generation

Enter or select the test script test_myknnEnsemblePredict and click Autodefine Input Types.

The MATLAB Coder app recognizes input types of the myknnEnsemblePredict function based on the test script. Modify the input types: • X — The app infers that input X is double(351x34). The number of predictors must be fixed to be the same as the number of predictors in the trained model. However, you can have a different number of observations for prediction. If the number of observations is unknown, change double(351x34) to double(:351x34) or double(:infx34). The setting double(:351x34) allows the number of observations up to 351, and the setting double(:infx34) allows an unbounded number of observations. In this example, specify double(:infx34) by clicking 351 and selecting :inf. • fileName — Click char, select Define Constant, and type the file name with single quotes, 'knnEnsemble'. • varargin{1} — Names in name-value pair arguments must be compile-time constants. Click char, select Define Constant, and type 'Learners'. • varargin{2} — To allow user-defined indices up to 13 weak learners in the generated code, change double(1x13) to double(1x:13). 34-26

Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App

Click Next to go to the Check for Run-Time Issues page. This optional step generates a MEX file, runs the MEX function, and reports issues. Click Next to go to the Generate Code page. 3. Generate C Code Set Build type to MEX and click Generate. The app generates a MEX function, myknnEnsemblePredict_mex. A MEX function is a C/C++ program that is executable from MATLAB. You can use a MEX function to accelerate MATLAB algorithms and to test the generated code for functionality and run-time issues. For details, see “MATLAB Algorithm Acceleration” (MATLAB Coder) and “Why Test MEX Functions in MATLAB?” (MATLAB Coder). Depending on the specified build type, MATLAB Coder generates a MEX function or standalone C/C+ + code compiled to a static library, dynamic linked library, or executable. For details on setting a build type, see “Configure Build Settings” (MATLAB Coder).

34-27

34

Code Generation

Click Next to go to the Finish Workflow page. 4. Review the Finish Workflow Page The Finish Workflow page indicates that code generation succeeded. This page also provides a project summary and links to generated output.

34-28

Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App

Generate Code Using Script You can convert a MATLAB Coder project to the equivalent script of MATLAB commands after you define input types. Then you run the script to generate code. For details, see “Convert MATLAB Coder Project to MATLAB Script” (MATLAB Coder). On the MATLAB Coder app toolbar, click the Open action menu button: Select Convert to script, and then click Save. The app creates the file myknnEnsemblePredict_script.m, which reproduces the project in a configuration object and runs the codegen (MATLAB Coder) function. Display the contents of the file myknnEnsemblePredict_script.m. type myknnEnsemblePredict_script.m % MYKNNENSEMBLEPREDICT_SCRIPT Generate MEX-function myknnEnsemblePredict_mex % from myknnEnsemblePredict. % % Script generated from project 'myknnEnsemblePredict.prj' on 17-Nov-2017. %

34-29

34

Code Generation

% See also CODER, CODER.CONFIG, CODER.TYPEOF, CODEGEN. %% Create configuration object of class 'coder.MexCodeConfig'. cfg = coder.config('mex'); cfg.GenerateReport = true; cfg.ReportPotentialDifferences = false; %% Define argument types for entry-point 'myknnEnsemblePredict'. ARGS = cell(1,1); ARGS{1} = cell(4,1); ARGS{1}{1} = coder.typeof(0,[Inf 34],[1 0]); ARGS{1}{2} = coder.Constant('knnEnsemble'); ARGS{1}{3} = coder.Constant('Learners'); ARGS{1}{4} = coder.typeof(0,[1 13],[0 1]); %% Invoke MATLAB Coder. codegen -config cfg myknnEnsemblePredict -args ARGS{1} -nargout 2

Run the script. myknnEnsemblePredict_script

Code generation successful: To view the report, open('codegen\mex\myknnEnsemblePredict\html\repor

Verify Generated Code Test a MEX function to verify that the generated code provides the same functionality as the original MATLAB code. To perform this test, run the MEX function using the same inputs that you used to run the original MATLAB code, and then compare the results. Running the MEX function in MATLAB before generating standalone code also enables you to detect and fix run-time errors that are much harder to diagnose in the generated standalone code. For more details, see “Why Test MEX Functions in MATLAB?” (MATLAB Coder). Pass some predictor data to verify that myknnEnsemblePredict and the MEX function return the same results. [label1,score1] = predict(Mdl,X,'Learners',1:10); [label2,score2] = myknnEnsemblePredict(X,'knnEnsemble','Learners',1:10); [label3,score3] = myknnEnsemblePredict_mex(X,'knnEnsemble','Learners',1:10);

Compare label1, label2, and label3 by using isequal. isequal(label1,label2,label3) ans = logical 1

isequal returns logical 1 (true), which means all the inputs are equal. The score3 output from the MEX function might include round-off differences compared with the output from the predict function. In this case, compare score1 and score3, allowing a small tolerance. find(abs(score1-score3) > 1e-12) ans = 0x1 empty double column vector

34-30

Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App

find returns an empty vector if the element-wise absolute difference between score1 and score3 is not larger than the specified tolerance 1e-12. The comparisons confirm that myknnEnsemblePredict and the MEX function return the same results.

See Also codegen | saveLearnerForCoder | loadLearnerForCoder | learnerCoderConfigurer

More About •

“Introduction to Code Generation” on page 34-3

•

“Code Generation for Prediction of Machine Learning Model at Command Line” on page 34-10

•

“Code Generation for Prediction and Update Using Coder Configurer” on page 34-90

•

“Code Generation and Classification Learner App” on page 34-32

•

“Specify Variable-Size Arguments for Code Generation” on page 34-54

•

“Generate C Code by Using the MATLAB Coder App” (MATLAB Coder)

•

Function List (C/C++ Code Generation)

34-31

34

Code Generation

Code Generation and Classification Learner App Classification Learner is well suited for choosing and training classification models interactively, but it does not generate C/C++ code that labels data based on a trained model. The Generate Function button in the Export section of the Classification Learner app generates MATLAB code for training a model but does not generate C/C++ code. This example shows how to generate C code from a function that predicts labels using an exported classification model. The example builds a model that predicts the credit rating of a business given various financial ratios, according to these steps: 1

Use the credit rating data set in the file CreditRating_Historical.dat, which is included with Statistics and Machine Learning Toolbox.

2

Reduce the data dimensionality using principal component analysis (PCA).

3

Train a set of models that support code generation for label prediction.

4

Export the model with the minimum 5-fold, cross-validated classification accuracy.

5

Generate C code from an entry-point function that transforms the new predictor data and then predicts corresponding labels using the exported model.

Load Sample Data Load sample data and import the data into the Classification Learner app. Review the data using scatter plots and remove unnecessary predictors. Use readtable to load the historical credit rating data set in the file CreditRating_Historical.dat into a table. creditrating = readtable('CreditRating_Historical.dat');

On the Apps tab, click Classification Learner. In Classification Learner, on the Classification Learner tab, in the File section, click New Session and select From Workspace. In the New Session from Workspace dialog box, select the table creditrating. All variables, except the one identified as the response, are double-precision numeric vectors. Click Start Session to compare classification models based on the 5-fold, cross-validated classification accuracy. Classification Learner loads the data and plots a scatter plot of the variables WC_TA versus ID. Because identification numbers are not helpful to display in a plot, choose RE_TA for X under Predictors.

34-32

Code Generation and Classification Learner App

The scatter plot suggests that the two variables can separate the classes AAA, BBB, BB, and CCC fairly well. However, the observations corresponding to the remaining classes are mixed into these classes. Identification numbers are not helpful for prediction. Therefore, in the Options section of the Classification Learner tab, click Feature Selection. In the Default Feature Selection tab, clear the ID check box, and click Save and Apply. You can also remove unnecessary predictors from the beginning by using the check boxes in the New Session from Workspace dialog box. This example shows how to remove unused predictors for code generation when you have included all predictors.

Enable PCA Enable PCA to reduce the data dimensionality. In the Options section of the Classification Learner tab, click PCA. In the Default PCA Options dialog box, select Enable PCA, and click Save and Apply. This action applies PCA to the predictor

34-33

34

Code Generation

data, and then transforms the data before training the models. Classification Learner uses only components that collectively explain 95% of the variability.

Train Models Train a set of models that support code generation for label prediction. For a list of models in Classification Learner that support code generation, see “Generate C Code for Prediction” on page 23-88. Select the following classification models and options, which support code generation for label prediction, and then perform cross-validation (for more details, see “Introduction to Code Generation” on page 34-3). To select each model, in the Models section, click the Show more arrow, and then click the model. Models and Options to Select

Description

Under Decision Trees, select All Trees

Classification trees of various complexities

Under Support Vector Machines, select All SVMs

SVMs of various complexities and using various kernels. Complex SVMs require time to fit.

Under Ensemble Classifiers, select Boosted Boosted ensemble of classification trees Trees. In the model Summary tab, under Model Hyperparameters, reduce Maximum number of splits to 5 and increase Number of learners to 100. Under Ensemble Classifiers, select Bagged Random forest of classification trees Trees. In the model Summary tab, under Model Hyperparameters, reduce Maximum number of splits to 50 and increase Number of learners to 100. After selecting the models and specifying any options, delete the default fine tree model (model 1). Right-click the model in the Models pane and select Delete. Then, in the Train section, click Train All and select Train All. After the app cross-validates each model type, the Models pane displays each model and its 5-fold, cross-validated classification accuracy, and highlights the model with the best accuracy.

34-34

Code Generation and Classification Learner App

Select the model that yields the maximum 5-fold, cross-validated classification accuracy, which is the error-correcting output codes (ECOC) model of Fine Gaussian SVM learners. With PCA enabled, Classification Learner uses two predictors out of six. In the Plot and Interpret section, click the arrow to open the gallery, and then click Confusion Matrix (Validation) in the Validation Results group.

34-35

34

Code Generation

The model does well distinguishing between A, B, and C classes. However, the model does not do as well distinguishing between particular levels within those groups, the lower B levels in particular.

Export Model to Workspace Export the model to the MATLAB Workspace and save the model using saveLearnerForCoder. On the Classification Learner tab, click Export, click Export Model, and select Export Model. In the Export Classification Model dialog box, clear the check box to exclude the training data from the exported model. Click OK to export the compact model. The structure trainedModel appears in the MATLAB Workspace. The field ClassificationSVM of trainedModel contains the compact model. At the command line, save the compact model to a file called ClassificationLearnerModel.mat in your current folder.

34-36

Code Generation and Classification Learner App

saveLearnerForCoder(trainedModel.ClassificationSVM,'ClassificationLearnerModel')

Generate C Code for Prediction Prediction using the object functions requires a trained model object, but the -args option of codegen does not accept such objects. Work around this limitation by using saveLearnerForCoder and loadLearnerForCoder. Save a trained model by using saveLearnerForCoder. Then, define an entry-point function that loads the saved model by using loadLearnerForCoder and calls the predict function. Finally, use codegen to generate code for the entry-point function. Preprocess Data Preprocess new data in the same way you preprocess the training data. To preprocess, you need the following three model parameters: • removeVars — Column vector of at most p elements identifying indices of variables to remove from the data, where p is the number of predictor variables in the raw data • pcaCenters — Row vector of exactly q PCA centers • pcaCoefficients — q-by-r matrix of PCA coefficients, where r is at most q Specify the indices of predictor variables that you removed while selecting data using Feature Selection in Classification Learner. Extract the PCA statistics from trainedModel. removeVars = 1; pcaCenters = trainedModel.PCACenters; pcaCoefficients = trainedModel.PCACoefficients;

Save the model parameters to a file named ModelParameters.mat in your current folder. save('ModelParameters.mat','removeVars','pcaCenters','pcaCoefficients');

Define Entry-Point Function An entry-point function is a function you define for code generation. Because you cannot call any function at the top level using codegen, you must define an entry-point function that calls codegeneration-enabled functions, and then generate C/C++ code for the entry-point function by using codegen. In your current folder, define a function named mypredictCL.m that: • Accepts a numeric matrix (X) of raw observations containing the same predictor variables as the ones passed into Classification Learner • Loads the classification model in ClassificationLearnerModel.mat and the model parameters in ModelParameters.mat • Removes the predictor variables corresponding to the indices in removeVars • Transforms the remaining predictor data using the PCA centers (pcaCenters) and coefficients (pcaCoefficients) estimated by Classification Learner • Returns predicted labels using the model

34-37

34

Code Generation

Generate Code Because C and C++ are statically typed languages, you must determine the properties of all variables in the entry-point function at compile time. Specify variable-size arguments using coder.typeof and generate code using the arguments. Create a double-precision matrix called x for code generation using coder.typeof. Specify that the number of rows of x is arbitrary, but that x must have p columns. p = size(creditrating,2) - 1; x = coder.typeof(0,[Inf,p],[1 0]);

For more details about specifying variable-size arguments, see “Specify Variable-Size Arguments for Code Generation” on page 34-54. Generate a MEX function from mypredictCL.m. Use the -args option to specify x as an argument. codegen mypredictCL -args x

codegen generates the MEX file mypredictCL_mex.mexw64 in your current folder. The file extension depends on your platform. Verify Generated Code Verify that the MEX function returns the expected labels. Remove the response variable from the original data set, and then randomly draw 15 observations. rng('default'); % For reproducibility m = 15; testsampleT = datasample(creditrating(:,1:(end - 1)),m);

Predict corresponding labels by using predictFcn in the classification model trained by Classification Learner. testLabels = trainedModel.predictFcn(testsampleT);

Convert the resulting table to a matrix. testsample = table2array(testsampleT);

The columns of testsample correspond to the columns of the predictor data loaded by Classification Learner. Pass the test data to mypredictCL. The function mypredictCL predicts corresponding labels by using predict and the classification model trained by Classification Learner. testLabelsPredict = mypredictCL(testsample);

Predict corresponding labels by using the generated MEX function mypredictCL_mex. testLabelsMEX = mypredictCL_mex(testsample);

Compare the sets of predictions. isequal(testLabels,testLabelsMEX,testLabelsPredict)

34-38

Code Generation and Classification Learner App

ans = logical 1

isequal returns logical 1 (true) if all the inputs are equal. predictFcn, mypredictCL, and the MEX function return the same values.

See Also loadLearnerForCoder | saveLearnerForCoder | coder.typeof | codegen | learnerCoderConfigurer

Related Examples •

“Classification Learner App”

•

“Predict Responses Using RegressionSVM Predict Block” on page 34-125

•

“Introduction to Code Generation” on page 34-3

•

“Code Generation for Prediction of Machine Learning Model at Command Line” on page 34-10

•

“Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 34-23

•

“Code Generation for Prediction and Update Using Coder Configurer” on page 34-90

•

“Specify Variable-Size Arguments for Code Generation” on page 34-54

•

“Apply PCA to New Data and Generate C/C++ Code” on page 35-5862

34-39

34

Code Generation

Deploy Neural Network Regression Model to FPGA/ASIC Platform This example shows how to train a neural network regression model, use the trained regression model in a Simulink® model that estimates the state of charge of a battery, and generate HDL code from the Simulink model for deployment to an FPGA/ASIC (Field-Programmable Gate Array / Application-Specific Integrated Circuit) platform. State of charge (SoC) is the level of charge of an electric battery relative to its capacity, measured as a percentage. SoC is critical for a vehicle's energy management system. You cannot measure SoC directly; therefore, you must estimate it. The SoC estimation must be accurate to ensure reliable and affordable electrified vehicles (xEV). However, because of the nonlinear temperature, health, and SoC-dependent behavior of Li-ion batteries, SoC estimation remains a significant challenge in automotive engineering. Traditional approaches to this problem, such as electrochemical models, usually require precise parameters and knowledge of the battery composition and physical response. In contrast, modeling SoC with neural networks is a data-driven approach that requires minimal knowledge of the battery and its nonlinear characteristics [1]. This example uses a neural network regression model to predict SoC from the battery's current, voltage, and temperature measurements [2]. The Simulink model in this example includes a plant simulation of the battery and a battery management system (BMS). The BMS monitors the battery state, manages the battery temperature, and ensures safe operation. For example, the BMS helps to avoid overcharging and overdischarging. From the battery sensors, the BMS collects information on the current, voltage, and temperature in a closed-loop system.

Train Regression Model at Command Line To begin, load the data set for this example. Then, train the regression model at the command line and evaluate the model performance. Load Data Set This example uses the batterySmall data set, which is a subset of the data set in [1]. The batterySmall data set contains two tables: trainDataSmall (training data set) and testDataSmall (test data set). Both the training and test data sets have a balanced representation of various temperature ranges. In both data sets, the observations are normalized. 34-40

Deploy Neural Network Regression Model to FPGA/ASIC Platform

Load the batterySmall data set. load batterysmall.mat

Display the first eight rows of the tables trainDataSmall and testDataSmall. head(trainDataSmall) V _______

I _______

Temp _______

V_avg _______

I_avg _______

Y _______

0.3855 0.38704 0.38709 0.38924 0.39174 0.39338 0.39508 0.39529

0.75102 0.75102 0.75102 0.75102 0.75102 0.75102 0.75102 0.75102

0.49157 0.85766 0.85824 0.85628 0.90589 0.90368 0.91501 0.91306

0.38535 0.38571 0.38572 0.38658 0.38919 0.39149 0.3939 0.39417

0.75102 0.75102 0.75102 0.75102 0.75102 0.75102 0.75102 0.75102

0.20642 0.20642 0.20642 0.20642 0.20642 0.20642 0.20642 0.20642

head(testDataSmall) V _______

I _______

Temp _________

V_avg _______

I_avg _______

Y _______

0.68309 0.74425 0.76465 0.91283 0.81809 0.63116 0.55114 0.90598

0.63084 0.70388 0.69525 0.74702 0.71665 0.58937 0.5736 0.80312

0.0084771 0.01131 0.014143 0.014143 0.014143 0.016976 0.031141 0.033973

0.8907 0.87824 0.85308 0.86508 0.86544 0.86426 0.7997 0.79805

0.72464 0.7214 0.71765 0.72354 0.72384 0.72321 0.69442 0.69475

0.99725 0.9958 0.9915 0.99175 0.98893 0.98804 0.96458 0.96337

Both tables contain variables of battery sensor data: voltage (V), current (I), temperature (Temp), average voltage (V_avg), and average current (I_avg). Both tables also contain the state of charge (SoC) variable, which is represented by Y. Train Regression Model Train a neural network regression model by using the fitrnet function on the training data set. Specify the sizes of the hidden, fully connected layers in the neural network model. nnetMdl = fitrnet(trainDataSmall,"Y",LayerSizes=[10,10]);

nnetMdl is a RegressionNeuralNetwork model. Evaluate Model Performance Cross-validate the trained model using 5-fold cross-validation, and estimate the cross-validated classification accuracy. partitionedModel = crossval(nnetMdl,KFold=5); validationAccuracy = 1-kfoldLoss(partitionedModel) validationAccuracy = 0.9993

Calculate the test set accuracy to evaluate how well the trained model generalizes. 34-41

34

Code Generation

testAccuracy = 1-loss(nnetMdl,testDataSmall,"Y") testAccuracy = 0.9994

The test set accuracy is larger than 99.9%, which confirms that the model does not overfit to the training set. Import Model to Simulink for Prediction This example provides the Simulink model slexFPGAPredictExample, which includes the RegressionNeuralNetwork Predict block, for estimating the battery SoC. The model also includes the measured SoC, so you can compare it to the estimated SoC.

Load Data The batterySmall data set contains the dataLarge structure with the input data (X) and the measured SoC (Y). Use the X data to create the input data to the slexFPGAPredictExample model. Create an input signal (input) in the form of an array for the Simulink model. The first column of the array contains the timeVector variable, which includes the points in time at which the observations enter the model. The other five columns of the array contain variables of battery measurements. timeVector = (0:length(dataLarge.X)-1)'; input = [timeVector,dataLarge.X]; measuredSOC = [timeVector dataLarge.Y];

Load the minimum and maximum values of the raw input data used for denormalizing input. minmaxData = load("MinMaxVectors"); X_MIN = minmaxData.X_MIN; X_MAX = minmaxData.X_MAX; stepSize = 10;

Simulate Simulink Model Open the Simulink model slexFPGAPredictExample. Simulate the model and export the simulation output to the workspace. open_system("slexFPGAPredictExample.slx") simOut = sim("slexFPGAPredictExample.slx");

34-42

Deploy Neural Network Regression Model to FPGA/ASIC Platform

Plot the simulated and measured values of the battery SoC. sim_ypred = simOut.yout.get("estim").Values.Data; plot(simOut.tout,sim_ypred) hold on plot(dataLarge.Y) hold off legend("Simulated SoC","Measured SoC",location="northwest")

Convert Simulink Model to Fixed-Point To deploy the Simulink model to FPGA or ASIC hardware with no floating-point support, you must convert the RegressionNeuralNetwork Predict block to fixed-point. You can convert the Neural Network subsystem to fixed-point by using the “Use the Fixed-Point Tool to Rescale a Fixed-Point Model” (Fixed-Point Designer). You can also specify the fixed-point values directly using the Data Type tab of the RegressionNeuralNetwork Predict block dialog box. For more details on how to convert to fixed-point, see “Human Activity Recognition Simulink Model for Fixed-Point Deployment” on page 34-84.

34-43

34

Code Generation

Open the Simulink model slexFPGAPredictFixedPointExample, which is already converted to fixed-point. Simulate the fixed-point Simulink model and export the simulation output to the workspace. open_system("slexFPGAPredictFixedPointExample.slx") simOutFixedPoint = sim("slexFPGAPredictFixedPointExample.slx");

Compare the simulation results for the floating-point (soc_dl) and fixed-point (soc_fp) estimation of the battery SoC. soc_dl_sig = simOut.yout.getElement(1); soc_fp_sig = simOutFixedPoint.yout.getElement(1); soc_dl = soc_dl_sig.Values.Data; soc_fp = soc_fp_sig.Values.Data; max(abs(soc_dl-soc_fp)./soc_dl) ans = 0.0371

This result shows less than a 4% difference between floating-point and fixed-point values for the SoC estimation. Prepare Simulink Model for HDL Code Generation To prepare the RegressionNeuralNetwork Predict block for HDL code generation, open and run HDL Code Advisor. For more information, see “Check HDL Compatibility of Simulink Model Using HDL Code Advisor” (HDL Coder). Open HDL Code Advisor by right-clicking the neural network subsystem and selecting HDL Code > HDL Code Advisor. Alternatively, you can enter: open_system("slexFPGAPredictFixedPointExample.slx") hdlcodeadvisor("slexFPGAPredictFixedPointExample/Neural Network")

Updating Model Advisor cache... Model Advisor cache updated. For new customizations, to update the cache, use the Advisor.Manager

34-44

Deploy Neural Network Regression Model to FPGA/ASIC Platform

In HDL Code Advisor, the left pane lists the folders in the hierarchy. Each folder represents a group or category of related checks. Expand the folders to see the available checks in each group. Make sure that all the checks are selected in the left pane, and then click Run Selected Checks in the right pane. If HDL Code Advisor returns a failure or a warning, the corresponding folder is marked accordingly. Expand each group to view the checks that failed. To fix a failure, click Run This Check in the right pane. Then, click Modify Settings. Click Run This Check again after you apply the modified settings. Repeat this process for each failed check in the following lists. Failed checks in the Industry standard checks group: • Check clock, reset, and enable signals - This check verifies if the clock, reset, and enable signals follow the recommended naming convention. • Check package file names • Check signal and port names • Check top-level subsystem/port names After you apply the suggested settings, run all checks again and inspect that make sure they pass. Generate HDL Code This example provides the Simulink model slexFPGAPredictReadyExample, which is ready for HDL code generation. Open the Simulink model. open_system("slexFPGAPredictReadyExample.slx")

34-45

34

Code Generation

To generate HDL code for the neural network subsystem, right-click the subsystem and select HDL Code > Generate HDL for Subsystem. After the code generation is complete, a code generation report opens. The report contains the generated source files and various reports on the efficiency of the code.

Optimize Model for Efficient Resource Usage on Hardware Open the generated report High-level Resource Report. Note that the Simulink model uses a large number of multipliers and adders/subtractors, because of the matrix-vector operations flagged by HDL Code Advisor. To optimize resource usage, you can enable streaming for your model before generating HDL code. When streaming is enabled, the generated code saves chip area by multiplexing the data over a smaller number of hardware resources. That is, streaming allows some computations to share a hardware resource. The subsystems that can benefit from streaming are: • neural network/RegressionNeuralNetwork Predict/getScore/hiddenLayers/ hiddenLayer1 • neural network/RegressionNeuralNetwork Predict/getScore/hiddenLayers/ hiddenLayer2 34-46

Deploy Neural Network Regression Model to FPGA/ASIC Platform

To enable streaming for these two subsystems, perform these steps for each subsystem: 1

Right-click the subsystem (hiddenLayer1 or hiddenLayer2) and select HDL Code > HDL Block Properties.

2

In the dialog box that opens, change the StreamingFactor option from 0 to 10, because each hidden layer contains 10 neurons.

3

Click OK.

Generate HDL code again and note the reduced number of multipliers and adders/subtractors in the High-level Resource Report. To open the autogenerated version of the model that uses streaming in the generated report, open the Streaming and Sharing report and click the link to the autogenerated model under the link Generated model after the transformation. To see the changes made to the subsystem, navigate to:

34-47

34

Code Generation

/neural network/RegressionNeuralNetwork Predict/getScore/hiddenLayers/ hiddenLayer1 To run the autogenerated model, you must extract the parameters of the neural network model that are stored in the mask workspace of the original Simulink model slexFPGAPredictExample. These parameters now need to be in the base workspace. blockName = "slexFPGAPredictReadyExample/neural network/RegressionNeuralNetwork Predict"; bmw = Simulink.Mask.get(blockName); mv = bmw.getWorkspaceVariables; learnerParams = mv(end).Value;

Deploy New Neural Network Model If you train a new neural network model with different settings (for example, different activation function, number of hidden layers, or size of hidden layers), follow the steps in this example from the start to deploy the new model. The HDL Coder optimization (prior to HDL code generation) might be different, depending on the new model architecture, target hardware, or other requirements.

References [1] Kollmeyer, Phillip, Carlos Vidal, Mina Naguib, and Michael Skells. "LG 18650HG2 Li-ion Battery Data and Example Deep Neural Network xEV SOC Estimator Script." Mendeley 3 (March 2020). https://doi.org/10.17632/CP3473X7XV.3. [2] Vezzini, Andrea. "Lithium-Ion Battery Management." In Lithium-Ion Batteries, edited by Gianfranco Pistoia, 345-360. Elsevier, 2014. https://doi.org/10.1016/ B978-0-444-59513-3.00015-7.

See Also RegressionNeuralNetwork Predict | fitrnet | RegressionNeuralNetwork

Related Examples

34-48

•

“Predict Responses Using RegressionNeuralNetwork Predict Block” on page 34-158

•

“Human Activity Recognition Simulink Model for Fixed-Point Deployment” on page 34-84

•

“HDL Coder Checks in Model Advisor / HDL Code Advisor Overview” (HDL Coder)

•

“Use the Fixed-Point Tool to Rescale a Fixed-Point Model” (Fixed-Point Designer)

Predict Class Labels Using MATLAB Function Block

Predict Class Labels Using MATLAB Function Block This example shows how to add a MATLAB® Function block to a Simulink® model for label prediction. The MATLAB Function block accepts streaming data, and predicts the label and classification score using a trained, support vector machine (SVM) classification model. For details on using the MATLAB Function block, see “Implement MATLAB Functions in Simulink with MATLAB Function Blocks” (Simulink). Train Classification Model This example uses the ionosphere data set, which contains radar-return qualities (Y) and predictor data (X). Radar returns are either of good quality ('g') or of bad quality ('b'). Load the ionosphere data set. Determine the sample size. load ionosphere n = numel(Y) n = 351

The MATLAB Function block cannot return cell arrays. Convert the response variable to a logical vector whose elements are 1 if the radar returns are good, and 0 otherwise. Y = strcmp(Y,'g');

Suppose that the radar returns are detected in sequence, and you have the first 300 observations, but you have not received the last 51 yet. Partition the data into present and future samples. prsntX prsntY ftrX = ftrY =

= X(1:300,:); = Y(1:300); X(301:end,:); Y(301:end);

Train an SVM model using all, presently available data. Specify predictor data standardization. Mdl = fitcsvm(prsntX,prsntY,'Standardize',true);

Mdl is a ClassificationSVM object, which is a linear SVM model. The predictor coefficients in a linear SVM model provide enough information to predict labels for new observations. Removing the support vectors reduces memory usage in the generated code. Remove the support vectors from the linear SVM model by using the discardSupportVectors function. Mdl = discardSupportVectors(Mdl);

Save Model Using saveLearnerForCoder At the command line, you can use Mdl to make predictions for new observations. However, you cannot use Mdl as an input argument in a function meant for code generation. Prepare Mdl to be loaded within the function using saveLearnerForCoder. saveLearnerForCoder(Mdl,'SVMIonosphere');

saveLearnerForCoder compacts Mdl, and then saves it in the MAT-file SVMIonosphere.mat. 34-49

34

Code Generation

Define MATLAB Function Define a MATLAB function named svmIonospherePredict.m that predicts whether a radar return is of good quality. The function must: • Include the code generation directive %#codegen somewhere in the function. • Accept radar-return predictor data. The data must be commensurate with X except for the number of rows. • Load SVMIonosphere.mat using loadLearnerForCoder. • Return predicted labels and classification scores for predicting the quality of the radar return as good (that is, the positive-class score). function [label,score] = svmIonospherePredict(X) %#codegen %svmIonospherePredict Predict radar-return quality using SVM model % svmIonospherePredict predicts labels and estimates classification % scores of the radar returns in the numeric matrix of predictor data X % using the compact SVM model in the file SVMIonosphere.mat. Rows of X % correspond to observations and columns to predictor variables. label % is the predicted label and score is the confidence measure for % classifying the radar-return quality as good. % % Copyright 2016 The MathWorks Inc. Mdl = loadLearnerForCoder('SVMIonosphere'); [label,bothscores] = predict(Mdl,X); score = bothscores(:,2); end

Note: If you click the button located in the upper-right section of this page and open this example in MATLAB, then MATLAB opens the example folder. This folder includes the entry-point function file. Create Simulink Model Create a Simulink model with the MATLAB Function block that dispatches to svmIonospherePredict.m. This example provides the Simulink model slexSVMIonospherePredictExample.slx. Open the Simulink model. SimMdlName = 'slexSVMIonospherePredictExample'; open_system(SimMdlName)

34-50

Predict Class Labels Using MATLAB Function Block

The figure displays the Simulink model. When the input node detects a radar return, it directs that observation into the MATLAB Function block that dispatches to svmIonospherePredict.m. After predicting the label and score, the model returns these values to the workspace and displays the values within the model one at a time. When you load slexSVMIonospherePredictExample.slx, MATLAB also loads the data set that it requires called radarReturnInput. However, this example shows how to construct the required data set. The model expects to receive input data as a structure array called radarReturnInput containing these fields: • time - The points in time at which the observations enter the model. In the example, the duration includes the integers from 0 though 50. The orientation must correspond to the observations in the predictor data. So, for this example, time must be a column vector. • signals - A 1-by-1 structure array describing the input data, and containing the fields values and dimensions. values is a matrix of predictor data. dimensions is the number of predictor variables. Create an appropriate structure array for future radar returns. radarReturnInput.time = (0:50)'; radarReturnInput.signals(1).values = ftrX; radarReturnInput.signals(1).dimensions = size(ftrX,2);

You can change the name from radarReturnInput, and then specify the new name in the model. However, Simulink expects the structure array to contain the described field names. Simulate the model using the data held out of training, that is, the data in radarReturnInput. sim(SimMdlName);

34-51

34

Code Generation

The figure shows the model after it processes all observations in radarReturnInput one at a time. The predicted label of X(351,:) is 1 and its positive-class score is 1.431. The variables tout, yout, and svmlogsout appear in the workspace. yout and svmlogsout are SimulinkData.Dataset objects containing the predicted labels and scores. For more details, see “Data Format for Logged Simulation Data” (Simulink). Extract the simulation data from the simulation log. labelsSL = svmlogsout.getElement(1).Values.Data; scoresSL = svmlogsout.getElement(2).Values.Data;

labelsSL is a 51-by-1 numeric vector of predicted labels. labelsSL(j) = 1 means that the SVM model predicts that radar return j in the future sample is of good quality, and 0 means otherwise. scoresSL is a 51-by-1 numeric vector of positive-class scores, that is, signed distances from the decision boundary. Positive scores correspond to predicted labels of 1, and negative scores correspond to predicted labels of 0. Predict labels and positive-class scores at the command line using predict. [labelCMD,scoresCMD] = predict(Mdl,ftrX); scoresCMD = scoresCMD(:,2);

labelCMD and scoresCMD are commensurate with labelsSL and scoresSL. Compare the future-sample, positive-class scores returned by slexSVMIonospherePredictExample to those returned by calling predict at the command line. err = sum((scoresCMD - scoresSL).^2); err < eps

34-52

Predict Class Labels Using MATLAB Function Block

ans = logical 1

The sum of squared deviations between the sets of scores is negligible. If you also have a Simulink Coder™ license, then you can generate C code from slexSVMIonospherePredictExample.slx in Simulink or from the command line using slbuild (Simulink). For more details, see “Generate C Code for a Model” (Simulink Coder).

See Also predict | loadLearnerForCoder | saveLearnerForCoder | slbuild | learnerCoderConfigurer

Related Examples •

“Predict Responses Using RegressionSVM Predict Block” on page 34-125

•

“Predict Class Labels Using ClassificationSVM Predict Block” on page 34-121

•

“Introduction to Code Generation” on page 34-3

•

“Code Generation for Image Classification” on page 34-113

•

“System Objects for Classification and Code Generation” on page 34-63

•

“Predict Class Labels Using Stateflow” on page 34-71

•

“Human Activity Recognition Simulink Model for Smartphone Deployment” on page 34-75

34-53

34

Code Generation

Specify Variable-Size Arguments for Code Generation This example shows how to specify variable-size input arguments when you generate code for the object functions of classification and regression model objects. Variable-size data is data whose size might change at run time. Specifying variable-size input arguments is convenient when you have data with an unknown size at compile time. This example also describes how to include name-value pair arguments in an entry-point function and how to specify them when generating code. For more detailed code generation workflow examples, see “Code Generation for Prediction of Machine Learning Model at Command Line” on page 34-10 and “Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 34-23. Train Classification Model Load Fisher's iris data set. Convert the labels to a character matrix. load fisheriris species = char(species);

Train a classification tree using the entire data set. Mdl = fitctree(meas,species);

Mdl is a ClassificationTree model. Save Model Using saveLearnerForCoder Save the trained classification tree to a file named ClassTreeIris.mat in your current folder by using saveLearnerForCoder. MdlName = 'ClassTreeIris'; saveLearnerForCoder(Mdl,MdlName);

Define Entry-Point Function In your current folder, define an entry-point function named mypredictTree.m that does the following: • Accept measurements with columns corresponding to meas and accept valid name-value pair arguments. • Load a trained classification tree by using loadLearnerForCoder. • Predict labels and corresponding scores, node numbers, and class numbers from the loaded classification tree. You can allow for optional name-value pair arguments by specifying varargin as an input argument. For details, see “Code Generation for Variable Length Argument Lists” (MATLAB Coder). type mypredictTree.m

% Display contents of mypredictTree.m file

function [label,score,node,cnum] = mypredictTree(x,savedmdl,varargin) %#codegen %MYPREDICTTREE Predict iris species using classification tree % MYPREDICTTREE predicts iris species for the n observations in the % n-by-4 matrix x using the classification tree stored in the MAT-file % whose name is in savedmdl, and then returns the predictions in the % array label. Each row of x contains the lengths and widths of the petal

34-54

Specify Variable-Size Arguments for Code Generation

% and sepal of an iris (see the fisheriris data set). For other output % argument descriptions, see the predict reference page. CompactMdl = loadLearnerForCoder(savedmdl); [label,score,node,cnum] = predict(CompactMdl,x,varargin{:}); end

Note: If you click the button located in the upper-right section of this page and open this example in MATLAB®, then MATLAB® opens the example folder. This folder includes the entry-point function file. Generate Code Specify Variable-Size Arguments Because C and C++ are statically typed languages, you must determine the properties of all variables in an entry-point function at compile time using the -args option of codegen. Use coder.Constant (MATLAB Coder) to specify a compile-time constant input. coder.Constant(v)

coder.Constant(v) creates a coder.Constant type variable whose values are constant, the same as v, during code generation. Use coder.typeof (MATLAB Coder) to specify a variable-size input. coder.typeof(example_value, size_vector, variable_dims)

The values of example_value, size_vector, and variable_dims specify the properties of the input array that the generated code can accept. • An input array has the same data type as the example values in example_value. • size_vector is the array size of an input array if the corresponding variable_dims value is false. • size_vector is the upper bound of the array size if the corresponding variable_dims value is true. • variable_dims specifies whether each dimension of the array has a variable size or a fixed size. A value of true (logical 1) means that the corresponding dimension has a variable size; a value of false (logical 0) means that the corresponding dimension has a fixed size. The entry-point function mypredictTree accepts predictor data, the MAT-file name containing the trained model object, and optional name-value pair arguments. Suppose that you want to generate code that accepts a variable-size array for predictor data and the 'Subtrees' name-value pair argument with a variable-size vector for its value. Then you have four input arguments: predictor data, the MAT-file name, and the name and value of the 'Subtrees' name-value pair argument. Define a 4-by-1 cell array and assign each input argument type of the entry-point function to each cell. ARGS = cell(4,1);

For the first input, use coder.typeof to specify that the predictor data variable is double-precision with the same number of columns as the predictor data used in training the model, but that the number of observations (rows) is arbitrary. 34-55

34

Code Generation

p = numel(Mdl.PredictorNames); ARGS{1} = coder.typeof(0,[Inf,p],[1,0]);

0 for the example_value value implies that the data type is double because double is the default numeric data type of MATLAB. [Inf,p] for the size_vector value and [1,0] for the variable_dims value imply that the size of the first dimension is variable and unbounded, and the size of the second dimension is fixed to be p. The second input is the MAT-file name, which must be a compile-time constant. Use coder.Constant to specify the type of the second input. ARGS{2} = coder.Constant(MdlName);

The last two inputs are the name and value of the 'Subtrees' name-value pair argument. Names of name-value pair arguments must be compile-time constants. ARGS{3} = coder.Constant('Subtrees');

Use coder.typeof to specify that the value of 'Subtrees' is a double-precision row vector and that the upper bound of the row vector size is max(Mdl.PrunedList). m = max(Mdl.PruneList); ARGS{4} = coder.typeof(0,[1,m],[0,1]);

Again, 0 for the example_value value implies that the data type is double because double is the default numeric data type of MATLAB. [1,m] for the size_vector value and [0,1] for the variable_dims value imply that the size of the first dimension is fixed to be 1, and the size of the second dimension is variable and its upper bound is m. Generate Code Using codegen Generate a MEX function from the entry-point function mypredictTree using the cell array ARGS, which includes input argument types for mypredictTree. Specify the input argument types using the -args option. Specify the number of output arguments in the generated entry-point function using the -nargout option. The generate code includes the specified number of output arguments in the order in which they occur in the entry-point function definition. codegen mypredictTree -args ARGS -nargout 2 Code generation successful.

codegen generates the MEX function mypredictTree_mex with a platform-dependent extension in your current folder. The predict function accepts single-precision values, double-precision values, and 'all' for the 'SubTrees' name-value pair argument. However, you can specify only double-precision values when you use the MEX function for prediction because the data type specified by ARGS{4} is double. Verify Generated Code Predict labels for a random selection of 15 values from the training data using the generated MEX function and the subtree at pruning level 1. Compare the labels from the MEX function with those predicted by predict. rng('default'); % For reproducibility Xnew = datasample(meas,15); [labelMEX,scoreMEX] = mypredictTree_mex(Xnew,MdlName,'Subtrees',1);

34-56

Specify Variable-Size Arguments for Code Generation

[labelPREDICT,scorePREDICT] = predict(Mdl,Xnew,'Subtrees',1); labelPREDICT labelPREDICT = 15x10 char array 'virginica ' 'virginica ' 'setosa ' 'virginica ' 'versicolor' 'setosa ' 'setosa ' 'versicolor' 'virginica ' 'virginica ' 'setosa ' 'virginica ' 'virginica ' 'versicolor' 'virginica ' labelMEX labelMEX = 15x1 cell {'virginica' } {'virginica' } {'setosa' } {'virginica' } {'versicolor'} {'setosa' } {'setosa' } {'versicolor'} {'virginica' } {'virginica' } {'setosa' } {'virginica' } {'virginica' } {'versicolor'} {'virginica' }

The predicted labels are the same as the MEX function labels except for the data type. When the response data type is char and codegen cannot determine that the value of Subtrees is a scalar, then the output from the generated code is a cell array of character vectors. For the comparison, you can convert labelsPREDICT to a cell array and use isequal. cell_labelPREDICT = cellstr(labelPREDICT); verifyLabel = isequal(labelMEX,cell_labelPREDICT) verifyLabel = logical 1

isequal returns logical 1 (true), which means all the inputs are equal. Compare the second outputs as well. scoreMex might include round-off differences compared with scorePREDICT. In this case, compare scoreMEX and scorePREDICT, allowing a small tolerance. 34-57

34

Code Generation

find(abs(scorePREDICT-scoreMEX) > 1e-8) ans = 0x1 empty double column vector

find returns an empty vector if the element-wise absolute difference between scorePREDICT and scoreMEX is not larger than the specified tolerance 1e-8. The comparison confirms that scorePREDICT and scoreMEX are equal within the tolerance 1e–8.

See Also codegen | coder.typeof | loadLearnerForCoder | coder.Constant | saveLearnerForCoder | learnerCoderConfigurer

Related Examples

34-58

•

“Introduction to Code Generation” on page 34-3

•

“Code Generation for Prediction of Machine Learning Model at Command Line” on page 34-10

•

“Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 34-23

•

“Code Generation for Prediction and Update Using Coder Configurer” on page 34-90

•

“Code Generation for Nearest Neighbor Searcher” on page 34-20

•

“Code Generation and Classification Learner App” on page 34-32

Create Dummy Variables for Categorical Predictors and Generate C/C++ Code

Create Dummy Variables for Categorical Predictors and Generate C/C++ Code This example shows how to generate code for classifying data using a support vector machine (SVM) model. Train the model using numeric and encoded categorical predictors. Use dummyvar to convert categorical predictors to numeric dummy variables before fitting an SVM classifier. When passing new data to your trained model, you must preprocess the data in a similar manner. Alternatively, if a trained model identifies categorical predictors in the CategoricalPredictors property, then you do not need to create dummy variables manually to generate code. The software handles categorical predictors automatically. For an example, see “Generate Code to Classify Data in Table” on page 34-110. Preprocess Data and Train SVM Classifier Load the patients data set. Create a table using the Diastolic and Systolic numeric variables. Each row of the table corresponds to a different patient. load patients tbl = table(Diastolic,Systolic); head(tbl) Diastolic _________ 93 77 83 75 80 70 88 82

Systolic ________ 124 109 125 117 122 121 130 115

Convert the Gender variable to a categorical variable. The order of the categories in categoricalGender is important because it determines the order of the columns in the predictor data. Use dummyvar to convert the categorical variable to a matrix of zeros and ones, where a 1 value in the (i,j)th entry indicates that the ith patient belongs to the jth category. categoricalGender = categorical(Gender); orderGender = categories(categoricalGender) orderGender = 2x1 cell {'Female'} {'Male' } dummyGender = dummyvar(categoricalGender);

Note: The resulting dummyGender matrix is rank deficient. Depending on the type of model you train, this rank deficiency can be problematic. For example, when training linear models, remove the first column of the dummy variables. Create a table that contains the dummy variable dummyGender with the corresponding variable headings. Combine this new table with tbl. 34-59

34

Code Generation

tblGender = array2table(dummyGender,'VariableNames',orderGender); tbl = [tbl tblGender]; head(tbl) Diastolic _________ 93 77 83 75 80 70 88 82

Systolic ________ 124 109 125 117 122 121 130 115

Female ______ 0 0 1 1 1 1 1 0

Male ____ 1 1 0 0 0 0 0 1

Convert the SelfAssessedHealthStatus variable to a categorical variable. Note the order of the categories in categoricalHealth, and convert the variable to a numeric matrix using dummyvar. categoricalHealth = categorical(SelfAssessedHealthStatus); orderHealth = categories(categoricalHealth) orderHealth = 4x1 cell {'Excellent'} {'Fair' } {'Good' } {'Poor' } dummyHealth = dummyvar(categoricalHealth);

Create a table that contains dummyHealth with the corresponding variable headings. Combine this new table with tbl. tblHealth = array2table(dummyHealth,'VariableNames',orderHealth); tbl = [tbl tblHealth]; head(tbl) Diastolic _________ 93 77 83 75 80 70 88 82

Systolic ________ 124 109 125 117 122 121 130 115

Female ______ 0 0 1 1 1 1 1 0

Male ____ 1 1 0 0 0 0 0 1

Excellent _________ 1 0 0 0 0 0 0 0

Fair ____ 0 1 0 1 0 0 0 0

Good ____ 0 0 1 0 1 1 1 1

Poor ____ 0 0 0 0 0 0 0 0

The third row of tbl, for example, corresponds to a patient with these characteristics: diastolic blood pressure of 83, systolic blood pressure of 125, female, and good self-assessed health status. Because all the values in tbl are numeric, you can convert the table to a matrix X. X = table2array(tbl);

34-60

Create Dummy Variables for Categorical Predictors and Generate C/C++ Code

Train an SVM classifier using X and a Gaussian kernel function with an automatic kernel scale. Specify the Smoker variable as the response. Y = Smoker; Mdl = fitcsvm(X,Y, ... 'KernelFunction','gaussian','KernelScale','auto');

Generate C/C++ Code Generate code that loads the SVM classifier, takes new predictor data as an input argument, and then classifies the new data. Save the SVM classifier to a file using saveLearnerForCoder. saveLearnerForCoder(Mdl,'SVMClassifier')

saveLearnerForCoder saves the classifier to the MATLAB® binary file SVMClassifier.mat as a structure array in the current folder. Define the entry-point function mySVMPredict, which takes new predictor data as an input argument. Within the function, load the SVM classifier by using loadLearnerForCoder, and then pass the loaded classifier to predict. function label = mySVMPredict(X) %#codegen Mdl = loadLearnerForCoder('SVMClassifier'); label = predict(Mdl,X); end

Generate code for mySVMPredict by using codegen. Specify the data type and dimensions of the new predictor data by using coder.typeof so that the generated code accepts a variable-size array. codegen mySVMPredict -args {coder.typeof(X,[Inf 8],[1 0])} Code generation successful.

Verify that mySVMPredict and the MEX file return the same results for the training data. label = predict(Mdl,X); mylabel = mySVMPredict(X); mylabel_mex = mySVMPredict_mex(X); verifyMEX = isequal(label,mylabel,mylabel_mex) verifyMEX = logical 1

Predict Labels for New Data To predict labels for new data, you must first preprocess the new data. If you run the generated code in the MATLAB environment, you can follow the preprocessing steps described in this section. If you deploy the generated code outside the MATLAB environment, the preprocessing steps can differ. In either case, you must ensure that the new data has the same columns as the training data X. In this example, take the third, fourth, and fifth patients in the patients data set. Preprocess the data for these patients so that the resulting numeric matrix matches the form of the training data. Convert the categorical variables to dummy variables. Because the new observations might not include values from all categories, you need to specify the same categories as the ones used during 34-61

34

Code Generation

training and maintain the same category order. In MATLAB, pass the ordered cell array of category names associated with the corresponding training data variable (in this example, orderGender for gender values and orderHealth for self-assessed health status values). newcategoricalGender = categorical(Gender(3:5),orderGender); newdummyGender = dummyvar(newcategoricalGender); newcategoricalHealth = categorical(SelfAssessedHealthStatus(3:5),orderHealth); newdummyHealth = dummyvar(newcategoricalHealth);

Combine all the new data into a numeric matrix. newX = [Diastolic(3:5) Systolic(3:5) newdummyGender newdummyHealth] newX = 3×8 83 75 80

125 117 122

1 1 1

0 0 0

0 0 0

0 1 0

1 0 1

0 0 0

Note that newX corresponds exactly to the third, fourth, and fifth rows of the matrix X. Verify that mySVMPredict and the MEX file return the same results for the new data. newlabel = predict(Mdl,newX); newmylabel = mySVMPredict(newX); newmylabel_mex = mySVMPredict_mex(newX); newverifyMEX = isequal(newlabel,newmylabel,newmylabel_mex) newverifyMEX = logical 1

See Also dummyvar | categorical | ClassificationSVM | codegen | coder.typeof | loadLearnerForCoder | coder.Constant | saveLearnerForCoder

Related Examples

34-62

•

“Introduction to Code Generation” on page 34-3

•

“Code Generation for Prediction of Machine Learning Model at Command Line” on page 34-10

•

“Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 34-23

•

“Code Generation and Classification Learner App” on page 34-32

System Objects for Classification and Code Generation

System Objects for Classification and Code Generation This example shows how to generate C code from a MATLAB® System object™ that classifies images of digits by using a trained classification model. This example also shows how to use the System object for classification in Simulink®. The benefit of using System objects over MATLAB function is that System objects are more appropriate for processing large amounts of streaming data. For more details, see “What Are System Objects?” This example is based on “Code Generation for Image Classification” on page 34-113, which is an alternative workflow to “Digit Classification Using HOG Features” (Computer Vision Toolbox). Load Data Load the digitimages. load digitimages.mat

images is a 28-by-28-by-3000 array of uint16 integers. Each page is a raster image of a digit. Each element is a pixel intensity. Corresponding labels are in the 3000-by-1 numeric vector Y. For more details, enter Description at the command line. Store the number of observations and the number of predictor variables. Create a data partition that specifies to hold out 20% of the data. Extract training and test set indices from the data partition. rng(1); % For reproducibility n = size(images,3); p = numel(images(:,:,1)); cvp = cvpartition(n,'Holdout',0.20); idxTrn = training(cvp); idxTest = test(cvp);

Rescale Data Rescale the pixel intensities so that they range in the interval [0,1] within each image. Specifically, suppose pi j is pixel intensity j within image i. For image i, rescale all of its pixel intensities by using this formula: pi j − min(pi j) pi j =

j

max(pi j) − min(pi j) j

.

j

X = double(images); for i = 1:n minX = min(min(X(:,:,i))); maxX = max(max(X(:,:,i))); X(:,:,i) = (X(:,:,i) - minX)/(maxX - minX); end

Reshape Data For code generation, the predictor data for training must be in a table of numeric variables or a numeric matrix. 34-63

34

Code Generation

Reshape the data to a matrix such that predictor variables correspond to columns and images correspond to rows. Because reshape takes elements column-wise, transpose its result. X = reshape(X,[p,n])';

Train and Optimize Classification Models Cross-validate an ECOC model of SVM binary learners and a random forest based on the training observations. Use 5-fold cross-validation. For the ECOC model, specify predictor standardization and optimize classification error over the ECOC coding design and the SVM box constraint. Explore all combinations of these values: • For the ECOC coding design, use one-versus-one and one-versus-all. • For the SVM box constraint, use three logarithmically spaced values from 0.1 to 100 each. For all models, store the 5-fold cross-validated misclassification rates. coding = {'onevsone' 'onevsall'}; boxconstraint = logspace(-1,2,3); cvLossECOC = nan(numel(coding),numel(boxconstraint)); % For preallocation for i = 1:numel(coding) for j = 1:numel(boxconstraint) t = templateSVM('BoxConstraint',boxconstraint(j),'Standardize',true); CVMdl = fitcecoc(X(idxTrn,:),Y(idxTrn),'Learners',t,'KFold',5,... 'Coding',coding{i}); cvLossECOC(i,j) = kfoldLoss(CVMdl); fprintf('cvLossECOC = %f for model using %s coding and box constraint=%f\n',... cvLossECOC(i,j),coding{i},boxconstraint(j)) end end cvLossECOC cvLossECOC cvLossECOC cvLossECOC cvLossECOC cvLossECOC

= = = = = =

0.058333 0.057083 0.050000 0.120417 0.121667 0.127917

for for for for for for

model model model model model model

using using using using using using

onevsone onevsone onevsone onevsall onevsall onevsall

coding coding coding coding coding coding

and and and and and and

box box box box box box

constraint=0.100000 constraint=3.162278 constraint=100.000000 constraint=0.100000 constraint=3.162278 constraint=100.000000

For the random forest, vary the maximum number of splits by using the values in the sequence 2

3

m

m

{3 , 3 , . . . , 3 }. m is such that 3 is no greater than n - 1. To reproduce random predictor selections, specify 'Reproducible',true. n = size(X,1); m = floor(log(n - 1)/log(3)); maxNumSplits = 3.^(2:m); cvLossRF = nan(numel(maxNumSplits)); for i = 1:numel(maxNumSplits) t = templateTree('MaxNumSplits',maxNumSplits(i),'Reproducible',true); CVMdl = fitcensemble(X(idxTrn,:),Y(idxTrn),'Method','bag','Learners',t,... 'KFold',5); cvLossRF(i) = kfoldLoss(CVMdl); fprintf('cvLossRF = %f for model using %d as the maximum number of splits\n',... cvLossRF(i),maxNumSplits(i)) end cvLossRF = 0.319167 for model using 9 as the maximum number of splits cvLossRF = 0.192917 for model using 27 as the maximum number of splits

34-64

System Objects for Classification and Code Generation

cvLossRF cvLossRF cvLossRF cvLossRF

= = = =

0.066250 0.015000 0.013333 0.009583

for for for for

model model model model

using using using using

81 as the maximum number of splits 243 as the maximum number of splits 729 as the maximum number of splits 2187 as the maximum number of splits

For each algorithm, determine the hyperparameter indices that yield the minimal misclassification rates. minCVLossECOC = min(cvLossECOC(:)) minCVLossECOC = 0.0500 linIdx = find(cvLossECOC == minCVLossECOC,1); [bestI,bestJ] = ind2sub(size(cvLossECOC),linIdx); bestCoding = coding{bestI} bestCoding = 'onevsone' bestBoxConstraint = boxconstraint(bestJ) bestBoxConstraint = 100 minCVLossRF = min(cvLossRF(:)) minCVLossRF = 0.0096 linIdx = find(cvLossRF == minCVLossRF,1); [bestI,bestJ] = ind2sub(size(cvLossRF),linIdx); bestMNS = maxNumSplits(bestI) bestMNS = 2187

The random forest achieves a smaller cross-validated misclassification rate. Train an ECOC model and a random forest using the training data. Supply the optimal hyperparameter combinations. t = templateSVM('BoxConstraint',bestBoxConstraint,'Standardize',true); MdlECOC = fitcecoc(X(idxTrn,:),Y(idxTrn),'Learners',t,'Coding',bestCoding); t = templateTree('MaxNumSplits',bestMNS); MdlRF = fitcensemble(X(idxTrn,:),Y(idxTrn),'Method','bag','Learners',t);

Create a variable for the test sample images and use the trained models to predict test sample labels. testImages = X(idxTest,:); testLabelsECOC = predict(MdlECOC,testImages); testLabelsRF = predict(MdlRF,testImages);

Save Classification Model to Disk MdlECOC and MdlRF are predictive classification models, but you must prepare them for code generation. Save MdlECOC and MdlRF to your present working folder using saveLearnerForCoder. saveLearnerForCoder(MdlECOC,'DigitImagesECOC'); saveLearnerForCoder(MdlRF,'DigitImagesRF');

Create System Object for Prediction Create two System objects, one for the ECOC model and the other for the random forest, that: 34-65

34

Code Generation

• Load the previously saved trained model by using loadLearnerForCoder. • Make sequential predictions by the step method. • Enforce no size changes to the input data. • Enforce double-precision, scalar output. type ECOCClassifier.m % Display contents of ECOCClassifier.m file classdef ECOCClassifier < matlab.System % ECOCCLASSIFIER Predict image labels from trained ECOC model % % ECOCCLASSIFIER loads the trained ECOC model from % |'DigitImagesECOC.mat'|, and predicts labels for new observations % based on the trained model. The ECOC model in % |'DigitImagesECOC.mat'| was cross-validated using the training data % in the sample data |digitimages.mat|. properties(Access = private) CompactMdl % The compacted, trained ECOC model end methods(Access = protected) function setupImpl(obj) % Load ECOC model from file obj.CompactMdl = loadLearnerForCoder('DigitImagesECOC'); end function y = stepImpl(obj,u) y = predict(obj.CompactMdl,u); end function flag = isInputSizeMutableImpl(obj,index) % Return false if input size is not allowed to change while % system is running flag = false; end function dataout = getOutputDataTypeImpl(~) dataout = 'double'; end function sizeout = getOutputSizeImpl(~) sizeout = [1 1]; end end end type RFClassifier.m % Display contents of RFClassifier.m file classdef RFClassifier < matlab.System % RFCLASSIFIER Predict image labels from trained random forest % % RFCLASSIFIER loads the trained random forest from % |'DigitImagesRF.mat'|, and predicts labels for new observations based % on the trained model. The random forest in |'DigitImagesRF.mat'| % was cross-validated using the training data in the sample data % |digitimages.mat|.

34-66

System Objects for Classification and Code Generation

properties(Access = private) CompactMdl % The compacted, trained random forest end methods(Access = protected) function setupImpl(obj) % Load random forest from file obj.CompactMdl = loadLearnerForCoder('DigitImagesRF'); end function y = stepImpl(obj,u) y = predict(obj.CompactMdl,u); end function flag = isInputSizeMutableImpl(obj,index) % Return false if input size is not allowed to change while % system is running flag = false; end function dataout = getOutputDataTypeImpl(~) dataout = 'double'; end function sizeout = getOutputSizeImpl(~) sizeout = [1 1]; end end end

Note: If you click the button located in the upper-right section of this page and open this example in MATLAB®, then MATLAB® opens the example folder. This folder includes the files used in this example. For System object basic requirements, see “Define Basic System Objects”. Define Prediction Functions for Code Generation Define two MATLAB functions called predictDigitECOCSO.m and predictDigitRFSO.m. The functions: • Include the code generation directive %#codegen. • Accept image data commensurate with X. • Predict labels using the ECOCClassifier and RFClassifier System objects, respectively. • Return predicted labels. type predictDigitECOCSO.m % Display contents of predictDigitECOCSO.m file function label = predictDigitECOCSO(X) %#codegen %PREDICTDIGITECOCSO Classify digit in image using ECOC Model System object % PREDICTDIGITECOCSO classifies the 28-by-28 images in the rows of X % using the compact ECOC model in the System object ECOCClassifier, and % then returns class labels in label. classifier = ECOCClassifier;

34-67

34

Code Generation

label = step(classifier,X); end type predictDigitRFSO.m % Display contents of predictDigitRFSO.m file function label = predictDigitRFSO(X) %#codegen %PREDICTDIGITRFSO Classify digit in image using RF Model System object % PREDICTDIGITRFSO classifies the 28-by-28 images in the rows of X % using the compact random forest in the System object RFClassifier, and % then returns class labels in label. classifier = RFClassifier; label = step(classifier,X); end

Compile MATLAB Function to MEX File Compile the prediction function that achieves better test-sample accuracy to a MEX file by using codegen. Specify the test set images by using the -args argument. if(minCVLossECOC 2). Y = Y > 2;

Select 10,000 observations as the training set, and select 10,000 observations to track the model performance metrics. n = 10000; Xtrain = X(1:n,:); Ytrain = Y(1:n,:); Xmetrics = X(n+1:2*n,:); Ymetrics = Y(n+1:2*n,:);

Create Incremental Learner Model Create an incremental linear model for binary classification. Specify that the data has 60 predictors and that the data type of the responses is logical. Also specify to standardize the data using an estimation period of 500 observations before the Update Metrics block outputs performance metrics. Create a workspace variable linearMdl to store the initial incremental learning model. Mdl = incrementalClassificationLinear(ClassNames=[false,true], ... NumPredictors=60,Standardize=true,EstimationPeriod=500, ... MetricsWarmupPeriod=500); linearMdl = Mdl;

34-249

34

Code Generation

Create Input Data for Simulink Simulate streaming data by dividing the training data into chunks of 50 observations. For each chunk, select a single observation as a test set to import into the IncrementalClassificationLinear Predict block. numObsPerChunk = 50; nchunk = floor(n/numObsPerChunk); for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; Xin(:,:,j) = Xtrain(idx,:); Yin(:,j) = Ytrain(idx); Xm_in(:,:,j) = Xmetrics(idx,:); Ym_in(:,j) = Ymetrics(idx); Xtest(1,:,j) = Xtrain(idx(1),:); end

Convert the training, metrics, and test set chunks into time series objects. t = 0:size(Xin,3)-1; Xtrain_ts = timeseries(Xin,t,InterpretSingleRowDataAs3D=true); Ytrain_ts = timeseries(Yin',t,InterpretSingleRowDataAs3D=true); Xm_ts = timeseries(Xm_in,t,InterpretSingleRowDataAs3D=true); Ym_ts = timeseries(Ym_in',t,InterpretSingleRowDataAs3D=true); Xtest_ts = timeseries(Xtest,t,InterpretSingleRowDataAs3D=true);

Open Provided Simulink Model This example provides the Simulink model slexUpdateMetricsExample.slx, which includes the IncrementalClassificationLinear Fit, IncrementalClassificationLinear Predict, and Update Metrics blocks. The Simulink model is configured to use linearMdl as the initial model for incremental learning and classification. Open the Simulink model slexUpdateMetricsExample.slx. slName = "slexUpdateMetricsExample"; open_system(slName);

34-250

Perform Incremental Learning and Track Performance Metrics Using Update Metrics Block

Simulate Model Simulate the Simulink model to perform incremental learning, predict responses for the test set observations, and compute performance metrics. Export the simulation outputs to the workspace. You can use the Simulation Data Inspector (Simulink) to view the logged data of an Outport block. simOut = sim(slName,"StopTime",num2str(numel(t)-1)); % Extract labels yfit_sig = simOut.yout.getElement(1); yfit_sl = squeeze(yfit_sig.Values.Data); % Extract score values scores_sig = simOut.yout.getElement(2); scores_sl = squeeze(scores_sig.Values.Data); % Extract beta values beta_sig = simOut.yout.getElement(3); beta_sl = squeeze(beta_sig.Values.Data); % Extract bias values bias_sig = simOut.yout.getElement(4); bias_sl = squeeze(bias_sig.Values.Data); % Extract IsWarm values IsWarm_sig = simOut.yout.getElement(5); IsWarm_sl = squeeze(IsWarm_sig.Values.Data);

34-251

34

Code Generation

% Extract metrics values metrics_sig = simOut.yout.getElement(6); metrics_sl = squeeze(metrics_sig.Values.Data);

At each iteration, the IncrementalClassificationLinear Fit block trains the model and updates the model parameters. The IncrementalClassificationLinear Predict block calculates the predicted label for the test set observation, and the Update Metrics block calculates the performance metrics. Analyze Model During Training To see how the model parameters and scores evolve during training, plot them on separate tiles. figure tiledlayout(3,1); nexttile plot(scores_sl(1,:),".") ylabel("Scores") xlabel("Iteration") xlim([0 nchunk]) nexttile plot(beta_sl(1,:),".-") ylabel("\beta_1") xlabel("Iteration") xlim([0 nchunk]) nexttile plot(bias_sl,".-") ylabel("Bias") xlabel("Iteration") xlim([0 nchunk])

34-252

Perform Incremental Learning and Track Performance Metrics Using Update Metrics Block

During the estimation period, the IncrementalClassificationLinear Fit block estimates hyperparameters but does not fit the initial model (see incrementalRegressionLinear). Therefore, the score output of the IncrementalClassificationLinear Predict block, model beta coefficients, and model bias all equal 0 during the first 10 iterations. At the end of the estimation period, the IncrementalClassificationLinear Fit block updates the model parameters and predicts labels. The score values vary between approximately –10 and 5. The first beta coefficient drops significantly during the first 30 iterations following the estimation period, and then varies between – 0.14 and –0.07 thereafter. The bias (intercept) term fluctuates initially and then gradually approaches 0.04. Plot the IsWarm status indicator value and cumulative performance metric on separate tiles. figure tiledlayout(2,1); nexttile plot(IsWarm_sl,".") ylabel("IsWarm") xlabel("Iteration") xlim([0 nchunk]) nexttile plot(metrics_sl(:,1),".") ylabel("Cumulative Metric (classiferror)") xlabel("Iteration") xlim([0 nchunk]) ylim([0 0.005])

34-253

34

Code Generation

After the estimation period, the Update Metrics block fits observations during the metrics warm-up period but does not calculate performance metrics. After the metrics warm-up period, IsWarm = 1 and the block calculates the performance metrics. In the provided Simulink model, the model performance metric is classiferror (the classification error). The cumulative classification error is –1 during the first 21 iterations, 0 for the next 23 iterations, and then varies between 0.001 and 0.003.

See Also IncrementalClassificationLinear Fit | IncrementalClassificationLinear Predict | IncrementalRegressionLinear Fit | IncrementalRegressionLinear Predict | incrementalRegressionLinear | fit | predict | updateMetricsAndFit | updateMetrics

Related Examples

34-254

•

Perform Incremental Learning Using IncrementalClassificationLinear Fit and Predict Blocks on page 34-245

•

Perform Incremental Learning Using IncrementalRegressionLinear Fit and Predict Blocks on page 34-241

35 Functions

35

Functions

addedvarplot Create added variable plot using input data

Syntax addedvarplot(X,y,num,inmodel) addedvarplot(X,y,num,inmodel,stats) addedvarplot(ax, ___ )

Description addedvarplot(X,y,num,inmodel) displays an added variable plot using the predictive terms in X, the response values in y, the added term in column num of X, and the model with current terms specified by inmodel. X is an n-by-p matrix of n observations of p predictive terms. y is vector of n response values. num is a scalar index specifying the column of X with the term to be added. inmodel is a logical vector of p elements specifying the columns of X in the current model. By default, all elements of inmodel are false. Note addedvarplot automatically includes a constant term in all models. Do not enter a column of 1s directly into X. addedvarplot(X,y,num,inmodel,stats) uses the stats output from the stepwisefit function to improve the efficiency of repeated calls to addedvarplot. Otherwise, this syntax is equivalent to the previous syntax. addedvarplot(ax, ___ ) creates the plot in the axes specified by ax instead of the current axes (gca). The option ax can precede any of the input argument combinations in the previous syntaxes. For more information on creating an Axes object, see axes and gca. Added variable plots are used to determine the unique effect of adding a new term to a multilinear model. The plot shows the relationship between the part of the response unexplained by terms already in the model and the part of the new term unexplained by terms already in the model. The “unexplained” parts are measured by the residuals of the respective regressions. A scatter of the residuals from the two regressions forms the added variable plot. In addition to the scatter of residuals, the plot produced by addedvarplot shows 95% confidence intervals on predictions from the fitted line. The slope of the fitted line is the coefficient that the new term would have if it were added to the model with terms inmodel. For more details, see “Added Variable Plot” on page 35-6078. Added variable plots are sometimes known as partial regression leverage plots.

Examples Create Added Variable Plot Load the data in hald.mat, which contains observations of the reaction to heat for various cement mixtures. 35-2

addedvarplot

load hald whos Name Description hald heat ingredients

Size

Bytes

22x58 13x5 13x1 13x4

2552 520 104 416

Class

Attributes

char double double double

Create an added variable plot to investigate the effect of adding the third column of ingredients to a model that contains the first two columns. inmodel = [true true false false]; addedvarplot(ingredients,heat,3,inmodel)

The wide scatter plot and the low slope of the fitted line are evidence against the statistical significance of adding the third column to the model.

Alternative Functionality You can create a linear regression model object LinearModel by using fitlm or stepwiselm and use the object function plotAdded to create an added variable plot. A LinearModel object provides the object properties and the object functions to investigate a fitted linear regression model. The object properties include information about coefficient estimates, 35-3

35

Functions

summary statistics, fitting method, and input data. Use the object functions to predict responses and to modify, evaluate, and visualize the linear regression model.

Version History Introduced before R2006a

See Also stepwisefit | stepwise | plotAdded

35-4

addK

addK Package: clustering.evaluation Evaluate additional numbers of clusters

Syntax updatedEvaluation = addK(evaluation,klist)

Description updatedEvaluation = addK(evaluation,klist) returns a clustering evaluation object updatedEvaluation, which contains the evaluation data in the clustering evaluation object evaluation and additional evaluation data for the proposed number of clusters specified in klist.

Examples Evaluate Additional Numbers of Clusters Create a clustering evaluation object using evalclusters, and then use addK to evaluate additional numbers of clusters. Load the fisheriris data set. The data contains length and width measurements from the sepals and petals of three species of iris flowers. load fisheriris

Cluster the flower measurement data using kmeans, and use the Calinski-Harabasz criterion to evaluate proposed solutions for 1 to 5 clusters. evaluation = evalclusters(meas,"kmeans","CalinskiHarabasz","KList",1:5) evaluation = CalinskiHarabaszEvaluation with properties: NumObservations: InspectedK: CriterionValues: OptimalK:

150 [1 2 3 4 5] [NaN 513.9245 561.6278 530.4871 456.1279] 3

The clustering evaluation object evaluation contains data on each proposed clustering solution. The returned value of OptimalK indicates that the optimal solution is three clusters. Evaluate proposed solutions for 6 to 10 clusters using the same criterion. Add these evaluations to the original clustering evaluation object. evaluation = addK(evaluation,6:10) evaluation = CalinskiHarabaszEvaluation with properties:

35-5

35

Functions

NumObservations: InspectedK: CriterionValues: OptimalK:

150 [1 2 3 4 5 6 7 8 9 10] [NaN 513.9245 561.6278 530.4871 456.1279 469.5068 449.6410 435.8182 413.3837 3

The updated values for InspectedK and CriterionValues show that evaluation now evaluates proposed solutions for 1 to 10 clusters. The OptimalK value is still 3, indicating that the optimal solution is still three clusters.

Input Arguments evaluation — Clustering evaluation data CalinskiHarabaszEvaluation object | DaviesBouldinEvaluation object | GapEvaluation object | SilhouetteEvaluation object Clustering evaluation data, specified as a CalinskiHarabaszEvaluation, DaviesBouldinEvaluation, GapEvaluation, or SilhouetteEvaluation clustering evaluation object. Create a clustering evaluation object by using evalclusters. klist — Additional number of clusters to evaluate positive integer vector Additional number of clusters to evaluate, specified as a positive integer vector. If any values in klist overlap with clustering solutions already evaluated in the evaluation object, then addK ignores the overlapping values. Data Types: single | double

Output Arguments updatedEvaluation — Updated clustering evaluation data CalinskiHarabaszEvaluation object | DaviesBouldinEvaluation object | GapEvaluation object | SilhouetteEvaluation object Updated clustering evaluation data, returned as a CalinskiHarabaszEvaluation, DaviesBouldinEvaluation, GapEvaluation, or SilhouetteEvaluation clustering evaluation object. updatedEvaluation contains data on the proposed clustering solutions included in evaluation and data on the additional proposed number of clusters specified in klist. For all clustering evaluation objects, addK updates the InspectedK and CriterionValues properties to include the proposed clustering solutions specified in klist and their corresponding criterion values. If the software finds a new optimal number of clusters and optimal clustering solution, then addK also updates the OptimalK and OptimalY properties. For certain clustering evaluation objects, addK updates these additional property values: • LogW, ExpectedLogW, StdLogW, and SE (for gap criterion evaluation objects) • ClusterSilhouettes (for silhouette criterion evaluation objects)

35-6

addK

Version History Introduced in R2014a

See Also evalclusters | CalinskiHarabaszEvaluation | DaviesBouldinEvaluation | GapEvaluation | SilhouetteEvaluation

35-7

35

Functions

addlevels (Not Recommended) Add levels to nominal or ordinal arrays Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” data type instead.

Syntax B = addlevels(A,newlevels)

Description B = addlevels(A,newlevels) adds new levels specified by newlevels to the nominal or ordinal array A. addlevels adds the new levels at the end of the list of possible levels in A, but does not modify the value of any element. B does not contain elements at the new levels.

Examples Add Levels to a Nominal Array Add levels for additional species to Fisher's iris data. Create a nominal array of the existing species in Fisher's iris data. load fisheriris species = nominal(species); getlevels(species) ans = 1x3 nominal setosa versicolor

virginica

Add two additional species. species = addlevels(species,{'spuria','ruthenica'}); getlevels(species) ans = 1x5 nominal setosa versicolor

virginica

spuria

ruthenica

Even though there are new levels, there are no elements in species that are in these new levels. sum(species=='spuria') ans = 0 sum(species=='ruthenica') ans = 0

35-8

addlevels

Input Arguments A — Nominal or ordinal array nominal array | ordinal array Nominal or ordinal array, specified as a nominal or ordinal array object created with nominal or ordinal. newlevels — Levels to add string array | cell array of character vectors | 2-D character matrix Levels to add to the input nominal or ordinal array, specified as a string array, a cell array of character vectors, or a 2-D character matrix. Data Types: char | string | cell

Output Arguments B — Nominal or ordinal array nominal array | ordinal array Nominal or ordinal array, returned as a nominal or ordinal array object.

Version History Introduced in R2007a

See Also droplevels | mergelevels | reorderlevels | nominal | ordinal

35-9

35

Functions

addInteractions Add interaction terms to univariate generalized additive model (GAM)

Syntax UpdatedMdl = addInteractions(Mdl,Interactions) UpdatedMdl = addInteractions(Mdl,Interactions,Name,Value)

Description UpdatedMdl = addInteractions(Mdl,Interactions) returns an updated model UpdatedMdl by adding the interaction terms in Interactions to the univariate generalized additive model Mdl. The model Mdl must contain only linear terms for predictors. If you want to resume training for the existing terms in Mdl, use the resume function. UpdatedMdl = addInteractions(Mdl,Interactions,Name,Value) specifies additional options using one or more name-value arguments. For example, 'MaxPValue',0.05 specifies to include only the interaction terms whose p-values are not greater than 0.05.

Examples Train GAM with Interaction Terms Train a univariate GAM, which contains linear terms for predictors, and then add interaction terms to the trained model by using the addInteractions function. Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. load carbig

Create a table that contains the predictor variables (Acceleration, Displacement, Horsepower, and Weight) and the response variable (MPG). tbl = table(Acceleration,Displacement,Horsepower,Weight,MPG);

Train a univariate GAM that contains linear terms for predictors in tbl. Mdl = fitrgam(tbl,'MPG');

Add the five most important interaction terms to the trained model. UpdatedMdl = addInteractions(Mdl,5);

Mdl is a univariate GAM, and UpdatedMdl is an updated GAM that contains all the terms in Mdl and five additional interaction terms. Display the interaction terms in UpdatedMdl. UpdatedMdl.Interactions ans = 5×2

35-10

addInteractions

2 1 3 1 1

3 2 4 4 3

Each row of the Interactions property represents one interaction term and contains the column indexes of the predictor variables for the interaction term. You can use the Interactions property to check the interaction terms in the model and the order in which fitrgam adds them to the model.

Specify Options for Interaction Terms Train a univariate GAM, which contains linear terms for predictors, and then add interaction terms to the trained model by using the addInteractions function. Specify the 'MaxPValue' name-value argument to add interaction terms whose p-values are not greater than the 'MaxPValue' value. Load Fisher's iris data set. Create a table that contains observations for versicolor and virginica. load fisheriris inds = strcmp(species,'versicolor') | strcmp(species,'virginica'); Tbl = array2table(meas(inds,:),'VariableNames',["x1","x2","x3","x4"]); Tbl.Y = species(inds,:);

Train a univariate GAM that contains linear terms for predictors in Tbl. Mdl = fitcgam(Tbl,'Y');

Add important interaction terms to the trained model Mdl. Specify 'all' for the Interactions argument, and set the 'MaxPValue' name-value argument to 0.05. Among all available interaction terms, addInteractions identifies those whose p-values are not greater than the 'MaxPValue' value and adds them to the model. The default 'MaxPValue' is 1 so that the function adds all specified interaction terms to the model. UpdatedMdl = addInteractions(Mdl,'all','MaxPValue',0.05); UpdatedMdl.Interactions ans = 5×2 3 2 1 2 1

4 4 4 3 3

Mdl is a univariate GAM, and UpdatedMdl is an updated GAM that contains all the terms in Mdl and five additional interaction terms. UpdatedMdl includes five of the six available pairs of interaction terms.

35-11

35

Functions

Input Arguments Mdl — Generalized additive model ClassificationGAM model object | RegressionGAM model object Generalized additive model, specified as a ClassificationGAM or RegressionGAM model object. Interactions — Number of interaction terms or list of interaction terms 0 | nonnegative integer | logical matrix | 'all' Number or list of interaction terms to include in the candidate set S, specified as a nonnegative integer scalar, a logical matrix, or 'all'. • Number of interaction terms, specified as a nonnegative integer — S includes the specified number of important interaction terms, selected based on the p-values of the terms. • List of interaction terms, specified as a logical matrix — S includes the terms specified by a t-by-p logical matrix, where t is the number of interaction terms, and p is the number of predictors used to train the model. For example, logical([1 1 0; 0 1 1]) represents two pairs of interaction terms: a pair of the first and second predictors, and a pair of the second and third predictors. If addInteractions uses a subset of input variables as predictors, then the function indexes the predictors using only the subset. That is, the column indexes of the logical matrix do not count the response and observation weight variables. The indexes also do not count any variables not used by the function. • 'all' — S includes all possible pairs of interaction terms, which is p*(p – 1)/2 number of terms in total. Among the interaction terms in S, the addInteractions function identifies those whose p-values are not greater than the 'MaxPValue' value and uses them to build a set of interaction trees. Use the default value ('MaxPValue',1) to build interaction trees using all terms in S. Data Types: single | double | logical | char | string Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: addInteractions(Mdl,'all','MaxPValue',0.05,'Verbose',1,'NumPrints',10) specifies to include all available interaction terms whose p-values are not greater than 0.05 and to display diagnostic messages every 10 iterations. InitialLearnRateForInteractions — Learning rate of gradient boosting for interaction terms 1 (default) | numeric scalar in (0,1] Initial learning rate of gradient boosting for interaction terms, specified as a numeric scalar in the interval (0,1]. For each boosting iteration for interaction trees, addInteractions starts fitting with the initial learning rate. For classification, the function halves the learning rate until it finds a rate that improves the model fit. For regression, the function uses the initial rate throughout the training. 35-12

addInteractions

Training a model using a small learning rate requires more learning iterations, but often achieves better accuracy. For more details about gradient boosting, see “Gradient Boosting Algorithm” on page 35-15. Example: 'InitialLearnRateForInteractions',0.1 Data Types: single | double MaxNumSplitsPerInteraction — Maximum number of decision splits per interaction tree 4 (default) | positive integer scalar Maximum number of decision splits (or branch nodes) for each interaction tree (boosted tree for an interaction term), specified as a positive integer scalar. Example: 'MaxNumSplitsPerInteraction',5 Data Types: single | double MaxPValue — Maximum p-value for detecting interaction terms 1 (default) | numeric scalar in [0,1] Maximum p-value for detecting interaction terms, specified as a numeric scalar in the interval [0,1]. addInteractions first finds the candidate set S of interaction terms from the Interactions value. Then the function identifies the interaction terms whose p-values are not greater than the 'MaxPValue' value and uses them to build a set of interaction trees. The default value ('MaxPValue',1) builds interaction trees for all interaction terms in the candidate set S. For more details about detecting interaction terms, see “Interaction Term Detection” on page 35-16. Example: 'MaxPValue',0.05 Data Types: single | double NumPrint — Number of iterations between diagnostic message printouts Mdl.ModelParameters.NumPrint (default) | nonnegative integer scalar Number of iterations between diagnostic message printouts, specified as a nonnegative integer scalar. This argument is valid only when you specify 'Verbose' as 1. If you specify 'Verbose',1 and 'NumPrint',numPrint, then the software displays diagnostic messages every numPrint iterations in the Command Window. The default value is Mdl.ModelParameters.NumPrint, which is the NumPrint value that you specify when creating the GAM object Mdl. Example: 'NumPrint',500 Data Types: single | double NumTreesPerInteraction — Number of trees per interaction term 100 (default) | positive integer scalar Number of trees per interaction term, specified as a positive integer scalar. The 'NumTreesPerInteraction' value is equivalent to the number of gradient boosting iterations for the interaction terms for predictors. For each iteration, addInteractions adds a set of 35-13

35

Functions

interaction trees to the model, one tree for each interaction term. To learn about the gradient boosting algorithm, see “Gradient Boosting Algorithm” on page 35-15. You can determine whether the fitted model has the specified number of trees by viewing the diagnostic message displayed when 'Verbose' is 1 or 2, or by checking the ReasonForTermination property value of the model Mdl. Example: 'NumTreesPerInteraction',500 Data Types: single | double Verbose — Verbosity level Mdl.ModelParameters.VerbosityLevel (default) | 0 | 1 | 2 Verbosity level, specified as 0, 1, or 2. The Verbose value controls the amount of information that the software displays in the Command Window. This table summarizes the available verbosity level options. Value

Description

0

The software displays no information.

1

The software displays diagnostic messages every numPrint iterations, where numPrint is the 'NumPrint' value.

2

The software displays diagnostic messages at every iteration.

Each line of the diagnostic messages shows the information about each boosting iteration and includes the following columns: • Type — Type of trained trees, 1D (predictor trees, or boosted trees for linear terms for predictors) or 2D (interaction trees, or boosted trees for interaction terms for predictors) • NumTrees — Number of trees per linear term or interaction term that addInteractions added to the model so far • Deviance — “Deviance” on page 35-15 of the model • RelTol — Relative change of model predictions: y k − y k − 1 ′ y k − y k − 1 / y k′y k, where y k is a column vector of model predictions at iteration k • LearnRate — Learning rate used for the current iteration The default value is Mdl.ModelParameters.VerbosityLevel, which is the Verbose value that you specify when creating the GAM object Mdl. Example: 'Verbose',1 Data Types: single | double

Output Arguments UpdatedMdl — Updated generalized additive model ClassificationGAM model object | RegressionGAM model object Updated generalized additive model, returned as a ClassificationGAM or RegressionGAM model object. UpdatedMdl has the same object type as the input model Mdl. To overwrite the input argument Mdl, assign the output of addInteractions to Mdl: 35-14

addInteractions

Mdl = addInteractions(Mdl,Interactions);

More About Deviance Deviance is a generalization of the residual sum of squares. It measures the goodness of fit compared to the saturated model. The deviance of a fitted model is twice the difference between the loglikelihoods of the model and the saturated model: -2(logL - logLs), where L and Ls are the likelihoods of the fitted model and the saturated model, respectively. The saturated model is the model with the maximum number of parameters that you can estimate. addInteractions uses the deviance to measure the goodness of model fit and finds a learning rate that reduces the deviance at each iteration. Specify 'Verbose' as 1 or 2 to display the deviance and learning rate in the Command Window.

Algorithms Gradient Boosting Algorithm addInteractions adds sets of interaction trees (boosted trees for interaction terms for predictors) to a univariate generalized additive model by using a gradient boosting algorithm (“Least-Squares Boosting” on page 19-53 for regression and “Adaptive Logistic Regression” on page 19-51 for classification). The algorithm iterates for at most 'NumTreesPerInteraction' times for interaction trees. For each boosting iteration, addInteractions builds a set of interaction trees with the initial learning rate 'InitialLearnRateForInteractions'. • When building a set of trees, the function trains one tree at a time. It fits a tree to the residual that is the difference between the response (observed response values for regression or scores of observed classes for classification) and the aggregated prediction from all trees grown previously. To control the boosting learning speed, the function shrinks the tree by the learning rate and then adds the tree to the model and updates the residual. • Updated model = current model + (learning rate)·(new tree) • Updated residual = current residual – (learning rate)·(response explained by new tree) • If adding the set of trees improves the model fit (that is, reduces the deviance of the fit by a value larger than the tolerance), then addInteractions moves to the next iteration. • Otherwise, for classification, addInteractions halves the learning rate and uses it to update the model and residual. The function continues to halve the learning rate until it finds a rate that improves the model fit. If the function cannot find such a learning rate for interaction trees, then it terminates the model fitting. For regression, if adding the set of trees does not improve the model fit with the initial learning rate, then the function terminates the model fitting. You can determine why training stopped by checking the ReasonForTermination property of the trained model. 35-15

35

Functions

Interaction Term Detection For each pairwise interaction term xixj (specified by Interactions), the software performs an F-test to examine whether the term is statistically significant. To speed up the process, addInteractions bins numeric predictors into at most 8 equiprobable bins. The number of bins can be less than 8 if a predictor has fewer than 8 unique values. The F-test examines the null hypothesis that the bins created by xi and xj have equal responses versus the alternative that at least one bin has a different response value from the others. A small p-value indicates that differences are significant, which implies that the corresponding interaction term is significant and, therefore, including the term can improve the model fit. addInteractions builds a set of interaction trees using the terms whose p-values are not greater than the 'MaxPValue' value. You can use the default 'MaxPValue' value 1 to build interaction trees using all terms specified by Interactions. addInteractions adds interaction terms to the model in the order of importance based on the pvalues. Use the Interactions property of the returned model to check the order of the interaction terms added to the model.

Version History Introduced in R2021a

See Also resume | RegressionGAM | ClassificationGAM Topics “Train Generalized Additive Model for Binary Classification” on page 12-77 “Train Generalized Additive Model for Regression” on page 12-86

35-16

addlistener

addlistener Class: qrandstream Add listener for event

Syntax el = addlistener(hsource,'eventname',callback) el = addlistener(hsource,property,'eventname',callback)

Description el = addlistener(hsource,'eventname',callback) creates a listener for the event named eventname, the source of which is handle object hsource. If hsource is an array of source handles, the listener responds to the named event on any handle in the array. callback is a function handle that is invoked when the event is triggered. el = addlistener(hsource,property,'eventname',callback) adds a listener for a property event. eventname must be 'PreGet', 'PostGet', 'PreSet', or 'PostSet'. property must be either a property name or cell array of property names, or a meta.property or array of meta.property. The properties must belong to the class of hsource. If hsource is scalar, property can include dynamic properties. For all forms, addlistener returns an event.listener. To remove a listener, delete the object returned by addlistener. For example, delete(el) calls the handle class delete method to remove the listener and delete it from the workspace.

See Also delete | dynamicprops | event.listener | events | meta.property | notify | qrandstream | reset

35-17

35

Functions

addMetrics Compute additional classification performance metrics

Syntax UpdatedROCObj = addMetrics(rocObj,metrics)

Description rocmetrics computes the false positive rates (FPR), true positive rates (TPR), and additional metrics specified by the AdditionalMetrics name-value argument. After creating a rocmetrics object, you can compute additional classification performance metrics by using the addMetrics function. UpdatedROCObj = addMetrics(rocObj,metrics) computes additional classification performance metrics specified in metrics using the classification model information stored in the rocmetrics object rocObj. UpdatedROCObj contains all the information in rocObj plus additional performance metrics computed by addMetrics. The function attaches the additional computed metrics (metrics) as new variables in the table of the Metrics property. If you compute confidence intervals when you create rocObj, the addMetrics function computes the confidence intervals for the additional metrics. The new variables in the Metrics property contain a three-column matrix in which the first column corresponds to the metric values, and the second and third columns correspond to the lower and upper bounds, respectively.

Examples Compute Additional Metrics Compute the performance metrics (FPR, TPR, and expected cost) for a multiclass classification problem when you create a rocmetrics object. Compute additional metrics, the positive predictive value (PPV) and the negative predictive value (NPV), and add them to the object. Load the fisheriris data set. The matrix meas contains flower measurements for 150 different flowers. The vector species lists the species for each flower. species contains three distinct flower names. load fisheriris

Train a classification tree that classifies observations into one of the three labels. Cross-validate the model using 10-fold cross-validation. rng("default") % For reproducibility Mdl = fitctree(meas,species,Crossval="on");

Compute the classification scores for validation-fold observations. 35-18

addMetrics

[~,Scores] = kfoldPredict(Mdl); size(Scores) ans = 1×2 150

3

Scores is a matrix of size 150-by-3. The column order of Scores follows the class order in Mdl. Display the class order stored in Mdl.ClassNames. Mdl.ClassNames ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' }

Create a rocmetrics object by using the true labels in species and the classification scores in Scores. Specify the column order of Scores using Mdl.ClassNames. By default, rocmetrics computes the FPR and TPR. Specify AdditionalMetrics="ExpectedCost" to compute the expected cost as well. rocObj = rocmetrics(species,Scores,Mdl.ClassNames, ... AdditionalMetrics="ExpectedCost");

The table in the Metrics property of rocObj contains performance metric values for all three classes, vertically concatenated according to the class order. Find and display the rows for the second class in the table. idx = strcmp(rocObj.Metrics.ClassName,Mdl.ClassNames(2)); rocObj.Metrics(idx,:) ans=13×5 table ClassName ______________ {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'}

Threshold _________ 1 1 0.95455 0.91304 -0.2 -0.33333 -0.6 -0.86957 -0.91111 -0.95122 -0.95238 -0.95349 -1

FalsePositiveRate _________________ 0 0.01 0.02 0.03 0.04 0.06 0.08 0.12 0.16 0.31 0.38 0.44 1

TruePositiveRate ________________

ExpectedCost ____________

0 0.7 0.8 0.9 0.9 0.9 0.9 0.92 0.96 0.96 0.98 0.98 1

0.074074 0.023704 0.017778 0.011852 0.013333 0.016296 0.019259 0.023704 0.026667 0.048889 0.057778 0.066667 0.14815

The table in Metrics contains the variables for the class names, threshold, false positive rate, true positive rate, and expected cost (the additional metric).

35-19

35

Functions

After creating a rocmetrics object, you can compute additional metrics using the classification model information stored in the object. Compute the PPV and NPV by using the addMetrics function. To overwrite the input argument rocObj, assign the output of addMetrics to the input. rocObj = addMetrics(rocObj,["PositivePredictiveValue","NegativePredictiveValue"]);

Display the Metrics property. rocObj.Metrics(idx,:) ans=13×7 table ClassName ______________ {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'} {'versicolor'}

Threshold _________ 1 1 0.95455 0.91304 -0.2 -0.33333 -0.6 -0.86957 -0.91111 -0.95122 -0.95238 -0.95349 -1

FalsePositiveRate _________________ 0 0.01 0.02 0.03 0.04 0.06 0.08 0.12 0.16 0.31 0.38 0.44 1

TruePositiveRate ________________

ExpectedCost ____________

0 0.7 0.8 0.9 0.9 0.9 0.9 0.92 0.96 0.96 0.98 0.98 1

0.074074 0.023704 0.017778 0.011852 0.013333 0.016296 0.019259 0.023704 0.026667 0.048889 0.057778 0.066667 0.14815

The table in Metrics now includes the PositivePredictiveValue and NegativePredictiveValue variables in the last two columns, in the order you specified. Note that the positive predictive value (PPV = TP/(TP+FP)) is NaN for the reject-all threshold (largest threshold), and the negative predictive value (NPV = TN/(TN+FN)) is NaN for the accept-all threshold (lowest threshold). TP, FP, TN, and FN represent the number of true positives, false positives, true negatives, and false negatives, respectively.

Input Arguments rocObj — Object evaluating classification performance rocmetrics object Object evaluating classification performance, specified as a rocmetrics object. metrics — Additional model performance metrics character vector | string array | function handle | cell array Additional model performance metrics to compute, specified as a character vector or string scalar of the built-in metric name, string array of names, function handle (@metricName), or cell array of names or function handles. A rocmetrics object always computes the false positive rates (FPR) and the true positive rates (TPR) to obtain a ROC curve. Therefore, you do not have to specify to compute FPR and TPR. • Built-in metrics — Specify one of the following built-in metric names by using a character vector or string scalar. You can specify more than one by using a string array. 35-20

Posit _____

addMetrics

Name

Description

"TruePositives" or "tp"

Number of true positives (TP)

"FalseNegatives" or "fn" Number of false negatives (FN) "FalsePositives" or "fp" Number of false positives (FP) "TrueNegatives" or "tn"

Number of true negatives (TN)

"SumOfTrueAndFalsePosit Sum of TP and FP ives" or "tp+fp" "RateOfPositivePredicti Rate of positive predictions (RPP), (TP+FP)/(TP+FN+FP+TN) ons" or "rpp" "RateOfNegativePredicti Rate of negative predictions (RNP), (TN+FN)/(TP+FN+FP ons" or "rnp" +TN) "Accuracy" or "accu"

Accuracy, (TP+TN)/(TP+FN+FP+TN)

"FalseNegativeRate", "fnr", or "miss"

False negative rate (FNR), or miss rate, FN/(TP+FN)

"TrueNegativeRate", "tnr", or "spec"

True negative rate (TNR), or specificity, TN/(TN+FP)

"PositivePredictiveValu Positive predictive value (PPV), or precision, TP/(TP+FP) e", "ppv", or "prec" "NegativePredictiveValu Negative predictive value (NPV), TN/(TN+FN) e" or "npv" "ExpectedCost" or "ecost"

Expected cost, (TP*cost(P|P)+FN*cost(N|P) +FP*cost(P|N)+TN*cost(N|N))/(TP+FN+FP+TN), where cost is a 2-by-2 misclassification cost matrix containing [0,cost(N|P);cost(P|N),0]. cost(N|P) is the cost of misclassifying a positive class (P) as a negative class (N), and cost(P|N) is the cost of misclassifying a negative class as a positive class. The software converts the K-by-K matrix specified by the Cost name-value argument of rocmetrics to a 2-by-2 matrix for each one-versus-all binary problem. For details, see “Misclassification Cost Matrix” on page 18-12.

The software computes the scale vector using the prior class probabilities (Prior) and the number of classes in Labels, and then scales the performance metrics according to this scale vector. For details, see “Performance Metrics” on page 18-11. • Custom metric — Specify a custom metric by using a function handle. A custom function that returns a performance metric must have this form: metric = customMetric(C,scale,cost)

• The output argument metric is a scalar value. • A custom metric is a function of the confusion matrix (C), scale vector (scale), and cost matrix (cost). The software finds these input values for each one-versus-all binary problem. For details, see “Performance Metrics” on page 18-11. • C is a 2-by-2 confusion matrix consisting of [TP,FN;FP,TN]. 35-21

35

Functions

• scale is a 2-by-1 scale vector. • cost is a 2-by-2 misclassification cost matrix. The software does not support cross-validation for a custom metric. Instead, you can specify to use bootstrap when you create a rocmetrics object. Note that the positive predictive value (PPV) is NaN for the reject-all threshold for which TP = FP = 0, and the negative predictive value (NPV) is NaN for the accept-all threshold for which TN = FN = 0. For more details, see “Thresholds, Fixed Metric, and Fixed Metric Values” on page 18-15. Example: ["Accuracy","PositivePredictiveValue"] Example: {"Accuracy",@m1,@m2} specifies the accuracy metric and the custom metrics m1 and m2 as additional metrics. addMetrics stores the custom metric values as variables named CustomMetric1 and CustomMetric2 in the Metrics property. Data Types: char | string | cell | function_handle

Output Arguments UpdatedROCObj — Object evaluating classification performance rocmetrics object Object evaluating classification performance, returned as a rocmetrics object. To overwrite the input argument rocObj, assign the output of addMetrics to rocObj: rocObj = addMetrics(rocObj,metrics);

Version History Introduced in R2022a

See Also rocmetrics | average | plot Topics “ROC Curve and Performance Metrics” on page 18-3

35-22

anova

anova Class: GeneralizedLinearMixedModel Analysis of variance for generalized linear mixed-effects model

Syntax stats = anova(glme) stats = anova(glme,Name,Value)

Description stats = anova(glme) returns a table, stats, that contains the results of F-tests to determine if all coefficients representing each fixed-effects term in the generalized linear mixed-effects model glme are equal to 0. stats = anova(glme,Name,Value) returns a table, stats, using additional options specified by one or more Name,Value pair arguments. For example, you can specify the method used to compute the approximate denominator degrees of freedom for the F-tests.

Input Arguments glme — Generalized linear mixed-effects model GeneralizedLinearMixedModel object Generalized linear mixed-effects model, specified as a GeneralizedLinearMixedModel object. For properties and methods of this object, see GeneralizedLinearMixedModel. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. DFMethod — Method for computing approximate denominator degrees of freedom 'residual' (default) | 'none' Method for computing approximate denominator degrees of freedom to use in the F-test, specified as the comma-separated pair consisting of 'DFMethod' and one of the following. Value

Description

'residual'

The degrees of freedom are assumed to be constant and equal to n – p, where n is the number of observations and p is the number of fixed effects.

'none'

All degrees of freedom are set to infinity.

35-23

35

Functions

The denominator degrees of freedom for the F-statistic correspond to the column DF2 in the output structure stats. Example: 'DFMethod','none'

Output Arguments stats — Results of F-tests for fixed-effects terms table Results of F-tests for fixed-effects terms, returned as a table with one row for each fixed-effects term in glme and the following columns. Column Name

Description

Term

Name of the fixed-effects term

FStat

F-statistic for the term

DF1

Numerator degrees of freedom for the F-statistic

DF2

Denominator degrees of freedom for the Fstatistic

pValue

p-value for the term

Each fixed-effects term is a continuous variable, a grouping variable, or an interaction between two or more continuous or grouping variables. For each fixed-effects term, anova performs an F-test (marginal test) to determine if all coefficients representing the fixed-effects term are equal to 0. To perform tests for the type III hypothesis, when fitting the generalized linear mixed-effects model fitglme, you must use the 'effects' contrasts for the 'DummyVarCoding' name-value pair argument.

Examples F-Tests for Fixed Effects Load the sample data. load mfr

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data: • Flag to indicate whether the batch used the new process (newprocess) • Processing time for each batch, in hours (time) • Temperature of the batch, in degrees Celsius (temp) • Categorical variable indicating the supplier (A, B, or C) of the chemical used in the batch (supplier) 35-24

anova

• Number of defects in the batch (defects) The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius. Fit a generalized linear mixed-effects model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects term for intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects', so the dummy variable coefficients sum to 0. The number of defects can be modeled using a Poisson distribution defecti j ∼ Poisson(μi j) This corresponds to the generalized linear mixed-effects model logμi j = β0 + β1newprocessi j + β2time_devi j + β3temp_devi j + β4supplier_Ci j + β5supplier_Bi j + bi, where • defectsi j is the number of defects observed in the batch produced by factory i during batch j. • μi j is the mean number of defects corresponding to factory i (where i = 1, 2, . . . , 20) during batch j (where j = 1, 2, . . . , 5). • newprocessi j, time_devi j, and temp_devi j are the measurements for each variable that correspond to factory i during batch j. For example, newprocessi j indicates whether the batch produced by factory i during batch j used the new process. • supplier_Ci j and supplier_Bi j are dummy variables that use effects (sum-to-zero) coding to indicate whether company C or B, respectively, supplied the process chemicals for the batch produced by factory i during batch j. • b ∼ N(0, σ2) is a random-effects intercept for each factory i that accounts for factory-specific i b variation in quality. glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)',... 'Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects') glme = Generalized linear mixed-effects model fit by ML Model information: Number of observations Fixed effects coefficients Random effects coefficients Covariance parameters Distribution Link FitMethod

100 6 20 1 Poisson Log Laplace

Formula: defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory) Model fit statistics:

35-25

35

Functions

AIC 416.35

BIC 434.58

LogLikelihood -201.17

Fixed effects coefficients Name {'(Intercept)'} {'newprocess' } {'time_dev' } {'temp_dev' } {'supplier_C' } {'supplier_B' }

(95% CIs): Estimate 1.4689 -0.36766 -0.094521 -0.28317 -0.071868 0.071072

Deviance 402.35 SE 0.15988 0.17755 0.82849 0.9617 0.078024 0.07739

Random effects covariance parameters: Group: factory (20 Levels) Name1 Name2 {'(Intercept)'} {'(Intercept)'} Group: Error Name {'sqrt(Dispersion)'}

tStat 9.1875 -2.0708 -0.11409 -0.29444 -0.9211 0.91836

DF 94 94 94 94 94 94

Type {'std'}

pValue 9.8194e-15 0.041122 0.90941 0.76907 0.35936 0.36078

Estimate 0.31381

Estimate 1

Perform an F-test to determine if all fixed-effects coefficients are equal to 0. stats = anova(glme) stats = ANOVA MARGINAL TESTS: DFMETHOD = 'RESIDUAL' Term {'(Intercept)'} {'newprocess' } {'time_dev' } {'temp_dev' } {'supplier' }

FStat 84.41 4.2881 0.013016 0.086696 0.59212

DF1 1 1 1 1 2

DF2 94 94 94 94 94

pValue 9.8194e-15 0.041122 0.90941 0.76907 0.5552

The p-values for the intercept, newprocess, time_dev, and temp_dev are the same as in the coefficient table of the glme display. The small p-values for the intercept and newprocess indicate that these are significant predictors at the 5% significance level. The large p-values for time_dev and temp_dev indicate that these are not significant predictors at this level. The p-value of 0.5552 for supplier measures the combined significance for both coefficients representing the categorical variable supplier. This includes the dummy variables supplier_C and supplier_B as shown in the coefficient table of the glme display. The large p-value indicates that supplier is not a significant predictor at the 5% significance level.

Tips • For each fixed-effects term, anova performs an F-test (marginal test) to determine if all coefficients representing the fixed-effects term are equal to 0. When fitting a generalized linear mixed-effects (GLME) model using fitglme and one of the maximum likelihood fit methods ('Laplace' or 'ApproximateLaplace'):

35-26

Lower 1.1515 -0.72019 -1.7395 -2.1926 -0.22679 -0.082588

anova

• If you specify the 'CovarianceMethod' name-value pair argument as 'conditional', then the F-tests are conditional on the estimated covariance parameters. • If you specify the 'CovarianceMethod' name-value pair as 'JointHessian', then the Ftests account for the uncertainty in estimation of covariance parameters. When fitting a GLME model using fitglme and one of the pseudo likelihood fit methods ('MPL' or 'REMPL'), anova uses the fitted linear mixed effects model from the final pseudo likelihood iteration for inference on fixed effects.

See Also GeneralizedLinearMixedModel | fitglme | coefTest | coefCI | fixedEffects

35-27

35

Functions

addTerms Add terms to generalized linear regression model

Syntax NewMdl = addTerms(mdl,terms)

Description NewMdl = addTerms(mdl,terms) returns a generalized linear regression model fitted using the input data and settings in mdl with the terms terms added.

Examples Add Terms to Generalized Linear Regression Model Create a generalized linear regression model using one predictor, and then add another predictor. Generate sample data using Poisson random numbers with two underlying predictors X(:,1) and X(:,2). rng('default') % For reproducibility rndvars = randn(100,2); X = [2 + rndvars(:,1),rndvars(:,2)]; mu = exp(1 + X*[1;2]); y = poissrnd(mu);

Create a generalized linear regression model of Poisson data. Include only the first predictor in the model. mdl = fitglm(X,y,'y ~ x1','Distribution','poisson') mdl = Generalized linear regression model: log(y) ~ 1 + x1 Distribution = Poisson Estimated Coefficients: Estimate ________ (Intercept) x1

2.7784 1.1732

SE _________

tStat ______

0.014043 0.0033653

197.85 348.6

pValue ______ 0 0

100 observations, 98 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 1.25e+05, p-value = 0

Add the second predictor to the model. mdl1 = addTerms(mdl,'x2')

35-28

addTerms

mdl1 = Generalized linear regression model: log(y) ~ 1 + x1 + x2 Distribution = Poisson Estimated Coefficients: Estimate ________ (Intercept) x1 x2

1.0405 0.9968 1.987

SE _________

tStat ______

0.022122 0.003362 0.0063433

47.034 296.49 313.24

pValue ______ 0 0 0

100 observations, 97 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 2.95e+05, p-value = 0

Input Arguments mdl — Generalized linear regression model GeneralizedLinearModel object Generalized linear regression model, specified as a GeneralizedLinearModel object created using fitglm or stepwiseglm. terms — Terms to add to regression model character vector or string scalar formula in Wilkinson notation | t-by-p terms matrix Terms to add to the regression model mdl, specified as one of the following: • Character vector or string scalar formula in “Wilkinson Notation” on page 35-30 representing one or more terms. The variable names in the formula must be valid MATLAB identifiers. • Terms matrix T of size t-by-p, where t is the number of terms and p is the number of predictor variables in mdl. The value of T(i,j) is the exponent of variable j in term i. For example, suppose mdl has three variables A, B, and C in that order. Each row of T represents one term: • [0 0 0] — Constant term or intercept • [0 1 0] — B; equivalently, A^0 * B^1 * C^0 • [1 0 1] — A*C • [2 0 0] — A^2 • [0 1 2] — B*(C^2) addTerms treats a group of indicator variables for a categorical predictor as a single variable. Therefore, you cannot specify an indicator variable to add to the model. If you specify a categorical predictor to add to the model, addTerms adds a group of indicator variables for the predictor in one step.

35-29

35

Functions

Output Arguments NewMdl — Generalized linear regression model with additional terms GeneralizedLinearModel object Generalized linear regression model with additional terms, returned as a GeneralizedLinearModel object. NewMdl is a newly fitted model that uses the input data and settings in mdl with additional terms specified in terms. To overwrite the input argument mdl, assign the newly fitted model to mdl: mdl = addTerms(mdl,terms);

More About Wilkinson Notation Wilkinson notation describes the terms present in a model. The notation relates to the terms present in a model, not to the multipliers (coefficients) of those terms. Wilkinson notation uses these symbols: • + means include the next variable. • – means do not include the next variable. • : defines an interaction, which is a product of terms. • * defines an interaction and all lower-order terms. • ^ raises the predictor to a power, exactly as in * repeated, so ^ includes lower-order terms as well. • () groups terms. This table shows typical examples of Wilkinson notation. Wilkinson Notation

Terms in Standard Notation

1

Constant (intercept) term

x1^k, where k is a positive integer

x1, x12, ..., x1k

x1 + x2

x1, x2

x1*x2

x1, x2, x1*x2

x1:x2

x1*x2 only

–x2

Do not include x2

x1*x2 + x3

x1, x2, x3, x1*x2

x1 + x2 + x3 + x1:x2

x1, x2, x3, x1*x2

x1*x2*x3 – x1:x2:x3

x1, x2, x3, x1*x2, x1*x3, x2*x3

x1*(x2 + x3)

x1, x2, x3, x1*x2, x1*x3

For more details, see “Wilkinson Notation” on page 11-93.

35-30

addTerms

Algorithms • addTerms treats a categorical predictor as follows: • A model with a categorical predictor that has L levels (categories) includes L – 1 indicator variables. The model uses the first category as a reference level, so it does not include the indicator variable for the reference level. If the data type of the categorical predictor is categorical, then you can check the order of categories by using categories and reorder the categories by using reordercats to customize the reference level. For more details about creating indicator variables, see “Automatic Creation of Dummy Variables” on page 2-14. • addTerms treats the group of L – 1 indicator variables as a single variable. If you want to treat the indicator variables as distinct predictor variables, create indicator variables manually by using dummyvar. Then use the indicator variables, except the one corresponding to the reference level of the categorical variable, when you fit a model. For the categorical predictor X, if you specify all columns of dummyvar(X) and an intercept term as predictors, then the design matrix becomes rank deficient. • Interaction terms between a continuous predictor and a categorical predictor with L levels consist of the element-wise product of the L – 1 indicator variables with the continuous predictor. • Interaction terms between two categorical predictors with L and M levels consist of the (L – 1)*(M – 1) indicator variables to include all possible combinations of the two categorical predictor levels. • You cannot specify higher-order terms for a categorical predictor because the square of an indicator is equal to itself.

Alternative Functionality • Use stepwiseglm to specify terms in a starting model and continue improving the model until no single step of adding or removing a term is beneficial. • Use removeTerms to remove specific terms from a model. • Use step to optimally improve a model by adding or removing terms.

Version History Introduced in R2012a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also GeneralizedLinearModel | removeTerms | step | stepwiseglm

35-31

35

Functions

Topics “Generalized Linear Models” on page 12-9

35-32

addTerms

addTerms Add terms to linear regression model

Syntax NewMdl = addTerms(mdl,terms)

Description NewMdl = addTerms(mdl,terms) returns a linear regression model fitted using the input data and settings in mdl with the terms terms added.

Examples Add Terms to Linear Regression Model Create a linear regression model of the carsmall data set without any interactions, and then add an interaction term. Load the carsmall data set and create a model of the MPG as a function of weight and model year. load carsmall tbl = table(MPG,Weight); tbl.Year = categorical(Model_Year); mdl = fitlm(tbl,'MPG ~ Year + Weight^2') mdl = Linear regression model: MPG ~ 1 + Weight + Year + Weight^2 Estimated Coefficients: Estimate __________ (Intercept) Weight Year_76 Year_82 Weight^2

54.206 -0.016404 2.0887 8.1864 1.5573e-06

SE __________

tStat _______

pValue __________

4.7117 0.0031249 0.71491 0.81531 4.9454e-07

11.505 -5.2493 2.9215 10.041 3.149

2.6648e-19 1.0283e-06 0.0044137 2.6364e-16 0.0022303

Number of observations: 94, Error degrees of freedom: 89 Root Mean Squared Error: 2.78 R-squared: 0.885, Adjusted R-Squared: 0.88 F-statistic vs. constant model: 172, p-value = 5.52e-41

The model includes five terms, Intercept, Weight, Year_76, Year_82, and Weight^2, where Year_76 and Year_82 are indicator variables for the categorical variable Year that has three distinct values. Add an interaction term between the Year and Weight variables to mdl. 35-33

35

Functions

terms = 'Year*Weight'; NewMdl = addTerms(mdl,terms) NewMdl = Linear regression model: MPG ~ 1 + Weight*Year + Weight^2 Estimated Coefficients:

(Intercept) Weight Year_76 Year_82 Weight:Year_76 Weight:Year_82 Weight^2

Estimate ___________

SE __________

tStat ________

pValue __________

48.045 -0.012624 2.7768 16.416 -0.00020693 -0.0032574 1.0121e-06

6.779 0.0041455 3.0538 4.9802 0.00092403 0.0018919 6.12e-07

7.0874 -3.0454 0.90931 3.2962 -0.22394 -1.7217 1.6538

3.3967e-10 0.0030751 0.3657 0.0014196 0.82333 0.088673 0.10177

Number of observations: 94, Error degrees of freedom: 87 Root Mean Squared Error: 2.76 R-squared: 0.89, Adjusted R-Squared: 0.882 F-statistic vs. constant model: 117, p-value = 1.88e-39

NewMdl includes two additional terms, Weight*Year_76 and Weight*Year_82.

Input Arguments mdl — Linear regression model LinearModel object Linear regression model, specified as a LinearModel object created using fitlm or stepwiselm. terms — Terms to add to regression model character vector or string scalar formula in Wilkinson notation | t-by-p terms matrix Terms to add to the regression model mdl, specified as one of the following: • Character vector or string scalar formula in “Wilkinson Notation” on page 35-35 representing one or more terms. The variable names in the formula must be valid MATLAB identifiers. • Terms matrix T of size t-by-p, where t is the number of terms and p is the number of predictor variables in mdl. The value of T(i,j) is the exponent of variable j in term i. For example, suppose mdl has three variables A, B, and C in that order. Each row of T represents one term: • [0 0 0] — Constant term or intercept • [0 1 0] — B; equivalently, A^0 * B^1 * C^0 • [1 0 1] — A*C • [2 0 0] — A^2 • [0 1 2] — B*(C^2) 35-34

addTerms

addTerms treats a group of indicator variables for a categorical predictor as a single variable. Therefore, you cannot specify an indicator variable to add to the model. If you specify a categorical predictor to add to the model, addTerms adds a group of indicator variables for the predictor in one step. See “Modify Linear Regression Model Using step” on page 35-7841 for an example that describes how to create indicator variables manually and treat each one as a separate variable.

Output Arguments NewMdl — Linear regression model with additional terms LinearModel object Linear regression model with additional terms, returned as a LinearModel object. NewMdl is a newly fitted model that uses the input data and settings in mdl with additional terms specified in terms. To overwrite the input argument mdl, assign the newly fitted model to mdl: mdl = addTerms(mdl,terms);

More About Wilkinson Notation Wilkinson notation describes the terms present in a model. The notation relates to the terms present in a model, not to the multipliers (coefficients) of those terms. Wilkinson notation uses these symbols: • + means include the next variable. • – means do not include the next variable. • : defines an interaction, which is a product of terms. • * defines an interaction and all lower-order terms. • ^ raises the predictor to a power, exactly as in * repeated, so ^ includes lower-order terms as well. • () groups terms. This table shows typical examples of Wilkinson notation. Wilkinson Notation

Terms in Standard Notation

1

Constant (intercept) term

x1^k, where k is a positive integer

x1, x12, ..., x1k

x1 + x2

x1, x2

x1*x2

x1, x2, x1*x2

x1:x2

x1*x2 only

–x2

Do not include x2

x1*x2 + x3

x1, x2, x3, x1*x2

x1 + x2 + x3 + x1:x2

x1, x2, x3, x1*x2

x1*x2*x3 – x1:x2:x3

x1, x2, x3, x1*x2, x1*x3, x2*x3

35-35

35

Functions

Wilkinson Notation

Terms in Standard Notation

x1*(x2 + x3)

x1, x2, x3, x1*x2, x1*x3

For more details, see “Wilkinson Notation” on page 11-93.

Algorithms • addTerms treats a categorical predictor as follows: • A model with a categorical predictor that has L levels (categories) includes L – 1 indicator variables. The model uses the first category as a reference level, so it does not include the indicator variable for the reference level. If the data type of the categorical predictor is categorical, then you can check the order of categories by using categories and reorder the categories by using reordercats to customize the reference level. For more details about creating indicator variables, see “Automatic Creation of Dummy Variables” on page 2-14. • addTerms treats the group of L – 1 indicator variables as a single variable. If you want to treat the indicator variables as distinct predictor variables, create indicator variables manually by using dummyvar. Then use the indicator variables, except the one corresponding to the reference level of the categorical variable, when you fit a model. For the categorical predictor X, if you specify all columns of dummyvar(X) and an intercept term as predictors, then the design matrix becomes rank deficient. • Interaction terms between a continuous predictor and a categorical predictor with L levels consist of the element-wise product of the L – 1 indicator variables with the continuous predictor. • Interaction terms between two categorical predictors with L and M levels consist of the (L – 1)*(M – 1) indicator variables to include all possible combinations of the two categorical predictor levels. • You cannot specify higher-order terms for a categorical predictor because the square of an indicator is equal to itself.

Alternative Functionality • Use stepwiselm to specify terms in a starting model and continue improving the model until no single step of adding or removing a term is beneficial. • Use removeTerms to remove specific terms from a model. • Use step to optimally improve a model by adding or removing terms.

Version History Introduced in R2012a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). 35-36

addTerms

See Also LinearModel | removeTerms | step | stepwiselm Topics “Linear Regression Workflow” on page 11-35 “Interpret Linear Regression Results” on page 11-52 “Linear Regression” on page 11-9 “Stepwise Regression” on page 11-101

35-37

35

Functions

adtest Anderson-Darling test

Syntax h = adtest(x) h = adtest(x,Name,Value) [h,p] = adtest( ___ ) [h,p,adstat,cv] = adtest( ___ )

Description h = adtest(x) returns a test decision for the null hypothesis that the data in vector x is from a population with a normal distribution, using the Anderson-Darling test on page 35-42. The alternative hypothesis is that x is not from a population with a normal distribution. The result h is 1 if the test rejects the null hypothesis at the 5% significance level, or 0 otherwise. h = adtest(x,Name,Value) returns a test decision for the Anderson-Darling test with additional options specified by one or more name-value pair arguments. For example, you can specify a null distribution other than normal, or select an alternative method for calculating the p-value. [h,p] = adtest( ___ ) also returns the p-value, p, of the Anderson-Darling test, using any of the input arguments from the previous syntaxes. [h,p,adstat,cv] = adtest( ___ ) also returns the test statistic, adstat, and the critical value, cv, for the Anderson-Darling test.

Examples Anderson-Darling Test for a Normal Distribution Load the sample data. Create a vector containing the first column of the students' exam grades data. load examgrades x = grades(:,1);

Test the null hypothesis that the exam grades come from a normal distribution. You do not need to specify values for the population parameters. [h,p,adstat,cv] = adtest(x) h = logical 0 p = 0.1854 adstat = 0.5194 cv = 0.7470

35-38

adtest

The returned value of h = 0 indicates that adtest fails to reject the null hypothesis at the default 5% significance level.

Anderson-Darling Test for Extreme Value Distribution Load the sample data. Create a vector containing the first column of the students' exam grades data. load examgrades x = grades(:,1);

Test the null hypothesis that the exam grades come from an extreme value distribution. You do not need to specify values for the population parameters. [h,p] = adtest(x,'Distribution','ev') h = logical 0 p = 0.0714

The returned value of h = 0 indicates that adtest fails to reject the null hypothesis at the default 5% significance level.

Anderson-Darling Test Using Specified Probability Distribution Load the sample data. Create a vector containing the first column of the students' exam grades data. load examgrades x = grades(:,1);

Create a normal probability distribution object with mean mu = 75 and standard deviation sigma = 10. dist = makedist('normal','mu',75,'sigma',10) dist = NormalDistribution Normal distribution mu = 75 sigma = 10

Test the null hypothesis that x comes from the hypothesized normal distribution. [h,p] = adtest(x,'Distribution',dist) h = logical 0 p = 0.4687

35-39

35

Functions

The returned value of h = 0 indicates that adtest fails to reject the null hypothesis at the default 5% significance level.

Input Arguments x — Sample data vector Sample data, specified as a vector. Missing observations in x, indicated by NaN, are ignored. Data Types: single | double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Alpha',0.01,'MCTol',0.01 conducts the hypothesis test at the 1% significance level, and determines the p-value, p, using a Monte Carlo simulation with a maximum Monte Carlo standard error for p of 0.01. Distribution — Hypothesized distribution 'norm' (default) | 'exp' | 'ev' | 'logn' | 'weibull' | probability distribution object Hypothesized distribution of data vector x, specified as the comma-separated pair consisting of 'Distribution' and one of the following. 'norm'

Normal distribution

'exp'

Exponential distribution

'ev'

Extreme value distribution

'logn'

Lognormal distribution

'weibull'

Weibull distribution

In this case, you do not need to specify population parameters. Instead, adtest estimates the distribution parameters from the sample data and tests x against a composite hypothesis that it comes from the selected distribution family with parameters unspecified. Alternatively, you can specify any continuous probability distribution object for the null distribution. In this case, you must specify all the distribution parameters, and adtest tests x against a simple hypothesis that it comes from the given distribution with its specified parameters. Example: 'Distribution','exp' Alpha — Significance level 0.05 (default) | scalar value in the range (0,1) Significance level of the hypothesis test, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range (0,1). Example: 'Alpha',0.01 35-40

adtest

Data Types: single | double MCTol — Maximum Monte Carlo standard error positive scalar value Maximum Monte Carlo standard error on page 35-43 for the p-value, p, specified as the commaseparated pair consisting of 'MCTol' and a positive scalar value. If you use MCTol, adtest determines p using a Monte Carlo simulation, and the name-value pair argument Asymptotic must have the value false. Example: 'MCTol',0.01 Data Types: single | double Asymptotic — Method for calculating p-value false (default) | true Method for calculating the p-value of the Anderson-Darling test, specified as the comma-separated pair consisting of 'Asymptotic' and either true or false. If you specify 'true', adtest estimates the p-value using the limiting distribution of the Anderson-Darling test statistic. If you specify false, adtest calculates the p-value based on an analytical formula. For sample sizes greater than 120, the limiting distribution estimate is likely to be more accurate than the small sample size approximation method. • If you specify a distribution family with unknown parameters for the Distribution name-value pair, Asymptotic must be false. • If you use MCTol to calculate the p-value using a Monte Carlo simulation, Asymptotic must be false. Example: 'Asymptotic',true Data Types: logical

Output Arguments h — Hypothesis test result 1|0 Hypothesis test result, returned as a logical value. • If h = 1, this indicates the rejection of the null hypothesis at the Alpha significance level. • If h = 0, this indicates a failure to reject the null hypothesis at the Alpha significance level. p — p-value scalar value in the range [0,1] p-value of the Anderson-Darling test, returned as a scalar value in the range [0,1]. p is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. p is calculated using one of these methods: • If the hypothesized distribution is a fully specified probability distribution object, adtest calculates p analytically. If 'Asymptotic' is true, adtest uses the asymptotic distribution of the test statistic. If you specify a value for 'MCTol', adtest uses a Monte Carlo simulation. 35-41

35

Functions

• If the hypothesized distribution is specified as a distribution family with unknown parameters, adtest retrieves the critical value from a table and uses inverse interpolation to determine the pvalue. If you specify a value for 'MCTol', adtest uses a Monte Carlo simulation. adstat — Test statistic scalar value Test statistic for the Anderson-Darling test, returned as a scalar value. • If the hypothesized distribution is a fully specified probability distribution object, adtest computes adstat using specified parameters. • If the hypothesized distribution is specified as a distribution family with unknown parameters, adtest computes adstat using parameters estimated from the sample data. cv — Critical value scalar value Critical value for the Anderson-Darling test at the significance level Alpha, returned as a scalar value. adtest determines cv by interpolating into a table based on the specified Alpha significance level.

More About Anderson-Darling Test The Anderson-Darling test is commonly used to test whether a data sample comes from a normal distribution. However, it can be used to test for another hypothesized distribution, even if you do not fully specify the distribution parameters. Instead, the test estimates any unknown parameters from the data sample. The test statistic belongs to the family of quadratic empirical distribution function statistics, which measure the distance between the hypothesized distribution, F(x) and the empirical cdf, Fn(x) as

∫

n

∞

−∞

Fn x − F x

2w

x dF x ,

over the ordered sample values x1 < x2 < ... < xn, where w(x) is a weight function and n is the number of data points in the sample. The weight function for the Anderson-Darling test is w x = F x 1−F x

−1

,

which places greater weight on the observations in the tails of the distribution, thus making the test more sensitive to outliers and better at detecting departure from normality in the tails of the distribution. The Anderson-Darling test statistic is An2 = − n −

n

2i − 1 ln F Xi + ln 1 − F Xn + 1 − i n i=1

∑

,

where X1 < ... < Xn are the ordered sample data points and n is the number of data points in the sample. 35-42

adtest

In adtest, the decision to reject or not reject the null hypothesis is based on comparing the p-value for the hypothesis test with the specified significance level, not on comparing the test statistic with the critical value. Monte Carlo Standard Error The Monte Carlo standard error is the error due to simulating the p-value. The Monte Carlo standard error is calculated as SE =

p 1−p , mcreps

where p is the estimated p-value of the hypothesis test, and mcreps is the number of Monte Carlo replications performed. adtest chooses the number of Monte Carlo replications, mcreps, large enough to make the Monte Carlo standard error for p less than the value specified for MCTol.

Version History Introduced in R2013a

See Also kstest | jbtest

35-43

35

Functions

andrewsplot Andrews plot

Syntax andrewsplot(X) andrewsplot(X,...,'Standardize',standopt) andrewsplot(X,...,'Quantile',alpha) andrewsplot(X,...,'Group',group) andrewsplot(X,...,'PropName',PropVal,...) andrewsplot(ax,X,...) h = andrewsplot(X,...)

Description andrewsplot(X) creates an Andrews plot of the multivariate data in the matrix X. The rows of X correspond to observations, the columns to variables. Andrews plots represent each observation by a function f(t) of a continuous dummy variable t over the interval [0,1]. f(t) is defined for the ith observation in X as f (t) = X(i, 1)/ 2 + X(i, 2)sin(2πt) + X(i, 3)cos(2πt) + … andrewsplot treats NaN values in X as missing values and ignores the corresponding rows. andrewsplot(X,...,'Standardize',standopt) creates an Andrews plot where standopt is one of the following: • 'on' — scales each column of X to have mean 0 and standard deviation 1 before making the plot. • 'PCA' — creates an Andrews plot from the principal component scores of X, in order of decreasing eigenvalue. (See pca.) • 'PCAStd' — creates an Andrews plot using the standardized principal component scores. (See pca.) andrewsplot(X,...,'Quantile',alpha) plots only the median and the alpha and (1 – alpha) quantiles of f(t) at each value of t. This is useful if X contains many observations. andrewsplot(X,...,'Group',group) plots the data in different groups with different colors. Groups are defined by group, a numeric array containing a group index for each observation. group can also be a categorical array, character matrix, string array, or cell array of character vectors containing a group name for each observation. andrewsplot(X,...,'PropName',PropVal,...) sets optional Line object properties to the specified values for all Line objects created by andrewsplot. (See Line Properties.) andrewsplot(ax,X,...) uses the plot axes specified in ax, an Axes object. (See axes.) Specify ax as the first input argument followed by any of the input argument combinations in the previous syntaxes. h = andrewsplot(X,...) returns a column vector of handles to the Line objects created by andrewsplot, one handle per row of X. If you use the 'Quantile' input parameter, h contains one 35-44

andrewsplot

handle for each of the three Line objects created. If you use both the 'Quantile' and the 'Group' input parameters, h contains three handles for each group.

Examples Create Andrews Plot to Visualize Grouped Data This example shows how to create an Andrews plot to visualize grouped sample data. Load the sample data. load fisheriris

Create an Andrews plot, grouping the sample data by species. andrewsplot(meas,'group',species)

Create a second, simplified Andrews plot that only displays the median and quartiles of each group. andrewsplot(meas,'group',species,'quantile',.25)

35-45

35

Functions

Version History Introduced before R2006a

See Also parallelcoords | glyphplot Topics “Grouping Variables” on page 2-11

35-46

anova

anova Package: Analysis of variance for linear regression model

Syntax tbl = anova(mdl) tbl = anova(mdl,anovatype) tbl = anova(mdl,'component',sstype)

Description tbl = anova(mdl) returns a table with component ANOVA statistics. tbl = anova(mdl,anovatype) returns ANOVA statistics of the specified type anovatype. For example, specify anovatype as 'component'(default) to return a table with component ANOVA statistics, or specify anovatype as 'summary' to return a table with summary ANOVA statistics. tbl = anova(mdl,'component',sstype) computes component ANOVA statistics using the specified type of sum of squares.

Examples Component ANOVA Table Create a component ANOVA table from a linear regression model of the hospital data set. Load the hospital data set and create a model of blood pressure as a function of age and gender. load hospital tbl = table(hospital.Age,hospital.Sex,hospital.BloodPressure(:,2), ... 'VariableNames',{'Age','Sex','BloodPressure'}); tbl.Sex = categorical(tbl.Sex); mdl = fitlm(tbl,'BloodPressure ~ Sex + Age^2') mdl = Linear regression model: BloodPressure ~ 1 + Age + Sex + Age^2 Estimated Coefficients: Estimate _________ (Intercept) Age Sex_Male Age^2

63.942 0.90673 3.0019 -0.011275

SE ________

tStat ________

pValue _________

19.194 1.0442 1.3765 0.013853

3.3314 0.86837 2.1808 -0.81389

0.0012275 0.38736 0.031643 0.41772

Number of observations: 100, Error degrees of freedom: 96

35-47

35

Functions

Root Mean Squared Error: 6.83 R-squared: 0.0577, Adjusted R-Squared: 0.0283 F-statistic vs. constant model: 1.96, p-value = 0.125

Create an ANOVA table of the model. tbl = anova(mdl) tbl=4×5 table

Age Sex Age^2 Error

SumSq ______

DF __

MeanSq ______

F _______

pValue ________

18.705 222.09 30.934 4483.1

1 1 1 96

18.705 222.09 30.934 46.699

0.40055 4.7558 0.66242

0.52831 0.031643 0.41772

The table displays the following columns for each term except the constant (intercept) term: • SumSq — Sum of squares explained by the term. • DF — Degrees of freedom. In this example, DF is 1 for each term in the model and n – p for the error term, where n is the number of observations and p is the number of coefficients (including the intercept) in the model. For example, the DF for the error term in this model is 100 – 4 = 96. If any variable in the model is a categorical variable, the DF for that variable is the number of indicator variables created for its categories (number of categories – 1). • MeanSq — Mean square, defined by MeanSq = SumSq/DF. For example, the mean square of the error term, mean squared error (MSE), is 4.4831e+03/96 = 46.6991. • F — F-statistic value to test the null hypothesis that the corresponding coefficient is zero, computed by F = MeanSq/MSE, where MSE is the mean squared error. When the null hypothesis is true, the F-statistic follows the F-distribution. The numerator degrees of freedom is the DF value for the corresponding term, and the denominator degrees of freedom is n – p. In this example, each F-statistic follows an F(1, 96)-distribution. • pValue — p-value of the F-statistic value. For example, the p-value for Age is 0.5283, implying that Age is not significant at the 5% significance level given the other terms in the model.

Summary ANOVA Table Create a summary ANOVA table from a linear regression model of the hospital data set. Load the hospital data set and create a model of blood pressure as a function of age and gender. load hospital tbl = table(hospital.Age,hospital.Sex,hospital.BloodPressure(:,2), ... 'VariableNames',{'Age','Sex','BloodPressure'}); tbl.Sex = categorical(tbl.Sex); mdl = fitlm(tbl,'BloodPressure ~ Sex + Age^2') mdl = Linear regression model: BloodPressure ~ 1 + Age + Sex + Age^2 Estimated Coefficients:

35-48

anova

(Intercept) Age Sex_Male Age^2

Estimate _________

SE ________

tStat ________

pValue _________

63.942 0.90673 3.0019 -0.011275

19.194 1.0442 1.3765 0.013853

3.3314 0.86837 2.1808 -0.81389

0.0012275 0.38736 0.031643 0.41772

Number of observations: 100, Error degrees of freedom: 96 Root Mean Squared Error: 6.83 R-squared: 0.0577, Adjusted R-Squared: 0.0283 F-statistic vs. constant model: 1.96, p-value = 0.125

Create a summary ANOVA table of the model. tbl = anova(mdl,'summary') tbl=7×5 table

Total Model . Linear . Nonlinear Residual . Lack of fit . Pure error

SumSq ______

DF __

MeanSq ______

4757.8 274.73 243.8 30.934 4483.1 1483.1 3000

99 3 2 1 96 39 57

48.059 91.577 121.9 30.934 46.699 38.028 52.632

F _______

pValue ________

1.961 2.6103 0.66242

0.12501 0.078726 0.41772

0.72253

0.85732

The table displays tests for groups of terms: Total, Model, and Residual. • Total — This row shows the total sum of squares (SumSq), degrees of freedom (DF), and the mean squared error (MeanSq). Note that MeanSq = SumSq/DF. • Model — This row includes SumSq, DF, MeanSq, F-statistic value (F), and p-value (pValue). Because this model includes a nonlinear term (Age^2), anova partitions the sum of squares (SumSq) of Model into two parts: SumSq explained by the linear terms (Age and Sex) and SumSq explained by the nonlinear term (Age^2). The corresponding F-statistic values are for testing the significance of the linear terms and the nonlinear term as separate groups. The nonlinear group consists of the Age^2 term only, so it has the same p-value as the Age^2 term in the “Component ANOVA Table” on page 35-47. • Residual — This row includes SumSq, DF, MeanSq, F, and pValue. Because the data set includes replications, anova partitions the residual SumSq into the part for the replications (Pure error) and the rest (Lack of fit). To test the lack of fit, anova computes the F-statistic value by comparing the model residuals to the model-free variance estimate computed on the replications. The F-statistic value shows no evidence of lack of fit.

Linear Regression with Categorical Predictor Fit a linear regression model that contains a categorical predictor. Reorder the categories of the categorical predictor to control the reference level in the model. Then, use anova to test the significance of the categorical variable. 35-49

35

Functions

Model with Categorical Predictor Load the carsmall data set and create a linear regression model of MPG as a function of Model_Year. To treat the numeric vector Model_Year as a categorical variable, identify the predictor using the 'CategoricalVars' name-value pair argument. load carsmall mdl = fitlm(Model_Year,MPG,'CategoricalVars',1,'VarNames',{'Model_Year','MPG'}) mdl = Linear regression model: MPG ~ 1 + Model_Year Estimated Coefficients: Estimate ________ (Intercept) Model_Year_76 Model_Year_82

17.69 3.8839 14.02

SE ______

tStat ______

pValue __________

1.0328 1.4059 1.4369

17.127 2.7625 9.7571

3.2371e-30 0.0069402 8.2164e-16

Number of observations: 94, Error degrees of freedom: 91 Root Mean Squared Error: 5.56 R-squared: 0.531, Adjusted R-Squared: 0.521 F-statistic vs. constant model: 51.6, p-value = 1.07e-15

The model formula in the display, MPG ~ 1 + Model_Year, corresponds to MPG = β0 + β1ΙYear = 76 + β2ΙYear = 82 + ϵ, where ΙYear = 76 and ΙYear = 82 are indicator variables whose value is one if the value of Model_Year is 76 and 82, respectively. The Model_Year variable includes three distinct values, which you can check by using the unique function. unique(Model_Year) ans = 3×1 70 76 82

fitlm chooses the smallest value in Model_Year as a reference level ('70') and creates two indicator variables ΙYear = 76 and ΙYear = 82. The model includes only two indicator variables because the design matrix becomes rank deficient if the model includes three indicator variables (one for each level) and an intercept term. Model with Full Indicator Variables You can interpret the model formula of mdl as a model that has three indicator variables without an intercept term: y = β0Ιx1 = 70 + β0 + β1 Ιx1 = 76 + β0 + β2 Ιx2 = 82 + ϵ. 35-50

anova

Alternatively, you can create a model that has three indicator variables without an intercept term by manually creating indicator variables and specifying the model formula. temp_Year = dummyvar(categorical(Model_Year)); Model_Year_70 = temp_Year(:,1); Model_Year_76 = temp_Year(:,2); Model_Year_82 = temp_Year(:,3); tbl = table(Model_Year_70,Model_Year_76,Model_Year_82,MPG); mdl = fitlm(tbl,'MPG ~ Model_Year_70 + Model_Year_76 + Model_Year_82 - 1') mdl = Linear regression model: MPG ~ Model_Year_70 + Model_Year_76 + Model_Year_82 Estimated Coefficients: Estimate ________ Model_Year_70 Model_Year_76 Model_Year_82

17.69 21.574 31.71

SE _______

tStat ______

pValue __________

1.0328 0.95387 0.99896

17.127 22.617 31.743

3.2371e-30 4.0156e-39 5.2234e-51

Number of observations: 94, Error degrees of freedom: 91 Root Mean Squared Error: 5.56

Choose Reference Level in Model You can choose a reference level by modifying the order of categories in a categorical variable. First, create a categorical variable Year. Year = categorical(Model_Year);

Check the order of categories by using the categories function. categories(Year) ans = 3x1 cell {'70'} {'76'} {'82'}

If you use Year as a predictor variable, then fitlm chooses the first category '70' as a reference level. Reorder Year by using the reordercats function. Year_reordered = reordercats(Year,{'76','70','82'}); categories(Year_reordered) ans = 3x1 cell {'76'} {'70'} {'82'}

The first category of Year_reordered is '76'. Create a linear regression model of MPG as a function of Year_reordered. mdl2 = fitlm(Year_reordered,MPG,'VarNames',{'Model_Year','MPG'})

35-51

35

Functions

mdl2 = Linear regression model: MPG ~ 1 + Model_Year Estimated Coefficients: Estimate ________ (Intercept) Model_Year_70 Model_Year_82

21.574 -3.8839 10.136

SE _______

tStat _______

pValue __________

0.95387 1.4059 1.3812

22.617 -2.7625 7.3385

4.0156e-39 0.0069402 8.7634e-11

Number of observations: 94, Error degrees of freedom: 91 Root Mean Squared Error: 5.56 R-squared: 0.531, Adjusted R-Squared: 0.521 F-statistic vs. constant model: 51.6, p-value = 1.07e-15

mdl2 uses '76' as a reference level and includes two indicator variables ΙYear = 70 and ΙYear = 82. Evaluate Categorical Predictor The model display of mdl2 includes a p-value of each term to test whether or not the corresponding coefficient is equal to zero. Each p-value examines each indicator variable. To examine the categorical variable Model_Year as a group of indicator variables, use anova. Use the 'components'(default) option to return a component ANOVA table that includes ANOVA statistics for each variable in the model except the constant term. anova(mdl2,'components') ans=2×5 table

Model_Year Error

SumSq ______

DF __

MeanSq ______

F _____

pValue __________

3190.1 2815.2

2 91

1595.1 30.936

51.56

1.0694e-15

The component ANOVA table includes the p-value of the Model_Year variable, which is smaller than the p-values of the indicator variables.

Input Arguments mdl — Linear regression model object LinearModel object | CompactLinearModel object Linear regression model object, specified as a LinearModel object created by using fitlm or stepwiselm, or a CompactLinearModel object created by using compact. anovatype — ANOVA type 'component' (default) | 'summary' ANOVA type, specified as one of these values: • 'component' — anova returns the table tbl with ANOVA statistics for each variable in the model except the constant term. 35-52

anova

• 'summary' — anova returns the table tbl with summary ANOVA statistics for grouped variables and the model as a whole. For details, see the tbl output argument description. sstype — Sum of squares type 'h' (default) | 1 | 2 | 3 Sum of squares type for each term, specified as one of the values in this table. Value

Description

1

Type 1 sum of squares — Reduction in residual sum of squares obtained by adding the term to a fit that already includes the preceding terms

2

Type 2 sum of squares — Reduction in residual sum of squares obtained by adding the term to a model that contains all other terms

3

Type 3 sum of squares — Reduction in residual sum of squares obtained by adding the term to a model that contains all other terms, but with their effects constrained to obey the usual “sigma restrictions” that make models estimable

'h'

Hierarchical model — Similar to Type 2, but uses both continuous and categorical factors to determine the hierarchy of terms

The sum of squares for any term is determined by comparing two models. For a model containing main effects but no interactions, the value of sstype influences the computations on unbalanced data only. Suppose you are fitting a model with two factors and their interaction, and the terms appear in the order A, B, AB. Let R(·) represent the residual sum of squares for the model. So, R(A, B, AB) is the residual sum of squares fitting the whole model, R(A) is the residual sum of squares fitting the main effect of A only, and R(1) is the residual sum of squares fitting the mean only. The three sum of squares types are as follows: Term

Type 1 Sum of Squares

Type 2 Sum of Squares

Type 3 Sum of Squares

A

R(1) – R(A)

R(B) – R(A, B)

R(B, AB) – R(A, B, AB)

B

R(A) – R(A, B)

R(A) – R(A, B)

R(A, AB) – R(A, B, AB)

AB

R(A, B) – R(A, B, AB)

R(A, B) – R(A, B, AB)

R(A, B) – R(A, B, AB)

The models for Type 3 sum of squares have sigma restrictions imposed. This means, for example, that in fitting R(B, AB), the array of AB effects is constrained to sum to 0 over A for each value of B, and over B for each value of A. For Type 3 sum of squares: • If mdl is a CompactLinearModel object and the regression model is nonhierarchical, anova returns an error. • If mdl is a LinearModel object and the regression model is nonhierarchical, anova refits the model using effects coding whenever it needs to compute a Type 3 sum of squares. 35-53

35

Functions

• If the regression model in mdl is hierarchical, anova computes the results without refitting the model. sstype applies only if anovatype is 'component'.

Output Arguments tbl — ANOVA summary statistics table table ANOVA summary statistics table, returned as a table. The contents of tbl depend on the ANOVA type specified in anovatype. • If anovatype is 'component', then tbl contains ANOVA statistics for each variable in the model except the constant (intercept) term. The table includes these columns for each variable: Column

Description

SumSq

Sum of squares explained by the term, computed depending on sstype

DF

Degrees of freedom • DF of a numeric variable is 1. • DF of a categorical variable is the number of indicator variables created for the category (number of categories – 1). Note that tbl contains one row for each categorical variable instead of one row for each indicator variable as in the model display. Use anova to test a categorical variable as a group of indicator variables. • DF of an error term is n – p, where n is the number of observations and p is the number of coefficients in the model.

MeanSq

Mean square, defined by MeanSq = SumSq/DF MeanSq for the error term is the mean squared error (MSE).

F

F-statistic value to test the null hypothesis that the corresponding coefficient is zero, computed by F = MeanSq/MSE When the null hypothesis is true, the F-statistic follows the Fdistribution. The numerator degrees of freedom is the DF value for the corresponding term, and the denominator degrees of freedom is n – p.

pValue

p-value of the F-statistic value

For an example, see “Component ANOVA Table” on page 35-47. • If anovatype is 'summary', then tbl contains summary statistics of grouped terms for each row. The table includes the same columns as 'component' and these rows:

35-54

anova

Row

Description

Total

Total statistics • SumSq — Total sum of squares, which is the sum of the squared deviations of the response around its mean • DF — Sum of degrees of freedom of Model and Residual

Model

Statistics for the model as a whole • SumSq — Model sum of squares, which is the sum of the squared deviations of the fitted value around the response mean. • F and pValue — These values provide a test of whether the model as a whole fits significantly better than a degenerate model consisting of only a constant term. If mdl includes only linear terms, then anova does not decompose Model into Linear and NonLinear.

Linear

Statistics for linear terms • SumSq — Sum of squares for linear terms, which is the difference between the model sum of squares and the sum of squares for nonlinear terms. • F and pValue — These values provide a test of whether the model with only linear terms fits better than a degenerate model consisting of only a constant term. anova uses the mean squared error that is based on the full model to compute this F-value, so the F-value obtained by dropping the nonlinear terms and repeating the test is not the same as the value in this row.

Nonlinear

Statistics for nonlinear terms • SumSq — Sum of squares for nonlinear (higher-order or interaction) terms, which is the increase in the residual sum of squares obtained by keeping only the linear terms and dropping all nonlinear terms. • F and pValue — These values provide a test of whether the full model fits significantly better than a smaller model consisting of only the linear terms.

35-55

35

Functions

Row

Description

Residual

Statistics for residuals • SumSq — Residual sum of squares, which is the sum of the squared residual values • MeanSq — Mean squared error, used to compute the Fstatistic values for Model, Linear, and NonLinear If mdl is a full LinearModel object and the sample data contains replications (multiple observations sharing the same predictor values), then anova decomposes the residual sum of squares into a sum of squares for the replicated observations (Lack of fit) and the remaining sum of squares (Pure error).

Lack of fit

Lack-of-fit statistics • SumSq — Sum of squares due to lack of fit, which is the difference between the residual sum of squares and the replication sum of squares. • F and pValue — The F-statistic value is the ratio of lack-offit MeanSq to pure error MeanSq. The ratio provides a test of bias by measuring whether the variation of the residuals is larger than the variation of the replications. A low pvalue implies that adding additional terms to the model can improve the fit.

Pure error

Statistics for pure error • SumSq — Replication sum of squares, obtained by finding the sets of points with identical predictor values, computing the sum of squared deviations around the mean within each set, and pooling the computed values • MeanSq — Model-free pure error variance estimate of the response

For an example, see “Summary ANOVA Table” on page 35-48.

Alternative Functionality More complete ANOVA statistics are available in the anova1, anova2, and anovan functions.

Version History Introduced in R2012a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

35-56

anova

This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also CompactLinearModel | LinearModel | coefCI | coefTest | dwtest Topics “F-statistic and t-statistic” on page 11-74 “Interpret Linear Regression Results” on page 11-52 “Linear Regression with Categorical Covariates” on page 2-17 “Linear Regression Workflow” on page 11-35 “Linear Regression” on page 11-9

35-57

35

Functions

anova Analysis of variance for linear mixed-effects model

Syntax stats = anova(lme) stats = anova(lme,Name,Value)

Description stats = anova(lme) returns the dataset array stats that includes the results of the F-tests for each fixed-effects term in the linear mixed-effects model lme. stats = anova(lme,Name,Value) also returns the dataset array stats with additional options specified by one or more Name,Value pair arguments.

Examples F-Tests for Fixed Effects Load the sample data. load('shift.mat')

The data shows the deviations from the target quality characteristic measured from the products that five operators manufacture during three shifts: morning, evening, and night. This is a randomized block design, where the operators are the blocks. The experiment is designed to study the impact of the time of shift on the performance. The performance measure is the deviation of the quality characteristics from the target value. This is simulated data. Shift and Operator are nominal variables. shift.Shift = nominal(shift.Shift); shift.Operator = nominal(shift.Operator);

Fit a linear mixed-effects model with a random intercept grouped by operator to assess if performance significantly differs according to the time of the shift. Use the restricted maximum likelihood method and 'effects' contrasts. 'effects' contrasts indicate that the coefficients sum to 0, and fitlme creates two contrast-coded variables in the fixed-effects design matrix, $X$1 and $X$2, where 0, if Morning Shif t_Evening = 1, if Evening −1, if Night and 35-58

anova

1, if Morning Shif t_Morning = 0, if Evening . −1, if Night The model corresponds to MorningShift: QCDevim = β0 + β2Shif t_Morningi + b0m + εim,

m = 1, 2, . . . , 5,

EveningShift: QCDevim = β0 + β1Shif t_Eveningi + b0m + εim, NightShift:

QCDevim = β0 − β1Shif t_Eveningi − β2Shif t_Morningi + b0m + εim,

where b ~ N(0, σb2 ) and ϵ ~ N(0, σ2 ). lme = fitlme(shift,'QCDev ~ Shift + (1|Operator)',... 'FitMethod','REML','DummyVarCoding','effects') lme = Linear mixed-effects model fit by REML Model information: Number of observations Fixed effects coefficients Random effects coefficients Covariance parameters

15 3 5 2

Formula: QCDev ~ 1 + Shift + (1 | Operator) Model fit statistics: AIC BIC 58.913 61.337

LogLikelihood -24.456

Deviance 48.913

Fixed effects coefficients (95% CIs): Name Estimate {'(Intercept)' } 3.6525 {'Shift_Evening'} -0.53293 {'Shift_Morning'} -0.91973

SE 0.94109 0.31206 0.31206

Random effects covariance parameters (95% CIs): Group: Operator (5 Levels) Name1 Name2 {'(Intercept)'} {'(Intercept)'} Group: Error Name {'Res Std'}

Estimate 0.85462

Lower 0.52357

tStat 3.8812 -1.7078 -2.9473

Type {'std'}

DF 12 12 12

pValue 0.0021832 0.11339 0.012206

Estimate 2.0457

Lower 1.6021 -1.2129 -1.5997

Lower 0.98207

Upper 1.395

Perform an F-test to determine if all fixed-effects coefficients are 0. anova(lme) ans = ANOVA MARGINAL TESTS: DFMETHOD = 'RESIDUAL' Term {'(Intercept)'}

FStat 15.063

DF1 1

DF2 12

pValue 0.0021832

35-59

Uppe 5 0.1 -0.2

Upper 4.2612

35

Functions

{'Shift'

}

11.091

2

12

0.0018721

The p-value for the constant term, 0.0021832, is the same as in the coefficient table in the lme display. The p-value of 0.0018721 for Shift measures the combined significance for both coefficients representing Shift.

ANOVA for Fixed-Effects in LME Model Load the sample data. load('fertilizer.mat')

The dataset array includes data from a split-plot experiment, where soil is divided into three blocks based on the soil type: sandy, silty, and loamy. Each block is divided into five plots, where five types of tomato plants (cherry, heirloom, grape, vine, and plum) are randomly assigned to these plots. The tomato plants in the plots are then divided into subplots, where each subplot is treated by one of four fertilizers. This is simulated data. Store the data in a dataset array called ds, for practical purposes, and define Tomato, Soil, and Fertilizer as categorical variables. ds = fertilizer; ds.Tomato = nominal(ds.Tomato); ds.Soil = nominal(ds.Soil); ds.Fertilizer = nominal(ds.Fertilizer);

Fit a linear mixed-effects model, where Fertilizer and Tomato are the fixed-effects variables, and the mean yield varies by the block (soil type) and the plots within blocks (tomato types within soil types) independently. Use the 'effects' contrasts when fitting the data for the type III sum of squares. lme = fitlme(ds,'Yield ~ Fertilizer * Tomato + (1|Soil) + (1|Soil:Tomato)',... 'DummyVarCoding','effects') lme = Linear mixed-effects model fit by ML Model information: Number of observations Fixed effects coefficients Random effects coefficients Covariance parameters

60 20 18 3

Formula: Yield ~ 1 + Tomato*Fertilizer + (1 | Soil) + (1 | Soil:Tomato) Model fit statistics: AIC BIC 522.57 570.74

LogLikelihood -238.29

Fixed effects coefficients (95% CIs): Name {'(Intercept)' } {'Tomato_Cherry' }

35-60

Deviance 476.57 Estimate 104.6 1.4

SE 3.3008 5.9353

tStat 31.69 0.23588

DF 40 40

pValue 5.9086e-30 0.81473

anova

{'Tomato_Grape' } {'Tomato_Heirloom' } {'Tomato_Plum' } {'Fertilizer_1' } {'Fertilizer_2' } {'Fertilizer_3' } {'Tomato_Cherry:Fertilizer_1' } {'Tomato_Grape:Fertilizer_1' } {'Tomato_Heirloom:Fertilizer_1'} {'Tomato_Plum:Fertilizer_1' } {'Tomato_Cherry:Fertilizer_2' } {'Tomato_Grape:Fertilizer_2' } {'Tomato_Heirloom:Fertilizer_2'} {'Tomato_Plum:Fertilizer_2' } {'Tomato_Cherry:Fertilizer_3' } {'Tomato_Grape:Fertilizer_3' } {'Tomato_Heirloom:Fertilizer_3'} {'Tomato_Plum:Fertilizer_3' }

-7.7667 -11.183 30.233 -28.267 -1.9333 10.733 -0.73333 -7.5667 5.1833 2.7667 7.6 -1.9 5.5167 -3.9 -6.0667 3.7667 3.1833 1.1

5.9353 5.9353 5.9353 2.3475 2.3475 2.3475 4.6951 4.6951 4.6951 4.6951 4.6951 4.6951 4.6951 4.6951 4.6951 4.6951 4.6951 4.6951

-1.3085 -1.8842 5.0938 -12.041 -0.82356 4.5722 -0.15619 -1.6116 1.104 0.58927 1.6187 -0.40468 1.175 -0.83066 -1.2921 0.80226 0.67802 0.23429

40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40

0.19816 0.066821 8.777e-06 7.0265e-15 0.41507 4.577e-05 0.87667 0.11491 0.27619 0.55899 0.11337 0.68787 0.24695 0.4111 0.20373 0.42714 0.50167 0.81596

Random effects covariance parameters (95% CIs): Group: Soil (3 Levels) Name1 Name2 {'(Intercept)'} {'(Intercept)'}

Type {'std'}

Estimate 2.5028

Lower 0.027711

Group: Soil:Tomato (15 Levels) Name1 Name2 {'(Intercept)'} {'(Intercept)'}

Type {'std'}

Estimate 10.225

Lower 6.1497

Group: Error Name {'Res Std'}

Estimate 10.499

Lower 8.5389

Upper 12.908

Perform an analysis of variance to test for the fixed-effects. anova(lme) ans = ANOVA MARGINAL TESTS: DFMETHOD = 'RESIDUAL' Term {'(Intercept)' } {'Tomato' } {'Fertilizer' } {'Tomato:Fertilizer'}

FStat 1004.2 7.1663 58.833 1.4182

DF1 1 4 3 12

DF2 40 40 40 40

pValue 5.9086e-30 0.00018935 1.0024e-14 0.19804

The p-value for the constant term, 5.9086e-30, is the same as in the coefficient table in the lme display. The p-values of 0.00018935, 1.0024e-14, and 0.19804 for Tomato, Fertilizer, and Tomato:Fertilizer represent the combined significance for all tomato coefficients, fertilizer coefficients, and coefficients representing the interaction between the tomato and fertilizer, respectively. The p-value of 0.19804 indicates that the interaction between tomato and fertilizer is not significant.

35-61

Upper 226.05 Upper 17.001

35

Functions

Satterthwaite Approximation for Degrees of Freedom Load the sample data. load('weight.mat')

weight contains data from a longitudinal study, where 20 subjects are randomly assigned 4 exercise programs, and their weight loss is recorded over six 2-week time periods. This is simulated data. Store the data in a table. Define Subject and Program as categorical variables. tbl = table(InitialWeight,Program,Subject,Week,y); tbl.Subject = nominal(tbl.Subject); tbl.Program = nominal(tbl.Program);

Fit the model using the 'effects' contrasts. lme = fitlme(tbl,'y ~ InitialWeight + Program*Week + (Week|Subject)',... 'DummyVarCoding','effects') lme = Linear mixed-effects model fit by ML Model information: Number of observations Fixed effects coefficients Random effects coefficients Covariance parameters

120 9 40 4

Formula: y ~ 1 + InitialWeight + Program*Week + (1 + Week | Subject) Model fit statistics: AIC BIC -22.981 13.257

LogLikelihood 24.49

Fixed effects coefficients (95% CIs): Name Estimate {'(Intercept)' } 0.77122 {'InitialWeight' } 0.0031879 {'Program_A' } -0.11017 {'Program_B' } 0.25061 {'Program_C' } -0.14344 {'Week' } 0.19881 {'Program_A:Week'} -0.025607 {'Program_B:Week'} 0.013164 {'Program_C:Week'} 0.0049357

Deviance -48.981 SE 0.24309 0.0013814 0.080377 0.08045 0.080475 0.033727 0.058417 0.058417 0.058417

Random effects covariance parameters (95% CIs): Group: Subject (20 Levels) Name1 Name2 {'(Intercept)'} {'(Intercept)'} {'Week' } {'(Intercept)'} {'Week' } {'Week' } Group: Error Name {'Res Std'}

35-62

Estimate 0.10261

Lower 0.087882

tStat 3.1725 2.3078 -1.3707 3.1151 -1.7824 5.8946 -0.43835 0.22535 0.084492

Type {'std' } {'corr'} {'std' } Upper 0.11981

DF 111 111 111 111 111 111 111 111 111

Estimate 0.18407 0.66841 0.15033

pValue 0.0019549 0.022863 0.17323 0.0023402 0.077424 4.1099e-08 0.66198 0.82212 0.93282

Lower 0.12281 0.21076 0.11004

Lower 0.289 0.000450 -0.269 0.0911 -0.30 0.131 -0.141 -0.102 -0.110

Upper 0.27587 0.88573 0.20537

anova

The p-values 0.022863 and 4.1099e-08 indicate significant effects of the initial weights of the subjects and the time factor in the amount of weight lost. The weight loss of subjects who are in program B is significantly different relative to the weight loss of subjects that are in program A. The lower and upper limits of the covariance parameters for the random effects do not include zero, thus they are significant. Perform an F-test that all fixed-effects coefficients are zero. anova(lme) ans = ANOVA MARGINAL TESTS: DFMETHOD = 'RESIDUAL' Term {'(Intercept)' } {'InitialWeight'} {'Program' } {'Week' } {'Program:Week' }

FStat 10.065 5.326 3.6798 34.747 0.066648

DF1 1 1 3 1 3

DF2 111 111 111 111 111

pValue 0.0019549 0.022863 0.014286 4.1099e-08 0.97748

The p-values for the constant term, initial weight, and week are the same as in the coefficient table in the previous lme output display. The p-value of 0.014286 for Program represents the combined significance for all program coefficients. Similarly, the p-value for the interaction between program and week (Program:Week) measures the combined significance for all coefficients representing this interaction. Now, use the Satterthwaite method to compute the degrees of freedom. anova(lme,'DFMethod','satterthwaite') ans = ANOVA MARGINAL TESTS: DFMETHOD = 'SATTERTHWAITE' Term {'(Intercept)' } {'InitialWeight'} {'Program' } {'Week' } {'Program:Week' }

FStat 10.065 5.326 3.6798 34.747 0.066648

DF1 1 1 3 1 3

DF2 20.445 20 19.14 20 20

pValue 0.004695 0.031827 0.030233 9.1346e-06 0.97697

The Satterthwaite method produces smaller denominator degrees of freedom and slightly larger pvalues.

Input Arguments lme — Linear mixed-effects model LinearMixedModel object Linear mixed-effects model, specified as a LinearMixedModel object constructed using fitlme or fitlmematrix.

35-63

35

Functions

Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: stats = anova(lme,'DFMethod','satterthwaite') DFMethod — Method for computing approximate degrees of freedom 'residual' (default) | 'satterthwaite' | 'none' Method for computing approximate degrees of freedom to use in the F-test, specified as the commaseparated pair consisting of 'DFMethod' and one of the following. 'residual'

Default. The degrees of freedom are assumed to be constant and equal to n – p, where n is the number of observations and p is the number of fixed effects.

'satterthwaite'

Satterthwaite approximation.

'none'

All degrees of freedom are set to infinity.

For example, you can specify the Satterthwaite approximation as follows. Example: 'DFMethod','satterthwaite'

Output Arguments stats — Results of F-tests for fixed-effects terms dataset array Results of F-tests for fixed-effects terms, returned as a dataset array with the following columns. Term

Name of the fixed effects term

Fstat

F-statistic for the term

DF1

Numerator degrees of freedom for the F-statistic

DF2

Denominator degrees of freedom for the Fstatistic

pValue

p-value of the test for the term

There is one row for each fixed-effects term. Each term is a continuous variable, a grouping variable, or an interaction between two or more continuous or grouping variables. For each fixed-effects term, anova performs an F-test (marginal test) to determine if all coefficients representing the fixed-effects term are 0. To perform tests for the type III hypothesis, you must use the 'effects' contrasts while fitting the linear mixed-effects model.

Tips • For each fixed-effects term, anova performs an F-test (marginal test), that all coefficients representing the fixed-effects term are 0. To perform tests for type III hypotheses, you must set 35-64

anova

the 'DummyVarCoding' name-value pair argument to 'effects' contrasts while fitting your linear mixed-effects model.

Version History Introduced in R2013b

See Also LinearMixedModel | fitlme | fitlmematrix

35-65

35

Functions

anova Analysis of variance (ANOVA) results

Description An anova object contains the results of a one- on page 9-2, two- on page 9-11, or N-way ANOVA on page 9-28. Use the properties of an anova object to determine if the means in a set of response data differ with respect to the values (levels) of a factor or multiple factors. The object properties include information about the coefficient estimates, ANOVA model fit to the response data, and factors used to perform the analysis.

Creation Syntax aov aov aov aov aov aov

= = = = = =

anova(y) anova(factors,y) anova(tbl,y) anova(tbl,responseVarName) anova(tbl,formula) anova( ___ ,Name=Value)

Description aov = anova(y) performs a one-way ANOVA and returns the anova object aov for the response data in the matrix y. Each column of y is treated as a different factor value. aov = anova(factors,y) performs a one-, two-, or N-way ANOVA and returns an anova object for the response data in the vector y. The argument factors specifies the number of factors and their values. aov = anova(tbl,y) uses the variables in the table tbl as factors for the response data in the vector y. Each table variable corresponds to a factor. aov = anova(tbl,responseVarName) uses the variables in tbl as factors and response data. The responseVarName argument specifies which variable contains the response data. aov = anova(tbl,formula) specifies the ANOVA model in Wilkinson notation on page 11-93. The terms of formula use only the variable names in tbl. aov = anova( ___ ,Name=Value) specifies additional options using one or more name-value arguments. For example, you can specify which factors are categorical or random, and specify the sum of squares type. Input Arguments y — Response data matrix | numeric vector 35-66

anova

Response data, specified as a matrix or a numeric vector. • If y is a matrix, anova treats each column of y as a separate factor value in a one-way ANOVA. In this design, the function evaluates whether the population means of the columns are equal. Use this design when you want to perform a one-way ANOVA on data that is equally divided between each group (balanced ANOVA).

• If y is a numeric vector, you must also specify either the factors or tbl input argument. For a one-way ANOVA, factors is a cell array of character vectors or a vector in which each element represents the factor value of the corresponding element in y.

• For an N-way ANOVA, factors is a cell array of vectors in which each cell is treated as a separate factor. Alternatively, for an N-way ANOVA, you can provide a table tbl in which each variable is treated as a separate factor. Use this design when you want to perform a two- or N-way ANOVA, or when factor values correspond to different numbers of observations in y (unbalanced ANOVA). Note The anova function ignores NaN values, values, empty characters, and empty strings in y. If factors or tbl contains NaN or values, or empty characters or strings, the function ignores the corresponding observations in y. The ANOVA is balanced if each factor value has the same number of observations after the function disregards empty or NaN values. Otherwise, the function performs an unbalanced ANOVA. Data Types: single | double factors — factors and factor values numeric vector | logical vector | categorical vector | string vector | character vector | cell array of vectors Factors and factor values for the ANOVA, specified as a numeric, logical, categorical, string, or character vector, or a cell array of vectors. Factors and factor values are sometimes called grouping variables and group names, respectively. 35-67

35

Functions

For a one-way ANOVA, factors is a vector or cell array of character vectors in which each element represents the factor value of the observation in y at the same position. The anova function groups observations in y by their factor values during the ANOVA. The length of factors must be the same as the length of y.

For a two- or N-way ANOVA, factors is a cell array of vectors in which each cell corresponds to a different factor. Each vector contains the values of the corresponding factor and must have the same length as y. Factor values are associated with observations in y by their index. If factors contains NaN values, anova ignores the corresponding observations in y. For more information on factors, see “Grouping Variables” on page 2-11. Note If factors or tbl contains NaN values, values, empty characters, or empty strings, the anova function ignores the corresponding observations in y. The ANOVA is balanced if each factor value has the same number of observations after the function disregards empty or NaN values. Otherwise, the function performs an unbalanced ANOVA. Example: [1,2,1,3,1,...,3,1] Example: ["white","red","white",...,"black","red"] Example: school=["Springfield","Springfield","Springfield","Arlington","Springfield"," Arlington","Arlington"]; monthnumber=[6,12,1,9,4,6,2]; factors={school,monthnumber}; Data Types: single | double | logical | categorical | char | string | cell tbl — Factors, factor values, and response data table Factors, factor values, and response data, specified as a table. The variables of tbl can contain numeric, logical, categorical, character vector, or string elements, or cell arrays of characters. When you specify tbl, you must also specify the response data y, responseVarName, or formula. • If you specify the response data in y, the table variables represent only the factors for the ANOVA. A factor value in a variable of tbl corresponds to the observation in y at the same position. tbl must have the same number of rows as the length of y. If tbl contains NaN values, then anova ignores the corresponding observations in y. • If you do not specify y, you must indicate which variable in tbl contains the response data by using the responseVarName or formula input argument. You can also choose a subset of factors in tbl to use in the ANOVA by setting the name-value argument FactorNames. The anova function associates the values of the factor variables in tbl with the response data in the same row. 35-68

anova

Note If factors or tbl contains NaN values, values, empty characters, or empty strings, the anova function ignores the corresponding observations in y. The ANOVA is balanced if each factor value has the same number of observations after the function disregards empty or NaN values. Otherwise, the function performs an unbalanced ANOVA. Example: mountain=table(altitude,temperature,soilpH); anova(mountain,"soilpH") Data Types: table responseVarName — Name of response data string scalar | character vector Name of the response data, specified as a string scalar or character vector. responseVarName indicates which variable in tbl contains the response data. When you specify responseVarName, you must also specify the tbl input argument. Example: "r" Data Types: char | string formula — ANOVA model string scalar | character vector ANOVA model, specified as a string scalar or a character vector in Wilkinson notation on page 11-93. anova supports the use of parentheses and commas to specify nested factors in formula. For example, you can specify that factor f1 is nested inside factor f2 by including the term f1(f2) in formula. To specify that f1 is nested inside two factors, f2 and f3, include the term f1(f2,f3). When you specify formula, you must also specify tbl. Example: "r ~ f1 + f2 + f3 + f1:f2:f3" Example: "MPG ~ Origin + Model(Origin)" Data Types: char | string Name-Value Pair Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Example: anova(factors,y,CategoricalFactors=[1 2],FactorNames=["school" "major" "age"],ResponseName="GPA") specifies the first two factors in factors as categorical, the factor names as "school", "major", and "age", and the name of the response variable as "GPA". CategoricalFactors — Factors to treat as categorical "all" (default) | numeric vector | logical vector | string vector | cell array of character vectors Factors to treat as categorical, specified as a numeric, logical, or string vector, or a cell array of character vectors. When CategoricalFactors is set to the default value "all", the anova function treats all factors as categorical. Specify CategoricalFactors as one of the following: • A numeric vector with indices between 1 and N, where N is the number of factor variables. The anova function treats factors with indices in CategoricalFactors as categorical. The index of a 35-69

35

Functions

factor is the order in which it appears in the columns of matrix y, the cells of factors, or the columns of tbl. • A logical vector of length N, where a true entry means that the corresponding factor is categorical. • A string vector or cell array of factor names. The factor names must match the names in tbl or FactorNames. Example: CategoricalFactors=["Location" "Smoker"] Example: CategoricalFactors=[1 3 4] Data Types: single | double | logical | char | string | cell FactorNames — Factor names string vector | cell array of character vectors Factor names, specified as a string vector or a cell array of character vectors. • If you specify tbl in the call to anova, FactorNames must be a subset of the table variables in tbl. anova uses only the factors specified in FactorNames. In this case, the default value of FactorNames is the collection of names of the factor variables in tbl. • If you specify the matrix y or factors in the call to anova, you can specify any names for FactorNames. In this case, the default value of FactorNames is ["Factor1","Factor2", …,"FactorN"], where N is the number of factors. When you specify formula, anova ignores FactorNames. Example: FactorNames=["time","latitude"] Data Types: char | string | cell ModelSpecification — Type of ANOVA model to fit "linear" (default) | "interactions" | "purequadratic" | "quadratic" | "polyIJK" | "full" | integer | string scalar | character vector | terms matrix Type of ANOVA model to fit, specified as one of the options in the following table or an integer, string scalar, character vector, or terms matrix. The default value for ModelSpecification is "linear".

35-70

Option

Terms Included in ANOVA Model

"linear" (default)

Main effect (linear) terms

"interactions"

Main effect and pairwise interaction terms

"purequadratic"

Main effects and squared main effects. All factors must be continuous to use this option. Set CategoricalFactors = [] to specify all factors as continuous.

"quadratic"

Main effects, squared main effects, and pairwise interaction terms. All factors must be continuous to use this option.

anova

Option

Terms Included in ANOVA Model

"polyIJK"

Polynomial terms up to degree I for the first factor, degree J for the second factor, and so on. The degree of an interaction term cannot exceed the maximum exponent of a main term. You must specify a degree for each factor.

"full"

Main effect and all interaction terms

To include all main effects and interaction levels up to the kth level, set ModelSpecification equal to k. When ModelSpecification is an integer, the maximum level of an interaction term in the ANOVA model is the minimum between ModelSpecification and the number of factors. If you specify formula, anova ignores ModelSpecification. You can also specify the terms of an ANOVA regression model using one of the following: • Double or single terms matrix, T, with a column for each factor. Each term in the ANOVA model is a product corresponding to a row of T. The row elements are the exponents of their corresponding 2

factors. For example, T(i,:) = [1 2 1] means that term i is (Factor1)(Factor2) (Factor3). Because the anova function automatically includes a constant term in the ANOVA model, you do not need to include a row of zeros in the terms matrix. • Character vector or string scalar formula in Wilkinson notation on page 11-93, representing one or more terms. anova supports the use of parentheses and commas to specify nested factors, as described in formula. The formula must use names contained in FactorNames, ResponseName, or table variable names if tbl is specified. Example: ModelSpecification="poly3212" Example: ModelSpecification=3 Example: ModelSpecification="r ~ c1*c2" Example: ModelSpecification=[0 0 0;1 0 0;0 1 0;0 0 1] Data Types: single | double | char | string RandomFactors — Factors to treat as random "all" | numeric vector | logical vector | string vector | cell array of character vectors Factors to treat as random rather than fixed, specified as a numeric, logical, or string vector, or a cell array of character vectors. The anova function treats an interaction term as random if it contains at least one random factor. The default value is [], meaning all factors are fixed. To specify all factors as random, set RandomFactors to "all". Specify RandomFactors as one of the following: • A numeric vector with indices between 1 and N, where N is the number of factor variables. The anova function treats factors with indices in RandomFactors as random. The index of a factor is the order in which it appears in the columns of matrix y, the cells of factors, or the columns of tbl. • A logical vector of length N, where a true entry means that the corresponding factor is random. • A string vector or cell array of factor names. The factor names must match the names in tbl or FactorNames. 35-71

35

Functions

Example: RandomFactors=[1] Example: RandomFactors=[1 0 0] Data Types: single | double | logical | char | string | cell ResponseName — Name of response variable string scalar | character vector Name of the response variable, specified as a string scalar or a character vector. If you specify responseVarName or formula, anova ignores ResponseName. Example: ResponseName="soilpH" Data Types: char | string SumOfSquaresType — Type of sum of squares "three" (default) | "two" | "one" | "hierarchical" Type of sum of squares used to perform the ANOVA, specified as "three", "two", "one", or "hierarchical". For a model containing main effects but no interactions, the value of SumOfSquaresType influences the computations on the unbalanced data only. The sum of squares of a term (SSTerm) is defined as the reduction in the sum of squares error (SSE) obtained by adding the term to a model that excludes it. The formula for the sum of squares of a term Term has the form SSTerm =

n

∑

⚬i = 1

2

(yi − f excl(g1, ..., gN)) − SSEf

excl

n

∑

⚬i = 1

2

(yi − f incl(g1, ..., gN)) SSEf

incl

where n is the number of observations, yi are the response data, g1, ..., gN are the factors used to perform the ANOVA, f excl is a model that excludes Term, and f incl is a model that includes Term. Both f excl and f incl are specified by SumOfSquaresType. The variables SSEf excl and SSEf incl are the sum of squares errors for f excl and f incl, respectively. You can specify f excl and f incl using one of the options for SumOfSquaresType described in the following table.

35-72

Option

Type of Sum of Squares

"three" (default)

f incl is the full ANOVA model specified in the property Formula. f excl is a model composed of all terms in f incl except Term. The model f excl has the same sigma-restricted coding as f incl. This type of sum of squares is known as Type III.

"two"

f excl is a model composed of all terms in the ANOVA model specified in the property Formula that do not contain Term. If Term is a continuous term, then powers of Term are treated as separate terms that do not contain Term. f incl is a model composed of Term and all the terms in f excl. This type of sum of squares is known as Type II.

anova

Option

Type of Sum of Squares

"one"

f excl is a model composed of all the terms that precede Term in the ANOVA model specified in the property Formula. f incl is a model composed of Term and all the terms in f excl. This type of sum of squares is known as Type I.

"hierarchical"

f excl and f incl are defined as in Type II, except powers of Term are treated as terms that contain Term.

Example: SumOfSquaresType="hierarchical" Data Types: char | string

Properties CategoricalFactors — Indices of categorical factors numeric vector This property is read-only. Indices of categorical factors, specified as a numeric vector. This property is set by the CategoricalFactors name-value argument. Data Types: double Coefficients — Fitted ANOVA model coefficients double vector This property is read-only. Fitted ANOVA model coefficients, specified as a double vector. The anova function expands each categorical factor into F dummy variables, where F is the number of values for the factor. Each dummy variable is fit with a different coefficient during the ANOVA. Continuous factors have coefficients that are constant across factor values. For example, let y be a set of response data and factor1 be a continuous factor. Let factor2 be a categorical factor with values value1, value2, and value3. The formula "y ~ 1 + factor1 + factor2" expands to "y ~ 1 + factor1 + (factor2==value1) + (factor2==value2) + (factor2==value3)" and anova fits the expanded formula with coefficients. Data Types: single | double ExpandedFactorNames — Names of coefficients string vector This property is read-only Names of coefficients, specified as a string vector of names. The anova function expands each categorical factor into F dummy variables, where F is the number of values for the factor. The vector ExpandedFactorNames contains the name of each dummy variable. For more information, see Coefficients. Data Types: string 35-73

35

Functions

FactorNames — Names of factors string vector This property is read-only. Names of the factors used to fit the ANOVA model, specified as a string vector of names. This property is set by the tbl input argument or the FactorNames name-value argument. Data Types: string Factors — Names and values of factors table This property is read-only. Names and values of the factors used to fit the ANOVA model, specified as a table. The names of the table variables are the factor names, and each variable contains the values of its corresponding factor. If the factors used to fit the model are not given as a table, anova converts them into a table with one column per factor. This property is set by one of the following: • tbl input argument • Matrix y input argument together with the FactorNames name-value argument • Vector y input argument together with the factors input argument and the FactorNames namevalue argument Data Types: table Formula — ANOVA model LinearFormulaWithNesting object This property is read-only. ANOVA model, specified as a LinearFormulaWithNesting object. This property is set by the formula input argument or the ModelSpecification name-value argument. Metrics — Model metrics table Model metrics, specified as a table. The table Metrics has these variables: • MSE — Mean squared error. • RMSE — Root mean squared error, which is the square root of MSE. • SSE — Sum of squares of the error. • SSR — Sum of squares regression. • SST — Total sum of squares. • RSquared — Coefficient of determination, also known as R2. • AdjustedRSquared — R2 value, adjusted for the number of coefficients. This value is given by the (n − 1)SSE 2 formula Rad j = 1 − (n − p)SST , where n is the number of observations, and p is the number of coefficients. A higher value for R2 indicates a better fit for the ANOVA model. 35-74

anova

Data Types: table NumObservations — Number of observations positive integer This property is read-only. Number of observations used to fit the ANOVA model, specified as a positive integer. Data Types: double RandomFactors — Indices of random factors numeric vector This property is read-only. Indices of random factors, specified as a numeric vector. This property is set by the RandomFactors name-value argument. Data Types: double Residuals — Residual values n-by-2 table This property is read-only. Residual values, specified as an n-by-2 table, where n is the number of observations. Residuals has two variables: • Raw contains the observed minus fitted values. • Pearson contains the raw residuals divided by the root mean squared error (RMSE). Data Types: table SumOfSquaresType — Type of sum of squares "three" (default) | "two" | "one" | "hierarchical" This property is read only. Type of sum of squares used when fitting the ANOVA model, specified as "three", "two", "one", or "hierarchical". This property is set by the SumOfSquaresType name-value argument. Data Types: string ResponseName — Name of response variable string scalar | character vector This property is read-only. Name of the response variable, specified as a string scalar or character vector. This property is set by the responseVarName input argument or the ResponseName name-value argument. Data Types: char | string Y — Response data numeric vector This property is read-only. 35-75

35

Functions

Response data used to fit the ANOVA model, specified as a numeric vector. This property is set by the y input argument, or the tbl input argument together with the responseVarName input argument. Data Types: single | double

Object Functions boxchart groupmeans multcompare plotComparisons stats varianceComponent

Box chart (box plot) for analysis of variance (ANOVA) Mean response estimates for analysis of variance (ANOVA) Multiple comparison of means for analysis of variance (ANOVA) Interactive plot of multiple comparisons of means for analysis of variance (ANOVA) Analysis of variance (ANOVA) table Variance component estimates for analysis of variance (ANOVA)

Examples Perform One-Way ANOVA for Matrix Data Load popcorn yield data. load popcorn.mat

The columns of the 6-by-3 matrix popcorn contain popcorn yield observations in cups for three different brands. Perform a one-way ANOVA to test the null hypothesis that the popcorn yield is not affected by the brand of popcorn. aov = anova(popcorn) aov = 1-way anova, constrained (Type III) sums of squares. Y ~ 1 + Factor1 SumOfSquares ____________ Factor1 Error Total

15.75 6.25 22

DF __

MeanSquares ___________

F ____

pValue __________

2 15 17

7.875 0.41667

18.9

7.9603e-05

Properties, Methods

aov is an anova object that contains the results of the one-way ANOVA. The Factor1 row of the ANOVA table shows statistics for the model term Factor1, and the Error row shows statistics for the entire model. The sum of squares and the degrees of freedom are given in the SumOfSquares and DF columns, respectively. The Total degrees of freedom is the total number of observations minus one, which is 18 – 1 = 17. The Factor1 degrees of freedom is the number of factor values minus one, which is 3 – 1 = 2. The Error degrees of freedom is the total degrees of freedom minus the Factor1 degrees of freedom, which is 17 – 2 = 15. 35-76

anova

The mean squares, given in the MeanSquares column, are calculated with the formula SumOfSquares/DF. The F-statistic is the ratio of the mean squares, which is 7.875/0.41667 = 18.9. The F-statistic follows an F-distribution with degrees of freedom 2 and 15. The p-value is calculated using the cumulative distribution function (cdf). The p-value for the F-statistic is small enough that the null hypothesis can be rejected at the 0.01 significance level. Therefore, the brand of popcorn has a significant effect on the popcorn yield.

Perform Two-Way ANOVA for Vector Data Load popcorn yield data. load popcorn.mat

The columns of the 6-by-3 matrix popcorn contain popcorn yield observations in cups for the brands Gourmet, National, and Generic. The first three rows of the matrix correspond to popcorn that was popped with an oil popper, and the last three rows correspond to popcorn that was popped with an air popper. Create string vectors containing factor values for the brand and popper type. Use the function repmat to repeat copies of strings.

brand = [repmat("Gourmet",6,1);repmat("National",6,1);repmat("Generic",6,1)]; poppertype = [repmat("Air",3,1);repmat("Oil",3,1);repmat("Air",3,1);repmat("Oil",3,1);repmat("Air factors = {brand,poppertype};

Perform a two-way ANOVA to test the null hypothesis that the popcorn yield is not affected by the brand of popcorn or the type of popper. aov = anova(factors,popcorn(:),FactorNames=["Brand" "PopperType"]) aov = 2-way anova, constrained (Type III) sums of squares. Y ~ 1 + Brand + PopperType SumOfSquares ____________ Brand PopperType Error Total

15.75 4.5 1.75 22

DF __

MeanSquares ___________

2 1 14 17

7.875 4.5 0.125

F ___

pValue __________

63 36

1e-07 3.2548e-05

Properties, Methods

aov is an anova object containing the results of the two-way ANOVA. The small p-values indicate that both the brand and popper type have a statistically significant effect on the popcorn yield. Compute the mean response estimates to see which brand and popper type produce the most popcorn. groupmeans(aov,["Brand" "PopperType"])

35-77

35

Functions

ans=6×6 table Brand __________ "Gourmet" "National" "Generic" "Gourmet" "National" "Generic"

PopperType __________ "Air" "Air" "Air" "Oil" "Oil" "Oil"

Mean ____

SE _______

5.75 4.25 3.5 6.75 5.25 4.5

0.16667 0.16667 0.16667 0.16667 0.16667 0.16667

MeanLower _________

MeanUpper _________

5.0329 3.5329 2.7829 6.0329 4.5329 3.7829

6.4671 4.9671 4.2171 7.4671 5.9671 5.2171

The table shows the mean response estimates with their standard error and 95% confidence bounds. The mean response estimates indicate that the Gourmet brand popped in an oil popper yields the most popcorn.

Perform Two-Way ANOVA with Random Effects Load the patient sample data. load patients.mat

Create a table of factors from the Age and Smoker variables. tbl = table(Age,Smoker,VariableNames=["Age" "SmokingStatus"]);

The factor SmokingStatus is a randomly sampled categorical factor, and Age is a continuous factor. Perform a two-way ANOVA to test the null hypothesis that systolic blood pressure is not affected by age or smoking status. aov = anova(tbl,Systolic,CategoricalFactors=2,RandomFactors=2) aov = 2-way anova, constrained (Type III) sums of squares. Y ~ 1 + Age + SmokingStatus SumOfSquares ____________

DF __

37.562 2182.9 2198 4461.2

1 1 97 99

Age SmokingStatus Error Total

MeanSquares ___________ 37.562 2182.9 22.659

F ______

pValue __________

1.6577 96.337

0.20098 3.3613e-16

Properties, Methods

aov is an anova object that contains the results of the two-way ANOVA. The p-value for Age is larger than 0.05. At the 95% confidence level, not enough evidence exists to reject the null hypothesis that age does not have a statistically significant effect on systolic blood pressure. SmokingStatus has a p-value smaller than 0.05, indicating that smoking status has a statistically significant effect on systolic blood pressure. 35-78

anova

To investigate whether the variability of the random factor SmokingStatus has an effect on the SmokingStatus mean square, use the object functions varianceComponent and stats. v = varianceComponent(aov) v=2×3 table VarianceComponent _________________ SmokingStatus Error

VarianceComponentLower ______________________

48.31 22.659

VarianceComponentUpper ______________________

9.0308 17.425

49707 30.68

[~,ems] = stats(aov) ems=3×5 table

Age SmokingStatus Error

Type ________

ExpectedMeanSquares ___________________________________

MeanSquaresDenominator ______________________

"fixed" "random" "random"

"5135.47*Q(Age)+V(Error)" "44.7172*V(SmokingStatus)+V(Error)" "V(Error)"

22.659 22.659

Inserting the VarianceComponent values into the SmokingStatus formula for ExpectedMeanSquares gives 44.7172*48.3098+22.6594 = 2.1829e+03. To see how much the variance component of SmokingStatus affects the expected mean squares, divide the SmokingStatus term of ExpectedMeanSquares by ExpectedMeanSquares to get 44.7172*48.3098/2.1829e+03 = 0.9896. This calculation shows that the SmokingStatus variance component contributes to almost 99% of the SmokingStatus expected mean squares.

Perform ANOVA for Data in Table Load data of the results for five exams taken by 120 students. load examgrades.mat

Create a table with variables for the math, biology, history, literature, and multi-subject comprehensive exams.

subject = ["math" "biology" "history" "literature" "comprehensive"]; grades = table(grades(:,1),grades(:,2),grades(:,3),grades(:,4),grades(:,5),VariableNames=subject) grades=120×5 table math biology ____ _______ 65 61 81 88 69 89 55 84

77 74 80 76 77 93 64 83

history _______ 69 70 71 80 74 78 60 80

literature __________ 75 66 74 88 69 77 50 77

comprehensive _____________ 69 68 79 79 76 80 63 78

35-79

35

Functions

86 84 71 81 84 81 78 67 ⋮

75 82 70 88 78 77 66 74

81 86 73 80 80 81 90 73

87 92 81 79 74 83 84 76

79 85 79 83 80 79 75 72

Perform a four-way ANOVA for the continuous factors math, biology, history, and literature, and the response data comprehensive. aov = anova(grades,"comprehensive",CategoricalFactors = []) aov = N-way anova, constrained (Type III) sums of squares. comprehensive ~ 1 + math + biology + history + literature SumOfSquares ____________

DF ___

58.973 100.35 243.89 152.22 1094.5 3291

1 1 1 1 115 119

math biology history literature Error Total

MeanSquares ___________ 58.973 100.35 243.89 152.22 9.5173

F ______

pValue __________

6.1964 10.544 25.626 15.994

0.014231 0.0015275 1.5901e-06 0.00011269

Properties, Methods

aov is an anova object that contains the results of the four-way ANOVA. The p-values of all factors are all smaller than 0.05, indicating that each subject exam can be used to predict a student's grade on the comprehensive exam. Display the estimated coefficients of the ANOVA model. coef = aov.Coefficients coef = 5×1 21.9901 0.0997 0.1805 0.2563 0.1701

The coefficient corresponding to the history exam is the largest; therefore, history makes the largest contribution to the predicted value of comprehensive.

35-80

anova

Compare Two anova Objects Created Using Table Load popcorn yield data. load popcorn.mat

The columns of the 6-by-3 matrix popcorn contain popcorn yield observations for the brands Gourmet, National, and Generic. The first three rows of the matrix correspond to popcorn that was popped with an oil popper, and the last three rows correspond to popcorn that was popped with an air popper. Create a table containing variables representing the brand, popper type, and popcorn yield by using the repmat and table functions.

brand = [repmat("Gourmet",6,1);repmat("National",6,1);repmat("Generic",6,1)]; poppertype = [repmat("air",3,1);repmat("oil",3,1);repmat("air",3,1);repmat("oil",3,1);repmat("air tbl = table(brand,poppertype,popcorn(:),VariableNames=["Brand" "PopperType" "PopcornYield"]);

Perform a two-way ANOVA to test the null hypothesis that the popcorn yield is the same across the three brands and the two popper types. Specify the ANOVA model formula using Wilkinson notation. aovLinear = anova(tbl,"PopcornYield ~ Brand + PopperType") aovLinear = 2-way anova, constrained (Type III) sums of squares. PopcornYield ~ 1 + Brand + PopperType SumOfSquares ____________ Brand PopperType Error Total

15.75 4.5 1.75 22

DF __

MeanSquares ___________

2 1 14 17

7.875 4.5 0.125

F ___

pValue __________

63 36

1e-07 3.2548e-05

Properties, Methods

aovLinear is an anova object that contains the results of the two-way ANOVA. The ANOVA model for aovLinear is linear and does not include an interaction term. The small p-values indicate that both the brand and popper type have a significant effect on the popcorn yield. To investigate whether the interaction between the brand and popper type has a significant effect on the popcorn yield, perform a two-way ANOVA with a model that contains the interaction term Brand:PopperType. aovInteraction = anova(tbl,"PopcornYield ~ Brand + PopperType + Brand:PopperType") aovInteraction = 2-way anova, constrained (Type III) sums of squares. PopcornYield ~ 1 + Brand*PopperType SumOfSquares ____________

DF __

MeanSquares ___________

F ____

pValue __________

35-81

35

Functions

Brand PopperType Brand:PopperType Error Total

15.75 4.5 0.083333 1.6667 22

2 1 2 12 17

7.875 4.5 0.041667 0.13889

56.7 32.4 0.3

7.679e-07 0.00010037 0.74622

Properties, Methods

The ANOVA model for the anova object aovInteraction includes the interaction term Brand:PopperType. The p-value for the Brand:PopperType term is larger than 0.05. Therefore, not enough evidence exists to conclude that the brand and popper type have an interaction effect on the popcorn yield. The Metrics property of an anova object provides statistics about the fit of the ANOVA model. To determine which model is a better fit for the response data, display the Metrics property of aovLinear and aovInteraction. aovLinear.Metrics ans=1×7 table MSE RMSE _____ _______ 0.125

0.35355

SSE ____

SSR _____

SST ___

RSquared ________

1.75

20.25

22

0.92045

AdjustedRSquared ________________ 0.88731

aovInteraction.Metrics ans=1×7 table MSE _______ 0.13889

RMSE _______

SSE ______

SSR ______

SST ___

RSquared ________

0.37268

1.6667

20.333

22

0.92424

AdjustedRSquared ________________ 0.78535

The metrics tables show that the mean squared error (MSE) is slightly smaller for the linear model than for the interaction model. The adjusted R-squared value is higher for the linear model. Together, these metrics suggest that the linear model is a better fit for the popcorn data than the interaction model.

Perform Nested Two-Way ANOVA Load the sample car data. load carbig.mat

The variable Model contains data for the car model, and the variable Origin contains data for the country in which the car is manufactured. Convert Model and Origin from character arrays with trailing whitespace to string vectors. 35-82

anova

Model = strtrim(string(Model)); Origin = strtrim(string(Origin));

The variable MPG contains mileage data for the cars. Create a table containing data for the model, country of origin, and mileage of the cars manufactured in Japan and the United States.

idxJapanUSA = (Origin=="Japan"|Origin=="USA"); tbl = table(Model(idxJapanUSA),Origin(idxJapanUSA),MPG(idxJapanUSA),VariableNames=["Origin" "Mode

Japan and the United States each manufacture a unique set of models. Therefore, the factor Model is nested in the factor Origin. Perform a two-way, nested ANOVA to test the null hypothesis that the car mileage is the same between the models and countries of origin. aov = anova(tbl,"MPG ~ Origin + Model(Origin)") aov = 2-way anova, constrained (Type III) sums of squares. MPG ~ 1 + Origin + Model(Origin) SumOfSquares ____________

DF ___

18873 0 633.26 19506

244 0 83 327

Origin Model(Origin) Error Total

MeanSquares ___________ 77.347 0 7.6296

F ______

pValue __________

10.138 0

3.0582e-25 NaN

Properties, Methods

The small p-values indicate that the null hypothesis can be rejected at the 99% confidence level. Enough evidence exists to conclude that the model of the car and the country of origin have a statistically significant effect on the car mileage.

Algorithms ANOVA partitions the total variation in the response data into two components: • Variation in the relationship between the factor data and the response data, as described by the ANOVA model. This variation is known as the sum of squares regression (SSR). The SSR is n

represented by the equation

∑

2

i=1

(y i − y) , where n is the number of observations in the sample, y i

is the predicted value of observation i, and y is the sample mean. • Variation in the data due to the ANOVA model error term, known as the sum of squares error (SSE). The SSE is represented by the equation

n

∑

i=1

2

(yi − y i) , where yi is the value of observation i.

With the above partitioning, the total sum of squares (SST) is represented by n

∑

⚬i = 1

2

(yi − y) = SST

n

∑

⚬i = 1

2

(y i − y) + SSR

n

∑

⚬i = 1

2

(yi − y i) SSE

35-83

35

Functions

The anova function calculates the sum of squares of a term (SSTerm) in the ANOVA model by measuring the reduction in the SSE when the term is added to a comparison model. The comparison model is given by aov.SumOfSquaresType (see SumOfSquaresType for more information). ANOVA uses SSE and SSTerm to perform an F-test. For categorical main effects, the null hypothesis is that the term's coefficient is the same across all groups. For continuous and interaction terms, the null hypothesis is that the term's coefficient is zero. A zero coefficient means that the value of the term does not have an effect on the response data. The F-statistic is calculated as F=

MSTerm SSTerm /df Term = SSE/df Error MSError

In the above formula, df Term is the degrees of freedom of a term, df Error is the degrees of freedom of the error, and MSTerm and MSError are the mean squares of the term and error, respectively. The anova function displays a component ANOVA table with rows for the model terms and error. The columns of the ANOVA table are described as follows: Column

Definition

SumOfSquares

Sum of squares

DF

Degrees of freedom

MeanSquares

Mean squares, which is the ratio SumOfSquares/DF

F

F-statistic, which is the source mean square to error mean square ratio

pValue

p-value, which is the probability that the Fstatistic, as computed under the null hypothesis, can take a value larger than the computed teststatistic value. anova derives this probability from the cdf of the F-distribution

Version History Introduced in R2022b

References [1] Wackerly, D. D., W. Mendenhall, III, and R. L. Scheaffer. Mathematical Statistics with Applications, 7th ed. Belmont, CA: Brooks/Cole, 2008. [2] Dunn, O. J., and V. A. Clark Hoboken. Applied Statistics: Analysis of Variance and Regression. NJ: John Wiley & Sons, Inc., 1974.

See Also anova | anovan | anova2 | anova1 | “N-Way ANOVA” on page 9-28 | “One-Way ANOVA” on page 9-2 | “Two-Way ANOVA” on page 9-11

35-84

anova1

anova1 One-way analysis of variance

Syntax p = anova1(y) p = anova1(y,group) p = anova1(y,group,displayopt) [p,tbl] = anova1( ___ ) [p,tbl,stats] = anova1( ___ )

Description p = anova1(y) performs one-way ANOVA on page 9-2 for the sample data y and returns the pvalue. anova1 treats each column of y as a separate group. The function tests the hypothesis that the samples in the columns of y are drawn from populations with the same mean against the alternative hypothesis that the population means are not all the same. The function also displays the box plot on page 35-95 for each group in y and the standard ANOVA table (tbl). p = anova1(y,group) performs one-way ANOVA for the sample data y, grouped by group. p = anova1(y,group,displayopt) enables the ANOVA table and box plot displays when displayopt is 'on' (default) and suppresses the displays when displayopt is 'off'. [p,tbl] = anova1( ___ ) returns the ANOVA table (including column and row labels) in the cell array tbl using any of the input argument combinations in the previous syntaxes. To copy a text version of the ANOVA table to the clipboard, select Edit > Copy Text from the ANOVA table figure. [p,tbl,stats] = anova1( ___ ) returns a structure, stats, which you can use to perform a multiple comparison test on page 9-19. A multiple comparison test enables you to determine which pairs of group means are significantly different. To perform this test, use multcompare, providing the stats structure as an input argument.

Examples One-Way ANOVA Create sample data matrix y with columns that are constants, plus random normal disturbances with mean 0 and standard deviation 1. y = meshgrid(1:5); rng default; % For reproducibility y = y + normrnd(0,1,5,5) y = 5×5 1.5377 2.8339 -1.2588

0.6923 1.5664 2.3426

1.6501 6.0349 3.7254

3.7950 3.8759 5.4897

5.6715 3.7925 5.7172

35-85

35

Functions

1.8622 1.3188

5.5784 4.7694

2.9369 3.7147

5.4090 5.4172

6.6302 5.4889

Perform one-way ANOVA. p = anova1(y)

p = 0.0023

The ANOVA table shows the between-groups variation (Columns) and within-groups variation (Error). SS is the sum of squares, and df is the degrees of freedom. The total degrees of freedom is total number of observations minus one, which is 25 - 1 = 24. The between-groups degrees of freedom is number of groups minus one, which is 5 - 1 = 4. The within-groups degrees of freedom is total degrees of freedom minus the between groups degrees of freedom, which is 24 - 4 = 20. 35-86

anova1

MS is the mean squared error, which is SS/df for each source of variation. The F-statistic is the ratio of the mean squared errors (13.4309/2.2204). The p-value is the probability that the test statistic can take a value greater than the value of the computed test statistic, i.e., P(F > 6.05). The small p-value of 0.0023 indicates that differences between column means are significant.

Compare Beam Strength Using One-Way ANOVA Input the sample data. strength = [82 86 79 83 84 85 86 87 74 82 ... 78 75 76 77 79 79 77 78 82 79]; alloy = {'st','st','st','st','st','st','st','st',... 'al1','al1','al1','al1','al1','al1',... 'al2','al2','al2','al2','al2','al2'};

The data are from a study of the strength of structural beams in Hogg (1987). The vector strength measures deflections of beams in thousandths of an inch under 3000 pounds of force. The vector alloy identifies each beam as steel ('st'), alloy 1 ('al1'), or alloy 2 ('al2'). Although alloy is sorted in this example, grouping variables do not need to be sorted. Test the null hypothesis that the steel beams are equal in strength to the beams made of the two more expensive alloys. Turn the figure display off and return the ANOVA results in a cell array. [p,tbl] = anova1(strength,alloy,'off') p = 1.5264e-04 tbl=4×6 cell array {'Source'} {'SS' } {'Groups'} {[184.8000]} {'Error' } {[102.0000]} {'Total' } {[286.8000]}

{'df'} {[ 2]} {[17]} {[19]}

{'MS' } {[ 92.4000]} {[ 6.0000]} {0x0 double}

{'F' } {[ 15.4000]} {0x0 double} {0x0 double}

{'Prob>F' } {[1.5264e-04]} {0x0 double } {0x0 double }

The total degrees of freedom is total number of observations minus one, which is 20 − 1 = 19. The between-groups degrees of freedom is number of groups minus one, which is 3 − 1 = 2. The withingroups degrees of freedom is total degrees of freedom minus the between groups degrees of freedom, which is 19 − 2 = 17. MS is the mean squared error, which is SS/df for each source of variation. The F-statistic is the ratio of the mean squared errors. The p-value is the probability that the test statistic can take a value greater than or equal to the value of the test statistic. The p-value of 1.5264e-04 suggests rejection of the null hypothesis. You can retrieve the values in the ANOVA table by indexing into the cell array. Save the F-statistic value and the p-value in the new variables Fstat and pvalue. Fstat = tbl{2,5} Fstat = 15.4000 pvalue = tbl{2,6} pvalue = 1.5264e-04

35-87

35

Functions

Multiple Comparisons for One-Way ANOVA Input the sample data. strength = [82 86 79 83 84 85 86 87 74 82 ... 78 75 76 77 79 79 77 78 82 79]; alloy = {'st','st','st','st','st','st','st','st',... 'al1','al1','al1','al1','al1','al1',... 'al2','al2','al2','al2','al2','al2'};

The data are from a study of the strength of structural beams in Hogg (1987). The vector strength measures deflections of beams in thousandths of an inch under 3000 pounds of force. The vector alloy identifies each beam as steel (st), alloy 1 (al1), or alloy 2 (al2). Although alloy is sorted in this example, grouping variables do not need to be sorted. Perform one-way ANOVA using anova1. Return the structure stats, which contains the statistics multcompare needs for performing “Multiple Comparisons” on page 9-19. [~,~,stats] = anova1(strength,alloy);

35-88

anova1

The small p-value of 0.0002 suggests that the strength of the beams is not the same. Perform a multiple comparison of the mean strength of the beams. [c,~,~,gnames] = multcompare(stats);

35-89

35

Functions

In the figure, the blue bar represents the comparison interval for mean material strength for steel. The red bars represent the comparison intervals for the mean material strength for alloy 1 and alloy 2. Neither of the red bars overlaps with the blue bar, which indicates that the mean material strength for steel is significantly different from that of alloy 1 and alloy 2. You can confirm the significant difference by clicking the bars that represent alloy 1 and 2. Display the multiple comparison results and the corresponding group names in a table. tbl = array2table(c,"VariableNames", ... ["Group A","Group B","Lower Limit","A-B","Upper Limit","P-value"]); tbl.("Group A") = gnames(tbl.("Group A")); tbl.("Group B") = gnames(tbl.("Group B")) tbl=3×6 table Group A _______ {'st' } {'st' } {'al1'}

Group B _______ {'al1'} {'al2'} {'al2'}

Lower Limit ___________ 3.6064 1.6064 -5.628

A-B ___ 7 5 -2

Upper Limit ___________ 10.394 8.3936 1.628

P-value __________ 0.00016831 0.0040464 0.35601

The first two columns show the pair of groups that are compared. The fourth column shows the difference between the estimated group means. The third and fifth columns show the lower and upper limits for the 95% confidence intervals of the true difference of means. The sixth column shows the pvalue for a hypothesis that the true difference of means for the corresponding groups is equal to zero. 35-90

anova1

The first two rows show that both comparisons involving the first group (steel) have confidence intervals that do not include zero. Because the corresponding p-values (1.6831e-04 and 0.0040, respectively) are small, those differences are significant. The third row shows that the differences in strength between the two alloys is not significant. A 95% confidence interval for the difference is [-5.6,1.6], so you cannot reject the hypothesis that the true difference is zero. The corresponding p-value of 0.3560 in the sixth column confirms this result.

Input Arguments y — sample data vector | matrix Sample data, specified as a vector or matrix. • If y is a vector, you must specify the group input argument. Each element in group represents a group name of the corresponding element in y. The anova1 function treats the y values corresponding to the same value of group as part of the same group. Use this design when groups have different numbers of elements (unbalanced ANOVA).

• If y is a matrix and you do not specify group, then anova1 treats each column of y as a separate group. In this design, the function evaluates whether the population means of the columns are equal. Use this design when each group has the same number of elements (balanced ANOVA).

• If y is a matrix and you specify group, then each element in group represents a group name for the corresponding column in y. The anova1 function treats the columns that have the same group name as part of the same group.

35-91

35

Functions

Note anova1 ignores any NaN values in y. Also, if group contains empty or NaN values, anova1 ignores the corresponding observations in y. The anova1 function performs balanced ANOVA if each group has the same number of observations after the function disregards empty or NaN values. Otherwise, anova1 performs unbalanced ANOVA. Data Types: single | double group — Grouping variable numeric vector | logical vector | categorical vector | character array | string array | cell array of character vectors Grouping variable containing group names, specified as a numeric vector, logical vector, categorical vector, character array, string array, or cell array of character vectors. • If y is a vector, then each element in group represents a group name of the corresponding element in y. The anova1 function treats the y values corresponding to the same value of group as part of the same group.

N is the total number of observations. • If y is a matrix, then each element in group represents a group name for the corresponding column in y. The anova1 function treats the columns of y that have the same group name as part of the same group.

35-92

anova1

If you do not want to specify group names for the matrix sample data y, enter an empty array ([]) or omit this argument. In this case, anova1 treats each column of y as a separate group. If group contains empty or NaN values, anova1 ignores the corresponding observations in y. For more information on grouping variables, see “Grouping Variables” on page 2-11. Example: 'group',[1,2,1,3,1,...,3,1] when y is a vector with observations categorized into groups 1, 2, and 3 Example: 'group',{'white','red','white','black','red'} when y is a matrix with five columns categorized into groups red, white, and black Data Types: single | double | logical | categorical | char | string | cell displayopt — Indicator to display ANOVA table and box plot 'on' (default) | 'off' Indicator to display the ANOVA table and box plot, specified as 'on' or 'off'. When displayopt is 'off', anova1 returns the output arguments, only. It does not display the standard ANOVA table and box plot. Example: p = anova(x,group,'off')

Output Arguments p — p-value for the F-test scalar value p-value for the F-test, returned as a scalar value. p-value is the probability that the F-statistic can take a value larger than the computed test-statistic value. anova1 tests the null hypothesis that all group means are equal to each other against the alternative hypothesis that at least one group mean is different from the others. The function derives the p-value from the cdf of the F-distribution. A p-value that is smaller than the significance level indicates that at least one of the sample means is significantly different from the others. Common significance levels are 0.05 or 0.01. tbl — ANOVA table cell array ANOVA table, returned as a cell array. tbl has six columns. 35-93

35

Functions

Column

Definition

source

The source of the variability.

SS

The sum of squares due to each source.

df

The degrees of freedom associated with each source. Suppose N is the total number of observations and k is the number of groups. Then, N – k is the within-groups degrees of freedom (Error), k – 1 is the between-groups degrees of freedom (Columns), and N – 1 is the total degrees of freedom. N – 1 = (N – k) + (k – 1)

MS

The mean squares for each source, which is the ratio SS/df.

F

F-statistic, which is the ratio of the mean squares.

Prob>F

The p-value, which is the probability that the Fstatistic can take a value larger than the computed test-statistic value. anova1 derives this probability from the cdf of F-distribution.

The rows of the ANOVA table show the variability in the data that is divided by the source. Row

Definition

Groups

Variability due to the differences among the group means (variability between groups)

Error

Variability due to the differences between the data in each group and the group mean (variability within groups)

Total

Total variability

stats — Statistics for multiple comparison tests structure Statistics for multiple comparison tests on page 9-19, returned as a structure with the fields described in this table.

35-94

Field name

Definition

gnames

Names of the groups

n

Number of observations in each group

source

Source of the stats output

means

Estimated values of the means

df

Error (within-groups) degrees of freedom (N – k, where N is the total number of observations and k is the number of groups)

s

Square root of the mean squared error

anova1

More About Box Plot anova1 returns a box plot of the observations for each group in y. Box plots provide a visual comparison of the group location parameters. On each box, the central mark is the median (2nd quantile, q2) and the edges of the box are the 25th and 75th percentiles (1st and 3rd quantiles, q1 and q3, respectively). The whiskers extend to the most extreme data points that are not considered outliers. The outliers are plotted individually using the '+' symbol. The extremes of the whiskers correspond to q3 + 1.5 × (q3 – q1) and q1 – 1.5 × (q3 – q1). Box plots include notches for the comparison of the median values. Two medians are significantly different at the 5% significance level if their intervals, represented by notches, do not overlap. This test is different from the F-test that ANOVA performs; however, large differences in the center lines of the boxes correspond to a large F-statistic value and correspondingly a small p-value. The extremes of the notches correspond to q2 – 1.57(q3 – q1)/sqrt(n) and q2 + 1.57(q3 – q1)/sqrt(n), where n is the number of observations without any NaN values. In some cases, notches can extend outside the boxes.

For more information about box plots, see 'Whisker' and 'Notch' of boxplot.

Alternative Functionality Instead of using anova1, you can create an anova object by using the anova function. The anova function provides these advantages: 35-95

35

Functions

• The anova function allows you to specify the ANOVA model type, sum of squares type, and factors to treat as categorical. anova also supports table predictor and response input arguments. • In addition to the outputs returned by anova1, the properties of the anova object contain the following: • ANOVA model formula • Fitted ANOVA model coefficients • Residuals • Factors and response data • The anova object functions allow you to conduct further analysis after fitting the anova object. For example, you can create an interactive plot of multiple comparisons of means for the ANOVA, get the mean response estimates for each value of a factor, and calculate the variance component estimates.

Version History Introduced before R2006a

References [1] Hogg, R. V., and J. Ledolter. Engineering Statistics. New York: MacMillan, 1987.

See Also anova | anova2 | anovan | boxplot | multcompare Topics “Perform One-Way ANOVA” on page 9-4 “One-Way ANOVA” on page 9-2 “Multiple Comparisons” on page 9-19

35-96

anova2

anova2 Two-way analysis of variance

Syntax p = anova2(y,reps) p = anova2(y,reps,displayopt) [p,tbl] = anova2( ___ ) [p,tbl,stats] = anova2( ___ )

Description anova2 performs two-way analysis of variance (ANOVA) with balanced designs. To perform two-way ANOVA with unbalanced designs, see anovan. p = anova2(y,reps) returns the p-values for a balanced two-way ANOVA for comparing the means of two or more columns and two or more rows of the observations in y. reps is the number of replicates for each combination of factor groups, which must be constant, indicating a balanced design. For unbalanced designs, use anovan. The anova2 function tests the main effects for column and row factors and their interaction effect. To test the interaction effect, reps must be greater than 1. anova2 also displays the standard ANOVA table. p = anova2(y,reps,displayopt) enables the ANOVA table display when displayopt is 'on' (default) and suppresses the display when displayopt is 'off'. [p,tbl] = anova2( ___ ) returns the ANOVA table (including column and row labels) in cell array tbl. To copy a text version of the ANOVA table to the clipboard, select Edit > Copy Text menu. [p,tbl,stats] = anova2( ___ ) returns a stats structure, which you can use to perform a multiple comparison test on page 9-19. A multiple comparison test enables you to determine which pairs of group means are significantly different. To perform this test, use multcompare, providing the stats structure as input.

Examples Two-Way ANOVA Load the sample data. load popcorn popcorn popcorn = 6×3 5.5000 5.5000 6.0000

4.5000 4.5000 4.0000

3.5000 4.0000 3.0000

35-97

35

Functions

6.5000 7.0000 7.0000

5.0000 5.5000 5.0000

4.0000 5.0000 4.5000

The data is from a study of popcorn brands and popper types (Hogg 1987). The columns of the matrix popcorn are brands, Gourmet, National, and Generic, respectively. The rows are popper types, oil and air. In the study, researchers popped a batch of each brand three times with each popper, that is, the number of replications is 3. The first three rows correspond to the oil popper, and the last three rows correspond to the air popper. The response values are the yield in cups of popped popcorn. Perform a two-way ANOVA. Save the ANOVA table in the cell array tbl for easy access to results. [p,tbl] = anova2(popcorn,3);

The column Prob>F shows the p-values for the three brands of popcorn (0.0000), the two popper types (0.0001), and the interaction between brand and popper type (0.7462). These values indicate that popcorn brand and popper type affect the yield of popcorn, but there is no evidence of an interaction effect of the two. Display the cell array containing the ANOVA table. tbl tbl=6×6 cell array {'Source' } {'Columns' } {'Rows' } {'Interaction'} {'Error' } {'Total' }

{'SS' } {[15.7500]} {[ 4.5000]} {[ 0.0833]} {[ 1.6667]} {[ 22]}

{'df'} {[ 2]} {[ 1]} {[ 2]} {[12]} {[17]}

{'MS' } {[ 7.8750]} {[ 4.5000]} {[ 0.0417]} {[ 0.1389]} {0x0 double}

{'F' } {[ 56.7000]} {[ 32.4000]} {[ 0.3000]} {0x0 double} {0x0 double}

Store the F-statistic for the factors and factor interaction in separate variables. Fbrands = tbl{2,5} Fbrands = 56.7000 Fpoppertype = tbl{3,5} Fpoppertype = 32.4000

35-98

{'Prob>F' } {[7.6790e-07]} {[1.0037e-04]} {[ 0.7462]} {0x0 double } {0x0 double }

anova2

Finteraction = tbl{4,5} Finteraction = 0.3000

Multiple Comparisons for Two-Way ANOVA Load the sample data. load popcorn popcorn popcorn = 6×3 5.5000 5.5000 6.0000 6.5000 7.0000 7.0000

4.5000 4.5000 4.0000 5.0000 5.5000 5.0000

3.5000 4.0000 3.0000 4.0000 5.0000 4.5000

The data is from a study of popcorn brands and popper types (Hogg 1987). The columns of the matrix popcorn are brands (Gourmet, National, and Generic). The rows are popper types oil and air. The first three rows correspond to the oil popper, and the last three rows correspond to the air popper. In the study, researchers popped a batch of each brand three times with each popper. The values are the yield in cups of popped popcorn. Perform a two-way ANOVA. Also compute the statistics that you need to perform a multiple comparison test on the main effects. [~,~,stats] = anova2(popcorn,3,"off") stats = struct with fields: source: 'anova2' sigmasq: 0.1389 colmeans: [6.2500 4.7500 4] coln: 6 rowmeans: [4.5000 5.5000] rown: 9 inter: 1 pval: 0.7462 df: 12

The stats structure includes • The mean squared error (sigmasq) • The estimates of the mean yield for each popcorn brand (colmeans) • The number of observations for each popcorn brand (coln) • The estimate of the mean yield for each popper type (rowmeans) • The number of observations for each popper type (rown) • The number of interactions (inter) 35-99

35

Functions

• The p-value that shows the significance level of the interaction term (pval) • The error degrees of freedom (df). Perform a multiple comparison test to see if the popcorn yield differs between pairs of popcorn brands (columns). c1 = multcompare(stats); Note: Your model includes an interaction term. A test of main effects can be difficult to interpret when the model includes interactions.

The figure shows the multiple comparisons of the means. By default, the group 1 mean is highlighted and the comparison interval is in blue. Because the comparison intervals for the other two groups do not intersect with the intervals for the group 1 mean, they are highlighted in red. This lack of intersection indicates that both means are different than group 1 mean. Select other group means to confirm that all group means are significantly different from each other. Display the multiple comparison results in a table. tbl1 = array2table(c1,"VariableNames", ... ["Group A","Group B","Lower Limit","A-B","Upper Limit","P-value"]) tbl1=3×6 table Group A Group B _______ _______ 1

35-100

2

Lower Limit ___________ 0.92597

A-B ____ 1.5

Upper Limit ___________

P-value __________

2.074

4.1188e-05

anova2

1 2

3 3

1.676 0.17597

2.25 0.75

2.824 1.324

6.1588e-07 0.011591

The first two columns of c1 show the groups that are compared. The fourth column shows the difference between the estimated group means. The third and fifth columns show the lower and upper limits for 95% confidence intervals for the true mean difference. The sixth column contains the pvalue for a hypothesis test that the corresponding mean difference is equal to zero. All p-values are very small, which indicates that the popcorn yield differs across all three brands. Perform a multiple comparison test to see the popcorn yield differs between the two popper types (rows). c2 = multcompare(stats,"Estimate","row"); Note: Your model includes an interaction term. A test of main effects can be difficult to interpret when the model includes interactions.

tbl2 = array2table(c2,"VariableNames", ... ["Group A","Group B","Lower Limit","A-B","Upper Limit","P-value"]) tbl2=1×6 table Group A Group B _______ _______ 1

2

Lower Limit ___________ -1.3828

A-B ___ -1

Upper Limit ___________ -0.61722

P-value __________ 0.00010037

35-101

35

Functions

The small p-value indicates that the popcorn yield differs between the two popper types (air and oil). The figure shows the same results. The disjoint comparison intervals indicate that the group means are significantly different from each other.

Input Arguments y — Sample data matrix Sample data, specified as a matrix. The columns correspond to groups of one factor, and the rows correspond to the groups of the other factor and the replications. Replications are the measurements or observations for each combination of groups (levels) of the row and column factor. For example, in the following data the row factor A has three levels, column factor B has two levels, and there are two replications (reps = 2). The subscripts indicate row, column, and replication, respectively. B=1 B=2 y111 y121 y112 y122 y211 y221 y212 y222

A=1 A=2 A=3

y311 y321 y312 y322 Data Types: single | double reps — Number of replications 1 (default) | an integer number Number of replications for each combination of groups, specified as an integer number. For example, the following data has two replications (reps = 2) for each group combination of row factor A and column factor B. B=1 B=2 y111 y121 y112 y122 y211 y221 y212 y222

A=1 A=2 A=3

y311 y321 y312 y322 • When reps is 1 (default), anova2 returns two p-values in vector p: • The p-value for the null hypothesis that all samples from factor B (i.e., all column samples in y) are drawn from the same population. • The p-value for the null hypothesis, that all samples from factor A (i.e., all row samples in y) are drawn from the same population. • When reps is greater than 1, anova2 also returns the p-value for the null hypothesis that factors A and B have no interaction (i.e., the effects due to factors A and B are additive). 35-102

anova2

Example: p = anova(y,3) specifies that each combination of groups (levels) has three replications. Data Types: single | double displayopt — Indicator to display the ANOVA table 'on' (default) | 'off' Indicator to display the ANOVA table as a figure, specified as 'on' or 'off'.

Output Arguments p — p-value scalar value p-value for the F-test, returned as a scalar value. A small p-value indicates that the results are statistically significant. Common significance levels are 0.05 or 0.01. For example: • A sufficiently small p-value for the null hypothesis for group means of row factor A suggests that at least one row-sample mean is significantly different from the other row-sample means; i.e., there is a main effect due to factor A • A sufficiently small p-value for the null hypothesis for group (level) means of column factor B suggests that at least one column-sample mean is significantly different from the other columnsample means; i.e., there is a main effect due to factor B. • A sufficiently small p-value for combinations of groups (levels) of factors A and B suggests that there is an interaction between factors A and B. tbl — ANOVA table cell array ANOVA table, returned as a cell array. tbl has six columns. Column name

Definition

source

Source of the variability.

SS

Sum of squares due to each source.

df

Degrees of freedom associated with each source.

MS

Mean squares for each source, which is the ratio SS/df.

F

F-statistic, which is the ratio of the mean squares.

Prob>F

p-value, which is the probability that the Fstatistic can take a value larger than the computed test-statistic value. anova2 derives this probability from the cdf of the F-distribution.

The rows of the ANOVA table show the variability in the data, divided by the source into three or four parts, depending on the value of reps. Row

Definition

Columns

Variability due to the differences among the column means

35-103

35

Functions

Row

Definition

Rows

Variability due to the differences among the row means

Interaction

Variability due to the interaction between rows and columns (if reps is greater than its default value of 1)

Error

Remaining variability not explained by any systematic source

Data Types: cell stats — Statistics for multiple comparison test structure Statistics for multiple comparisons tests on page 9-19, returned as a structure. Use multcompare to perform multiple comparison tests, supplying stats as an input argument. stats has nine fields. Field

Definition

source

Source of the stats output

sigmasq

Mean squared error

colmeans

Estimated values of the column means

coln

Number of observations for each group in columns

rowmeans

Estimated values of the row means

rown

Number of observations for each group in rows

inter

Number of interactions

pval

p-value for the interaction term

df

Error degrees of freedom (reps — 1)*r*c where reps is the number of replications and c and r are the number of groups in factors, respectively.

Data Types: struct

Alternative Functionality Instead of using anova2, you can create an anova object by using the anova function. The anova function provides these advantages: • The anova function allows you to specify the ANOVA model type, sum of squares type, and factors to treat as categorical. anova also supports table predictor and response input arguments. • In addition to the outputs returned by anova2, the properties of the anova object contain the following: • ANOVA model formula • Fitted ANOVA model coefficients • Residuals 35-104

anova2

• Factors and response data • The anova object functions allow you to conduct further analysis after fitting the anova object. For example, you can create an interactive plot of multiple comparisons of means for the ANOVA, get the mean response estimates for each value of a factor, and calculate the variance component estimates.

Version History Introduced before R2006a

References [1] Hogg, R. V., and J. Ledolter. Engineering Statistics. New York: MacMillan, 1987.

See Also anova | anova1 | anovan | multcompare Topics “Perform Two-Way ANOVA” on page 9-13 “Two-Way ANOVA” on page 9-11 “Multiple Comparisons” on page 9-19

35-105

35

Functions

anovan N-way analysis of variance

Syntax p = anovan(y,group) p = anovan(y,group,Name,Value) [p,tbl] = anovan( ___ ) [p,tbl,stats] = anovan( ___ ) [p,tbl,stats,terms] = anovan( ___ )

Description p = anovan(y,group) returns a vector of p-values, one per term, for multiway (n-way) analysis of variance (ANOVA) for testing the effects of multiple factors on the mean of the vector y. anovan also displays a figure showing the standard ANOVA table. p = anovan(y,group,Name,Value) returns a vector of p-values for multiway (n-way) ANOVA using additional options specified by one or more Name,Value pair arguments. For example, you can specify which predictor variable is continuous, if any, or the type of sum of squares to use. [p,tbl] = anovan( ___ ) returns the ANOVA table (including factor labels) in cell array tbl for any of the input arguments specified in the previous syntaxes. Copy a text version of the ANOVA table to the clipboard by using the Copy Text item on the Edit menu. [p,tbl,stats] = anovan( ___ ) returns a stats structure that you can use to perform a multiple comparison test on page 9-19, which enables you to determine which pairs of group means are significantly different. You can perform such a test using the multcompare function by providing the stats structure as input. [p,tbl,stats,terms] = anovan( ___ ) returns the main and interaction terms used in the ANOVA computations in terms.

Examples Three-Way ANOVA Load the sample data. y = [52.7 57.5 45.9 44.5 53.0 57.0 45.9 44.0]'; g1 = [1 2 1 2 1 2 1 2]; g2 = {'hi';'hi';'lo';'lo';'hi';'hi';'lo';'lo'}; g3 = {'may';'may';'may';'may';'june';'june';'june';'june'};

y is the response vector and g1, g2, and g3 are the grouping variables (factors). Each factor has two levels, and every observation in y is identified by a combination of factor levels. For example, 35-106

anovan

observation y(1) is associated with level 1 of factor g1, level 'hi' of factor g2, and level 'may' of factor g3. Similarly, observation y(6) is associated with level 2 of factor g1, level 'hi' of factor g2, and level 'june' of factor g3. Test if the response is the same for all factor levels. p = anovan(y,{g1,g2,g3})

p = 3×1 0.4174 0.0028 0.9140

In the ANOVA table, X1, X2, and X3 correspond to the factors g1, g2, and g3, respectively. The pvalue 0.4174 indicates that the mean responses for levels 1 and 2 of the factor g1 are not significantly different. Similarly, the p-value 0.914 indicates that the mean responses for levels 'may' and 'june', of the factor g3 are not significantly different. However, the p-value 0.0028 is small enough to conclude that the mean responses are significantly different for the two levels, 'hi' and 'lo' of the factor g2. By default, anovan computes p-values just for the three main effects. Test the two-factor interactions. This time specify the variable names. p = anovan(y,{g1 g2 g3},'model','interaction','varnames',{'g1','g2','g3'})

35-107

35

Functions

p = 6×1 0.0347 0.0048 0.2578 0.0158 0.1444 0.5000

The interaction terms are represented by g1*g2, g1*g3, and g2*g3 in the ANOVA table. The first three entries of p are the p-values for the main effects. The last three entries are the p-values for the two-way interactions. The p-value of 0.0158 indicates that the interaction between g1 and g2 is significant. The p-values of 0.1444 and 0.5 indicate that the corresponding interactions are not significant.

Two-Way ANOVA for Unbalanced Design Load the sample data. load carbig

The data has measurements on 406 cars. The variable org shows where the cars were made and when shows when in the year the cars were manufactured. Study how the mileage depends on when and where the cars were made. Also include the two-way interactions in the model. p = anovan(MPG,{org when},'model',2,'varnames',{'origin','mfg date'})

35-108

anovan

p = 3×1 0.0000 0.0000 0.3059

The 'model',2 name-value pair argument represents the two-way interactions. The p-value for the interaction term, 0.3059, is not small, indicating little evidence that the effect of the time of manufacture (mfg date) depends on where the car was made (origin). The main effects of origin and manufacturing date, however, are significant, both p-values are 0.

Multiple Comparisons for Three-Way ANOVA Load the sample data. y = [52.7 57.5 45.9 44.5 53.0 57.0 45.9 44.0]'; g1 = [1 2 1 2 1 2 1 2]; g2 = ["hi" "hi" "lo" "lo" "hi" "hi" "lo" "lo"]; g3 = ["may" "may" "may" "may" "june" "june" "june" "june"];

y is the response vector and g1, g2, and g3 are the grouping variables (factors). Each factor has two levels, and every observation in y is identified by a combination of factor levels. For example, observation y(1) is associated with level 1 of factor g1, level hi of factor g2, and level may of factor g3. Similarly, observation y(6) is associated with level 2 of factor g1, level hi of factor g2, and level june of factor g3. Test if the response is the same for all factor levels. Also compute the statistics required for multiple comparison tests. [~,~,stats] = anovan(y,{g1 g2 g3},"Model","interaction", ... "Varnames",["g1","g2","g3"]);

35-109

35

Functions

The p-value of 0.2578 indicates that the mean responses for levels may and june of factor g3 are not significantly different. The p-value of 0.0347 indicates that the mean responses for levels 1 and 2 of factor g1 are significantly different. Similarly, the p-value of 0.0048 indicates that the mean responses for levels hi and lo of factor g2 are significantly different. Perform a multiple comparison test to find out which groups of factors g1 and g2 are significantly different. [results,~,~,gnames] = multcompare(stats,"Dimension",[1 2]);

35-110

anovan

You can test the other groups by clicking on the corresponding comparison interval for the group. The bar you click on turns to blue. The bars for the groups that are significantly different are red. The bars for the groups that are not significantly different are gray. For example, if you click on the comparison interval for the combination of level 1 of g1 and level lo of g2, the comparison interval for the combination of level 2 of g1 and level lo of g2 overlaps, and is therefore gray. Conversely, the other comparison intervals are red, indicating significant difference. Display the multiple comparison results and the corresponding group names in a table. tbl = array2table(results,"VariableNames", ... ["Group A","Group B","Lower Limit","A-B","Upper Limit","P-value"]); tbl.("Group A")=gnames(tbl.("Group A")); tbl.("Group B")=gnames(tbl.("Group B")) tbl=6×6 table Group A ______________

Group B ______________

Lower Limit ___________

A-B _____

Upper Limit ___________

P-value _________

{'g1=1,g2=hi'} {'g1=1,g2=hi'} {'g1=1,g2=hi'} {'g1=2,g2=hi'} {'g1=2,g2=hi'} {'g1=1,g2=lo'}

{'g1=2,g2=hi'} {'g1=1,g2=lo'} {'g1=2,g2=lo'} {'g1=1,g2=lo'} {'g1=2,g2=lo'} {'g1=2,g2=lo'}

-6.8604 4.4896 6.1396 8.8896 10.54 -0.8104

-4.4 6.95 8.6 11.35 13 1.65

-1.9396 9.4104 11.06 13.81 15.46 4.1104

0.027249 0.016983 0.013586 0.010114 0.0087375 0.07375

35-111

35

Functions

The multcompare function compares the combinations of groups (levels) of the two grouping variables, g1 and g2. For example, the first row of the matrix shows that the combination of level 1 of g1 and level hi of g2 has the same mean response values as the combination of level 2 of g1 and level hi of g2. The p-value corresponding to this test is 0.0272, which indicates that the mean responses are significantly different. You can also see this result in the figure. The blue bar shows the comparison interval for the mean response for the combination of level 1 of g1 and level hi of g2. The red bars are the comparison intervals for the mean response for other group combinations. None of the red bars overlap with the blue bar, which means the mean response for the combination of level 1 of g1 and level hi of g2 is significantly different from the mean response for other group combinations.

Input Arguments y — Sample data numeric vector Sample data, specified as a numeric vector. Data Types: single | double group — Grouping variables cell array Grouping variables, i.e. the factors and factor levels of the observations in y, specified as a cell array. Each of the cells in group contains a list of factor levels identifying the observations in y with respect to one of the factors. The list within each cell can be a categorical array, numeric vector, character matrix, string array, or single-column cell array of character vectors, and must have the same number of elements as y. y2,

y3,

y4,

y5, ⋯, yN ]′

′A′, ′A′,

′C′,

′B′,

′B′, ⋯, ′D′

1

3

y = [ y1, g1 = g2 = [ g3 =

1

2

1 ⋯,

2

]

′hi′, ′mid′, ′low′, ′mid′, ′hi′, ⋯, ′low′

By default, anovan treats all grouping variables as fixed effects. For example, in a study you want to investigate the effects of gender, school, and the education method on the academic success of elementary school students, then you can specify the grouping variables as follows. Example: {'Gender','School','Method'} Data Types: cell Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. 35-112

anovan

Example: 'alpha',0.01,'model','interaction','sstype',2 specifies anovan to compute the 99% confidence bounds and p-values for the main effects and two-way interactions using type II sum of squares. alpha — Significance level 0.05 (default) | scalar value in the range 0 to 1 Significance level for confidence bounds, specified as the comma-separated pair consisting of'alpha' and a scalar value in the range 0 to 1. For a value α, the confidence level is 100*(1–α)%. Example: 'alpha',0.01 corresponds to 99% confidence intervals Data Types: single | double continuous — Indicator for continuous predictors vector of indices Indicator for continuous predictors, representing which grouping variables should be treated as continuous predictors rather than as categorical predictors, specified as the comma-separated pair consisting of 'continuous' and a vector of indices. For example, if there are three grouping variables and second one is continuous, then you can specify as follows. Example: 'continuous',[2] Data Types: single | double display — Indicator to display ANOVA table 'on' (default) | 'off' Indicator to display ANOVA table, specified as the comma-separated pair consisting of 'display' and 'on' or 'off'. When 'display' is 'off', anovan only returns the output arguments, and does not display the standard ANOVA table as a figure. Example: 'display','off' model — Type of the model 'linear' (default) | 'interaction' | 'full' | integer value | terms matrix Type of the model, specified as the comma-separated pair consisting of 'model' and one of the following: • 'linear' — The default 'linear' model computes only the p-values for the null hypotheses on the N main effects. • 'interaction' — The 'interaction' model computes the p-values for null hypotheses on the N two-factor interactions. N main effects and the 2 • 'full' — The 'full' model computes the p-values for null hypotheses on the N main effects and interactions at all levels. • An integer — For an integer value of k, (k ≤ N) for model type, anovan computes all interaction levels through the kth level. For example, the value 3 means main effects plus two- and threefactor interactions. The values k = 1 and k = 2 are equivalent to the 'linear' and 'interaction' specifications, respectively. The value k = N is equivalent to the 'full' specification. 35-113

35

Functions

• Terms matrix — A matrix of term definitions having the same form as the input to the x2fx function. All entries must be 0 or 1 (no higher powers). For more precise control over the main and interaction terms that anovan computes, you can specify a matrix containing one row for each main or interaction term to include in the ANOVA model. Each row defines one term using a vector of N zeros and ones. The table below illustrates the coding for a 3-factor ANOVA for factors A, B, and C. Matrix Row

ANOVA Term

[1 0 0]

Main term A

[0 1 0]

Main term B

[0 0 1]

Main term C

[1 1 0]

Interaction term AB

[1 0 1]

Interaction term AC

[0 1 1]

Interaction term BC

[1 1 1]

Interaction term ABC

For example, if there are three factors A, B, and C, and 'model',[0 1 0;0 0 1;0 1 1], then anovan tests for the main effects B and C, and the interaction effect BC, respectively. A simple way to generate the terms matrix is to modify the terms output, which codes the terms in the current model using the format described above. If anovan returns [0 1 0;0 0 1;0 1 1] for terms, for example, and there is no significant interaction BC, then you can recompute ANOVA on just the main effects B and C by specifying [0 1 0;0 0 1] for model. Example: 'model',[0 1 0;0 0 1;0 1 1] Example: 'model','interaction' Data Types: char | string | single | double nested — Nesting relationships matrix of 0’s and 1’s Nesting relationships among the grouping variables, specified as the comma-separated pair consisting of 'nested' and a matrix M of 0’s and 1’s, i.e.M(i,j) = 1 if variable i is nested in variable j. You cannot specify nesting in a continuous variable. For example, if there are two grouping variables District and School, where School is nested in District, then you can express this relationship as follows. Example: 'nested',[0 0;1 0] Data Types: single | double random — Indicator for random variables vector of indices Indicator for random variables, representing which grouping variables are random, specified as the comma-separated pair consisting of 'random' and a vector of indices. By default, anovan treats all grouping variables as fixed. anovan treats an interaction term as random if any of the variables in the interaction term is random. 35-114

anovan

Example: 'random',[3] Data Types: single | double sstype — Type of sum of squares 3 (default) | 1 | 2 | 'h' Type of sum squares, specified as the comma-separated pair consisting of 'sstype' and the following: • 1 — Type I sum of squares. The reduction in residual sum of squares obtained by adding that term to a fit that already includes the terms listed before it. • 2 — Type II sum of squares. The reduction in residual sum of squares obtained by adding that term to a model consisting of all other terms that do not contain the term in question. • 3 — Type III sum of squares. The reduction in residual sum of squares obtained by adding that term to a model containing all other terms, but with their effects constrained to obey the usual “sigma restrictions” that make models estimable. • 'h' — Hierarchical model. Similar to type 2, but with continuous as well as categorical factors used to determine the hierarchy of terms. The sum of squares for any term is determined by comparing two models. For a model containing main effects but no interactions, the value of sstype influences the computations on unbalanced data only. Suppose you are fitting a model with two factors and their interaction, and the terms appear in the order A, B, AB. Let R(·) represent the residual sum of squares for the model. So, R(A, B, AB) is the residual sum of squares fitting the whole model, R(A) is the residual sum of squares fitting the main effect of A only, and R(1) is the residual sum of squares fitting the mean only. The three sum of squares types are as follows: Term

Type 1 Sum of Squares

Type 2 Sum of Squares

Type 3 Sum of Squares

A

R(1) – R(A)

R(B) – R(A, B)

R(B, AB) – R(A, B, AB)

B

R(A) – R(A, B)

R(A) – R(A, B)

R(A, AB) – R(A, B, AB)

AB

R(A, B) – R(A, B, AB)

R(A, B) – R(A, B, AB)

R(A, B) – R(A, B, AB)

The models for Type 3 sum of squares have sigma restrictions imposed. This means, for example, that in fitting R(B, AB), the array of AB effects is constrained to sum to 0 over A for each value of B, and over B for each value of A. Example: 'sstype','h' Data Types: single | double | char | string varnames — Names of grouping variables X1,X2,...,XN (default) | character matrix | string array | cell array of character vectors Names of grouping variables, specified as the comma-separating pair consisting of 'varnames' and a character matrix, a string array, or a cell array of character vectors. Example: 'varnames',{'Gender','City'} Data Types: char | string | cell

35-115

35

Functions

Output Arguments p — p-values vector p-values, returned as a vector. Output vector p contains p-values for the null hypotheses on the N main effects and any interaction terms specified. Element p(1) contains the p-value for the null hypotheses that samples at all levels of factor A are drawn from the same population; element p(2) contains the p-value for the null hypotheses that samples at all levels of factor B are drawn from the same population; and so on. For example, if there are three factors A, B, and C, and 'model',[0 1 0;0 0 1;0 1 1], then the output vector p contains the p-values for the null hypotheses on the main effects B and C and the interaction effect BC, respectively. A sufficiently small p-value corresponding to a factor suggests that at least one group mean is significantly different from the other group means; that is, there is a main effect due to that factor. It is common to declare a result significant if the p-value is less than 0.05 or 0.01. tbl — ANOVA table cell array ANOVA table, returned as a cell array. The ANOVA table has seven columns: Column name

Definition

source

Source of the variability.

SS

Sum of squares due to each source.

df

Degrees of freedom associated with each source.

MS

Mean squares for each source, which is the ratio SS/df.

Singular?

Indication of whether the term is singular.

F

F-statistic, which is the ratio of the mean squares.

Prob>F

The p-values, which is the probability that the Fstatistic can take a value larger than a computed test-statistic value. anovan derives these probabilities from the cdf of F-distribution.

The ANOVA table also contains the following columns if at least one of the grouping variables is specified as random using the name-value pair argument random:

35-116

Column name

Definition

Type

Type of each source; 'fixed' for a fixed effect or 'random' for a random effect.

Expected MS

Text representation of the expected value for the mean square. Q(source) represents a quadratic function of source and V(source) represents the variance of source.

MS denom

Denominator of the F-statistic.

anovan

Column name

Definition

d.f. denom

Degrees of freedom for the denominator of the Fstatistic.

Denom. defn.

Text representation of the denominator of the Fstatistic. MS(source) represents the mean square of source.

Var. est.

Variance component estimate.

Var. lower bnd

Lower bound of the 95% confidence interval for the variance component estimate.

Var. upper bnd

Upper bound of the 95% confidence interval for the variance component estimate.

stats — Statistics structure Statistics to use in a multiple comparison test on page 9-19 using the multcompare function, returned as a structure. anovan evaluates the hypothesis that the different groups (levels) of a factor (or more generally, a term) have the same effect, against the alternative that they do not all have the same effect. Sometimes it is preferable to perform a test to determine which pairs of levels are significantly different, and which are not. Use the multcompare function to perform such tests by supplying the stats structure as input. The stats structure contains the fields listed below, in addition to a number of other fields required for doing multiple comparisons using the multcompare function: Field

Description

coeffs

Estimated coefficients

coeffnames

Name of term for each coefficient

vars

Matrix of grouping variable values for each term

resid

Residuals from the fitted model

The stats structure also contains the following fields if at least one of the grouping variables is specified as random using the name-value pair argument random: Field

Description

ems

Expected mean squares

denom

Denominator definition

rtnames

Names of random terms

varest

Variance component estimates (one per random term)

varci

Confidence intervals for variance components

terms — Main and interaction terms matrix

35-117

35

Functions

Main and interaction terms, returned as a matrix. The terms are encoded in the output matrix terms using the same format described above for input model. When you specify model itself in this format, the matrix returned in terms is identical.

Alternative Functionality Instead of using anovan, you can create an anova object by using the anova function. The anova function provides these advantages: • The anova function allows you to specify the ANOVA model type, sum of squares type, and factors to treat as categorical. anova also supports table predictor and response input arguments. • In addition to the outputs returned by anovan, the properties of the anova object contain the following: • ANOVA model formula • Fitted ANOVA model coefficients • Residuals • Factors and response data • The anova object functions allow you to conduct further analysis after fitting the anova object. For example, you can create an interactive plot of multiple comparisons of means for the ANOVA, get the mean response estimates for each value of a factor, and calculate the variance component estimates.

Version History Introduced before R2006a

References [1] Dunn, O.J., and V.A. Clark. Applied Statistics: Analysis of Variance and Regression. New York: Wiley, 1974. [2] Goodnight, J.H., and F.M. Speed. Computing Expected Mean Squares. Cary, NC: SAS Institute, 1978. [3] Seber, G. A. F., and A. J. Lee. Linear Regression Analysis. 2nd ed. Hoboken, NJ: Wiley-Interscience, 2003.

See Also anova | anova1 | anova2 | multcompare | fitrm | ranova Topics “Perform N-Way ANOVA” on page 9-30 “ANOVA with Random Effects” on page 9-36 “Multiple Comparisons” on page 9-19 “N-Way ANOVA” on page 9-28

35-118

anova

anova Analysis of variance for between-subject effects in a repeated measures model

Syntax anovatbl = anova(rm) anovatbl = anova(rm,'WithinModel',WM)

Description anovatbl = anova(rm) returns the analysis of variance results for the repeated measures model rm. anovatbl = anova(rm,'WithinModel',WM) returns the analysis of variance results it performs using the response or responses specified by the within-subject model WM.

Examples Analysis of Variance for Average Response Load the sample data. load fisheriris

The column vector species consists of iris flowers of three different species: setosa, versicolor, and virginica. The double matrix meas consists of four types of measurements on the flowers: the length and width of sepals and petals in centimeters, respectively. Store the data in a table array. t = table(species,meas(:,1),meas(:,2),meas(:,3),meas(:,4),... 'VariableNames',{'species','meas1','meas2','meas3','meas4'}); Meas = dataset([1 2 3 4]','VarNames',{'Measurements'});

Fit a repeated measures model where the measurements are the responses and the species is the predictor variable. rm = fitrm(t,'meas1-meas4~species','WithinDesign',Meas);

Perform analysis of variance. anova(rm) ans=3×7 table Within ________ Constant Constant Constant

Between ________

SumSq ______

DF ___

MeanSq _______

F ______

pValue ___________

constant species Error

7201.7 309.61 53.875

1 2 147

7201.7 154.8 0.36649

19650 422.39

2.0735e-158 1.1517e-61

35-119

35

Functions

There are 150 observations and 3 species. The degrees of freedom for species is 3 - 1 = 2, and for error it is 150 - 3 = 147. The small p-value of 1.1517e-61 indicates that the measurements differ significantly according to species.

Panel Data Load the sample panel data. load('panelData.mat');

The dataset array, panelData, contains yearly observations on eight cities for 6 years. The first variable, Growth, measures economic growth (the response variable). The second and third variables are city and year indicators, respectively. The last variable, Employ, measures employment (the predictor variable). This is simulated data. Store the data in a table array and define city as a nominal variable. t = table(panelData.Growth,panelData.City,panelData.Year,... 'VariableNames',{'Growth','City','Year'});

Convert the data in a proper format to do repeated measures analysis. t = unstack(t,'Growth','Year','NewDataVariableNames',... {'year1','year2','year3','year4','year5','year6'});

Add the mean employment level over the years as a predictor variable to the table t. t(:,8) = table(grpstats(panelData.Employ,panelData.City)); t.Properties.VariableNames{'Var8'} = 'meanEmploy';

Define the within-subjects variable. Year = [1 2 3 4 5 6]';

Fit a repeated measures model, where the growth figures over the 6 years are the responses and the mean employment is the predictor variable. rm = fitrm(t,'year1-year6 ~ meanEmploy','WithinDesign',Year);

Perform analysis of variance. anovatbl = anova(rm,'WithinModel',Year) anovatbl=3×7 table Within Between _________ __________ Contrast1 Contrast1 Contrast1

35-120

constant meanEmploy Error

SumSq __________

DF __

MeanSq __________

F ________

pValue _________

588.17 3.7064e+05 91675

1 1 6

588.17 3.7064e+05 15279

0.038495 24.258

0.85093 0.0026428

anova

Longitudinal Data Load the sample data. load('longitudinalData.mat');

The matrix Y contains response data for 16 individuals. The response is the blood level of a drug measured at five time points (time = 0, 2, 4, 6, and 8). Each row of Y corresponds to an individual, and each column corresponds to a time point. The first eight subjects are female, and the second eight subjects are male. This is simulated data. Define a variable that stores gender information. Gender = ['F' 'F' 'F' 'F' 'F' 'F' 'F' 'F' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M']';

Store the data in a proper table array format to do repeated measures analysis. t = table(Gender,Y(:,1),Y(:,2),Y(:,3),Y(:,4),Y(:,5),... 'VariableNames',{'Gender','t0','t2','t4','t6','t8'});

Define the within-subjects variable. Time = [0 2 4 6 8]';

Fit a repeated measures model, where blood levels are the responses and gender is the predictor variable. rm = fitrm(t,'t0-t8 ~ Gender','WithinDesign',Time);

Perform analysis of variance. anovatbl = anova(rm) anovatbl=3×7 table Within Between ________ ________ Constant Constant Constant

constant Gender Error

SumSq ______

DF __

MeanSq ______

F ______

pValue __________

54702 2251.7 709.6

1 1 14

54702 2251.7 50.685

1079.2 44.425

1.1897e-14 1.0693e-05

There are 2 genders and 16 observations, so the degrees of freedom for gender is (2 - 1) = 1 and for error it is (16 - 2)*(2 - 1) = 14. The small p-value of 1.0693e-05 indicates that there is a significant effect of gender on blood pressure. Repeat analysis of variance using orthogonal contrasts. anovatbl = anova(rm,'WithinModel','orthogonalcontrasts') anovatbl=15×7 table Within Between ________ ________ Constant Constant Constant Time

constant Gender Error constant

SumSq __________

DF __

MeanSq __________

F __________

pValue __________

54702 2251.7 709.6 310.83

1 1 14 1

54702 2251.7 50.685 310.83

1079.2 44.425

1.1897e-14 1.0693e-05

31.023

6.9065e-05

35-121

35

Functions

Time Time Time^2 Time^2 Time^2 Time^3 Time^3 Time^3 Time^4 Time^4 Time^4

Gender Error constant Gender Error constant Gender Error constant Gender Error

13.341 140.27 565.42 1.4076 80.039 2.6127 7.8853e-06 25.546 2.8404 2.9016 82.977

1 14 1 1 14 1 1 14 1 1 14

13.341 10.019 565.42 1.4076 5.7171 2.6127 7.8853e-06 1.8247 2.8404 2.9016 5.9269

1.3315

0.26785

98.901 0.24621

1.0003e-07 0.62746

1.4318 4.3214e-06

0.25134 0.99837

0.47924 0.48956

0.50009 0.49559

Input Arguments rm — Repeated measures model RepeatedMeasuresModel object Repeated measures model, returned as a RepeatedMeasuresModel object. For properties and methods of this object, see RepeatedMeasuresModel. WM — Within-subject model 'separatemeans' (default) | 'orthogonalcontrasts' | character vector or string scalar defining a model specification | r-by-nc matrix specifying nc contrasts Within-subject model, specified as one of the following: • 'separatemeans' — The response is the average of the repeated measures (average across the within-subject model). • 'orthogonalcontrasts' — This is valid when the within-subject model has a single numeric factor T. Responses are the average, the slope of centered T, and, in general, all orthogonal contrasts for a polynomial up to T^(p – 1), where p is the number of rows in the within-subject model. anova multiplies Y, the response you use in the repeated measures model rm by the orthogonal contrasts, and uses the columns of the resulting product matrix as the responses. anova computes the orthogonal contrasts for T using the Q factor of a QR factorization on page 35-123 of the Vandermonde matrix on page 35-123. • A character vector or string scalar that defines a model specification in the within-subject factors. Responses are defined by the terms in that model. anova multiplies Y, the response matrix you use in the repeated measures model rm by the terms of the model, and uses the columns of the result as the responses. For example, if there is a Time factor and 'Time' is the model specification, then anova uses two terms, the constant and the uncentered Time term. The default is '1' to perform on the average response. • An r-by-nc matrix, C, specifying nc contrasts among the r repeated measures. If Y represents the matrix of repeated measures you use in the repeated measures model rm, then the output tbl contains a separate analysis of variance for each column of Y*C. The anova table contains a separate univariate analysis of variance results for each response. Example: 'WithinModel','Time' 35-122

anova

Example: 'WithinModel','orthogonalcontrasts'

Output Arguments anovatbl — Results of analysis of variance table Results of analysis of variance for between-subject effects, returned as a table. This includes all terms on the between-subjects model and the following columns. Column Name

Definition

Within

Within-subject factors

Between

Between-subject factors

SumSq

Sum of squares

DF

Degrees of freedom

MeanSq

Mean squared error

F

F-statistic

pValue

p-value corresponding to the F-statistic

More About Vandermonde Matrix Vandermonde matrix is the matrix where columns are the powers of the vector a, that is, V(i,j) = a(i)(n — j) , where n is the length of a. QR Factorization QR factorization of an m-by-n matrix A is the factorization that matrix into the product A = Q*R, where R is an m-by-n upper triangular matrix and Q is an m-by-m unitary matrix.

Version History Introduced in R2014a

See Also ranova | manova | fitrm | qr | vander | RepeatedMeasuresModel Topics “Model Specification for Repeated Measures Models” on page 9-57

35-123

35

Functions

ansaribradley Ansari-Bradley test

Syntax h = ansaribradley(x,y) h = ansaribradley(x,y,Name,Value) [h,p] = ansaribradley( ___ ) [h,p,stats] = ansaribradley( ___ )

Description h = ansaribradley(x,y) returns a test decision for the null hypothesis that the data in vectors x and y comes from the same distribution, using the Ansari-Bradley test on page 35-127. The alternative hypothesis is that the data in x and y comes from distributions with the same median and shape but different dispersions (e.g., variances). The result h is 1 if the test rejects the null hypothesis at the 5% significance level, or 0 otherwise. h = ansaribradley(x,y,Name,Value) returns a test decision for the Ansari-Bradley test with additional options specified by one or more name-value pair arguments. For example, you can change the significance level, conduct a one-sided test, or use a normal approximation to calculate the value of the test statistic. [h,p] = ansaribradley( ___ ) also returns the p-value, p, of the test, using any of the input arguments in the previous syntaxes. [h,p,stats] = ansaribradley( ___ ) also returns the structure stats containing information about the test statistic.

Examples Ansari-Bradley Test for Equal Variances Load the sample data. Create data vectors of miles per gallon (MPG) measurements for the model years 1982 and 1976. load carsmall x = MPG(Model_Year==82); y = MPG(Model_Year==76);

Test the null hypothesis that the miles per gallon measured in cars from 1982 and 1976 have equal variances. [h,p,stats] = ansaribradley(x,y) h = 0 p = 0.8426 stats = struct with fields: W: 526.9000

35-124

ansaribradley

Wstar: 0.1986

The returned value of h = 0 indicates that ansaribradley does not reject the null hypothesis at the default 5% significance level.

Ansari-Bradley One-Sided Hypothesis Test Load the sample data. Create data vectors of miles per gallon (MPG) measurements for the model years 1982 and 1976. load carsmall x = MPG(Model_Year==82); y = MPG(Model_Year==76);

Test the null hypothesis that the miles per gallon measured in cars from 1982 and 1976 have equal variances, against the alternative hypothesis that the variance of cars from 1982 is greater than that of cars from 1976. [h,p,stats] = ansaribradley(x,y,'Tail','right') h = 0 p = 0.5787 stats = struct with fields: W: 526.9000 Wstar: 0.1986

The returned value of h = 0 indicates that ansaribradley does not reject the null hypothesis that the variance in miles per gallon is the same for the two model years, when the alternative is that the variance of cars from 1982 is greater than that of cars from 1976.

Input Arguments x — Sample data vector | matrix | multidimensional array Sample data, specified as a vector, matrix, or multidimensional array. • If x and y are specified as vectors, they do not need to be the same length. • If x and y are specified as matrices, they must have the same number of columns. ansaribradley performs separate tests along each column and returns a vector of results. • If x and y are specified as multidimensional arrays on page 35-128, ansaribradley works along the first nonsingleton dimension on page 35-128. x and y must have the same size along all remaining dimensions. Data Types: single | double y — Sample data vector | matrix | multidimensional array 35-125

35

Functions

Sample data, specified as a vector, matrix, or multidimensional array. • If x and y are specified as vectors, they do not need to be the same length. • If x and y are specified as matrices, they must have the same number of columns. ansaribradley performs separate tests along each column and returns a vector of results. • If x and y are specified as multidimensional arrays on page 35-128, ansaribradley works along the first nonsingleton dimension on page 35-128. x and y must have the same size along all remaining dimensions. Data Types: single | double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Tail','right','Alpha',0.01 specifies a right-tailed hypothesis test at the 1% significance level. Alpha — Significance level 0.05 (default) | scalar value in the range (0,1) Significance level of the hypothesis test, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range (0,1). Example: 'Alpha',0.01 Data Types: single | double Dim — Dimension first nonsingleton dimension (default) | positive integer value Dimension of the input matrix along which to test the means, specified as the comma-separated pair consisting of 'Dim' and a positive integer value. For example, specifying 'Dim',1 tests the column means, while 'Dim',2 tests the row means. Example: 'Dim',2 Data Types: single | double Tail — Type of alternative hypothesis 'both' (default) | 'left' | 'right' Type of alternative hypothesis to evaluate, specified as the comma-separated pair consisting of 'Tail' and one of the following.

35-126

'both'

Test the alternative hypothesis that the dispersion parameters of x and y are not equal.

'right'

Test the alternative hypothesis that the dispersion parameter of x is greater than that of y.

'left'

Test the alternative hypothesis that the dispersion parameter of x is less than that of y.

ansaribradley

Example: 'Tail','right' Method — Computation method 'exact' | 'approximate' Computation method for the test statistic, specified as the comma-separated pair consisting of 'Method' and one of the following. 'exact'

Compute p using an exact calculation of the distribution of the test statistic W. This is the default if n, the total number of rows in x and y, is 25 or less. Note that n is computed before any NaN values (representing missing data) are removed.

'approximate'

Compute p using a normal approximation for the statistic W*. This is the default if n, the total number of rows in x and y, is greater than 25.

Example: 'Method','exact'

Output Arguments h — Hypothesis test result 1|0 Hypothesis test result, returned as 1 or 0. • If h = 1, this indicates the rejection of the null hypothesis at the Alpha significance level. • If h = 0, this indicates a failure to reject the null hypothesis at the Alpha significance level. p — p-value scalar value in the range [0,1] p-value of the test, returned as a scalar value in the range [0,1]. p is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. Small values of p cast doubt on the validity of the null hypothesis. stats — Test statistics structure Test statistics for the Ansari-Bradley test, returned as a structure containing: • W — Value of the test statistic, which is the sum of the Ansari-Bradley ranks for the x sample. • Wstar — Approximate normal statistic W*.

More About Ansari-Bradley Test The Ansari-Bradley test is a nonparametric alternative to the two-sample F-test of equal variances. It does not require the assumption that x and y come from normal distributions. The dispersion of a distribution is generally measured by its variance or standard deviation, but the Ansari-Bradley test can be used with samples from distributions that do not have finite variances. This test requires that the samples have equal medians. Under that assumption, and if the distributions of the samples are continuous and identical, the test is independent of the distributions. 35-127

35

Functions

If the samples do not have the same medians, the results can be misleading. In that case, Ansari and Bradley recommend subtracting the median, but then the distribution of the resulting test under the null hypothesis is no longer independent of the common distribution of x and y. If you want to perform the tests with medians subtracted, you should subtract the medians from x and y before calling ansaribradley. Multidimensional Array A multidimensional array has more than two dimensions. For example, if x is a 1-by-3-by-4 array, then x is a three-dimensional array. First Nonsingleton Dimension The first nonsingleton dimension is the first dimension of an array whose size is not equal to 1. For example, if x is a 1-by-2-by-3-by-4 array, then the second dimension is the first nonsingleton dimension of x.

Version History Introduced before R2006a

See Also vartest2 | vartestn | ttest2

35-128

aoctool

aoctool Interactive analysis of covariance

Syntax aoctool(x,y,group) aoctool(x,y,group,alpha) aoctool(x,y,group,alpha,xname,yname,gname) aoctool(x,y,group,alpha,xname,yname,gname,displayopt) aoctool(x,y,group,alpha,xname,yname,gname,displayopt,model) h = aoctool(...) [h,atab,ctab] = aoctool(...) [h,atab,ctab,stats] = aoctool(...)

Description aoctool(x,y,group) fits a separate line to the column vectors, x and y, for each group defined by the values in the array group. group may be a categorical variable, numeric vector, character array, string array, or cell array of character vectors. These types of models are known as one-way analysis of covariance (ANOCOVA) models. The output consists of three figures: • An interactive graph of the data and prediction curves • An ANOVA table • A table of parameter estimates You can use the figures to change models and to test different parts of the model. More information about interactive use of the aoctool function appears in “Analysis of Covariance Tool” on page 9-42. aoctool(x,y,group,alpha) determines the confidence levels of the prediction intervals. The confidence level is 100(1-alpha)%. The default value of alpha is 0.05. aoctool(x,y,group,alpha,xname,yname,gname) specifies the name to use for the x, y, and g variables in the graph and tables. If you enter simple variable names for the x, y, and g arguments, the aoctool function uses those names. If you enter an expression for one of these arguments, you can specify a name to use in place of that expression by supplying these arguments. For example, if you enter m(:,2) as the x argument, you might choose to enter 'Col 2' as the xname argument. aoctool(x,y,group,alpha,xname,yname,gname,displayopt) enables the graph and table displays when displayopt is 'on' (default) and suppresses those displays when displayopt is 'off'. aoctool(x,y,group,alpha,xname,yname,gname,displayopt,model) specifies the initial model to fit. The value of model can be any of the following: • 'same mean' — Fit a single mean, ignoring grouping • 'separate means' — Fit a separate mean to each group • 'same line' — Fit a single line, ignoring grouping • 'parallel lines' — Fit a separate line to each group, but constrain the lines to be parallel 35-129

35

Functions

• 'separate lines' — Fit a separate line to each group, with no constraints h = aoctool(...) returns a vector of handles to the line objects in the plot. [h,atab,ctab] = aoctool(...) returns cell arrays containing the entries in ANOVA table (atab) and the table of coefficient estimates (ctab). (You can copy a text version of either table to the clipboard by using the Copy Text item on the Edit menu.) [h,atab,ctab,stats] = aoctool(...) returns a stats structure that you can use to perform a follow-up multiple comparison test. The ANOVA table output includes tests of the hypotheses that the slopes or intercepts are all the same, against a general alternative that they are not all the same. Sometimes it is preferable to perform a test to determine which pairs of values are significantly different, and which are not. You can use the multcompare function to perform such tests by supplying the stats structure as input. You can test either the slopes, the intercepts, or population marginal means (the heights of the curves at the mean x value).

Examples This example illustrates how to fit different models non-interactively. After loading the smaller car data set and fitting a separate-slopes model, you can examine the coefficient estimates. load carsmall [h,a,c,s] = aoctool(Weight,MPG,Model_Year,0.05,... '','','','off','separate lines'); c(:,1:2) ans = 'Term' 'Estimate' 'Intercept' [45.97983716833132] ' 70' [-8.58050531454973] ' 76' [-3.89017396094922] ' 82' [12.47067927549897] 'Slope' [-0.00780212907455] ' 70' [ 0.00195840368824] ' 76' [ 0.00113831038418] ' 82' [-0.00309671407243]

Roughly speaking, the lines relating MPG to Weight have an intercept close to 45.98 and a slope close to -0.0078. Each group's coefficients are offset from these values somewhat. For instance, the intercept for the cars made in 1970 is 45.98-8.58 = 37.40. Next, try a fit using parallel lines. (The ANOVA table shows that the parallel-lines fit is significantly worse than the separate-lines fit.) [h,a,c,s] = aoctool(Weight,MPG,Model_Year,0.05,... '','','','off','parallel lines'); c(:,1:2) ans = 'Term' 'Intercept' ' 70' ' 76' ' 82' 'Slope'

35-130

'Estimate' [43.38984085130596] [-3.27948192983761] [-1.35036234809006] [ 4.62984427792768] [-0.00664751826198]

aoctool

Again, there are different intercepts for each group, but this time the slopes are constrained to be the same.

Version History Introduced before R2006a

See Also anova1 | multcompare | polytool

35-131

35

Functions

append Append new trees to ensemble

Syntax B = append(B,other)

Description B = append(B,other) appends the trees from the other ensemble to those in B. This method checks for consistency of the X and Y properties of the two ensembles, as well as consistency of their compact objects and out-of-bag indices, before appending the trees. The output ensemble B takes training parameters such as FBoot, Prior, Cost, and other from the B input. There is no attempt to check if these training parameters are consistent between the two objects.

See Also combine

35-132

average

average Compute performance metrics for average receiver operating characteristic (ROC) curve in multiclass problem

Syntax [FPR,TPR,Thresholds,AUC] = average(rocObj,type)

Description [FPR,TPR,Thresholds,AUC] = average(rocObj,type) computes the averages of performance metrics stored in the rocmetrics object rocObj for a multiclass classification problem using the averaging method specified in type. The function returns the average false positive rate (FPR) and the average true positive rate (TPR) for each threshold value in Thresholds. The function also returns AUC, the area under the ROC curve composed of FPR and TPR.

Examples Find Average ROC Curve Compute the performance metrics for a multiclass classification problem by creating a rocmetrics object, and then compute the average values for the metrics by using the average function. Plot the average ROC curve using the outputs of average. Load the fisheriris data set. The matrix meas contains flower measurements for 150 different flowers. The vector species lists the species for each flower. species contains three distinct flower names. load fisheriris

Train a classification tree that classifies observations into one of the three labels. Cross-validate the model using 10-fold cross-validation. rng("default") % For reproducibility Mdl = fitctree(meas,species,Crossval="on");

Compute the classification scores for validation-fold observations. [~,Scores] = kfoldPredict(Mdl); size(Scores) ans = 1×2 150

3

The output Scores is a matrix of size 150-by-3. The column order of Scores follows the class order in Mdl, stored in Mdl.ClassNames. Create a rocmetrics object by using the true labels in species and the classification scores in Scores. Specify the column order of Scores using Mdl.ClassNames. 35-133

35

Functions

rocObj = rocmetrics(species,Scores,Mdl.ClassNames);

rocmetrics computes the FPR and TPR at different thresholds and finds the AUC value for each class. Compute the average performance metric values, including the FPR and TPR at different thresholds and the AUC value, using the macro-averaging method. [FPR,TPR,Thresholds,AUC] = average(rocObj,"macro");

Plot the average ROC curve and display the average AUC value. Include (0,0) so that the curve starts from the origin (0,0). plot([0;FPR],[0;TPR]) xlabel("False Positive Rate") ylabel("True Positive Rate") title("Average ROC Curve") hold on plot([0,1],[0,1],"k--") legend(join(["Macro-average (AUC =",AUC,")"]), ... Location="southeast") axis padded hold off

Alternatively, you can create the average ROC curve by using the plot function. Specify AverageROCType="macro" to compute the metrics for the average ROC curve using the macroaveraging method. 35-134

average

plot(rocObj,AverageROCType="macro",ClassNames=[])

Input Arguments rocObj — Object evaluating classification performance rocmetrics object Object evaluating classification performance, specified as a rocmetrics object. type — Averaging method "micro" | "macro" | "weighted" Averaging method, specified as "micro", "macro", or "weighted". • "micro" (micro-averaging) — average finds the average performance metrics by treating all one-versus-all on page 35-137 binary classification problems as one binary classification problem. The function computes the confusion matrix components for the combined binary classification problem, and then computes the average FPR and TPR using the values of the confusion matrix. • "macro" (macro-averaging) — average computes the average values for FPR and TPR by averaging the values of all one-versus-all binary classification problems. • "weighted" (weighted macro-averaging) — average computes the weighted average values for FPR and TPR using the macro-averaging method and using the prior class probabilities (the Prior property of rocObj) as weights. 35-135

35

Functions

The algorithm type determines the length of the vectors for the output arguments (FPR, TPR, and Thresholds). For more details, see “Average of Performance Metrics” on page 18-10. Data Types: char | string

Output Arguments FPR — Average false positive rates numeric vector Average false positive rates, returned as a numeric vector. TPR — Average true positive rates numeric vector Average true positive rates, returned as a numeric vector. Thresholds — Thresholds on classification scores numeric vector Thresholds on classification scores at which the function finds each of the average performance metric values (FPR and TPR), returned as a vector. AUC — Area under average ROC curve numeric scalar Area under the average ROC curve on page 35-137 composed of FPR and TPR, returned as a numeric scalar.

More About Receiver Operating Characteristic (ROC) Curve A ROC curve shows the true positive rate versus the false positive rate for different thresholds of classification scores. The true positive rate and the false positive rate are defined as follows: • True positive rate (TPR), also known as recall or sensitivity — TP/(TP+FN), where TP is the number of true positives and FN is the number of false negatives • False positive rate (FPR), also known as fallout or 1-specificity — FP/(TN+FP), where FP is the number of false positives and TN is the number of true negatives Each point on a ROC curve corresponds to a pair of TPR and FPR values for a specific threshold value. You can find different pairs of TPR and FPR values by varying the threshold value, and then create a ROC curve using the pairs. For each class, rocmetrics uses all distinct adjusted score on page 35-137 values as threshold values to create a ROC curve. For a multiclass classification problem, rocmetrics formulates a set of one-versus-all on page 35137 binary classification problems to have one binary problem for each class, and finds a ROC curve for each class using the corresponding binary problem. Each binary problem assumes one class as positive and the rest as negative. For a binary classification problem, if you specify the classification scores as a matrix, rocmetrics formulates two one-versus-all binary classification problems. Each of these problems treats one class 35-136

average

as a positive class and the other class as a negative class, and rocmetrics finds two ROC curves. Use one of the curves to evaluate the binary classification problem. For more details, see “ROC Curve and Performance Metrics” on page 18-3. Area Under ROC Curve (AUC) The area under a ROC curve (AUC) corresponds to the integral of a ROC curve (TPR values) with respect to FPR from FPR = 0 to FPR = 1. The AUC provides an aggregate performance measure across all possible thresholds. The AUC values are in the range 0 to 1, and larger AUC values indicate better classifier performance. One-Versus-All (OVA) Coding Design The one-versus-all (OVA) coding design reduces a multiclass classification problem to a set of binary classification problems. In this coding design, each binary classification treats one class as positive and the rest of the classes as negative. rocmetrics uses the OVA coding design for multiclass classification and evaluates the performance on each class by using the binary classification that the class is positive. For example, the OVA coding design for three classes formulates three binary classifications: Binary 1 Binary 2 Binary 3 Class 1 1 −1 −1 Class 2 −1 1 −1 Class 3 −1 −1 1 Each row corresponds to a class, and each column corresponds to a binary classification problem. The first binary classification assumes that class 1 is a positive class and the rest of the classes are negative. rocmetrics evaluates the performance on the first class by using the first binary classification problem.

Algorithms Adjusted Scores for Multiclass Classification Problem For each class, rocmetrics adjusts the classification scores (input argument Scores of rocmetrics) relative to the scores for the rest of the classes if you specify Scores as a matrix. Specifically, the adjusted score for a class given an observation is the difference between the score for the class and the maximum value of the scores for the rest of the classes. For example, if you have [s1,s2,s3] in a row of Scores for a classification problem with three classes, the adjusted score values are [s1-max(s2,s3),s2-max(s1,s3),s3-max(s1,s2)]. rocmetrics computes the performance metrics using the adjusted score values for each class. For a binary classification problem, you can specify Scores as a two-column matrix or a column vector. Using a two-column matrix is a simpler option because the predict function of a classification object returns classification scores as a matrix, which you can pass to rocmetrics. If you pass scores in a two-column matrix, rocmetrics adjusts scores in the same way that it adjusts scores for multiclass classification, and it computes performance metrics for both classes. You can use the metric values for one of the two classes to evaluate the binary classification problem. The 35-137

35

Functions

metric values for a class returned by rocmetrics when you pass a two-column matrix are equivalent to the metric values returned by rocmetrics when you specify classification scores for the class as a column vector.

Alternative Functionality • You can use the plot function to create the average ROC curve. The function returns a ROCCurve object containing the XData, YData, Thresholds, and AUC properties, which correspond to the output arguments FPR, TPR, Thresholds, and AUC of the average function, respectively. For an example, see “Plot Average ROC Curve for Multiclass Classifier” on page 35-6053.

Version History Introduced in R2022a

References [1] Sebastiani, Fabrizio. "Machine Learning in Automated Text Categorization." ACM Computing Surveys 34, no. 1 (March 2002): 1–47.

See Also rocmetrics | addMetrics | plot Topics “ROC Curve and Performance Metrics” on page 18-3

35-138

barttest

barttest Bartlett’s test

Syntax ndim = barttest(x,alpha) [ndim,prob,chisquare] = barttest(x,alpha)

Description ndim = barttest(x,alpha) returns the number of dimensions necessary to explain the nonrandom variation in the data matrix x at the alpha significance level. [ndim,prob,chisquare] = barttest(x,alpha) also returns the significance values for the hypothesis tests prob, and the χ2 values associated with the tests chisquare.

Examples Determine Dimensions Needed to Explain Nonrandom Data Variation Generate a 20-by-6 matrix of random numbers from a multivariate normal distribution with mean mu = [0 0] and covariance sigma = [1 0.99; 0.99 1]. rng default % for reproducibility mu = [0 0]; sigma = [1 0.99; 0.99 1]; X = mvnrnd(mu,sigma,20); % columns 1 and 2 X(:,3:4) = mvnrnd(mu,sigma,20); % columns 3 and 4 X(:,5:6) = mvnrnd(mu,sigma,20); % columns 5 and 6

Determine the number of dimensions necessary to explain the nonrandom variation in data matrix X. Report the significance values for the hypothesis tests. [ndim, prob] = barttest(X,0.05) ndim = 3 prob = 5×1 0.0000 0.0000 0.0000 0.5148 0.3370

The returned value of ndim indicates that three dimensions are necessary to explain the nonrandom variation in X.

35-139

35

Functions

Input Arguments x — Input data matrix of scalar values Input data, specified as a matrix of scalar values. Data Types: single | double alpha — Significance level 0.05 (default) | scalar value in the range (0,1) Significance level of the hypothesis test, specified as a scalar value in the range (0,1). Example: 0.1 Data Types: single | double

Output Arguments ndim — Number of dimensions positive integer value Number of dimensions, returned as a positive integer value. The dimension is determined by a series of hypothesis tests. The test for ndim = 1 tests the hypothesis that the variances of the data values along each principal component are equal, the test for ndim = 2 tests the hypothesis that the variances along the second through last components are equal, and so on. The null hypothesis is that the number of dimensions is equal to the number of the largest unequal eigenvalues of the covariance matrix of x. prob — Significance value vector of scalar values in the range (0,1) Significance value for the hypothesis tests, returned as a vector of scalar values in the range (0,1). Each element in prob corresponds to an element of chisquare. chisquare — Test statistics vector of scalar values Test statistics for each dimension’s hypothesis test, returned as a vector of scalar values.

Version History Introduced before R2006a

35-140

barttest

barttest Bartlett's test for multivariate analysis of variance (MANOVA)

Syntax d = barttest(maov) d = barttest(maov,factor) d = barttest( ___ ,Alpha=alpha) [d,tbl] = barttest( ___ )

Description d = barttest(maov) returns the result of a Bartlett's test to determine the minimum number of dimensions needed by a linear space to contain the mean response vectors of the manova object maov. In other words, barttest calculates the number of linearly independent mean response vectors. This syntax is supported only when maov is a one-way manova object. d = barttest(maov,factor) specifies the factor barttest uses to group the response data. Use this syntax when maov is a two- or N-way manova object. d = barttest( ___ ,Alpha=alpha) specifies the confidence level barttest uses to return the minimum dimension, using any of the input argument combinations in previous syntaxes. The confidence level is (1 – alpha)*100. [d,tbl] = barttest( ___ ) additionally returns a table containing the Bartlett's test statistics.

Examples Perform Bartlett's Test for Two-Way MANOVA Load the carsmall data set. load carsmall

The variable Model_Year contains data for the year a car was manufactured, and the variable Cylinders contains data for the number of engine cylinders in the car. The Acceleration and Displacement variables contain data for car acceleration and displacement. Use the table function to create a table of factor values from the data in Model_Year and Cylinders. tbl = table(Model_Year,Cylinders,VariableNames=["Year" "Cylinders"]);

Create a matrix of response variables from Acceleration and Displacement. y = [Acceleration Displacement];

Perform a two-way MANOVA using the factor values in tbl and the response variables in y. maov = manova(tbl,y)

35-141

35

Functions

maov = 2-way manova Y1,Y2 ~ 1 + Year + Cylinders Source _________

DF __

Year Cylinders Error Total

2 2 95 99

TestStatistic _____________ pillai pillai

Value ________

F ______

0.084893 0.94174

2.1056 42.27

DFNumerator ___________

DFDenominator _____________

p ___

190 190

0 2.5

4 4

Properties, Methods

maov is a two-way manova object that contains the results of the two-way MANOVA. The output displays the formula for the MANOVA model and a MANOVA table. In the formula, the car acceleration and displacement are represented by the variables Y1 and Y2, respectively. The MANOVA table contains a small p-value corresponding to the Cylinders term in the MANOVA model. The small p-value indicates that, at the 95% confidence level, enough evidence exists to conclude that Cylinders has a statistically significant effect on the mean response vector. Year has a p-value larger than 0.05, which indicates that not enough evidence exists to conclude that Year has a statistically significant effect on the mean response vector at the 95% confidence level. Use the barttest function to determine the dimension of the space spanned by the mean response vectors corresponding to the factor Year. barttest(maov,"Year") ans = 0

The output shows that the mean response vectors corresponding to Year span a point, indicating that they are not statistically different from each other. This result is consistent with the large p-value for Year.

Get Statistics for Bartlett's Test Load the fisheriris data set. load fisheriris

The column vector species contains iris flowers of three different species: setosa, versicolor, and virginica. The matrix meas contains four types of measurements for the flower: the length and width of sepals and petals in centimeters. Perform a one-way MANOVA with species as the factor and the measurements in meas as the response variables. maov = manova(species,meas);

maov is a one-way manova object that contains the results of the one-way MANOVA. 35-142

barttest

Perform a Bartlett's test to determine the minimum number of dimensions needed by a linear space to contain the mean response vectors. maov has three mean response vectors corresponding to the three iris species. [d,tbl] = barttest(maov) d = 2 tbl=2×5 table Dimension _________ 0 1

WilksLambda ___________ 0.023439 0.77797

ChiSquared __________ 546.12 36.53

DF __

pValue ___________

8 3

8.8708e-113 5.7861e-08

Each row of the table output corresponds to a dimension checked by the Bartlett's test. The small pvalues in the table indicate that not enough evidence exists to conclude that the mean response vectors are elements of a zero- or one-dimensional space. Three points are guaranteed to be elements of a two-dimensional space, so the Bartlett's test returns 2 as the number of dimensions.

Input Arguments maov — MANOVA results manova object MANOVA results, specified as a manova object. The properties of maov contain the response data and factor values used by barttest to perform the Bartlett's test. factor — Factor used to group response data string scalar | character array Factor used to group the response data, specified as a string scalar or character array. factor must be a name in maov.FactorNames. Example: "Factor2" Data Types: char | string alpha — Significance level 0.05 (default) | scalar value in the range (0,1) Significance level barttest uses to return d, specified as a scalar value in the range (0,1). The confidence level for d is (1 – alpha)*100. Example: Alpha=0.01 Data Types: single | double

Output Arguments d — Dimension nonnegative integer Dimension of the lowest dimensional linear space containing the response vectors of maov, returned as a nonnegative integer. d is the number of linearly independent mean response vectors for maov. 35-143

35

Functions

tbl — Bartlett's test statistics table Bartlett's test statistics, returned as a table. Each row of tbl corresponds to a dimension checked by barttest using the Wilks' lambda test statistic. tbl has the same number of rows as the minimum of the number of response variables and the number of values in factor minus 1. tbl has the following columns: • Dimension — Dimension checked by the Bartlett's test • WilksLambda — Value of the Wilks' lambda test statistic • ChiSquared — Value of the chi-square test statistic corresponding to the Wilks' lambda test statistic • DF — Degrees of freedom of the chi-square test statistic • pValue — p-value for the chi-square test statistic Data Types: table

Alternative Functionality The manova1 function returns the output of the barttest object function, and a subset of the manova object properties. manova1 is limited to one-way MANOVA.

Version History Introduced in R2023b

See Also manova | manova1 Topics “Multivariate Analysis of Variance for Repeated Measures” on page 9-62 “Perform Multivariate Analysis of Variance (MANOVA)” on page 9-52

35-144

BayesianOptimization

BayesianOptimization Bayesian optimization results

Description A BayesianOptimization object contains the results of a Bayesian optimization. It is the output of bayesopt or a fit function that accepts the OptimizeHyperparameters name-value pair such as fitcdiscr. In addition, a BayesianOptimization object contains data for each iteration of bayesopt that can be accessed by a plot function or an output function.

Creation Create a BayesianOptimization object by using the bayesopt function or one of the following fit functions with the OptimizeHyperparameters name-value argument. • Classification fit functions: fitcdiscr, fitcecoc, fitcensemble, fitcgam, fitckernel, fitcknn, fitclinear, fitcnb, fitcnet, fitcsvm, fitctree • Regression fit functions: fitrensemble, fitrgam, fitrgp, fitrkernel, fitrlinear, fitrnet, fitrsvm, fitrtree

Properties Problem Definition Properties ObjectiveFcn — ObjectiveFcn argument used by bayesopt function handle This property is read-only. ObjectiveFcn argument used by bayesopt, specified as a function handle. • If you call bayesopt directly, ObjectiveFcn is the bayesopt objective function argument. • If you call a fit function containing the 'OptimizeHyperparameters' name-value pair argument, ObjectiveFcn is a function handle that returns the misclassification rate for classification or returns the logarithm of one plus the cross-validation loss for regression, measured by five-fold cross-validation. Data Types: function_handle VariableDescriptions — VariableDescriptions argument that bayesopt used vector of optimizableVariable objects This property is read-only. VariableDescriptions argument that bayesopt used, specified as a vector of optimizableVariable objects. 35-145

35

Functions

• If you called bayesopt directly, VariableDescriptions is the bayesopt variable description argument. • If you called a fit function with the OptimizeHyperparameters name-value pair, VariableDescriptions is the vector of hyperparameters. Options — Options that bayesopt used structure This property is read-only. Options that bayesopt used, specified as a structure. • If you called bayesopt directly, Options is the options used in bayesopt, which are the namevalue pairs See bayesopt “Input Arguments” on page 35-167. • If you called a fit function with the OptimizeHyperparameters name-value pair, Options are the default bayesopt options, modified by the HyperparameterOptimizationOptions namevalue pair. Options is a read-only structure containing the following fields. Option Name

Meaning

AcquisitionFunctionName

Acquisition function name. See “Acquisition Function Types” on page 10-3.

IsObjectiveDeterministic

true means the objective function is deterministic, false otherwise.

ExplorationRatio

Used only when AcquisitionFunctionName is 'expected-improvement-plus' or 'expectedimprovement-per-second-plus'. See “Plus” on page 10-5.

MaxObjectiveEvaluations

Objective function evaluation limit.

MaxTime

Time limit.

XConstraintFcn

Deterministic constraints on variables. See “Deterministic Constraints — XConstraintFcn” on page 10-39.

ConditionalVariableFcn

Conditional variable constraints. See “Conditional Constraints — ConditionalVariableFcn” on page 10-40.

NumCoupledConstraints

Number of coupled constraints. See “Coupled Constraints” on page 10-41.

CoupledConstraintTolerances

Coupled constraint tolerances. See “Coupled Constraints” on page 10-41.

AreCoupledConstraintsDeterminis Logical vector specifying whether each coupled tic constraint is deterministic. Verbose

35-146

Command-line display level.

BayesianOptimization

Option Name

Meaning

OutputFcn

Function called after each iteration. See “Bayesian Optimization Output Functions” on page 10-19.

SaveVariableName

Variable name for the @assignInBase output function.

SaveFileName

File name for the @saveToFile output function.

PlotFcn

Plot function called after each iteration. See “Bayesian Optimization Plot Functions” on page 10-11

InitialX

Points where bayesopt evaluated the objective function.

InitialObjective

Objective function values at InitialX.

InitialConstraintViolations

Coupled constraint function values at InitialX.

InitialErrorValues

Error values at InitialX.

InitialObjectiveEvaluationTimes Objective function evaluation times at InitialX. InitialIterationTimes

Time for each iteration, including objective function evaluation and other computations.

Data Types: struct Solution Properties MinObjective — Minimum observed value of objective function real scalar This property is read-only. Minimum observed value of objective function, specified as a real scalar. When there are coupled constraints or evaluation errors, this value is the minimum over all observed points that are feasible according to the final constraint and Error models. Data Types: double XAtMinObjective — Observed point with minimum objective function value 1-by-D table This property is read-only. Observed point with minimum objective function value, specified as a 1-by-D table, where D is the number of variables. Data Types: table MinEstimatedObjective — Estimated objective function value real scalar This property is read-only. Estimated objective function value at XAtMinEstimatedObjective, specified as a real scalar. MinEstimatedObjective is the mean value of the posterior distribution of the final objective model. The software estimates the MinEstimatedObjective value by passing XAtMinEstimatedObjective to the object function predictObjective. 35-147

35

Functions

Data Types: double XAtMinEstimatedObjective — Point with minimum upper confidence bound of objective function value 1-by-D table This property is read-only. Point with the minimum upper confidence bound of the objective function value among the visited points, specified as a 1-by-D table, where D is the number of variables. The software uses the final objective model to find the upper confidence bounds of the visited points. XAtMinEstimatedObjective is the same as the best point returned by the bestPoint function with the default criterion ('min-visited-upper-confidence-interval'). Data Types: table NumObjectiveEvaluations — Number of objective function evaluations positive integer This property is read-only. Number of objective function evaluations, specified as a positive integer. This includes the initial evaluations to form a posterior model as well as evaluation during the optimization iterations. Data Types: double TotalElapsedTime — Total elapsed time of optimization in seconds positive scalar This property is read-only. Total elapsed time of optimization in seconds, specified as a positive scalar. Data Types: double NextPoint — Next point to evaluate if optimization continues 1-by-D table This property is read-only. Next point to evaluate if optimization continues, specified as a 1-by-D table, where D is the number of variables. Data Types: table Trace Properties XTrace — Points where the objective function was evaluated T-by-D table This property is read-only. Points where the objective function was evaluated, specified as a T-by-D table, where T is the number of evaluation points and D is the number of variables. Data Types: table 35-148

BayesianOptimization

ObjectiveTrace — Objective function values column vector of length T This property is read-only. Objective function values, specified as a column vector of length T, where T is the number of evaluation points. ObjectiveTrace contains the history of objective function evaluations. Data Types: double ObjectiveEvaluationTimeTrace — Objective function evaluation times column vector of length T This property is read-only. Objective function evaluation times, specified as a column vector of length T, where T is the number of evaluation points. ObjectiveEvaluationTimeTrace includes the time in evaluating coupled constraints, because the objective function computes these constraints. Data Types: double IterationTimeTrace — Iteration times column vector of length T This property is read-only. Iteration times, specified as a column vector of length T, where T is the number of evaluation points. IterationTimeTrace includes both objective function evaluation time and other overhead. Data Types: double ConstraintsTrace — Coupled constraint values T-by-K array This property is read-only. Coupled constraint values, specified as a T-by-K array, where T is the number of evaluation points and K is the number of coupled constraints. Data Types: double ErrorTrace — Error indications column vector of length T of -1 or 1 entries This property is read-only. Error indications, specified as a column vector of length T of -1 or 1 entries, where T is the number of evaluation points. Each 1 entry indicates that the objective function errored or returned NaN on the corresponding point in XTrace. Each -1 entry indicates that the objective function value was computed. Data Types: double FeasibilityTrace — Feasibility indications logical column vector of length T This property is read-only. 35-149

35

Functions

Feasibility indications, specified as a logical column vector of length T, where T is the number of evaluation points. Each 1 entry indicates that the final constraint model predicts feasibility at the corresponding point in XTrace. Data Types: logical FeasibilityProbabilityTrace — Probability that evaluation point is feasible column vector of length T This property is read-only. Probability that evaluation point is feasible, specified as a column vector of length T, where T is the number of evaluation points. The probabilities come from the final constraint model, including the error constraint model, on the corresponding points in XTrace. Data Types: double IndexOfMinimumTrace — Which evaluation gave minimum feasible objective column vector of integer indices of length T This property is read-only. Which evaluation gave minimum feasible objective, specified as a column vector of integer indices of length T, where T is the number of evaluation points. Feasibility is determined with respect to the constraint models that existed at each iteration, including the error constraint model. Data Types: double ObjectiveMinimumTrace — Minimum observed objective column vector of length T This property is read-only. Minimum observed objective, specified as a column vector of length T, where T is the number of evaluation points. Data Types: double EstimatedObjectiveMinimumTrace — Estimated objective column vector of length T This property is read-only. Estimated objective, specified as a column vector of length T, where T is the number of evaluation points. The estimated objective at each iteration is determined with respect to the objective model at that iteration. At each iteration, the software uses the object function predictObjective to estimate the objective function value at the point with the minimum upper confidence bound of the objective function among the visited points. Data Types: double UserDataTrace — Auxiliary data from the objective function cell array of length T This property is read-only. 35-150

BayesianOptimization

Auxiliary data from the objective function, specified as a cell array of length T, where T is the number of evaluation points. Each entry in the cell array is the UserData returned in the third output of the objective function. Data Types: cell

Object Functions bestPoint plot predictConstraints predictError predictObjective predictObjectiveEvaluationTime resume

Best point in a Bayesian optimization according to a criterion Plot Bayesian optimization results Predict coupled constraint violations at a set of points Predict error value at a set of points Predict objective function at a set of points Predict objective function run times at a set of points Resume a Bayesian optimization

Examples Create a BayesianOptimization Object Using bayesopt This example shows how to create a BayesianOptimization object by using bayesopt to minimize cross-validation loss. Optimize hyperparameters of a KNN classifier for the ionosphere data, that is, find KNN hyperparameters that minimize the cross-validation loss. Have bayesopt minimize over the following hyperparameters: • Nearest-neighborhood sizes from 1 to 30 • Distance functions 'chebychev', 'euclidean', and 'minkowski'. For reproducibility, set the random seed, set the partition, and set the AcquisitionFunctionName option to 'expected-improvement-plus'. To suppress iterative display, set 'Verbose' to 0. Pass the partition c and fitting data X and Y to the objective function fun by creating fun as an anonymous function that incorporates this data. See “Parameterizing Functions”. load ionosphere rng default num = optimizableVariable('n',[1,30],'Type','integer'); dst = optimizableVariable('dst',{'chebychev','euclidean','minkowski'},'Type','categorical'); c = cvpartition(351,'Kfold',5); fun = @(x)kfoldLoss(fitcknn(X,Y,'CVPartition',c,'NumNeighbors',x.n,... 'Distance',char(x.dst),'NSMethod','exhaustive')); results = bayesopt(fun,[num,dst],'Verbose',0,... 'AcquisitionFunctionName','expected-improvement-plus')

35-151

35

Functions

35-152

BayesianOptimization

results = BayesianOptimization with properties: ObjectiveFcn: VariableDescriptions: Options: MinObjective: XAtMinObjective: MinEstimatedObjective: XAtMinEstimatedObjective: NumObjectiveEvaluations: TotalElapsedTime: NextPoint: XTrace: ObjectiveTrace: ConstraintsTrace: UserDataTrace: ObjectiveEvaluationTimeTrace: IterationTimeTrace: ErrorTrace: FeasibilityTrace: FeasibilityProbabilityTrace: IndexOfMinimumTrace: ObjectiveMinimumTrace: EstimatedObjectiveMinimumTrace:

@(x)kfoldLoss(fitcknn(X,Y,'CVPartition',c,'NumNeighbors',x.n, [1x2 optimizableVariable] [1x1 struct] 0.1197 [1x2 table] 0.1213 [1x2 table] 30 49.8494 [1x2 table] [30x2 table] [30x1 double] [] {30x1 cell} [30x1 double] [30x1 double] [30x1 double] [30x1 logical] [30x1 double] [30x1 double] [30x1 double] [30x1 double]

35-153

35

Functions

Create a BayesianOptimization Object Using a Fit Function This example shows how to minimize the cross-validation loss in the ionosphere data using Bayesian optimization of an SVM classifier. Load the data. load ionosphere

Optimize the classification using the 'auto' parameters. rng default % For reproducibility Mdl = fitcsvm(X,Y,'OptimizeHyperparameters','auto')

|================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | BoxConstraint| KernelS | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 1 | Best | 0.35897 | 1.0784 | 0.35897 | 0.35897 | 3.8653 | 96 | 2 | Best | 0.13105 | 22.555 | 0.13105 | 0.15896 | 429.99 | 0. | 3 | Accept | 0.35897 | 0.21344 | 0.13105 | 0.13431 | 0.11801 | 8. | 4 | Accept | 0.1339 | 12.766 | 0.13105 | 0.13255 | 0.0010694 | 0.003 | 5 | Accept | 0.17949 | 24.938 | 0.13105 | 0.13109 | 973.65 | 0.1 | 6 | Accept | 0.35897 | 0.14708 | 0.13105 | 0.14734 | 0.0048533 | 95 | 7 | Accept | 0.1339 | 10.338 | 0.13105 | 0.13109 | 997.19 | 2. | 8 | Accept | 0.26781 | 25.882 | 0.13105 | 0.13109 | 4.7295 | 0.003 | 9 | Accept | 0.35897 | 0.16604 | 0.13105 | 0.13109 | 91.515 | 99 | 10 | Accept | 0.35897 | 0.16067 | 0.13105 | 0.13111 | 0.0011174 | 84 | 11 | Accept | 0.35897 | 0.17076 | 0.13105 | 0.14535 | 0.0098482 | 15 | 12 | Accept | 0.13105 | 0.16898 | 0.13105 | 0.13106 | 0.0010189 | 0.2 | 13 | Accept | 0.23077 | 0.18496 | 0.13105 | 0.13863 | 0.0010009 | 0.8 | 14 | Best | 0.12251 | 0.2545 | 0.12251 | 0.12044 | 0.0059501 | 0.08 | 15 | Accept | 0.12821 | 0.86239 | 0.12251 | 0.12203 | 0.052767 | 0.08 | 16 | Accept | 0.12251 | 0.21458 | 0.12251 | 0.12291 | 0.001082 | 0.05 | 17 | Accept | 0.12251 | 0.20668 | 0.12251 | 0.12238 | 0.001016 | 0.04 | 18 | Accept | 0.22792 | 0.16482 | 0.12251 | 0.12239 | 0.0011871 | 0.4 | 19 | Accept | 0.12821 | 0.3352 | 0.12251 | 0.12229 | 0.0010334 | 0.01 | 20 | Accept | 0.18803 | 26.259 | 0.12251 | 0.1223 | 986.83 | 0.001 |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | BoxConstraint| KernelS | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 21 | Best | 0.11966 | 0.23315 | 0.11966 | 0.11969 | 0.066257 | 0.2 | 22 | Accept | 0.12821 | 4.6873 | 0.11966 | 0.11975 | 7.0089 | 0.3 | 23 | Accept | 0.13675 | 16.093 | 0.11966 | 0.11975 | 993.3 | 0.8 | 24 | Accept | 0.12251 | 0.23387 | 0.11966 | 0.11991 | 0.020223 | 0.1 | 25 | Accept | 0.13105 | 1.6546 | 0.11966 | 0.11994 | 0.014336 | 0.02 | 26 | Accept | 0.12251 | 0.16864 | 0.11966 | 0.11989 | 0.0010626 | 0.05 | 27 | Best | 0.11681 | 0.18406 | 0.11681 | 0.11683 | 0.001015 | 0.02 | 28 | Accept | 0.12536 | 0.20642 | 0.11681 | 0.11684 | 0.0051588 | 0.04 | 29 | Accept | 0.1339 | 0.28064 | 0.11681 | 0.11682 | 0.0010043 | 0.01 | 30 | Accept | 0.12821 | 0.24365 | 0.11681 | 0.11686 | 0.16741 | 0.1 __________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached.

35-154

BayesianOptimization

Total function evaluations: 30 Total elapsed time: 179.6355 seconds Total objective function evaluation time: 151.0522 Best observed feasible point: BoxConstraint KernelScale _____________ ___________ 0.001015

0.026543

Standardize ___________ false

Observed objective function value = 0.11681 Estimated objective function value = 0.11686 Function evaluation time = 0.18406 Best estimated feasible point (according to models): BoxConstraint KernelScale Standardize _____________ ___________ ___________ 0.001015

0.026543

false

Estimated objective function value = 0.11686 Estimated function evaluation time = 0.1844

Mdl = ClassificationSVM ResponseName: 'Y' CategoricalPredictors: []

35-155

35

Functions

ClassNames: ScoreTransform: NumObservations: HyperparameterOptimizationResults: Alpha: Bias: KernelParameters: BoxConstraints: ConvergenceInfo: IsSupportVector: Solver:

{'b' 'g'} 'none' 351 [1x1 BayesianOptimization] [100x1 double] -4.5306 [1x1 struct] [351x1 double] [1x1 struct] [351x1 logical] 'SMO'

The fit achieved about 12% loss for the default 5-fold cross validation. Examine the BayesianOptimization object that is returned in the HyperparameterOptimizationResults property of the returned model. disp(Mdl.HyperparameterOptimizationResults) BayesianOptimization with properties: ObjectiveFcn: VariableDescriptions: Options: MinObjective: XAtMinObjective: MinEstimatedObjective: XAtMinEstimatedObjective: NumObjectiveEvaluations: TotalElapsedTime: NextPoint: XTrace: ObjectiveTrace: ConstraintsTrace: UserDataTrace: ObjectiveEvaluationTimeTrace: IterationTimeTrace: ErrorTrace: FeasibilityTrace: FeasibilityProbabilityTrace: IndexOfMinimumTrace: ObjectiveMinimumTrace: EstimatedObjectiveMinimumTrace:

@createObjFcn/inMemoryObjFcn [5x1 optimizableVariable] [1x1 struct] 0.1168 [1x3 table] 0.1169 [1x3 table] 30 179.6355 [1x3 table] [30x3 table] [30x1 double] [] {30x1 cell} [30x1 double] [30x1 double] [30x1 double] [30x1 logical] [30x1 double] [30x1 double] [30x1 double] [30x1 double]

Version History Introduced in R2016b

See Also bayesopt Topics “Bayesian Optimization Workflow” on page 10-25 35-156

bayesopt

bayesopt Select optimal machine learning hyperparameters using Bayesian optimization

Syntax results = bayesopt(fun,vars) results = bayesopt(fun,vars,Name,Value)

Description results = bayesopt(fun,vars) attempts to find values of vars that minimize fun(vars). Note To include extra parameters in an objective function, see “Parameterizing Functions”. results = bayesopt(fun,vars,Name,Value) modifies the optimization process according to the Name,Value arguments.

Examples Create a BayesianOptimization Object Using bayesopt This example shows how to create a BayesianOptimization object by using bayesopt to minimize cross-validation loss. Optimize hyperparameters of a KNN classifier for the ionosphere data, that is, find KNN hyperparameters that minimize the cross-validation loss. Have bayesopt minimize over the following hyperparameters: • Nearest-neighborhood sizes from 1 to 30 • Distance functions 'chebychev', 'euclidean', and 'minkowski'. For reproducibility, set the random seed, set the partition, and set the AcquisitionFunctionName option to 'expected-improvement-plus'. To suppress iterative display, set 'Verbose' to 0. Pass the partition c and fitting data X and Y to the objective function fun by creating fun as an anonymous function that incorporates this data. See “Parameterizing Functions”. load ionosphere rng default num = optimizableVariable('n',[1,30],'Type','integer'); dst = optimizableVariable('dst',{'chebychev','euclidean','minkowski'},'Type','categorical'); c = cvpartition(351,'Kfold',5); fun = @(x)kfoldLoss(fitcknn(X,Y,'CVPartition',c,'NumNeighbors',x.n,... 'Distance',char(x.dst),'NSMethod','exhaustive')); results = bayesopt(fun,[num,dst],'Verbose',0,... 'AcquisitionFunctionName','expected-improvement-plus')

35-157

35

Functions

35-158

bayesopt

results = BayesianOptimization with properties: ObjectiveFcn: VariableDescriptions: Options: MinObjective: XAtMinObjective: MinEstimatedObjective: XAtMinEstimatedObjective: NumObjectiveEvaluations: TotalElapsedTime: NextPoint: XTrace: ObjectiveTrace: ConstraintsTrace: UserDataTrace: ObjectiveEvaluationTimeTrace: IterationTimeTrace: ErrorTrace: FeasibilityTrace: FeasibilityProbabilityTrace: IndexOfMinimumTrace: ObjectiveMinimumTrace: EstimatedObjectiveMinimumTrace:

@(x)kfoldLoss(fitcknn(X,Y,'CVPartition',c,'NumNeighbors',x.n, [1x2 optimizableVariable] [1x1 struct] 0.1197 [1x2 table] 0.1213 [1x2 table] 30 49.8494 [1x2 table] [30x2 table] [30x1 double] [] {30x1 cell} [30x1 double] [30x1 double] [30x1 double] [30x1 logical] [30x1 double] [30x1 double] [30x1 double] [30x1 double]

35-159

35

Functions

Bayesian Optimization with Coupled Constraints A coupled constraint is one that can be evaluated only by evaluating the objective function. In this case, the objective function is the cross-validated loss of an SVM model. The coupled constraint is that the number of support vectors is no more than 100. The model details are in “Optimize CrossValidated Classifier Using bayesopt” on page 10-46. Create the data for classification. rng default grnpop = mvnrnd([1,0],eye(2),10); redpop = mvnrnd([0,1],eye(2),10); redpts = zeros(100,2); grnpts = redpts; for i = 1:100 grnpts(i,:) = mvnrnd(grnpop(randi(10),:),eye(2)*0.02); redpts(i,:) = mvnrnd(redpop(randi(10),:),eye(2)*0.02); end cdata = [grnpts;redpts]; grp = ones(200,1); grp(101:200) = -1; c = cvpartition(200,'KFold',10); sigma = optimizableVariable('sigma',[1e-5,1e5],'Transform','log'); box = optimizableVariable('box',[1e-5,1e5],'Transform','log');

The objective function is the cross-validation loss of the SVM model for partition c. The coupled constraint is the number of support vectors minus 100.5. This ensures that 100 support vectors give a negative constraint value, but 101 support vectors give a positive value. The model has 200 data points, so the coupled constraint values range from -99.5 (there is always at least one support vector) to 99.5. Positive values mean the constraint is not satisfied. function [objective,constraint] = mysvmfun(x,cdata,grp,c) SVMModel = fitcsvm(cdata,grp,'KernelFunction','rbf',... 'BoxConstraint',x.box,... 'KernelScale',x.sigma); cvModel = crossval(SVMModel,'CVPartition',c); objective = kfoldLoss(cvModel); constraint = sum(SVMModel.IsSupportVector)-100.5;

Pass the partition c and fitting data cdata and grp to the objective function fun by creating fun as an anonymous function that incorporates this data. See “Parameterizing Functions”. fun = @(x)mysvmfun(x,cdata,grp,c);

Set the NumCoupledConstraints to 1 so the optimizer knows that there is a coupled constraint. Set options to plot the constraint model. results = bayesopt(fun,[sigma,box],'IsObjectiveDeterministic',true,... 'NumCoupledConstraints',1,'PlotFcn',... {@plotMinObjective,@plotConstraintModels},... 'AcquisitionFunctionName','expected-improvement-plus','Verbose',0);

35-160

bayesopt

35-161

35

Functions

35-162

bayesopt

Most points lead to an infeasible number of support vectors.

Parallel Bayesian Optimization Improve the speed of a Bayesian optimization by using parallel objective function evaluation. Prepare variables and the objective function for Bayesian optimization. The objective function is the cross-validation error rate for the ionosphere data, a binary classification problem. Use fitcsvm as the classifier, with BoxConstraint and KernelScale as the parameters to optimize. load ionosphere box = optimizableVariable('box',[1e-4,1e3],'Transform','log'); kern = optimizableVariable('kern',[1e-4,1e3],'Transform','log'); vars = [box,kern]; fun = @(vars)kfoldLoss(fitcsvm(X,Y,'BoxConstraint',vars.box,'KernelScale',vars.kern,... 'Kfold',5));

35-163

35

Functions

Search for the parameters that give the lowest cross-validation error by using parallel Bayesian optimization. results = bayesopt(fun,vars,'UseParallel',true); Copying objective function to workers... Done copying objective function to workers.

|================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | box | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 2 | Accept | 0.2735 | 0.56171 | 0.13105 | 0.13108 | 0.0002608 | 2 | 2 | Accept | 0.35897 | 0.4062 | 0.13105 | 0.13108 | 3.6999 | 3 | 2 | Accept | 0.13675 | 0.42727 | 0.13105 | 0.13108 | 0.33594 | 4 | 2 | Accept | 0.35897 | 0.4453 | 0.13105 | 0.13108 | 0.014127 | 5 | 2 | Best | 0.13105 | 0.45503 | 0.13105 | 0.13108 | 0.29713 |

6 |

6 | Accept |

0.35897 |

0.16605 |

0.13105 |

0.13108 |

8.1878

| |

7 | 8 |

5 | Best | 5 | Accept |

0.11396 | 0.14245 |

0.51146 | 0.24943 |

0.11396 | 0.11396 |

0.11395 | 0.11395 |

8.7331 0.0020774

|

9 |

6 | Best

|

0.10826 |

4.0711 |

0.10826 |

0.10827 |

0.0015925

|

10 |

6 | Accept |

0.25641 |

16.265 |

0.10826 |

0.10829 |

0.00057357

|

11 |

6 | Accept |

0.1339 |

15.581 |

0.10826 |

0.10829 |

1.4553

|

12 |

6 | Accept |

0.16809 |

19.585 |

0.10826 |

0.10828 |

0.26919

|

13 |

6 | Accept |

0.20513 |

18.637 |

0.10826 |

0.10828 |

369.59

|

14 |

6 | Accept |

0.12536 |

0.11382 |

0.10826 |

0.10829 |

5.7059

|

15 |

6 | Accept |

0.13675 |

2.63 |

0.10826 |

0.10828 |

984.19

|

16 |

6 | Accept |

0.12821 |

2.0743 |

0.10826 |

0.11144 |

0.0063411

|

17 |

6 | Accept |

0.1339 |

0.1939 |

0.10826 |

0.11302 |

0.00010225

|

18 |

6 | Accept |

0.12821 |

0.20933 |

0.10826 |

0.11376 |

7.7447

| 19 | 4 | Accept | 0.55556 | 17.564 | 0.10826 | 0.10828 | 0.0087593 | 20 | 4 | Accept | 0.1396 | 16.473 | 0.10826 | 0.10828 | 0.054844 |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | box | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 4 | Accept | 0.1339 | 0.17127 | 0.10826 | 0.10828 | 9.2668

35-164

|

22 |

4 | Accept |

0.12821 |

0.089065 |

0.10826 |

0.10828 |

12.265

|

23 |

4 | Accept |

0.12536 |

0.073586 |

0.10826 |

0.10828 |

1.3355

|

24 |

4 | Accept |

0.12821 |

0.08038 |

0.10826 |

0.10828 |

131.51

| |

25 | 26 |

3 | Accept | 3 | Accept |

0.11111 | 0.13675 |

10.687 | 0.18626 |

0.10826 | 0.10826 |

0.10867 | 0.10867 |

1.4795 2.0513

|

27 |

6 | Accept |

0.12821 |

0.078559 |

0.10826 |

0.10868 |

980.04

bayesopt

| |

28 | 29 |

5 | Accept | 5 | Accept |

0.33048 | 0.16239 |

0.089844 | 0.12688 |

0.10826 | 0.10826 |

0.10843 | 0.10843 |

0.41821 172.39

|

30 |

5 | Accept |

0.11966 |

0.14597 |

0.10826 |

0.10846 |

639.15

35-165

35

Functions

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 48.2085 seconds. Total objective function evaluation time: 128.3472 Best observed feasible point: box kern _________ _________ 0.0015925

0.0050225

Observed objective function value = 0.10826 Estimated objective function value = 0.10846 Function evaluation time = 4.0711 Best estimated feasible point (according to models): box kern _________ _________ 0.0015925

0.0050225

Estimated objective function value = 0.10846 Estimated function evaluation time = 2.8307

35-166

bayesopt

Return the best feasible point in the Bayesian model results by using the bestPoint function. Use the default criterion min-visited-upper-confidence-interval, which determines the best feasible point as the visited point that minimizes an upper confidence interval on the objective function value. zbest = bestPoint(results) zbest=1×2 table box _________ 0.0015925

kern _________ 0.0050225

The table zbest contains the optimal estimated values for the 'BoxConstraint' and 'KernelScale' name-value pair arguments. Use these values to train a new optimized classifier. Mdl = fitcsvm(X,Y,'BoxConstraint',zbest.box,'KernelScale',zbest.kern);

Observe that the optimal parameters are in Mdl. Mdl.BoxConstraints(1) ans = 0.0016 Mdl.KernelParameters.Scale ans = 0.0050

Input Arguments fun — Objective function function handle | parallel.pool.Constant whose Value is a function handle Objective function, specified as a function handle or, when the UseParallel name-value pair is true, a parallel.pool.Constant whose Value is a function handle. Typically, fun returns a measure of loss (such as a misclassification error) for a machine learning model that has tunable hyperparameters to control its training. fun has these signatures: objective = fun(x) % or [objective,constraints] = fun(x) % or [objective,constraints,UserData] = fun(x)

fun accepts x, a 1-by-D table of variable values, and returns objective, a real scalar representing the objective function value fun(x). Optionally, fun also returns: • constraints, a real vector of coupled constraint violations. For a definition, see “Coupled Constraints” on page 35-175. constraint(j) > 0 means constraint j is violated. constraint(j) < 0 means constraint j is satisfied. • UserData, an entity of any type (such as a scalar, matrix, structure, or object). For an example of a custom plot function that uses UserData, see “Create a Custom Plot Function” on page 10-12. 35-167

35

Functions

For details about using parallel.pool.Constant with bayesopt, see “Placing the Objective Function on Workers” on page 10-8. Example: @objfun Data Types: function_handle vars — Variable descriptions vector of optimizableVariable objects defining the hyperparameters to be tuned Variable descriptions, specified as a vector of optimizableVariable objects defining the hyperparameters to be tuned. Example: [X1,X2], where X1 and X2 are optimizableVariable objects Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: results = bayesopt(fun,vars,'AcquisitionFunctionName','expectedimprovement-plus') Algorithm Control

AcquisitionFunctionName — Function to choose next evaluation point 'expected-improvement-per-second-plus' (default) | 'expected-improvement' | 'expected-improvement-plus' | 'expected-improvement-per-second' | 'lowerconfidence-bound' | 'probability-of-improvement' Function to choose next evaluation point, specified as one of the listed choices. Acquisition functions whose names include per-second do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include plus modify their behavior when they are overexploiting an area. For more details, see “Acquisition Function Types” on page 10-3. Example: 'AcquisitionFunctionName','expected-improvement-per-second' IsObjectiveDeterministic — Specify deterministic objective function false (default) | true Specify deterministic objective function, specified as false or true. If fun is stochastic (that is, fun(x) can return different values for the same x), then set IsObjectiveDeterministic to false. In this case, bayesopt estimates a noise level during optimization. Example: 'IsObjectiveDeterministic',true Data Types: logical ExplorationRatio — Propensity to explore 0.5 (default) | positive real Propensity to explore, specified as a positive real. Applies to the 'expected-improvement-plus' and 'expected-improvement-per-second-plus' acquisition functions. See “Plus” on page 10-5. 35-168

bayesopt

Example: 'ExplorationRatio',0.2 Data Types: double GPActiveSetSize — Fit Gaussian Process model to GPActiveSetSize or fewer points 300 (default) | positive integer Fit Gaussian Process model to GPActiveSetSize or fewer points, specified as a positive integer. When bayesopt has visited more than GPActiveSetSize points, subsequent iterations that use a GP model fit the model to GPActiveSetSize points. bayesopt chooses points uniformly at random without replacement among visited points. Using fewer points leads to faster GP model fitting, at the expense of possibly less accurate fitting. Example: 'GPActiveSetSize',80 Data Types: double UseParallel — Compute in parallel false (default) | true Compute in parallel, specified as false (do not compute in parallel) or true (compute in parallel). Computing in parallel requires Parallel Computing Toolbox. bayesopt performs parallel objective function evaluations concurrently on parallel workers. For algorithmic details, see “Parallel Bayesian Optimization” on page 10-7. Example: 'UseParallel',true Data Types: logical ParallelMethod — Imputation method for parallel worker objective function values 'clipped-model-prediction' (default) | 'model-prediction' | 'max-observed' | 'minobserved' Imputation method for parallel worker objective function values, specified as 'clipped-modelprediction', 'model-prediction', 'max-observed', or 'min-observed'. To generate a new point to evaluate, bayesopt fits a Gaussian process to all points, including the points being evaluated on workers. To fit the process, bayesopt imputes objective function values for the points that are currently on workers. ParallelMethod specifies the method used for imputation. • 'clipped-model-prediction' — Impute the maximum of these quantities: • Mean Gaussian process prediction at the point x • Minimum observed objective function among feasible points visited • Minimum model prediction among all feasible points • 'model-prediction' — Impute the mean Gaussian process prediction at the point x. • 'max-observed' — Impute the maximum observed objective function value among feasible points. • 'min-observed' — Impute the minimum observed objective function value among feasible points. Example: 'ParallelMethod','max-observed' MinWorkerUtilization — Tolerance on number of active parallel workers floor(0.8*Nworkers) (default) | positive integer 35-169

35

Functions

Tolerance on the number of active parallel workers, specified as a positive integer. After bayesopt assigns a point to evaluate, and before it computes a new point to assign, it checks whether fewer than MinWorkerUtilization workers are active. If so, bayesopt assigns random points within bounds to all available workers. Otherwise, bayesopt calculates the best point for one worker. bayesopt creates random points much faster than fitted points, so this behavior leads to higher utilization of workers, at the cost of possibly poorer points. For details, see “Parallel Bayesian Optimization” on page 10-7. Example: 'MinWorkerUtilization',3 Data Types: double Starting and Stopping

MaxObjectiveEvaluations — Objective function evaluation limit 30 (default) | positive integer Objective function evaluation limit, specified as a positive integer. Example: 'MaxObjectiveEvaluations',60 Data Types: double MaxTime — Time limit Inf (default) | positive real Time limit, specified as a positive real. The time limit is in seconds, as measured by tic and toc. Run time can exceed MaxTime because bayesopt does not interrupt function evaluations. Example: 'MaxTime',3600 Data Types: double NumSeedPoints — Number of initial evaluation points 4 (default) | positive integer Number of initial evaluation points, specified as a positive integer. bayesopt chooses these points randomly within the variable bounds, according to the setting of the Transform setting for each variable (uniform for 'none', logarithmically spaced for 'log'). Example: 'NumSeedPoints',10 Data Types: double Constraints

XConstraintFcn — Deterministic constraints on variables [] (default) | function handle Deterministic constraints on variables, specified as a function handle. For details, see “Deterministic Constraints — XConstraintFcn” on page 10-39. Example: 'XConstraintFcn',@xconstraint Data Types: function_handle ConditionalVariableFcn — Conditional variable constraints [] (default) | function handle 35-170

bayesopt

Conditional variable constraints, specified as a function handle. For details, see “Conditional Constraints — ConditionalVariableFcn” on page 10-40. Example: 'ConditionalVariableFcn',@condfun Data Types: function_handle NumCoupledConstraints — Number of coupled constraints 0 (default) | positive integer Number of coupled constraints, specified as a positive integer. For details, see “Coupled Constraints” on page 10-41. Note NumCoupledConstraints is required when you have coupled constraints. Example: 'NumCoupledConstraints',3 Data Types: double AreCoupledConstraintsDeterministic — Indication of whether coupled constraints are deterministic true for all coupled constraints (default) | logical vector Indication of whether coupled constraints are deterministic, specified as a logical vector of length NumCoupledConstraints. For details, see “Coupled Constraints” on page 10-41. Example: 'AreCoupledConstraintsDeterministic',[true,false,true] Data Types: logical Reports, Plots, and Halting

Verbose — Command-line display level 1 (default) | 0 | 2 Command-line display level, specified as 0, 1, or 2. • 0 — No command-line display. • 1 — At each iteration, display the iteration number, result report (see the next paragraph), objective function model, objective function evaluation time, best (lowest) observed objective function value, best (lowest) estimated objective function value, and the observed constraint values (if any). When optimizing in parallel, the display also includes a column showing the number of active workers, counted after assigning a job to the next worker. The result report for each iteration is one of the following: • Accept — The objective function returns a finite value, and all constraints are satisfied. • Best — Constraints are satisfied, and the objective function returns the lowest value among feasible points. • Error — The objective function returns a value that is not a finite real scalar. • Infeas — At least one constraint is violated.

35-171

35

Functions

• 2 — Same as 1, adding diagnostic information such as time to select the next point, model fitting time, indication that "plus" acquisition functions declare overexploiting, and parallel workers are being assigned to random points due to low parallel utilization. Example: 'Verbose',2 Data Types: double OutputFcn — Function called after each iteration {} (default) | function handle | cell array of function handles Function called after each iteration, specified as a function handle or cell array of function handles. An output function can halt the solver, and can perform arbitrary calculations, including creating variables or plotting. Specify several output functions using a cell array of function handles. There are two built-in output functions: • @assignInBase — Constructs a BayesianOptimization instance at each iteration and assigns it to a variable in the base workspace. Choose a variable name using the SaveVariableName name-value pair. • @saveToFile — Constructs a BayesianOptimization instance at each iteration and saves it to a file in the current folder. Choose a file name using the SaveFileName name-value pair. You can write your own output functions. For details, see “Bayesian Optimization Output Functions” on page 10-19. Example: 'OutputFcn',{@saveToFile @myOutputFunction} Data Types: cell | function_handle SaveFileName — File name for the @saveToFile output function 'BayesoptResults.mat' (default) | character vector | string scalar File name for the @saveToFile output function, specified as a character vector or string scalar. The file name can include a path, such as '../optimizations/September2.mat'. Example: 'SaveFileName','September2.mat' Data Types: char | string SaveVariableName — Variable name for the @assignInBase output function 'BayesoptResults' (default) | character vector | string scalar Variable name for the @assignInBase output function, specified as a character vector or string scalar. Example: 'SaveVariableName','September2Results' Data Types: char | string PlotFcn — Plot function called after each iteration {@plotObjectiveModel,@plotMinObjective} (default) | 'all' | function handle | cell array of function handles Plot function called after each iteration, specified as 'all', a function handle, or a cell array of function handles. A plot function can halt the solver, and can perform arbitrary calculations, including creating variables, in addition to plotting. 35-172

bayesopt

Specify no plot function as []. 'all' calls all built-in plot functions. Specify several plot functions using a cell array of function handles. The built-in plot functions appear in the following tables. Model Plots — Apply When D ≤ 2 Description @plotAcquisitionFunction

Plot the acquisition function surface.

@plotConstraintModels

Plot each constraint model surface. Negative values indicate feasible points. Also plot a P(feasible) surface. Also plot the error model, if it exists, which ranges from –1 to 1. Negative values mean that the model probably does not error, positive values mean that it probably does error. The model is: Plotted error = 2*Probability(error) – 1.

@plotObjectiveEvaluationTime Plot the objective function evaluation time model surface. Model @plotObjectiveModel

Plot the fun model surface, the estimated location of the minimum, and the location of the next proposed point to evaluate. For one-dimensional problems, plot envelopes one credible interval above and below the mean function, and envelopes one noise standard deviation above and below the mean.

Trace Plots — Apply to All D

Description

@plotObjective

Plot each observed function value versus the number of function evaluations.

@plotObjectiveEvaluationTime Plot each observed function evaluation run time versus the number of function evaluations. @plotMinObjective

Plot the minimum observed and estimated function values versus the number of function evaluations.

@plotElapsedTime

Plot three curves: the total elapsed time of the optimization, the total function evaluation time, and the total modeling and point selection time, all versus the number of function evaluations.

You can write your own plot functions. For details, see “Bayesian Optimization Plot Functions” on page 10-11. Note When there are coupled constraints, iterative display and plot functions can give counterintuitive results such as: • A minimum objective plot can increase. • The optimization can declare a problem infeasible even when it showed an earlier feasible point. The reason for this behavior is that the decision about whether a point is feasible can change as the optimization progresses. bayesopt determines feasibility with respect to its constraint model, and 35-173

35

Functions

this model changes as bayesopt evaluates points. So a “minimum objective” plot can increase when the minimal point is later deemed infeasible, and the iterative display can show a feasible point that is later deemed infeasible. Example: 'PlotFcn','all' Data Types: char | string | cell | function_handle Initialization

InitialX — Initial evaluation points NumSeedPoints-by-D random initial points within bounds (default) | N-by-D table Initial evaluation points, specified as an N-by-D table, where N is the number of evaluation points, and D is the number of variables. Note If only InitialX is provided, it is interpreted as initial points to evaluate. The objective function is evaluated at InitialX. If any other initialization parameters are also provided, InitialX is interpreted as prior function evaluation data. The objective function is not evaluated. Any missing values are set to NaN. Data Types: table InitialObjective — Objective values corresponding to InitialX [] (default) | length-N vector Objective values corresponding to InitialX, specified as a length-N vector, where N is the number of evaluation points. Example: 'InitialObjective',[17;-3;-12.5] Data Types: double InitialConstraintViolations — Constraint violations of coupled constraints [] (default) | N-by-K matrix Constraint violations of coupled constraints, specified as an N-by-K matrix, where N is the number of evaluation points and K is the number of coupled constraints. For details, see “Coupled Constraints” on page 10-41. Data Types: double InitialErrorValues — Errors for InitialX [] (default) | length-N vector with entries -1 or 1 Errors for InitialX, specified as a length-N vector with entries -1 or 1, where N is the number of evaluation points. Specify -1 for no error, and 1 for an error. Example: 'InitialErrorValues',[-1,-1,-1,-1,1] Data Types: double InitialUserData — Initial data corresponding to InitialX [] (default) | length-N cell vector 35-174

bayesopt

Initial data corresponding to InitialX, specified as a length-N cell vector, where N is the number of evaluation points. Example: 'InitialUserData',{2,3,-1} Data Types: cell InitialObjectiveEvaluationTimes — Evaluation times of objective function at InitialX [] (default) | length-N vector Evaluation times of objective function at InitialX, specified as a length-N vector, where N is the number of evaluation points. Time is measured in seconds. Data Types: double InitialIterationTimes — Times for the first N iterations {} (default) | length-N vector Times for the first N iterations, specified as a length-N vector, where N is the number of evaluation points. Time is measured in seconds. Data Types: double

Output Arguments results — Bayesian optimization results BayesianOptimization object Bayesian optimization results, returned as a BayesianOptimization object.

More About Coupled Constraints Coupled constraints are those constraints whose value comes from the objective function calculation. See “Coupled Constraints” on page 10-41.

Tips • Bayesian optimization is not reproducible if one of these conditions exists: • You specify an acquisition function whose name includes per-second, such as 'expectedimprovement-per-second'. The per-second modifier indicates that optimization depends on the run time of the objective function. For more details, see “Acquisition Function Types” on page 10-3. • You specify to run Bayesian optimization in parallel. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For more details, see “Parallel Bayesian Optimization” on page 10-7.

Version History Introduced in R2016b 35-175

35

Functions

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the UseParallel name-value argument to true in the call to this function. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also BayesianOptimization | bestPoint | optimizableVariable Topics “Optimize Cross-Validated Classifier Using bayesopt” on page 10-46 “Bayesian Optimization Algorithm” on page 10-2

35-176

bbdesign

bbdesign Box-Behnken design

Syntax dBB = bbdesign(n) [dBB,blocks] = bbdesign(n) [...] = bbdesign(n,param,val)

Description dBB = bbdesign(n) generates a Box-Behnken design for n factors. n must be an integer 3 or larger. The output matrix dBB is m-by-n, where m is the number of runs in the design. Each row represents one run, with settings for all factors represented in the columns. Factor values are normalized so that the cube points take values between -1 and 1. [dBB,blocks] = bbdesign(n) requests a blocked design. The output blocks is an m-by-1 vector of block numbers for each run. Blocks indicate runs that are to be measured under similar conditions to minimize the effect of inter-block differences on the parameter estimates. [...] = bbdesign(n,param,val) specifies one or more optional parameter/value pairs for the design. The following table lists valid parameter/value pairs. Parameter

Description

Values

'center'

Number of center points. Integer. The default depends on n.

'blocksize'

Maximum number of points per block.

Integer. The default is Inf.

Examples Create and Visualize 3-Factor Box-Behnken Design Create a 3-factor Box-Behnken design. dBB = bbdesign(3) dBB = 15×3 -1 -1 1 1 -1 -1 1 1 0 0

-1 1 -1 1 0 0 0 0 -1 -1

0 0 0 0 -1 1 -1 1 -1 1

35-177

35

Functions

⋮

The center point is run 3 times to allow for a more uniform estimate of the prediction variance over the entire design space. Visualize the design as follows: plot3(dBB(:,1),dBB(:,2),dBB(:,3),'ro', ... 'MarkerFaceColor','b') X = [1 -1 -1 -1 1 -1 -1 -1 1 1 -1 -1; ... 1 1 1 -1 1 1 1 -1 1 1 -1 -1]; Y = [-1 -1 1 -1 -1 -1 1 -1 1 -1 1 -1; ... 1 -1 1 1 1 -1 1 1 1 -1 1 -1]; Z = [1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1; ... 1 1 1 1 -1 -1 -1 -1 1 1 1 1]; line(X,Y,Z,'Color','b') axis square equal

Version History Introduced before R2006a

35-178

bbdesign

See Also ccdesign

35-179

35

Functions

bestPoint Best point in a Bayesian optimization according to a criterion

Syntax x = bestPoint(results) x = bestPoint(results,Name,Value) [x,CriterionValue] = bestPoint( ___ ) [x,CriterionValue,iteration] = bestPoint( ___ )

Description x = bestPoint(results) returns the best feasible point in the Bayesian model results according to the default criterion 'min-visited-upper-confidence-interval'. x = bestPoint(results,Name,Value) modifies the best point using name-value pairs. [x,CriterionValue] = bestPoint( ___ ), for any previous syntax, also returns the value of the criterion at x. [x,CriterionValue,iteration] = bestPoint( ___ ) also returns the iteration number at which the best point was returned. Applies when the Criterion name-value pair is 'minobserved', 'min-visited-mean', or the default 'min-visited-upper-confidenceinterval'.

Examples Best Point of an Optimized KNN Classifier This example shows how to obtain the best point of an optimized classifier. Optimize a KNN classifier for the ionosphere data, meaning find parameters that minimize the cross-validation loss. Minimize over nearest-neighborhood sizes from 1 to 30, and over the distance functions 'chebychev', 'euclidean', and 'minkowski'. For reproducibility, set the random seed, and set the AcquisitionFunctionName option to 'expected-improvement-plus'. load ionosphere rng(11) num = optimizableVariable('n',[1,30],'Type','integer'); dst = optimizableVariable('dst',{'chebychev','euclidean','minkowski'},'Type','categorical'); c = cvpartition(351,'Kfold',5); fun = @(x)kfoldLoss(fitcknn(X,Y,'CVPartition',c,'NumNeighbors',x.n,... 'Distance',char(x.dst),'NSMethod','exhaustive')); results = bayesopt(fun,[num,dst],'Verbose',0,... 'AcquisitionFunctionName','expected-improvement-plus');

35-180

bestPoint

35-181

35

Functions

Obtain the best point according to the default 'min-visited-upper-confidence-interval' criterion. x = bestPoint(results) x=1×2 table n dst _ _________ 1

chebychev

The lowest estimated cross-validation loss occurs for one nearest neighbor and 'chebychev' distance. Careful examination of the objective function model plot shows a point with two nearest neighbors and 'chebychev' distance that has a lower objective function value. Find this point using a different criterion. x = bestPoint(results,'Criterion','min-observed') x=1×2 table n dst _ _________ 2

35-182

chebychev

bestPoint

Also find the minimum observed objective function value, and the iteration number at which it was observed. [x,CriterionValue,iteration] = bestPoint(results,'Criterion','min-observed') x=1×2 table n dst _ _________ 2

chebychev

CriterionValue = 0.1054 iteration = 21

Input Arguments results — Bayesian optimization results BayesianOptimization object Bayesian optimization results, specified as a BayesianOptimization object. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: x = bestPoint(results,'Criterion','min-observed') Criterion — Best point criterion 'min-visited-upper-confidence-interval' (default) | 'min-observed' | 'min-mean' | 'min-upper-confidence-interval' | 'min-visited-mean' Best point criterion, specified as the comma-separated pair consisting of 'Criterion' and a criterion name. The names are case-insensitive, do not require - characters, and require only enough characters to make the name uniquely distinguishable. Criterion Name

Meaning

'min-observed'

x is the feasible point with minimum observed objective.

'min-mean'

x is the feasible point where the objective model mean is minimized.

'min-upper-confidenceinterval'

x is the feasible point minimizing an upper confidence interval of the objective model. See alpha.

'min-visited-mean'

x is the feasible point where the objective model mean is minimized among the visited points.

'min-visited-upperconfidence-interval'

x is the feasible point minimizing an upper confidence interval of the objective model among the visited points. See alpha.

Example: 'Criterion','min-visited-mean' 35-183

35

Functions

alpha — Probability that modeled objective mean exceeds CriterionValue 0.01 (default) | scalar between 0 and 1 Probability that the modeled objective mean exceeds CriterionValue, specified as the commaseparated pair consisting of 'alpha' and a scalar between 0 and 1. alpha relates to the 'minupper-confidence-interval' and 'min-visited-upper-confidence-interval' Criterion values. The definition for the upper confidence interval is the value Y where P(meanQ(fun(x)) > Y) = alpha, where fun is the objective function, and the mean is calculated with respect to the posterior distribution Q. Example: 'alpha',0.05 Data Types: double

Output Arguments x — Best point 1-by-D table Best point, returned as a 1-by-D table, where D is the number of variables. The meaning of “best” is with respect to Criterion. CriterionValue — Value of criterion real scalar Value of criterion, returned as a real scalar. The value depends on the setting of the Criterion name-value pair, which has a default value of 'min-visited-upper-confidence-interval'. Criterion Name

Meaning

'min-observed'

Minimum observed objective.

'min-mean'

Minimum of model mean.

'min-upper-confidenceinterval'

Value Y satisfying the equation P(meanQ(fun(x)) > Y) = alpha.

'min-visited-mean'

Minimum of observed model mean.

'min-visited-upperconfidence-interval'

Value Y satisfying the equation P(meanQ(fun(x)) > Y) = alpha among observed points.

iteration — Iteration number at which best point was observed positive integer Iteration number at which best point was observed, returned as a positive integer. The best point is defined by CriterionValue.

Version History Introduced in R2016b 35-184

bestPoint

See Also BayesianOptimization | bayesopt

35-185

35

Functions

betacdf Beta cumulative distribution function

Syntax p = betacdf(x,a,b) p = betacdf(x,a,b,'upper')

Description p = betacdf(x,a,b) returns the beta cdf at each of the values in x using the corresponding parameters in a and b. x, a, and b can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input is expanded to a constant array with the same dimensions as the other inputs. The parameters in a and b must all be positive, and the values in x must lie on the interval [0,1]. p = betacdf(x,a,b,'upper') returns the complement of the beta cdf at each of the values in x, using an algorithm that more accurately computes the extreme upper tail probabilities. The beta cdf for a given value x and given pair of parameters a and b is 1 p = F x a, b = B(a, b)

x

∫t

a−1

b−1

(1 − t)

dt

0

where B( · ) is the Beta function.

Examples Compute Beta Distribution cdf Compute the cdf for a beta distribution with parameters a = 2 and b = 2. x a b p

= = = =

0.1:0.2:0.9; 2; 2; betacdf(x,a,b)

p = 1×5 0.0280

0.2160

0.5000

a = [1 2 3]; p = betacdf(0.5,a,a) p = 1×3 0.5000

35-186

0.5000

0.5000

0.7840

0.9720

betacdf

Version History Introduced before R2006a

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also cdf | betapdf | betainv | betastat | betalike | betarnd | betafit Topics “Beta Distribution” on page B-6

35-187

35

Functions

betafit Beta parameter estimates

Syntax phat = betafit(data) [phat,pci] = betafit(data,alpha)

Description phat = betafit(data) computes the maximum likelihood estimates of the beta distribution parameters a and b from the data in the vector data and returns a column vector containing the a and b estimates, where the beta cdf is given by 1 F(x a, b) = B(a, b)

x

∫t

a−1

b−1

(1 − t)

dt

0

and B( · ) is the Beta function. The elements of data must lie in the open interval (0, 1), where the beta distribution is defined. However, it is sometimes also necessary to fit a beta distribution to data that include exact zeros or ones. For such data, the beta likelihood function is unbounded, and standard maximum likelihood estimation is not possible. In that case, betafit maximizes a modified likelihood that incorporates the zeros or ones by treating them as if they were values that have been left-censored at sqrt(realmin) or right-censored at 1-eps/2, respectively. [phat,pci] = betafit(data,alpha) returns confidence intervals on the a and b parameters in the 2-by-2 matrix pci. The first column of the matrix contains the lower and upper confidence bounds for parameter a, and the second column contains the confidence bounds for parameter b. The optional input argument alpha is a value in the range [0, 1] specifying the width of the confidence intervals. By default, alpha is 0.05, which corresponds to 95% confidence intervals. The confidence intervals are based on a normal approximation for the distribution of the logs of the parameter estimates.

Examples This example generates 100 beta distributed observations. The true a and b parameters are 4 and 3, respectively. Compare these to the values returned in p by the beta fit. Note that the columns of ci both bracket the true parameters. data = betarnd(4,3,100,1); [p,ci] = betafit(data,0.01) p = 5.5328 3.8097 ci = 3.6538 2.6197 8.3781 5.5402

Version History Introduced before R2006a 35-188

betafit

References [1] Hahn, Gerald J., and S. S. Shapiro. Statistical Models in Engineering. Hoboken, NJ: John Wiley & Sons, Inc., 1994, p. 95.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also mle | betapdf | betainv | betastat | betalike | betarnd | betacdf Topics “Beta Distribution” on page B-6

35-189

35

Functions

betainv Beta inverse cumulative distribution function

Syntax X = betainv(P,A,B)

Description X = betainv(P,A,B) computes the inverse of the beta cdf with parameters specified by A and B for the corresponding probabilities in P. P, A, and B can be vectors, matrices, or multidimensional arrays that are all the same size. A scalar input is expanded to a constant array with the same dimensions as the other inputs. The parameters in A and B must all be positive, and the values in P must lie on the interval [0, 1]. The inverse beta cdf for a given probability p and a given pair of parameters a and b is x = F−1(p a, b) = x: F(x a, b) = p where 1 p = F(x a, b) = B(a, b)

x

∫t

a−1

b−1

(1 − t)

dt

0

and B( · ) is the Beta function. Each element of output X is the value whose cumulative probability under the beta cdf defined by the corresponding parameters in A and B is specified by the corresponding value in P.

Examples p = [0.01 0.5 0.99]; x = betainv(p,10,5) x = 0.3726 0.6742 0.8981

According to this result, for a beta cdf with a = 10 and b = 5, a value less than or equal to 0.3726 occurs with probability 0.01. Similarly, values less than or equal to 0.6742 and 0.8981 occur with respective probabilities 0.5 and 0.99.

Algorithms The betainv function uses Newton's method with modifications to constrain steps to the allowable range for x, i.e., [0 1].

Version History Introduced before R2006a 35-190

betainv

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also icdf | betapdf | betafit | betastat | betalike | betarnd | betacdf Topics “Beta Distribution” on page B-6

35-191

35

Functions

betalike Beta negative log-likelihood

Syntax nlogL = betalike(params,data) [nlogL,AVAR] = betalike(params,data)

Description nlogL = betalike(params,data) returns the negative of the beta log-likelihood function for the beta parameters a and b specified in vector params and the observations specified in the column vector data. The elements of data must lie in the open interval (0, 1), where the beta distribution is defined. However, it is sometimes also necessary to fit a beta distribution to data that include exact zeros or ones. For such data, the beta likelihood function is unbounded, and standard maximum likelihood estimation is not possible. In that case, betalike computes a modified likelihood that incorporates the zeros or ones by treating them as if they were values that have been left-censored at sqrt(realmin) or right-censored at 1-eps/2, respectively. [nlogL,AVAR] = betalike(params,data) also returns AVAR, which is the asymptotic variancecovariance matrix of the parameter estimates if the values in params are the maximum likelihood estimates. AVAR is the inverse of Fisher's information matrix. The diagonal elements of AVAR are the asymptotic variances of their respective parameters. betalike is a utility function for maximum likelihood estimation of the beta distribution. The likelihood assumes that all the elements in the data sample are mutually independent. Since betalike returns the negative beta log-likelihood function, minimizing betalike using fminsearch is the same as maximizing the likelihood.

Examples This example continues the betafit example, which calculates estimates of the beta parameters for some randomly generated beta distributed data. r = betarnd(4,3,100,1); [nlogl,AVAR] = betalike(betafit(r),r) nlogl = -27.5996 AVAR = 0.2783 0.1316

35-192

0.1316 0.0867

betalike

Version History Introduced before R2006a

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also betapdf | betafit | betainv | betastat | betarnd | betacdf Topics “Beta Distribution” on page B-6

35-193

35

Functions

betapdf Beta probability density function

Syntax Y = betapdf(X,A,B)

Description Y = betapdf(X,A,B) computes the beta pdf at each of the values in X using the corresponding parameters in A and B. X, A, and B can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input is expanded to a constant array with the same dimensions of the other inputs. The parameters in A and B must all be positive, and the values in X must lie on the interval [0, 1]. The beta probability density function for a given value x and given pair of parameters a and b is y = f (x a, b) =

1 b−1 xa − 1(1 − x) I 0, 1 (x) B(a, b)

where B( · ) is the Beta function. The uniform distribution on (0 1) is a degenerate case of the beta pdf where a = 1 and b = 1. A likelihood function is the pdf viewed as a function of the parameters. Maximum likelihood estimators (MLEs) are the values of the parameters that maximize the likelihood function for a fixed value of x.

Examples a = [0.5 1; 2 4] a = 0.5000 1.0000 2.0000 4.0000 y = betapdf(0.5,a,a) y = 0.6366 1.0000 1.5000 2.1875

Version History Introduced before R2006a

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. 35-194

betapdf

This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also pdf | betafit | betainv | betastat | betalike | betarnd | betacdf Topics “Beta Distribution” on page B-6

35-195

35

Functions

betarnd Beta random numbers

Syntax R = betarnd(A,B) R = betarnd(A,B,m,n,...) R = betarnd(A,B,[m,n,...])

Description R = betarnd(A,B) generates random numbers from the beta distribution with parameters specified by A and B. A and B can be vectors, matrices, or multidimensional arrays that have the same size, which is also the size of R. A scalar input for A or B is expanded to a constant array with the same dimensions as the other input. R = betarnd(A,B,m,n,...) or R = betarnd(A,B,[m,n,...]) generates an m-by-n-by-... array containing random numbers from the beta distribution with parameters A and B. A and B can each be scalars or arrays of the same size as R.

Examples a = [1 1;2 2]; b = [1 2;1 2]; r = betarnd(a,b) r = 0.6987 0.6139 0.9102 0.8067 r = betarnd(10,10,[1 5]) r = 0.5974 0.4777 0.5538

0.5465

0.6327

r = betarnd(4,2,2,3) r = 0.3943 0.6101 0.5768 0.5990 0.2760 0.5474

Version History Introduced before R2006a

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: 35-196

betarnd

The generated code can return a different sequence of numbers than MATLAB if either of the following is true: • The output is nonscalar. • An input parameter is invalid for the distribution. For more information on code generation, see “Introduction to Code Generation” on page 34-3 and “General Code Generation Workflow” on page 34-6. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also random | betapdf | betafit | betainv | betastat | betalike | betacdf Topics “Beta Distribution” on page B-6

35-197

35

Functions

betastat Beta mean and variance

Syntax [M,V] = betastat(A,B)

Description [M,V] = betastat(A,B), with A>0 and B>0, returns the mean of and variance for the beta distribution with parameters specified by A and B. A and B can be vectors, matrices, or multidimensional arrays that have the same size, which is also the size of M and V. A scalar input for A or B is expanded to a constant array with the same dimensions as the other input. The mean of the beta distribution with parameters a and b is a/(a + b) and the variance is ab 2

(a + b + 1)(a + b)

Examples If parameters a and b are equal, the mean is 1/2. a = 1:6; [m,v] = betastat(a,a) m = 0.5000 0.5000 0.5000 v = 0.0833 0.0500 0.0357

0.5000

0.5000

0.5000

0.0278

0.0227

0.0192

Version History Introduced before R2006a

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also betapdf | betafit | betainv | betalike | betarnd | betacdf 35-198

betastat

Topics “Beta Distribution” on page B-6

35-199

35

Functions

binocdf Binomial cumulative distribution function

Syntax y = binocdf(x,n,p) y = binocdf(x,n,p,'upper')

Description y = binocdf(x,n,p) computes a binomial cumulative distribution function at each of the values in x using the corresponding number of trials in n and the probability of success for each trial in p. x, n, and p can be vectors, matrices, or multidimensional arrays of the same size. Alternatively, one or more arguments can be scalars. The binocdf function expands scalar inputs to constant arrays with the same dimensions as the other inputs. y = binocdf(x,n,p,'upper') returns the complement of the binomial cumulative distribution function at each value in x, using an algorithm that computes the extreme upper tail probabilities more accurately than the default algorithm.

Examples Compute and Plot Binomial Cumulative Distribution Function Compute and plot the binomial cumulative distribution function for the specified range of integer values, number of trials, and probability of success for each trial. A baseball team plays 100 games in a season and has a 50-50 chance of winning each game. Find the probability of the team winning more than 55 games in a season. format long 1 - binocdf(55,100,0.5) ans = 0.135626512036917

Find the probability of the team winning between 50 and 55 games in a season. binocdf(55,100,0.5) - binocdf(49,100,0.5) ans = 0.404168106656672

Compute the probabilities of the team winning more than 55 games in a season if the chance of winning each game ranges from 10% to 90%. chance = 0.1:0.05:0.9; y = 1 - binocdf(55,100,chance);

35-200

binocdf

Plot the results. scatter(chance,y) grid on

Compute Extreme Upper Tail Probabilities Compute the complement of the binomial cumulative distribution function with more accurate upper tail probabilities. A baseball team plays 100 games in a season and has a 50-50 chance of winning each game. Find the probability of the team winning more than 95 games in a season. format long 1 - binocdf(95,100,0.5) ans = 0

This result shows that the probability is so close to 1 (within eps) that subtracting it from 1 gives 0. To approximate the extreme upper tail probabilities better, compute the complement of the binomial cumulative distribution function directly instead of computing the difference. binocdf(95,100,0.5,'upper')

35-201

35

Functions

ans = 3.224844447881779e-24

Alternatively, use the binopdf function to find the probabilities of the team winning 96, 97, 98, 99, and 100 games in a season. Find the sum of these probabilities by using the sum function. sum(binopdf(96:100,100,0.5),'all') ans = 3.224844447881779e-24

Input Arguments x — Values at which to evaluate binomial cdf integer from interval [0 n] | array of integers from interval [0 n] Values at which to evaluate the binomial cdf, specified as an integer or an array of integers. All values of x must belong to the interval [0 n], where n is the number of trials. Example: [0 1 3 4] Data Types: single | double n — Number of trials positive integer | array of positive integers Number of trials, specified as a positive integer or an array of positive integers. Example: [10 20 50 100] Data Types: single | double p — Probability of success for each trial scalar value from interval [0 1] | array of scalar values from interval [0 1] Probability of success for each trial, specified as a scalar value or an array of scalar values. All values of p must belong to the interval [0 1]. Example: [0.01 0.1 0.5 0.7] Data Types: single | double

Output Arguments y — Binomial cdf values scalar value | array of scalar values Binomial cdf values, returned as a scalar value or an array of scalar values. Each element in y is the binomial cdf value of the distribution evaluated at the corresponding element in x. Data Types: single | double

35-202

binocdf

More About Binomial Cumulative Distribution Function The binomial cumulative distribution function lets you obtain the probability of observing less than or equal to x successes in n trials, with the probability p of success on a single trial. The binomial cumulative distribution function for a given value x and a given pair of parameters n and p is y = F(x n, p) =

x

∑

i=0

n i (n − i) p (1 − p) I(0, 1, ..., n)(i) . i

The resulting value y is the probability of observing up to x successes in n independent trials, where the probability of success in any given trial is p. The indicator function I(0, 1, ..., n)(i) ensures that x only adopts values of 0,1,...,n.

Alternative Functionality • binocdf is a function specific to binomial distribution. Statistics and Machine Learning Toolbox also offers the generic function cdf, which supports various probability distributions. To use cdf, specify the probability distribution name and its parameters. Alternatively, create a BinomialDistribution probability distribution object and pass the object as an input argument. Note that the distribution-specific function binocdf is faster than the generic function cdf. • Use the Probability Distribution Function app to create an interactive plot of the cumulative distribution function (cdf) or probability density function (pdf) for a probability distribution.

Version History Introduced before R2006a

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also cdf | binopdf | binoinv | binostat | binofit | binornd | BinomialDistribution Topics “Binomial Distribution” on page B-10 35-203

35

Functions

binofit Binomial parameter estimates

Syntax phat = binofit(x,n) [phat,pci] = binofit(x,n) [phat,pci] = binofit(x,n,alpha)

Description phat = binofit(x,n) returns a maximum likelihood estimate of the probability of success in a given binomial trial based on the number of successes, x, observed in n independent trials. If x = (x(1), x(2), ... x(k)) is a vector, binofit returns a vector of the same size as x whose ith entry is the parameter estimate for x(i). All k estimates are independent of each other. If n = (n(1), n(2), ..., n(k)) is a vector of the same size as x, the binomial fit, binofit, returns a vector whose ith entry is the parameter estimate based on the number of successes x(i) in n(i) independent trials. A scalar value for x or n is expanded to the same size as the other input. [phat,pci] = binofit(x,n) returns the probability estimate, phat, and the 95% confidence intervals, pci. binofit uses the Clopper-Pearson method to calculate confidence intervals. [phat,pci] = binofit(x,n,alpha) returns the 100(1 - alpha)% confidence intervals. For example, alpha = 0.01 yields 99% confidence intervals. Note binofit behaves differently than other Statistics and Machine Learning Toolbox functions that compute parameter estimates, in that it returns independent estimates for each entry of x. By comparison, expfit returns a single parameter estimate based on all the entries of x. Unlike most other distribution fitting functions, the binofit function treats its input x vector as a collection of measurements from separate samples. If you want to treat x as a single sample and compute a single parameter estimate for it, you can use binofit(sum(x),sum(n)) when n is a vector, and binofit(sum(X),N*length(X)) when n is a scalar.

Examples This example generates a binomial sample of 100 elements, where the probability of success in a given trial is 0.6, and then estimates this probability from the outcomes in the sample. r = binornd(100,0.6); [phat,pci] = binofit(r,100) phat = 0.5800 pci = 0.4771 0.6780

The 95% confidence interval, pci, contains the true value, 0.6. 35-204

binofit

Version History Introduced before R2006a

References [1] Johnson, N. L., S. Kotz, and A. W. Kemp. Univariate Discrete Distributions. Hoboken, NJ: WileyInterscience, 1993.

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also mle | binopdf | binocdf | binoinv | binostat | binornd Topics “Binomial Distribution” on page B-10

35-205

35

Functions

binoinv Binomial inverse cumulative distribution function

Syntax X = binoinv(Y,N,P)

Description X = binoinv(Y,N,P) returns the smallest integer X such that the binomial cdf evaluated at X is equal to or exceeds Y. You can think of Y as the probability of observing X successes in N independent trials where P is the probability of success in each trial. Each X is a positive integer less than or equal to N. Y, N, and P can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input is expanded to a constant array with the same dimensions as the other inputs. The parameters in N must be positive integers, and the values in both P and Y must lie on the interval [0 1].

Examples If a baseball team has a 50-50 chance of winning any game, what is a reasonable range of games this team might win over a season of 162 games? binoinv([0.05 0.95],162,0.5) ans = 71 91

This result means that in 90% of baseball seasons, a .500 team should win between 71 and 91 games.

Version History Introduced before R2006a

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also icdf | binopdf | binocdf | binofit | binostat | binornd 35-206

binoinv

Topics “Binomial Distribution” on page B-10

35-207

35

Functions

binopdf Binomial probability density function

Syntax y = binopdf(x,n,p)

Description y = binopdf(x,n,p) computes the binomial probability density function at each of the values in x using the corresponding number of trials in n and probability of success for each trial in p. x, n, and p can be vectors, matrices, or multidimensional arrays of the same size. Alternatively, one or more arguments can be scalars. The binopdf function expands scalar inputs to constant arrays with the same dimensions as the other inputs.

Examples Compute and Plot Binomial Probability Density Function Compute and plot the binomial probability density function for the specified range of integer values, number of trials, and probability of success for each trial. In one day, a quality assurance inspector tests 200 circuit boards. 2% of the boards have defects. Compute the probability that the inspector will find no defective boards on any given day. binopdf(0,200,0.02) ans = 0.0176

Compute the binomial probability density function values at each value from 0 to 200. These values correspond to the probabilities that the inspector will find 0, 1, 2, ..., 200 defective boards on any given day. defects = 0:200; y = binopdf(defects,200,.02);

Plot the resulting binomial probability values. plot(defects,y)

35-208

binopdf

Compute the most likely number of defective boards that the inspector finds in a day. [x,i] = max(y); defects(i) ans = 4

Input Arguments x — Values at which to evaluate binomial pdf integer from interval [0 n] | array of integers from interval [0 n] Values at which to evaluate the binomial pdf, specified as an integer or an array of integers. All values of x must belong to the interval [0 n], where n is the number of trials. Example: [0,1,3,4] Data Types: single | double n — Number of trials positive integer | array of positive integers Number of trials, specified as a positive integer or an array of positive integers. Example: [10,20,50,100] Data Types: single | double 35-209

35

Functions

p — Probability of success for each trial scalar value from interval [0 1] | array of scalar values from interval [0 1] Probability of success for each trial, specified as a scalar value or an array of scalar values. All values of p must belong to the interval [0 1]. Example: [0.01,0.1,0.5,0.7] Data Types: single | double

Output Arguments y — Binomial pdf values scalar value | array of scalar values Binomial pdf values, returned as a scalar value or array of scalar values. Each element in y is the binomial pdf value of the distribution evaluated at the corresponding element in x. Data Types: single | double

More About Binomial Probability Density Function The binomial probability density function lets you obtain the probability of observing exactly x successes in n trials, with the probability p of success on a single trial. The binomial probability density function for a given value x and given pair of parameters n and p is y = f (x n, p) =

n x (n − x) p q I(0, 1, ..., n)(x) x

where q = 1 – p. The resulting value y is the probability of observing exactly x successes in n independent trials, where the probability of success in any given trial is p. The indicator function I(0,1,...,n)(x) ensures that x only adopts values of 0, 1, ..., n.

Alternative Functionality • binopdf is a function specific to binomial distribution. Statistics and Machine Learning Toolbox also offers the generic function pdf, which supports various probability distributions. To use pdf, specify the probability distribution name and its parameters. Alternatively, create a BinomialDistribution probability distribution object and pass the object as an input argument. Note that the distribution-specific function binopdf is faster than the generic function pdf. • Use the Probability Distribution Function app to create an interactive plot of the cumulative distribution function (cdf) or probability density function (pdf) for a probability distribution.

Version History Introduced before R2006a

35-210

binopdf

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also pdf | binoinv | binocdf | binofit | binostat | binornd | BinomialDistribution Topics “Binomial Distribution” on page B-10

35-211

35

Functions

binornd Random numbers from binomial distribution

Syntax r = binornd(n,p) r = binornd(n,p,sz1,...,szN) r = binornd(n,p,sz)

Description r = binornd(n,p) generates random numbers from the binomial distribution specified by the number of trials n and the probability of success for each trial p. n and p can be vectors, matrices, or multidimensional arrays of the same size. Alternatively, one or more arguments can be scalars. The binornd function expands scalar inputs to constant arrays with the same dimensions as the other inputs. The function returns a vector, matrix, or multidimensional array r of the same size as n and p. r = binornd(n,p,sz1,...,szN) generates an array of random numbers from the binomial distribution with the scalar parameters n and p, where sz1,...,szN indicates the size of each dimension. r = binornd(n,p,sz) generates an array of random numbers from the binomial distribution with the scalar parameters n and p, where vector sz specifies size(r).

Examples Array of Random Numbers from Several Binomial Distributions Generate an array of random numbers from the binomial distributions. For each distribution, you specify the number of trials and the probability of success for each trial. Specify the numbers of trials. n = 10:10:60 n = 1×6 10

20

30

40

50

60

Specify the probabilities of success for each trial. p = 1./n p = 1×6 0.1000

35-212

0.0500

0.0333

0.0250

0.0200

0.0167

binornd

Generate random numbers from the binomial distributions. r = binornd(n,p) r = 1×6 0

1

1

0

1

1

Array of Random Numbers from One Binomial Distribution Generate an array of random numbers from one binomial distribution. Here, the distribution parameters n and p are scalars. Use the binornd function to generate random numbers from the binomial distribution with 100 trials, where the probability of success in each trial is 0.2. The function returns one number. r_scalar = binornd(100,0.2) r_scalar = 20

Generate a 2-by-3 array of random numbers from the same distribution by specifying the required array dimensions. r_array = binornd(100,0.2,2,3) r_array = 2×3 18 18

23 24

20 23

Alternatively, specify the required array dimensions as a vector. r_array = binornd(100,0.2,[2 3]) r_array = 2×3 21 26

21 18

20 23

Input Arguments n — Number of trials positive integer | array of positive integers Number of trials, specified as a positive integer or an array of positive integers. Example: [10 20 50 100] Data Types: single | double p — Probability of success for each trial scalar value | array of scalar values 35-213

35

Functions

Probability of success for each trial, specified as a scalar value or an array of scalar values. All values of p must belong to the interval [0 1]. Example: [0.01 0.1 0.5 0.7] Data Types: single | double sz1,...,szN — Size of each dimension (as separate arguments) integers Size of each dimension, specified as separate arguments of integers. For example, specifying 5,3,2 generates a 5-by-3-by-2 array of random numbers from the binomial probability distribution. If either n or p is an array, then the specified dimensions sz1,...,szN must match the common dimensions of n and p after any necessary scalar expansion. The default values of sz1,...,szN are the common dimensions. • If you specify a single value sz1, then r is a square matrix of size sz1-by-sz1. • If the size of any dimension is 0 or negative, then r is an empty array. • Beyond the second dimension, binornd ignores trailing dimensions with a size of 1. For example, binornd(n,p,3,1,1,1) produces a 3-by-1 vector of random numbers. Example: 5,3,2 Data Types: single | double sz — Size of each dimension (as a row vector) row vector of integers Size of each dimension, specified as a row vector of integers. For example, specifying [5 3 2] generates a 5-by-3-by-2 array of random numbers from the binomial probability distribution. If either n or p is an array, then the specified dimensions sz must match the common dimensions of n and p after any necessary scalar expansion. The default values of sz are the common dimensions. • If you specify a single value [sz1], then r is a square matrix of size sz1-by-sz1. • If the size of any dimension is 0 or negative, then r is an empty array. • Beyond the second dimension, binornd ignores trailing dimensions with a size of 1. For example, binornd(n,p,[3 1 1 1]) produces a 3-by-1 vector of random numbers. Example: [5 3 2] Data Types: single | double

Output Arguments r — Random numbers from binomial distribution scalar value | array of scalar values Random numbers from the binomial distribution, returned as a scalar value or an array of scalar values. Data Types: single | double 35-214

binornd

Alternative Functionality • binornd is a function specific to binomial distribution. Statistics and Machine Learning Toolbox also offers the generic function random, which supports various probability distributions. To use random, specify the probability distribution name and its parameters. Alternatively, create a BinomialDistribution probability distribution object and pass the object as an input argument. Note that the distribution-specific function binornd is faster than the generic function random. • To generate random numbers interactively, use randtool, a user interface for random number generation.

Version History Introduced before R2006a

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: The generated code can return a different sequence of numbers than MATLAB in these two cases: • The output is nonscalar. • An input parameter is invalid for the distribution. For more information on code generation, see “Introduction to Code Generation” on page 34-3 and “General Code Generation Workflow” on page 34-6. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). Distributed Arrays Partition large arrays across the combined memory of your cluster using Parallel Computing Toolbox™. This function fully supports distributed arrays. For more information, see “Run MATLAB Functions with Distributed Arrays” (Parallel Computing Toolbox).

See Also random | binoinv | binocdf | binofit | binostat | binopdf | BinomialDistribution Topics “Binomial Distribution” on page B-10

35-215

35

Functions

binostat Binomial mean and variance

Syntax [M,V] = binostat(N,P)

Description [M,V] = binostat(N,P) returns the mean of and variance for the binomial distribution with parameters specified by the number of trials, N, and probability of success for each trial, P. N and P can be vectors, matrices, or multidimensional arrays that have the same size, which is also the size of M and V. A scalar input for N or P is expanded to a constant array with the same dimensions as the other input. The mean of the binomial distribution with parameters n and p is np. The variance is npq, where q = 1 – p.

Examples n = logspace(1,5,5) n = 10 100 1000 [m,v] = binostat(n,1./n) m = 1 1 1 1 1 v = 0.9000 0.9900 0.9990 [m,v] = binostat(n,1/2) m = 5 50 500 v = 1.0e+04 * 0.0003 0.0025 0.0250

10000

100000

0.9999

1.0000

5000

50000

0.2500

2.5000

Version History Introduced before R2006a

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. 35-216

binostat

This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also binoinv | binocdf | binofit | binornd | binopdf Topics “Binomial Distribution” on page B-10

35-217

35

Functions

binScatterPlot Scatter plot of bins for tall arrays

Syntax binScatterPlot(X,Y) binScatterPlot(X,Y,nbins) binScatterPlot(X,Y,Xedges,Yedges) binScatterPlot(X,Y,Name,Value) h = binScatterPlot( ___ )

Description binScatterPlot(X,Y) creates a binned scatter plot of the data in X and Y. The binScatterPlot function uses an automatic binning algorithm that returns bins with a uniform area, chosen to cover the range of elements in X and Y and reveal the underlying shape of the distribution. binScatterPlot(X,Y,nbins) specifies the number of bins to use in each dimension. binScatterPlot(X,Y,Xedges,Yedges) specifies the edges of the bins in each dimension using the vectors Xedges and Yedges. binScatterPlot(X,Y,Name,Value) specifies additional options with one or more name-value pair arguments using any of the previous syntaxes. For example, you can specify 'Color' and a valid color option to change the color theme of the plot, or 'Gamma' with a positive scalar to adjust the level of detail. h = binScatterPlot( ___ ) returns a Histogram2 object. Use this object to inspect properties of the plot.

Examples Binned Scatter Plot of Normally Distributed Random Data Create two tall vectors of random data. Create a binned scatter plot for the data. When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. To run the example using the local MATLAB session when you have Parallel Computing Toolbox, change the global execution environment by using the mapreducer function. mapreducer(0) X = tall(randn(1e5,1)); Y = tall(randn(1e5,1)); binScatterPlot(X,Y) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 2: Completed in 0.34 sec - Pass 2 of 2: Completed in 0.23 sec

35-218

binScatterPlot

Evaluation completed in 1.1 sec Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.14 sec Evaluation completed in 0.22 sec

The resulting figure contains a slider to adjust the level of detail in the image.

Specify Number of Scatter Plot Bins Specify a scalar value as the third input argument to use the same number of bins in each dimension, or a two-element vector to use a different number of bins in each dimension. When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. To run the example using the local MATLAB session when you have Parallel Computing Toolbox, change the global execution environment by using the mapreducer function. mapreducer(0)

Plot a binned scatter plot of random data sorted into 100 bins in each dimension. X = tall(randn(1e5,1)); Y = tall(randn(1e5,1)); binScatterPlot(X,Y,100)

35-219

35

Functions

Evaluating tall expression using - Pass 1 of 1: Completed in 0.28 Evaluation completed in 0.44 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.12 Evaluation completed in 0.2 sec

the Local MATLAB Session: sec the Local MATLAB Session: sec

Use 20 bins in the x-dimension and continue to use 100 bins in the y-dimension. binScatterPlot(X,Y,[20 100]) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.13 sec Evaluation completed in 0.22 sec Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.1 sec Evaluation completed in 0.16 sec

35-220

binScatterPlot

Specify Bin Edges for Scatter Plot Plot a binned scatter plot of random data with specific bin edges. Use bin edges of Inf and -Inf to capture outliers. When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. To run the example using the local MATLAB session when you have Parallel Computing Toolbox, change the global execution environment by using the mapreducer function. mapreducer(0)

Create a binned scatter plot with 100 bin edges between [-2 2] in each dimension. The data outside the specified bin edges is not included in the plot. X = tall(randn(1e5,1)); Y = tall(randn(1e5,1)); Xedges = linspace(-2,2); Yedges = linspace(-2,2); binScatterPlot(X,Y,Xedges,Yedges) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 1.5 sec Evaluation completed in 2 sec

35-221

35

Functions

Use coarse bins extending to infinity on the edges of the plot to capture outliers. Xedges = [-Inf linspace(-2,2) Inf]; Yedges = [-Inf linspace(-2,2) Inf]; binScatterPlot(X,Y,Xedges,Yedges) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.32 sec Evaluation completed in 0.46 sec

35-222

binScatterPlot

Adjust Plot Color Theme Plot a binned scatter plot of random data, specifying 'Color' as 'c'. When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. To run the example using the local MATLAB session when you have Parallel Computing Toolbox, change the global execution environment by using the mapreducer function. mapreducer(0) X = tall(randn(1e5,1)); Y = tall(randn(1e5,1)); binScatterPlot(X,Y,'Color','c') Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 2: Completed in 1.3 sec - Pass 2 of 2: Completed in 0.4 sec Evaluation completed in 3.5 sec Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.23 sec Evaluation completed in 0.34 sec

35-223

35

Functions

Input Arguments X,Y — Data to distribute among bins (as separate arguments) tall vectors | tall matrices | tall multidimensional arrays Data to distribute among bins, specified as separate arguments of tall vectors, matrices, or multidimensional arrays. X and Y must be the same size. If X and Y are not vectors, then binScatterPlot treats them as single column vectors, X(:) and Y(:). Corresponding elements in X and Y specify the x and y coordinates of 2-D data points, [X(k),Y(k)]. The underlying data types of X and Y can be different, but binScatterPlot concatenates these inputs into a single N-by-2 tall matrix of the dominant underlying data type. binScatterPlot ignores all NaN values. Similarly, binScatterPlot ignores Inf and -Inf values, unless the bin edges explicitly specify Inf or -Inf as a bin edge. Note If X or Y contain integers of type int64 or uint64 that are larger than flintmax, then it is recommended that you explicitly specify the bin edges.binScatterPlot automatically bins the input data using double precision, which lacks integer precision for numbers greater than flintmax. Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical 35-224

binScatterPlot

nbins — Number of bins in each dimension scalar | vector Number of bins in each dimension, specified as a positive scalar integer or two-element vector of positive integers. If you do not specify nbins, then binScatterPlot automatically calculates how many bins to use based on the values in X and Y. • If nbins is a scalar, then binScatterPlot uses that many bins in each dimension. • If nbins is a vector, then nbins(1) specifies the number of bins in the x-dimension and nbins(2) specifies the number of bins in the y-dimension. Example: binScatterPlot(X,Y,20) uses 20 bins in each dimension. Example: binScatterPlot(X,Y,[10 20]) uses 10 bins in the x-dimension and 20 bins in the ydimension. Xedges — Bin edges in x-dimension vector Bin edges in x-dimension, specified as a vector. Xedges(1) is the first edge of the first bin in the xdimension, and Xedges(end) is the outer edge of the last bin. The value [X(k),Y(k)] is in the (i,j)th bin if Xedges(i) ≤ X(k) < Xedges(i+1) and Yedges(j) ≤ Y(k) < Yedges(j+1). The last bins in each dimension also include the last (outer) edge. For example, [X(k),Y(k)] falls into the ith bin in the last row if Xedges(end-1) ≤ X(k) ≤ Xedges(end) and Yedges(i) ≤ Y(k) < Yedges(i+1). Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical Yedges — Bin edges in y-dimension vector Bin edges in y-dimension, specified as a vector. Yedges(1) is the first edge of the first bin in the ydimension, and Yedges(end) is the outer edge of the last bin. The value [X(k),Y(k)] is in the (i,j)th bin if Xedges(i) ≤ X(k) < Xedges(i+1) and Yedges(j) ≤ Y(k) < Yedges(j+1). The last bins in each dimension also include the last (outer) edge. For example, [X(k),Y(k)] falls into the ith bin in the last row if Xedges(end-1) ≤ X(k) ≤ Xedges(end) and Yedges(i) ≤ Y(k) < Yedges(i+1). Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: binScatterPlot(X,Y,'BinWidth',[5 10]) BinMethod — Binning algorithm 'auto' (default) | 'scott' | 'integers' 35-225

35

Functions

Binning algorithm, specified as the comma-separated pair consisting of 'BinMethod' and one of these values. Value

Description

'auto'

The default 'auto' algorithm uses a maximum of 100 bins and chooses a bin width to cover the data range and reveal the shape of the underlying distribution.

'scott'

Scott’s rule is optimal if the data is close to being jointly normally distributed. This rule is appropriate for most other distributions, as well. It uses a bin size of [3.5*std(X)*numel(X)^(-1/4), 3.5*std(Y)*numel(Y)^(-1/4)].

'integers'

The integer rule is useful with integer data, as it creates a bin for each integer. It uses a bin width of 1 and places bin edges halfway between integers. To avoid accidentally creating too many bins, you can use this rule to create a limit of 65536 bins (216). If the data range is greater than 65536, then the integer rule uses wider bins instead.

Note The BinMethod property of the resulting Histogram2 object always has a value of 'manual'. BinWidth — Width of bins in each dimension scalar | vector Width of bins in each dimension, specified as the comma-separated pair consisting of 'BinWidth' and a scalar or two-element vector of positive integers, [xWidth yWidth]. A scalar value indicates the same bin width for each dimension. If you specify BinWidth, then binScatterPlot can use a maximum of 1024 bins (210) along each dimension. If instead the specified bin width requires more bins, then binScatterPlot uses a larger bin width corresponding to the maximum number of bins. Example: binScatterPlot(X,Y,'BinWidth',[5 10]) uses bins with size 5 in the x-dimension and size 10 in the y-dimension. Color — Plot color theme 'b' (default) | 'y' | 'm' | 'c' | 'r' | 'g' | 'k' Plot color theme, specified as the comma-separated pair consisting of 'Color' and one of these options.

35-226

Option

Description

'b'

Blue

'm'

Magenta

'c'

Cyan

binScatterPlot

Option

Description

'r'

Red

'g'

Green

'y'

Yellow

'k'

Black

Gamma — Gamma correction 1 (default) | positive scalar Gamma correction, specified as the comma-separated pair consisting of 'Gamma' and a positive scalar. Use this option to adjust the brightness and color intensity to affect the amount of detail in the image. • gamma < 1 — As gamma decreases, the shading of bins with smaller bin counts becomes progressively darker, including more detail in the image. • gamma > 1 — As gamma increases, the shading of bins with smaller bin counts becomes progressively lighter, removing detail from the image. • The default value of 1 does not apply any correction to the display. XBinLimits — Bin limits in x-dimension vector Bin limits in x-dimension, specified as the comma-separated pair consisting of 'XBinLimits' and a two-element vector, [xbmin,xbmax]. The vector indicates the first and last bin edges in the xdimension. binScatterPlot only plots data that falls within the bin limits inclusively, Data(Data(:,1)>=xbmin & Data(:,1)=ybmin & Data(:,2) 0; X = data(~nans,1:3); Y = data(~nans,4:5);

Compute the sample canonical correlation. [A,B,r,U,V] = canoncorr(X,Y);

View the output of A to determine the linear combinations of displacement, horsepower, and weight that make up the canonical variables of X. A

35-320

canoncorr

A = 3×2 0.0025 0.0202 -0.0000

0.0048 0.0409 -0.0027

A(3,1) is displayed as —0.000 because it is very small. Display A(3,1) separately. A(3,1) ans = -2.4737e-05

The first canonical variable of X is u1 = 0.0025*Disp + 0.0202*HP — 0.000025*Wgt. The second canonical variable of X is u2 = 0.0048*Disp + 0.0409*HP — 0.0027*Wgt. View the output of B to determine the linear combinations of acceleration and MPG that make up the canonical variables of Y. B B = 2×2 -0.1666 -0.0916

-0.3637 0.1078

The first canonical variable of Y is v1 = —0.1666*Accel — 0.0916*MPG. The second canonical variable of Y is v2 = —0.3637*Accel + 0.1078*MPG. Plot the scores of the canonical variables of X and Y against each other. t = tiledlayout(2,2); title(t,'Canonical Scores of X vs Canonical Scores of Y') xlabel(t,'Canonical Variables of X') ylabel(t,'Canonical Variables of Y') t.TileSpacing = 'compact'; nexttile plot(U(:,1),V(:,1),'.') xlabel('u1') ylabel('v1') nexttile plot(U(:,2),V(:,1),'.') xlabel('u2') ylabel('v1') nexttile plot(U(:,1),V(:,2),'.') xlabel('u1') ylabel('v2') nexttile plot(U(:,2),V(:,2),'.') xlabel('u2') ylabel('v2')

35-321

35

Functions

The pairs of canonical variables {ui, vi} are ordered from the strongest to weakest correlation, with all other pairs independent. Return the correlation coefficient of the variables u1 and v1. r(1) ans = 0.8782

Input Arguments X — Input matrix matrix Input matrix, specified as an n-by-d1 matrix. The rows of X correspond to observations, and the columns correspond to variables. Data Types: single | double Y — Input matrix matrix Input matrix, specified as an n-by-d2 matrix where X is an n-by-d1 matrix. The rows of Y correspond to observations, and the columns correspond to variables. Data Types: single | double 35-322

canoncorr

Output Arguments A — Sample canonical coefficients for X variables matrix Sample canonical coefficients for the variables in X, returned as a d1-by-d matrix, where d = min(rank(X),rank(Y)). The jth column of A contains the linear combination of variables that makes up the jth canonical variable for X. If X is less than full rank, canoncorr gives a warning and returns zeros in the rows of A corresponding to dependent columns of X. B — Sample canonical coefficients for Y variables matrix Sample canonical coefficients for the variables in Y, returned as a d2-by-d matrix, where d = min(rank(X),rank(Y)). The jth column of B contains the linear combination of variables that makes up the jth canonical variable for Y. If Y is less than full rank, canoncorr gives a warning and returns zeros in the rows of B corresponding to dependent columns of Y. r — Sample canonical correlations vector Sample canonical correlations, returned as a 1-by-d vector, where d = min(rank(X),rank(Y)). The jth element of r is the correlation between the jth columns of U and V. U — Canonical scores for the X variables matrix Canonical scores for the variables in X, returned as an n-by-d matrix, where X is an n-by-d1 matrix and d = min(rank(X),rank(Y)). V — Canonical scores for the Y variables matrix Canonical scores for the variables in Y, returned as an n-by-d matrix, where Y is an n-by-d2 matrix and d = min(rank(X),rank(Y)). stats — Hypothesis test information structure Hypothesis test information, returned as a structure. This information relates to the sequence of d (k)

null hypotheses H0 that the (k+1)st through dth correlations are all zero for k=1,…,d-1, and d = min(rank(X),rank(Y)). The fields of stats are 1-by-d vectors with elements corresponding to the values of k. 35-323

35

Functions

Field

Description

Wilks

Wilks' lambda (likelihood ratio) statistic

df1

Degrees of freedom for the chi-squared statistic, and the numerator degrees of freedom for the F statistic

df2

Denominator degrees of freedom for the F statistic

F

Rao's approximate F statistic for H0

pF

Right-tail significance level for F

chisq

Bartlett's approximate chi-squared statistic for H0 with Lawley's modification

pChisq

Right-tail significance level for chisq

(k)

(k)

stats has two other fields (dfe and p), which are equal to df1 and pChisq, respectively, and exist for historical reasons. Data Types: struct

More About Canonical Correlation Analysis The canonical scores of the data matrices X and Y are defined as Ui = Xai V i = Ybi where ai and bi maximize the Pearson correlation coefficient ρ(Ui,Vi) subject to being uncorrelated to all previous canonical scores and scaled so that Ui and Vi have zero mean and unit variance. The canonical coefficients of X and Y are the matrices A and B with columns ai and bi, respectively. The canonical variables of X and Y are the linear combinations of the columns of X and Y given by the canonical coefficients in A and B respectively. The canonical correlations are the values ρ(Ui,Vi) measuring the correlation of each pair of canonical variables of X and Y.

Algorithms canoncorr computes A, B, and r using qr and svd. canoncorr computes U and V as U = (X— mean(X))*A and V = (Y—mean(Y))*B.

Version History Introduced before R2006a

35-324

canoncorr

References [1] Krzanowski, W. J. Principles of Multivariate Analysis: A User's Perspective. New York: Oxford University Press, 1988. [2] Seber, G. A. F. Multivariate Observations. Hoboken, NJ: John Wiley & Sons, Inc., 1984.

See Also manova1 | pca

35-325

35

Functions

canonvars Canonical variables

Syntax canon = canonvars(maov) canon = canonvars(maov,factor) [canon,eigenvec,eigenval] = canonvars( ___ )

Description canon = canonvars(maov) returns the values of the canonical variables for the response data in the manova object maov. This syntax is supported for one-way manova objects only. canon = canonvars(maov,factor) specifies factor canonvars uses to group the response data. This syntax is supported for one-, two-, and N-way manova objects. [canon,eigenvec,eigenval] = canonvars( ___ ) additionally returns the eigenvectors and eigenvalues canonvars uses to calculate the canonical variables, using any of the input argument combinations in the previous syntaxes.

Examples Calculate Canonical Response Data for One-Way MANOVA Load the fisheriris data set. load fisheriris

The column vector species contains iris flowers of three different species: setosa, versicolor, and virginica. The matrix meas contains four types of measurements for the flower: the length and width of sepals and petals in centimeters. Perform a one-way MANOVA with species as the factor and the measurements in meas as the response variables. maov = manova(species,meas);

maov is a one-way manova object that contains the results of the one-way MANOVA. Calculate the canonical response data for maov. canon = canonvars(maov) canon = 150×4 -8.0618 -7.1287 -7.4898 -6.8132

35-326

0.3004 -0.7867 -0.2654 -0.6706

-0.1119 0.8095 0.4157 0.0125

0.2549 0.3782 -0.3684 -0.8191

canonvars

-8.1323 -7.7019 -7.2126 -7.6053 -6.5606 -7.3431 ⋮

0.5145 1.4617 0.3558 -0.0116 -1.0152 -0.9473

-0.4403 -0.5836 0.0540 -0.2522 0.5205 -0.1338

-0.1920 0.2032 -1.0747 -0.0486 -0.9845 0.0526

The output shows the canonical response data for the first ten observations. Each column of the output corresponds to a different canonical variable. Create a scatter plot using the first and second canonical variables. gscatter(canon(:,1),canon(:,2),species) xlabel("canon1") ylabel("canon2")

The function calculates the canonical variables by finding the lowest dimensional representation of the response variables that maximizes the correlation between the response variables and the factor values. The plot shows that the response data for the first two canonical variables is mostly separate 35-327

35

Functions

for the different factor values. In particular, observations with the first canonical variable less than 0 correspond to the setosa group. Observations with the first canonical response variable greater than 0 and less than 5 correspond to the versicolor group. Finally, observations with the first canonical response variable greater than 5 correspond to the virginica group.

Return Canonical Coefficients Load the carsmall data set. load carsmall

The variable Model_Year contains data for the year a car was manufactured, and the variable Cylinders contains data for the number of engine cylinders in the car. The Acceleration and Displacement variables contain data for car acceleration and displacement. Use the table function to create a table from the data in Model_Year and Cylinders. tbl = table(Model_Year,Cylinders,VariableNames=["Year" "Cylinders"]);

Create a matrix of response variables from Acceleration and Displacement. y = [Acceleration Displacement];

Perform a two-way MANOVA using the factor values in tbl and the response variables in y. maov = manova(tbl,y);

maov is a two-way manova object that contains the results of the two-way MANOVA. Return the canonical response data, canonical coefficients, and eigenvalues for the response data in maov, grouped by the Cylinders factor. [canon,eigenvec,eigenval] = canonvars(maov,"Cylinders") canon = 100×2 2.9558 4.2381 3.2798 2.8661 2.7996 6.5913 7.3336 6.9131 7.3680 5.4195 ⋮

-0.5358 -0.4096 -0.8889 -0.5600 -1.2391 -0.4348 -0.6749 -1.0089 -0.2249 -1.4126

eigenvec = 2×2 0.0045 0.0299

35-328

0.4419 0.0081

canonvars

eigenval = 2×1 6.5170 0.0808

The output shows the canonical response data for each canonical variable, and the vectors of canonical coefficients for each canonical variable with their corresponding eigenvalues. You can use the coefficients in eigenvec to calculate canonical response data manually. Normalize the training data in maov.Y by using the mean function. normres = maov.Y - mean(maov.Y) normres = 100×2 -3.0280 -3.5280 -4.0280 -3.0280 -4.5280 -5.0280 -6.0280 -6.5280 -5.0280 -6.5280 ⋮

99.4000 142.4000 110.4000 96.4000 94.4000 221.4000 246.4000 232.4000 247.4000 182.4000

Calculate the product of the matrix of normalized response data and matrix of canonical coefficients. mcanon = normres*eigenvec mcanon = 100×2 2.9558 4.2381 3.2798 2.8661 2.7996 6.5913 7.3336 6.9131 7.3680 5.4195 ⋮

-0.5358 -0.4096 -0.8889 -0.5600 -1.2391 -0.4348 -0.6749 -1.0089 -0.2249 -1.4126

The first ten rows of mcanon are identical to the first ten rows of data in canon. Check that mcanon is identical to canon by using the max and sum functions. max(abs(canon-mcanon)) ans = 1×2 0

0

35-329

35

Functions

The zero output confirms that the two methods of calculating the canonical response data are equivalent.

Input Arguments maov — MANOVA results manova object MANOVA results, specified as a manova object. The properties of maov contain the factor values and response data used by canonvars to calculate the canonical response data. factor — Factor used to group response data string scalar | character array Factor used to group the response data, specified as a string scalar or character array. factor must be a name in maov.FactorNames. Example: "Factor2" Data Types: char | string

Output Arguments canon — Canonical response data n-by-r numeric matrix Canonical response data, returned as an n-by-r numeric matrix. n is the number of observations in maov, and r is the number of response variables. To get the canonical response data, canonvars normalizes the data in maov.Y and then calculates linear combinations of the normalized data using the canonical coefficients. For more information, see “eigenvec” on page 35-0 . Data Types: single | double eigenvec — Canonical coefficients r-by-r numeric matrix Canonical coefficients used to calculate the canonical response data, returned as an r-by-r numeric matrix. r is the number of response variables in maov.Y. Each column of eigenvec corresponds to a different canonical variable. The leftmost column of eigenvec corresponds to the canonical variable that is the most correlated to the factor values, and the rightmost column corresponds to the variable that is the least correlated. The canonical variables are uncorrelated to each other. For more information, see “Canonical Coefficients” on page 35-331. Data Types: single | double eigenval — Eigenvalues r-by-1 numeric vector Eigenvalues for the characteristic equation canonvars uses to calculate the canonical coefficients, returned as an r-by-1 numeric vector. For more information, see “Canonical Coefficients” on page 35331. Data Types: single | double 35-330

canonvars

More About Canonical Coefficients The canonical coefficients are the r eigenvectors of the characteristic equation Mv = λv M = HE−1, where H is the hypothesis matrix for maov, E is the error matrix, and r is the number of response variables. The canonical variables correspond to projections of the response variables into linear spaces with dimensions equal to or smaller than the number of response variables. The first canonical variable is the projection of the response variables into the one-dimensional Euclidean space that has the maximum correlation with the factor values. For 0 < n ≤ r, the nth canonical variable is the projection into the one-dimensional Euclidean space that has the maximum correlation with the factor values, subject to the constraint that the canonical variable is uncorrelated with the previous n – 1 canonical variables. For more information, see Qe and Qh in “Multivariate Analysis of Variance for Repeated Measures” on page 9-62.

Version History Introduced in R2023b

See Also manova Topics “Multivariate Analysis of Variance for Repeated Measures” on page 9-62

35-331

35

Functions

capability Process capability indices

Syntax S = capability(data,specs)

Description S = capability(data,specs) estimates capability indices for measurements in data given the specifications in specs. data can be either a vector or a matrix of measurements. If data is a matrix, indices are computed for the columns. specs can be either a two-element vector of the form [L,U] containing lower and upper specification limits, or (if data is a matrix) a two-row matrix with the same number of columns as data. If there is no lower bound, use -Inf as the first element of specs. If there is no upper bound, use Inf as the second element of specs. The output S is a structure with the following fields: • mu — Sample mean • sigma — Sample standard deviation • P — Estimated probability of being within limits • Pl — Estimated probability of being below L • Pu — Estimated probability of being above U • Cp — (U-L)/(6*sigma) • Cpl — (mu-L)./(3.*sigma) • Cpu — (U-mu)./(3.*sigma) • Cpk — min(Cpl,Cpu) Indices are computed under the assumption that data values are independent samples from a normal population with constant mean and variance. Indices divide a “specification width” (between specification limits) by a “process width” (between control limits). Higher ratios indicate a process with fewer measurements outside of specification.

Examples Compute Capability Indices Simulate a sample from a process with a mean of 3 and a standard deviation of 0.005. rng default; % for reproducibility data = normrnd(3,0.005,100,1);

Compute capability indices if the process has an upper specification limit of 3.01 and a lower specification limit of 2.99. S = capability(data,[2.99 3.01])

35-332

capability

S = struct mu: sigma: P: Pl: Pu: Cp: Cpl: Cpu: Cpk:

with fields: 3.0006 0.0058 0.9129 0.0339 0.0532 0.5735 0.6088 0.5382 0.5382

Visualize the specification and process widths. capaplot(data,[2.99 3.01]); grid on

Version History Introduced in R2006b

References [1] Montgomery, D. Introduction to Statistical Quality Control. Hoboken, NJ: John Wiley & Sons, 1991, pp. 369–374. 35-333

35

Functions

See Also capaplot | histfit

35-334

capaplot

capaplot Process capability plot

Syntax p = capaplot(data,specs) [p,h] = capaplot(data,specs)

Description p = capaplot(data,specs) estimates the mean of and variance for the observations in input vector data, and plots the pdf of the resulting T distribution. The observations in data are assumed to be normally distributed. The output, p, is the probability that a new observation from the estimated distribution will fall within the range specified by the two-element vector specs. The portion of the distribution between the lower and upper bounds specified in specs is shaded in the plot. [p,h] = capaplot(data,specs) additionally returns handles to the plot elements in h. capaplot treats NaN values in data as missing, and ignores them.

Examples Create a Process Capability Plot Randomly generate sample data from a normal process with a mean of 3 and a standard deviation of 0.005. rng default; % For reproducibility data = normrnd(3,0.005,100,1);

Compute capability indices if the process has an upper specification limit of 3.01 and a lower specification limit of 2.99. S = capability(data,[2.99 3.01]) S = struct mu: sigma: P: Pl: Pu: Cp: Cpl: Cpu: Cpk:

with fields: 3.0006 0.0058 0.9129 0.0339 0.0532 0.5735 0.6088 0.5382 0.5382

Visualize the specification and process widths. capaplot(data,[2.99 3.01]); grid on

35-335

35

Functions

Version History Introduced before R2006a

See Also capability | histfit

35-336

caseread

caseread Read case names from file

Syntax names = caseread(filename) names = caseread

Description names = caseread(filename) reads the contents of filename and returns a character array names. The caseread function treats each line of the file as a separate case name. Specify filename as either the name of a file in the current folder or the complete path name of a file. filename can have one of the following file extensions: • .txt, .dat, or .csv for delimited text files • .xls, .xlsm, or .xlsx for Excel spreadsheet files names = caseread opens the Select File to Open dialog box so that you can interactively select the file to read.

Examples Write and Read Case Names Create a character array of case names representing months. months = char('January','February', ... 'March','April','May');

Write the names to a file named months.dat. View the contents of the file by using the type function. casewrite(months,'months.dat') type months.dat January February March April May

Read the names in the months.dat file. names = caseread('months.dat') names = 5x8 char array 'January ' 'February' 'March '

35-337

35

Functions

'April 'May

' '

Input Arguments filename — Name of file to read character vector | string scalar Name of the file to read, specified as a character vector or string scalar. Depending on the location of the file, filename has one of these forms. Location of File

Form

Current folder or folder on the MATLAB path

Specify the name of the file in filename. Example: 'myTextFile.csv'

Folder that is not the current folder or a folder on Specify the full or relative path name in the MATLAB path filename. Example: 'C:\myFolder\myTextFile.csv' Example: 'months.dat' Data Types: char | string

Alternative Functionality Instead of using casewrite and caseread with character arrays, consider using writecell and readcell with cell arrays. For example: months = {'January';'February';'March';'April';'May'}; writecell(months,'months.dat') names = readcell('months.dat') names = 5×1 cell array {'January' } {'February'} {'March' } {'April' } {'May' }

Version History Introduced before R2006a

See Also casewrite | gname | readcell | writecell | readtable | writetable 35-338

casewrite

casewrite Write case names to file

Syntax casewrite(strmat,filename) casewrite(strmat)

Description casewrite(strmat,filename) writes the contents of the character array or string column vector strmat to a file filename. Each row of strmat represents one case name, and casewrite writes each name to a separate line in filename. Specify filename as either a file name (to write the file to the current folder) or a complete path name (to write the file to a different folder). filename can have one of the following file extensions: • .txt, .dat, or .csv for delimited text files • .xls, .xlsm, or .xlsx for Excel spreadsheet files casewrite(strmat) opens the Select File to Write dialog box so that you can interactively specify the file to write.

Examples Write and Read Case Names Create a character array of case names representing months. months = char('January','February', ... 'March','April','May');

Write the names to a file named months.dat. View the contents of the file by using the type function. casewrite(months,'months.dat') type months.dat January February March April May

Read the names in the months.dat file. names = caseread('months.dat') names = 5x8 char array 'January ' 'February'

35-339

35

Functions

'March 'April 'May

' ' '

Input Arguments strmat — Case names character array | string column vector Case names, specified as a character array or string column vector. Each row of strmat corresponds to a case name and becomes a line in filename. Data Types: char | string filename — Name of file to write character vector | string scalar Name of the file to write, specified as a character vector or string scalar. Depending on the location you are writing to, filename has one of these forms. Location of File

Form

Current folder

Specify the name of the file in filename. Example: 'myTextFile.csv'

Folder that is different from the current folder

Specify the full or relative path name in filename. Example: 'C:\myFolder\myTextFile.csv'

Example: 'months.dat' Data Types: char | string

Alternative Functionality Instead of using casewrite and caseread with character arrays, consider using writecell and readcell with cell arrays. For example: months = {'January';'February';'March';'April';'May'}; writecell(months,'months.dat') names = readcell('months.dat') names = 5×1 cell array {'January' } {'February'} {'March' } {'April' } {'May' }

35-340

casewrite

Version History Introduced before R2006a

See Also gname | caseread | readcell | writecell | readtable | writetable

35-341

35

Functions

DaviesBouldinEvaluation Davies-Bouldin criterion clustering evaluation object

Description DaviesBouldinEvaluation is an object consisting of sample data (X), clustering data (OptimalY), and Davies-Bouldin criterion values (CriterionValues) used to evaluate the optimal number of clusters (OptimalK). The Davies-Bouldin criterion is based on a ratio of within-cluster and betweencluster distances. The optimal clustering solution has the smallest Davies-Bouldin index value. For more information, see “Davies-Bouldin Criterion” on page 35-346.

Creation Create a Davies-Bouldin criterion clustering evaluation object by using the evalclusters function and specifying the criterion as "DaviesBouldin". You can then use compact to create a compact version of the Davies-Bouldin criterion clustering evaluation object. The function removes the contents of the properties X, OptimalY, and Missing.

Properties Clustering Evaluation Properties ClusteringFunction — Clustering algorithm 'kmeans' | 'linkage' | 'gmdistribution' | function handle | [] This property is read-only. Clustering algorithm used to cluster the sample data, returned as 'kmeans', 'linkage', 'gmdistribution', or a function handle. If you specify the clustering solutions as an input argument to evalclusters when you create the clustering evaluation object, then ClusteringFunction is empty. Value

Description

'kmeans'

Cluster the data in X using the kmeans clustering algorithm, with EmptyAction set to "singleton" and Replicates set to 5.

'linkage'

Cluster the data in X using the clusterdata agglomerative clustering algorithm, with Linkage set to "ward".

'gmdistribution'

Cluster the data in X using the gmdistribution Gaussian mixture distribution algorithm, with SharedCov set to true and Replicates set to 5.

Data Types: double | char | function_handle 35-342

DaviesBouldinEvaluation

CriterionName — Name of criterion 'DaviesBouldin' This property is read-only. Name of the criterion used for clustering evaluation, returned as 'DaviesBouldin'. CriterionValues — Criterion values numeric vector This property is read-only. Criterion values, returned as a numeric vector. Each value corresponds to a proposed number of clusters in InspectedK. Data Types: double InspectedK — List of number of proposed clusters positive integer vector This property is read-only. List of the number of proposed clusters for which to compute criterion values, returned as a positive integer vector. Data Types: double OptimalK — Optimal number of clusters positive integer scalar This property is read-only. Optimal number of clusters, returned as a positive integer scalar. Data Types: double OptimalY — Optimal clustering solution positive integer column vector | [] This property is read-only. Optimal clustering solution corresponding to OptimalK, returned as a positive integer column vector. Each row of OptimalY represents the cluster index of the corresponding observation (or row) in X. If you specify the clustering solutions as an input argument to evalclusters when you create the clustering evaluation object, or if the clustering evaluation object is compact (see compact), then OptimalY is empty. Data Types: double Sample Data Properties Missing — Excluded data logical column vector | [] This property is read-only. 35-343

35

Functions

Excluded data, returned as a logical column vector. If an element of Missing is true, then the corresponding observation (or row) in the data matrix X is not used in the clustering solutions. If the clustering evaluation object is compact (see compact), then Missing is empty. Data Types: double | logical NumObservations — Number of observations positive integer scalar This property is read-only. Number of observations in the data matrix X, ignoring observations with missing (NaN) values, returned as a positive integer scalar. Data Types: double X — Data used for clustering numeric matrix | [] This property is read-only. Data used for clustering, returned as a numeric matrix. Rows correspond to observations, and columns correspond to variables. If the clustering evaluation object is compact (see compact), then X is empty. Data Types: single | double

Object Functions addK compact plot

Evaluate additional numbers of clusters Compact clustering evaluation object Plot clustering evaluation object criterion values

Examples Evaluate Clustering Solution Using Davies-Bouldin Criterion Evaluate the optimal number of clusters using the Davies-Bouldin clustering evaluation criterion. Generate sample data containing random numbers from three multivariate distributions with different parameter values. rng("default") % For reproducibility n = 200; mu1 = [2 2]; sigma1 = [0.9 -0.0255; -0.0255 0.9]; mu2 = [5 5]; sigma2 = [0.5 0; 0 0.3]; mu3 = [-2 -2]; sigma3 = [1 0; 0 0.9]; X = [mvnrnd(mu1,sigma1,n); ...

35-344

DaviesBouldinEvaluation

mvnrnd(mu2,sigma2,n); ... mvnrnd(mu3,sigma3,n)];

Evaluate the optimal number of clusters using the Davies-Bouldin criterion. Cluster the data using kmeans. evaluation = evalclusters(X,"kmeans","DaviesBouldin","KList",1:6) evaluation = DaviesBouldinEvaluation with properties: NumObservations: InspectedK: CriterionValues: OptimalK:

600 [1 2 3 4 5 6] [NaN 0.4663 0.4454 0.8316 1.0444 0.9236] 3

The OptimalK value indicates that, based on the Davies-Bouldin criterion, the optimal number of clusters is three. Plot the Davies-Bouldin criterion values for each number of clusters tested. plot(evaluation)

The plot shows that the lowest Davies-Bouldin value occurs at three clusters, suggesting that the optimal number of clusters is three. Create a grouped scatter plot to visually examine the suggested clusters. 35-345

35

Functions

clusters = evaluation.OptimalY; gscatter(X(:,1),X(:,2),clusters,[],"xod")

The plot shows three distinct clusters within the data: cluster 1 in the lower-left corner, cluster 2 in the upper-right corner, and cluster 3 near the center of the plot.

More About Davies-Bouldin Criterion The Davies-Bouldin criterion is based on a ratio of within-cluster and between-cluster distances. The Davies-Bouldin index is defined as DB =

k

1 max j ≠ i Di, j , ki∑ =1

where Di,j is the within-to-between cluster distance ratio for the ith and jth clusters. In mathematical terms, Di, j =

35-346

di + d j . di, j

DaviesBouldinEvaluation

d i is the average distance between each point in the ith cluster and the centroid of the ith cluster. d j is the average distance between each point in the jth cluster and the centroid of the jth cluster. di, j is the Euclidean distance between the centroids of the ith and jth clusters. The maximum value of Di,j represents the worst-case within-to-between cluster ratio for cluster i. The optimal clustering solution has the smallest Davies-Bouldin index value.

Version History Introduced in R2013b

References [1] Davies, D. L., and D. W. Bouldin. “A Cluster Separation Measure.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. PAMI-1, No. 2, 1979, pp. 224–227.

See Also evalclusters | CalinskiHarabaszEvaluation | GapEvaluation | SilhouetteEvaluation

35-347

35

Functions

cat Class: dataset (Not Recommended) Concatenate dataset arrays Note The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Syntax ds = cat(dim, ds1, ds2, ...)

Description ds = cat(dim, ds1, ds2, ...) concatenates the dataset arrays ds1, ds2, ... along dimension dim by calling the dataset/horzcat or dataset/vertcat method. dim must be 1 or 2.

See Also horzcat | vertcat

35-348

cdf

cdf Cumulative distribution function for Gaussian mixture distribution

Syntax y = cdf(gm,X)

Description y = cdf(gm,X) returns the cumulative distribution function (cdf) of the Gaussian mixture distribution gm, evaluated at the values in X.

Examples Compute cdf Values Create a gmdistribution object and compute its cdf values. Define the distribution parameters (means and covariances) of a two-component bivariate Gaussian mixture distribution. mu = [1 2;-3 -5]; sigma = [1 1]; % shared diagonal covariance matrix

Create a gmdistribution object by using the gmdistribution function. By default, the function creates an equal proportion mixture. gm = gmdistribution(mu,sigma) gm = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.500000 Mean: 1 2 Component 2: Mixing proportion: 0.500000 Mean: -3 -5

Compute the cdf values of gm. X = [0 0;1 2;3 3;5 3]; cdf(gm,X) ans = 4×1 0.5011 0.6250 0.9111 0.9207

35-349

35

Functions

Plot cdf Create a gmdistribution object and plot its cdf. Define the distribution parameters (means, covariances, and mixing proportions) of two bivariate Gaussian mixture components. p = [0.4 0.6]; mu = [1 2;-3 -5]; sigma = cat(3,[2 .5],[1 1])

% Mixing proportions % Means % Covariances 1-by-2-by-2 array

sigma = sigma(:,:,1) = 2.0000

0.5000

sigma(:,:,2) = 1

1

The cat function concatenates the covariances along the third array dimension. The defined covariance matrices are diagonal matrices. sigma(1,:,i) contains the diagonal elements of the covariance matrix of component i. Create a gmdistribution object by using the gmdistribution function. gm = gmdistribution(mu,sigma,p) gm = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.400000 Mean: 1 2 Component 2: Mixing proportion: 0.600000 Mean: -3 -5

Plot the cdf of the Gaussian mixture distribution by using fsurf. gmCDF = @(x,y) arrayfun(@(x0,y0) cdf(gm,[x0 y0]),x,y); fsurf(gmCDF,[-10 10])

35-350

cdf

Input Arguments gm — Gaussian mixture distribution gmdistribution object Gaussian mixture distribution, also called Gaussian mixture model (GMM), specified as a gmdistribution object. You can create a gmdistribution object using gmdistribution or fitgmdist. Use the gmdistribution function to create a gmdistribution object by specifying the distribution parameters. Use the fitgmdist function to fit a gmdistribution model to data given a fixed number of components. X — Values at which to evaluate cdf n-by-m numeric matrix Values at which to evaluate the cdf, specified as an n-by-m numeric matrix, where n is the number of observations and m is the number of variables in each observation. Data Types: single | double

Output Arguments y — cdf values n-by-1 numeric vector 35-351

35

Functions

cdf values of the Gaussian mixture distribution gm, evaluated at X, returned as an n-by-1 numeric vector, where n is the number of observations in X.

Version History Introduced in R2007b

See Also gmdistribution | fitgmdist | pdf | mvncdf | random Topics “Create Gaussian Mixture Model” on page 5-120 “Fit Gaussian Mixture Model to Data” on page 5-123 “Simulate Data from Gaussian Mixture Model” on page 5-127 “Cluster Using Gaussian Mixture Model” on page 17-39

35-352

ccdesign

ccdesign Central composite design

Syntax dCC = ccdesign(n) [dCC,blocks] = ccdesign(n) [...] = ccdesign(n,'Name',value)

Description dCC = ccdesign(n) generates a central composite design for n factors. n must be an integer 2 or larger. The output matrix dCC is m-by-n, where m is the number of runs in the design. Each row represents one run, with settings for all factors represented in the columns. Factor values are normalized so that the cube points take values between -1 and 1. [dCC,blocks] = ccdesign(n) requests a blocked design. The output blocks is an m-by-1 vector of block numbers for each run. Blocks indicate runs that are to be measured under similar conditions to minimize the effect of inter-block differences on the parameter estimates. [...] = ccdesign(n,'Name',value) specifies one or more optional name/value pairs for the design. Valid parameters and their values are listed in the following table. Specify Name in single quotes. Parameter

Description

Values

Value Description

center

Number of center points.

Integer

Number of center points to include.

'uniform'

Select number of center points to give uniform precision.

'orthogonal'

Select number of center points to give an orthogonal design. This is the default.

0

Whole design. Default when n ≤ 4.

1

1/2 fraction. Default when 4 < n ≤ 7 or n > 11.

2

1/4 fraction. Default when 7 1e-8) ans = 0x1 empty double column vector

The comparison confirms that NegLoss and NegLoss_mex are equal within the tolerance 1e–8. Retrain Model and Update Parameters in Generated Code Retrain the model using a different setting. Specify 'KernelScale' as 'auto' so that the software selects an appropriate scale factor using a heuristic procedure. t_new = templateSVM('KernelFunction','gaussian','Standardize',true,'KernelScale','auto'); retrainedMdl = fitcecoc(X,Y,'Learners',t_new);

Extract parameters to update by using validatedUpdateInputs. This function detects the modified model parameters in retrainedMdl and validates whether the modified parameter values satisfy the coder attributes of the parameters. params = validatedUpdateInputs(configurer,retrainedMdl);

Update parameters in the generated code. ClassificationECOCModel('update',params)

Verify Generated Code Compare the outputs from the predict function of retrainedMdl to the outputs from the predict function in the updated MEX function. 35-462

ClassificationECOCCoderConfigurer

[label,NegLoss] = predict(retrainedMdl,X,'BinaryLoss','exponential','Decoding','lossbased'); [label_mex,NegLoss_mex] = ClassificationECOCModel('predict',X,'BinaryLoss','exponential','Decodin isequal(label,label_mex) ans = logical 1 find(abs(NegLoss-NegLoss_mex) > 1e-8) ans = 0x1 empty double column vector

The comparison confirms that label and label_mex are equal, and NegLoss and NegLoss_mex are equal within the tolerance.

More About LearnerCoderInput Object A coder configurer uses a LearnerCoderInput object to specify the coder attributes of predict and update input arguments. A LearnerCoderInput object has the following attributes to specify the properties of an input argument array in the generated code. Attribute Name

Description

SizeVector

Array size if the corresponding VariableDimensions value is false. Upper bound of the array size if the corresponding VariableDimensions value is true. To allow an unbounded array, specify the bound as Inf.

VariableDimensions

Indicator specifying whether each dimension of the array has a variable size or fixed size, specified as true (logical 1) or false (logical 0): • A value of true (logical 1) means that the corresponding dimension has a variable size. • A value of false (logical 0) means that the corresponding dimension has a fixed size.

DataType

Data type of the array

Tunability

Indicator specifying whether or not predict or update includes the argument as an input in the generated code, specified as true (logical 1) or false (logical 0). If you specify other attribute values when Tunability is false, the software sets Tunability to true.

After creating a coder configurer, you can modify the coder attributes by using dot notation. For example, specify the coder attributes of the coefficients Alpha in BinaryLearners of the coder configurer configurer: 35-463

35

Functions

configurer.BinaryLearners.Alpha.SizeVector = [100 1]; configurer.BinaryLearners.Alpha.VariableDimensions = [1 0]; configurer.BinaryLearners.Alpha.DataType = 'double';

If you specify the verbosity level (Verbose) as true (default), then the software displays notification messages when you modify the coder attributes of a machine learning model parameter and the modification changes the coder attributes of other dependent parameters. EnumeratedInput Object A coder configurer uses an EnumeratedInput object to specify the coder attributes of predict input arguments that have a finite set of available values. An EnumeratedInput object has the following attributes to specify the properties of an input argument array in the generated code. Attribute Name

Description

Value

Value of the predict argument in the generated code, specified as a character vector or a LearnerCoderInput on page 35-463 object. • Character vector in BuiltInOptions — You can specify one of the BuiltInOptions using either the option name or its index value. For example, to choose the first option, specify Value as either the first character vector in BuiltInOptions or 1. • Character vector designating a custom function name — To use a custom option, define a custom function on the MATLAB search path, and specify Value as the name of the custom function. • LearnerCoderInput on page 35-463 object — If you set IsConstant to false (logical 0), then the software changes Value to a LearnerCoderInput on page 35-463 object with the following read-only coder attribute values. These values indicate that the input in the generated code is a variable-size, tunable character vector that is one of the available values in BuiltInOptions. • SizeVector — [1 c], indicating the upper bound of the array size, where c is the length of the longest available character vector in Option • VariableDimensions — [0 1], indicating that the array is a variable-size vector • DataType — 'char' • Tunability — 1 The default value of Value is consistent with the default value of the corresponding predict argument, which is one of the character vectors in BuiltInOptions.

35-464

ClassificationECOCCoderConfigurer

Attribute Name

Description

SelectedOption

Status of the selected option, specified as 'Built-in', 'Custom', or 'NonConstant'. The software sets SelectedOption according to Value: • 'Built-in'(default) — When Value is one of the character vectors in BuiltInOptions • 'Custom' — When Value is a character vector that is not in BuiltInOptions • 'NonConstant' — When Value is a LearnerCoderInput on page 35-463 object This attribute is read-only.

BuiltInOptions

List of available character vectors for the corresponding predict argument, specified as a cell array. This attribute is read-only.

IsConstant

Indicator specifying whether or not the array value is a compiletime constant (coder.Constant) in the generated code, specified as true (logical 1, default) or false (logical 0). If you set this value to false, then the software changes Value to a LearnerCoderInput on page 35-463 object.

Tunability

Indicator specifying whether or not predict includes the argument as an input in the generated code, specified as true (logical 1) or false (logical 0, default). If you specify other attribute values when Tunability is false, the software sets Tunability to true.

After creating a coder configurer, you can modify the coder attributes by using dot notation. For example, specify the coder attributes of BinaryLoss of the coder configurer configurer: configurer.BinaryLoss.Value = 'linear';

Version History Introduced in R2019a

See Also learnerCoderConfigurer | ClassificationECOC | CompactClassificationECOC | predict | update | ClassificationSVMCoderConfigurer | ClassificationLinearCoderConfigurer Topics “Introduction to Code Generation” on page 34-3 “Code Generation for Prediction and Update Using Coder Configurer” on page 34-90

35-465

35

Functions

ClassificationDiscriminant class Superclasses: CompactClassificationDiscriminant Discriminant analysis classification

Description A ClassificationDiscriminant object encapsulates a discriminant analysis classifier, which is a Gaussian mixture model for data generation. A ClassificationDiscriminant object can predict responses for new data using the predict method. The object contains the data used for training, so can compute resubstitution predictions.

Construction Create a ClassificationDiscriminant object by using fitcdiscr.

Properties BetweenSigma p-by-p matrix, the between-class covariance, where p is the number of predictors. CategoricalPredictors Categorical predictor indices, which is always empty ([]) . ClassNames List of the elements in the training data Y with duplicates removed. ClassNames can be a categorical array, cell array of character vectors, character array, logical vector, or a numeric vector. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.) Coeffs k-by-k structure of coefficient matrices, where k is the number of classes. Coeffs(i,j) contains coefficients of the linear or quadratic boundaries between classes i and j. Fields in Coeffs(i,j): • DiscrimType • Class1 — ClassNames(i) • Class2 — ClassNames(j) • Const — A scalar • Linear — A vector with p components, where p is the number of columns in X • Quadratic — p-by-p matrix, exists for quadratic DiscrimType The equation of the boundary between class i and class j is Const + Linear * x + x' * Quadratic * x = 0, 35-466

ClassificationDiscriminant class

where x is a column vector of length p. If fitcdiscr had the FillCoeffs name-value pair set to 'off' when constructing the classifier, Coeffs is empty ([]). Cost Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (i.e., the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. Change a Cost matrix using dot notation: obj.Cost = costMatrix. Delta Value of the Delta threshold for a linear discriminant model, a nonnegative scalar. If a coefficient of obj has magnitude smaller than Delta, obj sets this coefficient to 0, and so you can eliminate the corresponding predictor from the model. Set Delta to a higher value to eliminate more predictors. Delta must be 0 for quadratic discriminant models. Change Delta using dot notation: obj.Delta = newDelta. DeltaPredictor Row vector of length equal to the number of predictors in obj. If DeltaPredictor(i) < Delta then coefficient i of the model is 0. If obj is a quadratic discriminant model, all elements of DeltaPredictor are 0. DiscrimType Character vector specifying the discriminant type. One of: • 'linear' • 'quadratic' • 'diagLinear' • 'diagQuadratic' • 'pseudoLinear' • 'pseudoQuadratic' Change DiscrimType using dot notation: obj.DiscrimType = newDiscrimType. You can change between linear types, or between quadratic types, but cannot change between linear and quadratic types. Gamma Value of the Gamma regularization parameter, a scalar from 0 to 1. Change Gamma using dot notation: obj.Gamma = newGamma. • If you set 1 for linear discriminant, the discriminant sets its type to 'diagLinear'. 35-467

35

Functions

• If you set a value between MinGamma and 1 for linear discriminant, the discriminant sets its type to 'linear'. • You cannot set values below the value of the MinGamma property. • For quadratic discriminant, you can set either 0 (for DiscrimType 'quadratic') or 1 (for DiscrimType 'diagQuadratic'). HyperparameterOptimizationResults Description of the cross-validation optimization of hyperparameters, stored as a BayesianOptimization object or a table of hyperparameters and associated values. Nonempty when the OptimizeHyperparameters name-value pair is nonempty at creation. Value depends on the setting of the HyperparameterOptimizationOptions name-value pair at creation: • 'bayesopt' (default) — Object of class BayesianOptimization • 'gridsearch' or 'randomsearch' — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst) LogDetSigma Logarithm of the determinant of the within-class covariance matrix. The type of LogDetSigma depends on the discriminant type: • Scalar for linear discriminant analysis • Vector of length K for quadratic discriminant analysis, where K is the number of classes MinGamma Nonnegative scalar, the minimal value of the Gamma parameter so that the correlation matrix is invertible. If the correlation matrix is not singular, MinGamma is 0. ModelParameters Parameters used in training obj. Mu Class means, specified as a K-by-p matrix of scalar values class means of size. K is the number of classes, and p is the number of predictors. Each row of Mu represents the mean of the multivariate normal distribution of the corresponding class. The class indices are in the ClassNames attribute. NumObservations Number of observations in the training data, a numeric scalar. NumObservations can be less than the number of rows of input data X when there are missing values in X or response Y. PredictorNames Cell array of names for the predictor variables, in the order in which they appear in the training data X. Prior Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames. 35-468

ClassificationDiscriminant class

Add or change a Prior vector using dot notation: obj.Prior = priorVector. ResponseName Character vector describing the response variable Y. RowsUsed Rows of the original training data stored in the model, specified as a logical vector. This property is empty if all rows are stored in X and Y. ScoreTransform Character vector representing a built-in transformation function, or a function handle for transforming scores. 'none' means no transformation; equivalently, 'none' means @(x)x. For a list of built-in transformation functions and the syntax of custom transformation functions, see fitcdiscr. Implement dot notation to add or change a ScoreTransform function using one of the following: • cobj.ScoreTransform = 'function' • cobj.ScoreTransform = @function Sigma Within-class covariance matrix or matrices. The dimensions depend on DiscrimType: • 'linear' (default) — Matrix of size p-by-p, where p is the number of predictors • 'quadratic' — Array of size p-by-p-by-K, where K is the number of classes • 'diagLinear' — Row vector of length p • 'diagQuadratic' — Array of size 1-by-p-by-K • 'pseudoLinear' — Matrix of size p-by-p • 'pseudoQuadratic' — Array of size p-by-p-by-K W Scaled weights, a vector with length n, the number of rows in X. X Matrix of predictor values. Each column of X represents one predictor (variable), and each row represents one observation. Xcentered X data with class means subtracted. If Y(i) is of class j, Xcentered(i,:) = X(i,:) – Mu(j,:), where Mu is the class mean property. 35-469

35

Functions

Y A categorical array, cell array of character vectors, character array, logical vector, or a numeric vector with the same number of rows as X. Each row of Y represents the classification of the corresponding row of X.

Object Functions compact compareHoldout crossval cvshrink edge lime logp loss mahal margin nLinearCoeffs partialDependence plotPartialDependence predict resubEdge resubLoss resubMargin resubPredict shapley testckfold

Reduce size of discriminant analysis classifier Compare accuracies of two classification models using new data Cross-validate discriminant analysis classifier Cross-validate regularization of linear discriminant Classification edge for discriminant analysis classifier Local interpretable model-agnostic explanations (LIME) Log unconditional probability density for discriminant analysis classifier Classification error for discriminant analysis classifier Mahalanobis distance to class means of discriminant analysis classifier Classification margins for discriminant analysis classifier Number of nonzero linear coefficients in discriminant analysis classifier Compute partial dependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Predict labels using discriminant analysis classifier Resubstitution classification edge for discriminant analysis classifier Resubstitution classification loss for discriminant analysis classifier Resubstitution classification margins for discriminant analysis classifier Predict resubstitution labels of discriminant analysis classification model Shapley values Compare accuracies of two classification models by repeated crossvalidation

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects.

Examples Train Discriminant Analysis Model Load Fisher's iris data set. load fisheriris

Train a discriminant analysis model using the entire data set. Mdl = fitcdiscr(meas,species) Mdl = ClassificationDiscriminant ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' ScoreTransform: 'none'

35-470

'versicolor'

'virginica'}

ClassificationDiscriminant class

NumObservations: DiscrimType: Mu: Coeffs:

150 'linear' [3x4 double] [3x3 struct]

Mdl is a ClassificationDiscriminant model. To access its properties, use dot notation. For example, display the group means for each predictor. Mdl.Mu ans = 3×4 5.0060 5.9360 6.5880

3.4280 2.7700 2.9740

1.4620 4.2600 5.5520

0.2460 1.3260 2.0260

To predict labels for new observations, pass Mdl and predictor data to predict.

More About Discriminant Classification The model for discriminant analysis is: • Each class (Y) generates data (X) using a multivariate normal distribution. That is, the model assumes X has a Gaussian mixture distribution (gmdistribution). • For linear discriminant analysis, the model has the same covariance matrix for each class, only the means vary. • For quadratic discriminant analysis, both means and covariances of each class vary. predict classifies so as to minimize the expected classification cost: y = argmin

K

∑

y = 1, ..., K k = 1

P k xCy k,

where •

y is the predicted classification.

• K is the number of classes. • P k x is the posterior probability on page 21-6 of class k for observation x. • C y k is the cost on page 21-7 of classifying an observation as y when its true class is k. For details, see “Prediction Using Discriminant Analysis Models” on page 21-6. Regularization Regularization is the process of finding a small set of predictors that yield an effective predictive model. For linear discriminant analysis, there are two parameters, γ and δ, that control regularization as follows. cvshrink helps you select appropriate values of the parameters. 35-471

35

Functions

Let Σ represent the covariance matrix of the data X, and let X be the centered data (the data X minus the mean by class). Define T

D = diag X * X . The regularized covariance matrix Σ is Σ = 1 − γ Σ + γD . Whenever γ ≥ MinGamma, Σ is nonsingular. Let μk be the mean vector for those elements of X in class k, and let μ0 be the global mean vector (the mean of the rows of X). Let C be the correlation matrix of the data X, and let C be the regularized correlation matrix: C = 1 − γ C + γI, where I is the identity matrix. The linear term in the regularized discriminant analysis classifier for a data point x is T −1

x − μ0 Σ

T

μk − μ0 = x − μ0 D−1/2 C

−1 −1/2

D

μk − μ0 .

The parameter δ enters into this equation as a threshold on the final term in square brackets. Each component of the vector C

−1 −1/2

D

μk − μ0 is set to zero if it is smaller in magnitude than the

threshold δ. Therefore, for class k, if component j is thresholded to zero, component j of x does not enter into the evaluation of the posterior probability. The DeltaPredictor property is a vector related to this threshold. When δ ≥ DeltaPredictor(i), all classes k have C

−1 −1/2

D

μk − μ0 ≤ δ .

Therefore, when δ ≥ DeltaPredictor(i), the regularized classifier does not use predictor i.

Version History Introduced in R2011b R2023b: Model stores observations with missing predictor values Behavior changed in R2023b Starting in R2023b, training observations with missing predictor values are included in the X, Xcentered, Y, and W data properties. The RowsUsed property indicates the training observations stored in the model, rather than those used for training. Observations with missing predictor values continue to be omitted from the model training process. In previous releases, the software omitted training observations that contained missing predictor values from the data properties of the model. 35-472

ClassificationDiscriminant class

References [1] Guo, Y., T. Hastie, and R. Tibshirani. "Regularized linear discriminant analysis and its application in microarrays." Biostatistics, Vol. 8, No. 1, pp. 86–100, 2007.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict function supports code generation. • When you train a discriminant analysis model by using fitcdiscr or create a compact discriminant analysis model by using makecdiscr, the value of the 'ScoreTransform' namevalue pair argument cannot be an anonymous function. For more information, see “Introduction to Code Generation” on page 34-3.

See Also CompactClassificationDiscriminant | fitcdiscr | compareHoldout Topics “Discriminant Analysis Classification” on page 21-2

35-473

35

Functions

ClassificationEnsemble Package: classreg.learning.classif Superclasses: CompactClassificationEnsemble Ensemble classifier

Description ClassificationEnsemble combines a set of trained weak learner models and data on which these learners were trained. It can predict ensemble response for new data by aggregating predictions from its weak learners. It stores data used for training, can compute resubstitution predictions, and can resume training if desired.

Construction Create a classification ensemble object (ens) using fitcensemble.

Properties BinEdges Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors. The software bins numeric predictors only if you specify the 'NumBins' name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default). You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl. X = mdl.X; % Predictor data Xbinned = zeros(size(X)); edges = mdl.BinEdges; % Find indices of binned predictors. idxNumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxNumeric) idxNumeric = idxNumeric'; end for j = idxNumeric x = X(:,j); % Convert x to array if x is a table. if istable(x) x = table2array(x); end % Group x into bins by using the discretize function. xbinned = discretize(x,[-inf; edges{j}; inf]); Xbinned(:,j) = xbinned; end

35-474

ClassificationEnsemble

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs. CategoricalPredictors Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). ClassNames List of the elements in Y with duplicates removed. ClassNames can be a numeric vector, categorical vector, logical vector, character array, or cell array of character vectors. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.) CombineWeights Character vector describing how ens combines weak learner weights, either 'WeightedSum' or 'WeightedAverage'. Cost Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. This property is readonly. ExpandedPredictorNames Expanded predictor names, stored as a cell array of character vectors. If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. FitInfo Numeric array of fit information. The FitInfoDescription property describes the content of this array. FitInfoDescription Character vector describing the meaning of the FitInfo array. HyperparameterOptimizationResults Description of the cross-validation optimization of hyperparameters, stored as a BayesianOptimization object or a table of hyperparameters and associated values. Nonempty when the OptimizeHyperparameters name-value pair is nonempty at creation. Value depends on the setting of the HyperparameterOptimizationOptions name-value pair at creation: 35-475

35

Functions

• 'bayesopt' (default) — Object of class BayesianOptimization • 'gridsearch' or 'randomsearch' — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst) LearnerNames Cell array of character vectors with names of weak learners in the ensemble. The name of each learner appears just once. For example, if you have an ensemble of 100 trees, LearnerNames is {'Tree'}. Method Character vector describing the method that creates ens. ModelParameters Parameters used in training ens. NumObservations Numeric scalar containing the number of observations in the training data. NumTrained Number of trained weak learners in ens, a scalar. PredictorNames Cell array of names for the predictor variables, in the order in which they appear in X. Prior Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames. The number of elements of Prior is the number of unique classes in the response. This property is read-only. ReasonForTermination Character vector describing the reason fitcensemble stopped adding weak learners to the ensemble. ResponseName Character vector with the name of the response variable Y. RowsUsed Rows of the original training data stored in the model, specified as a logical vector. This property is empty if all rows are stored in X and Y. ScoreTransform Function handle for transforming scores, or character vector representing a built-in transformation function. 'none' means no transformation; equivalently, 'none' means @(x)x. For a list of built-in transformation functions and the syntax of custom transformation functions, see fitctree. 35-476

ClassificationEnsemble

Add or change a ScoreTransform function using dot notation: ens.ScoreTransform = 'function'

or ens.ScoreTransform = @function

Trained A cell vector of trained classification models. • If Method is 'LogitBoost' or 'GentleBoost', then ClassificationEnsemble stores trained learner j in the CompactRegressionLearner property of the object stored in Trained{j}. That is, to access trained learner j, use ens.Trained{j}.CompactRegressionLearner. • Otherwise, cells of the cell vector contain the corresponding, compact classification models. TrainedWeights Numeric vector of trained weights for the weak learners in ens. TrainedWeights has T elements, where T is the number of weak learners in learners. UsePredForLearner Logical matrix of size P-by-NumTrained, where P is the number of predictors (columns) in the training data X. UsePredForLearner(i,j) is true when learner j uses predictor i, and is false otherwise. For each learner, the predictors have the same order as the columns in the training data X. If the ensemble is not of type Subspace, all entries in UsePredForLearner are true. W Scaled weights, a vector with length n, the number of rows in X. The sum of the elements of W is 1. X Matrix or table of predictor values that trained the ensemble. Each column of X represents one variable, and each row represents one observation. Y Numeric vector, categorical vector, logical vector, character array, or cell array of character vectors. Each row of Y represents the classification of the corresponding row of X.

Object Functions compact compareHoldout crossval edge gather lime loss

Reduce size of classification ensemble model Compare accuracies of two classification models using new data Cross-validate classification ensemble model Classification edge for classification ensemble model Gather properties of Statistics and Machine Learning Toolbox object from GPU Local interpretable model-agnostic explanations (LIME) Classification loss for classification ensemble model 35-477

35

Functions

margin partialDependence plotPartialDependence predict predictorImportance resubEdge resubLoss resubMargin resubPredict resume shapley testckfold

Classification margins for classification ensemble model Compute partial dependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Classify observations using ensemble of classification models Estimates of predictor importance for classification ensemble of decision trees Resubstitution classification edge for classification ensemble model Resubstitution classification loss for classification ensemble model Resubstitution classification margins for classification ensemble model Classify observations in classification ensemble model Resume training of classification ensemble model Shapley values Compare accuracies of two classification models by repeated crossvalidation

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects.

Examples Train Boosted Classification Ensemble Load the ionosphere data set. load ionosphere

Train a boosted ensemble of 100 classification trees using all measurements and the AdaBoostM1 method. Mdl = fitcensemble(X,Y,'Method','AdaBoostM1') Mdl = ClassificationEnsemble ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: NumTrained: Method: LearnerNames: ReasonForTermination: FitInfo: FitInfoDescription:

'Y' [] {'b' 'g'} 'none' 351 100 'AdaBoostM1' {'Tree'} 'Terminated normally after completing the requested number of training [100x1 double] {2x1 cell}

Mdl is a ClassificationEnsemble model object. Mdl.Trained is the property that stores a 100-by-1 cell vector of the trained classification trees (CompactClassificationTree model objects) that compose the ensemble. 35-478

ClassificationEnsemble

Plot a graph of the first trained classification tree. view(Mdl.Trained{1},'Mode','graph')

By default, fitcensemble grows shallow trees for boosted ensembles of trees. Predict the label of the mean of X. predMeanX = predict(Mdl,mean(X)) predMeanX = 1x1 cell array {'g'}

Tip For an ensemble of classification trees, the Trained property of ens stores an ens.NumTrainedby-1 cell vector of compact classification models. For a textual or graphical display of tree t in the cell vector, enter: 35-479

35

Functions

• view(ens.Trained{t}.CompactRegressionLearner) for ensembles aggregated using LogitBoost or GentleBoost. • view(ens.Trained{t}) for all other aggregation methods.

Version History Introduced in R2011a R2023b: Model with discriminant analysis weak learners stores observations with missing predictor values Behavior changed in R2023b Starting in R2023b, training observations with missing predictor values are included in the X, Y, and W data properties of classification ensemble models with discriminant analysis weak learners. The RowsUsed property indicates the training observations stored in the model, rather than those used for training. Observations with missing predictor values continue to be omitted from the model training process. In previous releases, the software omitted training observations that contained missing predictor values from the data properties of the model. R2022a: Cost property stores the user-specified cost matrix Behavior changed in R2022a Starting in R2022a, the Cost property stores the user-specified cost matrix, so that you can compute the observed misclassification cost using the specified cost value. The software stores normalized prior probabilities (Prior) and observation weights (W) that do not reflect the penalties described in the cost matrix. To compute the observed misclassification cost, specify the LossFun name-value argument as "classifcost" when you call the loss or resubLoss function. Note that model training has not changed and, therefore, the decision boundaries between classes have not changed. For training, the fitting function updates the specified prior probabilities by incorporating the penalties described in the specified cost matrix, and then normalizes the prior probabilities and observation weights. This behavior has not changed. In previous releases, the software stored the default cost matrix in the Cost property and stored the prior probabilities and observation weights used for training in the Prior and W properties, respectively. Starting in R2022a, the software stores the user-specified cost matrix without modification, and stores normalized prior probabilities and observation weights that do not reflect the cost penalties. For more details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 19-8. Some object functions use the Cost, Prior, and W properties: • The loss and resubLoss functions use the cost matrix stored in the Cost property if you specify the LossFun name-value argument as "classifcost" or "mincost". • The loss and edge functions use the prior probabilities stored in the Prior property to normalize the observation weights of the input data. • The resubLoss and resubEdge functions use the observation weights stored in the W property. If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases. 35-480

ClassificationEnsemble

If you want the software to handle the cost matrix, prior probabilities, and observation weights in the same way as in previous releases, adjust the prior probabilities and observation weights for the nondefault cost matrix, as described in “Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix” on page 19-9. Then, when you train a classification model, specify the adjusted prior probabilities and observation weights by using the Prior and Weights name-value arguments, respectively, and use the default cost matrix.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict function supports code generation. • To integrate the prediction of an ensemble into Simulink, you can use the ClassificationEnsemble Predict block in the Statistics and Machine Learning Toolbox library or a MATLAB Function block with the predict function. • When you train an ensemble by using fitcensemble, the following restrictions apply. • The value of the ScoreTransform name-value argument cannot be an anonymous function. • Code generation limitations for the weak learners used in the ensemble also apply to the ensemble. • For decision tree weak learners, you cannot use surrogate splits; that is, the value of the Surrogate name-value argument must be 'off'. • For k-nearest neighbor weak learners, the value of the Distance name-value argument cannot be a custom distance function. The value of the DistanceWeight name-value argument can be a custom distance weight function, but it cannot be an anonymous function. • For fixed-point code generation, the following additional restrictions apply. • When you train an ensemble by using fitcensemble, you must train an ensemble using tree learners, and the ScoreTransform value cannot be 'invlogit'. • Categorical predictors (logical, categorical, char, string, or cell) are not supported. You cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. • Class labels with the categorical data type are not supported. Both the class label value in the training data (Tbl or Y) and the value of the ClassNames name-value argument cannot be an array with the categorical data type. For more information, see “Introduction to Code Generation” on page 34-3. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • The following object functions fully support GPU arrays: • compact 35-481

35

Functions

• crossval • gather • predictorImportance • resubEdge • resubLoss • resubMargin • resubPredict • resume • The following object functions offer limited support for GPU arrays: • compareHoldout • edge • loss • margin • partialDependence • plotPartialDependence • predict • The object functions execute on a GPU if either of the following apply: • The model was fitted with GPU arrays. • The predictor data that you pass to the object function is a GPU array. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also ClassificationTree | fitcensemble | CompactClassificationEnsemble | view | compareHoldout

35-482

ClassificationEnsemble Predict

ClassificationEnsemble Predict Classify observations using ensemble of decision trees Libraries: Statistics and Machine Learning Toolbox / Classification

Description The ClassificationEnsemble Predict block classifies observations using an ensemble of decision trees (ClassificationEnsemble, ClassificationBaggedEnsemble, or CompactClassificationEnsemble) for multiclass classification. Import a trained classification object into the block by specifying the name of a workspace variable that contains the object. The input port x receives an observation (predictor data), and the output port label returns a predicted class label for the observation. You can add the optional output port score, which returns predicted class scores or posterior probabilities.

Ports Input x — Predictor data row vector | column vector Predictor data, specified as a row or column vector of one observation. The variables in x must have the same order as the predictor variables that trained the model specified by Select trained machine learning model. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point Output label — Predicted class label scalar Predicted class label, returned as a scalar. The predicted class is the class yielding the largest score. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point | enumerated score — Predicted class scores or posterior probabilities row vector Predicted class scores or posterior probabilities, returned as a row vector of size 1-by-k, where k is the number of classes in the ensemble model. 35-483

35

Functions

To check the order of the classes, use the ClassNames property of the ensemble model specified by Select trained machine learning model. Dependencies

• To enable this port, select the check box for Add output port for predicted class scores on the Main tab of the Block Parameters dialog box. • The definition and range of classification score values depend on the ensemble aggregation method. You can specify the ensemble aggregation method by using the Method name-value argument of fitcensemble when training the ensemble model. For details, see the “More About” on page 35-6376 section of the predict function reference page. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point

Parameters Main Select trained machine learning model — Classification ensemble model ensMdl (default) | ClassificationEnsemble object | ClassificationBaggedEnsemble object | CompactClassificationEnsemble object Specify the name of a workspace variable that contains a ClassificationEnsemble object, ClassificationBaggedEnsemble object, or CompactClassificationEnsemble object. When you train the model by using fitcensemble, the following restrictions apply: • You must train an ensemble using tree weak learners. • The predictor data cannot include categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. • The value of the ScoreTransform name-value argument cannot be 'invlogit' or an anonymous function. • You cannot use surrogate splits for tree weak learners, that is, the value of the Surrogate namevalue argument must be 'off' (default) when you define tree weak learners by using the templateTree function. Programmatic Use

Block Parameter: TrainedLearner Type: workspace variable Values: ClassificationEnsemble object | ClassificationBaggedEnsemble object | CompactClassificationEnsemble object Default: 'ensMdl' Add output port for predicted class scores — Add second output port for predicted class scores off (default) | on 35-484

ClassificationEnsemble Predict

Select the check box to include the second output port score in the ClassificationEnsemble Predict block. Programmatic Use

Block Parameter: ShowOutputScore Type: character vector Values: 'off' | 'on' Default: 'off' Data Types Fixed-Point Operational Parameters

Integer rounding mode — Rounding mode for fixed-point operations Floor (default) | Ceiling | Convergent | Nearest | Round | Simplest | Zero Specify the rounding mode for fixed-point operations. For more information, see “Rounding” (FixedPoint Designer). Block parameters always round to the nearest representable value. To control the rounding of a block parameter, enter an expression into the mask field using a MATLAB rounding function. Programmatic Use

Block Parameter: RndMeth Type: character vector Values: "Ceiling" | "Convergent" | "Floor" | "Nearest" | "Round" | "Simplest" | "Zero" Default: "Floor" Saturate on integer overflow — Method of overflow action off (default) | on Specify whether overflows saturate or wrap. Action

Rationale

Impact on Overflows

Example

Select this check box (on).

Your model has possible overflow, and you want explicit saturation protection in the generated code.

Overflows saturate to either the minimum or maximum value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of – 128.

35-485

35

Functions

Action

Rationale

Impact on Overflows

Example

Clear this check box (off).

You want to optimize the efficiency of your generated code.

Overflows wrap to the appropriate value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the value causing the overflow as int8, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as int8 is –126.

You want to avoid overspecifying how a block handles out-of-range signals. For more information, see “Troubleshoot Signal Range Errors” (Simulink).

Programmatic Use

Block Parameter: SaturateOnIntegerOverflow Type: character vector Values: "off" | "on" Default: "off" Lock output data type setting against changes by the fixed-point tools — Prevention of fixedpoint tools from overriding data type off (default) | on Select this parameter to prevent the fixed-point tools from overriding the data type you specify for the block. For more information, see “Use Lock Output Data Type Setting” (Fixed-Point Designer). Programmatic Use

Block Parameter: LockScale Type: character vector Values: "off" | "on" Default: "off" Data Type

Label data type — Data type of label output Inherit: Inherit via back propagation | Inherit: auto | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the label output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. The supported data types depend on the labels used in the model specified by Select trained machine learning model. • If the model uses numeric or logical labels, the supported data types are Inherit: Inherit via back propagation (default), double, single, half, int8, uint8, int16, uint16, int32, uint32, int64, uint64, boolean, fixed point, and a data type object. 35-486

ClassificationEnsemble Predict

• If the model uses nonnumeric labels, the supported data types are Inherit: auto (default), Enum: , and a data type object. When you select an inherited option, the software behaves as follows: • Inherit: Inherit via back propagation (default for numeric and logical labels) — Simulink automatically determines the Label data type of the block during data type propagation (see “Data Type Propagation” (Simulink)). In this case, the block uses the data type of a downstream block or signal object. • Inherit: auto (default for nonnumeric labels) — The block uses an autodefined enumerated data type variable. For example, suppose the workspace variable name specified by Select trained machine learning model is myMdl, and the class labels are class 1 and class 2. Then, the corresponding label values are myMdl_enumLabels.class_1 and myMdl_enumLabels.class_2. The block converts the class labels to valid MATLAB identifiers by using the matlab.lang.makeValidName function. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: LabelDataTypeStr Type: character vector Values: "Inherit: Inherit via back propagation" | "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: Inherit via back propagation" (for numeric and logical labels) | "Inherit: auto" (for nonnumeric labels) Label data type Minimum — Minimum value of label output for range checking [] (default) | scalar Specify the lower value of the label output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Label data type Minimum parameter does not saturate or clip the actual label output signal. To do so, use the Saturation block instead. 35-487

35

Functions

Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses numeric labels. Programmatic Use

Block Parameter: LabelOutMin Type: character vector Values: "[]" | scalar Default: "[]" Label data type Maximum — Maximum value of label output for range checking [] (default) | scalar Specify the upper value of the label output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Label data type Maximum parameter does not saturate or clip the actual label output signal. To do so, use the Saturation block instead. Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses numeric labels. Programmatic Use

Block Parameter: LabelOutMax Type: character vector Values: "[]" | scalar Default: "[]" Score data type — Data type of score output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the score output. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. 35-488

ClassificationEnsemble Predict

For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: ScoreDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Score data type Minimum — Minimum value of score output for range checking [] (default) | scalar Specify the lower value of the score output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Score data type Minimum parameter does not saturate or clip the actual score output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: ScoreOutMin Type: character vector Values: "[]" | scalar Default: "[]" Score data type Maximum — Maximum value of score output for range checking [] (default) | scalar Specify the upper value of the score output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). 35-489

35

Functions

• Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Score data type Maximum parameter does not saturate or clip the actual score output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: ScoreOutMax Type: character vector Values: "[]" | scalar Default: "[]" Raw score data type — Untransformed score data type Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the internal untransformed scores. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses a score transformation other than "none" (default, same as "identity"). • If the model uses no score transformations ("none" or "identity"), then you can specify the score data type by using Score data type. • If the model uses a score transformation other than "none" or "identity", then you can specify the data type of untransformed raw scores by using this parameter. To specify the data type of transformed scores, use Score data type. You can change the score transformation option by specifying the ScoreTransform name-value argument during training, or by modifying the ScoreTransform property after training. Programmatic Use

Block Parameter: RawScoreDataTypeStr Type: character vector

35-490

ClassificationEnsemble Predict

Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Raw score data type Minimum — Minimum untransformed score for range checking [] (default) | scalar Specify the lower value of the untransformed score range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Raw score data type Minimum parameter does not saturate or clip the actual untransformed score signal. Programmatic Use

Block Parameter: RawScoreOutMin Type: character vector Values: "[]" | scalar Default: "[]" Raw score data type Maximum — Maximum untransformed score for range checking [] (default) | scalar Specify the upper value of the untransformed score range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Raw score data type Maximum parameter does not saturate or clip the actual untransformed score signal. 35-491

35

Functions

Programmatic Use

Block Parameter: RawScoreOutMax Type: character vector Values: "[]" | scalar Default: "[]" Weak learner data type — Data type of weak learner outputs Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the outputs from weak learners. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: WeakLearnerDataTypeStr Type: character vector Values: 'Inherit: auto' | 'double' | 'single' | 'half' | 'int8' | 'uint8' | 'int16' | 'uint16' | 'int32' | 'uint32' | 'int64' | 'uint64' | 'boolean' | 'fixdt(1,16,0)' | 'fixdt(1,16,2^0,0)' | '' Default: 'Inherit: auto' Weak learner data type Minimum — Minimum value of weak learner outputs for range checking [] (default) | scalar Specify the lower value of the weak learner output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Weak learner data type Minimum parameter does not saturate or clip the actual weak learner output signals. 35-492

ClassificationEnsemble Predict

Programmatic Use

Block Parameter: WeakLearnerOutMin Type: character vector Values: '[]' | scalar Default: '[]' Weak learner data type Maximum — Maximum value of weak learner outputs for range checking [] (default) | scalar Specify the upper value of the weak learner output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Weak learner data type Maximum parameter does not saturate or clip the actual weak learner output signals. Programmatic Use

Block Parameter: WeakLearnerOutMax Type: character vector Values: '[]' | scalar Default: '[]'

Block Characteristics Data Types

Boolean | double | enumerated | fixed point | half | integer | single

Direct Feedthrough

yes

Multidimensional Signals

no

Variable-Size Signals

no

Zero-Crossing Detection

no

Alternative Functionality You can use a MATLAB Function block with the predict object function of an ensemble of decision trees (ClassificationEnsemble, ClassificationBaggedEnsemble, or CompactClassificationEnsemble). For an example, see “Predict Class Labels Using MATLAB Function Block” on page 34-49. 35-493

35

Functions

When deciding whether to use the ClassificationEnsemble Predict block in the Statistics and Machine Learning Toolbox library or a MATLAB Function block with the predict function, consider the following: • If you use the Statistics and Machine Learning Toolbox library block, you can use the Fixed-Point Tool to convert a floating-point model to fixed point. • Support for variable-size arrays must be enabled for a MATLAB Function block with the predict function. • If you use a MATLAB Function block, you can use MATLAB functions for preprocessing or postprocessing before or after predictions in the same MATLAB Function block.

Version History Introduced in R2021a

Extended Capabilities C/C++ Code Generation Generate C and C++ code using Simulink® Coder™. Fixed-Point Conversion Design and simulate fixed-point systems using Fixed-Point Designer™.

See Also Blocks ClassificationSVM Predict | ClassificationTree Predict | ClassificationNeuralNetwork Predict | RegressionEnsemble Predict Objects ClassificationEnsemble | ClassificationBaggedEnsemble | CompactClassificationEnsemble Functions predict | fitcensemble Topics “Predict Class Labels Using ClassificationSVM Predict Block” on page 34-121 “Predict Class Labels Using ClassificationTree Predict Block” on page 34-131 “Predict Class Labels Using ClassificationNeuralNetwork Predict Block” on page 34-154 “Predict Class Labels Using MATLAB Function Block” on page 34-49

35-494

ClassificationKNN

ClassificationKNN k-nearest neighbor classification

Description ClassificationKNN is a nearest neighbor classification model in which you can alter both the distance metric and the number of nearest neighbors. Because a ClassificationKNN classifier stores training data, you can use the model to compute resubstitution predictions. Alternatively, use the model to classify new observations using the predict method.

Creation Create a ClassificationKNN model using fitcknn.

Properties KNN Properties BreakTies — Tie-breaking algorithm 'smallest' (default) | 'nearest' | 'random' Tie-breaking algorithm used by predict when multiple classes have the same smallest cost, specified as one of the following: • 'smallest' — Use the smallest index among tied groups. • 'nearest' — Use the class with the nearest neighbor among tied groups. • 'random' — Use a random tiebreaker among tied groups. By default, ties occur when multiple classes have the same number of nearest points among the k nearest neighbors. BreakTies applies when IncludeTies is false. Change BreakTies using dot notation: mdl.BreakTies = newBreakTies. Distance — Distance metric 'cityblock' | 'chebychev' | 'correlation' | 'cosine' | 'euclidean' | 'hamming' | function handle | ... Distance metric, specified as the comma-separated pair consisting of 'Distance' and a valid distance metric name or function handle. The allowable distance metric names depend on your choice of a neighbor-searcher method (see NSMethod). NSMethod

Distance Metric Names

exhaustive

Any distance metric of ExhaustiveSearcher

kdtree

'cityblock', 'chebychev', 'euclidean', or 'minkowski'

This table includes valid distance metrics of ExhaustiveSearcher. 35-495

35

Functions

Distance Metric Names

Description

'cityblock'

City block distance.

'chebychev'

Chebychev distance (maximum coordinate difference).

'correlation'

One minus the sample linear correlation between observations (treated as sequences of values).

'cosine'

One minus the cosine of the included angle between observations (treated as vectors).

'euclidean'

Euclidean distance.

'hamming'

Hamming distance, percentage of coordinates that differ.

'jaccard'

One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ.

'mahalanobis'

Mahalanobis distance, computed using a positive definite covariance matrix C. The default value of C is the sample covariance matrix of X, as computed by cov(X,'omitrows'). To specify a different value for C, use the 'Cov' name-value pair argument.

'minkowski'

Minkowski distance. The default exponent is 2. To specify a different exponent, use the 'Exponent' name-value pair argument.

'seuclidean'

Standardized Euclidean distance. Each coordinate difference between X and a query point is scaled, meaning divided by a scale value S. The default value of S is the standard deviation computed from X, S = std(X,'omitnan'). To specify another value for S, use the Scale name-value pair argument.

'spearman'

One minus the sample Spearman's rank correlation between observations (treated as sequences of values).

@distfun

Distance function handle. distfun has the form function D2 = distfun(ZI,ZJ) % calculation of distance ...

where • ZI is a 1-by-N vector containing one row of X or Y. • ZJ is an M2-by-N matrix containing multiple rows of X or Y. • D2 is an M2-by-1 vector of distances, and D2(k) is the distance between observations ZI and ZJ(k,:). If you specify CategoricalPredictors as 'all', then the default distance metric is 'hamming'. Otherwise, the default distance metric is 'euclidean'. Change Distance using dot notation: mdl.Distance = newDistance. If NSMethod is 'kdtree', you can use dot notation to change Distance only for the metrics 'cityblock', 'chebychev', 'euclidean', and 'minkowski'. For definitions, see “Distance Metrics” on page 19-14. Example: 'Distance','minkowski' Data Types: char | string | function_handle 35-496

ClassificationKNN

DistanceWeight — Distance weighting function 'equal' | 'inverse' | 'squaredinverse' | function handle Distance weighting function, specified as one of the values in this table. Value

Description

'equal'

No weighting

'inverse'

Weight is 1/distance

'squaredinverse'

Weight is 1/distance2

@fcn

fcn is a function that accepts a matrix of nonnegative distances and returns a matrix of the same size containing nonnegative distance weights. For example, 'squaredinverse' is equivalent to @(d)d.^(–2).

Change DistanceWeight using dot notation: mdl.DistanceWeight = newDistanceWeight. Data Types: char | function_handle DistParameter — Parameter for distance metric positive definite covariance matrix | positive scalar | vector of positive scale values Parameter for the distance metric, specified as one of the values described in this table. Distance Metric

Parameter

'mahalanobis'

Positive definite covariance matrix C

'minkowski'

Minkowski distance exponent, a positive scalar

'seuclidean'

Vector of positive scale values with length equal to the number of columns of X

For any other distance metric, the value of DistParameter must be []. You can alter DistParameter using dot notation: mdl.DistParameter = newDistParameter. However, if Distance is 'mahalanobis' or 'seuclidean', then you cannot alter DistParameter. Data Types: single | double IncludeTies — Tie inclusion flag false (default) | true Tie inclusion flag indicating whether predict includes all the neighbors whose distance values are equal to the kth smallest distance, specified as false or true. If IncludeTies is true, predict includes all of these neighbors. Otherwise, predict uses exactly k neighbors (see the BreakTies property). Change IncludeTies using dot notation: mdl.IncludeTies = newIncludeTies. Data Types: logical NSMethod — Nearest neighbor search method 'kdtree' | 'exhaustive' This property is read-only. 35-497

35

Functions

Nearest neighbor search method, specified as either 'kdtree' or 'exhaustive'. • 'kdtree' — Creates and uses a Kd-tree to find nearest neighbors. • 'exhaustive' — Uses the exhaustive search algorithm. When predicting the class of a new point xnew, the software computes the distance values from all points in X to xnew to find nearest neighbors. The default value is 'kdtree' when X has 10 or fewer columns, X is not sparse, and the distance metric is a 'kdtree' type. Otherwise, the default value is 'exhaustive'. NumNeighbors — Number of nearest neighbors positive integer value Number of nearest neighbors in X used to classify each point during prediction, specified as a positive integer value. Change NumNeighbors using dot notation: mdl.NumNeighbors = newNumNeighbors. Data Types: single | double Other Classification Properties CategoricalPredictors — Categorical predictor indices [] | vector of positive integers This property is read-only. Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). Data Types: double ClassNames — Names of classes in training data Y categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Names of the classes in the training data Y with duplicates removed, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as Y. (The software treats string arrays as cell arrays of character vectors.) Data Types: categorical | char | logical | single | double | cell Cost — Cost of misclassification square matrix Cost of the misclassification of a point, specified as a square matrix. Cost(i,j) is the cost of classifying a point into class j if its true class is i (that is, the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns in Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. By default, Cost(i,j) = 1 if i ~= j, and Cost(i,j) = 0 if i = j. In other words, the cost is 0 for correct classification and 1 for incorrect classification. 35-498

ClassificationKNN

Change a Cost matrix using dot notation: mdl.Cost = costMatrix. Data Types: single | double ExpandedPredictorNames — Expanded predictor names cell array of character vectors This property is read-only. Expanded predictor names, specified as a cell array of character vectors. If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. Data Types: cell ModelParameters — Parameters used in training ClassificationKNN object This property is read-only. Parameters used in training the ClassificationKNN model, specified as an object. Mu — Predictor means numeric vector This property is read-only. Predictor means, specified as a numeric vector of length numel(PredictorNames). If you do not standardize mdl when training the model using fitcknn, then Mu is empty ([]). Data Types: single | double NumObservations — Number of observations positive integer scalar This property is read-only. Number of observations used in training the ClassificationKNN model, specified as a positive integer scalar. This number can be less than the number of rows in the training data because rows containing NaN values are not part of the fit. Data Types: double PredictorNames — Predictor variable names cell array of character vectors This property is read-only. Predictor variable names, specified as a cell array of character vectors. The variable names are in the same order in which they appear in the training data X. Data Types: cell Prior — Prior probabilities for each class numeric vector 35-499

35

Functions

Prior probabilities for each class, specified as a numeric vector. The order of the elements in Prior corresponds to the order of the classes in ClassNames. Add or change a Prior vector using dot notation: mdl.Prior = priorVector. Data Types: single | double ResponseName — Response variable name character vector This property is read-only. Response variable name, specified as a character vector. Data Types: char RowsUsed — Rows used in fitting [] | logical vector This property is read-only. Rows of the original training data used in fitting the ClassificationKNN model, specified as a logical vector. This property is empty if all rows are used. Data Types: logical ScoreTransform — Score transformation 'none' (default) | 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | function handle | ... Score transformation, specified as either a character vector or a function handle. This table summarizes the available character vectors. Value

Description

"doublelogit"

1/(1 + e–2x)

"invlogit"

log(x / (1 – x))

"ismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

"logit"

1/(1 + e–x)

"none" or "identity"

x (no transformation)

"sign"

–1 for x < 0 0 for x = 0 1 for x > 0

"symmetric"

2x – 1

"symmetricismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

"symmetriclogit"

2/(1 + e–x) – 1

For a MATLAB function or a function you define, use its function handle for score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Change ScoreTransform using dot notation: mdl.ScoreTransform = newScoreTransform. 35-500

ClassificationKNN

Data Types: char | function_handle Sigma — Predictor standard deviations numeric vector This property is read-only. Predictor standard deviations, specified as a numeric vector of length numel(PredictorNames). If you do not standardize the predictor variables during training, then Sigma is empty ([]). Data Types: single | double W — Observation weights vector of nonnegative values This property is read-only. Observation weights, specified as a vector of nonnegative values with the same number of rows as Y. Each entry in W specifies the relative importance of the corresponding observation in Y. Data Types: single | double X — Unstandardized predictor data numeric matrix This property is read-only. Unstandardized predictor data, specified as a numeric matrix. Each column of X represents one predictor (variable), and each row represents one observation. Data Types: single | double Y — Class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Class labels, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Each value in Y is the observed class label for the corresponding row in X. Y has the same data type as the data in Y used for training the model. (The software treats string arrays as cell arrays of character vectors.) Data Types: single | double | logical | char | cell | categorical Hyperparameter Optimization Properties HyperparameterOptimizationResults — Cross-validation optimization of hyperparameters BayesianOptimization object | table This property is read-only. Cross-validation optimization of hyperparameters, specified as a BayesianOptimization object or a table of hyperparameters and associated values. This property is nonempty when the 'OptimizeHyperparameters' name-value pair argument is nonempty when you create the model using fitcknn. The value depends on the setting of the 'HyperparameterOptimizationOptions' name-value pair argument when you create the model: 35-501

35

Functions

• 'bayesopt' (default) — Object of class BayesianOptimization • 'gridsearch' or 'randomsearch' — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

Object Functions compareHoldout crossval edge gather lime loss margin partialDependence plotPartialDependence predict resubEdge resubLoss resubMargin resubPredict shapley testckfold

Compare accuracies of two classification models using new data Cross-validate machine learning model Edge of k-nearest neighbor classifier Gather properties of Statistics and Machine Learning Toolbox object from GPU Local interpretable model-agnostic explanations (LIME) Loss of k-nearest neighbor classifier Margin of k-nearest neighbor classifier Compute partial dependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Predict labels using k-nearest neighbor classification model Resubstitution classification edge Resubstitution classification loss Resubstitution classification margin Classify training data using trained classifier Shapley values Compare accuracies of two classification models by repeated crossvalidation

Examples Train k-Nearest Neighbor Classifier Train a k-nearest neighbor classifier for Fisher's iris data, where k, the number of nearest neighbors in the predictors, is 5. Load Fisher's iris data. load fisheriris X = meas; Y = species;

X is a numeric matrix that contains four petal measurements for 150 irises. Y is a cell array of character vectors that contains the corresponding iris species. Train a 5-nearest neighbor classifier. Standardize the noncategorical predictor data. Mdl = fitcknn(X,Y,'NumNeighbors',5,'Standardize',1) Mdl = ClassificationKNN ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations:

35-502

'Y' [] {'setosa' 'none' 150

'versicolor'

'virginica'}

ClassificationKNN

Distance: 'euclidean' NumNeighbors: 5

Mdl is a trained ClassificationKNN classifier, and some of its properties appear in the Command Window. To access the properties of Mdl, use dot notation. Mdl.ClassNames ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } Mdl.Prior ans = 1×3 0.3333

0.3333

0.3333

Mdl.Prior contains the class prior probabilities, which you can specify using the 'Prior' namevalue pair argument in fitcknn. The order of the class prior probabilities corresponds to the order of the classes in Mdl.ClassNames. By default, the prior probabilities are the respective relative frequencies of the classes in the data. You can also reset the prior probabilities after training. For example, set the prior probabilities to 0.5, 0.2, and 0.3, respectively. Mdl.Prior = [0.5 0.2 0.3];

You can pass Mdl to predict to label new measurements or crossval to cross-validate the classifier.

Tips • The compact function reduces the size of most classification models by removing the training data properties and any other properties that are not required to predict the labels of new observations. Because k-nearest neighbor classification models require all of the training data to predict labels, you cannot reduce the size of a ClassificationKNN model.

Alternative Functionality knnsearch finds the k-nearest neighbors of points. rangesearch finds all the points within a fixed distance. You can use these functions for classification, as shown in “Classify Query Data” on page 1921. If you want to perform classification, then using ClassificationKNN models can be more convenient because you can train a classifier in one step (using fitcknn) and classify in other steps (using predict). Alternatively, you can train a k-nearest neighbor classification model using one of the cross-validation options in the call to fitcknn. In this case, fitcknn returns a ClassificationPartitionedModel cross-validated model object. 35-503

35

Functions

Version History Introduced in R2012a

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict function supports code generation. • When you train a k-nearest neighbor classification model by using fitcknn, the following restrictions apply. • The value of the 'Distance' name-value pair argument cannot be a custom distance function. • The value of the 'DistanceWeight' name-value pair argument can be a custom distance weight function, but it cannot be an anonymous function. • The value of the 'ScoreTransform' name-value pair argument cannot be an anonymous function. For more information, see “Introduction to Code Generation” on page 34-3. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • The following object functions fully support GPU arrays: • crossval • gather • resubEdge • resubLoss • resubMargin • resubPredict • The following object functions offer limited support for GPU arrays: • compareHoldout • edge • loss • margin • partialDependence • plotPartialDependence • predict • The object functions execute on a GPU if either of the following apply: • The model was fitted with GPU arrays. 35-504

ClassificationKNN

• The predictor data that you pass to the object function is a GPU array. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also fitcknn | predict Topics “Construct KNN Classifier” on page 19-31 “Examine Quality of KNN Classifier” on page 19-31 “Predict Classification Using KNN Classifier” on page 19-32 “Modify KNN Classifier” on page 19-32 “Classification Using Nearest Neighbors” on page 19-14

35-505

35

Functions

KNN Search Find k-nearest neighbors using searcher object Libraries: Statistics and Machine Learning Toolbox / Neighborhood Searcher

Description The KNN Search block finds the nearest neighbors in the data to a query point using a nearest neighbor searcher object (ExhaustiveSearcher or KDTreeSearcher). Import a trained searcher object containing observation data into the block by specifying the name of a workspace variable that contains the object. The input port x receives a query point, and the output port Idx returns the indices of the k-nearest neighbor points in the data. The optional output port D returns the distances between the query point and the nearest neighbor points.

Ports Input x — Query point row vector Query point, specified as a row vector. x must have the same number of columns as the number of predictor variables in the searcher object specified by Select nearest neighbor searcher. The columns of x must be in the same order as those in the searcher object. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point Output Idx — Indices of nearest neighbors numeric row vector | 1-by-1 cell array Indices of the nearest neighbors in the data, returned as a numeric row vector or 1-by-1 cell array. • If you do not select Include ties on the Main tab of the Block Parameters dialog box, then the block returns a 1-by-k numeric row vector, where k is the number of nearest neighbors searched. Each column of the row vector contains the index of a nearest neighbor point in the data, ordered by increasing distance to the query point x. • If you select Include ties on the Main tab of the Block Parameters dialog box, then the block returns a 1-by-1 cell array as a variable-size signal containing a numeric row vector of at least k indices of the closest observations in the data to the query point x. The columns of the vector are ordered by increasing distance to the query point. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | fixed point 35-506

KNN Search

D — Distances of nearest neighbors numeric row vector | 1-by-1 cell array Distances of the nearest neighbors to the query points, returned as a numeric row vector or 1-by-1 cell array. • If you do not select Include ties on the Main tab of the Block Parameters dialog box, then the block returns a 1-by-k numeric row vector, where k is the number of nearest neighbors searched. Each column of the row vector contains the distance of a nearest neighbor point in the data to the query point x, according to the distance metric. The columns of the row vector are ordered by increasing distance to the query point. • If you select Include ties on the Main tab of the Block Parameters dialog box, then the block returns a 1-by-1 cell array as a variable-size signal containing a numeric row vector of at least k distances of the closest observations in the data to the query point x. The columns of the vector are ordered by increasing distance to the query point. Dependencies

To enable this port, select Add output port for nearest neighbor distances in the KNN Search block. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | fixed point

Parameters Main Select nearest neighbor searcher — Nearest neighbor search method searcher (default) | ExhaustiveSearcher object | KDTreeSearcher object Specify the name of a workspace variable that contains an ExhaustiveSearcher or KDTreeSearcher object. Note The software uses the default settings for all parameters that you can specify in the Block Parameters dialog box. The parameters in the dialog box override those of the searcher object. Programmatic Use

Block Parameter: NeighborhoodSearcher Type: workspace variable Values: ExhaustiveSearcher object | KDTreeSearcher object Default: "searcher" Add output port for nearest neighbor distances — Add second output port for nearest neighbor distances off (default) | on Select the check box to include the second outport port D in the KNN Search block.

35-507

35

Functions

Programmatic Use

Block Parameter: ShowOutputDistances Type: character vector Values: "off" | "on" Default: "off" Number of nearest neighbors — Number of nearest neighbors 1 (default) | positive integer Specify the number of nearest neighbors to find in the data for the query point. Programmatic Use

Block Parameter: NumNeighbors Type: positive integer Values: single | double Default: 1 Include ties — Flag to include all nearest neighbors off (default) | on If you do not select Include ties on the Main tab of the Block Parameters dialog box, then the block selects the observation with the smallest index among the observations that have the same distance from the query point. If you select Include ties: • The block output includes all nearest neighbors whose distances are equal to the kth smallest distance in the output arguments. If more than five nearest neighbors have equal distance to the kth smallest distance, the block output includes only the first five nearest neighbors with the smallest index values. • The Idx and D block outputs are 1-by-1 cell arrays where each cell contains a vector of at least k indices and distances, respectively. The columns in the vectors are ordered by increasing distance to the query point. Programmatic Use

Block Parameter: IncludeTies Type: character vector Values: "off" | "on" Default: "off" Distance metric — Distance metric euclidean (default) | chebychev | cityblock | minkowski | correlation | cosine | hamming | jaccard | mahalanobis | seuclidean | spearman Specify the distance metric used to find nearest neighbors in the data to the query point. For both ExhaustiveSearcher and KDTreeSearcher objects, the block supports these distance metrics.

35-508

KNN Search

Value

Description

"chebychev"

Chebychev distance (maximum coordinate difference)

"cityblock"

City block distance

"euclidean"

Euclidean distance

"minkowski"

Minkowski distance. The default exponent is 2. You can specify a different exponent in the Block Parameters dialog box.

For an ExhaustiveSearcher object, the block also supports these distance metrics. Value

Description

"correlation"

One minus the sample linear correlation between observations (treated as sequences of values)

"cosine"

One minus the cosine of the included angle between observations (treated as row vectors)

"hamming"

Hamming distance, which is the percentage of coordinates that differ

"jaccard"

One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ

"mahalanobis"

Mahalanobis distance, computed using a positive definite covariance matrix. The block computes the covariance matrix from the data in the searcher object, by default. You can specify a customized covariance matrix in the Block Parameters dialog box.

"seuclidean"

Standardized Euclidean distance. Each coordinate difference between the query point x and the data is scaled by dividing by the corresponding element of the standard deviation computed from the data. You can specify a different scaling method in the Block Parameters dialog box.

"spearman"

One minus the sample Spearman's rank correlation between observations (treated as sequences of values)

Note • The distance metric setting overrides the Distance property of the specified searcher object. • The KNN Search block does not support the "fasteuclidean" or "fastseuclidean" distance metric (see “Distance Metrics” on page 35-5943).

Programmatic Use

Block Parameter: DistanceMetric

35-509

35

Functions

Type: character vector Values: "euclidean" | "chebychev" | "cityblock" | "minkowski" | "correlation" | "cosine" | "hamming" | "jaccard" | "mahalanobis" | "seuclidean" | "spearman" Default: "euclidean" Covariance matrix — Covariance matrix for Mahalanobis distance metric Computed using data in searcher (default) | Customized The block computes the covariance matrix from the data in the searcher object, by default. You can specify a customized covariance matrix by selecting Customized and entering a positive definite matrix in the Customized matrix box. Note This setting overrides the DistParameter property of the specified searcher object. Programmatic Use

Block Parameter: CovarianceMatrix Type: positive definite matrix Values: "Computed using data in searcher" | "Customized" Default: "Computed using data in searcher" Dependencies

To enable this parameter, set Distance Metric to "mahalanobis". Scale — Scale parameter value for standardized Euclidean distance metric Standard deviation of data in searcher (default) | Customized The block computes the scale parameter value from the data in the searcher object, by default. You can specify a customized scale parameter value by selecting Customized and entering a nonnegative numeric row vector in the Customized scale text box. The row vector must have the same number of columns as the number of predictor variables in the searcher object. When the block computes the standardized Euclidean distance, each coordinate of the data is scaled by the corresponding element of Scale, as is the query point. Note This setting overrides the DistParameter property of the specified searcher object. Programmatic Use

Block Parameter: Scale Type: nonnegative numeric row vector Values: "Standard deviation of data in searcher" | "Customized" Default: "Standard deviation of data in searcher" Dependencies

To enable this parameter, set Distance Metric to "seuclidean". P — Exponent for Minkowski distance metric 2 (default) | positive integer 35-510

KNN Search

Specify the exponent for the Minkowski distance metric. For the default case of P = 2, the Minkowski distance gives the Euclidean distance. For the special case of P = 1, the Minkowski distance gives the city block distance. For the special case of P = ∞, the Minkowski distance gives the Chebychev distance. Note This setting overrides the DistParameter property of the specified searcher object. Programmatic Use

Block Parameter: MinkExp Type: positive integer Values: positive integer Default: 2 Dependencies

To enable this parameter, set Distance Metric to "minkowski". Data Types Fixed-Point Operational Parameters

Integer rounding mode — Rounding mode for fixed-point operations Floor (default) | Ceiling | Convergent | Nearest | Round | Simplest | Zero Specify the rounding mode for fixed-point operations. For more information, see “Rounding” (FixedPoint Designer). Block parameters always round to the nearest representable value. To control the rounding of a block parameter, enter an expression into the mask field using a MATLAB rounding function. Programmatic Use

Block Parameter: RndMeth Type: character vector Values: "Ceiling" | "Convergent" | "Floor" | "Nearest" | "Round" | "Simplest" | "Zero" Default: "Floor" Saturate on integer overflow — Method of overflow action off (default) | on Specify whether overflows saturate or wrap.

35-511

35

Functions

Action

Rationale

Impact on Overflows

Example

Select this check box (on).

Your model has possible overflow, and you want explicit saturation protection in the generated code.

Overflows saturate to either the minimum or maximum value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of – 128.

Clear this check box (off).

You want to optimize the efficiency of your generated code.

Overflows wrap to the appropriate value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the value causing the overflow as int8, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as int8 is –126.

You want to avoid overspecifying how a block handles out-of-range signals. For more information, see “Troubleshoot Signal Range Errors” (Simulink).

Programmatic Use

Block Parameter: SaturateOnIntegerOverflow Type: character vector Values: "off" | "on" Default: "off" Lock output data type setting against changes by the fixed-point tools — Prevention of fixedpoint tools from overriding data type off (default) | on Select this parameter to prevent the fixed-point tools from overriding the data type you specify for the block. For more information, see “Use Lock Output Data Type Setting” (Fixed-Point Designer). Programmatic Use

Block Parameter: LockScale Type: character vector Values: "off" | "on" Default: "off"

35-512

KNN Search

Data Type

Index data type — Data type of index output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the Idx output. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: IndicesDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Index data type Minimum — Minimum of index output [] (default) | scalar Specify the minimum value of the Idx output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Index data type Minimum parameter does not saturate or clip the actual Idx output signal. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: IndicesOutMin Type: scalar Values: "[]" | scalar Default: "[]" Index data type Maximum — Maximum of index output

35-513

35

Functions

[] (default) | scalar Specify the maximum value of the Idx output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Index data type Maximum parameter does not saturate or clip the actual Idx output signal. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: IndicesOutMax Type: scalar Values: "[]" | scalar Default: "[]" Distance data type — Data type of distance output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the distance (D) output. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: DistanceDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Note Fixed-point data types are not supported for the Spearman distance metric.

35-514

KNN Search

Dependencies

To enable this parameter, select Add output port for nearest neighbor distances on the Main tab of the Block Parameters dialog box. Distance data type Minimum — Minimum of distance output [] (default) | scalar Specify the minimum value of the distance (D) output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Distance data type Maximum parameter does not saturate or clip the actual D output signal. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: DistanceOutMin Type: scalar Values: "[]" | scalar Default: "[]" Dependencies

To enable this parameter, select Add output port for nearest neighbor distances on the Main tab of the Block Parameters dialog box. Distance data type Maximum — Maximum of distance output [] (default) | scalar Specify the maximum value of the distance (D) output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop 35-515

35

Functions

(SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Distance data type Maximum parameter does not saturate or clip the actual D output signal. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: DistanceOutMax Type: scalar Values: "[]" | scalar Default: "[]" Dependencies

To enable this parameter, select Add output port for nearest neighbor distances on the Main tab of the Block Parameters dialog box.

Block Characteristics Data Types

Boolean | double | enumerated | fixed point | half | integer | single

Direct Feedthrough

yes

Multidimensional Signals

no

Variable-Size Signals

yes

Zero-Crossing Detection

no

Alternative Functionality You can use a MATLAB Function block with the knnsearch object function of a nearest neighbor searcher object (ExhaustiveSearcher or KDTreeSearcher). For an example, see “Predict Class Labels Using MATLAB Function Block” on page 34-49. When deciding whether to use the KNN Search block in the Statistics and Machine Learning Toolbox library or a MATLAB Function block with the knnsearch function, consider the following: • If you use the Statistics and Machine Learning Toolbox library block, you can use the Fixed-Point Tool to convert a floating-point model to fixed point. • Support for variable-size arrays must be enabled for a MATLAB Function block with the knnsearch function.

Version History Introduced in R2023b

35-516

KNN Search

Extended Capabilities C/C++ Code Generation Generate C and C++ code using Simulink® Coder™. Fixed-Point Conversion Design and simulate fixed-point systems using Fixed-Point Designer™.

See Also createns | knnsearch | ExhaustiveSearcher | KDTreeSearcher Topics “Predict Class Labels Using ClassificationKNN Predict Block” on page 34-168 “Predict Class Labels Using MATLAB Function Block” on page 34-49 “k-Nearest Neighbor Search and Radius Search” on page 19-16 “Distance Metrics” on page 19-14

35-517

35

Functions

ClassificationKNN Predict Classify observations using nearest neighbor classification model Libraries: Statistics and Machine Learning Toolbox / Classification

Description The ClassificationKNN Predict block classifies observations using a nearest neighbor classification object (ClassificationKNN) for multiclass classification. Import a trained classification object into the block by specifying the name of a workspace variable that contains the object. The input port x receives an observation (predictor data), and the output port label returns a predicted class label for the observation. The optional output score returns the predicted class scores or posterior probabilities. The optional output cost returns the expected classification costs.

Ports Input x — Predictor data row vector | column vector Predictor data, specified as a row or column vector of one observation. The variables in x must have the same order as the predictor variables that trained the model specified by Select trained machine learning model. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point Output label — Predicted class label scalar Predicted class label, returned as a scalar. The predicted class is the class that minimizes the expected classification cost. For more details, see the “Algorithms” on page 35-6325 section of the predict object function. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point | enumerated score — Predicted class scores or posterior probabilities row vector 35-518

ClassificationKNN Predict

Predicted class scores or posterior probabilities, returned as a row vector of size 1-by-k, where k is the number of classes in the nearest neighbor model. The classification score Score(i) represents the posterior probability that the observation in x belongs to class i. To check the order of the classes, use the ClassNames property of the nearest neighbor model specified by Select trained machine learning model. Dependencies

To enable this port, select the check box for Add output port for predicted class scores on the Main tab of the Block Parameters dialog box. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point cost — Expected classification costs row vector Expected classification costs, returned as a row vector of size 1-by-k, where k is the number of classes in the nearest neighbor model. The classification cost Cost(i) represents the cost of classifying the observation in x to class i. To check the order of the classes, use the ClassNames property of the nearest neighbor model specified by Select trained machine learning model. Dependencies

To enable this port, select the check box for Add output port for expected costs on the Main tab of the Block Parameters dialog box. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point

Parameters Main Select trained machine learning model — Nearest neighbor classification model knnMdl (default) | ClassificationKNN object Specify the name of a workspace variable that contains a ClassificationKNN object. When you train the model by using fitcknn, the following restrictions apply: • The predictor data cannot include categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. • The value of the ScoreTransform name-value argument cannot be "invlogit" or an anonymous function. Programmatic Use

Block Parameter: TrainedLearner 35-519

35

Functions

Type: workspace variable Values: ClassificationKNN object Default: 'knnMdl' Add output port for predicted class scores — Add optional output port for predicted class scores off (default) | on Select the check box to include the output port score in the ClassificationKNN Predict block. Programmatic Use

Block Parameter: ShowOutputScore Type: character vector Values: 'off' | 'on' Default: 'off' Add output port for expected costs — Add optional output port for expected classification costs off (default) | on Select the check box to include the output port cost in the ClassificationKNN Predict block. Programmatic Use

Block Parameter: ShowOutputCost Type: character vector Values: 'off' | 'on' Default: 'off' Data Types Fixed-Point Operational Parameters

Integer rounding mode — Rounding mode for fixed-point operations Floor (default) | Ceiling | Convergent | Nearest | Round | Simplest | Zero Specify the rounding mode for fixed-point operations. For more information, see “Rounding” (FixedPoint Designer). Block parameters always round to the nearest representable value. To control the rounding of a block parameter, enter an expression into the mask field using a MATLAB rounding function. Programmatic Use

Block Parameter: RndMeth Type: character vector Values: "Ceiling" | "Convergent" | "Floor" | "Nearest" | "Round" | "Simplest" | "Zero" Default: "Floor" Saturate on integer overflow — Method of overflow action off (default) | on Specify whether overflows saturate or wrap. 35-520

ClassificationKNN Predict

Action

Rationale

Impact on Overflows

Example

Select this check box (on).

Your model has possible overflow, and you want explicit saturation protection in the generated code.

Overflows saturate to either the minimum or maximum value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of – 128.

Clear this check box (off).

You want to optimize the efficiency of your generated code.

Overflows wrap to the appropriate value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the value causing the overflow as int8, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as int8 is –126.

You want to avoid overspecifying how a block handles out-of-range signals. For more information, see “Troubleshoot Signal Range Errors” (Simulink).

Programmatic Use

Block Parameter: SaturateOnIntegerOverflow Type: character vector Values: "off" | "on" Default: "off" Lock output data type setting against changes by the fixed-point tools — Prevention of fixedpoint tools from overriding data type off (default) | on Select this parameter to prevent the fixed-point tools from overriding the data type you specify for the block. For more information, see “Use Lock Output Data Type Setting” (Fixed-Point Designer). Programmatic Use

Block Parameter: LockScale Type: character vector Values: "off" | "on" Default: "off"

35-521

35

Functions

Data Type

Label data type — Data type of label output Inherit: Inherit via back propagation | Inherit: auto | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the label output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. The supported data types depend on the labels used in the model specified by Select trained machine learning model. • If the model uses numeric or logical labels, the supported data types are Inherit: Inherit via back propagation (default), double, single, half, int8, uint8, int16, uint16, int32, uint32, int64, uint64, boolean, fixed point, and a data type object. • If the model uses nonnumeric labels, the supported data types are Inherit: auto (default), Enum: , and a data type object. When you select an inherited option, the software behaves as follows: • Inherit: Inherit via back propagation (default for numeric and logical labels) — Simulink automatically determines the Label data type of the block during data type propagation (see “Data Type Propagation” (Simulink)). In this case, the block uses the data type of a downstream block or signal object. • Inherit: auto (default for nonnumeric labels) — The block uses an autodefined enumerated data type variable. For example, suppose the workspace variable name specified by Select trained machine learning model is myMdl, and the class labels are class 1 and class 2. Then, the corresponding label values are myMdl_enumLabels.class_1 and myMdl_enumLabels.class_2. The block converts the class labels to valid MATLAB identifiers by using the matlab.lang.makeValidName function. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: LabelDataTypeStr Type: character vector Values: "Inherit: Inherit via back propagation" | "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: Inherit via back propagation" (for numeric and logical labels) | "Inherit: auto" (for nonnumeric labels) Label data type Minimum — Minimum value of label output for range checking [] (default) | scalar Specify the lower value of the label output range that Simulink checks. 35-522

ClassificationKNN Predict

Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Label data type Minimum parameter does not saturate or clip the actual label output signal. To do so, use the Saturation block instead. Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses numeric labels. Programmatic Use

Block Parameter: LabelOutMin Type: character vector Values: "[]" | scalar Default: "[]" Label data type Maximum — Maximum value of label output for range checking [] (default) | scalar Specify the upper value of the label output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Label data type Maximum parameter does not saturate or clip the actual label output signal. To do so, use the Saturation block instead. Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses numeric labels. 35-523

35

Functions

Programmatic Use

Block Parameter: LabelOutMax Type: character vector Values: "[]" | scalar Default: "[]" Score data type — Data type of score output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the score output. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: ScoreDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Score data type Minimum — Minimum value of score output for range checking [] (default) | scalar Specify the lower value of the score output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Score data type Minimum parameter does not saturate or clip the actual score output. To do so, use the Saturation block instead. 35-524

ClassificationKNN Predict

Programmatic Use

Block Parameter: ScoreOutMin Type: character vector Values: "[]" | scalar Default: "[]" Score data type Maximum — Maximum value of score output for range checking [] (default) | scalar Specify the upper value of the score output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Score data type Maximum parameter does not saturate or clip the actual score output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: ScoreOutMax Type: character vector Values: "[]" | scalar Default: "[]" Raw score data type — Untransformed score data type Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the internal untransformed scores. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink).

35-525

35

Functions

Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses a score transformation other than "none" (default, same as "identity"). • If the model uses no score transformations ("none" or "identity"), then you can specify the score data type by using Score data type. • If the model uses a score transformation other than "none" or "identity", then you can specify the data type of untransformed raw scores by using this parameter. To specify the data type of transformed scores, use Score data type. You can change the score transformation option by specifying the ScoreTransform name-value argument during training, or by modifying the ScoreTransform property after training. Programmatic Use

Block Parameter: RawScoreDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Raw score data type Minimum — Minimum untransformed score for range checking [] (default) | scalar Specify the lower value of the untransformed score range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Raw score data type Minimum parameter does not saturate or clip the actual untransformed score signal. Programmatic Use

Block Parameter: RawScoreOutMin Type: character vector Values: "[]" | scalar Default: "[]" Raw score data type Maximum — Maximum untransformed score for range checking [] (default) | scalar Specify the upper value of the untransformed score range that Simulink checks. 35-526

ClassificationKNN Predict

Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Raw score data type Maximum parameter does not saturate or clip the actual untransformed score signal. Programmatic Use

Block Parameter: RawScoreOutMax Type: character vector Values: "[]" | scalar Default: "[]" Estimated cost data type — Data type of cost output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the cost output. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: CostDataTypeStr Type: character vector Values: 'Inherit: auto' | 'double' | 'single' | 'half' | 'int8' | 'uint8' | 'int16' | 'uint16' | 'int32' | 'uint32' | 'int64' | 'uint64' | 'boolean' | 'fixdt(1,16,0)' | 'fixdt(1,16,2^0,0)' | '' Default: 'Inherit: auto' Estimated cost data type Minimum — Minimum value of cost output for range checking [] (default) | scalar Specify the lower value of the cost output range checked by Simulink. Simulink uses the minimum value to perform: 35-527

35

Functions

• Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Estimated cost data type Minimum parameter does not saturate or clip the actual cost signal. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: CostOutMin Type: character vector Values: '[]' | scalar Default: '[]' Estimated cost data type Maximum — Maximum value of cost output for range checking [] (default) | scalar Specify the upper value of the cost output range checked by Simulink. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Estimated cost data type Maximum parameter does not saturate or clip the actual cost signal. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: CostOutMax Type: character vector Values: '[]' | scalar Default: '[]' Distance data type — Data type of distance metric

35-528

ClassificationKNN Predict

Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type of the distance metric. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Tips

The Distance data type parameter specifies the data type of the distance metric for the nearest neighbor search method. For more information, see the Distance name-value argument of the fitcknn function. Programmatic Use

Block Parameter: DistanceDataTypeStr Type: character vector Values: 'Inherit: auto' | 'double' | 'single' | 'half' | 'int8' | 'uint8' | 'int16' | 'uint16' | 'int32' | 'uint32' | 'int64' | 'uint64' | 'boolean' | 'fixdt(1,16,0)' | 'fixdt(1,16,2^0,0)' | '' Default: 'Inherit: auto' Distance data type Minimum — Minimum value of distance metric [] (default) | scalar Specify the lower value of the distance metric's internal variable range checked by Simulink. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Distance data type Minimum parameter does not saturate or clip the actual distance metric signal. Programmatic Use

Block Parameter: DistanceOutMin 35-529

35

Functions

Type: character vector Values: '[]' | scalar Default: '[]' Distance data type Maximum — Maximum value of distance metric [] (default) | scalar Specify the upper value of the distance metric's internal variable range checked by Simulink. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Distance data type Maximum parameter does not saturate or clip the actual distance metric signal. Programmatic Use

Block Parameter: DistanceOutMax Type: character vector Values: '[]' | scalar Default: '[]'

Block Characteristics Data Types

Boolean | double | enumerated | fixed point | half | integer | single

Direct Feedthrough

yes

Multidimensional Signals

no

Variable-Size Signals

no

Zero-Crossing Detection

no

Alternative Functionality You can use a MATLAB Function block with the predict object function of a nearest neighbor classification object (ClassificationKNN). For an example, see “Predict Class Labels Using MATLAB Function Block” on page 34-49.

35-530

ClassificationKNN Predict

When deciding whether to use the ClassificationKNN Predict block in the Statistics and Machine Learning Toolbox library or a MATLAB Function block with the predict function, consider the following: • If you use the Statistics and Machine Learning Toolbox library block, you can use the Fixed-Point Tool to convert a floating-point model to fixed point. • Support for variable-size arrays must be enabled for a MATLAB Function block with the predict function. • If you use a MATLAB Function block, you can use MATLAB functions for preprocessing or postprocessing before or after predictions in the same MATLAB Function block.

Version History Introduced in R2022b

Extended Capabilities C/C++ Code Generation Generate C and C++ code using Simulink® Coder™. Fixed-Point Conversion Design and simulate fixed-point systems using Fixed-Point Designer™.

See Also Blocks ClassificationSVM Predict | ClassificationTree Predict | ClassificationEnsemble Predict | ClassificationNeuralNetwork Predict Functions predict | fitcknn Objects ClassificationKNN Topics “Predict Class Labels Using MATLAB Function Block” on page 34-49 “Predict Class Labels Using ClassificationSVM Predict Block” on page 34-121 “Predict Class Labels Using ClassificationEnsemble Predict Block” on page 34-140 “Predict Class Labels Using ClassificationTree Predict Block” on page 34-131 “Predict Class Labels Using ClassificationNeuralNetwork Predict Block” on page 34-154

35-531

35

Functions

ClassificationLinear class Linear model for binary classification of high-dimensional data

Description ClassificationLinear is a trained linear model object for binary classification; the linear model is a support vector machine (SVM) or logistic regression model. fitclinear fits a ClassificationLinear model by minimizing the objective function using techniques that reduce computation time for high-dimensional data sets (e.g., stochastic gradient descent). The classification loss plus the regularization term compose the objective function. Unlike other classification models, and for economical memory usage, ClassificationLinear model objects do not store the training data. However, they do store, for example, the estimated linear model coefficients, prior-class probabilities, and the regularization strength. You can use trained ClassificationLinear models to predict labels or classification scores for new data. For details, see predict.

Construction Create a ClassificationLinear object by using fitclinear.

Properties Linear Classification Properties

Lambda — Regularization term strength nonnegative scalar | vector of nonnegative values Regularization term strength, specified as a nonnegative scalar or vector of nonnegative values. Data Types: double | single Learner — Linear classification model type 'logistic' | 'svm' Linear classification model type, specified as 'logistic' or 'svm'. In this table, f x = xβ + b . • β is a vector of p coefficients. • x is an observation from p predictor variables. • b is the scalar bias.

35-532

Value

Algorithm

Loss Function

'svm'

Support vector machine Hinge: ℓ y, f x = max 0, 1 − yf x

FittedLoss Value 'hinge'

ClassificationLinear class

Value

Algorithm

Loss Function

FittedLoss Value

'logistic'

Logistic regression

Deviance (logistic): ℓ y, f x = log 1 + exp −yf x

'logit'

Beta — Linear coefficient estimates numeric vector Linear coefficient estimates, specified as a numeric vector with length equal to the number of expanded predictors (see ExpandedPredictorNames). Data Types: double Bias — Estimated bias term numeric scalar Estimated bias term or model intercept, specified as a numeric scalar. Data Types: double FittedLoss — Loss function used to fit linear model 'hinge' | 'logit' This property is read-only. Loss function used to fit the linear model, specified as 'hinge' or 'logit'. Value

Algorithm

Loss Function

'hinge'

Support vector machine Hinge: ℓ y, f x = max 0, 1 − yf x

'svm'

'logit'

Logistic regression

'logistic'

Deviance (logistic): ℓ y, f x = log 1 + exp −yf x

Learner Value

Regularization — Complexity penalty type 'lasso (L1)' | 'ridge (L2)' Complexity penalty type, specified as 'lasso (L1)' or 'ridge (L2)'. The software composes the objective function for minimization from the sum of the average loss function (see FittedLoss) and a regularization value from this table. Value 'lasso (L1)' 'ridge (L2)'

Description Lasso (L1) penalty: λ Ridge (L2) penalty:

p

∑

j=1

βj

p

λ 2 βj 2 j∑ =1

λ specifies the regularization term strength (see Lambda). The software excludes the bias term (β0) from the regularization penalty. 35-533

35

Functions

Other Classification Properties

CategoricalPredictors — Categorical predictor indices vector of positive integers | [] Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). Data Types: single | double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: categorical | char | logical | single | double | cell Cost — Misclassification costs square numeric matrix This property is read-only. Misclassification costs, specified as a square numeric matrix. Cost has K rows and columns, where K is the number of classes. Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. Data Types: double ModelParameters — Parameters used for training model structure Parameters used for training the ClassificationLinear model, specified as a structure. Access fields of ModelParameters using dot notation. For example, access the relative tolerance on the linear coefficients and the bias term by using Mdl.ModelParameters.BetaTolerance. Data Types: struct PredictorNames — Predictor names cell array of character vectors Predictor names in order of their appearance in the predictor data, specified as a cell array of character vectors. The length of PredictorNames is equal to the number of variables in the training data X or Tbl used as predictor variables. Data Types: cell ExpandedPredictorNames — Expanded predictor names cell array of character vectors Expanded predictor names, specified as a cell array of character vectors. 35-534

ClassificationLinear class

If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. Data Types: cell Prior — Prior class probabilities numeric vector This property is read-only. Prior class probabilities, specified as a numeric vector. Prior has as many elements as classes in ClassNames, and the order of the elements corresponds to the elements of ClassNames. Data Types: double ResponseName — Response variable name character vector Response variable name, specified as a character vector. Data Types: char ScoreTransform — Score transformation function 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | 'none' | function handle | ... Score transformation function to apply to predicted scores, specified as a function name or function handle. For linear classification models and before transformation, the predicted classification score for the observation x (row vector) is f(x) = xβ + b, where β and b correspond to Mdl.Beta and Mdl.Bias, respectively. To change the score transformation function to, for example, function, use dot notation. • For a built-in function, enter this code and replace function with a value in the table. Mdl.ScoreTransform = 'function';

Value

Description

"doublelogit"

1/(1 + e–2x)

"invlogit"

log(x / (1 – x))

"ismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

"logit"

1/(1 + e–x)

"none" or "identity"

x (no transformation)

"sign"

–1 for x < 0 0 for x = 0 1 for x > 0

"symmetric"

2x – 1

"symmetricismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

35-535

35

Functions

Value

Description

"symmetriclogit"

2/(1 + e–x) – 1

• For a MATLAB function, or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function must accept a matrix of the original scores for each class, and then return a matrix of the same size representing the transformed scores for each class. Data Types: char | function_handle

Object Functions edge incrementalLearner lime loss margin partialDependence plotPartialDependence predict shapley selectModels update

Classification edge for linear classification models Convert linear model for binary classification to incremental learner Local interpretable model-agnostic explanations (LIME) Classification loss for linear classification models Classification margins for linear classification models Compute partial dependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Predict labels for linear classification models Shapley values Choose subset of regularized, binary linear classification models Update model parameters for code generation

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects.

Examples Train Linear Classification Model Train a binary, linear classification model using support vector machines, dual SGD, and ridge regularization. Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. There are more than two classes in the data. Identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages. Ystats = Y == 'stats';

Train a binary, linear classification model that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation. 35-536

ClassificationLinear class

Train the model using the entire data set. Determine how well the optimization algorithm fit the model to the data by extracting a fit summary. rng(1); % For reproducibility [Mdl,FitInfo] = fitclinear(X,Ystats) Mdl = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'none' Beta: [34023x1 double] Bias: -1.0059 Lambda: 3.1674e-05 Learner: 'svm'

FitInfo = struct with fields: Lambda: 3.1674e-05 Objective: 5.3783e-04 PassLimit: 10 NumPasses: 10 BatchLimit: [] NumIterations: 238561 GradientNorm: NaN GradientTolerance: 0 RelativeChangeInBeta: 0.0562 BetaTolerance: 1.0000e-04 DeltaGradient: 1.4582 DeltaGradientTolerance: 1 TerminationCode: 0 TerminationStatus: {'Iteration limit exceeded.'} Alpha: [31572x1 double] History: [] FitTime: 0.1588 Solver: {'dual'}

Mdl is a ClassificationLinear model. You can pass Mdl and the training or new data to loss to inspect the in-sample classification error. Or, you can pass Mdl and new predictor data to predict to predict class labels for new observations. FitInfo is a structure array containing, among other things, the termination status (TerminationStatus) and how long the solver took to fit the model to the data (FitTime). It is good practice to use FitInfo to determine whether optimization-termination measurements are satisfactory. Because training time is small, you can try to retrain the model, but increase the number of passes through the data. This can improve measures like DeltaGradient.

Predict Class Labels Using Linear Classification Model Load the NLP data set. load nlpdata n = size(X,1); % Number of observations

35-537

35

Functions

Identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages. Ystats = Y == 'stats';

Hold out 5% of the data. rng(1); % For reproducibility cvp = cvpartition(n,'Holdout',0.05) cvp = Hold-out cross validation partition NumObservations: 31572 NumTestSets: 1 TrainSize: 29994 TestSize: 1578 IsCustom: 0

cvp is a CVPartition object that defines the random partition of n data into training and test sets. Train a binary, linear classification model using the training set that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation. For faster training time, orient the predictor data matrix so that the observations are in columns. idxTrain = training(cvp); % Extract training set indices X = X'; Mdl = fitclinear(X(:,idxTrain),Ystats(idxTrain),'ObservationsIn','columns');

Predict observations and classification error for the hold out sample. idxTest = test(cvp); % Extract test set indices labels = predict(Mdl,X(:,idxTest),'ObservationsIn','columns'); L = loss(Mdl,X(:,idxTest),Ystats(idxTest),'ObservationsIn','columns') L = 7.1753e-04

Mdl misclassifies fewer than 1% of the out-of-sample observations.

Version History Introduced in R2016a R2022a: Cost property stores the user-specified cost matrix Behavior changed in R2022a Starting in R2022a, the Cost property stores the user-specified cost matrix, so that you can compute the observed misclassification cost using the specified cost value. The software stores normalized prior probabilities (Prior) that do not reflect the penalties described in the cost matrix. To compute the observed misclassification cost, specify the LossFun name-value argument as "classifcost" when you call the loss function. Note that model training has not changed and, therefore, the decision boundaries between classes have not changed. For training, the fitting function updates the specified prior probabilities by incorporating the penalties described in the specified cost matrix, and then normalizes the prior probabilities and 35-538

ClassificationLinear class

observation weights. This behavior has not changed. In previous releases, the software stored the default cost matrix in the Cost property and stored the prior probabilities used for training in the Prior property. Starting in R2022a, the software stores the user-specified cost matrix without modification, and stores normalized prior probabilities that do not reflect the cost penalties. For more details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 198. Some object functions use the Cost and Prior properties: • The loss function uses the cost matrix stored in the Cost property if you specify the LossFun name-value argument as "classifcost" or "mincost". • The loss and edge functions use the prior probabilities stored in the Prior property to normalize the observation weights of the input data. If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases. If you want the software to handle the cost matrix, prior probabilities, and observation weights in the same way as in previous releases, adjust the prior probabilities and observation weights for the nondefault cost matrix, as described in “Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix” on page 19-9. Then, when you train a classification model, specify the adjusted prior probabilities and observation weights by using the Prior and Weights name-value arguments, respectively, and use the default cost matrix.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict and update functions support code generation. • When you train a linear classification model by using fitclinear, the following restrictions apply. • If the predictor data input argument value is a matrix, it must be a full, numeric matrix. Code generation does not support sparse data. • You can specify only one regularization strength—'auto' or a nonnegative scalar for the 'Lambda' name-value pair argument. • The value of the 'ScoreTransform' name-value pair argument cannot be an anonymous function. • For code generation with a coder configurer, the following additional restrictions apply. • Categorical predictors (logical, categorical, char, string, or cell) are not supported. You cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. • Class labels with the categorical data type are not supported. Both the class label value in the training data (Tbl or Y) and the value of the ClassNames name-value argument cannot be an array with the categorical data type. For more information, see “Introduction to Code Generation” on page 34-3. 35-539

35

Functions

See Also predict | fitclinear | ClassificationPartitionedLinear | ClassificationECOC | ClassificationPartitionedLinearECOC | ClassificationKernel

35-540

ClassificationLinear Predict

ClassificationLinear Predict Classify observations using linear classification model Libraries: Statistics and Machine Learning Toolbox / Classification

Description The ClassificationLinear Predict block classifies observations using a linear classification object (ClassificationLinear) for binary classification. Import a trained classification object into the block by specifying the name of a workspace variable that contains the object. The input port x receives an observation (predictor data), and the output port label returns predicted class labels for the observation. You can add the optional output port score, which returns predicted class scores or posterior probabilities.

Ports Input x — Predictor data row vector | column vector Predictor data, specified as a row or column vector of one observation. The variables in x must have the same order as the predictor variables that trained the model specified by Select trained machine learning model. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point Output label — Predicted class label scalar Predicted class label, returned as a scalar. label is the class yielding the highest score. For more details, see the Label argument of the predict object function. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point | enumerated score — Predicted class scores or posterior probabilities row vector Predicted class scores or posterior probabilities, returned as a 1-by-2 row vector. If the model was trained using a logistic learner, the classification scores are posterior probabilities. The classification score score(i) represents the posterior probability that the observation in x belongs to class i. 35-541

35

Functions

To check the order of the classes, use the ClassNames property of the linear model specified by Select trained machine learning model. Dependencies

To enable this port, select the check box for Add output port for predicted class scores on the Main tab of the Block Parameters dialog box. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point

Parameters Main Select trained machine learning model — Linear classification model linearMdl (default) | ClassificationLinear object Specify the name of a workspace variable that contains a ClassificationLinear object. When you train the model by using fitclinear, the following restrictions apply: • The predictor data cannot include categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. • The Lambda property (regularization term strength) of the trained model must be a numeric scalar. If Lambda is a numeric vector, you must select the model corresponding to one regularization strength by using selectModels. • The value of the ScoreTransform name-value argument cannot be "invlogit" or an anonymous function. Programmatic Use

Block Parameter: TrainedLearner Type: workspace variable Values: ClassificationLinear object Default: 'linearMdl' Add output port for predicted class scores — Add second output port for predicted class scores off (default) | on Select the check box to include the output port score in the ClassificationLinear Predict block. Programmatic Use

Block Parameter: ShowOutputScore Type: character vector Values: 'off' | 'on' Default: 'off'

35-542

ClassificationLinear Predict

Data Types Fixed-Point Operational Parameters

Integer rounding mode — Rounding mode for fixed-point operations Floor (default) | Ceiling | Convergent | Nearest | Round | Simplest | Zero Specify the rounding mode for fixed-point operations. For more information, see “Rounding” (FixedPoint Designer). Block parameters always round to the nearest representable value. To control the rounding of a block parameter, enter an expression into the mask field using a MATLAB rounding function. Programmatic Use

Block Parameter: RndMeth Type: character vector Values: "Ceiling" | "Convergent" | "Floor" | "Nearest" | "Round" | "Simplest" | "Zero" Default: "Floor" Saturate on integer overflow — Method of overflow action off (default) | on Specify whether overflows saturate or wrap. Action

Rationale

Impact on Overflows

Example

Select this check box (on).

Your model has possible overflow, and you want explicit saturation protection in the generated code.

Overflows saturate to either the minimum or maximum value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of – 128.

35-543

35

Functions

Action

Rationale

Impact on Overflows

Example

Clear this check box (off).

You want to optimize the efficiency of your generated code.

Overflows wrap to the appropriate value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the value causing the overflow as int8, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as int8 is –126.

You want to avoid overspecifying how a block handles out-of-range signals. For more information, see “Troubleshoot Signal Range Errors” (Simulink).

Programmatic Use

Block Parameter: SaturateOnIntegerOverflow Type: character vector Values: "off" | "on" Default: "off" Lock output data type setting against changes by the fixed-point tools — Prevention of fixedpoint tools from overriding data type off (default) | on Select this parameter to prevent the fixed-point tools from overriding the data type you specify for the block. For more information, see “Use Lock Output Data Type Setting” (Fixed-Point Designer). Programmatic Use

Block Parameter: LockScale Type: character vector Values: "off" | "on" Default: "off" Data Type

Label data type — Data type of label output Inherit: Inherit via back propagation | Inherit: auto | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the label output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. The supported data types depend on the labels used in the model specified by Select trained machine learning model. • If the model uses numeric or logical labels, the supported data types are Inherit: Inherit via back propagation (default), double, single, half, int8, uint8, int16, uint16, int32, uint32, int64, uint64, boolean, fixed point, and a data type object. 35-544

ClassificationLinear Predict

• If the model uses nonnumeric labels, the supported data types are Inherit: auto (default), Enum: , and a data type object. When you select an inherited option, the software behaves as follows: • Inherit: Inherit via back propagation (default for numeric and logical labels) — Simulink automatically determines the Label data type of the block during data type propagation (see “Data Type Propagation” (Simulink)). In this case, the block uses the data type of a downstream block or signal object. • Inherit: auto (default for nonnumeric labels) — The block uses an autodefined enumerated data type variable. For example, suppose the workspace variable name specified by Select trained machine learning model is myMdl, and the class labels are class 1 and class 2. Then, the corresponding label values are myMdl_enumLabels.class_1 and myMdl_enumLabels.class_2. The block converts the class labels to valid MATLAB identifiers by using the matlab.lang.makeValidName function. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: LabelDataTypeStr Type: character vector Values: "Inherit: Inherit via back propagation" | "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: Inherit via back propagation" (for numeric and logical labels) | "Inherit: auto" (for nonnumeric labels) Label data type Minimum — Minimum value of label output for range checking [] (default) | scalar Specify the lower value of the label output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Label data type Minimum parameter does not saturate or clip the actual label output signal. To do so, use the Saturation block instead. 35-545

35

Functions

Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses numeric labels. Programmatic Use

Block Parameter: LabelOutMin Type: character vector Values: "[]" | scalar Default: "[]" Label data type Maximum — Maximum value of label output for range checking [] (default) | scalar Specify the upper value of the label output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Label data type Maximum parameter does not saturate or clip the actual label output signal. To do so, use the Saturation block instead. Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses numeric labels. Programmatic Use

Block Parameter: LabelOutMax Type: character vector Values: "[]" | scalar Default: "[]" Score data type — Data type of score output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the score output. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. 35-546

ClassificationLinear Predict

For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: ScoreDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Score data type Minimum — Minimum value of score output for range checking [] (default) | scalar Specify the lower value of the score output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Score data type Minimum parameter does not saturate or clip the actual score output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: ScoreOutMin Type: character vector Values: "[]" | scalar Default: "[]" Score data type Maximum — Maximum value of score output for range checking [] (default) | scalar Specify the upper value of the score output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). 35-547

35

Functions

• Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Score data type Maximum parameter does not saturate or clip the actual score output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: ScoreOutMax Type: character vector Values: "[]" | scalar Default: "[]" Raw score data type — Untransformed score data type Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the internal untransformed scores. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses a score transformation other than "none" (default, same as "identity"). • If the model uses no score transformations ("none" or "identity"), then you can specify the score data type by using Score data type. • If the model uses a score transformation other than "none" or "identity", then you can specify the data type of untransformed raw scores by using this parameter. To specify the data type of transformed scores, use Score data type. You can change the score transformation option by specifying the ScoreTransform name-value argument during training, or by modifying the ScoreTransform property after training. Programmatic Use

Block Parameter: RawScoreDataTypeStr Type: character vector

35-548

ClassificationLinear Predict

Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Raw score data type Minimum — Minimum untransformed score for range checking [] (default) | scalar Specify the lower value of the untransformed score range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Raw score data type Minimum parameter does not saturate or clip the actual untransformed score signal. Programmatic Use

Block Parameter: RawScoreOutMin Type: character vector Values: "[]" | scalar Default: "[]" Raw score data type Maximum — Maximum untransformed score for range checking [] (default) | scalar Specify the upper value of the untransformed score range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Raw score data type Maximum parameter does not saturate or clip the actual untransformed score signal. 35-549

35

Functions

Programmatic Use

Block Parameter: RawScoreOutMax Type: character vector Values: "[]" | scalar Default: "[]" Inner product data type — Inner product data type Inherit: Inherit via internal rule (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the inner product term of the classification score on page 35-552. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: Inherit via internal rule, the block uses an internal rule to determine the inner product data type. The internal rule chooses a data type that optimizes numerical accuracy, performance, and generated code size, while taking into account the properties of the embedded target hardware. The software cannot always optimize efficiency and numerical accuracy at the same time. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: InnerProductDataTypeStr Type: character vector Values: 'Inherit: Inherit via internal rule' | 'double' | 'single' | 'half' | 'int8' | 'uint8' | 'int16' | 'uint16' | 'int32' | 'uint32' | 'int64' | 'uint64' | 'boolean' | 'fixdt(1,16,0)' | 'fixdt(1,16,2^0,0)' | '' Default: 'Inherit: Inherit via internal rule' Inner product data type Minimum — Minimum of inner product term for range checking [] (default) | scalar Specify the lower value of the inner product term range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). 35-550

ClassificationLinear Predict

Note The Inner product data type Minimum parameter does not saturate or clip the actual inner product term value. Programmatic Use

Block Parameter: InnerProductOutMin Type: character vector Values: "[]" | scalar Default: "[]" Inner product data type Maximum — Maximum of inner product term for range checking [] (default) | scalar Specify the upper value of the inner product term range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Inner product data type Maximum parameter does not saturate or clip the actual inner product term value. Programmatic Use

Block Parameter: InnerProductOutMax Type: character vector Values: "[]" | scalar Default: "[]"

Block Characteristics Data Types

Boolean | double | enumerated | fixed point | half | integer | single

Direct Feedthrough

yes

Multidimensional Signals

no

Variable-Size Signals

no

Zero-Crossing Detection

no

35-551

35

Functions

More About Classification Score For linear classification models, the raw classification score for classifying the observation x into the positive class is defined by f(x) = xβ+b β is the estimated column vector of coefficients, and b is the estimated scalar bias. The linear classification model object specified by Select trained machine learning model contains the coefficients and bias in the Beta and Bias properties, respectively. The raw classification score for classifying x into the negative class is –f(x). The software classifies observations into the class that yields the positive score. If the linear classification model uses no score transformations, then the raw classification score is the same as the classification score. If the model consists of logistic regression learners, then the software applies the 'logit' score transformation to the raw classification scores. You can specify the data types for the components required to compute classification scores using Score data type, Raw score data type, and Inner product data type. • Score data type determines the data type of the classification score. • Raw score data type determines the data type of the raw classification score f if the model uses a score transformation other than 'none' or 'identity'. • Inner product data type determines the data type of xβ.

Alternative Functionality You can use a MATLAB Function block with the predict object function of a linear classification object (ClassificationLinear). For an example, see “Predict Class Labels Using MATLAB Function Block” on page 34-49. When deciding whether to use the ClassificationLinear Predict block in the Statistics and Machine Learning Toolbox library or a MATLAB Function block with the predict function, consider the following: • If you use the Statistics and Machine Learning Toolbox library block, you can use the Fixed-Point Tool to convert a floating-point model to fixed point. • Support for variable-size arrays must be enabled for a MATLAB Function block with the predict function. • If you use a MATLAB Function block, you can use MATLAB functions for preprocessing or postprocessing before or after predictions in the same MATLAB Function block.

Version History Introduced in R2023a

35-552

ClassificationLinear Predict

Extended Capabilities C/C++ Code Generation Generate C and C++ code using Simulink® Coder™. Fixed-Point Conversion Design and simulate fixed-point systems using Fixed-Point Designer™.

See Also Blocks ClassificationECOC Predict | ClassificationSVM Predict | RegressionLinear Predict | IncrementalClassificationLinear Predict | IncrementalClassificationLinear Fit Objects ClassificationLinear Functions predict | fitclinear Topics “Predict Class Labels Using ClassificationSVM Predict Block” on page 34-121 “Predict Class Labels Using ClassificationTree Predict Block” on page 34-131 “Predict Class Labels Using ClassificationECOC Predict Block” on page 34-182 “Predict Class Labels Using MATLAB Function Block” on page 34-49

35-553

35

Functions

ClassificationLinearCoderConfigurer Coder configurer for linear binary classification of high-dimensional data

Description A ClassificationLinearCoderConfigurer object is a coder configurer of a linear classification model (ClassificationLinear) used for binary classification of high-dimensional data. A coder configurer offers convenient features to configure code generation options, generate C/C++ code, and update model parameters in the generated code. • Configure code generation options and specify the coder attributes of linear model parameters by using object properties. • Generate C/C++ code for the predict and update functions of the linear classification model by using generateCode. Generating C/C++ code requires MATLAB Coder. • Update model parameters in the generated C/C++ code without having to regenerate the code. This feature reduces the effort required to regenerate, redeploy, and reverify C/C++ code when you retrain the linear model with new data or settings. Before updating model parameters, use validatedUpdateInputs to validate and extract the model parameters to update. This flow chart shows the code generation workflow using a coder configurer.

For the code generation usage notes and limitations of a linear classification model, see the Code Generation sections of ClassificationLinear, predict, and update.

Creation After training a linear classification model by using fitclinear, create a coder configurer for the model by using learnerCoderConfigurer. Use the properties of a coder configurer to specify the coder attributes of the predict and update arguments. Then, use generateCode to generate C/C+ + code based on the specified coder attributes.

Properties predict Arguments The properties listed in this section specify the coder attributes of the predict function arguments in the generated code. 35-554

ClassificationLinearCoderConfigurer

X — Coder attributes of predictor data LearnerCoderInput object Coder attributes of the predictor data to pass to the generated C/C++ code for the predict function of the linear classification model, specified as a LearnerCoderInput on page 35-565 object. When you create a coder configurer by using the learnerCoderConfigurer function, the input argument X determines the default values of the LearnerCoderInput coder attributes: • SizeVector — The default value is the array size of the input X. • If the Value attribute of the ObservationsIn property for the ClassificationLinearCoderConfigurer is 'rows', then this SizeVector value is [n p], where n corresponds to the number of observations and p corresponds to the number of predictors. • If the Value attribute of the ObservationsIn property for the ClassificationLinearCoderConfigurer is 'columns', then this SizeVector value is [p n]. To switch the elements of SizeVector (for example, to change [n p] to [p n]), modify the Value attribute of the ObservationsIn property for the ClassificationLinearCoderConfigurer accordingly. You cannot modify the SizeVector value directly. • VariableDimensions — The default value is [0 0], which indicates that the array size is fixed as specified in SizeVector. You can set this value to [1 0] if the SizeVector value is [n p] or to [0 1] if it is [p n], which indicates that the array has variable-size rows and fixed-size columns. For example, [1 0] specifies that the first value of SizeVector (n) is the upper bound for the number of rows, and the second value of SizeVector (p) is the number of columns. • DataType — This value is single or double. The default data type depends on the data type of the input X. • Tunability — This value must be true, meaning that predict in the generated C/C++ code always includes predictor data as an input. You can modify the coder attributes by using dot notation. For example, to generate C/C++ code that accepts predictor data with 100 observations (in rows) of three predictor variables (in columns), specify these coder attributes of X for the coder configurer configurer: configurer.X.SizeVector = [100 3]; configurer.X.DataType = 'double'; configurer.X.VariableDimensions = [0 0];

[0 0] indicates that the first and second dimensions of X (number of observations and number of predictor variables, respectively) have fixed sizes. To allow the generated C/C++ code to accept predictor data with up to 100 observations, specify these coder attributes of X: configurer.X.SizeVector = [100 3]; configurer.X.DataType = 'double'; configurer.X.VariableDimensions = [1 0];

[1 0] indicates that the first dimension of X (number of observations) has a variable size and the second dimension of X (number of predictor variables) has a fixed size. The specified number of 35-555

35

Functions

observations, 100 in this example, becomes the maximum allowed number of observations in the generated C/C++ code. To allow any number of observations, specify the bound as Inf. ObservationsIn — Coder attributes of predictor data observation dimension EnumeratedInput object Coder attributes of the predictor data observation dimension ('ObservationsIn' name-value pair argument of predict), specified as an EnumeratedInput on page 35-566 object. When you create a coder configurer by using the learnerCoderConfigurer function, the 'ObservationsIn' name-value pair argument determines the default values of the EnumeratedInput coder attributes: • Value — The default value is the predictor data observation dimension you use when creating the coder configurer, specified as 'rows' or 'columns'. If you do not specify 'ObservationsIn' when creating the coder configurer, the default value is 'rows'. • SelectedOption — This value is always 'Built-in'. This attribute is read-only. • BuiltInOptions — Cell array of 'rows' and 'columns'. This attribute is read-only. • IsConstant — This value must be true. • Tunability — The default value is false if you specify 'ObservationsIn','rows' when creating the coder configurer, and true if you specify 'ObservationsIn','columns'. If you set Tunability to false, the software sets Value to 'rows'. If you specify other attribute values when Tunability is false, the software sets Tunability to true. NumOutputs — Number of outputs in predict 1 (default) | 2 Number of output arguments to return from the generated C/C++ code for the predict function of the linear classification model, specified as 1 or 2. The output arguments of predict are Label (predicted class labels) and Score (classification scores), in that order. predict in the generated C/C++ code returns the first n outputs of the predict function, where n is the NumOutputs value. After creating the coder configurer configurer, you can specify the number of outputs by using dot notation. configurer.NumOutputs = 2;

The NumOutputs property is equivalent to the '-nargout' compiler option of codegen. This option specifies the number of output arguments in the entry-point function of code generation. The object function generateCode generates two entry-point functions—predict.m and update.m for the predict and update functions of a linear classification model, respectively—and generates C/C++ code for the two entry-point functions. The specified value for the NumOutputs property corresponds to the number of output arguments in the entry-point function predict.m. Data Types: double update Arguments The properties listed in this section specify the coder attributes of the update function arguments in the generated code. The update function takes a trained model and new model parameters as input arguments, and returns an updated version of the model that contains the new parameters. To enable updating the parameters in the generated code, you need to specify the coder attributes of the 35-556

ClassificationLinearCoderConfigurer

parameters before generating code. Use a LearnerCoderInput on page 35-565 object to specify the coder attributes of each parameter. The default attribute values are based on the model parameters in the input argument Mdl of learnerCoderConfigurer. Beta — Coder attributes of linear predictor coefficients LearnerCoderInput object Coder attributes of the linear predictor coefficients (Beta of a linear classification model), specified as a LearnerCoderInput on page 35-565 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — This value must be [p 1], where p is the number of predictors in Mdl. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — This value must be true. Bias — Coder attributes of bias term LearnerCoderInput object Coder attributes of the bias term (Bias of a linear classification model), specified as a LearnerCoderInput on page 35-565 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — This value must be [1 1]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — This value must be true. Cost — Coder attributes of misclassification cost LearnerCoderInput object Coder attributes of the misclassification cost (Cost of a linear classification model), specified as a LearnerCoderInput on page 35-565 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — This value must be [2 2]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — The default value is true. 35-557

35

Functions

Prior — Coder attributes of prior probabilities LearnerCoderInput object Coder attributes of the prior probabilities (Prior of a linear classification model), specified as a LearnerCoderInput on page 35-565 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — This value must be [1 2]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — The default value is true. Other Configurer Options OutputFileName — File name of generated C/C++ code 'ClassificationLinearModel' (default) | character vector File name of the generated C/C++ code, specified as a character vector. The object function generateCode of ClassificationLinearCoderConfigurer generates C/C+ + code using this file name. The file name must not contain spaces because they can lead to code generation failures in certain operating system configurations. Also, the name must be a valid MATLAB function name. After creating the coder configurer configurer, you can specify the file name by using dot notation. configurer.OutputFileName = 'myModel';

Data Types: char Verbose — Verbosity level true (logical 1) (default) | false (logical 0) Verbosity level, specified as true (logical 1) or false (logical 0). The verbosity level controls the display of notification messages at the command line. Value

Description

true (logical 1)

The software displays notification messages when your changes to the coder attributes of a parameter result in changes for other dependent parameters.

false (logical 0)

The software does not display notification messages.

To enable updating machine learning model parameters in the generated code, you need to configure the coder attributes of the parameters before generating code. The coder attributes of parameters are dependent on each other, so the software stores the dependencies as configuration constraints. If you modify the coder attributes of a parameter by using a coder configurer, and the modification requires subsequent changes to other dependent parameters to satisfy configuration constraints, 35-558

ClassificationLinearCoderConfigurer

then the software changes the coder attributes of the dependent parameters. The verbosity level determines whether or not the software displays notification messages for these subsequent changes. After creating the coder configurer configurer, you can modify the verbosity level by using dot notation. configurer.Verbose = false;

Data Types: logical Options for Code Generation Customization To customize the code generation workflow, use the generateFiles function and the following three properties with codegen, instead of using the generateCode function. After generating the two entry-point function files (predict.m and update.m) by using the generateFiles function, you can modify these files according to your code generation workflow. For example, you can modify the predict.m file to include data preprocessing, or you can add these entry-point functions to another code generation project. Then, you can generate C/C++ code by using the codegen function and the codegen arguments appropriate for the modified entry-point functions or code generation project. Use the three properties described in this section as a starting point to set the codegen arguments. CodeGenerationArguments — codegen arguments cell array This property is read-only. codegen arguments, specified as a cell array. This property enables you to customize the code generation workflow. Use the generateCode function if you do not need to customize your workflow. Instead of using generateCode with the coder configurer configurer, you can generate C/C++ code as follows: generateFiles(configurer) cgArgs = configurer.CodeGenerationArguments; codegen(cgArgs{:})

If you customize the code generation workflow, modify cgArgs accordingly before calling codegen. If you modify other properties of configurer, the software updates the CodeGenerationArguments property accordingly. Data Types: cell PredictInputs — List of tunable input arguments of predict cell array This property is read-only. List of tunable input arguments of the entry-point function predict.m for code generation, specified as a cell array. The cell array contains another cell array that includes coder.PrimitiveType objects and coder.Constant objects. 35-559

35

Functions

If you modify the coder attributes of predict arguments on page 35-554, then the software updates the corresponding objects accordingly. If you specify the Tunability attribute as false, then the software removes the corresponding objects from the PredictInputs list. The cell array in PredictInputs is equivalent to configurer.CodeGenerationArguments{6} for the coder configurer configurer. Data Types: cell UpdateInputs — List of tunable input arguments of update cell array of a structure including coder.PrimitiveType objects This property is read-only. List of the tunable input arguments of the entry-point function update.m for code generation, specified as a cell array of a structure including coder.PrimitiveType objects. Each coder.PrimitiveType object includes the coder attributes of a tunable machine learning model parameter. If you modify the coder attributes of a model parameter by using the coder configurer properties (update Arguments on page 35-556 properties), then the software updates the corresponding coder.PrimitiveType object accordingly. If you specify the Tunability attribute of a machine learning model parameter as false, then the software removes the corresponding coder.PrimitiveType object from the UpdateInputs list. The structure in UpdateInputs is equivalent to configurer.CodeGenerationArguments{3} for the coder configurer configurer. Data Types: cell

Object Functions generateCode generateFiles validatedUpdateInputs

Generate C/C++ code using coder configurer Generate MATLAB files for code generation using coder configurer Validate and extract machine learning model parameters to update

Examples Generate Code Using Coder Configurer Train a machine learning model, and then generate code for the predict and update functions of the model by using a coder configurer. Load the ionosphere data set, and train a binary linear classification model. Pass the transposed predictor matrix Xnew to fitclinear, and use the 'ObservationsIn' name-value pair argument to specify that the columns of Xnew correspond to observations. load ionosphere Xnew = X'; Mdl = fitclinear(Xnew,Y,'ObservationsIn','columns');

Mdl is a ClassificationLinear object. Create a coder configurer for the ClassificationLinear model by using learnerCoderConfigurer. Specify the predictor data Xnew, and use the 'ObservationsIn' 35-560

ClassificationLinearCoderConfigurer

name-value pair argument to specify the observation dimension of Xnew. The learnerCoderConfigurer function uses these input arguments to configure the coder attributes of the corresponding input arguments of predict. configurer = learnerCoderConfigurer(Mdl,Xnew,'ObservationsIn','columns') configurer = ClassificationLinearCoderConfigurer with properties: Update Inputs: Beta: Bias: Prior: Cost:

[1x1 [1x1 [1x1 [1x1

LearnerCoderInput] LearnerCoderInput] LearnerCoderInput] LearnerCoderInput]

Predict Inputs: X: [1x1 LearnerCoderInput] ObservationsIn: [1x1 EnumeratedInput] Code Generation Parameters: NumOutputs: 1 OutputFileName: 'ClassificationLinearModel'

configurer is a ClassificationLinearCoderConfigurer object, which is a coder configurer of a ClassificationLinear object. To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler”. Generate code for the predict and update functions of the linear classification model (Mdl). generateCode(configurer) generateCode creates these files in output folder: 'initialize.m', 'predict.m', 'update.m', 'ClassificationLinearModel.mat' Code generation successful.

The generateCode function completes these actions: • Generate the MATLAB files required to generate code, including the two entry-point functions predict.m and update.m for the predict and update functions of Mdl, respectively. • Create a MEX function named ClassificationLinearModel for the two entry-point functions. • Create the code for the MEX function in the codegen\mex\ClassificationLinearModel folder. • Copy the MEX function to the current folder. Display the contents of the predict.m, update.m, and initialize.m files by using the type function. type predict.m function varargout = predict(X,varargin) %#codegen % Autogenerated by MATLAB, 20-Aug-2023 10:45:41

35-561

35

Functions

[varargout{1:nargout}] = initialize('predict',X,varargin{:}); end type update.m function update(varargin) %#codegen % Autogenerated by MATLAB, 20-Aug-2023 10:45:41 initialize('update',varargin{:}); end type initialize.m function [varargout] = initialize(command,varargin) %#codegen % Autogenerated by MATLAB, 20-Aug-2023 10:45:41 coder.inline('always') persistent model if isempty(model) model = loadLearnerForCoder('ClassificationLinearModel.mat'); end switch(command) case 'update' % Update struct fields: Beta % Bias % Prior % Cost model = update(model,varargin{:}); case 'predict' % Predict Inputs: X, ObservationsIn X = varargin{1}; if nargin == 2 [varargout{1:nargout}] = predict(model,X); else PVPairs = cell(1,nargin-2); for i = 1:nargin-2 PVPairs{1,i} = varargin{i+1}; end [varargout{1:nargout}] = predict(model,X,PVPairs{:}); end end end

Update Parameters of Linear Classification Model in Generated Code Train a linear classification model using a partial data set and create a coder configurer for the model. Use the properties of the coder configurer to specify coder attributes of the linear model parameters. Use the object function of the coder configurer to generate C code that predicts labels for new predictor data. Then retrain the model using the entire data set, and update parameters in the generated code without regenerating the code. Train Model Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). Train a binary linear classification model using half of the observations. Transpose the predictor data, and use the 'ObservationsIn' name-value pair argument to specify that the columns of XTrain correspond to observations. 35-562

ClassificationLinearCoderConfigurer

load ionosphere rng('default') % For reproducibility n = length(Y); c = cvpartition(Y,'HoldOut',0.5); idxTrain = training(c,1); XTrain = X(idxTrain,:)'; YTrain = Y(idxTrain); Mdl = fitclinear(XTrain,YTrain,'ObservationsIn','columns');

Mdl is a ClassificationLinear object. Create Coder Configurer Create a coder configurer for the ClassificationLinear model by using learnerCoderConfigurer. Specify the predictor data XTrain, and use the 'ObservationsIn' name-value pair argument to specify the observation dimension of XTrain. The learnerCoderConfigurer function uses these input arguments to configure the coder attributes of the corresponding input arguments of predict. Also, set the number of outputs to 2 so that the generated code returns predicted labels and scores. configurer = learnerCoderConfigurer(Mdl,XTrain,'ObservationsIn','columns','NumOutputs',2);

configurer is a ClassificationLinearCoderConfigurer object, which is a coder configurer of a ClassificationLinear object. Specify Coder Attributes of Parameters Specify the coder attributes of the linear classification model parameters so that you can update the parameters in the generated code after retraining the model. This example specifies the coder attributes of the predictor data that you want to pass to the generated code. Specify the coder attributes of the X property of configurer so that the generated code accepts any number of observations. Modify the SizeVector and VariableDimensions attributes. The SizeVector attribute specifies the upper bound of the predictor data size, and the VariableDimensions attribute specifies whether each dimension of the predictor data has a variable size or fixed size. configurer.X.SizeVector = [34 Inf]; configurer.X.VariableDimensions ans = 1x2 logical array 0

1

The size of the first dimension is the number of predictor variables. This value must be fixed for a machine learning model. Because the predictor data contains 34 predictors, the value of the SizeVector attribute must be 34 and the value of the VariableDimensions attribute must be 0. The size of the second dimension is the number of observations. Setting the value of the SizeVector attribute to Inf causes the software to change the value of the VariableDimensions attribute to 1. In other words, the upper bound of the size is Inf and the size is variable, meaning that the predictor data can have any number of observations. This specification is convenient if you do not know the number of observations when generating code. 35-563

35

Functions

The order of the dimensions in SizeVector and VariableDimensions depends on the coder attributes of ObservationsIn. configurer.ObservationsIn ans = EnumeratedInput with properties: Value: SelectedOption: BuiltInOptions: IsConstant: Tunability:

'columns' 'Built-in' {'rows' 'columns'} 1 1

When the Value attribute of the ObservationsIn property is 'columns', the first dimension of the SizeVector and VariableDimensions attributes of X corresponds to the number of predictors, and the second dimension corresponds to the number of observations. When the Value attribute of ObservationsIn is 'rows', the order of the dimensions is switched. Generate Code To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler”. Generate code for the predict and update functions of the linear classification model (Mdl). generateCode(configurer) generateCode creates these files in output folder: 'initialize.m', 'predict.m', 'update.m', 'ClassificationLinearModel.mat' Code generation successful.

The generateCode function completes these actions: • Generate the MATLAB files required to generate code, including the two entry-point functions predict.m and update.m for the predict and update functions of Mdl, respectively. • Create a MEX function named ClassificationLinearModel for the two entry-point functions. • Create the code for the MEX function in the codegen\mex\ClassificationLinearModel folder. • Copy the MEX function to the current folder. Verify Generated Code Pass some predictor data to verify whether the predict function of Mdl and the predict function in the MEX function return the same labels. To call an entry-point function in a MEX function that has more than one entry point, specify the function name as the first input argument. [label,score] = predict(Mdl,XTrain,'ObservationsIn','columns'); [label_mex,score_mex] = ClassificationLinearModel('predict',XTrain,'ObservationsIn','columns');

Compare label and label_mex by using isequal. isequal(label,label_mex)

35-564

ClassificationLinearCoderConfigurer

ans = logical 1

isequal returns logical 1 (true) if all the inputs are equal. The comparison confirms that the predict function of Mdl and the predict function in the MEX function return the same labels. Compare score and score_mex. max(abs(score-score_mex),[],'all') ans = 0

In general, score_mex might include round-off differences compared to score. In this case, the comparison confirms that score and score_mex are equal. Retrain Model and Update Parameters in Generated Code Retrain the model using the entire data set. retrainedMdl = fitclinear(X',Y,'ObservationsIn','columns');

Extract parameters to update by using validatedUpdateInputs. This function detects the modified model parameters in retrainedMdl and validates whether the modified parameter values satisfy the coder attributes of the parameters. params = validatedUpdateInputs(configurer,retrainedMdl);

Update parameters in the generated code. ClassificationLinearModel('update',params)

Verify Generated Code Compare the outputs from the predict function of retrainedMdl and the predict function in the updated MEX function. [label,score] = predict(retrainedMdl,X','ObservationsIn','columns'); [label_mex,score_mex] = ClassificationLinearModel('predict',X','ObservationsIn','columns'); isequal(label,label_mex) ans = logical 1 max(abs(score-score_mex),[],'all') ans = 0

The comparison confirms that label and label_mex are equal, and that the score values are equal.

More About LearnerCoderInput Object A coder configurer uses a LearnerCoderInput object to specify the coder attributes of predict and update input arguments. 35-565

35

Functions

A LearnerCoderInput object has the following attributes to specify the properties of an input argument array in the generated code. Attribute Name

Description

SizeVector

Array size if the corresponding VariableDimensions value is false. Upper bound of the array size if the corresponding VariableDimensions value is true. To allow an unbounded array, specify the bound as Inf.

VariableDimensions

Indicator specifying whether each dimension of the array has a variable size or fixed size, specified as true (logical 1) or false (logical 0): • A value of true (logical 1) means that the corresponding dimension has a variable size. • A value of false (logical 0) means that the corresponding dimension has a fixed size.

DataType

Data type of the array

Tunability

Indicator specifying whether or not predict or update includes the argument as an input in the generated code, specified as true (logical 1) or false (logical 0). If you specify other attribute values when Tunability is false, the software sets Tunability to true.

After creating a coder configurer, you can modify the coder attributes by using dot notation. For example, specify the data type of the bias term Bias of the coder configurer configurer: configurer.Bias.DataType = 'single';

If you specify the verbosity level (Verbose) as true (default), then the software displays notification messages when you modify the coder attributes of a machine learning model parameter and the modification changes the coder attributes of other dependent parameters. EnumeratedInput Object A coder configurer uses an EnumeratedInput object to specify the coder attributes of predict input arguments that have a finite set of available values. An EnumeratedInput object has the following attributes to specify the properties of an input argument array in the generated code.

35-566

ClassificationLinearCoderConfigurer

Attribute Name

Description

Value

Value of the predict argument in the generated code, specified as a character vector or a LearnerCoderInput on page 35-565 object. • Character vector in BuiltInOptions — You can specify one of the BuiltInOptions using either the option name or its index value. For example, to choose the first option, specify Value as either the first character vector in BuiltInOptions or 1. • Character vector designating a custom function name — To use a custom option, define a custom function on the MATLAB search path, and specify Value as the name of the custom function. • LearnerCoderInput on page 35-565 object — If you set IsConstant to false (logical 0), then the software changes Value to a LearnerCoderInput on page 35-565 object with the following read-only coder attribute values. These values indicate that the input in the generated code is a variable-size, tunable character vector that is one of the available values in BuiltInOptions. • SizeVector — [1 c], indicating the upper bound of the array size, where c is the length of the longest available character vector in Option • VariableDimensions — [0 1], indicating that the array is a variable-size vector • DataType — 'char' • Tunability — 1 The default value of Value is consistent with the default value of the corresponding predict argument, which is one of the character vectors in BuiltInOptions.

SelectedOption

Status of the selected option, specified as 'Built-in', 'Custom', or 'NonConstant'. The software sets SelectedOption according to Value: • 'Built-in'(default) — When Value is one of the character vectors in BuiltInOptions • 'Custom' — When Value is a character vector that is not in BuiltInOptions • 'NonConstant' — When Value is a LearnerCoderInput on page 35-565 object This attribute is read-only.

BuiltInOptions

List of available character vectors for the corresponding predict argument, specified as a cell array. This attribute is read-only.

35-567

35

Functions

Attribute Name

Description

IsConstant

Indicator specifying whether or not the array value is a compiletime constant (coder.Constant) in the generated code, specified as true (logical 1, default) or false (logical 0). If you set this value to false, then the software changes Value to a LearnerCoderInput on page 35-565 object.

Tunability

Indicator specifying whether or not predict includes the argument as an input in the generated code, specified as true (logical 1) or false (logical 0, default). If you specify other attribute values when Tunability is false, the software sets Tunability to true.

After creating a coder configurer, you can modify the coder attributes by using dot notation. For example, specify the coder attributes of ObservationsIn of the coder configurer configurer: configurer.ObservationsIn.Value = 'columns';

Version History Introduced in R2019b

See Also learnerCoderConfigurer | ClassificationLinear | update | predict | ClassificationECOCCoderConfigurer Topics “Introduction to Code Generation” on page 34-3 “Code Generation for Prediction and Update Using Coder Configurer” on page 34-90

35-568

ClassificationNaiveBayes

ClassificationNaiveBayes Naive Bayes classification for multiclass classification

Description ClassificationNaiveBayes is a “Naive Bayes” on page 35-579 classifier for multiclass learning. Trained ClassificationNaiveBayes classifiers store the training data, parameter values, data distribution, and prior probabilities. Use these classifiers to perform tasks such as estimating resubstitution predictions (see resubPredict) and predicting labels or posterior probabilities for new data (see predict).

Creation Create a ClassificationNaiveBayes object by using fitcnb.

Properties Predictor Properties PredictorNames — Predictor names cell array of character vectors This property is read-only. Predictor names, specified as a cell array of character vectors. The order of the elements in PredictorNames corresponds to the order in which the predictor names appear in the training data X. ExpandedPredictorNames — Expanded predictor names cell array of character vectors This property is read-only. Expanded predictor names, specified as a cell array of character vectors. If the model uses dummy variable encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. CategoricalPredictors — Categorical predictor indices vector of positive integers | [] This property is read-only. Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). 35-569

35

Functions

Data Types: single | double CategoricalLevels — Multivariate multinomial levels cell array This property is read-only. Multivariate multinomial levels, specified as a cell array. The length of CategoricalLevels is equal to the number of predictors (size(X,2)). The cells of CategoricalLevels correspond to predictors that you specify as 'mvmn' during training, that is, they have a multivariate multinomial distribution. Cells that do not correspond to a multivariate multinomial distribution are empty ([]). If predictor j is multivariate multinomial, then CategoricalLevels{j} is a list of all distinct values of predictor j in the sample. NaNs are removed from unique(X(:,j)). X — Unstandardized predictors numeric matrix This property is read-only. Unstandardized predictors used to train the naive Bayes classifier, specified as a numeric matrix. Each row of X corresponds to one observation, and each column corresponds to one variable. The software excludes observations containing at least one missing value, and removes corresponding elements from Y. Predictor Distribution Properties DistributionNames — Predictor distributions 'normal' (default) | 'kernel' | 'mn' | 'mvmn' | cell array of character vectors This property is read-only. Predictor distributions, specified as a character vector or cell array of character vectors. fitcnb uses the predictor distributions to model the predictors. This table lists the available distributions. Value

Description

'kernel'

Kernel smoothing density estimate

'mn'

Multinomial distribution. If you specify mn, then all features are components of a multinomial distribution. Therefore, you cannot include 'mn' as an element of a string array or a cell array of character vectors. For details, see “Estimated Probability for Multinomial Distribution” on page 35-580.

'mvmn'

Multivariate multinomial distribution. For details, see “Estimated Probability for Multivariate Multinomial Distribution” on page 35-580.

'normal'

Normal (Gaussian) distribution

If DistributionNames is a 1-by-P cell array of character vectors, then fitcnb models the feature j using the distribution in element j of the cell array. 35-570

ClassificationNaiveBayes

Example: 'mn' Example: {'kernel','normal','kernel'} Data Types: char | string | cell DistributionParameters — Distribution parameter estimates cell array This property is read-only. Distribution parameter estimates, specified as a cell array. DistributionParameters is a K-by-D cell array, where cell (k,d) contains the distribution parameter estimates for instances of predictor d in class k. The order of the rows corresponds to the order of the classes in the property ClassNames, and the order of the predictors corresponds to the order of the columns of X. If class k has no observations for predictor j, then the Distribution{k,j} is empty ([]). The elements of DistributionParameters depend on the distributions of the predictors. This table describes the values in DistributionParameters{k,j}. Distribution of Predictor j

Value of Cell Array for Predictor j and Class k

kernel

A KernelDistribution model. Display properties using cell indexing and dot notation. For example, to display the estimated bandwidth of the kernel density for predictor 2 in the third class, use Mdl.DistributionParameters{3,2}.Bandwidth.

mn

A scalar representing the probability that token j appears in class k. For details, see “Estimated Probability for Multinomial Distribution” on page 35-580.

mvmn

A numeric vector containing the probabilities for each possible level of predictor j in class k. The software orders the probabilities by the sorted order of all unique levels of predictor j (stored in the property CategoricalLevels). For more details, see “Estimated Probability for Multivariate Multinomial Distribution” on page 35-580.

normal

A 2-by-1 numeric vector. The first element is the sample mean and the second element is the sample standard deviation. For more details, see “Normal Distribution Estimators” on page 35-579

Kernel — Kernel smoother type 'normal' (default) | 'box' | cell array | ... This property is read-only. Kernel smoother type, specified as the name of a kernel or a cell array of kernel names. The length of Kernel is equal to the number of predictors (size(X,2)). Kernel{j} corresponds to predictor j and contains a character vector describing the type of kernel smoother. If a cell is empty ([]), then fitcnb did not fit a kernel distribution to the corresponding predictor. This table describes the supported kernel smoother types. I{u} denotes the indicator function.

35-571

35

Functions

Value

Kernel

'box'

Box (uniform)

Formula

'epanechnik Epanechnikov ov' 'normal'

Gaussian

'triangle'

Triangular

f (x) = 0.5I x ≤ 1 f (x) = 0.75 1 − x2 I x ≤ 1 f (x) =

1 exp −0.5x2 2π

f (x) = 1 − x I x ≤ 1

Example: 'box' Example: {'epanechnikov','normal'} Data Types: char | string | cell Mu — Predictor means numeric vector | [] This property is read-only. Predictor means, specified as a numeric vector. If you specify Standardize as 1 or true when you train the naive Bayes classifier using fitcnb, then the length of the Mu vector is equal to the number of predictors. The vector contains 0 values for predictors with nonkernel distributions, such as categorical predictors (see DistributionNames). If you set Standardize to 0 or false when you train the naive Bayes classifier using fitcnb, then the Mu value is an empty vector ([]). Data Types: double Sigma — Predictor standard deviations numeric vector | [] This property is read-only. Predictor standard deviations, specified as a numeric vector. If you specify Standardize as 1 or true when you train the naive Bayes classifier using fitcnb, then the length of the Sigma vector is equal to the number of predictors. The vector contains 1 values for predictors with nonkernel distributions, such as categorical predictors (see DistributionNames). If you set Standardize to 0 or false when you train the naive Bayes classifier using fitcnb, then the Sigma value is an empty vector ([]). Data Types: double Support — Kernel smoother density support cell array This property is read-only. Kernel smoother density support, specified as a cell array. The length of Support is equal to the number of predictors (size(X,2)). The cells represent the regions to which fitcnb applies the kernel density. If a cell is empty ([]), then fitcnb did not fit a kernel distribution to the corresponding predictor. This table describes the supported options. 35-572

ClassificationNaiveBayes

Value

Description

1-by-2 numeric row vector

The density support applies to the specified bounds, for example [L,U], where L and U are the finite lower and upper bounds, respectively.

'positive'

The density support applies to all positive real values.

'unbounded'

The density support applies to all real values.

Width — Kernel smoother window width numeric matrix This property is read-only. Kernel smoother window width, specified as a numeric matrix. Width is a K-by-P matrix, where K is the number of classes in the data, and P is the number of predictors (size(X,2)). Width(k,j) is the kernel smoother window width for the kernel smoothing density of predictor j within class k. NaNs in column j indicate that fitcnb did not fit predictor j using a kernel density. Response Properties ClassNames — Unique class names categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Unique class names used in the training model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as Y, and has K elements (or rows) for character arrays. (The software treats string arrays as cell arrays of character vectors.) Data Types: categorical | char | string | logical | double | cell ResponseName — Response variable name character vector This property is read-only. Response variable name, specified as a character vector. Data Types: char | string Y — Class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Class labels used to train the naive Bayes classifier, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Each row of Y represents the observed classification of the corresponding row of X. Y has the same data type as the data in Y used for training the model. (The software treats string arrays as cell arrays of character vectors.) Data Types: single | double | logical | char | string | cell | categorical 35-573

35

Functions

Training Properties ModelParameters — Parameter values used to train model object This property is read-only. Parameter values used to train the ClassificationNaiveBayes model, specified as an object. ModelParameters contains parameter values such as the name-value pair argument values used to train the naive Bayes classifier. Access the properties of ModelParameters by using dot notation. For example, access the kernel support using Mdl.ModelParameters.Support. NumObservations — Number of training observations numeric scalar This property is read-only. Number of training observations in the training data stored in X and Y, specified as a numeric scalar. Prior — Prior probabilities numeric vector Prior probabilities, specified as a numeric vector. The order of the elements in Prior corresponds to the elements of Mdl.ClassNames. fitcnb normalizes the prior probabilities you set using the 'Prior' name-value pair argument, so that sum(Prior) = 1. The value of Prior does not affect the best-fitting model. Therefore, you can reset Prior after training Mdl using dot notation. Example: Mdl.Prior = [0.2 0.8] Data Types: double | single W — Observation weights vector of nonnegative values This property is read-only. Observation weights, specified as a vector of nonnegative values with the same number of rows as Y. Each entry in W specifies the relative importance of the corresponding observation in Y. fitcnb normalizes the value you set for the 'Weights' name-value pair argument, so that the weights within a particular class sum to the prior probability for that class. Classifier Properties Cost — Misclassification cost square matrix Misclassification cost, specified as a numeric square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i. The rows correspond to the true class and the columns correspond to the predicted class. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. 35-574

ClassificationNaiveBayes

The misclassification cost matrix must have zeros on the diagonal. The value of Cost does not influence training. You can reset Cost after training Mdl using dot notation. Example: Mdl.Cost = [0 0.5 ; 1 0] Data Types: double | single HyperparameterOptimizationResults — Cross-validation optimization of hyperparameters BayesianOptimization object | table This property is read-only. Cross-validation optimization of hyperparameters, specified as a BayesianOptimization object or a table of hyperparameters and associated values. This property is nonempty if the 'OptimizeHyperparameters' name-value pair argument is nonempty when you create the model. The value of HyperparameterOptimizationResults depends on the setting of the Optimizer field in the HyperparameterOptimizationOptions structure when you create the model. Value of Optimizer Field

Value of HyperparameterOptimizationResults

'bayesopt' (default)

Object of class BayesianOptimization

'gridsearch' or 'randomsearch'

Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

ScoreTransform — Classification score transformation 'none' (default) | 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | function handle | ... Classification score transformation, specified as a character vector or function handle. This table summarizes the available character vectors. Value

Description

"doublelogit"

1/(1 + e–2x)

"invlogit"

log(x / (1 – x))

"ismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

"logit"

1/(1 + e–x)

"none" or "identity"

x (no transformation)

"sign"

–1 for x < 0 0 for x = 0 1 for x > 0

"symmetric"

2x – 1

"symmetricismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

"symmetriclogit"

2/(1 + e–x) – 1

35-575

35

Functions

For a MATLAB function or a function you define, use its function handle for the score transformation. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Example: Mdl.ScoreTransform = 'logit' Data Types: char | string | function handle

Object Functions compact compareHoldout crossval edge incrementalLearner lime logp loss margin partialDependence plotPartialDependence predict resubEdge resubLoss resubMargin resubPredict shapley testckfold

Reduce size of machine learning model Compare accuracies of two classification models using new data Cross-validate machine learning model Classification edge for naive Bayes classifier Convert naive Bayes classification model to incremental learner Local interpretable model-agnostic explanations (LIME) Log unconditional probability density for naive Bayes classifier Classification loss for naive Bayes classifier Classification margins for naive Bayes classifier Compute partial dependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Classify observations using naive Bayes classifier Resubstitution classification edge Resubstitution classification loss Resubstitution classification margin Classify training data using trained classifier Shapley values Compare accuracies of two classification models by repeated crossvalidation

Examples Train Naive Bayes Classifier Create a naive Bayes classifier for Fisher's iris data set. Then, specify prior probabilities after training the classifier. Load the fisheriris data set. Create X as a numeric matrix that contains four petal measurements for 150 irises. Create Y as a cell array of character vectors that contains the corresponding iris species. load fisheriris X = meas; Y = species;

Train a naive Bayes classifier using the predictors X and class labels Y. fitcnb assumes each predictor is independent and fits each predictor using a normal distribution by default. Mdl = fitcnb(X,Y) Mdl = ClassificationNaiveBayes ResponseName: 'Y'

35-576

ClassificationNaiveBayes

CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: DistributionNames: DistributionParameters:

[] {'setosa' 'versicolor' 'virginica'} 'none' 150 {'normal' 'normal' 'normal' 'normal'} {3x4 cell}

Mdl is a trained ClassificationNaiveBayes classifier. Some of the Mdl properties appear in the Command Window. Display the properties of Mdl using dot notation. For example, display the class names and prior probabilities. Mdl.ClassNames ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } Mdl.Prior ans = 1×3 0.3333

0.3333

0.3333

The order of the class prior probabilities in Mdl.Prior corresponds to the order of the classes in Mdl.ClassNames. By default, the prior probabilities are the respective relative frequencies of the classes in the data. Alternatively, you can set the prior probabilities when calling fitcnb by using the 'Prior' name-value pair argument. Set the prior probabilities after training the classifier by using dot notation. For example, set the prior probabilities to 0.5, 0.2, and 0.3, respectively. Mdl.Prior = [0.5 0.2 0.3];

You can now use this trained classifier to perform additional tasks. For example, you can label new measurements using predict or cross-validate the classifier using crossval.

Train and Cross-Validate Naive Bayes Classifier Train and cross-validate a naive Bayes classifier. fitcnb implements 10-fold cross-validation by default. Then, estimate the cross-validated classification error. Load the ionosphere data set. Remove the first two predictors for stability. load ionosphere X = X(:,3:end); rng('default') % for reproducibility

35-577

35

Functions

Train and cross-validate a naive Bayes classifier using the predictors X and class labels Y. A recommended practice is to specify the class names. fitcnb assumes that each predictor is conditionally and normally distributed. CVMdl = fitcnb(X,Y,'ClassNames',{'b','g'},'CrossVal','on') CVMdl = ClassificationPartitionedModel CrossValidatedModel: 'NaiveBayes' PredictorNames: {'x1' 'x2' 'x3' ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none'

'x4'

'x5'

'x6'

'x7'

'x8'

'x9'

'x10'

CVMdl is a ClassificationPartitionedModel cross-validated, naive Bayes classifier. Alternatively, you can cross-validate a trained ClassificationNaiveBayes model by passing it to crossval. Display the first training fold of CVMdl using dot notation. CVMdl.Trained{1} ans = CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell}

Each fold is a CompactClassificationNaiveBayes model trained on 90% of the data. Full and compact naive Bayes models are not used for predicting on new data. Instead, use them to estimate the generalization error by passing CVMdl to kfoldLoss. genError = kfoldLoss(CVMdl) genError = 0.1852

On average, the generalization error is approximately 19%. You can specify a different conditional distribution for the predictors, or tune the conditional distribution parameters to reduce the generalization error.

35-578

'x11'

'x1

ClassificationNaiveBayes

More About Bag-of-Tokens Model In the bag-of-tokens model, the value of predictor j is the nonnegative number of occurrences of token j in the observation. The number of categories (bins) in the multinomial model is the number of distinct tokens (number of predictors). Naive Bayes Naive Bayes is a classification algorithm that applies density estimation to the data. The algorithm leverages Bayes theorem, and (naively) assumes that the predictors are conditionally independent, given the class. Although the assumption is usually violated in practice, naive Bayes classifiers tend to yield posterior distributions that are robust to biased class density estimates, particularly where the posterior is 0.5 (the decision boundary) [1]. Naive Bayes classifiers assign observations to the most probable class (in other words, the maximum a posteriori decision rule). Explicitly, the algorithm takes these steps: 1

Estimate the densities of the predictors within each class.

2

Model posterior probabilities according to Bayes rule. That is, for all k = 1,...,K, π Y=k P Y = k X1, .., XP =

∑

P

∏

j=1 K

k=1

π Y=k

P Xj Y = k P

∏

j=1

, P Xj Y = k

where: • Y is the random variable corresponding to the class index of an observation. • X1,...,XP are the random predictors of an observation. • π Y = k is the prior probability that a class index is k. 3

Classify an observation by estimating the posterior probability for each class, and then assign the observation to the class yielding the maximum posterior probability.

If the predictors compose a multinomial distribution, then the posterior probability P Y = k X1, .., XP ∝ π Y = k Pmn X1, ..., XP Y = k , where Pmn X1, ..., XP Y = k is the probability mass function of a multinomial distribution.

Algorithms Normal Distribution Estimators If predictor variable j has a conditional normal distribution (see the DistributionNames property), the software fits the distribution to the data by computing the class-specific weighted mean and the unbiased estimate of the weighted standard deviation. For each class k: • The weighted mean of predictor j is 35-579

35

Functions

∑

xj

k

=

i: yi = k

wixi j

∑

i: yi = k

wi

,

where wi is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class. • The unbiased estimator of the weighted standard deviation of predictor j is

∑

sj

k

=

i: yi = k

z1

wi xi j − x j

k

2 1/2

z2 k k− z 1 k

,

where z1|k is the sum of the weights within class k and z2|k is the sum of the squared weights within class k. Estimated Probability for Multinomial Distribution If all predictor variables compose a conditional multinomial distribution (see the DistributionNames property), the software fits the distribution using the “Bag-of-Tokens Model” on page 35-579. The software stores the probability that token j appears in class k in the property DistributionParameters{k,j}. With additive smoothing [2], the estimated probability is P(token j class k) =

1 + cj k , P + ck

where:

∑

• cj

k

= nk

i: yi = k

∑

i: yi = k

xi jwi wi

, which is the weighted number of occurrences of token j in class k.

• nk is the number of observations in class k. • wi is the weight for observation i. The software normalizes weights within a class so that they sum to the prior probability for that class. •

ck =

P

∑

j=1

c j k, which is the total weighted number of occurrences of all tokens in class k.

Estimated Probability for Multivariate Multinomial Distribution If predictor variable j has a conditional multivariate multinomial distribution (see the DistributionNames property), the software follows this procedure:

35-580

1

The software collects a list of the unique levels, stores the sorted list in CategoricalLevels, and considers each level a bin. Each combination of predictor and class is a separate, independent multinomial random variable.

2

For each class k, the software counts instances of each categorical level using the list stored in CategoricalLevels{j}.

ClassificationNaiveBayes

3

The software stores the probability that predictor j in class k has level L in the property DistributionParameters{k,j}, for all levels in CategoricalLevels{j}. With additive smoothing [2], the estimated probability is P predictor j = L class k =

1 + m j k(L) , m j + mk

where:

∑

• m j k(L) = nk

i: yi = k

I xi j = L wi

∑

i: yi = k

wi

, which is the weighted number of observations for which

predictor j equals L in class k. • nk is the number of observations in class k. • I xi j = L = 1 if xij = L, and 0 otherwise. • wi is the weight for observation i. The software normalizes weights within a class so that they sum to the prior probability for that class. • mj is the number of distinct levels in predictor j. • mk is the weighted number of observations in class k.

Version History Introduced in R2014b R2023b: Naive Bayes models support standardization of kernel-distributed predictors fitcnb supports the standardization of predictors with kernel distributions. That is, you can specify the Standardize name-value argument as true when the DistributionNames name-value argument includes at least one "kernel" distribution. Naive Bayes models include Mu and Sigma properties that contain the means and standard deviations, respectively, used to standardize the predictors before training. The properties are empty when fitcnb does not perform any standardization.

References [1] Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer Series in Statistics. New York, NY: Springer, 2009. https://doi.org/10.1007/978-0-387-84858-7. [2] Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval, NY: Cambridge University Press, 2008.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: 35-581

35

Functions

• The predict function supports code generation. • When you train a naive Bayes model by using fitcnb, the following restrictions apply. • The value of the 'DistributionNames' name-value pair argument cannot contain 'mn'. • The value of the 'ScoreTransform' name-value pair argument cannot be an anonymous function. For more information, see “Introduction to Code Generation” on page 34-3.

See Also CompactClassificationNaiveBayes | loss | predict | fitcnb Topics “Naive Bayes Classification” on page 22-2 “Grouping Variables” on page 2-11 “Incremental Learning with Naive Bayes and Heterogeneous Data” on page 28-60

35-582

ClassificationNaiveBayes Predict

ClassificationNaiveBayes Predict Classify observations using naive Bayes model Libraries: Statistics and Machine Learning Toolbox / Classification

Description The ClassificationNaiveBayes Predict block classifies observations using a naive Bayes classification object (ClassificationNaiveBayes) for multiclass classification. Import a trained classification object into the block by specifying the name of a workspace variable that contains the object. The input port x receives an observation (predictor data), and the output port label returns a predicted class label for the observation. The optional output port score returns the predicted class scores or posterior probabilities. The optional output port cost returns the expected classification costs.

Ports Input x — Predictor data row vector | column vector Predictor data, specified as a row or column vector of one observation. The variables in x must have the same order as the predictor variables that trained the model specified by Select trained machine learning model. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point Output label — Predicted class label scalar Predicted class label, returned as a scalar. The predicted class is the class that minimizes the expected classification cost. For more details, see the “More About” on page 35-6387 section of the predict object function. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point | enumerated score — Predicted class scores or posterior probabilities row vector 35-583

35

Functions

Predicted class scores or posterior probabilities, returned as a row vector of size 1-by-k, where k is the number of classes in the naive Bayes model. The classification score Score(i) represents the posterior probability that the observation in x belongs to class i. To check the order of the classes, use the ClassNames property of the naive Bayes model specified by Select trained machine learning model. Dependencies

To enable this port, select the check box for Add output port for predicted class scores on the Main tab of the Block Parameters dialog box. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point cost — Expected classification costs row vector Expected classification costs, returned as a row vector of size 1-by-k, where k is the number of classes in the naive Bayes model. The classification cost Cost(i) represents the cost of classifying the observation in x to class i. To check the order of the classes, use the ClassNames property of the naive Bayes model specified by Select trained machine learning model. Dependencies

To enable this port, select the check box for Add output port for expected classification costs on the Main tab of the Block Parameters dialog box. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point

Parameters Main Select trained machine learning model — Naive Bayes classification model nbMdl (default) | ClassificationNaiveBayes object Specify the name of a workspace variable that contains a ClassificationNaiveBayes object. When you train the model by using fitcnb, the following restrictions apply: • The predictor data cannot include categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. • The value of the ScoreTransform name-value argument cannot be "invlogit" or an anonymous function. • The value of the DistributionNames name-value argument cannot be "mn". 35-584

ClassificationNaiveBayes Predict

Programmatic Use

Block Parameter: TrainedLearner Type: workspace variable Values: ClassificationNaiveBayes object Default: "nbMdl" Add output port for predicted class scores — Add optional output port for predicted class scores off (default) | on Select the check box to include the output port score in the ClassificationNaiveBayes Predict block. Programmatic Use

Block Parameter: ShowOutputScore Type: character vector Values: "off" | "on" Default: "off" Add output port for expected classification costs — Add optional output port for expected classification costs off (default) | on Select the check box to include the output port cost in the ClassificationNaiveBayes Predict block. Programmatic Use

Block Parameter: ShowOutputCost Type: character vector Values: "off" | "on" Default: "off" Data Types Fixed-Point Operational Parameters

Integer rounding mode — Rounding mode for fixed-point operations Floor (default) | Ceiling | Convergent | Nearest | Round | Simplest | Zero Specify the rounding mode for fixed-point operations. For more information, see “Rounding” (FixedPoint Designer). Block parameters always round to the nearest representable value. To control the rounding of a block parameter, enter an expression into the mask field using a MATLAB rounding function. Programmatic Use

Block Parameter: RndMeth Type: character vector Values: "Ceiling" | "Convergent" | "Floor" | "Nearest" | "Round" | "Simplest" | "Zero" Default: "Floor" Saturate on integer overflow — Method of overflow action off (default) | on 35-585

35

Functions

Specify whether overflows saturate or wrap. Action

Rationale

Impact on Overflows

Example

Select this check box (on).

Your model has possible overflow, and you want explicit saturation protection in the generated code.

Overflows saturate to either the minimum or maximum value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of – 128.

Clear this check box (off).

You want to optimize the efficiency of your generated code.

Overflows wrap to the appropriate value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the value causing the overflow as int8, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as int8 is –126.

You want to avoid overspecifying how a block handles out-of-range signals. For more information, see “Troubleshoot Signal Range Errors” (Simulink).

Programmatic Use

Block Parameter: SaturateOnIntegerOverflow Type: character vector Values: "off" | "on" Default: "off" Lock output data type setting against changes by the fixed-point tools — Prevention of fixedpoint tools from overriding data type off (default) | on Select this parameter to prevent the fixed-point tools from overriding the data type you specify for the block. For more information, see “Use Lock Output Data Type Setting” (Fixed-Point Designer). Programmatic Use

Block Parameter: LockScale Type: character vector Values: "off" | "on" Default: "off"

35-586

ClassificationNaiveBayes Predict

Data Type

Label data type — Data type of label output Inherit: Inherit via back propagation | Inherit: auto | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the label output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. The supported data types depend on the labels used in the model specified by Select trained machine learning model. • If the model uses numeric or logical labels, the supported data types are Inherit: Inherit via back propagation (default), double, single, half, int8, uint8, int16, uint16, int32, uint32, int64, uint64, boolean, fixed point, and a data type object. • If the model uses nonnumeric labels, the supported data types are Inherit: auto (default), Enum: , and a data type object. When you select an inherited option, the software behaves as follows: • Inherit: Inherit via back propagation (default for numeric and logical labels) — Simulink automatically determines the Label data type of the block during data type propagation (see “Data Type Propagation” (Simulink)). In this case, the block uses the data type of a downstream block or signal object. • Inherit: auto (default for nonnumeric labels) — The block uses an autodefined enumerated data type variable. For example, suppose the workspace variable name specified by Select trained machine learning model is myMdl, and the class labels are class 1 and class 2. Then, the corresponding label values are myMdl_enumLabels.class_1 and myMdl_enumLabels.class_2. The block converts the class labels to valid MATLAB identifiers by using the matlab.lang.makeValidName function. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: LabelDataTypeStr Type: character vector Values: "Inherit: Inherit via back propagation" | "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: Inherit via back propagation" (for numeric and logical labels) | "Inherit: auto" (for nonnumeric labels) Label data type Minimum — Minimum value of label output for range checking [] (default) | scalar Specify the lower value of the label output range that Simulink checks. 35-587

35

Functions

Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Label data type Minimum parameter does not saturate or clip the actual label output signal. To do so, use the Saturation block instead. Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses numeric labels. Programmatic Use

Block Parameter: LabelOutMin Type: character vector Values: "[]" | scalar Default: "[]" Label data type Maximum — Maximum value of label output for range checking [] (default) | scalar Specify the upper value of the label output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Label data type Maximum parameter does not saturate or clip the actual label output signal. To do so, use the Saturation block instead. Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses numeric labels. 35-588

ClassificationNaiveBayes Predict

Programmatic Use

Block Parameter: LabelOutMax Type: character vector Values: "[]" | scalar Default: "[]" Score data type — Data type of score output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the score output. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: ScoreDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Score data type Minimum — Minimum value of score output for range checking [] (default) | scalar Specify the lower value of the score output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Score data type Minimum parameter does not saturate or clip the actual score output. To do so, use the Saturation block instead. 35-589

35

Functions

Programmatic Use

Block Parameter: ScoreOutMin Type: character vector Values: "[]" | scalar Default: "[]" Score data type Maximum — Maximum value of score output for range checking [] (default) | scalar Specify the upper value of the score output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Score data type Maximum parameter does not saturate or clip the actual score output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: ScoreOutMax Type: character vector Values: "[]" | scalar Default: "[]" Raw score data type — Untransformed score data type Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the internal untransformed scores. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink).

35-590

ClassificationNaiveBayes Predict

Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses a score transformation other than "none" (default, same as "identity"). • If the model uses no score transformations ("none" or "identity"), then you can specify the score data type by using Score data type. • If the model uses a score transformation other than "none" or "identity", then you can specify the data type of untransformed raw scores by using this parameter. To specify the data type of transformed scores, use Score data type. You can change the score transformation option by specifying the ScoreTransform name-value argument during training, or by modifying the ScoreTransform property after training. Programmatic Use

Block Parameter: RawScoreDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Raw score data type Minimum — Minimum untransformed score for range checking [] (default) | scalar Specify the lower value of the untransformed score range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Raw score data type Minimum parameter does not saturate or clip the actual untransformed score signal. Programmatic Use

Block Parameter: RawScoreOutMin Type: character vector Values: "[]" | scalar Default: "[]" Raw score data type Maximum — Maximum untransformed score for range checking [] (default) | scalar Specify the upper value of the untransformed score range that Simulink checks. 35-591

35

Functions

Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Raw score data type Maximum parameter does not saturate or clip the actual untransformed score signal. Programmatic Use

Block Parameter: RawScoreOutMax Type: character vector Values: "[]" | scalar Default: "[]" Estimated cost data type — Data type of cost output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the cost output. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: CostDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Estimated cost data type Minimum — Minimum value of cost output for range checking [] (default) | scalar Specify the lower value of the cost output range that Simulink checks. 35-592

ClassificationNaiveBayes Predict

Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Estimated cost data type Minimum parameter does not saturate or clip the actual cost signal. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: CostOutMin Type: character vector Values: "[]" | scalar Default: "[]" Estimated cost data type Maximum — Maximum value of cost output for range checking [] (default) | scalar Specify the upper value of the cost output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Estimated cost data type Maximum parameter does not saturate or clip the actual cost signal. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: CostOutMax Type: character vector Values: "[]" | scalar Default: "[]"

35-593

35

Functions

Likelihood data type — Data type of likelihood output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the likelihood output. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: LikelihoodDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Likelihood data type Minimum — Minimum value of likelihood output for range checking [] (default) | scalar Specify the lower value of the likelihood output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Likelihood data type Minimum parameter does not saturate or clip the actual likelihood values. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: LikelihoodOutMin Type: character vector Values: "[]" | scalar 35-594

ClassificationNaiveBayes Predict

Default: "[]" Likelihood data type Maximum — Maximum value of likelihood output for range checking [] (default) | scalar Specify the upper value of the likelihood output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Likelihood data type Maximum parameter does not saturate or clip the actual likelihood values. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: LikelihoodOutMax Type: character vector Values: "[]" | scalar Default: "[]" Posterior data type — Data type of posterior probability output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for posterior probabilities. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: PosteriorDataTypeStr Type: character vector

35-595

35

Functions

Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Posterior data type Minimum — Minimum value of posterior probability output for range checking [] (default) | scalar Specify the lower value of the posterior probability output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Posterior data type Minimum parameter does not saturate or clip the actual posterior probabilities. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: PosteriorOutMin Type: character vector Values: "[]" | scalar Default: "[]" Posterior data type Maximum — Maximum value of posterior probability output for range checking [] (default) | scalar Specify the upper value of the posterior probability output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). 35-596

ClassificationNaiveBayes Predict

Note The Posterior data type Maximum parameter does not saturate or clip the actual posterior probabilities. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: PosteriorOutMax Type: character vector Values: "[]" | scalar Default: "[]"

Block Characteristics Data Types

Boolean | double | enumerated | fixed point | half | integer | single

Direct Feedthrough

yes

Multidimensional Signals

no

Variable-Size Signals

no

Zero-Crossing Detection

no

Alternative Functionality You can use a MATLAB Function (Simulink) block with the predict object function of a naive Bayes classification object (ClassificationNaiveBayes). For an example, see “Predict Class Labels Using MATLAB Function Block” on page 34-49. When deciding whether to use the ClassificationNaiveBayes Predict block in the Statistics and Machine Learning Toolbox library or a MATLAB Function block with the predict function, consider the following: • If you use the Statistics and Machine Learning Toolbox library block, you can use the Fixed-Point Tool to convert a floating-point model to fixed point. • Support for variable-size arrays must be enabled for a MATLAB Function block with the predict function. • If you use a MATLAB Function block, you can use MATLAB functions for preprocessing or postprocessing before or after predictions in the same MATLAB Function block.

Version History Introduced in R2023b

Extended Capabilities C/C++ Code Generation Generate C and C++ code using Simulink® Coder™. Fixed-Point Conversion Design and simulate fixed-point systems using Fixed-Point Designer™. 35-597

35

Functions

See Also Blocks ClassificationSVM Predict | ClassificationTree Predict | ClassificationEnsemble Predict | ClassificationNeuralNetwork Predict | ClassificationKNN Predict | ClassificationECOC Predict Functions predict | fitcnb Objects ClassificationNaiveBayes Topics “Predict Class Labels Using MATLAB Function Block” on page 34-49 “Predict Class Labels Using ClassificationSVM Predict Block” on page 34-121 “Predict Class Labels Using ClassificationEnsemble Predict Block” on page 34-140 “Predict Class Labels Using ClassificationTree Predict Block” on page 34-131 “Predict Class Labels Using ClassificationNeuralNetwork Predict Block” on page 34-154 “Predict Class Labels Using ClassificationECOC Predict Block” on page 34-182 “Predict Class Labels Using ClassificationKNN Predict Block” on page 34-168

35-598

ClassificationNeuralNetwork

ClassificationNeuralNetwork Neural network model for classification

Description A ClassificationNeuralNetwork object is a trained, feedforward, and fully connected neural network for classification. The first fully connected layer of the neural network has a connection from the network input (predictor data X), and each subsequent layer has a connection from the previous layer. Each fully connected layer multiplies the input by a weight matrix (LayerWeights) and then adds a bias vector (LayerBiases). An activation function follows each fully connected layer (Activations and OutputLayerActivation). The final fully connected layer and the subsequent softmax activation function produce the network's output, namely classification scores (posterior probabilities) and predicted labels. For more information, see “Neural Network Structure” on page 35-2456.

Creation Create a ClassificationNeuralNetwork object by using fitcnet.

Properties Neural Network Properties LayerSizes — Sizes of fully connected layers positive integer vector This property is read-only. Sizes of the fully connected layers in the neural network model, returned as a positive integer vector. The ith element of LayerSizes is the number of outputs in the ith fully connected layer of the neural network model. LayerSizes does not include the size of the final fully connected layer. This layer always has K outputs, where K is the number of classes in Y. Data Types: single | double LayerWeights — Learned layer weights cell array This property is read-only. Learned layer weights for the fully connected layers, returned as a cell array. The ith entry in the cell array corresponds to the layer weights for the ith fully connected layer. For example, Mdl.LayerWeights{1} returns the weights for the first fully connected layer of the model Mdl. LayerWeights includes the weights for the final fully connected layer. Data Types: cell 35-599

35

Functions

LayerBiases — Learned layer biases cell array This property is read-only. Learned layer biases for the fully connected layers, returned as a cell array. The ith entry in the cell array corresponds to the layer biases for the ith fully connected layer. For example, Mdl.LayerBiases{1} returns the biases for the first fully connected layer of the model Mdl. LayerBiases includes the biases for the final fully connected layer. Data Types: cell Activations — Activation functions for fully connected layers 'relu' | 'tanh' | 'sigmoid' | 'none' | cell array of character vectors This property is read-only. Activation functions for the fully connected layers of the neural network model, returned as a character vector or cell array of character vectors with values from this table. Value

Description

'relu'

Rectified linear unit (ReLU) function — Performs a threshold operation on each element of the input, where any value less than zero is set to zero, that is, f x =

x, x ≥ 0 0, x < 0

'tanh'

Hyperbolic tangent (tanh) function — Applies the tanh function to each input element

'sigmoid'

Sigmoid function — Performs the following operation on each input element: f (x) =

'none'

1 1 + e−x

Identity function — Returns each input element without performing any transformation, that is, f(x) = x

• If Activations contains only one activation function, then it is the activation function for every fully connected layer of the neural network model, excluding the final fully connected layer. The activation function for the final fully connected layer is always softmax (OutputLayerActivation). • If Activations is an array of activation functions, then the ith element is the activation function for the ith layer of the neural network model. Data Types: char | cell OutputLayerActivation — Activation function for final fully connected layer 'softmax' This property is read-only. 35-600

ClassificationNeuralNetwork

Activation function for the final fully connected layer, returned as 'softmax'. The function takes each input xi and returns the following, where K is the number of classes in the response variable: f (xi) =

exp(xi) K

∑

j=1

.

exp(x j)

The results correspond to the predicted classification scores (or posterior probabilities). ModelParameters — Parameter values used to train model NeuralNetworkParams object This property is read-only. Parameter values used to train the ClassificationNeuralNetwork model, returned as a NeuralNetworkParams object. ModelParameters contains parameter values such as the namevalue arguments used to train the neural network classifier. Access the properties of ModelParameters by using dot notation. For example, access the function used to initialize the fully connected layer weights of a model Mdl by using Mdl.ModelParameters.LayerWeightsInitializer. Convergence Control Properties ConvergenceInfo — Convergence information structure array This property is read-only. Convergence information, returned as a structure array. Field

Description

Iterations

Number of training iterations used to train the neural network model

TrainingLoss

Training cross-entropy loss for the returned model, or resubLoss(Mdl,'LossFun','crossentropy' ) for model Mdl

Gradient

Gradient of the loss function with respect to the weights and biases at the iteration corresponding to the returned model

Step

Step size at the iteration corresponding to the returned model

Time

Total time spent across all iterations (in seconds)

ValidationLoss

Validation cross-entropy loss for the returned model

ValidationChecks

Maximum number of times in a row that the validation loss was greater than or equal to the minimum validation loss

ConvergenceCriterion

Criterion for convergence

35-601

35

Functions

Field

Description

History

See TrainingHistory

Data Types: struct TrainingHistory — Training history table This property is read-only. Training history, returned as a table. Column

Description

Iteration

Training iteration

TrainingLoss

Training cross-entropy loss for the model at this iteration

Gradient

Gradient of the loss function with respect to the weights and biases at this iteration

Step

Step size at this iteration

Time

Time spent during this iteration (in seconds)

ValidationLoss

Validation cross-entropy loss for the model at this iteration

ValidationChecks

Running total of times that the validation loss is greater than or equal to the minimum validation loss

Data Types: table Solver — Solver used to train neural network model 'LBFGS' This property is read-only. Solver used to train the neural network model, returned as 'LBFGS'. To create a ClassificationNeuralNetwork model, fitcnet uses a limited-memory Broyden-FletcherGoldfarb-Shanno quasi-Newton algorithm (LBFGS) as its loss function minimization technique, where the software minimizes the cross-entropy loss. Predictor Properties PredictorNames — Predictor variable names cell array of character vectors This property is read-only. Predictor variable names, returned as a cell array of character vectors. The order of the elements of PredictorNames corresponds to the order in which the predictor names appear in the training data. Data Types: cell CategoricalPredictors — Categorical predictor indices vector of positive integers | [] 35-602

ClassificationNeuralNetwork

This property is read-only. Categorical predictor indices, returned as a vector of positive integers. Assuming that the predictor data contains observations in rows, CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]). Data Types: double ExpandedPredictorNames — Expanded predictor names cell array of character vectors This property is read-only. Expanded predictor names, returned as a cell array of character vectors. If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. Data Types: cell Mu — Predictor means numeric vector | [] This property is read-only. Predictor means, returned as a numeric vector. If you set Standardize to 1 or true when you train the neural network model, then the length of the Mu vector is equal to the number of expanded predictors (see ExpandedPredictorNames). The vector contains 0 values for dummy variables corresponding to expanded categorical predictors. If you set Standardize to 0 or false when you train the neural network model, then the Mu value is an empty vector ([]). Data Types: double Sigma — Predictor standard deviations numeric vector | [] This property is read-only. Predictor standard deviations, returned as a numeric vector. If you set Standardize to 1 or true when you train the neural network model, then the length of the Sigma vector is equal to the number of expanded predictors (see ExpandedPredictorNames). The vector contains 1 values for dummy variables corresponding to expanded categorical predictors. If you set Standardize to 0 or false when you train the neural network model, then the Sigma value is an empty vector ([]). Data Types: double X — Unstandardized predictors numeric matrix | table This property is read-only.

35-603

35

Functions

Unstandardized predictors used to train the neural network model, returned as a numeric matrix or table. X retains its original orientation, with observations in rows or columns depending on the value of the ObservationsIn name-value argument in the call to fitcnet. Data Types: single | double | table Response Properties ClassNames — Unique class names numeric vector | categorical vector | logical vector | character array | cell array of character vectors This property is read-only. Unique class names used in training, returned as a numeric vector, categorical vector, logical vector, character array, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: single | double | categorical | logical | char | cell ResponseName — Response variable name character vector This property is read-only. Response variable name, returned as a character vector. Data Types: char Y — Class labels numeric vector | categorical vector | logical vector | character array | cell array of character vectors This property is read-only. Class labels used to train the model, returned as a numeric vector, categorical vector, logical vector, character array, or cell array of character vectors. Y has the same data type as the response variable used to train the model. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the classification of the corresponding observation in X. Data Types: single | double | categorical | logical | char | cell Other Data Properties HyperparameterOptimizationResults — Cross-validation optimization of hyperparameters BayesianOptimization object | table This property is read-only. Cross-validation optimization of hyperparameters, specified as a BayesianOptimization object or a table of hyperparameters and associated values. This property is nonempty if the 'OptimizeHyperparameters' name-value pair argument is nonempty when you create the model. The value of HyperparameterOptimizationResults depends on the setting of the Optimizer field in the HyperparameterOptimizationOptions structure when you create the model.

35-604

ClassificationNeuralNetwork

Value of Optimizer Field

Value of HyperparameterOptimizationResults

'bayesopt' (default)

Object of class BayesianOptimization

'gridsearch' or 'randomsearch'

Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

NumObservations — Number of observations positive numeric scalar This property is read-only. Number of observations in the training data stored in X and Y, returned as a positive numeric scalar. Data Types: double RowsUsed — Observations of original training data stored logical vector | [] This property is read-only. Observations of the original training data stored in the model, returned as a logical vector. This property is empty if all observations are stored in X and Y. Data Types: logical W — Observation weights numeric vector This property is read-only. Observation weights used to train the model, returned as an n-by-1 numeric vector. n is the number of observations (NumObservations). The software normalizes the observation weights specified in the Weights name-value argument so that the elements of W within a particular class sum up to the prior probability of that class. Data Types: single | double Other Classification Properties Cost — Misclassification cost numeric square matrix Misclassification cost, returned as a numeric square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i. The cost matrix always has this form: Cost(i,j) = 1 if i ~= j, and Cost(i,j) = 0 if i = j. The rows correspond to the true class and the columns correspond to the predicted class. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The software uses the Cost value for prediction, but not training. You can change the Cost property value of the trained model by using dot notation. Data Types: double 35-605

35

Functions

Prior — Prior class probabilities numeric vector This property is read-only. Prior class probabilities, returned as a numeric vector. The order of the elements of Prior corresponds to the elements of ClassNames. Data Types: double ScoreTransform — Score transformation character vector | function handle Score transformation, specified as a character vector or function handle. ScoreTransform represents a built-in transformation function or a function handle for transforming predicted classification scores. To change the score transformation function to function, for example, use dot notation. • For a built-in function, enter a character vector. Mdl.ScoreTransform = 'function';

This table describes the available built-in functions. Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Data Types: char | function_handle

35-606

ClassificationNeuralNetwork

Object Functions Create CompactClassificationNeuralNetwork compact

Reduce size of machine learning model

Create ClassificationPartitionedModel crossval

Cross-validate machine learning model

Interpret Prediction lime partialDependence plotPartialDependence shapley

Local interpretable model-agnostic explanations (LIME) Compute partial dependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Shapley values

Assess Predictive Performance on New Observations edge loss margin predict

Classification edge for neural network classifier Classification loss for neural network classifier Classification margins for neural network classifier Classify observations using neural network classifier

Assess Predictive Performance on Training Data resubEdge resubLoss resubMargin resubPredict

Resubstitution classification edge Resubstitution classification loss Resubstitution classification margin Classify training data using trained classifier

Compare Accuracies compareHoldout testckfold

Compare accuracies of two classification models using new data Compare accuracies of two classification models by repeated cross-validation

Examples Train Neural Network Classifier Train a neural network classifier, and assess the performance of the classifier on a test set. Read the sample file CreditRating_Historical.dat into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency. Preview the first few rows of the data set. creditrating = readtable("CreditRating_Historical.dat"); head(creditrating) ID _____

WC_TA ______

RE_TA ______

EBIT_TA _______

MVE_BVTD ________

S_TA _____

Industry ________

Rating _______

35-607

35

Functions

62394 48608 42444 48631 43768 39255 62236 39354

0.013 0.232 0.311 0.194 0.121 -0.117 0.087 0.005

0.104 0.335 0.367 0.263 0.413 -0.799 0.158 0.181

0.036 0.062 0.074 0.062 0.057 0.01 0.049 0.034

0.447 1.969 1.935 1.017 3.647 0.179 0.816 2.597

0.142 0.281 0.366 0.228 0.466 0.082 0.324 0.388

3 8 1 4 12 4 2 7

{'BB' } {'A' } {'A' } {'BBB'} {'AAA'} {'CCC'} {'BBB'} {'AA' }

Because each value in the ID variable is a unique customer ID, that is, length(unique(creditrating.ID)) is equal to the number of observations in creditrating, the ID variable is a poor predictor. Remove the ID variable from the table, and convert the Industry variable to a categorical variable. creditrating = removevars(creditrating,"ID"); creditrating.Industry = categorical(creditrating.Industry);

Convert the Rating response variable to an ordinal categorical variable. creditrating.Rating = categorical(creditrating.Rating, ... ["AAA","AA","A","BBB","BB","B","CCC"],"Ordinal",true);

Partition the data into training and test sets. Use approximately 80% of the observations to train a neural network model, and 20% of the observations to test the performance of the trained model on new data. Use cvpartition to partition the data. rng("default") % For reproducibility of the partition c = cvpartition(creditrating.Rating,"Holdout",0.20); trainingIndices = training(c); % Indices for the training set testIndices = test(c); % Indices for the test set creditTrain = creditrating(trainingIndices,:); creditTest = creditrating(testIndices,:);

Train a neural network classifier by passing the training data creditTrain to the fitcnet function. Mdl = fitcnet(creditTrain,"Rating") Mdl = ClassificationNeuralNetwork PredictorNames: {'WC_TA' 'RE_TA' ResponseName: 'Rating' CategoricalPredictors: 6 ClassNames: [AAA AA A ScoreTransform: 'none' NumObservations: 3146 LayerSizes: 10 Activations: 'relu' OutputLayerActivation: 'softmax' Solver: 'LBFGS' ConvergenceInfo: [1x1 struct] TrainingHistory: [1000x7 table]

'EBIT_TA' BBB

BB

'MVE_BVTD' B

'S_TA'

'Industry'}

CCC]

Mdl is a trained ClassificationNeuralNetwork classifier. You can use dot notation to access the properties of Mdl. For example, you can specify Mdl.TrainingHistory to get more information about the training history of the neural network model. 35-608

ClassificationNeuralNetwork

Evaluate the performance of the classifier on the test set by computing the test set classification error. Visualize the results by using a confusion matrix. testAccuracy = 1 - loss(Mdl,creditTest,"Rating", ... "LossFun","classiferror") testAccuracy = 0.8028 confusionchart(creditTest.Rating,predict(Mdl,creditTest))

Specify Neural Network Classifier Architecture Specify the structure of a neural network classifier, including the size of the fully connected layers. Load the ionosphere data set, which includes radar signal data. X contains the predictor data, and Y is the response variable, whose values represent either good ("g") or bad ("b") radar signals. load ionosphere

Separate the data into training data (XTrain and YTrain) and test data (XTest and YTest) by using a stratified holdout partition. Reserve approximately 30% of the observations for testing, and use the rest of the observations for training. rng("default") % For reproducibility of the partition cvp = cvpartition(Y,"Holdout",0.3);

35-609

35

Functions

XTrain = X(training(cvp),:); YTrain = Y(training(cvp)); XTest = X(test(cvp),:); YTest = Y(test(cvp));

Train a neural network classifier. Specify to have 35 outputs in the first fully connected layer and 20 outputs in the second fully connected layer. By default, both layers use a rectified linear unit (ReLU) activation function. You can change the activation functions for the fully connected layers by using the Activations name-value argument. Mdl = fitcnet(XTrain,YTrain, ... "LayerSizes",[35 20]) Mdl = ClassificationNeuralNetwork ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' NumObservations: 246 LayerSizes: [35 20] Activations: 'relu' OutputLayerActivation: 'softmax' Solver: 'LBFGS' ConvergenceInfo: [1x1 struct] TrainingHistory: [47x7 table]

Access the weights and biases for the fully connected layers of the trained classifier by using the LayerWeights and LayerBiases properties of Mdl. The first two elements of each property correspond to the values for the first two fully connected layers, and the third element corresponds to the values for the final fully connected layer with a softmax activation function for classification. For example, display the weights and biases for the second fully connected layer. Mdl.LayerWeights{2} ans = 20×35 0.0481 -0.9489 -0.1910 -0.0415 1.1848 0.2486 -0.0516 -0.6192 0.5049 1.1109 ⋮

0.2501 -1.8343 0.0246 -0.0059 1.6142 -0.2920 0.0640 -0.7804 -0.1362 -0.0466

Mdl.LayerBiases{2} ans = 20×1 0.6147 0.1891

35-610

-0.1535 0.5510 -0.3511 -0.0753 -0.1352 -0.0004 0.1824 -0.0506 -0.2218 0.4044

-0.0934 -0.5751 0.0097 -0.1477 0.5774 0.2806 -0.0675 -0.4205 0.1637 0.6366

0.0760 -0.8726 0.3160 -0.1621 0.5491 0.2987 -0.2065 -0.2584 -0.1282 0.1863

-0.0579 0.8815 -0.0693 -0.1762 0.0103 -0.2709 -0.0052 -0.2020 -0.1008 0.5660

-0.2465 0.0203 0.2270 0.2164 0.0209 0.1473 -0.1682 -0.0008 0.1445 0.2839

1.0411 -1.6379 -0.0783 0.1710 0.7219 -0.2580 -0.1520 0.0534 0.4527 0.8793

0.3712 2.0315 -0.1626 -0.0610 -0.8643 -0.0499 0.0060 1.0185 -0.4887 -0.5497

-1.2 1.7 -0.3 -0.1 -0.5 -0.0 0.0 -0.0 0.0 0.0

ClassificationNeuralNetwork

-0.2767 -0.2977 1.3655 0.0347 0.1509 -0.4839 -0.3960 0.9248 ⋮

The final fully connected layer has two outputs, one for each class in the response variable. The number of layer outputs corresponds to the first dimension of the layer weights and layer biases. size(Mdl.LayerWeights{end}) ans = 1×2 2

20

size(Mdl.LayerBiases{end}) ans = 1×2 2

1

To estimate the performance of the trained classifier, compute the test set classification error for Mdl. testError = loss(Mdl,XTest,YTest, ... "LossFun","classiferror") testError = 0.0774 accuracy = 1 - testError accuracy = 0.9226

Mdl accurately classifies approximately 92% of the observations in the test set.

Version History Introduced in R2021a R2023b: Model stores observations with missing predictor values Behavior changed in R2023b Starting in R2023b, training observations with missing predictor values are included in the X, Y, and W data properties. The RowsUsed property indicates the training observations stored in the model, rather than those used for training. Observations with missing predictor values continue to be omitted from the model training process. In previous releases, the software omitted training observations that contained missing predictor values from the data properties of the model. 35-611

35

Functions

R2023b: Neural network models include standardization properties Neural network models include Mu and Sigma properties that contain the means and standard deviations, respectively, used to standardize the predictors before training. The properties are empty when the fitting function does not perform any standardization. R2023a: Neural network classifiers support misclassification costs and prior probabilities fitcnet supports misclassification costs and prior probabilities for neural network classifiers. Specify the Cost and Prior name-value arguments when you create a model. Alternatively, you can specify misclassification costs after training a model by using dot notation to change the Cost property value of the model. Mdl.Cost = [0 2; 1 0];

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict object function supports code generation. For more information, see “Introduction to Code Generation” on page 34-3.

See Also fitcnet | predict | loss | margin | edge | ClassificationPartitionedModel | CompactClassificationNeuralNetwork Topics “Assess Neural Network Classifier Performance” on page 19-151

35-612

ClassificationNeuralNetwork Predict

ClassificationNeuralNetwork Predict Classify observations using neural network classification model Libraries: Statistics and Machine Learning Toolbox / Classification

Description The ClassificationNeuralNetwork Predict block classifies observations using a neural network classification object (ClassificationNeuralNetwork or CompactClassificationNeuralNetwork) for multiclass classification. Import a trained classification object into the block by specifying the name of a workspace variable that contains the object. The input port x receives an observation (predictor data), and the output port label returns a predicted class label for the observation. You can add the optional output port score, which returns predicted class scores or posterior probabilities.

Ports Input x — Predictor data row vector | column vector Predictor data, specified as a row or column vector of one observation. The variables in x must have the same order as the predictor variables that trained the model specified by Select trained machine learning model. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point Output label — Predicted class label scalar Predicted class label, returned as a scalar. The predicted class is the class that minimizes the expected classification cost. For more details, see the “More About” on page 35-6398 section of the predict function reference page. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point | enumerated score — Predicted class scores or posterior probabilities row vector 35-613

35

Functions

Predicted class scores or posterior probabilities, returned as a row vector of size 1-by-k, where k is the number of classes in the neural network model. The classification score Score(i) represents the posterior probability that the observation in x belongs to class i. To check the order of the classes, use the ClassNames property of the neural network model specified by Select trained machine learning model. Dependencies

To enable this port, select the check box for Add output port for predicted class scores on the Main tab of the Block Parameters dialog box. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point

Parameters Main Select trained machine learning model — Neural network classification model nnetMdl (default) | ClassificationNeuralNetwork object | CompactClassificationNeuralNetwork object Specify the name of a workspace variable that contains a ClassificationNeuralNetwork object or CompactClassificationNeuralNetwork object. When you train the model by using fitcnet, the following restrictions apply: • The predictor data cannot include categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. • The value of the ScoreTransform name-value argument cannot be 'invlogit' or an anonymous function. Programmatic Use

Block Parameter: TrainedLearner Type: workspace variable Values: ClassificationNeuralNetwork object | CompactClassificationNeuralNetwork object Default: 'nnetMdl' Add output port for predicted class scores — Add second output port for predicted class scores off (default) | on Select the check box to include the second output port score in the ClassificationNeuralNetwork Predict block. Programmatic Use

Block Parameter: ShowOutputScore Type: character vector 35-614

ClassificationNeuralNetwork Predict

Values: 'off' | 'on' Default: 'off' Data Types Fixed-Point Operational Parameters

Integer rounding mode — Rounding mode for fixed-point operations Floor (default) | Ceiling | Convergent | Nearest | Round | Simplest | Zero Specify the rounding mode for fixed-point operations. For more information, see “Rounding” (FixedPoint Designer). Block parameters always round to the nearest representable value. To control the rounding of a block parameter, enter an expression into the mask field using a MATLAB rounding function. Programmatic Use

Block Parameter: RndMeth Type: character vector Values: "Ceiling" | "Convergent" | "Floor" | "Nearest" | "Round" | "Simplest" | "Zero" Default: "Floor" Saturate on integer overflow — Method of overflow action off (default) | on Specify whether overflows saturate or wrap. Action

Rationale

Impact on Overflows

Example

Select this check box (on).

Your model has possible overflow, and you want explicit saturation protection in the generated code.

Overflows saturate to either the minimum or maximum value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of – 128.

35-615

35

Functions

Action

Rationale

Impact on Overflows

Example

Clear this check box (off).

You want to optimize the efficiency of your generated code.

Overflows wrap to the appropriate value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the value causing the overflow as int8, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as int8 is –126.

You want to avoid overspecifying how a block handles out-of-range signals. For more information, see “Troubleshoot Signal Range Errors” (Simulink).

Programmatic Use

Block Parameter: SaturateOnIntegerOverflow Type: character vector Values: "off" | "on" Default: "off" Lock output data type setting against changes by the fixed-point tools — Prevention of fixedpoint tools from overriding data type off (default) | on Select this parameter to prevent the fixed-point tools from overriding the data type you specify for the block. For more information, see “Use Lock Output Data Type Setting” (Fixed-Point Designer). Programmatic Use

Block Parameter: LockScale Type: character vector Values: "off" | "on" Default: "off" Data Type

Label data type — Data type of label output Inherit: Inherit via back propagation | Inherit: auto | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the label output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. The supported data types depend on the labels used in the model specified by Select trained machine learning model. • If the model uses numeric or logical labels, the supported data types are Inherit: Inherit via back propagation (default), double, single, half, int8, uint8, int16, uint16, int32, uint32, int64, uint64, boolean, fixed point, and a data type object. 35-616

ClassificationNeuralNetwork Predict

• If the model uses nonnumeric labels, the supported data types are Inherit: auto (default), Enum: , and a data type object. When you select an inherited option, the software behaves as follows: • Inherit: Inherit via back propagation (default for numeric and logical labels) — Simulink automatically determines the Label data type of the block during data type propagation (see “Data Type Propagation” (Simulink)). In this case, the block uses the data type of a downstream block or signal object. • Inherit: auto (default for nonnumeric labels) — The block uses an autodefined enumerated data type variable. For example, suppose the workspace variable name specified by Select trained machine learning model is myMdl, and the class labels are class 1 and class 2. Then, the corresponding label values are myMdl_enumLabels.class_1 and myMdl_enumLabels.class_2. The block converts the class labels to valid MATLAB identifiers by using the matlab.lang.makeValidName function. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: LabelDataTypeStr Type: character vector Values: "Inherit: Inherit via back propagation" | "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: Inherit via back propagation" (for numeric and logical labels) | "Inherit: auto" (for nonnumeric labels) Label data type Minimum — Minimum value of label output for range checking [] (default) | scalar Specify the lower value of the label output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Label data type Minimum parameter does not saturate or clip the actual label output signal. To do so, use the Saturation block instead. 35-617

35

Functions

Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses numeric labels. Programmatic Use

Block Parameter: LabelOutMin Type: character vector Values: "[]" | scalar Default: "[]" Label data type Maximum — Maximum value of label output for range checking [] (default) | scalar Specify the upper value of the label output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Label data type Maximum parameter does not saturate or clip the actual label output signal. To do so, use the Saturation block instead. Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses numeric labels. Programmatic Use

Block Parameter: LabelOutMax Type: character vector Values: "[]" | scalar Default: "[]" Score data type — Data type of score output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the score output. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. 35-618

ClassificationNeuralNetwork Predict

For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: ScoreDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Score data type Minimum — Minimum value of score output for range checking [] (default) | scalar Specify the lower value of the score output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Score data type Minimum parameter does not saturate or clip the actual score output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: ScoreOutMin Type: character vector Values: "[]" | scalar Default: "[]" Score data type Maximum — Maximum value of score output for range checking [] (default) | scalar Specify the upper value of the score output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). 35-619

35

Functions

• Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Score data type Maximum parameter does not saturate or clip the actual score output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: ScoreOutMax Type: character vector Values: "[]" | scalar Default: "[]" Raw score data type — Untransformed score data type Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the internal untransformed scores. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses a score transformation other than "none" (default, same as "identity"). • If the model uses no score transformations ("none" or "identity"), then you can specify the score data type by using Score data type. • If the model uses a score transformation other than "none" or "identity", then you can specify the data type of untransformed raw scores by using this parameter. To specify the data type of transformed scores, use Score data type. You can change the score transformation option by specifying the ScoreTransform name-value argument during training, or by modifying the ScoreTransform property after training. Programmatic Use

Block Parameter: RawScoreDataTypeStr Type: character vector

35-620

ClassificationNeuralNetwork Predict

Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Raw score data type Minimum — Minimum untransformed score for range checking [] (default) | scalar Specify the lower value of the untransformed score range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Raw score data type Minimum parameter does not saturate or clip the actual untransformed score signal. Programmatic Use

Block Parameter: RawScoreOutMin Type: character vector Values: "[]" | scalar Default: "[]" Raw score data type Maximum — Maximum untransformed score for range checking [] (default) | scalar Specify the upper value of the untransformed score range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Raw score data type Maximum parameter does not saturate or clip the actual untransformed score signal. 35-621

35

Functions

Programmatic Use

Block Parameter: RawScoreOutMax Type: character vector Values: "[]" | scalar Default: "[]" Output layer data type — Data type of final fully connected layer Inherit: Inherit via internal rule (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the output layer. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: Inherit via internal rule, the block uses an internal rule to determine the output data type. The internal rule chooses a data type that optimizes numerical accuracy, performance, and generated code size, while taking into account the properties of the embedded target hardware. The software cannot always optimize efficiency and numerical accuracy at the same time. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: OutputLayerDataTypeStr Type: character vector Values: 'Inherit: Inherit via internal rule' | 'double' | 'single' | 'half' | 'int8' | 'uint8' | 'int16' | 'uint16' | 'int32' | 'uint32' | 'int64' | 'uint64' | 'boolean' | 'fixdt(1,16,0)' | 'fixdt(1,16,2^0,0)' | '' Default: 'Inherit: Inherit via internal rule' Output layer data type Minimum — Minimum value for final fully connected layer [] (default) | scalar Specify the lower value of the output layer's internal variable range checked by Simulink. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). 35-622

ClassificationNeuralNetwork Predict

Note The Output layer data type Minimum parameter does not saturate or clip the output layer value signal. Programmatic Use

Block Parameter: OutputLayerOutMin Type: character vector Values: '[]' | scalar Default: '[]' Output layer data type Maximum — Maximum value for final fully connected layer [] (default) | scalar Specify the upper value of the output layer's internal variable range checked by Simulink. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Output layer data type Maximum parameter does not saturate or clip the output layer value signal. Programmatic Use

Block Parameter: OutputLayerOutMax Type: character vector Values: '[]' | scalar Default: '[]' Layer 1 data type — Data type of first fully connected layer Inherit: Inherit via internal rule (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the first layer. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: Inherit via internal rule, the block uses an internal rule to determine the data type. The internal rule chooses a data type that optimizes numerical accuracy, performance, and generated code size, while taking into account the properties of the embedded target hardware. The software cannot always optimize efficiency and numerical accuracy at the same time. For more information about data types, see “Control Data Types of Signals” (Simulink). 35-623

35

Functions

Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Tips

A trained neural network can have more than one fully connected layer, excluding the output layer. • You can specify the data type for each individual layer for the first 10 layers. Specify the data type Layer n data type for each layer. The data type of the first layer is Layer 1 data type, the data type of the second layer is Layer 2 data type, and so on. • You can specify the data type for layers 11 to k, where k is the total number of layers, by using the data type Additional layer(s) data type. The Block Parameter for Additional layer(s) data type is Layer11DataTypeStr. • The data types Layer n data type and Additional layer(s) data type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. These data types support the same values as Layer 1 data type. Programmatic Use

Block Parameter: Layer1DataTypeStr Type: character vector Values: 'Inherit: Inherit via internal rule' | 'double' | 'single' | 'half' | 'int8' | 'uint8' | 'int16' | 'uint16' | 'int32' | 'uint32' | 'int64' | 'uint64' | 'boolean' | 'fixdt(1,16,0)' | 'fixdt(1,16,2^0,0)' | '' Default: 'Inherit: Inherit via internal rule' Layer 1 data type Minimum — Minimum value for first fully connected layer [] (default) | scalar Specify the lower value of the first layer's internal variable range checked by Simulink. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Layer 1 data type Minimum parameter does not saturate or clip the first layer value signal. Tips

A trained neural network can have more than one fully connected layer, excluding the output layer. 35-624

ClassificationNeuralNetwork Predict

• You can specify the lower value of each individual layer's internal variable range checked by Simulink for the first 10 layers. Specify the lower value Layer n minimum for each layer. The minimum value of the first layer is Layer 1 minimum, the minimum value of the second layer is Layer 2 minimum, and so on. • You can specify the lower value for layers 11 to k, where k is the total number of layers, by using Additional layer(s) minimum. The Block Parameter for Additional layer(s) minimum is Layer11OutMin. • Layer n minimum and Additional layer(s) minimum support the same values as Layer 1 minimum. Programmatic Use

Block Parameter: Layer1OutMin Type: character vector Values: '[]' | scalar Default: '[]' Layer 1 data type Maximum — Maximum value for first fully connected layer [] (default) | scalar Specify the upper value of the first layer's internal variable range checked by Simulink. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Layer 1 data type Maximum parameter does not saturate or clip the first layer value signal. Tips

A trained neural network can have more than one fully connected layer, excluding the output layer. • You can specify the upper value of each individual layer's internal variable range checked by Simulink for the first 10 layers. Specify the upper value Layer n maximum for each layer. The maximum value of the first layer is Layer 1 maximum, the maximum value of the second layer is Layer 2 maximum, and so on. • You can specify the upper value for layers 11 to k, where k is the total number of layers, by using Additional layer(s) maximum. The Block Parameter for Additional layer(s) maximum is Layer11OutMax. • Layer n maximum and Additional layer(s) maximum support the same values as Layer 1 maximum. 35-625

35

Functions

Programmatic Use

Block Parameter: Layer1OutMax Type: character vector Values: '[]' | scalar Default: '[]'

Block Characteristics Data Types

Boolean | double | enumerated | fixed point | half | integer | single

Direct Feedthrough

yes

Multidimensional Signals

no

Variable-Size Signals

no

Zero-Crossing Detection

no

Alternative Functionality You can use a MATLAB Function block with the predict object function of a neural network classification object (ClassificationNeuralNetwork or CompactClassificationNeuralNetwork). For an example, see “Predict Class Labels Using MATLAB Function Block” on page 34-49. When deciding whether to use the ClassificationNeuralNetwork Predict block in the Statistics and Machine Learning Toolbox library or a MATLAB Function block with the predict function, consider the following: • If you use the Statistics and Machine Learning Toolbox library block, you can use the Fixed-Point Tool to convert a floating-point model to fixed point. • Support for variable-size arrays must be enabled for a MATLAB Function block with the predict function. • If you use a MATLAB Function block, you can use MATLAB functions for preprocessing or postprocessing before or after predictions in the same MATLAB Function block.

Version History Introduced in R2021b

Extended Capabilities C/C++ Code Generation Generate C and C++ code using Simulink® Coder™. Fixed-Point Conversion Design and simulate fixed-point systems using Fixed-Point Designer™.

35-626

ClassificationNeuralNetwork Predict

See Also Blocks ClassificationSVM Predict | ClassificationEnsemble Predict | ClassificationTree Predict | RegressionNeuralNetwork Predict Objects ClassificationNeuralNetwork | CompactClassificationNeuralNetwork Functions predict | fitcnet Topics “Predict Class Labels Using ClassificationSVM Predict Block” on page 34-121 “Predict Class Labels Using ClassificationEnsemble Predict Block” on page 34-140 “Predict Class Labels Using ClassificationTree Predict Block” on page 34-131 “Predict Class Labels Using MATLAB Function Block” on page 34-49

35-627

35

Functions

IncrementalClassificationLinear Predict Classify observations using incremental linear classification model Libraries: Statistics and Machine Learning Toolbox / Incremental Learning / Classification / Linear

Description The IncrementalClassificationLinear Predict block classifies observations using a trained linear classification model returned as the output of an IncrementalClassificationLinear Fit block. Import an initial linear classification model object into the block by specifying the name of a workspace variable that contains the object. The input port mdl receives a bus signal that represents an incremental learning model fit to streaming data. The input port x receives a chunk of predictor data (observations), and the output port label returns predicted class labels for the chunk. The optional output port score returns predicted class scores or posterior probabilities.

Ports Input mdl — Incremental learning model bus signal Incremental learning model (incrementalClassificationLinear) fit to streaming data, specified as a bus signal (see Composite Signals). x — Chunk of predictor data numeric matrix Chunk of predictor data, specified as a numeric matrix. The orientation of the variables and observations is specified by Predictor data observation dimension. The default orientation is rows, which indicates that observations in the predictor data are oriented along the rows of x. Note The block supports only numeric input predictor data. If your input data includes categorical data, you must prepare an encoded version of the categorical data. Use dummyvar to convert each categorical variable to a numeric matrix of dummy variables. Then, concatenate all dummy variable matrices and any other numeric predictors. For more details, see “Dummy Variables” on page 2-13. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point Output label — Chunk of predicted class labels column vector 35-628

IncrementalClassificationLinear Predict

Chunk of predicted class labels, returned as a column vector. The label label(i) represents the class yielding the highest score for the observation x(i). For more details, see the Label argument of the predict object function. Note If you specify an estimation period when you create mdl, then during the estimation period the predicted class labels are the majority class, which makes up the largest proportion of the training labels in x. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point | enumerated score — Predicted class scores or posterior probabilities matrix Predicted class scores or posterior probabilities, returned as a matrix. If the model was trained using a logistic learner, the classification scores are posterior probabilities. The classification score score(i) represents the posterior probability that the observation in x belongs to class i. For more details, see “Classification Score” on page 35-638. To check the order of the classes, use the ClassNames property of the linear classification model specified by Select initial machine learning model. Note If you specify an estimation period when you create mdl, then the predicted class scores are zero during the estimation period Dependencies

To enable this port, select the check box for Add output port for predicted class scores on the Main tab of the Block Parameters dialog box. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point

Parameters Main Select initial machine learning model — Initial incremental linear classification model linearMdl (default) | incrementalClassificationLinear model object Specify the name of a workspace variable that contains the configured incrementalClassificationLinear model object. The following restrictions apply: • The predictor data cannot include categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. 35-629

35

Functions

• The ScoreTransform property of the initial model cannot be "invlogit" or an anonymous function. • The NumPredictors property of the initial model must be a positive integer scalar, and must be equal to the number of predictors in x. • The Solver property of the initial model must be "scale-invariant". Programmatic Use

Block Parameter: InitialLearner Type: workspace variable Values: incrementalClassificationLinear model object Default: "linearMdl" Add output port for predicted class scores — Add second output port for predicted class scores off (default) | on Select the check box to include the output port score for predicted class scores in the incremental ClassificationLinear Predict block. Programmatic Use

Block Parameter: ShowOutputScore Type: character vector Values: "off" | "on" Default: "off" Predictor data observation dimension — Observation dimension of predictor data rows (default) | columns Specify the observation dimension of the predictor data. The default value is rows, which indicates that observations in the predictor data are oriented along the rows of x. Programmatic Use

Block Parameter: ObservationsIn Type: character vector Values: "rows" | "columns" Default: "rows" Sample time (-1 for inherited) — Option to specify sample time -1 (default) | scalar Specify the discrete interval between sample time hits or specify another type of sample time, such as continuous (0) or inherited (-1). For more options, see “Types of Sample Time” (Simulink). By default, the IncrementalClassificationLinear Predict block inherits sample time based on the context of the block within the model. Programmatic Use

Block Parameter: SystemSampleTime Type: string scalar or character vector Values: scalar Default: "-1" 35-630

IncrementalClassificationLinear Predict

Data Types Fixed-Point Operational Parameters

Integer rounding mode — Rounding mode for fixed-point operations Floor (default) | Ceiling | Convergent | Nearest | Round | Simplest | Zero Specify the rounding mode for fixed-point operations. For more information, see “Rounding” (FixedPoint Designer). Block parameters always round to the nearest representable value. To control the rounding of a block parameter, enter an expression into the mask field using a MATLAB rounding function. Programmatic Use

Block Parameter: RndMeth Type: character vector Values: "Ceiling" | "Convergent" | "Floor" | "Nearest" | "Round" | "Simplest" | "Zero" Default: "Floor" Saturate on integer overflow — Method of overflow action off (default) | on Specify whether overflows saturate or wrap. Action

Rationale

Impact on Overflows

Example

Select this check box (on).

Your model has possible overflow, and you want explicit saturation protection in the generated code.

Overflows saturate to either the minimum or maximum value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of – 128.

35-631

35

Functions

Action

Rationale

Impact on Overflows

Example

Clear this check box (off).

You want to optimize the efficiency of your generated code.

Overflows wrap to the appropriate value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the value causing the overflow as int8, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as int8 is –126.

You want to avoid overspecifying how a block handles out-of-range signals. For more information, see “Troubleshoot Signal Range Errors” (Simulink).

Programmatic Use

Block Parameter: SaturateOnIntegerOverflow Type: character vector Values: "off" | "on" Default: "off" Lock output data type setting against changes by the fixed-point tools — Prevention of fixedpoint tools from overriding data type off (default) | on Select this parameter to prevent the fixed-point tools from overriding the data type you specify for the block. For more information, see “Use Lock Output Data Type Setting” (Fixed-Point Designer). Programmatic Use

Block Parameter: LockScale Type: character vector Values: "off" | "on" Default: "off" Data Type

Label data type — Data type of label output Inherit: Inherit via back propagation | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the label output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. The supported data types depend on the labels used in the model specified by Select initial machine learning model. • If the model uses numeric or logical labels, the supported data types are Inherit: Inherit via back propagation (default), double, single, half, int8, uint8, int16, uint16, int32, uint32, int64, uint64, boolean, fixed point, and a data type object. 35-632

IncrementalClassificationLinear Predict

• If the model uses nonnumeric labels, the supported data types are Enum: and a data type object. When you select an inherited option, the software behaves as follows: • Inherit: Inherit via back propagation (default for numeric and logical labels) — Simulink automatically determines the Label data type of the block during data type propagation (see “Data Type Propagation” (Simulink)). In this case, the block uses the data type of a downstream block or signal object. • Inherit: auto (default for nonnumeric labels) — The block uses an autodefined enumerated data type variable. For example, suppose the workspace variable name specified by Select initial machine learning model is myMdl, and the class labels are class 1 and class 2. Then, the corresponding label values are myMdl_enumLabels.class_1 and myMdl_enumLabels.class_2. The block converts the class labels to valid MATLAB identifiers by using the matlab.lang.makeValidName function. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: LabelDataTypeStr Type: character vector Values: "Inherit: Inherit via back propagation" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: Inherit via back propagation" (for numeric and logical labels) | "Inherit: auto" (for nonnumeric labels) Label data type Minimum — Minimum value of label output for range checking [] (default) | scalar Specify the lower value of the label output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Label data type Minimum parameter does not saturate or clip the actual label output signal. To do so, use the Saturation block instead. 35-633

35

Functions

Dependencies

You can specify this parameter only if the model specified by Select initial machine learning model uses numeric labels. Programmatic Use

Block Parameter: LabelOutMin Type: character vector Values: "[]" | scalar Default: "[]" Label data type Maximum — Maximum value of label output for range checking [] (default) | scalar Specify the upper value of the label output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Label data type Maximum parameter does not saturate or clip the actual label output signal. To do so, use the Saturation block instead. Dependencies

You can specify this parameter only if the model specified by Select initial machine learning model uses numeric labels. Programmatic Use

Block Parameter: LabelOutMax Type: character vector Values: "[]" | scalar Default: "[]" Score data type — Data type of score output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the score output. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. 35-634

IncrementalClassificationLinear Predict

For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: ScoreDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Score data type Minimum — Minimum value of score output for range checking [] (default) | scalar Specify the lower value of the score output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Score data type Minimum parameter does not saturate or clip the actual score output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: ScoreOutMin Type: character vector Values: "[]" | scalar Default: "[]" Score data type Maximum — Maximum value of score output for range checking [] (default) | scalar Specify the upper value of the score output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). 35-635

35

Functions

• Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Score data type Maximum parameter does not saturate or clip the actual score output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: ScoreOutMax Type: character vector Values: "[]" | scalar Default: "[]" Inner product data type — Inner product data type Inherit: Inherit via internal rule (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the inner product term of the predicted response on page 35-662. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: Inherit via internal rule, the block uses an internal rule to determine the inner product data type. The internal rule chooses a data type that optimizes numerical accuracy, performance, and generated code size, while taking into account the properties of the embedded target hardware. The software cannot always optimize efficiency and numerical accuracy at the same time. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: InnerProductDataTypeStr Type: character vector Values: "Inherit: Inherit via internal rule" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "double" Inner product data type Minimum — Minimum of inner product term for range checking [] (default) | scalar Specify the lower value of the inner product term range that Simulink checks. Simulink uses the minimum value to perform: 35-636

IncrementalClassificationLinear Predict

• Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Inner product data type Minimum parameter does not saturate or clip the actual inner product term value. Programmatic Use

Block Parameter: InnerProductOutMin Type: character vector Values: "[]" | scalar Default: "[]" Inner product data type Maximum — Maximum of inner product term for range checking [] (default) | scalar Specify the upper value of the inner product term range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Inner product data type Maximum parameter does not saturate or clip the actual inner product term value. Programmatic Use

Block Parameter: InnerProductOutMax Type: character vector Values: "[]" | scalar Default: "[]"

35-637

35

Functions

Block Characteristics Data Types

Boolean | double | enumerated | fixed point | half | integer | single

Direct Feedthrough

yes

Multidimensional Signals

no

Variable-Size Signals

no

Zero-Crossing Detection

no

More About Classification Score For linear classification models, the raw classification score for classifying the observation x into the positive class is defined by f(x) = xβ+b β is the estimated column vector of coefficients, and b is the estimated scalar bias. The linear classification model object specified by Select initial machine learning model contains the coefficients and bias in the Beta and Bias properties, respectively. The raw classification score for classifying x into the negative class is –f(x). The software classifies observations into the class that yields the positive score. If the linear classification model uses no score transformations, then the raw classification score is the same as the classification score. If the model consists of logistic regression learners, then the software applies the "logit" score transformation to the raw classification scores. You can specify the data types for the components required to compute classification scores using Score data type, Raw score data type, and Inner product data type. • Score data type determines the data type of the classification score. • Raw score data type determines the data type of the raw classification score f if the model uses a score transformation other than "none" or "identity". • Inner product data type determines the data type of xβ.

Version History Introduced in R2023b

Extended Capabilities C/C++ Code Generation Generate C and C++ code using Simulink® Coder™. Fixed-Point Conversion Design and simulate fixed-point systems using Fixed-Point Designer™. 35-638

IncrementalClassificationLinear Predict

See Also Blocks IncrementalClassificationLinear Fit | Update Metrics Objects incrementalClassificationLinear Functions predict | fit Topics “Predict Class Labels Using ClassificationLinear Predict Block” on page 34-174

35-639

35

Functions

IncrementalClassificationLinear Fit Fit incremental linear binary classification model Libraries: Statistics and Machine Learning Toolbox / Incremental Learning / Classification / Linear

Description The IncrementalClassificationLinear Fit block fits a configured incremental model for linear binary classification (incrementalClassificationLinear) to streaming data. Import an initial linear classification model object into the block by specifying the name of a workspace variable that contains the object. The input port x receives a chunk of predictor data (observations), and the input port y receives a chunk of responses (labels) to which the model is fit. The output port mdl returns an updated incrementalClassificationLinear model. The optional input port w receives a chunk of observation weights.

Ports Input x — Chunk of predictor data numeric matrix Chunk of predictor data to which the model is fit, specified as a numeric matrix. The orientation of the variables and observations is specified by Predictor data observation dimension. The default orientation is rows, which indicates that the observations in the predictor data are oriented along the rows of x. The length of the observation responses y and the number of observations in x must be equal; y(j) is the response of observation j (row or column) in x. Note • The number of predictor variables in x must be equal to the NumPredictors property value of the initial model. If the number of predictor variables in the streaming data changes from NumPredictors, the block issues an error. • The IncrementalClassificationLinear Fit block supports only numeric input predictor data. If your input data includes categorical data, you must prepare an encoded version of the categorical data. Use dummyvar to convert each categorical variable to a numeric matrix of dummy variables. Then, concatenate all dummy variable matrices and any other numeric predictors. For more details, see “Dummy Variables” on page 2-13.

Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point 35-640

IncrementalClassificationLinear Fit

y — Chunk of class labels numeric vector | logical vector | enumerated vector Chunk of class labels to which the model is trained, specified as a numeric, logical, or enumerated vector. • The IncrementalClassificationLinear Fit block supports binary classification only. • The length of the observation responses y and the number of observations in x must be equal; y (j) is the response of observation j (row or column) in x. • Each label must correspond to one row of the array. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point | enumerated w — Chunk of observation weights vector of positive values Chunk of observation weights, specified as a vector of positive values. The IncrementalClassificationLinear Fit block weights the observations in x with the corresponding values in w. The size of w must be equal to the number of observations in x. Dependencies

To enable this port, select the check box for Add input port for observation weights on the Main tab of the Block Parameters dialog box. Data Types: single | double Output mdl — Updated incremental learning model parameters bus signal Updated parameters of the incremental learning model fit to streaming data (including Beta and Bias), returned as a bus signal (see Composite Signals).

Parameters Main Select initial machine learning model — Initial incremental linear classification model linearMdl (default) | incrementalClassificationLinear model object Specify the name of a workspace variable that contains the configured incrementalClassificationLinear model object. The following restrictions apply: • The predictor data cannot include categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. 35-641

35

Functions

• The ScoreTransform property of the initial model cannot be "invlogit" or an anonymous function. • The NumPredictors property of the initial model must be a positive integer scalar, and must be equal to the number of predictors in x. • The Solver property of the initial model must be "scale-invariant". Programmatic Use

Block Parameter: InitialLearner Type: workspace variable Values: incrementalClassificationLinear model object Default: "linearMdl" Add input port for observation weights — Add second input port for observation weights off (default) | on Select the check box to include the input port w for observation weights in the IncrementalClassificationLinear Fit block. Programmatic Use

Block Parameter: ShowInputWeights Type: character vector Values: "off" | "on" Default: "off" Predictor data observation dimension — Observation dimension of predictor data rows (default) | columns Specify the observation dimension of the predictor data. The default value is rows, which indicates that observations in the predictor data are oriented along the rows of x. Programmatic Use

Block Parameter: ObservationsIn Type: character vector Values: "rows" | "columns" Default: "rows" Sample time (-1 for inherited) — Option to specify sample time -1 (default) | scalar Specify the discrete interval between sample time hits or specify another type of sample time, such as continuous (0) or inherited (-1). For more options, see “Types of Sample Time” (Simulink). By default, the IncrementalClassificationLinear Fit block inherits sample time based on the context of the block within the model. Programmatic Use

Block Parameter: SystemSampleTime Type: string scalar or character vector Values: scalar Default: "-1" 35-642

IncrementalClassificationLinear Fit

Data Types Fixed-Point Operational Parameters

Integer rounding mode — Rounding mode for fixed-point operations Floor (default) | Ceiling | Convergent | Nearest | Round | Simplest | Zero Specify the rounding mode for fixed-point operations. For more information, see “Rounding” (FixedPoint Designer). Block parameters always round to the nearest representable value. To control the rounding of a block parameter, enter an expression into the mask field using a MATLAB rounding function. Programmatic Use

Block Parameter: RndMeth Type: character vector Values: "Ceiling" | "Convergent" | "Floor" | "Nearest" | "Round" | "Simplest" | "Zero" Default: "Floor" Saturate on integer overflow — Method of overflow action off (default) | on Specify whether overflows saturate or wrap. Action

Rationale

Impact on Overflows

Example

Select this check box (on).

Your model has possible overflow, and you want explicit saturation protection in the generated code.

Overflows saturate to either the minimum or maximum value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of – 128.

35-643

35

Functions

Action

Rationale

Impact on Overflows

Example

Clear this check box (off).

You want to optimize the efficiency of your generated code.

Overflows wrap to the appropriate value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the value causing the overflow as int8, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as int8 is –126.

You want to avoid overspecifying how a block handles out-of-range signals. For more information, see “Troubleshoot Signal Range Errors” (Simulink).

Programmatic Use

Block Parameter: SaturateOnIntegerOverflow Type: character vector Values: "off" | "on" Default: "off" Lock output data type setting against changes by the fixed-point tools — Prevention of fixedpoint tools from overriding data type off (default) | on Select this parameter to prevent the fixed-point tools from overriding the data type you specify for the block. For more information, see “Use Lock Output Data Type Setting” (Fixed-Point Designer). Programmatic Use

Block Parameter: LockScale Type: character vector Values: "off" | "on" Default: "off" Data Type

Beta data type — Data type of linear coefficient estimates output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the linear coefficient estimates (beta) output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. For more information about data types, see “Control Data Types of Signals” (Simulink).

35-644

IncrementalClassificationLinear Fit

Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: BetaDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: auto" Beta data type Minimum — Minimum value of beta for range checking [] (default) | scalar Specify the lower value of the beta output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Beta data type Minimum parameter does not saturate or clip the actual beta output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: BetaOutMin Type: character vector Values: "[]" | scalar Default: "[]" Beta data type Maximum — Maximum value of beta for range checking [] (default) | scalar Specify the upper value of the beta output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. 35-645

35

Functions

For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Beta data type Maximum parameter does not saturate or clip the actual beta output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: BetaOutMax Type: character vector Values: "[]" | scalar Default: "[]" Bias data type — Data type of intercept estimates output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the intercept estimates (bias) output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: BiasDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: auto" Bias data type Minimum — Minimum value of bias for range checking [] (default) | scalar Specify the lower value of the bias output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). 35-646

IncrementalClassificationLinear Fit

Note The Bias data type Minimum parameter does not saturate or clip the actual bias output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: BiasOutMin Type: character vector Values: "[]" | scalar Default: "[]" Bias data type Maximum — Maximum value of bias for range checking [] (default) | scalar Specify the upper value of the bias output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Bias data type Maximum parameter does not saturate or clip the actual bias output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: BiasOutMax Type: character vector Values: "[]" | scalar Default: "[]" Internal states data type — Data type of internal states output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the internal states output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink).

35-647

35

Functions

Programmatic Use

Block Parameter: StatesDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: auto" Internal states data type Minimum — Minimum value of internal states for range checking [] (default) | scalar Specify the lower value of the internal states output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Internal states data type Minimum parameter does not saturate or clip the actual internal states output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: StatesOutMin Type: character vector Values: "[]" | scalar Default: "[]" Internal states data type Maximum — Maximum value of internal states for range checking [] (default) | scalar Specify the upper value of the internal states output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Internal states data type Maximum parameter does not saturate or clip the actual internal states output. To do so, use the Saturation block instead. 35-648

IncrementalClassificationLinear Fit

Programmatic Use

Block Parameter: StatesOutMax Type: character vector Values: "[]" | scalar Default: "[]" Prior data type — Data type of prior output double (default) | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the prior output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: PriorDataTypeStr Type: character vector Values: "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: auto" Prior data type Minimum — Minimum value of prior for range checking [] (default) | scalar Specify the lower value of the prior output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Prior data type Minimum parameter does not saturate or clip the actual prior output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: PriorOutMin Type: character vector 35-649

35

Functions

Values: "[]" | scalar Default: "[]" Prior data type Maximum — Maximum value of prior for range checking [] (default) | scalar Specify the upper value of the prior output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Prior data type Maximum parameter does not saturate or clip the actual prior output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: PriorOutMax Type: character vector Values: "[]" | scalar Default: "[]" Mu data type — Data type of mu output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the mu (predictor means) output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. If you do not specify Standardize="true" when you create the initial model mdl, then the IncrementalClassificationLinear Fit block sets mu to 0. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: MuDataTypeStr Type: character vector

35-650

IncrementalClassificationLinear Fit

Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: auto" Mu data type Minimum — Minimum value of mu for range checking [] (default) | scalar Specify the lower value of the mu output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Mu data type Minimum parameter does not saturate or clip the actual mu output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: MuOutMin Type: character vector Values: "[]" | scalar Default: "[]" Mu data type Maximum — Maximum value of mu for range checking [] (default) | scalar Specify the upper value of the mu output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Mu data type Maximum parameter does not saturate or clip the actual mu output. To do so, use the Saturation block instead.

35-651

35

Functions

Programmatic Use

Block Parameter: MuOutMax Type: character vector Values: "[]" | scalar Default: "[]" Sigma data type — Data type of sigma output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the sigma (predictor standard deviations) output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. If you do not specify Standardize=true when you create the initial model mdl, then the IncrementalClassificationLinear Fit block sets sigma to 0. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: SigmaDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: auto" Sigma data type Minimum — Minimum value of sigma for range checking [] (default) | scalar Specify the lower value of the sigma output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Sigma data type Minimum parameter does not saturate or clip the actual sigma output. To do so, use the Saturation block instead. 35-652

IncrementalClassificationLinear Fit

Programmatic Use

Block Parameter: SigmaOutMin Type: character vector Values: "[]" | scalar Default: "[]" Sigma data type Maximum — Maximum value of sigma for range checking [] (default) | scalar Specify the upper value of the sigma output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Sigma data type Maximum parameter does not saturate or clip the actual sigma output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: SigmaOutMax Type: character vector Values: "[]" | scalar Default: "[]"

Block Characteristics Data Types

Boolean | double | enumerated | fixed point | half | integer | single

Direct Feedthrough

yes

Multidimensional Signals

no

Variable-Size Signals

no

Zero-Crossing Detection

no

Version History Introduced in R2023b

35-653

35

Functions

Extended Capabilities C/C++ Code Generation Generate C and C++ code using Simulink® Coder™. Fixed-Point Conversion Design and simulate fixed-point systems using Fixed-Point Designer™.

See Also Blocks IncrementalClassificationLinear Predict | Update Metrics Objects incrementalClassificationLinear Functions fit | predict Topics “Predict Class Labels Using ClassificationLinear Predict Block” on page 34-174

35-654

IncrementalRegressionLinear Predict

IncrementalRegressionLinear Predict Predict responses using incremental linear regression model Libraries: Statistics and Machine Learning Toolbox / Incremental Learning / Regression / Linear

Description The IncrementalRegressionLinear Predict block predicts responses for streaming data using a trained linear regression model returned as the output of an IncrementalRegressionLinear Fit block. Import an initial linear regression model object into the block by specifying the name of a workspace variable that contains the object. The input port mdl receives a bus signal that represents an incremental learning model fit to streaming data. The input port x receives a chunk of predictor data (observations), and the output port yfit returns predicted responses for the chunk. The optional output port CanPredict, returns the prediction status of the trained model.

Ports Input mdl — Incremental learning model bus signal Incremental learning model (incrementalRegressionLinear) fit to streaming data, specified as a bus signal (see Composite Signals). x — Chunk of predictor data numeric matrix Chunk of predictor data, specified as a numeric matrix. The orientation of the variables and observations is specified by Predictor data observation dimension. The default orientation is rows, which indicates that observations in the predictor data are oriented along the rows of x. Note The block supports only numerical input predictor data. If your input data includes categorical data, you must prepare an encoded version of the categorical data. Use dummyvar to convert each categorical variable to a numeric matrix of dummy variables. Then, concatenate all dummy variable matrices and any other numeric predictors. For more details, see “Dummy Variables” on page 2-13. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point Output yfit — Chunk of predicted responses floating-point vector 35-655

35

Functions

Chunk of predicted responses, returned as a floating-point vector. For more details, see Predicted response on page 35-662 and the YHat argument of the predict object function. Note If you specify an estimation period when you create mdl, then the predicted responses are zero during the estimation period. Data Types: single | double CanPredict — Model status logical Model status for prediction, returned as logical 0 (false) or 1 (true). Note If you specify an estimation period when you create mdl, then the model status is 0 (false) during the estimation period. Dependencies

To enable this port, select the check box for Add output port for status of trained machine learning model on the Main tab of the Block Parameters dialog box.

Parameters Main Select initial machine learning model — Initial incremental linear regression model linearMdl (default) | incrementalRegressionLinear model object Specify the name of a workspace variable that contains the configured incrementalRegressionLinear model object. The following restrictions apply: • The predictor data cannot include categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. • The NumPredictors property of the initial model must be a positive integer scalar, and must be equal to the number of predictors in x. • The Solver property of the initial model must be "scale-invariant". Programmatic Use

Block Parameter: InitialLearner Type: workspace variable Values: incrementalRegressionLinear model object Default: "linearMdl"

35-656

IncrementalRegressionLinear Predict

Add output port for status of trained machine learning model — Add second output port for model status off (default) | on Select the check box to include the output port CanPredict in the IncrementalRegressionLinear Predict block. This check box does not appear if the workspace already contained an incremental linear regression model named linearMdl capable of prediction when you created the IncrementalRegressionLinear Fit block. Alternatively, you can specify to include the output port CanPredict by selecting the IncrementalRegressionLinear Predict block in the Simulink workspace and entering set_param(gcb,ShowOutputCanPredict="on") at the MATLAB command line. Programmatic Use

Block Parameter: ShowOutputCanPredict Type: character vector Values: "off" | "on" Default: "off" Predictor data observation dimension — Observation dimension of predictor data rows (default) | columns Specify the observation dimension of the predictor data. The default value is rows, which indicates that observations in the predictor data are oriented along the rows of x. Programmatic Use

Block Parameter: ObservationsIn Type: character vector Values: "rows" | "columns" Default: "rows" Sample time (-1 for inherited) — Option to specify sample time -1 (default) | scalar Specify the discrete interval between sample time hits or specify another type of sample time, such as continuous (0) or inherited (-1). For more options, see “Types of Sample Time” (Simulink). By default, the IncrementalRegressionLinear Predict block inherits sample time based on the context of the block within the model. Programmatic Use

Block Parameter: SystemSampleTime Type: string scalar or character vector Values: scalar Default: "-1" Data Types Fixed-Point Operational Parameters

Integer rounding mode — Rounding mode for fixed-point operations Floor (default) | Ceiling | Convergent | Nearest | Round | Simplest | Zero 35-657

35

Functions

Specify the rounding mode for fixed-point operations. For more information, see “Rounding” (FixedPoint Designer). Block parameters always round to the nearest representable value. To control the rounding of a block parameter, enter an expression into the mask field using a MATLAB rounding function. Programmatic Use

Block Parameter: RndMeth Type: character vector Values: "Ceiling" | "Convergent" | "Floor" | "Nearest" | "Round" | "Simplest" | "Zero" Default: "Floor" Saturate on integer overflow — Method of overflow action off (default) | on Specify whether overflows saturate or wrap. Action

Rationale

Impact on Overflows

Example

Select this check box (on).

Your model has possible overflow, and you want explicit saturation protection in the generated code.

Overflows saturate to either the minimum or maximum value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of – 128.

Clear this check box (off).

You want to optimize the efficiency of your generated code.

Overflows wrap to the appropriate value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the value causing the overflow as int8, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as int8 is –126.

You want to avoid overspecifying how a block handles out-of-range signals. For more information, see “Troubleshoot Signal Range Errors” (Simulink).

Programmatic Use

Block Parameter: SaturateOnIntegerOverflow Type: character vector 35-658

IncrementalRegressionLinear Predict

Values: "off" | "on" Default: "off" Lock output data type setting against changes by the fixed-point tools — Prevention of fixedpoint tools from overriding data type off (default) | on Select this parameter to prevent the fixed-point tools from overriding the data type you specify for the block. For more information, see “Use Lock Output Data Type Setting” (Fixed-Point Designer). Programmatic Use

Block Parameter: LockScale Type: character vector Values: "off" | "on" Default: "off" Data Type

Output data type — Data type of yfit output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the yfit output. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: OutDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Output data type Minimum — Minimum value of yfit output for range checking [] (default) | scalar Specify the lower value of the yfit output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). 35-659

35

Functions

• Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Output data type Minimum parameter does not saturate or clip the actual yfit output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: OutMin Type: character vector Values: "[]" | scalar Default: "[]" Output data type Maximum — Maximum value of yfit output for range checking [] (default) | scalar Specify the upper value of the yfit output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Output data type Maximum parameter does not saturate or clip the actual yfit output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: OutMax Type: character vector Values: "[]" | scalar Default: "[]" Inner product data type — Inner product data type double (default) | Inherit: Inherit via internal rule | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the inner product term of the predicted response on page 35-662. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: Inherit via internal rule, the block uses an internal rule to determine the inner product data type. The internal rule chooses a data type that optimizes numerical 35-660

IncrementalRegressionLinear Predict

accuracy, performance, and generated code size, while taking into account the properties of the embedded target hardware. The software cannot always optimize efficiency and numerical accuracy at the same time. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: InnerProductDataTypeStr Type: character vector Values: "Inherit: Inherit via internal rule" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "double" Inner product data type Minimum — Minimum of inner product term for range checking [] (default) | scalar Specify the lower value of the inner product term range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Inner product data type Minimum parameter does not saturate or clip the actual inner product term value. Programmatic Use

Block Parameter: InnerProductOutMin Type: character vector Values: "[]" | scalar Default: "[]" Inner product data type Maximum — Maximum of inner product term for range checking [] (default) | scalar Specify the upper value of the inner product term range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). 35-661

35

Functions

• Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Inner product data type Maximum parameter does not saturate or clip the actual inner product term value. Programmatic Use

Block Parameter: InnerProductOutMax Type: character vector Values: "[]" | scalar Default: "[]"

Block Characteristics Data Types

Boolean | double | fixed point | half | integer | single

Direct Feedthrough

yes

Multidimensional Signals

no

Variable-Size Signals

no

Zero-Crossing Detection

no

More About Predicted Response For linear regression models, the predicted response for the observation x is y = xβ+b β is the estimated column vector of coefficients, and b is the estimated scalar bias. The linear regression model object specified by Select initial machine learning model contains the coefficients and bias in the Beta and Bias properties, respectively. β and b correspond to Beta and Bias, respectively. You can specify the data types for the components required to compute predicted responses using Output data type and Inner product data type. • Output data type determines the data type of the predicted response. • Inner product data type determines the data type of xβ.

Version History Introduced in R2023b 35-662

IncrementalRegressionLinear Predict

Extended Capabilities C/C++ Code Generation Generate C and C++ code using Simulink® Coder™. Fixed-Point Conversion Design and simulate fixed-point systems using Fixed-Point Designer™.

See Also Blocks IncrementalRegressionLinear Fit | Update Metrics Objects incrementalRegressionLinear Functions fit | predict Topics “Predict Responses Using RegressionLinear Predict Block” on page 34-178

35-663

35

Functions

IncrementalRegressionLinear Fit Fit incremental linear regression model Libraries: Statistics and Machine Learning Toolbox / Incremental Learning / Regression / Linear

Description The IncrementalRegressionLinear Fit block fits a configured incremental model for linear regression (incrementalRegressionLinear) to streaming data. Import an initial linear regression model object into the block by specifying the name of a workspace variable that contains the object. The input port x receives a chunk of predictor data (observations), and the input port y receives a chunk of responses to which the model is fit. The output port mdl returns an updated incrementalRegressionLinear model. The optional input port w receives a chunk of observation weights.

Ports Input x — Chunk of predictor data numeric matrix Chunk of predictor data to which the model is fit, specified as a numeric matrix. The orientation of the variables and observations is specified by Predictor data observation dimension. The default orientation is rows, which indicates that the observations in the predictor data are oriented along the rows of x. The length of the observation responses y and the number of observations in x must be equal; y(j) is the response of observation j (row or column) in x. Note • The number of predictor variables in x must be equal to the NumPredictors property value of the initial model. If the number of predictor variables in the streaming data changes from NumPredictors, the block issues an error. • The IncrementalRegressionLinear Fit block supports only numeric input predictor data. If your input data includes categorical data, you must prepare an encoded version of the categorical data. Use dummyvar to convert each categorical variable to a numeric matrix of dummy variables. Then, concatenate all dummy variable matrices and any other numeric predictors. For more details, see “Dummy Variables” on page 2-13.

Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point 35-664

IncrementalRegressionLinear Fit

y — Chunk of responses numeric vector Chunk of responses to which the model is fit, specified as a numeric vector. The length of the observation responses y and the number of observations in x must be equal; y(j) is the response of observation j (row or column) in x. For more information, see “Predicted Response” on page 35-675. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | fixed point w — Chunk of observation weights vector of positive values Chunk of observation weights, specified as a vector of positive values. The IncrementalRegressionLinear Fit block weights the observations in x with the corresponding values in w. The size of w must be equal to the number of observations in x. Dependencies

To enable this port, select the check box for Add input port for observation weights on the Main tab of the Block Parameters dialog box. Data Types: single | double Output mdl — Updated incremental learning model parameters bus signal Updated parameters of the incremental learning model fit to streaming data (including Beta and Bias), returned as a bus signal (see Composite Signals).

Parameters Main Select initial machine learning model — Initial incremental linear regression model linearMdl (default) | incrementalRegressionLinear model object Specify the name of a workspace variable that contains the configured incrementalRegressionLinear model object. The following restrictions apply: • The predictor data cannot include categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. • The NumPredictors property of the initial model must be a positive integer scalar, and must be equal to the number of predictors in x. 35-665

35

Functions

• The Solver property of the initial model must be "scale-invariant". Programmatic Use

Block Parameter: InitialLearner Type: workspace variable Values: incrementalRegressionLinear model object Default: "linearMdl" Add input port for observation weights — Add second input port for observation weights off (default) | on Select the check box to include the input port w for observation weights in the IncrementalRegressionLinear Fit block. Programmatic Use

Block Parameter: ShowInputWeights Type: character vector Values: "off" | "on" Default: "off" Predictor data observation dimension — Observation dimension of predictor data rows (default) | columns Specify the observation dimension of the predictor data. The default value is rows, which indicates that observations in the predictor data are oriented along the rows of x. Programmatic Use

Block Parameter: ObservationsIn Type: character vector Values: "rows" | "columns" Default: "rows" Sample time (-1 for inherited) — Option to specify sample time -1 (default) | scalar Specify the discrete interval between sample time hits or specify another type of sample time, such as continuous (0) or inherited (-1). For more options, see “Types of Sample Time” (Simulink). By default, the IncrementalRegressionLinear Fit block inherits sample time based on the context of the block within the model. Programmatic Use

Block Parameter: SystemSampleTime Type: string scalar or character vector Values: scalar Default: "-1" Data Types Fixed-Point Operational Parameters

Integer rounding mode — Rounding mode for fixed-point operations

35-666

IncrementalRegressionLinear Fit

Floor (default) | Ceiling | Convergent | Nearest | Round | Simplest | Zero Specify the rounding mode for fixed-point operations. For more information, see “Rounding” (FixedPoint Designer). Block parameters always round to the nearest representable value. To control the rounding of a block parameter, enter an expression into the mask field using a MATLAB rounding function. Programmatic Use

Block Parameter: RndMeth Type: character vector Values: "Ceiling" | "Convergent" | "Floor" | "Nearest" | "Round" | "Simplest" | "Zero" Default: "Floor" Saturate on integer overflow — Method of overflow action off (default) | on Specify whether overflows saturate or wrap. Action

Rationale

Impact on Overflows

Example

Select this check box (on).

Your model has possible overflow, and you want explicit saturation protection in the generated code.

Overflows saturate to either the minimum or maximum value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of – 128.

Clear this check box (off).

You want to optimize the efficiency of your generated code.

Overflows wrap to the appropriate value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the value causing the overflow as int8, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as int8 is –126.

You want to avoid overspecifying how a block handles out-of-range signals. For more information, see “Troubleshoot Signal Range Errors” (Simulink).

35-667

35

Functions

Programmatic Use

Block Parameter: SaturateOnIntegerOverflow Type: character vector Values: "off" | "on" Default: "off" Lock output data type setting against changes by the fixed-point tools — Prevention of fixedpoint tools from overriding data type off (default) | on Select this parameter to prevent the fixed-point tools from overriding the data type you specify for the block. For more information, see “Use Lock Output Data Type Setting” (Fixed-Point Designer). Programmatic Use

Block Parameter: LockScale Type: character vector Values: "off" | "on" Default: "off" Beta data type — Data type of linear coefficient estimates output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the linear coefficient estimates (beta) output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: BetaDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: auto" Beta data type Minimum — Minimum value of beta for range checking [] (default) | scalar Specify the lower value of the beta output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). 35-668

IncrementalRegressionLinear Fit

• Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Beta data type Minimum parameter does not saturate or clip the actual beta output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: BetaOutMin Type: character vector Values: "[]" | scalar Default: "[]" Beta data type Maximum — Maximum value of beta for range checking [] (default) | scalar Specify the upper value of the beta output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Beta data type Maximum parameter does not saturate or clip the actual beta output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: BetaOutMax Type: character vector Values: "[]" | scalar Default: "[]" Bias data type — Data type of intercept estimates output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the intercept estimates (bias) output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. 35-669

35

Functions

For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: BiasDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: auto" Bias data type Minimum — Minimum value of bias for range checking [] (default) | scalar Specify the lower value of the bias output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Bias data type Minimum parameter does not saturate or clip the actual bias output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: BiasOutMin Type: character vector Values: "[]" | scalar Default: "[]" Bias data type Maximum — Maximum value of bias for range checking [] (default) | scalar Specify the upper value of the bias output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). 35-670

IncrementalRegressionLinear Fit

• Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Bias data type Maximum parameter does not saturate or clip the actual bias output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: BiasOutMax Type: character vector Values: "[]" | scalar Default: "[]" Internal states data type — Data type of internal states output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the internal states output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: StatesDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: auto" Internal states data type Minimum — Minimum value of internal states for range checking [] (default) | scalar Specify the lower value of the internal states output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). 35-671

35

Functions

Note The Internal states data type Minimum parameter does not saturate or clip the actual internal states output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: StatesOutMin Type: character vector Values: "[]" | scalar Default: "[]" Internal states data type Maximum — Maximum value of internal states for range checking [] (default) | scalar Specify the upper value of the internal states output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Internal states data type Maximum parameter does not saturate or clip the actual internal states output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: StatesOutMax Type: character vector Values: "[]" | scalar Default: "[]" Mu data type — Data type of mu output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the mu (predictor means) output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. If you do not specify Standardize="true" when you create the initial model mdl, then the IncrementalRegressionLinear Fit block sets mu to 0. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). 35-672

IncrementalRegressionLinear Fit

Programmatic Use

Block Parameter: MuDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: auto" Mu data type Minimum — Minimum value of mu for range checking [] (default) | scalar Specify the lower value of the mu output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Mu data type Minimum parameter does not saturate or clip the actual mu output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: MuOutMin Type: character vector Values: "[]" | scalar Default: "[]" Mu data type Maximum — Maximum value of mu for range checking [] (default) | scalar Specify the upper value of the mu output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Mu data type Maximum parameter does not saturate or clip the actual mu output. To do so, use the Saturation block instead. 35-673

35

Functions

Programmatic Use

Block Parameter: MuOutMax Type: character vector Values: "[]" | scalar Default: "[]" Sigma data type — Data type of sigma output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the sigma (predictor standard deviations) output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. If you do not specify Standardize=true when you create the initial model mdl, then the IncrementalRegressionLinear Fit block sets sigma to 0. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: SigmaDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: auto" Sigma data type Minimum — Minimum value of sigma for range checking [] (default) | scalar Specify the lower value of the sigma output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Sigma data type Minimum parameter does not saturate or clip the actual sigma output. To do so, use the Saturation block instead. 35-674

IncrementalRegressionLinear Fit

Programmatic Use

Block Parameter: SigmaOutMin Type: character vector Values: "[]" | scalar Default: "[]" Sigma data type Maximum — Maximum value of sigma for range checking [] (default) | scalar Specify the upper value of the sigma output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Sigma data type Maximum parameter does not saturate or clip the actual sigma output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: SigmaOutMax Type: character vector Values: "[]" | scalar Default: "[]"

Block Characteristics Data Types

Boolean | double | enumerated | fixed point | half | integer | single

Direct Feedthrough

yes

Multidimensional Signals

no

Variable-Size Signals

no

Zero-Crossing Detection

no

More About Predicted Response For linear regression models, the predicted response for the observation x is y = xβ+b 35-675

35

Functions

β is the estimated column vector of coefficients, and b is the estimated scalar bias. The linear regression model object specified by Select initial machine learning model contains the coefficients and bias in the Beta and Bias properties, respectively. β and b correspond to Beta and Bias, respectively.

Version History Introduced in R2023b

Extended Capabilities C/C++ Code Generation Generate C and C++ code using Simulink® Coder™. Fixed-Point Conversion Design and simulate fixed-point systems using Fixed-Point Designer™.

See Also Blocks IncrementalRegressionLinear Predict | Update Metrics Objects incrementalRegressionLinear Functions fit | predict | updateMetricsAndFit | updateMetrics Topics “Predict Responses Using RegressionLinear Predict Block” on page 34-178

35-676

Update Metrics

Update Metrics Update performance metrics in incremental learning model given new data Libraries: Statistics and Machine Learning Toolbox / Incremental Learning

Description The Update Metrics block outputs the performance metrics of a configured incremental model for binary classification (incrementalClassificationLinear) or linear regression (incrementalRegressionLinear), given new data. Import a trained incremental learning model object into the block by specifying the name of a workspace variable that contains the object. The input port mdl receives a bus signal that represents an incremental learning model fit to streaming data. The input port x receives a chunk of predictor data (observations), and the input port y receives a chunk of responses or labels for measuring the model performance. The output port IsWarm returns a value indicating if the model is warm, which means that it tracks performance metrics. The output port metrics returns the computed performance metrics. The optional input port w receives a chunk of observation weights.

Ports Input mdl — Incremental learning model bus signal Incremental learning model (incrementalClassificationLinear or incrementalRegressionLinear) fit to streaming data, specified as a bus signal (see Composite Signals). x — Chunk of predictor data numeric matrix Chunk of predictor data, specified as a numeric matrix. The orientation of the variables and observations is specified by Predictor data observation dimension. The default orientation is rows, which indicates that observations in the predictor data are oriented along the rows of x. The length of the observation responses y and the number of observations in x must be equal; y(j) is the response of observation j (row or column) in x. Note The block supports only numeric input predictor data. If your input data includes categorical data, you must prepare an encoded version of the categorical data. Use dummyvar to convert each categorical variable to a numeric matrix of dummy variables. Then, concatenate all dummy variable matrices and any other numeric predictors. For more details, see “Dummy Variables” on page 2-13. 35-677

35

Functions

Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point y — Chunk of responses (labels) numeric vector | logical vector | enumerated vector Chunk of responses (labels) for measuring the model performance, specified as a numeric, logical, or enumerated vector. The length of the observation responses y and the number of observations in x must be equal; y(j) is the response of observation j (row or column) in x. For classification problems: • The Update Metrics block supports binary classification only. • Each label must correspond to one row of the array. • If y contains a label that is in mdl.ClassNames, the block issues an error. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point | enumerated w — Chunk of observation weights vector of positive values Chunk of observation weights, specified as a vector of positive values. The Update Metrics block weights the observations in x with the corresponding values in w. The size of w must be equal to the number of observations in x. Dependencies

To enable this port, select the check box for Add input port for observation weights on the Main tab of the Block Parameters dialog box. Data Types: single | double Output IsWarm — Flag indicating whether model tracks performance metrics logical Flag indicating whether the incremental model tracks performance metrics, returned as logical 0 (false) or 1 (true). Value

Description

1 (true)

The incremental model mdl is warm. Consequently, the block tracks performance metrics in the bus signal output.

0 (false)

The block does not track performance metrics.

metrics — Model performance metrics matrix Model performance metrics updated during incremental learning, returned as a matrix with two columns and m rows, where m is the number of observations in x. Specify the performance metrics 35-678

Update Metrics

using Model performance metrics to track. The block ignores the metrics specified by the Metrics property of mdl. The columns of metrics are: • Cumulative — Model performance from the time the model becomes warm (IsWarm is 1). • Window — Model performance evaluated over all observations within the window specified by the MetricsWindowSize property of mdl. The software updates Window after it processes MetricsWindowSize observations. Data Types: matrix

Parameters Main Select initial machine learning model — Initial incremental linear classification or linear regression model linearMdl (default) | incrementalRegressionLinear model object | incrementalClassificationLinear model object Specify the name of a workspace variable that contains the configured incrementalRegressionLinear or incrementalClassificationLinear model object. The following restrictions apply: • The predictor data cannot include categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. • The ScoreTransform property of the initial model (classification only) cannot be "invlogit" or an anonymous function. • The NumPredictors property of the initial model must be a positive integer scalar, and must be equal to the number of predictors in x. • The Solver property of the initial model must be "scale-invariant". Programmatic Use

Block Parameter: InitialLearner Type: workspace variable Values: incrementalRegressionLinear model object, incrementalClassificationLinear model object Default: "linearMdl" Model performance metrics to track — Performance metrics function classiferror (default) | binodeviance | exponential | hinge | logit | quadratic Specify the model performance metrics (cumulative and window) to track during incremental learning. 35-679

35

Functions

If mdl is an incrementalClassificationLinear model object, you can specify one of the following: Name

Description

binodeviance

Binomial deviance

classiferror

Classification error

exponential

Exponential

hinge

Hinge

logit

Logistic

quadratic

Quadratic

If mdl is an incrementalRegressionLinear model object, you can specify one of the following: Name

Description

mse

Weighted mean squared error

epsiloninsensitive

Epsilon insensitive loss

The Update Metrics block ignores the metrics specified by the Metric property of mdl. For more details on the built-in loss functions, see loss. Programmatic Use

Block Parameter: Metric Type: character vector Values: "classiferror" | "binodeviance" | "exponential" | "hinge" | "logit" | "quadratic" | "mse" | "epsiloninsensitive" Default: "classiferror" (for incrementalClassificationLinear) | "epsiloninsensitive" (for incrementalRegressionLinear) Add input port for observation weights — Add second input port for observation weights off (default) | on Select the check box to include the input port w for observation weights in the Update Metrics block. Programmatic Use

Block Parameter: ShowInputWeights Type: character vector Values: "off" | "on" Default: "off" Predictor data observation dimension — Observation dimension of predictor data rows (default) | columns Specify the observation dimension of the predictor data. The default value is rows, which indicates that observations in the predictor data are oriented along the rows of x. Programmatic Use

Block Parameter: ObservationsIn Type: character vector Values: "rows" | "columns" 35-680

Update Metrics

Default: "rows" Sample time (-1 for inherited) — Option to specify sample time -1 (default) | scalar Specify the discrete interval between sample time hits or specify another type of sample time, such as continuous (0) or inherited (-1). For more options, see “Types of Sample Time” (Simulink). By default, the Update Metrics block inherits sample time based on the context of the block within the model. Programmatic Use

Block Parameter: SystemSampleTime Type: string scalar or character vector Values: scalar Default: "-1" Data Types Fixed-Point Operational Parameters

Integer rounding mode — Rounding mode for fixed-point operations Floor (default) | Ceiling | Convergent | Nearest | Round | Simplest | Zero Specify the rounding mode for fixed-point operations. For more information, see “Rounding” (FixedPoint Designer). Block parameters always round to the nearest representable value. To control the rounding of a block parameter, enter an expression into the mask field using a MATLAB rounding function. Programmatic Use

Block Parameter: RndMeth Type: character vector Values: "Ceiling" | "Convergent" | "Floor" | "Nearest" | "Round" | "Simplest" | "Zero" Default: "Floor" Saturate on integer overflow — Method of overflow action off (default) | on Specify whether overflows saturate or wrap.

35-681

35

Functions

Action

Rationale

Impact on Overflows

Example

Select this check box (on).

Your model has possible overflow, and you want explicit saturation protection in the generated code.

Overflows saturate to either the minimum or maximum value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of – 128.

Clear this check box (off).

You want to optimize the efficiency of your generated code.

Overflows wrap to the appropriate value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the value causing the overflow as int8, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as int8 is –126.

You want to avoid overspecifying how a block handles out-of-range signals. For more information, see “Troubleshoot Signal Range Errors” (Simulink).

Programmatic Use

Block Parameter: SaturateOnIntegerOverflow Type: character vector Values: "off" | "on" Default: "off" Lock output data type setting against changes by the fixed-point tools — Prevention of fixedpoint tools from overriding data type off (default) | on Select this parameter to prevent the fixed-point tools from overriding the data type you specify for the block. For more information, see “Use Lock Output Data Type Setting” (Fixed-Point Designer). Programmatic Use

Block Parameter: LockScale Type: character vector Values: "off" | "on" Default: "off"

35-682

Update Metrics

Data Type

Metrics data type — Data type of metrics output double (default) | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the metrics output. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: MetricsDataTypeStr Type: character vector Values: "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "double" Metrics data type Minimum — Minimum value of metrics for range checking [] (default) | scalar Specify the lower value of the metrics output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Metrics data type Minimum parameter does not saturate or clip the actual metrics output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: MetricsOutMin Type: character vector Values: "[]" | scalar Default: "[]" Metrics data type Maximum — Maximum value of metrics for range checking [] (default) | scalar 35-683

35

Functions

Specify the upper value of the metrics output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Metrics data type Maximum parameter does not saturate or clip the actual metrics output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: MetricsOutMax Type: character vector Values: "[]" | scalar Default: "[]" Predict response data type — Data type of predict response output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the response output from the predict block (IncrementalRegressionLinear Predict or IncrementalClassificationLinear Predict) that is internal to the Update Metrics block. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: PredictDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto"

35-684

Update Metrics

Predict response data type Minimum — Minimum value of predict response for range checking [] (default) | scalar Specify the lower value of the internal predict response output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Programmatic Use

Block Parameter: PredictResponseOutMin Type: character vector Values: "[]" | scalar Default: "[]" Predict response data type Maximum — Maximum value of predict response for range checking [] (default) | scalar Specify the upper value of the internal predict response output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Programmatic Use

Block Parameter: PredictResponseOutMax Type: character vector Values: "[]" | scalar Default: "[]" Loss data type — Data type of loss output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the loss value output used internally by the Update Metrics block to evaluate the cumulative loss and the loss over the window specified by the MetricsWindowSize 35-685

35

Functions

property of mdl. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: LossDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Loss data type Minimum — Minimum value of loss for range checking [] (default) | scalar Specify the lower value of the internal loss output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Programmatic Use

Block Parameter: LossOutMin Type: character vector Values: "[]" | scalar Default: "[]" Loss data type Maximum — Maximum value of loss for range checking [] (default) | scalar Specify the upper value of the internal loss output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). 35-686

Update Metrics

• Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Programmatic Use

Block Parameter: LossOutMax Type: character vector Values: "[]" | scalar Default: "[]" Additional predict data type — Additional predict data type Inherit: Inherit via internal rule (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the additional response output from the predict block (IncrementalRegressionLinear Predict or IncrementalClassificationLinear Predict) that is internal to the Update Metrics block. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: Inherit via internal rule, the block uses an internal rule to determine the additional predict data type. The internal rule chooses a data type that optimizes numerical accuracy, performance, and generated code size, while taking into account the properties of the embedded target hardware. The software cannot always optimize efficiency and numerical accuracy at the same time. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: AdditionalPredictDataTypeStr Type: character vector Values: "Inherit: Inherit via internal rule" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: Inherit via internal rule" Additional predict data type Minimum — Minimum value of additional predict for range checking [] (default) | scalar Specify the lower value of the additional predict output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). 35-687

35

Functions

• Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Programmatic Use

Block Parameter: AdditionalPredictOutMin Type: character vector Values: "[]" | scalar Default: "[]" Additional predict data type Maximum — Maximum value of additional predict for range checking [] (default) | scalar Specify the upper value of the additional predict output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Programmatic Use

Block Parameter: AdditionalPredictOutMax Type: character vector Values: "[]" | scalar Default: "[]" Additional loss data type — Additional loss data type Inherit: Inherit via internal rule (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for operations on the additional loss value used internally by the Update Metrics block to evaluate the cumulative loss and the loss over the window specified by the MetricsWindowSize property of mdl. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: Inherit via internal rule, the block uses an internal rule to determine the additional loss data type. The internal rule chooses a data type that optimizes numerical accuracy, performance, and generated code size, while taking into account the properties of the embedded target hardware. The software cannot always optimize efficiency and numerical accuracy at the same time. For more information about data types, see “Control Data Types of Signals” (Simulink). 35-688

Update Metrics

Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: AdditionalLossDataTypeStr Type: character vector Values: "Inherit: Inherit via internal rule" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: Inherit via internal rule" Additional loss data type Minimum — Minimum value of additional loss for range checking [] (default) | scalar Specify the lower value of the additional loss output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Programmatic Use

Block Parameter: AdditionalLossOutMin Type: character vector Values: "[]" | scalar Default: "[]" Additional loss data type Maximum — Maximum value of additional loss for range checking [] (default) | scalar Specify the upper value of the additional loss output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). 35-689

35

Functions

Programmatic Use

Block Parameter: AdditionalLossOutMax Type: character vector Values: "[]" | scalar Default: "[]"

Block Characteristics Data Types

Boolean | double | enumerated | fixed point | half | integer | single

Direct Feedthrough

yes

Multidimensional Signals

no

Variable-Size Signals

no

Zero-Crossing Detection

no

Version History Introduced in R2023b

Extended Capabilities C/C++ Code Generation Generate C and C++ code using Simulink® Coder™. Fixed-Point Conversion Design and simulate fixed-point systems using Fixed-Point Designer™.

See Also Blocks IncrementalClassificationLinear Fit | IncrementalClassificationLinear Predict | IncrementalRegressionLinear Fit | IncrementalRegressionLinear Predict Objects incrementalClassificationLinear | incrementalRegressionLinear Functions updateMetricsAndFit | updateMetrics Topics Perform Incremental Learning Using IncrementalClassificationLinear Fit and Predict Blocks on page 34-245 Perform Incremental Learning Using IncrementalRegressionLinear Fit and Predict Blocks on page 34241

35-690

ClassificationPartitionedECOC

ClassificationPartitionedECOC Cross-validated multiclass ECOC model for support vector machines (SVMs) and other classifiers

Description ClassificationPartitionedECOC is a set of error-correcting output codes (ECOC) models trained on cross-validated folds. Estimate the quality of the cross-validated classification by using one or more “kfold” functions: kfoldPredict, kfoldLoss, kfoldMargin, kfoldEdge, and kfoldfun. Every “kfold” method uses models trained on training-fold (in-fold) observations to predict the response for validation-fold (out-of-fold) observations. For example, suppose you cross-validate using five folds. In this case, the software randomly assigns each observation into five groups of equal size (roughly). The training fold contains four of the groups (roughly 4/5 of the data), and the validation fold contains the other group (roughly 1/5 of the data). In this case, cross-validation proceeds as follows: 1

The software trains the first model (stored in CVMdl.Trained{1}) by using the observations in the last four groups and reserves the observations in the first group for validation.

2

The software trains the second model (stored in CVMdl.Trained{2}) by using the observations in the first group and the last three groups. The software reserves the observations in the second group for validation.

3

The software proceeds in a similar fashion for the third, fourth, and fifth models.

If you validate by using kfoldPredict, the software computes predictions for the observations in group i by using the ith model. In short, the software estimates a response for every observation by using the model trained without that observation.

Creation You can create a ClassificationPartitionedECOC model in two ways: • Create a cross-validated ECOC model from an ECOC model by using the crossval object function. • Create a cross-validated ECOC model by using the fitcecoc function and specifying one of the name-value pair arguments 'CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'.

Properties Cross-Validation Properties CrossValidatedModel — Cross-validated model name character vector Cross-validated model name, specified as a character vector. For example, 'ECOC' specifies a cross-validated ECOC model. 35-691

35

Functions

Data Types: char KFold — Number of cross-validated folds positive integer Number of cross-validated folds, specified as a positive integer. Data Types: double ModelParameters — Cross-validation parameter values object Cross-validation parameter values, specified as an object. The parameter values correspond to the name-value pair argument values used to cross-validate the ECOC classifier. ModelParameters does not contain estimated parameters. You can access the properties of ModelParameters using dot notation. NumObservations — Number of observations positive numeric scalar Number of observations in the training data, specified as a positive numeric scalar. Data Types: double Partition — Data partition cvpartition model Data partition indicating how the software splits the data into cross-validation folds, specified as a cvpartition model. Trained — Compact classifiers trained on cross-validation folds cell array of CompactClassificationECOC models Compact classifiers trained on cross-validation folds, specified as a cell array of CompactClassificationECOC models. Trained has k cells, where k is the number of folds. Data Types: cell W — Observation weights numeric vector Observation weights used to cross-validate the model, specified as a numeric vector. W has NumObservations elements. The software normalizes the weights used for training so that sum(W,'omitnan') is 1. Data Types: single | double X — Unstandardized predictor data numeric matrix | table Unstandardized predictor data used to cross-validate the classifier, specified as a numeric matrix or table. Each row of X corresponds to one observation, and each column corresponds to one variable. Data Types: single | double | table 35-692

ClassificationPartitionedECOC

Y — Observed class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Observed class labels used to cross-validate the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y has NumObservations elements and has the same data type as the input argument Y that you pass to fitcecoc to cross-validate the model. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the corresponding row of X. Data Types: categorical | char | logical | single | double | cell ECOC Properties BinaryLoss — Binary learner loss function 'binodeviance' | 'exponential' | 'hamming' | 'hinge' | 'linear' | 'logit' | 'quadratic' Binary learner loss function, specified as a character vector representing the loss function name. This table identifies the default BinaryLoss value, which depends on the score ranges returned by the binary learners. Assumption

Default Value

All binary learners are any of the following:

'quadratic'

• Classification decision trees • Discriminant analysis models • k-nearest neighbor models • Naive Bayes models All binary learners are SVMs.

'hinge'

All binary learners are ensembles trained by AdaboostM1 or GentleBoost.

'exponential'

All binary learners are ensembles trained by LogitBoost.

'binodeviance'

You specify to predict class posterior probabilities by setting 'FitPosterior',true in fitcecoc.

'quadratic'

Binary learners are heterogeneous and use different loss functions. 'hamming' To check the default value, use dot notation to display the BinaryLoss property of the trained model at the command line. To potentially increase accuracy, specify a binary loss function other than the default during a prediction or loss computation by using the BinaryLoss name-value argument of kfoldPredict or kfoldLoss. For more information, see “Binary Loss” on page 35-4387. Data Types: char BinaryY — Binary learner class labels numeric matrix | [] Binary learner class labels, specified as a numeric matrix or []. 35-693

35

Functions

• If the coding matrix is the same across all folds, then BinaryY is a NumObservations-by-L matrix, where L is the number of binary learners (size(CodingMatrix,2)). The elements of BinaryY are –1, 0, and 1, and the values correspond to dichotomous class assignments. This table describes how learner j assigns observation k to a dichotomous class corresponding to the value of BinaryY(k,j). Value

Dichotomous Class Assignment

–1

Learner j assigns observation k to a negative class.

0

Before training, learner j removes observation k from the data set.

1

Learner j assigns observation k to a positive class.

• If the coding matrix varies across folds, then BinaryY is empty ([]). Data Types: double CodingMatrix — Codes specifying class assignments numeric matrix | [] Codes specifying class assignments for the binary learners, specified as a numeric matrix or []. • If the coding matrix is the same across all folds, then CodingMatrix is a K-by-L matrix, where K is the number of classes and L is the number of binary learners. The elements of CodingMatrix are –1, 0, and 1, and the values correspond to dichotomous class assignments. This table describes how learner j assigns observations in class i to a dichotomous class corresponding to the value of CodingMatrix(i,j). Value

Dichotomous Class Assignment

–1

Learner j assigns observations in class i to a negative class.

0

Before training, learner j removes observations in class i from the data set.

1

Learner j assigns observations in class i to a positive class.

• If the coding matrix varies across folds, then CodingMatrix is empty ([]). You can obtain the coding matrix for each fold by using the Trained property. For example, CVMdl.Trained{1}.CodingMatrix is the coding matrix in the first fold of the cross-validated ECOC model CVMdl. Data Types: double | single | int8 | int16 | int32 | int64 Other Classification Properties CategoricalPredictors — Categorical predictor indices vector of positive integers | [] Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). 35-694

ClassificationPartitionedECOC

Data Types: single | double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: categorical | char | logical | single | double | cell Cost — Misclassification costs square numeric matrix This property is read-only. Misclassification costs, specified as a square numeric matrix. Cost has K rows and columns, where K is the number of classes. Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. Data Types: double PredictorNames — Predictor names cell array of character vectors Predictor names in order of their appearance in the predictor data X, specified as a cell array of character vectors. The length of PredictorNames is equal to the number of columns in X. Data Types: cell Prior — Prior class probabilities numeric vector This property is read-only. Prior class probabilities, specified as a numeric vector. Prior has as many elements as the number of classes in ClassNames, and the order of the elements corresponds to the order of the classes in ClassNames. fitcecoc incorporates misclassification costs differently among different types of binary learners. Data Types: double ResponseName — Response variable name character vector Response variable name, specified as a character vector. Data Types: char ScoreTransform — Score transformation function to apply to predicted scores 'none' This property is read-only. 35-695

35

Functions

Score transformation function to apply to the predicted scores, specified as 'none'. An ECOC model does not support score transformation.

Object Functions gather kfoldEdge kfoldLoss kfoldMargin kfoldPredict kfoldfun

Gather properties of Statistics and Machine Learning Toolbox object from GPU Classification edge for cross-validated ECOC model Classification loss for cross-validated ECOC model Classification margins for cross-validated ECOC model Classify observations in cross-validated ECOC model Cross-validate function using cross-validated ECOC model

Examples Cross-Validate ECOC Classifier Cross-validate an ECOC classifier with SVM binary learners, and estimate the generalized classification error. Load Fisher's iris data set. Specify the predictor data X and the response data Y. load fisheriris X = meas; Y = species; rng(1); % For reproducibility

Create an SVM template, and standardize the predictors. t = templateSVM('Standardize',true) t = Fit template for SVM. Standardize: 1

t is an SVM template. Most of the template object properties are empty. When training the ECOC classifier, the software sets the applicable properties to their default values. Train the ECOC classifier, and specify the class order. Mdl = fitcecoc(X,Y,'Learners',t,... 'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a ClassificationECOC classifier. You can access its properties using dot notation. Cross-validate Mdl using 10-fold cross-validation. CVMdl = crossval(Mdl);

CVMdl is a ClassificationPartitionedECOC cross-validated ECOC classifier. Estimate the generalized classification error. genError = kfoldLoss(CVMdl) genError = 0.0400

35-696

ClassificationPartitionedECOC

The generalized classification error is 4%, which indicates that the ECOC classifier generalizes fairly well.

Speed Up Training ECOC Classifiers Using Binning and Parallel Computing Train a one-versus-all ECOC classifier using a GentleBoost ensemble of decision trees with surrogate splits. To speed up training, bin numeric predictors and use parallel computing. Binning is valid only when fitcecoc uses a tree learner. After training, estimate the classification error using 10-fold cross-validation. Note that parallel computing requires Parallel Computing Toolbox™. Load Sample Data Load and inspect the arrhythmia data set. load arrhythmia [n,p] = size(X) n = 452 p = 279 isLabels = unique(Y); nLabels = numel(isLabels) nLabels = 13 tabulate(categorical(Y)) Value 1 2 3 4 5 6 7 8 9 10 14 15 16

Count 245 44 15 15 13 25 3 2 9 50 4 5 22

Percent 54.20% 9.73% 3.32% 3.32% 2.88% 5.53% 0.66% 0.44% 1.99% 11.06% 0.88% 1.11% 4.87%

The data set contains 279 predictors, and the sample size of 452 is relatively small. Of the 16 distinct labels, only 13 are represented in the response (Y). Each label describes various degrees of arrhythmia, and 54.20% of the observations are in class 1. Train One-Versus-All ECOC Classifier Create an ensemble template. You must specify at least three arguments: a method, a number of learners, and the type of learner. For this example, specify 'GentleBoost' for the method, 100 for the number of learners, and a decision tree template that uses surrogate splits because there are missing observations. tTree = templateTree('surrogate','on'); tEnsemble = templateEnsemble('GentleBoost',100,tTree);

35-697

35

Functions

tEnsemble is a template object. Most of its properties are empty, but the software fills them with their default values during training. Train a one-versus-all ECOC classifier using the ensembles of decision trees as binary learners. To speed up training, use binning and parallel computing. • Binning ('NumBins',50) — When you have a large training data set, you can speed up training (a potential decrease in accuracy) by using the 'NumBins' name-value pair argument. This argument is valid only when fitcecoc uses a tree learner. If you specify the 'NumBins' value, then the software bins every numeric predictor into a specified number of equiprobable bins, and then grows trees on the bin indices instead of the original data. You can try 'NumBins',50 first, and then change the 'NumBins' value depending on the accuracy and training speed. • Parallel computing ('Options',statset('UseParallel',true)) — With a Parallel Computing Toolbox license, you can speed up the computation by using parallel computing, which sends each binary learner to a worker in the pool. The number of workers depends on your system configuration. When you use decision trees for binary learners, fitcecoc parallelizes training using Intel® Threading Building Blocks (TBB) for dual-core systems and above. Therefore, specifying the 'UseParallel' option is not helpful on a single computer. Use this option on a cluster. Additionally, specify that the prior probabilities are 1/K, where K = 13 is the number of distinct classes. options = statset('UseParallel',true); Mdl = fitcecoc(X,Y,'Coding','onevsall','Learners',tEnsemble,... 'Prior','uniform','NumBins',50,'Options',options); Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6).

Mdl is a ClassificationECOC model. Cross-Validation Cross-validate the ECOC classifier using 10-fold cross-validation. CVMdl = crossval(Mdl,'Options',options); Warning: One or more folds do not contain points from all the groups.

CVMdl is a ClassificationPartitionedECOC model. The warning indicates that some classes are not represented while the software trains at least one fold. Therefore, those folds cannot predict labels for the missing classes. You can inspect the results of a fold using cell indexing and dot notation. For example, access the results of the first fold by entering CVMdl.Trained{1}. Use the cross-validated ECOC classifier to predict validation-fold labels. You can compute the confusion matrix by using confusionchart. Move and resize the chart by changing the inner position property to ensure that the percentages appear in the row summary. oofLabel = kfoldPredict(CVMdl,'Options',options); ConfMat = confusionchart(Y,oofLabel,'RowSummary','total-normalized'); ConfMat.InnerPosition = [0.10 0.12 0.85 0.85];

35-698

ClassificationPartitionedECOC

Reproduce Binned Data Reproduce binned predictor data by using the BinEdges property of the trained model and the discretize function. X = Mdl.X; % Predictor data Xbinned = zeros(size(X)); edges = Mdl.BinEdges; % Find indices of binned predictors. idxNumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxNumeric) idxNumeric = idxNumeric'; end for j = idxNumeric x = X(:,j); % Convert x to array if x is a table. if istable(x) x = table2array(x); end % Group x into bins by using the discretize function. xbinned = discretize(x,[-inf; edges{j}; inf]); Xbinned(:,j) = xbinned; end

35-699

35

Functions

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

Version History Introduced in R2014b

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • The object functions of the ClassificationPartitionedECOC model fully support GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also cvpartition | crossval | fitcecoc | ClassificationECOC | CompactClassificationECOC

35-700

ClassificationPartitionedEnsemble

ClassificationPartitionedEnsemble Package: classreg.learning.partition Superclasses: ClassificationPartitionedModel Cross-validated classification ensemble

Description ClassificationPartitionedEnsemble is a set of classification ensembles trained on crossvalidated folds. Estimate the quality of classification by cross validation using one or more “kfold” methods: kfoldPredict, kfoldLoss, kfoldMargin, kfoldEdge, and kfoldfun. Every “kfold” method uses models trained on in-fold observations to predict response for out-of-fold observations. For example, suppose you cross validate using five folds. In this case, every training fold contains roughly 4/5 of the data and every test fold contains roughly 1/5 of the data. The first model stored in Trained{1} was trained on X and Y with the first 1/5 excluded, the second model stored in Trained{2} was trained on X and Y with the second 1/5 excluded, and so on. When you call kfoldPredict, it computes predictions for the first 1/5 of the data using the first model, for the second 1/5 of data using the second model, and so on. In short, response for every observation is computed by kfoldPredict using the model trained without this observation.

Construction cvens = crossval(ens) creates a cross-validated ensemble from ens, a classification ensemble. For syntax details, see the crossval method reference page. cvens = fitcensemble(X,Y,Name,Value) creates a cross-validated ensemble when Name is one of 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. For syntax details, see the fitcensemble function reference page.

Properties BinEdges Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors. The software bins numeric predictors only if you specify the 'NumBins' name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default). You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl. X = mdl.X; % Predictor data Xbinned = zeros(size(X)); edges = mdl.BinEdges; % Find indices of binned predictors.

35-701

35

Functions

idxNumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxNumeric) idxNumeric = idxNumeric'; end for j = idxNumeric x = X(:,j); % Convert x to array if x is a table. if istable(x) x = table2array(x); end % Group x into bins by using the discretize function. xbinned = discretize(x,[-inf; edges{j}; inf]); Xbinned(:,j) = xbinned; end

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs. CategoricalPredictors Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). ClassNames List of the elements in Y with duplicates removed. ClassNames can be a numeric vector, vector of categorical variables, logical vector, character array, or cell array of character vectors. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.) Combiner Cell array of combiners across all folds. Cost Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. This property is readonly. CrossValidatedModel Name of the cross-validated model, a character vector. KFold Number of folds used in a cross-validated ensemble, a positive integer. ModelParameters Object holding parameters of cvens. 35-702

ClassificationPartitionedEnsemble

NumObservations Number of data points used in training the ensemble, a positive integer. NumTrainedPerFold Number of weak learners used in training each fold of the ensemble, a positive integer. Partition Partition of class cvpartition used in creating the cross-validated ensemble. PredictorNames Cell array of names for the predictor variables, in the order in which they appear in X. Prior Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames. The number of elements of Prior is the number of unique classes in the response. This property is read-only. ResponseName Name of the response variable Y, a character vector. ScoreTransform Function handle for transforming scores, or character vector representing a built-in transformation function. 'none' means no transformation; equivalently, 'none' means @(x)x. For a list of built-in transformation functions and the syntax of custom transformation functions, see fitctree. Add or change a ScoreTransform function using dot notation: ens.ScoreTransform = 'function'

or ens.ScoreTransform = @function

Trainable Cell array of ensembles trained on cross-validation folds. Every ensemble is full, meaning it contains its training data and weights. Trained Cell array of compact ensembles trained on cross-validation folds. W Scaled weights, a vector with length n, the number of rows in X. X A matrix or table of predictor values. Each column of X represents one variable, and each row represents one observation. 35-703

35

Functions

Y Numeric vector, categorical vector, logical vector, character array, or cell array of character vectors. Each row of Y is the response to the data in the corresponding row of X.

Object Functions gather kfoldEdge kfoldLoss kfoldMargin kfoldPredict kfoldfun resume

Gather properties of Statistics and Machine Learning Toolbox object from GPU Classification edge for cross-validated classification model Classification loss for cross-validated classification model Classification margins for cross-validated classification model Classify observations in cross-validated classification model Cross-validate function for classification Resume training learners on cross-validation folds

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects.

Examples Evaluate K-Fold Cross-Validation Error for Classification Ensemble Evaluate the k-fold cross-validation error for a classification ensemble that models the Fisher iris data. Load the sample data set. load fisheriris

Train an ensemble of 100 boosted classification trees using AdaBoostM2. t = templateTree('MaxNumSplits',1); % Weak learner template tree object ens = fitcensemble(meas,species,'Method','AdaBoostM2','Learners',t);

Create a cross-validated ensemble from ens and find the k-fold cross-validation error. rng(10,'twister') % For reproducibility cvens = crossval(ens); L = kfoldLoss(cvens) L = 0.0533

Version History R2022a: Cost property stores the user-specified cost matrix Behavior changed in R2022a Starting in R2022a, the Cost property stores the user-specified cost matrix, so that you can compute the observed misclassification cost using the specified cost value. The software stores normalized prior probabilities (Prior) and observation weights (W) that do not reflect the penalties described in the cost matrix. To compute the observed misclassification cost, specify the LossFun name-value argument as "classifcost" when you call the kfoldLoss function. 35-704

ClassificationPartitionedEnsemble

Note that model training has not changed and, therefore, the decision boundaries between classes have not changed. For training, the fitting function updates the specified prior probabilities by incorporating the penalties described in the specified cost matrix, and then normalizes the prior probabilities and observation weights. This behavior has not changed. In previous releases, the software stored the default cost matrix in the Cost property and stored the prior probabilities and observation weights used for training in the Prior and W properties, respectively. Starting in R2022a, the software stores the user-specified cost matrix without modification, and stores normalized prior probabilities and observation weights that do not reflect the cost penalties. For more details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 19-8. Some object functions use the Cost and W properties: • The kfoldLoss function uses the cost matrix stored in the Cost property if you specify the LossFun name-value argument as "classifcost" or "mincost". • The kfoldLoss and kfoldEdge functions use the observation weights stored in the W property. If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases. If you want the software to handle the cost matrix, prior probabilities, and observation weights in the same way as in previous releases, adjust the prior probabilities and observation weights for the nondefault cost matrix, as described in “Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix” on page 19-9. Then, when you train a classification model, specify the adjusted prior probabilities and observation weights by using the Prior and Weights name-value arguments, respectively, and use the default cost matrix.

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • The object functions of the ClassificationPartitionedEnsemble model fully support GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also RegressionPartitionedEnsemble | ClassificationPartitionedModel | ClassificationEnsemble | fitctree

35-705

35

Functions

ClassificationPartitionedGAM Cross-validated generalized additive model (GAM) for classification

Description ClassificationPartitionedGAM is a set of generalized additive models trained on crossvalidated folds. Estimate the quality of the cross-validated classification by using one or more kfold functions: kfoldPredict, kfoldLoss, kfoldMargin, kfoldEdge, and kfoldfun. Every kfold object function uses models trained on training-fold (in-fold) observations to predict the response for validation-fold (out-of-fold) observations. For example, suppose you cross-validate using five folds. The software randomly assigns each observation into five groups of equal size (roughly). The training fold contains four of the groups (roughly 4/5 of the data), and the validation fold contains the other group (roughly 1/5 of the data). In this case, cross-validation proceeds as follows: 1

The software trains the first model (stored in CVMdl.Trained{1}) by using the observations in the last four groups, and reserves the observations in the first group for validation.

2

The software trains the second model (stored in CVMdl.Trained{2}) by using the observations in the first group and the last three groups. The software reserves the observations in the second group for validation.

3

The software proceeds in a similar manner for the third, fourth, and fifth models.

If you validate by using kfoldPredict, the software computes predictions for the observations in group i by using the ith model. In short, the software estimates a response for every observation by using the model trained without that observation.

Creation You can create a ClassificationPartitionedGAM model in two ways: • Create a cross-validated model from a GAM object ClassificationGAM by using the crossval object function. • Create a cross-validated model by using the fitcgam function and specifying one of the namevalue arguments 'CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'.

Properties Cross-Validation Properties CrossValidatedModel — Cross-validated model name 'GAM' This property is read-only. Cross-validated model name, specified as 'GAM'. KFold — Number of cross-validated folds positive integer 35-706

ClassificationPartitionedGAM

This property is read-only. Number of cross-validated folds, specified as a positive integer. Data Types: double ModelParameters — Cross-validation parameter values object This property is read-only. Cross-validation parameter values, specified as an object. The parameter values correspond to the values of the name-value arguments used to cross-validate the generalized additive model. ModelParameters does not contain estimated parameters. You can access the properties of ModelParameters using dot notation. Partition — Data partition cvpartition model This property is read-only. Data partition indicating how the software splits the data into cross-validation folds, specified as a cvpartition model. Trained — Compact classifiers trained on cross-validation folds cell array of CompactClassificationGAM models This property is read-only. Compact classifiers trained on cross-validation folds, specified as a cell array of CompactClassificationGAM model objects. Trained has k cells, where k is the number of folds. Data Types: cell Other Classification Properties CategoricalPredictors — Categorical predictor indices vector of positive integers | [] This property is read-only. Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). Data Types: double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. 35-707

35

Functions

(The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: single | double | logical | char | cell | categorical Cost — Misclassification costs 2-by-2 numeric matrix Misclassification costs, specified as a 2-by-2 numeric matrix. Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The software uses the Cost value for prediction, but not training. You can change the value by using dot notation. Example: Mdl.Cost = C; Data Types: double NumObservations — Number of observations numeric scalar This property is read-only. Number of observations in the training data stored in X and Y, specified as a numeric scalar. Data Types: double PredictorNames — Predictor variable names cell array of character vectors This property is read-only. Predictor variable names, specified as a cell array of character vectors. The order of the elements in PredictorNames corresponds to the order in which the predictor names appear in the training data. Data Types: cell Prior — Prior class probabilities numeric vector This property is read-only. Prior class probabilities, specified as a numeric vector with two elements. The order of the elements corresponds to the order of the elements in ClassNames. Data Types: double ResponseName — Response variable name character vector This property is read-only. Response variable name, specified as a character vector. Data Types: char 35-708

ClassificationPartitionedGAM

ScoreTransform — Score transformation character vector | function handle Score transformation, specified as a character vector or function handle. ScoreTransform represents a built-in transformation function or a function handle for transforming predicted classification scores. To change the score transformation function to function, for example, use dot notation. • For a built-in function, enter a character vector. Mdl.ScoreTransform = 'function';

This table describes the available built-in functions. Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). This property determines the output score computation for object functions such as kfoldPredict, kfoldMargin, and kfoldEdge. Use 'logit' to compute posterior probabilities, and use 'none' to compute the logit of posterior probabilities. Data Types: char | function_handle W — Observation weights numeric vector This property is read-only. Observation weights used to train the model, specified as an n-by-1 numeric vector. n is the number of observations (NumObservations). The software normalizes the observation weights specified in the 'Weights' name-value argument so that the elements of W within a particular class sum up to the prior probability of that class. 35-709

35

Functions

Data Types: double X — Predictors numeric matrix | table This property is read-only. Predictors used to cross-validate the model, specified as a numeric matrix or table. Each row of X corresponds to one observation, and each column corresponds to one variable. Data Types: single | double | table Y — Class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Class labels used to cross-validate the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y has the same data type as the response variable used to train the model. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the corresponding row of X. Data Types: single | double | logical | char | cell | categorical

Object Functions kfoldPredict kfoldLoss kfoldMargin kfoldEdge kfoldfun

Classify observations in cross-validated classification model Classification loss for cross-validated classification model Classification margins for cross-validated classification model Classification edge for cross-validated classification model Cross-validate function for classification

Examples Create Cross-Validated GAM Using fitcgam Train a cross-validated GAM with 10 folds, which is the default cross-validation option, by using fitcgam. Then, use kfoldPredict to predict class labels for validation-fold observations using a model trained on training-fold observations. Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). load ionosphere

Create a cross-validated GAM by using the default cross-validation option. Specify the 'CrossVal' name-value argument as 'on'. rng('default') % For reproducibility CVMdl = fitcgam(X,Y,'CrossVal','on') CVMdl = ClassificationPartitionedGAM

35-710

ClassificationPartitionedGAM

CrossValidatedModel: PredictorNames: ResponseName: NumObservations: KFold: Partition: NumTrainedPerFold: ClassNames: ScoreTransform:

'GAM' {'x1' 'x2' 'x3' 'Y' 351 10 [1x1 cvpartition] [1x1 struct] {'b' 'g'} 'logit'

'x4'

'x5'

'x6'

'x7'

'x8'

'x9'

'x10'

The fitcgam function creates a ClassificationPartitionedGAM model object CVMdl with 10 folds. During cross-validation, the software completes these steps: 1

Randomly partition the data into 10 sets.

2

For each set, reserve the set as validation data, and train the model using the other 9 sets.

3

Store the 10 compact, trained models in a 10-by-1 cell vector in the Trained property of the cross-validated model object ClassificationPartitionedGAM.

You can override the default cross-validation setting by using the 'CVPartition', 'Holdout', 'KFold', or 'Leaveout' name-value argument. Classify the observations in X by using kfoldPredict. The function predicts class labels for every observation using the model trained without that observation. label = kfoldPredict(CVMdl);

Create a confusion matrix to compare the true classes of the observations to their predicted labels. C = confusionchart(Y,label);

35-711

'x11'

'x1

35

Functions

Compute the classification error. L = kfoldLoss(CVMdl) L = 0.0712

The average misclassification rate over 10 folds is about 7%.

Create Cross-Validated GAM Using crossval Train a GAM by using fitcgam, and create a cross-validated GAM by using crossval and the holdout option. Then, use kfoldPredict to predict responses for validation-fold observations using a model trained on training-fold observations. Load the 1994 census data stored in census1994.mat. The data set consists of demographic data from the US Census Bureau to predict whether an individual makes over $50,000 per year. The classification task is to fit a model that predicts the salary category of people given their age, working class, education level, marital status, race, and so on. load census1994

census1994 contains the training data set adultdata and the test data set adulttest. To reduce the running time for this example, subsample 500 training observations from adultdata by using the datasample function. 35-712

ClassificationPartitionedGAM

rng('default') NumSamples = 5e2; adultdata = datasample(adultdata,NumSamples,'Replace',false);

Train a GAM that contains both linear and interaction terms for predictors. Specify to include all available interaction terms whose p-values are not greater than 0.05. Mdl = fitcgam(adultdata,'salary','Interactions','all','MaxPValue',0.05);

Mdl is a ClassificationGAM model object. Cross-validate the model by specifying a 30% holdout sample. CVMdl = crossval(Mdl,'Holdout',0.3) CVMdl = ClassificationPartitionedGAM CrossValidatedModel: 'GAM' PredictorNames: {'age' 'workClass' CategoricalPredictors: [2 4 6 7 8 9 10 14] ResponseName: 'salary' NumObservations: 500 KFold: 1 Partition: [1x1 cvpartition] NumTrainedPerFold: [1x1 struct] ClassNames: [50K] ScoreTransform: 'logit'

'fnlwgt'

'education'

'education_num'

The crossval function creates a ClassificationPartitionedGAM model object CVMdl with the holdout option. During cross-validation, the software completes these steps: 1

Randomly select and reserve 30% of the data as validation data, and train the model using the rest of the data.

2

Store the compact, trained model in the Trained property of the cross-validated model object ClassificationPartitionedGAM.

You can choose a different cross-validation setting by using the 'CrossVal', 'CVPartition', 'KFold', or 'Leaveout' name-value argument. Classify the validation-fold observations by using kfoldPredict. The function predicts class labels for the validation-fold observations by using the model trained on the training-fold observations. The function assigns the most frequently predicted label to the training-fold observations. [labels,scores] = kfoldPredict(CVMdl);

Find the validation-fold observations. kfoldPredict returns 0 scores for both classes for the training-fold observations. Therefore, you can identify the validation-fold observations by finding the observations whose scores are all zeros. idx = find(sum(abs(scores),2)~=0);

Create a confusion matrix to compare the true classes of the observations to their predicted labels, and compute the classification error for the validation-fold observations. C = confusionchart(adultdata.salary(idx),labels(idx));

35-713

'marital_

35

Functions

L = kfoldLoss(CVMdl) L = 0.1800

Find Optimal Number of Trees for GAM Using kfoldLoss Train a cross-validated generalized additive model (GAM) with 10 folds. Then, use kfoldLoss to compute cumulative cross-validation classification errors (misclassification rate in decimal). Use the errors to determine the optimal number of trees per predictor (linear term for predictor) and the optimal number of trees per interaction term. Alternatively, you can find optimal values of fitcgam name-value arguments by using the “OptimizeHyperparameters” on page 35-0 name-value argument. For an example, see “Optimize GAM Using OptimizeHyperparameters” on page 35-2296. Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). load ionosphere

Create a cross-validated GAM by using the default cross-validation option. Specify the 'CrossVal' name-value argument as 'on'. Specify to include all available interaction terms whose p-values are not greater than 0.05. 35-714

ClassificationPartitionedGAM

rng('default') % For reproducibility CVMdl = fitcgam(X,Y,'CrossVal','on','Interactions','all','MaxPValue',0.05);

If you specify 'Mode' as 'cumulative' for kfoldLoss, then the function returns cumulative errors, which are the average errors across all folds obtained using the same number of trees for each fold. Display the number of trees for each fold. CVMdl.NumTrainedPerFold ans = struct with fields: PredictorTrees: [65 64 59 61 60 66 65 62 64 61] InteractionTrees: [1 2 2 2 2 1 2 2 2 2]

kfoldLoss can compute cumulative errors using up to 59 predictor trees and one interaction tree. Plot the cumulative, 10-fold cross-validated, classification error (misclassification rate in decimal). Specify 'IncludeInteractions' as false to exclude interaction terms from the computation. L_noInteractions = kfoldLoss(CVMdl,'Mode','cumulative','IncludeInteractions',false); figure plot(0:min(CVMdl.NumTrainedPerFold.PredictorTrees),L_noInteractions)

The first element of L_noInteractions is the average error over all folds obtained using only the intercept (constant) term. The (J+1)th element of L_noInteractions is the average error obtained using the intercept term and the first J predictor trees per linear term. Plotting the cumulative loss allows you to monitor how the error changes as the number of predictor trees in GAM increases. 35-715

35

Functions

Find the minimum error and the number of predictor trees used to achieve the minimum error. [M,I] = min(L_noInteractions) M = 0.0655 I = 23

The GAM achieves the minimum error when it includes 22 predictor trees. Compute the cumulative classification error using both linear terms and interaction terms. L = kfoldLoss(CVMdl,'Mode','cumulative') L = 2×1 0.0712 0.0712

The first element of L is the average error over all folds obtained using the intercept (constant) term and all predictor trees per linear term. The second element of L is the average error obtained using the intercept term, all predictor trees per linear term, and one interaction tree per interaction term. The error does not decrease when interaction terms are added. If you are satisfied with the error when the number of predictor trees is 22, you can create a predictive model by training the univariate GAM again and specifying 'NumTreesPerPredictor',22 without cross-validation.

Version History Introduced in R2021a

See Also ClassificationGAM | crossval Topics “Train Generalized Additive Model for Binary Classification” on page 12-77

35-716

ClassificationPartitionedKernel

ClassificationPartitionedKernel Cross-validated, binary kernel classification model

Description ClassificationPartitionedKernel is a binary kernel classification model, trained on crossvalidated folds. You can estimate the quality of classification, or how well the kernel classification model generalizes, using one or more “kfold” functions: kfoldPredict, kfoldLoss, kfoldMargin, and kfoldEdge. Every “kfold” method uses models trained on training-fold (in-fold) observations to predict the response for validation-fold (out-of-fold) observations. For example, suppose that you cross-validate using five folds. In this case, the software randomly assigns each observation into five groups of equal size (roughly). The training fold contains four of the groups (that is, roughly 4/5 of the data) and the validation fold contains the other group (that is, roughly 1/5 of the data). In this case, cross-validation proceeds as follows: 1

The software trains the first model (stored in CVMdl.Trained{1}) by using the observations in the last four groups and reserves the observations in the first group for validation.

2

The software trains the second model (stored in CVMdl.Trained{2}) using the observations in the first group and the last three groups. The software reserves the observations in the second group for validation.

3

The software proceeds in a similar fashion for the third, fourth, and fifth models.

If you validate by using kfoldPredict, the software computes predictions for the observations in group i by using the ith model. In short, the software estimates a response for every observation by using the model trained without that observation. Note ClassificationPartitionedKernel model objects do not store the predictor data set.

Creation You can create a ClassificationPartitionedKernel model by training a classification kernel model using fitckernel and specifying one of these name-value pair arguments: 'Crossval', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'.

Properties Cross-Validation Properties CrossValidatedModel — Cross-validated model name character vector This property is read-only. Cross-validated model name, specified as a character vector. 35-717

35

Functions

For example, 'Kernel' specifies a cross-validated kernel model. Data Types: char KFold — Number of cross-validated folds positive integer scalar This property is read-only. Number of cross-validated folds, specified as a positive integer scalar. Data Types: double ModelParameters — Cross-validation parameter values object This property is read-only. Cross-validation parameter values, specified as an object. The parameter values correspond to the name-value pair argument values used to cross-validate the kernel classifier. ModelParameters does not contain estimated parameters. You can access the properties of ModelParameters using dot notation. NumObservations — Number of observations positive numeric scalar This property is read-only. Number of observations in the training data, specified as a positive numeric scalar. Data Types: double Partition — Data partition cvpartition model This property is read-only. Data partition indicating how the software splits the data into cross-validation folds, specified as a cvpartition model. Trained — Kernel classifiers trained on cross-validation folds cell array of ClassificationKernel models This property is read-only. Kernel classifiers trained on cross-validation folds, specified as a cell array of ClassificationKernel models. Trained has k cells, where k is the number of folds. Data Types: cell W — Observation weights numeric vector This property is read-only. Observation weights used to cross-validate the model, specified as a numeric vector. W has NumObservations elements. 35-718

ClassificationPartitionedKernel

The software normalizes the weights used for training so that sum(W,'omitnan') is 1. Data Types: single | double Y — Observed class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Observed class labels used to cross-validate the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y has NumObservations elements and has the same data type as the input argument Y that you pass to fitckernel to cross-validate the model. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the corresponding row of X. Data Types: categorical | char | logical | single | double | cell Other Classification Properties CategoricalPredictors — Categorical predictor indices vector of positive integers | [] This property is read-only. Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). Data Types: single | double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the observed class labels property Y and determines the class order. Data Types: categorical | char | logical | single | double | cell Cost — Misclassification costs square numeric matrix This property is read-only. Misclassification costs, specified as a square numeric matrix. Cost has K rows and columns, where K is the number of classes. Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. Data Types: double PredictorNames — Predictor names cell array of character vectors 35-719

35

Functions

This property is read-only. Predictor names in order of their appearance in the predictor data, specified as a cell array of character vectors. The length of PredictorNames is equal to the number of columns used as predictor variables in the training data X or Tbl. Data Types: cell Prior — Prior class probabilities numeric vector This property is read-only. Prior class probabilities, specified as a numeric vector. Prior has as many elements as there are classes in ClassNames, and the order of the elements corresponds to the elements of ClassNames. Data Types: double ResponseName — Response variable name character vector This property is read-only. Response variable name, specified as a character vector. Data Types: char ScoreTransform — Score transformation function 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | 'none' | function handle | ... Score transformation function to apply to predicted scores, specified as a function name or function handle. For a kernel classification model Mdl, and before the score transformation, the predicted classification score for the observation x (row vector) is f x = T(x)β + b . • T · is a transformation of an observation for feature expansion. • β is the estimated column vector of coefficients. • b is the estimated scalar bias. To change the CVMdl score transformation function to function, for example, use dot notation. • For a built-in function, enter this code and replace function with a value from the table. CVMdl.ScoreTransform = 'function';

35-720

Value

Description

"doublelogit"

1/(1 + e–2x)

"invlogit"

log(x / (1 – x))

"ismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

"logit"

1/(1 + e–x)

"none" or "identity"

x (no transformation)

ClassificationPartitionedKernel

Value

Description

"sign"

–1 for x < 0 0 for x = 0 1 for x > 0

"symmetric"

2x – 1

"symmetricismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

"symmetriclogit"

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. CVMdl.ScoreTransform = @function;

function must accept a matrix of the original scores for each class, and then return a matrix of the same size representing the transformed scores for each class. Data Types: char | function_handle

Object Functions kfoldEdge kfoldLoss kfoldMargin kfoldPredict

Classification edge for cross-validated kernel classification model Classification loss for cross-validated kernel classification model Classification margins for cross-validated kernel classification model Classify observations in cross-validated kernel classification model

Examples Cross-Validate Kernel Classification Model Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). load ionosphere rng('default') % For reproducibility

Cross-validate a binary kernel classification model. By default, the software uses 10-fold crossvalidation. CVMdl = fitckernel(X,Y,'CrossVal','on') CVMdl = ClassificationPartitionedKernel CrossValidatedModel: 'Kernel' ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none'

numel(CVMdl.Trained)

35-721

35

Functions

ans = 10

CVMdl is a ClassificationPartitionedKernel model. Because fitckernel implements 10-fold cross-validation, CVMdl contains 10 ClassificationKernel models that the software trains on training-fold (in-fold) observations. Estimate the cross-validated classification error. kfoldLoss(CVMdl) ans = 0.0940

The classification error rate is approximately 9%.

Version History Introduced in R2018b R2022a: Cost property stores the user-specified cost matrix Behavior changed in R2022a Starting in R2022a, the Cost property stores the user-specified cost matrix, so that you can compute the observed misclassification cost using the specified cost value. The software stores normalized prior probabilities (Prior) and observation weights (W) that do not reflect the penalties described in the cost matrix. To compute the observed misclassification cost, specify the LossFun name-value argument as "classifcost" when you call the kfoldLoss function. Note that model training has not changed and, therefore, the decision boundaries between classes have not changed. For training, the fitting function updates the specified prior probabilities by incorporating the penalties described in the specified cost matrix, and then normalizes the prior probabilities and observation weights. This behavior has not changed. In previous releases, the software stored the default cost matrix in the Cost property and stored the prior probabilities and observation weights used for training in the Prior and W properties, respectively. Starting in R2022a, the software stores the user-specified cost matrix without modification, and stores normalized prior probabilities and observation weights that do not reflect the cost penalties. For more details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 19-8. Some object functions use the Cost and W properties: • The kfoldLoss function uses the cost matrix stored in the Cost property if you specify the LossFun name-value argument as "classifcost" or "mincost". • The kfoldLoss and kfoldEdge functions use the observation weights stored in the W property. If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases. If you want the software to handle the cost matrix, prior probabilities, and observation weights in the same way as in previous releases, adjust the prior probabilities and observation weights for the nondefault cost matrix, as described in “Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix” on page 19-9. Then, when you train a classification model, specify the adjusted prior probabilities and observation weights by using the Prior and Weights name-value arguments, respectively, and use the default cost matrix. 35-722

ClassificationPartitionedKernel

See Also fitckernel | ClassificationKernel

35-723

35

Functions

ClassificationPartitionedKernelECOC Cross-validated kernel error-correcting output codes (ECOC) model for multiclass classification

Description ClassificationPartitionedKernelECOC is an error-correcting output codes (ECOC) model composed of kernel classification models, trained on cross-validated folds. Estimate the quality of the classification by cross-validation using one or more “kfold” functions: kfoldPredict, kfoldLoss, kfoldMargin, and kfoldEdge. Every “kfold” method uses models trained on training-fold (in-fold) observations to predict the response for validation-fold (out-of-fold) observations. For example, suppose that you cross-validate using five folds. In this case, the software randomly assigns each observation into five groups of equal size (roughly). The training fold contains four of the groups (that is, roughly 4/5 of the data) and the validation fold contains the other group (that is, roughly 1/5 of the data). In this case, cross-validation proceeds as follows: 1

The software trains the first model (stored in CVMdl.Trained{1}) by using the observations in the last four groups and reserves the observations in the first group for validation.

2

The software trains the second model (stored in CVMdl.Trained{2}) using the observations in the first group and the last three groups. The software reserves the observations in the second group for validation.

3

The software proceeds in a similar fashion for the third, fourth, and fifth models.

If you validate by using kfoldPredict, the software computes predictions for the observations in group i by using the ith model. In short, the software estimates a response for every observation by using the model trained without that observation. Note ClassificationPartitionedKernelECOC model objects do not store the predictor data set.

Creation You can create a ClassificationPartitionedKernelECOC model by training an ECOC model using fitcecoc and specifying these name-value pair arguments: • 'Learners'– Set the value to 'kernel', a template object returned by templateKernel, or a cell array of such template objects. • One of the arguments 'CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. For more details, see fitcecoc.

35-724

ClassificationPartitionedKernelECOC

Properties Cross-Validation Properties CrossValidatedModel — Cross-validated model name character vector This property is read-only. Cross-validated model name, specified as a character vector. For example, 'KernelECOC' specifies a cross-validated kernel ECOC model. Data Types: char KFold — Number of cross-validated folds positive integer scalar This property is read-only. Number of cross-validated folds, specified as a positive integer scalar. Data Types: double ModelParameters — Cross-validation parameter values object This property is read-only. Cross-validation parameter values, specified as an object. The parameter values correspond to the name-value pair argument values used to cross-validate the ECOC classifier. ModelParameters does not contain estimated parameters. You can access the properties of ModelParameters using dot notation. NumObservations — Number of observations positive numeric scalar This property is read-only. Number of observations in the training data, specified as a positive numeric scalar. Data Types: double Partition — Data partition cvpartition model This property is read-only. Data partition indicating how the software splits the data into cross-validation folds, specified as a cvpartition model. Trained — Compact classifiers trained on cross-validation folds cell array of CompactClassificationECOC models This property is read-only. 35-725

35

Functions

Compact classifiers trained on cross-validation folds, specified as a cell array of CompactClassificationECOC models. Trained has k cells, where k is the number of folds. Data Types: cell W — Observation weights numeric vector This property is read-only. Observation weights used to cross-validate the model, specified as a numeric vector. W has NumObservations elements. The software normalizes the weights used for training so that sum(W,'omitnan') is 1. Data Types: single | double Y — Observed class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Observed class labels used to cross-validate the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y has NumObservations elements and has the same data type as the input argument Y that you pass to fitcecoc to cross-validate the model. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the corresponding row of the predictor data. Data Types: categorical | char | logical | single | double | cell ECOC Properties BinaryLoss — Binary learner loss function 'hinge' | 'quadratic' This property is read-only. Binary learner loss function, specified as a character vector representing the loss function name. By default, if all binary learners are kernel classification models using SVM, then BinaryLoss is 'hinge'. If all binary learners are kernel classification models using logistic regression, then BinaryLoss is 'quadratic'. To potentially increase accuracy, specify a binary loss function other than the default during a prediction or loss computation by using the BinaryLoss name-value argument of kfoldPredict or kfoldLoss. For the list of supported binary loss functions, see “Binary Loss” on page 35-4407. Data Types: char BinaryY — Binary learner class labels numeric matrix | [] This property is read-only. Binary learner class labels, specified as a numeric matrix or []. 35-726

ClassificationPartitionedKernelECOC

• If the coding matrix is the same across all folds, then BinaryY is a NumObservations-by-L matrix, where L is the number of binary learners (size(CodingMatrix,2)). The elements of BinaryY are –1, 0, and 1, and the values correspond to dichotomous class assignments. This table describes how learner j assigns observation k to a dichotomous class corresponding to the value of BinaryY(k,j). Value

Dichotomous Class Assignment

–1

Learner j assigns observation k to a negative class.

0

Before training, learner j removes observation k from the data set.

1

Learner j assigns observation k to a positive class.

• If the coding matrix varies across folds, then BinaryY is empty ([]). Data Types: double CodingMatrix — Codes specifying class assignments numeric matrix | [] This property is read-only. Codes specifying class assignments for the binary learners, specified as a numeric matrix or []. • If the coding matrix is the same across all folds, then CodingMatrix is a K-by-L matrix, where K is the number of classes and L is the number of binary learners. The elements of CodingMatrix are –1, 0, and 1, and the values correspond to dichotomous class assignments. This table describes how learner j assigns observations in class i to a dichotomous class corresponding to the value of CodingMatrix(i,j). Value

Dichotomous Class Assignment

–1

Learner j assigns observations in class i to a negative class.

0

Before training, learner j removes observations in class i from the data set.

1

Learner j assigns observations in class i to a positive class.

• If the coding matrix varies across folds, then CodingMatrix is empty ([]). You can obtain the coding matrix for each fold by using the Trained property. For example, CVMdl.Trained{1}.CodingMatrix is the coding matrix in the first fold of the cross-validated ECOC model CVMdl. Data Types: double | single | int8 | int16 | int32 | int64 Other Classification Properties CategoricalPredictors — Categorical predictor indices vector of positive integers | [] This property is read-only. 35-727

35

Functions

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). Data Types: single | double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the observed class labels property Y and determines the class order. Data Types: categorical | char | logical | single | double | cell Cost — Misclassification costs square numeric matrix This property is read-only. Misclassification costs, specified as a square numeric matrix. Cost has K rows and columns, where K is the number of classes. Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. Data Types: double PredictorNames — Predictor names cell array of character vectors This property is read-only. Predictor names in order of their appearance in the predictor data, specified as a cell array of character vectors. The length of PredictorNames is equal to the number of columns used as predictor variables in the training data X or Tbl. Data Types: cell Prior — Prior class probabilities numeric vector This property is read-only. Prior class probabilities, specified as a numeric vector. Prior has as many elements as there are classes in ClassNames, and the order of the elements corresponds to the elements of ClassNames. Data Types: double ResponseName — Response variable name character vector This property is read-only. Response variable name, specified as a character vector. 35-728

ClassificationPartitionedKernelECOC

Data Types: char ScoreTransform — Score transformation function to apply to predicted scores 'none' This property is read-only. Score transformation function to apply to the predicted scores, specified as 'none'. An ECOC model does not support score transformation.

Object Functions kfoldEdge kfoldLoss kfoldMargin kfoldPredict

Classification edge for cross-validated kernel ECOC model Classification loss for cross-validated kernel ECOC model Classification margins for cross-validated kernel ECOC model Classify observations in cross-validated kernel ECOC model

Examples Cross-Validate Multiclass Kernel Classification Model Create a cross-validated, multiclass kernel ECOC classification model using fitcecoc. Load Fisher's iris data set. X contains flower measurements, and Y contains the names of flower species. load fisheriris X = meas; Y = species;

Cross-validate a multiclass kernel ECOC classification model that can identify the species of a flower based on the flower's measurements. rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners','kernel','CrossVal','on') CVMdl = ClassificationPartitionedKernelECOC CrossValidatedModel: 'KernelECOC' ResponseName: 'Y' NumObservations: 150 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'setosa' 'versicolor' ScoreTransform: 'none'

'virginica'}

CVMdl is a ClassificationPartitionedKernelECOC cross-validated model. fitcecoc implements 10-fold cross-validation by default. Therefore, CVMdl.Trained contains a 10-by-1 cell array of ten CompactClassificationECOC models, one for each fold. Each compact ECOC model is composed of binary kernel classification models. Estimate the classification error by passing CVMdl to kfoldLoss. error = kfoldLoss(CVMdl)

35-729

35

Functions

error = 0.0333

The estimated classification error is about 3% misclassified observations. To change default options when training ECOC models composed of kernel classification models, create a kernel classification model template using templateKernel, and then pass the template to fitcecoc.

Version History Introduced in R2018b

See Also fitcecoc | fitckernel | CompactClassificationECOC | ClassificationKernel

35-730

ClassificationPartitionedLinear

ClassificationPartitionedLinear Package: classreg.learning.partition Superclasses: ClassificationPartitionedModel Cross-validated linear model for binary classification of high-dimensional data

Description ClassificationPartitionedLinear is a set of linear classification models trained on crossvalidated folds. To obtain a cross-validated, linear classification model, use fitclinear and specify one of the cross-validation options. You can estimate the quality of classification, or how well the linear classification model generalizes, using one or more of these “kfold” methods: kfoldPredict, kfoldLoss, kfoldMargin, and kfoldEdge. Every “kfold” method uses models trained on in-fold observations to predict the response for out-offold observations. For example, suppose that you cross-validate using five folds. In this case, the software randomly assigns each observation into five roughly equally sized groups. The training fold contains four of the groups (that is, roughly 4/5 of the data) and the test fold contains the other group (that is, roughly 1/5 of the data). In this case, cross-validation proceeds as follows: 1

The software trains the first model (stored in CVMdl.Trained{1}) using the observations in the last four groups and reserves the observations in the first group for validation.

2

The software trains the second model, which is stored in CVMdl.Trained{2}, using the observations in the first group and last three groups. The software reserves the observations in the second group for validation.

3

The software proceeds in a similar fashion for the third through fifth models.

If you validate by calling kfoldPredict, it computes predictions for the observations in group 1 using the first model, group 2 for the second model, and so on. In short, the software estimates a response for every observation using the model trained without that observation. Note ClassificationPartitionedLinear model objects do not store the predictor data set.

Construction CVMdl = fitclinear(X,Y,Name,Value) creates a cross-validated, linear classification model when Name is either 'CrossVal', 'CVPartition', 'Holdout', or 'KFold'. For more details, see fitclinear.

Properties Cross-Validation Properties

CrossValidatedModel — Cross-validated model name character vector Cross-validated model name, specified as a character vector. 35-731

35

Functions

For example, 'Linear' specifies a cross-validated linear model for binary classification or regression. Data Types: char KFold — Number of cross-validated folds positive integer Number of cross-validated folds, specified as a positive integer. Data Types: double ModelParameters — Cross-validation parameter values object Cross-validation parameter values, e.g., the name-value pair argument values used to cross-validate the linear model, specified as an object. ModelParameters does not contain estimated parameters. Access properties of ModelParameters using dot notation. NumObservations — Number of observations positive numeric scalar Number of observations in the training data, specified as a positive numeric scalar. Data Types: double Partition — Data partition cvpartition model Data partition indicating how the software splits the data into cross-validation folds, specified as a cvpartition model. Trained — Linear classification models trained on cross-validation folds cell array of ClassificationLinear model objects Linear classification models trained on cross-validation folds, specified as a cell array of ClassificationLinear models. Trained has k cells, where k is the number of folds. Data Types: cell W — Observation weights numeric vector Observation weights used to cross-validate the model, specified as a numeric vector. W has NumObservations elements. The software normalizes W so that the weights for observations within a particular class sum up to the prior probability of that class. Data Types: single | double Y — Observed class labels categorical array | character array | logical vector | vector of numeric values | cell array of character vectors Observed class labels used to cross-validate the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y has NumObservations elements, and 35-732

ClassificationPartitionedLinear

is the same data type as the input argument Y that you passed to fitclinear to cross-validate the model. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the corresponding observation in the predictor data. Data Types: categorical | char | logical | single | double | cell Other Classification Properties

CategoricalPredictors — Categorical predictor indices vector of positive integers | [] Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). Data Types: single | double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: categorical | char | logical | single | double | cell Cost — Misclassification costs square numeric matrix This property is read-only. Misclassification costs, specified as a square numeric matrix. Cost has K rows and columns, where K is the number of classes. Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. Data Types: double PredictorNames — Predictor names cell array of character vectors Predictor names in order of their appearance in the predictor data, specified as a cell array of character vectors. The length of PredictorNames is equal to the number of variables in the training data X or Tbl used as predictor variables. Data Types: cell Prior — Prior class probabilities numeric vector This property is read-only. 35-733

35

Functions

Prior class probabilities, specified as a numeric vector. Prior has as many elements as classes in ClassNames, and the order of the elements corresponds to the elements of ClassNames. Data Types: double ResponseName — Response variable name character vector Response variable name, specified as a character vector. Data Types: char ScoreTransform — Score transformation function 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | 'none' | function handle | ... Score transformation function to apply to predicted scores, specified as a function name or function handle. For linear classification models and before transformation, the predicted classification score for the observation x (row vector) is f(x) = xβ + b, where β and b correspond to Mdl.Beta and Mdl.Bias, respectively. To change the score transformation function to, for example, function, use dot notation. • For a built-in function, enter this code and replace function with a value in the table. Mdl.ScoreTransform = 'function';

Value

Description

"doublelogit"

1/(1 + e–2x)

"invlogit"

log(x / (1 – x))

"ismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

"logit"

1/(1 + e–x)

"none" or "identity"

x (no transformation)

"sign"

–1 for x < 0 0 for x = 0 1 for x > 0

"symmetric"

2x – 1

"symmetricismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

"symmetriclogit"

2/(1 + e–x) – 1

• For a MATLAB function, or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function must accept a matrix of the original scores for each class, and then return a matrix of the same size representing the transformed scores for each class. Data Types: char | function_handle 35-734

ClassificationPartitionedLinear

Methods kfoldEdge

Classification edge for observations not used for training

kfoldLoss

Classification loss for observations not used in training

kfoldMargin

Classification margins for observations not used in training

kfoldPredict

Predict labels for observations not used for training

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects.

Examples Create Cross-Validated Binary Linear Classification Model Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. There are more than two classes in the data. Identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages. Ystats = Y == 'stats';

Cross-validate a binary, linear classification model that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation. rng(1); % For reproducibility CVMdl = fitclinear(X,Ystats,'CrossVal','on') CVMdl = ClassificationPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 10 Partition: [1x1 cvpartition] ClassNames: [0 1] ScoreTransform: 'none'

CVMdl is a ClassificationPartitionedLinear cross-validated model. Because fitclinear implements 10-fold cross-validation by default, CVMdl.Trained contains ten ClassificationLinear models that contain the results of training linear classification models for each of the folds. Estimate labels for out-of-fold observations and estimate the generalization error by passing CVMdl to kfoldPredict and kfoldLoss, respectively. 35-735

35

Functions

oofLabels = kfoldPredict(CVMdl); ge = kfoldLoss(CVMdl) ge = 7.6017e-04

The estimated generalization error is less than 0.1% misclassified observations.

Find Good Lasso Penalty Using Cross-Validation To determine a good lasso-penalty strength for a linear classification model that uses a logistic regression learner, implement 5-fold cross-validation. Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. There are more than two classes in the data. The models should identify whether the word counts in a web page are from the Statistics and Machine Learning Toolbox™ documentation. So, identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages. Ystats = Y == 'stats'; −6

Create a set of 11 logarithmically-spaced regularization strengths from 10

−0 . 5

through 10

.

Lambda = logspace(-6,-0.5,11);

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Estimate the coefficients using SpaRSA. Lower the tolerance on the gradient of the objective function to 1e-8. X = X'; rng(10); % For reproducibility CVMdl = fitclinear(X,Ystats,'ObservationsIn','columns','KFold',5,... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',Lambda,'GradientTolerance',1e-8) CVMdl = ClassificationPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 5 Partition: [1x1 cvpartition] ClassNames: [0 1] ScoreTransform: 'none'

numCLModels = numel(CVMdl.Trained) numCLModels = 5

35-736

ClassificationPartitionedLinear

CVMdl is a ClassificationPartitionedLinear model. Because fitclinear implements 5-fold cross-validation, CVMdl contains 5 ClassificationLinear models that the software trains on each fold. Display the first trained linear classification model. Mdl1 = CVMdl.Trained{1}

Mdl1 = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'logit' Beta: [34023x11 double] Bias: [-13.1654 -13.1654 -13.1654 -13.1654 -9.2347 -7.0908 -5.4827 -4.5396 -3.5274 Lambda: [1.0000e-06 3.5481e-06 1.2589e-05 4.4668e-05 1.5849e-04 5.6234e-04 0.0020 0.0 Learner: 'logistic'

Mdl1 is a ClassificationLinear model object. fitclinear constructed Mdl1 by training on the first four folds. Because Lambda is a sequence of regularization strengths, you can think of Mdl1 as 11 models, one for each regularization strength in Lambda. Estimate the cross-validated classification error. ce = kfoldLoss(CVMdl);

Because there are 11 regularization strengths, ce is a 1-by-11 vector of classification error rates. Higher values of Lambda lead to predictor variable sparsity, which is a good quality of a classifier. For each regularization strength, train a linear classification model using the entire data set and the same options as when you cross-validated the models. Determine the number of nonzero coefficients per model. Mdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',Lambda,'GradientTolerance',1e-8); numNZCoeff = sum(Mdl.Beta~=0);

In the same figure, plot the cross-validated, classification error rates and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale. figure; [h,hL1,hL2] = plotyy(log10(Lambda),log10(ce),... log10(Lambda),log10(numNZCoeff)); hL1.Marker = 'o'; hL2.Marker = 'o'; ylabel(h(1),'log_{10} classification error') ylabel(h(2),'log_{10} nonzero-coefficient frequency') xlabel('log_{10} Lambda') title('Test-Sample Statistics') hold off

35-737

35

Functions

Choose the index of the regularization strength that balances predictor variable sparsity and low −4

classification error. In this case, a value between 10

−1

to 10

should suffice.

idxFinal = 7;

Select the model from Mdl with the chosen regularization strength. MdlFinal = selectModels(Mdl,idxFinal);

MdlFinal is a ClassificationLinear model containing one regularization strength. To estimate labels for new observations, pass MdlFinal and the new data to predict.

Version History Introduced in R2016a R2022a: Cost property stores the user-specified cost matrix Behavior changed in R2022a Starting in R2022a, the Cost property stores the user-specified cost matrix, so that you can compute the observed misclassification cost using the specified cost value. The software stores normalized prior probabilities (Prior) and observation weights (W) that do not reflect the penalties described in the cost matrix. To compute the observed misclassification cost, specify the LossFun name-value argument as "classifcost" when you call the kfoldLoss function. 35-738

ClassificationPartitionedLinear

Note that model training has not changed and, therefore, the decision boundaries between classes have not changed. For training, the fitting function updates the specified prior probabilities by incorporating the penalties described in the specified cost matrix, and then normalizes the prior probabilities and observation weights. This behavior has not changed. In previous releases, the software stored the default cost matrix in the Cost property and stored the prior probabilities and observation weights used for training in the Prior and W properties, respectively. Starting in R2022a, the software stores the user-specified cost matrix without modification, and stores normalized prior probabilities and observation weights that do not reflect the cost penalties. For more details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 19-8. Some object functions use the Cost and W properties: • The kfoldLoss function uses the cost matrix stored in the Cost property if you specify the LossFun name-value argument as "classifcost" or "mincost". • The kfoldLoss and kfoldEdge functions use the observation weights stored in the W property. If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases. If you want the software to handle the cost matrix, prior probabilities, and observation weights in the same way as in previous releases, adjust the prior probabilities and observation weights for the nondefault cost matrix, as described in “Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix” on page 19-9. Then, when you train a classification model, specify the adjusted prior probabilities and observation weights by using the Prior and Weights name-value arguments, respectively, and use the default cost matrix.

See Also ClassificationLinear | fitclinear | kfoldPredict | kfoldLoss

35-739

35

Functions

ClassificationPartitionedLinearECOC Package: classreg.learning.partition Superclasses: ClassificationPartitionedModel Cross-validated linear error-correcting output codes model for multiclass classification of highdimensional data

Description ClassificationPartitionedLinearECOC is a set of error-correcting output codes (ECOC) models composed of linear classification models, trained on cross-validated folds. Estimate the quality of classification by cross-validation using one or more “kfold” functions: kfoldPredict, kfoldLoss, kfoldMargin, and kfoldEdge. Every “kfold” method uses models trained on in-fold observations to predict the response for out-offold observations. For example, suppose that you cross-validate using five folds. In this case, the software randomly assigns each observation into five roughly equal-sized groups. The training fold contains four of the groups (that is, roughly 4/5 of the data) and the test fold contains the other group (that is, roughly 1/5 of the data). In this case, cross-validation proceeds as follows. 1

The software trains the first model (stored in CVMdl.Trained{1}) using the observations in the last four groups and reserves the observations in the first group for validation.

2

The software trains the second model (stored in CVMdl.Trained{2}) using the observations in the first group and last three groups. The software reserves the observations in the second group for validation.

3

The software proceeds in a similar fashion for the third, fourth, and fifth models.

If you validate by calling kfoldPredict, it computes predictions for the observations in group 1 using the first model, group 2 for the second model, and so on. In short, the software estimates a response for every observation using the model trained without that observation. Note ClassificationPartitionedLinearECOC model objects do not store the predictor data set.

Construction CVMdl = fitcecoc(X,Y,'Learners',t,Name,Value) returns a cross-validated, linear ECOC model when: • t is 'Linear' or a template object returned by templateLinear. • Name is one of 'CrossVal', 'CVPartition', 'Holdout', or 'KFold'. For more details, see fitcecoc.

35-740

ClassificationPartitionedLinearECOC

Properties Cross-Validation Properties

CrossValidatedModel — Cross-validated model name character vector Cross-validated model name, specified as a character vector. For example, 'ECOC' specifies a cross-validated ECOC model. Data Types: char KFold — Number of cross-validated folds positive integer Number of cross-validated folds, specified as a positive integer. Data Types: double ModelParameters — Cross-validation parameter values object Cross-validation parameter values, e.g., the name-value pair argument values used to cross-validate the ECOC classifier, specified as an object. ModelParameters does not contain estimated parameters. Access properties of ModelParameters using dot notation. NumObservations — Number of observations positive numeric scalar Number of observations in the training data, specified as a positive numeric scalar. Data Types: double Partition — Data partition cvpartition model Data partition indicating how the software splits the data into cross-validation folds, specified as a cvpartition model. Trained — Compact classifiers trained on cross-validation folds cell array of CompactClassificationECOC models Compact classifiers trained on cross-validation folds, specified as a cell array of CompactClassificationECOC models. Trained has k cells, where k is the number of folds. Data Types: cell W — Observation weights numeric vector Observation weights used to cross-validate the model, specified as a numeric vector. W has NumObservations elements. The software normalizes the weights used for training so that sum(W,'omitnan') is 1. 35-741

35

Functions

Data Types: single | double Y — Observed class labels categorical array | character array | logical vector | vector of numeric values | cell array of character vectors Observed class labels used to cross-validate the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y has NumObservations elements, and is the same data type as the input argument Y that you passed to fitcecoc to cross-validate the model. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the observation in the predictor data. Data Types: char | cell | categorical | logical | single | double ECOC Properties

BinaryLoss — Binary learner loss function 'hinge' | 'quadratic' This property is read-only. Binary learner loss function, specified as a character vector representing the loss function name. By default, if all binary learners are linear classification models using SVM, then BinaryLoss is 'hinge'. If all binary learners are linear classification models using logistic regression, then BinaryLoss is 'quadratic'. To potentially increase accuracy, specify a binary loss function other than the default during a prediction or loss computation by using the BinaryLoss name-value argument of kfoldPredict or kfoldLoss. For the list of supported binary loss functions, see “Binary Loss” on page 35-4430. Data Types: char BinaryY — Binary learner class labels numeric matrix | [] Binary learner class labels, specified as a numeric matrix or []. • If the coding matrix is the same across folds, then BinaryY is a NumObservations-by-L matrix, where L is the number of binary learners (size(CodingMatrix,2)). Elements of BinaryY are -1, 0, or 1, and the value corresponds to a dichotomous class assignment. This table describes how learner j assigns observation k to a dichotomous class corresponding to the value of BinaryY(k,j). Value

Dichotomous Class Assignment

–1

Learner j assigns observation k to a negative class.

0

Before training, learner j removes observation k from the data set.

1

Learner j assigns observation k to a positive class.

• If the coding matrix varies across folds, then BinaryY is empty ([]). 35-742

ClassificationPartitionedLinearECOC

Data Types: double CodingMatrix — Codes specifying class assignments numeric matrix | [] Codes specifying class assignments for the binary learners, specified as a numeric matrix or []. • If the coding matrix is the same across folds, then CodingMatrix is a K-by-L matrix. K is the number of classes and L is the number of binary learners. Elements of CodingMatrix are -1, 0, or 1, and the value corresponds to a dichotomous class assignment. This table describes how learner j assigns observations in class i to a dichotomous class corresponding to the value of CodingMatrix(i,j). Value

Dichotomous Class Assignment

–1

Learner j assigns observations in class i to a negative class.

0

Before training, learner j removes observations in class i from the data set.

1

Learner j assigns observations in class i to a positive class.

• If the coding matrix varies across folds, then CodingMatrix is empty ([]). Obtain the coding matrix for each fold using the Trained property. For example, CVMdl.Trained{1}.CodingMatrix is the coding matrix in the first fold of the cross-validated ECOC model CVMdl. Data Types: double | single | int8 | int16 | int32 | int64 Other Classification Properties

CategoricalPredictors — Categorical predictor indices vector of positive integers | [] Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). Data Types: single | double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: categorical | char | logical | single | double | cell Cost — Misclassification costs square numeric matrix This property is read-only. Misclassification costs, specified as a square numeric matrix. Cost has K rows and columns, where K is the number of classes. 35-743

35

Functions

Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. Data Types: double PredictorNames — Predictor names cell array of character vectors Predictor names in order of their appearance in the predictor data, specified as a cell array of character vectors. The length of PredictorNames is equal to the number of variables in the training data X or Tbl used as predictor variables. Data Types: cell Prior — Prior class probabilities numeric vector This property is read-only. Prior class probabilities, specified as a numeric vector. Prior has as many elements as the number of classes in ClassNames, and the order of the elements corresponds to the order of the classes in ClassNames. fitcecoc incorporates misclassification costs differently among different types of binary learners. Data Types: double ResponseName — Response variable name character vector Response variable name, specified as a character vector. Data Types: char ScoreTransform — Score transformation function to apply to predicted scores 'none' This property is read-only. Score transformation function to apply to the predicted scores, specified as 'none'. An ECOC model does not support score transformation.

Methods kfoldEdge

Classification edge for observations not used for training

kfoldLoss

Classification loss for observations not used in training

kfoldMargin

Classification margins for observations not used in training

kfoldPredict

Predict labels for observations not used for training

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects. 35-744

ClassificationPartitionedLinearECOC

Examples Create Cross-Validated Multiclass Linear Classification Model Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. Cross-validate a multiclass, linear classification model that can identify which MATLAB® toolbox a documentation web page is from based on counts of words on the page. rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners','linear','CrossVal','on') CVMdl = ClassificationPartitionedLinearECOC CrossValidatedModel: 'LinearECOC' ResponseName: 'Y' NumObservations: 31572 KFold: 10 Partition: [1x1 cvpartition] ClassNames: [comm dsp ecoder ScoreTransform: 'none'

fixedpoint

hdlcoder

phased

CVMdl is a ClassificationPartitionedLinearECOC cross-validated model. Because fitcecoc implements 10-fold cross-validation by default, CVMdl.Trained contains a 10-by-1 cell vector of ten CompactClassificationECOC models that contain the results of training ECOC models composed of binary, linear classification models for each of the folds. Estimate labels for out-of-fold observations and estimate the generalization error by passing CVMdl to kfoldPredict and kfoldLoss, respectively. oofLabels = kfoldPredict(CVMdl); ge = kfoldLoss(CVMdl) ge = 0.0958

The estimated generalization error is about 10% misclassified observations. To improve generalization error, try specifying another solver, such as LBFGS. To change default options when training ECOC models composed of linear classification models, create a linear classification model template using templateLinear, and then pass the template to fitcecoc.

Find Good Lasso Penalty Using Cross-Validation To determine a good lasso-penalty strength for an ECOC model composed of linear classification models that use logistic regression learners, implement 5-fold cross-validation. Load the NLP data set. 35-745

physmod

35

Functions

load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. For simplicity, use the label 'others' for all observations in Y that are not 'simulink', 'dsp', or 'comm'. Y(~(ismember(Y,{'simulink','dsp','comm'}))) = 'others'; −7

Create a set of 11 logarithmically-spaced regularization strengths from 10

−2

through 10

.

Lambda = logspace(-7,-2,11);

Create a linear classification model template that specifies to use logistic regression learners, use lasso penalties with strengths in Lambda, train using SpaRSA, and lower the tolerance on the gradient of the objective function to 1e-8. t = templateLinear('Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',Lambda,'GradientTolerance',1e-8);

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. X = X'; rng(10); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns','KFold',5);

CVMdl is a ClassificationPartitionedLinearECOC model. Dissect CVMdl, and each model within it. numECOCModels = numel(CVMdl.Trained) numECOCModels = 5 ECOCMdl1 = CVMdl.Trained{1} ECOCMdl1 = CompactClassificationECOC ResponseName: 'Y' ClassNames: [comm dsp ScoreTransform: 'none' BinaryLearners: {6×1 cell} CodingMatrix: [4×6 double]

simulink

Properties, Methods numCLModels = numel(ECOCMdl1.BinaryLearners) numCLModels = 6 CLMdl1 = ECOCMdl1.BinaryLearners{1} CLMdl1 = ClassificationLinear ResponseName: 'Y' ClassNames: [-1 1]

35-746

others]

ClassificationPartitionedLinearECOC

ScoreTransform: Beta: Bias: Lambda: Learner:

'logit' [34023×11 double] [-0.3169 -0.3169 -0.3168 -0.3168 -0.3168 -0.3167 -0.1725 -0.0805 -0.1762 -0.3 [1.0000e-07 3.1623e-07 1.0000e-06 3.1623e-06 1.0000e-05 3.1623e-05 1.0000e-04 'logistic'

Properties, Methods

Because fitcecoc implements 5-fold cross-validation, CVMdl contains a 5-by-1 cell array of CompactClassificationECOC models that the software trains on each fold. The BinaryLearners property of each CompactClassificationECOC model contains the ClassificationLinear models. The number of ClassificationLinear models within each compact ECOC model depends on the number of distinct labels and coding design. Because Lambda is a sequence of regularization strengths, you can think of CLMdl1 as 11 models, one for each regularization strength in Lambda. Determine how well the models generalize by plotting the averages of the 5-fold classification error for each regularization strength. Identify the regularization strength that minimizes the generalization error over the grid. ce = kfoldLoss(CVMdl); figure; plot(log10(Lambda),log10(ce)) [~,minCEIdx] = min(ce); minLambda = Lambda(minCEIdx); hold on plot(log10(minLambda),log10(ce(minCEIdx)),'ro'); ylabel('log_{10} 5-fold classification error') xlabel('log_{10} Lambda') legend('MSE','Min classification error') hold off

35-747

35

Functions

Train an ECOC model composed of linear classification model using the entire data set, and specify the minimal regularization strength. t = templateLinear('Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',minLambda,'GradientTolerance',1e-8); MdlFinal = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns');

To estimate labels for new observations, pass MdlFinal and the new data to predict.

Version History Introduced in R2016a

See Also kfoldLoss | kfoldPredict | fitcecoc | fitclinear | ClassificationECOC | ClassificationLinear

35-748

ClassificationPartitionedModel

ClassificationPartitionedModel Package: classreg.learning.partition Cross-validated classification model

Description ClassificationPartitionedModel is a set of classification models trained on cross-validated folds. Estimate the quality of classification by cross validation using one or more “kfold” methods: kfoldPredict, kfoldLoss, kfoldMargin, kfoldEdge, and kfoldfun. Every “kfold” method uses models trained on in-fold observations to predict the response for out-offold observations. For example, suppose you cross validate using five folds. In this case, the software randomly assigns each observation into five roughly equally sized groups. The training fold contains four of the groups (i.e., roughly 4/5 of the data) and the test fold contains the other group (i.e., roughly 1/5 of the data). In this case, cross validation proceeds as follows: • The software trains the first model (stored in CVMdl.Trained{1}) using the observations in the last four groups and reserves the observations in the first group for validation. • The software trains the second model (stored in CVMdl.Trained{2}) using the observations in the first group and last three groups, and reserves the observations in the second group for validation. • The software proceeds in a similar fashion for the third to fifth models. If you validate by calling kfoldPredict, it computes predictions for the observations in group 1 using the first model, group 2 for the second model, and so on. In short, the software estimates a response for every observation using the model trained without that observation.

Construction CVMdl = crossval(Mdl) creates a cross-validated classification model from a classification model (Mdl). Alternatively: • CVDiscrMdl = fitcdiscr(X,Y,Name,Value) • CVKNNMdl = fitcknn(X,Y,Name,Value) • CVNetMdl = fitcnet(X,Y,Name,Value) • CVNBMdl = fitcnb(X,Y,Name,Value) • CVSVMMdl = fitcsvm(X,Y,Name,Value) • CVTreeMdl = fitctree(X,Y,Name,Value) create a cross-validated model when Name is either 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. For syntax details, see fitcdiscr, fitcknn, fitcnet, fitcnb, fitcsvm, and fitctree.

35-749

35

Functions

Input Arguments Mdl A classification model, specified as one of the following: • A classification tree trained using fitctree • A discriminant analysis classifier trained using fitcdiscr • A neural network classifier trained using fitcnet • A naive Bayes classifier trained using fitcnb • A nearest neighbor classifier trained using fitcknn • A support vector machine classifier trained using fitcsvm

Properties BinEdges Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors. The software bins numeric predictors only if you specify the 'NumBins' name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default). You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl. X = mdl.X; % Predictor data Xbinned = zeros(size(X)); edges = mdl.BinEdges; % Find indices of binned predictors. idxNumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxNumeric) idxNumeric = idxNumeric'; end for j = idxNumeric x = X(:,j); % Convert x to array if x is a table. if istable(x) x = table2array(x); end % Group x into bins by using the discretize function. xbinned = discretize(x,[-inf; edges{j}; inf]); Xbinned(:,j) = xbinned; end

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

35-750

ClassificationPartitionedModel

CategoricalPredictors Categorical predictor indices, specified as a vector of positive integers. Assuming that the predictor data contains observations in rows, CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]). If Mdl is a trained discriminant analysis classifier, then CategoricalPredictors is always empty ([]). ClassNames Unique class labels used in training the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Cost Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (i.e., the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. If CVModel is a cross-validated ClassificationDiscriminant, ClassificationKNN, ClassificationNaiveBayes, or ClassificationNeuralNetwork model, then you can change its cost matrix to e.g., CostMatrix, using dot notation. CVModel.Cost = CostMatrix;

CrossValidatedModel Name of the cross-validated model, which is a character vector. KFold Number of folds used in cross-validated model, which is a positive integer. ModelParameters Object holding parameters of CVModel. NumObservations Number of observations in the training data stored in X and Y, specified as a numeric scalar. Partition The partition of class CVPartition used in creating the cross-validated model. PredictorNames Predictor variable names, specified as a cell array of character vectors. The order of the elements in PredictorNames corresponds to the order in which the predictor names appear in the training data. Prior Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames. 35-751

35

Functions

If CVModel is a cross-validated ClassificationDiscriminant or ClassificationNaiveBayes model, then you can change its vector of priors to e.g., priorVector, using dot notation. CVModel.Prior = priorVector;

ResponseName Response variable name, specified as a character vector. ScoreTransform Score transformation, specified as a character vector or function handle. ScoreTransform represents a built-in transformation function or a function handle for transforming predicted classification scores. To change the score transformation function to function, for example, use dot notation. • For a built-in function, enter a character vector. Mdl.ScoreTransform = 'function';

This table describes the available built-in functions. Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Trained The trained learners, which is a cell array of compact classification models. W The scaled weights, which is a vector with length n, the number of observations in X. 35-752

ClassificationPartitionedModel

X A matrix or table of predictor values. Y Categorical or character array, logical or numeric vector, or cell array of character vectors specifying the class labels for each observation. Each entry of Y is the response value of the corresponding observation in X.

Object Functions gather kfoldEdge kfoldLoss kfoldMargin kfoldPredict kfoldfun

Gather properties of Statistics and Machine Learning Toolbox object from GPU Classification edge for cross-validated classification model Classification loss for cross-validated classification model Classification margins for cross-validated classification model Classify observations in cross-validated classification model Cross-validate function for classification

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects.

Examples Evaluate the Classification Error of a Classification Tree Classifier Evaluate the k-fold cross-validation error for a classification tree model. Load Fisher's iris data set. load fisheriris

Train a classification tree using default options. Mdl = fitctree(meas,species);

Cross validate the classification tree model. CVMdl = crossval(Mdl);

Estimate the 10-fold cross-validation loss. L = kfoldLoss(CVMdl) L = 0.0533

Estimate Posterior Probabilities for Test Samples Estimate positive class posterior probabilities for the test set of an SVM algorithm. Load the ionosphere data set. 35-753

35

Functions

load ionosphere

Train an SVM classifier. Specify a 20% holdout sample. It is good practice to standardize the predictors and specify the class order. rng(1) % For reproducibility CVSVMModel = fitcsvm(X,Y,'Holdout',0.2,'Standardize',true,... 'ClassNames',{'b','g'});

CVSVMModel is a trained ClassificationPartitionedModel cross-validated classifier. Estimate the optimal score function for mapping observation scores to posterior probabilities of an observation being classified as 'g'. ScoreCVSVMModel = fitSVMPosterior(CVSVMModel);

ScoreSVMModel is a trained ClassificationPartitionedModel cross-validated classifier containing the optimal score transformation function estimated from the training data. Estimate the out-of-sample positive class posterior probabilities. Display the results for the first 10 out-of-sample observations. [~,OOSPostProbs] = kfoldPredict(ScoreCVSVMModel); indx = ~isnan(OOSPostProbs(:,2)); hoObs = find(indx); % Holdout observation numbers OOSPostProbs = [hoObs, OOSPostProbs(indx,2)]; table(OOSPostProbs(1:10,1),OOSPostProbs(1:10,2),... 'VariableNames',{'ObservationIndex','PosteriorProbability'}) ans=10×2 table ObservationIndex ________________ 6 7 8 9 16 22 23 24 38 41

PosteriorProbability ____________________ 0.17381 0.89639 0.0076613 0.91602 0.026722 4.6114e-06 0.9024 2.4137e-06 0.00042705 0.86427

Tips To estimate posterior probabilities of trained, cross-validated SVM classifiers, use fitSVMPosterior.

Version History R2023a: Neural network classifiers support misclassification costs and prior probabilities

35-754

ClassificationPartitionedModel

fitcnet supports misclassification costs and prior probabilities for neural network classifiers. Specify the Cost and Prior name-value arguments when you create a model. Alternatively, you can specify misclassification costs after training a model by using dot notation to change the Cost property value of the model. Mdl.Cost = [0 2; 1 0];

R2022a: Cost property stores the user-specified cost matrix Behavior changed in R2022a Starting in R2022a, the Cost property of a cross-validated SVM classification model stores the userspecified cost matrix, so that you can compute the observed misclassification cost using the specified cost value. The software stores normalized prior probabilities (Prior) and observation weights (W) that do not reflect the penalties described in the cost matrix. Other cross-validated models already had this behavior. To compute the observed misclassification cost, specify the LossFun name-value argument as "classifcost" when you call the kfoldLoss function. Note that model training has not changed and, therefore, the decision boundaries between classes have not changed. For training an SVM model, the fitting function updates the specified prior probabilities by incorporating the penalties described in the specified cost matrix, and then normalizes the prior probabilities and observation weights. This behavior has not changed. In previous releases, the software stored the default cost matrix in the Cost property and stored the prior probabilities and observation weights used for training in the Prior and W properties, respectively. Starting in R2022a, the software stores the user-specified cost matrix without modification, and stores normalized prior probabilities and observation weights that do not reflect the cost penalties. For more details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 19-8. Some object functions use the Cost and W properties: • The kfoldLoss function uses the cost matrix stored in the Cost property if you specify the LossFun name-value argument as "classifcost" or "mincost". • The kfoldLoss and kfoldEdge functions use the observation weights stored in the W property. If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases. If you want the software to handle the cost matrix, prior probabilities, and observation weights in the same way as in previous releases, adjust the prior probabilities and observation weights for the nondefault cost matrix, as described in “Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix” on page 19-9. Then, when you train a classification model, specify the adjusted prior probabilities and observation weights by using the Prior and Weights name-value arguments, respectively, and use the default cost matrix.

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • ClassificationPartitionedModel can be one of the following cross-validated model objects: 35-755

35

Functions

• k-nearest neighbor classifier trained with fitcknn • Support vector machine classifier trained with fitcsvm • Binary decision tree for multiclass classification trained with fitctree • The object functions of the ClassificationPartitionedModel model fully support GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also CompactClassificationSVM | CompactClassificationTree | CompactClassificationDiscriminant | fitcsvm | fitctree | fitcdiscr | fitSVMPosterior | fitcknn | ClassificationKNN | ClassificationNaiveBayes | fitcnb | ClassificationNeuralNetwork | fitcnet Topics “Cross Validating a Discriminant Analysis Classifier” on page 21-17

35-756

ClassificationSVM

ClassificationSVM Support vector machine (SVM) for one-class and binary classification

Description ClassificationSVM is a support vector machine (SVM) classifier on page 35-772 for one-class and two-class learning. Trained ClassificationSVM classifiers store training data, parameter values, prior probabilities, support vectors, and algorithmic implementation information. Use these classifiers to perform tasks such as fitting a score-to-posterior-probability transformation function (see fitPosterior) and predicting labels for new data (see predict).

Creation Create a ClassificationSVM object by using fitcsvm.

Properties SVM Properties Alpha — Trained classifier coefficients numeric vector This property is read-only. Trained classifier coefficients, specified as an s-by-1 numeric vector. s is the number of support vectors in the trained classifier, sum(Mdl.IsSupportVector). Alpha contains the trained classifier coefficients from the dual problem, that is, the estimated Lagrange multipliers. If you remove duplicates by using the RemoveDuplicates name-value pair argument of fitcsvm, then for a given set of duplicate observations that are support vectors, Alpha contains one coefficient corresponding to the entire set. That is, MATLAB attributes a nonzero coefficient to one observation from the set of duplicates and a coefficient of 0 to all other duplicate observations in the set. Data Types: single | double Beta — Linear predictor coefficients numeric vector This property is read-only. Linear predictor coefficients, specified as a numeric vector. The length of Beta is equal to the number of predictors used to train the model. MATLAB expands categorical variables in the predictor data using full dummy encoding. That is, MATLAB creates one dummy variable for each level of each categorical variable. Beta stores one value for each predictor variable, including the dummy variables. For example, if there are three predictors, one of which is a categorical variable with three levels, then Beta is a numeric vector containing five values. 35-757

35

Functions

If KernelParameters.Function is 'linear', then the classification score for the observation x is f x = x/s ′β + b . Mdl stores β, b, and s in the properties Beta, Bias, and KernelParameters.Scale, respectively. To estimate classification scores manually, you must first apply any transformations to the predictor data that were applied during training. Specifically, if you specify 'Standardize',true when using fitcsvm, then you must standardize the predictor data manually by using the mean Mdl.Mu and standard deviation Mdl.Sigma, and then divide the result by the kernel scale in Mdl.KernelParameters.Scale. All SVM functions, such as resubPredict and predict, apply any required transformation before estimation. If KernelParameters.Function is not 'linear', then Beta is empty ([]). Data Types: single | double Bias — Bias term scalar This property is read-only. Bias term, specified as a scalar. Data Types: single | double BoxConstraints — Box constraints numeric vector This property is read-only. Box constraints, specified as a numeric vector of n-by-1 box constraints on page 35-770. n is the number of observations in the training data (see the NumObservations property). If you remove duplicates by using the RemoveDuplicates name-value pair argument of fitcsvm, then for a given set of duplicate observations, MATLAB sums the box constraints and then attributes the sum to one observation. MATLAB attributes the box constraints of 0 to all other observations in the set. Data Types: single | double CacheInfo — Caching information structure array This property is read-only. Caching information, specified as a structure array. The caching information contains the fields described in this table.

35-758

Field

Description

Size

The cache size (in MB) that the software reserves to train the SVM classifier. For details, see 'CacheSize'.

ClassificationSVM

Field

Description

Algorithm

The caching algorithm that the software uses during optimization. Currently, the only available caching algorithm is Queue. You cannot set the caching algorithm.

Display the fields of CacheInfo by using dot notation. For example, Mdl.CacheInfo.Size displays the value of the cache size. Data Types: struct IsSupportVector — Support vector indicator logical vector This property is read-only. Support vector indicator, specified as an n-by-1 logical vector that flags whether a corresponding observation in the predictor data matrix is a “Support Vector” on page 35-771. n is the number of observations in the training data (see NumObservations). If you remove duplicates by using the RemoveDuplicates name-value pair argument of fitcsvm, then for a given set of duplicate observations that are support vectors, IsSupportVector flags only one observation as a support vector. Data Types: logical KernelParameters — Kernel parameters structure array This property is read-only. Kernel parameters, specified as a structure array. The kernel parameters property contains the fields listed in this table. Field

Description

Function

Kernel function used to compute the elements of the Gram matrix on page 35-2498. For details, see 'KernelFunction'.

Scale

Kernel scale parameter used to scale all elements of the predictor data on which the model is trained. For details, see 'KernelScale'.

To display the values of KernelParameters, use dot notation. For example, Mdl.KernelParameters.Scale displays the kernel scale parameter value. The software accepts KernelParameters as inputs and does not modify them. Data Types: struct Nu — One-class learning parameter positive scalar This property is read-only. One-class learning on page 35-771 parameter ν, specified as a positive scalar. Data Types: single | double 35-759

35

Functions

OutlierFraction — Proportion of outliers numeric scalar This property is read-only. Proportion of outliers in the training data, specified as a numeric scalar. Data Types: double Solver — Optimization routine 'ISDA' | 'L1QP' | 'SMO' This property is read-only. Optimization routine used to train the SVM classifier, specified as 'ISDA', 'L1QP', or 'SMO'. For more details, see 'Solver'. SupportVectorLabels — Support vector class labels s-by-1 numeric vector This property is read-only. Support vector class labels, specified as an s-by-1 numeric vector. s is the number of support vectors in the trained classifier, sum(Mdl.IsSupportVector). A value of +1 in SupportVectorLabels indicates that the corresponding support vector is in the positive class (ClassNames{2}). A value of –1 indicates that the corresponding support vector is in the negative class (ClassNames{1}). If you remove duplicates by using the RemoveDuplicates name-value pair argument of fitcsvm, then for a given set of duplicate observations that are support vectors, SupportVectorLabels contains one unique support vector label. Data Types: single | double SupportVectors — Support vectors s-by-p numeric matrix This property is read-only. Support vectors in the trained classifier, specified as an s-by-p numeric matrix. s is the number of support vectors in the trained classifier, sum(Mdl.IsSupportVector), and p is the number of predictor variables in the predictor data. SupportVectors contains rows of the predictor data X that MATLAB considers to be support vectors. If you specify 'Standardize',true when training the SVM classifier using fitcsvm, then SupportVectors contains the standardized rows of X. If you remove duplicates by using the RemoveDuplicates name-value pair argument of fitcsvm, then for a given set of duplicate observations that are support vectors, SupportVectors contains one unique support vector. Data Types: single | double

35-760

ClassificationSVM

Other Classification Properties CategoricalPredictors — Categorical predictor indices vector of positive integers | [] This property is read-only. Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). Data Types: double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: single | double | logical | char | cell | categorical Cost — Misclassification cost numeric square matrix This property is read-only. Misclassification cost, specified as a numeric square matrix. • For two-class learning, the Cost property stores the misclassification cost matrix specified by the Cost name-value argument of the fitting function. The rows correspond to the true class and the columns correspond to the predicted class. That is, Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. • For one-class learning, Cost = 0. Data Types: double ExpandedPredictorNames — Expanded predictor names cell array of character vectors This property is read-only. Expanded predictor names, specified as a cell array of character vectors. If the model uses dummy variable encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. Data Types: cell Gradient — Training data gradient values numeric vector 35-761

35

Functions

This property is read-only. Training data gradient values, specified as a numeric vector. The length of Gradient is equal to the number of observations (NumObservations). Data Types: single | double ModelParameters — Parameters used to train model object This property is read-only. Parameters used to train the ClassificationSVM model, specified as an object. ModelParameters contains parameter values such as the name-value pair argument values used to train the SVM classifier. ModelParameters does not contain estimated parameters. Access the properties of ModelParameters by using dot notation. For example, access the initial values for estimating Alpha by using Mdl.ModelParameters.Alpha. Mu — Predictor means numeric vector | [] This property is read-only. Predictor means, specified as a numeric vector. If you specify 'Standardize',1 or 'Standardize',true when you train an SVM classifier using fitcsvm, the length of Mu is equal to the number of predictors. MATLAB expands categorical variables in the predictor data using dummy variables. Mu stores one value for each predictor variable, including the dummy variables. However, MATLAB does not standardize the columns that contain categorical variables. If you set 'Standardize',false when you train the SVM classifier using fitcsvm, then Mu is an empty vector ([]). Data Types: single | double NumObservations — Number of observations numeric scalar This property is read-only. Number of observations in the training data stored in X and Y, specified as a numeric scalar. Data Types: double PredictorNames — Predictor variable names cell array of character vectors This property is read-only. Predictor variable names, specified as a cell array of character vectors. The order of the elements in PredictorNames corresponds to the order in which the predictor names appear in the training data. Data Types: cell Prior — Prior probabilities numeric vector 35-762

ClassificationSVM

This property is read-only. Prior probabilities for each class, specified as a numeric vector. For two-class learning, if you specify a cost matrix, then the software updates the prior probabilities by incorporating the penalties described in the cost matrix. • For two-class learning, the software normalizes the prior probabilities specified by the Prior name-value argument of the fitting function so that the probabilities sum to 1. The Prior property stores the normalized prior probabilities. The order of the elements of Prior corresponds to the elements of Mdl.ClassNames. • For one-class learning, Prior = 1. Data Types: single | double ResponseName — Response variable name character vector This property is read-only. Response variable name, specified as a character vector. Data Types: char RowsUsed — Rows of original training data stored logical vector | [] This property is read-only. Rows of the original training data stored in the model, specified as a logical vector. This property is empty if all rows are stored in X and Y. Data Types: logical ScoreTransform — Score transformation character vector | function handle Score transformation, specified as a character vector or function handle. ScoreTransform represents a built-in transformation function or a function handle for transforming predicted classification scores. To change the score transformation function to function, for example, use dot notation. • For a built-in function, enter a character vector. Mdl.ScoreTransform = 'function';

This table describes the available built-in functions. Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

35-763

35

Functions

Value

Description

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Data Types: char | function_handle Sigma — Predictor standard deviations [] (default) | numeric vector This property is read-only. Predictor standard deviations, specified as a numeric vector. If you specify 'Standardize',true when you train the SVM classifier using fitcsvm, the length of Sigma is equal to the number of predictor variables. MATLAB expands categorical variables in the predictor data using dummy variables. Sigma stores one value for each predictor variable, including the dummy variables. However, MATLAB does not standardize the columns that contain categorical variables. If you set 'Standardize',false when you train the SVM classifier using fitcsvm, then Sigma is an empty vector ([]). Data Types: single | double W — Observation weights numeric vector This property is read-only. Observation weights used to train the SVM classifier, specified as an n-by-1 numeric vector. n is the number of observations (see NumObservations). fitcsvm normalizes the observation weights specified in the 'Weights' name-value pair argument so that the elements of W within a particular class sum up to the prior probability of that class. Data Types: single | double X — Unstandardized predictors numeric matrix | table 35-764

ClassificationSVM

This property is read-only. Unstandardized predictors used to train the SVM classifier, specified as a numeric matrix or table. Each row of X corresponds to one observation, and each column corresponds to one variable. Data Types: single | double Y — Class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Class labels used to train the SVM classifier, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y is the same data type as the input argument Y of fitcsvm. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the corresponding row of X. Data Types: single | double | logical | char | cell | categorical Convergence Control Properties ConvergenceInfo — Convergence information structure array This property is read-only. Convergence information, specified as a structure array. Field

Description

Converged

Logical flag indicating whether the algorithm converged (1 indicates convergence).

ReasonForConvergence

Character vector indicating the criterion the software uses to detect convergence.

Gap

Scalar feasibility gap between the dual and primal objective functions.

GapTolerance

Scalar feasibility gap tolerance. Set this tolerance, for example to 1e-2, by using the name-value pair argument 'GapTolerance',1e-2 of fitcsvm.

DeltaGradient

Scalar-attained gradient difference between upper and lower violators

DeltaGradientTolerance

Scalar tolerance for the gradient difference between upper and lower violators. Set this tolerance, for example to 1e-2, by using the name-value pair argument 'DeltaGradientTolerance',1e-2 of fitcsvm.

LargestKKTViolation

Maximal scalar Karush-Kuhn-Tucker (KKT) violation value.

35-765

35

Functions

Field

Description

KKTTolerance

Scalar tolerance for the largest KKT violation. Set this tolerance, for example, to 1e-3, by using the name-value pair argument 'KKTTolerance',1e-3 of fitcsvm.

History

Structure array containing convergence information at set optimization iterations. The fields are: • NumIterations: numeric vector of iteration indices for which the software records convergence information • Gap: numeric vector of Gap values at the iterations • DeltaGradient: numeric vector of DeltaGradient values at the iterations • LargestKKTViolation: numeric vector of LargestKKTViolation values at the iterations • NumSupportVectors: numeric vector indicating the number of support vectors at the iterations • Objective: numeric vector of Objective values at the iterations

Objective

Scalar value of the dual objective function.

Data Types: struct NumIterations — Number of iterations positive integer This property is read-only. Number of iterations required by the optimization routine to attain convergence, specified as a positive integer. To set the limit on the number of iterations to 1000, for example, specify 'IterationLimit',1000 when you train the SVM classifier using fitcsvm. Data Types: double ShrinkagePeriod — Number of iterations between reductions of active set nonnegative integer This property is read-only. Number of iterations between reductions of the active set, specified as a nonnegative integer. To set the shrinkage period to 1000, for example, specify 'ShrinkagePeriod',1000 when you train the SVM classifier using fitcsvm. Data Types: single | double 35-766

ClassificationSVM

Hyperparameter Optimization Properties HyperparameterOptimizationResults — Description of cross-validation optimization of hyperparameters BayesianOptimization object | table This property is read-only. Description of the cross-validation optimization of hyperparameters, specified as a BayesianOptimization object or a table of hyperparameters and associated values. This property is nonempty when the 'OptimizeHyperparameters' name-value pair argument of fitcsvm is nonempty at creation. The value of HyperparameterOptimizationResults depends on the setting of the Optimizer field in the HyperparameterOptimizationOptions structure of fitcsvm at creation, as described in this table. Value of Optimizer Field

Value of HyperparameterOptimizationResults

'bayesopt' (default)

Object of class BayesianOptimization

'gridsearch' or 'randomsearch'

Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

Object Functions compact compareHoldout crossval discardSupportVectors edge fitPosterior gather incrementalLearner lime loss margin partialDependence plotPartialDependence predict resubEdge resubLoss resubMargin resubPredict resume shapley testckfold

Reduce size of machine learning model Compare accuracies of two classification models using new data Cross-validate machine learning model Discard support vectors for linear support vector machine (SVM) classifier Find classification edge for support vector machine (SVM) classifier Fit posterior probabilities for support vector machine (SVM) classifier Gather properties of Statistics and Machine Learning Toolbox object from GPU Convert binary classification support vector machine (SVM) model to incremental learner Local interpretable model-agnostic explanations (LIME) Find classification error for support vector machine (SVM) classifier Find classification margins for support vector machine (SVM) classifier Compute partial dependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Classify observations using support vector machine (SVM) classifier Resubstitution classification edge Resubstitution classification loss Resubstitution classification margin Classify training data using trained classifier Resume training support vector machine (SVM) classifier Shapley values Compare accuracies of two classification models by repeated crossvalidation

35-767

35

Functions

Examples Train SVM Classifier Load Fisher's iris data set. Remove the sepal lengths and widths and all observed setosa irises. load fisheriris inds = ~strcmp(species,'setosa'); X = meas(inds,3:4); y = species(inds);

Train an SVM classifier using the processed data set. SVMModel = fitcsvm(X,y) SVMModel = ClassificationSVM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: Alpha: Bias: KernelParameters: BoxConstraints: ConvergenceInfo: IsSupportVector: Solver:

'Y' [] {'versicolor' 'virginica'} 'none' 100 [24x1 double] -14.4149 [1x1 struct] [100x1 double] [1x1 struct] [100x1 logical] 'SMO'

SVMModel is a trained ClassificationSVM classifier. Display the properties of SVMModel. For example, to determine the class order, use dot notation. classOrder = SVMModel.ClassNames classOrder = 2x1 cell {'versicolor'} {'virginica' }

The first class ('versicolor') is the negative class, and the second ('virginica') is the positive class. You can change the class order during training by using the 'ClassNames' name-value pair argument. Plot a scatter diagram of the data and circle the support vectors. sv = SVMModel.SupportVectors; figure gscatter(X(:,1),X(:,2),y) hold on plot(sv(:,1),sv(:,2),'ko','MarkerSize',10) legend('versicolor','virginica','Support Vector') hold off

35-768

ClassificationSVM

The support vectors are observations that occur on or beyond their estimated class boundaries. You can adjust the boundaries (and, therefore, the number of support vectors) by setting a box constraint during training using the 'BoxConstraint' name-value pair argument.

Train and Cross-Validate SVM Classifier Load the ionosphere data set. load ionosphere

Train and cross-validate an SVM classifier. Standardize the predictor data and specify the order of the classes. rng(1); % For reproducibility CVSVMModel = fitcsvm(X,Y,'Standardize',true,... 'ClassNames',{'b','g'},'CrossVal','on') CVSVMModel = ClassificationPartitionedModel CrossValidatedModel: 'SVM' PredictorNames: {'x1' 'x2' ResponseName: 'Y' NumObservations: 351 KFold: 10

'x3'

'x4'

'x5'

'x6'

'x7'

'x8'

'x9'

'x10'

35-769

'x11'

'x1

35

Functions

Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none'

CVSVMModel is a ClassificationPartitionedModel cross-validated SVM classifier. By default, the software implements 10-fold cross-validation. Alternatively, you can cross-validate a trained ClassificationSVM classifier by passing it to crossval. Inspect one of the trained folds using dot notation. CVSVMModel.Trained{1} ans = CompactClassificationSVM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: Alpha: Bias: KernelParameters: Mu: Sigma: SupportVectors: SupportVectorLabels:

'Y' [] {'b' 'g'} 'none' [78x1 double] -0.2209 [1x1 struct] [0.8888 0 0.6320 0.0406 0.5931 0.1205 0.5361 0.1286 0.5083 0.1879 0.47 [0.3149 0 0.5033 0.4441 0.5255 0.4663 0.4987 0.5205 0.5040 0.4780 0.56 [78x34 double] [78x1 double]

Each fold is a CompactClassificationSVM classifier trained on 90% of the data. Estimate the generalization error. genError = kfoldLoss(CVSVMModel) genError = 0.1168

On average, the generalization error is approximately 12%.

More About Box Constraint A box constraint is a parameter that controls the maximum penalty imposed on margin-violating observations, which helps to prevent overfitting (regularization). If you increase the box constraint, then the SVM classifier assigns fewer support vectors. However, increasing the box constraint can lead to longer training times.

35-770

ClassificationSVM

Gram Matrix The Gram matrix of a set of n vectors {x1,..,xn; xj ∊ Rp} is an n-by-n matrix with element (j,k) defined as G(xj,xk) = , an inner product of the transformed predictors using the kernel function ϕ. For nonlinear SVM, the algorithm forms a Gram matrix using the rows of the predictor data X. The dual formalization replaces the inner product of the observations in X with corresponding elements of the resulting Gram matrix (called the “kernel trick”). Consequently, nonlinear SVM operates in the transformed predictor space to find a separating hyperplane. Karush-Kuhn-Tucker Complementarity Conditions KKT complementarity conditions are optimization constraints required for optimal nonlinear programming solutions. In SVM, the KKT complementarity conditions are α j y jf x j − 1 + ξ j = 0 ξ j C − αj = 0 for all j = 1,...,n, where f x j = ϕ x j ′β + b, ϕ is a kernel function (see Gram matrix on page 35-2498), and ξj is a slack variable. If the classes are perfectly separable, then ξj = 0 for all j = 1,...,n. One-Class Learning One-class learning, or unsupervised SVM, aims to separate data from the origin in the highdimensional predictor space (not the original predictor space), and is an algorithm used for outlier detection. The algorithm resembles that of SVM for binary classification on page 35-2499. The objective is to minimize the dual expression 0.5 ∑ α jαkG(x j, xk) jk

with respect to α1, ..., αn, subject to

∑ α j = nν and 0 ≤ α j ≤ 1 for all j = 1,...,n. The value of G(xj,xk) is in element (j,k) of the Gram matrix on page 352498. A small value of ν leads to fewer support vectors and, therefore, a smooth, crude decision boundary. A large value of ν leads to more support vectors and, therefore, a curvy, flexible decision boundary. The optimal value of ν should be large enough to capture the data complexity and small enough to avoid overtraining. Also, 0 < ν ≤ 1. For more details, see [5]. Support Vector Support vectors are observations corresponding to strictly positive estimates of α1,...,αn. SVM classifiers that yield fewer support vectors for a given training set are preferred. 35-771

35

Functions

Support Vector Machines for Binary Classification The SVM binary classification algorithm searches for an optimal hyperplane that separates the data into two classes. For separable classes, the optimal hyperplane maximizes a margin (space that does not contain any observations) surrounding itself, which creates boundaries for the positive and negative classes. For inseparable classes, the objective is the same, but the algorithm imposes a penalty on the length of the margin for every observation that is on the wrong side of its class boundary. The linear SVM score function is f (x) = x′β + b, where: • x is an observation (corresponding to a row of X). • The vector β contains the coefficients that define an orthogonal vector to the hyperplane (corresponding to Mdl.Beta). For separable data, the optimal margin length is 2/ β . • b is the bias term (corresponding to Mdl.Bias). The root of f(x) for particular coefficients defines a hyperplane. For a particular hyperplane, f(z) is the distance from point z to the hyperplane. The algorithm searches for the maximum margin length, while keeping observations in the positive (y = 1) and negative (y = –1) classes separate. • For separable classes, the objective is to minimize β with respect to the β and b subject to yjf(xj) ≥ 1, for all j = 1,..,n. This is the primal formalization for separable classes. • For inseparable classes, the algorithm uses slack variables (ξj) to penalize the objective function for observations that cross the margin boundary for their class. ξj = 0 for observations that do not cross the margin boundary for their class, otherwise ξj ≥ 0. 2

The objective is to minimize 0.5 β + C ∑ ξ j with respect to the β, b, and ξj subject to y j f (x j) ≥ 1 − ξ j and ξ j ≥ 0 for all j = 1,..,n, and for a positive scalar box constraint on page 352498 C. This is the primal formalization for inseparable classes. The algorithm uses the Lagrange multipliers method to optimize the objective, which introduces n coefficients α1,...,αn (corresponding to Mdl.Alpha). The dual formalizations for linear SVM are as follows: • For separable classes, minimize n

0.5

n

∑

j=1

∑

n

α jαk y j ykx j′xk −

k=1

∑

αj

j=1

with respect to α1,...,αn, subject to ∑ α j y j = 0, αj ≥ 0 for all j = 1,...,n, and Karush-Kuhn-Tucker (KKT) complementarity conditions on page 35-2498. • For inseparable classes, the objective is the same as for separable classes, except for the additional condition 0 ≤ α j ≤ C for all j = 1,..,n. The resulting score function is 35-772

ClassificationSVM

f (x) =

n

∑

j=1

α j y jx′x j + b .

b is the estimate of the bias and α j is the jth estimate of the vector α , j = 1,...,n. Written this way, the score function is free of the estimate of β as a result of the primal formalization. The SVM algorithm classifies a new observation z using sign f z . In some cases, a nonlinear boundary separates the classes. Nonlinear SVM works in a transformed predictor space to find an optimal, separating hyperplane. The dual formalization for nonlinear SVM is 0.5

n

n

∑ ∑

j = 1k = 1

α jαk y j ykG(x j, xk) −

n

∑

j=1

αj

with respect to α1,...,αn, subject to ∑ α j y j = 0, 0 ≤ α j ≤ C for all j = 1,..,n, and the KKT complementarity conditions. G(xk,xj) are elements of the Gram matrix on page 35-2498. The resulting score function is f (x) =

n

∑

j=1

α j y jG(x, x j) + b .

For more details, see Understanding Support Vector Machines on page 25-2, [1], and [3].

Algorithms • For the mathematical formulation of the SVM binary classification algorithm, see “Support Vector Machines for Binary Classification” on page 35-2499 and “Understanding Support Vector Machines” on page 25-2. • NaN, , empty character vector (''), empty string (""), and values indicate missing values. fitcsvm removes entire rows of data corresponding to a missing response. When computing total weights (see the next bullets), fitcsvm ignores any weight corresponding to an observation with at least one missing predictor. This action can lead to unbalanced prior probabilities in balanced-class problems. Consequently, observation box constraints might not equal BoxConstraint. • If you specify the Cost, Prior, and Weights name-value arguments, the output model object stores the specified values in the Cost, Prior, and W properties, respectively. The Cost property stores the user-specified cost matrix (C) without modification. The Prior and W properties store the prior probabilities and observation weights, respectively, after normalization. For model training, the software updates the prior probabilities and observation weights to incorporate the penalties described in the cost matrix. For details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 19-8. Note that the Cost and Prior name-value arguments are used for two-class learning. For oneclass learning, the Cost and Prior properties store 0 and 1, respectively. • For two-class learning, fitcsvm assigns a box constraint to each observation in the training data. The formula for the box constraint of observation j is C j = nC0w∗j , 35-773

35

Functions

where C0 is the initial box constraint (see the BoxConstraint name-value argument), and wj* is the observation weight adjusted by Cost and Prior for observation j. For details about the observation weights, see “Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix” on page 19-9. • If you specify Standardize as true and set the Cost, Prior, or Weights name-value argument, then fitcsvm standardizes the predictors using their corresponding weighted means and weighted standard deviations. That is, fitcsvm standardizes predictor j (xj) using x∗j =

x j − μ∗j σ∗j

,

where xjk is observation k (row) of predictor j (column), and μ∗j =

1 ∑ w*x , ∑ wk* k k jk k

σ∗j

2

=

v1 v12 − v2

k

∑ w*j ,

v1 = v2 =

∑ wk* x jk − μ∗j 2, j

∑ j

2

w*j .

• Assume that p is the proportion of outliers that you expect in the training data, and that you set 'OutlierFraction',p. • For one-class learning, the software trains the bias term such that 100p% of the observations in the training data have negative scores. • The software implements robust learning for two-class learning. In other words, the software attempts to remove 100p% of the observations when the optimization algorithm converges. The removed observations correspond to gradients that are large in magnitude. • If your predictor data contains categorical variables, then the software generally uses full dummy encoding for these variables. The software creates one dummy variable for each level of each categorical variable. • The PredictorNames property stores one element for each of the original predictor variable names. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then PredictorNames is a 1-by-3 cell array of character vectors containing the original names of the predictor variables. • The ExpandedPredictorNames property stores one element for each of the predictor variables, including the dummy variables. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then ExpandedPredictorNames is a 1-by-5 cell array of character vectors containing the names of the predictor variables and the new dummy variables. • Similarly, the Beta property stores one beta coefficient for each predictor, including the dummy variables. • The SupportVectors property stores the predictor values for the support vectors, including the dummy variables. For example, assume that there are m support vectors and three predictors, one of which is a categorical variable with three levels. Then SupportVectors is an n-by-5 matrix. 35-774

ClassificationSVM

• The X property stores the training data as originally input and does not include the dummy variables. When the input is a table, X contains only the columns used as predictors. • For predictors specified in a table, if any of the variables contain ordered (ordinal) categories, the software uses ordinal encoding for these variables. • For a variable with k ordered levels, the software creates k – 1 dummy variables. The jth dummy variable is –1 for levels up to j, and +1 for levels j + 1 through k. • The names of the dummy variables stored in the ExpandedPredictorNames property indicate the first level with the value +1. The software stores k – 1 additional predictor names for the dummy variables, including the names of levels 2, 3, ..., k. • All solvers implement L1 soft-margin minimization. • For one-class learning, the software estimates the Lagrange multipliers, α1,...,αn, such that n

∑

j=1

α j = nν .

Version History Introduced in R2014a R2023b: Model stores observations with missing predictor values Behavior changed in R2023b Starting in R2023b, training observations with missing predictor values are included in the X, Y, and W data properties. The RowsUsed property indicates the training observations stored in the model, rather than those used for training. Observations with missing predictor values continue to be omitted from the model training process. In previous releases, the software omitted training observations that contained missing predictor values from the data properties of the model. R2022a: Cost property stores the user-specified cost matrix Behavior changed in R2022a Starting in R2022a, the Cost property stores the user-specified cost matrix, so that you can compute the observed misclassification cost using the specified cost value. The software stores normalized prior probabilities (Prior) and observation weights (W) that do not reflect the penalties described in the cost matrix. To compute the observed misclassification cost, specify the LossFun name-value argument as "classifcost" when you call the loss or resubLoss function. Note that model training has not changed and, therefore, the decision boundaries between classes have not changed. For training, the fitting function updates the specified prior probabilities by incorporating the penalties described in the specified cost matrix, and then normalizes the prior probabilities and observation weights. This behavior has not changed. In previous releases, the software stored the default cost matrix in the Cost property and stored the prior probabilities and observation weights used for training in the Prior and W properties, respectively. Starting in R2022a, the software stores the user-specified cost matrix without modification, and stores normalized prior probabilities and observation weights that do not reflect the cost penalties. For more details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 19-8. 35-775

35

Functions

Some object functions use the Cost, Prior, and W properties: • The loss and resubLoss functions use the cost matrix stored in the Cost property if you specify the LossFun name-value argument as "classifcost" or "mincost". • The loss and edge functions use the prior probabilities stored in the Prior property to normalize the observation weights of the input data. • The resubLoss and resubEdge functions use the observation weights stored in the W property. If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases. If you want the software to handle the cost matrix, prior probabilities, and observation weights in the same way as in previous releases, adjust the prior probabilities and observation weights for the nondefault cost matrix, as described in “Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix” on page 19-9. Then, when you train a classification model, specify the adjusted prior probabilities and observation weights by using the Prior and Weights name-value arguments, respectively, and use the default cost matrix.

References [1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008. [2] Scholkopf, B., J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson. “Estimating the Support of a High-Dimensional Distribution.” Neural Comput., Vol. 13, Number 7, 2001, pp. 1443–1471. [3] Christianini, N., and J. C. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000. [4] Scholkopf, B., and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, Adaptive Computation and Machine Learning. Cambridge, MA: The MIT Press, 2002.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict and update functions support code generation. • To integrate the prediction of an SVM classification model into Simulink, you can use the ClassificationSVM Predict block in the Statistics and Machine Learning Toolbox library or a MATLAB Function block with the predict function. • When you train an SVM model by using fitcsvm, the following restrictions apply. • The value of the 'ScoreTransform' name-value pair argument cannot be an anonymous function. For generating code that predicts posterior probabilities given new observations, pass a trained SVM model to fitPosterior or fitSVMPosterior. The ScoreTransform property of the returned model contains an anonymous function that represents the score-toposterior-probability function and is configured for code generation. 35-776

ClassificationSVM

• For fixed-point code generation, the value of the 'ScoreTransform' name-value pair argument cannot be 'invlogit'. Also, the value of the 'KernelFunction' name-value pair argument must be 'gaussian', 'linear', or 'polynomial'. • For fixed-point code generation and code generation with a coder configurer, the following additional restrictions apply. • Categorical predictors (logical, categorical, char, string, or cell) are not supported. You cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. • Class labels with the categorical data type are not supported. Both the class label value in the training data (Tbl or Y) and the value of the ClassNames name-value argument cannot be an array with the categorical data type. For more information, see “Introduction to Code Generation” on page 34-3. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • The following object functions fully support GPU arrays: • compact • crossval • discardSupportVectors • fitPosterior • gather • resubEdge • resubLoss • resubMargin • resubPredict • resume • The following object functions offer limited support for GPU arrays: • compareHoldout • edge • loss • margin • partialDependence • plotPartialDependence • predict • The object functions execute on a GPU if either of the following apply: • The model was fitted with GPU arrays. • The predictor data that you pass to the object function is a GPU array. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). 35-777

35

Functions

See Also fitcsvm | CompactClassificationSVM | ClassificationPartitionedModel Topics Using Support Vector Machines on page 25-6 Understanding Support Vector Machines on page 25-2

35-778

ClassificationSVMCoderConfigurer

ClassificationSVMCoderConfigurer Coder configurer for support vector machine (SVM) for one-class and binary classification

Description A ClassificationSVMCoderConfigurer object is a coder configurer of an SVM classification model (ClassificationSVM or CompactClassificationSVM). A coder configurer offers convenient features to configure code generation options, generate C/C++ code, and update model parameters in the generated code. • Configure code generation options and specify the coder attributes of SVM model parameters by using object properties. • Generate C/C++ code for the predict and update functions of the SVM classification model by using generateCode. Generating C/C++ code requires MATLAB Coder. • Update model parameters in the generated C/C++ code without having to regenerate the code. This feature reduces the effort required to regenerate, redeploy, and reverify C/C++ code when you retrain the SVM model with new data or settings. Before updating model parameters, use validatedUpdateInputs to validate and extract the model parameters to update. This flow chart shows the code generation workflow using a coder configurer.

For the code generation usage notes and limitations of an SVM classification model, see the Code Generation sections of CompactClassificationSVM, predict, and update.

Creation After training an SVM classification model by using fitcsvm, create a coder configurer for the model by using learnerCoderConfigurer. Use the properties of a coder configurer to specify the coder attributes of predict and update arguments. Then, use generateCode to generate C/C++ code based on the specified coder attributes.

Properties predict Arguments The properties listed in this section specify the coder attributes of the predict function arguments in the generated code. 35-779

35

Functions

X — Coder attributes of predictor data LearnerCoderInput object Coder attributes of predictor data to pass to the generated C/C++ code for the predict function of the SVM classification model, specified as a LearnerCoderInput on page 35-792 object. When you create a coder configurer by using the learnerCoderConfigurer function, the input argument X determines the default values of the LearnerCoderInput coder attributes: • SizeVector — The default value is the array size of the input X. • VariableDimensions — This value is [0 0](default) or [1 0]. • [0 0] indicates that the array size is fixed as specified in SizeVector. • [1 0] indicates that the array has variable-size rows and fixed-size columns. In this case, the first value of SizeVector is the upper bound for the number of rows, and the second value of SizeVector is the number of columns. • DataType — This value is single or double. The default data type depends on the data type of the input X. • Tunability — This value must be true, meaning that predict in the generated C/C++ code always includes predictor data as an input. You can modify the coder attributes by using dot notation. For example, to generate C/C++ code that accepts predictor data with 100 observations of three predictor variables, specify these coder attributes of X for the coder configurer configurer: configurer.X.SizeVector = [100 3]; configurer.X.DataType = 'double'; configurer.X.VariableDimensions = [0 0];

[0 0] indicates that the first and second dimensions of X (number of observations and number of predictor variables, respectively) have fixed sizes. To allow the generated C/C++ code to accept predictor data with up to 100 observations, specify these coder attributes of X: configurer.X.SizeVector = [100 3]; configurer.X.DataType = 'double'; configurer.X.VariableDimensions = [1 0];

[1 0] indicates that the first dimension of X (number of observations) has a variable size and the second dimension of X (number of predictor variables) has a fixed size. The specified number of observations, 100 in this example, becomes the maximum allowed number of observations in the generated C/C++ code. To allow any number of observations, specify the bound as Inf. NumOutputs — Number of outputs in predict 1 (default) | 2 Number of output arguments to return from the generated C/C++ code for the predict function of the SVM classification model, specified as 1 or 2. The output arguments of predict are label (predicted class labels) and score (scores or posterior probabilities) in the order of listed. predict in the generated C/C++ code returns the first n outputs of the predict function, where n is the NumOutputs value. 35-780

ClassificationSVMCoderConfigurer

After creating the coder configurer configurer, you can specify the number of outputs by using dot notation. configurer.NumOutputs = 2;

The NumOutputs property is equivalent to the '-nargout' compiler option of codegen. This option specifies the number of output arguments in the entry-point function of code generation. The object function generateCode generates two entry-point functions—predict.m and update.m for the predict and update functions of an SVM classification model, respectively—and generates C/C++ code for the two entry-point functions. The specified value for the NumOutputs property corresponds to the number of output arguments in the entry-point function predict.m. Data Types: double update Arguments The properties listed in this section specify the coder attributes of the update function arguments in the generated code. The update function takes a trained model and new model parameters as input arguments, and returns an updated version of the model that contains the new parameters. To enable updating the parameters in the generated code, you need to specify the coder attributes of the parameters before generating code. Use a LearnerCoderInput on page 35-792 object to specify the coder attributes of each parameter. The default attribute values are based on the model parameters in the input argument Mdl of learnerCoderConfigurer. Alpha — Coder attributes of trained classifier coefficients LearnerCoderInput object Coder attributes of the trained classifier coefficients (Alpha of an SVM classification model), specified as a LearnerCoderInput on page 35-792 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — The default value is [s,1], where s is the number of support vectors in Mdl. • VariableDimensions — This value is [0 0](default) or [1 0]. • [0 0] indicates that the array size is fixed as specified in SizeVector. • [1 0] indicates that the array has variable-size rows and fixed-size columns. In this case, the first value of SizeVector is the upper bound for the number of rows, and the second value of SizeVector is the number of columns. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — If you train a model with a linear kernel function, and the model stores the linear predictor coefficients (Beta) without the support vectors and related values, then this value must be false. Otherwise, this value must be true. Beta — Coder attributes of linear predictor coefficients LearnerCoderInput object Coder attributes of the linear predictor coefficients (Beta of an SVM classification model), specified as a LearnerCoderInput on page 35-792 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: 35-781

35

Functions

• SizeVector — This value must be [p 1], where p is the number of predictors in Mdl. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — If you train a model with a linear kernel function, and the model stores the linear predictor coefficients (Beta) without the support vectors and related values, then this value must be true. Otherwise, this value must be false. Bias — Coder attributes of bias term LearnerCoderInput object Coder attributes of the bias term (Bias of an SVM classification model), specified as a LearnerCoderInput on page 35-792 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — This value must be [1 1]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — This value must be true. Cost — Coder attributes of misclassification cost LearnerCoderInput object Coder attributes of the misclassification cost (Cost of an SVM classification model), specified as a LearnerCoderInput on page 35-792 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — For binary classification, this value must be [2 2]. For one-class classification, this value must be [1 1]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — For binary classification, the default value is true. For one-class classification, this value must be false. Mu — Coder attributes of predictor means LearnerCoderInput object Coder attributes of the predictor means (Mu of an SVM classification model), specified as a LearnerCoderInput on page 35-792 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: 35-782

ClassificationSVMCoderConfigurer

• SizeVector — If you train Mdl using standardized predictor data by specifying 'Standardize',true, this value must be [1,p], where p is the number of predictors in Mdl. Otherwise, this value must be [0,0]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — If you train Mdl using standardized predictor data by specifying 'Standardize',true, the default value is true. Otherwise, this value must be false. Prior — Coder attributes of prior probabilities LearnerCoderInput object Coder attributes of the prior probabilities (Prior of an SVM classification model), specified as a LearnerCoderInput on page 35-792 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — For binary classification, this value must be [1 2]. For one-class classification, this value must be [1 1]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — For binary classification, the default value is true. For one-class classification, this value must be false. Scale — Coder attributes of kernel scale parameter LearnerCoderInput object Coder attributes of the kernel scale parameter (KernelParameters.Scale of an SVM classification model), specified as a LearnerCoderInput on page 35-792 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — This value must be [1 1]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — The default value is true. Sigma — Coder attributes of predictor standard deviations LearnerCoderInput object Coder attributes of the predictor standard deviations (Sigma of an SVM classification model), specified as a LearnerCoderInput on page 35-792 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: 35-783

35

Functions

• SizeVector — If you train Mdl using standardized predictor data by specifying 'Standardize',true, this value must be [1,p], where p is the number of predictors in Mdl. Otherwise, this value must be [0,0]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — If you train Mdl using standardized predictor data by specifying 'Standardize',true, the default value is true. Otherwise, this value must be false. SupportVectorLabels — Coder attributes of support vector class labels LearnerCoderInput object Coder attributes of the support vector class labels (SupportVectorLabels of an SVM classification model), specified as a LearnerCoderInput on page 35-792 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — The default value is [s,1], where s is the number of support vectors in Mdl. • VariableDimensions — This value is [0 0](default) or [1 0]. • [0 0] indicates that the array size is fixed as specified in SizeVector. • [1 0] indicates that the array has variable-size rows and fixed-size columns. In this case, the first value of SizeVector is the upper bound for the number of rows, and the second value of SizeVector is the number of columns. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — If you train a model with a linear kernel function, and the model stores the linear predictor coefficients (Beta) without the support vectors and related values, then this value must be false. Otherwise, this value must be true. SupportVectors — Coder attributes of support vectors LearnerCoderInput object Coder attributes of the support vectors (SupportVectors of an SVM classification model), specified as a LearnerCoderInput on page 35-792 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — The default value is [s,p], where s is the number of support vectors, and p is the number of predictors in Mdl. • VariableDimensions — This value is [0 0](default) or [1 0]. • [0 0] indicates that the array size is fixed as specified in SizeVector. • [1 0] indicates that the array has variable-size rows and fixed-size columns. In this case, the first value of SizeVector is the upper bound for the number of rows, and the second value of SizeVector is the number of columns. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. 35-784

ClassificationSVMCoderConfigurer

• Tunability — If you train a model with a linear kernel function, and the model stores the linear predictor coefficients (Beta) without the support vectors and related values, then this value must be false. Otherwise, this value must be true. Other Configurer Options OutputFileName — File name of generated C/C++ code 'ClassificationSVMModel' (default) | character vector File name of the generated C/C++ code, specified as a character vector. The object function generateCode of ClassificationSVMCoderConfigurer generates C/C++ code using this file name. The file name must not contain spaces because they can lead to code generation failures in certain operating system configurations. Also, the name must be a valid MATLAB function name. After creating the coder configurer configurer, you can specify the file name by using dot notation. configurer.OutputFileName = 'myModel';

Data Types: char Verbose — Verbosity level true (logical 1) (default) | false (logical 0) Verbosity level, specified as true (logical 1) or false (logical 0). The verbosity level controls the display of notification messages at the command line. Value

Description

true (logical 1)

The software displays notification messages when your changes to the coder attributes of a parameter result in changes for other dependent parameters.

false (logical 0)

The software does not display notification messages.

To enable updating machine learning model parameters in the generated code, you need to configure the coder attributes of the parameters before generating code. The coder attributes of parameters are dependent on each other, so the software stores the dependencies as configuration constraints. If you modify the coder attributes of a parameter by using a coder configurer, and the modification requires subsequent changes to other dependent parameters to satisfy configuration constraints, then the software changes the coder attributes of the dependent parameters. The verbosity level determines whether or not the software displays notification messages for these subsequent changes. After creating the coder configurer configurer, you can modify the verbosity level by using dot notation. configurer.Verbose = false;

Data Types: logical Options for Code Generation Customization To customize the code generation workflow, use the generateFiles function and the following three properties with codegen, instead of using the generateCode function. 35-785

35

Functions

After generating the two entry-point function files (predict.m and update.m) by using the generateFiles function, you can modify these files according to your code generation workflow. For example, you can modify the predict.m file to include data preprocessing, or you can add these entry-point functions to another code generation project. Then, you can generate C/C++ code by using the codegen function and the codegen arguments appropriate for the modified entry-point functions or code generation project. Use the three properties described in this section as a starting point to set the codegen arguments. CodeGenerationArguments — codegen arguments cell array This property is read-only. codegen arguments, specified as a cell array. This property enables you to customize the code generation workflow. Use the generateCode function if you do not need to customize your workflow. Instead of using generateCode with the coder configurer configurer, you can generate C/C++ code as follows: generateFiles(configurer) cgArgs = configurer.CodeGenerationArguments; codegen(cgArgs{:})

If you customize the code generation workflow, modify cgArgs accordingly before calling codegen. If you modify other properties of configurer, the software updates the CodeGenerationArguments property accordingly. Data Types: cell PredictInputs — Input argument of predict cell array of a coder.PrimitiveType object This property is read-only. Input argument of the entry-point function predict.m for code generation, specified as a cell array of a coder.PrimitiveType object. The coder.PrimitiveType object includes the coder attributes of the predictor data stored in the X property. If you modify the coder attributes of the predictor data, then the software updates the coder.PrimitiveType object accordingly. The coder.PrimitiveType object in PredictInputs is equivalent to configurer.CodeGenerationArguments{6} for the coder configurer configurer. Data Types: cell UpdateInputs — List of tunable input arguments of update cell array of a structure including coder.PrimitiveType objects This property is read-only. List of the tunable input arguments of the entry-point function update.m for code generation, specified as a cell array of a structure including coder.PrimitiveType objects. Each 35-786

ClassificationSVMCoderConfigurer

coder.PrimitiveType object includes the coder attributes of a tunable machine learning model parameter. If you modify the coder attributes of a model parameter by using the coder configurer properties (update Arguments on page 35-781 properties), then the software updates the corresponding coder.PrimitiveType object accordingly. If you specify the Tunability attribute of a machine learning model parameter as false, then the software removes the corresponding coder.PrimitiveType object from the UpdateInputs list. The structure in UpdateInputs is equivalent to configurer.CodeGenerationArguments{3} for the coder configurer configurer. Data Types: cell

Object Functions generateCode generateFiles validatedUpdateInputs

Generate C/C++ code using coder configurer Generate MATLAB files for code generation using coder configurer Validate and extract machine learning model parameters to update

Examples Generate Code Using Coder Configurer Train a machine learning model, and then generate code for the predict and update functions of the model by using a coder configurer. Load the ionosphere data set and train a binary SVM classification model. load ionosphere Mdl = fitcsvm(X,Y);

Mdl is a ClassificationSVM object, which is a linear SVM model. The predictor coefficients in a linear SVM model provide enough information to predict labels for new observations. Removing the support vectors reduces memory usage in the generated code. Remove the support vectors from the linear SVM model by using the discardSupportVectors function. Mdl = discardSupportVectors(Mdl);

Create a coder configurer for the ClassificationSVM model by using learnerCoderConfigurer. Specify the predictor data X. The learnerCoderConfigurer function uses the input X to configure the coder attributes of the predict function input. configurer = learnerCoderConfigurer(Mdl,X) configurer = ClassificationSVMCoderConfigurer with properties: Update Inputs: Beta: Scale: Bias: Prior: Cost:

[1x1 [1x1 [1x1 [1x1 [1x1

LearnerCoderInput] LearnerCoderInput] LearnerCoderInput] LearnerCoderInput] LearnerCoderInput]

35-787

35

Functions

Predict Inputs: X: [1x1 LearnerCoderInput] Code Generation Parameters: NumOutputs: 1 OutputFileName: 'ClassificationSVMModel'

configurer is a ClassificationSVMCoderConfigurer object, which is a coder configurer of a ClassificationSVM object. To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler”. Generate code for the predict and update functions of the SVM classification model (Mdl) with default settings. generateCode(configurer) generateCode creates these files in output folder: 'initialize.m', 'predict.m', 'update.m', 'ClassificationSVMModel.mat' Code generation successful.

The generateCode function completes these actions: • Generate the MATLAB files required to generate code, including the two entry-point functions predict.m and update.m for the predict and update functions of Mdl, respectively. • Create a MEX function named ClassificationSVMModel for the two entry-point functions. • Create the code for the MEX function in the codegen\mex\ClassificationSVMModel folder. • Copy the MEX function to the current folder. Display the contents of the predict.m, update.m, and initialize.m files by using the type function. type predict.m function varargout = predict(X,varargin) %#codegen % Autogenerated by MATLAB, 20-Aug-2023 10:42:48 [varargout{1:nargout}] = initialize('predict',X,varargin{:}); end type update.m function update(varargin) %#codegen % Autogenerated by MATLAB, 20-Aug-2023 10:42:48 initialize('update',varargin{:}); end type initialize.m function [varargout] = initialize(command,varargin) %#codegen % Autogenerated by MATLAB, 20-Aug-2023 10:42:48 coder.inline('always') persistent model if isempty(model)

35-788

ClassificationSVMCoderConfigurer

model = loadLearnerForCoder('ClassificationSVMModel.mat'); end switch(command) case 'update' % Update struct fields: Beta % Scale % Bias % Prior % Cost model = update(model,varargin{:}); case 'predict' % Predict Inputs: X X = varargin{1}; if nargin == 2 [varargout{1:nargout}] = predict(model,X); else PVPairs = cell(1,nargin-2); for i = 1:nargin-2 PVPairs{1,i} = varargin{i+1}; end [varargout{1:nargout}] = predict(model,X,PVPairs{:}); end end end

Update Parameters of SVM Classification Model in Generated Code Train an SVM model using a partial data set and create a coder configurer for the model. Use the properties of the coder configurer to specify coder attributes of the SVM model parameters. Use the object function of the coder configurer to generate C code that predicts labels for new predictor data. Then retrain the model using the whole data set and update parameters in the generated code without regenerating the code. Train Model Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). load ionosphere

Train a binary SVM classification model using the first 50 observations and a Gaussian kernel function with an automatic kernel scale. Mdl = fitcsvm(X(1:50,:),Y(1:50), ... 'KernelFunction','gaussian','KernelScale','auto');

Mdl is a ClassificationSVM object. Create Coder Configurer Create a coder configurer for the ClassificationSVM model by using learnerCoderConfigurer. Specify the predictor data X. The learnerCoderConfigurer function uses the input X to configure the coder attributes of the predict function input. Also, set the number of outputs to 2 so that the generated code returns predicted labels and scores. 35-789

35

Functions

configurer = learnerCoderConfigurer(Mdl,X(1:50,:),'NumOutputs',2);

configurer is a ClassificationSVMCoderConfigurer object, which is a coder configurer of a ClassificationSVM object. Specify Coder Attributes of Parameters Specify the coder attributes of the SVM classification model parameters so that you can update the parameters in the generated code after retraining the model. This example specifies the coder attributes of predictor data that you want to pass to the generated code and the coder attributes of the support vectors of the SVM model. First, specify the coder attributes of X so that the generated code accepts any number of observations. Modify the SizeVector and VariableDimensions attributes. The SizeVector attribute specifies the upper bound of the predictor data size, and the VariableDimensions attribute specifies whether each dimension of the predictor data has a variable size or fixed size. configurer.X.SizeVector = [Inf 34]; configurer.X.VariableDimensions = [true false];

The size of the first dimension is the number of observations. In this case, the code specifies that the upper bound of the size is Inf and the size is variable, meaning that X can have any number of observations. This specification is convenient if you do not know the number of observations when generating code. The size of the second dimension is the number of predictor variables. This value must be fixed for a machine learning model. X contains 34 predictors, so the value of the SizeVector attribute must be 34 and the value of the VariableDimensions attribute must be false. If you retrain the SVM model using new data or different settings, the number of support vectors can vary. Therefore, specify the coder attributes of SupportVectors so that you can update the support vectors in the generated code. configurer.SupportVectors.SizeVector = [250 34];

SizeVector attribute for Alpha has been modified to satisfy configuration constraints. SizeVector attribute for SupportVectorLabels has been modified to satisfy configuration constrain configurer.SupportVectors.VariableDimensions = [true false];

VariableDimensions attribute for Alpha has been modified to satisfy configuration constraints. VariableDimensions attribute for SupportVectorLabels has been modified to satisfy configuration c

If you modify the coder attributes of SupportVectors, then the software modifies the coder attributes of Alpha and SupportVectorLabels to satisfy configuration constraints. If the modification of the coder attributes of one parameter requires subsequent changes to other dependent parameters to satisfy configuration constraints, then the software changes the coder attributes of the dependent parameters. Generate Code To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler”. Use generateCode to generate code for the predict and update functions of the SVM classification model (Mdl) with default settings. 35-790

ClassificationSVMCoderConfigurer

generateCode(configurer) generateCode creates these files in output folder: 'initialize.m', 'predict.m', 'update.m', 'ClassificationSVMModel.mat' Code generation successful.

generateCode generates the MATLAB files required to generate code, including the two entry-point functions predict.m and update.m for the predict and update functions of Mdl, respectively. Then generateCode creates a MEX function named ClassificationSVMModel for the two entrypoint functions in the codegen\mex\ClassificationSVMModel folder and copies the MEX function to the current folder. Verify Generated Code Pass some predictor data to verify whether the predict function of Mdl and the predict function in the MEX function return the same labels. To call an entry-point function in a MEX function that has more than one entry point, specify the function name as the first input argument. [label,score] = predict(Mdl,X); [label_mex,score_mex] = ClassificationSVMModel('predict',X);

Compare label and label_mex by using isequal. isequal(label,label_mex) ans = logical 1

isequal returns logical 1 (true) if all the inputs are equal. The comparison confirms that the predict function of Mdl and the predict function in the MEX function return the same labels. score_mex might include round-off differences compared with score. In this case, compare score_mex and score, allowing a small tolerance. find(abs(score-score_mex) > 1e-8) ans = 0x1 empty double column vector

The comparison confirms that score and score_mex are equal within the tolerance 1e–8. Retrain Model and Update Parameters in Generated Code Retrain the model using the entire data set. retrainedMdl = fitcsvm(X,Y, ... 'KernelFunction','gaussian','KernelScale','auto');

Extract parameters to update by using validatedUpdateInputs. This function detects the modified model parameters in retrainedMdl and validates whether the modified parameter values satisfy the coder attributes of the parameters. params = validatedUpdateInputs(configurer,retrainedMdl);

Update parameters in the generated code. ClassificationSVMModel('update',params)

35-791

35

Functions

Verify Generated Code Compare the outputs from the predict function of retrainedMdl and the predict function in the updated MEX function. [label,score] = predict(retrainedMdl,X); [label_mex,score_mex] = ClassificationSVMModel('predict',X); isequal(label,label_mex) ans = logical 1 find(abs(score-score_mex) > 1e-8) ans = 0x1 empty double column vector

The comparison confirms that labels and labels_mex are equal, and the score values are equal within the tolerance.

More About LearnerCoderInput Object A coder configurer uses a LearnerCoderInput object to specify the coder attributes of predict and update input arguments. A LearnerCoderInput object has the following attributes to specify the properties of an input argument array in the generated code. Attribute Name

Description

SizeVector

Array size if the corresponding VariableDimensions value is false. Upper bound of the array size if the corresponding VariableDimensions value is true. To allow an unbounded array, specify the bound as Inf.

VariableDimensions

Indicator specifying whether each dimension of the array has a variable size or fixed size, specified as true (logical 1) or false (logical 0): • A value of true (logical 1) means that the corresponding dimension has a variable size. • A value of false (logical 0) means that the corresponding dimension has a fixed size.

DataType

35-792

Data type of the array

ClassificationSVMCoderConfigurer

Attribute Name

Description

Tunability

Indicator specifying whether or not predict or update includes the argument as an input in the generated code, specified as true (logical 1) or false (logical 0). If you specify other attribute values when Tunability is false, the software sets Tunability to true.

After creating a coder configurer, you can modify the coder attributes by using dot notation. For example, specify the coder attributes of the coefficients Alpha of the coder configurer configurer as follows: configurer.Alpha.SizeVector = [100 1]; configurer.Alpha.VariableDimensions = [1 0]; configurer.Alpha.DataType = 'double';

If you specify the verbosity level (Verbose) as true (default), then the software displays notification messages when you modify the coder attributes of a machine learning model parameter and the modification changes the coder attributes of other dependent parameters.

Version History Introduced in R2018b

See Also learnerCoderConfigurer | CompactClassificationSVM | ClassificationSVM | update | predict | ClassificationECOCCoderConfigurer Topics “Introduction to Code Generation” on page 34-3 “Code Generation for Prediction and Update Using Coder Configurer” on page 34-90

35-793

35

Functions

ClassificationSVM Predict Classify observations using support vector machine (SVM) classifier for one-class and binary classification Libraries: Statistics and Machine Learning Toolbox / Classification

Description The ClassificationSVM Predict block classifies observations using an SVM classification object (ClassificationSVM or CompactClassificationSVM) for one-class and two-class (binary) classification. Import a trained SVM classification object into the block by specifying the name of a workspace variable that contains the object. The input port x receives an observation (predictor data), and the output port label returns a predicted class label for the observation. You can add the optional output port score, which returns predicted class scores or posterior probabilities.

Ports Input x — Predictor data row vector | column vector Predictor data, specified as a column vector or row vector of one observation. The variables in x must have the same order as the predictor variables that trained the SVM model specified by Select trained machine learning model. If you set 'Standardize',true in fitcsvm when training the SVM model, then the ClassificationSVM Predict block standardizes the values of x using the means and standard deviations in the Mu and Sigma properties (respectively) of the SVM model. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point Output label — Predicted class label scalar Predicted class label, returned as a scalar. • For one-class learning, label is the value representing the positive class. • For two-class learning, label is the class yielding the largest score or the largest posterior probability. 35-794

ClassificationSVM Predict

Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point | enumerated score — Predicted class scores or posterior probabilities scalar | 1-by-2 vector Predicted class scores on page 35-805 or posterior probabilities on page 35-806, returned as a scalar for one-class learning or a 1-by-2 vector for two-class learning. • For one-class learning, score is the classification score of the positive class. You cannot obtain posterior probabilities for one-class learning. • For two-class learning, score is a 1-by-2 vector. • The first and second element of score correspond to the classification scores of the negative class (svmMdl.ClassNames(1)) and the positive class (svmMdl.ClassNames(2)), respectively, where svmMdl is the SVM model specified by Select trained machine learning model. You can use the ClassNames property of svmMdl to check the negative and positive class names. • If you fit the optimal score-to-posterior-probability transformation function using fitPosterior or fitSVMPosterior, then score contains class posterior probabilities. Otherwise, score contains class scores. Dependencies

To enable this port, select the check box for Add output port for predicted class scores on the Main tab of the Block Parameters dialog box. Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point

Parameters Main Select trained machine learning model — SVM classification model svmMdl (default) | ClassificationSVM object | CompactClassificationSVM object Specify the name of a workspace variable that contains a ClassificationSVM object or CompactClassificationSVM object. When you train the SVM model by using fitcsvm, the following restrictions apply: • The predictor data cannot include categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. • The value of the ScoreTransform name-value argument cannot be 'invlogit' or an anonymous function. For a block that predicts posterior probabilities given new observations, pass a trained SVM model to fitPosterior or fitSVMPosterior. • The value of the KernelFunction name-value argument must be 'gaussian' (same as 'rbf', default for one-class learning), 'linear' (default for two-class learning), or 'polynomial'. 35-795

35

Functions

Programmatic Use

Block Parameter: TrainedLearner Type: workspace variable Values: ClassificationSVM object | CompactClassificationSVM object Default: 'svmMdl' Add output port for predicted class scores — Add second output port for predicted class scores off (default) | on Select the check box to include the second output port score in the ClassificationSVM Predict block. Programmatic Use

Block Parameter: ShowOutputScore Type: character vector Values: 'off' | 'on' Default: 'off' Data Types Fixed-Point Operational Parameters

Integer rounding mode — Rounding mode for fixed-point operations Floor (default) | Ceiling | Convergent | Nearest | Round | Simplest | Zero Specify the rounding mode for fixed-point operations. For more information, see “Rounding” (FixedPoint Designer). Block parameters always round to the nearest representable value. To control the rounding of a block parameter, enter an expression into the mask field using a MATLAB rounding function. Programmatic Use

Block Parameter: RndMeth Type: character vector Values: "Ceiling" | "Convergent" | "Floor" | "Nearest" | "Round" | "Simplest" | "Zero" Default: "Floor" Saturate on integer overflow — Method of overflow action off (default) | on Specify whether overflows saturate or wrap.

35-796

ClassificationSVM Predict

Action

Rationale

Impact on Overflows

Example

Select this check box (on).

Your model has possible overflow, and you want explicit saturation protection in the generated code.

Overflows saturate to either the minimum or maximum value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of – 128.

Clear this check box (off).

You want to optimize the efficiency of your generated code.

Overflows wrap to the appropriate value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the value causing the overflow as int8, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as int8 is –126.

You want to avoid overspecifying how a block handles out-of-range signals. For more information, see “Troubleshoot Signal Range Errors” (Simulink).

Programmatic Use

Block Parameter: SaturateOnIntegerOverflow Type: character vector Values: "off" | "on" Default: "off" Lock output data type setting against changes by the fixed-point tools — Prevention of fixedpoint tools from overriding data type off (default) | on Select this parameter to prevent the fixed-point tools from overriding the data type you specify for the block. For more information, see “Use Lock Output Data Type Setting” (Fixed-Point Designer). Programmatic Use

Block Parameter: LockScale Type: character vector Values: "off" | "on" Default: "off"

35-797

35

Functions

Data Type

Label data type — Data type of label output Inherit: Inherit via back propagation | Inherit: auto | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Enum: | Specify the data type for the label output. The type can be inherited, specified as an enumerated data type, or expressed as a data type object such as Simulink.NumericType. The supported data types depend on the labels used in the model specified by Select trained machine learning model. • If the model uses numeric or logical labels, the supported data types are Inherit: Inherit via back propagation (default), double, single, half, int8, uint8, int16, uint16, int32, uint32, int64, uint64, boolean, fixed point, and a data type object. • If the model uses nonnumeric labels, the supported data types are Inherit: auto (default), Enum: , and a data type object. When you select an inherited option, the software behaves as follows: • Inherit: Inherit via back propagation (default for numeric and logical labels) — Simulink automatically determines the Label data type of the block during data type propagation (see “Data Type Propagation” (Simulink)). In this case, the block uses the data type of a downstream block or signal object. • Inherit: auto (default for nonnumeric labels) — The block uses an autodefined enumerated data type variable. For example, suppose the workspace variable name specified by Select trained machine learning model is myMdl, and the class labels are class 1 and class 2. Then, the corresponding label values are myMdl_enumLabels.class_1 and myMdl_enumLabels.class_2. The block converts the class labels to valid MATLAB identifiers by using the matlab.lang.makeValidName function. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: LabelDataTypeStr Type: character vector Values: "Inherit: Inherit via back propagation" | "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "Enum: " | "" Default: "Inherit: Inherit via back propagation" (for numeric and logical labels) | "Inherit: auto" (for nonnumeric labels) Label data type Minimum — Minimum value of label output for range checking [] (default) | scalar Specify the lower value of the label output range that Simulink checks. 35-798

ClassificationSVM Predict

Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Label data type Minimum parameter does not saturate or clip the actual label output signal. To do so, use the Saturation block instead. Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses numeric labels. Programmatic Use

Block Parameter: LabelOutMin Type: character vector Values: "[]" | scalar Default: "[]" Label data type Maximum — Maximum value of label output for range checking [] (default) | scalar Specify the upper value of the label output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Label data type Maximum parameter does not saturate or clip the actual label output signal. To do so, use the Saturation block instead. Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses numeric labels. 35-799

35

Functions

Programmatic Use

Block Parameter: LabelOutMax Type: character vector Values: "[]" | scalar Default: "[]" Score data type — Data type of score output Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the score output. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: ScoreDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Score data type Minimum — Minimum value of score output for range checking [] (default) | scalar Specify the lower value of the score output range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Score data type Minimum parameter does not saturate or clip the actual score output. To do so, use the Saturation block instead. 35-800

ClassificationSVM Predict

Programmatic Use

Block Parameter: ScoreOutMin Type: character vector Values: "[]" | scalar Default: "[]" Score data type Maximum — Maximum value of score output for range checking [] (default) | scalar Specify the upper value of the score output range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Score data type Maximum parameter does not saturate or clip the actual score output. To do so, use the Saturation block instead. Programmatic Use

Block Parameter: ScoreOutMax Type: character vector Values: "[]" | scalar Default: "[]" Raw score data type — Untransformed score data type Inherit: auto (default) | double | single | half | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type for the internal untransformed scores. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType. When you select Inherit: auto, the block uses a rule that inherits a data type. For more information about data types, see “Control Data Types of Signals” (Simulink). to display the Data Type Assistant, which helps Click the Show data type assistant button you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink).

35-801

35

Functions

Dependencies

You can specify this parameter only if the model specified by Select trained machine learning model uses a score transformation other than "none" (default, same as "identity"). • If the model uses no score transformations ("none" or "identity"), then you can specify the score data type by using Score data type. • If the model uses a score transformation other than "none" or "identity", then you can specify the data type of untransformed raw scores by using this parameter. To specify the data type of transformed scores, use Score data type. You can change the score transformation option by specifying the ScoreTransform name-value argument during training, or by modifying the ScoreTransform property after training. Programmatic Use

Block Parameter: RawScoreDataTypeStr Type: character vector Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "boolean" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "" Default: "Inherit: auto" Raw score data type Minimum — Minimum untransformed score for range checking [] (default) | scalar Specify the lower value of the untransformed score range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Raw score data type Minimum parameter does not saturate or clip the actual untransformed score signal. Programmatic Use

Block Parameter: RawScoreOutMin Type: character vector Values: "[]" | scalar Default: "[]" Raw score data type Maximum — Maximum untransformed score for range checking [] (default) | scalar Specify the upper value of the untransformed score range that Simulink checks. 35-802

ClassificationSVM Predict

Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Raw score data type Maximum parameter does not saturate or clip the actual untransformed score signal. Programmatic Use

Block Parameter: RawScoreOutMax Type: character vector Values: "[]" | scalar Default: "[]" Kernel data type — Kernel computation data type double (default) | single | half | int8 | uint8 | int16 | uint16 | int32 | int64 | uint64 | uint32 | boolean | fixdt(1,16,0) | fixdt(1,16,2^0,0) | Specify the data type of a parameter for kernel computation. The type can be specified directly or expressed as a data type object such as Simulink.NumericType. The Kernel data type parameter specifies the data type of a different parameter depending on the type of kernel function of the specified SVM model. You specify the KernelFunction name-value argument when training the SVM model. 'KernelFunction' value

Data Type

'gaussian' or 'rbf'

Kernel data type specifies the data type of the squared distance D2 = x − s

2

for the Gaussian kernel G x, s = exp −D2 , where x is the

predictor data for an observation and s is a support vector. 'linear'

Kernel data type specifies the data type for the output of the linear kernel function G x, s = xs′, where x is the predictor data for an observation and s is a support vector.

'polynomial'

Kernel data type specifies the data type for the output of the polynomial p

kernel function G x, s = 1 + xs′ , where x is the predictor data for an observation, s is a support vector, and p is a polynomial kernel function order. For more information about data types, see “Control Data Types of Signals” (Simulink).

35-803

35

Functions

Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see “Specify Data Types Using Data Type Assistant” (Simulink). Programmatic Use

Block Parameter: KernelDataTypeStr Type: character vector Values: 'double' | 'single' | 'half' | 'int8' | 'uint8' | 'int16' | 'uint16' | 'int32' | 'uint32' | 'uint64' | 'int64' | 'boolean' | 'fixdt(1,16,0)' | 'fixdt(1,16,2^0,0)' | '' Default: 'double' Kernel data type Minimum — Minimum kernel computation value for range checking [] (default) | scalar Specify the lower value of the kernel computation internal variable range that Simulink checks. Simulink uses the minimum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. • Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as software-in-the-loop (SIL) mode or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Kernel data type Minimum parameter does not saturate or clip the actual kernel computation value signal. Programmatic Use

Block Parameter: KernelOutMin Type: character vector Values: '[]' | scalar Default: '[]' Kernel data type Maximum — Maximum kernel computation value for range checking [] (default) | scalar Specify the upper value of the kernel computation internal variable range that Simulink checks. Simulink uses the maximum value to perform: • Parameter range checking for some blocks (see “Specify Minimum and Maximum Values for Block Parameters” (Simulink)). • Simulation range checking (see “Specify Signal Ranges” (Simulink) and “Enable Simulation Range Checking” (Simulink)). • Automatic scaling of fixed-point data types. 35-804

ClassificationSVM Predict

• Optimization of the code that you generate from the model. This optimization can remove algorithmic code and affect the results of some simulation modes, such as SIL or external mode. For more information, see Optimize using the specified minimum and maximum values (Embedded Coder). Note The Kernel data type Maximum parameter does not saturate or clip the actual kernel computation value signal. Programmatic Use

Block Parameter: KernelOutMax Type: character vector Values: '[]' | scalar Default: '[]'

Block Characteristics Data Types

Boolean | double | enumerated | fixed point | half | integer | single

Direct Feedthrough

yes

Multidimensional Signals

no

Variable-Size Signals

no

Zero-Crossing Detection

no

More About Classification Score The SVM classification score for classifying observation x is the signed distance from x to the decision boundary ranging from -∞ to +∞. A positive score for a class indicates that x is predicted to be in that class. A negative score indicates otherwise. The positive class classification score f (x) is the trained SVM classification function. f (x) is also the numerical predicted response for x, or the score for predicting x into the positive class. f (x) =

n

∑

j=1

α j y jG(x j, x) + b,

where (α1, ..., αn, b) are the estimated SVM parameters, G(x j, x) is the dot product in the predictor space between x and the support vectors, and the sum includes the training set observations. The negative class classification score for x, or the score for predicting x into the negative class, is –f(x). If G(xj,x) = xj′x (the linear kernel), then the score function reduces to f x = x/s ′β + b . s is the kernel scale and β is the vector of fitted linear coefficients. For more details, see “Understanding Support Vector Machines” on page 25-2. 35-805

35

Functions

Posterior Probability The posterior probability is the probability that an observation belongs in a particular class, given the data. For SVM, the posterior probability is a function of the score P(s) that observation j is in class k = {-1,1}. • For separable classes, the posterior probability is the step function 0; s < max sk yk = − 1

P sj =

π;

max sk ≤ s j ≤

yk = − 1

1; s j >

min sk ,

yk = + 1

min sk

yk = + 1

where: • sj is the score of observation j. • +1 and –1 denote the positive and negative classes, respectively. • π is the prior probability that an observation is in the positive class. • For inseparable classes, the posterior probability is the sigmoid function P(s j) =

1 , 1 + exp(As j + B)

where the parameters A and B are the slope and intercept parameters, respectively. Prior Probability The prior probability of a class is the assumed relative frequency with which observations from that class occur in a population.

Tips • If you are using a linear SVM model and it has many support vectors, then prediction (classifying observations) can be slow. To efficiently classify observations based on a linear SVM model, remove the support vectors from the ClassificationSVM or CompactClassificationSVM object by using discardSupportVectors.

Alternative Functionality You can use a MATLAB Function block with the predict object function of an SVM classification object (ClassificationSVM or CompactClassificationSVM). For an example, see “Predict Class Labels Using MATLAB Function Block” on page 34-49. When deciding whether to use the ClassificationSVM Predict block in the Statistics and Machine Learning Toolbox library or a MATLAB Function block with the predict function, consider the following: • If you use the Statistics and Machine Learning Toolbox library block, you can use the Fixed-Point Tool to convert a floating-point model to fixed point. 35-806

ClassificationSVM Predict

• Support for variable-size arrays must be enabled for a MATLAB Function block with the predict function. • If you use a MATLAB Function block, you can use MATLAB functions for preprocessing or postprocessing before or after predictions in the same MATLAB Function block.

Version History Introduced in R2020b R2021a: Default value of Label data type is Inherit: Inherit via back propagation for numeric and logical labels and Inherit: auto for nonnumeric labels Behavior changed in R2021a Starting in R2021a, the default data type value and the supported data types of the Label data type parameter depend on the labels used in the model specified by Select trained machine learning model. The default value is Inherit: Inherit via back propagation for numeric and logical labels, and Inherit: auto for nonnumeric labels. If you specified Label data type as Inherit: Inherit via back propagation for nonnumeric labels or Inherit: Inherit from 'Constant value', then change the value to Inherit: auto. R2021a: Default value of Score data type and Raw score data type is Inherit: auto Behavior changed in R2021a Starting in R2021a, the default value of the parameters Score data type and Raw score data type is Inherit: auto. R2021a: Specify Kernel data type as a data type name or data type object Behavior changed in R2021a Starting in R2021a, the Kernel data type parameter does not support inherited options. You can specify Kernel data type as a supported data type name or data type object.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using Simulink® Coder™. Fixed-Point Conversion Design and simulate fixed-point systems using Fixed-Point Designer™.

See Also Blocks ClassificationTree Predict | ClassificationEnsemble Predict | ClassificationECOC Predict | RegressionSVM Predict Objects ClassificationSVM | CompactClassificationSVM Functions predict | fitcsvm 35-807

35

Functions

Topics “Predict Class Labels Using ClassificationTree Predict Block” on page 34-131 “Predict Class Labels Using ClassificationEnsemble Predict Block” on page 34-140 “Predict Class Labels Using ClassificationECOC Predict Block” on page 34-182 “Predict Class Labels Using MATLAB Function Block” on page 34-49

35-808

ClassificationTree class

ClassificationTree class Superclasses: CompactClassificationTree Binary decision tree for multiclass classification

Description A ClassificationTree object represents a decision tree with binary splits for classification. An object of this class can predict responses for new data using the predict method. The object contains the data used for training, so it can also compute resubstitution predictions.

Construction Create a ClassificationTree object by using fitctree.

Properties BinEdges Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors. The software bins numeric predictors only if you specify the 'NumBins' name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default). You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl. X = mdl.X; % Predictor data Xbinned = zeros(size(X)); edges = mdl.BinEdges; % Find indices of binned predictors. idxNumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxNumeric) idxNumeric = idxNumeric'; end for j = idxNumeric x = X(:,j); % Convert x to array if x is a table. if istable(x) x = table2array(x); end % Group x into bins by using the discretize function. xbinned = discretize(x,[-inf; edges{j}; inf]); Xbinned(:,j) = xbinned; end

35-809

35

Functions

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs. CategoricalPredictors Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). CategoricalSplit An n-by-2 cell array, where n is the number of categorical splits in tree. Each row in CategoricalSplit gives left and right values for a categorical split. For each branch node with categorical split j based on a categorical predictor variable z, the left child is chosen if z is in CategoricalSplit(j,1) and the right child is chosen if z is in CategoricalSplit(j,2). The splits are in the same order as nodes of the tree. Nodes for these splits can be found by running cuttype and selecting 'categorical' cuts from top to bottom. Children An n-by-2 array containing the numbers of the child nodes for each node in tree, where n is the number of nodes. Leaf nodes have child node 0. ClassCount An n-by-k array of class counts for the nodes in tree, where n is the number of nodes and k is the number of classes. For any node number i, the class counts ClassCount(i,:) are counts of observations (from the data used in fitting the tree) from each class satisfying the conditions for node i. ClassNames List of the elements in Y with duplicates removed. ClassNames can be a categorical array, cell array of character vectors, character array, logical vector, or a numeric vector. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.) ClassProbability An n-by-k array of class probabilities for the nodes in tree, where n is the number of nodes and k is the number of classes. For any node number i, the class probabilities ClassProbability(i,:) are the estimated probabilities for each class for a point satisfying the conditions for node i. Cost Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. This property is readonly.

35-810

ClassificationTree class

CutCategories An n-by-2 cell array of the categories used at branches in tree, where n is the number of nodes. For each branch node i based on a categorical predictor variable X, the left child is chosen if X is among the categories listed in CutCategories{i,1}, and the right child is chosen if X is among those listed in CutCategories{i,2}. Both columns of CutCategories are empty for branch nodes based on continuous predictors and for leaf nodes. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutPoint An n-element vector of the values used as cut points in tree, where n is the number of nodes. For each branch node i based on a continuous predictor variable X, the left child is chosen if X=CutPoint(i). CutPoint is NaN for branch nodes based on categorical predictors and for leaf nodes. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutType An n-element cell array indicating the type of cut at each node in tree, where n is the number of nodes. For each node i, CutType{i} is: • 'continuous' — If the cut is defined in the form X < v for a variable X and cut point v. • 'categorical' — If the cut is defined by whether a variable X takes a value in a set of categories. • '' — If i is a leaf node. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutPredictor An n-element cell array of the names of the variables used for branching in each node in tree, where n is the number of nodes. These variables are sometimes known as cut variables. For leaf nodes, CutPredictor contains an empty character vector. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutPredictorIndex An n-element array of numeric indices for the variables used for branching in each node in tree, where n is the number of nodes. For more information, see CutPredictor. ExpandedPredictorNames Expanded predictor names, stored as a cell array of character vectors. If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. 35-811

35

Functions

HyperparameterOptimizationResults Description of the cross-validation optimization of hyperparameters, stored as a BayesianOptimization object or a table of hyperparameters and associated values. Nonempty when the OptimizeHyperparameters name-value pair is nonempty at creation. Value depends on the setting of the HyperparameterOptimizationOptions name-value pair at creation: • 'bayesopt' (default) — Object of class BayesianOptimization • 'gridsearch' or 'randomsearch' — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst) IsBranchNode An n-element logical vector that is true for each branch node and false for each leaf node of tree. ModelParameters Parameters used in training tree. To display all parameter values, enter tree.ModelParameters. To access a particular parameter, use dot notation. NumObservations Number of observations in the training data, a numeric scalar. NumObservations can be less than the number of rows of input data X when there are missing values in X or response Y. NodeClass An n-element cell array with the names of the most probable classes in each node of tree, where n is the number of nodes in the tree. Every element of this array is a character vector equal to one of the class names in ClassNames. NodeError An n-element vector of the errors of the nodes in tree, where n is the number of nodes. NodeError(i) is the misclassification probability for node i. NodeProbability An n-element vector of the probabilities of the nodes in tree, where n is the number of nodes. The probability of a node is computed as the proportion of observations from the original data that satisfy the conditions for the node. This proportion is adjusted for any prior probabilities assigned to each class. NodeRisk An n-element vector of the risk of the nodes in the tree, where n is the number of nodes. The risk for each node is the measure of impurity (Gini index or deviance) for this node weighted by the node probability. If the tree is grown by twoing, the risk for each node is zero. NodeSize An n-element vector of the sizes of the nodes in tree, where n is the number of nodes. The size of a node is defined as the number of observations from the data used to create the tree that satisfy the conditions for the node. 35-812

ClassificationTree class

NumNodes The number of nodes in tree. Parent An n-element vector containing the number of the parent node for each node in tree, where n is the number of nodes. The parent of the root node is 0. PredictorNames Cell array of character vectors containing the predictor names, in the order which they appear in X. Prior Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames. The number of elements of Prior is the number of unique classes in the response. This property is read-only. PruneAlpha Numeric vector with one element per pruning level. If the pruning level ranges from 0 to M, then PruneAlpha has M + 1 elements sorted in ascending order. PruneAlpha(1) is for pruning level 0 (no pruning), PruneAlpha(2) is for pruning level 1, and so on. PruneList An n-element numeric vector with the pruning levels in each node of tree, where n is the number of nodes. The pruning levels range from 0 (no pruning) to M, where M is the distance between the deepest leaf and the root node. ResponseName A character vector that specifies the name of the response variable (Y). RowsUsed An n-element logical vector indicating which rows of the original predictor data (X) were used in fitting. If the software uses all rows of X, then RowsUsed is an empty array ([]). ScoreTransform Function handle for transforming predicted classification scores, or character vector representing a built-in transformation function. none means no transformation, or @(x)x. To change the score transformation function to, for example, function, use dot notation. • For available functions (see fitctree), enter Mdl.ScoreTransform = 'function';

• You can set a function handle for an available function, or a function you define yourself by entering tree.ScoreTransform = @function;

35-813

35

Functions

SurrogateCutCategories An n-element cell array of the categories used for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrogateCutCategories{k} is a cell array. The length of SurrogateCutCategories{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutCategories{k} is either an empty character vector for a continuous surrogate predictor, or is a two-element cell array with categories for a categorical surrogate predictor. The first element of this two-element cell array lists categories assigned to the left child by this surrogate split, and the second element of this two-element cell array lists categories assigned to the right child by this surrogate split. The order of the surrogate split variables at each node is matched to the order of variables in SurrogateCutPredictor. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutCategories contains an empty cell. SurrogateCutFlip An n-element cell array of the numeric cut assignments used for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrogateCutFlip{k} is a numeric vector. The length of SurrogateCutFlip{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutFlip{k} is either zero for a categorical surrogate predictor, or a numeric cut assignment for a continuous surrogate predictor. The numeric cut assignment can be either –1 or +1. For every surrogate split with a numeric cut C based on a continuous predictor variable Z, the left child is chosen if Z Cluster Data. • In a code block in the live script, type a relevant keyword, such as clustering or kmeans. Select Cluster Data from the suggested command completions.

Examples

35-887

35

Functions

Cluster Data with Specified Number of Clusters By Using Live Editor Task This example shows how to use the Cluster Data task to interactively perform k-means clustering for a specified number of clusters. Load the sample data. The data contains length and width measurements from the sepals and petals of three species of iris flowers. load fisheriris

Open the Cluster Data task. To open the task, begin typing the keyword clustering in a code block and select Cluster Data from the suggested command completions.

Cluster the data into two clusters. • Select the meas variable as the input data. • Set the number of clusters to 2. •

In the Live Editor tab, press the

Run button to run the task.

MATLAB displays the clustered data and the cluster means in a scatter plot.

35-888

Cluster Data

Increase the number of clusters to 3 and rerun the task. MATLAB displays the updated clustered data and the cluster means in a scatter plot.

35-889

35

Functions

The task generates code in your live script. The generated code reflects the parameters and options that you select, and includes code to generate the scatter plot. To see the generated code, click the at the bottom of the task parameter area. The task expands to display the generated code.

35-890

Cluster Data

By default, the generated code uses clusterIndices and centroids as the name of the output variables returned to the MATLAB workspace. The clusterIndices vector is a numeric column vector containing the cluster indices. Each row in clusterIndices indicates the cluster assignment of the corresponding observation. The centroids matrix is a numeric matrix containing the cluster centroid locations. To specify a different output variable name, enter a new name in the summary line at the top of the task. For instance, change the two variable names to c_indices and c_locations.

When the task runs, the generated code is updated to reflect the new variable names. The new variables c_indices and c_locations appear in the MATLAB workspace.

Evaluate the Optimal Number of Clusters By Using Live Editor Task This example shows how to use the Cluster Data task to interactively evaluate clustering solutions based on selected criteria. Load the sample data. The data contains length and width measurements from the sepals and petals of three species of iris flowers. load fisheriris

Open the Cluster Data task. To open the task, begin typing the keyword clustering in a code block and select Cluster Data from the suggested command completions. 35-891

35

Functions

Evaluate the optimal number of clusters. • Select the meas variable as the input data. • Set the number of clusters selection method to Optimal. • Set the range min and max to 2 and 6. •

In the Live Editor tab, press the

Run button to run the task.

MATLAB displays a bar chart with evaluation results, indicating that, based on the Calinski-Harabasz criterion, the optimal number of clusters is 3. A scatter plot shows the clustered data and the cluster means using the optimal number of clusters, 3. Your results may differ.

35-892

Cluster Data

35-893

35

Functions

•

“Add Interactive Tasks to a Live Script”

Parameters Input data — Data to cluster numeric matrix Specify the data to cluster by selecting a variable from the available workspace variables. The variable must be a numeric matrix to appear in the list. Selection Method — Cluster selection method Manual (default) | Optimal Specify the method for determining the optimal number of clusters for your data. • Manual — Specify the number of clusters to group your data into manually. • Optimal— Use the evalclusters function to find the optimal number of clusters based on criteria such as gap values, silhouette values, Davies-Bouldin index values, and Calinski-Harabasz index values. Range — List of number of clusters to evaluate 2:5 (default) | min and max positive integer values Specify the list of number of clusters to evaluate as a range consisting of a min value and a max value. For example, if you specify a min value of 2 and a max value of 6, the task evaluates the number of clusters 2, 3, 4, 5, and 6 to determine the optimal number. Plots to show — Plots to show results with check boxes To display the clustered data, select from the available options: • Select 2D scatter plot (PCA) to display the principle components of the clustered data in a 2D scatter plot. The Cluster Data task uses the gscatter function to create the scatter plot. • Select Matrix of scatter plots to display the clustered data in a matrix of scatter plots. When you select Matrix of scatter plots, a list appears to the right of the check box. Each item in the list represents a column in the specified input data. Press the Ctrl key and select a maximum of four input data columns from the list. The Cluster Data task uses the pca and gplotmatrix functions to create the matrix of scatter plots from the selected columns. The scatter plots in the matrix compare the selected input data columns across cluster indices. The diagonal plots in the matrix are histograms showing the distribution of the selected columns for each cluster indices.

Tips • By default, the Cluster Data task does not automatically run when you modify the task parameters. To have the task run automatically after any change, select the autorun button at the top-right of the task. If your dataset is large, do not enable this option.

35-894

Cluster Data

Version History Introduced in R2021b

See Also kmeans | evalclusters | gscatter | gplotmatrix Topics “k-Means Clustering” on page 17-33 “Add Interactive Tasks to a Live Script”

35-895

35

Functions

cmdscale Classical multidimensional scaling

Syntax Y = cmdscale(D) [Y,e] = cmdscale(D) [Y,e] = cmdscale(D,p)

Description Y = cmdscale(D) takes an n-by-n distance matrix D, and returns an n-by-p configuration matrix Y. Rows of Y are the coordinates of n points in p-dimensional space for some p < n. When D is a Euclidean distance matrix, the distances between those points are given by D. p is the dimension of the smallest space in which the n points whose inter-point distances are given by D can be embedded. [Y,e] = cmdscale(D) also returns the eigenvalues of Y*Y'. When D is Euclidean, the first p elements of e are positive, the rest zero. If the first k elements of e are much larger than the remaining (n-k), then you can use the first k columns of Y as k-dimensional points whose inter-point distances approximate D. This can provide a useful dimension reduction for visualization, e.g., for k = 2. D need not be a Euclidean distance matrix. If it is non-Euclidean or a more general dissimilarity matrix, then some elements of e are negative, and cmdscale chooses p as the number of positive eigenvalues. In this case, the reduction to p or fewer dimensions provides a reasonable approximation to D only if the negative elements of e are small in magnitude. [Y,e] = cmdscale(D,p) also accepts a positive integer p between 1 and n. p specifies the dimensionality of the desired embedding Y. If a p dimensional embedding is possible, then Y will be of size n-by-p and e will be of size p-by-1. If only a q dimensional embedding with q < p is possible, then Y will be of size n-by-q and e will be of size p-by-1. Specifying p may reduce the computational burden when n is very large. You can specify D as either a full dissimilarity matrix, or in upper triangle vector form such as is output by pdist. A full dissimilarity matrix must be real and symmetric, and have zeros along the diagonal and positive elements everywhere else. A dissimilarity matrix in upper triangle form must have real, positive entries. You can also specify D as a full similarity matrix, with ones along the diagonal and all other elements less than one. cmdscale transforms a similarity matrix to a dissimilarity matrix in such a way that distances between the points returned in Y equal or approximate sqrt(1-D). To use a different transformation, you must transform the similarities prior to calling cmdscale.

Examples Construct a Map Using Multidimensional Scaling This example shows how to construct a map of 10 US cities based on the distances between those cities, using cmdscale. 35-896

cmdscale

First, create the distance matrix and pass it to cmdscale. In this example, D is a full distance matrix: it is square and symmetric, has positive entries off the diagonal, and has zeros on the diagonal. cities = ... {'Atl','Chi','Den','Hou','LA','Mia','NYC','SF','Sea','WDC'}; D = [ 0 587 1212 701 1936 604 748 2139 2182 543; 587 0 920 940 1745 1188 713 1858 1737 597; 1212 920 0 879 831 1726 1631 949 1021 1494; 701 940 879 0 1374 968 1420 1645 1891 1220; 1936 1745 831 1374 0 2339 2451 347 959 2300; 604 1188 1726 968 2339 0 1092 2594 2734 923; 748 713 1631 1420 2451 1092 0 2571 2408 205; 2139 1858 949 1645 347 2594 2571 0 678 2442; 2182 1737 1021 1891 959 2734 2408 678 0 2329; 543 597 1494 1220 2300 923 205 2442 2329 0]; [Y,eigvals] = cmdscale(D);

Next, look at the eigenvalues returned by cmdscale. Some of these are negative, indicating that the original distances are not Euclidean. This is because of the curvature of the earth. format short g [eigvals eigvals/max(abs(eigvals))] ans = 10×2 9.5821e+06 1.6868e+06 8157.3 1432.9 508.67 25.143 5.7999e-10 -897.7 -5467.6 -35479

1 0.17604 0.0008513 0.00014954 5.3085e-05 2.624e-06 6.0528e-17 -9.3685e-05 -0.0005706 -0.0037026

However, in this case, the two largest positive eigenvalues are much larger in magnitude than the remaining eigenvalues. So, despite the negative eigenvalues, the first two coordinates of Y are sufficient for a reasonable reproduction of D. Dtriu = D(find(tril(ones(10),-1)))'; maxrelerr = max(abs(Dtriu-pdist(Y(:,1:2))))./max(Dtriu) maxrelerr = 0.0075371

Here is a plot of the reconstructed city locations as a map. The orientation of the reconstruction is arbitrary. plot(Y(:,1),Y(:,2),'.') text(Y(:,1)+25,Y(:,2),cities) xlabel('Miles') ylabel('Miles')

35-897

35

Functions

Evaluate Reconstructions Using Different Distance Metrics Determine how the quality of reconstruction varies when you reduce points to distances using different metrics. Generate ten points in 4-D space that are close to 3-D space. Take a linear transformation of the points so that their transformed values are close to a 3-D subspace that does not align with the coordinate axes. rng A = B = X =

default % Set the seed for reproducibility [normrnd(0,1,10,3) normrnd(0,0.1,10,1)]; randn(4,4); A*B;

Reduce the points in X to distances by using the Euclidean metric. Find a configuration Y with the inter-point distances. D = pdist(X,'euclidean'); Y = cmdscale(D);

Compare the quality of the reconstructions when using 2, 3, or 4 dimensions. The small maxerr3 value indicates that the first 3 dimensions provide a good reconstruction. maxerr2 = max(abs(pdist(X)-pdist(Y(:,1:2))))

35-898

cmdscale

maxerr2 = 0.1631 maxerr3 = max(abs(pdist(X)-pdist(Y(:,1:3)))) maxerr3 = 0.0187 maxerr4 = max(abs(pdist(X)-pdist(Y))) maxerr4 = 1.2434e-14

Reduce the points in X to distances by using the 'cityblock' metric. Find a configuration Y with the inter-point distances. D = pdist(X,'cityblock'); [Y,e] = cmdscale(D);

Evaluate the quality of the reconstruction. e contains at least one negative element of large magnitude, which might account for the poor quality of the reconstruction. maxerr = max(abs(pdist(X)-pdist(Y))) maxerr = 9.0488 min(e) ans = -5.6586

Version History Introduced before R2006a

References [1] Seber, G. A. F. Multivariate Observations. Hoboken, NJ: John Wiley & Sons, Inc., 1984.

See Also mdscale | pdist | procrustes

35-899

35

Functions

coefci Confidence interval for Cox proportional hazards model coefficients

Syntax ci = coefci(coxMdl) ci = coefci(coxMdl,level)

Description ci = coefci(coxMdl) returns a 95% confidence interval for the coefficients of a trained Cox proportional hazards model. ci = coefci(coxMdl,level) returns a 100(1 – level)% confidence interval for the coefficients.

Examples Cox Model Confidence Interval Perform a Cox proportional hazards regression on the lightbulb data set, which contains simulated lifetimes of light bulbs. The first column of the light bulb data contains the lifetime (in hours) of two different types of bulbs. The second column contains a binary variable indicating whether the bulb is fluorescent or incandescent; 0 indicates the bulb is fluorescent, and 1 indicates it is incandescent. The third column contains the censoring information, where 0 indicates the bulb was observed until failure, and 1 indicates the observation was censored. Fit a Cox proportional hazards model for the lifetime of the light bulbs, accounting for censoring. The predictor variable is the type of bulb. load lightbulb coxMdl = fitcox(lightbulb(:,2),lightbulb(:,1), ... 'Censoring',lightbulb(:,3)) coxMdl = Cox Proportional Hazards regression model

X1

Beta ______

SE ______

zStat ______

pValue __________

4.7262

1.0372

4.5568

5.1936e-06

Log-likelihood: -212.638

Find a 95% confidence interval for the returned Beta estimate. ci = coefci(coxMdl) ci = 1×2

35-900

coefci

2.6934

6.7590

Find a 99% confidence interval for the Beta estimate. ci99 = coefci(coxMdl,0.01) ci99 = 1×2 2.0546

7.3978

Confidence Intervals for Multiple Predictors Find confidence intervals for predictors of the readmissiontimes data set. The response variable is ReadmissionTime, which shows the readmission times for 100 patients. The predictor variables are Age, Sex, Weight, and Smoker, the smoking status of each patient. A 1 indicates the patient is a smoker, and a 0 indicates the patient does not smoke. The column vector Censored contains the censorship information for each patient, where 1 indicates censored data, and 0 indicates the exact readmission times are observed. (This data is simulated.) Load the data. load readmissiontimes

Use all four predictors for fitting a model. X = [Age Sex Weight Smoker];

Fit the model using the censoring information. coxMdl = fitcox(X,ReadmissionTime,'censoring',Censored);

View the point estimates for the Age, Sex, Weight, and Smoker coefficients. coxMdl.Coefficients.Beta ans = 4×1 0.0184 -0.0676 0.0343 0.8172

Find 95% confidence intervals for these estimates. ci = coefci(coxMdl) ci = 4×2 -0.0139 -1.6488 0.0042 0.2767

0.0506 1.5136 0.0644 1.3576

35-901

35

Functions

The Sex coefficient (second row) has a large confidence interval, and the first two coefficients bracket the value 0. Therefore, you cannot reject the hypothesis that the Age and Sex predictors are zero.

Input Arguments coxMdl — Fitted Cox proportional hazards model CoxModel object Fitted Cox proportional hazards model, specified as a CoxModel object. Create coxMdl using fitcox. level — Level of significance for confidence interval 0.05 (default) | positive number less than 1 Level of significance for the confidence interval, specified as a positive number less than 1. The resulting percentage is 100(1 – level)%. For example, for a 99% confidence interval, specify level as 0.01. Example: 0.01 Data Types: double

Output Arguments ci — Confidence interval real two-column matrix Confidence interval, returned as a real two-column matrix. Each row of the matrix is a confidence interval for the corresponding predictor. The probability that the true predictor coefficient lies in its confidence interval is 100(1 – level)%. For example, the default value of level is 0.05, so with no level specified, the probability that each predictor lies in its row of ci is 95%.

Version History Introduced in R2021a

See Also CoxModel | linhyptest | fitcox

35-902

coefCI

coefCI Package: Confidence intervals of coefficient estimates of generalized linear regression model

Syntax ci = coefCI(mdl) ci = coefCI(mdl,alpha)

Description ci = coefCI(mdl) returns 95% confidence intervals for the coefficients in mdl. ci = coefCI(mdl,alpha) returns confidence intervals using the confidence level 1 – alpha.

Examples Find Confidence Intervals for Model Coefficients Find the confidence intervals for the coefficients of a fitted generalized linear regression model. Generate sample data using Poisson random numbers with two underlying predictors X(:,1) and X(:,2). rng('default') % For reproducibility rndvars = randn(100,2); X = [2 + rndvars(:,1),rndvars(:,2)]; mu = exp(1 + X*[1;2]); y = poissrnd(mu);

Create a generalized linear regression model of Poisson data. mdl = fitglm(X,y,'y ~ x1 + x2','Distribution','poisson') mdl = Generalized linear regression model: log(y) ~ 1 + x1 + x2 Distribution = Poisson Estimated Coefficients: Estimate ________ (Intercept) x1 x2

1.0405 0.9968 1.987

SE _________

tStat ______

0.022122 0.003362 0.0063433

47.034 296.49 313.24

pValue ______ 0 0 0

100 observations, 97 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 2.95e+05, p-value = 0

35-903

35

Functions

Find 95% (default) confidence intervals for the coefficients of the model. ci = coefCI(mdl) ci = 3×2 0.9966 0.9901 1.9744

1.0844 1.0035 1.9996

Find 99% confidence intervals for the coefficients. alpha = 0.01; ci = coefCI(mdl,alpha) ci = 3×2 0.9824 0.9880 1.9703

1.0986 1.0056 2.0036

Input Arguments mdl — Generalized linear regression model GeneralizedLinearModel object | CompactGeneralizedLinearModel object Generalized linear regression model, specified as a GeneralizedLinearModel object created using fitglm or stepwiseglm, or a CompactGeneralizedLinearModel object created using compact. alpha — Significance level 0.05 (default) | numeric value in the range [0,1] Significance level for the confidence interval, specified as a numeric value in the range [0,1]. The confidence level of ci is equal to 100(1 – alpha)%. alpha is the probability that the confidence interval does not contain the true value. Example: 0.01 Data Types: single | double

Output Arguments ci — Confidence intervals numeric matrix Confidence intervals, returned as a k-by-2 numeric matrix, where k is the number of coefficients. The jth row of ci is the confidence interval of the jth coefficient of mdl. The name of coefficient j is stored in the CoefficientNames property of mdl. Data Types: single | double

35-904

coefCI

More About Confidence Interval The coefficient confidence intervals provide a measure of precision for regression coefficient estimates. A 100(1 – α)% confidence interval gives the range for the corresponding regression coefficient with 100(1 – α)% confidence, meaning that 100(1 – α)% of the intervals resulting from repeated experimentation will contain the true value of the coefficient. The software finds confidence intervals using the Wald method. The 100(1 – α)% confidence intervals for regression coefficients are bi ± t 1 − α/2, n − p SE bi , where bi is the coefficient estimate, SE(bi) is the standard error of the coefficient estimate, and t(1–α/ 2,n–p) is the 100(1 – α/2) percentile of the t-distribution with n – p degrees of freedom. n is the number of observations and p is the number of regression coefficients.

Version History Introduced in R2012a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also GeneralizedLinearModel | CompactGeneralizedLinearModel | coefTest | devianceTest Topics “Generalized Linear Model Workflow” on page 12-28

35-905

35

Functions

coefCI Class: GeneralizedLinearMixedModel Confidence intervals for coefficients of generalized linear mixed-effects model

Syntax feCI = coefCI(glme) feCI = coefCI(glme,Name,Value) [feCI,reCI] = coefCI( ___ )

Description feCI = coefCI(glme) returns the 95% confidence intervals for the fixed-effects coefficients in the generalized linear mixed-effects model glme. feCI = coefCI(glme,Name,Value) returns the confidence intervals using additional options specified by one or more Name,Value pair arguments. For example, you can specify a different confidence level or the method used to compute the approximate degrees of freedom. [feCI,reCI] = coefCI( ___ ) also returns the confidence intervals for the random-effects coefficients using any of the previous syntaxes.

Input Arguments glme — Generalized linear mixed-effects model GeneralizedLinearMixedModel object Generalized linear mixed-effects model, specified as a GeneralizedLinearMixedModel object. For properties and methods of this object, see GeneralizedLinearMixedModel. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Alpha — Significance level 0.05 (default) | scalar value in the range [0,1] Significance level, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range [0,1]. For a value α, the confidence level is 100 × (1 – α)%. For example, for 99% confidence intervals, you can specify the confidence level as follows. Example: 'Alpha',0.01 Data Types: single | double 35-906

coefCI

DFMethod — Method for computing approximate degrees of freedom 'residual' (default) | 'none' Method for computing approximate degrees of freedom, specified as the comma-separated pair consisting of 'DFMethod' and one of the following. Value

Description

'residual'

The degrees of freedom value is assumed to be constant and equal to n – p, where n is the number of observations and p is the number of fixed effects.

'none'

The degrees of freedom is set to infinity.

Example: 'DFMethod','none'

Output Arguments feCI — Fixed-effects confidence intervals p-by-2 matrix Fixed-effects confidence intervals, returned as a p-by-2 matrix. feCI contains the confidence limits that correspond to the p-by-1 fixed-effects vector returned by the fixedEffects method. The first column of feCI contains the lower confidence limits and the second column contains the upper confidence limits. When fitting a GLME model using fitglme and one of the maximum likelihood fit methods ('Laplace' or 'ApproximateLaplace'): • If you specify the 'CovarianceMethod' name-value pair argument as 'conditional', then the confidence intervals are conditional on the estimated covariance parameters. • If you specify the 'CovarianceMethod' name-value pair argument as 'JointHessian', then the confidence intervals account for the uncertainty in the estimated covariance parameters. When fitting a GLME model using fitglme and one of the pseudo likelihood fit methods ('MPL' or 'REMPL'), coefci uses the fitted linear mixed effects model from the final pseudo likelihood iteration to compute confidence intervals on the fixed effects. reCI — Random-effects confidence intervals q-by-2 matrix Random-effects confidence intervals, returned as a q-by-2 matrix. reCI contains the confidence limits corresponding to the q-by-1 random-effects vector B returned by the randomEffects method. The first column of reCI contains the lower confidence limits, and the second column contains the upper confidence limits. When fitting a GLME model using fitglme and one of the maximum likelihood fit methods ('Laplace' or 'ApproximateLaplace'), coefCI computes the confidence intervals using the conditional mean squared error of prediction (CMSEP) approach conditional on the estimated covariance parameters and the observed response. Alternatively, you can interpret the confidence intervals from coefCI as approximate Bayesian credible intervals conditional on the estimated covariance parameters and the observed response. 35-907

35

Functions

When fitting a GLME model using fitglme and one of the pseudo likelihood fit methods ('MPL' or 'REMPL'), coefci uses the fitted linear mixed effects model from the final pseudo likelihood iteration to compute confidence intervals on the random effects.

Examples 95% Confidence Intervals for Fixed Effects Load the sample data. load mfr

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data: • Flag to indicate whether the batch used the new process (newprocess) • Processing time for each batch, in hours (time) • Temperature of the batch, in degrees Celsius (temp) • Categorical variable indicating the supplier (A, B, or C) of the chemical used in the batch (supplier) • Number of defects in the batch (defects) The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius. Fit a generalized linear mixed-effects model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects term for intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects', so the dummy variable coefficients sum to 0. The number of defects can be modeled using a Poisson distribution defectsi j ∼ Poisson(μi j) This corresponds to the generalized linear mixed-effects model log(μi j) = β0 + β1newprocessi j + β2time_devi j + β3temp_devi j + β4supplier_Ci j + β5supplier_Bi j + bi, where • defectsi j is the number of defects observed in the batch produced by factory i during batch j. • μi j is the mean number of defects corresponding to factory i (where i = 1, 2, . . . , 20) during batch j (where j = 1, 2, . . . , 5). 35-908

coefCI

• newprocessi j, time_devi j, and temp_devi j are the measurements for each variable that correspond to factory i during batch j. For example, newprocessi j indicates whether the batch produced by factory i during batch j used the new process. • supplier_Ci j and supplier_Bi j are dummy variables that use effects (sum-to-zero) coding to indicate whether company C or B, respectively, supplied the process chemicals for the batch produced by factory i during batch j. • b ∼ N(0, σ2) is a random-effects intercept for each factory i that accounts for factory-specific i b variation in quality.

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)','Dis

Use fixedEffects to display the estimates and names of the fixed-effects coefficients in glme. [beta,betanames] = fixedEffects(glme) beta = 6×1 1.4689 -0.3677 -0.0945 -0.2832 -0.0719 0.0711 betanames=6×1 table Name _______________ {'(Intercept)'} {'newprocess' } {'time_dev' } {'temp_dev' } {'supplier_C' } {'supplier_B' }

Each row of beta contains the estimated value for the coefficient named in the corresponding row of betanames. For example, the value –0.0945 in row 3 of beta is the estimated coefficient for the predictor variable time_dev. Compute the 95% confidence intervals for the fixed-effects coefficients. feCI = coefCI(glme) feCI = 6×2 1.1515 -0.7202 -1.7395 -2.1926 -0.2268 -0.0826

1.7864 -0.0151 1.5505 1.6263 0.0831 0.2247

Column 1 of feCI contains the lower bound of the 95% confidence interval. Column 2 contains the upper bound. Row 1 corresponds to the intercept term. Rows 2, 3, and 4 correspond to newprocess, 35-909

35

Functions

time_dev, and temp_dev, respectively. Rows 5 and 6 correspond to the indicator variables supplier_C and supplier_B, respectively. For example, the 95% confidence interval for the coefficient for time_dev is [-1.7395 , 1.5505]. Some of the confidence intervals include 0, which indicates that those predictors are not significant at the 5% significance level. To obtain specific pvalues for each fixed-effects term, use fixedEffects. To test significance for entire terms, use anova.

99% Confidence Intervals for Random Effects Load the sample data. load mfr

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data: • Flag to indicate whether the batch used the new process (newprocess) • Processing time for each batch, in hours (time) • Temperature of the batch, in degrees Celsius (temp) • Categorical variable indicating the supplier (A, B, or C) of the chemical used in the batch (supplier) • Number of defects in the batch (defects) The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius. Fit a generalized linear mixed-effects model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. The number of defects can be modeled using a Poisson distribution defectsi j ∼ Poisson(μi j) This corresponds to the generalized linear mixed-effects model log(μi j) = β0 + β1newprocessi j + β2time_devi j + β3temp_devi j + β4supplier_Ci j + β5supplier_Bi j + bi, where • defectsi j is the number of defects observed in the batch produced by factory i during batch j. • μi j is the mean number of defects corresponding to factory i (where i = 1, 2, . . . , 20) during batch j (where j = 1, 2, . . . , 5). 35-910

coefCI

• newprocessi j, time_devi j, and temp_devi j are the measurements for each variable that correspond to factory i during batch j. For example, newprocessi j indicates whether the batch produced by factory i during batch j used the new process. • supplier_Ci j and supplier_Bi j are dummy variables that use effects (sum-to-zero) coding to indicate whether company C or B, respectively, supplied the process chemicals for the batch produced by factory i during batch j. • b ∼ N(0, σ2) is a random-effects intercept for each factory i that accounts for factory-specific i b variation in quality.

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)','Dis

Use randomEffects to compute and display the estimates of the empirical Bayes predictors (EBPs) for the random effects associated with factory. [B,Bnames] = randomEffects(glme) B = 20×1 0.2913 0.1542 -0.2633 -0.4257 0.5453 -0.1069 0.3040 -0.1653 -0.1458 -0.0816 ⋮ Bnames=20×3 table Group ___________ {'factory'} {'factory'} {'factory'} {'factory'} {'factory'} {'factory'} {'factory'} {'factory'} {'factory'} {'factory'} {'factory'} {'factory'} {'factory'} {'factory'} {'factory'} {'factory'} ⋮

Level ______

Name _______________

{'1' } {'2' } {'3' } {'4' } {'5' } {'6' } {'7' } {'8' } {'9' } {'10'} {'11'} {'12'} {'13'} {'14'} {'15'} {'16'}

{'(Intercept)'} {'(Intercept)'} {'(Intercept)'} {'(Intercept)'} {'(Intercept)'} {'(Intercept)'} {'(Intercept)'} {'(Intercept)'} {'(Intercept)'} {'(Intercept)'} {'(Intercept)'} {'(Intercept)'} {'(Intercept)'} {'(Intercept)'} {'(Intercept)'} {'(Intercept)'}

35-911

35

Functions

Each row of B contains the estimated EBPs for the random-effects coefficient named in the corresponding row of Bnames. For example, the value -0.2633 in row 3 of B is the estimated coefficient of '(Intercept)' for level '3' of factory. Compute the 99% confidence intervals of the EBPs for the random effects. [feCI,reCI] = coefCI(glme,'Alpha',0.01); reCI reCI = 20×2 -0.2125 -0.3510 -0.8219 -0.9953 0.0730 -0.6362 -0.1796 -0.7044 -0.6795 -0.6142 ⋮

0.7951 0.6595 0.2954 0.1440 1.0176 0.4224 0.7877 0.3738 0.3880 0.4509

Column 1 of reCI contains the lower bound of the 99% confidence interval. Column 2 contains the upper bound. Each row corresponds to a level of factory, in the order shown in Bnames. For example, row 3 corresponds to the coefficient of '(Intercept)' for level '3' of factory, which has a 99% confidence interval of [-0.8219 , 0.2954]. For additional statistics related to each randomeffects term, use randomEffects.

References [1] Booth, J.G., and J.P. Hobert. “Standard Errors of Prediction in Generalized Linear Mixed Models.” Journal of the American Statistical Association. Vol. 93, 1998, pp. 262–272.

See Also GeneralizedLinearMixedModel | anova | coefTest | covarianceParameters | fixedEffects | randomEffects

35-912

coefCI

coefCI Package: Confidence intervals of coefficient estimates of linear regression model

Syntax ci = coefCI(mdl) ci = coefCI(mdl,alpha)

Description ci = coefCI(mdl) returns 95% confidence intervals for the coefficients in mdl. ci = coefCI(mdl,alpha) returns confidence intervals using the confidence level 1 – alpha.

Examples Find Confidence Intervals for Model Coefficients Fit a linear regression model and obtain the default 95% confidence intervals for the resulting model coefficients. Load the carbig data set and create a table in which the Origin predictor is categorical. load carbig Origin = categorical(cellstr(Origin)); tbl = table(Horsepower,Weight,MPG,Origin);

Fit a linear regression model. Specify Horsepower, Weight, and Origin as predictor variables, and specify MPG as the response variable. modelspec = 'MPG ~ 1 + Horsepower + Weight + Origin'; mdl = fitlm(tbl,modelspec);

View the names of the coefficients. mdl.CoefficientNames ans = 1x9 cell {'(Intercept)'}

{'Horsepower'}

{'Weight'}

{'Origin_France'}

{'Origin_Germany'}

Find confidence intervals for the coefficients of the model. ci = coefCI(mdl) ci = 9×2 43.3611 -0.0748

59.9390 -0.0315

35-913

35

Functions

-0.0059 -17.3623 -15.7503 -17.2091 -14.5106 -18.5820 -17.3114

-0.0037 -0.3477 0.7434 0.0613 1.8738 -1.5036 -0.9642

Specify Confidence Level Fit a linear regression model and obtain the confidence intervals for the resulting model coefficients using a specified confidence level. Load the carbig data set and create a table in which the Origin predictor is categorical. load carbig Origin = categorical(cellstr(Origin)); tbl = table(Horsepower,Weight,MPG,Origin);

Fit a linear regression model. Specify Horsepower, Weight, and Origin as predictor variables, and specify MPG as the response variable. modelspec = 'MPG ~ 1 + Horsepower + Weight + Origin'; mdl = fitlm(tbl,modelspec);

Find 99% confidence intervals for the coefficients. ci = coefCI(mdl,.01) ci = 9×2 40.7365 -0.0816 -0.0062 -20.0560 -18.3615 -19.9433 -17.1045 -21.2858 -19.8995

62.5635 -0.0246 -0.0034 2.3459 3.3546 2.7955 4.4676 1.2002 1.6238

The confidence intervals are wider than the default 95% confidence intervals in “Find Confidence Intervals for Model Coefficients” on page 35-913.

Input Arguments mdl — Linear regression model object LinearModel object | CompactLinearModel object Linear regression model object, specified as a LinearModel object created by using fitlm or stepwiselm, or a CompactLinearModel object created by using compact. 35-914

coefCI

alpha — Significance level 0.05 (default) | numeric value in the range [0,1] Significance level for the confidence interval, specified as a numeric value in the range [0,1]. The confidence level of ci is equal to 100(1 – alpha)%. alpha is the probability that the confidence interval does not contain the true value. Example: 0.01 Data Types: single | double

Output Arguments ci — Confidence intervals numeric matrix Confidence intervals, returned as a k-by-2 numeric matrix, where k is the number of coefficients. The jth row of ci is the confidence interval of the jth coefficient of mdl. The name of coefficient j is stored in the CoefficientNames property of mdl. Data Types: single | double

More About Confidence Interval The coefficient confidence intervals provide a measure of precision for regression coefficient estimates. A 100(1 – α)% confidence interval gives the range for the corresponding regression coefficient with 100(1 – α)% confidence, meaning that 100(1 – α)% of the intervals resulting from repeated experimentation will contain the true value of the coefficient. The software finds confidence intervals using the Wald method. The 100(1 – α)% confidence intervals for regression coefficients are bi ± t 1 − α/2, n − p SE bi , where bi is the coefficient estimate, SE(bi) is the standard error of the coefficient estimate, and t(1–α/ 2,n–p) is the 100(1 – α/2) percentile of the t-distribution with n – p degrees of freedom. n is the number of observations and p is the number of regression coefficients.

Version History Introduced in R2012a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). 35-915

35

Functions

See Also CompactLinearModel | LinearModel | anova | coefTest | dwtest Topics “Coefficient Standard Errors and Confidence Intervals” on page 11-60 “Interpret Linear Regression Results” on page 11-52 “Linear Regression Workflow” on page 11-35 “Linear Regression” on page 11-9

35-916

coefCI

coefCI Confidence intervals for coefficients of linear mixed-effects model

Syntax feCI = coefCI(lme) feCI = coefCI(lme,Name,Value) [feCI,reCI] = coefCI( ___ )

Description feCI = coefCI(lme) returns the 95% confidence intervals for the fixed-effects coefficients in the linear mixed-effects model lme. feCI = coefCI(lme,Name,Value) returns the 95% confidence intervals for the fixed-effects coefficients in the linear mixed-effects model lme with additional options specified by one or more Name,Value pair arguments. For example, you can specify the confidence level or method to compute the degrees of freedom. [feCI,reCI] = coefCI( ___ ) also returns the 95% confidence intervals for the random-effects coefficients in the linear mixed-effects model lme.

Examples 95% Confidence Intervals for Fixed-Effects Coefficients Load the sample data. load('weight.mat')

weight contains data from a longitudinal study, where 20 subjects are randomly assigned to 4 exercise programs, and their weight loss is recorded over six 2-week time periods. This is simulated data. Store the data in a table. Define Subject and Program as categorical variables. tbl = table(InitialWeight, Program, Subject,Week, y); tbl.Subject = nominal(tbl.Subject); tbl.Program = nominal(tbl.Program);

Fit a linear mixed-effects model where the initial weight, type of program, week, and the interaction between the week and type of program are the fixed effects. The intercept and week vary by subject. lme = fitlme(tbl,'y ~ InitialWeight + Program*Week + (Week|Subject)');

Compute the fixed-effects coefficient estimates. fe = fixedEffects(lme)

35-917

35

Functions

fe = 9×1 0.6610 0.0032 0.3608 -0.0333 0.1132 0.1732 0.0388 0.0305 0.0331

The first estimate, 0.6610, corresponds to the constant term. The second row, 0.0032, and the third row, 0.3608, are estimates for the coefficient of initial weight and week, respectively. Rows four to six correspond to the indicator variables for programs B-D, and the last three rows correspond to the interaction of programs B-D and week. Compute the 95% confidence intervals for the fixed-effects coefficients. fecI = coefCI(lme) fecI = 9×2 0.1480 0.0005 0.1004 -0.2932 -0.1471 0.0395 -0.1503 -0.1585 -0.1559

1.1741 0.0059 0.6211 0.2267 0.3734 0.3069 0.2278 0.2196 0.2221

Some confidence intervals include 0. To obtain specific p-values for each fixed-effects term, use the fixedEffects method. To test for entire terms use the anova method.

Confidence Intervals with Specified Options Load the sample data. load carbig

Fit a linear mixed-effects model for miles per gallon (MPG), with fixed effects for acceleration and horsepower, and a potentially correlated random effect for intercept and acceleration grouped by model year. First, store the data in a table. tbl = table(Acceleration,Horsepower,Model_Year,MPG);

Fit the model. lme = fitlme(tbl, 'MPG ~ Acceleration + Horsepower + (Acceleration|Model_Year)');

Compute the fixed-effects coefficient estimates. 35-918

coefCI

fe = fixedEffects(lme) fe = 3×1 50.1325 -0.5833 -0.1695

Compute the 99% confidence intervals for fixed-effects coefficients using the residuals method to determine the degrees of freedom. This is the default method. feCI = coefCI(lme,'Alpha',0.01) feCI = 3×2 44.2690 -0.9300 -0.1883

55.9961 -0.2365 -0.1507

Compute the 99% confidence intervals for fixed-effects coefficients using the Satterthwaite approximation to compute the degrees of freedom. feCI = coefCI(lme,'Alpha',0.01,'DFMethod','satterthwaite') feCI = 3×2 44.0949 -0.9640 -0.1884

56.1701 -0.2025 -0.1507

The Satterthwaite approximation produces similar confidence intervals than the residual method.

Compute Confidence Intervals for Random Effects Load the sample data. load('shift.mat')

The data shows the deviations from the target quality characteristic measured from the products that five operators manufacture during three shifts: morning, evening, and night. This is a randomized block design, where the operators are the blocks. The experiment is designed to study the impact of the time of shift on the performance. The performance measure is the deviation of the quality characteristics from the target value. This is simulated data. Shift and Operator are nominal variables. shift.Shift = nominal(shift.Shift); shift.Operator = nominal(shift.Operator);

Fit a linear mixed-effects model with a random intercept grouped by operator to assess if there is significant difference in the performance according to the time of the shift. lme = fitlme(shift,'QCDev ~ Shift + (1|Operator)');

35-919

35

Functions

Compute the estimate of the BLUPs for random effects. randomEffects(lme) ans = 5×1 0.5775 1.1757 -2.1715 2.3655 -1.9472

Compute the 95% confidence intervals for random effects. [~,reCI] = coefCI(lme) reCI = 5×2 -1.3916 -0.7934 -4.1407 0.3964 -3.9164

2.5467 3.1449 -0.2024 4.3347 0.0219

Compute the 99% confidence intervals for random effects using the residuals method to determine the degrees of freedom. This is the default method. [~,reCI] = coefCI(lme,'Alpha',0.01) reCI = 5×2 -2.1831 -1.5849 -4.9322 -0.3951 -4.7079

3.3382 3.9364 0.5891 5.1261 0.8134

Compute the 99% confidence intervals for random effects using the Satterthwaite approximation to determine the degrees of freedom. [~,reCI] = coefCI(lme,'Alpha',0.01,'DFMethod','satterthwaite') reCI = 5×2 -2.6840 -2.0858 -5.4330 -0.8960 -5.2087

3.8390 4.4372 1.0900 5.6270 1.3142

The Satterthwaite approximation might produce smaller DF values than the residual method. That is why these confidence intervals are larger than the previous ones computed using the residual method.

35-920

coefCI

Input Arguments lme — Linear mixed-effects model LinearMixedModel object Linear mixed-effects model, specified as a LinearMixedModel object constructed using fitlme or fitlmematrix. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: [feCI,reCI] = coefCI(lme,'Alpha',0.01) Alpha — Significance level 0.05 (default) | scalar value in the range 0 to 1 Significance level, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range 0 to 1. For a value α, the confidence level is 100*(1–α)%. For example, for 99% confidence intervals, you can specify the confidence level as follows. Example: 'Alpha',0.01 Data Types: single | double DFMethod — Method for computing approximate degrees of freedom 'residual' (default) | 'satterthwaite' | 'none' Method for computing approximate degrees of freedom for confidence interval computation, specified as the comma-separated pair consisting of 'DFMethod' and one of the following. 'residual'

Default. The degrees of freedom are assumed to be constant and equal to n – p, where n is the number of observations and p is the number of fixed effects.

'satterthwaite'

Satterthwaite approximation.

'none'

All degrees of freedom are set to infinity.

For example, you can specify the Satterthwaite approximation as follows. Example: 'DFMethod','satterthwaite'

Output Arguments feCI — Fixed-effects confidence intervals p-by-2 matrix Fixed-effects confidence intervals, returned as a p-by-2 matrix. feCI contains the confidence limits that correspond to the p fixed-effects estimates in the vector beta returned by the fixedEffects 35-921

35

Functions

method. The first column of feCI has the lower confidence limits and the second column has the upper confidence limits. reCI — Random-effects confidence intervals q-by-2 matrix Random-effects confidence intervals, returned as a q-by-2 matrix. reCI contains the confidence limits corresponding to the q random-effects estimates in the vector B returned by the randomEffects method. The first column of reCI has the lower confidence limits and the second column has the upper confidence limits.

Version History Introduced in R2013b

See Also LinearMixedModel | coefTest | fixedEffects | randomEffects

35-922

coefCI

coefCI Confidence intervals for coefficient estimates of multinomial regression model

Syntax ci = coefCI(mdl) ci = coefCI(mdl,alpha)

Description ci = coefCI(mdl) returns 95% confidence intervals on page 35-927 for the coefficients in mdl. ci = coefCI(mdl,alpha) returns the confidence intervals using the confidence level 100(1 – alpha)%.

Examples Find Confidence Intervals for Model Coefficients Load the carbig data set. load carbig

The variables Horsepower, Weight, and Origin contain data for car horsepower, weight, and country of origin, respectively. The variable MPG contains car mileage data. Create a table in which the Origin and MPG variables are categorical. Origin = categorical(cellstr(Origin)); MPG = discretize(MPG,[9 19 29 39 48],"categorical"); tbl = table(Horsepower,Weight,Origin,MPG);

Fit a multinomial regression model. Specify Horsepower, Weight, and Origin as predictor variables, and specify MPG as the response variable. modelspec = "MPG ~ 1 + Horsepower + Weight + Origin"; mdl = fitmnr(tbl,modelspec);

Find the 95% confidence intervals for the coefficients. Display the coefficient names and confidence intervals in a table by using the array2table function. ci = coefCI(mdl); ciTable = array2table(ci, ... RowNames = mdl.Coefficients.Properties.RowNames, ... VariableNames = ["LowerLimit","UpperLimit"]) ciTable=27×2 table LowerLimit ___________ (Intercept_[9, 19))

-89.395

UpperLimit __________ 32.927

35-923

35

Functions

Horsepower_[9, 19) Weight_[9, 19) Origin_France_[9, 19) Origin_Germany_[9, 19) Origin_Italy_[9, 19) Origin_Japan_[9, 19) Origin_Sweden_[9, 19) Origin_USA_[9, 19) (Intercept_[19, 29)) Horsepower_[19, 29) Weight_[19, 29) Origin_France_[19, 29) Origin_Germany_[19, 29) Origin_Italy_[19, 29) Origin_Japan_[19, 29) ⋮

0.14928 0.0022537 -54.498 -62.237 -73.457 -62.743 -60.076 -59.875 -78.671 0.12131 -0.00073846 -49.929 -57.315 -51.881 -58.22

0.27499 0.0069061 69.362 59.666 54.35 59.097 63.853 61.926 43.544 0.24115 0.0033281 73.841 64.476 73.071 63.559

Each row contains the lower and upper limits for the 95% confidence intervals.

Specify Confidence Level Load the carbig data set. load carbig

The variables Horsepower, Weight, and Origin contain data for car horsepower, weight, and country of origin. The variable MPG contains car mileage data. Create a table in which the Origin and MPG variables are categorical. Origin = categorical(cellstr(Origin)); MPG = discretize(MPG,[9 19 29 39 48],"categorical"); tbl = table(Horsepower,Weight,Origin,MPG);

Fit a multinomial regression model. Specify Horsepower, Weight, and Origin as predictor variables, and specify MPG as the response variable. modelspec = "MPG ~ 1 + Horsepower + Weight + Origin"; mdl = fitmnr(tbl,modelspec);

Find the 95% and 99% confidence intervals for the coefficients. Display the coefficient names and confidence intervals in a table by using the array2table function. ci95 = coefCI(mdl); ci99 = coefCI(mdl,0.01); confIntervals = array2table([ci95 ci99], ... RowNames=mdl.Coefficients.Properties.RowNames, ... VariableNames=["95LowerLimit","95UpperLimit", ... "99LowerLimit","99UpperLimit"]) confIntervals=27×4 table 95LowerLimit ____________

35-924

95UpperLimit ____________

99LowerLimit ____________

99UpperLimit ____________

coefCI

(Intercept_[9, 19)) Horsepower_[9, 19) Weight_[9, 19) Origin_France_[9, 19) Origin_Germany_[9, 19) Origin_Italy_[9, 19) Origin_Japan_[9, 19) Origin_Sweden_[9, 19) Origin_USA_[9, 19) (Intercept_[19, 29)) Horsepower_[19, 29) Weight_[19, 29) Origin_France_[19, 29) Origin_Germany_[19, 29) Origin_Italy_[19, 29) Origin_Japan_[19, 29) ⋮

-89.395 0.14928 0.0022537 -54.498 -62.237 -73.457 -62.743 -60.076 -59.875 -78.671 0.12131 -0.00073846 -49.929 -57.315 -51.881 -58.22

32.927 0.27499 0.0069061 69.362 59.666 54.35 59.097 63.853 61.926 43.544 0.24115 0.0033281 73.841 64.476 73.071 63.559

-108.66 0.12948 0.0015209 -74.007 -81.438 -93.588 -81.935 -79.596 -79.06 -97.921 0.10243 -0.001379 -69.424 -76.498 -71.563 -77.401

52.194 0.29478 0.0076389 88.871 78.868 74.481 78.288 83.373 81.111 62.794 0.26003 0.0039687 93.336 83.659 92.752 82.74

Each row contains the lower and upper limits for the 95% and 99% confidence intervals. Visualize the confidence intervals by plotting their limits with the coefficient values. ci95 = coefCI(mdl); ci99 = coefCI(mdl,0.01); colors = lines(3); hold on p = plot(mdl.Coefficients.Value,Color=colors(1,:)); plot(ci95(:,1),Color=colors(2,:),LineStyle="--") plot(ci95(:,2),Color=colors(2,:),LineStyle="--") plot(ci99(:,1),Color=colors(3,:),LineStyle="--") plot(ci99(:,2),Color=colors(3,:),LineStyle="--") hold off legend(["Coefficients","95% CI","","99% CI",""], ... Location="southeast")

35-925

35

Functions

The plot shows that the 99% confidence intervals for the coefficients are wider than the 95% confidence intervals.

Input Arguments mdl — Multinomial regression model object MultinomialRegression model object Multinomial regression model object, specified as a MultinomialRegression model object created with the fitmnr function. alpha — Significance level 0.05 (default) | numeric value in the range [0,1] Significance level for the confidence interval, specified as a numeric value in the range [0,1]. The confidence level of ci is equal to 100(1 – alpha)%. alpha is the probability that the confidence interval does not contain the true value. Example: 0.01 35-926

coefCI

Data Types: single | double

Output Arguments ci — Confidence intervals numeric matrix Confidence intervals, returned as a p-by-2 numeric matrix, where p is the number of coefficients. The jth row of ci is the confidence interval for the jth coefficient of mdl. The name of coefficient j is stored in the CoefficientNames property of mdl.

More About Confidence Intervals The coefficient confidence intervals provide a measure of precision for regression coefficient estimates. A 100(1 – α)% confidence interval gives the range for the corresponding regression coefficient with 100(1 – α)% confidence, meaning that 100(1 – α)% of the intervals resulting from repeated experimentation will contain the true value of the coefficient. The software finds confidence intervals using the Wald method. The 100(1 – α)% confidence intervals for regression coefficients are bi ± t 1 − α/2, n − p SE bi , where bi is the coefficient estimate, SE(bi) is the standard error of the coefficient estimate, and t(1–α/ is the 100(1 – α/2) percentile of the t-distribution with n – p degrees of freedom. n is the number of observations and p is the number of regression coefficients.

2,n–p)

Version History Introduced in R2023a

See Also MultinomialRegression | fitmnr

35-927

35

Functions

coefCI Confidence intervals of coefficient estimates of nonlinear regression model

Syntax ci = coefCI(mdl) ci = coefCI(mdl,alpha)

Description ci = coefCI(mdl) returns 95% confidence intervals for the coefficients in mdl. ci = coefCI(mdl,alpha) returns confidence intervals with confidence level 1 - alpha.

Examples Default Confidence Intervals Create a nonlinear model for auto mileage based on the carbig data. Then obtain confidence intervals for the resulting model coefficients. Load the data and create a nonlinear model. load carbig ds = dataset(Horsepower,Weight,MPG); modelfun = @(b,x)b(1) + b(2)*x(:,1) + ... b(3)*x(:,2) + b(4)*x(:,1).*x(:,2); beta0 = [1 1 1 1]; mdl = fitnlm(ds,modelfun,beta0) mdl = Nonlinear regression model: MPG ~ b1 + b2*Horsepower + b3*Weight + b4*Horsepower*Weight Estimated Coefficients: Estimate SE __________ __________ b1 b2 b3 b4

63.558 -0.25084 -0.010772 5.3554e-05

2.3429 0.027279 0.00077381 6.6491e-06

tStat _______

pValue __________

27.127 -9.1952 -13.921 8.0542

1.2343e-91 2.3226e-18 5.1372e-36 9.9336e-15

Number of observations: 392, Error degrees of freedom: 388 Root Mean Squared Error: 3.93 R-Squared: 0.748, Adjusted R-Squared 0.746 F-statistic vs. constant model: 385, p-value = 7.26e-116

All the coefficients have extremely small p-values. This means a confidence interval around the coefficients will not contain the point 0, unless the confidence level is very high. 35-928

coefCI

Find 95% confidence intervals for the coefficients of the model. ci = coefCI(mdl) ci = 4×2 58.9515 -0.3045 -0.0123 0.0000

68.1644 -0.1972 -0.0093 0.0001

The confidence interval for b4 seems to contain 0. Examine it in more detail. ci(4,:) ans = 1×2 10-4 × 0.4048

0.6663

As expected, the confidence interval does not contain the point 0.

Input Arguments mdl — Nonlinear regression model object NonLinearModel object Nonlinear regression model object, specified as a NonLinearModel object created by using fitnlm. alpha — Significance level 0.05 (default) | numeric value in the range [0,1] Significance level for the confidence interval, specified as a numeric value in the range [0,1]. The confidence level of ci is equal to 100(1 – alpha)%. alpha is the probability that the confidence interval does not contain the true value. Example: 0.01 Data Types: single | double

Output Arguments ci — Confidence intervals numeric matrix Confidence intervals, returned as a k-by-2 numeric matrix, where k is the number of coefficients. The jth row of ci is the confidence interval of the jth coefficient of mdl. The name of coefficient j is stored in the CoefficientNames property of mdl. Data Types: single | double

35-929

35

Functions

More About Confidence Interval The coefficient confidence intervals provide a measure of precision for regression coefficient estimates. A 100(1 – α)% confidence interval gives the range for the corresponding regression coefficient with 100(1 – α)% confidence, meaning that 100(1 – α)% of the intervals resulting from repeated experimentation will contain the true value of the coefficient. The software finds confidence intervals using the Wald method. The 100(1 – α)% confidence intervals for regression coefficients are bi ± t 1 − α/2, n − p SE bi , where bi is the coefficient estimate, SE(bi) is the standard error of the coefficient estimate, and t(1–α/ 2,n–p) is the 100(1 – α/2) percentile of the t-distribution with n – p degrees of freedom. n is the number of observations and p is the number of regression coefficients.

Version History Introduced in R2012a

See Also NonLinearModel Topics “Nonlinear Regression Workflow” on page 13-13 “Nonlinear Regression” on page 13-2

35-930

coefTest

coefTest Package: Linear hypothesis test on generalized linear regression model coefficients

Syntax p = coefTest(mdl) p = coefTest(mdl,H) p = coefTest(mdl,H,C) [p,F] = coefTest( ___ ) [p,F,r] = coefTest( ___ )

Description p = coefTest(mdl) computes the p-value for an F test that all coefficient estimates in mdl, except the intercept term, are zero. p = coefTest(mdl,H) performs an F-test that H × B = 0, where B represents the coefficient vector. Use H to specify the coefficients to include in the F-test. p = coefTest(mdl,H,C) performs an F-test that H × B = C. [p,F] = coefTest( ___ ) also returns the F-test statistic F using any of the input argument combinations in previous syntaxes. [p,F,r] = coefTest( ___ ) also returns the numerator degrees of freedom r for the test.

Examples Test Significance of Generalized Linear Regression Model Fit a generalized linear regression model, and test the coefficients of the fitted model to see if they differ from zero. Generate sample data using Poisson random numbers with two underlying predictors X(:,1) and X(:,2). rng('default') % For reproducibility rndvars = randn(100,2); X = [2 + rndvars(:,1),rndvars(:,2)]; mu = exp(1 + X*[1;2]); y = poissrnd(mu);

Create a generalized linear regression model of Poisson data. mdl = fitglm(X,y,'y ~ x1 + x2','Distribution','poisson') mdl = Generalized linear regression model:

35-931

35

Functions

log(y) ~ 1 + x1 + x2 Distribution = Poisson Estimated Coefficients: Estimate ________ (Intercept) x1 x2

1.0405 0.9968 1.987

SE _________

tStat ______

0.022122 0.003362 0.0063433

47.034 296.49 313.24

pValue ______ 0 0 0

100 observations, 97 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 2.95e+05, p-value = 0

Test whether the fitted model has coefficients that differ significantly from zero. p = coefTest(mdl) p = 4.1131e-153

The small p-value indicates that the model fits significantly better than a degenerate model consisting of only an intercept term.

Test Significance of Generalized Linear Regression Model Coefficient Fit a generalized linear regression model, and test the significance of a specified coefficient in the fitted model. Generate sample data using Poisson random numbers with two underlying predictors X(:,1) and X(:,2). rng('default') % For reproducibility rndvars = randn(100,2); X = [2 + rndvars(:,1),rndvars(:,2)]; mu = exp(1 + X*[1;2]); y = poissrnd(mu);

Create a generalized linear regression model of Poisson data. mdl = fitglm(X,y,'y ~ x1 + x2','Distribution','poisson') mdl = Generalized linear regression model: log(y) ~ 1 + x1 + x2 Distribution = Poisson Estimated Coefficients: Estimate ________ (Intercept) x1 x2

35-932

1.0405 0.9968 1.987

SE _________

tStat ______

0.022122 0.003362 0.0063433

47.034 296.49 313.24

pValue ______ 0 0 0

coefTest

100 observations, 97 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 2.95e+05, p-value = 0

Test the significance of the x1 coefficient. According to the model display, x1 is the second predictor. Specify the coefficient by using a numeric index vector. p = coefTest(mdl,[0 1 0]) p = 2.8681e-145

The returned p-value indicates that x1 is statistically significant in the fitted model.

Input Arguments mdl — Generalized linear regression model GeneralizedLinearModel object | CompactGeneralizedLinearModel object Generalized linear regression model, specified as a GeneralizedLinearModel object created using fitglm or stepwiseglm, or a CompactGeneralizedLinearModel object created using compact. H — Hypothesis matrix numeric index matrix Hypothesis matrix, specified as a full-rank numeric index matrix of size r-by-s, where r is the number of linear combinations of coefficients being tested, and s is the total number of coefficients. • If you specify H, then the output p is the p-value for an F-test that H × B = 0, where B represents the coefficient vector. • If you specify H and C, then the output p is the p-value for an F-test that H × B = C. Example: [1 0 0 0 0] tests the first coefficient among five coefficients. Data Types: single | double C — Hypothesized value numeric vector Hypothesized value for testing the null hypothesis, specified as a numeric vector with the same number of rows as H. If you specify H and C, then the output p is the p-value for an F-test that H × B = C, where B represents the coefficient vector. Data Types: single | double

Output Arguments p — p-value for F-test numeric value in the range [0,1] p-value for the F-test, returned as a numeric value in the range [0,1]. 35-933

35

Functions

F — Value of test statistic for F-test numeric value Value of the test statistic for the F-test, returned as a numeric value. r — Numerator degrees of freedom for F-test positive integer Numerator degrees of freedom for the F-test, returned as a positive integer. The F-statistic has r degrees of freedom in the numerator and mdl.DFE degrees of freedom in the denominator.

Algorithms The p-value, F-statistic, and numerator degrees of freedom are valid under these assumptions: • The data comes from a model represented by the formula in the Formula property of the fitted model. • The observations are independent, conditional on the predictor values. Under these assumptions, let β represent the (unknown) coefficient vector of the linear regression. Suppose H is a full-rank numeric index matrix of size r-by-s, where r is the number of linear combinations of coefficients being tested, and s is the total number of coefficients. Let c be a column vector with r rows. The following is a test statistic for the hypothesis that Hβ = c: F = Hβ − c ′ HVH′

−1

Hβ − c /r .

Here β is the estimate of the coefficient vector β, stored in the Coefficients property, and V is the estimated covariance of the coefficient estimates, stored in the CoefficientCovariance property. When the hypothesis is true, the test statistic F has an “F Distribution” on page B-46 with r and u degrees of freedom, where u is the degrees of freedom for error, stored in the DFE property.

Alternative Functionality The values of commonly used test statistics are available in the Coefficients property of a fitted model.

Version History Introduced in R2012a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also GeneralizedLinearModel | CompactGeneralizedLinearModel | linhyptest | coefCI | devianceTest 35-934

coefTest

Topics “Generalized Linear Model Workflow” on page 12-28 “Generalized Linear Models” on page 12-9

35-935

35

Functions

coefTest Class: GeneralizedLinearMixedModel Hypothesis test on fixed and random effects of generalized linear mixed-effects model

Syntax pVal = coefTest(glme) pVal = coefTest(glme,H) pVal = coefTest(glme,H,C) pVal = coefTest(glme,H,C,Name,Value) [pVal,F,DF1,DF2] = coefTest( ___ )

Description pVal = coefTest(glme) returns the p-value of an F-test of the null hypothesis that all fixed-effects coefficients of the generalized linear mixed-effects model glme, except for the intercept, are equal to 0. pVal = coefTest(glme,H) returns the p-value of an F-test using a specified contrast matrix, H. The null hypothesis is H0: Hβ = 0, where β is the fixed-effects vector. pVal = coefTest(glme,H,C) returns the p-value for an F-test using the hypothesized value, C. The null hypothesis is H0: Hβ = C, where β is the fixed-effects vector. pVal = coefTest(glme,H,C,Name,Value) returns the p-value for an F-test on the fixed- and/or random-effects coefficients of the generalized linear mixed-effects model glme, with additional options specified by one or more name-value pair arguments. For example, you can specify the method to compute the approximate denominator degrees of freedom for the F-test. [pVal,F,DF1,DF2] = coefTest( ___ ) also returns the F-statistic, F, and the numerator and denominator degrees of freedom for F, respectively DF1 and DF2, using any of the previous syntaxes.

Input Arguments glme — Generalized linear mixed-effects model GeneralizedLinearMixedModel object Generalized linear mixed-effects model, specified as a GeneralizedLinearMixedModel object. For properties and methods of this object, see GeneralizedLinearMixedModel. H — Fixed-effects contrasts m-by-p matrix Fixed-effects contrasts, specified as an m-by-p matrix, where p is the number of fixed-effects coefficients in glme. Each row of H represents one contrast. The columns of H (left to right) correspond to the rows of the p-by-1 fixed-effects vector beta (top to bottom) whose estimate is returned by the fixedEffects method. Data Types: single | double 35-936

coefTest

C — Hypothesized value m-by-1 vector Hypothesized value for testing the null hypothesis Hβ = C, specified as an m-by-1 vector. Here, β is the vector of fixed-effects whose estimate is returned by fixedEffects. Data Types: single | double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. DFMethod — Method for computing approximate degrees of freedom 'residual' (default) | 'none' Method for computing approximate degrees of freedom, specified as the comma-separated pair consisting of 'DFMethod' and one of the following. Value

Description

'residual'

The degrees of freedom value is assumed to be constant and equal to n – p, where n is the number of observations and p is the number of fixed effects.

'none'

The degrees of freedom is set to infinity.

Example: 'DFMethod','none' REContrast — Random-effects contrasts m-by-q matrix Random-effects contrasts, specified as the comma-separated pair consisting of 'REContrast' and an m-by-q matrix, where q is the number of random effects parameters in glme. The columns of the matrix (left to right) correspond to the rows of the q-by-1 random-effects vector B (top to bottom), whose estimate is returned by the randomEffects method. Data Types: single | double

Output Arguments pVal — p-value scalar value p-value for the F-test on the fixed- and/or random-effects coefficients of the generalized linear mixedeffects model glme, returned as a scalar value. When fitting a GLME model using fitglme and one of the maximum likelihood fit methods ('Laplace' or 'ApproximateLaplace'), coefTest uses an approximation of the conditional mean squared error of prediction (CMSEP) of the estimated linear combination of fixed- and random-effects to compute p-values. This accounts for the uncertainty in the fixed-effects estimates, but not for the uncertainty in the covariance parameter estimates. For tests on fixed effects only, if you specify the 35-937

35

Functions

'CovarianceMethod' name-value pair argument in fitglme as 'JointHessian', then coefTest accounts for the uncertainty in the estimation of covariance parameters. When fitting a GLME model using fitglme and one of the pseudo likelihood fit methods ('MPL' or 'REMPL'), coefTest bases the inference on the fitted linear mixed effects model from the final pseudo likelihood iteration. F — F-statistic scalar value F-statistic, returned as a scalar value. DF1 — Numerator degrees of freedom for F scalar value Numerator degrees of freedom for the F-statistic F, returned as a scalar value. • If you test the null hypothesis H0: Hβ = 0 or H0: Hβ = C, then DF1 is equal to the number of linearly independent rows in H. • If you test the null hypothesis H0: Hβ + KB = C, then DF1 is equal to the number of linearly independent rows in [H,K]. DF2 — Denominator degrees of freedom for F scalar value Denominator degrees of freedom for the F-statistic F, returned as a scalar value. The value of DF2 depends on the option specified by the 'DFMethod' name-value pair argument.

Examples Test the Significance of Coefficients Load the sample data. load mfr

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data: • Flag to indicate whether the batch used the new process (newprocess) • Processing time for each batch, in hours (time) • Temperature of the batch, in degrees Celsius (temp) • Categorical variable indicating the supplier (A, B, or C) of the chemical used in the batch (supplier) • Number of defects in the batch (defects) 35-938

coefTest

The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius. Fit a generalized linear mixed-effects model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects', so the dummy variable coefficients sum to 0. The number of defects can be modeled using a Poisson distribution defectsi j ∼ Poisson(μi j) This corresponds to the generalized linear mixed-effects model log(μi j) = β0 + β1newprocessi j + β2time_devi j + β3temp_devi j + β4supplier_Ci j + β5supplier_Bi j + bi, where • defectsi j is the number of defects observed in the batch produced by factory i during batch j. • μi j is the mean number of defects corresponding to factory i (where i = 1, 2, . . . , 20) during batch j (where j = 1, 2, . . . , 5). • newprocessi j, time_devi j, and temp_devi j are the measurements for each variable that correspond to factory i during batch j. For example, newprocessi j indicates whether the batch produced by factory i during batch j used the new process. • supplier_Ci j and supplier_Bi j are dummy variables that use effects (sum-to-zero) coding to indicate whether company C or B, respectively, supplied the process chemicals for the batch produced by factory i during batch j. • b ∼ N(0, σ2) is a random-effects intercept for each factory i that accounts for factory-specific i b variation in quality.

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)','Dis

Test if there is any significant difference between supplier C and supplier B. H = [0,0,0,0,1,-1]; [pVal,F,DF1,DF2] = coefTest(glme,H) pVal = 0.2793 F = 1.1842 DF1 = 1 DF2 = 94

The large p-value indicates that there is no significant difference between supplier C and supplier B at the 5% significance level. Here, coefTest also returns the F-statistic, the numerator degrees of freedom, and the approximate denominator degrees of freedom. Test if there is any significant difference between supplier A and supplier B. 35-939

35

Functions

If you specify the 'DummyVarCoding' name-value pair argument as 'effects' when fitting the model using fitglme, then βA + βB + βC = 0, where βA, βB, and βC correspond to suppliers A, B, and C, respectively. βA is the effect of A minus the average effect of A, B, and C. To determine the contrast matrix corresponding to a test between supplier A and supplier B, βB − βA = βB − ( − βB − βC) = 2βB + βC . From the output of disp(glme), column 5 of the contrast matrix corresponds to βC, and column 6 corresponds to βB. Therefore, the contrast matrix for this test is specified as H = [0,0,0,0,1,2]. H = [0,0,0,0,1,2]; [pVal,F,DF1,DF2] = coefTest(glme,H) pVal = 0.6177 F = 0.2508 DF1 = 1 DF2 = 94

The large p-value indicates that there is no significant difference between supplier A and supplier B at the 5% significance level.

References [1] Booth, J.G., and J.P. Hobert. “Standard Errors of Prediction in Generalized Linear Mixed Models.” Journal of the American Statistical Association, Vol. 93, 1998, pp. 262–272.

See Also GeneralizedLinearMixedModel | anova | coefCI | covarianceParameters | fixedEffects | randomEffects

35-940

coefTest

coefTest Package: Linear hypothesis test on linear regression model coefficients

Syntax p = coefTest(mdl) p = coefTest(mdl,H) p = coefTest(mdl,H,C) [p,F] = coefTest( ___ ) [p,F,r] = coefTest( ___ )

Description p = coefTest(mdl) computes the p-value for an F-test that all coefficient estimates in mdl, except for the intercept term, are zero. p = coefTest(mdl,H) performs an F-test that H × B = 0, where B represents the coefficient vector. Use H to specify the coefficients to include in the F-test. p = coefTest(mdl,H,C) performs an F-test that H × B = C. [p,F] = coefTest( ___ ) also returns the F-test statistic F using any of the input argument combinations in previous syntaxes. [p,F,r] = coefTest( ___ ) also returns the numerator degrees of freedom r for the test.

Examples Test Significance of Linear Regression Model Fit a linear regression model and test the coefficients of the fitted model to see if they are zero. Load the carsmall data set and create a table in which the Model_Year predictor is categorical. load carsmall Model_Year = categorical(Model_Year); tbl = table(MPG,Weight,Model_Year);

Fit a linear regression model of mileage as a function of the weight, weight squared, and model year. mdl = fitlm(tbl,'MPG ~ Model_Year + Weight^2') mdl = Linear regression model: MPG ~ 1 + Weight + Model_Year + Weight^2 Estimated Coefficients: Estimate

SE

tStat

pValue

35-941

35

Functions

(Intercept) Weight Model_Year_76 Model_Year_82 Weight^2

__________

__________

_______

__________

54.206 -0.016404 2.0887 8.1864 1.5573e-06

4.7117 0.0031249 0.71491 0.81531 4.9454e-07

11.505 -5.2493 2.9215 10.041 3.149

2.6648e-19 1.0283e-06 0.0044137 2.6364e-16 0.0022303

Number of observations: 94, Error degrees of freedom: 89 Root Mean Squared Error: 2.78 R-squared: 0.885, Adjusted R-Squared: 0.88 F-statistic vs. constant model: 172, p-value = 5.52e-41

The last line of the model display shows the F-statistic value of the regression model and the corresponding p-value. The small p-value indicates that the model fits significantly better than a degenerate model consisting of only an intercept term. You can return these two values by using coefTest. [p,F] = coefTest(mdl) p = 5.5208e-41 F = 171.8844

Test Significance of Linear Model Coefficient Fit a linear regression model and test the significance of a specified coefficient in the fitted model by using coefTest. You can also use anova to test the significance of each predictor in the model. Load the carsmall data set and create a table in which the Model_Year predictor is categorical. load carsmall Model_Year = categorical(Model_Year); tbl = table(MPG,Acceleration,Weight,Model_Year);

Fit a linear regression model of mileage as a function of the weight, weight squared, and model year. mdl = fitlm(tbl,'MPG ~ Acceleration + Model_Year + Weight') mdl = Linear regression model: MPG ~ 1 + Acceleration + Weight + Model_Year Estimated Coefficients: Estimate __________ (Intercept) Acceleration Weight Model_Year_76 Model_Year_82

40.523 -0.023438 -0.0066799 1.9898 7.9661

SE __________

tStat ________

pValue __________

2.5293 0.11353 0.00045796 0.80696 0.89745

16.021 -0.20644 -14.586 2.4657 8.8763

5.8302e-28 0.83692 2.5314e-25 0.015591 6.7725e-14

Number of observations: 94, Error degrees of freedom: 89

35-942

coefTest

Root Mean Squared Error: 2.93 R-squared: 0.873, Adjusted R-Squared: 0.867 F-statistic vs. constant model: 153, p-value = 5.86e-39

The model display includes the p-value for the t-statistic for each coefficient to test the null hypothesis that the corresponding coefficient is zero. You can examine the significance of the coefficient using coefTest. For example, test the significance of the Acceleration coefficient. According to the model display, Acceleration is the second predictor. Specify the coefficient by using a numeric index vector. [p_Acceleration,F_Acceleration,r_Acceleration] = coefTest(mdl,[0 1 0 0 0]) p_Acceleration = 0.8369 F_Acceleration = 0.0426 r_Acceleration = 1

p_Acceleration is the p-value corresponding to the F-statistic value F_Acceleration, and r_Acceleration is the numerator degrees of freedom for the F-test. The returned p-value indicates that Acceleration is not statistically significant in the fitted model. Note that p_Acceleration is equal to the p-value of t-statistic (tStat) in the model display, and F_Acceleration is the square of tStat. Test the significance of the categorical predictor Model_Year. Instead of testing Model_Year_76 and Model_Year_82 separately, you can perform a single test for the categorical predictor Model_Year. Specify Model_Year_76 and Model_Year_82 by using a numeric index matrix. [p_Model_Year,F_Model_Year,r_Model_Year] = coefTest(mdl,[0 0 0 1 0; 0 0 0 0 1]) p_Model_Year = 2.7408e-14 F_Model_Year = 45.2691 r_Model_Year = 2

The returned p-value indicates that Model_Year is statistically significant in the fitted model. You can also return these values by using anova. anova(mdl) ans=4×5 table

Acceleration Weight Model_Year Error

SumSq _______

DF __

MeanSq _______

F ________

pValue __________

0.36613 1827.7 777.81 764.59

1 1 2 89

0.36613 1827.7 388.9 8.591

0.042618 212.75 45.269

0.83692 2.5314e-25 2.7408e-14

Input Arguments mdl — Linear regression model object LinearModel object | CompactLinearModel object 35-943

35

Functions

Linear regression model object, specified as a LinearModel object created by using fitlm or stepwiselm, or a CompactLinearModel object created by using compact. H — Hypothesis matrix numeric index matrix Hypothesis matrix, specified as a full-rank numeric index matrix of size r-by-s, where r is the number of linear combinations of coefficients being tested, and s is the total number of coefficients. • If you specify H, then the output p is the p-value for an F-test that H × B = 0, where B represents the coefficient vector. • If you specify H and C, then the output p is the p-value for an F-test that H × B = C. Example: [1 0 0 0 0] tests the first coefficient among five coefficients. Data Types: single | double C — Hypothesized value numeric vector Hypothesized value for testing the null hypothesis, specified as a numeric vector with the same number of rows as H. If you specify H and C, then the output p is the p-value for an F-test that H × B = C, where B represents the coefficient vector. Data Types: single | double

Output Arguments p — p-value for F-test numeric value in the range [0,1] p-value for the F-test, returned as a numeric value in the range [0,1]. F — Value of test statistic for F-test numeric value Value of the test statistic for the F-test, returned as a numeric value. r — Numerator degrees of freedom for F-test positive integer Numerator degrees of freedom for the F-test, returned as a positive integer. The F-statistic has r degrees of freedom in the numerator and mdl.DFE degrees of freedom in the denominator.

Algorithms The p-value, F-statistic, and numerator degrees of freedom are valid under these assumptions: • The data comes from a model represented by the formula in the Formula property of the fitted model. • The observations are independent, conditional on the predictor values. 35-944

coefTest

Under these assumptions, let β represent the (unknown) coefficient vector of the linear regression. Suppose H is a full-rank numeric index matrix of size r-by-s, where r is the number of linear combinations of coefficients being tested, and s is the total number of coefficients. Let c be a column vector with r rows. The following is a test statistic for the hypothesis that Hβ = c: F = Hβ − c ′ HVH′

−1

Hβ − c /r .

Here β is the estimate of the coefficient vector β, stored in the Coefficients property, and V is the estimated covariance of the coefficient estimates, stored in the CoefficientCovariance property. When the hypothesis is true, the test statistic F has an “F Distribution” on page B-46 with r and u degrees of freedom, where u is the degrees of freedom for error, stored in the DFE property.

Alternative Functionality • The values of commonly used test statistics are available in the Coefficients property of a fitted model. • anova provides tests for each model predictor and groups of predictors.

Version History Introduced in R2012a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also anova | CompactLinearModel | LinearModel | linhyptest | coefCI | dwtest Topics “F-statistic and t-statistic” on page 11-74 “Interpret Linear Regression Results” on page 11-52 “Linear Regression Workflow” on page 11-35 “Linear Regression” on page 11-9

35-945

35

Functions

coefTest Hypothesis test on fixed and random effects of linear mixed-effects model

Syntax pVal = coefTest(lme) pVal = coefTest(lme,H) pVal = coefTest(lme,H,C) pVal = coefTest(lme,H,C,Name,Value) [pVal,F,DF1,DF2] = coefTest( ___ )

Description pVal = coefTest(lme) returns the p-value for an F-test that all fixed-effects coefficients except for the intercept are 0. pVal = coefTest(lme,H) returns the p-value for an F-test on fixed-effects coefficients of linear mixed-effects model lme, using the contrast matrix H. It tests the null hypothesis that H0: Hβ = 0, where β is the fixed-effects vector. pVal = coefTest(lme,H,C) returns the p-value for an F-test on fixed-effects coefficients of the linear mixed-effects model lme, using the contrast matrix H. It tests the null hypothesis that H0: Hβ = C, where β is the fixed-effects vector. pVal = coefTest(lme,H,C,Name,Value) returns the p-value for an F-test on the fixed- and/or random-effects coefficients of the linear mixed-effects model lme, with additional options specified by one or more name-value pair arguments. For example, 'REContrast',K tells coefTest to test the null hypothesis that H0: Hβ + KB = C, where β is the fixed-effects vector and B is the random-effects vector. [pVal,F,DF1,DF2] = coefTest( ___ ) also returns the F-statistic F, and the numerator and denominator degrees of freedom for F, respectively DF1 and DF2.

Examples Test Fixed-Effects Coefficients for Categorical Data Load the sample data. load('shift.mat')

The data shows the absolute deviations from the target quality characteristic measured from the products that five operators manufacture during three different shifts: morning, evening, and night. This is a randomized block design, where the operators are the blocks. The experiment is designed to study the impact of the time of shift on the performance. The performance measure is the absolute deviation of the quality characteristics from the target value. This is simulated data. Shift and Operator are nominal variables. 35-946

coefTest

shift.Shift = nominal(shift.Shift); shift.Operator = nominal(shift.Operator);

Fit a linear mixed-effects model with a random intercept grouped by operator to assess if there is significant difference in the performance according to the time of the shift. lme = fitlme(shift,'QCDev ~ Shift + (1|Operator)') lme = Linear mixed-effects model fit by ML Model information: Number of observations Fixed effects coefficients Random effects coefficients Covariance parameters

15 3 5 2

Formula: QCDev ~ 1 + Shift + (1 | Operator) Model fit statistics: AIC BIC 59.012 62.552

LogLikelihood -24.506

Fixed effects coefficients (95% CIs): Name Estimate {'(Intercept)' } 3.1196 {'Shift_Morning'} -0.3868 {'Shift_Night' } 1.9856

Deviance 49.012 SE 0.88681 0.48344 0.48344

Random effects covariance parameters (95% CIs): Group: Operator (5 Levels) Name1 Name2 {'(Intercept)'} {'(Intercept)'} Group: Error Name {'Res Std'}

Estimate 0.76439

Lower 0.49315

tStat 3.5178 -0.80009 4.1072

Type {'std'}

DF 12 12 12

pValue 0.0042407 0.43921 0.0014535

Estimate 1.8297

Lower 1.1874 -1.4401 0.93227

Lower 0.94915

Upper 1.1848

Test if all fixed-effects coefficients except for the intercept are 0. pVal = coefTest(lme) pVal = 7.5956e-04

The small p-value indicates that not all fixed-effects coefficients are 0. Test the significance of the Shift term using a contrast matrix. H = [0 1 0; 0 0 1]; pVal = coefTest(lme,H) pVal = 7.5956e-04

Test the significance of the Shift term using the anova method. anova(lme)

35-947

Upp 5. 0.6 3.

Upper 3.5272

35

Functions

ans = ANOVA MARGINAL TESTS: DFMETHOD = 'RESIDUAL' Term {'(Intercept)'} {'Shift' }

FStat 12.375 13.864

DF1 1 2

DF2 12 12

pValue 0.0042407 0.00075956

The p-value for Shift, 0.00075956, is the same as the p-value of the previous hypothesis test. Test if there is any difference between the evening and morning shifts. pVal = coefTest(lme,[0 1 -1]) pVal = 3.6147e-04

This small p-value indicates that the performance of the operators are not the same in the morning and the evening shifts.

Hypothesis Tests for Fixed-Effects Coefficients Load the sample data. load('weight.mat')

weight contains data from a longitudinal study, where 20 subjects are randomly assigned to 4 exercise programs, and their weight loss is recorded over six 2-week time periods. This is simulated data. Store the data in a table. Define Subject and Program as categorical variables. tbl = table(InitialWeight,Program,Subject,Week,y); tbl.Subject = nominal(tbl.Subject); tbl.Program = nominal(tbl.Program);

Fit a linear mixed-effects model where the initial weight, type of program, week, and the interaction between the week and type of program are the fixed effects. The intercept and week vary by subject. lme = fitlme(tbl,'y ~ InitialWeight + Program*Week + (Week|Subject)') lme = Linear mixed-effects model fit by ML Model information: Number of observations Fixed effects coefficients Random effects coefficients Covariance parameters

120 9 40 4

Formula: y ~ 1 + InitialWeight + Program*Week + (1 + Week | Subject) Model fit statistics: AIC BIC -22.981 13.257

35-948

LogLikelihood 24.49

Deviance -48.981

coefTest

Fixed effects coefficients (95% CIs): Name Estimate {'(Intercept)' } 0.66105 {'InitialWeight' } 0.0031879 {'Program_B' } 0.36079 {'Program_C' } -0.033263 {'Program_D' } 0.11317 {'Week' } 0.1732 {'Program_B:Week'} 0.038771 {'Program_C:Week'} 0.030543 {'Program_D:Week'} 0.033114

SE 0.25892 0.0013814 0.13139 0.13117 0.13132 0.067454 0.095394 0.095394 0.095394

Random effects covariance parameters (95% CIs): Group: Subject (20 Levels) Name1 Name2 {'(Intercept)'} {'(Intercept)'} {'Week' } {'(Intercept)'} {'Week' } {'Week' } Group: Error Name {'Res Std'}

Estimate 0.10261

Lower 0.087882

tStat 2.5531 2.3078 2.746 -0.25358 0.86175 2.5677 0.40644 0.32018 0.34713

Type {'std' } {'corr'} {'std' }

DF 111 111 111 111 111 111 111 111 111

Estimate 0.18407 0.66841 0.15033

pValue 0.012034 0.022863 0.0070394 0.80029 0.39068 0.011567 0.68521 0.74944 0.72915

Lower 0.12281 0.21076 0.11004

Upper 0.11981

Test for the significance of the interaction between Program and Week. H = [0 0 0 pVal =

0 0 0 0 0 1 0 0; 0 0 0 0 0 0 1 0; 0 0 0 0 0 0 0 1]; coefTest(lme,H)

pVal = 0.9775

The high p-value indicates that the interaction between Program and Week is not statistically significant. Now, test whether all coefficients involving Program are 0. H = [0 0 1 0 0 0 0 0 0; 0 0 0 1 0 0 0 0 0; 0 0 0 0 1 0 0 0 0; 0 0 0 0 0 0 1 0 0; 0 0 0 0 0 0 0 1 0; 0 0 0 0 0 0 0 0 1]; C = [0;0;0;0;0;0]; pVal = coefTest(lme,H,C) pVal = 0.0274

The p-value of 0.0274 indicates that not all coefficients involving Program are zero.

Hypothesis Tests for Fixed- and Random-Effects Coefficients Load the sample data. 35-949

Lower 0.1479 0.0004506 0.1004 -0.2931 -0.1470 0.03953 -0.1502 -0.1584 -0.1559

Upper 0.27587 0.88573 0.20537

35

Functions

load flu

The flu dataset array has a Date variable, and 10 variables containing estimated influenza rates (in 9 different regions, estimated from Google® searches, plus a nationwide estimate from the CDC). To fit a linear-mixed effects model, your data must be in a properly formatted dataset array. To fit a linear mixed-effects model with the influenza rates as the responses and region as the predictor variable, combine the nine columns corresponding to the regions into an array. The new dataset array, flu2, must have the response variable, FluRate, the nominal variable, Region, that shows which region each estimate is from, and the grouping variable Date. flu2 = stack(flu,2:10,'NewDataVarName','FluRate',... 'IndVarName','Region'); flu2.Date = nominal(flu2.Date);

Fit a linear mixed-effects model with fixed effects for the region and a random intercept that varies by Date. lme = fitlme(flu2,'FluRate ~ 1 + Region + (1|Date)') lme = Linear mixed-effects model fit by ML Model information: Number of observations Fixed effects coefficients Random effects coefficients Covariance parameters

468 9 52 2

Formula: FluRate ~ 1 + Region + (1 | Date) Model fit statistics: AIC BIC 318.71 364.35

LogLikelihood -148.36

Deviance 296.71

Fixed effects coefficients (95% CIs): Name Estimate {'(Intercept)' } 1.2233 {'Region_MidAtl' } 0.010192 {'Region_ENCentral'} 0.051923 {'Region_WNCentral'} 0.23687 {'Region_SAtl' } 0.075481 {'Region_ESCentral'} 0.33917 {'Region_WSCentral'} 0.069 {'Region_Mtn' } 0.046673 {'Region_Pac' } -0.16013

SE 0.096678 0.052221 0.052221 0.052221 0.052221 0.052221 0.052221 0.052221 0.052221

Random effects covariance parameters (95% CIs): Group: Date (52 Levels) Name1 Name2 {'(Intercept)'} {'(Intercept)'} Group: Error Name {'Res Std'}

35-950

Estimate 0.26627

Lower 0.24878

tStat 12.654 0.19518 0.9943 4.5359 1.4454 6.495 1.3213 0.89377 -3.0665

Type {'std'}

Upper 0.285

DF 459 459 459 459 459 459 459 459 459

Estimate 0.6443

pValue 1.085e-31 0.84534 0.3206 7.3324e-06 0.14902 2.1623e-10 0.18705 0.37191 0.0022936

Lower 0.5297

Lower 1.0334 -0.092429 -0.050698 0.13424 -0.02714 0.23655 -0.033621 -0.055948 -0.26276

Upper 0.78368

coefTest

Test the hypothesis that the random effects-term for week 10/9/2005 is zero. [~,~,STATS] = randomEffects(lme); % Compute the random-effects statistics (STATS) STATS.Level = nominal(STATS.Level); K = zeros(length(STATS),1); K(STATS.Level == '10/9/2005') = 1; pVal = coefTest(lme,[0 0 0 0 0 0 0 0 0],0,'REContrast',K') pVal = 0.1692

Refit the model this time with a random intercept and slope. lme = fitlme(flu2,'FluRate ~ 1 + Region + (1 + Region|Date)');

Test the hypothesis that the combined coefficient of region WNCentral for week 10/9/2005 is zero. [~,~,STATS] = randomEffects(lme); STATS.Level = nominal(STATS.Level); K = zeros(length(STATS),1); K(STATS.Level == '10/9/2005' & flu2.Region == 'WNCentral') = 1; pVal = coefTest(lme,[0 0 0 1 0 0 0 0 0],0,'REContrast',K') pVal = 1.2059e-12

Also return the F-statistic with the numerator and denominator degrees of freedom. [pVal,F,DF1,DF2] = coefTest(lme,[0 0 0 1 0 0 0 0 0],0,'REContrast',K') pVal = 1.2059e-12 F = 53.4176 DF1 = 1 DF2 = 459

Repeat the test using the Satterthwaite approximation for the denominator degrees of freedom. [pVal,F,DF1,DF2] = coefTest(lme,[0 0 0 1 0 0 0 0 0],0,'REContrast',K',... 'DFMethod','satterthwaite') pVal = NaN F = 53.4176 DF1 = 1 DF2 = 0

Input Arguments lme — Linear mixed-effects model LinearMixedModel object Linear mixed-effects model, specified as a LinearMixedModel object constructed using fitlme or fitlmematrix. H — Fixed-effects contrasts m-by-p matrix 35-951

35

Functions

Fixed-effects contrasts, specified as an m-by-p matrix, where p is the number of fixed-effects coefficients in lme. Each row of H represents one contrast. The columns of H (left to right) correspond to the rows of the p-by-1 fixed-effects vector beta (top to bottom), returned by the fixedEffects method. Example: pVal = coefTest(lme,H) Data Types: single | double C — Hypothesized value m-by-1 vector Hypothesized value for testing the null hypothesis H*beta = C, specified as an m-by-1 matrix. Here, beta is the vector of fixed-effects estimates returned by the fixedEffects method. Data Types: single | double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: pVal = coefTest(lme,H,C,'DFMethod','satterthwaite') DFMethod — Method for computing approximate denominator degrees of freedom 'residual' (default) | 'satterthwaite' | 'none' Method for computing the approximate denominator degrees of freedom for the F-test, specified as the comma-separated pair consisting of 'DFMethod' and one of the following. 'residual'

Default. The degrees of freedom are assumed to be constant and equal to n – p, where n is the number of observations and p is the number of fixed effects.

'satterthwaite'

Satterthwaite approximation.

'none'

All degrees of freedom are set to infinity.

For example, you can specify the Satterthwaite approximation as follows. Example: 'DFMethod','satterthwaite' REContrast — Random-effects contrasts m-by-q matrix Random-effects contrasts, specified as the comma-separated pair consisting of 'REContrast' and an m-by-q matrix K, where q is the number of random effects parameters in lme. The columns of K (left to right) correspond to the rows of the random-effects best linear unbiased predictor vector B (top to bottom), returned by the randomEffects method. Data Types: single | double

35-952

coefTest

Output Arguments pVal — p-value scalar value p-value for the F-test on the fixed and/or random-effects coefficients of the linear mixed-effects model lme, returned as a scalar value. F — F-statistic scalar value F-statistic, returned as a scalar value. DF1 — Numerator degrees of freedom for F scalar value Numerator degrees of freedom for F, returned as a scalar value. • If you test the null hypothesis H0: Hβ = 0, or H0: Hβ = C, then DF1 is equal to the number of linearly independent rows in H. • If you test the null hypothesis H0: Hβ + KB= C, then DF1 is equal to the number of linearly independent rows in [H,K]. DF2 — Denominator degrees of freedom for F scalar value Denominator degrees of freedom for F, returned as a scalar value. The value of DF2 depends on the option you select for DFMethod.

Version History Introduced in R2013b

See Also LinearMixedModel | anova | coefCI

35-953

35

Functions

coeftest Linear hypothesis test on MANOVA model coefficients

Syntax tbl = coeftest(maov) tbl = coeftest(maov,A) tbl = coeftest(maov,A,C) tbl = coeftest(maov,A,C,D) tbl = coeftest( ___ ,TestStatistic=testStat) [tbl,H,E] = coeftest( ___ )

Description tbl = coeftest(maov) performs an F-test to determine if the coefficient estimates for the manova object maov are statistically different from zero, and returns the results in a table tbl. tbl = coeftest(maov,A) tests the null hypothesis that A*B = 0, where B is the matrix of coefficients in maov.Coefficients. A is an a-by-p transform matrix with rank a ≤ p, and p is the number of terms in the MANOVA model for maov. Use this syntax to test for statistically significant differences in model coefficients between factor values. tbl = coeftest(maov,A,C) tests the null hypothesis that A*B*C = 0. C is an r-by-c contrast matrix, where r is the number of response variables in the MANOVA model for maov. Use this syntax to test for statistically significant differences in model coefficients between response variables. tbl = coeftest(maov,A,C,D) tests the null hypothesis that A*B*C = D. D is a scalar or an a-by-c matrix of numeric values. Use this syntax to determine whether linear combinations of the coefficients estimated for maov are statistically equal to certain values. tbl = coeftest( ___ ,TestStatistic=testStat) specifies the test statistic to use in the hypothesis test, using any of the input argument combinations in the previous syntaxes. [tbl,H,E] = coeftest( ___ ) also returns the hypothesis matrix H and the error matrix E.

Examples Test Significance of MANOVA Model Coefficients Load the carsmall data set. load carsmall

The variable Model_Year contains data for the year a car was manufactured, and the variable Cylinders contains data for the number of engine cylinders in the car. The Acceleration and Displacement variables contain data for car acceleration and displacement. Use the table function to create a table from the data in Model_Year, Cylinders, Acceleration, and Displacement. 35-954

coeftest

data = table(Model_Year,Cylinders,Acceleration,Displacement, ... VariableNames=["Year" "Cylinders" "Acceleration" "Displacement"]);

Perform a two-way MANOVA, using Model_Year and Cylinders as factors, and Acceleration and Displacement as response variables. Display the test statistic used to perform the MANOVA. maov = manova(data,["Acceleration" "Displacement"]); maov.TestStatistic ans = "pillai"

The output shows that the function uses Pillai's to compute the MANOVA statistics for maov. Test the null hypothesis that the model coefficients for maov are not statistically different from zero. By default, coeftest uses the statistic in maov.TestStatistic to perform the test. tbl = coeftest(maov) tbl=1×6 table TestStatistic _____________ "pillai"

Value ______

F ______

1.8636

259.68

DFNumerator ___________ 10

DFDenominator _____________

pValue ___________

190

4.4695e-105

The p-value in the table output is very small, indicating that enough evidence exists to conclude that at least one of the model coefficients is statistically significant.

Test Significance of MANOVA Model Term Load the carsmall data set. load carsmall

The variable Model_Year contains data for the year a car was manufactured, and the variable Cylinders contains data for the number of engine cylinders in the car. The Acceleration and Displacement variables contain data for car acceleration and displacement. Use the table function to create a table from the data in Model_Year, Cylinders, Acceleration, and Displacement. data = table(Model_Year,Cylinders,Acceleration,Displacement, ... VariableNames=["Year" "Cylinders" "Acceleration" "Displacement"]);

Perform a two-way MANOVA using Model_Year and Cylinders as factors, and Acceleration and Displacement as response variables. maov = manova(data,["Acceleration" "Displacement"]);

maov is a manova object that contains the results of the two-way MANOVA. Display the fitted MANOVA model coefficients for maov. coefs = maov.Coefficients

35-955

35

Functions

coefs = 5×2 14.9360 228.5164 -0.8342 4.5054 0.6874 -10.0817 1.5827 -115.6528 1.3065 -7.8655

The first and second columns of the matrix coefs correspond to the car acceleration and car displacement response variables, respectively. Each row corresponds to a term in the MANOVA model, with the first row containing intercept terms. Display the names of the terms for the fitted coefficients. maov.ExpandedFactorNames ans = 1×5 string "(Intercept)"

"Year_70"

"Year_76"

"Cylinders_4"

"Cylinders_6"

The output shows that the last two rows of coefs correspond to the terms for number of engine cylinders. Test the null hypothesis that, for both response variables, the sum of the coefficients corresponding to the number of engine cylinders is zero. A = [0 0 0 1 1]; tbl = coeftest(maov,A) tbl=1×6 table TestStatistic _____________ "pillai"

Value _______

F ______

DFNumerator ___________

0.81715

210.04

2

DFDenominator _____________ 94

pValue __________ 2.0833e-35

The small p-value in the table output indicates that enough evidence exists to conclude that the sum of the engine cylinders coefficients is statistically different from zero.

Test Significance of Coefficients for Each Response Variable Load the carsmall data set. load carsmall

The variable Model_Year contains data for the year a car was manufactured, and the variable Cylinders contains data for the number of engine cylinders in the car. The Acceleration and Displacement variables contain data for car acceleration and displacement. Use the table function to create a table from the data in Model_Year, Cylinders, Acceleration, and Displacement. data = table(Model_Year,Cylinders,Acceleration,Displacement, ... VariableNames=["Year" "Cylinders" "Acceleration" "Displacement"]);

35-956

coeftest

Perform a two-way MANOVA using Model_Year and Cylinders as factors, and Acceleration and Displacement as response variables. maov = manova(data,["Acceleration" "Displacement"]);

maov is a manova object that contains the results of the two-way MANOVA. Display the fitted MANOVA model coefficients for maov. coefs = maov.Coefficients coefs = 5×2 14.9360 228.5164 -0.8342 4.5054 0.6874 -10.0817 1.5827 -115.6528 1.3065 -7.8655

The first and second columns of the matrix coefs correspond to the car acceleration and car displacement response variables, respectively. Each row corresponds to a term in the MANOVA model, with the first row containing intercept terms. Test the null hypothesis that the coefficients corresponding to the car acceleration sum to zero for each response variable. A = [0 0 0 1 1]; C = [1;0]; tbl = coeftest(maov,A,C) tbl=1×6 table TestStatistic _____________ "pillai"

Value _______

F ______

DFNumerator ___________

0.33182

47.176

1

DFDenominator _____________ 95

pValue __________ 6.6905e-10

The small p-value in the table output indicates that enough evidence exists to conclude that at least one of the sums is statistically different from zero.

Input Arguments maov — MANOVA results manova object MANOVA results, specified as a manova object. The properties of maov contain the coefficient estimates and MANOVA statistics used by coeftest to perform the F-test. A — Transform matrix p-by-p identity matrix (default) | a-by-p numeric matrix Transform matrix, specified as a p-by-p identity matrix or an a-by-p numeric matrix, where p is the number of terms in the MANOVA model for maov. A has rank a ≤ p. coeftest uses A to perform an F-test with the null hypothesis A*B*C = D. B is the matrix of coefficients in maov.Coefficients, C is a contrast matrix, and D is a matrix of hypothesized values. 35-957

35

Functions

Specify A to test for statistically significant differences in model coefficients between factor values. For more information, see “Multivariate Analysis of Variance for Repeated Measures” on page 9-62. Example: [1 1 3 0;0 0 2 1] Data Types: single | double C — Contrast matrix r-by-r identity matrix (default) | r-by-c numeric matrix Contrast matrix, specified as an r-by-r identity matrix or an r-by-c numeric matrix, where r is the number of response variables in the MANOVA model for maov. coeftest uses C to perform an F-test with the null hypothesis A*B*C = D. B is the matrix of coefficients in maov.Coefficients, A is a transform matrix, and D is a matrix of hypothesized values. Specify C to test for statistically significant differences in model coefficients between response variables. For more information, see “Multivariate Analysis of Variance for Repeated Measures” on page 9-62. Example: [0.25 0.4] Data Types: single | double D — Hypothesized values 0 (default) | numeric scalar | a-by-c numeric matrix Hypothesized values, specified as 0, a numeric scalar, or an a-by-c numeric matrix. a is the number of rows in the transform matrix A, and c is the number of columns in the contrast matrix C. If D is a scalar, the function expands it to be an a-by-c matrix. coeftest uses D to perform an F-test with the null hypothesis A*B*C = D. B is the matrix of coefficients in maov.Coefficients, A is a transform matrix, and C is a contrast matrix. Specify D to determine whether linear combinations of the coefficients estimated for maov are statistically equal to certain values. For more information, see “Multivariate Analysis of Variance for Repeated Measures” on page 9-62. Example: [0 0 0 0;1 1 1 1;2 2 2 2] Data Types: single | double testStat — MANOVA test statistics maov.TestStatistic (default) | "all" | "pillai" | "hotelling" | "wilks" | "roy" MANOVA test statistics, specified as maov.TestStatistic, "all", or one or more of the following values.

35-958

coeftest

Value

Test Name

"pillai" (default)

Pillai's trace

Equation V = trace Qh Qh + Qe =

−1

∑ θi,

where θi values are the solutions of the characteristic equation Qh – θ(Qh + Qe) = 0. Qh and Qe are, respectively, the hypotheses and the residual sum of squares product matrices. "hotelling"

Hotelling-Lawley trace

−1

U = trace QhQe

=

∑ λi,

where λi are the solutions of the characteristic equation |Qh – λQe| = 0. "wilks"

Wilk's lambda

"roy"

Roy's maximum root statistic

Λ=

Qe = Qh + Qe

∏ 1 +1 λi . −1

Θ = max eig QhQe

.

If you specify testStat as "all", coeftest calculates all the test statistics in the table above. Example: TestStatistic="hotelling" Data Types: char | string | cell

Output Arguments tbl — Hypothesis test results table Hypothesis test results, returned as a table with the following variables: • TestStatistic — Test statistic used by coeftest to perform the F-test. • Value — Value of the test statistic. • F — F-statistic value. To calculate the F-statistic, coeftest transforms Value so that it follows an F-distribution under the null hypothesis. • DFNumerator — Degrees of freedom for the numerator of the F-statistic. • DFDenominator — Degrees of freedom for the denominator of the F-statistic. • pValue — p-value for the F-statistic. H — Hypothesis matrix numeric matrix Hypothesis matrix, returned as a numeric matrix. coeftest uses H to calculate the test statistic. For more information about H, see Qh in “Multivariate Analysis of Variance for Repeated Measures” on page 9-62. Data Types: single | double 35-959

35

Functions

E — Error matrix numeric matrix Error matrix, returned as a numeric matrix. coeftest uses E to calculate the test statistic. For more information about E, see Qe in “Multivariate Analysis of Variance for Repeated Measures” on page 962. Data Types: single | double

Version History Introduced in R2023b

See Also manova Topics “Multivariate Analysis of Variance for Repeated Measures” on page 9-62

35-960

coefTest

coefTest Linear hypothesis test on multinomial regression model coefficients

Syntax p = coefTest(mdl) p = coefTest(mdl,H) p = coefTest(mdl,H,C) [p,F] = coefTest( ___ ) [p,F,r] = coefTest( ___ )

Description p = coefTest(mdl) computes the p-value for an F-test that all coefficient estimates in mdl are zero. p = coefTest(mdl,H) performs an F-test that H × B = 0, where B represents the coefficient vector. Use H to specify the coefficients to include in the F-test. p = coefTest(mdl,H,C) performs an F-test that H × B = C. [p,F] = coefTest( ___ ) also returns the F-test statistic F using any of the input argument combinations in previous syntaxes. [p,F,r] = coefTest( ___ ) also returns the numerator degrees of freedom r for the test.

Examples Test Significance of Multinomial Regression Model Load the fisheriris data set. load fisheriris

The column vector species contains iris flowers of three different species: setosa, versicolor, and virginica. The matrix meas contains four types of measurements for the flowers: the length and width of sepals and petals in centimeters. Create a table from the iris measurements and species data by using the array2table function. tbl = array2table(meas,... VariableNames=["SepalLength","SepalWidth","PetalLength","PetalWidth"]); tbl.Species = species;

Fit a multinomial regression model using the petal measurements as the predictor data and the species as the response data. mdl = fitmnr(tbl,"Species ~ PetalLength + PetalWidth^2") mdl = Multinomial regression with nominal responses

35-961

35

Functions

(Intercept_setosa) PetalLength_setosa PetalWidth_setosa PetalWidth^2_setosa (Intercept_versicolor) PetalLength_versicolor PetalWidth_versicolor PetalWidth^2_versicolor

Value _______

SE ______

tStat _______

pValue __________

136.9 -17.351 -77.383 -24.719 8.2731 -5.7089 35.208 -14.041

12.587 7.0021 24.06 8.3324 14.489 2.0638 21.97 7.1653

10.876 -2.478 -3.2163 -2.9666 0.571 -2.7662 1.6026 -1.9596

1.4933e-27 0.013211 0.0012987 0.0030111 0.568 0.0056709 0.10903 0.050037

150 observations, 292 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 309.3988, p-value = 7.9151e-64

mdl is a multinomial regression model object that contains the results of the fitting a nominal multinomial regression model to the data. The chi-squared statistic and p-value correspond to the null hypothesis that the fitted model does not outperform a degenerate model consisting of only an intercept term. The large p-value indicates that not enough evidence exists to reject the null hypothesis. Perform an F-test to test the null hypothesis that all coefficients, except the intercept term, are zero. Use the default 95% significance level. p = coefTest(mdl) p = 3.5512e-133

The small p-value in the output indicates that enough evidence exists to reject the null hypothesis that all coefficients are zero. Enough evidence exists to conclude that at least one of the fitted model coefficients is statistically significant at the 95% significance level.

Test Significance of Multinomial Model Coefficient Load the carsmall data set. load carsmall

The variables Acceleration, Weight, and Model_Year contain data for car acceleration, weight, and model year, respectively. The variable MPG contains car mileage data in miles per gallon (MPG). Sort the data in MPG into four response categories by using the discretize function. MPG = discretize(MPG,[9 19 29 39 48]); tbl = table(MPG,Acceleration,Weight,Model_Year);

Fit a multinomial regression model of the car mileage as a function of the acceleration, weight, and model year. mdl = fitmnr(tbl,"MPG ~ Acceleration + Model_Year + Weight",CategoricalPredictors="Model_Year") mdl = Multinomial regression with nominal responses

35-962

coefTest

(Intercept_1) Acceleration_1 Weight_1 Model_Year_76_1 Model_Year_82_1 (Intercept_2) Acceleration_2 Weight_2 Model_Year_76_2 Model_Year_82_2 (Intercept_3) Acceleration_3 Weight_3 Model_Year_76_3 Model_Year_82_3

Value ________

SE _________

tStat _______

pValue ___________

154.38 -11.31 0.098347 182.33 -1690.4 177.87 -11.28 0.090009 187.19 -136.5 103.66 -11.359 0.080071 283.31 -34.727

15.697 0.53323 0.0034745 4.5868 4.6231 14.211 0.48884 0.0030349 4.2373 3.4781 14.991 0.48805 0.0033652 4.7309 4.0878

9.835 -21.21 28.306 39.75 -365.64 12.516 -23.076 29.658 44.176 -39.244 6.9146 -23.274 23.794 59.885 -8.4953

7.9576e-23 7.7405e-100 2.9244e-176 0 0 6.0891e-36 8.1522e-118 2.6661e-193 0 0 4.6928e-12 8.2157e-120 3.8879e-125 0 1.9743e-17

94 observations, 267 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 169.6193, p-value = 5.7114e-30

mdl is a multinomial regression model object that contains the results of fitting a nominal multinomial regression model to the data. By default, the fourth response category is the reference category. Each row of the table output corresponds to the coefficient of the model term in the first column. The tStat and pValue columns contain the t-statistics and p-values, respectively, for the null hypothesis that the corresponding coefficient is zero. The small p-values for the Model_Year terms indicate that the model year has a statistically significant effect on mdl. For example, the pvalue for the term Model_Year_76_2 indicates that a car being manufactured in 1976 has a π2 , where πi is the ith category probability. statistically significant effect on ln π4 You can use a numeric index matrix to investigate whether a group of coefficients contains a coefficient that is statistically significant. Use a numeric index matrix to test the null hypothesis that all coefficients corresponding to the Model_Year terms are zero. idx_Model_Year = [0 0 0 0 0 0 ];

0 0 0 0 0 0

0 0 0 0 0 0

1 0 0 0 0 0

0 1 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 1 0 0 0

0 0 0 1 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 1 0

0;... 0;... 0;... 0;... 0;... 1;...

[p_Model_Year,F_Model_Year,r_Model_Year] = coefTest(mdl,idx_Model_Year) p_Model_Year = 0 F_Model_Year = 4.8985e+04 r_Model_Year = 6

The returned p-value indicates that at least one of the category coefficients corresponding to Model_Year is statistically different from zero. This result is consistent with the small p-value for each of the Model_Term coefficients. 35-963

35

Functions

Input Arguments mdl — Multinomial regression model object MultinomialRegression model object Multinomial regression model object, specified as a MultinomialRegression model object created with the fitmnr function. H — Hypothesis matrix numeric index matrix | logical matrix Hypothesis matrix, specified as a full-rank numeric index matrix of size r-by-s, where r is the number of linear combinations of coefficients being tested, and s is the total number of coefficients. • If you specify H, then the output p is the p-value for an F-test that H × B = 0, where B represents the coefficient vector. • If you specify H and C, then the output p is the p-value for an F-test that H × B = C. Example: [1 0 0 0 0] tests the first coefficient among five coefficients. Data Types: single | double | logical C — Hypothesized value numeric vector Hypothesized value for testing the null hypothesis, specified as a numeric vector with the same number of rows as H. If you specify H and C, then the output p is the p-value for an F-test that H × B = C, where B represents the coefficient vector. Data Types: single | double

Output Arguments p — p-value for F-test numeric value in the range [0,1] p-value for the F-test, returned as a numeric value in the range [0,1]. F — Value of test statistic for F-test numeric value Value of the test statistic for the F-test, returned as a numeric value. r — Numerator degrees of freedom for F-test positive integer Numerator degrees of freedom for the F-test, returned as a positive integer. The F-statistic has r degrees of freedom in the numerator and mdl.DFE degrees of freedom in the denominator.

35-964

coefTest

Version History Introduced in R2023a

See Also MultinomialRegression | fitmnr

35-965

35

Functions

coefTest Linear hypothesis test on nonlinear regression model coefficients

Syntax p = coefTest(mdl) p = coefTest(mdl,H) p = coefTest(mdl,H,C) [p,F] = coefTest( ___ ) [p,F,r] = coefTest( ___ )

Description p = coefTest(mdl) computes the p-value for an F-test that all coefficient estimates in mdl are zero. p = coefTest(mdl,H) performs an F-test that H × B = 0, where B represents the coefficient vector. Use H to specify the coefficients to include in the F-test. p = coefTest(mdl,H,C) performs an F-test that H × B = C. [p,F] = coefTest( ___ ) also returns the F-test statistic F using any of the input argument combinations in previous syntaxes. [p,F,r] = coefTest( ___ ) also returns the numerator degrees of freedom r for the test.

Examples Test Nonlinear Regression Model Coefficients Make a nonlinear model of mileage as a function of the weight from the carsmall data set. Test the coefficients to see if all should be zero. Create an exponential model of car mileage as a function of weight from the carsmall data. Scale the weight by a factor of 1000 so all the variables are roughly equal in size. load carsmall X = Weight; y = MPG; modelfun = 'y ~ b1 + b2*exp(-b3*x/1000)'; beta0 = [1 1 1]; mdl = fitnlm(X,y,modelfun,beta0);

Test the model for significant differences from a constant model. p = coefTest(mdl) p = 1.3708e-36

There is no doubt that the model contains nonzero terms. 35-966

coefTest

Input Arguments mdl — Nonlinear regression model object NonLinearModel object Nonlinear regression model object, specified as a NonLinearModel object created by using fitnlm. H — Hypothesis matrix numeric index matrix Hypothesis matrix, specified as a numeric index matrix with one column for each coefficient in the model. • If you specify H, then the output p is the p-value for an F-test that H × B = 0, where B represents the coefficient vector. • If you specify H and C, then the output p is the p-value for an F-test that H × B = C. Data Types: single | double C — Hypothesized value numeric vector Hypothesized value for testing the null hypothesis, specified as a numeric vector with the same number of rows as H. If you specify H and C, then the output p is the p-value for an F-test that H × B = C, where B represents the coefficient vector. Data Types: single | double

Output Arguments p — p-value for F-test numeric value in the range [0,1] p-value for the F-test, returned as a numeric value in the range [0,1]. F — Value of test statistic for F-test numeric value Value of the test statistic for the F-test, returned as a numeric value. r — Numerator degrees of freedom for F-test positive integer Numerator degrees of freedom for the F-test, returned as a positive integer. The F-statistic has r degrees of freedom in the numerator and mdl.DFE degrees of freedom in the denominator.

More About Test Statistics The p-value, F statistic, and numerator degrees of freedom are valid under these assumptions: 35-967

35

Functions

• The data comes from a normal distribution. • The entries are independent. Suppose these assumptions hold. Let β represent the unknown coefficient vector of the linear regression. Suppose H is a full-rank numeric index matrix of size r-by-s, where r is the number of linear combinations of coefficients being tested, and s is the number of terms in β. Let c be a vector the same size as β. The following is a test statistic for the hypothesis that Hβ = c: F = Hβ − c ′ HVH′

−1

Hβ − c /r .

Here β is the estimate of the coefficient vector β in mdl.Coefs, and V is the estimated covariance of the coefficient estimates in mdl.CoefCov. When the hypothesis is true, the test statistic F has an “F Distribution” on page B-46 with r and u degrees of freedom.

Alternatives The values of commonly used test statistics are available in the mdl.Coefficients table.

Version History Introduced in R2012a

See Also NonLinearModel Topics “Nonlinear Regression” on page 13-2

35-968

coeftest

coeftest Linear hypothesis test on coefficients of repeated measures model

Syntax tbl = coeftest(rm,A,C,D)

Description tbl = coeftest(rm,A,C,D) returns a table tbl containing the multivariate analysis of variance (manova) for the repeated measures model rm.

Examples Test Coefficients for First and Last Repeated Measures Load the sample data. load repeatedmeas

The table between includes the between-subject variables age, IQ, group, gender, and eight repeated measures y1 through y8 as responses. The table within includes the within-subject variables w1 and w2. This is simulated data. Fit a repeated measures model, where the repeated measures y1 through y8 are the responses, and age, IQ, group, gender, and the group-gender interaction are the predictor variables. Also specify the within-subject design matrix. rm = fitrm(between,'y1-y8 ~ Group*Gender + Age + IQ','WithinDesign',within);

Test that the coefficients of all terms in the between-subjects model are the same for the first and last repeated measurement variable. coeftest(rm,eye(8),[1 0 0 0 0 0 0 -1]') ans=4×7 table Statistic _________

Value _______

F ______

RSquare _______

Pillai Wilks Hotelling Roy

0.3355 0.6645 0.50488 0.50488

1.3884 1.3884 1.3884 1.3884

0.3355 0.3355 0.3355 0.3355

df1 ___ 8 8 8 8

df2 ___

pValue _______

22 22 22 22

0.25567 0.25567 0.25567 0.25567

The p-value of 0.25567 indicates that there is not enough statistical evidence to conclude that the coefficients of all terms in the between-subjects model for the first and last repeated measures variable are different.

35-969

35

Functions

Input Arguments rm — Repeated measures model RepeatedMeasuresModel object Repeated measures model, returned as a RepeatedMeasuresModel object. For properties and methods of this object, see RepeatedMeasuresModel. A — Specification representing between-subjects model a-by-p matrix Specification representing the between-subjects model, specified as an a-by-p numeric matrix, with rank a ≤ p. Data Types: single | double C — Specification representing within-subjects hypothesis r-by-c matrix Specification representing the within-subjects (within time) hypotheses, specified as an r-by-c numeric matrix, with rank c ≤ r ≤ n – p. Data Types: single | double D — Hypothesized value 0 (default) | scalar value | a-by-c matrix Hypothesized value, specified as a scalar value or an a-by-c matrix. Data Types: single | double

Output Arguments tbl — Results of multivariate analysis of variance table Results of multivariate analysis of variance for the repeated measures model rm, returned as a table containing the following columns.

35-970

Statistic

Type of test statistic used

Value

Value of the corresponding test statistic

F

F-statistic value

RSquare

Measure of variance explained

df1

Numerator degrees of freedom for the F-statistic

df2

Denominator degrees of freedom for the Fstatistic

pValue

p-value associated with the test statistic value

coeftest

Tips • This test is defined as A*B*C = D, where B is the matrix of coefficients in the repeated measures model. A and C are numeric matrices of the proper size for this multiplication. D is a scalar or numeric matrix of the proper size. The default is D = 0.

Version History Introduced in R2014a

See Also fitrm | manova

35-971

35

Functions

combine Combine two ensembles

Syntax B1 = combine(B1,B2)

Description B1 = combine(B1,B2) appends decision trees from ensemble B2 to those stored in B1 and returns ensemble B1. This method requires that the class and variable names be identical in both ensembles.

See Also append

35-972

combnk

combnk (Not recommended) Enumeration of combinations Note combnk is not recommended. Use the MATLAB® function nchoosek instead. For more information, see “Compatibility Considerations”.

Syntax C = combnk(v,k)

Description C = combnk(v,k) returns a matrix containing all possible combinations of the elements of vector v taken k at a time. Matrix C has k columns and n!/((n – k)! k!) rows, where n is the number of observations in v.

Examples Combinations of Four Characters Create a character array of every four-letter combination of the characters in the word 'tendril'. C = combnk('tendril',4);

C is a 35-by-4 character array. Display the last five combinations in the list. last5 = C(31:35,:) last5 = 5x4 char array 'tedr' 'tenl' 'teni' 'tenr' 'tend'

Combinations of Elements from a Numeric Vector List all two-number combinations of the numbers one through four. C = combnk(1:4,2) C = 6×2 3

4

35-973

35

Functions

2 2 1 1 1

4 3 4 3 2

Because 1:4 is a vector of doubles, C is a matrix of doubles.

Input Arguments v — Set of all elements vector Set of all elements, specified as a vector. Example: [1 2 3 4 5] Example: 'abcd' Data Types: single | double | logical | char k — Number of selected choices nonnegative integer scalar Number of elements to select, specified as a nonnegative integer scalar. k can be any numeric type, but must be real. There are no restrictions on combining inputs of different types for combnk(v,k). Example: 3 Data Types: single | double

Output Arguments C — All combinations matrix All combinations of v, returned as a matrix of the same type as v. C has k columns and n!/((n – k)! k!) rows, where n is the number of observations in v. Each row of C contains a combination of k items selected from v. The elements in each row of C are listed in the same order as they appear in v. If k is larger than n, then C is an empty matrix.

Limitations combnk is practical only for situations where v has fewer than 15 observations.

Version History Introduced before R2006a 35-974

combnk

R2020b: combnk is not recommended Not recommended starting in R2020b combnk is not recommended. Use the MATLAB function nchoosek instead. There are no plans to remove combnk. To update your code, change instances of the function name combnk to nchoosek. You do not need to change the input arguments. For example, use C = nchoosek(v,k). The output C contains all possible combinations of the elements of vector v taken k at a time. Note that C from nchoosek can have a different order compared to the output from combnk. The nchoosek function has several advantages over the combnk function. • nchoosek also returns the binomial coefficient when the first input argument is a scalar value. • nchoosek has extended functionality using MATLAB Coder. • nchoosek is faster than combnk.

See Also nchoosek | perms | randperm

35-975

35

Functions

compact Reduce size of machine learning model

Syntax CompactMdl = compact(Mdl)

Description CompactMdl = compact(Mdl) returns a compact model (CompactMdl), the compact version of the trained machine learning model Mdl. CompactMdl does not contain the training data, whereas Mdl contains the training data in its X and Y properties. Therefore, although you can predict class labels using CompactMdl, you cannot perform tasks such as cross-validation with the compact model.

Examples Reduce Size of Naive Bayes Classifier Reduce the size of a full naive Bayes classifier by removing the training data. Full naive Bayes classifiers hold the training data. You can use a compact naive Bayes classifier to improve memory efficiency. Load the ionosphere data set. Remove the first two predictors for stability. load ionosphere X = X(:,3:end);

Train a naive Bayes classifier using the predictors X and class labels Y. A recommended practice is to specify the class names. fitcnb assumes that each predictor is conditionally and normally distributed. Mdl = fitcnb(X,Y,'ClassNames',{'b','g'}) Mdl = ClassificationNaiveBayes ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: DistributionNames: DistributionParameters:

'Y' [] {'b' 'g'} 'none' 351 {1x32 cell} {2x32 cell}

Mdl is a trained ClassificationNaiveBayes classifier. Reduce the size of the naive Bayes classifier. 35-976

compact

CMdl = compact(Mdl) CMdl = CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell}

CMdl is a trained CompactClassificationNaiveBayes classifier. Display the amount of memory used by each classifier. whos('Mdl','CMdl') Name

Size

CMdl Mdl

1x1 1x1

Bytes 15229 111359

Class classreg.learning.classif.CompactClassificationNaiveBayes ClassificationNaiveBayes

The full naive Bayes classifier (Mdl) is more than seven times larger than the compact naive Bayes classifier (CMdl). To label new observations efficiently, you can remove Mdl from the MATLAB® Workspace, and then pass CMdl and new predictor values to predict.

Reduce Size of SVM Classifier Reduce the size of a full support vector machine (SVM) classifier by removing the training data. Full SVM classifiers (that is, ClassificationSVM classifiers) hold the training data. To improve efficiency, use a smaller classifier. Load the ionosphere data set. load ionosphere

Train an SVM classifier. Standardize the predictor data and specify the order of the classes. SVMModel = fitcsvm(X,Y,'Standardize',true,... 'ClassNames',{'b','g'}) SVMModel = ClassificationSVM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: Alpha: Bias: KernelParameters:

'Y' [] {'b' 'g'} 'none' 351 [90x1 double] -0.1343 [1x1 struct]

35-977

35

Functions

Mu: Sigma: BoxConstraints: ConvergenceInfo: IsSupportVector: Solver:

[0.8917 0 0.6413 0.0444 0.6011 0.1159 0.5501 0.1194 0.5118 0.1813 0.47 [0.3112 0 0.4977 0.4414 0.5199 0.4608 0.4927 0.5207 0.5071 0.4839 0.56 [351x1 double] [1x1 struct] [351x1 logical] 'SMO'

SVMModel is a ClassificationSVM classifier. Reduce the size of the SVM classifier. CompactSVMModel = compact(SVMModel) CompactSVMModel = CompactClassificationSVM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: Alpha: Bias: KernelParameters: Mu: Sigma: SupportVectors: SupportVectorLabels:

'Y' [] {'b' 'g'} 'none' [90x1 double] -0.1343 [1x1 struct] [0.8917 0 0.6413 0.0444 0.6011 0.1159 0.5501 0.1194 0.5118 0.1813 0.47 [0.3112 0 0.4977 0.4414 0.5199 0.4608 0.4927 0.5207 0.5071 0.4839 0.56 [90x34 double] [90x1 double]

CompactSVMModel is a CompactClassificationSVM classifier. Display the amount of memory used by each classifier. whos('SVMModel','CompactSVMModel') Name

Size

CompactSVMModel SVMModel

1x1 1x1

Bytes 31227 141317

Class

classreg.learning.classif.CompactClassificationSVM ClassificationSVM

The full SVM classifier (SVMModel) is more than four times larger than the compact SVM classifier (CompactSVMModel). To label new observations efficiently, you can remove SVMModel from the MATLAB® Workspace, and then pass CompactSVMModel and new predictor values to predict. To further reduce the size of the compact SVM classifier, use the discardSupportVectors function to discard support vectors.

Reduce Size of Generalized Additive Model Reduce the size of a full generalized additive model (GAM) for regression by removing the training data. Full models hold the training data. You can use a compact model to improve memory efficiency. 35-978

compact

Load the carbig data set. load carbig

Specify Acceleration, Displacement, Horsepower, and Weight as the predictor variables (X) and MPG as the response variable (Y). X = [Acceleration,Displacement,Horsepower,Weight]; Y = MPG;

Train a GAM using X and Y. Mdl = fitrgam(X,Y) Mdl = RegressionGAM ResponseName: CategoricalPredictors: ResponseTransform: Intercept: IsStandardDeviationFit: NumObservations:

'Y' [] 'none' 26.9442 0 398

Mdl is a RegressionGAM model object. Reduce the size of the model. CMdl = compact(Mdl) CMdl = CompactRegressionGAM ResponseName: CategoricalPredictors: ResponseTransform: Intercept: IsStandardDeviationFit:

'Y' [] 'none' 26.9442 0

CMdl is a CompactRegressionGAM model object. Display the amount of memory used by each regression model. whos('Mdl','CMdl') Name

Size

CMdl Mdl

1x1 1x1

Bytes 578332 612126

Class

Attributes

classreg.learning.regr.CompactRegressionGAM RegressionGAM

The full model (Mdl) is larger than the compact model (CMdl). To efficiently predict responses for new observations, you can remove Mdl from the MATLAB® Workspace, and then pass CMdl and new predictor values to predict.

35-979

35

Functions

Input Arguments Mdl — Machine learning model full regression model object | full classification model object Machine learning model, specified as a full regression or classification model object, as given in the following tables of supported models. Regression Model Object Model

Full Regression Model Object

Gaussian process regression (GPR) model

RegressionGP

Generalized additive model (GAM)

RegressionGAM

Neural network model

RegressionNeuralNetwork

Classification Model Object Model

Full Classification Model Object

Generalized additive model

ClassificationGAM

Naive Bayes model

ClassificationNaiveBayes

Neural network model

ClassificationNeuralNetwork

Support vector machine for one-class and binary classification

ClassificationSVM

Output Arguments CompactMdl — Compact machine learning model compact regression model object | compact classification model object Compact machine learning model, returned as one of the compact model objects in the following tables, depending on the input model Mdl. Regression Model Object

35-980

Model

Full Model (Mdl)

Compact Model (CompactMdl)

Gaussian process regression (GPR) model

RegressionGP

CompactRegressionGP

Generalized additive model

RegressionGAM

CompactRegressionGAM

Neural network model

RegressionNeuralNetwork

CompactRegressionNeuralN etwork

compact

Classification Model Object Model

Full Model (Mdl)

Compact Model (CompactMdl)

Generalized additive model

ClassificationGAM

CompactClassificationGAM

Naive Bayes model

ClassificationNaiveBayes CompactClassificationNai veBayes

Neural network model

ClassificationNeuralNetw CompactClassificationNeu ork ralNetwork

Support vector machine for one- ClassificationSVM class and binary classification

CompactClassificationSVM

Version History Introduced in R2014a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • This function fully supports GPU arrays for a trained classification model specified as a ClassificationSVM object. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

35-981

35

Functions

compact Reduce size of discriminant analysis classifier

Syntax Cmdl = compact(mdl)

Description Cmdl = compact(mdl) returns a CompactClassificationDiscriminant version of the trained discriminant analysis classifier mdl. You can predict classifications using the CompactClassificationDiscriminant object Cmdl in the same way as when you use mdl. However, because Cmdl does not contain training data, you cannot perform some actions, such as cross-validation.

Examples Sizes of Full and Compact Versions of Discriminant Analysis Classifier Compare the size of the discriminant analysis classifier for Fisher's iris data to the compact version of the classifier. load fisheriris fullobj = fitcdiscr(meas,species); cobj = compact(fullobj); b = whos('fullobj'); % b.bytes = size of fullobj c = whos('cobj'); % c.bytes = size of cobj [b.bytes c.bytes] % shows cobj uses 60% of the memory ans = 18578

11498

Input Arguments mdl — Discriminant analysis classifier ClassificationDiscriminant model object Full discriminant analysis classifier, specified as a ClassificationDiscriminant model object trained with fitcdiscr.

Version History Introduced in R2011b

35-982

compact

See Also Classes ClassificationDiscriminant | CompactClassificationDiscriminant Functions fitcdiscr Topics “Discriminant Analysis Classification” on page 21-2

35-983

35

Functions

compact Reduce size of multiclass error-correcting output codes (ECOC) model

Syntax CompactMdl = compact(Mdl)

Description CompactMdl = compact(Mdl) returns a compact multiclass error-correcting output codes (ECOC) model (CompactMdl), the compact version of the trained ECOC model Mdl. CompactMdl is a CompactClassificationECOC object. CompactMdl does not contain the training data, whereas Mdl contains the training data in its X and Y properties. Therefore, although you can predict class labels using CompactMdl, you cannot perform tasks such as cross-validation with the compact ECOC model.

Examples Reduce Size of Full ECOC Model Reduce the size of a full ECOC model by removing the training data. Full ECOC models (ClassificationECOC models) hold the training data. To improve efficiency, use a smaller classifier. Load Fisher's iris data set. Specify the predictor data X, the response data Y, and the order of the classes in Y. load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y);

Train an ECOC model using SVM binary classifiers. Standardize the predictor data using an SVM template t, and specify the order of the classes. During training, the software uses default values for empty options in t. t = templateSVM('Standardize',true); Mdl = fitcecoc(X,Y,'Learners',t,'ClassNames',classOrder);

Mdl is a ClassificationECOC model. Reduce the size of the ECOC model. CompactMdl = compact(Mdl) CompactMdl = CompactClassificationECOC ResponseName: 'Y' CategoricalPredictors: [] ClassNames: [setosa ScoreTransform: 'none'

35-984

versicolor

virginica]

compact

BinaryLearners: {3x1 cell} CodingMatrix: [3x3 double]

CompactMdl is a CompactClassificationECOC model. CompactMdl does not store all of the properties that Mdl stores. In particular, it does not store the training data. Display the amount of memory each classifier uses. whos('CompactMdl','Mdl') Name

Size

Bytes

Class

A

CompactMdl Mdl

1x1 1x1

15792 29207

classreg.learning.classif.CompactClassificationECOC ClassificationECOC

The full ECOC model (Mdl) is approximately double the size of the compact ECOC model (CompactMdl). To label new observations efficiently, you can remove Mdl from the MATLAB® Workspace, and then pass CompactMdl and new predictor values to predict.

Input Arguments Mdl — Full, trained multiclass ECOC model ClassificationECOC model Full, trained multiclass ECOC model, specified as a ClassificationECOC model trained with fitcecoc.

Version History Introduced in R2014b

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also ClassificationECOC | CompactClassificationECOC | fitcecoc | predict

35-985

35

Functions

compact Reduce size of classification ensemble model

Syntax cens = compact(ens)

Description cens = compact(ens) returns a CompactClassificationEnsemble version of the trained classification ensemble model ens. You can predict classifications using the CompactClassificationEnsemble object cens in the same way as when you use ens. However, because cens does not contain training data, you cannot perform certain tasks, such as cross-validation.

Input Arguments ens — Full classification ensemble model ClassificationEnsemble model object Full classification ensemble model, specified as a ClassificationEnsemble model object trained with fitcensemble.

Examples View Size of Compact Classification Ensemble Compare the size of a classification ensemble for the Fisher iris data to the compact version of the ensemble. Load the Fisher iris data set. load fisheriris

Train an ensemble of 100 boosted classification trees using AdaBoostM2. t = templateTree(MaxNumSplits=1); % Weak learner template tree object ens = fitcensemble(meas,species,"Method","AdaBoostM2","Learners",t);

Create a compact version of ens and compare ensemble sizes. cens = compact(ens); b = whos("ens"); % b.bytes = size of ens c = whos("cens"); % c.bytes = size of cens [b.bytes c.bytes] % Shows cens uses less memory ans = 1×2

35-986

compact

464631

423531

The compact version of the ensemble uses less memory than the full ensemble. Note that the ensemble sizes can vary slightly, depending on your operating system.

Version History Introduced in R2011a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also ClassificationEnsemble | CompactClassificationEnsemble | fitcensemble Topics “Framework for Ensemble Learning” on page 19-34

35-987

35

Functions

compact Compact tree

Syntax ctree = compact(tree)

Description ctree = compact(tree) creates a compact version of tree.

Examples Create a Compact Classification Tree Compare the size of the classification tree for Fisher's iris data to the compact version of the tree. load fisheriris fulltree = fitctree(meas,species); ctree = compact(fulltree); b = whos('fulltree'); % b.bytes = size of fulltree c = whos('ctree'); % c.bytes = size of ctree [b.bytes c.bytes] % shows ctree uses half the memory ans = 1×2 11931

5266

Input Arguments tree — Classification tree ClassificationTree object Classification tree, specified as a ClassificationTree object. Use the fitctree function to create a classification tree object.

Output Arguments ctree — Compact decision tree CompactClassificationTree object Compact decision tree, returned as a CompactClassificationTree object. You can predict classifications using ctree exactly as you can using tree. However, since ctree does not contain training data, you cannot perform some actions, such as cross validation. 35-988

compact

Version History Introduced in R2011a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also CompactClassificationTree | ClassificationTree | fitctree | predict

35-989

35

Functions

compact Package: timeseries.forecaster Reduce size of direct forecasting model

Syntax CompactMdl = compact(Mdl)

Description CompactMdl = compact(Mdl) returns a compact model (compactMdl), the compact version of the trained direct forecasting model Mdl. CompactMdl does not contain the training data, whereas Mdl contains the training data in its X and Y properties. Therefore, although you can predict and forecast using CompactMdl, you cannot perform tasks such as cross-validation with the compact model.

Examples Reduce Size of Direct Forecasting Model Reduce the size of a full direct forecasting model by removing the training data from the model. You can use a compact model to improve memory efficiency. Load the sample file TemperatureData.csv, which contains average daily temperatures from January 2015 through July 2016. Read the file into a table. Observe the first eight observations in the table. temperatures = readtable("TemperatureData.csv"); head(temperatures) Year ____

Month ___________

Day ___

TemperatureF ____________

2015 2015 2015 2015 2015 2015 2015 2015

{'January'} {'January'} {'January'} {'January'} {'January'} {'January'} {'January'} {'January'}

1 2 3 4 5 6 7 8

23 31 25 39 29 12 10 4

For this example, use a subset of the temperature data that omits the first 100 observations. Tbl = temperatures(101:end,:);

Create a datetime variable t that contains the year, month, and day information for each observation in Tbl. Then, use t to convert Tbl into a timetable. 35-990

compact

numericMonth = month(datetime(Tbl.Month, ... InputFormat="MMMM")); t = datetime(Tbl.Year,numericMonth,Tbl.Day); Tbl.Time = t; Tbl = table2timetable(Tbl);

Plot the temperature values in Tbl over time. plot(Tbl.Time,Tbl.TemperatureF) xlabel("Date") ylabel("Temperature in Fahrenheit")

Create a full direct forecasting model by using the data in Tbl. Train the model using a decision tree learner. All three of the predictors (Year, Month, and Day) are leading predictors because their future values are known. To create new predictors by shifting the leading predictor and response variables backward in time, specify the leading predictor lags and the response variable lags. Mdl = directforecaster(Tbl,"TemperatureF", ... Learner="tree", ... LeadingPredictors="all",LeadingPredictorLags={0:1,0:1,0:7}, ... ResponseLags=1:7) Mdl = DirectForecaster Horizon: 1 ResponseLags: [1 2 3 4 5 6 7] LeadingPredictors: [1 2 3]

35-991

35

Functions

LeadingPredictorLags: ResponseName: PredictorNames: CategoricalPredictors: Learners: MaxLag: NumObservations:

{[0 1] [0 1] [0 1 2 3 4 5 6 7]} 'TemperatureF' {'Year' 'Month' 'Day'} 2 {[1x1 classreg.learning.regr.CompactRegressionTree]} 7 465

Mdl is a DirectForecaster object. By default, the horizon is one step ahead. That is, Mdl predicts a value that is one step into the future. Reduce the size of the model by using the compact object function. compactMdl = compact(Mdl) compactMdl = CompactDirectForecaster Horizon: ResponseLags: LeadingPredictors: LeadingPredictorLags: ResponseName: PredictorNames: CategoricalPredictors: Learners: MaxLag:

1 [1 2 3 4 5 6 7] [1 2 3] {[0 1] [0 1] [0 1 2 3 4 5 6 7]} 'TemperatureF' {'Year' 'Month' 'Day'} 2 {[1x1 classreg.learning.regr.CompactRegressionTree]} 7

compactMdl is a CompactDirectForecaster model object. compactMdl contains fewer properties than the full model Mdl. Display the amount of memory used by each direct forecasting model. whos("Mdl","compactMdl") Name

Size

Mdl compactMdl

1x1 1x1

Bytes 110401 42541

Class timeseries.forecaster.DirectForecaster timeseries.forecaster.CompactDirectForecaster

The full model is larger than the compact model.

Input Arguments Mdl — Direct forecasting model DirectForecaster model object Direct forecasting model, specified as a DirectForecaster model object.

35-992

Attrib

compact

Output Arguments CompactMdl — Compact direct forecasting model CompactDirectForecaster model object Compact direct forecasting model, returned as a CompactDirectForecaster model object.

Version History Introduced in R2023b

See Also DirectForecaster | CompactDirectForecaster

35-993

35

Functions

compact Compact linear regression model

Syntax compactMdl = compact(mdl)

Description compactMdl = compact(mdl) returns the compact linear regression model compactMdl, which is the compact version of the full, fitted linear regression model mdl.

Examples Compact Linear Regression Model Fit a linear regression model to data and reduce the size of a full, fitted linear regression model by discarding the sample data and some information related to the fitting process. Load the largedata4reg data set, which contains 15,000 observations and 45 predictor variables. load largedata4reg

Fit a linear regression model to the data. mdl = fitlm(X,Y);

Compact the model. compactMdl = compact(mdl);

The compact model discards the original sample data and some information related to the fitting process. Compare the size of the full model mdl and the compact model compactMdl. vars = whos('compactMdl','mdl'); [vars(1).bytes,vars(2).bytes] ans = 1×2 81538

11409065

The compact model consumes less memory than the full model.

Input Arguments mdl — Linear regression model LinearModel object 35-994

compact

Linear regression model, specified as a LinearModel object created using fitlm or stepwiselm.

Output Arguments compactMdl — Compact linear regression model CompactLinearModel object Compact linear regression model, returned as a CompactLinearModel object. A CompactLinearModel object consumes less memory than a LinearModel object because a compact model does not store the input data used to fit the model or information related to the fitting process. You can still use a compact model to predict responses using new input data, but some LinearModel object functions do not work with a compact model.

Version History Introduced in R2016a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also CompactLinearModel | LinearModel | fitlm | stepwiselm Topics “Linear Regression Workflow” on page 11-35 “Interpret Linear Regression Results” on page 11-52 “Linear Regression” on page 11-9

35-995

35

Functions

compact Compact generalized linear regression model

Syntax compactMdl = compact(mdl)

Description compactMdl = compact(mdl) returns the compact generalized linear regression model compactMdl, which is the compact version of the full, fitted generalized linear regression model mdl.

Examples Compact Generalized Linear Regression Model Fit a generalized linear regression model to data and reduce the size of a full, fitted model by discarding the sample data and some information related to the fitting process. Load the largedata4reg data set, which contains 15,000 observations and 45 predictor variables. load largedata4reg

Fit a generalized linear regression model to the data using the first 15 predictor variables. mdl = fitglm(X(:,1:15),Y);

Compact the model. compactMdl = compact(mdl);

The compact model discards the original sample data and some information related to the fitting process, so it uses less memory than the full model. Compare the size of the full model mdl and the compact model compactMdl. vars = whos('compactMdl','mdl'); [vars(1).bytes,vars(2).bytes] ans = 1×2 15518

4382501

The compact model consumes less memory than the full model.

Input Arguments mdl — Generalized linear regression model GeneralizedLinearModel object 35-996

compact

Generalized linear regression model, specified as a GeneralizedLinearModel object created using fitglm or stepwiseglm.

Output Arguments compactMdl — Compact generalized linear regression model CompactGeneralizedLinearModel object Compact generalized linear regression model, returned as a CompactGeneralizedLinearModel object. A CompactGeneralizedLinearModel object consumes less memory than a GeneralizedLinearModel object because a compact model does not store the input data used to fit the model or information related to the fitting process. You can still use a compact model to predict responses using new input data, but some GeneralizedLinearModel object functions that require the input data do not work with a compact model.

Version History Introduced in R2016b

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also CompactGeneralizedLinearModel | GeneralizedLinearModel | fitglm | stepwiseglm

35-997

35

Functions

compact Reduce size of regression ensemble model

Syntax cens = compact(ens)

Description cens = compact(ens) returns a CompactRegressionEnsemble version of the trained regression ensemble model ens. You can predict regressions using the CompactRegressionEnsemble object cens in the same way as when you use ens. However, because cens does not contain training data, you cannot perform some actions, such as cross-validation.

Input Arguments ens — Full regression ensemble model RegressionEnsemble model object Full regression ensemble model, specified as a RegressionEnsemble model object trained with fitrensemble.

Examples View Size of Compact Regression Ensemble Compare the size of a regression ensemble for the carsmall data to the size of the compact version of the ensemble. Load the carsmall data set and select acceleration, number of cylinders, displacement, horsepower, and vehicle weight as predictors. load carsmall X = [Acceleration Cylinders Displacement Horsepower Weight];

Train an ensemble of regression trees. ens = fitrensemble(X,MPG);

Create a compact version of ens and compare ensemble sizes. cens = compact(ens); b = whos("ens"); c = whos("cens"); [b.bytes c.bytes] % b.bytes = size of ens and c.bytes = size of cens ans = 1×2

35-998

compact

501081

468548

The compact ensemble uses less memory.

Version History Introduced in R2011a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also RegressionEnsemble | CompactRegressionEnsemble | fitrensemble

35-999

35

Functions

compact Class: RegressionSVM Compact support vector machine regression model

Syntax compactMdl = compact(mdl)

Description compactMdl = compact(mdl) returns a compact support vector machine (SVM) regression model, compactMdl, which is the compact version of the full, trained SVM regression model mdl. compactMdl does not contain the training data, whereas mdl contains the training data in its properties mdl.X and mdl.Y.

Input Arguments mdl — Full, trained SVM regression model RegressionSVM model Full, trained SVM regression model, specified as a RegressionSVM model returned by fitrsvm.

Output Arguments compactMdl — Compact SVM regression model CompactRegressionSVM model Compact SVM regression model, returned as a CompactRegressionSVM model. Predict response values using compactMdl exactly as you would using mdl. However, since compactMdl does not contain training data, you cannot perform certain tasks, such as cross validation.

Examples Compact an SVM Regression Model This example shows how to reduce the size of a full, trained SVM regression model by discarding the training data and some information related to the training process. This example uses the abalone data from the UCI Machine Learning Repository. Download the data and save it in your current directory with the name 'abalone.data'. Read the data into a table. tbl = readtable('abalone.data','Filetype','text','ReadVariableNames',false); rng default % for reproducibility

35-1000

compact

The sample data contains 4177 observations. All of the predictor variables are continuous except for sex, which is a categorical variable with possible values 'M' (for males), 'F' (for females), and 'I' (for infants). The goal is to predict the number of rings on the abalone, and thereby determine its age, using physical measurements. Train an SVM regression model using a Gaussian kernel function and an automatic kernel scale. Standardize the data. mdl = fitrsvm(tbl,'Var9','KernelFunction','gaussian','KernelScale','auto','Standardize',true) mdl = RegressionSVM PredictorNames: ResponseName: CategoricalPredictors: ResponseTransform: Alpha: Bias: KernelParameters: Mu: Sigma: NumObservations: BoxConstraints: ConvergenceInfo: IsSupportVector: Solver:

{1x8 cell} 'Var9' 1 'none' [3635x1 double] 10.8144 [1x1 struct] [1x10 double] [1x10 double] 4177 [4177x1 double] [1x1 struct] [4177x1 logical] 'SMO'

Properties, Methods

Compact the model. compactMdl = compact(mdl) compactMdl = classreg.learning.regr.CompactRegressionSVM PredictorNames: {1x8 cell} ResponseName: 'Var9' CategoricalPredictors: 1 ResponseTransform: 'none' Alpha: [3635x1 double] Bias: 10.8144 KernelParameters: [1x1 struct] Mu: [1x10 double] Sigma: [1x10 double] SupportVectors: [3635x10 double] Properties, Methods

The compacted model discards the training data and some information related to the training process. Compare the size of the full model mdl and the compact model compactMdl. vars = whos('compactMdl','mdl'); [vars(1).bytes,vars(2).bytes]

35-1001

35

Functions

ans = 323793

775968

The compacted model consumes about half the memory of the full model.

Reduce Memory Consumption of SVM Regression Model This example shows how to reduce the memory consumption of a full, trained SVM regression model by compacting the model and discarding the support vectors. Load the carsmall sample data. load carsmall rng default % for reproducibility

Train a linear SVM regression model using Weight as the predictor variable and MPG as the response variable. Standardize the data. mdl = fitrsvm(Weight,MPG,'Standardize',true);

Note that MPG contains several NaN values. When training a model, fitrsvm will remove rows that contain NaN values from both the predictor and response data. As a result, the trained model uses only 94 of the 100 total observations contained in the sample data. Compact the regression model to discard the training data and some information related to the training process. compactMdl = compact(mdl);

compactMdl is a CompactRegressionSVM model that has the same parameters, support vectors, and related estimates as mdl, but no longer stores the training data. Discard the support vectors and related estimates for the compacted model. mdlOut = discardSupportVectors(compactMdl);

mdlOut is a CompactRegressionSVM model that has the same parameters as mdl and compactMdl, but no longer stores the support vectors and related estimates. Compare the sizes of the three SVM regression models, compactMdl, mdl, and mdlOut. vars = whos('compactMdl','mdl','mdlOut'); [vars(1).bytes,vars(2).bytes,vars(3).bytes] ans = 3601

35-1002

13727

2305

compact

The compacted model compactMdl consumes 3601 bytes of memory, while the full model mdl consumes 13727 bytes of memory. The model mdlOut, which also discards the support vectors, consumes 2305 bytes of memory.

Version History Introduced in R2015b R2023a: GPU array support Starting in R2023a, compact fully supports GPU arrays.

References [1] Nash, W.J., T. L. Sellers, S. R. Talbot, A. J. Cawthorn, and W. B. Ford. "The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait." Sea Fisheries Division, Technical Report No. 48, 1994. [2] Waugh, S. "Extending and Benchmarking Cascade-Correlation: Extensions to the CascadeCorrelation Architecture and Benchmarking of Feed-forward Supervised Artificial Neural Networks." University of Tasmania Department of Computer Science thesis, 1995. [3] Clark, D., Z. Schreter, A. Adams. "A Quantitative Comparison of Dystal and Backpropagation." submitted to the Australian Conference on Neural Networks, 1996. [4] Lichman, M. UCI Machine Learning Repository, [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also fitrsvm | RegressionSVM | CompactRegressionSVM

35-1003

35

Functions

compact Compact regression tree

Syntax ctree = compact(tree)

Description ctree = compact(tree) creates a compact version of regression tree tree.

Examples Reduce Memory Consumption of Regression Tree Model Compare the size of a full regression tree model to the compacted model. Load the carsmall data set. Consider Acceleration, Displacement, Horsepower, and Weight as predictor variables. load carsmall X = [Acceleration Cylinders Displacement Horsepower Weight];

Grow a regression tree using the entire data set. Mdl = fitrtree(X,MPG) Mdl = RegressionTree ResponseName: CategoricalPredictors: ResponseTransform: NumObservations:

'Y' [] 'none' 94

Mdl is a RegressionTree model. It is a full model, that is, it stores information such as the predictor and response data fitrtree used in training. For a properties list of full regression tree models, see RegressionTree. Create a compact version of the full regression tree—that is, one that contains enough information to make predictions only. CMdl = compact(Mdl) CMdl = CompactRegressionTree ResponseName: 'Y' CategoricalPredictors: [] ResponseTransform: 'none'

35-1004

compact

CMdl is a CompactRegressionTree model. For a properties list of compact regression tree models, see CompactRegressionTree. Inspect the amounts of memory that the full and compact regression trees consume. mdlInfo = whos('Mdl'); cMdlInfo = whos('CMdl'); [mdlInfo.bytes cMdlInfo.bytes] ans = 1×2 12570

7067

cMdlInfo.bytes/mdlInfo.bytes ans = 0.5622

In this case, the compact regression tree model uses approximately half the memory that the full model uses.

Input Arguments tree — Regression tree RegressionTree object Regression tree, specified as a RegressionTree object created by the fitrtree function.

Output Arguments ctree — Compact regression tree CompactRegressionTree object Compact regression tree, returned as a CompactRegressionTree object. You can predict regressions using ctree exactly as you can using tree. However, because ctree does not contain training data, you cannot perform some actions, such as cross validation.

Version History Introduced in R2011a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also CompactRegressionTree | RegressionTree | predict | fitrtree 35-1005

35

Functions

compact Compact ensemble of decision trees

Description CMdl = compact(Mdl) creates a compact version of Mdl, a TreeBagger model object. You can predict regressions using CMdl exactly as you can using Mdl. However, since CMdl does not contain training data, you cannot perform some actions, such as make out-of-bag predictions using oobPredict.

Input Arguments Mdl A regression ensemble created with TreeBagger.

Output Arguments CMdl A compact regression ensemble. CMdl is of class CompactTreeBagger.

Examples Reduce Size of Ensemble of Bagged Trees Reduce the size of a full ensemble of bagged classification trees by removing the training data and parameters. Then, use the compact ensemble object to make predictions on new data. Using a compact ensemble improves memory efficiency. Load the ionosphere data set. load ionosphere

Set the random number generator to default for reproducibility. rng("default")

Train an ensemble of 100 bagged classification trees using the entire data set. By default, TreeBagger grows deep trees. Mdl = TreeBagger(100,X,Y,... Method="classification");

Mdl is a TreeBagger ensemble for classification trees. Create a compact version of Mdl. CMdl = compact(Mdl)

35-1006

compact

CMdl = CompactTreeBagger Ensemble with 100 bagged decision trees: Method: classification NumPredictors: 34 ClassNames: 'b' 'g'

CMdl is a CompactTreeBagger ensemble for classification trees. Display the amount of memory used by each ensemble. whos("Mdl","CMdl") Name

Size

CMdl Mdl

1x1 1x1

Bytes 993836 1132811

Class

Attributes

CompactTreeBagger TreeBagger

Mdl takes up more space than CMdl. The CMdl.Trees property is a 100-by-1 cell vector that contains the trained classification trees for the ensemble. Each tree is a CompactClassificationTree object. View the graphical display of the first trained classification tree. view(CMdl.Trees{1},Mode="graph");

35-1007

35

Functions

Predict the label of the mean of X by using the compact ensemble. predMeanX = predict(CMdl,mean(X)) predMeanX = 1x1 cell array {'g'}

See Also error | CompactTreeBagger | predict Topics “Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger” on page 19-115 “Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger” on page 19-126

35-1008

CompactClassificationDiscriminant

CompactClassificationDiscriminant Package: classreg.learning.classif Compact discriminant analysis class

Description A CompactClassificationDiscriminant object is a compact version of a discriminant analysis classifier. The compact version does not include the data for training the classifier. Therefore, you cannot perform some tasks with a compact classifier, such as cross validation. Use a compact classifier for making predictions (classifications) of new data.

Construction cobj = compact(obj) constructs a compact classifier from a full classifier. cobj = makecdiscr(Mu,Sigma) constructs a compact discriminant analysis classifier from the class means Mu and covariance matrix Sigma. For syntax details, see makecdiscr. Input Arguments obj Discriminant analysis classifier, created using fitcdiscr.

Properties BetweenSigma p-by-p matrix, the between-class covariance, where p is the number of predictors. CategoricalPredictors Categorical predictor indices, which is always empty ([]) . ClassNames List of the elements in the training data Y with duplicates removed. ClassNames can be a categorical array, cell array of character vectors, character array, logical vector, or a numeric vector. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.) Coeffs k-by-k structure of coefficient matrices, where k is the number of classes. Coeffs(i,j) contains coefficients of the linear or quadratic boundaries between classes i and j. Fields in Coeffs(i,j): • DiscrimType • Class1 — ClassNames(i) 35-1009

35

Functions

• Class2 — ClassNames(j) • Const — A scalar • Linear — A vector with p components, where p is the number of columns in X • Quadratic — p-by-p matrix, exists for quadratic DiscrimType The equation of the boundary between class i and class j is Const + Linear * x + x' * Quadratic * x = 0, where x is a column vector of length p. If fitcdiscr had the FillCoeffs name-value pair set to 'off' when constructing the classifier, Coeffs is empty ([]). Cost Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (i.e., the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. Change a Cost matrix using dot notation: obj.Cost = costMatrix. Delta Value of the Delta threshold for a linear discriminant model, a nonnegative scalar. If a coefficient of obj has magnitude smaller than Delta, obj sets this coefficient to 0, and so you can eliminate the corresponding predictor from the model. Set Delta to a higher value to eliminate more predictors. Delta must be 0 for quadratic discriminant models. Change Delta using dot notation: obj.Delta = newDelta. DeltaPredictor Row vector of length equal to the number of predictors in obj. If DeltaPredictor(i) < Delta then coefficient i of the model is 0. If obj is a quadratic discriminant model, all elements of DeltaPredictor are 0. DiscrimType Character vector specifying the discriminant type. One of: • 'linear' • 'quadratic' • 'diagLinear' • 'diagQuadratic' • 'pseudoLinear' • 'pseudoQuadratic' Change DiscrimType using dot notation: obj.DiscrimType = newDiscrimType. 35-1010

CompactClassificationDiscriminant

You can change between linear types, or between quadratic types, but cannot change between linear and quadratic types. Gamma Value of the Gamma regularization parameter, a scalar from 0 to 1. Change Gamma using dot notation: obj.Gamma = newGamma. • If you set 1 for linear discriminant, the discriminant sets its type to 'diagLinear'. • If you set a value between MinGamma and 1 for linear discriminant, the discriminant sets its type to 'linear'. • You cannot set values below the value of the MinGamma property. • For quadratic discriminant, you can set either 0 (for DiscrimType 'quadratic') or 1 (for DiscrimType 'diagQuadratic'). LogDetSigma Logarithm of the determinant of the within-class covariance matrix. The type of LogDetSigma depends on the discriminant type: • Scalar for linear discriminant analysis • Vector of length K for quadratic discriminant analysis, where K is the number of classes MinGamma Nonnegative scalar, the minimal value of the Gamma parameter so that the correlation matrix is invertible. If the correlation matrix is not singular, MinGamma is 0. Mu Class means, specified as a K-by-p matrix of scalar values class means of size. K is the number of classes, and p is the number of predictors. Each row of Mu represents the mean of the multivariate normal distribution of the corresponding class. The class indices are in the ClassNames attribute. PredictorNames Cell array of names for the predictor variables, in the order in which they appear in the training data X. Prior Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames. Add or change a Prior vector using dot notation: obj.Prior = priorVector. ResponseName Character vector describing the response variable Y. ScoreTransform Character vector representing a built-in transformation function, or a function handle for transforming scores. 'none' means no transformation; equivalently, 'none' means @(x)x. For a list 35-1011

35

Functions

of built-in transformation functions and the syntax of custom transformation functions, see fitcdiscr. Implement dot notation to add or change a ScoreTransform function using one of the following: • cobj.ScoreTransform = 'function' • cobj.ScoreTransform = @function Sigma Within-class covariance matrix or matrices. The dimensions depend on DiscrimType: • 'linear' (default) — Matrix of size p-by-p, where p is the number of predictors • 'quadratic' — Array of size p-by-p-by-K, where K is the number of classes • 'diagLinear' — Row vector of length p • 'diagQuadratic' — Array of size 1-by-p-by-K • 'pseudoLinear' — Matrix of size p-by-p • 'pseudoQuadratic' — Array of size p-by-p-by-K

Object Functions compareHoldout edge lime logp loss mahal margin nLinearCoeffs partialDependence plotPartialDependence predict shapley

Compare accuracies of two classification models using new data Classification edge for discriminant analysis classifier Local interpretable model-agnostic explanations (LIME) Log unconditional probability density for discriminant analysis classifier Classification error for discriminant analysis classifier Mahalanobis distance to class means of discriminant analysis classifier Classification margins for discriminant analysis classifier Number of nonzero linear coefficients in discriminant analysis classifier Compute partial dependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Predict labels using discriminant analysis classifier Shapley values

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects.

Examples Construct a Compact Discriminant Analysis Classifier Load the sample data. load fisheriris

Construct a discriminant analysis classifier for the sample data. fullobj = fitcdiscr(meas,species);

35-1012

CompactClassificationDiscriminant

Construct a compact discriminant analysis classifier, and compare its size to that of the full classifier. cobj = compact(fullobj); b = whos('fullobj'); % b.bytes = size of fullobj c = whos('cobj'); % c.bytes = size of cobj [b.bytes c.bytes] % shows cobj uses 60% of the memory ans = 1×2 18610

11847

The compact classifier is smaller than the full classifier.

Construct Classifier Using Means and Covariances Construct a compact discriminant analysis classifier from the means and covariances of the Fisher iris data. load fisheriris mu(1,:) = mean(meas(1:50,:)); mu(2,:) = mean(meas(51:100,:)); mu(3,:) = mean(meas(101:150,:)); mm1 = repmat(mu(1,:),50,1); mm2 = repmat(mu(2,:),50,1); mm3 = repmat(mu(3,:),50,1); cc = meas; cc(1:50,:) = cc(1:50,:) - mm1; cc(51:100,:) = cc(51:100,:) - mm2; cc(101:150,:) = cc(101:150,:) - mm3; sigstar = cc' * cc / 147; cpct = makecdiscr(mu,sigstar,... 'ClassNames',{'setosa','versicolor','virginica'});

More About Discriminant Classification The model for discriminant analysis is: • Each class (Y) generates data (X) using a multivariate normal distribution. That is, the model assumes X has a Gaussian mixture distribution (gmdistribution). • For linear discriminant analysis, the model has the same covariance matrix for each class, only the means vary. • For quadratic discriminant analysis, both means and covariances of each class vary. predict classifies so as to minimize the expected classification cost: y = argmin

K

∑

y = 1, ..., K k = 1

P k xCy k,

35-1013

35

Functions

where •

y is the predicted classification.

• K is the number of classes. • P k x is the posterior probability on page 21-6 of class k for observation x. • C y k is the cost on page 21-7 of classifying an observation as y when its true class is k. For details, see “Prediction Using Discriminant Analysis Models” on page 21-6. Regularization Regularization is the process of finding a small set of predictors that yield an effective predictive model. For linear discriminant analysis, there are two parameters, γ and δ, that control regularization as follows. cvshrink helps you select appropriate values of the parameters. Let Σ represent the covariance matrix of the data X, and let X be the centered data (the data X minus the mean by class). Define T

D = diag X * X . The regularized covariance matrix Σ is Σ = 1 − γ Σ + γD . Whenever γ ≥ MinGamma, Σ is nonsingular. Let μk be the mean vector for those elements of X in class k, and let μ0 be the global mean vector (the mean of the rows of X). Let C be the correlation matrix of the data X, and let C be the regularized correlation matrix: C = 1 − γ C + γI, where I is the identity matrix. The linear term in the regularized discriminant analysis classifier for a data point x is T −1

x − μ0 Σ

T

μk − μ0 = x − μ0 D−1/2 C

−1 −1/2

D

μk − μ0 .

The parameter δ enters into this equation as a threshold on the final term in square brackets. Each component of the vector C

−1 −1/2

D

μk − μ0 is set to zero if it is smaller in magnitude than the

threshold δ. Therefore, for class k, if component j is thresholded to zero, component j of x does not enter into the evaluation of the posterior probability. The DeltaPredictor property is a vector related to this threshold. When δ ≥ DeltaPredictor(i), all classes k have C

−1 −1/2

D

μk − μ0 ≤ δ .

Therefore, when δ ≥ DeltaPredictor(i), the regularized classifier does not use predictor i. 35-1014

CompactClassificationDiscriminant

Version History Introduced in R2011b

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict function supports code generation. • When you train a discriminant analysis model by using fitcdiscr or create a compact discriminant analysis model by using makecdiscr, the value of the 'ScoreTransform' namevalue pair argument cannot be an anonymous function. For more information, see “Introduction to Code Generation” on page 34-3.

See Also ClassificationDiscriminant | compact | makecdiscr | fitcdiscr | predict | compareHoldout Topics “Discriminant Analysis Classification” on page 21-2

35-1015

35

Functions

CompactClassificationECOC Compact multiclass model for support vector machines (SVMs) and other classifiers

Description CompactClassificationECOC is a compact version of the multiclass error-correcting output codes (ECOC) model. The compact classifier does not include the data used for training the multiclass ECOC model. Therefore, you cannot perform certain tasks, such as cross-validation, using the compact classifier. Use a compact multiclass ECOC model for tasks such as classifying new data (predict).

Creation You can create a CompactClassificationECOC model in two ways: • Create a compact ECOC model from a trained ClassificationECOC model by using the compact object function. • Create a compact ECOC model by using the fitcecoc function and specifying the 'Learners' name-value pair argument as 'linear', 'kernel', a templateLinear or templateKernel object, or a cell array of such objects.

Properties After you create a CompactClassificationECOC model object, you can use dot notation to access its properties. For an example, see “Train and Cross-Validate ECOC Classifier” on page 35-1021. ECOC Properties BinaryLearners — Trained binary learners cell vector of model objects Trained binary learners, specified as a cell vector of model objects. The number of binary learners depends on the number of classes in Y and the coding design. The software trains BinaryLearner{j} according to the binary problem specified by CodingMatrix(:,j). For example, for multiclass learning using SVM learners, each element of BinaryLearners is a CompactClassificationSVM classifier. Data Types: cell BinaryLoss — Binary learner loss function 'binodeviance' | 'exponential' | 'hamming' | 'hinge' | 'linear' | 'logit' | 'quadratic' Binary learner loss function, specified as a character vector representing the loss function name. This table identifies the default BinaryLoss value, which depends on the score ranges returned by the binary learners. 35-1016

CompactClassificationECOC

Assumption

Default Value

All binary learners are any of the following:

'quadratic'

• Classification decision trees • Discriminant analysis models • k-nearest neighbor models • Linear or kernel classification models of logistic regression learners • Naive Bayes models All binary learners are SVMs or linear or kernel classification models of SVM learners.

'hinge'

All binary learners are ensembles trained by AdaboostM1 or GentleBoost.

'exponential'

All binary learners are ensembles trained by LogitBoost.

'binodeviance'

You specify to predict class posterior probabilities by setting 'FitPosterior',true in fitcecoc.

'quadratic'

Binary learners are heterogeneous and use different loss functions. 'hamming' To check the default value, use dot notation to display the BinaryLoss property of the trained model at the command line. To potentially increase accuracy, specify a binary loss function other than the default during a prediction or loss computation by using the BinaryLoss name-value argument of predict or loss. For more information, see “Binary Loss” on page 35-6362. Data Types: char CodingMatrix — Class assignment codes numeric matrix Class assignment codes for the binary learners, specified as a numeric matrix. CodingMatrix is a Kby-L matrix, where K is the number of classes and L is the number of binary learners. The elements of CodingMatrix are –1, 0, and 1, and the values correspond to dichotomous class assignments. This table describes how learner j assigns observations in class i to a dichotomous class corresponding to the value of CodingMatrix(i,j). Value

Dichotomous Class Assignment

–1

Learner j assigns observations in class i to a negative class.

0

Before training, learner j removes observations in class i from the data set.

1

Learner j assigns observations in class i to a positive class.

Data Types: double | single | int8 | int16 | int32 | int64 LearnerWeights — Binary learner weights numeric row vector Binary learner weights, specified as a numeric row vector. The length of LearnerWeights is equal to the number of binary learners (length(Mdl.BinaryLearners)). 35-1017

35

Functions

LearnerWeights(j) is the sum of the observation weights that binary learner j uses to train its classifier. The software uses LearnerWeights to fit posterior probabilities by minimizing the Kullback-Leibler divergence. The software ignores LearnerWeights when it uses the quadratic programming method of estimating posterior probabilities. Data Types: double | single Other Classification Properties CategoricalPredictors — Categorical predictor indices vector of positive integers | [] Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). Data Types: single | double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: categorical | char | logical | single | double | cell Cost — Misclassification costs square numeric matrix This property is read-only. Misclassification costs, specified as a square numeric matrix. Cost has K rows and columns, where K is the number of classes. Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. Data Types: double PredictorNames — Predictor names cell array of character vectors Predictor names in order of their appearance in the predictor data, specified as a cell array of character vectors. The length of PredictorNames is equal to the number of variables in the training data X or Tbl used as predictor variables. Data Types: cell ExpandedPredictorNames — Expanded predictor names cell array of character vectors Expanded predictor names, specified as a cell array of character vectors. 35-1018

CompactClassificationECOC

If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. Data Types: cell Prior — Prior class probabilities numeric vector This property is read-only. Prior class probabilities, specified as a numeric vector. Prior has as many elements as the number of classes in ClassNames, and the order of the elements corresponds to the order of the classes in ClassNames. fitcecoc incorporates misclassification costs differently among different types of binary learners. Data Types: double ResponseName — Response variable name character vector Response variable name, specified as a character vector. Data Types: char ScoreTransform — Score transformation function to apply to predicted scores 'none' This property is read-only. Score transformation function to apply to the predicted scores, specified as 'none'. An ECOC model does not support score transformation.

Object Functions compareHoldout discardSupportVectors edge gather incrementalLearner lime loss margin partialDependence plotPartialDependence predict shapley

Compare accuracies of two classification models using new data Discard support vectors of linear SVM binary learners in ECOC model Classification edge for multiclass error-correcting output codes (ECOC) model Gather properties of Statistics and Machine Learning Toolbox object from GPU Convert multiclass error-correcting output codes (ECOC) model to incremental learner Local interpretable model-agnostic explanations (LIME) Classification loss for multiclass error-correcting output codes (ECOC) model Classification margins for multiclass error-correcting output codes (ECOC) model Compute partial dependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Classify observations using multiclass error-correcting output codes (ECOC) model Shapley values 35-1019

35

Functions

selectModels

Choose subset of multiclass ECOC models composed of binary ClassificationLinear learners Update model parameters for code generation

update

Examples Reduce Size of Full ECOC Model Reduce the size of a full ECOC model by removing the training data. Full ECOC models (ClassificationECOC models) hold the training data. To improve efficiency, use a smaller classifier. Load Fisher's iris data set. Specify the predictor data X, the response data Y, and the order of the classes in Y. load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y);

Train an ECOC model using SVM binary classifiers. Standardize the predictor data using an SVM template t, and specify the order of the classes. During training, the software uses default values for empty options in t. t = templateSVM('Standardize',true); Mdl = fitcecoc(X,Y,'Learners',t,'ClassNames',classOrder);

Mdl is a ClassificationECOC model. Reduce the size of the ECOC model. CompactMdl = compact(Mdl) CompactMdl = CompactClassificationECOC ResponseName: 'Y' CategoricalPredictors: [] ClassNames: [setosa versicolor ScoreTransform: 'none' BinaryLearners: {3x1 cell} CodingMatrix: [3x3 double]

virginica]

CompactMdl is a CompactClassificationECOC model. CompactMdl does not store all of the properties that Mdl stores. In particular, it does not store the training data. Display the amount of memory each classifier uses. whos('CompactMdl','Mdl')

35-1020

Name

Size

Bytes

Class

CompactMdl Mdl

1x1 1x1

15792 29207

classreg.learning.classif.CompactClassificationECOC ClassificationECOC

A

CompactClassificationECOC

The full ECOC model (Mdl) is approximately double the size of the compact ECOC model (CompactMdl). To label new observations efficiently, you can remove Mdl from the MATLAB® Workspace, and then pass CompactMdl and new predictor values to predict.

Train and Cross-Validate ECOC Classifier Train and cross-validate an ECOC classifier using different binary learners and the one-versus-all coding design. Load Fisher's iris data set. Specify the predictor data X and the response data Y. Determine the class names and the number of classes. load fisheriris X = meas; Y = species; classNames = unique(species(~strcmp(species,''))) % Remove empty classes classNames = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } K = numel(classNames) % Number of classes K = 3

You can use classNames to specify the order of the classes during training. For a one-versus-all coding design, this example has K = 3 binary learners. Specify templates for the binary learners such that: • Binary learner 1 and 2 are naive Bayes classifiers. By default, each predictor is conditionally, normally distributed given its label. • Binary learner 3 is an SVM classifier. Specify to use the Gaussian kernel. rng(1); % For reproducibility tNB = templateNaiveBayes(); tSVM = templateSVM('KernelFunction','gaussian'); tLearners = {tNB tNB tSVM};

tNB and tSVM are template objects for naive Bayes and SVM learning, respectively. The objects indicate which options to use during training. Most of their properties are empty, except those specified by name-value pair arguments. During training, the software fills in the empty properties with their default values. Train and cross-validate an ECOC classifier using the binary learner templates and the one-versus-all coding design. Specify the order of the classes. By default, naive Bayes classifiers use posterior probabilities as scores, whereas SVM classifiers use distances from the decision boundary. Therefore, to aggregate the binary learners, you must specify to fit posterior probabilities. CVMdl = fitcecoc(X,Y,'ClassNames',classNames,'CrossVal','on',... 'Learners',tLearners,'FitPosterior',true);

35-1021

35

Functions

CVMdl is a ClassificationPartitionedECOC cross-validated model. By default, the software implements 10-fold cross-validation. The scores across the binary learners have the same form (that is, they are posterior probabilities), so the software can aggregate the results of the binary classifications properly. Inspect one of the trained folds using dot notation. CVMdl.Trained{1} ans = CompactClassificationECOC ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' ScoreTransform: 'none' BinaryLearners: {3x1 cell} CodingMatrix: [3x3 double]

'virginica'}

Each fold is a CompactClassificationECOC model trained on 90% of the data. You can access the results of the binary learners using dot notation and cell indexing. Display the trained SVM classifier (the third binary learner) in the first fold. CVMdl.Trained{1}.BinaryLearners{3} ans = CompactClassificationSVM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: Alpha: Bias: KernelParameters: SupportVectors: SupportVectorLabels:

'Y' [] [-1 1] '@(S)sigmoid(S,-4.016735e+00,-3.243073e-01)' [33x1 double] -0.1345 [1x1 struct] [33x4 double] [33x1 double]

Estimate the generalization error. genError = kfoldLoss(CVMdl) genError = 0.0333

On average, the generalization error is approximately 3%.

More About Error-Correcting Output Codes Model An error-correcting output codes (ECOC) model reduces the problem of classification with three or more classes to a set of binary classification problems. 35-1022

CompactClassificationECOC

ECOC classification requires a coding design, which determines the classes that the binary learners train on, and a decoding scheme, which determines how the results (predictions) of the binary classifiers are aggregated. Assume the following: • The classification problem has three classes. • The coding design is one-versus-one. For three classes, this coding design is Class 1 Class 2 Class 3

Learner 1 Learner 2 Learner 3 1 1 0 −1 0 1 0 −1 −1

You can specify a different coding design by using the Coding name-value argument when you create a classification model. • The model determines the predicted class by using the loss-weighted decoding scheme with the binary loss function g. The software also supports the loss-based decoding scheme. You can specify the decoding scheme and binary loss function by using the Decoding and BinaryLoss name-value arguments, respectively, when you call object functions, such as predict, loss, margin, edge, and so on. The ECOC algorithm follows these steps. 1

Learner 1 trains on observations in Class 1 or Class 2, and treats Class 1 as the positive class and Class 2 as the negative class. The other learners are trained similarly.

2

Let M be the coding design matrix with elements mkl, and sl be the predicted classification score for the positive class of learner l. The algorithm assigns a new observation to the class (k ) that minimizes the aggregation of the losses for the B binary learners.

k = argmin k

∑

B

l=1

mkl g mkl, sl

∑

B

l=1

.

mkl

ECOC models can improve classification accuracy, compared to other multiclass models [1]. Coding Design The coding design is a matrix whose elements direct which classes are trained by each binary learner, that is, how the multiclass problem is reduced to a series of binary problems. Each row of the coding design corresponds to a distinct class, and each column corresponds to a binary learner. In a ternary coding design, for a particular column (or binary learner): • A row containing 1 directs the binary learner to group all observations in the corresponding class into a positive class. • A row containing –1 directs the binary learner to group all observations in the corresponding class into a negative class. • A row containing 0 directs the binary learner to ignore all observations in the corresponding class. 35-1023

35

Functions

Coding design matrices with large, minimal, pairwise row distances based on the Hamming measure are optimal. For details on the pairwise row distance, see “Random Coding Design Matrices” on page 35-1026 and [2]. This table describes popular coding designs.

35-1024

Coding Design

Description

Number of Learners

one-versus-all (OVA)

For each binary learner, K one class is positive and the rest are negative. This design exhausts all combinations of positive class assignments.

2

one-versus-one (OVO)

For each binary learner, K(K – 1)/2 one class is positive, one class is negative, and the rest are ignored. This design exhausts all combinations of class pair assignments.

1

binary complete

This design partitions 2K – 1 – 1 the classes into all binary combinations, and does not ignore any classes. That is, all class assignments are –1 and 1 with at least one positive class and one negative class in the assignment for each binary learner.

2K – 2

ternary complete

This design partitions the classes into all ternary combinations. That is, all class assignments are 0, –1, and 1 with at least one positive class and one negative class in the assignment for each binary learner.

(3K – 2K + 1 + 1)/2

Minimal Pairwise Row Distance

3K – 2

CompactClassificationECOC

Coding Design

Description

Number of Learners

Minimal Pairwise Row Distance

ordinal

For the first binary K–1 learner, the first class is negative and the rest are positive. For the second binary learner, the first two classes are negative and the rest are positive, and so on.

dense random

For each binary learner, Random, but Variable the software randomly approximately 10 log2K assigns classes into positive or negative classes, with at least one of each type. For more details, see “Random Coding Design Matrices” on page 351026.

sparse random

For each binary learner, Random, but Variable the software randomly approximately 15 log2K assigns classes as positive or negative with probability 0.25 for each, and ignores classes with probability 0.5. For more details, see “Random Coding Design Matrices” on page 35-1026.

1

This plot compares the number of binary learners for the coding designs with an increasing number of classes (K).

35-1025

35

Functions

Algorithms Random Coding Design Matrices For a given number of classes K, the software generates random coding design matrices as follows. 1

The software generates one of these matrices: a

Dense random — The software assigns 1 or –1 with equal probability to each element of the K-by-Ld coding design matrix, where Ld ≈ 10log2K .

b

Sparse random — The software assigns 1 to each element of the K-by-Ls coding design matrix with probability 0.25, –1 with probability 0.25, and 0 with probability 0.5, where Ls ≈ 15log2K .

2

If a column does not contain at least one 1 and one –1, then the software removes that column.

3

For distinct columns u and v, if u = v or u = –v, then the software removes v from the coding design matrix.

The software randomly generates 10,000 matrices by default, and retains the matrix with the largest, minimal, pairwise row distance based on the Hamming measure ([2]) given by L

Δ(k1, k2) = 0.5

∑ l=1

35-1026

mk1l mk2l mk1l − mk2l ,

CompactClassificationECOC

where mkjl is an element of coding design matrix j. Support Vector Storage By default and for efficiency, fitcecoc empties the Alpha, SupportVectorLabels, and SupportVectors properties for all linear SVM binary learners. fitcecoc lists Beta, rather than Alpha, in the model display. To store Alpha, SupportVectorLabels, and SupportVectors, pass a linear SVM template that specifies storing support vectors to fitcecoc. For example, enter: t = templateSVM('SaveSupportVectors',true) Mdl = fitcecoc(X,Y,'Learners',t);

You can remove the support vectors and related values by passing the resulting ClassificationECOC model to discardSupportVectors.

Version History Introduced in R2014b

References [1] Fürnkranz, Johannes. “Round Robin Classification.” J. Mach. Learn. Res., Vol. 2, 2002, pp. 721– 747. [2] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recog. Lett., Vol. 30, Issue 3, 2009, pp. 285–297.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict and update functions support code generation. • When you train an ECOC model by using fitcecoc, the following restrictions apply. • All binary learners must be SVM classifiers or linear classification models. For the Learners name-value argument, you can specify: • 'svm' or 'linear' • An SVM template object or a cell array of such objects (see templateSVM) • A linear classification model template object or a cell array of such objects (see templateLinear) • Code generation limitations for the binary learners used in the ECOC classifier also apply to the ECOC classifier. For linear classification models, you can specify only one regularization strength—'auto' or a nonnegative scalar for the Lambda name-value argument. • For code generation with a coder configurer, the following additional restrictions apply. 35-1027

35

Functions

• If you use a cell array of SVM template objects, the value of Standardize for SVM learners must be consistent. For example, if you specify 'Standardize',true for one SVM learner, you must specify the same value for all SVM learners. • If you use a cell array of SVM template objects, and you use one SVM learner with a linear kernel ('KernelFunction','linear') and another with a different type of kernel function, then you must specify 'SaveSupportVectors',true for the learner with a linear kernel. • Categorical predictors (logical, categorical, char, string, or cell) are not supported. You cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. • Class labels with the categorical data type are not supported. Both the class label value in the training data (Tbl or Y) and the value of the ClassNames name-value argument cannot be an array with the categorical data type. • For more details, see ClassificationECOCCoderConfigurer. For information on namevalue arguments that you cannot modify when you retrain a model, see “Tips” on page 358437. For more information, see “Introduction to Code Generation” on page 34-3. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • The following object functions fully support GPU arrays: • discardSupportVectors • gather • The following object functions offer limited support for GPU arrays: • compareHoldout • edge • loss • margin • partialDependence • plotPartialDependence • predict • The object functions execute on a GPU if either of the following apply: • The model was fitted with GPU arrays. • The predictor data that you pass to the object function is a GPU array. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also ClassificationECOC | fitcecoc | compact | ClassificationPartitionedLinearECOC | ClassificationPartitionedKernelECOC | ClassificationPartitionedECOC 35-1028

CompactClassificationEnsemble

CompactClassificationEnsemble Package: classreg.learning.classif Compact classification ensemble class

Description Compact version of a classification ensemble (of class ClassificationEnsemble). The compact version does not include the data for training the classification ensemble. Therefore, you cannot perform some tasks with a compact classification ensemble, such as cross validation. Use a compact classification ensemble for making predictions (classifications) of new data.

Construction ens = compact(fullEns) constructs a compact decision ensemble from a full decision ensemble. Input Arguments fullEns A classification ensemble created by fitcensemble.

Properties CategoricalPredictors Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). ClassNames List of the elements in Y with duplicates removed. ClassNames can be a numeric vector, vector of categorical variables, logical vector, character array, or cell array of character vectors. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.) CombineWeights Character vector describing how ens combines weak learner weights, either 'WeightedSum' or 'WeightedAverage'. Cost Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. This property is readonly. 35-1029

35

Functions

ExpandedPredictorNames Expanded predictor names, stored as a cell array of character vectors. If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. NumTrained Number of trained weak learners in ens, a scalar. PredictorNames A cell array of names for the predictor variables, in the order in which they appear in X. Prior Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames. The number of elements of Prior is the number of unique classes in the response. This property is read-only. ResponseName Character vector with the name of the response variable Y. ScoreTransform Function handle for transforming scores, or character vector representing a built-in transformation function. 'none' means no transformation; equivalently, 'none' means @(x)x. For a list of built-in transformation functions and the syntax of custom transformation functions, see fitctree. Add or change a ScoreTransform function using dot notation: ens.ScoreTransform = 'function'

or ens.ScoreTransform = @function

Trained A cell vector of trained classification models. • If Method is 'LogitBoost' or 'GentleBoost', then CompactClassificationEnsemble stores trained learner j in the CompactRegressionLearner property of the object stored in Trained{j}. That is, to access trained learner j, use ens.Trained{j}.CompactRegressionLearner. • Otherwise, cells of the cell vector contain the corresponding, compact classification models. TrainedWeights Numeric vector of trained weights for the weak learners in ens. TrainedWeights has T elements, where T is the number of weak learners in learners. 35-1030

CompactClassificationEnsemble

UsePredForLearner Logical matrix of size P-by-NumTrained, where P is the number of predictors (columns) in the training data X. UsePredForLearner(i,j) is true when learner j uses predictor i, and is false otherwise. For each learner, the predictors have the same order as the columns in the training data X. If the ensemble is not of type Subspace, all entries in UsePredForLearner are true.

Object Functions compareHoldout edge gather lime loss margin partialDependence plotPartialDependence predict predictorImportance removeLearners shapley

Compare accuracies of two classification models using new data Classification edge for classification ensemble model Gather properties of Statistics and Machine Learning Toolbox object from GPU Local interpretable model-agnostic explanations (LIME) Classification loss for classification ensemble model Classification margins for classification ensemble model Compute partial dependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Classify observations using ensemble of classification models Estimates of predictor importance for classification ensemble of decision trees Remove members of compact classification ensemble Shapley values

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects.

Examples Reduce Size of Classification Ensemble Create a compact classification ensemble for efficiently making predictions on new data. Load the ionosphere data set. load ionosphere

Train a boosted ensemble of 100 classification trees using all measurements and the AdaBoostM1 method. Mdl = fitcensemble(X,Y,Method="AdaBoostM1") Mdl = ClassificationEnsemble ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: NumTrained:

'Y' [] {'b' 'g'} 'none' 351 100

35-1031

35

Functions

Method: LearnerNames: ReasonForTermination: FitInfo: FitInfoDescription:

'AdaBoostM1' {'Tree'} 'Terminated normally after completing the requested number of training [100x1 double] {2x1 cell}

Mdl is a ClassificationEnsemble model object that contains the training data, among other things. Create a compact version of Mdl. CMdl = compact(Mdl) CMdl = CompactClassificationEnsemble ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' NumTrained: 100

CMdl is a CompactClassificationEnsemble model object. CMdl is almost the same as Mdl. One exception is that CMdl does not store the training data. Compare the amounts of space consumed by Mdl and CMdl. mdlInfo = whos("Mdl"); cMdlInfo = whos("CMdl"); [mdlInfo.bytes cMdlInfo.bytes] ans = 1×2 895597

648755

Mdl consumes more space than CMdl. CMdl.Trained stores the trained classification trees (CompactClassificationTree model objects) that compose Mdl. Display a graph of the first tree in the compact ensemble. view(CMdl.Trained{1},Mode="graph");

35-1032

CompactClassificationEnsemble

By default, fitcensemble grows shallow trees for boosted ensembles of trees. Predict the label of the mean of X using the compact ensemble. predMeanX = predict(CMdl,mean(X)) predMeanX = 1x1 cell array {'g'}

Tip For an ensemble of classification trees, the Trained property of ens stores an ens.NumTrainedby-1 cell vector of compact classification models. For a textual or graphical display of tree t in the cell vector, enter: • view(ens.Trained{t}.CompactRegressionLearner) for ensembles aggregated using LogitBoost or GentleBoost. • view(ens.Trained{t}) for all other aggregation methods. 35-1033

35

Functions

Version History Introduced in R2011a R2022a: Cost property stores the user-specified cost matrix Behavior changed in R2022a Starting in R2022a, the Cost property stores the user-specified cost matrix, so that you can compute the observed misclassification cost using the specified cost value. The software stores normalized prior probabilities (Prior) that do not reflect the penalties described in the cost matrix. To compute the observed misclassification cost, specify the LossFun name-value argument as "classifcost" when you call the loss function. Note that model training has not changed and, therefore, the decision boundaries between classes have not changed. For training, the fitting function updates the specified prior probabilities by incorporating the penalties described in the specified cost matrix, and then normalizes the prior probabilities and observation weights. This behavior has not changed. In previous releases, the software stored the default cost matrix in the Cost property and stored the prior probabilities used for training in the Prior property. Starting in R2022a, the software stores the user-specified cost matrix without modification, and stores normalized prior probabilities that do not reflect the cost penalties. For more details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 198. Some object functions use the Cost and Prior properties: • The loss function uses the cost matrix stored in the Cost property if you specify the LossFun name-value argument as "classifcost" or "mincost". • The loss and edge functions use the prior probabilities stored in the Prior property to normalize the observation weights of the input data. If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases. If you want the software to handle the cost matrix, prior probabilities, and observation weights in the same way as in previous releases, adjust the prior probabilities and observation weights for the nondefault cost matrix, as described in “Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix” on page 19-9. Then, when you train a classification model, specify the adjusted prior probabilities and observation weights by using the Prior and Weights name-value arguments, respectively, and use the default cost matrix.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict function supports code generation. • To integrate the prediction of an ensemble into Simulink, you can use the ClassificationEnsemble Predict block in the Statistics and Machine Learning Toolbox library or a MATLAB Function block with the predict function. 35-1034

CompactClassificationEnsemble

• When you train an ensemble by using fitcensemble, the following restrictions apply. • The value of the ScoreTransform name-value argument cannot be an anonymous function. • Code generation limitations for the weak learners used in the ensemble also apply to the ensemble. • For decision tree weak learners, you cannot use surrogate splits; that is, the value of the Surrogate name-value argument must be 'off'. • For k-nearest neighbor weak learners, the value of the Distance name-value argument cannot be a custom distance function. The value of the DistanceWeight name-value argument can be a custom distance weight function, but it cannot be an anonymous function. • For fixed-point code generation, the following additional restrictions apply. • When you train an ensemble by using fitcensemble, you must train an ensemble using tree learners, and the ScoreTransform value cannot be 'invlogit'. • Categorical predictors (logical, categorical, char, string, or cell) are not supported. You cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. • Class labels with the categorical data type are not supported. Both the class label value in the training data (Tbl or Y) and the value of the ClassNames name-value argument cannot be an array with the categorical data type. For more information, see “Introduction to Code Generation” on page 34-3. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • The following object functions fully support GPU arrays: • gather • predictorImportance • removeLearners • The following object functions offer limited support for GPU arrays: • compareHoldout • edge • loss • margin • partialDependence • plotPartialDependence • predict • The object functions execute on a GPU if either of the following apply: • The model was fitted with GPU arrays. • The predictor data that you pass to the object function is a GPU array. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). 35-1035

35

Functions

See Also fitcensemble | ClassificationEnsemble | predict | compact | fitctree | view | compareHoldout

35-1036

ClassificationGAM

ClassificationGAM Generalized additive model (GAM) for binary classification

Description A ClassificationGAM object is a generalized additive model on page 35-1048 (GAM) object for binary classification. It is an interpretable model that explains class scores (the logit of class probabilities) using a sum of univariate and bivariate shape functions. You can classify new observations by using the predict function, and plot the effect of each shape function on the prediction (class score) for an observation by using the plotLocalEffects function. For the full list of object functions for ClassificationGAM, see “Object Functions” on page 351043.

Creation Create a ClassificationGAM object by using fitcgam. You can specify both linear terms and interaction terms for predictors to include univariate shape functions (predictor trees) and bivariate shape functions (interaction trees) in a trained model, respectively. You can update a trained model by using resume or addInteractions. • The resume function resumes training for the existing terms in a model. • The addInteractions function adds interaction terms to a model that contains only linear terms.

Properties GAM Properties BinEdges — Bin edges for numeric predictors cell array of numeric vectors | [] This property is read-only. Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors. The software bins numeric predictors only if you specify the 'NumBins' name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default). You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl. X = mdl.X; % Predictor data Xbinned = zeros(size(X));

35-1037

35

Functions

edges = mdl.BinEdges; % Find indices of binned predictors. idxNumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxNumeric) idxNumeric = idxNumeric'; end for j = idxNumeric x = X(:,j); % Convert x to array if x is a table. if istable(x) x = table2array(x); end % Group x into bins by using the discretize function. xbinned = discretize(x,[-inf; edges{j}; inf]); Xbinned(:,j) = xbinned; end

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs. Data Types: cell Interactions — Interaction term indices two-column matrix of positive integers | [] This property is read-only. Interaction term indices, specified as a t-by-2 matrix of positive integers, where t is the number of interaction terms in the model. Each row of the matrix represents one interaction term and contains the column indexes of the predictor data X for the interaction term. If the model does not include an interaction term, then this property is empty ([]). The software adds interaction terms to the model in the order of importance based on the p-values. Use this property to check the order of the interaction terms added to the model. Data Types: double Intercept — Intercept term of model numeric scalar This property is read-only. Intercept (constant) term of the model, which is the sum of the intercept terms in the predictor trees and interaction trees, specified as a numeric scalar. Data Types: single | double ModelParameters — Parameters used to train model model parameter object This property is read-only. Parameters used to train the model, specified as a model parameter object. ModelParameters contains parameter values such as those for the name-value arguments used to train the model. ModelParameters does not contain estimated parameters. 35-1038

ClassificationGAM

Access the fields of ModelParameters by using dot notation. For example, access the maximum number of decision splits per interaction tree by using Mdl.ModelParameters.MaxNumSplitsPerInteraction. PairDetectionBinEdges — Bin edges for interaction term detection cell array of numeric vectors This property is read-only. Bin edges for interaction term detection for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors. To speed up the interaction term detection process, the software bins numeric predictors into at most 8 equiprobable bins. The number of bins can be less than 8 if a predictor has fewer than 8 unique values. Data Types: cell ReasonForTermination — Reason training stops structure This property is read-only. Reason training the model stops, specified as a structure with two fields, PredictorTrees and InteractionTrees. Use this property to check if the model contains the specified number of trees for each linear term ('NumTreesPerPredictor') and for each interaction term ('NumTreesPerInteraction'). If the fitcgam function terminates training before adding the specified number of trees, this property contains the reason for the termination. Data Types: struct Other Classification Properties CategoricalPredictors — Categorical predictor indices vector of positive integers | [] This property is read-only. Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). Data Types: double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. 35-1039

35

Functions

(The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: single | double | logical | char | cell | categorical Cost — Misclassification costs 2-by-2 numeric matrix Misclassification costs, specified as a 2-by-2 numeric matrix. Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The software uses the Cost value for prediction, but not training. You can change the value by using dot notation. Example: Mdl.Cost = C; Data Types: double ExpandedPredictorNames — Expanded predictor names cell array of character vectors This property is read-only. Expanded predictor names, specified as a cell array of character vectors. ExpandedPredictorNames is the same as PredictorNames for a generalized additive model. Data Types: cell NumObservations — Number of observations numeric scalar This property is read-only. Number of observations in the training data stored in X and Y, specified as a numeric scalar. Data Types: double PredictorNames — Predictor variable names cell array of character vectors This property is read-only. Predictor variable names, specified as a cell array of character vectors. The order of the elements in PredictorNames corresponds to the order in which the predictor names appear in the training data. Data Types: cell Prior — Prior class probabilities numeric vector This property is read-only. Prior class probabilities, specified as a numeric vector with two elements. The order of the elements corresponds to the order of the elements in ClassNames. Data Types: double 35-1040

ClassificationGAM

ResponseName — Response variable name character vector This property is read-only. Response variable name, specified as a character vector. Data Types: char RowsUsed — Rows used in fitting [] | logical vector This property is read-only. Rows of the original training data used in fitting the ClassificationGAM model, specified as a logical vector. This property is empty if all rows are used. Data Types: logical ScoreTransform — Score transformation character vector | function handle Score transformation, specified as a character vector or function handle. ScoreTransform represents a built-in transformation function or a function handle for transforming predicted classification scores. To change the score transformation function to function, for example, use dot notation. • For a built-in function, enter a character vector. Mdl.ScoreTransform = 'function';

This table describes the available built-in functions. Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

35-1041

35

Functions

function must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). This property determines the output score computation for object functions such as predict, margin, and edge. Use 'logit' to compute posterior probabilities, and use 'none' to compute the logit of posterior probabilities. Data Types: char | function_handle W — Observation weights numeric vector This property is read-only. Observation weights used to train the model, specified as an n-by-1 numeric vector. n is the number of observations (NumObservations). The software normalizes the observation weights specified in the 'Weights' name-value argument so that the elements of W within a particular class sum up to the prior probability of that class. Data Types: double X — Predictors numeric matrix | table This property is read-only. Predictors used to train the model, specified as a numeric matrix or table. Each row of X corresponds to one observation, and each column corresponds to one variable. Data Types: single | double | table Y — Class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Class labels used to train the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y has the same data type as the response variable used to train the model. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the corresponding row of X. Data Types: single | double | logical | char | cell | categorical Hyperparameter Optimization Properties HyperparameterOptimizationResults — Description of cross-validation optimization of hyperparameters BayesianOptimization object | table This property is read-only. Description of the cross-validation optimization of hyperparameters, specified as a BayesianOptimization object or a table of hyperparameters and associated values. This property is nonempty when the 'OptimizeHyperparameters' name-value argument of fitcgam is not 35-1042

ClassificationGAM

'none' (default) when the object is created. The value of HyperparameterOptimizationResults depends on the setting of the Optimizer field in the HyperparameterOptimizationOptions structure of fitcgam when the object is created. Value of Optimizer Field

Value of HyperparameterOptimizationResults

'bayesopt' (default)

Object of class BayesianOptimization

'gridsearch' or 'randomsearch'

Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

Object Functions Create CompactClassificationGAM compact

Reduce size of machine learning model

Create ClassificationPartitionedGAM crossval

Cross-validate machine learning model

Update GAM addInteractions resume

Add interaction terms to univariate generalized additive model (GAM) Resume training of generalized additive model (GAM)

Interpret Prediction lime partialDependence plotLocalEffects plotPartialDependence shapley

Local interpretable model-agnostic explanations (LIME) Compute partial dependence Plot local effects of terms in generalized additive model (GAM) Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Shapley values

Assess Predictive Performance on New Observations predict loss margin edge

Classify observations using generalized additive model (GAM) Classification loss for generalized additive model (GAM) Classification margins for generalized additive model (GAM) Classification edge for generalized additive model (GAM)

Assess Predictive Performance on Training Data resubPredict resubLoss resubMargin resubEdge

Classify training data using trained classifier Resubstitution classification loss Resubstitution classification margin Resubstitution classification edge

Compare Accuracies compareHoldout

Compare accuracies of two classification models using new data 35-1043

35

Functions

testckfold

Compare accuracies of two classification models by repeated cross-validation

Examples Train Generalized Additive Model Train a univariate generalized additive model, which contains linear terms for predictors. Then, interpret the prediction for a specified data instance by using the plotLocalEffects function. Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). load ionosphere

Train a univariate GAM that identifies whether the radar return is bad ('b') or good ('g'). Mdl = fitcgam(X,Y) Mdl = ClassificationGAM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: Intercept: NumObservations:

'Y' [] {'b' 'g'} 'logit' 2.2715 351

Mdl is a ClassificationGAM model object. The model display shows a partial list of the model properties. To view the full list of properties, double-click the variable name Mdl in the Workspace. The Variables editor opens for Mdl. Alternatively, you can display the properties in the Command Window by using dot notation. For example, display the class order of Mdl. classOrder = Mdl.ClassNames classOrder = 2x1 cell {'b'} {'g'}

Classify the first observation of the training data, and plot the local effects of the terms in Mdl on the prediction. label = predict(Mdl,X(1,:)) label = 1x1 cell array {'g'} plotLocalEffects(Mdl,X(1,:))

35-1044

ClassificationGAM

The predict function classifies the first observation X(1,:) as 'g'. The plotLocalEffects function creates a horizontal bar graph that shows the local effects of the 10 most important terms on the prediction. Each local effect value shows the contribution of each term to the classification score for 'g', which is the logit of the posterior probability that the classification is 'g' for the observation.

Train GAM with Interaction Terms Train a generalized additive model that contains linear and interaction terms for predictors in three different ways: • Specify the interaction terms using the formula input argument. • Specify the 'Interactions' name-value argument. • Build a model with linear terms first and add interaction terms to the model by using the addInteractions function. Load Fisher's iris data set. Create a table that contains observations for versicolor and virginica. load fisheriris inds = strcmp(species,'versicolor') | strcmp(species,'virginica'); tbl = array2table(meas(inds,:),'VariableNames',["x1","x2","x3","x4"]); tbl.Y = species(inds,:);

35-1045

35

Functions

Specify formula Train a GAM that contains the four linear terms (x1, x2, x3, and x4) and two interaction terms (x1*x2 and x2*x3). Specify the terms using a formula in the form 'Y ~ terms'. Mdl1 = fitcgam(tbl,'Y ~ x1 + x2 + x3 + x4 + x1:x2 + x2:x3');

The function adds interaction terms to the model in the order of importance. You can use the Interactions property to check the interaction terms in the model and the order in which fitcgam adds them to the model. Display the Interactions property. Mdl1.Interactions ans = 2×2 2 1

3 2

Each row of Interactions represents one interaction term and contains the column indexes of the predictor variables for the interaction term. Specify 'Interactions' Pass the training data (tbl) and the name of the response variable in tbl to fitcgam, so that the function includes the linear terms for all the other variables as predictors. Specify the 'Interactions' name-value argument using a logical matrix to include the two interaction terms, x1*x2 and x2*x3. Mdl2 = fitcgam(tbl,'Y','Interactions',logical([1 1 0 0; 0 1 1 0])); Mdl2.Interactions ans = 2×2 2 1

3 2

You can also specify 'Interactions' as the number of interaction terms or as 'all' to include all available interaction terms. Among the specified interaction terms, fitcgam identifies those whose pvalues are not greater than the 'MaxPValue' value and adds them to the model. The default 'MaxPValue' is 1 so that the function adds all specified interaction terms to the model. Specify 'Interactions','all' and set the 'MaxPValue' name-value argument to 0.01. Mdl3 = fitcgam(tbl,'Y','Interactions','all','MaxPValue',0.01); Mdl3.Interactions ans = 5×2 3 2 1 2 1

4 4 4 3 3

Mdl3 includes five of the six available pairs of interaction terms. 35-1046

ClassificationGAM

Use addInteractions Function Train a univariate GAM that contains linear terms for predictors, and then add interaction terms to the trained model by using the addInteractions function. Specify the second input argument of addInteractions in the same way you specify the 'Interactions' name-value argument of fitcgam. You can specify the list of interaction terms using a logical matrix, the number of interaction terms, or 'all'. Specify the number of interaction terms as 5 to add the five most important interaction terms to the trained model. Mdl4 = fitcgam(tbl,'Y'); UpdatedMdl4 = addInteractions(Mdl4,5); UpdatedMdl4.Interactions ans = 5×2 3 2 1 2 1

4 4 4 3 3

Mdl4 is a univariate GAM, and UpdatedMdl4 is an updated GAM that contains all the terms in Mdl4 and five additional interaction terms.

Resume Training Predictor Trees in GAM Train a univariate classification GAM (which contains only linear terms) for a small number of iterations. After training the model for more iterations, compare the resubstitution loss. Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). load ionosphere

Train a univariate GAM that identifies whether the radar return is bad ('b') or good ('g'). Specify the number of trees per linear term as 2. fitcgam iterates the boosting algorithm for the specified number of iterations. For each boosting iteration, the function adds one tree per linear term. Specify 'Verbose' as 2 to display diagnostic messages at every iteration. Mdl = fitcgam(X,Y,'NumTreesPerPredictor',2,'Verbose',2); |========================================================| | Type | NumTrees | Deviance | RelTol | LearnRate | |========================================================| | 1D| 0| 486.59| | | | 1D| 1| 166.71| Inf| 1| | 1D| 2| 78.336| 0.58205| 1|

To check whether fitcgam trains the specified number of trees, display the ReasonForTermination property of the trained model and view the displayed message. Mdl.ReasonForTermination

35-1047

35

Functions

ans = struct with fields: PredictorTrees: 'Terminated after training the requested number of trees.' InteractionTrees: ''

Compute the classification loss for the training data. resubLoss(Mdl) ans = 0.0142

Resume training the model for another 100 iterations. Because Mdl contains only linear terms, the resume function resumes training for the linear terms and adds more trees for them (predictor trees). Specify 'Verbose' and 'NumPrint' to display diagnostic messages at every 10 iterations. UpdatedMdl = resume(Mdl,100,'Verbose',1,'NumPrint',10); |========================================================| | Type | NumTrees | Deviance | RelTol | LearnRate | |========================================================| | 1D| 0| 78.336| | | | 1D| 1| 38.364| 0.17429| 1| | 1D| 10| 0.16311| 0.011894| 1| | 1D| 20| 0.00035693| 0.0025178| 1| | 1D| 30| 8.1191e-07| 0.0011006| 1| | 1D| 40| 1.7978e-09| 0.00074607| 1| | 1D| 50| 3.6113e-12| 0.00034404| 1| | 1D| 60| 1.7497e-13| 0.00016541| 1| UpdatedMdl.ReasonForTermination ans = struct with fields: PredictorTrees: 'Unable to improve the model fit.' InteractionTrees: ''

resume terminates training when adding more trees does not improve the deviance of the model fit. Compute the classification loss using the updated model. resubLoss(UpdatedMdl) ans = 0

The classification loss decreases after resume updates the model with more iterations.

More About Generalized Additive Model (GAM) for Binary Classification A generalized additive model (GAM) is an interpretable model that explains class scores (the logit of class probabilities) using a sum of univariate and bivariate shape functions of predictors. fitcgam uses a boosted tree as a shape function for each predictor and, optionally, each pair of predictors; therefore, the function can capture a nonlinear relation between a predictor and the response variable. Because contributions of individual shape functions to the prediction (classification score) are well separated, the model is easy to interpret. 35-1048

ClassificationGAM

The standard GAM uses a univariate shape function for each predictor. y Binomial(n, μ) μ g(μ) = log = c + f 1(x1) + f 2(x2) + ⋯ + f p(xp), 1−μ where y is a response variable that follows the binomial distribution with the probability of success (probability of positive class) μ in n observations. g(μ) is a logit link function, and c is an intercept (constant) term. fi(xi) is a univariate shape function for the ith predictor, which is a boosted tree for a linear term for the predictor (predictor tree). You can include interactions between predictors in a model by adding bivariate shape functions of important interaction terms to the model. g(μ) = c + f 1(x1) + f 2(x2) + ⋯ + f p(xp) +

∑

i, j ∈ 1, 2, ⋯, p

f i j(xix j),

where fij(xixj) is a bivariate shape function for the ith and jth predictors, which is a boosted tree for an interaction term for the predictors (interaction tree). fitcgam finds important interaction terms based on the p-values of F-tests. For details, see “Interaction Term Detection” on page 35-2321.

Version History Introduced in R2021a

References [1] Lou, Yin, Rich Caruana, and Johannes Gehrke. "Intelligible Models for Classification and Regression." Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’12). Beijing, China: ACM Press, 2012, pp. 150–158. [2] Lou, Yin, Rich Caruana, Johannes Gehrke, and Giles Hooker. "Accurate Intelligible Models with Pairwise Interactions." Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’13) Chicago, Illinois, USA: ACM Press, 2013, pp. 623–631.

See Also CompactClassificationGAM | ClassificationPartitionedGAM | fitcgam | resume | addInteractions Topics “Train Generalized Additive Model for Binary Classification” on page 12-77

35-1049

35

Functions

CompactClassificationNaiveBayes Compact naive Bayes classifier for multiclass classification

Description CompactClassificationNaiveBayes is a compact version of the naive Bayes classifier. The compact classifier does not include the data used for training the naive Bayes classifier. Therefore, you cannot perform some tasks, such as cross-validation, using the compact classifier. Use a compact naive Bayes classifier for tasks such as predicting the labels of the data.

Creation Create a CompactClassificationNaiveBayes model from a full, trained ClassificationNaiveBayes classifier by using compact.

Properties Predictor Properties PredictorNames — Predictor names cell array of character vectors This property is read-only. Predictor names, specified as a cell array of character vectors. The order of the elements in PredictorNames corresponds to the order in which the predictor names appear in the training data X. ExpandedPredictorNames — Expanded predictor names cell array of character vectors This property is read-only. Expanded predictor names, specified as a cell array of character vectors. If the model uses dummy variable encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. CategoricalPredictors — Categorical predictor indices vector of positive integers | [] This property is read-only. Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). 35-1050

CompactClassificationNaiveBayes

Data Types: single | double CategoricalLevels — Multivariate multinomial levels cell array This property is read-only. Multivariate multinomial levels, specified as a cell array. The length of CategoricalLevels is equal to the number of predictors (size(X,2)). The cells of CategoricalLevels correspond to predictors that you specify as 'mvmn' during training, that is, they have a multivariate multinomial distribution. Cells that do not correspond to a multivariate multinomial distribution are empty ([]). If predictor j is multivariate multinomial, then CategoricalLevels{j} is a list of all distinct values of predictor j in the sample. NaNs are removed from unique(X(:,j)). Predictor Distribution Properties DistributionNames — Predictor distributions 'normal' (default) | 'kernel' | 'mn' | 'mvmn' | cell array of character vectors This property is read-only. Predictor distributions, specified as a character vector or cell array of character vectors. fitcnb uses the predictor distributions to model the predictors. This table lists the available distributions. Value

Description

'kernel'

Kernel smoothing density estimate

'mn'

Multinomial distribution. If you specify mn, then all features are components of a multinomial distribution. Therefore, you cannot include 'mn' as an element of a string array or a cell array of character vectors. For details, see “Estimated Probability for Multinomial Distribution” on page 35-1059.

'mvmn'

Multivariate multinomial distribution. For details, see “Estimated Probability for Multivariate Multinomial Distribution” on page 35-1060.

'normal'

Normal (Gaussian) distribution

If DistributionNames is a 1-by-P cell array of character vectors, then fitcnb models the feature j using the distribution in element j of the cell array. Example: 'mn' Example: {'kernel','normal','kernel'} Data Types: char | string | cell DistributionParameters — Distribution parameter estimates cell array This property is read-only. 35-1051

35

Functions

Distribution parameter estimates, specified as a cell array. DistributionParameters is a K-by-D cell array, where cell (k,d) contains the distribution parameter estimates for instances of predictor d in class k. The order of the rows corresponds to the order of the classes in the property ClassNames, and the order of the predictors corresponds to the order of the columns of X. If class k has no observations for predictor j, then the Distribution{k,j} is empty ([]). The elements of DistributionParameters depend on the distributions of the predictors. This table describes the values in DistributionParameters{k,j}. Distribution of Predictor j

Value of Cell Array for Predictor j and Class k

kernel

A KernelDistribution model. Display properties using cell indexing and dot notation. For example, to display the estimated bandwidth of the kernel density for predictor 2 in the third class, use Mdl.DistributionParameters{3,2}.Bandwidth.

mn

A scalar representing the probability that token j appears in class k. For details, see “Estimated Probability for Multinomial Distribution” on page 35-1059.

mvmn

A numeric vector containing the probabilities for each possible level of predictor j in class k. The software orders the probabilities by the sorted order of all unique levels of predictor j (stored in the property CategoricalLevels). For more details, see “Estimated Probability for Multivariate Multinomial Distribution” on page 35-1060.

normal

A 2-by-1 numeric vector. The first element is the sample mean and the second element is the sample standard deviation. For more details, see “Normal Distribution Estimators” on page 35-1059

Kernel — Kernel smoother type 'normal' (default) | 'box' | cell array | ... This property is read-only. Kernel smoother type, specified as the name of a kernel or a cell array of kernel names. The length of Kernel is equal to the number of predictors (size(X,2)). Kernel{j} corresponds to predictor j and contains a character vector describing the type of kernel smoother. If a cell is empty ([]), then fitcnb did not fit a kernel distribution to the corresponding predictor. This table describes the supported kernel smoother types. I{u} denotes the indicator function. Value

Kernel

'box'

Box (uniform)

'epanechnik Epanechnikov ov' 'normal'

Gaussian

'triangle'

Triangular

Formula f (x) = 0.5I x ≤ 1 f (x) = 0.75 1 − x2 I x ≤ 1 f (x) =

f (x) = 1 − x I x ≤ 1

Example: 'box' Example: {'epanechnikov','normal'} 35-1052

1 exp −0.5x2 2π

CompactClassificationNaiveBayes

Data Types: char | string | cell Mu — Predictor means numeric vector | [] This property is read-only. Predictor means, specified as a numeric vector. If you specify Standardize as 1 or true when you train the naive Bayes classifier using fitcnb, then the length of the Mu vector is equal to the number of predictors. The vector contains 0 values for predictors with nonkernel distributions, such as categorical predictors (see DistributionNames). If you set Standardize to 0 or false when you train the naive Bayes classifier using fitcnb, then the Mu value is an empty vector ([]). Data Types: double Sigma — Predictor standard deviations numeric vector | [] This property is read-only. Predictor standard deviations, specified as a numeric vector. If you specify Standardize as 1 or true when you train the naive Bayes classifier using fitcnb, then the length of the Sigma vector is equal to the number of predictors. The vector contains 1 values for predictors with nonkernel distributions, such as categorical predictors (see DistributionNames). If you set Standardize to 0 or false when you train the naive Bayes classifier using fitcnb, then the Sigma value is an empty vector ([]). Data Types: double Support — Kernel smoother density support cell array This property is read-only. Kernel smoother density support, specified as a cell array. The length of Support is equal to the number of predictors (size(X,2)). The cells represent the regions to which fitcnb applies the kernel density. If a cell is empty ([]), then fitcnb did not fit a kernel distribution to the corresponding predictor. This table describes the supported options. Value

Description

1-by-2 numeric row vector

The density support applies to the specified bounds, for example [L,U], where L and U are the finite lower and upper bounds, respectively.

'positive'

The density support applies to all positive real values.

'unbounded'

The density support applies to all real values.

Width — Kernel smoother window width numeric matrix This property is read-only. 35-1053

35

Functions

Kernel smoother window width, specified as a numeric matrix. Width is a K-by-P matrix, where K is the number of classes in the data, and P is the number of predictors (size(X,2)). Width(k,j) is the kernel smoother window width for the kernel smoothing density of predictor j within class k. NaNs in column j indicate that fitcnb did not fit predictor j using a kernel density. Response Properties ClassNames — Unique class names categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Unique class names used in the training model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as Y, and has K elements (or rows) for character arrays. (The software treats string arrays as cell arrays of character vectors.) Data Types: categorical | char | string | logical | double | cell ResponseName — Response variable name character vector This property is read-only. Response variable name, specified as a character vector. Data Types: char | string Training Properties Prior — Prior probabilities numeric vector Prior probabilities, specified as a numeric vector. The order of the elements in Prior corresponds to the elements of Mdl.ClassNames. fitcnb normalizes the prior probabilities you set using the 'Prior' name-value pair argument, so that sum(Prior) = 1. The value of Prior does not affect the best-fitting model. Therefore, you can reset Prior after training Mdl using dot notation. Example: Mdl.Prior = [0.2 0.8] Data Types: double | single Classifier Properties Cost — Misclassification cost square matrix Misclassification cost, specified as a numeric square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i. The rows correspond to the true class and the columns correspond to the predicted class. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. 35-1054

CompactClassificationNaiveBayes

The misclassification cost matrix must have zeros on the diagonal. The value of Cost does not influence training. You can reset Cost after training Mdl using dot notation. Example: Mdl.Cost = [0 0.5 ; 1 0] Data Types: double | single ScoreTransform — Classification score transformation 'none' (default) | 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | function handle | ... Classification score transformation, specified as a character vector or function handle. This table summarizes the available character vectors. Value

Description

"doublelogit"

1/(1 + e–2x)

"invlogit"

log(x / (1 – x))

"ismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

"logit"

1/(1 + e–x)

"none" or "identity"

x (no transformation)

"sign"

–1 for x < 0 0 for x = 0 1 for x > 0

"symmetric"

2x – 1

"symmetricismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

"symmetriclogit"

2/(1 + e–x) – 1

For a MATLAB function or a function you define, use its function handle for the score transformation. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Example: Mdl.ScoreTransform = 'logit' Data Types: char | string | function handle

Object Functions compareHoldout edge lime logp loss margin partialDependence plotPartialDependence predict shapley

Compare accuracies of two classification models using new data Classification edge for naive Bayes classifier Local interpretable model-agnostic explanations (LIME) Log unconditional probability density for naive Bayes classifier Classification loss for naive Bayes classifier Classification margins for naive Bayes classifier Compute partial dependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Classify observations using naive Bayes classifier Shapley values 35-1055

35

Functions

Examples Reduce Size of Naive Bayes Classifier Reduce the size of a full naive Bayes classifier by removing the training data. Full naive Bayes classifiers hold the training data. You can use a compact naive Bayes classifier to improve memory efficiency. Load the ionosphere data set. Remove the first two predictors for stability. load ionosphere X = X(:,3:end);

Train a naive Bayes classifier using the predictors X and class labels Y. A recommended practice is to specify the class names. fitcnb assumes that each predictor is conditionally and normally distributed. Mdl = fitcnb(X,Y,'ClassNames',{'b','g'}) Mdl = ClassificationNaiveBayes ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: DistributionNames: DistributionParameters:

'Y' [] {'b' 'g'} 'none' 351 {1x32 cell} {2x32 cell}

Mdl is a trained ClassificationNaiveBayes classifier. Reduce the size of the naive Bayes classifier. CMdl = compact(Mdl) CMdl = CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell}

CMdl is a trained CompactClassificationNaiveBayes classifier. Display the amount of memory used by each classifier. whos('Mdl','CMdl') Name

35-1056

Size

Bytes

Class

CompactClassificationNaiveBayes

CMdl Mdl

1x1 1x1

15229 111359

classreg.learning.classif.CompactClassificationNaiveBayes ClassificationNaiveBayes

The full naive Bayes classifier (Mdl) is more than seven times larger than the compact naive Bayes classifier (CMdl). To label new observations efficiently, you can remove Mdl from the MATLAB® Workspace, and then pass CMdl and new predictor values to predict.

Train and Cross-Validate Naive Bayes Classifier Train and cross-validate a naive Bayes classifier. fitcnb implements 10-fold cross-validation by default. Then, estimate the cross-validated classification error. Load the ionosphere data set. Remove the first two predictors for stability. load ionosphere X = X(:,3:end); rng('default') % for reproducibility

Train and cross-validate a naive Bayes classifier using the predictors X and class labels Y. A recommended practice is to specify the class names. fitcnb assumes that each predictor is conditionally and normally distributed. CVMdl = fitcnb(X,Y,'ClassNames',{'b','g'},'CrossVal','on') CVMdl = ClassificationPartitionedModel CrossValidatedModel: 'NaiveBayes' PredictorNames: {'x1' 'x2' 'x3' ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none'

'x4'

'x5'

'x6'

'x7'

'x8'

'x9'

'x10'

CVMdl is a ClassificationPartitionedModel cross-validated, naive Bayes classifier. Alternatively, you can cross-validate a trained ClassificationNaiveBayes model by passing it to crossval. Display the first training fold of CVMdl using dot notation. CVMdl.Trained{1} ans = CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell}

35-1057

'x11'

'x1

35

Functions

Each fold is a CompactClassificationNaiveBayes model trained on 90% of the data. Full and compact naive Bayes models are not used for predicting on new data. Instead, use them to estimate the generalization error by passing CVMdl to kfoldLoss. genError = kfoldLoss(CVMdl) genError = 0.1852

On average, the generalization error is approximately 19%. You can specify a different conditional distribution for the predictors, or tune the conditional distribution parameters to reduce the generalization error.

More About Bag-of-Tokens Model In the bag-of-tokens model, the value of predictor j is the nonnegative number of occurrences of token j in the observation. The number of categories (bins) in the multinomial model is the number of distinct tokens (number of predictors). Naive Bayes Naive Bayes is a classification algorithm that applies density estimation to the data. The algorithm leverages Bayes theorem, and (naively) assumes that the predictors are conditionally independent, given the class. Although the assumption is usually violated in practice, naive Bayes classifiers tend to yield posterior distributions that are robust to biased class density estimates, particularly where the posterior is 0.5 (the decision boundary) [1]. Naive Bayes classifiers assign observations to the most probable class (in other words, the maximum a posteriori decision rule). Explicitly, the algorithm takes these steps: 1

Estimate the densities of the predictors within each class.

2

Model posterior probabilities according to Bayes rule. That is, for all k = 1,...,K, π Y=k P Y = k X1, .., XP =

∑

P

∏

j=1 K

k=1

π Y=k

P Xj Y = k P

∏

j=1

, P Xj Y = k

where: • Y is the random variable corresponding to the class index of an observation. • X1,...,XP are the random predictors of an observation. • π Y = k is the prior probability that a class index is k. 3

35-1058

Classify an observation by estimating the posterior probability for each class, and then assign the observation to the class yielding the maximum posterior probability.

CompactClassificationNaiveBayes

If the predictors compose a multinomial distribution, then the posterior probability P Y = k X1, .., XP ∝ π Y = k Pmn X1, ..., XP Y = k , where Pmn X1, ..., XP Y = k is the probability mass function of a multinomial distribution.

Algorithms Normal Distribution Estimators If predictor variable j has a conditional normal distribution (see the DistributionNames property), the software fits the distribution to the data by computing the class-specific weighted mean and the unbiased estimate of the weighted standard deviation. For each class k: • The weighted mean of predictor j is

∑

xj

k

=

i: yi = k

wixi j

∑

i: yi = k

wi

,

where wi is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class. • The unbiased estimator of the weighted standard deviation of predictor j is

∑

sj

k

=

i: yi = k

z1

wi xi j − x j

k

2 1/2

z2 k k− z 1 k

,

where z1|k is the sum of the weights within class k and z2|k is the sum of the squared weights within class k. Estimated Probability for Multinomial Distribution If all predictor variables compose a conditional multinomial distribution (see the DistributionNames property), the software fits the distribution using the “Bag-of-Tokens Model” on page 35-1058. The software stores the probability that token j appears in class k in the property DistributionParameters{k,j}. With additive smoothing [2], the estimated probability is P(token j class k) =

1 + cj k , P + ck

where:

∑

• cj

k

= nk

i: yi = k

∑

i: yi = k

xi jwi wi

, which is the weighted number of occurrences of token j in class k.

• nk is the number of observations in class k. • wi is the weight for observation i. The software normalizes weights within a class so that they sum to the prior probability for that class. 35-1059

35

Functions

•

ck =

P

∑

j=1

c j k, which is the total weighted number of occurrences of all tokens in class k.

Estimated Probability for Multivariate Multinomial Distribution If predictor variable j has a conditional multivariate multinomial distribution (see the DistributionNames property), the software follows this procedure: 1

The software collects a list of the unique levels, stores the sorted list in CategoricalLevels, and considers each level a bin. Each combination of predictor and class is a separate, independent multinomial random variable.

2

For each class k, the software counts instances of each categorical level using the list stored in CategoricalLevels{j}.

3

The software stores the probability that predictor j in class k has level L in the property DistributionParameters{k,j}, for all levels in CategoricalLevels{j}. With additive smoothing [2], the estimated probability is P predictor j = L class k =

1 + m j k(L) , m j + mk

where:

∑

• m j k(L) = nk

i: yi = k

I xi j = L wi

∑

i: yi = k

wi

, which is the weighted number of observations for which

predictor j equals L in class k. • nk is the number of observations in class k. • I xi j = L = 1 if xij = L, and 0 otherwise. • wi is the weight for observation i. The software normalizes weights within a class so that they sum to the prior probability for that class. • mj is the number of distinct levels in predictor j. • mk is the weighted number of observations in class k.

Version History Introduced in R2014b R2023b: Naive Bayes models support standardization of kernel-distributed predictors fitcnb supports the standardization of predictors with kernel distributions. That is, you can specify the Standardize name-value argument as true when the DistributionNames name-value argument includes at least one "kernel" distribution. Naive Bayes models include Mu and Sigma properties that contain the means and standard deviations, respectively, used to standardize the predictors before training. The properties are empty when fitcnb does not perform any standardization.

35-1060

CompactClassificationNaiveBayes

References [1] Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer Series in Statistics. New York, NY: Springer, 2009. https://doi.org/10.1007/978-0-387-84858-7. [2] Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval, NY: Cambridge University Press, 2008.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict function supports code generation. • When you train a naive Bayes model by using fitcnb, the following restrictions apply. • The value of the 'DistributionNames' name-value pair argument cannot contain 'mn'. • The value of the 'ScoreTransform' name-value pair argument cannot be an anonymous function. For more information, see “Introduction to Code Generation” on page 34-3.

See Also ClassificationNaiveBayes | loss | predict | fitcnb Topics “Naive Bayes Classification” on page 22-2 “Grouping Variables” on page 2-11

35-1061

35

Functions

CompactClassificationNeuralNetwork Compact neural network model for classification

Description CompactClassificationNeuralNetwork is a compact version of a ClassificationNeuralNetwork model object. The compact model does not include the data used for training the classifier. Therefore, you cannot perform some tasks, such as cross-validation, using the compact model. Use a compact model for tasks such as predicting the labels of new data.

Creation Create a CompactClassificationNeuralNetwork object from a full ClassificationNeuralNetwork model object by using compact.

Properties Neural Network Properties LayerSizes — Sizes of fully connected layers positive integer vector This property is read-only. Sizes of the fully connected layers in the neural network model, returned as a positive integer vector. The ith element of LayerSizes is the number of outputs in the ith fully connected layer of the neural network model. LayerSizes does not include the size of the final fully connected layer. This layer always has K outputs, where K is the number of classes in the response variable. Data Types: single | double LayerWeights — Learned layer weights cell array This property is read-only. Learned layer weights for the fully connected layers, returned as a cell array. The ith entry in the cell array corresponds to the layer weights for the ith fully connected layer. For example, Mdl.LayerWeights{1} returns the weights for the first fully connected layer of the model Mdl. LayerWeights includes the weights for the final fully connected layer. Data Types: cell LayerBiases — Learned layer biases cell array This property is read-only. 35-1062

CompactClassificationNeuralNetwork

Learned layer biases for the fully connected layers, returned as a cell array. The ith entry in the cell array corresponds to the layer biases for the ith fully connected layer. For example, Mdl.LayerBiases{1} returns the biases for the first fully connected layer of the model Mdl. LayerBiases includes the biases for the final fully connected layer. Data Types: cell Activations — Activation functions for fully connected layers 'relu' | 'tanh' | 'sigmoid' | 'none' | cell array of character vectors This property is read-only. Activation functions for the fully connected layers of the neural network model, returned as a character vector or cell array of character vectors with values from this table. Value

Description

'relu'

Rectified linear unit (ReLU) function — Performs a threshold operation on each element of the input, where any value less than zero is set to zero, that is, f x =

x, x ≥ 0 0, x < 0

'tanh'

Hyperbolic tangent (tanh) function — Applies the tanh function to each input element

'sigmoid'

Sigmoid function — Performs the following operation on each input element: f (x) =

'none'

1 1 + e−x

Identity function — Returns each input element without performing any transformation, that is, f(x) = x

• If Activations contains only one activation function, then it is the activation function for every fully connected layer of the neural network model, excluding the final fully connected layer. The activation function for the final fully connected layer is always softmax (OutputLayerActivation). • If Activations is an array of activation functions, then the ith element is the activation function for the ith layer of the neural network model. Data Types: char | cell OutputLayerActivation — Activation function for final fully connected layer 'softmax' This property is read-only. Activation function for the final fully connected layer, returned as 'softmax'. The function takes each input xi and returns the following, where K is the number of classes in the response variable: 35-1063

35

Functions

f (xi) =

exp(xi) K

∑

j=1

.

exp(x j)

The results correspond to the predicted classification scores (or posterior probabilities). Data Properties PredictorNames — Predictor variable names cell array of character vectors This property is read-only. Predictor variable names, returned as a cell array of character vectors. The order of the elements of PredictorNames corresponds to the order in which the predictor names appear in the training data. Data Types: cell CategoricalPredictors — Categorical predictor indices vector of positive integers | [] This property is read-only. Categorical predictor indices, returned as a vector of positive integers. Assuming that the predictor data contains observations in rows, CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]). Data Types: double ExpandedPredictorNames — Expanded predictor names cell array of character vectors This property is read-only. Expanded predictor names, returned as a cell array of character vectors. If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. Data Types: cell ClassNames — Unique class names numeric vector | categorical vector | logical vector | character array | cell array of character vectors This property is read-only. Unique class names used in training, returned as a numeric vector, categorical vector, logical vector, character array, or cell array of character vectors. ClassNames has the same data type as the class labels in the response variable used to train the model. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: single | double | categorical | logical | char | cell Mu — Predictor means numeric vector | [] This property is read-only. 35-1064

CompactClassificationNeuralNetwork

Predictor means, returned as a numeric vector. If you set Standardize to 1 or true when you train the neural network model, then the length of the Mu vector is equal to the number of expanded predictors (see ExpandedPredictorNames). The vector contains 0 values for dummy variables corresponding to expanded categorical predictors. If you set Standardize to 0 or false when you train the neural network model, then the Mu value is an empty vector ([]). Data Types: double ResponseName — Response variable name character vector This property is read-only. Response variable name, returned as a character vector. Data Types: char Sigma — Predictor standard deviations numeric vector | [] This property is read-only. Predictor standard deviations, returned as a numeric vector. If you set Standardize to 1 or true when you train the neural network model, then the length of the Sigma vector is equal to the number of expanded predictors (see ExpandedPredictorNames). The vector contains 1 values for dummy variables corresponding to expanded categorical predictors. If you set Standardize to 0 or false when you train the neural network model, then the Sigma value is an empty vector ([]). Data Types: double Other Classification Properties Cost — Misclassification cost numeric square matrix Misclassification cost, returned as a numeric square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i. The cost matrix always has this form: Cost(i,j) = 1 if i ~= j, and Cost(i,j) = 0 if i = j. The rows correspond to the true class and the columns correspond to the predicted class. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The software uses the Cost value for prediction, but not training. You can change the Cost property value of the trained model by using dot notation. Data Types: double Prior — Prior class probabilities numeric vector This property is read-only. Prior class probabilities, returned as a numeric vector. The order of the elements of Prior corresponds to the elements of ClassNames. 35-1065

35

Functions

Data Types: double ScoreTransform — Score transformation character vector | function handle Score transformation, specified as a character vector or function handle. ScoreTransform represents a built-in transformation function or a function handle for transforming predicted classification scores. To change the score transformation function to function, for example, use dot notation. • For a built-in function, enter a character vector. Mdl.ScoreTransform = 'function';

This table describes the available built-in functions. Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Data Types: char | function_handle

Object Functions Interpret Prediction lime partialDependence plotPartialDependence shapley 35-1066

Local interpretable model-agnostic explanations (LIME) Compute partial dependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Shapley values

CompactClassificationNeuralNetwork

Assess Predictive Performance on New Observations edge loss margin predict

Classification edge for neural network classifier Classification loss for neural network classifier Classification margins for neural network classifier Classify observations using neural network classifier

Compare Accuracies compareHoldout testckfold

Compare accuracies of two classification models using new data Compare accuracies of two classification models by repeated cross-validation

Examples Reduce Size of Neural Network Classifier Reduce the size of a full neural network classifier by removing the training data from the model. You can use a compact model to improve memory efficiency. Load the patients data set. Create a table from the data set. Each row corresponds to one patient, and each column corresponds to a diagnostic variable. Use the Smoker variable as the response variable, and the rest of the variables as predictors. load patients tbl = table(Diastolic,Systolic,Gender,Height,Weight,Age,Smoker);

Train a neural network classifier using the data. Specify the Smoker column of tbl as the response variable. Specify to standardize the numeric predictors. Mdl = fitcnet(tbl,"Smoker","Standardize",true) Mdl = ClassificationNeuralNetwork PredictorNames: {'Diastolic' ResponseName: 'Smoker' CategoricalPredictors: 3 ClassNames: [0 1] ScoreTransform: 'none' NumObservations: 100 LayerSizes: 10 Activations: 'relu' OutputLayerActivation: 'softmax' Solver: 'LBFGS' ConvergenceInfo: [1x1 struct] TrainingHistory: [36x7 table]

'Systolic'

'Gender'

'Height'

'Weight'

'Age'}

Mdl is a full ClassificationNeuralNetwork model object. Reduce the size of the model by using compact. compactMdl = compact(Mdl) compactMdl = CompactClassificationNeuralNetwork

35-1067

35

Functions

LayerSizes: 10 Activations: 'relu' OutputLayerActivation: 'softmax'

compactMdl is a CompactClassificationNeuralNetwork model object. compactMdl contains fewer properties than the full model Mdl. Display the amount of memory used by each neural network model. whos("Mdl","compactMdl") Name

Size

Bytes

Class

Mdl compactMdl

1x1 1x1

19105 6832

ClassificationNeuralNetwork classreg.learning.classif.CompactClassificationNeuralNet

The full model is larger than the compact model.

Version History Introduced in R2021a R2023b: Neural network models include standardization properties Neural network models include Mu and Sigma properties that contain the means and standard deviations, respectively, used to standardize the predictors before training. The properties are empty when the fitting function does not perform any standardization. R2023a: Neural network classifiers support misclassification costs and prior probabilities fitcnet supports misclassification costs and prior probabilities for neural network classifiers. Specify the Cost and Prior name-value arguments when you create a model. Alternatively, you can specify misclassification costs after training a model by using dot notation to change the Cost property value of the model. Mdl.Cost = [0 2; 1 0];

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict object function supports code generation. For more information, see “Introduction to Code Generation” on page 34-3.

35-1068

CompactClassificationNeuralNetwork

See Also fitcnet | predict | loss | margin | edge | ClassificationPartitionedModel | ClassificationNeuralNetwork | compact Topics “Assess Neural Network Classifier Performance” on page 19-151

35-1069

35

Functions

CompactClassificationGAM Compact generalized additive model (GAM) for binary classification

Description CompactClassificationGAM is a compact version of a ClassificationGAM model object (GAM for binary classification). The compact model does not include the data used for training the classifier. Therefore, you cannot perform some tasks, such as cross-validation, using the compact model. Use a compact model for tasks such as predicting the labels of new data.

Creation Create a CompactClassificationGAM object from a full ClassificationGAM model object by using compact.

Properties GAM Properties Interactions — Interaction term indices two-column matrix of positive integers | [] This property is read-only. Interaction term indices, specified as a t-by-2 matrix of positive integers, where t is the number of interaction terms in the model. Each row of the matrix represents one interaction term and contains the column indexes of the predictor data X for the interaction term. If the model does not include an interaction term, then this property is empty ([]). The software adds interaction terms to the model in the order of importance based on the p-values. Use this property to check the order of the interaction terms added to the model. Data Types: double Intercept — Intercept term of model numeric scalar This property is read-only. Intercept (constant) term of the model, which is the sum of the intercept terms in the predictor trees and interaction trees, specified as a numeric scalar. Data Types: single | double Other Classification Properties CategoricalPredictors — Categorical predictor indices vector of positive integers | [] This property is read-only. 35-1070

CompactClassificationGAM

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). Data Types: double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: single | double | logical | char | cell | categorical Cost — Misclassification costs 2-by-2 numeric matrix Misclassification costs, specified as a 2-by-2 numeric matrix. Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The software uses the Cost value for prediction, but not training. You can change the value by using dot notation. Example: Mdl.Cost = C; Data Types: double ExpandedPredictorNames — Expanded predictor names cell array of character vectors This property is read-only. Expanded predictor names, specified as a cell array of character vectors. ExpandedPredictorNames is the same as PredictorNames for a generalized additive model. Data Types: cell PredictorNames — Predictor variable names cell array of character vectors This property is read-only. Predictor variable names, specified as a cell array of character vectors. The order of the elements in PredictorNames corresponds to the order in which the predictor names appear in the training data. Data Types: cell Prior — Prior class probabilities numeric vector 35-1071

35

Functions

This property is read-only. Prior class probabilities, specified as a numeric vector with two elements. The order of the elements corresponds to the order of the elements in ClassNames. Data Types: double ResponseName — Response variable name character vector This property is read-only. Response variable name, specified as a character vector. Data Types: char ScoreTransform — Score transformation character vector | function handle Score transformation, specified as a character vector or function handle. ScoreTransform represents a built-in transformation function or a function handle for transforming predicted classification scores. To change the score transformation function to function, for example, use dot notation. • For a built-in function, enter a character vector. Mdl.ScoreTransform = 'function';

This table describes the available built-in functions. Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). 35-1072

CompactClassificationGAM

This property determines the output score computation for object functions such as predict, margin, and edge. Use 'logit' to compute posterior probabilities, and use 'none' to compute the logit of posterior probabilities. Data Types: char | function_handle

Object Functions Interpret Prediction lime partialDependence plotLocalEffects plotPartialDependence shapley

Local interpretable model-agnostic explanations (LIME) Compute partial dependence Plot local effects of terms in generalized additive model (GAM) Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Shapley values

Assess Predictive Performance on New Observations predict loss margin edge

Classify observations using generalized additive model (GAM) Classification loss for generalized additive model (GAM) Classification margins for generalized additive model (GAM) Classification edge for generalized additive model (GAM)

Compare Accuracies compareHoldout

Compare accuracies of two classification models using new data

Examples Reduce Size of Generalized Additive Model Reduce the size of a full generalized additive model (GAM) by removing the training data. Full models hold the training data. You can use a compact model to improve memory efficiency. Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). load ionosphere

Train a GAM using the predictors X and class labels Y. A recommended practice is to specify the class names. Mdl = fitcgam(X,Y,'ClassNames',{'b','g'}) Mdl = ClassificationGAM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: Intercept: NumObservations:

'Y' [] {'b' 'g'} 'logit' 2.2715 351

35-1073

35

Functions

Mdl is a ClassificationGAM model object. Reduce the size of the classifier. CMdl = compact(Mdl) CMdl = CompactClassificationGAM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: Intercept:

'Y' [] {'b' 'g'} 'logit' 2.2715

CMdl is a CompactClassificationGAM model object. Display the amount of memory used by each classifier. whos('Mdl','CMdl') Name

Size

CMdl Mdl

1x1 1x1

Bytes 1030188 1231165

Class classreg.learning.classif.CompactClassificationGAM ClassificationGAM

The full classifier (Mdl) is larger than the compact classifier (CMdl). To efficiently label new observations, you can remove Mdl from the MATLAB® Workspace, and then pass CMdl and new predictor values to predict.

Version History Introduced in R2021a

See Also ClassificationGAM | compact Topics “Train Generalized Additive Model for Binary Classification” on page 12-77

35-1074

Attrib

CompactClassificationSVM

CompactClassificationSVM Compact support vector machine (SVM) for one-class and binary classification

Description CompactClassificationSVM is a compact version of the support vector machine (SVM) classifier. The compact classifier does not include the data used for training the SVM classifier. Therefore, you cannot perform some tasks, such as cross-validation, using the compact classifier. Use a compact SVM classifier for tasks such as predicting the labels of new data.

Creation Create a CompactClassificationSVM model from a full, trained ClassificationSVM classifier by using compact.

Properties SVM Properties Alpha — Trained classifier coefficients numeric vector This property is read-only. Trained classifier coefficients, specified as an s-by-1 numeric vector. s is the number of support vectors in the trained classifier, sum(Mdl.IsSupportVector). Alpha contains the trained classifier coefficients from the dual problem, that is, the estimated Lagrange multipliers. If you remove duplicates by using the RemoveDuplicates name-value pair argument of fitcsvm, then for a given set of duplicate observations that are support vectors, Alpha contains one coefficient corresponding to the entire set. That is, MATLAB attributes a nonzero coefficient to one observation from the set of duplicates and a coefficient of 0 to all other duplicate observations in the set. Data Types: single | double Beta — Linear predictor coefficients numeric vector This property is read-only. Linear predictor coefficients, specified as a numeric vector. The length of Beta is equal to the number of predictors used to train the model. MATLAB expands categorical variables in the predictor data using full dummy encoding. That is, MATLAB creates one dummy variable for each level of each categorical variable. Beta stores one value for each predictor variable, including the dummy variables. For example, if there are three predictors, one of which is a categorical variable with three levels, then Beta is a numeric vector containing five values. 35-1075

35

Functions

If KernelParameters.Function is 'linear', then the classification score for the observation x is f x = x/s ′β + b . Mdl stores β, b, and s in the properties Beta, Bias, and KernelParameters.Scale, respectively. To estimate classification scores manually, you must first apply any transformations to the predictor data that were applied during training. Specifically, if you specify 'Standardize',true when using fitcsvm, then you must standardize the predictor data manually by using the mean Mdl.Mu and standard deviation Mdl.Sigma, and then divide the result by the kernel scale in Mdl.KernelParameters.Scale. All SVM functions, such as resubPredict and predict, apply any required transformation before estimation. If KernelParameters.Function is not 'linear', then Beta is empty ([]). Data Types: single | double Bias — Bias term scalar This property is read-only. Bias term, specified as a scalar. Data Types: single | double KernelParameters — Kernel parameters structure array This property is read-only. Kernel parameters, specified as a structure array. The kernel parameters property contains the fields listed in this table. Field

Description

Function

Kernel function used to compute the elements of the Gram matrix on page 35-2498. For details, see 'KernelFunction'.

Scale

Kernel scale parameter used to scale all elements of the predictor data on which the model is trained. For details, see 'KernelScale'.

To display the values of KernelParameters, use dot notation. For example, Mdl.KernelParameters.Scale displays the kernel scale parameter value. The software accepts KernelParameters as inputs and does not modify them. Data Types: struct SupportVectorLabels — Support vector class labels s-by-1 numeric vector This property is read-only. Support vector class labels, specified as an s-by-1 numeric vector. s is the number of support vectors in the trained classifier, sum(Mdl.IsSupportVector). 35-1076

CompactClassificationSVM

A value of +1 in SupportVectorLabels indicates that the corresponding support vector is in the positive class (ClassNames{2}). A value of –1 indicates that the corresponding support vector is in the negative class (ClassNames{1}). If you remove duplicates by using the RemoveDuplicates name-value pair argument of fitcsvm, then for a given set of duplicate observations that are support vectors, SupportVectorLabels contains one unique support vector label. Data Types: single | double SupportVectors — Support vectors s-by-p numeric matrix This property is read-only. Support vectors in the trained classifier, specified as an s-by-p numeric matrix. s is the number of support vectors in the trained classifier, sum(Mdl.IsSupportVector), and p is the number of predictor variables in the predictor data. SupportVectors contains rows of the predictor data X that MATLAB considers to be support vectors. If you specify 'Standardize',true when training the SVM classifier using fitcsvm, then SupportVectors contains the standardized rows of X. If you remove duplicates by using the RemoveDuplicates name-value pair argument of fitcsvm, then for a given set of duplicate observations that are support vectors, SupportVectors contains one unique support vector. Data Types: single | double Other Classification Properties CategoricalPredictors — Categorical predictor indices vector of positive integers | [] This property is read-only. Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). Data Types: double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: single | double | logical | char | cell | categorical Cost — Misclassification cost numeric square matrix 35-1077

35

Functions

This property is read-only. Misclassification cost, specified as a numeric square matrix. • For two-class learning, the Cost property stores the misclassification cost matrix specified by the Cost name-value argument of the fitting function. The rows correspond to the true class and the columns correspond to the predicted class. That is, Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. • For one-class learning, Cost = 0. Data Types: double ExpandedPredictorNames — Expanded predictor names cell array of character vectors This property is read-only. Expanded predictor names, specified as a cell array of character vectors. If the model uses dummy variable encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. Data Types: cell Mu — Predictor means numeric vector | [] This property is read-only. Predictor means, specified as a numeric vector. If you specify 'Standardize',1 or 'Standardize',true when you train an SVM classifier using fitcsvm, the length of Mu is equal to the number of predictors. MATLAB expands categorical variables in the predictor data using dummy variables. Mu stores one value for each predictor variable, including the dummy variables. However, MATLAB does not standardize the columns that contain categorical variables. If you set 'Standardize',false when you train the SVM classifier using fitcsvm, then Mu is an empty vector ([]). Data Types: single | double PredictorNames — Predictor variable names cell array of character vectors This property is read-only. Predictor variable names, specified as a cell array of character vectors. The order of the elements in PredictorNames corresponds to the order in which the predictor names appear in the training data. Data Types: cell Prior — Prior probabilities numeric vector 35-1078

CompactClassificationSVM

This property is read-only. Prior probabilities for each class, specified as a numeric vector. For two-class learning, if you specify a cost matrix, then the software updates the prior probabilities by incorporating the penalties described in the cost matrix. • For two-class learning, the software normalizes the prior probabilities specified by the Prior name-value argument of the fitting function so that the probabilities sum to 1. The Prior property stores the normalized prior probabilities. The order of the elements of Prior corresponds to the elements of Mdl.ClassNames. • For one-class learning, Prior = 1. Data Types: single | double ScoreTransform — Score transformation character vector | function handle Score transformation, specified as a character vector or function handle. ScoreTransform represents a built-in transformation function or a function handle for transforming predicted classification scores. To change the score transformation function to function, for example, use dot notation. • For a built-in function, enter a character vector. Mdl.ScoreTransform = 'function';

This table describes the available built-in functions. Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). 35-1079

35

Functions

Data Types: char | function_handle Sigma — Predictor standard deviations [] (default) | numeric vector This property is read-only. Predictor standard deviations, specified as a numeric vector. If you specify 'Standardize',true when you train the SVM classifier using fitcsvm, the length of Sigma is equal to the number of predictor variables. MATLAB expands categorical variables in the predictor data using dummy variables. Sigma stores one value for each predictor variable, including the dummy variables. However, MATLAB does not standardize the columns that contain categorical variables. If you set 'Standardize',false when you train the SVM classifier using fitcsvm, then Sigma is an empty vector ([]). Data Types: single | double

Object Functions compareHoldout discardSupportVectors edge fitPosterior gather incrementalLearner lime loss margin partialDependence plotPartialDependence predict shapley update

Compare accuracies of two classification models using new data Discard support vectors for linear support vector machine (SVM) classifier Find classification edge for support vector machine (SVM) classifier Fit posterior probabilities for compact support vector machine (SVM) classifier Gather properties of Statistics and Machine Learning Toolbox object from GPU Convert binary classification support vector machine (SVM) model to incremental learner Local interpretable model-agnostic explanations (LIME) Find classification error for support vector machine (SVM) classifier Find classification margins for support vector machine (SVM) classifier Compute partial dependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Classify observations using support vector machine (SVM) classifier Shapley values Update model parameters for code generation

Examples Reduce Size of SVM Classifier Reduce the size of a full support vector machine (SVM) classifier by removing the training data. Full SVM classifiers (that is, ClassificationSVM classifiers) hold the training data. To improve efficiency, use a smaller classifier. Load the ionosphere data set. load ionosphere

35-1080

CompactClassificationSVM

Train an SVM classifier. Standardize the predictor data and specify the order of the classes. SVMModel = fitcsvm(X,Y,'Standardize',true,... 'ClassNames',{'b','g'}) SVMModel = ClassificationSVM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: Alpha: Bias: KernelParameters: Mu: Sigma: BoxConstraints: ConvergenceInfo: IsSupportVector: Solver:

'Y' [] {'b' 'g'} 'none' 351 [90x1 double] -0.1343 [1x1 struct] [0.8917 0 0.6413 0.0444 0.6011 0.1159 0.5501 0.1194 0.5118 0.1813 0.47 [0.3112 0 0.4977 0.4414 0.5199 0.4608 0.4927 0.5207 0.5071 0.4839 0.56 [351x1 double] [1x1 struct] [351x1 logical] 'SMO'

SVMModel is a ClassificationSVM classifier. Reduce the size of the SVM classifier. CompactSVMModel = compact(SVMModel) CompactSVMModel = CompactClassificationSVM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: Alpha: Bias: KernelParameters: Mu: Sigma: SupportVectors: SupportVectorLabels:

'Y' [] {'b' 'g'} 'none' [90x1 double] -0.1343 [1x1 struct] [0.8917 0 0.6413 0.0444 0.6011 0.1159 0.5501 0.1194 0.5118 0.1813 0.47 [0.3112 0 0.4977 0.4414 0.5199 0.4608 0.4927 0.5207 0.5071 0.4839 0.56 [90x34 double] [90x1 double]

CompactSVMModel is a CompactClassificationSVM classifier. Display the amount of memory used by each classifier. whos('SVMModel','CompactSVMModel') Name

Size

CompactSVMModel SVMModel

1x1 1x1

Bytes 31227 141317

Class

classreg.learning.classif.CompactClassificationSVM ClassificationSVM

The full SVM classifier (SVMModel) is more than four times larger than the compact SVM classifier (CompactSVMModel). 35-1081

35

Functions

To label new observations efficiently, you can remove SVMModel from the MATLAB® Workspace, and then pass CompactSVMModel and new predictor values to predict. To further reduce the size of the compact SVM classifier, use the discardSupportVectors function to discard support vectors.

Train and Cross-Validate SVM Classifier Load the ionosphere data set. load ionosphere

Train and cross-validate an SVM classifier. Standardize the predictor data and specify the order of the classes. rng(1); % For reproducibility CVSVMModel = fitcsvm(X,Y,'Standardize',true,... 'ClassNames',{'b','g'},'CrossVal','on') CVSVMModel = ClassificationPartitionedModel CrossValidatedModel: 'SVM' PredictorNames: {'x1' 'x2' 'x3' ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none'

'x4'

'x5'

'x6'

'x7'

'x8'

'x9'

'x10'

'x11'

'x1

CVSVMModel is a ClassificationPartitionedModel cross-validated SVM classifier. By default, the software implements 10-fold cross-validation. Alternatively, you can cross-validate a trained ClassificationSVM classifier by passing it to crossval. Inspect one of the trained folds using dot notation. CVSVMModel.Trained{1} ans = CompactClassificationSVM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: Alpha: Bias: KernelParameters: Mu: Sigma: SupportVectors: SupportVectorLabels:

35-1082

'Y' [] {'b' 'g'} 'none' [78x1 double] -0.2209 [1x1 struct] [0.8888 0 0.6320 0.0406 0.5931 0.1205 0.5361 0.1286 0.5083 0.1879 0.47 [0.3149 0 0.5033 0.4441 0.5255 0.4663 0.4987 0.5205 0.5040 0.4780 0.56 [78x34 double] [78x1 double]

CompactClassificationSVM

Each fold is a CompactClassificationSVM classifier trained on 90% of the data. Estimate the generalization error. genError = kfoldLoss(CVSVMModel) genError = 0.1168

On average, the generalization error is approximately 12%.

Version History Introduced in R2014a R2022a: Cost property stores the user-specified cost matrix Behavior changed in R2022a Starting in R2022a, the Cost property stores the user-specified cost matrix, so that you can compute the observed misclassification cost using the specified cost value. The software stores normalized prior probabilities (Prior) that do not reflect the penalties described in the cost matrix. To compute the observed misclassification cost, specify the LossFun name-value argument as "classifcost" when you call the loss function. Note that model training has not changed and, therefore, the decision boundaries between classes have not changed. For training, the fitting function updates the specified prior probabilities by incorporating the penalties described in the specified cost matrix, and then normalizes the prior probabilities and observation weights. This behavior has not changed. In previous releases, the software stored the default cost matrix in the Cost property and stored the prior probabilities used for training in the Prior property. Starting in R2022a, the software stores the user-specified cost matrix without modification, and stores normalized prior probabilities that do not reflect the cost penalties. For more details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 198. Some object functions use the Cost and Prior properties: • The loss function uses the cost matrix stored in the Cost property if you specify the LossFun name-value argument as "classifcost" or "mincost". • The loss and edge functions use the prior probabilities stored in the Prior property to normalize the observation weights of the input data. If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases. If you want the software to handle the cost matrix, prior probabilities, and observation weights in the same way as in previous releases, adjust the prior probabilities and observation weights for the nondefault cost matrix, as described in “Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix” on page 19-9. Then, when you train a classification model, specify the adjusted prior probabilities and observation weights by using the Prior and Weights name-value arguments, respectively, and use the default cost matrix. 35-1083

35

Functions

References [1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008. [2] Scholkopf, B., J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson. “Estimating the Support of a High-Dimensional Distribution.” Neural Computation. Vol. 13, Number 7, 2001, pp. 1443–1471. [3] Christianini, N., and J. C. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000. [4] Scholkopf, B., and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, Adaptive Computation and Machine Learning. Cambridge, MA: The MIT Press, 2002.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict and update functions support code generation. • To integrate the prediction of an SVM classification model into Simulink, you can use the ClassificationSVM Predict block in the Statistics and Machine Learning Toolbox library or a MATLAB Function block with the predict function. • When you train an SVM model by using fitcsvm, the following restrictions apply. • The value of the 'ScoreTransform' name-value pair argument cannot be an anonymous function. For generating code that predicts posterior probabilities given new observations, pass a trained SVM model to fitPosterior or fitSVMPosterior. The ScoreTransform property of the returned model contains an anonymous function that represents the score-toposterior-probability function and is configured for code generation. • For fixed-point code generation, the value of the 'ScoreTransform' name-value pair argument cannot be 'invlogit'. Also, the value of the 'KernelFunction' name-value pair argument must be 'gaussian', 'linear', or 'polynomial'. • For fixed-point code generation and code generation with a coder configurer, the following additional restrictions apply. • Categorical predictors (logical, categorical, char, string, or cell) are not supported. You cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model. • Class labels with the categorical data type are not supported. Both the class label value in the training data (Tbl or Y) and the value of the ClassNames name-value argument cannot be an array with the categorical data type. For more information, see “Introduction to Code Generation” on page 34-3. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. 35-1084

CompactClassificationSVM

Usage notes and limitations: • The following object functions fully support GPU arrays: • discardSupportVectors • fitPosterior • gather • The following object functions offer limited support for GPU arrays: • compareHoldout • edge • loss • margin • partialDependence • plotPartialDependence • predict For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also ClassificationSVM | fitcsvm | compact | discardSupportVectors Topics Using Support Vector Machines on page 25-6 Understanding Support Vector Machines on page 25-2

35-1085

35

Functions

CompactClassificationTree Package: classreg.learning.classif Compact classification tree

Description Compact version of a classification tree (of class ClassificationTree). The compact version does not include the data for training the classification tree. Therefore, you cannot perform some tasks with a compact classification tree, such as cross validation. Use a compact classification tree for making predictions (classifications) of new data.

Construction ctree = compact(tree) constructs a compact decision tree from a full decision tree. Input Arguments tree A decision tree constructed using fitctree.

Properties CategoricalPredictors Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]). CategoricalSplit An n-by-2 cell array, where n is the number of categorical splits in tree. Each row in CategoricalSplit gives left and right values for a categorical split. For each branch node with categorical split j based on a categorical predictor variable z, the left child is chosen if z is in CategoricalSplit(j,1) and the right child is chosen if z is in CategoricalSplit(j,2). The splits are in the same order as nodes of the tree. Nodes for these splits can be found by running cuttype and selecting 'categorical' cuts from top to bottom. Children An n-by-2 array containing the numbers of the child nodes for each node in tree, where n is the number of nodes. Leaf nodes have child node 0. ClassCount An n-by-k array of class counts for the nodes in tree, where n is the number of nodes and k is the number of classes. For any node number i, the class counts ClassCount(i,:) are counts of observations (from the data used in fitting the tree) from each class satisfying the conditions for node i. 35-1086

CompactClassificationTree

ClassNames List of the elements in Y with duplicates removed. ClassNames can be a numeric vector, vector of categorical variables, logical vector, character array, or cell array of character vectors. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.) If the value of a property has at least one dimension of length k, then ClassNames indicates the order of the elements along that dimension (e.g., Cost and Prior). ClassProbability An n-by-k array of class probabilities for the nodes in tree, where n is the number of nodes and k is the number of classes. For any node number i, the class probabilities ClassProbability(i,:) are the estimated probabilities for each class for a point satisfying the conditions for node i. Cost Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. This property is readonly. CutCategories An n-by-2 cell array of the categories used at branches in tree, where n is the number of nodes. For each branch node i based on a categorical predictor variable x, the left child is chosen if x is among the categories listed in CutCategories{i,1}, and the right child is chosen if x is among those listed in CutCategories{i,2}. Both columns of CutCategories are empty for branch nodes based on continuous predictors and for leaf nodes. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutPoint An n-element vector of the values used as cut points in tree, where n is the number of nodes. For each branch node i based on a continuous predictor variable x, the left child is chosen if x=CutPoint(i). CutPoint is NaN for branch nodes based on categorical predictors and for leaf nodes. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutType An n-element cell array indicating the type of cut at each node in tree, where n is the number of nodes. For each node i, CutType{i} is: • 'continuous' — If the cut is defined in the form x < v for a variable x and cut point v. • 'categorical' — If the cut is defined by whether a variable x takes a value in a set of categories. • '' — If i is a leaf node. 35-1087

35

Functions

CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutPredictor An n-element cell array of the names of the variables used for branching in each node in tree, where n is the number of nodes. These variables are sometimes known as cut variables. For leaf nodes, CutPredictor contains an empty character vector. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutPredictorIndex An n-element array of numeric indices for the variables used for branching in each node in tree, where n is the number of nodes. For more information, see CutPredictor. ExpandedPredictorNames Expanded predictor names, stored as a cell array of character vectors. If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. IsBranchNode An n-element logical vector that is true for each branch node and false for each leaf node of tree. NodeClass An n-element cell array with the names of the most probable classes in each node of tree, where n is the number of nodes in the tree. Every element of this array is a character vector equal to one of the class names in ClassNames. NodeError An n-element vector of the errors of the nodes in tree, where n is the number of nodes. NodeError(i) is the misclassification probability for node i. NodeProbability An n-element vector of the probabilities of the nodes in tree, where n is the number of nodes. The probability of a node is computed as the proportion of observations from the original data that satisfy the conditions for the node. This proportion is adjusted for any prior probabilities assigned to each class. NodeRisk An n-element vector of the risk of the nodes in the tree, where n is the number of nodes. The risk for each node is the measure of impurity (Gini index or deviance) for this node weighted by the node probability. If the tree is grown by twoing, the risk for each node is zero.

35-1088

CompactClassificationTree

NodeSize An n-element vector of the sizes of the nodes in tree, where n is the number of nodes. The size of a node is defined as the number of observations from the data used to create the tree that satisfy the conditions for the node. NumNodes The number of nodes in tree. Parent An n-element vector containing the number of the parent node for each node in tree, where n is the number of nodes. The parent of the root node is 0. PredictorNames A cell array of names for the predictor variables, in the order in which they appear in X. Prior Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames. The number of elements of Prior is the number of unique classes in the response. This property is read-only. PruneAlpha Numeric vector with one element per pruning level. If the pruning level ranges from 0 to M, then PruneAlpha has M + 1 elements sorted in ascending order. PruneAlpha(1) is for pruning level 0 (no pruning), PruneAlpha(2) is for pruning level 1, and so on. PruneList An n-element numeric vector with the pruning levels in each node of tree, where n is the number of nodes. The pruning levels range from 0 (no pruning) to M, where M is the distance between the deepest leaf and the root node. ResponseName Character vector describing the response variable Y. ScoreTransform Function handle for transforming scores, or character vector representing a built-in transformation function. 'none' means no transformation; equivalently, 'none' means @(x)x. For a list of built-in transformation functions and the syntax of custom transformation functions, see fitctree. Add or change a ScoreTransform function using dot notation: ctree.ScoreTransform = 'function' or ctree.ScoreTransform = @function

SurrogateCutCategories An n-element cell array of the categories used for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrogateCutCategories{k} is a cell array. The length of 35-1089

35

Functions

SurrogateCutCategories{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutCategories{k} is either an empty character vector for a continuous surrogate predictor, or is a two-element cell array with categories for a categorical surrogate predictor. The first element of this two-element cell array lists categories assigned to the left child by this surrogate split and the second element of this two-element cell array lists categories assigned to the right child by this surrogate split. The order of the surrogate split variables at each node is matched to the order of variables in SurrogateCutVar. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutCategories contains an empty cell. SurrogateCutFlip An n-element cell array of the numeric cut assignments used for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrSurrogateCutFlip{k} is a numeric vector. The length of SurrogateCutFlip{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutFlip{k} is either zero for a categorical surrogate predictor, or a numeric cut assignment for a continuous surrogate predictor. The numeric cut assignment can be either –1 or +1. For every surrogate split with a numeric cut C based on a continuous predictor variable Z, the left child is chosen if Z 0.5); end

Determine Number of Clusters Using Cross-Validation For a given number of clusters, compute the cross-validated sum of squared distances between observations and their nearest cluster center. Compare the results for one through ten clusters. Load the fisheriris data set. X is the matrix meas, which contains flower measurements for 150 different flowers. load fisheriris X = meas;

Create the custom function clustf (shown at the end of this example). This function performs the following steps: 1

Standardize the training data.

2

Separate the training data into k clusters.

3

Transform the test data using the training data mean and standard deviation.

4

Compute the distance from each test data point to the nearest cluster center, or centroid.

5

Compute the sum of the squares of the distances.

Note: If you use the live script file for this example, the clustf function is already included at the end of the file. Otherwise, you need to create the function at the end of your .m file or add it as a file on the MATLAB® path. Create a for loop that specifies the number of clusters k for each iteration. For each fixed number of clusters, pass the corresponding clustf function to crossval. Because crossval performs 10-fold 35-1342

crossval

cross-validation by default, the software computes 10 sums of squared distances, one for each partition of training and test data. Take the sum of those values; the result is the cross-validated sum of squared distances for the given number of clusters. rng('default') % For reproducibility cvdist = zeros(5,1); for k = 1:10 fun = @(Xtrain,Xtest)clustf(Xtrain,Xtest,k); distances = crossval(fun,X); cvdist(k) = sum(distances); end

Plot the cross-validated sum of squared distances for each number of clusters. plot(cvdist) xlabel('Number of Clusters') ylabel('CV Sum of Squared Distances')

In general, when determining how many clusters to use, consider the greatest number of clusters that corresponds to a significant decrease in the cross-validated sum of squared distances. For this example, using two or three clusters seems appropriate, but using more than three clusters does not. This code creates the function clustf. function distances = clustf(Xtrain,Xtest,k) [Ztrain,Zmean,Zstd] = zscore(Xtrain); [~,C] = kmeans(Ztrain,k); % Creates k clusters

35-1343

35

Functions

Ztest = (Xtest-Zmean)./Zstd; d = pdist2(C,Ztest,'euclidean','Smallest',1); distances = sum(d.^2); end

Compute Mean Absolute Error Using Cross-Validation Compute the mean absolute error of a regression model by using 10-fold cross-validation. Load the carsmall data set. Specify the Acceleration and Displacement variables as predictors and the Weight variable as the response. load carsmall X1 = Acceleration; X2 = Displacement; y = Weight;

Create the custom function regf (shown at the end of this example). This function fits a regression model to training data and then computes predicted car weights on a test set. The function compares the predicted car weight values to the true values, and then computes the mean absolute error (MAE) and the MAE adjusted to the range of the test set car weights. Note: If you use the live script file for this example, the regf function is already included at the end of the file. Otherwise, you need to create this function at the end of your .m file or add it as a file on the MATLAB® path. By default, crossval performs 10-fold cross-validation. For each of the 10 training and test set partitions of the data in X1, X2, and y, compute the MAE and adjusted MAE values using the regf function. Find the mean MAE and mean adjusted MAE. rng('default') % For reproducibility values = crossval(@regf,X1,X2,y) values = 10×2 319.2261 342.3722 214.3735 174.7247 189.4835 249.4359 194.4210 348.7437 283.1761 210.7444

0.1132 0.1240 0.0902 0.1128 0.0832 0.1003 0.0845 0.1700 0.1187 0.1325

mean(values) ans = 1×2 252.6701

0.1129

This code creates the function regf. 35-1344

crossval

function errors = regf(X1train,X2train,ytrain,X1test,X2test,ytest) tbltrain = table(X1train,X2train,ytrain, ... 'VariableNames',{'Acceleration','Displacement','Weight'}); tbltest = table(X1test,X2test,ytest, ... 'VariableNames',{'Acceleration','Displacement','Weight'}); mdl = fitlm(tbltrain,'Weight ~ Acceleration + Displacement'); yfit = predict(mdl,tbltest); MAE = mean(abs(yfit-tbltest.Weight)); adjMAE = MAE/range(tbltest.Weight); errors = [MAE adjMAE]; end

Compute Misclassification Error Using PCA and Cross-Validation Compute the misclassification error of a classification tree by using principal component analysis (PCA) and 5-fold cross-validation. Load the fisheriris data set. The meas matrix contains flower measurements for 150 different flowers. The species variable lists the species for each flower. load fisheriris

Create the custom function classf (shown at the end of this example). This function fits a classification tree to training data and then classifies test data. Use PCA inside the function to reduce the number of predictors used to create the tree model. Note: If you use the live script file for this example, the classf function is already included at the end of the file. Otherwise, you need to create this function at the end of your .m file or add it as a file on the MATLAB® path. Create a cvpartition object for stratified 5-fold cross-validation. By default, cvpartition ensures that training and test sets have roughly the same proportions of flower species. rng('default') % For reproducibility cvp = cvpartition(species,'KFold',5);

Compute the 5-fold cross-validation misclassification error for the classification tree with predictor data meas and response variable species. cvError = crossval('mcr',meas,species,'Predfun',@classf,'Partition',cvp) cvError = 0.1067

This code creates the function classf. function yfit = classf(Xtrain,ytrain,Xtest) % Standardize the training predictor data. Then, find the % principal components for the standardized training predictor % data. [Ztrain,Zmean,Zstd] = zscore(Xtrain); [coeff,scoreTrain,~,~,explained,mu] = pca(Ztrain); % Find the lowest number of principal components that account % for at least 95% of the variability.

35-1345

35

Functions

n = find(cumsum(explained)>=95,1); % Find the n principal component scores for the standardized % training predictor data. Train a classification tree model % using only these scores. scoreTrain95 = scoreTrain(:,1:n); mdl = fitctree(scoreTrain95,ytrain); % Find the n principal component scores for the transformed % test data. Classify the test data. Ztest = (Xtest-Zmean)./Zstd; scoreTest95 = (Ztest-mu)*coeff(:,1:n); yfit = predict(mdl,scoreTest95); end

Create Confusion Matrix Using Cross-Validation Create a confusion matrix from the 10-fold cross-validation results of a discriminant analysis model. Note: Use classify when training speed is a concern. Otherwise, use fitcdiscr to create a discriminant analysis model. For an example that shows the same workflow as this example, but uses fitcdiscr, see “Create Confusion Matrix Using Cross-Validation Predictions” on page 35-4436. Load the fisheriris data set. X contains flower measurements for 150 different flowers, and y lists the species for each flower. Create a variable order that specifies the order of the flower species. load fisheriris X = meas; y = species; order = unique(y) order = 3x1 cell {'setosa' } {'versicolor'} {'virginica' }

Create a function handle named func for a function that completes the following steps: • Take in training data (Xtrain and ytrain) and test data (Xtest and ytest). • Use the training data to create a discriminant analysis model that classifies new data (Xtest). Create this model and classify new data by using the classify function. • Compare the true test data classes (ytest) to the predicted test data values, and create a confusion matrix of the results by using the confusionmat function. Specify the class order by using 'Order',order. func = @(Xtrain,ytrain,Xtest,ytest)confusionmat(ytest, ... classify(Xtest,Xtrain,ytrain),'Order',order);

Create a cvpartition object for stratified 10-fold cross-validation. By default, cvpartition ensures that training and test sets have roughly the same proportions of flower species. 35-1346

crossval

rng('default') % For reproducibility cvp = cvpartition(y,'Kfold',10);

Compute the 10 test set confusion matrices for each partition of the predictor data X and response variable y. Each row of confMat corresponds to the confusion matrix results for one test set. Aggregate the results and create the final confusion matrix cvMat. confMat = crossval(func,X,y,'Partition',cvp); cvMat = reshape(sum(confMat),3,3) cvMat = 3×3 50 0 0

0 48 1

0 2 49

Plot the confusion matrix as a confusion matrix chart by using confusionchart. confusionchart(cvMat,order)

Input Arguments criterion — Type of error estimate 'mse' | 'mcr' 35-1347

35

Functions

Type of error estimate, specified as either 'mse' or 'mcr'. Value

Description

'mse'

Mean squared error (MSE) — Appropriate for regression algorithms only

'mcr'

Misclassification rate, or proportion of misclassified observations — Appropriate for classification algorithms only

X — Data set column vector | matrix | array Data set, specified as a column vector, matrix, or array. The rows of X correspond to observations, and the columns of X generally correspond to variables. If you pass multiple data sets X1,...,XN to crossval, then all data sets must have the same number of rows. Data Types: single | double | logical | char | string | cell | categorical y — Response data column vector | character array Response data, specified as a column vector or character array. The rows of y correspond to observations, and y must have the same number of rows as the predictor data X or X1,...,XN. Data Types: single | double | logical | char | string | cell | categorical predfun — Prediction function function handle Prediction function, specified as a function handle. You must create this function as an anonymous function, a function defined at the end of the .m or .mlx file containing the rest of your code, or a file on the MATLAB path. This table describes the required function syntax, given the type of predictor data passed to crossval.

35-1348

crossval

Value

Predictor Data

Function Syntax

@myfunction

X

function yfit = myfunction(Xtrain,ytrain,Xtest) % Calculate predicted response ... end

• Xtrain — Subset of the observations in X used as training predictor data. The function uses Xtrain and ytrain to construct a classification or regression model. • ytrain — Subset of the responses in y used as training response data. The rows of ytrain correspond to the same observations in the rows of Xtrain. The function uses Xtrain and ytrain to construct a classification or regression model. • Xtest — Subset of the observations in X used as test predictor data. The function uses Xtest and the model trained on Xtrain and ytrain to compute the predicted values yfit. • yfit — Set of predicted values for observations in Xtest. The yfit values form a column vector with the same number of rows as Xtest.

35-1349

35

Functions

Value

Predictor Data

Function Syntax

@myfunction

X1,...,XN

function yfit = myfunction(X1train,...,XNtrain,ytrain % Calculate predicted response ... end

• X1train,...,XNtrain — Subsets of the predictor data in X1,...,XN, respectively, that are used as training predictor data. The rows of X1train,...,XNtrain correspond to the same observations. The function uses X1train,...,XNtrain and ytrain to construct a classification or regression model. • ytrain — Subset of the responses in y used as training response data. The rows of ytrain correspond to the same observations in the rows of X1train,...,XNtrain. The function uses X1train,...,XNtrain and ytrain to construct a classification or regression model. • X1test,...,XNtest — Subsets of the observations in X1,...,XN, respectively, that are used as test predictor data. The rows of X1test,...,XNtest correspond to the same observations. The function uses X1test,...,XNtest and the model trained on X1train,...,XNtrain and ytrain to compute the predicted values yfit. • yfit — Set of predicted values for observations in X1test,...,XNtest. The yfit values form a column vector with the same number of rows as X1test,...,XNtest. Example: @(Xtrain,ytrain,Xtest)(Xtest*regress(ytrain,Xtrain)); Data Types: function_handle fun — Function to cross-validate function handle Function to cross-validate, specified as a function handle. You must create this function as an anonymous function, a function defined at the end of the .m or .mlx file containing the rest of your code, or a file on the MATLAB path. This table describes the required function syntax, given the type of data passed to crossval.

35-1350

crossval

Value

Data

Function Syntax

@myfunction

X

function value = myfunction(Xtrain,Xtest) % Calculation of value ... end

• Xtrain — Subset of the observations in X used as training data. The function uses Xtrain to construct a model. • Xtest — Subset of the observations in X used as test data. The function uses Xtest and the model trained on Xtrain to compute value. • value — Quantity or variable. In most cases, value is a numeric scalar representing a loss estimate. value can also be an array, provided that the array size is the same for each partition of training and test data. If you want to return a variable output that can change size depending on the data partition, set value to be the cell scalar {output} instead. @myfunction

X1,...,XN

function value = myfunction(X1train,...,XNtrain,X1tes % Calculation of value ... end

• X1train,...,XNtrain — Subsets of the data in X1,...,XN, respectively, that are used as training data. The rows of X1train,...,XNtrain correspond to the same observations. The function uses X1train,...,XNtrain to construct a model. • X1test,...,XNtest — Subsets of the data in X1,...,XN, respectively, that are used as test data. The rows of X1test,...,XNtest correspond to the same observations. The function uses X1test,...,XNtest and the model trained on X1train,...,XNtrain to compute value. • value — Quantity or variable. In most cases, value is a numeric scalar representing a loss estimate. value can also be an array, provided that the array size is the same for each partition of training and test data. If you want to return a variable output that can change size depending on the data partition, set value to be the cell scalar {output} instead. Data Types: function_handle

35-1351

35

Functions

Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: crossval('mcr',meas,species,'Predfun',@classf,'KFold',5,'Stratify',species) specifies to compute the stratified 5-fold cross-validation misclassification rate for the classf function with predictor data meas and response variable species. Holdout — Fraction or number of observations used for holdout validation [] (default) | scalar value in the range (0,1) | positive integer scalar Fraction or number of observations used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1) or a positive integer scalar. • If the Holdout value p is a scalar in the range (0,1), then crossval randomly selects and reserves approximately p*100% of the observations as test data. • If the Holdout value p is a positive integer scalar, then crossval randomly selects and reserves p observations as test data. In either case, crossval then trains the model specified by either fun or predfun using the rest of the data. Finally, the function uses the test data along with the trained model to compute either values or err. You can use only one of these four name-value pair arguments: Holdout, KFold, Leaveout, and Partition. Example: 'Holdout',0.3 Example: 'Holdout',50 Data Types: single | double KFold — Number of folds 10 (default) | positive integer scalar greater than 1 Number of folds for k-fold cross-validation, specified as the comma-separated pair consisting of 'KFold' and a positive integer scalar greater than 1. If you specify 'KFold',k, then crossval randomly partitions the data into k sets. For each set, the function reserves the set as test data, and trains the model specified by either fun or predfun using the other k – 1 sets. crossval then uses the test data along with the trained model to compute either values or err. You can use only one of these four name-value pair arguments: Holdout, KFold, Leaveout, and Partition. Example: 'KFold',5 Data Types: single | double Leaveout — Leave-one-out cross-validation [] (default) | 1 35-1352

crossval

Leave-one-out cross-validation, specified as the comma-separated pair consisting of 'Leaveout' and 1. If you specify 'Leaveout',1, then for each observation, crossval reserves the observation as test data, and trains the model specified by either fun or predfun using the other observations. The function then uses the test observation along with the trained model to compute either values or err. You can use only one of these four name-value pair arguments: Holdout, KFold, Leaveout, and Partition. Example: 'Leaveout',1 Data Types: single | double MCReps — Number of Monte Carlo repetitions 1 (default) | positive integer scalar Number of Monte Carlo repetitions for validation, specified as the comma-separated pair consisting of 'MCReps' and a positive integer scalar. If the first input of crossval is 'mse' or 'mcr' (see criterion), then crossval returns the mean MSE or misclassification rate across all Monte Carlo repetitions. Otherwise, crossval concatenates the values from all Monte Carlo repetitions along the first dimension. If you specify both Partition and MCReps, then the first Monte Carlo repetition uses the partition information in the cvpartition object, and the software calls the repartition object function to generate new partitions for each of the remaining Monte Carlo repetitions. If the Leaveout value is 1, the Partition value is a cvpartition object of type 'leaveout' or 'resubstitution', or the Partition value is a custom cvpartition object (that is, the IsCustom property is set to 1), then the software sets the MCReps value to 1. Example: 'MCReps',5 Data Types: single | double Partition — Cross-validation partition [] (default) | cvpartition partition object Cross-validation partition, specified as the comma-separated pair consisting of 'Partition' and a cvpartition partition object created by cvpartition. The partition object specifies the type of cross-validation and the indexing for the training and test sets. When you use crossval, you cannot specify both Partition and Stratify. Instead, directly specify a stratified partition when you create the cvpartition partition object. You can use only one of these four name-value pair arguments: Holdout, KFold, Leaveout, and Partition. Stratify — Variable specifying groups used for stratification column vector Variable specifying the groups used for stratification, specified as the comma-separated pair consisting of 'Stratify' and a column vector with the same number of rows as the data X or X1,...,XN. 35-1353

35

Functions

When you specify Stratify, both the training and test sets have roughly the same class proportions as in the Stratify vector. The software treats NaNs, empty character vectors, empty strings, values, and values in Stratify as missing data values, and ignores the corresponding rows of the data. A good practice is to use stratification when you use cross-validation with classification algorithms. Otherwise, some test sets might not include observations for all classes. When you use crossval, you cannot specify both Partition and Stratify. Instead, directly specify a stratified partition when you create the cvpartition partition object. Data Types: single | double | logical | string | cell | categorical Options — Options for running in parallel and setting random streams structure Options for running computations in parallel and setting random streams, specified as a structure. Create the Options structure with statset. This table lists the option fields and their values. Field Name

Value

Default

UseParallel

Set this value to true to run computations in parallel.

false

UseSubstreams

Set this value to true to run computations in parallel in a reproducible manner.

false

To compute reproducibly, set Streams to a type that allows substreams: 'mlfg6331_64' or 'mrg32k3a'. Streams

Specify this value as a RandStream object or a cell array consisting of one such object.

If you do not specify Streams, then crossval uses the default stream.

Note You need Parallel Computing Toolbox to run computations in parallel. Example: 'Options',statset('UseParallel',true) Data Types: struct

Output Arguments err — Mean squared error or misclassification rate numeric scalar Mean squared error or misclassification rate, returned as a numeric scalar. The type of error depends on the criterion value. values — Loss values column vector | matrix 35-1354

crossval

Loss values, returned as a column vector or matrix. Each row of values corresponds to the output of fun for one partition of training and test data. If the output returned by fun is multidimensional, then crossval reshapes the output and fits it into one row of values. For an example, see “Create Confusion Matrix Using Cross-Validation” on page 35-1346.

Tips • A good practice is to use stratification (see Stratify) when you use cross-validation with classification algorithms. Otherwise, some test sets might not include observations for all classes.

Algorithms General Cross-Validation Steps for predfun When you use predfun, the crossval function typically performs 10-fold cross-validation as follows: 1

Split the observations in the predictor data X and the response variable y into 10 groups, each of which has approximately the same number of observations.

2

Use the last nine groups of observations to train a model as specified in predfun. Use the first group of observations as test data, pass the test predictor data to the trained model, and compute predicted values as specified in predfun. Compute the error specified by criterion.

3

Use the first group and the last eight groups of observations to train a model as specified in predfun. Use the second group of observations as test data, pass the test data to the trained model, and compute predicted values as specified in predfun. Compute the error specified by criterion.

4

Proceed in a similar manner until each group of observations is used as test data exactly once.

5

Return the mean error estimate as the scalar err.

General Cross-Validation Steps for fun When you use fun, the crossval function typically performs 10-fold cross-validation as follows: 1

Split the data in X into 10 groups, each of which has approximately the same number of observations.

2

Use the last nine groups of data to train a model as specified in fun. Use the first group of data as a test set, pass the test set to the trained model, and compute some value (for example, loss) as specified in fun.

3

Use the first group and the last eight groups of data to train a model as specified in fun. Use the second group of data as a test set, pass the test set to the trained model, and compute some value as specified in fun.

4

Proceed in a similar manner until each group of data is used as a test set exactly once.

5

Return the 10 computed values as the vector values.

Alternative Functionality Many classification and regression functions allow you to perform cross-validation directly. 35-1355

35

Functions

• When you use fit functions such as fitcsvm, fitctree, and fitrtree, you can specify crossvalidation options by using name-value pair arguments. Alternatively, you can first create models with these fit functions and then create a partitioned object by using the crossval object function. Use the kfoldLoss and kfoldPredict object functions to compute the loss and predicted values for the partitioned object. For more information, see ClassificationPartitionedModel and RegressionPartitionedModel. • You can also specify cross-validation options when you perform lasso or elastic net regularization using lasso and lassoglm.

Version History Introduced in R2008a

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, specify the Options name-value argument in the call to this function and set the UseParallel field of the options structure to true using statset: "Options",statset("UseParallel",true) For more information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also cvpartition | pca | regress | classify | kmeans | confusionmat Topics “Select Features for Classifying High-Dimensional Data” on page 16-157

35-1356

crossval

crossval Cross-validate machine learning model

Syntax CVMdl = crossval(Mdl) CVMdl = crossval(Mdl,Name,Value)

Description CVMdl = crossval(Mdl) returns a cross-validated (partitioned) machine learning model (CVMdl) from a trained model (Mdl). By default, crossval uses 10-fold cross-validation on the training data. CVMdl = crossval(Mdl,Name,Value) sets an additional cross-validation option. You can specify only one name-value argument. For example, you can specify the number of folds or a holdout sample proportion.

Examples Cross-Validate SVM Classifier Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). load ionosphere rng(1); % For reproducibility

Train a support vector machine (SVM) classifier. Standardize the predictor data and specify the order of the classes. SVMModel = fitcsvm(X,Y,'Standardize',true,'ClassNames',{'b','g'});

SVMModel is a trained ClassificationSVM classifier. 'b' is the negative class and 'g' is the positive class. Cross-validate the classifier using 10-fold cross-validation. CVSVMModel = crossval(SVMModel) CVSVMModel = ClassificationPartitionedModel CrossValidatedModel: 'SVM' PredictorNames: {'x1' 'x2' 'x3' ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none'

'x4'

'x5'

'x6'

'x7'

'x8'

'x9'

'x10'

35-1357

'x11'

'x1

35

Functions

CVSVMModel is a ClassificationPartitionedModel cross-validated classifier. During crossvalidation, the software completes these steps: 1

Randomly partition the data into 10 sets of equal size.

2

Train an SVM classifier on nine of the sets.

3

Repeat steps 1 and 2 k = 10 times. The software leaves out one partition each time and trains on the other nine partitions.

4

Combine generalization statistics for each fold.

Display the first model in CVSVMModel.Trained. FirstModel = CVSVMModel.Trained{1} FirstModel = CompactClassificationSVM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: Alpha: Bias: KernelParameters: Mu: Sigma: SupportVectors: SupportVectorLabels:

'Y' [] {'b' 'g'} 'none' [78x1 double] -0.2209 [1x1 struct] [0.8888 0 0.6320 0.0406 0.5931 0.1205 0.5361 0.1286 0.5083 0.1879 0.47 [0.3149 0 0.5033 0.4441 0.5255 0.4663 0.4987 0.5205 0.5040 0.4780 0.56 [78x34 double] [78x1 double]

FirstModel is the first of the 10 trained classifiers. It is a CompactClassificationSVM classifier. You can estimate the generalization error by passing CVSVMModel to kfoldLoss.

Specify Holdout Sample Proportion for Naive Bayes Cross-Validation Specify a holdout sample proportion for cross-validation. By default, crossval uses 10-fold crossvalidation to cross-validate a naive Bayes classifier. However, you have several other options for cross-validation. For example, you can specify a different number of folds or a holdout sample proportion. Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). load ionosphere

Remove the first two predictors for stability. X = X(:,3:end); rng('default'); % For reproducibility

Train a naive Bayes classifier using the predictors X and class labels Y. A recommended practice is to specify the class names. 'b' is the negative class and 'g' is the positive class. fitcnb assumes that each predictor is conditionally and normally distributed. 35-1358

crossval

Mdl = fitcnb(X,Y,'ClassNames',{'b','g'});

Mdl is a trained ClassificationNaiveBayes classifier. Cross-validate the classifier by specifying a 30% holdout sample. CVMdl = crossval(Mdl,'Holdout',0.3) CVMdl = ClassificationPartitionedModel CrossValidatedModel: 'NaiveBayes' PredictorNames: {'x1' 'x2' 'x3' ResponseName: 'Y' NumObservations: 351 KFold: 1 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none'

'x4'

'x5'

'x6'

'x7'

'x8'

'x9'

'x10'

CVMdl is a ClassificationPartitionedModel cross-validated, naive Bayes classifier. Display the properties of the classifier trained using 70% of the data. TrainedModel = CVMdl.Trained{1} TrainedModel = CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell}

TrainedModel is a CompactClassificationNaiveBayes classifier. Estimate the generalization error by passing CVMdl to kfoldloss. kfoldLoss(CVMdl) ans = 0.2095

The out-of-sample misclassification error is approximately 21%. Reduce the generalization error by choosing the five most important predictors. idx = fscmrmr(X,Y); Xnew = X(:,idx(1:5));

Train a naive Bayes classifier for the new predictor. Mdlnew = fitcnb(Xnew,Y,'ClassNames',{'b','g'});

Cross-validate the new classifier by specifying a 30% holdout sample, and estimate the generalization error. 35-1359

'x11'

'x1

35

Functions

CVMdlnew = crossval(Mdlnew,'Holdout',0.3); kfoldLoss(CVMdlnew) ans = 0.1429

The out-of-sample misclassification error is reduced from approximately 21% to approximately 14%.

Create Cross-Validated Regression GAM Using crossval Train a regression generalized additive model (GAM) by using fitrgam, and create a cross-validated GAM by using crossval and the holdout option. Then, use kfoldPredict to predict responses for validation-fold observations using a model trained on training-fold observations. Load the patients data set. load patients

Create a table that contains the predictor variables (Age, Diastolic, Smoker, Weight, Gender, SelfAssessedHealthStatus) and the response variable (Systolic). tbl = table(Age,Diastolic,Smoker,Weight,Gender,SelfAssessedHealthStatus,Systolic);

Train a GAM that contains linear terms for predictors. Mdl = fitrgam(tbl,'Systolic');

Mdl is a RegressionGAM model object. Cross-validate the model by specifying a 30% holdout sample. rng('default') % For reproducibility CVMdl = crossval(Mdl,'Holdout',0.3) CVMdl = RegressionPartitionedGAM CrossValidatedModel: PredictorNames: CategoricalPredictors: ResponseName: NumObservations: KFold: Partition: NumTrainedPerFold: ResponseTransform: IsStandardDeviationFit:

'GAM' {'Age' 'Diastolic' [3 5 6] 'Systolic' 100 1 [1x1 cvpartition] [1x1 struct] 'none' 0

'Smoker'

'Weight'

'Gender'

'SelfAssessedHealt

The crossval function creates a RegressionPartitionedGAM model object CVMdl with the holdout option. During cross-validation, the software completes these steps:

35-1360

1

Randomly select and reserve 30% of the data as validation data, and train the model using the rest of the data.

2

Store the compact, trained model in the Trained property of the cross-validated model object RegressionPartitionedGAM.

crossval

You can choose a different cross-validation setting by using the 'CrossVal', 'CVPartition', 'KFold', or 'Leaveout' name-value argument. Predict responses for the validation-fold observations by using kfoldPredict. The function predicts responses for the validation-fold observations by using the model trained on the training-fold observations. The function assigns NaN to the training-fold observations. yFit = kfoldPredict(CVMdl);

Find the validation-fold observation indexes, and create a table containing the observation index, observed response values, and predicted response values. Display the first eight rows of the table. idx = find(~isnan(yFit)); t = table(idx,tbl.Systolic(idx),yFit(idx), ... 'VariableNames',{'Obseraction Index','Observed Value','Predicted Value'}); head(t) Obseraction Index _________________

Observed Value ______________

1 6 7 12 20 22 23 24

124 121 130 115 125 123 114 128

Predicted Value _______________ 130.22 124.38 125.26 117.05 121.82 116.99 107 122.52

Compute the regression error (mean squared error) for the validation-fold observations. L = kfoldLoss(CVMdl) L = 43.8715

Input Arguments Mdl — Machine learning model full regression model object | full classification model object Machine learning model, specified as a full regression or classification model object, as given in the following tables of supported models. Regression Model Object Model

Full Regression Model Object

Gaussian process regression (GPR) model

RegressionGP (If you supply a custom 'ActiveSet' in the call to fitrgp, then you cannot cross-validate the GPR model.)

Generalized additive model (GAM)

RegressionGAM

Neural network model

RegressionNeuralNetwork

35-1361

35

Functions

Classification Model Object Model

Full Classification Model Object

Generalized additive model

ClassificationGAM

k-nearest neighbor model

ClassificationKNN

Naive Bayes model

ClassificationNaiveBayes

Neural network model

ClassificationNeuralNetwork

Support vector machine for one-class and binary classification

ClassificationSVM

Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: crossval(Mdl,'KFold',3) specifies using three folds in a cross-validated model. CVPartition — Cross-validation partition [] (default) | cvpartition partition object Cross-validation partition, specified as a cvpartition partition object created by cvpartition. The partition object specifies the type of cross-validation and the indexing for the training and validation sets. You can specify only one of these four name-value arguments: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,'KFold',5). Then, you can specify the cross-validated model by using 'CVPartition',cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as a scalar value in the range (0,1). If you specify 'Holdout',p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact, trained model in the Trained property of the cross-validated model. If Mdl does not have a corresponding compact object, then Trained contains a full object.

You can specify only one of these four name-value arguments: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Example: 'Holdout',0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 35-1362

crossval

Number of folds to use in a cross-validated model, specified as a positive integer value greater than 1. If you specify 'KFold',k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact, trained models in a k-by-1 cell vector in the Trained property of the crossvalidated model. If Mdl does not have a corresponding compact object, then Trained contains a full object.

You can specify only one of these four name-value arguments: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Example: 'KFold',5 Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as 'on' or 'off'. If you specify 'Leaveout','on', then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the one observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact, trained models in an n-by-1 cell vector in the Trained property of the cross-validated model. If Mdl does not have a corresponding compact object, then Trained contains a full object.

You can specify only one of these four name-value arguments: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Example: 'Leaveout','on'

Output Arguments CVMdl — Cross-validated machine learning model cross-validated (partitioned) model object Cross-validated machine learning model, returned as one of the cross-validated (partitioned) model objects in the following tables, depending on the input model Mdl. Regression Model Object Model

Regression Model (Mdl)

Cross-Validated Model (CVMdl)

Gaussian process regression model

RegressionGP

RegressionPartitionedGP

Generalized additive model

RegressionGAM

RegressionPartitionedGAM

Neural network model

RegressionNeuralNetwork

RegressionPartitionedNeu ralNetwork

35-1363

35

Functions

Classification Model Object Model

Classification Model (Mdl)

Cross-Validated Model (CVMdl)

Generalized additive model

ClassificationGAM

ClassificationPartitione dGAM

k-nearest neighbor model

ClassificationKNN

ClassificationPartitione dModel

Naive Bayes model

ClassificationNaiveBayes ClassificationPartitione dModel

Neural network model

ClassificationNeuralNetw ClassificationPartitione ork dModel

Support vector machine for one- ClassificationSVM class and binary classification

ClassificationPartitione dModel

Tips • Assess the predictive performance of Mdl on cross-validated data by using the kfold functions and properties of CVMdl, such as kfoldPredict, kfoldLoss, kfoldMargin, and kfoldEdge for classification and kfoldPredict and kfoldLoss for regression. • Return a partitioned classifier with stratified partitioning by using the name-value argument 'KFold' or 'Holdout'. • Create a cvpartition object cvp using cvp = cvpartition(n,'KFold',k). Return a partitioned classifier with nonstratified partitioning by using the name-value argument 'CVPartition',cvp.

Alternative Functionality Instead of training a model and then cross-validating it, you can create a cross-validated model directly by using a fitting function and specifying one of these name-value argument: 'CrossVal', 'CVPartition', 'Holdout', 'Leaveout', or 'KFold'.

Version History Introduced in R2012a R2023b: A cross-validated regression neural network model is a RegressionPartitionedNeuralNetwork object Behavior changed in R2023b Starting in R2023b, a cross-validated regression neural network model is a RegressionPartitionedNeuralNetwork object. In previous releases, a cross-validated regression neural network model was a RegressionPartitionedModel object. You can create a RegressionPartitionedNeuralNetwork object in two ways: • Create a cross-validated model from a regression neural network model object RegressionNeuralNetwork by using the crossval object function. 35-1364

crossval

• Create a cross-validated model by using the fitrnet function and specifying one of the namevalue arguments CrossVal, CVPartition, Holdout, KFold, or Leaveout. R2022b: A cross-validated Gaussian process regression model is a RegressionPartitionedGP object Behavior changed in R2022b Starting in R2022b, a cross-validated Gaussian process regression (GPR) model is a RegressionPartitionedGP object. In previous releases, a cross-validated GPR model was a RegressionPartitionedModel object. You can create a RegressionPartitionedGP object in two ways: • Create a cross-validated model from a GPR model object RegressionGP by using the crossval object function. • Create a cross-validated model by using the fitrgp function and specifying one of the namevalue arguments CrossVal, CVPartition, Holdout, KFold, or Leaveout. Regardless of whether you train a full or cross-validated GPR model first, you cannot specify an ActiveSet value in the call to fitrgp.

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • This function fully supports GPU arrays for a trained classification model specified as a ClassificationKNN or ClassificationSVM object. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also cvpartition

35-1365

35

Functions

crossval Cross-validate discriminant analysis classifier

Syntax cvmodel = crossval(mdl) cvmodel = crossval(mdl,Name=Value)

Description cvmodel = crossval(mdl) returns a cross-validated (partitioned) discriminant analysis classifier (cvmodel) from a trained discriminant analysis classifier (mdl). By default, crossval uses 10-fold cross validation on the training data to create cvmodel. cvmodel = crossval(mdl,Name=Value) specifies additional options using one or more namevalue arguments. For example, you can specify the fraction of data for holdout validation, and the number of folds to use in the cross-validated model.

Examples Partitioned Model from Fitted Discriminant Analysis Classifier Create a cross-validation model and evaluate its quality. Create a classification model for the Fisher iris data. load fisheriris obj = fitcdiscr(meas,species);

Create a cross-validation model. cvmodel = crossval(obj);

Evaluate the quality of the model using kfoldLoss. L = kfoldLoss(cvmodel) L = 0.0200

Input Arguments mdl — Trained discriminant analysis classifier ClassificationDiscriminant model object Trained discriminant analysis classifier, specified as a ClassificationDiscriminant model object trained with fitcdiscr. 35-1366

crossval

Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: cvmodel = crossval(mdl,KFold=15) specifies to use 15 folds in the cross-validated model. CVPartition — Cross-validation partition [] (default) | cvpartition object Cross-validation partition, specified as a cvpartition object that specifies the type of crossvalidation and the indexing for the training and validation sets. To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,KFold=5). Then, you can specify the cross-validation partition by setting CVPartition=cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as a scalar value in the range [0,1]. If you specify Holdout=p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Holdout=0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in the cross-validated model, specified as a positive integer value greater than 1. If you specify KFold=k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact trained models in a k-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: KFold=5 35-1367

35

Functions

Data Types: single | double Leaveout — Leave-one-out cross-validation flag "off" (default) | "on" Leave-one-out cross-validation flag, specified as "on" or "off". If you specify Leaveout="on", then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the one observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact trained models in an n-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Leaveout="on" Data Types: char | string

Tips • Assess the predictive performance of mdl on cross-validated data using the “kfold” methods and properties of cvmodel, such as kfoldLoss.

Alternative Functionality You can create a cross-validation classifier directly from the data, instead of creating a discriminant analysis classifier followed by a cross-validation classifier. To do so, include one of these options in fitcdiscr: CrossVal, CVPartition, Holdout, KFold, or Leaveout.

Version History Introduced in R2011b

See Also Functions fitcdiscr | crossval | kfoldEdge | kfoldfun | kfoldLoss | kfoldMargin | kfoldPredict Topics “Discriminant Analysis Classification” on page 21-2

35-1368

crossval

crossval Cross-validate multiclass error-correcting output codes (ECOC) model

Syntax CVMdl = crossval(Mdl) CVMdl = crossval(Mdl,Name,Value)

Description CVMdl = crossval(Mdl) returns a cross-validated (partitioned) multiclass error-correcting output codes (ECOC) model (CVMdl) from a trained ECOC model (Mdl). By default, crossval uses 10-fold cross-validation on the training data to create CVMdl, a ClassificationPartitionedECOC model. CVMdl = crossval(Mdl,Name,Value) returns a partitioned ECOC model with additional options specified by one or more name-value pair arguments. For example, you can specify the number of folds or a holdout sample proportion.

Examples Cross-Validate ECOC Classifier Cross-validate an ECOC classifier with SVM binary learners, and estimate the generalized classification error. Load Fisher's iris data set. Specify the predictor data X and the response data Y. load fisheriris X = meas; Y = species; rng(1); % For reproducibility

Create an SVM template, and standardize the predictors. t = templateSVM('Standardize',true) t = Fit template for SVM. Standardize: 1

t is an SVM template. Most of the template object properties are empty. When training the ECOC classifier, the software sets the applicable properties to their default values. Train the ECOC classifier, and specify the class order. Mdl = fitcecoc(X,Y,'Learners',t,... 'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a ClassificationECOC classifier. You can access its properties using dot notation. 35-1369

35

Functions

Cross-validate Mdl using 10-fold cross-validation. CVMdl = crossval(Mdl);

CVMdl is a ClassificationPartitionedECOC cross-validated ECOC classifier. Estimate the generalized classification error. genError = kfoldLoss(CVMdl) genError = 0.0400

The generalized classification error is 4%, which indicates that the ECOC classifier generalizes fairly well.

Cross-Validate ECOC Classifier Using Parallel Computing Consider the arrhythmia data set. This data set contains 16 classes, 13 of which are represented in the data. The first class indicates that the subject does not have arrhythmia, and the last class indicates that the arrhythmia state of the subject is not recorded. The other classes are ordinal levels indicating the severity of arrhythmia. Train an ECOC classifier with a custom coding design specified by the description of the classes. Load the arrhythmia data set. Convert Y to a categorical variable, and determine the number of classes. load arrhythmia Y = categorical(Y); K = unique(Y); % Number of distinct classes

Construct a coding matrix that describes the nature of the classes. OrdMat = designecoc(11,'ordinal'); nOrdMat = size(OrdMat); class1VSOrd = [1; -ones(11,1); 0]; class1VSClass16 = [1; zeros(11,1); -1]; OrdVSClass16 = [0; ones(11,1); -1]; Coding = [class1VSOrd class1VSClass16 OrdVSClass16,... [zeros(1,nOrdMat(2)); OrdMat; zeros(1,nOrdMat(2))]];

Train an ECOC classifier using the custom coding design (Coding) and parallel computing. Specify an ensemble of 50 classification trees boosted using GentleBoost. t = templateEnsemble('GentleBoost',50,'Tree'); options = statset('UseParallel',true); Mdl = fitcecoc(X,Y,'Coding',Coding,'Learners',t,'Options',options); Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6).

Mdl is a ClassificationECOC model. You can access its properties using dot notation. Cross-validate Mdl using 8-fold cross-validation and parallel computing. 35-1370

crossval

rng(1); % For reproducibility CVMdl = crossval(Mdl,'Options',options,'KFold',8); Warning: One or more folds do not contain points from all the groups.

Because some classes have low relative frequency, some folds do not train using observations from those classes. CVMdl is a ClassificationPartitionedECOC cross-validated ECOC model. Estimate the generalization error using parallel computing. error = kfoldLoss(CVMdl,'Options',options) error = 0.3208

The cross-validated classification error is 32%, which indicates that this model does not generalize well. To improve the model, try training using a different boosting method, such as RobustBoost, or a different algorithm, such as SVM.

Input Arguments Mdl — Full, trained multiclass ECOC model ClassificationECOC model Full, trained multiclass ECOC model, specified as a ClassificationECOC model trained with fitcecoc. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: crossval(Mdl,'KFold',3) specifies using three folds in a cross-validated model. CVPartition — Cross-validation partition [] (default) | cvpartition object Cross-validation partition, specified as a cvpartition object that specifies the type of crossvalidation and the indexing for the training and validation sets. To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,KFold=5). Then, you can specify the cross-validation partition by setting CVPartition=cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as a scalar value in the range [0,1]. If you specify Holdout=p, then the software completes these steps: 35-1371

35

Functions

1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Holdout=0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in the cross-validated model, specified as a positive integer value greater than 1. If you specify KFold=k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact trained models in a k-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: KFold=5 Data Types: single | double Leaveout — Leave-one-out cross-validation flag "off" (default) | "on" Leave-one-out cross-validation flag, specified as "on" or "off". If you specify Leaveout="on", then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the one observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact trained models in an n-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Leaveout="on" Data Types: char | string Options — Estimation options [] (default) | structure array returned by statset Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. To invoke parallel computing: 35-1372

crossval

• You need a Parallel Computing Toolbox license. • Specify 'Options',statset('UseParallel',true).

Tips • Assess the predictive performance of Mdl on cross-validated data using the "kfold" methods and properties of CVMdl, such as kfoldLoss.

Alternative Functionality Instead of training an ECOC model and then cross-validating it, you can create a cross-validated ECOC model directly by using fitcecoc and specifying one of these name-value pair arguments: 'CrossVal', 'CVPartition', 'Holdout', 'Leaveout', or 'KFold'.

Version History Introduced in R2014b

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, specify the Options name-value argument in the call to this function and set the UseParallel field of the options structure to true using statset: "Options",statset("UseParallel",true) For more information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox). GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also ClassificationECOC | CompactClassificationECOC | ClassificationPartitionedECOC | fitcecoc | statset | cvpartition Topics “Quick Start Parallel Computing for Statistics and Machine Learning Toolbox” on page 33-2 “Reproducibility in Parallel Statistical Computations” on page 33-16 “Concepts of Parallel Computing in Statistics and Machine Learning Toolbox” on page 33-6

35-1373

35

Functions

crossval Cross-validate classification ensemble model

Syntax cvens = crossval(ens) cvens = crossval(ens,Name=Value)

Description cvens = crossval(ens) returns a cross-validated (partitioned) classification ensemble model (cvens) from a trained classification ensemble model (ens). By default, crossval uses 10-fold crossvalidation on the training data to create cvens, a ClassificationPartitionedEnsemble model. cvens = crossval(ens,Name=Value) specifies additional options using one or more name-value arguments. For example, you can specify the fraction of data for holdout validation, and monitor the training of the cross-validation folds.

Input Arguments ens — Classification ensemble model ClassificationEnsemble model object Classification ensemble model, specified as a ClassificationEnsemble model object trained with fitcensemble. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: crossval(ens,KFold=10,NPrint=5) specifies to use 10 folds in a cross-validated model, and to display a message to the command line every time crossval finishes training 5 folds. CVPartition — Cross-validation partition [] (default) | cvpartition object Cross-validation partition, specified as a cvpartition object that specifies the type of crossvalidation and the indexing for the training and validation sets. To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,KFold=5). Then, you can specify the cross-validation partition by setting CVPartition=cvp. 35-1374

crossval

Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as a scalar value in the range [0,1]. If you specify Holdout=p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Holdout=0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in the cross-validated model, specified as a positive integer value greater than 1. If you specify KFold=k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact trained models in a k-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: KFold=5 Data Types: single | double Leaveout — Leave-one-out cross-validation flag "off" (default) | "on" Leave-one-out cross-validation flag, specified as "on" or "off". If you specify Leaveout="on", then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the one observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact trained models in an n-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Leaveout="on" Data Types: char | string NPrint — Printout frequency "off" (default) | positive integer 35-1375

35

Functions

Printout frequency, specified as a positive integer or "off". To track the number of folds trained by the software so far, specify a positive integer m. The software displays a message to the command line every time it finishes training m folds. If you specify "off", the software does not display a message when it completes training folds. Example: NPrint=5 Data Types: single | double | char | string

Examples Cross-Validate Classification Ensemble Create a cross-validated classification model for the Fisher iris data, and assess its quality using the kfoldLoss method. Load the Fisher iris data set. load fisheriris

Train an ensemble of 100 boosted classification trees using AdaBoostM2. t = templateTree(MaxNumSplits=1); % Weak learner template tree object ens = fitcensemble(meas,species,"Method","AdaBoostM2","Learners",t);

Create a cross-validated ensemble from ens and find the classification error averaged over all folds. rng(10,"twister") % For reproducibility cvens = crossval(ens); L = kfoldLoss(cvens) L = 0.0533

Alternatives Instead of training a classification ensemble model and then cross-validating it, you can create a cross-validated model directly by using fitcensemble and specifying any of these name-value arguments: CrossVal, CVPartition, Holdout, Leaveout, or KFold.

Version History Introduced in R2011a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). 35-1376

crossval

See Also ClassificationPartitionedEnsemble | cvpartition | ClassificationEnsemble | fitcensemble

35-1377

35

Functions

crossval Cross-validated decision tree

Syntax cvmodel = crossval(model) cvmodel = crossval(model,Name,Value)

Description cvmodel = crossval(model) creates a partitioned model from model, a fitted classification tree. By default, crossval uses 10-fold cross validation on the training data to create cvmodel. cvmodel = crossval(model,Name,Value) creates a partitioned model with additional options specified by one or more Name,Value pair arguments.

Examples Create a Cross-Validation Model Create a classification model for the ionosphere data, then create a cross-validation model. Evaluate the quality the model using kfoldLoss. load ionosphere tree = fitctree(X,Y); cvmodel = crossval(tree); L = kfoldLoss(cvmodel) L = 0.1083

Input Arguments model — Classification model ClassificationTree object Classification model, specified as a ClassificationTree object. Use the fitctree function to create a classification tree object. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: cvmodel = crossval(model,'Holdout',0.2) 35-1378

crossval

CVPartition — Cross-validation partition [] (default) | cvpartition object Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition object created by the cvpartition function. crossval splits the data into subsets with cvpartition. Use only one of these four options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). Use only one of these four options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Example: 'Holdout',0.3 Data Types: single | double KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated model, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. Use only one of these four options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Example: 'KFold',3 Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. Leave-one-out is a special case of 'KFold' in which the number of folds equals the number of observations. Use only one of these four options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Example: 'Leaveout','on'

Output Arguments cvmodel — Partitioned model ClassificationPartionedModel object Partitioned model, returned as a ClassificationPartitionedModel object. 35-1379

35

Functions

Tips Assess the predictive performance of model on cross-validated data using the “kfold” methods and properties of cvmodel, such as kfoldLoss.

Alternatives You can create a cross-validation tree directly from the data, instead of creating a decision tree followed by a cross-validation tree. To do so, include one of these five options in fitctree: 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'.

Version History Introduced in R2011a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also fitctree | crossval

35-1380

crossval

crossval Package: timeseries.forecaster Cross-validate direct forecasting model

Syntax CVMdl = crossval(Mdl,TSPartition)

Description CVMdl = crossval(Mdl,TSPartition) returns a cross-validated (partitioned) direct forecasting model (CVMdl) from a trained direct forecasting model (Mdl). The crossval function uses the crossvalidation scheme specified by TSPartition. You can assess the predictive performance of Mdl on cross-validated data by using the object functions of CVMdl (cvloss and cvpredict).

Examples Evaluate Model Using Expanding Window Cross-Validation Create a cross-validated direct forecasting model using expanding window cross-validation. To evaluate the performance of the model: • Compute the mean squared error (MSE) on each test set using the cvloss object function. • For each test set, compare the true response values to the predicted response values using the cvpredict object function. Load the sample file TemperatureData.csv, which contains average daily temperature from January 2015 through July 2016. Read the file into a table. Observe the first eight observations in the table. Tbl = readtable("TemperatureData.csv"); head(Tbl) Year ____

Month ___________

Day ___

TemperatureF ____________

2015 2015 2015 2015 2015 2015 2015 2015

{'January'} {'January'} {'January'} {'January'} {'January'} {'January'} {'January'} {'January'}

1 2 3 4 5 6 7 8

23 31 25 39 29 12 10 4

Create a datetime variable t that contains the year, month, and day information for each observation in Tbl. 35-1381

35

Functions

numericMonth = month(datetime(Tbl.Month, ... InputFormat="MMMM")); t = datetime(Tbl.Year,numericMonth,Tbl.Day);

Plot the temperature values in Tbl over time. plot(t,Tbl.TemperatureF) xlabel("Date") ylabel("Temperature in Fahrenheit")

Create a direct forecasting model by using the data in Tbl. Train the model using a bagged ensemble of trees. All three of the predictors (Year, Month, and Day) are leading predictors because their future values are known. To create new predictors by shifting the leading predictor and response variables backward in time, specify the leading predictor lags and the response variable lags. Mdl = directforecaster(Tbl,"TemperatureF", ... Learner="bag", ... LeadingPredictors="all",LeadingPredictorLags={0:1,0:1,0:7}, ... ResponseLags=1:7) Mdl = DirectForecaster Horizon: ResponseLags: LeadingPredictors: LeadingPredictorLags: ResponseName:

35-1382

1 [1 2 3 4 5 6 7] [1 2 3] {[0 1] [0 1] [0 1 2 3 4 5 6 7]} 'TemperatureF'

crossval

PredictorNames: CategoricalPredictors: Learners: MaxLag: NumObservations:

{'Year' 'Month' 'Day'} 2 {[1x1 classreg.learning.regr.CompactRegressionEnsemble]} 7 565

Mdl is a DirectForecaster model object. By default, the horizon is one step ahead. That is, Mdl predicts a value that is one step into the future. Partition the time series data in Tbl using an expanding window cross-validation scheme. Create three training sets and three test sets, where each test set has 100 observations. Note that each observation in Tbl is in at most one test set. CVPartition = tspartition(size(Mdl.X,1),"ExpandingWindow",3, ... TestSize=100) CVPartition = tspartition Type: NumObservations: NumTestSets: TrainSize: TestSize: StepSize:

'expanding-window' 565 3 [265 365 465] [100 100 100] 100

The training sets increase in size from 265 observations in the first window to 465 observations in the third window. Create a cross-validated direct forecasting model using the partition specified in CVPartition. Inspect the Learners property of the resulting CVMdl object. CVMdl = crossval(Mdl,CVPartition) CVMdl = PartitionedDirectForecaster Partition: Horizon: ResponseLags: LeadingPredictors: LeadingPredictorLags: ResponseName: PredictorNames: CategoricalPredictors: Learners: MaxLag: NumObservations:

[1x1 tspartition] 1 [1 2 3 4 5 6 7] [1 2 3] {[0 1] [0 1] [0 1 2 3 4 5 6 7]} 'TemperatureF' {'Year' 'Month' 'Day'} 2 {3x1 cell} 7 565

CVMdl.Learners ans=3×1 cell array {1x1 timeseries.forecaster.CompactDirectForecaster}

35-1383

35

Functions

{1x1 timeseries.forecaster.CompactDirectForecaster} {1x1 timeseries.forecaster.CompactDirectForecaster}

CVMdl is a PartitionedDirectForecaster model object. The crossval function trains CVMdl.Learners{1} using the observations in the first training set, CVMdl.Learner{2} using the observations in the second training set, and CVMdl.Learner{3} using the observations in the third training set. Compute the average test set MSE. averageMSE = cvloss(CVMdl) averageMSE = 53.3480

To obtain more information, compute the MSE for each test set. individualMSE = cvloss(CVMdl,Mode="individual") individualMSE = 3×1 44.1352 84.0695 31.8393

The models trained on the first and third training sets seem to perform better than the model trained on the second training set. For each test set observation, predict the temperature value using the corresponding model in CVMdl.Learners. predictedY = cvpredict(CVMdl); predictedY(260:end,:) ans=306×1 table TemperatureF_Step1 __________________

⋮

35-1384

NaN NaN NaN NaN NaN NaN 50.963 57.363 57.04 60.705 59.606 58.302 58.023 61.39 67.229 61.083

crossval

Only the last 300 observations appear in any test set. For observations that do not appear in a test set, the predicted response value is NaN. For each test set, plot the true response values and the predicted response values. tiledlayout(3,1) nexttile idx1 = test(CVPartition,1); plot(t(idx1),Tbl.TemperatureF(idx1)) hold on plot(t(idx1),predictedY.TemperatureF_Step1(idx1)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Test Set 1") hold off nexttile idx2 = test(CVPartition,2); plot(t(idx2),Tbl.TemperatureF(idx2)) hold on plot(t(idx2),predictedY.TemperatureF_Step1(idx2)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Test Set 2") hold off nexttile idx3 = test(CVPartition,3); plot(t(idx3),Tbl.TemperatureF(idx3)) hold on plot(t(idx3),predictedY.TemperatureF_Step1(idx3)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Test Set 3") hold off

35-1385

35

Functions

Overall, the cross-validated direct forecasting model is able to predict the trend in temperatures. If you are satisfied with the performance of the cross-validated model, you can use the full DirectForecaster model Mdl for forecasting at time steps beyond the available data.

Input Arguments Mdl — Direct forecasting model DirectForecaster model object Direct forecasting model, specified as a DirectForecaster model object. TSPartition — Cross-validation partition for time series data tspartition object Cross-validation partition for time series data, specified as a tspartition object. TSPartition uses an expanding window cross-validation, sliding window cross-validation, or holdout validation scheme (as specified by the tspartition function).

Output Arguments CVMdl — Cross-validated direct forecasting model PartitionedDirectForecaster model object

35-1386

crossval

Cross-validated direct forecasting model, returned as a PartitionedDirectForecaster model object.

Version History Introduced in R2023b

See Also DirectForecaster | PartitionedDirectForecaster | tspartition

35-1387

35

Functions

crossval Cross-validate regression ensemble model

Syntax cvens = crossval(ens) cvens = crossval(ens,Name=Value)

Description cvens = crossval(ens) returns a cross-validated (partitioned) regression ensemble model (cvens) from a trained regression ensemble model (ens). By default, crossval uses 10-fold crossvalidation on the training data to create cvens, a RegressionPartitionedEnsemble model. cvens = crossval(ens,Name=Value) specifies additional options using one or more name-value arguments. For example, you can specify the cross-validation partition, the fraction of data for holdout validation, and the number of folds to use.

Input Arguments ens — Regression ensemble model RegressionEnsemble model object Regression ensemble model, specified as a RegressionEnsemble model object trained with fitrensemble. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: crossval(ens,KFold=10,NPrint=5) specifies to use 10 folds in a cross-validated model, and to display a message to the command line every time crossval finishes training 5 folds. CVPartition — Cross-validation partition [] (default) | cvpartition object Cross-validation partition, specified as a cvpartition object that specifies the type of crossvalidation and the indexing for the training and validation sets. To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,KFold=5). Then, you can specify the cross-validation partition by setting CVPartition=cvp. 35-1388

crossval

Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as a scalar value in the range [0,1]. If you specify Holdout=p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Holdout=0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in the cross-validated model, specified as a positive integer value greater than 1. If you specify KFold=k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact trained models in a k-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: KFold=5 Data Types: single | double Leaveout — Leave-one-out cross-validation flag "off" (default) | "on" Leave-one-out cross-validation flag, specified as "on" or "off". If you specify Leaveout="on", then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the one observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact trained models in an n-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Leaveout="on" Data Types: char | string NPrint — Printout frequency "off" (default) | positive integer 35-1389

35

Functions

Printout frequency, specified as a positive integer or "off". To track the number of folds trained by the software so far, specify a positive integer m. The software displays a message to the command line every time it finishes training m folds. If you specify "off", the software does not display a message when it completes training folds. Example: NPrint=5 Data Types: single | double | char | string

Examples Create Cross-Validated Regression Model Create a cross-validated regression model for the carsmall data, and evaluate its quality using the kfoldLoss method. Load the carsmall data set and select acceleration, displacement, horsepower, and vehicle weight as predictors. load carsmall; X = [Acceleration Displacement Horsepower Weight];

Train a regression ensemble. rens = fitrensemble(X,MPG);

Create a cross-validated ensemble from rens and find the cross-validation loss. rng(10,"twister") % For reproducibility cvens = crossval(rens); L = kfoldLoss(cvens) L = 30.3471

Alternatives You can create a cross-validation ensemble directly from the data, instead of creating an ensemble followed by a cross-validation ensemble. To do so, include one of these five options in fitrensemble: CrossVal, CVPartition, Holdout, Leaveout, or KFold.

Version History Introduced in R2011a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). 35-1390

crossval

See Also cvpartition | kfoldLoss | RegressionPartitionedEnsemble | RegressionEnsemble | fitrensemble

35-1391

35

Functions

crossval Class: RegressionSVM Cross-validated support vector machine regression model

Syntax CVMdl = crossval(mdl) CVMdl = crossval(mdl,Name,Value)

Description CVMdl = crossval(mdl) returns a cross-validated (partitioned) support vector machine regression model, CVMdl, from a trained SVM regression model, mdl. CVMdl = crossval(mdl,Name,Value) returns a cross-validated model with additional options specified by one or more Name,Value pair arguments.

Input Arguments mdl — Full, trained SVM regression model RegressionSVM model Full, trained SVM regression model, specified as a RegressionSVM model returned by fitrsvm. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. CVPartition — Cross-validation partition [] (default) | cvpartition object Cross-validation partition, specified as a cvpartition object that specifies the type of crossvalidation and the indexing for the training and validation sets. To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,KFold=5). Then, you can specify the cross-validation partition by setting CVPartition=cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as a scalar value in the range [0,1]. If you specify Holdout=p, then the software completes these steps: 35-1392

crossval

1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Holdout=0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in the cross-validated model, specified as a positive integer value greater than 1. If you specify KFold=k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact trained models in a k-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: KFold=5 Data Types: single | double Leaveout — Leave-one-out cross-validation flag "off" (default) | "on" Leave-one-out cross-validation flag, specified as "on" or "off". If you specify Leaveout="on", then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the one observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact trained models in an n-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Leaveout="on" Data Types: char | string

Output Arguments CVMdl — Cross-validated SVM regression model RegressionPartitionedSVM model Cross-validated SVM regression model, returned as a RegressionPartitionedSVM model. 35-1393

35

Functions

Examples Train Cross-Validated SVM Regression Model Using crossval This example shows how to train a cross-validated SVM regression model using crossval. This example uses the abalone data from the UCI Machine Learning Repository. Download the data and save it in your current folder with the name 'abalone.data'. Read the data into a table. tbl = readtable('abalone.data','Filetype','text','ReadVariableNames',false); rng default % for reproducibility

The sample data contains 4177 observations. All the predictor variables are continuous except for sex, which is a categorical variable with possible values 'M' (for males), 'F' (for females), and 'I' (for infants). The goal is to predict the number of rings on the abalone and determine its age using physical measurements. Train an SVM regression model, using a Gaussian kernel function with a kernel scale equal to 2.2. Standardize the data. mdl = fitrsvm(tbl,'Var9','KernelFunction','gaussian','KernelScale',2.2,'Standardize',true);

mdl is a trained RegressionSVM regression model. Cross validate the model using 10-fold cross validation. CVMdl = crossval(mdl) CVMdl = classreg.learning.partition.RegressionPartitionedSVM CrossValidatedModel: 'SVM' PredictorNames: {1x8 cell} CategoricalPredictors: 1 ResponseName: 'Var9' NumObservations: 4177 KFold: 10 Partition: [1x1 cvpartition] ResponseTransform: 'none' Properties, Methods

CVMdl is a RegressionPartitionedSVM cross-validated regression model. The software: 1. Randomly partitions the data into 10 equally sized sets. 2. Trains an SVM regression model on nine of the 10 sets. 3. Repeats steps 1 and 2 k = 10 times. It leaves out one of the partitions each time, and trains on the other nine partitions. 4. Combines generalization statistics for each fold. Calculate the resubstitution loss for the cross-validated model. loss = kfoldLoss(CVMdl)

35-1394

crossval

loss = 4.5712

Specify Cross-Validation Holdout Proportion for SVM Regression This example shows how to specify a holdout proportion for training a cross-validated SVM regression model. This example uses the abalone data from the UCI Machine Learning Repository. Download the data and save it in your current folder with the name 'abalone.data'. Read the data into a table. tbl = readtable('abalone.data','Filetype','text','ReadVariableNames',false); rng default % for reproducibility

The sample data contains 4177 observations. All the predictor variables are continuous except for sex, which is a categorical variable with possible values 'M' (for males), 'F' (for females), and 'I' (for infants). The goal is to predict the number of rings on the abalone and determine its age using physical measurements. Train an SVM regression model, using a Gaussian kernel function with an automatic kernel scale. Standardize the data. mdl = fitrsvm(tbl,'Var9','KernelFunction','gaussian','KernelScale','auto','Standardize',true);

mdl is a trained RegressionSVM regression model. Cross validate the regression model by specifying a 10% holdout sample. CVMdl = crossval(mdl,'Holdout',0.1) CVMdl = classreg.learning.partition.RegressionPartitionedSVM CrossValidatedModel: 'SVM' PredictorNames: {1x8 cell} CategoricalPredictors: 1 ResponseName: 'Var9' NumObservations: 4177 KFold: 1 Partition: [1x1 cvpartition] ResponseTransform: 'none' Properties, Methods

CVMdl is a RegressionPartitionedSVM model object. Calculate the resubstitution loss for the cross-validated model. loss = kfoldLoss(CVMdl)

35-1395

35

Functions

loss = 5.2499

Alternatives Instead of training an SVM regression model and then cross-validating it, you can create a crossvalidated model directly by using fitrsvm and specifying any of these name-value pair arguments: 'CrossVal', 'CVPartition', 'Holdout', 'Leaveout', or 'KFold'.

Version History Introduced in R2015b R2023a: GPU array support Starting in R2023a, crossval fully supports GPU arrays.

References [1] Nash, W.J., T. L. Sellers, S. R. Talbot, A. J. Cawthorn, and W. B. Ford. "The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait." Sea Fisheries Division, Technical Report No. 48, 1994. [2] Waugh, S. "Extending and Benchmarking Cascade-Correlation: Extensions to the CascadeCorrelation Architecture and Benchmarking of Feed-forward Supervised Artificial Neural Networks." University of Tasmania Department of Computer Science thesis, 1995. [3] Clark, D., Z. Schreter, A. Adams. "A Quantitative Comparison of Dystal and Backpropagation." submitted to the Australian Conference on Neural Networks, 1996. [4] Lichman, M. UCI Machine Learning Repository, [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also fitrsvm | RegressionPartitionedSVM | RegressionSVM | CompactRegressionSVM | kfoldLoss | kfoldPredict

35-1396

crossval

crossval Cross-validated decision tree

Syntax cvmodel = crossval(model) cvmodel = crossval(model,Name=Value)

Description cvmodel = crossval(model) creates a partitioned model from model, a fitted regression tree. By default, crossval uses 10-fold cross validation on the training data to create cvmodel. cvmodel = crossval(model,Name=Value) creates a partitioned model with additional options specified by a single name-value argument.

Examples Cross-Validate Regression Tree Load the carsmall data set. Consider Acceleration, Displacement, Horsepower, and Weight as predictor variables. load carsmall X = [Acceleration Displacement Horsepower Weight];

Grow a regression tree using the entire data set. Mdl = fitrtree(X,MPG);

Mdl is a RegressionTree model. Cross-validate the regression tree using 10-fold cross-validation. CVMdl = crossval(Mdl);

CVMdl is a RegressionPartitionedModel cross-validated model. crossval stores the ten trained, compact regression trees in the Trained property of CVMdl. Display the compact regression tree that crossval trained using all observations except those in the first fold. CVMdl.Trained{1} ans = CompactRegressionTree PredictorNames: ResponseName: CategoricalPredictors: ResponseTransform:

{'x1' 'x2' 'Y' [] 'none'

'x3'

'x4'}

35-1397

35

Functions

Estimate the generalization error of Mdl by computing the 10-fold cross-validated mean-squared error. L = kfoldLoss(CVMdl) L = 23.5706

Input Arguments model — Regression tree RegressionTree object Regression tree, specified as a RegressionTree object created by the fitrtree function. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Example: cvmodel = crossval(model,Holdout=0.5) performs holdout validation using 50% of the data. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: cvmodel = crossval(model,"Holdout",0.5) performs holdout validation using 50% of the data. CVPartition — Partition for cross-validation tree cvpartition object Partition for cross-validated tree, specified as a cvpartition object. Use only one of these four name-value arguments at a time: CVPartition, Holdout, KFold, or Leaveout. Holdout — Fraction of data for holdout validation 0 (default) | numeric scalar in the range [0, 1] Fraction of data used for holdout validation, specified as a numeric scalar in the range [0, 1]. Holdout validation tests the specified fraction of the data, and uses the rest of the data for training. Use only one of these four name-value arguments at a time: CVPartition, Holdout, KFold, or Leaveout. Example: Holdout=0.1 Data Types: single | double KFold — Number of folds 10 (default) | positive integer Number of folds to use in a cross-validated tree, specified as a positive integer. 35-1398

crossval

Use only one of these four name-value arguments at a time: CVPartition, Holdout, KFold, or Leaveout. Example: KFold=8 Leaveout — Leave-one-out cross-validation flag "off" (default) | "on" Leave-one-out cross-validation flag, specified as "on" or "off". Use leave-one-out cross-validation by specifying the value "on". Use only one of these four name-value arguments at a time: CVPartition, Holdout, KFold, or Leaveout. Example: Leaveout="on" Data Types: single | double

Output Arguments cvmodel — Partitioned regression model RegressionPartitionedModel Partitioned regression model, returned as a RegressionPartitionedModel object.

Tips • Assess the predictive performance of model on cross-validated data using the "kfold" functions and properties of cvmodel, such as kfoldLoss.

Alternatives You can create a cross-validation tree directly from the data, instead of creating a decision tree followed by a cross-validation tree. To do so, include one of these five name-value arguments in fitrtree: CrossVal, KFold, Holdout, Leaveout, or CVPartition.

Version History Introduced in R2011a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also fitrtree | crossval 35-1399

35

Functions

cvloss Classification error by cross validation

Syntax E = cvloss(tree) [E,SE] = cvloss(tree) [E,SE,Nleaf] = cvloss(tree) [E,SE,Nleaf,BestLevel] = cvloss(tree) [ ___ ] = cvloss(tree,Name,Value)

Description E = cvloss(tree) returns the cross-validated classification error (loss) for tree, a classification tree. The cvloss method uses stratified partitioning to create cross-validated sets. That is, for each fold, each partition of the data has roughly the same class proportions as in the data used to train tree. [E,SE] = cvloss(tree) returns the standard error of E. [E,SE,Nleaf] = cvloss(tree) returns the number of leaves of tree. [E,SE,Nleaf,BestLevel] = cvloss(tree) returns the optimal pruning level for tree. [ ___ ] = cvloss(tree,Name,Value) cross validates with additional options specified by one or more Name,Value pair arguments.

Examples Compute the Cross-Validation Error Compute the cross-validation error for a default classification tree. Load the ionosphere data set. load ionosphere

Grow a classification tree using the entire data set. Mdl = fitctree(X,Y);

Compute the cross-validation error. rng(1); % For reproducibility E = cvloss(Mdl) E = 0.1168

E is the 10-fold misclassification error.

35-1400

cvloss

Find the Best Pruning Level Using Cross Validation Apply k-fold cross validation to find the best level to prune a classification tree for all of its subtrees. Load the ionosphere data set. load ionosphere

Grow a classification tree using the entire data set. View the resulting tree. Mdl = fitctree(X,Y); view(Mdl,'Mode','graph')

Compute the 5-fold cross-validation error for each subtree except for the highest pruning level. Specify to return the best pruning level over all subtrees. rng(1); % For reproducibility m = max(Mdl.PruneList) - 1 m = 7

35-1401

35

Functions

[E,~,~,bestLevel] = cvloss(Mdl,'SubTrees',0:m,'KFold',5) E = 8×1 0.1282 0.1254 0.1225 0.1282 0.1282 0.1197 0.0997 0.1738 bestLevel = 6

Of the 7 pruning levels, the best pruning level is 6. Prune the tree to the best level. View the resulting tree. MdlPrune = prune(Mdl,'Level',bestLevel); view(MdlPrune,'Mode','graph')

35-1402

cvloss

Input Arguments tree — Trained classification tree ClassificationTree model object Trained classification tree, specified as a ClassificationTree model object produced by the fitctree function. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: [E,SE,Nleaf,BestLevel] = cvloss(tree,'SubTrees',0:7,'KFold',5) Subtrees — Pruning level 0 (default) | vector of nonnegative integers | "all" Pruning level, specified as a vector of nonnegative integers in ascending order or "all". If you specify a vector, then all elements must be at least 0 and at most max(tree.PruneList). 0 indicates the full, unpruned tree and max(tree.PruneList) indicates the completely pruned tree (i.e., just the root node). If you specify "all", then cvloss operates on all subtrees (in other words, the entire pruning sequence). This specification is equivalent to using 0:max(tree.PruneList). cvloss prunes tree to each level indicated in Subtrees, and then estimates the corresponding output arguments. The size of Subtrees determines the size of some output arguments. To invoke Subtrees, the properties PruneList and PruneAlpha of tree must be nonempty. In other words, grow tree by setting Prune="on", or by pruning tree using prune. Example: Subtrees="all" Data Types: single | double | char | string TreeSize — Tree size 'se' (default) | 'min' Tree size, specified as one of the following values: • 'se' — cvloss uses the smallest tree whose cost is within one standard error of the minimum cost. • 'min' — cvloss uses the minimal cost tree. Example: 'TreeSize','min' KFold — Number of cross-validation samples 10 (default) | positive integer value greater than 1 Number of cross-validation samples, specified as a positive integer value greater than 1. 35-1403

35

Functions

Example: 'KFold',8

Output Arguments E — Cross-validation classification error numeric vector | scalar value Cross-validation classification error (loss), returned as a vector or scalar depending on the setting of the Subtrees name-value pair. SE — Standard error numeric vector | scalar value Standard error of E, returned as a vector or scalar depending on the setting of the Subtrees namevalue pair. Nleaf — Number of leaf nodes numeric vector | scalar value Number of leaf nodes in tree, returned as a vector or scalar depending on the setting of the Subtrees name-value pair. Leaf nodes are terminal nodes, which give classifications, not splits. BestLevel — Best pruning level scalar value Best pruning level, returned as a scalar value. By default, a scalar representing the largest pruning level that achieves a value of E within SE of the minimum error. If you set TreeSize to 'min', BestLevel is the smallest value in Subtrees.

Alternatives You can construct a cross-validated tree model with crossval, and call kfoldLoss instead of cvloss. If you are going to examine the cross-validated tree more than once, then the alternative can save time. However, unlike cvloss, kfoldLoss does not return SE,Nleaf, or BestLevel. kfoldLoss also does not allow you to examine any error other than the classification error.

Version History Introduced in R2011a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). 35-1404

cvloss

See Also fitctree | crossval | loss | kfoldLoss

35-1405

35

Functions

cvloss Package: timeseries.forecaster Loss for partitioned data at each horizon step

Syntax L = cvloss(CVMdl) L = cvloss(CVMdl,Name=Value)

Description L = cvloss(CVMdl) returns the loss (mean squared error) obtained by the cross-validated direct forecasting model CVMdl at each step of the horizon (CVMdl.Horizon). For each partition window in CVMdl.Partition and each horizon step, the function computes the loss for the test observations using a model trained on the training observations. CVMdl.X and CVMdl.Y contain all observations. L = cvloss(CVMdl,Name=Value) specifies additional options using one or more name-value arguments. For example, you can specify a custom loss function.

Examples Evaluate Model Using Expanding Window Cross-Validation Create a cross-validated direct forecasting model using expanding window cross-validation. To evaluate the performance of the model: • Compute the mean squared error (MSE) on each test set using the cvloss object function. • For each test set, compare the true response values to the predicted response values using the cvpredict object function. Load the sample file TemperatureData.csv, which contains average daily temperature from January 2015 through July 2016. Read the file into a table. Observe the first eight observations in the table. Tbl = readtable("TemperatureData.csv"); head(Tbl)

35-1406

Year ____

Month ___________

Day ___

TemperatureF ____________

2015 2015 2015 2015 2015 2015 2015 2015

{'January'} {'January'} {'January'} {'January'} {'January'} {'January'} {'January'} {'January'}

1 2 3 4 5 6 7 8

23 31 25 39 29 12 10 4

cvloss

Create a datetime variable t that contains the year, month, and day information for each observation in Tbl. numericMonth = month(datetime(Tbl.Month, ... InputFormat="MMMM")); t = datetime(Tbl.Year,numericMonth,Tbl.Day);

Plot the temperature values in Tbl over time. plot(t,Tbl.TemperatureF) xlabel("Date") ylabel("Temperature in Fahrenheit")

Create a direct forecasting model by using the data in Tbl. Train the model using a bagged ensemble of trees. All three of the predictors (Year, Month, and Day) are leading predictors because their future values are known. To create new predictors by shifting the leading predictor and response variables backward in time, specify the leading predictor lags and the response variable lags. Mdl = directforecaster(Tbl,"TemperatureF", ... Learner="bag", ... LeadingPredictors="all",LeadingPredictorLags={0:1,0:1,0:7}, ... ResponseLags=1:7) Mdl = DirectForecaster Horizon: 1 ResponseLags: [1 2 3 4 5 6 7]

35-1407

35

Functions

LeadingPredictors: LeadingPredictorLags: ResponseName: PredictorNames: CategoricalPredictors: Learners: MaxLag: NumObservations:

[1 2 3] {[0 1] [0 1] [0 1 2 3 4 5 6 7]} 'TemperatureF' {'Year' 'Month' 'Day'} 2 {[1x1 classreg.learning.regr.CompactRegressionEnsemble]} 7 565

Mdl is a DirectForecaster model object. By default, the horizon is one step ahead. That is, Mdl predicts a value that is one step into the future. Partition the time series data in Tbl using an expanding window cross-validation scheme. Create three training sets and three test sets, where each test set has 100 observations. Note that each observation in Tbl is in at most one test set. CVPartition = tspartition(size(Mdl.X,1),"ExpandingWindow",3, ... TestSize=100) CVPartition = tspartition Type: NumObservations: NumTestSets: TrainSize: TestSize: StepSize:

'expanding-window' 565 3 [265 365 465] [100 100 100] 100

The training sets increase in size from 265 observations in the first window to 465 observations in the third window. Create a cross-validated direct forecasting model using the partition specified in CVPartition. Inspect the Learners property of the resulting CVMdl object. CVMdl = crossval(Mdl,CVPartition) CVMdl = PartitionedDirectForecaster Partition: Horizon: ResponseLags: LeadingPredictors: LeadingPredictorLags: ResponseName: PredictorNames: CategoricalPredictors: Learners: MaxLag: NumObservations:

CVMdl.Learners

35-1408

[1x1 tspartition] 1 [1 2 3 4 5 6 7] [1 2 3] {[0 1] [0 1] [0 1 2 3 4 5 6 7]} 'TemperatureF' {'Year' 'Month' 'Day'} 2 {3x1 cell} 7 565

cvloss

ans=3×1 cell array {1x1 timeseries.forecaster.CompactDirectForecaster} {1x1 timeseries.forecaster.CompactDirectForecaster} {1x1 timeseries.forecaster.CompactDirectForecaster}

CVMdl is a PartitionedDirectForecaster model object. The crossval function trains CVMdl.Learners{1} using the observations in the first training set, CVMdl.Learner{2} using the observations in the second training set, and CVMdl.Learner{3} using the observations in the third training set. Compute the average test set MSE. averageMSE = cvloss(CVMdl) averageMSE = 53.3480

To obtain more information, compute the MSE for each test set. individualMSE = cvloss(CVMdl,Mode="individual") individualMSE = 3×1 44.1352 84.0695 31.8393

The models trained on the first and third training sets seem to perform better than the model trained on the second training set. For each test set observation, predict the temperature value using the corresponding model in CVMdl.Learners. predictedY = cvpredict(CVMdl); predictedY(260:end,:) ans=306×1 table TemperatureF_Step1 __________________

⋮

NaN NaN NaN NaN NaN NaN 50.963 57.363 57.04 60.705 59.606 58.302 58.023 61.39 67.229 61.083

35-1409

35

Functions

Only the last 300 observations appear in any test set. For observations that do not appear in a test set, the predicted response value is NaN. For each test set, plot the true response values and the predicted response values. tiledlayout(3,1) nexttile idx1 = test(CVPartition,1); plot(t(idx1),Tbl.TemperatureF(idx1)) hold on plot(t(idx1),predictedY.TemperatureF_Step1(idx1)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Test Set 1") hold off nexttile idx2 = test(CVPartition,2); plot(t(idx2),Tbl.TemperatureF(idx2)) hold on plot(t(idx2),predictedY.TemperatureF_Step1(idx2)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Test Set 2") hold off nexttile idx3 = test(CVPartition,3); plot(t(idx3),Tbl.TemperatureF(idx3)) hold on plot(t(idx3),predictedY.TemperatureF_Step1(idx3)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Test Set 3") hold off

35-1410

cvloss

Overall, the cross-validated direct forecasting model is able to predict the trend in temperatures. If you are satisfied with the performance of the cross-validated model, you can use the full DirectForecaster model Mdl for forecasting at time steps beyond the available data.

Evaluate Model Using Holdout Validation Create a partitioned direct forecasting model using holdout validation. To evaluate the performance of the model: • At each horizon step, compute the root relative squared error (RRSE) on the test set using the cvloss object function. • At each horizon step, compare the true response values to the predicted response values using the cvpredict object function. Load the sample file TemperatureData.csv, which contains average daily temperature from January 2015 through July 2016. Read the file into a table. Observe the first eight observations in the table. Tbl = readtable("TemperatureData.csv"); head(Tbl) Year ____

Month ___________

Day ___

TemperatureF ____________

35-1411

35

Functions

2015 2015 2015 2015 2015 2015 2015 2015

{'January'} {'January'} {'January'} {'January'} {'January'} {'January'} {'January'} {'January'}

1 2 3 4 5 6 7 8

23 31 25 39 29 12 10 4

Create a datetime variable t that contains the year, month, and day information for each observation in Tbl. numericMonth = month(datetime(Tbl.Month, ... InputFormat="MMMM")); t = datetime(Tbl.Year,numericMonth,Tbl.Day);

Plot the temperature values in Tbl over time. plot(t,Tbl.TemperatureF) xlabel("Date") ylabel("Temperature in Fahrenheit")

Create a direct forecasting model by using the data in Tbl. Specify the horizon steps as one, two, and three steps ahead. Train a model at each horizon using a bagged ensemble of trees. All three of the predictors (Year, Month, and Day) are leading predictors because their future values are known. To create new predictors by shifting the leading predictor and response variables backward in time, specify the leading predictor lags and the response variable lags. 35-1412

cvloss

rng("default") Mdl = directforecaster(Tbl,"TemperatureF", ... Horizon=1:3,Learner="bag", ... LeadingPredictors="all",LeadingPredictorLags={0:1,0:1,0:7}, ... ResponseLags=1:7) Mdl = DirectForecaster Horizon: ResponseLags: LeadingPredictors: LeadingPredictorLags: ResponseName: PredictorNames: CategoricalPredictors: Learners: MaxLag: NumObservations:

[1 2 3] [1 2 3 4 5 6 7] [1 2 3] {[0 1] [0 1] [0 1 2 3 4 5 6 7]} 'TemperatureF' {'Year' 'Month' 'Day'} 2 {3x1 cell} 7 565

Mdl is a DirectForecaster model object. Mdl consists of three regression models: Mdl.Learners{1}, which predicts one step ahead; Mdl.Learners{2}, which predicts two steps ahead; and Mdl.Learners{3}, which predicts three steps ahead. Partition the time series data in Tbl using a holdout validation scheme. Reserve 20% of the observations for testing. holdoutPartition = tspartition(size(Mdl.X,1),"Holdout",0.20) holdoutPartition = tspartition Type: NumObservations: NumTestSets: TrainSize: TestSize:

'holdout' 565 1 452 113

The test set consists of the latest 113 observations. Create a partitioned direct forecasting model using the partition specified in holdoutPartition. holdoutMdl = crossval(Mdl,holdoutPartition) holdoutMdl = PartitionedDirectForecaster Partition: Horizon: ResponseLags: LeadingPredictors: LeadingPredictorLags: ResponseName: PredictorNames:

[1x1 tspartition] [1 2 3] [1 2 3 4 5 6 7] [1 2 3] {[0 1] [0 1] [0 1 2 3 4 5 6 7]} 'TemperatureF' {'Year' 'Month' 'Day'}

35-1413

35

Functions

CategoricalPredictors: Learners: MaxLag: NumObservations:

2 {[1x1 timeseries.forecaster.CompactDirectForecaster]} 7 565

holdoutMdl is a PartitionedDirectForecaster model object. Because holdoutMdl uses holdout validation rather than a cross-validation scheme, the Learners property of the object contains one CompactDirectForecaster model only. Like Mdl, holdoutMdl contains three regression models. The crossval function trains holdoutMdl.Learners{1}.Learners{1}, holdoutMdl.Learners{1}.Learners{2}, and holdoutMdl.Learners{1}.Learners{3} using the same training data. However, the three models use different response variables because each model predicts values for a different horizon step. holdoutMdl.Learners{1}.Learners{1}.ResponseName ans = 'TemperatureF_Step1' holdoutMdl.Learners{1}.Learners{2}.ResponseName ans = 'TemperatureF_Step2' holdoutMdl.Learners{1}.Learners{3}.ResponseName ans = 'TemperatureF_Step3'

Compute the root relative squared error (RRSE) on the test data at each horizon step. Use the helper function computeRRSE on page 35-1416 (shown at the end of this example). The RRSE indicates how well a model performs relative to the simple model, which always predicts the average of the true values. In particular, when the RRSE is less than 1, the model performs better than the simple model. holdoutRRSE = cvloss(holdoutMdl,LossFun=@computeRRSE) holdoutRRSE = 1×3 0.4797

0.5889

0.6103

At each horizon, the direct forecasting model seems to perform better than the simple model. For each test set observation, predict the temperature value using the corresponding model in holdoutMdl.Learners. predictedY = cvpredict(holdoutMdl); predictedY(450:end,:) ans=116×3 table TemperatureF_Step1 __________________ NaN NaN NaN

35-1414

TemperatureF_Step2 __________________ NaN NaN NaN

TemperatureF_Step3 __________________ NaN NaN NaN

cvloss

⋮

41.063 33.721 36.987 38.644 38.917 45.888 48.516 44.882 35.057 31.1 31.817 33.166 40.279

39.758 36.507 35.133 34.598 34.576 37.005 42.762 46.816 45.301 41.473 37.314 38.419 38.432

41.234 37.719 37.719 36.444 36.275 38.34 41.05 43.881 47.048 42.948 42.946 41.3 40.533

Recall that only the latest 113 observations appear in the test set. For observations that do not appear in the test set, the predicted response value is NaN. For each test set, plot the true response values and the predicted response values. tiledlayout(3,1) idx = test(holdoutPartition); nexttile plot(t(idx),Tbl.TemperatureF(idx)) hold on plot(t(idx),predictedY.TemperatureF_Step1(idx)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Horizon 1") hold off nexttile plot(t(idx),Tbl.TemperatureF(idx)) hold on plot(t(idx),predictedY.TemperatureF_Step2(idx)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Horizon 2") hold off nexttile plot(t(idx),Tbl.TemperatureF(idx)) hold on plot(t(idx),predictedY.TemperatureF_Step3(idx)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Horizon 3") hold off

35-1415

35

Functions

Overall, holdoutMdl is able to predict the trend in temperatures, although it seems to perform best when forecasting one step ahead. If you are satisfied with the performance of the partitioned model, you can use the full DirectForecaster model Mdl for forecasting at time steps beyond the available data. Helper Function The helper function computeRRSE computes the RRSE given the true response variable trueY and the predicted values predY. This code creates the computeRRSE helper function. function rrse = computeRRSE(trueY,predY) error = trueY(:) - predY(:); meanY = mean(trueY(:),"omitnan"); rrse = sqrt(sum(error.^2,"omitnan")/sum((trueY(:) - meanY).^2,"omitnan")); end

Input Arguments CVMdl — Cross-validated direct forecasting model PartitionedDirectForecaster model object Cross-validated direct forecasting model, specified as a PartitionedDirectForecaster model object.

35-1416

cvloss

Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Example: cvloss(CVMdl,Mode="individual") specifies to return the loss for each window and horizon step combination. LossFun — Loss function "mse" (default) | function handle Loss function, specified as "mse" or a function handle. • If you specify the built-in function "mse", then the loss function is the mean squared error. • If you specify your own function using function handle notation, then the function must have the signature lossvalue = lossfun(Y,predictedY), where: • The output argument lossvalue is a scalar. • You specify the function name (lossfun). • Y is an n-by-1 vector of observed numeric responses at a specific horizon step, where n is the number of observations (CVMdl.NumObservations). • predictedY is an n-by-1 vector of predicted numeric responses at a specific horizon step. Specify your function using LossFun=@lossfun. Data Types: single | double | function_handle Mode — Loss aggregation level "average" (default) | "individual" Loss aggregation level, specified as "average" or "individual". Value

Description

"average"

cvloss returns a vector of loss values containing the loss at each horizon step, averaged over all partition windows. At each horizon step, if an observation is in more than one test set, the function averages the predictions for the observation over all test sets before computing the loss.

"individual"

cvloss returns a w-by-h matrix of loss values, where w is the number of partition windows and h is the number of horizon steps (that is, the number of elements in CVMdl.Horizon). Before computing loss values at each horizon step, the function does not average the predictions for observations that are in more than one test set.

Example: Model="individual" 35-1417

35

Functions

Data Types: char | string

Output Arguments L — Losses numeric vector | numeric matrix Losses, returned as a numeric vector or numeric matrix. • If Mode is "average", then L is a vector of loss values containing the loss at each horizon step, averaged over all partition windows. At each horizon step, if an observation is in more than one test set, the function averages the predictions for the observation over all test sets before computing the loss. • If Mode is "individual", then L is a w-by-h matrix of loss values, where w is the number of partition windows and h is the number of horizon steps (that is, the number of elements in CVMdl.Horizon). Before computing the loss values at each horizon step, the function does not average the predictions for observations that are in more than one test set.

Version History Introduced in R2023b

See Also PartitionedDirectForecaster | cvpredict | DirectForecaster | tspartition

35-1418

cvloss

cvloss Regression error by cross validation

Syntax E = cvloss(tree) [E,SE] = cvloss(tree) [E,SE,Nleaf] = cvloss(tree) [E,SE,Nleaf,bestLevel] = cvloss(tree) [ ___ ] = cvloss(tree,Name=Value)

Description E = cvloss(tree) returns the cross-validated regression error (loss) for tree, a regression tree. [E,SE] = cvloss(tree) also returns the standard error of E. [E,SE,Nleaf] = cvloss(tree) returns the number of leaves (terminal nodes) in tree. [E,SE,Nleaf,bestLevel] = cvloss(tree) returns the optimal pruning level for tree. [ ___ ] = cvloss(tree,Name=Value) cross validates with additional options specified by one or more name-value arguments.

Examples Compute the Cross-Validation Error Compute the cross-validation error for a default regression tree. Load the carsmall data set. Consider Displacement, Horsepower, and Weight as predictors of the response MPG. load carsmall X = [Displacement Horsepower Weight];

Grow a regression tree using the entire data set. Mdl = fitrtree(X,MPG);

Compute the cross-validation error. rng(1); % For reproducibility E = cvloss(Mdl) E = 27.6976

E is the 10-fold weighted, average MSE (weighted by number of test observations in the folds).

35-1419

35

Functions

Find the Best Pruning Level Using Cross Validation Apply k-fold cross validation to find the best level to prune a regression tree for all of its subtrees. Load the carsmall data set. Consider Displacement, Horsepower, and Weight as predictors of the response MPG. load carsmall X = [Displacement Horsepower Weight];

Grow a regression tree using the entire data set. View the resulting tree. Mdl = fitrtree(X,MPG); view(Mdl,Mode="graph")

Compute the 5-fold cross-validation error for each subtree except for the first two lowest and highest pruning level. Specify to return the best pruning level over all subtrees. rng(1); % For reproducibility m = max(Mdl.PruneList) - 1 m = 15

35-1420

cvloss

[~,~,~,bestLevel] = cvloss(Mdl,SubTrees=2:m,KFold=5) bestLevel = 14

Of the 15 pruning levels, the best pruning level is 14. Prune the tree to the best level. View the resulting tree. MdlPrune = prune(Mdl,Level=bestLevel); view(MdlPrune,Mode="graph")

Input Arguments tree — Trained regression tree RegressionTree object Trained regression tree, specified as a RegressionTree object created using the fitrtree function. 35-1421

35

Functions

Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Example: E = cvloss(tree,Subtrees="all") prunes all subtrees. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: E = cvloss(tree,"Subtrees","all") prunes all subtrees. Subtrees — Pruning level 0 (default) | vector of nonnegative integers | "all" Pruning level, specified as a vector of nonnegative integers in ascending order or "all". If you specify a vector, then all elements must be at least 0 and at most max(tree.PruneList). 0 indicates the full, unpruned tree and max(tree.PruneList) indicates the completely pruned tree (in other words, just the root node). If you specify "all", then cvloss operates on all subtrees (in other words, the entire pruning sequence). This specification is equivalent to using 0:max(tree.PruneList). cvloss prunes tree to each level indicated in Subtrees, and then estimates the corresponding output arguments. The size of Subtrees determines the size of some output arguments. To invoke Subtrees, the properties PruneList and PruneAlpha of tree must be nonempty. In other words, grow tree by setting Prune="on", or by pruning tree using prune. Example: Subtrees="all" Data Types: single | double | char | string TreeSize — Tree size "se" (default) | "min" Tree size, specified as one of the following: • "se" — The cvloss function uses the smallest tree whose cost is within one standard error of the minimum cost. • "min" — The cvloss function uses the minimal cost tree. Example: TreeSize="min" KFold — Number of folds 10 (default) | positive integer greater than 1 Number of folds to use in a cross-validated tree, specified as a positive integer greater than 1. Example: KFold=8

Output Arguments E — Mean squared error numeric vector 35-1422

cvloss

Cross-validation mean squared error (loss), returned as a numeric vector of the same length as Subtrees. SE — Standard error numeric vector Standard error of E, returned as a numeric vector of the same length as Subtrees. Nleaf — Number of leaf nodes numeric vector Number of leaf nodes in the pruned subtrees, returned as a numeric vector of the same length as Subtrees. Leaf nodes are terminal nodes, which give responses, not splits. bestLevel — Best pruning level numeric scalar Best pruning level as defined in the TreeSize name-value argument, returned as a numeric scalar whose value depends on TreeSize: • If TreeSize is "se", then bestLevel is the largest pruning level that achieves a value of E within SE of the minimum error. • If TreeSize is "min", then bestLevel is the smallest value in Subtrees.

Alternatives You can create a cross-validated tree model using crossval, and call kfoldLoss instead of cvloss. If you are going to examine the cross-validated tree more than once, then the alternative can save time. However, unlike cvloss, kfoldLoss does not return SE, Nleaf, or BestLevel.

Version History Introduced in R2011a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also crossval | kfoldLoss | fitrtree | loss

35-1423

35

Functions

cvpartition Partition data for cross-validation

Description cvpartition defines a random partition on a data set. Use this partition to define training and test sets for validating a statistical model using cross-validation. Use training to extract the training indices and test to extract the test indices for cross-validation. Use repartition to define a new random partition of the same type as a given cvpartition object.

Creation Syntax c = cvpartition(n,"KFold",k) c = cvpartition(n,"Holdout",p) c c c c

= = = =

cvpartition(group,"KFold",k) cvpartition(group,"KFold",k,"Stratify",stratifyOption) cvpartition(group,"Holdout",p) cvpartition(group,"Holdout",p,"Stratify",stratifyOption)

c = cvpartition(n,"Leaveout") c = cvpartition(n,"Resubstitution") c = cvpartition("CustomPartition",testSets) Description c = cvpartition(n,"KFold",k) returns a cvpartition object c that defines a random nonstratified partition for k-fold cross-validation on n observations. The partition randomly divides the observations into k disjoint subsamples, or folds, each of which has approximately the same number of observations. c = cvpartition(n,"Holdout",p) creates a random nonstratified partition for holdout validation on n observations. This partition divides the observations into a training set and a test, or holdout, set. c = cvpartition(group,"KFold",k) creates a random partition for stratified k-fold crossvalidation. Each subsample, or fold, has approximately the same number of observations and contains approximately the same class proportions as in group. When you specify group as the first input argument, cvpartition discards rows of observations corresponding to missing values in group. c = cvpartition(group,"KFold",k,"Stratify",stratifyOption) returns a cvpartition object c that defines a random partition for k-fold cross-validation. If you specify 35-1424

cvpartition

"Stratify",false, then cvpartition ignores the class information in group and creates a nonstratified random partition. Otherwise, the function implements stratification by default. c = cvpartition(group,"Holdout",p) randomly partitions observations into a training set and a test, or holdout, set with stratification, using the class information in group. Both the training and test sets have approximately the same class proportions as in group. c = cvpartition(group,"Holdout",p,"Stratify",stratifyOption) returns an object c that defines a random partition into a training set and a test, or holdout, set. If you specify "Stratify",false, then cvpartition creates a nonstratified random partition. Otherwise, the function implements stratification by default. c = cvpartition(n,"Leaveout") creates a random partition for leave-one-out cross-validation on n observations. Leave-one-out is a special case of "KFold" in which the number of folds equals the number of observations. c = cvpartition(n,"Resubstitution") creates an object c that does not partition the data. Both the training set and the test set contain all of the original n observations. c = cvpartition("CustomPartition",testSets) creates a cvpartition object c that partitions the data based on the test sets indicated in testSets. Input Arguments n — Number of observations positive integer scalar Number of observations in the sample data, specified as a positive integer scalar. Example: 100 Data Types: single | double k — Number of folds 10 (default) | positive integer scalar Number of folds in the partition, specified as a positive integer scalar. k must be smaller than the total number of observations. Example: 5 Data Types: single | double p — Fraction or number of observations in test set 0.1 (default) | scalar in the range (0,1) | integer scalar in the range [1,n) Fraction or number of observations in the test set used for holdout validation, specified as a scalar in the range (0,1) or an integer scalar in the range [1,n), where n is the total number of observations. • If p is a scalar in the range (0,1), then cvpartition randomly selects approximately p*n observations for the test set. • If p is an integer scalar in the range [1,n), then cvpartition randomly selects p observations for the test set. Example: 0.2 Example: 50 35-1425

35

Functions

Data Types: single | double group — Grouping variable for stratification numeric vector | logical vector | categorical array | character array | string array | cell array of character vectors Grouping variable for stratification, specified as a numeric or logical vector, a categorical, character, or string array, or a cell array of character vectors indicating the class of each observation. cvpartition creates a partition from the observations in group. Data Types: single | double | logical | categorical | char | string | cell stratifyOption — Indicator for stratification true | false Indicator for stratification, specified as true or false. • If the first input argument to cvpartition is group, then cvpartition implements stratification by default ("Stratify",true). For a nonstratified random partition, specify "Stratify",false. • If the first input argument to cvpartition is n, then cvpartition always creates a nonstratified random partition ("Stratify",false). In this case, you cannot specify "Stratify",true. Data Types: logical testSets — Custom test sets positive integer vector | logical vector | logical matrix Custom test sets, specified as a positive integer vector, logical vector, or logical matrix. • For holdout validation, specify the test set observations by using a logical vector. A value of 1 (true) indicates that the corresponding observation is in the test set, and value of 0 (false) indicates that the corresponding observation is in the training set. • For k-fold cross-validation, specify the test set observations by using an integer vector (with values in the range [1,k]) or a logical matrix with k columns. • Integer vector — A value of j indicates that the corresponding observation is in test set j. • Logical matrix — The value in row i and column j indicates whether observation i is in test set j. Each of the k test sets must contain at least one observation. • For leave-one-out cross-validation, specify the test set observations by using an integer vector (with values in the range [1,n]) or an n-by-n logical matrix, where n is the number of observations in the data. • Integer vector — A value of j indicates that the corresponding observation is in test set j. • Logical matrix — The value in row i and column j indicates whether observation i is in test set j. Example: "CustomPartition",[true false true false false] indicates a holdout validation scheme, with the first and third observations in the test set. Example: "CustomPartition",[1 2 2 1 3 3 1 2 3 2] indicates a 3-fold cross-validation scheme, with the first, fourth, and seventh observations in the first test set. 35-1426

cvpartition

Data Types: single | double | logical

Properties IsCustom — Indicator of custom partition logical scalar This property is read-only. Indicator of a custom partition, specified as a logical scalar. The value is 1 (true) when the object was created using custom partitioning. The value is 0 (false) otherwise. Data Types: logical NumObservations — Number of observations positive integer scalar This property is read-only. Number of observations, including observations with missing group values, specified as a positive integer scalar. Data Types: double NumTestSets — Total number of test sets number of folds | 1 This property is read-only. Total number of test sets in the partition, specified as the number of folds when the partition type is 'kfold' or 'leaveout', and 1 when the partition type is 'holdout' or 'resubstitution'. Data Types: double TestSize — Size of each test set positive integer vector | positive integer scalar This property is read-only. Size of each test set, specified as a positive integer vector when the partition type is 'kfold' or 'leaveout', and a positive integer scalar when the partition type is 'holdout' or 'resubstitution'. Data Types: double TrainSize — Size of each training set positive integer vector | positive integer scalar This property is read-only. Size of each training set, specified as a positive integer vector when the partition type is 'kfold' or 'leaveout', and a positive integer scalar when the partition type is 'holdout' or 'resubstitution'. Data Types: double 35-1427

35

Functions

Type — Type of validation partition 'kfold' | 'holdout' | 'leaveout' | 'resubstitution' This property is read-only. Type of validation partition, specified as 'kfold', 'holdout', 'leaveout', or 'resubstitution'.

Object Functions repartition test training

Repartition data for cross-validation Test indices for cross-validation Training indices for cross-validation

Examples Estimate Accuracy of Classifying New Data by Using Cross-Validation Error Use the cross-validation misclassification error to estimate how a model will perform on new data. Load the ionosphere data set. Create a table containing the predictor data X and the response variable Y. load ionosphere tbl = array2table(X); tbl.Y = Y;

Use a random nonstratified partition hpartition to split the data into training data (tblTrain) and a reserved data set (tblNew). Reserve approximately 30 percent of the data. rng('default') % For reproducibility n = length(tbl.Y); hpartition = cvpartition(n,'Holdout',0.3); % Nonstratified partition idxTrain = training(hpartition); tblTrain = tbl(idxTrain,:); idxNew = test(hpartition); tblNew = tbl(idxNew,:);

Train a support vector machine (SVM) classification model using the training data tblTrain. Calculate the misclassification error and the classification accuracy on the training data. Mdl = fitcsvm(tblTrain,'Y'); trainError = resubLoss(Mdl) trainError = 0.0569 trainAccuracy = 1-trainError trainAccuracy = 0.9431

Typically, the misclassification error on the training data is not a good estimate of how a model will perform on new data because it can underestimate the misclassification rate on new data. A better estimate is the cross-validation error. 35-1428

cvpartition

Create a partitioned model cvMdl. Compute the 10-fold cross-validation misclassification error and classification accuracy. By default, crossval ensures that the class proportions in each fold remain approximately the same as the class proportions in the response variable tblTrain.Y. cvMdl = crossval(Mdl); % Performs stratified 10-fold cross-validation cvtrainError = kfoldLoss(cvMdl) cvtrainError = 0.1220 cvtrainAccuracy = 1-cvtrainError cvtrainAccuracy = 0.8780

Notice that the cross-validation error cvtrainError is greater than the resubstitution error trainError. Classify the new data in tblNew using the trained SVM model. Compare the classification accuracy on the new data to the accuracy estimates trainAccuracy and cvtrainAccuracy. newError = loss(Mdl,tblNew,'Y'); newAccuracy = 1-newError newAccuracy = 0.8700

The cross-validation error gives a better estimate of the model performance on new data than the resubstitution error.

Find Misclassification Rates Using K-Fold Cross-Validation Use the same stratified partition for 5-fold cross-validation to compute the misclassification rates of two models. Load the fisheriris data set. The matrix meas contains flower measurements for 150 different flowers. The variable species lists the species for each flower. load fisheriris

Create a random partition for stratified 5-fold cross-validation. The training and test sets have approximately the same proportions of flower species as species. rng('default') % For reproducibility c = cvpartition(species,'KFold',5);

Create a partitioned discriminant analysis model and a partitioned classification tree model by using c. discrCVModel = fitcdiscr(meas,species,'CVPartition',c); treeCVModel = fitctree(meas,species,'CVPartition',c);

Compute the misclassification rates of the two partitioned models. discrRate = kfoldLoss(discrCVModel) discrRate = 0.0200 treeRate = kfoldLoss(treeCVModel)

35-1429

35

Functions

treeRate = 0.0333

The discriminant analysis model has a smaller cross-validation misclassification rate.

Create Nonstratified Partition Observe the test set (fold) class proportions in a 5-fold nonstratified partition of the fisheriris data. The class proportions differ across the folds. Load the fisheriris data set. The species variable contains the species name (class) for each flower (observation). Convert species to a categorical variable. load fisheriris species = categorical(species);

Find the number of observations in each class. Notice that the three classes occur in equal proportion. C = categories(species) % Class names C = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } numClasses = size(C,1); n = countcats(species) % Number of observations in each class n = 3×1 50 50 50

Create a random nonstratified 5-fold partition. rng('default') % For reproducibility cv = cvpartition(species,'KFold',5,'Stratify',false) cv = K-fold cross validation partition NumObservations: 150 NumTestSets: 5 TrainSize: 120 120 120 120 TestSize: 30 30 30 30 30 IsCustom: 0

120

Show that the three classes do not occur in equal proportion in each of the five test sets, or folds. Use a for-loop to update the nTestData matrix so that each entry nTestData(i,j) corresponds to the number of observations in test set i and class C(j). Create a bar chart from the data in nTestData. numFolds = cv.NumTestSets; nTestData = zeros(numFolds,numClasses); for i = 1:numFolds

35-1430

cvpartition

testClasses = species(cv.test(i)); nCounts = countcats(testClasses); % Number of test set observations in each class nTestData(i,:) = nCounts'; end bar(nTestData) xlabel('Test Set (Fold)') ylabel('Number of Observations') title('Nonstratified Partition') legend(C)

Notice that the class proportions vary in some of the test sets. For example, the first test set contains 8 setosa, 13 versicolor, and 9 virginica flowers, rather than 10 flowers per species. Because cv is a random nonstratified partition of the fisheriris data, the class proportions in each test set (fold) are not guaranteed to be equal to the class proportions in species. That is, the classes do not always occur equally in each test set, as they do in species.

Create Nonstratified and Stratified Holdout Partitions for Tall Array Create a nonstratified holdout partition and a stratified holdout partition for a tall array. For the two holdout sets, compare the number of observations in each class. When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (the default if you have Parallel Computing Toolbox™) or the local MATLAB session. To run the example using the local 35-1431

35

Functions

MATLAB session when you have Parallel Computing Toolbox, change the global execution environment by using the mapreducer function. mapreducer(0)

Create a numeric vector of two classes, where class 1 and class 2 occur in the ratio 1:10. group = [ones(20,1);2*ones(200,1)] group = 220×1 1 1 1 1 1 1 1 1 1 1

⋮

Create a tall array from group. tgroup = tall(group) tgroup = 220x1 tall double column vector 1 1 1 1 1 1 1 1 : :

Holdout is the only cvpartition option that is supported for tall arrays. Create a random nonstratified holdout partition. CV0 = cvpartition(tgroup,'Holdout',1/4,'Stratify',false) CV0 = Hold-out cross validation partition NumObservations: [1x1 tall] NumTestSets: 1 TrainSize: [1x1 tall] TestSize: [1x1 tall] IsCustom: 0

Return the result of CV0.test to memory by using the gather function. testIdx0 = gather(CV0.test);

35-1432

cvpartition

Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.81 sec Evaluation completed in 1.1 sec

Find the number of times each class occurs in the test, or holdout, set. accumarray(group(testIdx0),1) % Number of observations per class in the holdout set ans = 2×1 5 51

cvpartition produces randomness in the results, so your number of observations in each class can vary from those shown. Because CV0 is a nonstratified partition, class 1 observations and class 2 observations in the holdout set are not guaranteed to occur in the same ratio as in tgroup. However, because of the inherent randomness in cvpartition, you can sometimes obtain a holdout set in which the classes occur in the same ratio as in tgroup, even though you specify 'Stratify',false. Because the training set is the complement of the holdout set, excluding any NaN or missing observations, you can obtain a similar result for the training set. Return the result of CV0.training to memory. trainIdx0 = gather(CV0.training); Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.31 sec Evaluation completed in 0.41 sec

Find the number of times each class occurs in the training set. accumarray(group(trainIdx0),1) % Number of observations per class in the training set ans = 2×1 15 149

The classes in the nonstratified training set are not guaranteed to occur in the same ratio as in tgroup. Create a random stratified holdout partition. CV1 = cvpartition(tgroup,'Holdout',1/4) CV1 = Hold-out cross validation partition NumObservations: [1x1 tall] NumTestSets: 1 TrainSize: [1x1 tall] TestSize: [1x1 tall] IsCustom: 0

Return the result of CV1.test to memory. 35-1433

35

Functions

testIdx1 = gather(CV1.test); Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.15 sec Evaluation completed in 0.21 sec

Find the number of times each class occurs in the test, or holdout, set. accumarray(group(testIdx1),1) % Number of observations per class in the holdout set ans = 2×1 5 51

In the case of the stratified holdout partition, the class ratio in the holdout set and the class ratio in tgroup are the same (1:10).

Find Influential Observations Using Leave-One-Out Partition Create a random partition of data for leave-one-out cross-validation. Compute and compare training set means. A repetition with a significantly different mean suggests the presence of an influential observation. Create a data set X that contains one value that is much greater than the others. X = [1 2 3 4 5 6 7 8 9 20]';

Create a cvpartition object that has 10 observations and 10 repetitions of training and test data. For each repetition, cvpartition selects one observation to remove from the training set and reserve for the test set. c = cvpartition(10,'Leaveout') c = Leave-one-out cross NumObservations: NumTestSets: TrainSize: TestSize: IsCustom:

validation partition 10 10 9 9 9 9 9 9 9 9 1 1 1 1 1 1 1 1 0

9 1

9 1

Apply the leave-one-out partition to X, and take the mean of the training observations for each repetition by using crossval. values = crossval(@(Xtrain,Xtest)mean(Xtrain),X,'Partition',c) values = 10×1 6.5556 6.4444 7.0000 6.3333 6.6667 7.1111

35-1434

cvpartition

6.8889 6.7778 6.2222 5.0000

View the distribution of the training set means using a box chart (or box plot). The plot displays one outlier. boxchart(values)

Find the repetition corresponding to the outlier value. For that repetition, find the observation in the test set. [~,repetitionIdx] = min(values) repetitionIdx = 10 observationIdx = test(c,repetitionIdx); influentialObservation = X(observationIdx) influentialObservation = 20

Training sets that contain the observation have substantially different means from the mean of the training set without the observation. This significant change in mean suggests that the value of 20 in X is an influential observation.

35-1435

35

Functions

Specify Custom Cross-Validation Partition Create a cross-validated regression tree by specifying a custom 4-fold cross-validation partition. Load the carbig data set. Create a table Tbl containing the response variable MPG and the predictor variables Acceleration, Cylinders, and so on. load carbig Tbl = table(Acceleration,Cylinders,Displacement, ... Horsepower,Model_Year,Weight,Origin,MPG);

Remove observations with missing values. Check the size of the table data after the removal of the observations with missing values. Tbl = rmmissing(Tbl); dimensions = size(Tbl) dimensions = 1×2 392

8

The resulting table contains 392 observations, where 392/4 = 98. Create a custom 4-fold cross-validation partition of the Tbl data. Place the first 98 observations in the first test set, the next 98 observations in the second test set, and so on. testSet = ones(98,1); testIndices = [testSet; 2*testSet; ... 3*testSet; 4*testSet]; c = cvpartition("CustomPartition",testIndices) c = K-fold cross validation partition NumObservations: 392 NumTestSets: 4 TrainSize: 294 294 294 294 TestSize: 98 98 98 98 IsCustom: 1

Train a cross-validated regression tree using the custom partition c. To assess the model performance, compute the cross-validation mean squared error (MSE). cvMdl = fitrtree(Tbl,"MPG","CVPartition",c); cvMSE = kfoldLoss(cvMdl) cvMSE = 21.2223

Tips • If you specify group as the first input argument to cvpartition, then the function discards rows of observations corresponding to missing values in group. • If you specify group as the first input argument to cvpartition, then the function implements stratification by default. You can specify "Stratify",false to create a nonstratified random partition. 35-1436

cvpartition

• You can specify "Stratify",true only when the first input argument to cvpartition is group.

Version History Introduced in R2008a R2023b: Create custom cross-validation partitions The cvpartition function supports the creation of custom cross-validation partitions. Use the CustomPartition name-value argument to specify the test set observations. For example, cvpartition("CustomPartition",testSets) specifies to partition the data based on the test sets in testSets. The IsCustom property of the resulting cvpartition object is set to 1 (true).

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. The cvpartition function supports tall arrays for out-of-memory data with some limitations. • When you use cvpartition with tall arrays, the first input argument must be a grouping variable, tGroup. If you specify a tall scalar as the first input argument, cvpartition gives an error. • cvpartition supports only Holdout cross-validation for tall arrays; for example, c = cvpartition(tGroup,"Holdout",p). By default, cvpartition randomly partitions observations into a training set and a test set with stratification, using the class information in tGroup. The parameter p is a scalar such that 0 < p < 1. • To create nonstratified Holdout partitions, specify the value of the "Stratify" name-value argument as false; for example, c = cvpartition(tGroup,"Holdout",p,"Stratify",false). For more information, see “Tall Arrays for Out-of-Memory Data”.

See Also crossval | repartition | test | training Topics “Grouping Variables” on page 2-11

35-1437

35

Functions

cvpredict Package: timeseries.forecaster Predict response using cross-validated direct forecasting model

Syntax predictedY = cvpredict(CVMdl)

Description predictedY = cvpredict(CVMdl) predicts the test data response using the cross-validated direct forecasting model CVMdl. For each partition window in CVMdl.Partition and each horizon step in CVMdl.Horizon, the function predicts the response for test observations by using a model trained on training observations. If an observation is in more than one test set, the function returns the prediction for that observation, averaged over all test sets.

Examples Evaluate Model Using Expanding Window Cross-Validation Create a cross-validated direct forecasting model using expanding window cross-validation. To evaluate the performance of the model: • Compute the mean squared error (MSE) on each test set using the cvloss object function. • For each test set, compare the true response values to the predicted response values using the cvpredict object function. Load the sample file TemperatureData.csv, which contains average daily temperature from January 2015 through July 2016. Read the file into a table. Observe the first eight observations in the table. Tbl = readtable("TemperatureData.csv"); head(Tbl) Year ____

Month ___________

Day ___

TemperatureF ____________

2015 2015 2015 2015 2015 2015 2015 2015

{'January'} {'January'} {'January'} {'January'} {'January'} {'January'} {'January'} {'January'}

1 2 3 4 5 6 7 8

23 31 25 39 29 12 10 4

Create a datetime variable t that contains the year, month, and day information for each observation in Tbl. 35-1438

cvpredict

numericMonth = month(datetime(Tbl.Month, ... InputFormat="MMMM")); t = datetime(Tbl.Year,numericMonth,Tbl.Day);

Plot the temperature values in Tbl over time. plot(t,Tbl.TemperatureF) xlabel("Date") ylabel("Temperature in Fahrenheit")

Create a direct forecasting model by using the data in Tbl. Train the model using a bagged ensemble of trees. All three of the predictors (Year, Month, and Day) are leading predictors because their future values are known. To create new predictors by shifting the leading predictor and response variables backward in time, specify the leading predictor lags and the response variable lags. Mdl = directforecaster(Tbl,"TemperatureF", ... Learner="bag", ... LeadingPredictors="all",LeadingPredictorLags={0:1,0:1,0:7}, ... ResponseLags=1:7) Mdl = DirectForecaster Horizon: ResponseLags: LeadingPredictors: LeadingPredictorLags: ResponseName:

1 [1 2 3 4 5 6 7] [1 2 3] {[0 1] [0 1] [0 1 2 3 4 5 6 7]} 'TemperatureF'

35-1439

35

Functions

PredictorNames: CategoricalPredictors: Learners: MaxLag: NumObservations:

{'Year' 'Month' 'Day'} 2 {[1x1 classreg.learning.regr.CompactRegressionEnsemble]} 7 565

Mdl is a DirectForecaster model object. By default, the horizon is one step ahead. That is, Mdl predicts a value that is one step into the future. Partition the time series data in Tbl using an expanding window cross-validation scheme. Create three training sets and three test sets, where each test set has 100 observations. Note that each observation in Tbl is in at most one test set. CVPartition = tspartition(size(Mdl.X,1),"ExpandingWindow",3, ... TestSize=100) CVPartition = tspartition Type: NumObservations: NumTestSets: TrainSize: TestSize: StepSize:

'expanding-window' 565 3 [265 365 465] [100 100 100] 100

The training sets increase in size from 265 observations in the first window to 465 observations in the third window. Create a cross-validated direct forecasting model using the partition specified in CVPartition. Inspect the Learners property of the resulting CVMdl object. CVMdl = crossval(Mdl,CVPartition) CVMdl = PartitionedDirectForecaster Partition: Horizon: ResponseLags: LeadingPredictors: LeadingPredictorLags: ResponseName: PredictorNames: CategoricalPredictors: Learners: MaxLag: NumObservations:

[1x1 tspartition] 1 [1 2 3 4 5 6 7] [1 2 3] {[0 1] [0 1] [0 1 2 3 4 5 6 7]} 'TemperatureF' {'Year' 'Month' 'Day'} 2 {3x1 cell} 7 565

CVMdl.Learners ans=3×1 cell array {1x1 timeseries.forecaster.CompactDirectForecaster}

35-1440

cvpredict

{1x1 timeseries.forecaster.CompactDirectForecaster} {1x1 timeseries.forecaster.CompactDirectForecaster}

CVMdl is a PartitionedDirectForecaster model object. The crossval function trains CVMdl.Learners{1} using the observations in the first training set, CVMdl.Learner{2} using the observations in the second training set, and CVMdl.Learner{3} using the observations in the third training set. Compute the average test set MSE. averageMSE = cvloss(CVMdl) averageMSE = 53.3480

To obtain more information, compute the MSE for each test set. individualMSE = cvloss(CVMdl,Mode="individual") individualMSE = 3×1 44.1352 84.0695 31.8393

The models trained on the first and third training sets seem to perform better than the model trained on the second training set. For each test set observation, predict the temperature value using the corresponding model in CVMdl.Learners. predictedY = cvpredict(CVMdl); predictedY(260:end,:) ans=306×1 table TemperatureF_Step1 __________________

⋮

NaN NaN NaN NaN NaN NaN 50.963 57.363 57.04 60.705 59.606 58.302 58.023 61.39 67.229 61.083

35-1441

35

Functions

Only the last 300 observations appear in any test set. For observations that do not appear in a test set, the predicted response value is NaN. For each test set, plot the true response values and the predicted response values. tiledlayout(3,1) nexttile idx1 = test(CVPartition,1); plot(t(idx1),Tbl.TemperatureF(idx1)) hold on plot(t(idx1),predictedY.TemperatureF_Step1(idx1)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Test Set 1") hold off nexttile idx2 = test(CVPartition,2); plot(t(idx2),Tbl.TemperatureF(idx2)) hold on plot(t(idx2),predictedY.TemperatureF_Step1(idx2)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Test Set 2") hold off nexttile idx3 = test(CVPartition,3); plot(t(idx3),Tbl.TemperatureF(idx3)) hold on plot(t(idx3),predictedY.TemperatureF_Step1(idx3)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Test Set 3") hold off

35-1442

cvpredict

Overall, the cross-validated direct forecasting model is able to predict the trend in temperatures. If you are satisfied with the performance of the cross-validated model, you can use the full DirectForecaster model Mdl for forecasting at time steps beyond the available data.

Evaluate Model Using Holdout Validation Create a partitioned direct forecasting model using holdout validation. To evaluate the performance of the model: • At each horizon step, compute the root relative squared error (RRSE) on the test set using the cvloss object function. • At each horizon step, compare the true response values to the predicted response values using the cvpredict object function. Load the sample file TemperatureData.csv, which contains average daily temperature from January 2015 through July 2016. Read the file into a table. Observe the first eight observations in the table. Tbl = readtable("TemperatureData.csv"); head(Tbl) Year ____

Month ___________

Day ___

TemperatureF ____________

35-1443

35

Functions

2015 2015 2015 2015 2015 2015 2015 2015

{'January'} {'January'} {'January'} {'January'} {'January'} {'January'} {'January'} {'January'}

1 2 3 4 5 6 7 8

23 31 25 39 29 12 10 4

Create a datetime variable t that contains the year, month, and day information for each observation in Tbl. numericMonth = month(datetime(Tbl.Month, ... InputFormat="MMMM")); t = datetime(Tbl.Year,numericMonth,Tbl.Day);

Plot the temperature values in Tbl over time. plot(t,Tbl.TemperatureF) xlabel("Date") ylabel("Temperature in Fahrenheit")

Create a direct forecasting model by using the data in Tbl. Specify the horizon steps as one, two, and three steps ahead. Train a model at each horizon using a bagged ensemble of trees. All three of the predictors (Year, Month, and Day) are leading predictors because their future values are known. To create new predictors by shifting the leading predictor and response variables backward in time, specify the leading predictor lags and the response variable lags. 35-1444

cvpredict

rng("default") Mdl = directforecaster(Tbl,"TemperatureF", ... Horizon=1:3,Learner="bag", ... LeadingPredictors="all",LeadingPredictorLags={0:1,0:1,0:7}, ... ResponseLags=1:7) Mdl = DirectForecaster Horizon: ResponseLags: LeadingPredictors: LeadingPredictorLags: ResponseName: PredictorNames: CategoricalPredictors: Learners: MaxLag: NumObservations:

[1 2 3] [1 2 3 4 5 6 7] [1 2 3] {[0 1] [0 1] [0 1 2 3 4 5 6 7]} 'TemperatureF' {'Year' 'Month' 'Day'} 2 {3x1 cell} 7 565

Mdl is a DirectForecaster model object. Mdl consists of three regression models: Mdl.Learners{1}, which predicts one step ahead; Mdl.Learners{2}, which predicts two steps ahead; and Mdl.Learners{3}, which predicts three steps ahead. Partition the time series data in Tbl using a holdout validation scheme. Reserve 20% of the observations for testing. holdoutPartition = tspartition(size(Mdl.X,1),"Holdout",0.20) holdoutPartition = tspartition Type: NumObservations: NumTestSets: TrainSize: TestSize:

'holdout' 565 1 452 113

The test set consists of the latest 113 observations. Create a partitioned direct forecasting model using the partition specified in holdoutPartition. holdoutMdl = crossval(Mdl,holdoutPartition) holdoutMdl = PartitionedDirectForecaster Partition: Horizon: ResponseLags: LeadingPredictors: LeadingPredictorLags: ResponseName: PredictorNames:

[1x1 tspartition] [1 2 3] [1 2 3 4 5 6 7] [1 2 3] {[0 1] [0 1] [0 1 2 3 4 5 6 7]} 'TemperatureF' {'Year' 'Month' 'Day'}

35-1445

35

Functions

CategoricalPredictors: Learners: MaxLag: NumObservations:

2 {[1x1 timeseries.forecaster.CompactDirectForecaster]} 7 565

holdoutMdl is a PartitionedDirectForecaster model object. Because holdoutMdl uses holdout validation rather than a cross-validation scheme, the Learners property of the object contains one CompactDirectForecaster model only. Like Mdl, holdoutMdl contains three regression models. The crossval function trains holdoutMdl.Learners{1}.Learners{1}, holdoutMdl.Learners{1}.Learners{2}, and holdoutMdl.Learners{1}.Learners{3} using the same training data. However, the three models use different response variables because each model predicts values for a different horizon step. holdoutMdl.Learners{1}.Learners{1}.ResponseName ans = 'TemperatureF_Step1' holdoutMdl.Learners{1}.Learners{2}.ResponseName ans = 'TemperatureF_Step2' holdoutMdl.Learners{1}.Learners{3}.ResponseName ans = 'TemperatureF_Step3'

Compute the root relative squared error (RRSE) on the test data at each horizon step. Use the helper function computeRRSE on page 35-1448 (shown at the end of this example). The RRSE indicates how well a model performs relative to the simple model, which always predicts the average of the true values. In particular, when the RRSE is less than 1, the model performs better than the simple model. holdoutRRSE = cvloss(holdoutMdl,LossFun=@computeRRSE) holdoutRRSE = 1×3 0.4797

0.5889

0.6103

At each horizon, the direct forecasting model seems to perform better than the simple model. For each test set observation, predict the temperature value using the corresponding model in holdoutMdl.Learners. predictedY = cvpredict(holdoutMdl); predictedY(450:end,:) ans=116×3 table TemperatureF_Step1 __________________ NaN NaN NaN

35-1446

TemperatureF_Step2 __________________ NaN NaN NaN

TemperatureF_Step3 __________________ NaN NaN NaN

cvpredict

⋮

41.063 33.721 36.987 38.644 38.917 45.888 48.516 44.882 35.057 31.1 31.817 33.166 40.279

39.758 36.507 35.133 34.598 34.576 37.005 42.762 46.816 45.301 41.473 37.314 38.419 38.432

41.234 37.719 37.719 36.444 36.275 38.34 41.05 43.881 47.048 42.948 42.946 41.3 40.533

Recall that only the latest 113 observations appear in the test set. For observations that do not appear in the test set, the predicted response value is NaN. For each test set, plot the true response values and the predicted response values. tiledlayout(3,1) idx = test(holdoutPartition); nexttile plot(t(idx),Tbl.TemperatureF(idx)) hold on plot(t(idx),predictedY.TemperatureF_Step1(idx)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Horizon 1") hold off nexttile plot(t(idx),Tbl.TemperatureF(idx)) hold on plot(t(idx),predictedY.TemperatureF_Step2(idx)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Horizon 2") hold off nexttile plot(t(idx),Tbl.TemperatureF(idx)) hold on plot(t(idx),predictedY.TemperatureF_Step3(idx)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Horizon 3") hold off

35-1447

35

Functions

Overall, holdoutMdl is able to predict the trend in temperatures, although it seems to perform best when forecasting one step ahead. If you are satisfied with the performance of the partitioned model, you can use the full DirectForecaster model Mdl for forecasting at time steps beyond the available data. Helper Function The helper function computeRRSE computes the RRSE given the true response variable trueY and the predicted values predY. This code creates the computeRRSE helper function. function rrse = computeRRSE(trueY,predY) error = trueY(:) - predY(:); meanY = mean(trueY(:),"omitnan"); rrse = sqrt(sum(error.^2,"omitnan")/sum((trueY(:) - meanY).^2,"omitnan")); end

Input Arguments CVMdl — Cross-validated direct forecasting model PartitionedDirectForecaster model object Cross-validated direct forecasting model, specified as a PartitionedDirectForecaster model object.

35-1448

cvpredict

Output Arguments predictedY — Predicted responses numeric matrix | table | timetable Predicted responses, returned as a numeric matrix, table, or timetable. predictedY has the same data type as CVMdl.Y and is of size n-by-h, where n is the number of observations (CVMdl.NumObservations) and h is the number of horizon steps (that is, the number of elements in CVMdl.Horizon). predictedY contains NaN values for observations that are not included in any test set. To identify the observations in test set i, you can use test(CVMdl.Partition,i).

Version History Introduced in R2023b

See Also PartitionedDirectForecaster | cvloss | DirectForecaster

35-1449

35

Functions

cvshrink Cross-validate regularization of linear discriminant

Syntax err = cvshrink(mdl) [err,gamma] = cvshrink(mdl) [err,gamma,delta] = cvshrink(mdl) [err,gamma,delta,numpred] = cvshrink(mdl) [ ___ ] = cvshrink(mdl,Name=Value)

Description err = cvshrink(mdl) returns a vector of cross-validated classification error values for differing values of the regularization parameter gamma. [err,gamma] = cvshrink(mdl) also returns the vector of gamma values. [err,gamma,delta] = cvshrink(mdl) also returns the vector of delta values. [err,gamma,delta,numpred] = cvshrink(mdl) returns the vector of number of nonzero predictors for each setting of the parameters gamma and delta. [ ___ ] = cvshrink(mdl,Name=Value) specifies additional options using one or more name-value arguments. For example, you can specify the number of delta and gamma intervals for crossvalidation, and the verbosity level of progress messages.

Examples Regularize Data with Many Predictors Regularize a discriminant analysis classifier, and view the tradeoff between the number of predictors in the model and the classification accuracy. Create a linear discriminant analysis classifier for the ovariancancer data. Set the SaveMemory and FillCoeffs options to keep the resulting model reasonably small. load ovariancancer obj = fitcdiscr(obs,grp,... 'SaveMemory','on','FillCoeffs','off');

Use 10 levels of Gamma and 10 levels of Delta to search for good parameters. This search is timeconsuming. Set Verbose to 1 to view the progress. rng('default') % for reproducibility [err,gamma,delta,numpred] = cvshrink(obj,... 'NumGamma',9,'NumDelta',9,'Verbose',1); Done building cross-validated model. Processing Gamma step 1 out of 10.

35-1450

cvshrink

Processing Processing Processing Processing Processing Processing Processing Processing Processing

Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma Gamma

step step step step step step step step step

2 out of 10. 3 out of 10. 4 out of 10. 5 out of 10. 6 out of 10. 7 out of 10. 8 out of 10. 9 out of 10. 10 out of 10.

Plot the classification error rate against the number of predictors. plot(err,numpred,'k.') xlabel('Error rate'); ylabel('Number of predictors');

Input Arguments mdl — Trained discriminant analysis classifier ClassificationDiscriminant model object Trained discriminant analysis classifier, specified as a ClassificationDiscriminant model object, trained with fitcdiscr.

35-1451

35

Functions

Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: [err,gamma,delta,numpred] = cvshrink(mdl,NumGamma=9,NumDelta=9,Verbose=1); delta — Delta values for cross-validation 0 (default) | numeric row vector | numeric matrix Delta values for cross-validation, specified as a numeric scalar, row vector, or matrix. • Scalar delta — cvshrink uses this value of delta with every value of gamma for regularization. • Row vector delta — For each i and j, cvshrink uses delta(j) with gamma(i) for regularization. • Matrix delta — The number of rows of delta must equal the number of elements in gamma. For each i and j, cvshrink uses delta(i,j) with gamma(i) for regularization. Example: delta=[0 .01 .1] Data Types: double gamma — Gamma values for cross-validation 0:0.1:1 (default) | numeric vector Gamma values for cross-validation, specified as a numeric vector. Example: gamma=[0 .01 .1] Data Types: double NumDelta — Number of delta intervals for cross-validation 0 (default) | nonnegative integer Number of delta intervals for cross-validation, specified as a nonnegative integer. For every value of gamma, cvshrink cross-validates the discriminant using NumDelta + 1 values of delta, uniformly spaced from zero to the maximal delta at which all predictors are eliminated for this value of gamma. If you set delta, cvshrink ignores NumDelta. Example: NumDelta=3 Data Types: double NumGamma — Number of gamma intervals for cross-validation 10 (default) | nonnegative integer Number of gamma intervals for cross-validation, specified as a nonnegative integer. cvshrink crossvalidates the discriminant using NumGamma + 1 values of gamma, uniformly spaced from MinGamma to 1. If you set gamma, cvshrink ignores NumGamma. Example: NumGamma=3 Data Types: double 35-1452

cvshrink

Verbose — Verbosity level 0 (default) | 1 | 2 Verbosity level, specified as 0, 1, or 2. Higher values give more progress messages. Example: Verbose=2 Data Types: double

Output Arguments err — Misclassification error rate numeric vector | numeric matrix Misclassification error rate, returned as a numeric vector or matrix of errors. The misclassification error rate is the average fraction of misclassified data over all folds. • If delta is a scalar (default), err(i) is the misclassification error rate for mdl regularized with gamma(i). • If delta is a vector, err(i,j) is the misclassification error rate for mdl regularized with gamma(i) and delta(j). • If delta is a matrix, err(i,j) is the misclassification error rate for mdl regularized with gamma(i) and delta(i,j). gamma — Gamma values used for regularization numeric vector Gamma values used for regularization, returned as a numeric vector. See “Gamma and Delta” on page 35-1454. delta — Delta values used for regularization numeric vector | numeric matrix Delta values used for regularization, returned as a numeric vector or matrix. See “Gamma and Delta” on page 35-1454. • If you specify a scalar for the delta name-value argument, the output delta is a row vector the same size as gamma, with entries equal to the input scalar. • If you specify a row vector for the delta name-value argument, the output delta is a matrix with the same number of columns as the row vector, and with the number of rows equal to the number of elements of gamma. The output delta(i,j) is equal to the input delta(j). • If you specify a matrix for the delta name-value argument, the output delta is the same as the input matrix. The number of rows of delta must equal the number of elements in gamma. numpred — Number of predictors in model at various regularizations numeric vector | numeric matrix Number of predictors in the model at various regularizations, returned as a numeric vector or matrix. numpred has the same size as err. • If delta is a scalar (default), numpred(i) is the number of predictors for mdl regularized with gamma(i) and delta. • If delta is a vector, numpred(i,j) is the number of predictors for mdl regularized with gamma(i) and delta(j). 35-1453

35

Functions

• If delta is a matrix, numpred(i,j) is the number of predictors for mdl regularized with gamma(i) and delta(i,j).

More About Gamma and Delta Regularization is the process of finding a small set of predictors that yield an effective predictive model. For linear discriminant analysis, there are two parameters, γ and δ, that control regularization as follows. cvshrink helps you select appropriate values of the parameters. Let Σ represent the covariance matrix of the data X, and let X be the centered data (the data X minus the mean by class). Define T

D = diag X * X . The regularized covariance matrix Σ is Σ = 1 − γ Σ + γD . Whenever γ ≥ MinGamma, Σ is nonsingular. Let μk be the mean vector for those elements of X in class k, and let μ0 be the global mean vector (the mean of the rows of X). Let C be the correlation matrix of the data X, and let C be the regularized correlation matrix: C = 1 − γ C + γI, where I is the identity matrix. The linear term in the regularized discriminant analysis classifier for a data point x is T −1

x − μ0 Σ

T

μk − μ0 = x − μ0 D−1/2 C

−1 −1/2

D

μk − μ0 .

The parameter δ enters into this equation as a threshold on the final term in square brackets. Each component of the vector C

−1 −1/2

D

μk − μ0 is set to zero if it is smaller in magnitude than the

threshold δ. Therefore, for class k, if component j is thresholded to zero, component j of x does not enter into the evaluation of the posterior probability. The DeltaPredictor property is a vector related to this threshold. When δ ≥ DeltaPredictor(i), all classes k have C

−1 −1/2

D

μk − μ0 ≤ δ .

Therefore, when δ ≥ DeltaPredictor(i), the regularized classifier does not use predictor i.

Tips • Examine the err and numpred outputs to see the tradeoff between the cross-validated error and the number of predictors. When you find a satisfactory point, set the corresponding gamma and 35-1454

cvshrink

delta properties in the model using dot notation. For example, if (i,j) is the location of the satisfactory point, set: mdl.Gamma = gamma(i); mdl.Delta = delta(i,j);

Version History Introduced in R2012b

See Also Classes ClassificationDiscriminant Functions fitcdiscr Topics “Regularize Discriminant Analysis Classifier” on page 21-21 “Discriminant Analysis Classification” on page 21-2

35-1455

35

Functions

cvshrink Cross-validate shrinking (pruning) ensemble

Syntax vals = cvshrink(ens) [vals,nlearn] = cvshrink(ens) [ ___ ] = cvshrink(ens,Name=Value)

Description vals = cvshrink(ens) returns an L-by-T matrix with cross-validated values of the mean squared error. L is the number of Lambda values in the ens.Regularization structure. T is the number of Threshold values on weak learner weights. If ens does not have a Regularization property containing values specified by the regularize function, set the Lambda name-value argument. [vals,nlearn] = cvshrink(ens) additionally returns an L-by-T matrix of the mean number of learners in the cross-validated ensemble. [ ___ ] = cvshrink(ens,Name=Value) specifies additional options using one or more name-value arguments. For example, you can specify the number of folds to use, the fraction of data to use for holdout validation, and lower cutoffs on weights for weak learners.

Input Arguments ens — Regression ensemble model RegressionEnsemble model object Regression ensemble model, specified as a RegressionEnsemble model object trained with fitrensemble. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: cvshrink(ens,Holdout=0.1,Threshold=[0 .01 .1]) specifies to reserve 10% of the data for holdout validation, and weight cutoffs of 0, 0.01, and 1 for the first, second, and third weak learners, respectively. CVPartition — Cross-validation partition [] (default) | cvpartition object Cross-validation partition, specified as a cvpartition object that specifies the type of crossvalidation and the indexing for the training and validation sets. To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. 35-1456

cvshrink

Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,KFold=5). Then, you can specify the cross-validation partition by setting CVPartition=cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as a scalar value in the range [0,1]. If you specify Holdout=p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Holdout=0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in the cross-validated model, specified as a positive integer value greater than 1. If you specify KFold=k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact trained models in a k-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: KFold=5 Data Types: single | double Lambda — Regularization parameter values "[]" (default) | vector of nonnegative scalar values Regularization parameter values for lasso, specified as a vector of nonnegative scalar values. If the value of this argument is empty, cvshrink does not perform cross-validation. Example: Lambda=[.01 .1 1] Data Types: single | double Leaveout — Leave-one-out cross-validation flag "off" (default) | "on" Leave-one-out cross-validation flag, specified as "on" or "off". If you specify Leaveout="on", then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 35-1457

35

Functions

1

Reserve the one observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact trained models in an n-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Leaveout="on" Data Types: char | string Threshold — Weights threshold 0 (default) | numeric vector Weights threshold, specified as a numeric vector with lower cutoffs on weights for weak learners. cvshrink discards learners with weights below Threshold in its cross-validation calculation. Example: Threshold=[0 .01 .1] Data Types: single | double

Output Arguments vals — Cross-validated values of mean squared error numeric matrix Cross-validated values of the mean squared error, returned as an L-by-T numeric matrix. L is the number of values of the regularization parameter Lambda, and T is the number of Threshold values on weak learner weights. nlearn — Mean number of learners numeric matrix Mean number of learners in the cross-validated ensemble, returned as an L-by-T numeric matrix. L is the number of values of the regularization parameter Lambda, and T is the number of Threshold values on weak learner weights.

Examples Cross-Validate Regression Ensemble Create a regression ensemble for predicting mileage from the carsmall data. Cross-validate the ensemble. Load the carsmall data set and select displacement, horsepower, and vehicle weight as predictors. load carsmall X = [Displacement Horsepower Weight];

You can train an ensemble of bagged regression trees. ens = fitrensemble(X,Y,Method="Bag")

35-1458

cvshrink

fircensemble uses a default template tree object templateTree() as a weak learner when 'Method' is 'Bag'. In this example, for reproducibility, specify 'Reproducible',true when you create a tree template object, and then use the object as a weak learner. rng('default') % For reproducibility t = templateTree(Reproducible=true); % For reproducibiliy of random predictor selections ens = fitrensemble(X,MPG,Method="Bag",Learners=t);

Specify values for Lambda and Threshold. Use these values to cross-validate the ensemble. [vals,nlearn] = cvshrink(ens,Lambda=[.01 .1 1],Threshold=[0 .01 .1]) vals = 3×3 18.9150 18.9099 19.0328

19.0092 18.9504 18.9636

128.5935 128.8449 116.8500

11.6000 11.7000 11.6000

4.1000 4.1000 4.1000

nlearn = 3×3 13.7000 13.7000 13.9000

Clearly, setting a threshold of 0.1 leads to unacceptable errors, while a threshold of 0.01 gives similar errors to a threshold of 0. The mean number of learners with a threshold of 0.01 is about 11.4, whereas the mean number is about 13.8 when the threshold is 0.

Version History Introduced in R2011a

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also regularize | shrink | RegressionEnsemble | fitrensemble

35-1459

35

Functions

datasample Randomly sample from data, with or without replacement

Syntax y = datasample(data,k) y = datasample(data,k,dim) y = datasample( ___ ,Name,Value) y = datasample(s, ___ ) [y,idx] = datasample( ___ )

Description y = datasample(data,k) returns k observations sampled uniformly at random, with replacement, from the data in data. y = datasample(data,k,dim) returns a sample taken along dimension dim of data. y = datasample( ___ ,Name,Value) returns a sample for any of the input arguments in the previous syntaxes, with additional options specified by one or more name-value pair arguments. For example, 'Replace',false specifies sampling without replacement. y = datasample(s, ___ ) uses the random number stream s to generate random numbers. The option s can precede any of the input arguments in the previous syntaxes. [y,idx] = datasample( ___ ) also returns an index vector indicating which values datasample sampled from data using any of the input arguments in the previous syntaxes.

Examples Sample Unique Values from Vector Create the random number stream for reproducibility. s = RandStream('mlfg6331_64');

Draw five unique values from the integers 1 to 10. y = datasample(s,1:10,5,'Replace',false) y = 1×5 9

8

3

6

2

Generate Random Characters for Specified Probabilities Create the random number stream for reproducibility. 35-1460

datasample

s = RandStream('mlfg6331_64');

Generate 48 random characters from the sequence ACGT per specified probabilities. seq = datasample(s,'ACGT',48,'Weights',[0.15 0.35 0.35 0.15]) seq = 'GGCGGCGCAAGGCGCCGGACCTGGCTGCACGCCGTTCCCTGCTACTCG'

Select Random Subset of Matrix Columns Set the random seed for reproducibility of the results. rng(10,'twister')

Generate a matrix with 10 rows and 1000 columns. X = randn(10,1000);

Create the random number stream for reproducibility within datasample. s = RandStream('mlfg6331_64');

Randomly select five unique columns from X. Y = datasample(s,X,5,2,'Replace',false) Y = 10×5 0.4317 0.6977 -0.8543 0.1686 -1.7649 -0.3821 -1.6844 -0.4170 -0.2410 0.6212

-0.3327 -0.7422 -0.3105 0.6609 -1.1607 0.5696 0.7148 1.3696 1.4703 1.4118

0.9112 0.4578 0.9836 -0.0553 -0.3513 -1.6264 -0.6876 1.1874 -2.5003 -0.4518

-2.3244 -1.3745 -0.6434 -0.1202 -1.5533 -0.2104 -0.4447 -0.9901 -1.1321 0.8697

0.9559 -0.8634 -0.4457 -1.3699 0.0597 -1.5486 -1.4615 0.5875 -1.8451 0.8093

Create a Bootstrap Replicate Data Set Resample observations from a dataset array to create a bootstrap replicate data set. See “Bootstrap Resampling” on page 3-10 for more information about bootstrapping. Load the sample data set. load hospital

Create a data set that has the same size as the hospital data set and contains random samples chosen with replacement from the hospital data set. y = datasample(hospital,size(hospital,1));

35-1461

35

Functions

Sample in Parallel from Two Data Vectors Select samples from data based on indices of a sample chosen from another vector. Generate two random vectors. x1 = randn(100,1); x2 = randn(100,1);

Select a sample of 10 elements from vector x1, and return the indices of the sample in vector idx. [y1,idx] = datasample(x1,10);

Select a sample of 10 elements from vector x2 using the indices in vector idx. y2 = x2(idx);

Input Arguments data — Input data vector | matrix | multidimensional array | table | dataset array Input data from which to sample, specified as a vector, matrix, multidimensional array, table, or dataset array. By default, datasample samples from the first nonsingleton dimension of data. For example, if data is a matrix, then datasample samples from the rows. Change this behavior with the dim input argument. Data Types: single | double | logical | char | string | table k — Number of samples positive integer Number of samples, specified as a positive integer. Example: datasample(data,100) returns 100 observations sampled uniformly and at random from the data in data. Data Types: single | double dim — Dimension to sample 1 (default) | positive integer Dimension to sample, specified as a positive integer. For example, if data is a matrix and dim is 2, y contains a selection of columns in data. If data is a table or dataset array and dim is 2, y contains a selection of variables in data. Use dim to ensure sampling along a specific dimension regardless of whether data is a vector, matrix, or N-dimensional array. Data Types: single | double s — Random number stream global stream (default) | RandStream Random number stream, specified as the global stream or RandStream. For example, s = RandStream('mlfg6331_64') creates a random number stream that uses the multiplicative lagged 35-1462

datasample

Fibonacci generator algorithm. For details, see “Creating and Controlling a Random Number Stream”. The rng function provides a simple way to control the global stream. For example, rng(seed) seeds the random number generator using the nonnegative integer seed. For details, see “Managing the Global Stream Using RandStream”. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Replace',false,'Weights',ones(datasize,1) samples without replacement and with probability proportional to the elements of Weights, where datasize is the size of the dimension being sampled. Replace — Indicator for sampling with replacement true (default) | false Indicator for sampling with replacement, specified as the comma-separated pair consisting of 'Replace' and either true or false. Sample with replacement if 'Replace' is true, or without replacement if 'Replace' is false. If 'Replace' is false, then k must not be larger than the size of the dimension being sampled. For example, if data = [1 3 Inf; 2 4 5] and y = datasample(data,k,'Replace',false), then k cannot be larger than 2. Data Types: logical Weights — Sampling weights ones(datasize,1) (default) | vector of nonnegative numeric values Sampling weights, specified as the comma-separated pair consisting of 'Weights' and a vector of nonnegative numeric values. The vector is of size datasize, where datasize is the size of the dimension being sampled. The vector must have at least one positive value and cannot contain NaN values. The datasample function samples with probability proportional to the elements of 'Weights'. Example: 'Weights',[0.1 0.5 0.35 0.46] Data Types: single | double

Output Arguments y — Sample vector | matrix | multidimensional array | table | dataset array Sample, returned as a vector, matrix, multidimensional array, table, or dataset array. • If data is a vector, then y is a vector containing k elements selected from data. • If data is a matrix and dim = 1, then y is a matrix containing k rows selected from data. Or, if dim = 2, then y is a matrix containing k columns selected from data. 35-1463

35

Functions

• If data is an N-dimensional array and dim = 1, then y is an N-dimensional array of samples taken along the first nonsingleton dimension of data. Or, if you specify a value for the dim name-value pair argument, datasample samples along the dimension dim. • If data is a table and dim = 1, then y is a table containing k rows selected from data. Or, if dim = 2, then y is a table containing k variables selected from data. • If data is a dataset array and dim = 1, then y is a dataset array containing k rows selected from data. Or, if dim = 2, then y is a dataset array containing k variables selected from data. If the input data contains missing observations that are represented as NaN values, datasample samples from the entire input, including the NaN values. For example, y = datasample([NaN 6 14],2) can return y = NaN 14. When the sample is taken with replacement (default), y can contain repeated observations from data. Set the Replace name-value pair argument to false to sample without replacement. idx — Indices vector Indices, returned as a vector indicating which elements datasample chooses from data to create y. For example: • If data is a vector, then y = data(idx). • If data is a matrix and dim = 1, then y = data(idx,:). • If data is a matrix and dim = 2, then y = data(:,idx).

Tips • To sample random integers with replacement from a range, use randi. • To sample random integers without replacement, use randperm or datasample. • To randomly sample from data, with or without replacement, use datasample.

Algorithms datasample uses randperm, rand, or randi to generate random values. Therefore, datasample changes the state of the MATLAB global random number generator. Control the random number generator using rng. For selecting weighted samples without replacement, datasample uses the algorithm of Wong and Easton [1].

Alternative Functionality You can use randi or randperm to generate indices for random sampling with or without replacement, respectively. However, datasample can be more convenient to use because it samples directly from your data. datasample also allows weighted sampling.

Version History Introduced in R2011b 35-1464

datasample

References [1] Wong, C. K. and M. C. Easton. "An Efficient Method for Weighted Sampling Without Replacement." SIAM Journal of Computing 9(1), pp. 111–113, 1980.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function supports tall arrays for out-of-memory data with some limitations. • datasample is useful as a precursor to plotting and fitting a random subset of a large data set. Sampling a large data set preserves trends in the data without requiring the use of all the data points. If the sample is small enough to fit in memory, then you can apply plotting and fitting functions that do not directly support tall arrays. • datasample supports sampling only along the first dimension of the data. • For tall arrays, datasample does not support sampling with replacement. You must specify 'Replace',false, for example, datasample(data,k,'Replace',false). • The value of 'Weights' must be a numeric tall array of the same height as data. • For the syntax [Y,idx] = datasample(___), the output idx is a tall logical vector of the same height as data. The vector indicates whether each data point is included in the sample. • If you specify a random number stream, then the underlying generator must support multiple streams and substreams. If you do not specify a random number stream, then datasample uses the stream controlled by tallrng. For more information, see “Tall Arrays for Out-of-Memory Data”.

See Also rand | randi | randperm | RandStream | rng | tallrng

35-1465

35

Functions

dataset class (Not Recommended) Arrays for statistical data Note The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Description Dataset arrays are used to collect heterogeneous data and metadata including variable and observation names into a single container variable. Dataset arrays are suitable for storing columnoriented or tabular data that are often stored as columns in a text file or in a spreadsheet, and can accommodate variables of different types, sizes, units, etc. Dataset arrays can contain different kinds of variables, including numeric, logical, character, string, categorical, and cell. However, a dataset array is a different class than the variables that it contains. For example, even a dataset array that contains only variables that are double arrays cannot be operated on as if it were itself a double array. However, using dot subscripting, you can operate on variable in a dataset array as if it were a workspace variable. You can subscript dataset arrays using parentheses much like ordinary numeric arrays, but in addition to numeric and logical indices, you can use variable and observation names as indices.

Construction Use the dataset constructor to create a dataset array from variables in the MATLAB workspace. You can also create a dataset array by reading data from a text or spreadsheet file. You can access each variable in a dataset array much like fields in a structure, using dot subscripting. See the following section for a list of operations available for dataset arrays. dataset

35-1466

(Not Recommended) Construct dataset array

dataset class

Methods cat

(Not Recommended) Concatenate dataset arrays

cellstr

(Not Recommended) Create cell array of character vectors from dataset array

dataset2cell

(Not Recommended) Convert dataset array to cell array

dataset2struct

(Not Recommended) Convert dataset array to structure

datasetfun

(Not Recommended) Apply function to dataset array variables

disp

(Not Recommended) Display dataset array

display

(Not Recommended) Display dataset array

double

(Not Recommended) Convert dataset variables to double array

end

(Not Recommended) Last index in indexing expression for dataset array

export

(Not Recommended) Write dataset array to file

get

(Not Recommended) Access dataset array properties

horzcat

(Not Recommended) Horizontal concatenation for dataset arrays

intersect

(Not Recommended) Set intersection for dataset array observations

isempty

(Not Recommended) True for empty dataset array

ismember

(Not Recommended) Dataset array elements that are members of set

ismissing

(Not Recommended) Find dataset array elements with missing values

join

(Not Recommended) Merge dataset array observations

length

(Not Recommended) Length of dataset array

ndims

(Not Recommended) Number of dimensions of dataset array

numel

(Not Recommended) Number of elements in dataset array

replacedata

(Not Recommended) Replace dataset variables

replaceWithMissing (Not Recommended) Insert missing data indicators into a dataset array set

(Not Recommended) Set and display dataset array properties

setdiff

(Not Recommended) Set difference for dataset array observations

setxor

(Not Recommended) Set exclusive or for dataset array observations

single

(Not Recommended) Convert dataset variables to single array

size

(Not Recommended) Size of dataset array

sortrows

(Not Recommended) Sort rows of dataset array

stack

(Not Recommended) Stack dataset array from multiple variables into single variable

subsasgn

(Not Recommended) Subscripted assignment to dataset array

subsref

(Not Recommended) Subscripted reference for dataset array

summary

(Not Recommended) Print summary of dataset array

union

(Not Recommended) Set union for dataset array observations

unique

(Not Recommended) Unique observations in dataset array

unstack

(Not Recommended) Unstack dataset array from single variable into multiple variables

vertcat

(Not Recommended) Vertical concatenation for dataset arrays

35-1467

35

Functions

Properties A dataset array D has properties that store metadata (information about your data). Access or assign to a property using P = D.Properties.PropName or D.Properties.PropName = P, where PropName is one of the following: Description Description is a character vector describing the dataset array. The default is an empty character vector. DimNames A two-element cell array of character vectors giving the names of the two dimensions of the dataset array. The default is {'Observations' 'Variables'}. ObsNames A cell array of nonempty, distinct character vectors giving the names of the observations in the dataset array. This property may be empty, but if not empty, the number of character vectors must equal the number of observations. Units A cell array of character vectors giving the units of the variables in the dataset array. This property may be empty, but if not empty, the number of character vectors must equal the number of variables. Any individual character vector may be empty for a variable that does not have units defined. The default is an empty cell array. UserData Any variable containing additional information to be associated with the dataset array. The default is an empty array. VarDescription A cell array of character vectors giving the descriptions of the variables in the dataset array. This property may be empty, but if not empty, the number of character vectors must equal the number of variables. Any individual character vector may be empty for a variable that does not have a description defined. The default is an empty cell array. VarNames A cell array of nonempty, distinct character vectors giving the names of the variables in the dataset array. The number of character vectors must equal the number of variables. The default is the cell array of names for the variables used to create the data set.

Copy Semantics Value. To learn how this affects your use of the class, see Comparing Handle and Value Classes in the MATLAB Object-Oriented Programming documentation.

35-1468

dataset class

Examples Load a dataset array from a .mat file and create some simple subsets: load hospital h1 = hospital(1:10,:) h2 = hospital(:,{'LastName' 'Age' 'Sex' 'Smoker'}) % Access and modify metadata hospital.Properties.Description hospital.Properties.VarNames{4} = 'Wgt' % Create a new dataset variable from an existing one hospital.AtRisk = hospital.Smoker | (hospital.Age > 40) % Use individual variables to explore the data boxplot(hospital.Age,hospital.Sex) h3 = hospital(hospital.Age50K evaluator.ReferenceGroup ans = 'White' report(evaluator,BiasMetrics="DisparateImpact") ans=5×5 table Metrics _______________ DisparateImpact DisparateImpact DisparateImpact DisparateImpact DisparateImpact

SensitiveAttributeNames _______________________ race race race race race

Groups __________________ Amer-Indian-Eskimo Asian-Pac-Islander Black Other White

Original Model ______________ 0.41702 1.719 0.60571 0.66958 1

35-1619

New Mod _______

0.9280 0.969 0.6662 0.8603

35

Functions

For the mdl predictions, several of the disparate impact values are below the industry standard of 0.8, and one value is above 1.25. These values indicate bias in the predictions with respect to the positive class >50K and the sensitive attribute race. The disparate impact values for the newMdl predictions are closer to 1 than the disparate impact values for the mdl predictions. One value is still below 0.8. Visually compare the disparate impact values by using the bar graph returned by the plot object function. plot(evaluator,"DisparateImpact")

The disparateImpactRemover function seems to have improved the model predictions on the test set with respect to the disparate impact metric. Check whether the transformed predictors negatively affect the accuracy of the model predictions. Compute the accuracy of the test set predictions for the two models mdl and newMdl. accuracy = 1-loss(mdl,adulttest,"salary") accuracy = 0.8024 newAccuracy = 1-loss(newMdl,newadulttest,"salary") newAccuracy = 0.7955

The model trained using the transformed predictors (newMdl) achieves similar test set accuracy compared to the model trained with the original predictors (mdl). 35-1620

disparateImpactRemover

Understand and Visualize Disparate Impact Removal Try to remove the disparate impact of a sensitive attribute by adjusting continuous numeric predictors. Visualize the difference between the original and adjusted predictor values. Suppose you want to create a binary classifier that predicts whether a patient is a smoker based on the patient's diastolic and systolic blood pressure values. Also, you want to remove the disparate impact of the patient's gender on model predictions. Before training the model, you can use disparateImpactRemover to transform the continuous predictor variables in your data set. Load the patients data set, which contains medical information for 100 patients. Convert the Gender and Smoker variables to categorical variables. Specify the descriptive category names Smoker and Nonsmoker rather than 1 and 0. load patients Gender = categorical(Gender); Smoker = categorical(Smoker,logical([1 0]), ... ["Smoker","Nonsmoker"]);

Create a matrix containing the continuous predictors Diastolic and Systolic. X = [Diastolic,Systolic];

Find the observations in the two groups of the sensitive attribute Gender. femaleIdx = Gender=="Female"; maleIdx = Gender=="Male"; femaleX = X(femaleIdx,:); maleX = X(maleIdx,:);

Compute the Diastolic and Systolic quantiles for the two groups in the sensitive attribute. Specify the number of quantiles to be the minimum number of group observations across the groups in the sensitive attribute, provided that the number is smaller than 100. t = tabulate(Gender); t = array2table(t,VariableNames=["Value","Count","Percent"]) t=2×3 table Value __________

Count ______

Percent _______

{'Female'} {'Male' }

{[53]} {[47]}

{[53]} {[47]}

numQuantiles = min(100,min(t.Count{:})) numQuantiles = 47 femaleQuantiles = quantile(femaleX,numQuantiles,1); maleQuantiles = quantile(maleX,numQuantiles,1);

Compute the median quantiles across the two groups. 35-1621

35

Functions

Q(:,:,1) = femaleQuantiles; Q(:,:,2) = maleQuantiles; medianQuantiles = median(Q,3);

Plot the results. Show the Diastolic quantiles in the left plot and the Systolic quantiles in the right plot. tiledlayout(1,2) nexttile % Diastolic plot(femaleQuantiles(:,1),1:numQuantiles) hold on plot(maleQuantiles(:,1),1:numQuantiles) plot(medianQuantiles(:,1),1:numQuantiles) hold off xlabel("Diastolic") ylabel("Quantile") legend(["Female","Male","Median"],Location="southeast") nexttile % Systolic plot(femaleQuantiles(:,2),1:numQuantiles) hold on plot(maleQuantiles(:,2),1:numQuantiles) plot(medianQuantiles(:,2),1:numQuantiles) hold off xlabel("Systolic") ylabel("Quantile") legend(["Female","Male","Median"],Location="southeast")

35-1622

disparateImpactRemover

For each predictor, the Female and Male quantiles differ. The disparateImpactRemover function uses the median quantiles to adjust this difference. Transform the Diastolic and Systolic predictors in X by using the Gender sensitive attribute. [remover,newX] = disparateImpactRemover(X,Gender); femaleNewX = newX(femaleIdx,:); maleNewX = newX(maleIdx,:);

Visualize the difference in the Diastolic distributions between the original values in X and the transformed values in newX. Compute and display the probability density estimates by using the ksdensity function. tiledlayout(1,2) nexttile ksdensity(femaleX(:,1)) hold on ksdensity(maleX(:,1)) hold off xlabel("Diastolic") ylabel("Probability Density Estimate") title("Original") legend(["Female","Male"]) ylim([0,0.07]) nexttile ksdensity(femaleNewX{:,1}) hold on ksdensity(maleNewX{:,1}) hold off xlabel("Diastolic") ylabel("Probability Density Estimate") title("Transformed") legend(["Female","Male"]) ylim([0,0.07])

35-1623

35

Functions

The disparateImpactRemover function transforms the values in the Diastolic predictor variable so that the distribution of Female values and the distribution of Male values are similar. You can now train a binary classifier using the adjusted predictor data. For this example, train a tree classifier. tree = fitctree(newX,Smoker) tree = ClassificationTree PredictorNames: ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations:

{'x1' 'x2'} 'Y' [] [Smoker Nonsmoker] 'none' 100

Note: You must transform new data sets before passing them to the classifier for prediction. Randomly sample 10 observations from X. Transform the values using the remover object and the transform object function. Then, predict the smoker status for the observations. rng("default") % For reproducibility testIdx = randsample(size(X,1),10,1); testX = transform(remover,X(testIdx,:),Gender(testIdx)); label = predict(tree,testX)

35-1624

disparateImpactRemover

label = 10x1 categorical Nonsmoker Smoker Nonsmoker Nonsmoker Nonsmoker Nonsmoker Nonsmoker Smoker Smoker Smoker

Specify Different Repair Fractions Specify the extent of the transformation of the continuous numeric predictors with respect to a sensitive attribute. Use the RepairFraction name-value argument of the disparateImpactRemover function. Load the patients data set, which contains medical information for 100 patients. Convert the Gender and Smoker variables to categorical variables. Specify the descriptive category names Smoker and Nonsmoker rather than 1 and 0. load patients Gender = categorical(Gender); Smoker = categorical(Smoker,logical([1 0]), ... ["Smoker","Nonsmoker"]);

Create a matrix containing the continuous predictors Diastolic and Systolic. X = [Diastolic,Systolic];

Find the observations in the two groups of the sensitive attribute Gender. femaleIdx = Gender=="Female"; maleIdx = Gender=="Male"; femaleX = X(femaleIdx,:); maleX = X(maleIdx,:);

Transform the Diastolic and Systolic predictors in X by using the Gender sensitive attribute. Specify a repair fraction of 0.5. Note that a value of 1 indicates a full transformation, and a value of 0 indicates no transformation. [remover,newX50] = disparateImpactRemover(X,Gender, ... RepairFraction=0.5); femaleNewX50 = newX50(femaleIdx,:); maleNewX50 = newX50(maleIdx,:);

Fully transform the predictor variables by using the transform object function of the remover object. newX100 = transform(remover,X,Gender,RepairFraction=1); femaleNewX100 = newX100(femaleIdx,:); maleNewX100 = newX100(maleIdx,:);

35-1625

35

Functions

Visualize the difference in the Diastolic distributions between the original values in X, the partially repaired values in newX50, and the fully transformed values in newX100. Compute and display the probability density estimates by using the ksdensity function. t = tiledlayout(1,3); title(t,"Diastolic Distributions with Different " + ... "Repair Fractions") xlabel(t,"Diastolic") ylabel(t,"Density Estimate") nexttile ksdensity(femaleX(:,1)) hold on ksdensity(maleX(:,1)) hold off title("Fraction=0") ylim([0,0.07]) nexttile ksdensity(femaleNewX50{:,1}) hold on ksdensity(maleNewX50{:,1}) hold off title("Fraction=0.5") ylim([0,0.07]) nexttile ksdensity(femaleNewX100{:,1}) hold on ksdensity(maleNewX100{:,1}) hold off title("Fraction=1") ylim([0,0.07]) legend(["Female","Male"],Location="eastoutside")

35-1626

disparateImpactRemover

As the repair fraction increases, the disparateImpactRemover function transforms the values in the Diastolic predictor variable so that the distribution of Female values and the distribution of Male values become more similar.

More About Disparate Impact For each group in the sensitive attribute, the disparate impact value is the proportion of observations in that group with a positive class value (pg+) divided by the proportion of observations in the reference group with a positive class value (pr+). Ideally, pg+ is close to pr+—that is, the disparate impact value is close to 1. For more information on disparate impact and other bias metrics, see “Bias Metrics” on page 351924.

Tips • After using disparateImpactRemover, consider using only continuous and ordinal predictors for model training. Avoid using the sensitive attribute as a separate predictor when training your model. For more information, see [1]. • You must transform new data, such as test data, after training a model using disparateImpactRemover. Otherwise, the predicted results are inaccurate. Use the transform object function. 35-1627

35

Functions

Algorithms disparateImpactRemover transforms a continuous predictor in Tbl or X as follows: 1

The software uses the groups in the sensitive attribute to split the predictor values. For each group g, the software computes q quantiles of the predictor values by using the quantile function. The number of quantiles q is either 100 or the minimum number of group observations across the groups in the sensitive attribute, whichever is smaller. The software creates a corresponding binning function Fg using the discretize function and the quantile values as bin edges.

2

The software then finds the median quantile values across all the sensitive attribute groups and forms the associated quantile function Fm-1. The software omits missing (NaN) values from this calculation.

3

Finally, the software transforms the predictor value x in the sensitive attribute group g by using the transformation λFm-1(Fg(x)) + (1 – λ)x, where λ is the repair fraction RepairFraction. The software preserves missing (NaN) values in the predictor.

The function stores the transformation, which you can apply to new predictor data. For more information, see [1].

Version History Introduced in R2022b

References [1] Feldman, Michael, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. “Certifying and Removing Disparate Impact.” In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 259– 68. Sydney NSW Australia: ACM, 2015. https://doi.org/10.1145/2783258.2783311.

See Also transform | fairnessMetrics | fairnessWeights Topics “Introduction to Fairness in Binary Classification” on page 26-2

35-1628

display

display Class: dataset (Not Recommended) Display dataset array Note The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Syntax display(ds)

Description display(ds) prints the dataset array ds, including variable names and observation names (if present). dataset callsdisplay when a you do not use a semicolon to terminate a statement For numeric or categorical variables that are 2-D and have three or fewer columns, display prints the actual data. Otherwise, display prints the size and type of each dataset element. For character variables that are 2-D and 10 or fewer characters wide, display prints quoted text. Otherwise, display prints the size and type of each dataset element. For cell variables that are 2-D and have three or fewer columns, display prints the contents of each cell (or its size and type if too large). Otherwise, display prints the size of each dataset element. For time series variables, display prints columns for both the time and the data. If the variable is 2D and has three or fewer columns, display prints the actual data. Otherwise, display prints the size and type of each dataset element. For other types of variables, display prints the size and type of each dataset element.

See Also dataset | format

35-1629

35

Functions

distributionFitter Open Distribution Fitter app

Syntax distributionFitter distributionFitter(y) distributionFitter(y,cens) distributionFitter(y,cens,freq) distributionFitter(y,cens,freq,dsname)

Description This page contains programmatic syntax information for the Distribution Fitter app. For general usage information, see Distribution Fitter. distributionFitter opens the Distribution Fitter app, or brings focus to the app if it is already open. distributionFitter(y) opens the Distribution Fitter app populated with the data specified by the vector y. distributionFitter(y,cens) uses the vector cens to specify whether each observation in y is censored. distributionFitter(y,cens,freq) uses the vector freq to specify the frequency of each element of y. distributionFitter(y,cens,freq,dsname) creates a data set with the name dsname, using the data vector, y, censoring indicator, cens, and frequency vector, freq.

Examples Open Distribution Fitter App with Existing Data Load the carsmall sample data. load carsmall

Open the Distribution Fitter app using the MPG miles per gallon data. distributionFitter(MPG)

35-1630

distributionFitter

The Distribution Fitter app opens, populated with the MPG data, and displays the density (PDF) plot. You can use the app to display different plots and fit distributions to this data.

Open Distribution Fitter App with Censoring Data Load the sample data. load lightbulb.mat

The first column of the data contains the lifetime (in hours) of two types of light bulbs. The second column contains information about the type of light bulb. 1 indicates fluorescent bulbs, and 0 indicates the incandescent bulb. The third column contains censoring information. 1 indicates censored data, and 0 indicates the exact failure time. This is simulated data. Open the Distribution Fitter app using the first column of lightbulb as the input data, and the third column as the censoring data. Name the data lifetime. 35-1631

35

Functions

distributionFitter(lightbulb(:,1),lightbulb(:,3),[],'lifetime')

To open the Data dialog box, click Data. In the Manage data sets pane, click to highlight the lifetime data set row. Finally, to open the View Data Set dialog box, click View. The lifetime data appears in the second column and the corresponding censoring indicator appears in the third column.

35-1632

distributionFitter

Input Arguments y — Input data array of scalar values | variable representing an array of scalar values Input data, specified as an array of scalar values or a variable representing an array of such values. Data Types: single | double cens — Censoring indicator zeros(n) (default) | vector of 0 and 1 values Censoring indicator, specified as a vector of 0 and 1 values. The length of cens must be equal to the length of y. If y(j) is censored, then (cens(j)==1). If y(j) is not censored, then (cens(j)==0). If cens is omitted or empty, then no y values are censored. If you have frequency data (freq) but not censoring data (cens), then you must specify empty brackets ([]) for cens. Data Types: single | double freq — Frequency data ones(n) (default) | vector of scalar values Frequency data, specified as a vector of scalar values. The length of freq must be equal to the length of y. If freq is omitted or empty, then all y values have a frequency of 1. 35-1633

35

Functions

If you have frequency data (freq) but not censoring data (cens), then you must specify empty brackets ([]) for cens. Data Types: single | double dsname — Data set name character vector | string scalar Data set name, specified as a character vector enclosed in single quotes or a string scalar enclosed in double quotes. If you want to specify a data set name, but do not have censoring data (cens) or frequency data (freq), then you must specify empty brackets ([]) for both freq and cens. Example: 'MyData' Data Types: char | string

Version History Introduced before R2006a R2017a: Renamed from dfittool dfittool is now called distributionFitter. The behavior remains the same, and there are no plans to remove support for dfittool.

See Also fitdist | makedist | Distribution Fitter Topics “Fit a Distribution Using the Distribution Fitter App” on page 5-72 “Model Data Using the Distribution Fitter App” on page 5-52 “Working with Probability Distributions” on page 5-3 “Supported Distributions” on page 5-16

35-1634

Probability Distribution Function

Probability Distribution Function Interactive density and distribution plots

Description The Probability Distribution Function user interface creates an interactive plot of the cumulative distribution function (cdf) or probability density function (pdf) for a probability distribution. Explore the effects of changing parameter values on the shape of the plot, either by specifying parameter values or using interactive sliders. Required Products • MATLAB • Statistics and Machine Learning Toolbox Note: disttool does not provide printing, code generating, or data importing functionality in MATLAB Online.

35-1635

35

Functions

Open the Probability Distribution Function App • At the command prompt, enter disttool.

Examples Explore the Probability Distribution Function User Interface This example shows how to use the Probability Distribution Function user interface to explore the shape of cdf and pdf plots for different probability distributions and parameter values. Open the Probability Distribution Function user interface. disttool

The interface opens with a plot of the cdf of the Normal distribution. The initial parameter settings are Mu = 0 and Sigma = 1. Select PDF from the Function type drop-down menu to plot the pdf of the Normal distribution using the same parameter values. 35-1636

Probability Distribution Function

Change the value of the location parameter Mu to 1.

35-1637

35

Functions

As the parameter values change, the shape of the plot also changes. Also, the value of X remains the same, but the density value changes because of the new parameter value. Use the Distribution drop-down menu to change the distribution type from Normal to Weibull.

35-1638

Probability Distribution Function

The shape of the plot changes, along with the names and values of the parameters.

Parameters Distribution — Probability distribution Normal (default) | Exponential | Poisson | Weibull | ... Specify the probability distribution to explore by selecting a distribution name from the drop-down list. The drop-down list includes approximately 25 probability distribution options, including Normal, Exponential, Poisson, Weibull, and more. Function type — Probability distribution function type CDF (default) | PDF Specify the probability distribution function type as CDF (cumulative distribution function) or PDF (probability density function) by selecting the function name from the drop-down list. Probability — Cumulative distribution function value numeric value in the range [0,1] 35-1639

35

Functions

Specify the cumulative distribution function (cdf) value of interest as a numeric value in the range [0,1]. The corresponding random variable value appears in the X field below the plot. Alternatively, you can specify a value for X, and the Probability value will update automatically. This option only appears if Function type is CDF. If Function type is PDF, then the probability density at the specified X value displays to the left of the plot. X — Random variable numeric value Specify the random variable of interest as a numeric value. If the Function type is CDF, then the corresponding cumulative distribution function (cdf) value appears in the Probability field to the left of the plot. Alternatively, you can specify a value for Probability, and the X value will update automatically. If the Function type is PDF, then the corresponding probability density value appears to the left of the plot. Parameters — Parameter boundaries and values numeric value Specify the parameter boundaries and values as numeric values. Each column contains a field for the upper bound, value, and lower bound of one parameter. The name and number of available parameters changes based on the distribution specified in the Distribution drop-down list. For example, if you select the Normal distribution, then disttool enables two columns: One column for the Mu parameter and one column for the Sigma parameter. If you select the Exponential distribution, then disttool enables one column for the Mu parameter.

Tips To change the value of X (on the y-axis), or Probability or Density (on the x-axis): • Type the values of interest into the Probability or X fields; • Click on the point of interest on the plot; or • Click and drag the reference lines across the plot.

Version History Introduced before R2006a

See Also Functions distributionFitter | fitdist | makedist

35-1640

double

double Class: dataset (Not Recommended) Convert dataset variables to double array Note The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Syntax b = double(A) b = double(a,vars)

Description b = double(A) returns the contents of the dataset A, converted to one double array. The classes of the variables in the dataset must support the conversion. b = double(a,vars) returns the contents of the dataset variables specified by vars. vars is a positive integer, a vector of positive integers, a character vector, a string array, a cell array of character vectors, or a logical vector.

See Also dataset | single | replacedata

35-1641

35

Functions

DriftDiagnostics Diagnostics information for batch drift detection

Description A DriftDiagnostics object stores the diagnostics information returned by the detectdrift function after it performs permutation testing for batch drift detection.

Creation Create a DriftDiagnostics object by using detectdrift to test for drift between baseline and target data sets.

Properties Baseline — Baseline data set numeric array | categorical array | table This property is read-only. Baseline data set, specified as a numeric array, categorical array, or table. Data Types: double | categorical | table CategoricalVariables — Indices of categorical variables in data numeric array | [] This property is read-only. Indices of the categorical variables in the data, specified as a numeric array. If the data does not contain any categorical variables, then this property is empty ([]). Data Types: double ConfidenceIntervals — 95% confidence interval bounds for estimated p-values two-row matrix of positive scalar values from 0 to 1 | NaN This property is read-only. 95% confidence interval bounds for the estimated p-values of the variables, specified as a 2-by-k matrix of positive scalar values from 0 to 1, where k is the number of variables. The rows of ConfidenceIntervals correspond to the lower and upper bounds of the confidence intervals, respectively. If you set EstimatePValues to false in the call to detectdrift, then the function does not compute the confidence interval bounds. In this case, ConfidenceIntervals property contains NaNs. Data Types: double 35-1642

DriftDiagnostics

DriftStatus — Drift status for each variable string array This property is read-only. Drift status for each variable, specified as a string array containing the possible values shown in this table. Drift Status

Condition

Drift

Upper < DriftThreshold

Warning

DriftThreshold < Lower < WarningThreshold or DriftThreshold < Upper < WarningThreshold

Stable

Lower > WarningThreshold

Lower and Upper are the lower and upper confidence interval bounds for an estimated p-value. Data Types: string DriftThreshold — Threshold to determine drift status scalar value from 0 to 1 This property is read-only. Threshold to determine the drift status, specified as a scalar value from 0 to 1. If the upper bound of the confidence interval for the estimated p-value is below DriftThreshold, then the drift status is Drift. Data Types: double Metrics — List of metrics string array This property is read-only. List of the metrics used by detectdrift to quantify the difference between the baseline and target data for each variable during permutation testing, specified as a string array. Data Types: string MetricValues — Metric values for variables row vector This property is read-only. Metric values for the corresponding variables, specified as a row vector with the number of columns equal to the number of variables specified for drift detection. The metric corresponding to each variable is stored in the Metrics property. Data Types: double MultipleTestCorrection — Multiple hypothesis testing correction "Bonferroni" | "FalseDiscoveryRate" This property is read-only. 35-1643

35

Functions

Multiple hypothesis testing correction, specified as either "Bonferroni" or "FalseDiscoveryRate". If you set EstimatePValues to false in the call to detectdrift, do not set the MultipleTestCorrection name-value argument because the function ignores it in this case. Data Types: string MultipleTestDriftStatus — Drift status for overall data "Drift" | "Warning" | "Stable" This property is read-only. Drift status for the overall data estimated by detectdrift using the multiple test correction method in MultipleTestCorrection, specified as "Drift", "Warning", or "Stable". Multiple test corrections provide a conservative estimate of the drift status when multiple variables are tested. If you set EstimatePValues to false in the call to detectdrift, then the function does not populate MultipleTestDriftStatus. Data Types: string NumPermutations — Number of permutation tests performed for each variable array of integer values This property is read-only. Number of permutation tests performed by detectdrift for each variable to determine the drift status for that variable, specified as an array of integer values. If you set EstimatePValues to false in the call to detectdrift, then NumPermutations is a row vector of ones corresponding to the baseline and target data provided. The metric values are the initial computations that use the baseline and target data for each variable. Data Types: double PermutationResults — Permutation testing results for each variable table This property is read-only. Permutation testing results for each variable, specified as a k-by-1 table, where k is the number of variables. Each row corresponds to one variable and contains a 1-by-1 cell array of the metric values in a vector whose size is equal to the number of permutations for that variable. To access the metric values for the second variable, for example, use DDiagnostics.PermutationResults{2,1} {1,1}. If you set EstimatePValues to false in the call to detectdrift, then PermutationResults contains only the initial metric values for each variable. You can visualize the test results using plotPermutationResults. Data Types: table PValues — Estimated p-value for each variable vector of scalar values from 0 to 1 35-1644

DriftDiagnostics

This property is read-only. Estimated p-value for each variable, specified as a vector of scalar values from 0 to 1. If you set EstimatePValues to false in the call to detectdrift, then PValues is a vector of NaNs. Data Types: double Target — Target data set numeric array | categorical array | table This property is read-only. Target data set, specified as a numeric array, categorical array, or table. Data Types: single | double | categorical | table VariableNames — Variables specified for drift detection string array This property is read-only. Variables specified for drift detection in the call to detectdrift, specified as a string array. Data Types: string WarningThreshold — Threshold to determine warning status scalar value from 0 to 1 This property is read-only. Threshold to determine the warning status, specified as a scalar value from 0 to 1. Data Types: double

Object Functions ecdf histcounts plotDriftStatus plotEmpiricalCDF plotHistogram plotPermutationResults summary

Compute empirical cumulative distribution function (ecdf) for baseline and target data specified for data drift detection Compute histogram bin counts for specified variables in baseline and target data for drift detection Plot p-values and confidence intervals for variables tested for data drift Plot empirical cumulative distribution function (ecdf) of a variable specified for data drift detection Plot histogram of a variable specified for data drift detection Plot histogram of permutation results for a variable specified for data drift detection Summary table for DriftDiagnostics object

Examples Test and Examine Drift Status Load the sample data. 35-1645

35

Functions

load humanactivity

For details on the data set, enter Description at the command line. Assign the first 250 observations as baseline data and the next 250 as target data for variables 1 to 15. baseline = feat(1:250,1:15); target = feat(251:500,1:15);

Test for drift on all variables. DDiagnostics = detectdrift(baseline,target);

Display a summary of the test results. summary(DDiagnostics) Multiple Test Correction Drift Status: Drift DriftStatus ___________ x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15

"Drift" "Drift" "Drift" "Drift" "Drift" "Drift" "Drift" "Stable" "Stable" "Drift" "Stable" "Stable" "Drift" "Stable" "Warning"

PValue ______

ConfidenceInterval ________________________

0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.863 0.726 0.001 0.496 0.249 0.001 0.574 0.094

2.5317e-05 2.5317e-05 2.5317e-05 2.5317e-05 2.5317e-05 2.5317e-05 2.5317e-05 0.84012 0.69722 2.5317e-05 0.46456 0.22247 2.5317e-05 0.54267 0.076629

0.0055589 0.0055589 0.0055589 0.0055589 0.0055589 0.0055589 0.0055589 0.88372 0.75344 0.0055589 0.52746 0.27702 0.0055589 0.60489 0.1138

The summary table shows the drift status and estimated p-value for each variable tested for drift detection. You can also see the 95% confidence interval bounds for the p-values. Plot drift status for variables x10 to x15. plotDriftStatus(DDiagnostics,Variables=(10:15))

35-1646

DriftDiagnostics

Compute the ecdf values for variables x13 and x15. E = ecdf(DDiagnostics,Variables=["x13","x15"]) E=2×3 table

x13 x15

x ______________

F_Baseline ______________

F_Target ______________

{501×1 double} {501×1 double}

{501×1 double} {501×1 double}

{501×1 double} {501×1 double}

x contains the common domain over which ecdf computes the empirical cumulative distribution function for the baseline and target data of a variable. Access the common domain for x13. E.x{1} ans = 501×1 0.0420 0.0420 0.0423 0.0424 0.0424 0.0425 0.0425 0.0426 0.0426

35-1647

35

Functions

0.0426 ⋮

Access the ecdf values for x15 in the baseline data. E.F_Baseline{2} ans = 501×1 0 0 0.0040 0.0080 0.0080 0.0080 0.0080 0.0080 0.0120 0.0120 ⋮

Plot the ecdf values for variables x13 and x15. tiledlayout(1,2) ax1 = nexttile; plotEmpiricalCDF(DDiagnostics,ax1,Variable="x13") ax2= nexttile; plotEmpiricalCDF(DDiagnostics,ax2,Variable="x15")

35-1648

DriftDiagnostics

You can also visualize the permutation test results for a variable. Plot the permutation results for variable x13. figure plotPermutationResults(DDiagnostics,Variable="x13")

35-1649

35

Functions

The plot also shows the metric threshold value with a straight line. Based on the histogram of metric values obtained during permutation testing, the probability that a metric value being greater than the threshold value if the baseline and target data for variable x13 have the same distribution is very small. The plot also displays the estimated p-value, 0.001, and the drift status, Drift, below the plot title.

Compute Metrics Without Estimating p-Values Generate baseline and target data with three variables, where the distribution parameters of the second and third variables change for the target data. rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1),betarnd(1,2,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1),betarnd(1.7,2.8,100,1)];

Compute the initial metrics for all variables between the baseline and target data without estimating the p-values. DDiagnostics = detectdrift(baseline,target,EstimatePValues=false) DDiagnostics = DriftDiagnostics VariableNames: ["x1"

35-1650

"x2"

"x3"]

DriftDiagnostics

CategoricalVariables: [] Metrics: ["Wasserstein" "Wasserstein" MetricValues: [0.2022 0.3468 0.0559]

"Wasserstein"]

Properties, Methods

detectdrift computes only the initial metric value for each variable using the baseline and target data. The properties associated with permutation testing and p-value estimation are either empty or contain NaNs. summary(DDiagnostics) MetricValue ___________ x1 x2 x3

0.20215 0.34676 0.055922

Metric _____________ "Wasserstein" "Wasserstein" "Wasserstein"

summary function displays only the initial metric value and the metric used for each specified variable. plotDriftStatus and plotPermutationResults do not produce plots and return warning messages when you compute metrics without estimating p-values. plotEmpiricalCDF and plotHistogram plot the ecdf and the histogram, respectively, for the first variable by default. They both return NaN for the p-value and drift status associated with the variable. plotEmpiricalCDF(DDiagnostics)

35-1651

35

Functions

plotHistogram(DDiagnostics)

35-1652

DriftDiagnostics

Version History Introduced in R2022a

See Also detectdrift | ecdf | histcounts | plotDriftStatus | plotEmpiricalCDF | plotHistogram | plotPermutationResults | summary

35-1653

35

Functions

ecdf Compute empirical cumulative distribution function (ecdf) for baseline and target data specified for data drift detection

Syntax E = ecdf(DDiagnostics) E = ecdf(DDiagnostics,Variables=variables)

Description E = ecdf(DDiagnostics) returns the table E, which stores the ecdf values for all the variables specified for drift detection in the call to the detectdrift function. ecdf returns NaN values for categorical variables. E = ecdf(DDiagnostics,Variables=variables) returns the table E for the variables specified by variables.

Examples Compute ECDF for All Variables Generate baseline and target data with two variables, where the distribution parameters of the second variable change for the target data. rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1)];

Perform permutation testing for any drift between the baseline and target data. DDiagnostics = detectdrift(baseline,target) DDiagnostics = DriftDiagnostics VariableNames: CategoricalVariables: DriftStatus: PValues: ConfidenceIntervals: MultipleTestDriftStatus: DriftThreshold: WarningThreshold:

["x1" "x2"] [] ["Stable" "Drift"] [0.2850 0.0030] [2×2 double] "Drift" 0.0500 0.1000

Properties, Methods

Compute the ecdf values for all variables. 35-1654

ecdf

E = ecdf(DDiagnostics) E=2×3 table

x1 x2

x ______________

F_Baseline ______________

F_Target ______________

{201×1 double} {201×1 double}

{201×1 double} {201×1 double}

{201×1 double} {201×1 double}

E is a table with two rows and three columns. The two rows correspond to the two variables, x1 and x2. For each variable, ecdf computes the ecdf values over a common domain for the baseline and target data. The function stores the common domain for each variable in the column x, the ecdf values for the baseline data in the column F_Baseline, and the ecdf values for the target data in the column F_Target. Access the ecdf values for variable 2 in the baseline data. E.F_Baseline{2} ans = 201×1 0 0.0100 0.0100 0.0200 0.0300 0.0400 0.0500 0.0600 0.0700 0.0800 ⋮

Plot the ecdf values of the baseline and target data for variable x2. stairs(E.x{2},E.F_Baseline{2},LineWidth=1.5) hold on stairs(E.x{2},E.F_Target{2},LineWidth=1.5) title('ECDF for x2') xlabel('x2') ylabel('Empirical CDF') legend('Baseline','Target',Location='east') hold off

35-1655

35

Functions

The plot of the ecdf values also shows the drift in the distribution of the target data.

Compute ECDF Values for Specified Variables Load the sample data. load humanactivity

For details on the data set, enter Description at the command line. Assign the first 1000 observations as baseline data and the next 1000 as target data. baseline = feat(1:1000,:); target = feat(1001:2000,:);

Test for drift on all variables. DDiagnostics = detectdrift(baseline,target);

Compute the ecdf values for only the first five variables. E = ecdf(DDiagnostics,Variables=[1:5]) E=5×3 table x

35-1656

F_Baseline

F_Target

ecdf

x1 x2 x3 x4 x5

_______________

_______________

_______________

{2001×1 {2001×1 {2001×1 {2001×1 {2001×1

{2001×1 {2001×1 {2001×1 {2001×1 {2001×1

{2001×1 {2001×1 {2001×1 {2001×1 {2001×1

double} double} double} double} double}

double} double} double} double} double}

double} double} double} double} double}

Access the ecdf values for the third variable in the baseline data. E.F_Baseline{3} ans = 2001×1 0 0 0 0 0 0 0.0010 0.0020 0.0030 0.0040 ⋮

Plot the ecdf values of the baseline and target data for variable x3. stairs(E.x{3},E.F_Baseline{3},LineWidth=1.5) hold on stairs(E.x{3},E.F_Target{3},LineWidth=1.5) title('ECDF for x3') xlabel('x3') ylabel('Empirical CDF') legend('Baseline','Target',Location = 'southeast') hold off

35-1657

35

Functions

The ecdf plot shows the drift in the target data for variable x3.

Input Arguments DDiagnostics — Diagnostics of permutation testing for drift detection DriftDiagnostics object Diagnostics of the permutation testing for drift detection, specified as a DriftDiagnostics object returned by detectdrift. variables — List of variables string array | cell array of character vectors | integer indices List of variables for which to compute the ecdf values, specified as a string array, cell array of character vectors, or list of integer indices. Example: Variables=["x1","x3"] Example: Variables=(1,3) Data Types: single | double | char | string

Output Arguments E — ecdf values table 35-1658

ecdf

ecdf values for all variables specified for drift detection in the call to detectdrift, returned as a table with the following columns. Column Name

Description

x

Common domain over which to evaluate the empirical cdf

F_Baseline

ecdf values for the baseline data

F_Target

ecdf values for the target data

For each variable in E, the columns store x and the ecdf values in cell arrays. To access the values, you can index into the table; for example, to obtain the ecdf values for the second variable in the baseline data, use E.F_Baseline{2,1}.

Version History Introduced in R2022a

See Also detectdrift | DriftDiagnostics | plotDriftStatus | plotEmpiricalCDF | plotHistogram | plotPermutationResults | summary | histcounts

35-1659

35

Functions

histcounts Compute histogram bin counts for specified variables in baseline and target data for drift detection

Syntax H = histcounts(DDiagnostics) H = histcounts(DDiagnostics,Variables=variables)

Description H = histcounts(DDiagnostics) returns the histogram bin counts in the table H for all variables specified for drift detection in the call to the detectdrift function. H = histcounts(DDiagnostics,Variables=variables) returns the bin counts for the variables specified by variables.

Examples Compute Histogram Bin Counts for All Variables Generate baseline and target data with two variables, where the distribution parameters of the second variable change for target data. rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1)];

Perform permutation testing for any drift between the baseline and target data. DDiagnostics = detectdrift(baseline,target);

Compute the histogram bin counts for all variables. H = histcounts(DDiagnostics) H=2×3 table

x1 x2

Bins __________________________________________________________________________________

_

{[-3.5000 -3 -2.5000 -2 -1.5000 -1 -0.5000 0 0.5000 1 1.5000 2 2.5000 3 3.5000 4]} {[ 0 0.5000 1 1.5000 2 2.5000 3 3.5000 4 4.5000 5 5.5000 6]}

{ {

H is a table with three columns. histcounts divides the data into bins and computes the histogram bin counts for a variable in the baseline and target data over the common bins. The first and second rows contain the bins and counts for variables x1 and x2, respectively. Access the histogram bin counts in the baseline data for the first variable. H.Counts_Baseline{1}

35-1660

histcounts

ans = 1×15 0

1.0000

1.0000

3.0000

14.0000

11.0000

17.0000

17.0000

15.0000

Plot the probability density function (pdf) estimate (percent of the data in each bin) of the baseline data for variable 1. histogram(BinEdges=H.Bins{1},BinCounts=H.Counts_Baseline{1},Normalization='probability')

You can also plot the histogram of the baseline and target data for variable 1 using the plotHistogram function. plotHistogram(DDiagnostics,Variable=1)

35-1661

11.0

35

Functions

Compute Histogram Bin Counts for Specific Variables Load the sample data. load humanactivity

For details on the data set, enter Description at the command line. Assign the first 1000 observations as baseline data and the next 1000 as target data. baseline = feat(1:1000,:); target = feat(1001:2000,:);

Test for drift on all variables. DDiagnostics = detectdrift(baseline,target);

Compute the histogram bin counts for only the first five variables. H = histcounts(DDiagnostics,Variables=(1:5)) H=5×3 table

Bins _______________________________________________________________________________________

35-1662

histcounts

x1 x2 x3 x4 x5

{[ -0.2000 -0.1000 0 0.1000 {[ -0.3000 -0.2000 -0.1000 0 0.1000 0.2000 0 {[ -0.6000 -0.5500 -0.5000 -0.4500 -0.4000 -0.3500 -0.3000 -0.2500 -0.2000 -0.1 {[0 0.0100 0.0200 0.0300 0.0400 0.0500 0.0600 0.0700 0.0800 0.0900 0.1000 0.1100 0.1200 {[ 0.0300 0.0400 0.0500 0.0600 0.0700 0.0800 0.0900

Access the histogram bin counts for the second variable in the target data. H.Counts_Target{2} ans = 1×14 0.1000

0

0.1000

0.1000

0.1000

8.2000

0.3000

0

Input Arguments DDiagnostics — Diagnostics of permutation testing for drift detection DriftDiagnostics object Diagnostics of the permutation testing for drift detection, specified as a DriftDiagnostics object returned by detectdrift. variables — List of variables string array | cell array of character vectors | integer indices List of variables for which to compute the histogram bin counts, specified as a string array, cell array of character vectors, or list of integer indices. Example: Variables=["x1","x3"] Example: Variables=(1,3) Data Types: single | double | char | string

Output Arguments H — Histogram bin counts table Histogram bin counts, returned as a table with the following columns. Column Name

Description

Bins

Common domain over which to evaluate the histogram bin counts for a variable. • For categorical variables, Bins contains the categories. • For continuous variables, Bins contains the bin edges.

Counts_Baseline

Histogram bin counts for the corresponding variables in the baseline data

35-1663

0

35

Functions

Column Name

Description

Counts_Target

Histogram bin counts for the corresponding variables in the target data

For each variable in H, the columns contain the bins and counts in cell arrays. To access the counts, you can index into the table; for example, to obtain the histogram bin counts for the second variable in the baseline data, use H.Counts_Baseline{2,1}.

Algorithms • For categorical data, detectdrift adds a 0.5 correction factor to the histogram bin counts for each bin to handle empty bins (categories). This is equivalent to the assumption that the parameter p, probability that value of the variable would be in that category, has the prior distribution Beta(0.5,0.5), (Jeffreys prior assumption for the distribution parameter). • histcounts treats a variable as ordinal for visualization purposes in these cases: • The variable is ordinal in either the baseline data or the target data, and the categories from both the baseline data and the target data are the same. • The variable is ordinal in either the baseline data or the target data, and the categories of the other data set are a subset of the ordinal data. • The variable is ordinal in both the baseline data and the target data, and categories from either data set are a subset of the other. • If a variable is ordinal, histcounts preserves the order of the bin names.

Version History Introduced in R2022a

See Also detectdrift | DriftDiagnostics | plotDriftStatus | plotEmpiricalCDF | plotHistogram | plotPermutationResults | ecdf | summary

35-1664

plotDriftStatus

plotDriftStatus Plot p-values and confidence intervals for variables tested for data drift

Syntax plotDriftStatus(DDiagnostics) plotDriftStatus(DDiagnostics,Variables=variables) plotDriftStatus(ax, ___ ) EB = plotDriftStatus( ___ ) [EB,CL] = plotDriftStatus( ___ )

Description plotDriftStatus(DDiagnostics) plots the estimated p-value of the permutation test for each variable specified for drift detection in the call to detectdrift, as well as the confidence interval for each estimated p-value, using error bars. The function also plots the warning and drift thresholds as well and color-codes the p-values with their confidence intervals according to their drift status. If you set the value of EstimatePValues to false in the call to detectdrift, then plotDriftStatus does not generate a plot and, instead, returns a warning. plotDriftStatus(DDiagnostics,Variables=variables) plots the drift status for the variables specified by variables. plotDriftStatus(ax, ___ ) plots on the axes ax instead of gca, using any of the input argument combinations in the previous syntaxes. EB = plotDriftStatus( ___ ) creates an error bar plot and returns an array of ErrorBar objects EB. Use EB to inspect and modify the properties of the error bars. To learn more, see ErrorBar Properties. [EB,CL] = plotDriftStatus( ___ ) additionally returns an array of ConstantLine objects CL for the drift and warning threshold values. CL is an array of ConstantLine objects. Use CL to inspect and modify the properties of the lines. For more information, see ConstantLine Properties.

Examples Plot Drift Status for All Variables Generate baseline and target data with three variables, where the distribution parameters of the second and third variables change for target data. rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1),betarnd(1,2,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1),betarnd(1.7,2.8,100,1)];

Perform permutation testing for all variables to check for any drift between the baseline and target data. 35-1665

35

Functions

DDiagnostics = detectdrift(baseline,target) DDiagnostics = DriftDiagnostics VariableNames: CategoricalVariables: DriftStatus: PValues: ConfidenceIntervals: MultipleTestDriftStatus: DriftThreshold: WarningThreshold:

["x1" "x2" "x3"] [] ["Stable" "Drift" [0.3850 0.0050 0.0910] [2×3 double] "Drift" 0.0500 0.1000

"Warning"]

Properties, Methods

Display the 95% confidence intervals for the estimated p-values. DDiagnostics.ConfidenceIntervals ans = 2×3 0.3547 0.4160

0.0016 0.0116

0.0739 0.1106

Plot the drift status for all three variables. plotDriftStatus(DDiagnostics)

35-1666

plotDriftStatus

plotDriftStatus plots the confidence intervals for the estimated p-values, using error bars. The function also compares the confidence bounds against the drift and warning thresholds, and indicates the drift status of each variable using different colors. The lower confidence bound of the p-value for the first variable is higher than the warning threshold. Therefore, the drift status for the first variable is Stable, indicated by the color blue. The lower confidence bound of the p-value for the third variable is lower than the warning threshold, but higher than the drift threshold. Therefore, the drift status for the third variable is Warning, and is indicated by the color yellow. The upper confidence bound of the p-value for the second variable is lower than the drift threshold. Therefore, the drift status for the third variable is Drift and is indicated by the color orange.

Plot Drift Status for Specified Variables Load the sample data. load humanactivity

For details on the data set, enter Description at the command line. Assign the first 250 observations as baseline data and the next 250 as target data for the first 15 variables. baseline = feat(1:250,1:15); target = feat(251:500,1:15);

35-1667

35

Functions

Test for drift on all variables. DDiagnostics = detectdrift(baseline,target) DDiagnostics = DriftDiagnostics VariableNames: CategoricalVariables: DriftStatus: PValues: ConfidenceIntervals: MultipleTestDriftStatus: DriftThreshold: WarningThreshold:

["x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" "x9 [] ["Drift" "Drift" "Drift" "Drift" "Drift" "Drift" " [1.0000e-03 1.0000e-03 1.0000e-03 1.0000e-03 1.0000e-03 1.0000e-03 1 [2×15 double] "Drift" 0.0500 0.1000

Properties, Methods

Display the 95% confidence intervals of the p-values for variables 10 to 15. DDiagnostics.ConfidenceIntervals(:,10:15) ans = 2×6 0.0000 0.0056

0.4646 0.5275

0.2225 0.2770

0.0000 0.0056

0.5427 0.6049

Plot the drift status for variables 10 to 15. plotDriftStatus(DDiagnostics,Variables=(10:15))

35-1668

0.0766 0.1138

plotDriftStatus

Change Error Bar Color on Drift Status Plot Load the sample data. load humanactivity

For details on the data set, enter Description at the command line. Assign the first 250 observations as baseline data and the next 250 as target data for the first 15 variables. baseline = feat(1:250,1:15); target = feat(251:500,1:15);

Test for drift on all variables. DDiagnostics = detectdrift(baseline,target);

Plot the drift status for all variables and return the ErrorBar and ConstantLine objects. [EB,CL] = plotDriftStatus(DDiagnostics)

35-1669

35

Functions

EB = 3×1 ErrorBar array: ErrorBar ErrorBar ErrorBar

(Stable) (Warning) (Drift)

CL = 2×1 ConstantLine array: ConstantLine ConstantLine

EB is an array of ErrorBar objects and CL is an array of ConstantLine objects. You can change the appearance of the plot by accessing the properties of these objects. Change the color of the error bars and markers for status Stable to green. Change the color of the drift threshold line, error bars, and markers for the status Drift to magenta. EB(1).Color = [0 1 0]; EB(1).MarkerFaceColor = EB(1).MarkerEdgeColor = EB(3).Color = [1 0 1]; EB(3).MarkerFaceColor = EB(3).MarkerEdgeColor = CL(2).Color = [1 0 1];

35-1670

[0 1 0]; [0 1 0]; [1 0 1]; [1 0 1];

plotDriftStatus

You can also access and modify properties by double-clicking EB or CL in the Workspace to open and use the Property Inspector.

Input Arguments DDiagnostics — Diagnostics of permutation testing for drift detection DriftDiagnostics object Diagnostics of the permutation testing for drift detection, specified as a DriftDiagnostics object returned by detectdrift. variables — List of variables string array | cell array of character vectors | integer indices List of variables for which to plot the drift status, specified as a string array, a cell array of character vectors, or a list of integer indices. Example: Variables=["x1","x3"] Example: Variables=(1,3) Data Types: single | double | char | string ax — Axes to plot into Axes object | UIAxes object 35-1671

35

Functions

Axes for plotDriftStatus to plot into, specified as an Axes or UIAxes object. If you do not specify ax, then plotDriftStatus creates the plot using the current axes. For more information on creating an axes object, see axes and uiaxes.

Output Arguments EB — Error bars showing the confidence intervals 3-by-1 array of ErrorBar objects Error bars showing the confidence intervals for the estimated p-values in the plot, returned as a 3by-1 array of ErrorBar objects. Use EB to inspect and adjust the properties of the error bars. To learn more about the properties of the ErrorBar object, see ErrorBar Properties. CL — Lines showing the threshold values 2-by-1 array of ConstantLine objects Lines showing the drift and warning threshold values in the plot, returned as a 2-by-1 array of ConstantLine objects. Use CL to inspect and adjust the properties of the lines.

Version History Introduced in R2022a

See Also detectdrift | DriftDiagnostics | plotEmpiricalCDF | plotHistogram | plotPermutationResults | ecdf | summary | histcounts

35-1672

plotEmpiricalCDF

plotEmpiricalCDF Plot empirical cumulative distribution function (ecdf) of a variable specified for data drift detection

Syntax plotEmpiricalCDF(DDiagnostics) plotEmpiricalCDF(DDiagnostics,Variable=variable) plotEmpiricalCDF(ax, ___ ) St = plotEmpiricalCDF( ___ )

Description plotEmpiricalCDF(DDiagnostics) plots the ecdf values of the baseline and target data for the continuous variable with the lowest p-value. If the data does not contain any continuous variables, then plotEmpiricalCDF does not generate a plot and, instead, returns a warning. If you set the value of EstimatePValues to false in the call to detectdrift, then plotEmpiricalCDF displays NaN for the p-value and the drift status. plotEmpiricalCDF(DDiagnostics,Variable=variable) plots the ecdf for the variable specified by variable. plotEmpiricalCDF(ax, ___ ) plots on the axes ax instead of gca, using any of the input argument combinations in the previous syntaxes. St = plotEmpiricalCDF( ___ ) plots the ecdf and returns an array of Stair objects St. Use this to inspect and modify the properties of the object. To learn more, see Stair Properties.

Examples Plot ECDF for Variable with Lowest p-Value Generate baseline and target data with three variables, where the distribution parameters of the second and third variables change for the target data. rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1),betarnd(1,2,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1),betarnd(1.7,2.8,100,1)];

Perform permutation testing for all variables to check for any drift between the baseline and target data. DDiagnostics = detectdrift(baseline,target) DDiagnostics = DriftDiagnostics VariableNames: ["x1"

"x2"

"x3"]

35-1673

35

Functions

CategoricalVariables: DriftStatus: PValues: ConfidenceIntervals: MultipleTestDriftStatus: DriftThreshold: WarningThreshold:

[] ["Stable" "Drift" [0.3850 0.0050 0.0910] [2×3 double] "Drift" 0.0500 0.1000

"Warning"]

Properties, Methods

Plot the ecdf for the variable with the lowest p-value. plotEmpiricalCDF(DDiagnostics)

By default, plotEmpiricalCDF plots the ecdf of the baseline and target data for the variable with the lowest p-value, which is x2 in this case. You can see the difference between the two empirical cumulative distribution functions. The plot also displays the p-value and the drift status for variable x2.

Plot ECDF for Specified Variable Generate baseline and target data with three variables, where the distribution parameters of the second and third variables change for the target data. 35-1674

plotEmpiricalCDF

rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1),betarnd(1,2,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1),betarnd(1.7,2.8,100,1)];

Perform permutation testing for all variables to check for any drift between the baseline and target data. DDiagnostics = detectdrift(baseline,target) DDiagnostics = DriftDiagnostics VariableNames: CategoricalVariables: DriftStatus: PValues: ConfidenceIntervals: MultipleTestDriftStatus: DriftThreshold: WarningThreshold:

["x1" "x2" "x3"] [] ["Stable" "Drift" [0.3850 0.0050 0.0910] [2×3 double] "Drift" 0.0500 0.1000

"Warning"]

Properties, Methods

Plot the ecdf for the third variable. plotEmpiricalCDF(DDiagnostics,Variable="x3")

35-1675

35

Functions

plotEmpiricalCDF plots the ecdf for the baseline and target data. The function also displays the estimated p-value and the drift status for the specified variable.

Plot ECDF for Variables in Tiled Layout Load the sample data. load humanactivity

For details on the data set, enter Description at the command line. Assign the first 250 observations as baseline data and the next 250 as target data for columns 10 to 15. baseline = feat(1:250,10:15); target = feat(251:500,10:15);

Test for drift on all variables. DDiagnostics = detectdrift(baseline,target) DDiagnostics = DriftDiagnostics VariableNames: CategoricalVariables: DriftStatus: PValues: ConfidenceIntervals: MultipleTestDriftStatus: DriftThreshold: WarningThreshold:

["x1" "x2" "x3" "x4" "x5" "x6"] [] ["Drift" "Stable" "Stable" "Drift" "Stable" [1.0000e-03 0.5080 0.2370 1.0000e-03 0.5370 0.0820] [2×6 double] "Drift" 0.0500 0.1000

Properties, Methods

The drift status for variables x4 and x6 is Drift and Warning, respectively. Plot the ecdf values for x4 and x6 in a tiled layout. tiledlayout(1,2); ax1 = nexttile; plotEmpiricalCDF(DDiagnostics,ax1,Variable="x4") ax2= nexttile; plotEmpiricalCDF(DDiagnostics,ax2,Variable="x6")

35-1676

"Warning"

plotEmpiricalCDF

There is a greater difference between the ecdf of the baseline and target data for variable x4. The detectdrift function detects the shift for variable x4.

Input Arguments DDiagnostics — Diagnostics of permutation testing for drift detection DriftDiagnostics object Diagnostics of the permutation testing for drift detection, specified as a DriftDiagnostics object returned by detectdrift. variable — Variable for which to visualize ecdf string | character vector | integer index Variable for which to plot the ecdf, specified as a string, character vector, or integer index. Example: Variable="x3" Example: Variable=3 Data Types: single | double | char | string ax — Axes to plot into Axes object | UIAxes object

35-1677

35

Functions

Axes on which to plot, specified as an Axes or UIAxes object. If you do not specify ax, then plotEmpiricalCDF creates the plot using the current axes. For more information on creating an axes object, see axes and uiaxes.

Version History Introduced in R2022a

See Also detectdrift | DriftDiagnostics | plotDriftStatus | plotHistogram | plotPermutationResults | ecdf | summary | histcounts

35-1678

plotHistogram

plotHistogram Plot histogram of a variable specified for data drift detection

Syntax plotHistogram(DDiagnostics) plotHistogram(DDiagnostics,Variable=variable) plotHistogram(ax, ___ ) H = plotHistogram( ___ )

Description plotHistogram(DDiagnostics) plots a histogram of the baseline and target data for the variable with the lowest p-value computed by the detectdrift function. If you set the value of EstimatePValues to false in the call to detectdrift, then plotHistogram displays NaN for the p-value and the drift status. plotHistogram(DDiagnostics,Variable=variable) plots the histogram of the baseline and target data for the variable specified by variable. plotHistogram(ax, ___ ) plots on the axes ax instead of gca, using any of the input argument combinations in the previous syntaxes. H = plotHistogram( ___ ) plots the histogram and returns an array of Histogram objects in H. Use H to inspect and modify the properties of the histogram. For more information, see Histogram Properties.

Examples Plot Histogram for Variable with Lowest p-Value Generate baseline and target data with three variables, where the distribution parameters of the second and third variables change for the target data. rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1),betarnd(1,2,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1),betarnd(1.7,2.8,100,1)];

Perform permutation testing for all variables to check for any drift between the baseline and target data. DDiagnostics = detectdrift(baseline,target) DDiagnostics = DriftDiagnostics VariableNames: ["x1"

"x2"

"x3"]

35-1679

35

Functions

CategoricalVariables: DriftStatus: PValues: ConfidenceIntervals: MultipleTestDriftStatus: DriftThreshold: WarningThreshold:

[] ["Stable" "Drift" [0.3850 0.0050 0.0910] [2×3 double] "Drift" 0.0500 0.1000

"Warning"]

Properties, Methods

Plot the histogram for the default variable. plotHistogram(DDiagnostics)

By default, plotHistogram plots a histogram of the baseline and target data for the variable with the lowest p-value. The function also displays the p-value and the drift status for the variable.

Plot Histogram of All Variables in Tiled Layout Generate baseline and target data with three variables, where the distribution parameters of the second and third variables change for the target data.

35-1680

plotHistogram

rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1),betarnd(1,2,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1),betarnd(1.7,2.8,100,1)];

Perform permutation testing for all variables to check for any drift between the baseline and target data. Use the Energy statistic as the metric. DDiagnostics = detectdrift(baseline,target,ContinuousMetric="energy") DDiagnostics = DriftDiagnostics VariableNames: CategoricalVariables: DriftStatus: PValues: ConfidenceIntervals: MultipleTestDriftStatus: DriftThreshold: WarningThreshold:

["x1" "x2" "x3"] [] ["Stable" "Drift" [0.3790 0.0110 0.0820] [2×3 double] "Drift" 0.0500 0.1000

"Warning"]

Properties, Methods

Plot the histograms for all three variables in a tiled layout. tiledlayout(3,1); ax1 = nexttile; plotHistogram(DDiagnostics,ax1,Variable="x1") ax2 = nexttile; plotHistogram(DDiagnostics,ax2,Variable="x2") ax3 = nexttile; plotHistogram(DDiagnostics,ax3,Variable="x3")

35-1681

35

Functions

Plot Histogram for Drift Detection and Change Bar Color Generate baseline and target data with three variables, where the distribution parameters of the second and third variables change for the target data. rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1),betarnd(1,2,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1),betarnd(1.7,2.8,100,1)];

Perform permutation testing for all variables to check for any drift between the baseline and target data. DDiagnostics = detectdrift(baseline,target) DDiagnostics = DriftDiagnostics VariableNames: CategoricalVariables: DriftStatus: PValues: ConfidenceIntervals: MultipleTestDriftStatus: DriftThreshold: WarningThreshold:

35-1682

["x1" "x2" "x3"] [] ["Stable" "Drift" [0.3850 0.0050 0.0910] [2×3 double] "Drift" 0.0500 0.1000

"Warning"]

plotHistogram

Properties, Methods

Plot the histogram for the first variable and return the Histogram object. H = plotHistogram(DDiagnostics,Variable=1)

H = 2×1 Bar array: Bar Bar

(Baseline) (Target)

Change the color of the histogram bars for the baseline data. H(1).FaceColor = [1 0 1];

35-1683

35

Functions

Input Arguments DDiagnostics — Diagnostics of permutation testing for drift detection DriftDiagnostics object Diagnostics of the permutation testing for drift detection, specified as a DriftDiagnostics object returned by detectdrift. variable — Variable for which to plot histogram string | character vector | integer index Variable for which to plot the histogram, specified as a string, a character vector, or an integer index. Example: Variable="x2" Example: Variable=2 Data Types: single | double | char | string ax — Axes to plot into Axes object | UIAxes object Axes for plotHistogram to plot into, specified as an Axes or UIAxes object. If you do not specify ax, then plotHistogram creates the plot using the current axes. For more information on creating an axes object, see axes and uiaxes. 35-1684

plotHistogram

Algorithms • For categorical data, detectdrift adds a 0.5 correction factor to the histogram bin counts for each bin to handle empty bins (categories). This is equivalent to the assumption that the parameter p, probability that value of the variable would be in that category, has the prior distribution Beta(0.5,0.5), (Jeffreys prior assumption for the distribution parameter). • plotHistogram treats a variable as ordinal for visualization purposes in these cases: • The variable is ordinal in either the baseline data or the target data, and the categories from both the baseline data and the target data are the same. • The variable is ordinal in either the baseline data or the target data, and the categories of the other data set are a subset of the ordinal data. • The variable is ordinal in both the baseline data and the target data, and categories from either data set are a subset of the other. • If a variable is ordinal, plotHistogram preserves the order of the bin names.

Version History Introduced in R2022a

See Also detectdrift | DriftDiagnostics | plotDriftStatus | plotEmpiricalCDF | plotPermutationResults | ecdf | summary | histcounts

35-1685

35

Functions

plotPermutationResults Plot histogram of permutation results for a variable specified for data drift detection

Syntax plotPermutationResults(DDiagnostics) plotPermutationResults(DDiagnostics,Variable=variable) plotPermutationResults(ax, ___ ) H = plotPermutationResults( ___ ) [H,CL] = plotPermutationResults( ___ )

Description plotPermutationResults(DDiagnostics) plots the histogram of metric values computed by the driftdetect function during permutation testing for the variable with the lowest p-value. If you set the value of EstimatePValues to false in the call to detectdrift, then plotPermutationResults does not generate a plot and, instead, returns a warning. plotPermutationResults(DDiagnostics,Variable=variable) plots the histogram for the variable specified by variable. plotPermutationResults(ax, ___ ) plots on the axes ax instead of gca using any of the previous input argument combinations in the previous syntaxes. H = plotPermutationResults( ___ ) plots the histogram and returns an array of Histogram objects H for the metric values computed during permutation testing. Use H to inspect and modify the properties of the histogram. For more information, see Histogram Properties. [H,CL] = plotPermutationResults( ___ ) additionally returns a ConstantLine object CL for the metric threshold value. Use CL to inspect and modify the properties of the line. For more information, see ConstantLine Properties.

Examples Plot Permutation Results for Variable with Lowest p-Value Generate baseline and target data with three variables, where the distribution parameters of the second and third variables change for the target data. rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1),betarnd(1,2,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1),betarnd(1.7,2.8,100,1)];

Perform permutation testing for all variables to check for any drift between the baseline and target data. DDiagnostics = detectdrift(baseline,target)

35-1686

plotPermutationResults

DDiagnostics = DriftDiagnostics VariableNames: CategoricalVariables: DriftStatus: PValues: ConfidenceIntervals: MultipleTestDriftStatus: DriftThreshold: WarningThreshold:

["x1" "x2" "x3"] [] ["Stable" "Drift" [0.3850 0.0050 0.0910] [2×3 double] "Drift" 0.0500 0.1000

"Warning"]

Properties, Methods

Plot the permutation results for the default variable. plotPermutationResults(DDiagnostics)

By default, plotPermutationResults plots a histogram of the metric values computed in permutation testing for the variable with the lowest p-value, which is x2 in this case. The function includes the metric threshold value (the initial metric value computed by detectdrift using the baseline and target data) on the histogram, so you can see the values that are greater than or equal to the threshold. plotPermutationResults also displays the p-value and the drift status for the variable, and the metric that you specify to use for permutation testing in the call to detectdrift.

35-1687

35

Functions

In this example, no metric is specified, so detectdrift uses the default metric (Wasserstein) for continuous variables.

Plot Permutation Results for Specified Variable Generate baseline and target data with three variables, where the distribution parameters of the second and third variables change for the target data. rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1),betarnd(1,2,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1),betarnd(1.7,2.8,100,1)];

Perform permutation testing for all variables to check for any drift between the baseline and target data. Use the Energy metric for all variables. DDiagnostics = detectdrift(baseline,target,ContinuousMetric="energy") DDiagnostics = DriftDiagnostics VariableNames: CategoricalVariables: DriftStatus: PValues: ConfidenceIntervals: MultipleTestDriftStatus: DriftThreshold: WarningThreshold:

["x1" "x2" "x3"] [] ["Stable" "Drift" [0.3790 0.0110 0.0820] [2×3 double] "Drift" 0.0500 0.1000

Properties, Methods

Display the 95% confidence bounds for the p-values. DDiagnostics.ConfidenceIntervals ans = 2×3 0.3488 0.4099

0.0055 0.0196

0.0657 0.1008

Plot the permutation results for the third variable. plotPermutationResults(DDiagnostics,Variable=3)

35-1688

"Warning"]

plotPermutationResults

Plot Permutation Results for Multiple Variables in Tiled Layout Generate baseline and target data with three variables, where the distribution parameters of the second and third variables change for the target data. rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1),betarnd(1,2,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1),betarnd(1.7,2.8,100,1)];

Perform permutation testing for all variables to check for any drift between the baseline and target data. Use the Energy metric for all variables. DDiagnostics = detectdrift(baseline,target,ContinuousMetric="energy") DDiagnostics = DriftDiagnostics VariableNames: CategoricalVariables: DriftStatus: PValues: ConfidenceIntervals: MultipleTestDriftStatus: DriftThreshold: WarningThreshold:

["x1" "x2" "x3"] [] ["Stable" "Drift" [0.3790 0.0110 0.0820] [2x3 double] "Drift" 0.0500 0.1000

"Warning"]

35-1689

35

Functions

Plot the permutation results for variables x1 and x2 in a tiled layout. tiledlayout(2,1); ax1 = nexttile; plotPermutationResults(DDiagnostics,ax1,Variable="x1") ax2 = nexttile; plotPermutationResults(DDiagnostics,ax2,Variable="x2")

Plot the permutation results for variables x1 and x3 in a tiled layout. tiledlayout(2,1); ax1 = nexttile; plotPermutationResults(DDiagnostics,ax1,Variable="x1") ax3= nexttile; plotPermutationResults(DDiagnostics,ax3,Variable="x3")

35-1690

plotPermutationResults

Adjust Colors on Permutation Results Plot Generate baseline and target data with three variables, where the distribution parameters of the second and third variables change for the target data. rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1),betarnd(1,2,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1),betarnd(1.7,2.8,100,1)];

Perform permutation testing for all variables to check for any drift between the baseline and target data. Use the Energy distance as the metric. DDiagnostics = detectdrift(baseline,target,ContinuousMetric="energy") DDiagnostics = DriftDiagnostics VariableNames: CategoricalVariables: DriftStatus: PValues: ConfidenceIntervals: MultipleTestDriftStatus: DriftThreshold: WarningThreshold:

["x1" "x2" "x3"] [] ["Stable" "Drift" [0.3790 0.0110 0.0820] [2×3 double] "Drift" 0.0500 0.1000

"Warning"]

35-1691

35

Functions

Properties, Methods

Plot the permutation results for the third variable. [H,CL] = plotPermutationResults(DDiagnostics,Variable=3)

H = 2×1 Histogram array: Histogram Histogram CL = ConstantLine with properties: InterceptAxis: Value: Color: LineStyle: LineWidth: Label: DisplayName:

35-1692

'x' 0.1012 [0.1500 0.1500 0.1500] ':' 3 '' ''

plotPermutationResults

Show all properties

Change the histogram bar colors to blue and the threshold line color to red. H(1).FaceColor = "b"; CL.Color = "r";

You can also access and modify properties by double-clicking H or CL in the Workspace to open and use the Property Inspector.

Input Arguments DDiagnostics — Diagnostics of permutation testing for drift detection DriftDiagnostics object Diagnostics of the permutation testing for drift detection, specified as a DriftDiagnostics object returned by detectdrift. variable — Variable for which to plot permutation results string | character vector | integer index Variable for which to plot the permutation results, specified as a string, character vector, or integer index. Example: Variable="x2" 35-1693

35

Functions

Example: Variable=2 Data Types: single | double | char | string ax — Axes on which to plot Axes object | UIAxes object Axes on which to plot, specified as an Axes or UIAxes object. If you do not specify ax, then plotPermutationResults creates the plot using the current axes. For more information on creating an axes object, see axes and uiaxes.

Output Arguments H — Histogram of metric values 2-by-1 array of Histogram objects Histogram of metric values computed during permutation testing, returned as a 2-by-1 array of Histogram objects. Use H to inspect and adjust the properties of the histogram. For more information on the Histogram object properties, see Histogram Properties. CL — Line showing the metric threshold value ConstantLine object Line showing the metric threshold value in the plot, returned as a ConstantLine object. Use CL to inspect and modify the properties of the line.

Version History Introduced in R2022a

See Also detectdrift | DriftDiagnostics | plotDriftStatus | plotEmpiricalCDF | plotHistogram | ecdf | summary | histcounts

35-1694

summary

summary Summary table for DriftDiagnostics object

Syntax summary(DDiagnostics) S = summary(DDiagnostics)

Description summary(DDiagnostics) displays the multiple test correction drift status and the summary of the drift diagnostics returned by the detectdrift function. S = summary(DDiagnostics) returns the table S containing the summary of the drift diagnostic results.

Examples Display Summary of Drift Diagnostics Generate baseline and target data with two variables, where the distribution parameters of the second variable change for target data. rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1)];

Perform permutation testing for any drift between the baseline and the target data. DDiagnostics = detectdrift(baseline,target);

Display the summary of the drift diagnostics. summary(DDiagnostics) Multiple Test Correction Drift Status: Drift DriftStatus ___________ x1 x2

"Stable" "Drift"

PValue ______

ConfidenceInterval ______________________

0.285 0.003

0.25719 0.0006191

0.31408 0.008742

summary displays the multiple test correction drift status above the summary table. detectdrift uses the default multiple test correction method, Bonferroni, which determines that the drift status for the overall data is Drift. The summary table has two rows, one for each variable, and three columns containing the drift status, estimated p-value, and 95% confidence bounds for the estimated p-values. detectdrift identifies the drift status as stable for the first variable, and detects the drift in the distribution for the second variable. The upper confidence bound for the second variable is lower than the default drift threshold of 0.05, so the drift status for this variable is Drift. 35-1695

35

Functions

Save Summary of Drift Diagnostics Generate baseline and target data with two variables, where the distribution parameters of the second variable change for the target data. rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1)];

Perform permutation testing for any drift between the baseline and target data. DDiagnostics = detectdrift(baseline,target);

Save the summary of the drift diagnostics in the table S. S = summary(DDiagnostics) S=3×3 table DriftStatus ___________ x1 x2 MultipleTest

"Stable" "Drift" "Drift"

PValue ______

ConfidenceInterval ______________________

0.285 0.003 NaN

0.25719 0.0006191 NaN

0.31408 0.008742 NaN

When you save the results in a table, summary stores the multiple test correction drift status in a row MultipleTest below the variables. The multiple test correction has no p-value or confidence interval, so the function stores NaNs. If you set EstimatePValues to false in the call to detectdrift, the software does not perform any estimation or confidence interval computation. In this case, S stores the name and initial value of the metric you specify for each variable in the call to detectdrift. DDiagnostics = detectdrift(baseline,target,EstimatePValues=false); S = summary(DDiagnostics) S=2×2 table MetricValue ___________ x1 x2

0.22381 0.36879

Metric _____________ "Wasserstein" "Wasserstein"

Input Arguments DDiagnostics — Diagnostics of permutation testing for drift detection DriftDiagnostics object Diagnostics of the permutation testing for drift detection, specified as a DriftDiagnostics object returned by detectdrift. 35-1696

summary

Output Arguments S — Summary of drift diagnostic results table Summary of the drift diagnostic results, returned as a table. By default, S includes a row for each variable specified for drift detection in the call to detectdrift, and a row for the multiple test drift status, MultipleTest. In this case, S has the following columns. Column Name

Description

DriftStatus

Drift status at the end of the permutation testing: Drift, Warning, or Stable

PValue

Estimated p-values

ConfidenceInterval

Confidence intervals for the estimated p-values

If you set the value of EstimatePValues to false in the call to detectdrift, then S does not have the row MultipleTest, and the number of rows in S is equal to the number of variables specified for drift detection. In this case, S has the following columns. Column Name

Description

MetricValue

Value of the metric used in permutation testing

Metric

Metric used in permutation testing

Version History Introduced in R2022a

See Also detectdrift | DriftDiagnostics | plotDriftStatus | plotEmpiricalCDF | plotPermutationResults | plotHistogram | ecdf | histcounts

35-1697

35

Functions

DriftDetectionMethod Incremental drift detector that utilizes Drift Detection Method (DDM)

Description DriftDetectionMethod model object represents an incremental concept drift detector that uses the Drift Detection Method [1]. After creating the object, you can use the detectdrift object function to update the statistics and check for any drift in the concept data (for example, failure rate, regression loss, and so on). DriftDetectionMethod is suitable for incremental concept drift detection. For drift detection on raw data, see detectdrift for batch drift detection.

Creation You can create DriftDetectionMethod by specifying the DetectionMethod argument as "ddm" in the call to incrementalConceptDriftDetector.

Properties Alternative — Type of alternative hypothesis 'greater' (default) | 'less' Type of alternative hypothesis for determining the drift status, specified as either 'greater' or 'less'. Data Types: char DriftDetected — Flag indicating whether software detects drift 1|0 This property is read-only. Flag indicating whether software detects drift or not, specified as either 1 or 0. Value of 1 means DriftStatus is 'Drift'. Data Types: logical DriftStatus — Current drift status 'Stable' | 'Warning' | 'Drift' This property is read-only. Current drift status, specified as 'Stable', 'Warning', or 'Drift'. You can see the transition in the drift status by comparing DriftStatus and PreviousDriftStaus. Data Types: char DriftThreshold — Number of standard deviations for drift limit nonnegative scalar value 35-1698

DriftDetectionMethod

This property is read-only. Number of standard deviations for drift limit, specified as a nonnegative scalar value. This is the number of standard deviations the overall test statistic can be away from the optimal test statistic before the software sets DriftStatus to 'Drift'. Data Types: double InputType — Type of input data 'binary' (default) | 'continuous' This property is read-only. Type of input data, specified as either 'binary' or 'continuous'. Data Types: char IsWarm — Flag indicating whether warmup period is over 1|0 This property is read-only. Flag indicating whether the warmup period is over or not, specified as 1 (true) or 0(false). Data Types: logical Mean — Weighted average of all input data numeric value This property is read-only. Weighted average of all input data used for training the drift detector, specified as a numeric value. Data Types: double NumTrainingObservations — Number of observations used for training nonnegative integer value This property is read-only. Number of observations used for training the drift detector, specified as a nonnegative integer value. Data Types: double OptimalMean — Optimal weighted average numeric value Optimal weighted average detectdrift observes up to the most current data point, specified as a numeric value. detectdrift updates the OptimalMean and OptimalStandardDeviation under any of these conditions: • When Alternative is 'greater' and Mean + StandardDeviation is less than or equal to OptimalMean + OptimalStandardDeviation. • When Alternative is 'less' and Mean - StandardDeviation is greater than or equal to OptimalMean - OptimalStandardDeviation. 35-1699

35

Functions

Data Types: double OptimalStandardDeviation — Optimal weighted standard deviation numeric value This property is read-only. Optimal weighted standard deviation detectdrift observes up to the most current data point, specified as a numeric value. detectdrift updates the OptimalMean and OptimalStandardDeviation under any of these conditions: • When Alternative is 'greater' and Mean + StandardDeviation is less than or equal to OptimalMean + OptimalStandardDeviation. • When Alternative is 'less' and Mean - StandardDeviation is greater than or equal to OptimalMean - OptimalStandardDeviation. Data Types: double PreviousDriftStatus — Drift status prior to the latest training 'Stable' | 'Warning' | 'Drift' This property is read-only. Drift status prior to the latest training using the most recent batch of data, specified as 'Stable', 'Warning', or 'Drift'. You can see the transition in the drift status by comparing DriftStatus and PreviousDriftStaus. Data Types: char StandardDeviation — Weighted standard deviation of all input data numeric value This property is read-only. Weighted standard deviation of all input data used for training the drift detector, specified as a numeric value. Data Types: double WarmupPeriod — Number of observations for drift detector warmup nonnegative integer value This property is read-only. Number of observations for drift detector warmup, specified as a nonnegative integer. Data Types: double WarningDetected — Flag indicating whether there is warning 1|0 This property is read-only. Flag indicating whether there is warning or not, specified as either 1 or 0. Value of 1 means DriftStatus is 'Warning'. 35-1700

DriftDetectionMethod

Data Types: logical WarningThreshold — Number of standard deviations for warning limit nonnegative scalar value This property is read-only. Number of standard deviations for warning limit, specified as a nonnegative scalar value. This is the number of standard deviations the overall test statistic can be away from the optimal test statistic before the software sets DriftStatus to 'Warning'. Data Types: double

Object Functions detectdrift reset

Update drift detector states and drift status with new data Reset incremental concept drift detector

Examples Monitor Data Stream for Potential Drift Initiate the concept drift detector using the Drift Detection Method (DDM). incCDDetector = incrementalConceptDriftDetector("ddm");

Create a random stream such that for the first 1000 observations, failure rate is 0.1 and after 1000 observations, failure rate increases to 0.6. rng(1234) % For reproducibility numObservations = 3000; switchPeriod = 1000; for i = 1:numObservations if i 50K [4x4 table] [4x4 table]

evaluator is a fairnessMetrics object. By default, the fairnessMetrics function selects the majority group of the sensitive attribute (group with the largest number of individuals) as the reference group for the attribute. Also, the fairnessMetrics function orders the labels by using the unique function with the "sorted" option, and specifies the second class of the labels as the positive class. In this data set, the reference group of age_group is the group 30 2). Y = Y > 2;

Suppose that the data collected when the subject was not moving (Y = false) has double the quality than when the subject was moving. Create a weight variable that attributes 2 to observations collected from a still subject, and 1 to a moving subject. W = ones(n,1) + ~Y;

Train Linear Model for Binary Classification Fit a linear model for binary classification to a random sample of half the data. idxtt = randsample([true false],n,true); TTMdl = fitclinear(X(:,idxtt),Y(idxtt),'ObservationsIn','columns', ... 'Weights',W(idxtt)) TTMdl = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'none' Beta: [60x1 double] Bias: -0.1107 Lambda: 8.2967e-05 Learner: 'svm'

TTMdl is a ClassificationLinear model object representing a traditionally trained linear model for binary classification. Convert Trained Model Convert the traditionally trained classification model to a binary classification linear model for incremental learning. 35-2060

fit

IncrementalMdl = incrementalLearner(TTMdl) IncrementalMdl = incrementalClassificationLinear IsWarm: Metrics: ClassNames: ScoreTransform: Beta: Bias: Learner:

1 [1x2 table] [0 1] 'none' [60x1 double] -0.1107 'svm'

Separately Track Performance Metrics and Fit Model Perform incremental learning on the rest of the data by using the updateMetrics and fit functions. At each iteration: 1

Simulate a data stream by processing 50 observations at a time.

2

Call updateMetrics to update the cumulative and window classification error of the model given the incoming chunk of observations. Overwrite the previous incremental model to update the losses in the Metrics property. Note that the function does not fit the model to the chunk of data—the chunk is "new" data for the model. Specify that the observations are oriented in columns, and specify the observation weights.

3

Call fit to fit the model to the incoming chunk of observations. Overwrite the previous incremental model to update the model parameters. Specify that the observations are oriented in columns, and specify the observation weights.

4

Store the classification error and first estimated coefficient β1.

% Preallocation idxil = ~idxtt; nil = sum(idxil); numObsPerChunk = 50; nchunk = floor(nil/numObsPerChunk); ce = array2table(zeros(nchunk,2),'VariableNames',["Cumulative" "Window"]); beta1 = [IncrementalMdl.Beta(1); zeros(nchunk,1)]; Xil = X(:,idxil); Yil = Y(idxil); Wil = W(idxil); % Incremental fitting for j = 1:nchunk ibegin = min(nil,numObsPerChunk*(j-1) + 1); iend = min(nil,numObsPerChunk*j); idx = ibegin:iend; IncrementalMdl = updateMetrics(IncrementalMdl,Xil(:,idx),Yil(idx), ... 'ObservationsIn','columns','Weights',Wil(idx)); ce{j,:} = IncrementalMdl.Metrics{"ClassificationError",:}; IncrementalMdl = fit(IncrementalMdl,Xil(:,idx),Yil(idx),'ObservationsIn','columns', ... 'Weights',Wil(idx)); beta1(j + 1) = IncrementalMdl.Beta(1); end

35-2061

35

Functions

IncrementalMdl is an incrementalClassificationLinear model object trained on all the data in the stream. Alternatively, you can use updateMetricsAndFit to update performance metrics of the model given a new chunk of data, and then fit the model to the data. Plot a trace plot of the performance metrics and estimated coefficient β1. t = tiledlayout(2,1); nexttile h = plot(ce.Variables); xlim([0 nchunk]) ylabel('Classification Error') legend(h,ce.Properties.VariableNames) nexttile plot(beta1) ylabel('\beta_1') xlim([0 nchunk]) xlabel(t,'Iteration')

The cumulative loss is stable and gradually decreases, whereas the window loss jumps.

β1 changes gradually, then levels off, as fit processes more chunks.

35-2062

fit

Perform Conditional Training Incrementally train a linear regression model only when its performance degrades. Load and shuffle the 2015 NYC housing data set. For more details on the data, see NYC Open Data. load NYCHousing2015 rng(1) % For reproducibility n = size(NYCHousing2015,1); shuffidx = randsample(n,n); NYCHousing2015 = NYCHousing2015(shuffidx,:);

Extract the response variable SALEPRICE from the table. For numerical stability, scale SALEPRICE by 1e6. Y = NYCHousing2015.SALEPRICE/1e6; NYCHousing2015.SALEPRICE = [];

Create dummy variable matrices from the categorical predictors. catvars = ["BOROUGH" "BUILDINGCLASSCATEGORY" "NEIGHBORHOOD"]; dumvarstbl = varfun(@(x)dummyvar(categorical(x)),NYCHousing2015, ... 'InputVariables',catvars); dumvarmat = table2array(dumvarstbl); NYCHousing2015(:,catvars) = [];

Treat all other numeric variables in the table as linear predictors of sales price. Concatenate the matrix of dummy variables to the rest of the predictor data. idxnum = varfun(@isnumeric,NYCHousing2015,'OutputFormat','uniform'); X = [dumvarmat NYCHousing2015{:,idxnum}];

Configure a linear regression model for incremental learning so that it does not have an estimation or metrics warm-up period. Specify a metrics window size of 1000. Fit the configured model to the first 100 observations. Mdl = incrementalRegressionLinear('EstimationPeriod',0, ... 'MetricsWarmupPeriod',0,'MetricsWindowSize',1000); numObsPerChunk = 100; Mdl = fit(Mdl,X(1:numObsPerChunk,:),Y(1:numObsPerChunk));

Mdl is an incrementalRegressionLinear model object. Perform incremental learning, with conditional fitting, by following this procedure for each iteration: • Simulate a data stream by processing a chunk of 100 observations at a time. • Update the model performance by computing the epsilon insensitive loss, within a 200 observation window. • Fit the model to the chunk of data only when the loss more than doubles from the minimum loss experienced. • When tracking performance and fitting, overwrite the previous incremental model. • Store the epsilon insensitive loss and β313 to see how the loss and coefficient evolve during training. • Track when fit trains the model. 35-2063

35

Functions

% Preallocation n = numel(Y) - numObsPerChunk; nchunk = floor(n/numObsPerChunk); beta313 = zeros(nchunk,1); ei = array2table(nan(nchunk,2),'VariableNames',["Cumulative" "Window"]); trained = false(nchunk,1); % Incremental fitting for j = 2:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; Mdl = updateMetrics(Mdl,X(idx,:),Y(idx)); ei{j,:} = Mdl.Metrics{"EpsilonInsensitiveLoss",:}; minei = min(ei{:,2}); pdiffloss = (ei{j,2} - minei)/minei*100; if pdiffloss > 100 Mdl = fit(Mdl,X(idx,:),Y(idx)); trained(j) = true; end beta313(j) = Mdl.Beta(end); end

Mdl is an incrementalRegressionLinear model object trained on all the data in the stream. To see how the model performance and β313 evolve during training, plot them on separate tiles. t = tiledlayout(2,1); nexttile plot(beta313) hold on plot(find(trained),beta313(trained),'r.') xlim([0 nchunk]) ylabel('\beta_{313}') xline(Mdl.EstimationPeriod/numObsPerChunk,'r-.') legend('\beta_{313}','Training occurs','Location','southeast') hold off nexttile plot(ei.Variables) xlim([0 nchunk]) ylabel('Epsilon Insensitive Loss') xline(Mdl.EstimationPeriod/numObsPerChunk,'r-.') legend(ei.Properties.VariableNames) xlabel(t,'Iteration')

35-2064

fit

The trace plot of β313 shows periods of constant values, during which the loss did not double from the minimum experienced.

Input Arguments Mdl — Incremental learning model incrementalClassificationLinear model object | incrementalRegressionLinear model object Incremental learning model to fit to streaming data, specified as an incrementalClassificationLinear or incrementalRegressionLinear model object. You can create Mdl directly or by converting a supported, traditionally trained machine learning model using the incrementalLearner function. For more details, see the corresponding reference page. X — Chunk of predictor data floating-point matrix Chunk of predictor data to which the model is fit, specified as a floating-point matrix of n observations and Mdl.NumPredictors predictor variables. The value of the ObservationsIn name-value argument determines the orientation of the variables and observations. The default ObservationsIn value is "rows", which indicates that observations in the predictor data are oriented along the rows of X. The length of the observation labels Y and the number of observations in X must be equal; Y(j) is the label of observation j (row or column) in X. 35-2065

35

Functions

Note • If Mdl.NumPredictors = 0, fit infers the number of predictors from X, and sets the corresponding property of the output model. Otherwise, if the number of predictor variables in the streaming data changes from Mdl.NumPredictors, fit issues an error. • fit supports only floating-point input predictor data. If your input data includes categorical data, you must prepare an encoded version of the categorical data. Use dummyvar to convert each categorical variable to a numeric matrix of dummy variables. Then, concatenate all dummy variable matrices and any other numeric predictors. For more details, see “Dummy Variables” on page 2-13.

Data Types: single | double Y — Chunk of responses (labels) categorical array | character array | string array | logical vector | floating-point vector | cell array of character vectors Chunk of responses (labels) to which the model is fit, specified as a categorical, character, or string array, logical or floating-point vector, or cell array of character vectors for classification problems; or a floating-point vector for regression problems. The length of the observation labels Y and the number of observations in X must be equal; Y(j) is the label of observation j (row or column) in X. For classification problems: • fit supports binary classification only. • When the ClassNames property of the input model Mdl is nonempty, the following conditions apply: • If Y contains a label that is not a member of Mdl.ClassNames, fit issues an error. • The data type of Y and Mdl.ClassNames must be the same. Data Types: char | string | cell | categorical | logical | single | double Note • If an observation (predictor or label) or weight contains at least one missing (NaN) value, fit ignores the observation. Consequently, fit uses fewer than n observations to create an updated model, where n is the number of observations in X. • The chunk size n and the stochastic gradient descent (SGD) hyperparameter mini-batch size (Mdl.BatchSize) can be different values, and n does not have to be an exact multiple of the minibatch size. If n < Mdl.BatchSize, fit uses the n available observations when it applies SGD. If n > Mdl.BatchSize, the function updates the model with a mini-batch of the specified size multiple times, and then uses the rest of the observations for the last mini-batch. The number of observations for the last mini-batch can be smaller than Mdl.BatchSize.

35-2066

fit

Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'ObservationsIn','columns','Weights',W specifies that the columns of the predictor matrix correspond to observations, and the vector W contains observation weights to apply during incremental learning. ObservationsIn — Predictor data observation dimension 'rows' (default) | 'columns' Predictor data observation dimension, specified as the comma-separated pair consisting of 'ObservationsIn' and 'columns' or 'rows'. Data Types: char | string Weights — Chunk of observation weights floating-point vector of positive values Chunk of observation weights, specified as the comma-separated pair consisting of 'Weights' and a floating-point vector of positive values. fit weighs the observations in X with the corresponding values in Weights. The size of Weights must equal n, which is the number of observations in X. By default, Weights is ones(n,1). For more details, including normalization schemes, see “Observation Weights” on page 35-2068. Data Types: double | single

Output Arguments Mdl — Updated incremental learning model incrementalClassificationLinear model object | incrementalRegressionLinear model object Updated incremental learning model, returned as an incremental learning model object of the same data type as the input model Mdl, either incrementalClassificationLinear or incrementalRegressionLinear. If Mdl.EstimationPeriod > 0, the incremental fitting functions updateMetricsAndFit and fit estimate hyperparameters using the first Mdl.EstimationPeriod observations passed to either function; they do not train the input model to that data. However, if an incoming chunk of n observations is greater than or equal to the number of observations remaining in the estimation period m, fit estimates hyperparameters using the first n – m observations, and fits the input model to the remaining m observations. Consequently, the software updates the Beta and Bias properties, hyperparameter properties, and recordkeeping properties such as NumTrainingObservations. For classification problems, if the ClassNames property of the input model Mdl is an empty array, fit sets the ClassNames property of the output model Mdl to unique(Y). 35-2067

35

Functions

Tips • Unlike traditional training, incremental learning might not have a separate test (holdout) set. Therefore, to treat each incoming chunk of data as a test set, pass the incremental model and each incoming chunk to updateMetrics before training the model on the same data.

Algorithms Observation Weights For classification problems, if the prior class probability distribution is known (in other words, the prior distribution is not empirical), fit normalizes observation weights to sum to the prior class probabilities in the respective classes. This action implies that observation weights are the respective prior class probabilities by default. For regression problems or if the prior class probability distribution is empirical, the software normalizes the specified observation weights to sum to 1 each time you call fit.

Version History Introduced in R2020b

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • Use saveLearnerForCoder, loadLearnerForCoder, and codegen to generate code for the fit function. Save a trained model by using saveLearnerForCoder. Define an entry-point function that loads the saved model by using loadLearnerForCoder and calls the fit function. Then use codegen to generate code for the entry-point function. • To generate single-precision C/C++ code for fit, specify the name-value argument "DataType","single" when you call the loadLearnerForCoder function. • This table contains notes about the arguments of fit. Arguments not included in this table are fully supported. Argument

Notes and Limitations

Mdl

For usage notes and limitations of the model object, see incrementalClassificationLinear or incrementalRegressionLinear.

X

• Batch-to-batch, the number of observations can be a variable size, but must equal the number of observations in Y. • The number of predictor variables must equal to Mdl.NumPredictors. • X must be single or double.

35-2068

fit

Argument

Notes and Limitations

Y

• Batch-to-batch, the number of observations can be a variable size, but must equal the number of observations in X. • For classification problems, all labels in Y must be represented in Mdl.ClassNames. • Y and Mdl.ClassNames must have the same data type.

• The following restrictions apply: • If you configure Mdl to shuffle data (Mdl.Shuffle is true, or Mdl.Solver is 'sgd' or 'asgd'), the fit function randomly shuffles each incoming batch of observations before it fits the model to the batch. The order of the shuffled observations might not match the order generated by MATLAB. Therefore, the fitted coefficients computed in MATLAB and the generated code might not be equal. • Use a homogeneous data type for all floating-point input arguments and object properties, specifically, either single or double. For more information, see “Introduction to Code Generation” on page 34-3.

See Also Objects incrementalClassificationLinear | incrementalRegressionLinear Functions predict | updateMetricsAndFit | updateMetrics Topics “Incremental Learning Overview” on page 28-2 “Implement Incremental Learning for Classification Using Flexible Workflow” on page 28-37

35-2069

35

Functions

fit Train naive Bayes classification model for incremental learning

Syntax Mdl = fit(Mdl,X,Y) Mdl = fit(Mdl,X,Y,'Weights',Weights)

Description The fit function fits a configured naive Bayes classification model for incremental learning (incrementalClassificationNaiveBayes object) to streaming data. To additionally track performance metrics using the data as it arrives, use updateMetricsAndFit instead. To fit or cross-validate a naive Bayes classification model to an entire batch of data at once, see fitcnb. Mdl = fit(Mdl,X,Y) returns a naive Bayes classification model for incremental learning Mdl, which represents the input naive Bayes classification model for incremental learning Mdl trained using the predictor and response data, X and Y respectively. Specifically, fit updates the conditional posterior distribution of the predictor variables given the data. Mdl = fit(Mdl,X,Y,'Weights',Weights) also sets observation weights Weights.

Examples Incrementally Train Model with Little Prior Information Fit an incremental naive Bayes learner when you know only the expected maximum number of classes in the data. Create an incremental naive Bayes model. Specify that the maximum number of expected classes is 5. Mdl = incrementalClassificationNaiveBayes('MaxNumClasses',5) Mdl = incrementalClassificationNaiveBayes IsWarm: Metrics: ClassNames: ScoreTransform: DistributionNames: DistributionParameters:

0 [1x2 table] [1x0 double] 'none' 'normal' {}

Mdl is an incrementalClassificationNaiveBayes model. All its properties are read-only. Mdl can process at most 5 unique classes. By default, the prior class distribution Mdl.Prior is empirical, which means the software updates the prior distribution as it encounters labels. 35-2070

fit

Mdl must be fit to data before you can use it to perform any other operations. Load the human activity data set. Randomly shuffle the data. load humanactivity n = numel(actid); rng(1) % For reproducibility idx = randsample(n,n); X = feat(idx,:); Y = actid(idx);

For details on the data set, enter Description at the command line. Fit the incremental model to the training data, in chunks of 50 observations at a time, by using the fit function. At each iteration: • Simulate a data stream by processing 50 observations. • Overwrite the previous incremental model with a new one fitted to the incoming observations. • Store the mean of the first predictor in the first class μ11 and the prior probability that the subject is moving (Y > 2) to see how these parameters evolve during incremental learning. % Preallocation numObsPerChunk = 50; nchunk = floor(n/numObsPerChunk); mu11 = zeros(nchunk,1); priormoved = zeros(nchunk,1); % Incremental fitting for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; Mdl = fit(Mdl,X(idx,:),Y(idx)); mu11(j) = Mdl.DistributionParameters{1,1}(1); priormoved(j) = sum(Mdl.Prior(Mdl.ClassNames > 2)); end

Mdl is an incrementalClassificationNaiveBayes model object trained on all the data in the stream. To see how the parameters evolve during incremental learning, plot them on separate tiles. t = tiledlayout(2,1); nexttile plot(mu11) ylabel('\mu_{11}') xlabel('Iteration') axis tight nexttile plot(priormoved) ylabel('\pi(Subject Is Moving)') xlabel(t,'Iteration') axis tight

35-2071

35

Functions

fit updates the posterior mean of the predictor distribution as it processes each chunk. Because the prior class distribution is empirical, π(subject is moving) changes as fit processes each chunk.

Specify All Class Names Before Fitting Fit an incremental naive Bayes learner when you know all the class names in the data. Consider training a device to predict whether a subject is sitting, standing, walking, running, or dancing based on biometric data measured on the subject. The class names map 1 through 5 to an activity. Also, suppose that the researchers plan to expose the device to each class uniformly. Create an incremental naive Bayes model for multiclass learning. Specify the class names and the uniform prior class distribution. classnames = 1:5; Mdl = incrementalClassificationNaiveBayes('ClassNames',classnames,'Prior','uniform') Mdl = incrementalClassificationNaiveBayes IsWarm: Metrics: ClassNames: ScoreTransform:

35-2072

0 [1x2 table] [1 2 3 4 5] 'none'

fit

DistributionNames: 'normal' DistributionParameters: {5x0 cell}

Mdl is an incrementalClassificationNaiveBayes model object. All its properties are read-only. During training, observed labels must be in Mdl.ClassNames. Mdl must be fit to data before you can use it to perform any other operations. Load the human activity data set. Randomly shuffle the data. load humanactivity n = numel(actid); rng(1); % For reproducibility idx = randsample(n,n); X = feat(idx,:); Y = actid(idx);

For details on the data set, enter Description at the command line. Fit the incremental model to the training data by using the fit function. Simulate a data stream by processing chunks of 50 observations at a time. At each iteration: • Process 50 observations. • Overwrite the previous incremental model with a new one fitted to the incoming observations. • Store the mean of the first predictor in the first class μ11 and the prior probability that the subject is moving (Y > 2) to see how these parameters evolve during incremental learning. % Preallocation numObsPerChunk = 50; nchunk = floor(n/numObsPerChunk); mu11 = zeros(nchunk,1); priormoved = zeros(nchunk,1); % Incremental fitting for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; Mdl = fit(Mdl,X(idx,:),Y(idx)); mu11(j) = Mdl.DistributionParameters{1,1}(1); priormoved(j) = sum(Mdl.Prior(Mdl.ClassNames > 2)); end

Mdl is an incrementalClassificationNaiveBayes model object trained on all the data in the stream. To see how the parameters evolve during incremental learning, plot them on separate tiles. t = tiledlayout(2,1); nexttile plot(mu11) ylabel('\mu_{11}') xlabel('Iteration') axis tight nexttile

35-2073

35

Functions

plot(priormoved) ylabel('\pi(Subject Is Moving)') xlabel(t,'Iteration') axis tight

fit updates the posterior mean of the predictor distribution as it processes each chunk. Because the prior class distribution is specified as uniform, π(subject is moving) = 0.6 and does not change as fit processes each chunk.

Specify Observation Weights Train a naive Bayes classification model by using fitcnb, convert it to an incremental learner, track its performance on streaming data, and then fit the model to the data. Specify observation weights. Load and Preprocess Data Load the human activity data set. Randomly shuffle the data. load humanactivity rng(1); % For reproducibility n = numel(actid); idx = randsample(n,n); X = feat(idx,:); Y = actid(idx);

35-2074

fit

For details on the data set, enter Description at the command line. Suppose that the data from a stationary subject (Y 0

"symmetric"

2x – 1

fitcauto

Value

Description

"symmetricismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

"symmetriclogit"

2/(1 + e–x) – 1

For a MATLAB function or a function you define, use its function handle for the score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Example: "ScoreTransform","logit" Data Types: char | string | function_handle Weights — Observation weights positive numeric vector | name of variable in Tbl Observation weights, specified as a positive numeric vector or the name of a variable in Tbl. The software weights each observation in X or Tbl with the corresponding value in Weights. The length of Weights must equal the number of rows in X or Tbl. If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored as Tbl.W, then specify it as "W". Otherwise, the software treats all columns of Tbl, including W, as predictors or the response variable when training the model. By default, Weights is ones(n,1), where n is the number of observations in X or Tbl. The software normalizes Weights to sum to the value of the prior probability in the respective class. Data Types: single | double | char | string

Output Arguments Mdl — Trained classification model classification model object Trained classification model, returned as one of the classification model objects in this table. Learner Name

Returned Model Object

"discr"

CompactClassificationDiscriminant

"ensemble"

CompactClassificationEnsemble

"kernel"

• ClassificationKernel for binary classification • CompactClassificationECOC for multiclass classification

"knn"

ClassificationKNN

"linear"

• ClassificationLinear for binary classification • CompactClassificationECOC for multiclass classification

"nb"

CompactClassificationNaiveBayes

35-2167

35

Functions

Learner Name

Returned Model Object

"net"

CompactClassificationNeuralNetwork

"svm"

• CompactClassificationSVM for binary classification • CompactClassificationECOC for multiclass classification

"tree"

CompactClassificationTree

OptimizationResults — Optimization results BayesianOptimization object | table Optimization results, returned as a BayesianOptimization object if you use Bayesian optimization or a table if you use ASHA optimization. For more information, see “Bayesian Optimization” on page 35-2172 and “ASHA Optimization” on page 35-2172.

More About Verbose Display When you set the Verbose field of the HyperparameterOptimizationOptions name-value argument to 1 or 2, the fitcauto function provides an iterative display of the optimization results. The following table describes the columns in the display and their entries. Column Name

Description

Iter

Iteration number — You can set a limit to the number of iterations by using the MaxObjectiveEvaluations field of the HyperparameterOptimizationOptions name-value argument.

Active workers

Number of active parallel workers — This column appears only when you run the optimization in parallel by setting the UseParallel field of the HyperparameterOptimizationOptions name-value argument to true.

Eval result

One of the following evaluation results: • Best — The learner and hyperparameter values at this iteration give the minimum observed validation loss computed so far. That is, the Validation loss value is the smallest computed so far. • Accept — The learner and hyperparameter values at this iteration give meaningful (for example, non-NaN) validation loss values. • Error — The learner and hyperparameter values at this iteration result in an error (for example, a Validation loss value of NaN).

35-2168

fitcauto

Column Name

Description

Validation loss

Validation loss computed for the learner and hyperparameter values at this iteration — In particular, fitcauto computes the crossvalidation classification error by default. If you specify misclassification costs by using the Cost name-value argument, fitcauto computes the mean misclassification cost instead. For more information, see “Mean Misclassification Cost” on page 35-2173. You can change the validation scheme by using the CVPartition, Holdout, or Kfold field of the HyperparameterOptimizationOptions name-value argument.

Time for training & validation (sec)

Time taken to train and compute the validation loss for the model with the learner and hyperparameter values at this iteration (in seconds) — When you use Bayesian optimization, this value excludes the time required to update the objective function model maintained by the Bayesian optimization process. For more details, see “Bayesian Optimization” on page 35-2172.

Observed min validation loss

Observed minimum validation loss computed so far — This value corresponds to the smallest Validation loss value computed so far in the optimization process. By default, fitcauto returns a plot of the optimization that displays dark blue points for the observed minimum validation loss values. This plot does not appear when the ShowPlots field of the HyperparameterOptimizationOptions name-value argument is set to false.

35-2169

35

Functions

Column Name

Description

Estimated min validation loss

Estimated minimum validation loss — When you use Bayesian optimization, fitcauto updates, at each iteration, an objective function model maintained by the Bayesian optimization process, and uses this model to estimate the minimum validation loss. For more details, see “Bayesian Optimization” on page 35-2172. By default, fitcauto returns a plot of the optimization that displays light blue points for the estimated minimum validation loss values. This plot does not appear when the ShowPlots field of the HyperparameterOptimizationOptions name-value argument is set to false. Note This column appears only when you use Bayesian optimization, that is, when the Optimizer field of the HyperparameterOptimizationOptions name-value argument is set to "bayesopt".

Training set size

Number of observations used in each training set at this iteration — Use the MaxTrainingSetSize and MinTrainingSetSize fields of the HyperparameterOptimizationOptions name-value argument to specify bounds for the training set size. For more details, see “ASHA Optimization” on page 35-2172. Note This column appears only when you use ASHA optimization, that is, when the Optimizer field of the HyperparameterOptimizationOptions name-value argument is set to "asha".

Learner

Model type evaluated at this iteration — Specify the learners used in the optimization by using the Learners name-value argument.

Hyperparameter: Value

Hyperparameter values at this iteration — Specify the hyperparameters used in the optimization by using the OptimizeHyperparameters name-value argument.

The display also includes these model descriptions: • Best observed learner — This model, with the listed learner type and hyperparameter values, yields the final observed minimum validation loss. When you use ASHA optimization, fitcauto retrains the model on the entire training data set and returns it as the Mdl output.

35-2170

fitcauto

• Best estimated learner — This model, with the listed learner type and hyperparameter values, yields the final estimated minimum validation loss when you use Bayesian optimization. In this case, fitcauto retrains the model on the entire training data set and returns it as the Mdl output. Note The Best estimated learner model appears only when you use Bayesian optimization, that is, when the Optimizer field of the HyperparameterOptimizationOptions name-value argument is set to "bayesopt".

Tips • Depending on the size of your data set, the number of learners you specify, and the optimization method you choose, fitcauto can take some time to run. • If you have a Parallel Computing Toolbox license, you can speed up computations by running the optimization in parallel. To do so, specify "HyperparameterOptimizationOptions",struct("UseParallel",true). You can include additional fields in the structure to control other aspects of the optimization. See HyperparameterOptimizationOptions. • If fitcauto with Bayesian optimization takes a long time to run because of the number of observations in your training set (for example, over 10,000), consider using fitcauto with ASHA optimization instead. ASHA optimization often finds good solutions faster than Bayesian optimization for data sets with many observations. To use ASHA optimization, specify "HyperparameterOptimizationOptions",struct("Optimizer","asha"). You can include additional fields in the structure to control other aspects of the optimization. In particular, if you have a time constraint, specify the MaxTime field of the HyperparameterOptimizationOptions structure to limit the number of seconds fitcauto runs.

Algorithms Automatic Selection of Learners When you specify "Learners","auto", the fitcauto function analyzes the predictor and response data in order to choose appropriate learners. The function considers whether the data set has any of these characteristics: • Categorical predictors • Missing values for more than 5% of the data • Imbalanced data, where the ratio of the number of observations in the largest class to the number of observations in the smallest class is greater than 5 • More than 100 observations in the smallest class • Wide data, where the number of predictors is greater than or equal to the number of observations • High-dimensional data, where the number of predictors is greater than 100 • Large data, where the number of observations is greater than 50,000 • Binary response variable • Ordinal response variable 35-2171

35

Functions

The selected learners are always a subset of those listed in the Learners table. However, the associated models tried during the optimization process can have different default values for hyperparameters not being optimized, as well as different search ranges for hyperparameters being optimized. Bayesian Optimization The goal of Bayesian optimization, and optimization in general, is to find a point that minimizes an objective function. In the context of fitcauto, a point is a learner type together with a set of hyperparameter values for the learner (see Learners and OptimizeHyperparameters), and the objective function is the cross-validation classification error, by default. The Bayesian optimization implemented in fitcauto internally maintains a multi-TreeBagger model of the objective function. That is, the objective function model splits along the learner type and, for a given learner, the model is a TreeBagger ensemble for regression. (This underlying model differs from the Gaussian process model employed by other Statistics and Machine Learning Toolbox functions that use Bayesian optimization.) Bayesian optimization trains the underlying model by using objective function evaluations, and determines the next point to evaluate by using an acquisition function ("expectedimprovement"). For more information, see “Expected Improvement” on page 10-4. The acquisition function balances between sampling at points with low modeled objective function values and exploring areas that are not well modeled yet. At the end of the optimization, fitcauto chooses the point with the minimum objective function model value, among the points evaluated during the optimization. For more information, see the "Criterion","min-visited-mean" name-value argument of bestPoint. ASHA Optimization The asynchronous successive halving algorithm (ASHA) in fitcauto randomly chooses several models with different hyperparameter values (see Learners and OptimizeHyperparameters) and trains them on a small subset of the training data. If the performance of a particular model is promising, the model is promoted and trained on a larger amount of the training data. This process repeats, and successful models are trained on progressively larger amounts of data. By default, at the end of the optimization, fitcauto chooses the model that has the lowest cross-validation classification error. At each iteration, ASHA either chooses a previously trained model and promotes it (that is, retrains the model using more training data), or selects a new model (learner type and hyperparameter values) using random search. ASHA promotes models as follows: • The algorithm searches for the group of models with the largest training set size for which this condition does not hold: floor(g/4) of the models have been promoted, where g is the number of models in the group. • Among the group of models, ASHA chooses the model with the lowest cross-validation classification error and retrains that model with 4*(Training Set Size) observations. • If no such group of models exists, then ASHA selects a new model instead of promoting an old one, and trains the new model using the smallest training set size. When a model is trained on a subset of the training data, ASHA computes the cross-validation classification error as follows: • For each training fold, the algorithm selects a random sample of the observations (of size Training set size) using stratified sampling, and then trains a model on that subset of data. • The algorithm then tests the fitted model on the test fold (that is, the observations not in the training fold) and computes the classification error. 35-2172

fitcauto

• Finally, the algorithm averages the results across all folds. For more information on ASHA, see [1]. Number of ASHA Iterations When you use ASHA optimization, the default number of iterations depends on the number of observations in the data, the number of learner types, the use of parallel processing, and the type of cross-validation. The algorithm selects the number of iterations such that, for L learner types (see Learners), fitcauto trains L models on the largest training set size. This table describes the default number of iterations based on the given specifications when you use 5-fold cross-validation. Note that n represents the number of observations and L represents the number of learner types. Number of Observations

Default Number of Iterations Default Number of Iterations

n

(run in serial)

(run in parallel)

n < 500

30*L — n is too small to implement ASHA optimization, and fitcauto implements random search to find and assess models instead.

30*L — n is too small to implement ASHA optimization, and fitcauto implements random search to find and assess models instead.

500 ≤ n < 2000

5*L

5*(L + 1)

2000 ≤ n < 8000

21*L

21*(L + 1)

8000 ≤ n < 32,000

85*L

85*(L + 1)

32,000 ≤ n

341*L

341*(L + 1)

Mean Misclassification Cost If you specify the Cost name-value argument, then fitcauto minimizes the mean misclassification cost rather than the misclassification error as part of the optimization process. The mean misclassification cost is defined as n

∑

L=

j=1

C k j, k

j

⋅ I yj ≠ y

j

n

where • C is the misclassification cost matrix as specified by the Cost name-value argument, and I is the indicator function. • yj is the true class label for observation j, and yj belongs to class kj. •

y j is the class label with the maximal predicted score for observation j, and y j belongs to class k j.

• n is the number of observations in the validation set.

Alternative Functionality • If you are unsure which models work best for your data set, you can alternatively use the Classification Learner app. Using the app, you can perform hyperparameter tuning for different 35-2173

35

Functions

models, and choose the optimized model that performs best. Although you must select a specific model before you can tune the model hyperparameters, Classification Learner provides greater flexibility for selecting optimizable hyperparameters and setting hyperparameter values. However, you cannot optimize in parallel, specify observation weights, specify prior probabilities, or use ASHA optimization in the app. For more information, see “Hyperparameter Optimization in Classification Learner App” on page 23-56. • If you know which models might suit your data, you can alternatively use the corresponding model fit functions and specify the OptimizeHyperparameters name-value argument to tune hyperparameters. You can compare the results across the models to select the best classifier. For an example of this process, see “Moving Towards Automating Model Selection Using Bayesian Optimization” on page 19-177.

Version History Introduced in R2020a R2023b: "auto" option of OptimizeHyperparameters includes Standardize for kernel, knearest neighbor (KNN), naive Bayes, and support vector machine (SVM) classifiers Behavior changed in R2023b Starting in R2023b, when the Learners value includes kernel ("kernel"), k-nearest neighbor ("knn"), naive Bayes ("nb"), or support vector machine ("svm") classifiers, the fitcauto function optimizes the Standardize hyperparameter of the models by default. That is, if the OptimizeHyperparameters value is "auto", then Standardize is an optimizable hyperparameter of kernel, KNN, naive Bayes, and SVM models. R2023b: Width hyperparameter search range does not depend on predictor data during optimization of naive Bayes models Behavior changed in R2023b Starting in R2023b, fitcauto optimizes the kernel smoothing window width of naive Bayes models by using the default search range [1e-3,1e3]. That is, when you specify to optimize the naive Bayes hyperparameter Width by using the OptimizeHyperparameters name-value argument, the function searches among positive values log-scaled in the range [1e-3,1e3]. In previous releases, the default search range for the Width hyperparameter was [MinPredictorDiff/4,max(MaxPredictorRange,MinPredictorDiff)], where MinPredictorDiff and MaxPredictorRange were determined as follows: diffs = diff(sort(X)); MinPredictorDiff = min(diffs(diffs ~= 0),[],"omitnan"); MaxPredictorRange = max(max(X) - min(X));

R2023a: Neural network classifiers support misclassification costs and prior probabilities Behavior changed in R2023a Starting in R2023a, fitcauto supports misclassification costs and prior probabilities for neural network classifiers. That is, you can specify the Cost and Prior name-value arguments when the Learners name-value argument includes "net" models. In previous releases, when you specified nondefault misclassification costs or prior probabilities, fitcauto omitted neural network models from the model selection process. 35-2174

fitcauto

R2022a: Learners include neural network models Behavior changed in R2022a Starting in R2022a, the list of available learners includes neural network models. When you specify "all" or "all-nonlinear" for the Learners name-value argument, fitcauto includes neural network models as part of the model selection and hyperparameter tuning process. The function also considers neural network models when you specify Learners as "auto", depending on the characteristics of your data set. To omit neural network models from the model selection process, you can explicitly specify the models you want to include. For example, to use tree and ensemble models only, specify "Learners",["tree","ensemble"]. R2022a: Automatic selection of learners includes linear models when data is wide after categorical expansion Behavior changed in R2022a Starting in R2022a, if you specify Learners as "auto" and the data has more predictors than observations after the expansion of the categorical predictors (see “Automatic Creation of Dummy Variables” on page 2-14), then fitcauto includes linear learners ("linear") along with other models during the hyperparameter optimization. In previous releases, linear learners were not considered. R2022a: Regularization method determines the linear learner solver used during the optimization process for multiclass classification Behavior changed in R2022a Starting in R2022a, when you specify to try a linear learner ("linear") for multiclass classification, fitcauto uses either a Limited-memory BFGS (LBFGS) solver or a Sparse Reconstruction by Separable Approximation (SpaRSA) solver, depending on the regularization type selected during that iteration of the optimization process. • When Regularization is 'ridge', the function sets the Solver value to 'lbfgs' by default. • When Regularization is 'lasso', the function sets the Solver value to 'sparsa' by default. In previous releases, the default solver selection during the optimization process depended on various factors, including the regularization type, learner type, and number of predictors. For more information, see Solver. R2021a: Regularization method determines the linear learner solver used during the optimization process for binary classification Behavior changed in R2021a Starting in R2021a, when you specify to try a linear learner ("linear") for binary classification, fitcauto uses either a Limited-memory BFGS (LBFGS) solver or a Sparse Reconstruction by Separable Approximation (SpaRSA) solver, depending on the regularization type selected during that iteration of the optimization process. • When Regularization is 'ridge', the function sets the Solver value to 'lbfgs' by default. • When Regularization is 'lasso', the function sets the Solver value to 'sparsa' by default. In previous releases, the default solver selection during the optimization process depended on various factors, including the regularization type, learner type, and number of predictors. For more information, see Solver. 35-2175

35

Functions

References [1] Li, Liam, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. “A System for Massively Parallel Hyperparameter Tuning.” ArXiv:1810.05934v5 [Cs], March 16, 2020. https://arxiv.org/abs/1810.05934v5.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To perform parallel hyperparameter optimization, use the "HyperparameterOptimizationOptions",struct("UseParallel",true) name-value argument in the call to this function. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also fitcdiscr | fitcecoc | fitcensemble | fitcknn | fitclinear | fitcnb | fitcnet | fitcsvm | fitctree | fitckernel Topics “Automated Classifier Selection with Bayesian and ASHA Optimization” on page 19-185 “Hyperparameter Optimization in Classification Learner App” on page 23-56

35-2176

fitcdiscr

fitcdiscr Fit discriminant analysis classifier

Syntax Mdl = fitcdiscr(Tbl,ResponseVarName) Mdl = fitcdiscr(Tbl,formula) Mdl = fitcdiscr(Tbl,Y) Mdl = fitcdiscr(X,Y) Mdl = fitcdiscr( ___ ,Name,Value)

Description Mdl = fitcdiscr(Tbl,ResponseVarName) returns a fitted discriminant analysis model based on the input variables (also known as predictors, features, or attributes) contained in the table Tbl and output (response or labels) contained in ResponseVarName. Mdl = fitcdiscr(Tbl,formula) returns a fitted discriminant analysis model based on the input variables contained in the table Tbl. formula is an explanatory model of the response and a subset of predictor variables in Tbl used to fit Mdl. Mdl = fitcdiscr(Tbl,Y) returns a fitted discriminant analysis model based on the input variables contained in the table Tbl and response Y. Mdl = fitcdiscr(X,Y) returns a discriminant analysis classifier based on the input variables X and response Y. Mdl = fitcdiscr( ___ ,Name,Value) fits a classifier with additional options specified by one or more name-value pair arguments, using any of the previous syntaxes. For example, you can optimize hyperparameters to minimize the model’s cross-validation loss, or specify the cost of misclassification, the prior probabilities for each class, or the observation weights.

Examples Train Discriminant Analysis Model Load Fisher's iris data set. load fisheriris

Train a discriminant analysis model using the entire data set. Mdl = fitcdiscr(meas,species) Mdl = ClassificationDiscriminant ResponseName: 'Y' CategoricalPredictors: []

35-2177

35

Functions

ClassNames: ScoreTransform: NumObservations: DiscrimType: Mu: Coeffs:

{'setosa' 'versicolor' 'none' 150 'linear' [3x4 double] [3x3 struct]

'virginica'}

Mdl is a ClassificationDiscriminant model. To access its properties, use dot notation. For example, display the group means for each predictor. Mdl.Mu ans = 3×4 5.0060 5.9360 6.5880

3.4280 2.7700 2.9740

1.4620 4.2600 5.5520

0.2460 1.3260 2.0260

To predict labels for new observations, pass Mdl and predictor data to predict.

Optimize Discriminant Analysis Model This example shows how to optimize hyperparameters automatically using fitcdiscr. The example uses Fisher's iris data. Load the data. load fisheriris

Find hyperparameters that minimize five-fold cross-validation loss by using automatic hyperparameter optimization. For reproducibility, set the random seed and use the 'expected-improvement-plus' acquisition function. rng(1) Mdl = fitcdiscr(meas,species,'OptimizeHyperparameters','auto',... 'HyperparameterOptimizationOptions',... struct('AcquisitionFunctionName','expected-improvement-plus'))

|================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Delta | G | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 1 | Best | 0.66667 | 1.0042 | 0.66667 | 0.66667 | 13.261 | 0.2 | 2 | Best | 0.02 | 0.20304 | 0.02 | 0.064227 | 2.7404e-05 | 0.07 | 3 | Accept | 0.04 | 0.12784 | 0.02 | 0.020084 | 3.2455e-06 | 0.4 | 4 | Accept | 0.66667 | 0.13206 | 0.02 | 0.020118 | 14.879 | 0.9 | 5 | Accept | 0.046667 | 0.16548 | 0.02 | 0.019907 | 0.00031449 | 0.9 | 6 | Accept | 0.04 | 0.13571 | 0.02 | 0.028438 | 4.5092e-05 | 0.4 | 7 | Accept | 0.046667 | 0.12994 | 0.02 | 0.031424 | 2.0973e-05 | 0. | 8 | Accept | 0.02 | 0.13209 | 0.02 | 0.022424 | 1.0554e-06 | 0.002

35-2178

fitcdiscr

| 9 | Accept | 0.02 | 0.11654 | 0.02 | 0.021105 | 1.1232e-06 | 0.0001 | 10 | Accept | 0.02 | 0.21013 | 0.02 | 0.020948 | 0.00011837 | 0.003 | 11 | Accept | 0.02 | 0.1237 | 0.02 | 0.020172 | 1.0292e-06 | 0.02 | 12 | Accept | 0.02 | 0.12369 | 0.02 | 0.020105 | 9.7792e-05 | 0.002 | 13 | Accept | 0.02 | 0.12593 | 0.02 | 0.020038 | 0.00036014 | 0.001 | 14 | Accept | 0.02 | 0.1272 | 0.02 | 0.019597 | 0.00021059 | 0.004 | 15 | Accept | 0.02 | 0.12114 | 0.02 | 0.019461 | 1.1911e-05 | 0.001 | 16 | Accept | 0.02 | 0.11943 | 0.02 | 0.01993 | 0.0017896 | 0.0007 | 17 | Accept | 0.02 | 0.12492 | 0.02 | 0.019551 | 0.00073745 | 0.006 | 18 | Accept | 0.02 | 0.12572 | 0.02 | 0.019776 | 0.00079304 | 0.0001 | 19 | Accept | 0.02 | 0.11518 | 0.02 | 0.019678 | 0.007292 | 0.000 | 20 | Accept | 0.046667 | 0.11401 | 0.02 | 0.019785 | 0.0074408 | 0.9 |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Delta | G | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 21 | Accept | 0.02 | 0.13172 | 0.02 | 0.019043 | 0.0036004 | 0.002 | 22 | Accept | 0.02 | 0.13834 | 0.02 | 0.019755 | 2.5238e-05 | 0.001 | 23 | Accept | 0.02 | 0.21915 | 0.02 | 0.0191 | 1.5478e-05 | 0.002 | 24 | Accept | 0.02 | 0.13454 | 0.02 | 0.019081 | 0.0040557 | 0.0004 | 25 | Accept | 0.02 | 0.11772 | 0.02 | 0.019333 | 2.959e-05 | 0.001 | 26 | Accept | 0.02 | 0.12669 | 0.02 | 0.019369 | 2.3111e-06 | 0.002 | 27 | Accept | 0.02 | 0.51238 | 0.02 | 0.019455 | 3.8898e-05 | 0.001 | 28 | Accept | 0.02 | 0.36816 | 0.02 | 0.019449 | 0.0035925 | 0.002 | 29 | Accept | 0.66667 | 0.83305 | 0.02 | 0.019479 | 998.93 | 0.06 | 30 | Accept | 0.02 | 0.19048 | 0.02 | 0.01947 | 8.1557e-06 | 0.000 __________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 38.0103 seconds Total objective function evaluation time: 6.3502 Best observed feasible point: Delta Gamma __________ ________ 2.7404e-05

0.073264

Observed objective function value = 0.02 Estimated objective function value = 0.022693 Function evaluation time = 0.20304 Best estimated feasible point (according to models): Delta Gamma __________ _________ 2.5238e-05

0.0015542

Estimated objective function value = 0.01947 Estimated function evaluation time = 0.1627

35-2179

35

Functions

35-2180

fitcdiscr

Mdl = ClassificationDiscriminant ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: HyperparameterOptimizationResults: DiscrimType: Mu: Coeffs:

'Y' [] {'setosa' 'versicolor' 'virginica'} 'none' 150 [1x1 BayesianOptimization] 'linear' [3x4 double] [3x3 struct]

The fit achieves about 2% loss for the default 5-fold cross validation.

35-2181

35

Functions

Optimize Discriminant Analysis Model on Tall Array This example shows how to optimize hyperparameters of a discriminant analysis model automatically using a tall array. The sample data set airlinesmall.csv is a large data set that contains a tabular file of airline flight data. This example creates a tall table containing the data and uses it to run the optimization procedure. When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. If you want to run the example using the local MATLAB session when you have Parallel Computing Toolbox, you can change the global execution environment by using the mapreducer function. Create a datastore that references the folder location with the data. Select a subset of the variables to work with, and treat 'NA' values as missing data so that datastore replaces them with NaN values. Create a tall table that contains the data in the datastore. ds = datastore('airlinesmall.csv'); ds.SelectedVariableNames = {'Month','DayofMonth','DayOfWeek',... 'DepTime','ArrDelay','Distance','DepDelay'}; ds.TreatAsMissing = 'NA'; tt = tall(ds) % Tall table Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6). tt = M×7 tall table Month _____ 10 10 10 10 10 10 10 10 : :

DayofMonth __________

DayOfWeek _________

21 26 23 23 22 28 8 10 : :

3 1 5 5 4 3 4 6 : :

DepTime _______ 642 1021 2055 1332 629 1446 928 859 : :

ArrDelay ________ 8 8 21 13 4 59 3 11 : :

Distance ________ 308 296 480 296 373 308 447 954 : :

DepDelay ________ 12 1 20 12 -1 63 -2 -1 : :

Determine the flights that are late by 10 minutes or more by defining a logical variable that is true for a late flight. This variable contains the class labels. A preview of this variable includes the first few rows. Y = tt.DepDelay > 10 % Class labels Y = M×1 tall logical array 1 0 1 1

35-2182

fitcdiscr

0 1 0 0 : :

Create a tall array for the predictor data. X = tt{:,1:end-1} % Predictor data X = M×6 tall double matrix 10 10 10 10 10 10 10 10 : :

21 26 23 23 22 28 8 10 : :

3 1 5 5 4 3 4 6 : :

642 1021 2055 1332 629 1446 928 859 : :

8 8 21 13 4 59 3 11 : :

308 296 480 296 373 308 447 954 : :

Remove rows in X and Y that contain missing data. R = rmmissing([X Y]); % Data with missing entries removed X = R(:,1:end-1); Y = R(:,end);

Standardize the predictor variables. Z = zscore(X);

Optimize hyperparameters automatically using the 'OptimizeHyperparameters' name-value pair argument. Find the optimal 'DiscrimType' value that minimizes holdout cross-validation loss. (Specifying 'auto' uses 'DiscrimType'.) For reproducibility, use the 'expected-improvementplus' acquisition function and set the seeds of the random number generators using rng and tallrng. The results can vary depending on the number of workers and the execution environment for the tall arrays. For details, see “Control Where Your Code Runs”. rng('default') tallrng('default') [Mdl,FitInfo,HyperparameterOptimizationResults] = fitcdiscr(Z,Y,... 'OptimizeHyperparameters','auto',... 'HyperparameterOptimizationOptions',struct('Holdout',0.3,... 'AcquisitionFunctionName','expected-improvement-plus')) Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 2: Completed in 5.7 sec - Pass 2 of 2: Completed in 4.3 sec Evaluation completed in 16 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 2.5 sec Evaluation completed in 2.8 sec |======================================================================================|

35-2183

35

Functions

| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | DiscrimType | | | result | | runtime | (observed) | (estim.) | | |======================================================================================| | 1 | Best | 0.11354 | 25.315 | 0.11354 | 0.11354 | quadratic |

35-2184

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.5 sec Evaluation completed in 2.7 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.4 sec Evaluation completed in 1.6 sec | 2 | Accept | 0.11354 | 7.9367 | 0.11354 |

0.11354 | pseudoQuadra |

Evaluating tall expression using - Pass 1 of 1: Completed in 0.87 Evaluation completed in 2 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.78 Evaluation completed in 0.91 sec | 3 | Accept | 0.12869 |

0.11859 | pseudoLinear |

the Parallel Pool 'local': sec the Parallel Pool 'local': sec 6.5057 |

0.11354 |

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.9 sec Evaluation completed in 1.7 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.3 sec Evaluation completed in 1.4 sec | 4 | Accept | 0.12745 | 6.4167 | 0.11354 |

0.1208 |

diagLinear |

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.85 sec Evaluation completed in 1.7 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.8 sec Evaluation completed in 0.93 sec | 5 | Accept | 0.12869 | 6.1236 | 0.11354 |

0.12238 |

linear |

Evaluating tall expression using - Pass 1 of 1: Completed in 0.85 Evaluation completed in 1.5 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.75 Evaluation completed in 0.9 sec | 6 | Best | 0.11301 |

the Parallel Pool 'local': sec

0.12082 | diagQuadrati |

Evaluating tall expression using - Pass 1 of 1: Completed in 0.82 Evaluation completed in 1.5 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.77 Evaluation completed in 0.89 sec | 7 | Accept | 0.11301 |

the Parallel Pool 'local': sec

the Parallel Pool 'local': sec 5.4147 |

0.11301 |

the Parallel Pool 'local': sec 5.297 |

0.11301 |

0.11301 | diagQuadrati |

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.84 sec Evaluation completed in 1.5 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.8 sec Evaluation completed in 0.93 sec | 8 | Accept | 0.11301 | 5.6152 | 0.11301 |

0.11301 | diagQuadrati |

fitcdiscr

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.3 sec Evaluation completed in 2.1 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.75 sec Evaluation completed in 0.88 sec | 9 | Accept | 0.11301 | 5.9147 | 0.11301 |

0.11301 | diagQuadrati |

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.88 sec Evaluation completed in 1.6 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.3 sec Evaluation completed in 1.4 sec | 10 | Accept | 0.11301 | 6.0504 | 0.11301 |

0.11301 | diagQuadrati |

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.82 sec Evaluation completed in 1.5 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.3 sec Evaluation completed in 1.4 sec | 11 | Accept | 0.11301 | 5.9595 | 0.11301 |

0.11301 | diagQuadrati |

Evaluating tall expression using - Pass 1 of 1: Completed in 0.86 Evaluation completed in 1.6 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.76 Evaluation completed in 0.91 sec | 12 | Accept | 0.11301 |

the Parallel Pool 'local': sec

0.11301 | diagQuadrati |

Evaluating tall expression using - Pass 1 of 1: Completed in 0.88 Evaluation completed in 1.6 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.75 Evaluation completed in 0.87 sec | 13 | Accept | 0.11301 |

the Parallel Pool 'local': sec

the Parallel Pool 'local': sec 5.4266 |

0.11301 |

the Parallel Pool 'local': sec 5.3869 |

0.11301 |

0.11301 | diagQuadrati |

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.83 sec Evaluation completed in 1.5 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.8 sec Evaluation completed in 0.97 sec | 14 | Accept | 0.11301 | 5.4876 | 0.11301 |

0.11301 | diagQuadrati |

Evaluating tall expression using - Pass 1 of 1: Completed in 0.85 Evaluation completed in 1.5 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.73 Evaluation completed in 0.85 sec | 15 | Accept | 0.11301 |

0.11301 | diagQuadrati |

the Parallel Pool 'local': sec the Parallel Pool 'local': sec 5.4052 |

0.11301 |

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.87 sec Evaluation completed in 1.5 sec

35-2185

35

Functions

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.78 sec Evaluation completed in 0.9 sec | 16 | Accept | 0.11301 | 5.4434 | 0.11301 |

0.11301 | diagQuadrati |

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.89 sec Evaluation completed in 1.6 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.8 sec Evaluation completed in 0.93 sec | 17 | Accept | 0.11301 | 5.5804 | 0.11301 |

0.11301 | diagQuadrati |

Evaluating tall expression using - Pass 1 of 1: Completed in 0.94 Evaluation completed in 1.6 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.79 Evaluation completed in 0.92 sec | 18 | Accept | 0.11354 |

the Parallel Pool 'local': sec

0.11301 | pseudoQuadra |

Evaluating tall expression using - Pass 1 of 1: Completed in 0.85 Evaluation completed in 1.5 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.76 Evaluation completed in 0.88 sec | 19 | Accept | 0.11301 |

the Parallel Pool 'local': sec

Evaluating tall expression using - Pass 1 of 1: Completed in 0.76 Evaluation completed in 1.4 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.75 Evaluation completed in 0.88 sec | 20 | Accept | 0.11301 |

the Parallel Pool 'local': sec

the Parallel Pool 'local': sec 5.616 |

0.11301 |

the Parallel Pool 'local': sec 5.4031 |

0.11301 |

0.11301 | diagQuadrati |

the Parallel Pool 'local': sec 5.1974 |

0.11301 |

0.11301 | diagQuadrati |

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.77 sec Evaluation completed in 1.4 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.75 sec Evaluation completed in 0.87 sec |======================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | DiscrimType | | | result | | runtime | (observed) | (estim.) | | |======================================================================================| | 21 | Accept | 0.11301 | 5.1418 | 0.11301 | 0.11301 | diagQuadrati | Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.3 sec Evaluation completed in 2 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.73 sec Evaluation completed in 0.86 sec | 22 | Accept | 0.11301 | 5.9864 | 0.11301 | Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.88 sec

35-2186

0.11301 | diagQuadrati |

fitcdiscr

Evaluation completed in 1.6 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.78 sec Evaluation completed in 0.91 sec | 23 | Accept | 0.11354 | 5.5656 | 0.11301 |

0.11301 |

Evaluating tall expression using - Pass 1 of 1: Completed in 0.82 Evaluation completed in 1.5 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.77 Evaluation completed in 0.9 sec | 24 | Accept | 0.11354 |

0.11301 |

0.11301 | pseudoQuadra |

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.4 sec Evaluation completed in 2.1 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.77 sec Evaluation completed in 0.9 sec | 25 | Accept | 0.11301 | 6.2276 | 0.11301 |

0.11301 | diagQuadrati |

Evaluating tall expression using - Pass 1 of 1: Completed in 0.86 Evaluation completed in 1.6 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.77 Evaluation completed in 0.89 sec | 26 | Accept | 0.11301 |

the Parallel Pool 'local': sec

0.11301 | diagQuadrati |

Evaluating tall expression using - Pass 1 of 1: Completed in 0.92 Evaluation completed in 1.6 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.88 Evaluation completed in 1 sec | 27 | Accept | 0.11301 |

the Parallel Pool 'local': sec

Evaluating tall expression using - Pass 1 of 1: Completed in 0.83 Evaluation completed in 1.5 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.78 Evaluation completed in 0.9 sec | 28 | Accept | 0.11354 |

the Parallel Pool 'local': sec

Evaluating tall expression using - Pass 1 of 1: Completed in 0.86 Evaluation completed in 1.5 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.81 Evaluation completed in 0.93 sec | 29 | Accept | 0.11301 |

the Parallel Pool 'local': sec

Evaluating tall expression using - Pass 1 of 1: Completed in 0.89 Evaluation completed in 1.6 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.74

the Parallel Pool 'local': sec

quadratic |

the Parallel Pool 'local': sec the Parallel Pool 'local': sec 5.3012 |

the Parallel Pool 'local': sec 5.5308 |

0.11301 |

the Parallel Pool 'local': sec 5.7396 |

0.11301 |

0.11301 | diagQuadrati |

the Parallel Pool 'local': sec 5.4403 |

0.11301 |

0.11301 |

quadratic |

the Parallel Pool 'local': sec 5.3572 |

0.11301 |

0.11301 | diagQuadrati |

the Parallel Pool 'local': sec

35-2187

35

Functions

Evaluation completed in 0.85 sec | 30 | Accept | 0.11354 |

35-2188

5.2718 |

0.11301 |

0.11301 |

quadratic |

fitcdiscr

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 229.5689 seconds. Total objective function evaluation time: 191.058 Best observed feasible point: DiscrimType _____________ diagQuadratic Observed objective function value = 0.11301 Estimated objective function value = 0.11301 Function evaluation time = 5.4147 Best estimated feasible point (according to models): DiscrimType

35-2189

35

Functions

_____________ diagQuadratic Estimated objective function value = 0.11301 Estimated function evaluation time = 5.784 Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.76 sec Evaluation completed in 1.4 sec Mdl = CompactClassificationDiscriminant PredictorNames: {'x1' 'x2' 'x3' ResponseName: 'Y' CategoricalPredictors: [] ClassNames: [0 1] ScoreTransform: 'none' DiscrimType: 'diagQuadratic' Mu: [2×6 double] Coeffs: [2×2 struct]

'x4'

'x5'

'x6'}

Properties, Methods FitInfo = struct with no fields.

HyperparameterOptimizationResults = BayesianOptimization with properties: ObjectiveFcn: VariableDescriptions: Options: MinObjective: XAtMinObjective: MinEstimatedObjective: XAtMinEstimatedObjective: NumObjectiveEvaluations: TotalElapsedTime: NextPoint: XTrace: ObjectiveTrace: ConstraintsTrace: UserDataTrace: ObjectiveEvaluationTimeTrace: IterationTimeTrace: ErrorTrace: FeasibilityTrace: FeasibilityProbabilityTrace: IndexOfMinimumTrace: ObjectiveMinimumTrace: EstimatedObjectiveMinimumTrace:

35-2190

@createObjFcn/tallObjFcn [1×1 optimizableVariable] [1×1 struct] 0.1130 [1×1 table] 0.1130 [1×1 table] 30 229.5689 [1×1 table] [30×1 table] [30×1 double] [] {30×1 cell} [30×1 double] [30×1 double] [30×1 double] [30×1 logical] [30×1 double] [30×1 double] [30×1 double] [30×1 double]

fitcdiscr

Input Arguments Tbl — Sample data table Sample data used to train the model, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain one additional column for the response variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed. • If Tbl contains the response variable, and you want to use all remaining variables in Tbl as predictors, then specify the response variable by using ResponseVarName. • If Tbl contains the response variable, and you want to use only a subset of the remaining variables in Tbl as predictors, then specify a formula by using formula. • If Tbl does not contain the response variable, then specify a response variable by using Y. The length of the response variable and the number of rows in Tbl must be equal. ResponseVarName — Response variable name name of variable in Tbl Response variable name, specified as the name of a variable in Tbl. You must specify ResponseVarName as a character vector or string scalar. For example, if the response variable Y is stored as Tbl.Y, then specify it as "Y". Otherwise, the software treats all columns of Tbl, including Y, as predictors when training the model. The response variable must be a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. If Y is a character array, then each element of the response variable must correspond to one row of the array. A good practice is to specify the order of the classes by using the ClassNames name-value argument. Data Types: char | string formula — Explanatory model of response variable and subset of predictor variables character vector | string scalar Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form "Y~x1+x2+x3". In this form, Y represents the response variable, and x1, x2, and x3 represent the predictor variables. To specify a subset of variables in Tbl as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in Tbl that do not appear in formula. The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB identifiers. You can verify the variable names in Tbl by using the isvarname function. If the variable names are not valid, then you can convert them by using the matlab.lang.makeValidName function. Data Types: char | string Y — Class labels categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors 35-2191

35

Functions

Class labels, specified as a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. Each row of Y represents the classification of the corresponding row of X. The software considers NaN, '' (empty character vector), "" (empty string), , and values in Y to be missing values. Consequently, the software does not train using observations with a missing response. Data Types: categorical | char | string | logical | single | double | cell X — Predictor data numeric matrix Predictor values, specified as a numeric matrix. Each column of X represents one variable, and each row represents one observation. fitcdiscr considers NaN values in X as missing values. fitcdiscr does not use observations with missing values for X in the fit. Data Types: single | double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Note You cannot use any cross-validation name-value argument together with the 'OptimizeHyperparameters' name-value argument. You can modify the cross-validation for 'OptimizeHyperparameters' only by using the 'HyperparameterOptimizationOptions' name-value argument. Example: 'DiscrimType','quadratic','SaveMemory','on' specifies a quadratic discriminant classifier and does not store the covariance matrix in the output object. Model Parameters

ClassNames — Names of classes to use for training categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Names of classes to use for training, specified as a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. ClassNames must have the same data type as the response variable in Tbl or Y. If ClassNames is a character array, then each element must correspond to one row of the array. Use ClassNames to: • Specify the order of the classes during training. • Specify the order of any input or output argument dimension that corresponds to the class order. For example, use ClassNames to specify the order of the dimensions of Cost or the column order of classification scores returned by predict. 35-2192

fitcdiscr

• Select a subset of classes for training. For example, suppose that the set of all distinct class names in Y is ["a","b","c"]. To train the model using observations from classes "a" and "c" only, specify "ClassNames",["a","c"]. The default value for ClassNames is the set of all distinct class names in the response variable in Tbl or Y. Example: "ClassNames",["b","g"] Data Types: categorical | char | string | logical | single | double | cell Cost — Cost of misclassification square matrix | structure Cost of misclassification of a point, specified as the comma-separated pair consisting of 'Cost' and one of the following: • Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (i.e., the rows correspond to the true class and the columns correspond to the predicted class). To specify the class order for the corresponding rows and columns of Cost, additionally specify the ClassNames name-value pair argument. • Structure S having two fields: S.ClassNames containing the group names as a variable of the same type as Y, and S.ClassificationCosts containing the cost matrix. The default is Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j. Data Types: single | double | struct Delta — Linear coefficient threshold 0 (default) | nonnegative scalar value Linear coefficient threshold, specified as the comma-separated pair consisting of 'Delta' and a nonnegative scalar value. If a coefficient of Mdl has magnitude smaller than Delta, Mdl sets this coefficient to 0, and you can eliminate the corresponding predictor from the model. Set Delta to a higher value to eliminate more predictors. Delta must be 0 for quadratic discriminant models. Data Types: single | double DiscrimType — Discriminant type 'linear' (default) | 'quadratic' | 'diaglinear' | 'diagquadratic' | 'pseudolinear' | 'pseudoquadratic' Discriminant type, specified as the comma-separated pair consisting of 'DiscrimType' and a character vector or string scalar in this table.

35-2193

35

Functions

Value

Description

Predictor Covariance Treatment

'linear'

Regularized linear discriminant • All classes have the same analysis (LDA) covariance matrix. •

Σ γ = 1 − γ Σ + γdiag Σ .

Σ is the empirical, pooled covariance matrix and γ is the amount of regularization. 'diaglinear'

LDA

All classes have the same, diagonal covariance matrix.

'pseudolinear'

LDA

All classes have the same covariance matrix. The software inverts the covariance matrix using the pseudo inverse.

'quadratic'

Quadratic discriminant analysis The covariance matrices can (QDA) vary among classes.

'diagquadratic'

QDA

The covariance matrices are diagonal and can vary among classes.

'pseudoquadratic'

QDA

The covariance matrices can vary among classes. The software inverts the covariance matrix using the pseudo inverse.

Note To use regularization, you must specify 'linear'. To specify the amount of regularization, use the Gamma name-value pair argument. Example: 'DiscrimType','quadratic' FillCoeffs — Coeffs property flag 'on' | 'off' Coeffs property flag, specified as the comma-separated pair consisting of 'FillCoeffs' and 'on' or 'off'. Setting the flag to 'on' populates the Coeffs property in the classifier object. This can be computationally intensive, especially when cross-validating. The default is 'on', unless you specify a cross-validation name-value pair, in which case the flag is set to 'off' by default. Example: 'FillCoeffs','off' Gamma — Amount of regularization scalar value in the interval [0,1] Amount of regularization to apply when estimating the covariance matrix of the predictors, specified as the comma-separated pair consisting of 'Gamma' and a scalar value in the interval [0,1]. Gamma provides finer control over the covariance matrix structure than DiscrimType. • If you specify 0, then the software does not use regularization to adjust the covariance matrix. That is, the software estimates and uses the unrestricted, empirical covariance matrix. 35-2194

fitcdiscr

• For linear discriminant analysis, if the empirical covariance matrix is singular, then the software automatically applies the minimal regularization required to invert the covariance matrix. You can display the chosen regularization amount by entering Mdl.Gamma at the command line. • For quadratic discriminant analysis, if at least one class has an empirical covariance matrix that is singular, then the software throws an error. • If you specify a value in the interval (0,1), then you must implement linear discriminant analysis, otherwise the software throws an error. Consequently, the software sets DiscrimType to 'linear'. • If you specify 1, then the software uses maximum regularization for covariance matrix estimation. That is, the software restricts the covariance matrix to be diagonal. Alternatively, you can set DiscrimType to 'diagLinear' or 'diagQuadratic' for diagonal covariance matrices. Example: 'Gamma',1 Data Types: single | double PredictorNames — Predictor variable names string array of unique names | cell array of unique character vectors Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The functionality of PredictorNames depends on the way you supply the training data. • If you supply X and Y, then you can use PredictorNames to assign names to the predictor variables in X. • The order of the names in PredictorNames must correspond to the column order of X. That is, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal. • By default, PredictorNames is {'x1','x2',...}. • If you supply Tbl, then you can use PredictorNames to choose which predictor variables to use in training. That is, fitcdiscr uses only the predictor variables in PredictorNames and the response variable during training. • PredictorNames must be a subset of Tbl.Properties.VariableNames and cannot include the name of the response variable. • By default, PredictorNames contains the names of all predictor variables. • A good practice is to specify the predictors for training using either PredictorNames or formula, but not both. Example: "PredictorNames", ["SepalLength","SepalWidth","PetalLength","PetalWidth"] Data Types: string | cell Prior — Prior probabilities 'empirical' (default) | 'uniform' | vector of scalar values | structure Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and a value in this table.

35-2195

35

Functions

Value

Description

'empirical'

The class prior probabilities are the class relative frequencies in Y.

'uniform'

All class prior probabilities are equal to 1/K, where K is the number of classes.

numeric vector

Each element is a class prior probability. Order the elements according to Mdl.ClassNames or specify the order using the ClassNames namevalue pair argument. The software normalizes the elements such that they sum to 1.

structure

A structure S with two fields: • S.ClassNames contains the class names as a variable of the same type as Y. • S.ClassProbs contains a vector of corresponding prior probabilities. The software normalizes the elements such that they sum to 1.

If you set values for both Weights and Prior, the weights are renormalized to add up to the value of the prior probability in the respective class. Example: 'Prior','uniform' Data Types: char | string | single | double | struct ResponseName — Response variable name "Y" (default) | character vector | string scalar Response variable name, specified as a character vector or string scalar. • If you supply Y, then you can use ResponseName to specify a name for the response variable. • If you supply ResponseVarName or formula, then you cannot use ResponseName. Example: "ResponseName","response" Data Types: char | string SaveMemory — Flag to save covariance matrix 'off' (default) | 'on' Flag to save covariance matrix, specified as the comma-separated pair consisting of 'SaveMemory' and either 'on' or 'off'. If you specify 'on', then fitcdiscr does not store the full covariance matrix, but instead stores enough information to compute the matrix. The predict method computes the full covariance matrix for prediction, and does not store the matrix. If you specify 'off', then fitcdiscr computes and stores the full covariance matrix in Mdl. Specify SaveMemory as 'on' when the input matrix contains thousands of predictors. Example: 'SaveMemory','on' ScoreTransform — Score transformation "none" (default) | "doublelogit" | "invlogit" | "ismax" | "logit" | function handle | ... 35-2196

fitcdiscr

Score transformation, specified as a character vector, string scalar, or function handle. This table summarizes the available character vectors and string scalars. Value

Description

"doublelogit"

1/(1 + e–2x)

"invlogit"

log(x / (1 – x))

"ismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

"logit"

1/(1 + e–x)

"none" or "identity"

x (no transformation)

"sign"

–1 for x < 0 0 for x = 0 1 for x > 0

"symmetric"

2x – 1

"symmetricismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

"symmetriclogit"

2/(1 + e–x) – 1

For a MATLAB function or a function you define, use its function handle for the score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Example: "ScoreTransform","logit" Data Types: char | string | function_handle Weights — Observation weights numeric vector of positive values | name of variable in Tbl Observation weights, specified as the comma-separated pair consisting of 'Weights' and a numeric vector of positive values or name of a variable in Tbl. The software weighs the observations in each row of X or Tbl with the corresponding value in Weights. The size of Weights must equal the number of rows of X or Tbl. If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored as Tbl.W, then specify it as 'W'. Otherwise, the software treats all columns of Tbl, including W, as predictors or the response when training the model. The software normalizes Weights to sum up to the value of the prior probability in the respective class. By default, Weights is ones(n,1), where n is the number of observations in X or Tbl. Data Types: double | single | char | string Cross-Validation Options

CrossVal — Cross-validation flag 'off' (default) | 'on' 35-2197

35

Functions

Cross-validation flag, specified as the comma-separated pair consisting of 'Crossval' and 'on' or 'off'. If you specify 'on', then the software implements 10-fold cross-validation. To override this cross-validation setting, use one of these name-value pair arguments: CVPartition, Holdout, KFold, or Leaveout. To create a cross-validated model, you can use one cross-validation name-value pair argument at a time only. Alternatively, cross-validate later by passing Mdl to crossval. Example: 'CrossVal','on' CVPartition — Cross-validation partition [] (default) | cvpartition object Cross-validation partition, specified as a cvpartition object that specifies the type of crossvalidation and the indexing for the training and validation sets. To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,KFold=5). Then, you can specify the cross-validation partition by setting CVPartition=cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as a scalar value in the range [0,1]. If you specify Holdout=p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Holdout=0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in the cross-validated model, specified as a positive integer value greater than 1. If you specify KFold=k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact trained models in a k-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. 35-2198

fitcdiscr

Example: KFold=5 Data Types: single | double Leaveout — Leave-one-out cross-validation flag "off" (default) | "on" Leave-one-out cross-validation flag, specified as "on" or "off". If you specify Leaveout="on", then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the one observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact trained models in an n-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Leaveout="on" Data Types: char | string

Hyperparameter Optimization Options

OptimizeHyperparameters — Parameters to optimize 'none' (default) | 'auto' | 'all' | string array or cell array of eligible parameter names | vector of optimizableVariable objects Parameters to optimize, specified as the comma-separated pair consisting of 'OptimizeHyperparameters' and one of the following: • 'none' — Do not optimize. • 'auto' — Use {'Delta','Gamma'}. • 'all' — Optimize all eligible parameters. • String array or cell array of eligible parameter names. • Vector of optimizableVariable objects, typically the output of hyperparameters. The optimization attempts to minimize the cross-validation loss (error) for fitcdiscr by varying the parameters. For information about cross-validation loss (albeit in a different context), see “Classification Loss” on page 35-4305. To control the cross-validation type and other aspects of the optimization, use the HyperparameterOptimizationOptions name-value pair. Note The values of 'OptimizeHyperparameters' override any values you specify using other name-value arguments. For example, setting 'OptimizeHyperparameters' to 'auto' causes fitcdiscr to optimize hyperparameters corresponding to the 'auto' option and to ignore any specified values for the hyperparameters. The eligible parameters for fitcdiscr are: 35-2199

35

Functions

• Delta — fitcdiscr searches among positive values, by default log-scaled in the range [1e-6,1e3]. • DiscrimType — fitcdiscr searches among 'linear', 'quadratic', 'diagLinear', 'diagQuadratic', 'pseudoLinear', and 'pseudoQuadratic'. • Gamma — fitcdiscr searches among real values in the range [0,1]. Set nondefault parameters by passing a vector of optimizableVariable objects that have nondefault values. For example, load fisheriris params = hyperparameters('fitcdiscr',meas,species); params(1).Range = [1e-4,1e6];

Pass params as the value of OptimizeHyperparameters. By default, the iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is the misclassification rate. To control the iterative display, set the Verbose field of the 'HyperparameterOptimizationOptions' name-value argument. To control the plots, set the ShowPlots field of the 'HyperparameterOptimizationOptions' name-value argument. For an example, see “Optimize Discriminant Analysis Model” on page 35-2178. Example: 'auto' HyperparameterOptimizationOptions — Options for optimization structure Options for optimization, specified as a structure. This argument modifies the effect of the OptimizeHyperparameters name-value argument. All fields in the structure are optional. Field Name

Values

Default

Optimizer

• 'bayesopt' — Use Bayesian optimization. Internally, this setting calls bayesopt.

'bayesopt'

• 'gridsearch' — Use grid search with NumGridDivisions values per dimension. • 'randomsearch' — Search at random among MaxObjectiveEvaluations points. 'gridsearch' searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command sortrows(Mdl.HyperparameterOptimizatio nResults).

35-2200

fitcdiscr

Field Name

Values

AcquisitionFunct • 'expected-improvement-per-secondionName plus' • 'expected-improvement'

Default 'expectedimprovement-persecond-plus'

• 'expected-improvement-plus' • 'expected-improvement-per-second' • 'lower-confidence-bound' • 'probability-of-improvement' Acquisition functions whose names include persecond do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include plus modify their behavior when they are overexploiting an area. For more details, see “Acquisition Function Types” on page 10-3. MaxObjectiveEval Maximum number of objective function uations evaluations.

30 for 'bayesopt' and 'randomsearch', and the entire grid for 'gridsearch'

MaxTime

Inf

Time limit, specified as a positive real scalar. The time limit is in seconds, as measured by tic and toc. The run time can exceed MaxTime because MaxTime does not interrupt function evaluations.

NumGridDivisions For 'gridsearch', the number of values in each 10 dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables. ShowPlots

Logical value indicating whether to show plots. If true true, this field plots the best observed objective function value against the iteration number. If you use Bayesian optimization (Optimizer is 'bayesopt'), then this field also plots the best estimated objective function value. The best observed objective function values and best estimated objective function values correspond to the values in the BestSoFar (observed) and BestSoFar (estim.) columns of the iterative display, respectively. You can find these values in the properties ObjectiveMinimumTrace and EstimatedObjectiveMinimumTrace of Mdl.HyperparameterOptimizationResults. If the problem includes one or two optimization parameters for Bayesian optimization, then ShowPlots also plots a model of the objective function against the parameters.

35-2201

35

Functions

Field Name

Values

Default

SaveIntermediate Logical value indicating whether to save results Results when Optimizer is 'bayesopt'. If true, this field overwrites a workspace variable named 'BayesoptResults' at each iteration. The variable is a BayesianOptimization object.

false

Verbose

1

Display at the command line: • 0 — No iterative display • 1 — Iterative display • 2 — Iterative display with extra information For details, see the bayesopt Verbose namevalue argument and the example “Optimize Classifier Fit Using Bayesian Optimization” on page 10-56.

UseParallel

Logical value indicating whether to run Bayesian false optimization in parallel, which requires Parallel Computing Toolbox. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For details, see “Parallel Bayesian Optimization” on page 10-7.

Repartition

Logical value indicating whether to repartition the false cross-validation at every iteration. If this field is false, the optimizer uses a single partition for the optimization. The setting true usually gives the most robust results because it takes partitioning noise into account. However, for good results, true requires at least twice as many function evaluations.

Use no more than one of the following three options. CVPartition

A cvpartition object, as created by cvpartition

Holdout

A scalar in the range (0,1) representing the holdout fraction

Kfold

An integer greater than 1

'Kfold',5 if you do not specify a cross-validation field

Example: 'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60) Data Types: struct

Output Arguments Mdl — Trained discriminant analysis classification model ClassificationDiscriminant model object | ClassificationPartitionedModel crossvalidated model object 35-2202

fitcdiscr

Trained discriminant analysis classification model, returned as a ClassificationDiscriminant model object or a ClassificationPartitionedModel cross-validated model object. If you set any of the name-value pair arguments KFold, Holdout, CrossVal, or CVPartition, then Mdl is a ClassificationPartitionedModel cross-validated model object. Otherwise, Mdl is a ClassificationDiscriminant model object. To reference properties of Mdl, use dot notation. For example, to display the estimated component means at the Command Window, enter Mdl.Mu.

More About Discriminant Classification The model for discriminant analysis is: • Each class (Y) generates data (X) using a multivariate normal distribution. That is, the model assumes X has a Gaussian mixture distribution (gmdistribution). • For linear discriminant analysis, the model has the same covariance matrix for each class, only the means vary. • For quadratic discriminant analysis, both means and covariances of each class vary. predict classifies so as to minimize the expected classification cost: y = argmin

K

∑

y = 1, ..., K k = 1

P k xCy k,

where •

y is the predicted classification.

• K is the number of classes. • P k x is the posterior probability on page 21-6 of class k for observation x. • C y k is the cost on page 21-7 of classifying an observation as y when its true class is k. For details, see “Prediction Using Discriminant Analysis Models” on page 21-6.

Tips After training a model, you can generate C/C++ code that predicts labels for new data. Generating C/C++ code requires MATLAB Coder. For details, see “Introduction to Code Generation” on page 343.

Algorithms • If you specify the Cost, Prior, and Weights name-value arguments, the output model object stores the specified values in the Cost, Prior, and W properties, respectively. The Cost property stores the user-specified cost matrix as is. The Prior and W properties store the prior probabilities and observation weights, respectively, after normalization. For details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 19-8. 35-2203

35

Functions

• The software uses the Cost property for prediction, but not training. Therefore, Cost is not readonly; you can change the property value by using dot notation after creating the trained model.

Alternative Functionality Functions The classify function also performs discriminant analysis. classify is usually more awkward to use. • classify requires you to fit the classifier every time you make a new prediction. • classify does not perform cross-validation or hyperparameter optimization. • classify requires you to fit the classifier when changing prior probabilities.

Version History Introduced in R2014a

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. Usage notes and limitations: • Supported syntaxes are: • Mdl = fitcdiscr(Tbl,Y) • Mdl = fitcdiscr(X,Y) • Mdl = fitcdiscr(___,Name,Value) • [Mdl,FitInfo,HyperparameterOptimizationResults] = fitcdiscr(___,Name,Value) — fitcdiscr returns the additional output arguments FitInfo and HyperparameterOptimizationResults when you specify the 'OptimizeHyperparameters' name-value pair argument. • The FitInfo output argument is an empty structure array currently reserved for possible future use. • The HyperparameterOptimizationResults output argument is a BayesianOptimization object or a table of hyperparameters with associated values that describe the cross-validation optimization of hyperparameters. 'HyperparameterOptimizationResults' is nonempty when the 'OptimizeHyperparameters' name-value pair argument is nonempty at the time you create the model. The values in 'HyperparameterOptimizationResults' depend on the value you specify for the 'HyperparameterOptimizationOptions' name-value pair argument when you create the model. • If you specify 'bayesopt' (default), then HyperparameterOptimizationResults is an object of class BayesianOptimization. • If you specify 'gridsearch' or 'randomsearch', then HyperparameterOptimizationResults is a table of the hyperparameters used, observed 35-2204

fitcdiscr

objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst). • Supported name-value pair arguments, and any differences, are: • 'ClassNames' • 'Cost' • 'DiscrimType' • 'HyperparameterOptimizationOptions' — For cross-validation, tall optimization supports only 'Holdout' validation. By default, the software selects and reserves 20% of the data as holdout validation data, and trains the model using the rest of the data. You can specify a different value for the holdout fraction by using this argument. For example, specify 'HyperparameterOptimizationOptions',struct('Holdout',0.3) to reserve 30% of the data as validation data. • 'OptimizeHyperparameters' — The only eligible parameter to optimize is 'DiscrimType'. Specifying 'auto' uses 'DiscrimType'. • 'PredictorNames' • 'Prior' • 'ResponseName' • 'ScoreTransform' • 'Weights' • For tall arrays and tall tables, fitcdiscr returns a CompactClassificationDiscriminant object, which contains most of the same properties as a ClassificationDiscriminant object. The main difference is that the compact object is sensitive to memory requirements. The compact object does not include properties that include the data, or that include an array of the same size as the data. The compact object does not contain these ClassificationDiscriminant properties: • ModelParameters • NumObservations • HyperparameterOptimizationResults • RowsUsed • XCentered • W • X • Y Additionally, the compact object does not support these ClassificationDiscriminant methods: • compact • crossval • cvshrink • resubEdge • resubLoss • resubMargin 35-2205

35

Functions

• resubPredict For more information, see “Tall Arrays”. Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To perform parallel hyperparameter optimization, use the 'HyperparameterOptimizationOptions', struct('UseParallel',true) name-value argument in the call to the fitcdiscr function. For more information on parallel hyperparameter optimization, see “Parallel Bayesian Optimization” on page 10-7. For general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationDiscriminant | ClassificationPartitionedModel | predict | crossval | classify Topics “Discriminant Analysis Classification” on page 21-2 “Improving Discriminant Analysis Models” on page 21-15 “Regularize Discriminant Analysis Classifier” on page 21-21

35-2206

fitcecoc

fitcecoc Fit multiclass models for support vector machines or other classifiers

Syntax Mdl = fitcecoc(Tbl,ResponseVarName) Mdl = fitcecoc(Tbl,formula) Mdl = fitcecoc(Tbl,Y) Mdl = fitcecoc(X,Y) Mdl = fitcecoc( ___ ,Name,Value) [Mdl,HyperparameterOptimizationResults] = fitcecoc( ___ ,Name,Value)

Description Mdl = fitcecoc(Tbl,ResponseVarName) returns a full, trained, multiclass, error-correcting output codes (ECOC) model on page 35-2242 using the predictors in table Tbl and the class labels in Tbl.ResponseVarName. fitcecoc uses K(K – 1)/2 binary support vector machine (SVM) models using the one-versus-one coding design on page 35-2242, where K is the number of unique class labels (levels). Mdl is a ClassificationECOC model. Mdl = fitcecoc(Tbl,formula) returns an ECOC model using the predictors in table Tbl and the class labels. formula is an explanatory model of the response and a subset of predictor variables in Tbl used for training. Mdl = fitcecoc(Tbl,Y) returns an ECOC model using the predictors in table Tbl and the class labels in vector Y. Mdl = fitcecoc(X,Y) returns a trained ECOC model using the predictors X and the class labels Y. Mdl = fitcecoc( ___ ,Name,Value) returns an ECOC model with additional options specified by one or more Name,Value pair arguments, using any of the previous syntaxes. For example, specify different binary learners, a different coding design, or to cross-validate. It is good practice to cross-validate using the Kfold Name,Value pair argument. The cross-validation results determine how well the model generalizes. [Mdl,HyperparameterOptimizationResults] = fitcecoc( ___ ,Name,Value) also returns hyperparameter optimization details when you specify the OptimizeHyperparameters name-value pair argument and use linear or kernel binary learners. For other Learners, the HyperparameterOptimizationResults property of Mdl contains the results.

Examples Train Multiclass Model Using SVM Learners Train a multiclass error-correcting output codes (ECOC) model using support vector machine (SVM) binary learners. 35-2207

35

Functions

Load Fisher's iris data set. Specify the predictor data X and the response data Y. load fisheriris X = meas; Y = species;

Train a multiclass ECOC model using the default options. Mdl = fitcecoc(X,Y) Mdl = ClassificationECOC ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: BinaryLearners: CodingName:

'Y' [] {'setosa' 'versicolor' 'none' {3x1 cell} 'onevsone'

'virginica'}

Mdl is a ClassificationECOC model. By default, fitcecoc uses SVM binary learners and a oneversus-one coding design. You can access Mdl properties using dot notation. Display the class names and the coding design matrix. Mdl.ClassNames ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } CodingMat = Mdl.CodingMatrix CodingMat = 3×3 1 -1 0

1 0 -1

0 1 -1

A one-versus-one coding design for three classes yields three binary learners. The columns of CodingMat correspond to the learners, and the rows correspond to the classes. The class order is the same as the order in Mdl.ClassNames. For example, CodingMat(:,1) is [1; –1; 0] and indicates that the software trains the first SVM binary learner using all observations classified as 'setosa' and 'versicolor'. Because 'setosa' corresponds to 1, it is the positive class; 'versicolor' corresponds to –1, so it is the negative class. You can access each binary learner using cell indexing and dot notation. Mdl.BinaryLearners{1}

% The first binary learner

ans = CompactClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: [-1 1]

35-2208

fitcecoc

ScoreTransform: Beta: Bias: KernelParameters:

'none' [4x1 double] 1.4505 [1x1 struct]

Compute the resubstitution classification error. error = resubLoss(Mdl) error = 0.0067

The classification error on the training data is small, but the classifier might be an overfitted model. You can cross-validate the classifier using crossval and compute the cross-validation classification error instead.

Train Multiclass Linear Classification Model Create a default linear learner template, and then use it to train an ECOC model containing multiple binary linear classification models. Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. The data contains 13 classes. Create a default linear learner template. t = templateLinear t = Fit template for Linear. Learner: 'svm'

t is a template object for a linear learner. All of the properties of t are empty. When you pass t to a training function, such as fitcecoc for ECOC multiclass classification, the software sets the empty properties to their respective default values. For example, the software sets Type to "classification". To modify the default values see the name-value arguments for templateLinear. Train an ECOC model containing of multiple binary linear classification models that identify the software product given the frequency distribution of words on a documentation web page. For faster training time, transpose the predictor data, and specify that observations correspond to columns. X = X'; rng(1); % For reproducibility Mdl = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns') Mdl = CompactClassificationECOC ResponseName: 'Y'

35-2209

35

Functions

ClassNames: ScoreTransform: BinaryLearners: CodingMatrix:

[comm dsp 'none' {78x1 cell} [13x78 double]

ecoder

fixedpoint

hdlcoder

phased

physmod

Alternatively, you can train an ECOC model containing default linear classification models by specifying "Learners","Linear". To conserve memory, fitcecoc returns trained ECOC models containing linear classification learners in CompactClassificationECOC model objects.

Cross-Validate ECOC Classifier Cross-validate an ECOC classifier with SVM binary learners, and estimate the generalized classification error. Load Fisher's iris data set. Specify the predictor data X and the response data Y. load fisheriris X = meas; Y = species; rng(1); % For reproducibility

Create an SVM template, and standardize the predictors. t = templateSVM('Standardize',true) t = Fit template for SVM. Standardize: 1

t is an SVM template. Most of the template object properties are empty. When training the ECOC classifier, the software sets the applicable properties to their default values. Train the ECOC classifier, and specify the class order. Mdl = fitcecoc(X,Y,'Learners',t,... 'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a ClassificationECOC classifier. You can access its properties using dot notation. Cross-validate Mdl using 10-fold cross-validation. CVMdl = crossval(Mdl);

CVMdl is a ClassificationPartitionedECOC cross-validated ECOC classifier. Estimate the generalized classification error. genError = kfoldLoss(CVMdl) genError = 0.0400

35-2210

simu

fitcecoc

The generalized classification error is 4%, which indicates that the ECOC classifier generalizes fairly well.

Estimate Posterior Probabilities Using ECOC Classifier Train an ECOC classifier using SVM binary learners. First predict the training-sample labels and class posterior probabilities. Then predict the maximum class posterior probability at each point in a grid. Visualize the results. Load Fisher's iris data set. Specify the petal dimensions as the predictors and the species names as the response. load fisheriris X = meas(:,3:4); Y = species; rng(1); % For reproducibility

Create an SVM template. Standardize the predictors, and specify the Gaussian kernel. t = templateSVM('Standardize',true,'KernelFunction','gaussian');

t is an SVM template. Most of its properties are empty. When the software trains the ECOC classifier, it sets the applicable properties to their default values. Train the ECOC classifier using the SVM template. Transform classification scores to class posterior probabilities (which are returned by predict or resubPredict) using the 'FitPosterior' namevalue pair argument. Specify the class order using the 'ClassNames' name-value pair argument. Display diagnostic messages during training by using the 'Verbose' name-value pair argument. Mdl = fitcecoc(X,Y,'Learners',t,'FitPosterior',true,... 'ClassNames',{'setosa','versicolor','virginica'},... 'Verbose',2); Training binary learner 1 (SVM) out of 3 with 50 negative and 50 positive observations. Negative class indices: 2 Positive class indices: 1 Fitting posterior probabilities for learner 1 (SVM). Training binary learner 2 (SVM) out of 3 with 50 negative and 50 positive observations. Negative class indices: 3 Positive class indices: 1 Fitting posterior probabilities for learner 2 (SVM). Training binary learner 3 (SVM) out of 3 with 50 negative and 50 positive observations. Negative class indices: 3 Positive class indices: 2 Fitting posterior probabilities for learner 3 (SVM).

Mdl is a ClassificationECOC model. The same SVM template applies to each binary learner, but you can adjust options for each binary learner by passing in a cell vector of templates. Predict the training-sample labels and class posterior probabilities. Display diagnostic messages during the computation of labels and class posterior probabilities by using the 'Verbose' namevalue pair argument. 35-2211

35

Functions

[label,~,~,Posterior] = resubPredict(Mdl,'Verbose',1); Predictions from all learners have been computed. Loss for all observations has been computed. Computing posterior probabilities... Mdl.BinaryLoss ans = 'quadratic'

The software assigns an observation to the class that yields the smallest average binary loss. Because all binary learners are computing posterior probabilities, the binary loss function is quadratic. Display a random set of results. idx = randsample(size(X,1),10,1); Mdl.ClassNames ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } table(Y(idx),label(idx),Posterior(idx,:),... 'VariableNames',{'TrueLabel','PredLabel','Posterior'}) ans=10×3 table TrueLabel ______________

PredLabel ______________

Posterior ______________________________________

{'virginica' } {'virginica' } {'virginica' } {'versicolor'} {'setosa' } {'versicolor'} {'versicolor'} {'setosa' } {'versicolor'} {'setosa' }

{'virginica' } {'virginica' } {'virginica' } {'versicolor'} {'setosa' } {'virginica' } {'versicolor'} {'setosa' } {'versicolor'} {'setosa' }

0.0039319 0.017066 0.014947 2.2197e-14 0.999 2.2195e-14 2.2194e-14 0.999 0.0085638 0.999

0.0039866 0.018262 0.015855 0.87318 0.00025091 0.059427 0.97002 0.0002499 0.98259 0.00025013

0.99208 0.96467 0.9692 0.12682 0.00074639 0.94057 0.029984 0.00074741 0.0088482 0.00074718

The columns of Posterior correspond to the class order of Mdl.ClassNames. Define a grid of values in the observed predictor space. Predict the posterior probabilities for each instance in the grid. xMax = max(X); xMin = min(X); x1Pts = linspace(xMin(1),xMax(1)); x2Pts = linspace(xMin(2),xMax(2)); [x1Grid,x2Grid] = meshgrid(x1Pts,x2Pts); [~,~,~,PosteriorRegion] = predict(Mdl,[x1Grid(:),x2Grid(:)]);

For each coordinate on the grid, plot the maximum class posterior probability among all classes. 35-2212

fitcecoc

contourf(x1Grid,x2Grid,... reshape(max(PosteriorRegion,[],2),size(x1Grid,1),size(x1Grid,2))); h = colorbar; h.YLabel.String = 'Maximum posterior'; h.YLabel.FontSize = 15; hold on gh = gscatter(X(:,1),X(:,2),Y,'krk','*xd',8); gh(2).LineWidth = 2; gh(3).LineWidth = 2; title('Iris Petal Measurements and Maximum Posterior') xlabel('Petal length (cm)') ylabel('Petal width (cm)') axis tight legend(gh,'Location','NorthWest') hold off

Speed Up Training ECOC Classifiers Using Binning and Parallel Computing Train a one-versus-all ECOC classifier using a GentleBoost ensemble of decision trees with surrogate splits. To speed up training, bin numeric predictors and use parallel computing. Binning is valid only when fitcecoc uses a tree learner. After training, estimate the classification error using 10-fold cross-validation. Note that parallel computing requires Parallel Computing Toolbox™. 35-2213

35

Functions

Load Sample Data Load and inspect the arrhythmia data set. load arrhythmia [n,p] = size(X) n = 452 p = 279 isLabels = unique(Y); nLabels = numel(isLabels) nLabels = 13 tabulate(categorical(Y)) Value 1 2 3 4 5 6 7 8 9 10 14 15 16

Count 245 44 15 15 13 25 3 2 9 50 4 5 22

Percent 54.20% 9.73% 3.32% 3.32% 2.88% 5.53% 0.66% 0.44% 1.99% 11.06% 0.88% 1.11% 4.87%

The data set contains 279 predictors, and the sample size of 452 is relatively small. Of the 16 distinct labels, only 13 are represented in the response (Y). Each label describes various degrees of arrhythmia, and 54.20% of the observations are in class 1. Train One-Versus-All ECOC Classifier Create an ensemble template. You must specify at least three arguments: a method, a number of learners, and the type of learner. For this example, specify 'GentleBoost' for the method, 100 for the number of learners, and a decision tree template that uses surrogate splits because there are missing observations. tTree = templateTree('surrogate','on'); tEnsemble = templateEnsemble('GentleBoost',100,tTree);

tEnsemble is a template object. Most of its properties are empty, but the software fills them with their default values during training. Train a one-versus-all ECOC classifier using the ensembles of decision trees as binary learners. To speed up training, use binning and parallel computing. • Binning ('NumBins',50) — When you have a large training data set, you can speed up training (a potential decrease in accuracy) by using the 'NumBins' name-value pair argument. This argument is valid only when fitcecoc uses a tree learner. If you specify the 'NumBins' value, then the software bins every numeric predictor into a specified number of equiprobable bins, and 35-2214

fitcecoc

then grows trees on the bin indices instead of the original data. You can try 'NumBins',50 first, and then change the 'NumBins' value depending on the accuracy and training speed. • Parallel computing ('Options',statset('UseParallel',true)) — With a Parallel Computing Toolbox license, you can speed up the computation by using parallel computing, which sends each binary learner to a worker in the pool. The number of workers depends on your system configuration. When you use decision trees for binary learners, fitcecoc parallelizes training using Intel® Threading Building Blocks (TBB) for dual-core systems and above. Therefore, specifying the 'UseParallel' option is not helpful on a single computer. Use this option on a cluster. Additionally, specify that the prior probabilities are 1/K, where K = 13 is the number of distinct classes. options = statset('UseParallel',true); Mdl = fitcecoc(X,Y,'Coding','onevsall','Learners',tEnsemble,... 'Prior','uniform','NumBins',50,'Options',options); Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6).

Mdl is a ClassificationECOC model. Cross-Validation Cross-validate the ECOC classifier using 10-fold cross-validation. CVMdl = crossval(Mdl,'Options',options); Warning: One or more folds do not contain points from all the groups.

CVMdl is a ClassificationPartitionedECOC model. The warning indicates that some classes are not represented while the software trains at least one fold. Therefore, those folds cannot predict labels for the missing classes. You can inspect the results of a fold using cell indexing and dot notation. For example, access the results of the first fold by entering CVMdl.Trained{1}. Use the cross-validated ECOC classifier to predict validation-fold labels. You can compute the confusion matrix by using confusionchart. Move and resize the chart by changing the inner position property to ensure that the percentages appear in the row summary. oofLabel = kfoldPredict(CVMdl,'Options',options); ConfMat = confusionchart(Y,oofLabel,'RowSummary','total-normalized'); ConfMat.InnerPosition = [0.10 0.12 0.85 0.85];

35-2215

35

Functions

Reproduce Binned Data Reproduce binned predictor data by using the BinEdges property of the trained model and the discretize function. X = Mdl.X; % Predictor data Xbinned = zeros(size(X)); edges = Mdl.BinEdges; % Find indices of binned predictors. idxNumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxNumeric) idxNumeric = idxNumeric'; end for j = idxNumeric x = X(:,j); % Convert x to array if x is a table. if istable(x) x = table2array(x); end % Group x into bins by using the discretize function. xbinned = discretize(x,[-inf; edges{j}; inf]); Xbinned(:,j) = xbinned; end

35-2216

fitcecoc

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

Optimize ECOC Classifier Optimize hyperparameters automatically using fitcecoc. Load the fisheriris data set. load fisheriris X = meas; Y = species;

Find hyperparameters that minimize five-fold cross-validation loss by using automatic hyperparameter optimization. For reproducibility, set the random seed and use the 'expectedimprovement-plus' acquisition function. rng default Mdl = fitcecoc(X,Y,'OptimizeHyperparameters','auto',... 'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName',... 'expected-improvement-plus'))

|================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Coding | BoxConst | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 1 | Best | 0.26667 | 22.411 | 0.26667 | 0.26667 | onevsall | 76 | 2 | Best | 0.10667 | 0.55839 | 0.10667 | 0.11803 | onevsone | 0.001 | 3 | Best | 0.04 | 1.4057 | 0.04 | 0.13653 | onevsall | 16 | 4 | Accept | 0.046667 | 0.56881 | 0.04 | 0.11375 | onevsone | 0.0 | 5 | Accept | 0.08 | 0.5493 | 0.04 | 0.044921 | onevsall | 18 | 6 | Accept | 0.17333 | 15.55 | 0.04 | 0.040013 | onevsall | 13 | 7 | Accept | 0.04 | 0.45646 | 0.04 | 0.040007 | onevsone | 0.05 | 8 | Accept | 0.046667 | 0.41354 | 0.04 | 0.04 | onevsone | 0.06 | 9 | Accept | 0.04 | 0.43019 | 0.04 | 0.040008 | onevsone | 98 | 10 | Accept | 0.16667 | 0.45062 | 0.04 | 0.040001 | onevsone | 0.006 | 11 | Accept | 0.04 | 0.43982 | 0.04 | 0.040006 | onevsone | 14 | 12 | Accept | 0.16667 | 0.47276 | 0.04 | 0.039991 | onevsone | 0.004 | 13 | Accept | 0.046667 | 0.60997 | 0.04 | 0.039993 | onevsone | 3. | 14 | Accept | 0.073333 | 0.42587 | 0.04 | 0.039995 | onevsall | 0.04 | 15 | Accept | 0.04 | 0.46644 | 0.04 | 0.039989 | onevsone | 98 | 16 | Accept | 0.046667 | 10.161 | 0.04 | 0.039988 | onevsall | 93 | 17 | Accept | 0.04 | 0.47093 | 0.04 | 0.039987 | onevsone | 0.001 | 18 | Accept | 0.04 | 0.47244 | 0.04 | 0.039986 | onevsone | 0.0 | 19 | Accept | 0.046667 | 0.98501 | 0.04 | 0.039987 | onevsone | 64 | 20 | Accept | 0.04 | 0.48397 | 0.04 | 0.039987 | onevsone | 0.02 |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Coding | BoxConst | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 21 | Accept | 0.093333 | 5.6343 | 0.04 | 0.039988 | onevsone | 40 | 22 | Best | 0.02 | 0.43797 | 0.02 | 0.020006 | onevsone | 0.01 | 23 | Accept | 0.046667 | 0.42746 | 0.02 | 0.020008 | onevsone | 3. | 24 | Accept | 0.026667 | 0.45558 | 0.02 | 0.020023 | onevsone | 0.001 | 25 | Accept | 0.046667 | 0.45016 | 0.02 | 0.026047 | onevsone | 0.1

35-2217

35

Functions

| | | | |

26 27 28 29 30

| | | | |

Accept Accept Accept Accept Accept

| | | | |

0.12 0.02 0.02 0.046667 0.12667

| | | | |

0.4087 0.43199 0.43313 0.45609 0.42198

| | | | |

0.02 0.02 0.02 0.02 0.02

| | | | |

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 86.129 seconds Total objective function evaluation time: 67.3393 Best observed feasible point: Coding BoxConstraint ________ _____________ onevsone

0.012432

KernelScale ___________

Standardize ___________

0.18646

false

Observed objective function value = 0.02 Estimated objective function value = 0.041264 Function evaluation time = 0.43797 Best estimated feasible point (according to models): Coding BoxConstraint KernelScale Standardize ________ _____________ ___________ ___________ onevsone

0.0010314

0.036064

Estimated objective function value = 0.022163 Estimated function evaluation time = 0.43353

35-2218

false

0.032409 0.0247 0.022039 0.022131 0.022163

| | | | |

onevsone onevsone onevsone onevsone onevsone

| | | | |

0.001 0.001 0.001 94 9

fitcecoc

Mdl = ClassificationECOC ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: BinaryLearners: CodingName: HyperparameterOptimizationResults:

'Y' [] {'setosa' 'versicolor' 'virginica'} 'none' {3x1 cell} 'onevsone' [1x1 BayesianOptimization]

Train Multiclass ECOC Model with SVMs and Tall Arrays Create two multiclass ECOC models trained on tall data. Use linear binary learners for one of the models and kernel binary learners for the other. Compare the resubstitution classification error of the two models. In general, you can perform multiclass classification of tall data by using fitcecoc with linear or kernel binary learners. When you use fitcecoc to train a model on tall arrays, you cannot use SVM binary learners directly. However, you can use either linear or kernel binary classification models that use SVMs. When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. If you want to run the example 35-2219

35

Functions

using the local MATLAB session when you have Parallel Computing Toolbox, you can change the global execution environment by using the mapreducer function. Create a datastore that references the folder containing Fisher's iris data set. Specify 'NA' values as missing data so that datastore replaces them with NaN values. Create tall versions of the predictor and response data. ds = datastore('fisheriris.csv','TreatAsMissing','NA'); t = tall(ds); Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6). X = [t.SepalLength t.SepalWidth t.PetalLength t.PetalWidth]; Y = t.Species;

Standardize the predictor data. Z = zscore(X);

Train a multiclass ECOC model that uses tall data and linear binary learners. By default, when you pass tall arrays to fitcecoc, the software trains linear binary learners that use SVMs. Because the response data contains only three unique classes, change the coding scheme from one-versus-all (which is the default when you use tall data) to one-versus-one (which is the default when you use inmemory data). For reproducibility, set the seeds of the random number generators using rng and tallrng. The results can vary depending on the number of workers and the execution environment for the tall arrays. For details, see “Control Where Your Code Runs”. rng('default') tallrng('default') mdlLinear = fitcecoc(Z,Y,'Coding','onevsone') Training binary learner 1 (Linear) out of 3. Training binary learner 2 (Linear) out of 3. Training binary learner 3 (Linear) out of 3. mdlLinear = CompactClassificationECOC ResponseName: 'Y' ClassNames: {'setosa' 'versicolor' ScoreTransform: 'none' BinaryLearners: {3×1 cell} CodingMatrix: [3×3 double]

'virginica'}

Properties, Methods

mdlLinear is a CompactClassificationECOC model composed of three binary learners. Train a multiclass ECOC model that uses tall data and kernel binary learners. First, create a templateKernel object to specify the properties of the kernel binary learners; in particular, 16

increase the number of expansion dimensions to 2 . tKernel = templateKernel('NumExpansionDimensions',2^16)

35-2220

fitcecoc

tKernel = Fit template for classification Kernel. BetaTolerance: BlockSize: BoxConstraint: Epsilon: NumExpansionDimensions: GradientTolerance: HessianHistorySize: IterationLimit: KernelScale: Lambda: Learner: LossFunction: Stream: VerbosityLevel: Version: Method: Type:

[] [] [] [] 65536 [] [] [] [] [] 'svm' [] [] [] 1 'Kernel' 'classification'

By default, the kernel binary learners use SVMs. Pass the templateKernel object to fitcecoc and change the coding scheme to one-versus-one. mdlKernel = fitcecoc(Z,Y,'Learners',tKernel,'Coding','onevsone') Training binary learner 1 (Kernel) out of 3. Training binary learner 2 (Kernel) out of 3. Training binary learner 3 (Kernel) out of 3. mdlKernel = CompactClassificationECOC ResponseName: 'Y' ClassNames: {'setosa' 'versicolor' ScoreTransform: 'none' BinaryLearners: {3×1 cell} CodingMatrix: [3×3 double]

'virginica'}

Properties, Methods

mdlKernel is also a CompactClassificationECOC model composed of three binary learners. Compare the resubstitution classification error of the two models. errorLinear = gather(loss(mdlLinear,Z,Y)) Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.4 sec Evaluation completed in 1.6 sec errorLinear = 0.0333 errorKernel = gather(loss(mdlKernel,Z,Y))

35-2221

35

Functions

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 15 sec Evaluation completed in 16 sec errorKernel = 0.0067

mdlKernel misclassifies a smaller percentage of the training data than mdlLinear.

Input Arguments Tbl — Sample data table Sample data, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor. Optionally, Tbl can contain one additional column for the response variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not accepted. If Tbl contains the response variable, and you want to use all remaining variables in Tbl as predictors, then specify the response variable using ResponseVarName. If Tbl contains the response variable, and you want to use only a subset of the remaining variables in Tbl as predictors, specify a formula using formula. If Tbl does not contain the response variable, specify a response variable using Y. The length of response variable and the number of Tbl rows must be equal. Data Types: table ResponseVarName — Response variable name name of variable in Tbl Response variable name, specified as the name of a variable in Tbl. You must specify ResponseVarName as a character vector or string scalar. For example, if the response variable Y is stored as Tbl.Y, then specify it as "Y". Otherwise, the software treats all columns of Tbl, including Y, as predictors when training the model. The response variable must be a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. If Y is a character array, then each element of the response variable must correspond to one row of the array. A good practice is to specify the order of the classes by using the ClassNames name-value argument. Data Types: char | string formula — Explanatory model of response variable and subset of predictor variables character vector | string scalar Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form "Y~x1+x2+x3". In this form, Y represents the response variable, and x1, x2, and x3 represent the predictor variables. 35-2222

fitcecoc

To specify a subset of variables in Tbl as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in Tbl that do not appear in formula. The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB identifiers. You can verify the variable names in Tbl by using the isvarname function. If the variable names are not valid, then you can convert them by using the matlab.lang.makeValidName function. Data Types: char | string Y — Class labels categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Class labels to which the ECOC model is trained, specified as a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. If Y is a character array, then each element must correspond to one row of the array. The length of Y and the number of rows of Tbl or X must be equal. It is good practice to specify the class order using the ClassNames name-value pair argument. Data Types: categorical | char | string | logical | single | double | cell X — Predictor data full matrix | sparse matrix Predictor data, specified as a full or sparse matrix. The length of Y and the number of observations in X must be equal. To specify the names of the predictors in the order of their appearance in X, use the PredictorNames name-value pair argument. Note • For linear classification learners, if you orient X so that observations correspond to columns and specify 'ObservationsIn','columns', then you can experience a significant reduction in optimization-execution time. • For all other learners, orient X so that observations correspond to rows. • fitcecoc supports sparse matrices for training linear classification models only.

Data Types: double | single Note The software treats NaN, empty character vector (''), empty string (""), , and elements as missing data. The software removes rows of X corresponding to missing values in Y. However, the treatment of missing values in X varies among binary learners. For details, see the training functions for your binary learners: fitcdiscr, fitckernel, fitcknn, fitclinear, fitcnb, fitcsvm, fitctree, or fitcensemble. Removing observations decreases the effective training or cross-validation sample size. 35-2223

35

Functions

Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Note You cannot use any cross-validation name-value argument together with the 'OptimizeHyperparameters' name-value argument. You can modify the cross-validation for 'OptimizeHyperparameters' only by using the 'HyperparameterOptimizationOptions' name-value argument. Example: 'Learners','tree','Coding','onevsone','CrossVal','on' specifies to use decision trees for all binary learners, a one-versus-one coding design, and to implement 10-fold crossvalidation. ECOC Classifier Options

Coding — Coding design 'onevsone' (default) | 'allpairs' | 'binarycomplete' | 'denserandom' | 'onevsall' | 'ordinal' | 'sparserandom' | 'ternarycomplete' | numeric matrix Coding design name, specified as the comma-separated pair consisting of 'Coding' and a numeric matrix or a value in this table.

35-2224

Value

Number of Binary Learners

Description

'allpairs' and 'onevsone'

K(K – 1)/2

For each binary learner, one class is positive, another is negative, and the software ignores the rest. This design exhausts all combinations of class pair assignments.

'binarycomplete'

2

'denserandom'

Random, but approximately 10 log2K

(K − 1)

−1

This design partitions the classes into all binary combinations, and does not ignore any classes. For each binary learner, all class assignments are –1 and 1 with at least one positive class and one negative class in the assignment. For each binary learner, the software randomly assigns classes into positive or negative classes, with at least one of each type. For more details, see “Random Coding Design Matrices” on page 35-2247.

fitcecoc

Value

Number of Binary Learners

Description

'onevsall'

K

For each binary learner, one class is positive and the rest are negative. This design exhausts all combinations of positive class assignments.

'ordinal'

K–1

For the first binary learner, the first class is negative and the rest are positive. For the second binary learner, the first two classes are negative and the rest are positive, and so on.

'sparserandom'

Random, but approximately 15 log2K

For each binary learner, the software randomly assigns classes as positive or negative with probability 0.25 for each, and ignores classes with probability 0.5. For more details, see “Random Coding Design Matrices” on page 352247.

'ternarycomplete'

K

(K + 1)

3 −2

+ 1 /2

This design partitions the classes into all ternary combinations. All class assignments are 0, –1, and 1 with at least one positive class and one negative class in each assignment.

You can also specify a coding design using a custom coding matrix, which is a K-by-L matrix. Each row corresponds to a class and each column corresponds to a binary learner. The class order (rows) corresponds to the order in ClassNames. Create the matrix by following these guidelines: • Every element of the custom coding matrix must be –1, 0, or 1, and the value must correspond to a dichotomous class assignment. Consider Coding(i,j), the class that learner j assigns to observations in class i. Value

Dichotomous Class Assignment

–1

Learner j assigns observations in class i to a negative class.

0

Before training, learner j removes observations in class i from the data set.

1

Learner j assigns observations in class i to a positive class.

• Every column must contain at least one –1 and one 1. • For all column indices i,j where i ≠ j, Coding(:,i) cannot equal Coding(:,j), and Coding(:,i) cannot equal –Coding(:,j). • All rows of the custom coding matrix must be different. For more details on the form of custom coding design matrices, see “Custom Coding Design Matrices” on page 35-2246. 35-2225

35

Functions

Example: 'Coding','ternarycomplete' Data Types: char | string | double | single | int16 | int32 | int64 | int8 FitPosterior — Flag indicating whether to transform scores to posterior probabilities false or 0 (default) | true or 1 Flag indicating whether to transform scores to posterior probabilities, specified as the commaseparated pair consisting of 'FitPosterior' and a true (1) or false (0). If FitPosterior is true, then the software transforms binary-learner classification scores to posterior probabilities. You can obtain posterior probabilities by using kfoldPredict, predict, or resubPredict. fitcecoc does not support fitting posterior probabilities if: • The ensemble method is AdaBoostM2, LPBoost, RUSBoost, RobustBoost, or TotalBoost. • The binary learners (Learners) are linear or kernel classification models that implement SVM. To obtain posterior probabilities for linear or kernel classification models, implement logistic regression instead. Example: 'FitPosterior',true Data Types: logical Learners — Binary learner templates 'svm' (default) | 'discriminant' | 'kernel' | 'knn' | 'linear' | 'naivebayes' | 'tree' | template object | cell vector of template objects Binary learner templates, specified as the comma-separated pair consisting of 'Learners' and a character vector, string scalar, template object, or cell vector of template objects. Specifically, you can specify binary classifiers such as SVM, and the ensembles that use GentleBoost, LogitBoost, and RobustBoost, to solve multiclass problems. However, fitcecoc also supports multiclass models as binary classifiers. • If Learners is a character vector or string scalar, then the software trains each binary learner using the default values of the specified algorithm. This table summarizes the available algorithms.

35-2226

Value

Description

'discriminant'

Discriminant analysis. For default options, see templateDiscriminant.

'kernel'

Kernel classification model. For default options, see templateKernel.

'knn'

k-nearest neighbors. For default options, see templateKNN.

'linear'

Linear classification model. For default options, see templateLinear.

'naivebayes'

Naive Bayes. For default options, see templateNaiveBayes.

'svm'

SVM. For default options, see templateSVM.

fitcecoc

Value

Description

'tree'

Classification trees. For default options, see templateTree.

• If Learners is a template object, then each binary learner trains according to the stored options. You can create a template object using: • templateDiscriminant, for discriminant analysis. • templateEnsemble, for ensemble learning. You must at least specify the learning method (Method), the number of learners (NLearn), and the type of learner (Learners). You cannot use the AdaBoostM2 ensemble method for binary learning. • templateKernel, for kernel classification. • templateKNN, for k-nearest neighbors. • templateLinear, for linear classification. • templateNaiveBayes, for naive Bayes. • templateSVM, for SVM. • templateTree, for classification trees. • If Learners is a cell vector of template objects, then: • Cell j corresponds to binary learner j (in other words, column j of the coding design matrix), and the cell vector must have length L. L is the number of columns in the coding design matrix. For details, see Coding. • To use one of the built-in loss functions for prediction, then all binary learners must return a score in the same range. For example, you cannot include default SVM binary learners with default naive Bayes binary learners. The former returns a score in the range (-∞,∞), and the latter returns a posterior probability as a score. Otherwise, you must provide a custom loss as a function handle to functions such as predict and loss. • You cannot specify linear classification model learner templates with any other template. • Similarly, you cannot specify kernel classification model learner templates with any other template. By default, the software trains learners using default SVM templates. Example: 'Learners','tree' NumBins — Number of bins for numeric predictors [](empty) (default) | positive integer scalar Number of bins for numeric predictors, specified as the comma-separated pair consisting of 'NumBins' and a positive integer scalar. This argument is valid only when fitcecoc uses a tree learner, that is, 'Learners' is either 'tree' or a template object created by using templateTree, or a template object created by using templateEnsemble with tree weak learners. • If the 'NumBins' value is empty (default), then fitcecoc does not bin any predictors. • If you specify the 'NumBins' value as a positive integer scalar (numBins), then fitcecoc bins every numeric predictor into at most numBins equiprobable bins, and then grows trees on the bin indices instead of the original data. • The number of bins can be less than numBins if a predictor has fewer than numBins unique values. 35-2227

35

Functions

• fitcecoc does not bin categorical predictors. When you use a large training data set, this binning option speeds up training but might cause a potential decrease in accuracy. You can try 'NumBins',50 first, and then change the value depending on the accuracy and training speed. A trained model stores the bin edges in the BinEdges property. Example: 'NumBins',50 Data Types: single | double NumConcurrent — Number of binary learners concurrently trained 1 (default) | positive integer scalar Number of binary learners concurrently trained, specified as the comma-separated pair consisting of 'NumConcurrent' and a positive integer scalar. The default value is 1, which means fitcecoc trains the binary learners sequentially. Note This option applies only when you use fitcecoc on tall arrays. See “Tall Arrays” on page 352249 for more information. Data Types: single | double ObservationsIn — Predictor data observation dimension 'rows' (default) | 'columns' Predictor data observation dimension, specified as the comma-separated pair consisting of 'ObservationsIn' and 'columns' or 'rows'. Note • For linear classification learners, if you orient X so that observations correspond to columns and specify 'ObservationsIn','columns', then you can experience a significant reduction in optimization-execution time. • For all other learners, orient X so that observations correspond to rows.

Example: 'ObservationsIn','columns' Verbose — Verbosity level 0 (default) | 1 | 2 Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0, 1, or 2. Verbose controls the amount of diagnostic information per binary learner that the software displays in the Command Window. This table summarizes the available verbosity level options.

35-2228

fitcecoc

Value

Description

0

The software does not display diagnostic information.

1

The software displays diagnostic messages every time it trains a new binary learner.

2

The software displays extra diagnostic messages every time it trains a new binary learner.

Each binary learner has its own verbosity level that is independent of this name-value pair argument. To change the verbosity level of a binary learner, create a template object and specify the 'Verbose' name-value pair argument. Then, pass the template object to fitcecoc by using the 'Learners' name-value pair argument. Example: 'Verbose',1 Data Types: double | single Cross-Validation Options

CrossVal — Flag to train cross-validated classifier 'off' (default) | 'on' Flag to train a cross-validated classifier, specified as the comma-separated pair consisting of 'Crossval' and 'on' or 'off'. If you specify 'on', then the software trains a cross-validated classifier with 10 folds. You can override this cross-validation setting using one of the CVPartition, Holdout, KFold, or Leaveout name-value pair arguments. You can only use one cross-validation name-value pair argument at a time to create a cross-validated model. Alternatively, cross-validate later by passing Mdl to crossval. Example: 'Crossval','on' CVPartition — Cross-validation partition [] (default) | cvpartition object Cross-validation partition, specified as a cvpartition object that specifies the type of crossvalidation and the indexing for the training and validation sets. To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,KFold=5). Then, you can specify the cross-validation partition by setting CVPartition=cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as a scalar value in the range [0,1]. If you specify Holdout=p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data. 35-2229

35

Functions

2

Store the compact trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Holdout=0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in the cross-validated model, specified as a positive integer value greater than 1. If you specify KFold=k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact trained models in a k-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: KFold=5 Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. If you specify 'Leaveout','on', then, for each of the n observations, where n is size(Mdl.X,1), the software: 1

Reserves the observation as validation data, and trains the model using the other n – 1 observations

2

Stores the n compact, trained models in the cells of a n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four options only: CVPartition, Holdout, KFold, or Leaveout. Note Leave-one-out is not recommended for cross-validating ECOC models composed of linear or kernel classification model learners. Example: 'Leaveout','on' Other Classification Options

CategoricalPredictors — Categorical predictors list vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | 'all' Categorical predictors list, specified as one of the values in this table. 35-2230

fitcecoc

Value

Description

Vector of positive integers

Each entry in the vector is an index value indicating that the corresponding predictor is categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If fitcecoc uses a subset of input variables as predictors, then the function indexes the predictors using only the subset. The CategoricalPredictors values do not count the response variable, observation weights variable, or any other variables that the function does not use.

Logical vector

A true entry means that the corresponding predictor is categorical. The length of the vector is p.

Character matrix

Each row of the matrix is the name of a predictor variable. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length.

String array or cell array of character vectors

Each element in the array is the name of a predictor variable. The names must match the entries in PredictorNames.

"all"

All predictors are categorical.

Specification of 'CategoricalPredictors' is appropriate if: • At least one predictor is categorical and all binary learners are classification trees, naive Bayes learners, SVMs, linear learners, kernel learners, or ensembles of classification trees. • All predictors are categorical and at least one binary learner is kNN. If you specify 'CategoricalPredictors' for any other learner, then the software warns that it cannot train that binary learner. For example, the software cannot train discriminant analysis classifiers using categorical predictors. Each learner identifies and treats categorical predictors in the same way as the fitting function corresponding to the learner. See 'CategoricalPredictors' of fitckernel for kernel learners, 'CategoricalPredictors' of fitcknn for k-nearest learners, 'CategoricalPredictors' of fitclinear for linear learners, 'CategoricalPredictors' of fitcnb for naive Bayes learners, 'CategoricalPredictors' of fitcsvm for SVM learners, and 'CategoricalPredictors' of fitctree for tree learners. Example: 'CategoricalPredictors','all' Data Types: single | double | logical | char | string | cell ClassNames — Names of classes to use for training categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Names of classes to use for training, specified as a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. ClassNames must have the same data type as the response variable in Tbl or Y. If ClassNames is a character array, then each element must correspond to one row of the array. Use ClassNames to: 35-2231

35

Functions

• Specify the order of the classes during training. • Specify the order of any input or output argument dimension that corresponds to the class order. For example, use ClassNames to specify the order of the dimensions of Cost or the column order of classification scores returned by predict. • Select a subset of classes for training. For example, suppose that the set of all distinct class names in Y is ["a","b","c"]. To train the model using observations from classes "a" and "c" only, specify "ClassNames",["a","c"]. The default value for ClassNames is the set of all distinct class names in the response variable in Tbl or Y. Example: "ClassNames",["b","g"] Data Types: categorical | char | string | logical | single | double | cell Cost — Misclassification cost square matrix | structure array Misclassification cost, specified as the comma-separated pair consisting of 'Cost' and a square matrix or structure. If you specify: • The square matrix Cost, then Cost(i,j) is the cost of classifying a point into class j if its true class is i. That is, the rows correspond to the true class and the columns correspond to the predicted class. To specify the class order for the corresponding rows and columns of Cost, additionally specify the ClassNames name-value pair argument. • The structure S, then it must have two fields: • S.ClassNames, which contains the class names as a variable of the same data type as Y • S.ClassificationCosts, which contains the cost matrix with rows and columns ordered as in S.ClassNames The default is ones(K) - eye(K), where K is the number of distinct classes. Example: 'Cost',[0 1 2 ; 1 0 2; 2 2 0] Data Types: double | single | struct Options — Parallel computing options [] (default) | structure array returned by statset Parallel computing options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. Parallel computation requires Parallel Computing Toolbox. fitcecoc uses 'Streams', 'UseParallel', and 'UseSubtreams' fields. This table summarizes the available options.

35-2232

fitcecoc

Option

Description

'Streams'

A RandStream object or cell array of such objects. If you do not specify Streams, the software uses the default stream or streams. If you specify Streams, use a single object except when the following are true: • You have an open parallel pool. • UseParallel is true. • UseSubstreams is false. In that case, use a cell array of the same size as the parallel pool. If a parallel pool is not open, then the software tries to open one (depending on your preferences), and Streams must supply a single random number stream.

'UseParallel'

If you have Parallel Computing Toolbox, then you can invoke a pool of workers by setting 'UseParallel',true. The fitcecoc function sends each binary learner to a worker in the pool. When you use decision trees for binary learners, fitcecoc parallelizes training using Intel Threading Building Blocks (TBB) for dual-core systems and above. Therefore, specifying the 'UseParallel' option is not helpful on a single computer. Use this option on a cluster. For details on Intel TBB, see https://www.intel.com/ content/www/us/en/developer/tools/oneapi/ onetbb.html.

'UseSubstreams'

Set to true to compute in parallel using the stream specified by 'Streams'. Default is false. For example, set Streams to a type allowing substreams, such as'mlfg6331_64' or 'mrg32k3a'.

A best practice to ensure more predictable results is to use parpool and explicitly create a parallel pool before you invoke parallel computing using fitcecoc. Example: 'Options',statset('UseParallel',true) Data Types: struct PredictorNames — Predictor variable names string array of unique names | cell array of unique character vectors Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The functionality of PredictorNames depends on the way you supply the training data. • If you supply X and Y, then you can use PredictorNames to assign names to the predictor variables in X. 35-2233

35

Functions

• The order of the names in PredictorNames must correspond to the column order of X. That is, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal. • By default, PredictorNames is {'x1','x2',...}. • If you supply Tbl, then you can use PredictorNames to choose which predictor variables to use in training. That is, fitcecoc uses only the predictor variables in PredictorNames and the response variable during training. • PredictorNames must be a subset of Tbl.Properties.VariableNames and cannot include the name of the response variable. • By default, PredictorNames contains the names of all predictor variables. • A good practice is to specify the predictors for training using either PredictorNames or formula, but not both. Example: "PredictorNames", ["SepalLength","SepalWidth","PetalLength","PetalWidth"] Data Types: string | cell Prior — Prior probabilities 'empirical' (default) | 'uniform' | numeric vector | structure array Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and a value in this table. Value

Description

'empirical'

The class prior probabilities are the class relative frequencies in Y.

'uniform'

All class prior probabilities are equal to 1/K, where K is the number of classes.

numeric vector

Each element is a class prior probability. Order the elements according to Mdl.ClassNames or specify the order using the ClassNames namevalue pair argument. The software normalizes the elements such that they sum to 1.

structure

A structure S with two fields: • S.ClassNames contains the class names as a variable of the same type as Y. • S.ClassProbs contains a vector of corresponding prior probabilities. The software normalizes the elements such that they sum to 1.

For more details on how the software incorporates class prior probabilities, see “Prior Probabilities and Misclassification Cost” on page 35-2246. Example: struct('ClassNames', {{'setosa','versicolor','virginica'}},'ClassProbs',1:3) Data Types: single | double | char | string | struct 35-2234

fitcecoc

ResponseName — Response variable name "Y" (default) | character vector | string scalar Response variable name, specified as a character vector or string scalar. • If you supply Y, then you can use ResponseName to specify a name for the response variable. • If you supply ResponseVarName or formula, then you cannot use ResponseName. Example: "ResponseName","response" Data Types: char | string Weights — Observation weights numeric vector of positive values | name of variable in Tbl Observation weights, specified as the comma-separated pair consisting of 'Weights' and a numeric vector of positive values or name of a variable in Tbl. The software weighs the observations in each row of X or Tbl with the corresponding value in Weights. The size of Weights must equal the number of rows of X or Tbl. If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored as Tbl.W, then specify it as 'W'. Otherwise, the software treats all columns of Tbl, including W, as predictors or the response when training the model. The software normalizes Weights to sum up to the value of the prior probability in the respective class. By default, Weights is ones(n,1), where n is the number of observations in X or Tbl. Data Types: double | single | char | string

Hyperparameter Optimization

OptimizeHyperparameters — Parameters to optimize 'none' (default) | 'auto' | 'all' | string array or cell array of eligible parameter names | vector of optimizableVariable objects Parameters to optimize, specified as the comma-separated pair consisting of 'OptimizeHyperparameters' and one of the following: • 'none' — Do not optimize. • 'auto' — Use {'Coding'} along with the default parameters for the specified Learners: • Learners = 'svm' (default) — {'BoxConstraint','KernelScale','Standardize'} • Learners = 'discriminant' — {'Delta','Gamma'} • Learners = 'kernel' — {'KernelScale','Lambda','Standardize'} • Learners = 'knn' — {'Distance','NumNeighbors','Standardize'} • Learners = 'linear' — {'Lambda','Learner'} • Learners = 'tree' — {'MinLeafSize'} 35-2235

35

Functions

• 'all' — Optimize all eligible parameters. • String array or cell array of eligible parameter names • Vector of optimizableVariable objects, typically the output of hyperparameters The optimization attempts to minimize the cross-validation loss (error) for fitcecoc by varying the parameters. For information about cross-validation loss in a different context, see “Classification Loss” on page 35-4305. To control the cross-validation type and other aspects of the optimization, use the HyperparameterOptimizationOptions name-value pair. Note The values of 'OptimizeHyperparameters' override any values you specify using other name-value arguments. For example, setting 'OptimizeHyperparameters' to 'auto' causes fitcecoc to optimize hyperparameters corresponding to the 'auto' option and to ignore any specified values for the hyperparameters. The eligible parameters for fitcecoc are: • Coding — fitcecoc searches among 'onevsall' and 'onevsone'. • The eligible hyperparameters for the chosen Learners, as specified in this table. Learners

Eligible Hyperparameters (Bold = Default)

'discriminant Delta ' DiscrimType

'kernel'

'knn'

35-2236

Default Range Log-scaled in the range [1e-6,1e3] 'linear', 'quadratic', 'diagLinear', 'diagQuadratic', 'pseudoLinear', and 'pseudoQuadratic'

Gamma

Real values in [0,1]

Lambda

Positive values log-scaled in the range [1e-3/NumObservations,1e3/ NumObservations]

KernelScale

Positive values log-scaled in the range [1e-3,1e3]

Learner

'svm' and 'logistic'

NumExpansionDimensions

Integers log-scaled in the range [100,10000]

Standardize

'true' and 'false'

Distance

'cityblock', 'chebychev', 'correlation', 'cosine', 'euclidean', 'hamming', 'jaccard', 'mahalanobis', 'minkowski', 'seuclidean', and 'spearman'

DistanceWeight

'equal', 'inverse', and 'squaredinverse'

Exponent

Positive values in [0.5,3]

fitcecoc

Learners

'linear'

Eligible Hyperparameters (Bold = Default)

Default Range

NumNeighbors

Positive integer values log-scaled in the range [1, max(2,round(NumObservations/ 2))]

Standardize

'true' and 'false'

Lambda

Positive values log-scaled in the range [1e-5/NumObservations,1e5/ NumObservations]

Learner

'svm' and 'logistic'

Regularization

'ridge' and 'lasso' • When Regularization is 'ridge', the function uses a Limited-memory BFGS (LBFGS) solver by default. • When Regularization is 'lasso', the function uses a Sparse Reconstruction by Separable Approximation (SpaRSA) solver by default.

'svm'

'tree'

BoxConstraint

Positive values log-scaled in the range [1e-3,1e3]

KernelScale

Positive values log-scaled in the range [1e-3,1e3]

KernelFunction

'gaussian', 'linear', and 'polynomial'

PolynomialOrder

Integers in the range [2,4]

Standardize

'true' and 'false'

MaxNumSplits

Integers log-scaled in the range [1,max(2,NumObservations-1)]

MinLeafSize

Integers log-scaled in the range [1,max(2,floor(NumObservations/ 2))]

NumVariablesToSample

Integers in the range [1,max(2,NumPredictors)]

SplitCriterion

'gdi', 'deviance', and 'twoing'

Alternatively, use hyperparameters with your chosen Learners, such as load fisheriris % hyperparameters requires data and learner params = hyperparameters('fitcecoc',meas,species,'svm');

To see the eligible and default hyperparameters, examine params. Set nondefault parameters by passing a vector of optimizableVariable objects that have nondefault values. For example, 35-2237

35

Functions

load fisheriris params = hyperparameters('fitcecoc',meas,species,'svm'); params(2).Range = [1e-4,1e6];

Pass params as the value of OptimizeHyperparameters. By default, the iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is the misclassification rate. To control the iterative display, set the Verbose field of the 'HyperparameterOptimizationOptions' name-value argument. To control the plots, set the ShowPlots field of the 'HyperparameterOptimizationOptions' name-value argument. For an example, see “Optimize ECOC Classifier” on page 35-2217. Example: 'auto' HyperparameterOptimizationOptions — Options for optimization structure Options for optimization, specified as a structure. This argument modifies the effect of the OptimizeHyperparameters name-value argument. All fields in the structure are optional. Field Name

Values

Default

Optimizer

• 'bayesopt' — Use Bayesian optimization. Internally, this setting calls bayesopt.

'bayesopt'

• 'gridsearch' — Use grid search with NumGridDivisions values per dimension. • 'randomsearch' — Search at random among MaxObjectiveEvaluations points. 'gridsearch' searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command sortrows(Mdl.HyperparameterOptimizatio nResults).

35-2238

fitcecoc

Field Name

Values

AcquisitionFunct • 'expected-improvement-per-secondionName plus' • 'expected-improvement'

Default 'expectedimprovement-persecond-plus'

• 'expected-improvement-plus' • 'expected-improvement-per-second' • 'lower-confidence-bound' • 'probability-of-improvement' Acquisition functions whose names include persecond do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include plus modify their behavior when they are overexploiting an area. For more details, see “Acquisition Function Types” on page 10-3. MaxObjectiveEval Maximum number of objective function uations evaluations.

30 for 'bayesopt' and 'randomsearch', and the entire grid for 'gridsearch'

MaxTime

Inf

Time limit, specified as a positive real scalar. The time limit is in seconds, as measured by tic and toc. The run time can exceed MaxTime because MaxTime does not interrupt function evaluations.

NumGridDivisions For 'gridsearch', the number of values in each 10 dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables. ShowPlots

Logical value indicating whether to show plots. If true true, this field plots the best observed objective function value against the iteration number. If you use Bayesian optimization (Optimizer is 'bayesopt'), then this field also plots the best estimated objective function value. The best observed objective function values and best estimated objective function values correspond to the values in the BestSoFar (observed) and BestSoFar (estim.) columns of the iterative display, respectively. You can find these values in the properties ObjectiveMinimumTrace and EstimatedObjectiveMinimumTrace of Mdl.HyperparameterOptimizationResults. If the problem includes one or two optimization parameters for Bayesian optimization, then ShowPlots also plots a model of the objective function against the parameters.

35-2239

35

Functions

Field Name

Values

Default

SaveIntermediate Logical value indicating whether to save results Results when Optimizer is 'bayesopt'. If true, this field overwrites a workspace variable named 'BayesoptResults' at each iteration. The variable is a BayesianOptimization object.

false

Verbose

1

Display at the command line: • 0 — No iterative display • 1 — Iterative display • 2 — Iterative display with extra information For details, see the bayesopt Verbose namevalue argument and the example “Optimize Classifier Fit Using Bayesian Optimization” on page 10-56.

UseParallel

Logical value indicating whether to run Bayesian false optimization in parallel, which requires Parallel Computing Toolbox. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For details, see “Parallel Bayesian Optimization” on page 10-7.

Repartition

Logical value indicating whether to repartition the false cross-validation at every iteration. If this field is false, the optimizer uses a single partition for the optimization. The setting true usually gives the most robust results because it takes partitioning noise into account. However, for good results, true requires at least twice as many function evaluations.

Use no more than one of the following three options. CVPartition

A cvpartition object, as created by cvpartition

Holdout

A scalar in the range (0,1) representing the holdout fraction

Kfold

An integer greater than 1

'Kfold',5 if you do not specify a cross-validation field

Example: 'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60) Data Types: struct

Output Arguments Mdl — Trained ECOC model ClassificationECOC model object | CompactClassificationECOC model object | ClassificationPartitionedECOC cross-validated model object | 35-2240

fitcecoc

ClassificationPartitionedLinearECOC cross-validated model object | ClassificationPartitionedKernelECOC cross-validated model object Trained ECOC classifier, returned as a ClassificationECOC or CompactClassificationECOC model object, or a ClassificationPartitionedECOC, ClassificationPartitionedLinearECOC, or ClassificationPartitionedKernelECOC cross-validated model object. This table shows how the types of model objects returned by fitcecoc depend on the type of binary learners you specify and whether you perform cross-validation. Linear Classification Model Learners

Kernel Classification Model Learners

Cross-Validation

Returned Model Object

No

No

No

ClassificationECOC

No

No

Yes

ClassificationPart itionedECOC

Yes

No

No

CompactClassificat ionECOC

Yes

No

Yes

ClassificationPart itionedLinearECOC

No

Yes

No

CompactClassificat ionECOC

No

Yes

Yes

ClassificationPart itionedKernelECOC

HyperparameterOptimizationResults — Description of cross-validation optimization of hyperparameters BayesianOptimization object | table of hyperparameters and associated values Description of the cross-validation optimization of hyperparameters, returned as a BayesianOptimization object or a table of hyperparameters and associated values. HyperparameterOptimizationResults is nonempty when the OptimizeHyperparameters name-value pair argument is nonempty and the Learners name-value pair argument designates linear or kernel binary learners. The value depends on the setting of the HyperparameterOptimizationOptions name-value pair argument: • 'bayesopt' (default) — Object of class BayesianOptimization • 'gridsearch' or 'randomsearch' — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observation from smallest (best) to highest (worst) Data Types: table

Limitations • fitcecoc supports sparse matrices for training linear classification models only. For all other models, supply a full matrix of predictor data instead.

35-2241

35

Functions

More About Error-Correcting Output Codes Model An error-correcting output codes (ECOC) model reduces the problem of classification with three or more classes to a set of binary classification problems. ECOC classification requires a coding design, which determines the classes that the binary learners train on, and a decoding scheme, which determines how the results (predictions) of the binary classifiers are aggregated. Assume the following: • The classification problem has three classes. • The coding design is one-versus-one. For three classes, this coding design is Learner 1 Learner 2 Learner 3 Class 1 1 1 0 Class 2 −1 0 1 Class 3 0 −1 −1 You can specify a different coding design by using the Coding name-value argument when you create a classification model. • The model determines the predicted class by using the loss-weighted decoding scheme with the binary loss function g. The software also supports the loss-based decoding scheme. You can specify the decoding scheme and binary loss function by using the Decoding and BinaryLoss name-value arguments, respectively, when you call object functions, such as predict, loss, margin, edge, and so on. The ECOC algorithm follows these steps. 1

Learner 1 trains on observations in Class 1 or Class 2, and treats Class 1 as the positive class and Class 2 as the negative class. The other learners are trained similarly.

2

Let M be the coding design matrix with elements mkl, and sl be the predicted classification score for the positive class of learner l. The algorithm assigns a new observation to the class (k ) that minimizes the aggregation of the losses for the B binary learners.

k = argmin k

∑

B

l=1

mkl g mkl, sl

∑

B

l=1

.

mkl

ECOC models can improve classification accuracy, compared to other multiclass models [2]. Coding Design The coding design is a matrix whose elements direct which classes are trained by each binary learner, that is, how the multiclass problem is reduced to a series of binary problems. Each row of the coding design corresponds to a distinct class, and each column corresponds to a binary learner. In a ternary coding design, for a particular column (or binary learner): 35-2242

fitcecoc

• A row containing 1 directs the binary learner to group all observations in the corresponding class into a positive class. • A row containing –1 directs the binary learner to group all observations in the corresponding class into a negative class. • A row containing 0 directs the binary learner to ignore all observations in the corresponding class. Coding design matrices with large, minimal, pairwise row distances based on the Hamming measure are optimal. For details on the pairwise row distance, see “Random Coding Design Matrices” on page 35-2247 and [3]. This table describes popular coding designs. Coding Design

Description

Number of Learners

one-versus-all (OVA)

For each binary learner, K one class is positive and the rest are negative. This design exhausts all combinations of positive class assignments.

2

one-versus-one (OVO)

For each binary learner, K(K – 1)/2 one class is positive, one class is negative, and the rest are ignored. This design exhausts all combinations of class pair assignments.

1

binary complete

This design partitions 2K – 1 – 1 the classes into all binary combinations, and does not ignore any classes. That is, all class assignments are –1 and 1 with at least one positive class and one negative class in the assignment for each binary learner.

2K – 2

ternary complete

This design partitions the classes into all ternary combinations. That is, all class assignments are 0, –1, and 1 with at least one positive class and one negative class in the assignment for each binary learner.

(3K – 2K + 1 + 1)/2

Minimal Pairwise Row Distance

3K – 2

35-2243

35

Functions

Coding Design

Description

Number of Learners

Minimal Pairwise Row Distance

ordinal

For the first binary K–1 learner, the first class is negative and the rest are positive. For the second binary learner, the first two classes are negative and the rest are positive, and so on.

dense random

For each binary learner, Random, but Variable the software randomly approximately 10 log2K assigns classes into positive or negative classes, with at least one of each type. For more details, see “Random Coding Design Matrices” on page 352247.

sparse random

For each binary learner, Random, but Variable the software randomly approximately 15 log2K assigns classes as positive or negative with probability 0.25 for each, and ignores classes with probability 0.5. For more details, see “Random Coding Design Matrices” on page 35-2247.

1

This plot compares the number of binary learners for the coding designs with an increasing number of classes (K).

35-2244

fitcecoc

Tips • The number of binary learners grows with the number of classes. For a problem with many classes, the binarycomplete and ternarycomplete coding designs are not efficient. However: • If K ≤ 4, then use ternarycomplete coding design rather than sparserandom. • If K ≤ 5, then use binarycomplete coding design rather than denserandom. You can display the coding design matrix of a trained ECOC classifier by entering Mdl.CodingMatrix into the Command Window. • You should form a coding matrix using intimate knowledge of the application, and taking into account computational constraints. If you have sufficient computational power and time, then try several coding matrices and choose the one with the best performance (e.g., check the confusion matrices for each model using confusionchart). • Leave-one-out cross-validation (Leaveout) is inefficient for data sets with many observations. Instead, use k-fold cross-validation (KFold). • After training a model, you can generate C/C++ code that predicts labels for new data. Generating C/C++ code requires MATLAB Coder. For details, see “Introduction to Code Generation” on page 34-3.

35-2245

35

Functions

Algorithms Custom Coding Design Matrices Custom coding matrices must have a certain form. The software validates a custom coding matrix by ensuring: • Every element is –1, 0, or 1. • Every column contains as least one –1 and one 1. • For all distinct column vectors u and v, u ≠ v and u ≠ –v. • All row vectors are unique. • The matrix can separate any two classes. That is, you can move from any row to any other row following these rules: • Move vertically from 1 to –1 or –1 to 1. • Move horizontally from a nonzero element to another nonzero element. • Use a column of the matrix for a vertical move only once. If it is not possible to move from row i to row j using these rules, then classes i and j cannot be separated by the design. For example, in the coding design 1 −1 0 0

0 0 1 −1

classes 1 and 2 cannot be separated from classes 3 and 4 (that is, you cannot move horizontally from –1 in row 2 to column 2 because that position contains a 0). Therefore, the software rejects this coding design. Parallel Computing If you use parallel computing (see Options), then fitcecoc trains binary learners in parallel. Prior Probabilities and Misclassification Cost If you specify the Cost, Prior, and Weights name-value arguments, the output model object stores the specified values in the Cost, Prior, and W properties, respectively. The Cost property stores the user-specified cost matrix as is. The Prior and W properties store the prior probabilities and observation weights, respectively, after normalization. For details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 19-8. For each binary learner, the software normalizes the prior probabilities into a vector of two elements, and normalizes the cost matrix into a 2-by-2 matrix. Then, the software adjusts the prior probability vector by incorporating the penalties described in the 2-by-2 cost matrix, and sets the cost matrix to the default cost matrix. The Cost and Prior properties of the binary learners in Mdl (Mdl.BinaryLearners) store the adjusted values. Specifically, the software completes these steps: 1

35-2246

The software normalizes the specified class prior probabilities (Prior) for each binary learner. Let M be the coding design matrix and I(A,c) be an indicator matrix. The indicator matrix has the same dimensions as A. If the corresponding element of A is c, then the indicator matrix has elements equaling one, and zero otherwise. Let M+1 and M-1 be K-by-L matrices such that:

fitcecoc

• M+1 = M○I(M,1), where ○ is element-wise multiplication (that is, Mplus = M.*(M == 1)). ( + 1)

Also, let ml

be column vector l of M+1.

• M = -M○I(M,-1) (that is, Mminus = -M.*(M == -1)). Also, let m( − 1) be column vector l of -1 l M-1. ( + 1)

( − 1)

Let πl+1 = ml °π and πl−1 = ml probabilities (Prior).

°π, where π is the vector of specified, class prior

Then, the positive and negative, scalar class prior probabilities for binary learner l are ( j)

πl

( j)

πl =

( + 1)

πl

1 ( − 1)

1 + πl

where j = {-1,1} and a 2

1

, 1

is the one-norm of a.

The software normalizes the K-by-K cost matrix C (Cost) for each binary learner. For binary learner l, the cost of classifying a negative-class observation into the positive class is ( − 1) ⊤

cl− + = πl

( + 1)

Cπl

.

Similarly, the cost of classifying a positive-class observation into the negative class is ( + 1) ⊤

cl+ − = πl

( − 1)

Cπl

.

The cost matrix for binary learner l is Cl = 3

0

cl− +

cl+ −

0

.

ECOC models accommodate misclassification costs by incorporating them with class prior probabilities. The software adjusts the class prior probabilities and sets the cost matrix to the default cost matrix for binary learners as follows: −1

πl

+1

πl

−1

=

cl− + π l −1

cl− + π l

+1

,

+1

,

+ c+ − π l +1

=

cl+ − π l −1

cl− + π l Cl =

+ c+ − π l

0 1 . 1 0

Random Coding Design Matrices For a given number of classes K, the software generates random coding design matrices as follows. 1

The software generates one of these matrices: a

Dense random — The software assigns 1 or –1 with equal probability to each element of the K-by-Ld coding design matrix, where Ld ≈ 10log2K . 35-2247

35

Functions

b

Sparse random — The software assigns 1 to each element of the K-by-Ls coding design matrix with probability 0.25, –1 with probability 0.25, and 0 with probability 0.5, where Ls ≈ 15log2K .

2

If a column does not contain at least one 1 and one –1, then the software removes that column.

3

For distinct columns u and v, if u = v or u = –v, then the software removes v from the coding design matrix.

The software randomly generates 10,000 matrices by default, and retains the matrix with the largest, minimal, pairwise row distance based on the Hamming measure ([3]) given by L

Δ(k1, k2) = 0.5

∑

mk1l mk2l mk1l − mk2l ,

l=1

where mkjl is an element of coding design matrix j. Support Vector Storage By default and for efficiency, fitcecoc empties the Alpha, SupportVectorLabels, and SupportVectors properties for all linear SVM binary learners. fitcecoc lists Beta, rather than Alpha, in the model display. To store Alpha, SupportVectorLabels, and SupportVectors, pass a linear SVM template that specifies storing support vectors to fitcecoc. For example, enter: t = templateSVM('SaveSupportVectors',true) Mdl = fitcecoc(X,Y,'Learners',t);

You can remove the support vectors and related values by passing the resulting ClassificationECOC model to discardSupportVectors.

Version History Introduced in R2014b R2023b: "auto" option of OptimizeHyperparameters includes Standardize when the binary learners are kernel, k-nearest neighbor (KNN), or support vector machine (SVM) classifiers Behavior changed in R2023b Starting in R2023b, when you specify "kernel", "knn", or "svm" as the Learners value and "auto" as the OptimizeHyperparameters value, fitcecoc includes Standardize as an optimizable hyperparameter. R2022a: Regularization method determines the linear learner solver used during hyperparameter optimization Behavior changed in R2022a Starting in R2022a, when you specify to optimize hyperparameters for an ECOC model with linear binary learners ('linear' or templateLinear) and do not specify to use a particular solver, fitcecoc uses either a Limited-memory BFGS (LBFGS) solver or a Sparse Reconstruction by Separable Approximation (SpaRSA) solver, depending on the regularization type selected during each iteration of the hyperparameter optimization. 35-2248

fitcecoc

• When Regularization is 'ridge', the function sets the Solver value to 'lbfgs' by default. • When Regularization is 'lasso', the function sets the Solver value to 'sparsa' by default. In previous releases, the default solver selection during hyperparameter optimization depended on various factors, including the regularization type, learner type, and number of predictors. For more information, see Solver.

References [1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classiﬁers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141. [2] Fürnkranz, Johannes. “Round Robin Classification.” J. Mach. Learn. Res., Vol. 2, 2002, pp. 721– 747. [3] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recog. Lett., Vol. 30, Issue 3, 2009, pp. 285–297. [4] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. Usage notes and limitations: • Supported syntaxes are: • Mdl = fitcecoc(X,Y) • Mdl = fitcecoc(X,Y,Name,Value) • [Mdl,FitInfo,HyperparameterOptimizationResults] = fitcecoc(X,Y,Name,Value) — fitcecoc returns the additional output arguments FitInfo and HyperparameterOptimizationResults when you specify the 'OptimizeHyperparameters' name-value pair argument. • The FitInfo output argument is an empty structure array currently reserved for possible future use. • Options related to cross-validation are not supported. The supported name-value pair arguments are: • 'ClassNames' • 'Cost' • 'Coding' — Default value is 'onevsall'. • 'HyperparameterOptimizationOptions' — For cross-validation, tall optimization supports only 'Holdout' validation. By default, the software selects and reserves 20% of the data as holdout validation data, and trains the model using the rest of the data. You can specify a different value for the holdout fraction by using this argument. For example, specify 'HyperparameterOptimizationOptions',struct('Holdout',0.3) to reserve 30% of the data as validation data. 35-2249

35

Functions

• 'Learners' — Default value is 'linear'. You can specify 'linear','kernel', a templateLinear or templateKernel object, or a cell array of such objects. • 'OptimizeHyperparameters' — When you use linear binary learners, the value of the 'Regularization' hyperparameter must be 'ridge'. • 'Prior' • 'Verbose' — Default value is 1. • 'Weights' • This additional name-value pair argument is specific to tall arrays: • 'NumConcurrent' — A positive integer scalar specifying the number of binary learners that are trained concurrently by combining file I/O operations. The default value for 'NumConcurrent' is 1, which means fitcecoc trains the binary learners sequentially. 'NumConcurrent' is most beneficial when the input arrays cannot fit into the distributed cluster memory. Otherwise, the input arrays can be cached and speedup is negligible. If you run your code on Apache® Spark™, NumConcurrent is upper bounded by the memory available for communications. Check the 'spark.executor.memory' and 'spark.driver.memory' properties in your Apache Spark configuration. See parallel.cluster.Hadoop (Parallel Computing Toolbox) for more details. For more information on Apache Spark and other execution environments that control where your code runs, see “Extend Tall Arrays with Other Products”. For more information, see “Tall Arrays”. Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true in one of these ways: • Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to fitceoc. For example: 'Options',statset('UseParallel',true) For more information, see the 'Options' name-value pair argument. • Perform parallel hyperparameter optimization by using the 'HyperparameterOptions',struct('UseParallel',true) name-value pair argument in the call to fitceoc. For more information on parallel hyperparameter optimization, see “Parallel Bayesian Optimization” on page 10-7. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • You can specify the name-value argument 'Learners' only as one of the learners specified in this table.

35-2250

fitcecoc

Learner

Learner Name

Template Object Creation Function

Information About gpuArray Support

support vector machine

'svm'

templateSVM

“GPU Arrays” on page 35-2504 for fitcsvm

k-nearest neighbors

'knn'

templateKNN

“GPU Arrays” on page 35-2346 for fitcknn

classification tree

'tree'

templateTree

“GPU Arrays” on page 35-2568 for fitctree

For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also ClassificationECOC | CompactClassificationECOC | ClassificationPartitionedECOC | loss | predict | designecoc | statset | ClassificationPartitionedLinearECOC | ClassificationPartitionedKernelECOC Topics “Quick Start Parallel Computing for Statistics and Machine Learning Toolbox” on page 33-2 “Reproducibility in Parallel Statistical Computations” on page 33-16 “Concepts of Parallel Computing in Statistics and Machine Learning Toolbox” on page 33-6

35-2251

35

Functions

fitcensemble Fit ensemble of learners for classification

Syntax Mdl = fitcensemble(Tbl,ResponseVarName) Mdl = fitcensemble(Tbl,formula) Mdl = fitcensemble(Tbl,Y) Mdl = fitcensemble(X,Y) Mdl = fitcensemble( ___ ,Name,Value)

Description Mdl = fitcensemble(Tbl,ResponseVarName) returns the trained classification ensemble model object (Mdl) that contains the results of boosting 100 classification trees and the predictor and response data in the table Tbl. ResponseVarName is the name of the response variable in Tbl. By default, fitcensemble uses LogitBoost for binary classification and AdaBoostM2 for multiclass classification. Mdl = fitcensemble(Tbl,formula) applies formula to fit the model to the predictor and response data in the table Tbl. formula is an explanatory model of the response and a subset of predictor variables in Tbl used to fit Mdl. For example, 'Y~X1+X2+X3' fits the response variable Tbl.Y as a function of the predictor variables Tbl.X1, Tbl.X2, and Tbl.X3. Mdl = fitcensemble(Tbl,Y) treats all variables in the table Tbl as predictor variables. Y is the array of class labels that is not in Tbl. Mdl = fitcensemble(X,Y) uses the predictor data in the matrix X and the array of class labels in Y. Mdl = fitcensemble( ___ ,Name,Value) uses additional options specified by one or more Name,Value pair arguments and any of the input arguments in the previous syntaxes. For example, you can specify the number of learning cycles, the ensemble aggregation method, or to implement 10fold cross-validation.

Examples Train Classification Ensemble Create a predictive classification ensemble using all available predictor variables in the data. Then, train another ensemble using fewer predictors. Compare the in-sample predictive accuracies of the ensembles. Load the census1994 data set. load census1994

Train an ensemble of classification models using the entire data set and default options. 35-2252

fitcensemble

Mdl1 = fitcensemble(adultdata,'salary') Mdl1 = ClassificationEnsemble PredictorNames: ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: NumTrained: Method: LearnerNames: ReasonForTermination: FitInfo: FitInfoDescription:

{'age' 'workClass' 'fnlwgt' 'education' 'education_num' 'marital_ 'salary' [2 4 6 7 8 9 10 14] [50K] 'none' 32561 100 'LogitBoost' {'Tree'} 'Terminated normally after completing the requested number of training [100x1 double] {2x1 cell}

Mdl is a ClassificationEnsemble model. Some notable characteristics of Mdl are: • Because two classes are represented in the data, LogitBoost is the ensemble aggregation algorithm. • Because the ensemble aggregation method is a boosting algorithm, classification trees that allow a maximum of 10 splits compose the ensemble. • One hundred trees compose the ensemble. Use the classification ensemble to predict the labels of a random set of five observations from the data. Compare the predicted labels with their true values. rng(1) % For reproducibility [pX,pIdx] = datasample(adultdata,5); label = predict(Mdl1,pX); table(label,adultdata.salary(pIdx),'VariableNames',{'Predicted','Truth'}) ans=5×2 table Predicted _________ ut, ∗

βj = 0

if β

j

≤ ut,

β j + ut if β j < − ut . • For SGD, β is the estimate of coefficient j after processing k mini-batches. u = kγ λ . γ is the t t t j learning rate at iteration t. λ is the value of Lambda. • For ASGD, β is the averaged estimate coefficient j after processing k mini-batches, u = kλ . j t If Regularization is 'ridge', then the software ignores TruncationPeriod. Example: 'TruncationPeriod',100 Data Types: single | double

Other Classification Options

CategoricalPredictors — Categorical predictors list vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | 'all' Categorical predictors list, specified as one of the values in this table. The descriptions assume that the predictor data has observations in rows and predictors in columns. Value

Description

Vector of positive integers

Each entry in the vector is an index value indicating that the corresponding predictor is categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If fitclinear uses a subset of input variables as predictors, then the function indexes the predictors using only the subset. The CategoricalPredictors values do not count the response variable, observation weights variable, or any other variables that the function does not use.

35-2366

fitclinear

Value

Description

Logical vector

A true entry means that the corresponding predictor is categorical. The length of the vector is p.

Character matrix

Each row of the matrix is the name of a predictor variable. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length.

String array or cell array of character vectors

Each element in the array is the name of a predictor variable. The names must match the entries in PredictorNames.

"all"

All predictors are categorical.

By default, if the predictor data is in a table (Tbl), fitclinear assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (X), fitclinear assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the CategoricalPredictors name-value argument. For the identified categorical predictors, fitclinear creates dummy variables using two different schemes, depending on whether a categorical variable is unordered or ordered. For an unordered categorical variable, fitclinear creates one dummy variable for each level of the categorical variable. For an ordered categorical variable, fitclinear creates one less dummy variable than the number of categories. For details, see “Automatic Creation of Dummy Variables” on page 2-14. Example: 'CategoricalPredictors','all' Data Types: single | double | logical | char | string | cell ClassNames — Names of classes to use for training categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Names of classes to use for training, specified as a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. ClassNames must have the same data type as the response variable in Tbl or Y. If ClassNames is a character array, then each element must correspond to one row of the array. Use ClassNames to: • Specify the order of the classes during training. • Specify the order of any input or output argument dimension that corresponds to the class order. For example, use ClassNames to specify the order of the dimensions of Cost or the column order of classification scores returned by predict. • Select a subset of classes for training. For example, suppose that the set of all distinct class names in Y is ["a","b","c"]. To train the model using observations from classes "a" and "c" only, specify "ClassNames",["a","c"]. The default value for ClassNames is the set of all distinct class names in the response variable in Tbl or Y. Example: "ClassNames",["b","g"] Data Types: categorical | char | string | logical | single | double | cell 35-2367

35

Functions

Cost — Misclassification cost square matrix | structure array Misclassification cost, specified as the comma-separated pair consisting of 'Cost' and a square matrix or structure. • If you specify the square matrix cost ('Cost',cost), then cost(i,j) is the cost of classifying a point into class j if its true class is i. That is, the rows correspond to the true class, and the columns correspond to the predicted class. To specify the class order for the corresponding rows and columns of cost, use the ClassNames name-value pair argument. • If you specify the structure S ('Cost',S), then it must have two fields: • S.ClassNames, which contains the class names as a variable of the same data type as Y • S.ClassificationCosts, which contains the cost matrix with rows and columns ordered as in S.ClassNames The default value for Cost is ones(K) – eye(K), where K is the number of distinct classes. fitclinear uses Cost to adjust the prior class probabilities specified in Prior. Then, fitclinear uses the adjusted prior probabilities for training. Example: 'Cost',[0 2; 1 0] Data Types: single | double | struct PredictorNames — Predictor variable names string array of unique names | cell array of unique character vectors Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The functionality of 'PredictorNames' depends on the way you supply the training data. • If you supply X and Y, then you can use 'PredictorNames' to assign names to the predictor variables in X. • The order of the names in PredictorNames must correspond to the predictor order in X. Assuming that X has the default orientation, with observations in rows and predictors in columns, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal. • By default, PredictorNames is {'x1','x2',...}. • If you supply Tbl, then you can use 'PredictorNames' to choose which predictor variables to use in training. That is, fitclinear uses only the predictor variables in PredictorNames and the response variable during training. • PredictorNames must be a subset of Tbl.Properties.VariableNames and cannot include the name of the response variable. • By default, PredictorNames contains the names of all predictor variables. • A good practice is to specify the predictors for training using either 'PredictorNames' or formula, but not both. Example: 'PredictorNames', {'SepalLength','SepalWidth','PetalLength','PetalWidth'} Data Types: string | cell 35-2368

fitclinear

Prior — Prior probabilities 'empirical' (default) | 'uniform' | numeric vector | structure array Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and 'empirical', 'uniform', a numeric vector, or a structure array. This table summarizes the available options for setting prior probabilities. Value

Description

'empirical'

The class prior probabilities are the class relative frequencies in Y.

'uniform'

All class prior probabilities are equal to 1/K, where K is the number of classes.

numeric vector

Each element is a class prior probability. Order the elements according to their order in Y. If you specify the order using the 'ClassNames' namevalue pair argument, then order the elements accordingly.

structure array

A structure S with two fields: • S.ClassNames contains the class names as a variable of the same type as Y. • S.ClassProbs contains a vector of corresponding prior probabilities.

fitclinear normalizes the prior probabilities in Prior to sum to 1. Example: 'Prior',struct('ClassNames', {{'setosa','versicolor'}},'ClassProbs',1:2) Data Types: char | string | double | single | struct ResponseName — Response variable name "Y" (default) | character vector | string scalar Response variable name, specified as a character vector or string scalar. • If you supply Y, then you can use ResponseName to specify a name for the response variable. • If you supply ResponseVarName or formula, then you cannot use ResponseName. Example: "ResponseName","response" Data Types: char | string ScoreTransform — Score transformation "none" (default) | "doublelogit" | "invlogit" | "ismax" | "logit" | function handle | ... Score transformation, specified as a character vector, string scalar, or function handle. This table summarizes the available character vectors and string scalars. Value

Description

"doublelogit"

1/(1 + e–2x)

35-2369

35

Functions

Value

Description

"invlogit"

log(x / (1 – x))

"ismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

"logit"

1/(1 + e–x)

"none" or "identity"

x (no transformation)

"sign"

–1 for x < 0 0 for x = 0 1 for x > 0

"symmetric"

2x – 1

"symmetricismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

"symmetriclogit"

2/(1 + e–x) – 1

For a MATLAB function or a function you define, use its function handle for the score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Example: "ScoreTransform","logit" Data Types: char | string | function_handle Weights — Observation weights nonnegative numeric vector | name of variable in Tbl Observation weights, specified as a nonnegative numeric vector or the name of a variable in Tbl. The software weights each observation in X or Tbl with the corresponding value in Weights. The length of Weights must equal the number of observations in X or Tbl. If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored as Tbl.W, then specify it as 'W'. Otherwise, the software treats all columns of Tbl, including W, as predictors or the response variable when training the model. By default, Weights is ones(n,1), where n is the number of observations in X or Tbl. The software normalizes Weights to sum to the value of the prior probability in the respective class. Data Types: single | double | char | string

Cross-Validation Options

CrossVal — Cross-validation flag 'off' (default) | 'on' Cross-validation flag, specified as the comma-separated pair consisting of 'Crossval' and 'on' or 'off'. If you specify 'on', then the software implements 10-fold cross-validation. 35-2370

fitclinear

To override this cross-validation setting, use one of these name-value pair arguments: CVPartition, Holdout, or KFold. To create a cross-validated model, you can use one cross-validation name-value pair argument at a time only. Example: 'Crossval','on' CVPartition — Cross-validation partition [] (default) | cvpartition partition object Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition partition object as created by cvpartition. The partition object specifies the type of cross-validation, and also the indexing for training and validation sets. To create a cross-validated model, you can use one of these four options only: 'CVPartition', 'Holdout', or 'KFold'. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software: 1

Randomly reserves p*100% of the data as validation data, and trains the model using the rest of the data

2

Stores the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four options only: 'CVPartition', 'Holdout', or 'KFold'. Example: 'Holdout',0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated classifier, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify, e.g., 'KFold',k, then the software: 1

Randomly partitions the data into k sets

2

For each set, reserves the set as validation data, and trains the model using the other k – 1 sets

3

Stores the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four options only: 'CVPartition', 'Holdout', or 'KFold'. Example: 'KFold',8 Data Types: single | double

35-2371

35

Functions

SGD and ASGD Convergence Controls

BatchLimit — Maximal number of batches positive integer Maximal number of batches to process, specified as the comma-separated pair consisting of 'BatchLimit' and a positive integer. When the software processes BatchLimit batches, it terminates optimization. • By default: • The software passes through the data PassLimit times. • If you specify multiple solvers, and use (A)SGD to get an initial approximation for the next solver, then the default value is ceil(1e6/BatchSize). BatchSize is the value of the 'BatchSize' name-value pair argument. • If you specify BatchLimit, then fitclinear uses the argument that results in processing the fewest observations, either BatchLimit or PassLimit. Example: 'BatchLimit',100 Data Types: single | double BetaTolerance — Relative tolerance on linear coefficients and bias term 1e-4 (default) | nonnegative scalar Relative tolerance on the linear coefficients and the bias term (intercept), specified as the commaseparated pair consisting of 'BetaTolerance' and a nonnegative scalar. Let Bt = βt′ bt , that is, the vector of the coefficients and the bias term at optimization iteration t. If Bt − Bt − 1 < BetaTolerance, then optimization terminates. Bt 2 If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. Example: 'BetaTolerance',1e-6 Data Types: single | double NumCheckConvergence — Number of batches to process before next convergence check positive integer Number of batches to process before next convergence check, specified as the comma-separated pair consisting of 'NumCheckConvergence' and a positive integer. To specify the batch size, see BatchSize. The software checks for convergence about 10 times per pass through the entire data set by default. Example: 'NumCheckConvergence',100 Data Types: single | double PassLimit — Maximal number of passes 1 (default) | positive integer 35-2372

fitclinear

Maximal number of passes through the data, specified as the comma-separated pair consisting of 'PassLimit' and a positive integer. fitclinear processes all observations when it completes one pass through the data. When fitclinear passes through the data PassLimit times, it terminates optimization. If you specify BatchLimit, then fitclinear uses the argument that results in processing the fewest observations, either BatchLimit or PassLimit. Example: 'PassLimit',5 Data Types: single | double ValidationData — Validation data for optimization convergence detection cell array | table Validation data for optimization convergence detection, specified as the comma-separated pair consisting of 'ValidationData' and a cell array or table. During optimization, the software periodically estimates the loss of ValidationData. If the validation-data loss increases, then the software terminates optimization. For more details, see “Algorithms” on page 35-2384. To optimize hyperparameters using cross-validation, see crossvalidation options such as CrossVal. You can specify ValidationData as a table if you use a table Tbl of predictor data that contains the response variable. In this case, ValidationData must contain the same predictors and response contained in Tbl. The software does not apply weights to observations, even if Tbl contains a vector of weights. To specify weights, you must specify ValidationData as a cell array. If you specify ValidationData as a cell array, then it must have the following format: • ValidationData{1} must have the same data type and orientation as the predictor data. That is, if you use a predictor matrix X, then ValidationData{1} must be an m-by-p or p-by-m full or sparse matrix of predictor data that has the same orientation as X. The predictor variables in the training data X and ValidationData{1} must correspond. Similarly, if you use a predictor table Tbl of predictor data, then ValidationData{1} must be a table containing the same predictor variables contained in Tbl. The number of observations in ValidationData{1} and the predictor data can vary. • ValidationData{2} must match the data type and format of the response variable, either Y or ResponseVarName. If ValidationData{2} is an array of class labels, then it must have the same number of elements as the number of observations in ValidationData{1}. The set of all distinct labels of ValidationData{2} must be a subset of all distinct labels of Y. If ValidationData{1} is a table, then ValidationData{2} can be the name of the response variable in the table. If you want to use the same ResponseVarName or formula, you can specify ValidationData{2} as []. • Optionally, you can specify ValidationData{3} as an m-dimensional numeric vector of observation weights or the name of a variable in the table ValidationData{1} that contains observation weights. The software normalizes the weights with the validation data so that they sum to 1. If you specify ValidationData and want to display the validation loss at the command line, specify a value larger than 0 for Verbose. 35-2373

35

Functions

If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. By default, the software does not detect convergence by monitoring validation-data loss.

Dual SGD Convergence Controls

BetaTolerance — Relative tolerance on linear coefficients and bias term 1e-4 (default) | nonnegative scalar Relative tolerance on the linear coefficients and the bias term (intercept), specified as the commaseparated pair consisting of 'BetaTolerance' and a nonnegative scalar. Let Bt = βt′ bt , that is, the vector of the coefficients and the bias term at optimization iteration t. If Bt − Bt − 1 < BetaTolerance, then optimization terminates. Bt 2 If you also specify DeltaGradientTolerance, then optimization terminates when the software satisfies either stopping criterion. If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. Example: 'BetaTolerance',1e-6 Data Types: single | double DeltaGradientTolerance — Gradient-difference tolerance 1 (default) | nonnegative scalar Gradient-difference tolerance between upper and lower pool Karush-Kuhn-Tucker (KKT) complementarity conditions on page 35-2498 violators, specified as a nonnegative scalar. • If the magnitude of the KKT violators is less than DeltaGradientTolerance, then the software terminates optimization. • If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. Example: 'DeltaGradientTolerance',1e-2 Data Types: double | single NumCheckConvergence — Number of passes through entire data set to process before next convergence check 5 (default) | positive integer Number of passes through entire data set to process before next convergence check, specified as the comma-separated pair consisting of 'NumCheckConvergence' and a positive integer. Example: 'NumCheckConvergence',100 Data Types: single | double PassLimit — Maximal number of passes 10 (default) | positive integer 35-2374

fitclinear

Maximal number of passes through the data, specified as the comma-separated pair consisting of 'PassLimit' and a positive integer. When the software completes one pass through the data, it has processed all observations. When the software passes through the data PassLimit times, it terminates optimization. Example: 'PassLimit',5 Data Types: single | double ValidationData — Validation data for optimization convergence detection cell array | table Validation data for optimization convergence detection, specified as the comma-separated pair consisting of 'ValidationData' and a cell array or table. During optimization, the software periodically estimates the loss of ValidationData. If the validation-data loss increases, then the software terminates optimization. For more details, see “Algorithms” on page 35-2384. To optimize hyperparameters using cross-validation, see crossvalidation options such as CrossVal. You can specify ValidationData as a table if you use a table Tbl of predictor data that contains the response variable. In this case, ValidationData must contain the same predictors and response contained in Tbl. The software does not apply weights to observations, even if Tbl contains a vector of weights. To specify weights, you must specify ValidationData as a cell array. If you specify ValidationData as a cell array, then it must have the following format: • ValidationData{1} must have the same data type and orientation as the predictor data. That is, if you use a predictor matrix X, then ValidationData{1} must be an m-by-p or p-by-m full or sparse matrix of predictor data that has the same orientation as X. The predictor variables in the training data X and ValidationData{1} must correspond. Similarly, if you use a predictor table Tbl of predictor data, then ValidationData{1} must be a table containing the same predictor variables contained in Tbl. The number of observations in ValidationData{1} and the predictor data can vary. • ValidationData{2} must match the data type and format of the response variable, either Y or ResponseVarName. If ValidationData{2} is an array of class labels, then it must have the same number of elements as the number of observations in ValidationData{1}. The set of all distinct labels of ValidationData{2} must be a subset of all distinct labels of Y. If ValidationData{1} is a table, then ValidationData{2} can be the name of the response variable in the table. If you want to use the same ResponseVarName or formula, you can specify ValidationData{2} as []. • Optionally, you can specify ValidationData{3} as an m-dimensional numeric vector of observation weights or the name of a variable in the table ValidationData{1} that contains observation weights. The software normalizes the weights with the validation data so that they sum to 1. If you specify ValidationData and want to display the validation loss at the command line, specify a value larger than 0 for Verbose. If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. By default, the software does not detect convergence by monitoring validation-data loss. 35-2375

35

Functions

BFGS, LBFGS, and SpaRSA Convergence Controls

BetaTolerance — Relative tolerance on linear coefficients and bias term 1e-4 (default) | nonnegative scalar Relative tolerance on the linear coefficients and the bias term (intercept), specified as a nonnegative scalar. Let Bt = βt′ bt , that is, the vector of the coefficients and the bias term at optimization iteration t. If Bt − Bt − 1 < BetaTolerance, then optimization terminates. Bt 2 If you also specify GradientTolerance, then optimization terminates when the software satisfies either stopping criterion. If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. Example: 'BetaTolerance',1e-6 Data Types: single | double GradientTolerance — Absolute gradient tolerance 1e-6 (default) | nonnegative scalar Absolute gradient tolerance, specified as a nonnegative scalar. Let ∇ℒ t be the gradient vector of the objective function with respect to the coefficients and bias term at optimization iteration t. If ∇ℒ t ∞ = max ∇ℒ t < GradientTolerance, then optimization terminates. If you also specify BetaTolerance, then optimization terminates when the software satisfies either stopping criterion. If the software converges for the last solver specified in the software, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. Example: 'GradientTolerance',1e-5 Data Types: single | double HessianHistorySize — Size of history buffer for Hessian approximation 15 (default) | positive integer Size of history buffer for Hessian approximation, specified as the comma-separated pair consisting of 'HessianHistorySize' and a positive integer. That is, at each iteration, the software composes the Hessian using statistics from the latest HessianHistorySize iterations. The software does not support 'HessianHistorySize' for SpaRSA. Example: 'HessianHistorySize',10 Data Types: single | double IterationLimit — Maximal number of optimization iterations 1000 (default) | positive integer 35-2376

fitclinear

Maximal number of optimization iterations, specified as the comma-separated pair consisting of 'IterationLimit' and a positive integer. IterationLimit applies to these values of Solver: 'bfgs', 'lbfgs', and 'sparsa'. Example: 'IterationLimit',500 Data Types: single | double ValidationData — Validation data for optimization convergence detection cell array | table Validation data for optimization convergence detection, specified as the comma-separated pair consisting of 'ValidationData' and a cell array or table. During optimization, the software periodically estimates the loss of ValidationData. If the validation-data loss increases, then the software terminates optimization. For more details, see “Algorithms” on page 35-2384. To optimize hyperparameters using cross-validation, see crossvalidation options such as CrossVal. You can specify ValidationData as a table if you use a table Tbl of predictor data that contains the response variable. In this case, ValidationData must contain the same predictors and response contained in Tbl. The software does not apply weights to observations, even if Tbl contains a vector of weights. To specify weights, you must specify ValidationData as a cell array. If you specify ValidationData as a cell array, then it must have the following format: • ValidationData{1} must have the same data type and orientation as the predictor data. That is, if you use a predictor matrix X, then ValidationData{1} must be an m-by-p or p-by-m full or sparse matrix of predictor data that has the same orientation as X. The predictor variables in the training data X and ValidationData{1} must correspond. Similarly, if you use a predictor table Tbl of predictor data, then ValidationData{1} must be a table containing the same predictor variables contained in Tbl. The number of observations in ValidationData{1} and the predictor data can vary. • ValidationData{2} must match the data type and format of the response variable, either Y or ResponseVarName. If ValidationData{2} is an array of class labels, then it must have the same number of elements as the number of observations in ValidationData{1}. The set of all distinct labels of ValidationData{2} must be a subset of all distinct labels of Y. If ValidationData{1} is a table, then ValidationData{2} can be the name of the response variable in the table. If you want to use the same ResponseVarName or formula, you can specify ValidationData{2} as []. • Optionally, you can specify ValidationData{3} as an m-dimensional numeric vector of observation weights or the name of a variable in the table ValidationData{1} that contains observation weights. The software normalizes the weights with the validation data so that they sum to 1. If you specify ValidationData and want to display the validation loss at the command line, specify a value larger than 0 for Verbose. If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. By default, the software does not detect convergence by monitoring validation-data loss.

35-2377

35

Functions

Hyperparameter Optimization

OptimizeHyperparameters — Parameters to optimize 'none' (default) | 'auto' | 'all' | string array or cell array of eligible parameter names | vector of optimizableVariable objects Parameters to optimize, specified as the comma-separated pair consisting of 'OptimizeHyperparameters' and one of the following: • 'none' — Do not optimize. • 'auto' — Use {'Lambda','Learner'}. • 'all' — Optimize all eligible parameters. • String array or cell array of eligible parameter names. • Vector of optimizableVariable objects, typically the output of hyperparameters. The optimization attempts to minimize the cross-validation loss (error) for fitclinear by varying the parameters. For information about cross-validation loss (albeit in a different context), see “Classification Loss” on page 35-4305. To control the cross-validation type and other aspects of the optimization, use the HyperparameterOptimizationOptions name-value pair. Note The values of 'OptimizeHyperparameters' override any values you specify using other name-value arguments. For example, setting 'OptimizeHyperparameters' to 'auto' causes fitclinear to optimize hyperparameters corresponding to the 'auto' option and to ignore any specified values for the hyperparameters. The eligible parameters for fitclinear are: • Lambda — fitclinear searches among positive values, by default log-scaled in the range [1e-5/NumObservations,1e5/NumObservations]. • Learner — fitclinear searches among 'svm' and 'logistic'. • Regularization — fitclinear searches among 'ridge' and 'lasso'. • When Regularization is 'ridge', the function sets the Solver value to 'lbfgs' by default. • When Regularization is 'lasso', the function sets the Solver value to 'sparsa' by default. Set nondefault parameters by passing a vector of optimizableVariable objects that have nondefault values. For example, load fisheriris params = hyperparameters('fitclinear',meas,species); params(1).Range = [1e-4,1e6];

Pass params as the value of OptimizeHyperparameters. By default, the iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is the misclassification rate. To control the iterative display, set the Verbose field of the 'HyperparameterOptimizationOptions' name-value argument. To control the plots, set the ShowPlots field of the 'HyperparameterOptimizationOptions' name-value argument. 35-2378

fitclinear

For an example, see “Optimize Linear Classifier” on page 35-2352. Example: 'OptimizeHyperparameters','auto' HyperparameterOptimizationOptions — Options for optimization structure Options for optimization, specified as a structure. This argument modifies the effect of the OptimizeHyperparameters name-value argument. All fields in the structure are optional. Field Name

Values

Default

Optimizer

• 'bayesopt' — Use Bayesian optimization. Internally, this setting calls bayesopt.

'bayesopt'

• 'gridsearch' — Use grid search with NumGridDivisions values per dimension. • 'randomsearch' — Search at random among MaxObjectiveEvaluations points. 'gridsearch' searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command sortrows(Mdl.HyperparameterOptimizatio nResults). AcquisitionFunct • 'expected-improvement-per-secondionName plus' • 'expected-improvement'

'expectedimprovement-persecond-plus'

• 'expected-improvement-plus' • 'expected-improvement-per-second' • 'lower-confidence-bound' • 'probability-of-improvement' Acquisition functions whose names include persecond do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include plus modify their behavior when they are overexploiting an area. For more details, see “Acquisition Function Types” on page 10-3. MaxObjectiveEval Maximum number of objective function uations evaluations.

30 for 'bayesopt' and 'randomsearch', and the entire grid for 'gridsearch'

MaxTime

Inf

Time limit, specified as a positive real scalar. The time limit is in seconds, as measured by tic and toc. The run time can exceed MaxTime because MaxTime does not interrupt function evaluations.

35-2379

35

Functions

Field Name

Values

Default

NumGridDivisions For 'gridsearch', the number of values in each 10 dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables. ShowPlots

Logical value indicating whether to show plots. If true true, this field plots the best observed objective function value against the iteration number. If you use Bayesian optimization (Optimizer is 'bayesopt'), then this field also plots the best estimated objective function value. The best observed objective function values and best estimated objective function values correspond to the values in the BestSoFar (observed) and BestSoFar (estim.) columns of the iterative display, respectively. You can find these values in the properties ObjectiveMinimumTrace and EstimatedObjectiveMinimumTrace of Mdl.HyperparameterOptimizationResults. If the problem includes one or two optimization parameters for Bayesian optimization, then ShowPlots also plots a model of the objective function against the parameters.

SaveIntermediate Logical value indicating whether to save results Results when Optimizer is 'bayesopt'. If true, this field overwrites a workspace variable named 'BayesoptResults' at each iteration. The variable is a BayesianOptimization object.

false

Verbose

1

Display at the command line: • 0 — No iterative display • 1 — Iterative display • 2 — Iterative display with extra information For details, see the bayesopt Verbose namevalue argument and the example “Optimize Classifier Fit Using Bayesian Optimization” on page 10-56.

UseParallel

35-2380

Logical value indicating whether to run Bayesian false optimization in parallel, which requires Parallel Computing Toolbox. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For details, see “Parallel Bayesian Optimization” on page 10-7.

fitclinear

Field Name

Values

Default

Repartition

Logical value indicating whether to repartition the false cross-validation at every iteration. If this field is false, the optimizer uses a single partition for the optimization. The setting true usually gives the most robust results because it takes partitioning noise into account. However, for good results, true requires at least twice as many function evaluations.

Use no more than one of the following three options. CVPartition

A cvpartition object, as created by cvpartition

Holdout

A scalar in the range (0,1) representing the holdout fraction

Kfold

An integer greater than 1

'Kfold',5 if you do not specify a cross-validation field

Example: 'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60) Data Types: struct

Output Arguments Mdl — Trained linear classification model ClassificationLinear model object | ClassificationPartitionedLinear cross-validated model object Trained linear classification model, returned as a ClassificationLinear model object or ClassificationPartitionedLinear cross-validated model object. If you set any of the name-value pair arguments KFold, Holdout, CrossVal, or CVPartition, then Mdl is a ClassificationPartitionedLinear cross-validated model object. Otherwise, Mdl is a ClassificationLinear model object. To reference properties of Mdl, use dot notation. For example, enter Mdl.Beta in the Command Window to display the vector or matrix of estimated coefficients. Note Unlike other classification models, and for economical memory usage, ClassificationLinear and ClassificationPartitionedLinear model objects do not store the training data or training process details (for example, convergence history). FitInfo — Optimization details structure array Optimization details, returned as a structure array. Fields specify final values or name-value pair argument specifications, for example, Objective is the value of the objective function when optimization terminates. Rows of multidimensional fields correspond to values of Lambda and columns correspond to values of Solver. 35-2381

35

Functions

This table describes some notable fields. Field

Description

TerminationStatus

• Reason for optimization termination • Corresponds to a value in TerminationCode

FitTime

Elapsed, wall-clock time in seconds

History

A structure array of optimization information for each iteration. The field Solver stores solver types using integer coding. Integer

Solver

1

SGD

2

ASGD

3

Dual SGD for SVM

4

LBFGS

5

BFGS

6

SpaRSA

To access fields, use dot notation. For example, to access the vector of objective function values for each iteration, enter FitInfo.History.Objective. It is good practice to examine FitInfo to assess whether convergence is satisfactory. HyperparameterOptimizationResults — Cross-validation optimization of hyperparameters BayesianOptimization object | table of hyperparameters and associated values Cross-validation optimization of hyperparameters, returned as a BayesianOptimization object or a table of hyperparameters and associated values. The output is nonempty when the value of 'OptimizeHyperparameters' is not 'none'. The output value depends on the Optimizer field value of the 'HyperparameterOptimizationOptions' name-value pair argument: Value of Optimizer Field

Value of HyperparameterOptimizationResults

'bayesopt' (default)

Object of class BayesianOptimization

'gridsearch' or 'randomsearch'

Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

More About Warm Start A warm start is initial estimates of the beta coefficients and bias term supplied to an optimization routine for quicker convergence.

35-2382

fitclinear

Alternatives for Lower-Dimensional Data fitclinear and fitrlinear minimize objective functions relatively quickly for a high-dimensional linear model at the cost of some accuracy and with the restriction that the model must be linear with respect to the parameters. If your predictor data set is low- to medium-dimensional, you can use an alternative classification or regression fitting function. To help you decide which fitting function is appropriate for your data set, use this table. Model to Fit

Function

Notable Algorithmic Differences

SVM

• Binary classification: fitcsvm

• Computes the Gram matrix of the predictor variables, which is convenient for nonlinear kernel transformations.

• Multiclass classification: fitcecoc

Linear regression

• Regression: fitrsvm

• Solves dual problem using SMO, ISDA, or L1 minimization via quadratic programming using quadprog.

• Least-squares without regularization: fitlm

• lasso implements cyclic coordinate descent.

• Regularized least-squares using a lasso penalty: lasso • Ridge regression: ridge or lasso Logistic regression

• Logistic regression without regularization: fitglm. • Regularized logistic regression using a lasso penalty: lassoglm

• fitglm implements iteratively reweighted least squares. • lassoglm implements cyclic coordinate descent.

Tips • It is a best practice to orient your predictor matrix so that observations correspond to columns and to specify 'ObservationsIn','columns'. As a result, you can experience a significant reduction in optimization-execution time. • If your predictor data has few observations but many predictor variables, then: • Specify 'PostFitBias',true. • For SGD or ASGD solvers, set PassLimit to a positive integer that is greater than 1, for example, 5 or 10. This setting often results in better accuracy. • For SGD and ASGD solvers, BatchSize affects the rate of convergence. • If BatchSize is too small, then fitclinear achieves the minimum in many iterations, but computes the gradient per iteration quickly. • If BatchSize is too large, then fitclinear achieves the minimum in fewer iterations, but computes the gradient per iteration slowly. • Large learning rates (see LearnRate) speed up convergence to the minimum, but can lead to divergence (that is, over-stepping the minimum). Small learning rates ensure convergence to the minimum, but can lead to slow termination. 35-2383

35

Functions

• When using lasso penalties, experiment with various values of TruncationPeriod. For example, set TruncationPeriod to 1, 10, and then 100. • For efficiency, fitclinear does not standardize predictor data. To standardize X where you orient the observations as the columns, enter X = normalize(X,2);

If you orient the observations as the rows, enter X = normalize(X);

For memory-usage economy, the code replaces the original predictor data the standardized data. • After training a model, you can generate C/C++ code that predicts labels for new data. Generating C/C++ code requires MATLAB Coder. For details, see “Introduction to Code Generation” on page 34-3.

Algorithms • If you specify ValidationData, then, during objective-function optimization: • fitclinear estimates the validation loss of ValidationData periodically using the current model, and tracks the minimal estimate. • When fitclinear estimates a validation loss, it compares the estimate to the minimal estimate. • When subsequent, validation loss estimates exceed the minimal estimate five times, fitclinear terminates optimization. • If you specify ValidationData and to implement a cross-validation routine (CrossVal, CVPartition, Holdout, or KFold), then: 1

fitclinear randomly partitions X and Y (or Tbl) according to the cross-validation routine that you choose.

2

fitclinear trains the model using the training-data partition. During objective-function optimization, fitclinear uses ValidationData as another possible way to terminate optimization (for details, see the previous bullet).

3

Once fitclinear satisfies a stopping criterion, it constructs a trained model based on the optimized linear coefficients and intercept.

4

a

If you implement k-fold cross-validation, and fitclinear has not exhausted all trainingset folds, then fitclinear returns to Step 2 to train using the next training-set fold.

b

Otherwise, fitclinear terminates training, and then returns the cross-validated model.

You can determine the quality of the cross-validated model. For example: • To determine the validation loss using the holdout or out-of-fold data from step 1, pass the cross-validated model to kfoldLoss. • To predict observations on the holdout or out-of-fold data from step 1, pass the crossvalidated model to kfoldPredict.

• If you specify the Cost, Prior, and Weights name-value arguments, the output model object stores the specified values in the Cost, Prior, and W properties, respectively. The Cost property stores the user-specified cost matrix (C) without modification. The Prior and W properties store 35-2384

fitclinear

the prior probabilities and observation weights, respectively, after normalization. For model training, the software updates the prior probabilities and observation weights to incorporate the penalties described in the cost matrix. For details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 19-8.

Version History Introduced in R2016a R2022a: Regularization method determines the solver used during hyperparameter optimization Behavior changed in R2022a Starting in R2022a, when you specify to optimize hyperparameters and do not specify a Solver value, fitclinear uses either a Limited-memory BFGS (LBFGS) solver or a Sparse Reconstruction by Separable Approximation (SpaRSA) solver, depending on the regularization type selected during each iteration of the hyperparameter optimization. • When Regularization is 'ridge', the function sets the Solver value to 'lbfgs' by default. • When Regularization is 'lasso', the function sets the Solver value to 'sparsa' by default. In previous releases, the default solver selection during hyperparameter optimization depended on various factors, including the regularization type, learner type, and number of predictors. For more information, see Solver.

References [1] Hsieh, C. J., K. W. Chang, C. J. Lin, S. S. Keerthi, and S. Sundararajan. “A Dual Coordinate Descent Method for Large-Scale Linear SVM.” Proceedings of the 25th International Conference on Machine Learning, ICML ’08, 2001, pp. 408–415. [2] Langford, J., L. Li, and T. Zhang. “Sparse Online Learning Via Truncated Gradient.” J. Mach. Learn. Res., Vol. 10, 2009, pp. 777–801. [3] Nocedal, J. and S. J. Wright. Numerical Optimization, 2nd ed., New York: Springer, 2006. [4] Shalev-Shwartz, S., Y. Singer, and N. Srebro. “Pegasos: Primal Estimated Sub-Gradient Solver for SVM.” Proceedings of the 24th International Conference on Machine Learning, ICML ’07, 2007, pp. 807–814. [5] Wright, S. J., R. D. Nowak, and M. A. T. Figueiredo. “Sparse Reconstruction by Separable Approximation.” Trans. Sig. Proc., Vol. 57, No 7, 2009, pp. 2479–2493. [6] Xiao, Lin. “Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization.” J. Mach. Learn. Res., Vol. 11, 2010, pp. 2543–2596. [7] Xu, Wei. “Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent.” CoRR, abs/1107.2490, 2011.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. 35-2385

35

Functions

Usage notes and limitations: • fitclinear does not support tall table data. • Some name-value pair arguments have different defaults compared to the default values for the inmemory fitclinear function. Supported name-value pair arguments, and any differences, are: • 'ObservationsIn' — Supports only 'rows'. • 'Lambda' — Can be 'auto' (default) or a scalar. • 'Learner' • 'Regularization' — Supports only 'ridge'. • 'Solver' — Supports only 'lbfgs'. • 'FitBias' — Supports only true. • 'Verbose' — Default value is 1. • 'Beta' • 'Bias' • 'ClassNames' • 'Cost' • 'Prior' • 'Weights' — Value must be a tall array. • 'HessianHistorySize' • 'BetaTolerance' — Default value is relaxed to 1e–3. • 'GradientTolerance' — Default value is relaxed to 1e–3. • 'IterationLimit' — Default value is relaxed to 20. • 'OptimizeHyperparameters' — Value of 'Regularization' parameter must be 'ridge'. • 'HyperparameterOptimizationOptions' — For cross-validation, tall optimization supports only 'Holdout' validation. By default, the software selects and reserves 20% of the data as holdout validation data, and trains the model using the rest of the data. You can specify a different value for the holdout fraction by using this argument. For example, specify 'HyperparameterOptimizationOptions',struct('Holdout',0.3) to reserve 30% of the data as validation data. • For tall arrays, fitclinear implements LBFGS by distributing the calculation of the loss and gradient among different parts of the tall array at each iteration. Other solvers are not available for tall arrays. When initial values for Beta and Bias are not given, fitclinear refines the initial estimates of the parameters by fitting the model locally to parts of the data and combining the coefficients by averaging. For more information, see “Tall Arrays”. Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To perform parallel hyperparameter optimization, use the 'HyperparameterOptimizationOptions', struct('UseParallel',true) name-value argument in the call to the fitclinear function. 35-2386

fitclinear

For more information on parallel hyperparameter optimization, see “Parallel Bayesian Optimization” on page 10-7. For general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also fitcsvm | fitckernel | fitcecoc | fitglm | lassoglm | testcholdout | fitrlinear | templateLinear | predict | kfoldPredict | kfoldLoss | ClassificationLinear | ClassificationPartitionedLinear

35-2387

35

Functions

fitcnb Train multiclass naive Bayes model

Syntax Mdl = fitcnb(Tbl,ResponseVarName) Mdl = fitcnb(Tbl,formula) Mdl = fitcnb(Tbl,Y) Mdl = fitcnb(X,Y) Mdl = fitcnb( ___ ,Name,Value)

Description Mdl = fitcnb(Tbl,ResponseVarName) returns a multiclass naive Bayes model (Mdl), trained by the predictors in table Tbl and class labels in the variable Tbl.ResponseVarName. Mdl = fitcnb(Tbl,formula) returns a multiclass naive Bayes model (Mdl), trained by the predictors in table Tbl. formula is an explanatory model of the response and a subset of predictor variables in Tbl used to fit Mdl. Mdl = fitcnb(Tbl,Y) returns a multiclass naive Bayes model (Mdl), trained by the predictors in the table Tbl and class labels in the array Y. Mdl = fitcnb(X,Y) returns a multiclass naive Bayes model (Mdl), trained by predictors X and class labels Y. Mdl = fitcnb( ___ ,Name,Value) returns a naive Bayes classifier with additional options specified by one or more Name,Value pair arguments, using any of the previous syntaxes. For example, you can specify a distribution to model the data, prior probabilities for the classes, or the kernel smoothing window bandwidth.

Examples Train a Naive Bayes Classifier Load Fisher's iris data set. load fisheriris X = meas(:,3:4); Y = species; tabulate(Y) Value setosa versicolor virginica

Count 50 50 50

Percent 33.33% 33.33% 33.33%

The software can classify data with more than two classes using naive Bayes methods. 35-2388

fitcnb

Train a naive Bayes classifier. It is good practice to specify the class order. Mdl = fitcnb(X,Y,'ClassNames',{'setosa','versicolor','virginica'}) Mdl = ClassificationNaiveBayes ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: DistributionNames: DistributionParameters:

'Y' [] {'setosa' 'versicolor' 'none' 150 {'normal' 'normal'} {3x2 cell}

'virginica'}

Mdl is a trained ClassificationNaiveBayes classifier. By default, the software models the predictor distribution within each class using a Gaussian distribution having some mean and standard deviation. Use dot notation to display the parameters of a particular Gaussian fit, e.g., display the fit for the first feature within setosa. setosaIndex = strcmp(Mdl.ClassNames,'setosa'); estimates = Mdl.DistributionParameters{setosaIndex,1} estimates = 2×1 1.4620 0.1737

The mean is 1.4620 and the standard deviation is 0.1737. Plot the Gaussian contours. figure gscatter(X(:,1),X(:,2),Y); h = gca; cxlim = h.XLim; cylim = h.YLim; hold on Params = cell2mat(Mdl.DistributionParameters); Mu = Params(2*(1:3)-1,1:2); % Extract the means Sigma = zeros(2,2,3); for j = 1:3 Sigma(:,:,j) = diag(Params(2*j,:)).^2; % Create diagonal covariance matrix xlim = Mu(j,1) + 4*[-1 1]*sqrt(Sigma(1,1,j)); ylim = Mu(j,2) + 4*[-1 1]*sqrt(Sigma(2,2,j)); f = @(x,y) arrayfun(@(x0,y0) mvnpdf([x0 y0],Mu(j,:),Sigma(:,:,j)),x,y); fcontour(f,[xlim ylim]) % Draw contours for the multivariate normal distributions end h.XLim = cxlim; h.YLim = cylim; title('Naive Bayes Classifier -- Fisher''s Iris Data') xlabel('Petal Length (cm)') ylabel('Petal Width (cm)') legend('setosa','versicolor','virginica') hold off

35-2389

35

Functions

You can change the default distribution using the name-value pair argument 'DistributionNames'. For example, if some predictors are categorical, then you can specify that they are multivariate, multinomial random variables using 'DistributionNames','mvmn'.

Specify Prior Probabilities When Training Naive Bayes Classifiers Construct a naive Bayes classifier for Fisher's iris data set. Also, specify prior probabilities during training. Load Fisher's iris data set. load fisheriris X = meas; Y = species; classNames = {'setosa','versicolor','virginica'}; % Class order

X is a numeric matrix that contains four petal measurements for 150 irises. Y is a cell array of character vectors that contains the corresponding iris species. By default, the prior class probability distribution is the relative frequency distribution of the classes in the data set. In this case the prior probability is 33% for each species. However, suppose you know that in the population 50% of the irises are setosa, 20% are versicolor, and 30% are virginica. You can incorporate this information by specifying this distribution as a prior probability during training. 35-2390

fitcnb

Train a naive Bayes classifier. Specify the class order and prior class probability distribution. prior = [0.5 0.2 0.3]; Mdl = fitcnb(X,Y,'ClassNames',classNames,'Prior',prior) Mdl = ClassificationNaiveBayes ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: DistributionNames: DistributionParameters:

'Y' [] {'setosa' 'versicolor' 'virginica'} 'none' 150 {'normal' 'normal' 'normal' 'normal'} {3x4 cell}

Mdl is a trained ClassificationNaiveBayes classifier, and some of its properties appear in the Command Window. The software treats the predictors as independent given a class, and, by default, fits them using normal distributions. The naive Bayes algorithm does not use the prior class probabilities during training. Therefore, you can specify prior class probabilities after training using dot notation. For example, suppose that you want to see the difference in performance between a model that uses the default prior class probabilities and a model that uses different prior. Create a new naive Bayes model based on Mdl, and specify that the prior class probability distribution is an empirical class distribution. defaultPriorMdl = Mdl; FreqDist = cell2table(tabulate(Y)); defaultPriorMdl.Prior = FreqDist{:,3};

The software normalizes the prior class probabilities to sum to 1. Estimate the cross-validation error for both models using 10-fold cross-validation. rng(1); % For reproducibility defaultCVMdl = crossval(defaultPriorMdl); defaultLoss = kfoldLoss(defaultCVMdl) defaultLoss = 0.0533 CVMdl = crossval(Mdl); Loss = kfoldLoss(CVMdl) Loss = 0.0340

Mdl performs better than defaultPriorMdl.

Specify Predictor Distributions for Naive Bayes Classifiers Load Fisher's iris data set. 35-2391

35

Functions

load fisheriris X = meas; Y = species;

Train a naive Bayes classifier using every predictor. It is good practice to specify the class order. Mdl1 = fitcnb(X,Y,... 'ClassNames',{'setosa','versicolor','virginica'}) Mdl1 = ClassificationNaiveBayes ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: DistributionNames: DistributionParameters:

'Y' [] {'setosa' 'versicolor' 'virginica'} 'none' 150 {'normal' 'normal' 'normal' 'normal'} {3x4 cell}

Mdl1.DistributionParameters ans=3×4 cell array {2x1 double} {2x1 double} {2x1 double}

{2x1 double} {2x1 double} {2x1 double}

{2x1 double} {2x1 double} {2x1 double}

{2x1 double} {2x1 double} {2x1 double}

Mdl1.DistributionParameters{1,2} ans = 2×1 3.4280 0.3791

By default, the software models the predictor distribution within each class as a Gaussian with some mean and standard deviation. There are four predictors and three class levels. Each cell in Mdl1.DistributionParameters corresponds to a numeric vector containing the mean and standard deviation of each distribution, e.g., the mean and standard deviation for setosa iris sepal widths are 3.4280 and 0.3791, respectively. Estimate the confusion matrix for Mdl1. isLabels1 = resubPredict(Mdl1); ConfusionMat1 = confusionchart(Y,isLabels1);

35-2392

fitcnb

Element (j, k) of the confusion matrix chart represents the number of observations that the software classifies as k, but are truly in class j according to the data. Retrain the classifier using the Gaussian distribution for predictors 1 and 2 (the sepal lengths and widths), and the default normal kernel density for predictors 3 and 4 (the petal lengths and widths). Mdl2 = fitcnb(X,Y,... 'DistributionNames',{'normal','normal','kernel','kernel'},... 'ClassNames',{'setosa','versicolor','virginica'}); Mdl2.DistributionParameters{1,2} ans = 2×1 3.4280 0.3791

The software does not train parameters to the kernel density. Rather, the software chooses an optimal width. However, you can specify a width using the 'Width' name-value pair argument. Estimate the confusion matrix for Mdl2. isLabels2 = resubPredict(Mdl2); ConfusionMat2 = confusionchart(Y,isLabels2);

35-2393

35

Functions

Based on the confusion matrices, the two classifiers perform similarly in the training sample.

Compare Classifiers Using Cross-Validation Load Fisher's iris data set. load fisheriris X = meas; Y = species; rng(1); % For reproducibility

Train and cross-validate a naive Bayes classifier using the default options and k-fold cross-validation. It is good practice to specify the class order. CVMdl1 = fitcnb(X,Y,... 'ClassNames',{'setosa','versicolor','virginica'},... 'CrossVal','on');

By default, the software models the predictor distribution within each class as a Gaussian with some mean and standard deviation. CVMdl1 is a ClassificationPartitionedModel model. Create a default naive Bayes binary classifier template, and train an error-correcting, output codes multiclass model. 35-2394

fitcnb

t = templateNaiveBayes(); CVMdl2 = fitcecoc(X,Y,'CrossVal','on','Learners',t);

CVMdl2 is a ClassificationPartitionedECOC model. You can specify options for the naive Bayes binary learners using the same name-value pair arguments as for fitcnb. Compare the out-of-sample k-fold classification error (proportion of misclassified observations). classErr1 = kfoldLoss(CVMdl1,'LossFun','ClassifErr') classErr1 = 0.0533 classErr2 = kfoldLoss(CVMdl2,'LossFun','ClassifErr') classErr2 = 0.0467

Mdl2 has a lower generalization error.

Train Naive Bayes Classifiers Using Multinomial Predictors Some spam filters classify an incoming email as spam based on how many times a word or punctuation (called tokens) occurs in an email. The predictors are the frequencies of particular words or punctuations in an email. Therefore, the predictors compose multinomial random variables. This example illustrates classification using naive Bayes and multinomial predictors. Create Training Data Suppose you observed 1000 emails and classified them as spam or not spam. Do this by randomly assigning -1 or 1 to y for each email. n = 1000; rng(1); Y = randsample([-1 1],n,true);

% Sample size % For reproducibility % Random labels

To build the predictor data, suppose that there are five tokens in the vocabulary, and 20 observed tokens per email. Generate predictor data from the five tokens by drawing random, multinomial deviates. The relative frequencies for tokens corresponding to spam emails should differ from emails that are not spam. tokenProbs = [0.2 0.3 0.1 0.15 0.25;... 0.4 0.1 0.3 0.05 0.15]; % Token relative frequencies tokensPerEmail = 20; % Fixed for convenience X = zeros(n,5); X(Y == 1,:) = mnrnd(tokensPerEmail,tokenProbs(1,:),sum(Y == 1)); X(Y == -1,:) = mnrnd(tokensPerEmail,tokenProbs(2,:),sum(Y == -1));

Train the Classifier Train a naive Bayes classifier. Specify that the predictors are multinomial. Mdl = fitcnb(X,Y,'DistributionNames','mn');

Mdl is a trained ClassificationNaiveBayes classifier. Assess the in-sample performance of Mdl by estimating the misclassification error. 35-2395

35

Functions

isGenRate = resubLoss(Mdl,'LossFun','ClassifErr') isGenRate = 0.0200

The in-sample misclassification rate is 2%. Create New Data Randomly generate deviates that represent a new batch of emails. newN = 500; newY = randsample([-1 1],newN,true); newX = zeros(newN,5); newX(newY == 1,:) = mnrnd(tokensPerEmail,tokenProbs(1,:),... sum(newY == 1)); newX(newY == -1,:) = mnrnd(tokensPerEmail,tokenProbs(2,:),... sum(newY == -1));

Assess Classifier Performance Classify the new emails using the trained naive Bayes classifier Mdl, and determine whether the algorithm generalizes. oosGenRate = loss(Mdl,newX,newY) oosGenRate = 0.0261

The out-of-sample misclassification rate is 2.6% indicating that the classifier generalizes fairly well.

Optimize Naive Bayes Classifier This example shows how to use the OptimizeHyperparameters name-value pair to minimize crossvalidation loss in a naive Bayes classifier using fitcnb. The example uses Fisher's iris data. Load Fisher's iris data. load fisheriris X = meas; Y = species; classNames = {'setosa','versicolor','virginica'};

Optimize the classification using the 'auto' parameters. For reproducibility, set the random seed and use the 'expected-improvement-plus' acquisition function. rng default Mdl = fitcnb(X,Y,'ClassNames',classNames,'OptimizeHyperparameters','auto',... 'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName',... 'expected-improvement-plus'))

|================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Distribution-| W | | result | | runtime | (observed) | (estim.) | Names | |================================================================================================ | 1 | Best | 0.093333 | 1.1844 | 0.093333 | 0.093333 | kernel | 5. | 2 | Accept | 0.13333 | 0.3655 | 0.093333 | 0.11333 | kernel | 94

35-2396

fitcnb

| 3 | Best | 0.053333 | 0.23879 | 0.053333 | 0.05765 | normal | | 4 | Accept | 0.053333 | 0.11305 | 0.053333 | 0.053336 | normal | | 5 | Accept | 0.26667 | 0.3385 | 0.053333 | 0.053338 | kernel | 0.00 | 6 | Accept | 0.093333 | 0.31532 | 0.053333 | 0.053337 | kernel | 10 | 7 | Accept | 0.26667 | 0.31183 | 0.053333 | 0.05334 | kernel | 0.001 | 8 | Accept | 0.093333 | 0.31837 | 0.053333 | 0.053338 | kernel | 98 | 9 | Accept | 0.13333 | 0.32457 | 0.053333 | 0.053338 | kernel | 99 | 10 | Accept | 0.053333 | 0.10881 | 0.053333 | 0.053336 | normal | | 11 | Accept | 0.053333 | 0.092586 | 0.053333 | 0.053336 | normal | | 12 | Best | 0.046667 | 0.31064 | 0.046667 | 0.046679 | kernel | 0.3 | 13 | Accept | 0.11333 | 0.30626 | 0.046667 | 0.046685 | kernel | 1. | 14 | Accept | 0.053333 | 0.3077 | 0.046667 | 0.046695 | kernel | 0.1 | 15 | Accept | 0.046667 | 0.30945 | 0.046667 | 0.046677 | kernel | 0.2 | 16 | Accept | 0.06 | 0.32655 | 0.046667 | 0.046686 | kernel | 0.5 | 17 | Accept | 0.046667 | 0.31136 | 0.046667 | 0.046656 | kernel | 0.0 | 18 | Accept | 0.093333 | 0.31344 | 0.046667 | 0.046654 | kernel | 13 | 19 | Accept | 0.046667 | 0.30673 | 0.046667 | 0.04648 | kernel | 0.1 | 20 | Best | 0.04 | 0.29801 | 0.04 | 0.040132 | kernel | 0.1 |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Distribution-| W | | result | | runtime | (observed) | (estim.) | Names | |================================================================================================ | 21 | Accept | 0.04 | 0.31357 | 0.04 | 0.040066 | kernel | 0.1 | 22 | Accept | 0.04 | 0.31172 | 0.04 | 0.040043 | kernel | 0.1 | 23 | Accept | 0.04 | 0.30052 | 0.04 | 0.040031 | kernel | 0.1 | 24 | Accept | 0.10667 | 0.29607 | 0.04 | 0.040018 | kernel | 0.008 | 25 | Accept | 0.073333 | 0.33546 | 0.04 | 0.040022 | kernel | 0.0 | 26 | Accept | 0.04 | 0.30352 | 0.04 | 0.04002 | kernel | 0. | 27 | Accept | 0.13333 | 0.30853 | 0.04 | 0.040021 | kernel | 12 | 28 | Accept | 0.11333 | 0.36925 | 0.04 | 0.040006 | kernel | 0.004 | 29 | Accept | 0.1 | 0.37091 | 0.04 | 0.039993 | kernel | 0.02 | 30 | Accept | 0.046667 | 0.33394 | 0.04 | 0.041008 | kernel | 0.1 __________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 23.35 seconds Total objective function evaluation time: 9.7453 Best observed feasible point: DistributionNames Width _________________ _______ kernel

0.19525

Standardize ___________ true

Observed objective function value = 0.04 Estimated objective function value = 0.041117 Function evaluation time = 0.29801 Best estimated feasible point (according to models): DistributionNames Width Standardize _________________ ______ ___________ kernel

0.2037

true

Estimated objective function value = 0.041008 Estimated function evaluation time = 0.31644

35-2397

35

Functions

Mdl = ClassificationNaiveBayes ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: HyperparameterOptimizationResults: DistributionNames: DistributionParameters: Kernel: Support: Width: Mu: Sigma:

'Y' [] {'setosa' 'versicolor' 'virginica'} 'none' 150 [1x1 BayesianOptimization] {'kernel' 'kernel' 'kernel' 'kernel'} {3x4 cell} {'normal' 'normal' 'normal' 'normal'} {'unbounded' 'unbounded' 'unbounded' 'unbounded'} [3x4 double] [5.8433 3.0573 3.7580 1.1993] [0.8281 0.4359 1.7653 0.7622]

Input Arguments Tbl — Sample data table Sample data used to train the model, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain one 35-2398

fitcnb

additional column for the response variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed. • If Tbl contains the response variable, and you want to use all remaining variables in Tbl as predictors, then specify the response variable by using ResponseVarName. • If Tbl contains the response variable, and you want to use only a subset of the remaining variables in Tbl as predictors, then specify a formula by using formula. • If Tbl does not contain the response variable, then specify a response variable by using Y. The length of the response variable and the number of rows in Tbl must be equal. ResponseVarName — Response variable name name of variable in Tbl Response variable name, specified as the name of a variable in Tbl. You must specify ResponseVarName as a character vector or string scalar. For example, if the response variable Y is stored as Tbl.Y, then specify it as "Y". Otherwise, the software treats all columns of Tbl, including Y, as predictors when training the model. The response variable must be a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. If Y is a character array, then each element of the response variable must correspond to one row of the array. A good practice is to specify the order of the classes by using the ClassNames name-value argument. Data Types: char | string formula — Explanatory model of response variable and subset of predictor variables character vector | string scalar Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form "Y~x1+x2+x3". In this form, Y represents the response variable, and x1, x2, and x3 represent the predictor variables. To specify a subset of variables in Tbl as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in Tbl that do not appear in formula. The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB identifiers. You can verify the variable names in Tbl by using the isvarname function. If the variable names are not valid, then you can convert them by using the matlab.lang.makeValidName function. Data Types: char | string Y — Class labels categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Class labels to which the naive Bayes classifier is trained, specified as a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. Each element of Y defines the class membership of the corresponding row of X. Y supports K class levels. If Y is a character array, then each row must correspond to one class label. 35-2399

35

Functions

The length of Y and the number of rows of X must be equivalent. Data Types: categorical | char | string | logical | single | double | cell X — Predictor data numeric matrix Predictor data, specified as a numeric matrix. Each row of X corresponds to one observation (also known as an instance or example), and each column corresponds to one variable (also known as a feature). The length of Y and the number of rows of X must be equivalent. Data Types: double Note: The software treats NaN, empty character vector (''), empty string (""), , and elements as missing data values. • If Y contains missing values, then the software removes them and the corresponding rows of X. • If X contains any rows composed entirely of missing values, then the software removes those rows and the corresponding elements of Y. • If X contains missing values and you set 'DistributionNames','mn', then the software removes those rows of X and the corresponding elements of Y. • If a predictor is not represented in a class, that is, if all of its values are NaN within a class, then the software returns an error. Removing rows of X and corresponding elements of Y decreases the effective training or crossvalidation sample size. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Note You cannot use any cross-validation name-value argument together with the 'OptimizeHyperparameters' name-value argument. You can modify the cross-validation for 'OptimizeHyperparameters' only by using the 'HyperparameterOptimizationOptions' name-value argument. Example: 'DistributionNames','mn','Prior','uniform','KSWidth',0.5 specifies that the data distribution is multinomial, the prior probabilities for all classes are equal, and the kernel smoothing window bandwidth for all classes is 0.5 units. Naive Bayes Options

DistributionNames — Data distributions 'kernel' | 'mn' | 'mvmn' | 'normal' | string array | cell array of character vectors 35-2400

fitcnb

Data distributions fitcnb uses to model the data, specified as the comma-separated pair consisting of 'DistributionNames' and a character vector or string scalar, a string array, or a cell array of character vectors with values from this table. Value

Description

'kernel'

Kernel smoothing density estimate.

'mn'

Multinomial distribution. If you specify mn, then all features are components of a multinomial distribution. Therefore, you cannot include 'mn' as an element of a string array or a cell array of character vectors. For details, see “Algorithms” on page 35-2413.

'mvmn'

Multivariate multinomial distribution. For details, see “Algorithms” on page 35-2413.

'normal'

Normal (Gaussian) distribution.

If you specify a character vector or string scalar, then the software models all the features using that distribution. If you specify a 1-by-P string array or cell array of character vectors, then the software models feature j using the distribution in element j of the array. By default, the software sets all predictors specified as categorical predictors (using the CategoricalPredictors name-value pair argument) to 'mvmn'. Otherwise, the default distribution is 'normal'. You must specify that at least one predictor has distribution 'kernel' to additionally specify Kernel, Standardize, Support, or Width. Example: 'DistributionNames','mn' Example: 'DistributionNames',{'kernel','normal','kernel'} Kernel — Kernel smoother type 'normal' (default) | 'box' | 'epanechnikov' | 'triangle' | string array | cell array of character vectors Kernel smoother type, specified as the comma-separated pair consisting of 'Kernel' and a character vector or string scalar, a string array, or a cell array of character vectors. This table summarizes the available options for setting the kernel smoothing density region. Let I{u} denote the indicator function. Value

Kernel

'box'

Box (uniform)

'epanechnik Epanechnikov ov' 'normal'

Gaussian

'triangle'

Triangular

Formula f (x) = 0.5I x ≤ 1 f (x) = 0.75 1 − x2 I x ≤ 1 f (x) =

1 exp −0.5x2 2π

f (x) = 1 − x I x ≤ 1

If you specify a 1-by-P string array or cell array, with each element of the array containing any value in the table, then the software trains the classifier using the kernel smoother type in element j for 35-2401

35

Functions

feature j in X. The software ignores elements of Kernel not corresponding to a predictor whose distribution is 'kernel'. You must specify that at least one predictor has distribution 'kernel' to additionally specify Kernel, Standardize, Support, or Width. Example: 'Kernel',{'epanechnikov','normal'} Standardize — Flag to standardize kernel-distributed predictors false or 0 (default) | true or 1 Flag to standardize the kernel-distributed predictors, specified as a numeric or logical 0 (false) or 1 (true). This argument is valid only when the DistributionNames value contains at least one kernel distribution ("kernel"). If you set Standardize to true, then the software centers and scales each kernel-distributed predictor variable by the corresponding column mean and standard deviation. The software does not standardize predictors with nonkernel distributions, such as categorical predictors. Example: "Standardize",true Data Types: single | double | logical Support — Kernel smoothing density support 'unbounded' (default) | 'positive' | string array | cell array | numeric row vector Kernel smoothing density support, specified as the comma-separated pair consisting of 'Support' and 'positive', 'unbounded', a string array, a cell array, or a numeric row vector. The software applies the kernel smoothing density to the specified region. This table summarizes the available options for setting the kernel smoothing density region. Value

Description

1-by-2 numeric row vector

For example, [L,U], where L and U are the finite lower and upper bounds, respectively, for the density support.

'positive'

The density support is all positive real values.

'unbounded'

The density support is all real values.

If you specify a 1-by-P string array or cell array, with each element in the string array containing any text value in the table and each element in the cell array containing any value in the table, then the software trains the classifier using the kernel support in element j for feature j in X. The software ignores elements of Kernel not corresponding to a predictor whose distribution is 'kernel'. You must specify that at least one predictor has distribution 'kernel' to additionally specify Kernel, Standardize, Support, or Width. Example: 'Support',{[-10,20],'unbounded'} Data Types: char | string | cell | double Width — Kernel smoothing window width matrix of numeric values | numeric column vector | numeric row vector | scalar Kernel smoothing window width, specified as the comma-separated pair consisting of 'Width' and a matrix of numeric values, numeric column vector, numeric row vector, or scalar. 35-2402

fitcnb

Suppose there are K class levels and P predictors. This table summarizes the available options for setting the kernel smoothing window width. Value

Description

K-by-P matrix of numeric values

Element (k,j) specifies the width for predictor j in class k.

K-by-1 numeric column vector

Element k specifies the width for all predictors in class k.

1-by-P numeric row vector

Element j specifies the width in all class levels for predictor j.

scalar

Specifies the bandwidth for all features in all classes.

By default, the software selects a default width automatically for each combination of predictor and class by using a value that is optimal for a Gaussian distribution. If you specify Width and it contains NaNs, then the software selects widths for the elements containing NaNs. You must specify that at least one predictor has distribution 'kernel' to additionally specify Kernel, Standardize, Support, or Width. Example: 'Width',[NaN NaN] Data Types: double | struct Cross-Validation Options

CrossVal — Cross-validation flag 'off' (default) | 'on' Cross-validation flag, specified as the comma-separated pair consisting of 'Crossval' and 'on' or 'off'. If you specify 'on', then the software implements 10-fold cross-validation. To override this cross-validation setting, use one of these name-value pair arguments: CVPartition, Holdout, KFold, or Leaveout. To create a cross-validated model, you can use one cross-validation name-value pair argument at a time only. Alternatively, cross-validate later by passing Mdl to crossval. Example: 'CrossVal','on' CVPartition — Cross-validation partition [] (default) | cvpartition object Cross-validation partition, specified as a cvpartition object that specifies the type of crossvalidation and the indexing for the training and validation sets. To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,KFold=5). Then, you can specify the cross-validation partition by setting CVPartition=cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) 35-2403

35

Functions

Fraction of the data used for holdout validation, specified as a scalar value in the range [0,1]. If you specify Holdout=p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Holdout=0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in the cross-validated model, specified as a positive integer value greater than 1. If you specify KFold=k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact trained models in a k-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: KFold=5 Data Types: single | double Leaveout — Leave-one-out cross-validation flag "off" (default) | "on" Leave-one-out cross-validation flag, specified as "on" or "off". If you specify Leaveout="on", then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the one observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact trained models in an n-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Leaveout="on" Data Types: char | string Other Classification Options

CategoricalPredictors — Categorical predictors list vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | 'all' 35-2404

fitcnb

Categorical predictors list, specified as one of the values in this table. Value

Description

Vector of positive integers

Each entry in the vector is an index value indicating that the corresponding predictor is categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If fitcnb uses a subset of input variables as predictors, then the function indexes the predictors using only the subset. The CategoricalPredictors values do not count the response variable, observation weights variable, or any other variables that the function does not use.

Logical vector

A true entry means that the corresponding predictor is categorical. The length of the vector is p.

Character matrix

Each row of the matrix is the name of a predictor variable. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length.

String array or cell array of character vectors

Each element in the array is the name of a predictor variable. The names must match the entries in PredictorNames.

"all"

All predictors are categorical.

By default, if the predictor data is in a table (Tbl), fitcnb assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (X), fitcnb assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the CategoricalPredictors name-value argument. For the identified categorical predictors, fitcnb uses multivariate multinomial distributions. For details, see DistributionNames and “Algorithms” on page 35-2413. Example: 'CategoricalPredictors','all' Data Types: single | double | logical | char | string | cell ClassNames — Names of classes to use for training categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Names of classes to use for training, specified as a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. ClassNames must have the same data type as the response variable in Tbl or Y. If ClassNames is a character array, then each element must correspond to one row of the array. Use ClassNames to: • Specify the order of the classes during training. • Specify the order of any input or output argument dimension that corresponds to the class order. For example, use ClassNames to specify the order of the dimensions of Cost or the column order of classification scores returned by predict. 35-2405

35

Functions

• Select a subset of classes for training. For example, suppose that the set of all distinct class names in Y is ["a","b","c"]. To train the model using observations from classes "a" and "c" only, specify "ClassNames",["a","c"]. The default value for ClassNames is the set of all distinct class names in the response variable in Tbl or Y. Example: "ClassNames",["b","g"] Data Types: categorical | char | string | logical | single | double | cell Cost — Cost of misclassification square matrix | structure Cost of misclassification of a point, specified as the comma-separated pair consisting of 'Cost' and one of the following: • Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (i.e., the rows correspond to the true class and the columns correspond to the predicted class). To specify the class order for the corresponding rows and columns of Cost, additionally specify the ClassNames name-value pair argument. • Structure S having two fields: S.ClassNames containing the group names as a variable of the same type as Y, and S.ClassificationCosts containing the cost matrix. The default is Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j. Example: 'Cost',struct('ClassNames',{{'b','g'}},'ClassificationCosts',[0 0.5; 1 0]) Data Types: single | double | struct PredictorNames — Predictor variable names string array of unique names | cell array of unique character vectors Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The functionality of PredictorNames depends on the way you supply the training data. • If you supply X and Y, then you can use PredictorNames to assign names to the predictor variables in X. • The order of the names in PredictorNames must correspond to the column order of X. That is, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal. • By default, PredictorNames is {'x1','x2',...}. • If you supply Tbl, then you can use PredictorNames to choose which predictor variables to use in training. That is, fitcnb uses only the predictor variables in PredictorNames and the response variable during training. • PredictorNames must be a subset of Tbl.Properties.VariableNames and cannot include the name of the response variable. • By default, PredictorNames contains the names of all predictor variables. • A good practice is to specify the predictors for training using either PredictorNames or formula, but not both. 35-2406

fitcnb

Example: "PredictorNames", ["SepalLength","SepalWidth","PetalLength","PetalWidth"] Data Types: string | cell Prior — Prior probabilities 'empirical' (default) | 'uniform' | vector of scalar values | structure Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and a value in this table. Value

Description

'empirical'

The class prior probabilities are the class relative frequencies in Y.

'uniform'

All class prior probabilities are equal to 1/K, where K is the number of classes.

numeric vector

Each element is a class prior probability. Order the elements according to Mdl.ClassNames or specify the order using the ClassNames namevalue pair argument. The software normalizes the elements such that they sum to 1.

structure

A structure S with two fields: • S.ClassNames contains the class names as a variable of the same type as Y. • S.ClassProbs contains a vector of corresponding prior probabilities. The software normalizes the elements such that they sum to 1.

If you set values for both Weights and Prior, the weights are renormalized to add up to the value of the prior probability in the respective class. Example: 'Prior','uniform' Data Types: char | string | single | double | struct ResponseName — Response variable name "Y" (default) | character vector | string scalar Response variable name, specified as a character vector or string scalar. • If you supply Y, then you can use ResponseName to specify a name for the response variable. • If you supply ResponseVarName or formula, then you cannot use ResponseName. Example: "ResponseName","response" Data Types: char | string ScoreTransform — Score transformation "none" (default) | "doublelogit" | "invlogit" | "ismax" | "logit" | function handle | ... Score transformation, specified as a character vector, string scalar, or function handle. 35-2407

35

Functions

This table summarizes the available character vectors and string scalars. Value

Description

"doublelogit"

1/(1 + e–2x)

"invlogit"

log(x / (1 – x))

"ismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

"logit"

1/(1 + e–x)

"none" or "identity"

x (no transformation)

"sign"

–1 for x < 0 0 for x = 0 1 for x > 0

"symmetric"

2x – 1

"symmetricismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

"symmetriclogit"

2/(1 + e–x) – 1

For a MATLAB function or a function you define, use its function handle for the score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Example: "ScoreTransform","logit" Data Types: char | string | function_handle Weights — Observation weights numeric vector of positive values | name of variable in Tbl Observation weights, specified as the comma-separated pair consisting of 'Weights' and a numeric vector of positive values or name of a variable in Tbl. The software weighs the observations in each row of X or Tbl with the corresponding value in Weights. The size of Weights must equal the number of rows of X or Tbl. If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored as Tbl.W, then specify it as 'W'. Otherwise, the software treats all columns of Tbl, including W, as predictors or the response when training the model. The software normalizes Weights to sum up to the value of the prior probability in the respective class. By default, Weights is ones(n,1), where n is the number of observations in X or Tbl. Data Types: double | single | char | string

35-2408

fitcnb

Hyperparameter Optimization

OptimizeHyperparameters — Parameters to optimize 'none' (default) | 'auto' | 'all' | string array or cell array of eligible parameter names | vector of optimizableVariable objects Parameters to optimize, specified as the comma-separated pair consisting of 'OptimizeHyperparameters' and one of the following: • 'none' — Do not optimize. • 'auto' — Use {'DistributionNames','Standardize','Width'}. • 'all' — Optimize all eligible parameters. • String array or cell array of eligible parameter names. • Vector of optimizableVariable objects, typically the output of hyperparameters. The optimization attempts to minimize the cross-validation loss (error) for fitcnb by varying the parameters. For information about cross-validation loss (albeit in a different context), see “Classification Loss” on page 35-4305. To control the cross-validation type and other aspects of the optimization, use the HyperparameterOptimizationOptions name-value pair. Note The values of 'OptimizeHyperparameters' override any values you specify using other name-value arguments. For example, setting 'OptimizeHyperparameters' to 'auto' causes fitcnb to optimize hyperparameters corresponding to the 'auto' option and to ignore any specified values for the hyperparameters. The eligible parameters for fitcnb are: • DistributionNames — fitcnb searches among 'normal' and 'kernel'. • Kernel — fitcnb searches among 'normal', 'box', 'epanechnikov', and 'triangle'. • Standardize — fitcnb searches among true and false. • Width — fitcnb searches among real values, by default log-scaled in the range [1e-3,1e3]. Set nondefault parameters by passing a vector of optimizableVariable objects that have nondefault values. For example, load fisheriris params = hyperparameters('fitcnb',meas,species); params(2).Range = [1e-2,1e2];

Pass params as the value of OptimizeHyperparameters. By default, the iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is the misclassification rate. To control the iterative display, set the Verbose field of the 'HyperparameterOptimizationOptions' name-value argument. To control the plots, set the ShowPlots field of the 'HyperparameterOptimizationOptions' name-value argument. For an example, see “Optimize Naive Bayes Classifier” on page 35-2396. Example: 'auto' 35-2409

35

Functions

HyperparameterOptimizationOptions — Options for optimization structure Options for optimization, specified as a structure. This argument modifies the effect of the OptimizeHyperparameters name-value argument. All fields in the structure are optional. Field Name

Values

Default

Optimizer

• 'bayesopt' — Use Bayesian optimization. Internally, this setting calls bayesopt.

'bayesopt'

• 'gridsearch' — Use grid search with NumGridDivisions values per dimension. • 'randomsearch' — Search at random among MaxObjectiveEvaluations points. 'gridsearch' searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command sortrows(Mdl.HyperparameterOptimizatio nResults). AcquisitionFunct • 'expected-improvement-per-secondionName plus' • 'expected-improvement'

'expectedimprovement-persecond-plus'

• 'expected-improvement-plus' • 'expected-improvement-per-second' • 'lower-confidence-bound' • 'probability-of-improvement' Acquisition functions whose names include persecond do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include plus modify their behavior when they are overexploiting an area. For more details, see “Acquisition Function Types” on page 10-3.

35-2410

MaxObjectiveEval Maximum number of objective function uations evaluations.

30 for 'bayesopt' and 'randomsearch', and the entire grid for 'gridsearch'

MaxTime

Inf

Time limit, specified as a positive real scalar. The time limit is in seconds, as measured by tic and toc. The run time can exceed MaxTime because MaxTime does not interrupt function evaluations.

fitcnb

Field Name

Values

Default

NumGridDivisions For 'gridsearch', the number of values in each 10 dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables. ShowPlots

Logical value indicating whether to show plots. If true true, this field plots the best observed objective function value against the iteration number. If you use Bayesian optimization (Optimizer is 'bayesopt'), then this field also plots the best estimated objective function value. The best observed objective function values and best estimated objective function values correspond to the values in the BestSoFar (observed) and BestSoFar (estim.) columns of the iterative display, respectively. You can find these values in the properties ObjectiveMinimumTrace and EstimatedObjectiveMinimumTrace of Mdl.HyperparameterOptimizationResults. If the problem includes one or two optimization parameters for Bayesian optimization, then ShowPlots also plots a model of the objective function against the parameters.

SaveIntermediate Logical value indicating whether to save results Results when Optimizer is 'bayesopt'. If true, this field overwrites a workspace variable named 'BayesoptResults' at each iteration. The variable is a BayesianOptimization object.

false

Verbose

1

Display at the command line: • 0 — No iterative display • 1 — Iterative display • 2 — Iterative display with extra information For details, see the bayesopt Verbose namevalue argument and the example “Optimize Classifier Fit Using Bayesian Optimization” on page 10-56.

UseParallel

Logical value indicating whether to run Bayesian false optimization in parallel, which requires Parallel Computing Toolbox. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For details, see “Parallel Bayesian Optimization” on page 10-7.

35-2411

35

Functions

Field Name

Values

Default

Repartition

Logical value indicating whether to repartition the false cross-validation at every iteration. If this field is false, the optimizer uses a single partition for the optimization. The setting true usually gives the most robust results because it takes partitioning noise into account. However, for good results, true requires at least twice as many function evaluations.

Use no more than one of the following three options. CVPartition

A cvpartition object, as created by cvpartition

Holdout

A scalar in the range (0,1) representing the holdout fraction

Kfold

An integer greater than 1

'Kfold',5 if you do not specify a cross-validation field

Example: 'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60) Data Types: struct

Output Arguments Mdl — Trained naive Bayes classification model ClassificationNaiveBayes model object | ClassificationPartitionedModel cross-validated model object Trained naive Bayes classification model, returned as a ClassificationNaiveBayes model object or a ClassificationPartitionedModel cross-validated model object. If you set any of the name-value pair arguments KFold, Holdout, CrossVal, or CVPartition, then Mdl is a ClassificationPartitionedModel cross-validated model object. Otherwise, Mdl is a ClassificationNaiveBayes model object. To reference properties of Mdl, use dot notation. For example, to access the estimated distribution parameters, enter Mdl.DistributionParameters.

More About Bag-of-Tokens Model In the bag-of-tokens model, the value of predictor j is the nonnegative number of occurrences of token j in the observation. The number of categories (bins) in the multinomial model is the number of distinct tokens (number of predictors). Naive Bayes Naive Bayes is a classification algorithm that applies density estimation to the data. The algorithm leverages Bayes theorem, and (naively) assumes that the predictors are conditionally independent, given the class. Although the assumption is usually violated in practice, naive Bayes 35-2412

fitcnb

classifiers tend to yield posterior distributions that are robust to biased class density estimates, particularly where the posterior is 0.5 (the decision boundary) [1]. Naive Bayes classifiers assign observations to the most probable class (in other words, the maximum a posteriori decision rule). Explicitly, the algorithm takes these steps: 1

Estimate the densities of the predictors within each class.

2

Model posterior probabilities according to Bayes rule. That is, for all k = 1,...,K, π Y=k P Y = k X1, .., XP =

∑

P

∏

j=1 K

k=1

π Y=k

P Xj Y = k P

∏

j=1

, P Xj Y = k

where: • Y is the random variable corresponding to the class index of an observation. • X1,...,XP are the random predictors of an observation. • π Y = k is the prior probability that a class index is k. 3

Classify an observation by estimating the posterior probability for each class, and then assign the observation to the class yielding the maximum posterior probability.

If the predictors compose a multinomial distribution, then the posterior probability P Y = k X1, .., XP ∝ π Y = k Pmn X1, ..., XP Y = k , where Pmn X1, ..., XP Y = k is the probability mass function of a multinomial distribution.

Tips • For classifying count-based data, such as the bag-of-tokens model on page 35-2412, use the multinomial distribution (e.g., set 'DistributionNames','mn'). • After training a model, you can generate C/C++ code that predicts labels for new data. Generating C/C++ code requires MATLAB Coder. For details, see “Introduction to Code Generation” on page 34-3.

Algorithms • If predictor variable j has a conditional normal distribution (see the DistributionNames namevalue argument), the software fits the distribution to the data by computing the class-specific weighted mean and the unbiased estimate of the weighted standard deviation. For each class k: • The weighted mean of predictor j is

∑

xj

k

=

i: yi = k

∑

i: yi = k

wixi j wi

,

where wi is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class. 35-2413

35

Functions

• The unbiased estimator of the weighted standard deviation of predictor j is

∑

sj

k

=

wi xi j − x j

i: yi = k

z1

k−

k

2 1/2

,

z2 k z1 k

where z1|k is the sum of the weights within class k and z2|k is the sum of the squared weights within class k. • If all predictor variables compose a conditional multinomial distribution (you specify 'DistributionNames','mn'), the software fits the distribution using the bag-of-tokens model on page 35-2412. The software stores the probability that token j appears in class k in the property DistributionParameters{k,j}. Using additive smoothing [2], the estimated probability is P(token j class k) =

1 + cj k , P + ck

where:

∑

• cj

k

= nk

i: yi = k

∑

i: yi = k

xi jwi wi

, which is the weighted number of occurrences of token j in class k.

• nk is the number of observations in class k. • wi is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class. •

ck =

P

∑

j=1

c j k, which is the total weighted number of occurrences of all tokens in class k.

• If predictor variable j has a conditional multivariate multinomial distribution: 1

The software collects a list of the unique levels, stores the sorted list in CategoricalLevels, and considers each level a bin. Each predictor/class combination is a separate, independent multinomial random variable.

2

For each class k, the software counts instances of each categorical level using the list stored in CategoricalLevels{j}.

3

The software stores the probability that predictor j, in class k, has level L in the property DistributionParameters{k,j}, for all levels in CategoricalLevels{j}. Using additive smoothing [2], the estimated probability is P predictor j = L class k =

1 + m j k(L) , m j + mk

where:

∑

• m j k(L) = nk

i: yi = k

I xi j = L wi

∑

i: yi = k

wi

predictor j equals L in class k. 35-2414

, which is the weighted number of observations for which

fitcnb

• nk is the number of observations in class k. • I xi j = L = 1 if xij = L, 0 otherwise. • wi is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class. • mj is the number of distinct levels in predictor j. • mk is the weighted number of observations in class k. • If you specify the Cost, Prior, and Weights name-value arguments, the output model object stores the specified values in the Cost, Prior, and W properties, respectively. The Cost property stores the user-specified cost matrix as is. The Prior and W properties store the prior probabilities and observation weights, respectively, after normalization. For details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 19-8. • The software uses the Cost property for prediction, but not training. Therefore, Cost is not readonly; you can change the property value by using dot notation after creating the trained model.

Version History Introduced in R2014b R2023b: Naive Bayes models support standardization of kernel-distributed predictors Behavior changed in R2023b Starting in R2023b, fitcnb supports the standardization of predictors with kernel distributions. That is, you can specify the Standardize name-value argument as true when the DistributionNames name-value argument includes at least one "kernel" distribution. You can also optimize the Standardize hyperparameter by using the OptimizeHyperparameters name-value argument. Unlike in previous releases, when you specify "auto" as the OptimizeHyperparameters value, fitcnb includes Standardize as an optimizable hyperparameter. R2023b: Width hyperparameter search range does not depend on predictor data during optimization of naive Bayes models Behavior changed in R2023b Starting in R2023b, fitcnb optimizes the kernel smoothing window width of naive Bayes models by using the default search range [1e-3,1e3]. That is, when you specify to optimize the naive Bayes hyperparameter Width by using the OptimizeHyperparameters name-value argument, the function searches among positive values log-scaled in the range [1e-3,1e3]. In previous releases, the default search range for the Width hyperparameter was [MinPredictorDiff/4,max(MaxPredictorRange,MinPredictorDiff)], where MinPredictorDiff and MaxPredictorRange were determined as follows: diffs = diff(sort(X)); MinPredictorDiff = min(diffs(diffs ~= 0),[],"omitnan"); MaxPredictorRange = max(max(X) - min(X));

References [1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008. 35-2415

35

Functions

[2] Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval, NY: Cambridge University Press, 2008.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function supports tall arrays with the limitations: • Supported syntaxes are: • Mdl = fitcnb(Tbl,Y) • Mdl = fitcnb(X,Y) • Mdl = fitcnb(___,Name,Value) • Options related to kernel densities, cross-validation, and hyperparameter optimization are not supported. The supported name-value pair arguments are: • 'DistributionNames' — 'kernel' value is not supported. • 'CategoricalPredictors' • 'Cost' • 'PredictorNames' • 'Prior' • 'ResponseName' • 'ScoreTransform' • 'Weights' — Value must be a tall array. For more information, see “Tall Arrays for Out-of-Memory Data”. Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To perform parallel hyperparameter optimization, use the 'HyperparameterOptimizationOptions', struct('UseParallel',true) name-value argument in the call to the fitcnb function. For more information on parallel hyperparameter optimization, see “Parallel Bayesian Optimization” on page 10-7. For general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationNaiveBayes | predict | ClassificationPartitionedModel | templateNaiveBayes Topics “Naive Bayes Classification” on page 22-2 “Grouping Variables” on page 2-11 35-2416

fitcnb

“Incremental Learning with Naive Bayes and Heterogeneous Data” on page 28-60

35-2417

35

Functions

fitcnet Train neural network classification model

Syntax Mdl = fitcnet(Tbl,ResponseVarName) Mdl = fitcnet(Tbl,formula) Mdl = fitcnet(Tbl,Y) Mdl = fitcnet(X,Y) Mdl = fitcnet( ___ ,Name,Value)

Description Use fitcnet to train a feedforward, fully connected neural network for classification. The first fully connected layer of the neural network has a connection from the network input (predictor data), and each subsequent layer has a connection from the previous layer. Each fully connected layer multiplies the input by a weight matrix and then adds a bias vector. An activation function follows each fully connected layer. The final fully connected layer and the subsequent softmax activation function produce the network's output, namely classification scores (posterior probabilities) and predicted labels. For more information, see “Neural Network Structure” on page 35-2456. Mdl = fitcnet(Tbl,ResponseVarName) returns a neural network classification model Mdl trained using the predictors in the table Tbl and the class labels in the ResponseVarName table variable. Mdl = fitcnet(Tbl,formula) returns a neural network classification model trained using the sample data in the table Tbl. The input argument formula is an explanatory model of the response and a subset of the predictor variables in Tbl used to fit Mdl. Mdl = fitcnet(Tbl,Y) returns a neural network classification model using the predictor variables in the table Tbl and the class labels in vector Y. Mdl = fitcnet(X,Y) returns a neural network classification model trained using the predictors in the matrix X and the class labels in vector Y. Mdl = fitcnet( ___ ,Name,Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. For example, you can adjust the number of outputs and the activation functions for the fully connected layers by specifying the LayerSizes and Activations name-value arguments.

Examples Train Neural Network Classifier Train a neural network classifier, and assess the performance of the classifier on a test set. Read the sample file CreditRating_Historical.dat into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response 35-2418

fitcnet

variable consists of credit ratings assigned by a rating agency. Preview the first few rows of the data set. creditrating = readtable("CreditRating_Historical.dat"); head(creditrating) ID _____

WC_TA ______

RE_TA ______

EBIT_TA _______

62394 48608 42444 48631 43768 39255 62236 39354

0.013 0.232 0.311 0.194 0.121 -0.117 0.087 0.005

0.104 0.335 0.367 0.263 0.413 -0.799 0.158 0.181

0.036 0.062 0.074 0.062 0.057 0.01 0.049 0.034

MVE_BVTD ________ 0.447 1.969 1.935 1.017 3.647 0.179 0.816 2.597

S_TA _____

Industry ________

Rating _______

0.142 0.281 0.366 0.228 0.466 0.082 0.324 0.388

3 8 1 4 12 4 2 7

{'BB' } {'A' } {'A' } {'BBB'} {'AAA'} {'CCC'} {'BBB'} {'AA' }

Because each value in the ID variable is a unique customer ID, that is, length(unique(creditrating.ID)) is equal to the number of observations in creditrating, the ID variable is a poor predictor. Remove the ID variable from the table, and convert the Industry variable to a categorical variable. creditrating = removevars(creditrating,"ID"); creditrating.Industry = categorical(creditrating.Industry);

Convert the Rating response variable to an ordinal categorical variable. creditrating.Rating = categorical(creditrating.Rating, ... ["AAA","AA","A","BBB","BB","B","CCC"],"Ordinal",true);

Partition the data into training and test sets. Use approximately 80% of the observations to train a neural network model, and 20% of the observations to test the performance of the trained model on new data. Use cvpartition to partition the data. rng("default") % For reproducibility of the partition c = cvpartition(creditrating.Rating,"Holdout",0.20); trainingIndices = training(c); % Indices for the training set testIndices = test(c); % Indices for the test set creditTrain = creditrating(trainingIndices,:); creditTest = creditrating(testIndices,:);

Train a neural network classifier by passing the training data creditTrain to the fitcnet function. Mdl = fitcnet(creditTrain,"Rating") Mdl = ClassificationNeuralNetwork PredictorNames: {'WC_TA' 'RE_TA' ResponseName: 'Rating' CategoricalPredictors: 6 ClassNames: [AAA AA A ScoreTransform: 'none' NumObservations: 3146 LayerSizes: 10 Activations: 'relu' OutputLayerActivation: 'softmax' Solver: 'LBFGS'

'EBIT_TA' BBB

BB

'MVE_BVTD' B

'S_TA'

'Industry'}

CCC]

35-2419

35

Functions

ConvergenceInfo: [1x1 struct] TrainingHistory: [1000x7 table]

Mdl is a trained ClassificationNeuralNetwork classifier. You can use dot notation to access the properties of Mdl. For example, you can specify Mdl.TrainingHistory to get more information about the training history of the neural network model. Evaluate the performance of the classifier on the test set by computing the test set classification error. Visualize the results by using a confusion matrix. testAccuracy = 1 - loss(Mdl,creditTest,"Rating", ... "LossFun","classiferror") testAccuracy = 0.8028 confusionchart(creditTest.Rating,predict(Mdl,creditTest))

Specify Neural Network Classifier Architecture Specify the structure of a neural network classifier, including the size of the fully connected layers. Load the ionosphere data set, which includes radar signal data. X contains the predictor data, and Y is the response variable, whose values represent either good ("g") or bad ("b") radar signals. 35-2420

fitcnet

load ionosphere

Separate the data into training data (XTrain and YTrain) and test data (XTest and YTest) by using a stratified holdout partition. Reserve approximately 30% of the observations for testing, and use the rest of the observations for training. rng("default") % For reproducibility of the partition cvp = cvpartition(Y,"Holdout",0.3); XTrain = X(training(cvp),:); YTrain = Y(training(cvp)); XTest = X(test(cvp),:); YTest = Y(test(cvp));

Train a neural network classifier. Specify to have 35 outputs in the first fully connected layer and 20 outputs in the second fully connected layer. By default, both layers use a rectified linear unit (ReLU) activation function. You can change the activation functions for the fully connected layers by using the Activations name-value argument. Mdl = fitcnet(XTrain,YTrain, ... "LayerSizes",[35 20]) Mdl = ClassificationNeuralNetwork ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' NumObservations: 246 LayerSizes: [35 20] Activations: 'relu' OutputLayerActivation: 'softmax' Solver: 'LBFGS' ConvergenceInfo: [1x1 struct] TrainingHistory: [47x7 table]

Access the weights and biases for the fully connected layers of the trained classifier by using the LayerWeights and LayerBiases properties of Mdl. The first two elements of each property correspond to the values for the first two fully connected layers, and the third element corresponds to the values for the final fully connected layer with a softmax activation function for classification. For example, display the weights and biases for the second fully connected layer. Mdl.LayerWeights{2} ans = 20×35 0.0481 -0.9489 -0.1910 -0.0415 1.1848 0.2486 -0.0516 -0.6192 0.5049 1.1109

0.2501 -1.8343 0.0246 -0.0059 1.6142 -0.2920 0.0640 -0.7804 -0.1362 -0.0466

-0.1535 0.5510 -0.3511 -0.0753 -0.1352 -0.0004 0.1824 -0.0506 -0.2218 0.4044

-0.0934 -0.5751 0.0097 -0.1477 0.5774 0.2806 -0.0675 -0.4205 0.1637 0.6366

0.0760 -0.8726 0.3160 -0.1621 0.5491 0.2987 -0.2065 -0.2584 -0.1282 0.1863

-0.0579 0.8815 -0.0693 -0.1762 0.0103 -0.2709 -0.0052 -0.2020 -0.1008 0.5660

-0.2465 0.0203 0.2270 0.2164 0.0209 0.1473 -0.1682 -0.0008 0.1445 0.2839

1.0411 -1.6379 -0.0783 0.1710 0.7219 -0.2580 -0.1520 0.0534 0.4527 0.8793

0.3712 2.0315 -0.1626 -0.0610 -0.8643 -0.0499 0.0060 1.0185 -0.4887 -0.5497

35-2421

-1.2 1.7 -0.3 -0.1 -0.5 -0.0 0.0 -0.0 0.0 0.0

35

Functions

⋮ Mdl.LayerBiases{2} ans = 20×1 0.6147 0.1891 -0.2767 -0.2977 1.3655 0.0347 0.1509 -0.4839 -0.3960 0.9248 ⋮

The final fully connected layer has two outputs, one for each class in the response variable. The number of layer outputs corresponds to the first dimension of the layer weights and layer biases. size(Mdl.LayerWeights{end}) ans = 1×2 2

20

size(Mdl.LayerBiases{end}) ans = 1×2 2

1

To estimate the performance of the trained classifier, compute the test set classification error for Mdl. testError = loss(Mdl,XTest,YTest, ... "LossFun","classiferror") testError = 0.0774 accuracy = 1 - testError accuracy = 0.9226

Mdl accurately classifies approximately 92% of the observations in the test set.

Stop Neural Network Training Early Using Validation Data At each iteration of the training process, compute the validation loss of the neural network. Stop the training process early if the validation loss reaches a reasonable minimum. 35-2422

fitcnet

Load the patients data set. Create a table from the data set. Each row corresponds to one patient, and each column corresponds to a diagnostic variable. Use the Smoker variable as the response variable, and the rest of the variables as predictors. load patients tbl = table(Diastolic,Systolic,Gender,Height,Weight,Age,Smoker);

Separate the data into a training set tblTrain and a validation set tblValidation by using a stratified holdout partition. The software reserves approximately 30% of the observations for the validation data set and uses the rest of the observations for the training data set. rng("default") % For reproducibility of the partition c = cvpartition(tbl.Smoker,"Holdout",0.30); trainingIndices = training(c); validationIndices = test(c); tblTrain = tbl(trainingIndices,:); tblValidation = tbl(validationIndices,:);

Train a neural network classifier by using the training set. Specify the Smoker column of tblTrain as the response variable. Evaluate the model at each iteration by using the validation set. Specify to display the training information at each iteration by using the Verbose name-value argument. By default, the training process ends early if the validation cross-entropy loss is greater than or equal to the minimum validation cross-entropy loss computed so far, six times in a row. To change the number of times the validation loss is allowed to be greater than or equal to the minimum, specify the ValidationPatience name-value argument. Mdl = fitcnet(tblTrain,"Smoker", ... "ValidationData",tblValidation, ... "Verbose",1); |==========================================================================================| | Iteration | Train Loss | Gradient | Step | Iteration | Validation | Validation | | | | | | Time (sec) | Loss | Checks | |==========================================================================================| | 1| 2.602935| 26.866935| 0.262009| 0.054190| 2.793048| 0| | 2| 1.470816| 42.594723| 0.058323| 0.008893| 1.247046| 0| | 3| 1.299292| 25.854432| 0.034910| 0.005533| 1.507857| 1| | 4| 0.710465| 11.629107| 0.013616| 0.006842| 0.889157| 0| | 5| 0.647783| 2.561740| 0.005753| 0.015240| 0.766728| 0| | 6| 0.645541| 0.681579| 0.001000| 0.001774| 0.776072| 1| | 7| 0.639611| 1.544692| 0.007013| 0.003638| 0.776320| 2| | 8| 0.604189| 5.045676| 0.064190| 0.001179| 0.744919| 0| | 9| 0.565364| 5.851552| 0.068845| 0.000733| 0.694226| 0| | 10| 0.391994| 8.377717| 0.560480| 0.000870| 0.425466| 0| |==========================================================================================| | Iteration | Train Loss | Gradient | Step | Iteration | Validation | Validation | | | | | | Time (sec) | Loss | Checks | |==========================================================================================| | 11| 0.383843| 0.630246| 0.110270| 0.002094| 0.428487| 1| | 12| 0.369289| 2.404750| 0.084395| 0.001618| 0.405728| 0| | 13| 0.357839| 6.220679| 0.199197| 0.001485| 0.378480| 0| | 14| 0.344974| 2.752717| 0.029013| 0.001470| 0.367279| 0| | 15| 0.333747| 0.711398| 0.074513| 0.002677| 0.348499| 0| | 16| 0.327763| 0.804818| 0.122178| 0.000879| 0.330237| 0| | 17| 0.327702| 0.778169| 0.009810| 0.000883| 0.329095| 0| | 18| 0.327277| 0.020615| 0.004377| 0.000747| 0.329141| 1| | 19| 0.327273| 0.010018| 0.003313| 0.000486| 0.328773| 0|

35-2423

35

Functions

| 20| 0.327268| 0.019497| 0.000805| 0.001315| 0.328831| 1| |==========================================================================================| | Iteration | Train Loss | Gradient | Step | Iteration | Validation | Validation | | | | | | Time (sec) | Loss | Checks | |==========================================================================================| | 21| 0.327228| 0.113983| 0.005397| 0.000617| 0.329085| 2| | 22| 0.327138| 0.240166| 0.012159| 0.000712| 0.329406| 3| | 23| 0.326865| 0.428912| 0.036841| 0.000624| 0.329952| 4| | 24| 0.325797| 0.255227| 0.139585| 0.000685| 0.331246| 5| | 25| 0.325181| 0.758050| 0.135868| 0.001811| 0.332035| 6| |==========================================================================================|

Create a plot that compares the training cross-entropy loss and the validation cross-entropy loss at each iteration. By default, fitcnet stores the loss information inside the TrainingHistory property of the object Mdl. You can access this information by using dot notation. iteration = Mdl.TrainingHistory.Iteration; trainLosses = Mdl.TrainingHistory.TrainingLoss; valLosses = Mdl.TrainingHistory.ValidationLoss; plot(iteration,trainLosses,iteration,valLosses) legend(["Training","Validation"]) xlabel("Iteration") ylabel("Cross-Entropy Loss")

Check the iteration that corresponds to the minimum validation loss. The final returned model Mdl is the model trained at this iteration. 35-2424

fitcnet

[~,minIdx] = min(valLosses); iteration(minIdx) ans = 19

Find Good Regularization Strength for Neural Network Using Cross-Validation Assess the cross-validation loss of neural network models with different regularization strengths, and choose the regularization strength corresponding to the best performing model. Read the sample file CreditRating_Historical.dat into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency. Preview the first few rows of the data set. creditrating = readtable("CreditRating_Historical.dat"); head(creditrating) ID _____

WC_TA ______

RE_TA ______

EBIT_TA _______

62394 48608 42444 48631 43768 39255 62236 39354

0.013 0.232 0.311 0.194 0.121 -0.117 0.087 0.005

0.104 0.335 0.367 0.263 0.413 -0.799 0.158 0.181

0.036 0.062 0.074 0.062 0.057 0.01 0.049 0.034

MVE_BVTD ________ 0.447 1.969 1.935 1.017 3.647 0.179 0.816 2.597

S_TA _____

Industry ________

Rating _______

0.142 0.281 0.366 0.228 0.466 0.082 0.324 0.388

3 8 1 4 12 4 2 7

{'BB' } {'A' } {'A' } {'BBB'} {'AAA'} {'CCC'} {'BBB'} {'AA' }

Because each value in the ID variable is a unique customer ID, that is, length(unique(creditrating.ID)) is equal to the number of observations in creditrating, the ID variable is a poor predictor. Remove the ID variable from the table, and convert the Industry variable to a categorical variable. creditrating = removevars(creditrating,"ID"); creditrating.Industry = categorical(creditrating.Industry);

Convert the Rating response variable to an ordinal categorical variable. creditrating.Rating = categorical(creditrating.Rating, ... ["AAA","AA","A","BBB","BB","B","CCC"],"Ordinal",true);

Create a cvpartition object for stratified 5-fold cross-validation. cvp partitions the data into five folds, where each fold has roughly the same proportions of different credit ratings. Set the random seed to the default value for reproducibility of the partition. rng("default") cvp = cvpartition(creditrating.Rating,"KFold",5);

Compute the cross-validation classification error for neural network classifiers with different regularization strengths. Try regularization strengths on the order of 1/n, where n is the number of observations. Specify to standardize the data before training the neural network models. 1/size(creditrating,1)

35-2425

35

Functions

ans = 2.5432e-04 lambda = (0:0.5:5)*1e-4; cvloss = zeros(length(lambda),1); for i = 1:length(lambda) cvMdl = fitcnet(creditrating,"Rating","Lambda",lambda(i), ... "CVPartition",cvp,"Standardize",true); cvloss(i) = kfoldLoss(cvMdl,"LossFun","classiferror"); end

Plot the results. Find the regularization strength corresponding to the lowest cross-validation classification error. plot(lambda,cvloss) xlabel("Regularization Strength") ylabel("Cross-Validation Loss")

[~,idx] = min(cvloss); bestLambda = lambda(idx) bestLambda = 5.0000e-05

Train a neural network classifier using the bestLambda regularization strength. Mdl = fitcnet(creditrating,"Rating","Lambda",bestLambda, ... "Standardize",true)

35-2426

fitcnet

Mdl = ClassificationNeuralNetwork PredictorNames: {'WC_TA' 'RE_TA' ResponseName: 'Rating' CategoricalPredictors: 6 ClassNames: [AAA AA A ScoreTransform: 'none' NumObservations: 3932 LayerSizes: 10 Activations: 'relu' OutputLayerActivation: 'softmax' Solver: 'LBFGS' ConvergenceInfo: [1×1 struct] TrainingHistory: [1000×7 table]

'EBIT_TA' BBB

'MVE_BVTD'

BB

B

'S_TA'

'Industry'}

CCC]

Properties, Methods

Improve Neural Network Classifier Using OptimizeHyperparameters Train a neural network classifier using the OptimizeHyperparameters argument to improve the resulting classifier. Using this argument causes fitcnet to minimize cross-validation loss over some problem hyperparameters using Bayesian optimization. Read the sample file CreditRating_Historical.dat into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency. Preview the first few rows of the data set. creditrating = readtable("CreditRating_Historical.dat"); head(creditrating) ID _____

WC_TA ______

RE_TA ______

EBIT_TA _______

62394 48608 42444 48631 43768 39255 62236 39354

0.013 0.232 0.311 0.194 0.121 -0.117 0.087 0.005

0.104 0.335 0.367 0.263 0.413 -0.799 0.158 0.181

0.036 0.062 0.074 0.062 0.057 0.01 0.049 0.034

MVE_BVTD ________ 0.447 1.969 1.935 1.017 3.647 0.179 0.816 2.597

S_TA _____

Industry ________

Rating _______

0.142 0.281 0.366 0.228 0.466 0.082 0.324 0.388

3 8 1 4 12 4 2 7

{'BB' } {'A' } {'A' } {'BBB'} {'AAA'} {'CCC'} {'BBB'} {'AA' }

Because each value in the ID variable is a unique customer ID, that is, length(unique(creditrating.ID)) is equal to the number of observations in creditrating, the ID variable is a poor predictor. Remove the ID variable from the table, and convert the Industry variable to a categorical variable. creditrating = removevars(creditrating,"ID"); creditrating.Industry = categorical(creditrating.Industry);

Convert the Rating response variable to an ordinal categorical variable. 35-2427

35

Functions

creditrating.Rating = categorical(creditrating.Rating, ... ["AAA","AA","A","BBB","BB","B","CCC"],"Ordinal",true);

Partition the data into training and test sets. Use approximately 80% of the observations to train a neural network model, and 20% of the observations to test the performance of the trained model on new data. Use cvpartition to partition the data. rng("default") % For reproducibility of the partition c = cvpartition(creditrating.Rating,"Holdout",0.20); trainingIndices = training(c); % Indices for the training set testIndices = test(c); % Indices for the test set creditTrain = creditrating(trainingIndices,:); creditTest = creditrating(testIndices,:);

Train a neural network classifier by passing the training data creditTrain to the fitcnet function, and include the OptimizeHyperparameters argument. For reproducibility, set the AcquisitionFunctionName to "expected-improvement-plus" in a HyperparameterOptimizationOptions structure. To attempt to get a better solution, set the number of optimization steps to 100 instead of the default 30. fitcnet performs Bayesian optimization by default. To use grid search or random search, set the Optimizer field in HyperparameterOptimizationOptions. rng("default") % For reproducibility Mdl = fitcnet(creditTrain,"Rating","OptimizeHyperparameters","auto", ... "HyperparameterOptimizationOptions", ... struct("AcquisitionFunctionName","expected-improvement-plus", ... "MaxObjectiveEvaluations",100))

|================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standar | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 1 | Best | 0.55944 | 0.21329 | 0.55944 | 0.55944 | none | | 2 | Best | 0.21551 | 9.1007 | 0.21551 | 0.22919 | relu | | 3 | Accept | 0.74189 | 0.37 | 0.21551 | 0.21565 | sigmoid | | 4 | Accept | 0.4501 | 0.73172 | 0.21551 | 0.21574 | tanh | f | 5 | Accept | 0.74189 | 0.20404 | 0.21551 | 0.21653 | relu | | 6 | Accept | 0.22409 | 13.374 | 0.21551 | 0.21555 | relu | | 7 | Accept | 0.26987 | 12.622 | 0.21551 | 0.21571 | relu | | 8 | Accept | 0.26192 | 33.25 | 0.21551 | 0.21586 | relu | | 9 | Accept | 0.30706 | 20.077 | 0.21551 | 0.21585 | tanh | f | 10 | Accept | 0.21678 | 35.983 | 0.21551 | 0.21582 | relu | f | 11 | Accept | 0.22537 | 10.54 | 0.21551 | 0.21578 | relu | f | 12 | Accept | 0.35029 | 39.297 | 0.21551 | 0.21564 | relu | f | 13 | Accept | 0.32867 | 27.557 | 0.21551 | 0.21564 | relu | f | 14 | Best | 0.21297 | 9.1901 | 0.21297 | 0.21289 | relu | f | 15 | Accept | 0.44469 | 22.2 | 0.21297 | 0.21292 | relu | f | 16 | Accept | 0.21488 | 19.999 | 0.21297 | 0.21292 | relu | | 17 | Accept | 0.30356 | 87.716 | 0.21297 | 0.21293 | tanh | f | 18 | Accept | 0.23172 | 35.422 | 0.21297 | 0.21294 | relu | | 19 | Accept | 0.21647 | 23.035 | 0.21297 | 0.21294 | tanh | | 20 | Accept | 0.28862 | 149.47 | 0.21297 | 0.21293 | tanh | |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standar | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 21 | Accept | 0.21519 | 7.4726 | 0.21297 | 0.21294 | tanh | | 22 | Accept | 0.31977 | 23.153 | 0.21297 | 0.21295 | tanh |

35-2428

fitcnet

| 23 | Accept | 0.21488 | 36.265 | 0.21297 | 0.21295 | tanh | | 24 | Accept | 0.74189 | 0.10388 | 0.21297 | 0.21296 | tanh | | 25 | Accept | 0.21456 | 22.882 | 0.21297 | 0.21296 | tanh | | 26 | Accept | 0.25016 | 66.29 | 0.21297 | 0.21296 | sigmoid | f | 27 | Accept | 0.21424 | 9.3908 | 0.21297 | 0.21296 | sigmoid | f | 28 | Accept | 0.2314 | 83.118 | 0.21297 | 0.21301 | sigmoid | f | 29 | Accept | 0.21996 | 13.988 | 0.21297 | 0.21308 | relu | f | 30 | Accept | 0.27273 | 40.495 | 0.21297 | 0.21297 | relu | f | 31 | Accept | 0.24666 | 40.795 | 0.21297 | 0.20705 | sigmoid | f | 32 | Accept | 0.74189 | 0.57341 | 0.21297 | 0.21299 | sigmoid | f | 33 | Accept | 0.21678 | 44.545 | 0.21297 | 0.21329 | sigmoid | f | 34 | Accept | 0.74189 | 0.33765 | 0.21297 | 0.21301 | relu | f | 35 | Accept | 0.24348 | 8.9164 | 0.21297 | 0.21307 | relu | f | 36 | Accept | 0.22473 | 4.7917 | 0.21297 | 0.2134 | relu | f | 37 | Accept | 0.21424 | 9.6882 | 0.21297 | 0.21299 | relu | | 38 | Accept | 0.21837 | 53.903 | 0.21297 | 0.21597 | relu | f | 39 | Accept | 0.29688 | 44.476 | 0.21297 | 0.21299 | relu | | 40 | Accept | 0.21805 | 30.673 | 0.21297 | 0.21157 | tanh | |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standar | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 41 | Accept | 0.22187 | 41.121 | 0.21297 | 0.21539 | relu | | 42 | Accept | 0.29847 | 43.535 | 0.21297 | 0.21214 | tanh | | 43 | Accept | 0.22282 | 66.822 | 0.21297 | 0.213 | tanh | f | 44 | Accept | 0.21519 | 32.912 | 0.21297 | 0.21522 | sigmoid | f | 45 | Accept | 0.21901 | 5.021 | 0.21297 | 0.21204 | tanh | f | 46 | Accept | 0.21519 | 30.912 | 0.21297 | 0.21193 | tanh | f | 47 | Accept | 0.21456 | 58.926 | 0.21297 | 0.21506 | tanh | f | 48 | Accept | 0.46154 | 5.002 | 0.21297 | 0.21531 | relu | | 49 | Accept | 0.21424 | 5.1521 | 0.21297 | 0.21525 | sigmoid | f | 50 | Accept | 0.25079 | 50.299 | 0.21297 | 0.21527 | tanh | f | 51 | Accept | 0.21647 | 8.5723 | 0.21297 | 0.21354 | none | f | 52 | Accept | 0.21519 | 51.63 | 0.21297 | 0.21343 | none | f | 53 | Accept | 0.22092 | 32.834 | 0.21297 | 0.21343 | none | f | 54 | Accept | 0.21647 | 4.5964 | 0.21297 | 0.21339 | none | f | 55 | Accept | 0.74189 | 0.64963 | 0.21297 | 0.21518 | none | f | 56 | Accept | 0.21551 | 36.402 | 0.21297 | 0.21338 | none | f | 57 | Accept | 0.21488 | 14.209 | 0.21297 | 0.21516 | none | f | 58 | Accept | 0.2206 | 102.47 | 0.21297 | 0.21527 | none | f | 59 | Accept | 0.21456 | 3.1248 | 0.21297 | 0.21532 | none | f | 60 | Best | 0.21202 | 7.9207 | 0.21202 | 0.21528 | none | |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standar | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 61 | Accept | 0.21424 | 32.984 | 0.21202 | 0.2153 | none | | 62 | Accept | 0.21488 | 1.5184 | 0.21202 | 0.21366 | none | | 63 | Accept | 0.21901 | 26.273 | 0.21202 | 0.21482 | none | | 64 | Accept | 0.21488 | 6.2945 | 0.21202 | 0.21471 | none | | 65 | Accept | 0.32422 | 16.531 | 0.21202 | 0.21506 | sigmoid | | 66 | Accept | 0.21519 | 5.8113 | 0.21202 | 0.21387 | none | | 67 | Accept | 0.22028 | 46.305 | 0.21202 | 0.21471 | none | | 68 | Accept | 0.21456 | 8.6221 | 0.21202 | 0.21491 | sigmoid | | 69 | Accept | 0.29307 | 73.252 | 0.21202 | 0.2148 | sigmoid | | 70 | Accept | 0.21519 | 8.829 | 0.21202 | 0.21422 | sigmoid | | 71 | Accept | 0.21424 | 78.573 | 0.21202 | 0.21538 | sigmoid | | 72 | Accept | 0.74189 | 0.36797 | 0.21202 | 0.21511 | tanh | f

35-2429

35

Functions

| 73 | Accept | 0.23935 | 42.077 | 0.21202 | 0.21485 | sigmoid | | 74 | Accept | 0.21996 | 51.114 | 0.21202 | 0.20972 | none | f | 75 | Accept | 0.21329 | 5.0237 | 0.21202 | 0.21425 | sigmoid | f | 76 | Accept | 0.21742 | 1.1517 | 0.21202 | 0.21027 | none | f | 77 | Accept | 0.22028 | 92.125 | 0.21202 | 0.21008 | none | f | 78 | Accept | 0.21456 | 51.296 | 0.21202 | 0.21 | tanh | | 79 | Accept | 0.21265 | 4.9246 | 0.21202 | 0.21006 | relu | | 80 | Accept | 0.22219 | 10.504 | 0.21202 | 0.20997 | none | |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standar | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 81 | Accept | 0.21329 | 1.2294 | 0.21202 | 0.21009 | none | | 82 | Accept | 0.21488 | 1.6326 | 0.21202 | 0.20863 | none | | 83 | Accept | 0.21392 | 27.767 | 0.21202 | 0.21006 | none | | 84 | Accept | 0.21392 | 24.581 | 0.21202 | 0.20884 | tanh | f | 85 | Accept | 0.2136 | 34.531 | 0.21202 | 0.20886 | sigmoid | f | 86 | Accept | 0.21488 | 10.99 | 0.21202 | 0.20888 | none | f | 87 | Accept | 0.21583 | 6.9518 | 0.21202 | 0.20957 | sigmoid | | 88 | Accept | 0.22155 | 32.717 | 0.21202 | 0.20958 | none | | 89 | Accept | 0.21424 | 22.295 | 0.21202 | 0.21213 | none | | 90 | Accept | 0.21996 | 66.168 | 0.21202 | 0.21214 | none | f | 91 | Accept | 0.21964 | 31.478 | 0.21202 | 0.20975 | none | f | 92 | Accept | 0.21901 | 38.288 | 0.21202 | 0.20979 | none | | 93 | Accept | 0.21678 | 40.626 | 0.21202 | 0.21217 | tanh | | 94 | Accept | 0.21742 | 40.422 | 0.21202 | 0.2101 | sigmoid | | 95 | Accept | 0.74189 | 0.67204 | 0.21202 | 0.20996 | sigmoid | f | 96 | Best | 0.21011 | 44.631 | 0.21011 | 0.2098 | tanh | | 97 | Accept | 0.21964 | 8.977 | 0.21011 | 0.20842 | none | | 98 | Accept | 0.22187 | 68.624 | 0.21011 | 0.20836 | relu | | 99 | Accept | 0.21488 | 6.7468 | 0.21011 | 0.2084 | none | | 100 | Accept | 0.21233 | 29.459 | 0.21011 | 0.20832 | sigmoid | __________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 100 reached. Total function evaluations: 100 Total elapsed time: 2835.9139 seconds Total objective function evaluation time: 2796.6941 Best observed feasible point: Activations Standardize ___________ ___________ tanh

true

Lambda __________

LayerSizes _______________

4.4224e-08

1

73

107

Observed objective function value = 0.21011 Estimated objective function value = 0.2131 Function evaluation time = 44.631 Best estimated feasible point (according to models): Activations Standardize Lambda LayerSizes ___________ ___________ __________ _______________ tanh

true

3.8729e-09

Estimated objective function value = 0.20832 Estimated function evaluation time = 39.7382

35-2430

1

11

235

fitcnet

Mdl = ClassificationNeuralNetwork PredictorNames: ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: HyperparameterOptimizationResults: LayerSizes: Activations: OutputLayerActivation: Solver: ConvergenceInfo: TrainingHistory:

{'WC_TA' 'RE_TA' 'EBIT_TA' 'Rating' 6 [AAA AA A BBB BB 'none' 3146 [1×1 BayesianOptimization] [1 11 235] 'tanh' 'softmax' 'LBFGS' [1×1 struct] [1000×7 table]

'MVE_BVTD' B

'S_TA'

CCC]

Properties, Methods

Mdl is a trained ClassificationNeuralNetwork classifier. The model corresponds to the best estimated feasible point, as opposed to the best observed feasible point. (For details on this distinction, see bestPoint.) You can use dot notation to access the properties of Mdl. For example, you can specify Mdl.HyperparameterOptimizationResults to get more information about the optimization of the neural network model.

35-2431

'Industr

35

Functions

Find the classification accuracy of the model on the test data set. Visualize the results by using a confusion matrix. modelAccuracy = 1 - loss(Mdl,creditTest,"Rating", ... "LossFun","classiferror") modelAccuracy = 0.8002 confusionchart(creditTest.Rating,predict(Mdl,creditTest))

The model has all predicted classes within one unit of the true classes, meaning all predictions are off by no more than one rating.

Customize Neural Network Classifier Optimization Train a neural network classifier using the OptimizeHyperparameters argument to improve the resulting classification accuracy. Use the hyperparameters function to specify larger-than-default values for the number of layers used and the layer size range. Read the sample file CreditRating_Historical.dat into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency. creditrating = readtable("CreditRating_Historical.dat");

35-2432

fitcnet

Because each value in the ID variable is a unique customer ID, that is, length(unique(creditrating.ID)) is equal to the number of observations in creditrating, the ID variable is a poor predictor. Remove the ID variable from the table, and convert the Industry variable to a categorical variable. creditrating = removevars(creditrating,"ID"); creditrating.Industry = categorical(creditrating.Industry);

Convert the Rating response variable to an ordinal categorical variable. creditrating.Rating = categorical(creditrating.Rating, ... ["AAA","AA","A","BBB","BB","B","CCC"],"Ordinal",true);

Partition the data into training and test sets. Use approximately 80% of the observations to train a neural network model, and 20% of the observations to test the performance of the trained model on new data. Use cvpartition to partition the data. rng("default") % For reproducibility of the partition c = cvpartition(creditrating.Rating,"Holdout",0.20); trainingIndices = training(c); % Indices for the training set testIndices = test(c); % Indices for the test set creditTrain = creditrating(trainingIndices,:); creditTest = creditrating(testIndices,:);

List the hyperparameters available for this problem of fitting the Rating response. params = hyperparameters("fitcnet",creditTrain,"Rating"); for ii = 1:length(params) disp(ii);disp(params(ii)) end 1 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'NumLayers' [1 3] 'integer' 'none' 1

2 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'Activations' {'relu' 'tanh' 'categorical' 'none' 1

'sigmoid'

'none'}

3 optimizableVariable with properties: Name: 'Standardize' Range: {'true' 'false'} Type: 'categorical'

35-2433

35

Functions

Transform: 'none' Optimize: 1 4 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'Lambda' [3.1786e-09 31.7864] 'real' 'log' 1

5 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'LayerWeightsInitializer' {'glorot' 'he'} 'categorical' 'none' 0

6 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'LayerBiasesInitializer' {'zeros' 'ones'} 'categorical' 'none' 0

7 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'Layer_1_Size' [1 300] 'integer' 'log' 1

8 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'Layer_2_Size' [1 300] 'integer' 'log' 1

9 optimizableVariable with properties: Name: 'Layer_3_Size'

35-2434

fitcnet

Range: Type: Transform: Optimize:

[1 300] 'integer' 'log' 1

10 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'Layer_4_Size' [1 300] 'integer' 'log' 0

11 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'Layer_5_Size' [1 300] 'integer' 'log' 0

To try more layers than the default of 1 through 3, set the range of NumLayers (optimizable variable 1) to its maximum allowable size, [1 5]. Also, set Layer_4_Size and Layer_5_Size (optimizable variables 10 and 11, respectively) to be optimized. params(1).Range = [1 5]; params(10).Optimize = true; params(11).Optimize = true;

Set the range of all layer sizes (optimizable variables 7 through 11) to [1 400] instead of the default [1 300]. for ii = 7:11 params(ii).Range = [1 400]; end

Train a neural network classifier by passing the training data creditTrain to the fitcnet function, and include the OptimizeHyperparameters argument set to params. For reproducibility, set the AcquisitionFunctionName to "expected-improvement-plus" in a HyperparameterOptimizationOptions structure. To attempt to get a better solution, set the number of optimization steps to 100 instead of the default 30. rng("default") % For reproducibility Mdl = fitcnet(creditTrain,"Rating","OptimizeHyperparameters",params, ... "HyperparameterOptimizationOptions", ... struct("AcquisitionFunctionName","expected-improvement-plus", ... "MaxObjectiveEvaluations",100))

|================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standar | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 1 | Best | 0.74189 | 0.24554 | 0.74189 | 0.74189 | sigmoid | | 2 | Best | 0.22219 | 79.821 | 0.22219 | 0.24285 | relu |

35-2435

35

Functions

| 3 | Accept | 0.63859 | 11.228 | 0.22219 | 0.22668 | sigmoid | | 4 | Best | 0.21933 | 49.537 | 0.21933 | 0.22311 | none | f | 5 | Accept | 0.74189 | 0.12879 | 0.21933 | 0.21936 | relu | | 6 | Accept | 0.29434 | 124.68 | 0.21933 | 0.21936 | relu | | 7 | Accept | 0.23776 | 43.931 | 0.21933 | 0.21936 | relu | | 8 | Best | 0.21488 | 38.338 | 0.21488 | 0.21626 | none | f | 9 | Accept | 0.21933 | 35.508 | 0.21488 | 0.21613 | none | f | 10 | Accept | 0.21996 | 102.77 | 0.21488 | 0.21609 | none | f | 11 | Accept | 0.74189 | 0.32504 | 0.21488 | 0.21626 | none | f | 12 | Accept | 0.74189 | 0.70441 | 0.21488 | 0.21713 | relu | | 13 | Accept | 0.21933 | 35.818 | 0.21488 | 0.21632 | none | f | 14 | Best | 0.21456 | 102.86 | 0.21456 | 0.21462 | none | f | 15 | Accept | 0.21488 | 18.953 | 0.21456 | 0.21467 | none | f | 16 | Accept | 0.21901 | 122.61 | 0.21456 | 0.21452 | none | f | 17 | Accept | 0.21933 | 124.64 | 0.21456 | 0.2145 | none | f | 18 | Accept | 0.30579 | 89.942 | 0.21456 | 0.21468 | relu | | 19 | Accept | 0.74189 | 1.6194 | 0.21456 | 0.2145 | relu | | 20 | Accept | 0.25779 | 41.138 | 0.21456 | 0.21451 | relu | |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standar | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 21 | Accept | 0.27209 | 49.233 | 0.21456 | 0.21451 | relu | f | 22 | Accept | 0.23045 | 40.584 | 0.21456 | 0.21451 | relu | f | 23 | Accept | 0.59917 | 38.321 | 0.21456 | 0.21451 | relu | f | 24 | Accept | 0.34107 | 34.102 | 0.21456 | 0.21453 | relu | f | 25 | Accept | 0.28671 | 94.448 | 0.21456 | 0.21453 | relu | f | 26 | Accept | 0.25175 | 31.357 | 0.21456 | 0.21453 | tanh | f | 27 | Accept | 0.25556 | 42.215 | 0.21456 | 0.21453 | tanh | f | 28 | Accept | 0.29688 | 8.8611 | 0.21456 | 0.21453 | tanh | f | 29 | Accept | 0.74189 | 0.45105 | 0.21456 | 0.21453 | tanh | f | 30 | Accept | 0.2171 | 39.161 | 0.21456 | 0.21453 | tanh | f | 31 | Accept | 0.23268 | 93.767 | 0.21456 | 0.21453 | tanh | f | 32 | Accept | 0.24539 | 11.764 | 0.21456 | 0.21453 | sigmoid | f | 33 | Accept | 0.22441 | 12.443 | 0.21456 | 0.21453 | sigmoid | f | 34 | Accept | 0.65512 | 3.3187 | 0.21456 | 0.21453 | sigmoid | f | 35 | Accept | 0.26605 | 47.571 | 0.21456 | 0.21454 | sigmoid | f | 36 | Accept | 0.25143 | 27.106 | 0.21456 | 0.21454 | tanh | | 37 | Accept | 0.23713 | 36.344 | 0.21456 | 0.21454 | tanh | | 38 | Accept | 0.23045 | 28.862 | 0.21456 | 0.21454 | tanh | | 39 | Accept | 0.74189 | 0.22938 | 0.21456 | 0.21454 | tanh | | 40 | Accept | 0.27622 | 24.206 | 0.21456 | 0.21454 | tanh | |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standar | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 41 | Accept | 0.2562 | 11.592 | 0.21456 | 0.21454 | tanh | | 42 | Accept | 0.2225 | 7.546 | 0.21456 | 0.21454 | none | | 43 | Accept | 0.21996 | 90.518 | 0.21456 | 0.21454 | none | | 44 | Accept | 0.2171 | 9.248 | 0.21456 | 0.21454 | none | | 45 | Accept | 0.28163 | 1.1081 | 0.21456 | 0.21454 | none | | 46 | Best | 0.21265 | 90.522 | 0.21265 | 0.21266 | none | | 47 | Accept | 0.74189 | 0.20411 | 0.21265 | 0.21266 | none | | 48 | Accept | 0.2225 | 5.836 | 0.21265 | 0.21266 | none | | 49 | Accept | 0.74189 | 0.41584 | 0.21265 | 0.21266 | sigmoid | f | 50 | Accept | 0.21551 | 22.706 | 0.21265 | 0.21267 | none | | 51 | Accept | 0.21901 | 1.65 | 0.21265 | 0.21267 | none | f | 52 | Accept | 0.22187 | 23.949 | 0.21265 | 0.21267 | none | f

35-2436

fitcnet

| 53 | Accept | 0.2972 | 142.39 | 0.21265 | 0.21267 | sigmoid | | 54 | Accept | 0.74189 | 0.44397 | 0.21265 | 0.21268 | relu | f | 55 | Accept | 0.21774 | 9.8434 | 0.21265 | 0.21268 | none | f | 56 | Accept | 0.21996 | 66.573 | 0.21265 | 0.21279 | none | | 57 | Accept | 0.22378 | 51.304 | 0.21265 | 0.21279 | relu | f | 58 | Accept | 0.46821 | 32.156 | 0.21265 | 0.21268 | relu | | 59 | Accept | 0.34456 | 7.7912 | 0.21265 | 0.21266 | relu | | 60 | Accept | 0.28195 | 54.602 | 0.21265 | 0.21268 | sigmoid | f |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standar | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 61 | Accept | 0.30737 | 67.794 | 0.21265 | 0.21268 | relu | | 62 | Accept | 0.21901 | 81.661 | 0.21265 | 0.21269 | none | | 63 | Accept | 0.30197 | 22.19 | 0.21265 | 0.21862 | tanh | f | 64 | Accept | 0.21837 | 35.774 | 0.21265 | 0.21268 | none | | 65 | Accept | 0.2295 | 57.177 | 0.21265 | 0.21268 | tanh | f | 66 | Accept | 0.24412 | 3.6737 | 0.21265 | 0.21265 | none | | 67 | Accept | 0.30292 | 89.668 | 0.21265 | 0.21265 | tanh | f | 68 | Accept | 0.53306 | 22.767 | 0.21265 | 0.21265 | relu | | 69 | Accept | 0.31977 | 37.867 | 0.21265 | 0.21699 | relu | | 70 | Accept | 0.22028 | 42.181 | 0.21265 | 0.21721 | none | | 71 | Accept | 0.3042 | 127.78 | 0.21265 | 0.21727 | tanh | f | 72 | Accept | 0.21901 | 40.13 | 0.21265 | 0.21731 | none | | 73 | Accept | 0.2225 | 67.221 | 0.21265 | 0.21744 | tanh | f | 74 | Accept | 0.21964 | 19.274 | 0.21265 | 0.21773 | none | f | 75 | Accept | 0.21869 | 61.973 | 0.21265 | 0.21779 | none | f | 76 | Accept | 0.21964 | 7.4093 | 0.21265 | 0.21779 | none | f | 77 | Accept | 0.21996 | 36.057 | 0.21265 | 0.21798 | none | | 78 | Accept | 0.22823 | 6.1521 | 0.21265 | 0.21803 | none | f | 79 | Accept | 0.21805 | 32.939 | 0.21265 | 0.21805 | none | f | 80 | Accept | 0.21424 | 9.1044 | 0.21265 | 0.21814 | none | f |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Activations | Standar | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 81 | Accept | 0.21996 | 42.556 | 0.21265 | 0.21815 | none | | 82 | Accept | 0.21964 | 30.291 | 0.21265 | 0.21812 | tanh | f | 83 | Accept | 0.22028 | 134.98 | 0.21265 | 0.21889 | none | f | 84 | Accept | 0.33312 | 47.699 | 0.21265 | 0.21783 | relu | f | 85 | Accept | 0.29243 | 79.597 | 0.21265 | 0.21799 | tanh | | 86 | Accept | 0.24126 | 7.0959 | 0.21265 | 0.21807 | relu | f | 87 | Accept | 0.21901 | 63.435 | 0.21265 | 0.21811 | none | f | 88 | Accept | 0.21805 | 55.381 | 0.21265 | 0.2181 | none | | 89 | Accept | 0.21996 | 9.4909 | 0.21265 | 0.2181 | none | | 90 | Accept | 0.21488 | 8.3628 | 0.21265 | 0.21811 | tanh | | 91 | Accept | 0.21774 | 70.692 | 0.21265 | 0.21271 | none | f | 92 | Accept | 0.22568 | 19.881 | 0.21265 | 0.21807 | relu | f | 93 | Accept | 0.2171 | 183.67 | 0.21265 | 0.21267 | none | | 94 | Accept | 0.22219 | 64.04 | 0.21265 | 0.21272 | relu | f | 95 | Accept | 0.29879 | 77.664 | 0.21265 | 0.21266 | relu | | 96 | Accept | 0.22854 | 51.505 | 0.21265 | 0.21777 | tanh | | 97 | Accept | 0.21837 | 49.418 | 0.21265 | 0.21266 | none | f | 98 | Accept | 0.29212 | 172.26 | 0.21265 | 0.21266 | relu | f | 99 | Accept | 0.23427 | 67.281 | 0.21265 | 0.21273 | sigmoid | f | 100 | Accept | 0.21583 | 218.34 | 0.21265 | 0.21273 | relu | f __________________________________________________________

35-2437

35

Functions

Optimization completed. MaxObjectiveEvaluations of 100 reached. Total function evaluations: 100 Total elapsed time: 4686.8262 seconds Total objective function evaluation time: 4641.9874 Best observed feasible point: Activations Standardize ___________ ___________ none

true

Lambda __________

LayerSizes ______________________________

3.2216e-06

11

4

315

183

1

Observed objective function value = 0.21265 Estimated objective function value = 0.21273 Function evaluation time = 90.5222 Best estimated feasible point (according to models): Activations Standardize Lambda LayerSizes ___________ ___________ __________ ______________________________ none

true

3.2216e-06

Estimated objective function value = 0.21273 Estimated function evaluation time = 82.3209

Mdl = ClassificationNeuralNetwork

35-2438

11

4

315

183

1

fitcnet

PredictorNames: ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: HyperparameterOptimizationResults: LayerSizes: Activations: OutputLayerActivation: Solver: ConvergenceInfo: TrainingHistory:

{'WC_TA' 'RE_TA' 'EBIT_TA' 'Rating' 6 [AAA AA A BBB BB 'none' 3146 [1×1 BayesianOptimization] [11 4 315 183 1] 'none' 'softmax' 'LBFGS' [1×1 struct] [1000×7 table]

'MVE_BVTD' B

'S_TA'

CCC]

Properties, Methods

Find the classification accuracy of the model on the test data set. Visualize the results by using a confusion matrix. testAccuracy = 1 - loss(Mdl,creditTest,"Rating", ... "LossFun","classiferror") testAccuracy = 0.8002 confusionchart(creditTest.Rating,predict(Mdl,creditTest))

35-2439

'Industr

35

Functions

The model has all predicted classes within one unit of the true classes, meaning all predictions are off by no more than one rating.

Input Arguments Tbl — Sample data table Sample data used to train the model, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain one additional column for the response variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed. • If Tbl contains the response variable, and you want to use all remaining variables in Tbl as predictors, then specify the response variable by using ResponseVarName. • If Tbl contains the response variable, and you want to use only a subset of the remaining variables in Tbl as predictors, then specify a formula by using formula. • If Tbl does not contain the response variable, then specify a response variable by using Y. The length of the response variable and the number of rows in Tbl must be equal. ResponseVarName — Response variable name name of variable in Tbl Response variable name, specified as the name of a variable in Tbl. You must specify ResponseVarName as a character vector or string scalar. For example, if the response variable Y is stored as Tbl.Y, then specify it as "Y". Otherwise, the software treats all columns of Tbl, including Y, as predictors when training the model. The response variable must be a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. If Y is a character array, then each element of the response variable must correspond to one row of the array. A good practice is to specify the order of the classes by using the ClassNames name-value argument. Data Types: char | string formula — Explanatory model of response variable and subset of predictor variables character vector | string scalar Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form "Y~x1+x2+x3". In this form, Y represents the response variable, and x1, x2, and x3 represent the predictor variables. To specify a subset of variables in Tbl as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in Tbl that do not appear in formula. The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB identifiers. You can verify the variable names in Tbl by using the isvarname function. If the variable names are not valid, then you can convert them by using the matlab.lang.makeValidName function. Data Types: char | string 35-2440

fitcnet

Y — Class labels numeric vector | categorical vector | logical vector | character array | string array | cell array of character vectors Class labels used to train the model, specified as a numeric, categorical, or logical vector; a character or string array; or a cell array of character vectors. • If Y is a character array, then each element of the class labels must correspond to one row of the array. • The length of Y must be equal to the number of rows in Tbl or X. • A good practice is to specify the class order by using the ClassNames name-value argument. Data Types: single | double | categorical | logical | char | string | cell X — Predictor data numeric matrix Predictor data used to train the model, specified as a numeric matrix. By default, the software treats each row of X as one observation, and each column as one predictor. The length of Y and the number of observations in X must be equal. To specify the names of the predictors in the order of their appearance in X, use the PredictorNames name-value argument. Note If you orient your predictor matrix so that observations correspond to columns and specify 'ObservationsIn','columns', then you might experience a significant reduction in computation time. Data Types: single | double Note The software treats NaN, empty character vector (''), empty string (""), , and elements as missing values, and removes observations with any of these characteristics: • Missing value in the response variable (for example, Y or ValidationData{2}) • At least one missing value in a predictor observation (for example, row in X or ValidationData{1}) • NaN value or 0 weight (for example, value in Weights or ValidationData{3}) • Class label with 0 prior probability (value in Prior)

Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. 35-2441

35

Functions

Example: fitcnet(X,Y,'LayerSizes',[10 10],'Activations',["relu","tanh"]) specifies to create a neural network with two fully connected layers, each with 10 outputs. The first layer uses a rectified linear unit (ReLU) activation function, and the second uses a hyperbolic tangent activation function. Neural Network Options

LayerSizes — Sizes of fully connected layers 10 (default) | positive integer vector Sizes of the fully connected layers in the neural network model, specified as a positive integer vector. The ith element of LayerSizes is the number of outputs in the ith fully connected layer of the neural network model. LayerSizes does not include the size of the final fully connected layer that uses a softmax activation function. For more information, see “Neural Network Structure” on page 35-2456. Example: 'LayerSizes',[100 25 10] Activations — Activation functions for fully connected layers 'relu' (default) | 'tanh' | 'sigmoid' | 'none' | string array | cell array of character vectors Activation functions for the fully connected layers of the neural network model, specified as a character vector, string scalar, string array, or cell array of character vectors with values from this table. Value

Description

'relu'

Rectified linear unit (ReLU) function — Performs a threshold operation on each element of the input, where any value less than zero is set to zero, that is, f x =

x, x ≥ 0 0, x < 0

'tanh'

Hyperbolic tangent (tanh) function — Applies the tanh function to each input element

'sigmoid'

Sigmoid function — Performs the following operation on each input element: f (x) =

'none'

1 1 + e−x

Identity function — Returns each input element without performing any transformation, that is, f(x) = x

• If you specify one activation function only, then Activations is the activation function for every fully connected layer of the neural network model, excluding the final fully connected layer. The activation function for the final fully connected layer is always softmax (see “Neural Network Structure” on page 35-2456). • If you specify an array of activation functions, then the ith element of Activations is the activation function for the ith layer of the neural network model. Example: 'Activations','sigmoid' 35-2442

fitcnet

LayerWeightsInitializer — Function to initialize fully connected layer weights 'glorot' (default) | 'he' Function to initialize the fully connected layer weights, specified as 'glorot' or 'he'. Value

Description

'glorot'

Initialize the weights with the Glorot initializer [1] (also known as the Xavier initializer). For each layer, the Glorot initializer independently samples from a uniform distribution with zero mean and variance 2/(I+O), where I is the input size and O is the output size for the layer.

'he'

Initialize the weights with the He initializer [2]. For each layer, the He initializer samples from a normal distribution with zero mean and variance 2/I, where I is the input size for the layer.

Example: 'LayerWeightsInitializer','he' LayerBiasesInitializer — Type of initial fully connected layer biases 'zeros' (default) | 'ones' Type of initial fully connected layer biases, specified as 'zeros' or 'ones'. • If you specify the value 'zeros', then each fully connected layer has an initial bias of 0. • If you specify the value 'ones', then each fully connected layer has an initial bias of 1. Example: 'LayerBiasesInitializer','ones' Data Types: char | string ObservationsIn — Predictor data observation dimension 'rows' (default) | 'columns' Predictor data observation dimension, specified as 'rows' or 'columns'. Note If you orient your predictor matrix so that observations correspond to columns and specify 'ObservationsIn','columns', then you might experience a significant reduction in computation time. You cannot specify 'ObservationsIn','columns' for predictor data in a table. Example: 'ObservationsIn','columns' Data Types: char | string Lambda — Regularization term strength 0 (default) | nonnegative scalar Regularization term strength, specified as a nonnegative scalar. The software composes the objective function for minimization from the cross-entropy loss function and the ridge (L2) penalty term. Example: 'Lambda',1e-4 Data Types: single | double 35-2443

35

Functions

Standardize — Flag to standardize predictor data false or 0 (default) | true or 1 Flag to standardize the predictor data, specified as a numeric or logical 0 (false) or 1 (true). If you set Standardize to true, then the software centers and scales each numeric predictor variable by the corresponding column mean and standard deviation. The software does not standardize the categorical predictors. Example: 'Standardize',true Data Types: single | double | logical Convergence Control Options

Verbose — Verbosity level 0 (default) | 1 Verbosity level, specified as 0 or 1. The 'Verbose' name-value argument controls the amount of diagnostic information that fitcnet displays at the command line. Value

Description

0

fitcnet does not display diagnostic information.

1

fitcnet periodically displays diagnostic information.

By default, StoreHistory is set to true and fitcnet stores the diagnostic information inside of Mdl. Use Mdl.TrainingHistory to access the diagnostic information. Example: 'Verbose',1 Data Types: single | double VerboseFrequency — Frequency of verbose printing 1 (default) | positive integer scalar Frequency of verbose printing, which is the number of iterations between printing to the command window, specified as a positive integer scalar. A value of 1 indicates to print diagnostic information at every iteration. Note To use this name-value argument, set Verbose to 1. Example: 'VerboseFrequency',5 Data Types: single | double StoreHistory — Flag to store training history true or 1 (default) | false or 0 Flag to store the training history, specified as a numeric or logical 0 (false) or 1 (true). If StoreHistory is set to true, then the software stores diagnostic information inside of Mdl, which you can access by using Mdl.TrainingHistory. Example: 'StoreHistory',false Data Types: single | double | logical 35-2444

fitcnet

InitialStepSize — Initial step size [] (default) | positive scalar | 'auto' Initial step size, specified as a positive scalar or 'auto'. By default, fitcnet does not use the initial step size to determine the initial Hessian approximation used in training the model (see “Training Solver” on page 35-2458). However, if you specify an initial step size s0 ∞, then the initial inverseHessian approximation is

s0

∞

∇ℒ 0

∞

I. ∇ℒ 0 is the initial gradient vector, and I is the identity matrix.

To have fitcnet determine an initial step size automatically, specify the value as 'auto' . In this case, the function determines the initial step size by using s0 ∞ = 0.5 η0 ∞ + 0.1. s0 is the initial step vector, and η0 is the vector of unconstrained initial weights and biases. Example: 'InitialStepSize','auto' Data Types: single | double | char | string IterationLimit — Maximum number of training iterations 1e3 (default) | positive integer scalar Maximum number of training iterations, specified as a positive integer scalar. The software returns a trained model regardless of whether the training routine successfully converges. Mdl.ConvergenceInfo contains convergence information. Example: 'IterationLimit',1e8 Data Types: single | double GradientTolerance — Relative gradient tolerance 1e-6 (default) | nonnegative scalar Relative gradient tolerance, specified as a nonnegative scalar. Let ℒ t be the loss function at training iteration t, ∇ℒ t be the gradient of the loss function with respect to the weights and biases at iteration t, and ∇ℒ 0 be the gradient of the loss function at an initial point. If max ∇ℒ t ≤ a ⋅ GradientTolerance, where a = max 1, min ℒ t , max ∇ℒ 0 , then the training process terminates. Example: 'GradientTolerance',1e-5 Data Types: single | double LossTolerance — Loss tolerance 1e-6 (default) | nonnegative scalar Loss tolerance, specified as a nonnegative scalar. If the function loss at some iteration is smaller than LossTolerance, then the training process terminates. Example: 'LossTolerance',1e-8 Data Types: single | double StepTolerance — Step size tolerance 1e-6 (default) | nonnegative scalar 35-2445

35

Functions

Step size tolerance, specified as a nonnegative scalar. If the step size at some iteration is smaller than StepTolerance, then the training process terminates. Example: 'StepTolerance',1e-4 Data Types: single | double ValidationData — Validation data for training convergence detection cell array | table Validation data for training convergence detection, specified as a cell array or table. During the training process, the software periodically estimates the validation loss by using ValidationData. If the validation loss increases more than ValidationPatience times in a row, then the software terminates the training. You can specify ValidationData as a table if you use a table Tbl of predictor data that contains the response variable. In this case, ValidationData must contain the same predictors and response contained in Tbl. The software does not apply weights to observations, even if Tbl contains a vector of weights. To specify weights, you must specify ValidationData as a cell array. If you specify ValidationData as a cell array, then it must have the following format: • ValidationData{1} must have the same data type and orientation as the predictor data. That is, if you use a predictor matrix X, then ValidationData{1} must be an m-by-p or p-by-m matrix of predictor data that has the same orientation as X. The predictor variables in the training data X and ValidationData{1} must correspond. Similarly, if you use a predictor table Tbl of predictor data, then ValidationData{1} must be a table containing the same predictor variables contained in Tbl. The number of observations in ValidationData{1} and the predictor data can vary. • ValidationData{2} must match the data type and format of the response variable, either Y or ResponseVarName. If ValidationData{2} is an array of class labels, then it must have the same number of elements as the number of observations in ValidationData{1}. The set of all distinct labels of ValidationData{2} must be a subset of all distinct labels of Y. If ValidationData{1} is a table, then ValidationData{2} can be the name of the response variable in the table. If you want to use the same ResponseVarName or formula, you can specify ValidationData{2} as []. • Optionally, you can specify ValidationData{3} as an m-dimensional numeric vector of observation weights or the name of a variable in the table ValidationData{1} that contains observation weights. The software normalizes the weights with the validation data so that they sum to 1. If you specify ValidationData and want to display the validation loss at the command line, set Verbose to 1. ValidationFrequency — Number of iterations between validation evaluations 1 (default) | positive integer scalar Number of iterations between validation evaluations, specified as a positive integer scalar. A value of 1 indicates to evaluate validation metrics at every iteration. Note To use this name-value argument, you must specify ValidationData. 35-2446

fitcnet

Example: 'ValidationFrequency',5 Data Types: single | double ValidationPatience — Stopping condition for validation evaluations 6 (default) | nonnegative integer scalar Stopping condition for validation evaluations, specified as a nonnegative integer scalar. The training process stops if the validation loss is greater than or equal to the minimum validation loss computed so far, ValidationPatience times in a row. You can check the Mdl.TrainingHistory table to see the running total of times that the validation loss is greater than or equal to the minimum (Validation Checks). Example: 'ValidationPatience',10 Data Types: single | double Other Classification Options

CategoricalPredictors — Categorical predictors list vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | 'all' Categorical predictors list, specified as one of the values in this table. The descriptions assume that the predictor data has observations in rows and predictors in columns. Value

Description

Vector of positive integers

Each entry in the vector is an index value indicating that the corresponding predictor is categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If fitcnet uses a subset of input variables as predictors, then the function indexes the predictors using only the subset. The CategoricalPredictors values do not count the response variable, observation weights variable, or any other variables that the function does not use.

Logical vector

A true entry means that the corresponding predictor is categorical. The length of the vector is p.

Character matrix

Each row of the matrix is the name of a predictor variable. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length.

String array or cell array of character vectors

Each element in the array is the name of a predictor variable. The names must match the entries in PredictorNames.

"all"

All predictors are categorical.

By default, if the predictor data is in a table (Tbl), fitcnet assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (X), fitcnet assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the CategoricalPredictors name-value argument. For the identified categorical predictors, fitcnet creates dummy variables using two different schemes, depending on whether a categorical variable is unordered or ordered. For an unordered 35-2447

35

Functions

categorical variable, fitcnet creates one dummy variable for each level of the categorical variable. For an ordered categorical variable, fitcnet creates one less dummy variable than the number of categories. For details, see “Automatic Creation of Dummy Variables” on page 2-14. Example: 'CategoricalPredictors','all' Data Types: single | double | logical | char | string | cell ClassNames — Names of classes to use for training categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Names of classes to use for training, specified as a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. ClassNames must have the same data type as the response variable in Tbl or Y. If ClassNames is a character array, then each element must correspond to one row of the array. Use ClassNames to: • Specify the order of the classes during training. • Specify the order of any input or output argument dimension that corresponds to the class order. For example, use ClassNames to specify the order of the dimensions of Cost or the column order of classification scores returned by predict. • Select a subset of classes for training. For example, suppose that the set of all distinct class names in Y is ["a","b","c"]. To train the model using observations from classes "a" and "c" only, specify "ClassNames",["a","c"]. The default value for ClassNames is the set of all distinct class names in the response variable in Tbl or Y. Example: "ClassNames",["b","g"] Data Types: categorical | char | string | logical | single | double | cell Cost — Misclassification cost square matrix | structure array Misclassification cost, specified as a square matrix or structure array. • If you specify a square matrix Cost and the true class of an observation is i, then Cost(i,j) is the cost of classifying a point into class j. That is, rows correspond to the true classes, and columns correspond to the predicted classes. To specify the class order for the corresponding rows and columns of Cost, also set the ClassNames name-value argument. • If you specify a structure S, then it must have two fields: • S.ClassNames, which contains the class names as a variable of the same data type as Y • S.ClassificationCosts, which contains the cost matrix with rows and columns ordered as in S.ClassNames The default value for Cost is ones(K) – eye(K), where K is the number of distinct classes. Example: "Cost",[0 1; 2 0] Data Types: single | double | struct 35-2448

fitcnet

PredictorNames — Predictor variable names string array of unique names | cell array of unique character vectors Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The functionality of 'PredictorNames' depends on the way you supply the training data. • If you supply X and Y, then you can use 'PredictorNames' to assign names to the predictor variables in X. • The order of the names in PredictorNames must correspond to the predictor order in X. Assuming that X has the default orientation, with observations in rows and predictors in columns, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal. • By default, PredictorNames is {'x1','x2',...}. • If you supply Tbl, then you can use 'PredictorNames' to choose which predictor variables to use in training. That is, fitcnet uses only the predictor variables in PredictorNames and the response variable during training. • PredictorNames must be a subset of Tbl.Properties.VariableNames and cannot include the name of the response variable. • By default, PredictorNames contains the names of all predictor variables. • A good practice is to specify the predictors for training using either 'PredictorNames' or formula, but not both. Example: 'PredictorNames', {'SepalLength','SepalWidth','PetalLength','PetalWidth'} Data Types: string | cell Prior — Prior class probabilities "empirical" (default) | "uniform" | numeric vector | structure array Prior class probabilities, specified as a value in this table. Value

Description

"empirical"

The class prior probabilities are the class relative frequencies in Y.

"uniform"

All class prior probabilities are equal to 1/K, where K is the number of classes.

numeric vector

Each element is a class prior probability. Order the elements according to Mdl.ClassNames or specify the order using the ClassNames name-value argument. The software normalizes the elements to sum to 1.

structure

A structure S with two fields: • S.ClassNames contains the class names as a variable of the same type as Y. • S.ClassProbs contains a vector of corresponding prior probabilities. The software normalizes the elements to sum to 1.

Example: "Prior",struct("ClassNames",["b","g"],"ClassProbs",1:2) Data Types: single | double | char | string | struct 35-2449

35

Functions

ResponseName — Response variable name "Y" (default) | character vector | string scalar Response variable name, specified as a character vector or string scalar. • If you supply Y, then you can use ResponseName to specify a name for the response variable. • If you supply ResponseVarName or formula, then you cannot use ResponseName. Example: "ResponseName","response" Data Types: char | string ScoreTransform — Score transformation "none" (default) | "doublelogit" | "invlogit" | "ismax" | "logit" | function handle | ... Score transformation, specified as a character vector, string scalar, or function handle. This table summarizes the available character vectors and string scalars. Value

Description

"doublelogit"

1/(1 + e–2x)

"invlogit"

log(x / (1 – x))

"ismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

"logit"

1/(1 + e–x)

"none" or "identity"

x (no transformation)

"sign"

–1 for x < 0 0 for x = 0 1 for x > 0

"symmetric"

2x – 1

"symmetricismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

"symmetriclogit"

2/(1 + e–x) – 1

For a MATLAB function or a function you define, use its function handle for the score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Example: "ScoreTransform","logit" Data Types: char | string | function_handle Weights — Observation weights nonnegative numeric vector | name of variable in Tbl Observation weights, specified as a nonnegative numeric vector or the name of a variable in Tbl. The software weights each observation in X or Tbl with the corresponding value in Weights. The length of Weights must equal the number of observations in X or Tbl. If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored as Tbl.W, then specify it as 'W'. Otherwise, the 35-2450

fitcnet

software treats all columns of Tbl, including W, as predictors or the response variable when training the model. By default, Weights is ones(n,1), where n is the number of observations in X or Tbl. The software normalizes Weights to sum to the value of the prior probability in the respective class. Data Types: single | double | char | string Note You cannot use any cross-validation name-value argument together with the 'OptimizeHyperparameters' name-value argument. You can modify the cross-validation for 'OptimizeHyperparameters' only by using the 'HyperparameterOptimizationOptions' name-value argument. Cross-Validation Options

CrossVal — Flag to train cross-validated classifier 'off' (default) | 'on' Flag to train a cross-validated classifier, specified as 'on' or 'off'. If you specify 'on', then the software trains a cross-validated classifier with 10 folds. You can override this cross-validation setting using the CVPartition, Holdout, KFold, or Leaveout name-value argument. You can use only one cross-validation name-value argument at a time to create a cross-validated model. Alternatively, cross-validate later by passing Mdl to crossval. Example: 'Crossval','on' Data Types: char | string CVPartition — Cross-validation partition [] (default) | cvpartition object Cross-validation partition, specified as a cvpartition object that specifies the type of crossvalidation and the indexing for the training and validation sets. To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,KFold=5). Then, you can specify the cross-validation partition by setting CVPartition=cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as a scalar value in the range [0,1]. If you specify Holdout=p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data. 35-2451

35

Functions

2

Store the compact trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Holdout=0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in the cross-validated model, specified as a positive integer value greater than 1. If you specify KFold=k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact trained models in a k-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: KFold=5 Data Types: single | double Leaveout — Leave-one-out cross-validation flag "off" (default) | "on" Leave-one-out cross-validation flag, specified as "on" or "off". If you specify Leaveout="on", then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the one observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact trained models in an n-by-1 cell vector in the Trained property of the crossvalidated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout. Example: Leaveout="on" Data Types: char | string

Hyperparameter Optimization Options

OptimizeHyperparameters — Parameters to optimize 'none' (default) | 'auto' | 'all' | string array or cell array of eligible parameter names | vector of optimizableVariable objects Parameters to optimize, specified as one of the following: 35-2452

fitcnet

• 'none' — Do not optimize. • 'auto' — Use {'Activations','Lambda','LayerSizes','Standardize'}. • 'all' — Optimize all eligible parameters. • String array or cell array of eligible parameter names. • Vector of optimizableVariable objects, typically the output of hyperparameters. The optimization attempts to minimize the cross-validation loss (error) for fitcnet by varying the parameters. For information about cross-validation loss (although in a different context), see “Classification Loss” on page 35-4305. To control the cross-validation type and other aspects of the optimization, use the HyperparameterOptimizationOptions name-value argument. Note The values of 'OptimizeHyperparameters' override any values you specify using other name-value arguments. For example, setting 'OptimizeHyperparameters' to 'auto' causes fitcnet to optimize hyperparameters corresponding to the 'auto' option and to ignore any specified values for the hyperparameters. The eligible parameters for fitcnet are: • Activations — fitcnet optimizes Activations over the set {'relu','tanh','sigmoid','none'}. • Lambda — fitcnet optimizes Lambda over continuous values in the range [1e-5,1e5]/ NumObservations, where the value is chosen uniformly in the log transformed range. • LayerBiasesInitializer — fitcnet optimizes LayerBiasesInitializer over the two values {'zeros','ones'}. • LayerWeightsInitializer — fitcnet optimizes LayerWeightsInitializer over the two values {'glorot','he'}. • LayerSizes — fitcnet optimizes over the three values 1, 2, and 3 fully connected layers, excluding the final fully connected layer. fitcnet optimizes each fully connected layer separately over 1 through 300 sizes in the layer, sampled on a logarithmic scale. Note When you use the LayerSizes argument, the iterative display shows the size of each relevant layer. For example, if the current number of fully connected layers is 3, and the three layers are of sizes 10, 79, and 44 respectively, the iterative display shows LayerSizes for that iteration as [10 79 44]. Note To access up to five fully connected layers or a different range of sizes in a layer, use hyperparameters to select the optimizable parameters and ranges. • Standardize — fitcnet optimizes Standardize over the two values {true,false}. Set nondefault parameters by passing a vector of optimizableVariable objects that have nondefault values. As an example, this code sets the range of NumLayers to [1 5] and optimizes Layer_4_Size and Layer_5_Size: load fisheriris params = hyperparameters('fitcnet',meas,species); params(1).Range = [1 5]; params(10).Optimize = true; params(11).Optimize = true;

35-2453

35

Functions

Pass params as the value of OptimizeHyperparameters. For an example using nondefault parameters, see “Customize Neural Network Classifier Optimization” on page 35-2432. By default, the iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is the misclassification rate. To control the iterative display, set the Verbose field of the 'HyperparameterOptimizationOptions' name-value argument. To control the plots, set the ShowPlots field of the 'HyperparameterOptimizationOptions' name-value argument. For an example, see “Improve Neural Network Classifier Using OptimizeHyperparameters” on page 35-2427. Example: 'OptimizeHyperparameters','auto' HyperparameterOptimizationOptions — Options for optimization structure Options for optimization, specified as a structure. This argument modifies the effect of the OptimizeHyperparameters name-value argument. All fields in the structure are optional. Field Name

Values

Default

Optimizer

• 'bayesopt' — Use Bayesian optimization. Internally, this setting calls bayesopt.

'bayesopt'

• 'gridsearch' — Use grid search with NumGridDivisions values per dimension. • 'randomsearch' — Search at random among MaxObjectiveEvaluations points. 'gridsearch' searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command sortrows(Mdl.HyperparameterOptimizatio nResults). AcquisitionFunct • 'expected-improvement-per-secondionName plus' • 'expected-improvement' • 'expected-improvement-plus' • 'expected-improvement-per-second' • 'lower-confidence-bound' • 'probability-of-improvement' Acquisition functions whose names include persecond do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include plus modify their behavior when they are overexploiting an area. For more details, see “Acquisition Function Types” on page 10-3.

35-2454

'expectedimprovement-persecond-plus'

fitcnet

Field Name

Values

Default

MaxObjectiveEval Maximum number of objective function uations evaluations.

30 for 'bayesopt' and 'randomsearch', and the entire grid for 'gridsearch'

MaxTime

Inf

Time limit, specified as a positive real scalar. The time limit is in seconds, as measured by tic and toc. The run time can exceed MaxTime because MaxTime does not interrupt function evaluations.

NumGridDivisions For 'gridsearch', the number of values in each 10 dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables. ShowPlots

Logical value indicating whether to show plots. If true true, this field plots the best observed objective function value against the iteration number. If you use Bayesian optimization (Optimizer is 'bayesopt'), then this field also plots the best estimated objective function value. The best observed objective function values and best estimated objective function values correspond to the values in the BestSoFar (observed) and BestSoFar (estim.) columns of the iterative display, respectively. You can find these values in the properties ObjectiveMinimumTrace and EstimatedObjectiveMinimumTrace of Mdl.HyperparameterOptimizationResults. If the problem includes one or two optimization parameters for Bayesian optimization, then ShowPlots also plots a model of the objective function against the parameters.

SaveIntermediate Logical value indicating whether to save results Results when Optimizer is 'bayesopt'. If true, this field overwrites a workspace variable named 'BayesoptResults' at each iteration. The variable is a BayesianOptimization object.

false

Verbose

1

Display at the command line: • 0 — No iterative display • 1 — Iterative display • 2 — Iterative display with extra information For details, see the bayesopt Verbose namevalue argument and the example “Optimize Classifier Fit Using Bayesian Optimization” on page 10-56.

35-2455

35

Functions

Field Name

Values

Default

UseParallel

Logical value indicating whether to run Bayesian false optimization in parallel, which requires Parallel Computing Toolbox. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For details, see “Parallel Bayesian Optimization” on page 10-7.

Repartition

Logical value indicating whether to repartition the false cross-validation at every iteration. If this field is false, the optimizer uses a single partition for the optimization. The setting true usually gives the most robust results because it takes partitioning noise into account. However, for good results, true requires at least twice as many function evaluations.

Use no more than one of the following three options. CVPartition

A cvpartition object, as created by cvpartition

Holdout

A scalar in the range (0,1) representing the holdout fraction

Kfold

An integer greater than 1

'Kfold',5 if you do not specify a cross-validation field

Example: 'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60) Data Types: struct

Output Arguments Mdl — Trained neural network classifier ClassificationNeuralNetwork object | ClassificationPartitionedModel object Trained neural network classifier, returned as a ClassificationNeuralNetwork or ClassificationPartitionedModel object. If you set any of the name-value arguments CrossVal, CVPartition, Holdout, KFold, or Leaveout, then Mdl is a ClassificationPartitionedModel object. Otherwise, Mdl is a ClassificationNeuralNetwork model. To reference properties of Mdl, use dot notation.

More About Neural Network Structure The default neural network classifier has the following layer structure.

35-2456

fitcnet

Structure

Description Input — This layer corresponds to the predictor data in Tbl or X. First fully connected layer — This layer has 10 outputs by default. • You can widen the layer or add more fully connected layers to the network by specifying the LayerSizes name-value argument. • You can find the weights and biases for this layer in the Mdl.LayerWeights{1} and Mdl.LayerBiases{1} properties of Mdl, respectively. ReLU activation function — fitcnet applies this activation function to the first fully connected layer. • You can change the activation function by specifying the Activations name-value argument. Final fully connected layer — This layer has K outputs, where K is the number of classes in the response variable. • You can find the weights and biases for this layer in the Mdl.LayerWeights{end} and Mdl.LayerBiases{end} properties of Mdl, respectively. Softmax function (for both binary and multiclass classification) — fitcnet applies this activation function to the final fully connected layer. The function takes each input xi and returns the following, where K is the number of classes in the response variable: f (xi) =

exp(xi) K

∑

j=1

.

exp(x j)

The results correspond to the predicted classification scores (or posterior probabilities). Output — This layer corresponds to the predicted class labels. For an example that shows how a neural network classifier with this layer structure returns predictions, see “Predict Using Layer Structure of Neural Network Classifier” on page 35-6395.

Tips • Always try to standardize the numeric predictors (see Standardize). Standardization makes predictors insensitive to the scales on which they are measured. • After training a model, you can generate C/C++ code that predicts labels for new data. Generating C/C++ code requires MATLAB Coder. For details, see “Introduction to Code Generation” on page 34-3.

35-2457

35

Functions

Algorithms Training Solver fitcnet uses a limited-memory Broyden-Fletcher-Goldfarb-Shanno quasi-Newton algorithm (LBFGS) [3] as its loss function minimization technique, where the software minimizes the cross-entropy loss. The LBFGS solver uses a standard line-search method with an approximation to the Hessian. Cost, Prior, and Weights • If you specify the Cost, Prior, and Weights name-value arguments, the output model object stores the specified values in the Cost, Prior, and W properties, respectively. The Cost property stores the user-specified cost matrix as is. The Prior and W properties store the prior probabilities and observation weights, respectively, after normalization. For details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 19-8. • The software uses the Cost property for prediction, but not training. Therefore, Cost is not readonly; you can change the property value by using dot notation after creating the trained model.

Version History Introduced in R2021a R2023a: Neural network classifiers support misclassification costs and prior probabilities fitcnet supports misclassification costs and prior probabilities for neural network classifiers. Specify the Cost and Prior name-value arguments when you create a model. Alternatively, you can specify misclassification costs after training a model by using dot notation to change the Cost property value of the model. Mdl.Cost = [0 2; 1 0];

References [1] Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of training deep feedforward neural networks.” In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256. 2010. [2] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.” In Proceedings of the IEEE international conference on computer vision, pp. 1026–1034. 2015. [3] Nocedal, J. and S. J. Wright. Numerical Optimization, 2nd ed., New York: Springer, 2006.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To perform parallel hyperparameter optimization, use the 'HyperparameterOptimizationOptions', struct('UseParallel',true) name-value argument in the call to the fitcnet function. 35-2458

fitcnet

For more information on parallel hyperparameter optimization, see “Parallel Bayesian Optimization” on page 10-7. For general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationNeuralNetwork | predict | loss | hyperparameters | margin | edge | ClassificationPartitionedModel | CompactClassificationNeuralNetwork Topics “Assess Neural Network Classifier Performance” on page 19-151

35-2459

35

Functions

fitcox Create Cox proportional hazards model

Syntax coxMdl = fitcox(X,T) coxMdl = fitcox(X,T,Name,Value)

Description The fitcox function creates a Cox proportional hazards model for lifetime data. The basic Cox model includes a hazard function h0(t) and model coefficients b such that, for predictor X, the hazard rate at time t is h Xi, t = h0(t)exp

p

∑

j=1

xi jb j ,

where the b coefficients do not depend on time. fitcox infers both the model coefficients b and the hazard rate h0(t), and stores them as properties in the resulting CoxModel object. The full Cox model includes extensions to the basic model, such as hazards with respect to different baselines or the inclusion of stratification variables. See “Extension of Cox Proportional Hazards Model” on page 15-27. coxMdl = fitcox(X,T) returns a Cox proportional hazards model object coxMdl using the predictor values X and event times T. coxMdl = fitcox(X,T,Name,Value) modifies the fit using one or more Name,Value arguments. For example, when the data includes censoring (values that are not observed), the Censoring argument specifies the censored data.

Examples Estimate Cox Proportional Hazard Regression Weibull random variables with the same shape parameter have proportional hazard rates; see “Weibull Distribution” on page B-186. The hazard rate with scale parameter a and shape parameter b at time t is b b−1 . t ab Generate pseudorandom samples from the Weibull distribution with scale parameters 1, 5, and 1/3, and with the same shape parameter B. rng default % For reproducibility B = 2; A = ones(100,1); data1 = wblrnd(A,B);

35-2460

fitcox

A2 = 5*A; data2 = wblrnd(A2,B); A3 = A/3; data3 = wblrnd(A3,B);

Create a table of data. The predictors are the three variable types, 1, 2, or 3. predictors = categorical([A;2*A;3*A]); data = table(predictors,[data1;data2;data3],'VariableNames',["Predictors" "Times"]);

Fit a Cox regression to the data. mdl = fitcox(data,"Times") mdl = Cox Proportional Hazards regression model

Predictors_2 Predictors_3

Beta _______

SE _______

zStat _______

pValue __________

-3.5834 2.1668

0.33187 0.20802

-10.798 10.416

3.5299e-27 2.0899e-25

Log-likelihood: -1197.917 rates = exp(mdl.Coefficients.Beta) rates = 2×1 0.0278 8.7301

Fit Cox Proportional Hazards Model to Lifetime Data Perform a Cox proportional hazards regression on the lightbulb data set, which contains simulated lifetimes of light bulbs. The first column of the light bulb data contains the lifetime (in hours) of two different types of bulbs. The second column contains a binary variable indicating whether the bulb is fluorescent or incandescent; 0 indicates the bulb is fluorescent, and 1 indicates it is incandescent. The third column contains the censoring information, where 0 indicates the bulb was observed until failure, and 1 indicates the observation was censored. Load the lightbulb data set. load lightbulb

Fit a Cox proportional hazards model for the lifetime of the light bulbs, accounting for censoring. The predictor variable is the type of bulb. coxMdl = fitcox(lightbulb(:,2),lightbulb(:,1), ... 'Censoring',lightbulb(:,3)) coxMdl = Cox Proportional Hazards regression model

35-2461

35

Functions

X1

Beta ______

SE ______

zStat ______

pValue __________

4.7262

1.0372

4.5568

5.1936e-06

Log-likelihood: -212.638

Find the hazard rate of incandescent bulbs compared to fluorescent bulbs by evaluating exp(Beta). hr = exp(coxMdl.Coefficients.Beta) hr = 112.8646

The estimate of the hazard ratio is eBeta = 112.8646, which means that the estimated hazard for the incandescent bulbs is 112.86 times the hazard for the fluorescent bulbs. The small value of coxMdl.Coefficients.pValue indicates there is a negligible chance that the two types of light bulbs have identical hazard rates, which would mean Beta = 0.

Input Arguments X — Predictor values matrix | table Predictor values, specified as a matrix or table. • A matrix contains one column for each predictor and one row for each observation. • A table contains one row for each observation. A table can also contain the time data as well as the predictors. By default, if the predictor data is in a table, fitcox assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix, fitcox assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the CategoricalPredictors name-value argument. If X, T, the value of 'Frequency', or the value of 'Stratification' contains NaN values, then fitcox removes rows with NaN values from all data when fitting a Cox model. Data Types: double | table | categorical T — Event times real column vector | real matrix with two columns | name of column in table X | formula in Wilkinson notation for table X Event times, specified as one of the following: • Real column vector. • Real matrix with two columns representing the start and stop times. • Name of a column in the table X. • Formula in Wilkinson notation for the table X. For example, to specify that the table columns 'x' and 'y' are in the model, use 35-2462

fitcox

'T ~ x + y' See “Wilkinson Notation” on page 11-93. For vector or matrix entries, the number of rows of T must be the same as the number of rows of X. Use the two-column form of T to fit a model with time-varying coefficients. See “Cox Proportional Hazards Model with Time-Dependent Covariates” on page 15-35. Data Types: single | double | char | string Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: To fit data with censored values cens, specify 'Censoring',cens. Baseline — X values at which to compute baseline hazard mean(X), the default for continuous predictors | 0, the default for categorical predictors | real scalar | real row vector X values at which to compute the baseline hazard, specified as a real scalar or row vector. If Baseline is a row vector, its length is the number of predictors, so there is one baseline for each predictor. The default baseline for continuous predictors is mean(X), so the default hazard rate at X for these predictors is h(t)*exp((X – mean(X))*b). The default baseline for categorical predictors is 0. Enter 0 to compute the baseline for all predictors relative to 0, so the hazard rate at X is h(t)*exp(X*b). Changing the baseline changes the hazard ratio, but does not affect the coefficient estimates. For the identified categorical predictors, fitcox creates dummy variables. fitcox creates one less dummy variable than the number of categories. For details, see “Automatic Creation of Dummy Variables” on page 2-14. Example: 'Baseline',0 Data Types: double Beta — Coefficient initial values 0.01/std(X) (default) | numeric vector Coefficient initial values, specified as a numeric vector of coefficient values. These values initiate the likelihood maximization iterations performed by fitcox. Data Types: double CategoricalPredictors — Categorical predictors list vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | 'all' Categorical predictors list, specified as one of the values in this table. 35-2463

35

Functions

Value

Description

Vector of positive integers

Each entry in the vector is an index value corresponding to the column of the predictor data (X) that contains a categorical variable.

Logical vector

A true entry means that the corresponding column of predictor data (X) is a categorical variable.

Character matrix

Each row of the matrix is the name of a predictor variable in the table X. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length.

String array or cell array of character vectors

Each element in the array is the name of a predictor variable in the table X. The names must match the entries in PredictorNames.

'all'

All predictors are categorical.

By default, if the predictor data is in a table, fitcox assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix, fitcox assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the 'CategoricalPredictors' name-value argument. For the identified categorical predictors, fitcox creates dummy variables. fitcox creates one less dummy variable than the number of categories. For details, see “Automatic Creation of Dummy Variables” on page 2-14. Example: 'CategoricalPredictors','all' Data Types: single | double | logical | char | string | cell Censoring — Indicator for censoring array of 0s (default) | array of 0s and 1s | name of a column in table X Indicator for censoring, specified as a Boolean vector with the same number of rows as X or the name of a column in the table X. Use 1 for observations that are right censored and 0 for observations that are fully observed. By default, all observations are fully observed. For an example, see “Cox Proportional Hazards Model for Censored Data” on page 15-31. Example: 'Censoring',cens Data Types: logical Frequency — Frequency or weights of observations array of 1s (default) | vector of nonnegative scalar values Frequency or weights of observations, specified as an array the same size as T containing nonnegative scalar values. The array can contain integer values corresponding to frequencies of observations or nonnegative values corresponding to observation weights. The default is 1 per row of X and T. If X, T, the value of 'Frequency', or the value of 'Stratification' contains NaN values, then fitcox removes rows with NaN values from all data when fitting a Cox model. Example: 'Frequency',w Data Types: double 35-2464

fitcox

OptimizationOptions — Algorithm control parameters structure Algorithm control parameters for the iterative algorithm fitcox uses to estimate the solution, specified as a structure. Create this structure using statset. For parameter names and default values, see the following table or enter statset('fitcox'). In the table, "termination tolerance" means that if the internal iterations cause a change in the stated value less than the tolerance, the iterations stop. Field in Structure

Description

Values

Display

Amount of information returned to the command line

• 'off' — None (default) • 'final' — Final output • 'iter' — Output at each iteration

MaxFunEvals

Maximum number of function evaluations

Positive integer; default is 200

MaxIter

Maximum number of iterations

Positive integer; default is 100

TolFun

Termination tolerance on change in likelihood; Positive scalar; default is see “Cox Proportional Hazards Model” on page 1e-8 15-26

TolX

Termination tolerance for parameter (predictor Positive scalar; default is estimate) change 1e-8

Example: 'OptimizationOptions',statset('TolX',1e-6,'MaxIter',200) PredictorNames — Predictor variable names string array of unique names | cell array of unique character vectors Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The functionality of 'PredictorNames' depends on how you supply the training data. • If you supply X as a numeric array, then you can use 'PredictorNames' to assign names to the predictor variables in X. • The order of the names in PredictorNames must correspond to the column order of X. That is, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal. • By default, PredictorNames is {'X1','X2',...}. • If you supply X as a table, then you can use 'PredictorNames' to choose which predictor variables to use in training. That is, fitcox uses only the predictor variables in PredictorNames and the time variable during training. • PredictorNames must be a subset of X.Properties.VariableNames and cannot include the name of the time variable T. • By default, PredictorNames contains the names of all predictor variables. 35-2465

35

Functions

• Specify the predictors for training using either 'PredictorNames' or a formula in Wilkinson notation, but not both. Example: 'PredictorNames',{'Sex','Age','Weight','Smoker'} Data Types: string | cell Stratification — Stratification variables [] (default) | matrix of real values | name of column in table X | array of categorical variables Stratification variables, specified as a matrix of real values, the name of a column in table X, or an array of categorical variables. The matrix must have the same number of rows as T, with each row corresponding to an observation. The default [] is no stratification variable. If X, T, the value of 'Frequency', or the value of 'Stratification' contains NaN values, then fitcox removes rows with NaN values from all data when fitting a Cox model. Example: 'Stratification',Gender Data Types: single | double | char | string | categorical TieBreakMethod — Method to handle tied failure times 'breslow' (default) | 'efron' Method to handle tied failure times, specified as 'breslow' (Breslow's method) or 'efron' (Efron's method). See “Partial Likelihood Function for Tied Events” on page 15-28. Example: 'TieBreakMethod','efron' Data Types: char | string

Version History Introduced in R2021a

See Also CoxModel | hazardratio | survival | plotSurvival | linhyptest | coefci | coxphfit Topics “Cox Proportional Hazards Model Object” on page 15-39 “What Is Survival Analysis?” on page 15-2 “Cox Proportional Hazards Model” on page 15-26 “Cox Proportional Hazards Model for Censored Data” on page 15-31 “Cox Proportional Hazards Model with Time-Dependent Covariates” on page 15-35 “Analyzing Survival or Reliability Data” on page 15-47

35-2466

fitcsvm

fitcsvm Train support vector machine (SVM) classifier for one-class and binary classification

Syntax Mdl = fitcsvm(Tbl,ResponseVarName) Mdl = fitcsvm(Tbl,formula) Mdl = fitcsvm(Tbl,Y) Mdl = fitcsvm(X,Y) Mdl = fitcsvm( ___ ,Name,Value)

Description fitcsvm trains or cross-validates a support vector machine (SVM) model for one-class and two-class (binary) classification on a low-dimensional or moderate-dimensional predictor data set. fitcsvm supports mapping the predictor data using kernel functions, and supports sequential minimal optimization (SMO), iterative single data algorithm (ISDA), or L1 soft-margin minimization via quadratic programming for objective-function minimization. To train a linear SVM model for binary classification on a high-dimensional data set, that is, a data set that includes many predictor variables, use fitclinear instead. For multiclass learning with combined binary SVM models, use error-correcting output codes (ECOC). For more details, see fitcecoc. To train an SVM regression model, see fitrsvm for low-dimensional and moderate-dimensional predictor data sets, or fitrlinear for high-dimensional data sets. Mdl = fitcsvm(Tbl,ResponseVarName) returns a support vector machine (SVM) classifier on page 35-2499 Mdl trained using the sample data contained in the table Tbl. ResponseVarName is the name of the variable in Tbl that contains the class labels for one-class or two-class classification. If the class label variable contains only one class (for example, a vector of ones), fitcsvm trains a model for one-class classification. Otherwise, the function trains a model for two-class classification. Mdl = fitcsvm(Tbl,formula) returns an SVM classifier trained using the sample data contained in the table Tbl. formula is an explanatory model of the response and a subset of the predictor variables in Tbl used to fit Mdl. Mdl = fitcsvm(Tbl,Y) returns an SVM classifier trained using the predictor variables in the table Tbl and the class labels in vector Y. Mdl = fitcsvm(X,Y) returns an SVM classifier trained using the predictors in the matrix X and the class labels in vector Y for one-class or two-class classification. Mdl = fitcsvm( ___ ,Name,Value) specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, you can specify the type of cross-validation, the cost for misclassification, and the type of score transformation function. 35-2467

35

Functions

Examples Train SVM Classifier Load Fisher's iris data set. Remove the sepal lengths and widths and all observed setosa irises. load fisheriris inds = ~strcmp(species,'setosa'); X = meas(inds,3:4); y = species(inds);

Train an SVM classifier using the processed data set. SVMModel = fitcsvm(X,y) SVMModel = ClassificationSVM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: Alpha: Bias: KernelParameters: BoxConstraints: ConvergenceInfo: IsSupportVector: Solver:

'Y' [] {'versicolor' 'virginica'} 'none' 100 [24x1 double] -14.4149 [1x1 struct] [100x1 double] [1x1 struct] [100x1 logical] 'SMO'

SVMModel is a trained ClassificationSVM classifier. Display the properties of SVMModel. For example, to determine the class order, use dot notation. classOrder = SVMModel.ClassNames classOrder = 2x1 cell {'versicolor'} {'virginica' }

The first class ('versicolor') is the negative class, and the second ('virginica') is the positive class. You can change the class order during training by using the 'ClassNames' name-value pair argument. Plot a scatter diagram of the data and circle the support vectors. sv = SVMModel.SupportVectors; figure gscatter(X(:,1),X(:,2),y) hold on plot(sv(:,1),sv(:,2),'ko','MarkerSize',10) legend('versicolor','virginica','Support Vector') hold off

35-2468

fitcsvm

The support vectors are observations that occur on or beyond their estimated class boundaries. You can adjust the boundaries (and, therefore, the number of support vectors) by setting a box constraint during training using the 'BoxConstraint' name-value pair argument.

Plot Decision Boundary and Margin Lines for Two-Class SVM Classifier This example shows how to plot the decision boundary and margin lines of a two-class (binary) SVM classifier with two predictor variables. Load Fisher's iris data set. Exclude all the versicolor iris species (leaving only the setosa and virginica species), and keep only the sepal length and width measurements. load fisheriris; inds = ~strcmp(species,'versicolor'); X = meas(inds,1:2); s = species(inds);

Train a linear kernel SVM classifier. SVMModel = fitcsvm(X,s);

SVMModel is a trained ClassificationSVM classifier, whose properties include the support vectors, linear predictor coefficients, and bias term. 35-2469

35

Functions

sv = SVMModel.SupportVectors; % Support vectors beta = SVMModel.Beta; % Linear predictor coefficients b = SVMModel.Bias; % Bias term

Plot a scatter diagram of the data, and circle the support vectors. The support vectors are observations that occur on or beyond their estimated class boundaries. hold on gscatter(X(:,1),X(:,2),s) plot(sv(:,1),sv(:,2),'ko','MarkerSize',10)

The best separating hyperplane for the SVMModel classifier is a straight line specified by β1 X1 + β2 X2 + b = 0. Plot the decision boundary between the two species as a solid line. X1 = linspace(min(X(:,1)),max(X(:,1)),100); X2 = -(beta(1)/beta(2)*X1)-b/beta(2); plot(X1,X2,'-')

The linear predictor coefficients β define a vector that is orthogonal to the decision boundary. The −1

maximum margin width is 2| | β‖ (for more information, see “Support Vector Machines for Binary Classification” on page 35-2499). Plot the maximum margin boundaries as dashed lines. Label the axes and add a legend. m = 1/sqrt(beta(1)^2 + beta(2)^2); % Margin half-width X1margin_low = X1+beta(1)*m^2; X2margin_low = X2+beta(2)*m^2; X1margin_high = X1-beta(1)*m^2; X2margin_high = X2-beta(2)*m^2; plot(X1margin_high,X2margin_high,'b--') plot(X1margin_low,X2margin_low,'r--') xlabel('X_1 (Sepal Length in cm)') ylabel('X_2 (Sepal Width in cm)') legend('setosa','virginica','Support Vector', ... 'Boundary Line','Upper Margin','Lower Margin') hold off

35-2470

fitcsvm

Train and Cross-Validate SVM Classifier Load the ionosphere data set. load ionosphere rng(1); % For reproducibility

Train an SVM classifier using the radial basis kernel. Let the software find a scale value for the kernel function. Standardize the predictors. SVMModel = fitcsvm(X,Y,'Standardize',true,'KernelFunction','RBF',... 'KernelScale','auto');

SVMModel is a trained ClassificationSVM classifier. Cross-validate the SVM classifier. By default, the software uses 10-fold cross-validation. CVSVMModel = crossval(SVMModel);

CVSVMModel is a ClassificationPartitionedModel cross-validated classifier. Estimate the out-of-sample misclassification rate. classLoss = kfoldLoss(CVSVMModel) classLoss = 0.0484

35-2471

35

Functions

The generalization rate is approximately 5%.

Detect Outliers Using SVM and One-Class Learning Modify Fisher's iris data set by assigning all the irises to the same class. Detect outliers in the modified data set, and confirm the expected proportion of the observations that are outliers. Load Fisher's iris data set. Remove the petal lengths and widths. Treat all irises as coming from the same class. load fisheriris X = meas(:,1:2); y = ones(size(X,1),1);

Train an SVM classifier using the modified data set. Assume that 5% of the observations are outliers. Standardize the predictors. rng(1); SVMModel = fitcsvm(X,y,'KernelScale','auto','Standardize',true,... 'OutlierFraction',0.05);

SVMModel is a trained ClassificationSVM classifier. By default, the software uses the Gaussian kernel for one-class learning. Plot the observations and the decision boundary. Flag the support vectors and potential outliers. svInd = SVMModel.IsSupportVector; h = 0.02; % Mesh grid step size [X1,X2] = meshgrid(min(X(:,1)):h:max(X(:,1)),... min(X(:,2)):h:max(X(:,2))); [~,score] = predict(SVMModel,[X1(:),X2(:)]); scoreGrid = reshape(score,size(X1,1),size(X2,2)); figure plot(X(:,1),X(:,2),'k.') hold on plot(X(svInd,1),X(svInd,2),'ro','MarkerSize',10) contour(X1,X2,scoreGrid) colorbar; title('{\bf Iris Outlier Detection via One-Class SVM}') xlabel('Sepal Length (cm)') ylabel('Sepal Width (cm)') legend('Observation','Support Vector') hold off

35-2472

fitcsvm

The boundary separating the outliers from the rest of the data occurs where the contour value is 0. Verify that the fraction of observations with negative scores in the cross-validated data is close to 5%. CVSVMModel = crossval(SVMModel); [~,scorePred] = kfoldPredict(CVSVMModel); outlierRate = mean(scorePred 10 % Class labels Y = M×1 tall logical array 1 0 1 1 0 1 0 0 : :

Create a tall array for the predictor data. X = tt{:,1:end-1} % Predictor data X =

35-2517

35

Functions

M×6 tall double matrix 10 10 10 10 10 10 10 10 : :

21 26 23 23 22 28 8 10 : :

3 1 5 5 4 3 4 6 : :

642 1021 2055 1332 629 1446 928 859 : :

8 8 21 13 4 59 3 11 : :

308 296 480 296 373 308 447 954 : :

Remove rows in X and Y that contain missing data. R = rmmissing([X Y]); % Data with missing entries removed X = R(:,1:end-1); Y = R(:,end);

Standardize the predictor variables. Z = zscore(X);

Optimize hyperparameters automatically using the 'OptimizeHyperparameters' name-value pair argument. Find the optimal 'MinLeafSize' value that minimizes holdout cross-validation loss. (Specifying 'auto' uses 'MinLeafSize'.) For reproducibility, use the 'expected-improvementplus' acquisition function and set the seeds of the random number generators using rng and tallrng. The results can vary depending on the number of workers and the execution environment for the tall arrays. For details, see “Control Where Your Code Runs”. rng('default') tallrng('default') [Mdl,FitInfo,HyperparameterOptimizationResults] = fitctree(Z,Y,... 'OptimizeHyperparameters','auto',... 'HyperparameterOptimizationOptions',struct('Holdout',0.3,... 'AcquisitionFunctionName','expected-improvement-plus')) Evaluating tall expression using the - Pass 1 of 3: Completed in 5.6 sec - Pass 2 of 3: Completed in 2.1 sec - Pass 3 of 3: Completed in 3.4 sec Evaluation completed in 13 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.73 sec Evaluation completed in 0.9 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 1.2 sec Evaluation completed in 1.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.64 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.72 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 5.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.58 sec

35-2518

Parallel Pool 'local':

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

fitctree

- Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 0.87 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.89 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.68 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 4.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.65 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.65 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 4.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.63 sec - Pass 2 of 4: Completed in 0.85 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 4.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.2 sec - Pass 2 of 4: Completed in 0.88 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 3 sec Evaluation completed in 6.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.72 sec - Pass 2 of 4: Completed in 0.96 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 4.2 sec Evaluation completed in 7.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.77 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

35-2519

35

Functions

- Pass 2 of 4: Completed in 0.95 sec - Pass 3 of 4: Completed in 0.65 sec - Pass 4 of 4: Completed in 4.8 sec Evaluation completed in 7.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.79 sec - Pass 2 of 4: Completed in 1 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 5.1 sec Evaluation completed in 8.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.89 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 5.8 sec Evaluation completed in 9.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.63 sec - Pass 4 of 4: Completed in 5.2 sec Evaluation completed in 8.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.6 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.74 sec - Pass 4 of 4: Completed in 4.8 sec Evaluation completed in 9.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.68 sec - Pass 4 of 4: Completed in 3.9 sec Evaluation completed in 7.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.6 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.7 sec - Pass 4 of 4: Completed in 3 sec Evaluation completed in 7.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.66 sec - Pass 4 of 4: Completed in 2.5 sec Evaluation completed in 6.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.66 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 5.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.69 sec - Pass 4 of 4: Completed in 1.9 sec Evaluation completed in 5.5 sec

35-2520

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 1.2 sec - Pass 2 of 4: Completed in 1.4 sec - Pass 3 of 4: Completed in 0.67 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 5.5 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 1.3 sec - Pass 2 of 4: Completed in 1.4 sec - Pass 3 of 4: Completed in 0.65 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 5.4 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.67 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 5 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 1.2 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.73 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 5.1 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.65 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 5.1 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 2.4 sec Evaluation completed in 2.6 sec |======================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | MinLeafSize | | | result | | runtime | (observed) | (estim.) | | |======================================================================================| | 1 | Best | 0.11572 | 197.12 | 0.11572 | 0.11572 | 10 | Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.4 sec Evaluation completed in 0.56 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.93 sec Evaluation completed in 1.1 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 1.1 sec - Pass 4 of 4: Completed in 0.84 sec Evaluation completed in 3.7 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.4 sec Evaluation completed in 1.6 sec | 2 | Accept | 0.19635 | 10.496 | 0.11572 | 0.12008 | 48298 | Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.33 sec Evaluation completed in 0.47 sec Evaluating tall expression using the Parallel Pool 'local':

35-2521

35

Functions

- Pass 1 of 1: Completed in 0.83 sec Evaluation completed in 0.99 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.68 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.74 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.73 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.68 sec - Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.5 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.75 sec

35-2522

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

Evaluation completed in 0.87 sec | 3 | Best | 0.1048 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.3 sec Evaluation completed in 0.45 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.83 sec Evaluation completed in 0.97 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.99 sec - Pass 2 of 4: Completed in 0.68 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.73 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.47 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.82 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 0.89 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 4.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.5 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.63 sec

44.614 | 0.1048 | Parallel Pool 'local':

0.11431 |

3166 |

Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

35-2523

35

Functions

- Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 4.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 4.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.66 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 1.1 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 4.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 1 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 4.3 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.8 sec Evaluation completed in 0.94 sec | 4 | Best | 0.10094 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.3 sec

35-2524

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 91.723 | 0.10094 | Parallel Pool 'local':

0.10574 |

180 |

fitctree

Evaluation completed in 0.42 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.93 sec Evaluation completed in 1.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.66 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 1.1 sec - Pass 4 of 4: Completed in 0.88 sec Evaluation completed in 4.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.5 sec - Pass 4 of 4: Completed in 0.98 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.5 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.7 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 4.1 sec

Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

35-2525

35

Functions

Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.64 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.97 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.89 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 0.85 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.82 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 1.3 sec Evaluation completed in 1.4 sec | 5 | Best | 0.10087 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.32 sec Evaluation completed in 0.45 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.87 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3.1 sec

35-2526

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 82.84 | 0.10087 | Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

0.10085 |

219 |

fitctree

Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.66 sec - Pass 3 of 4: Completed in 0.5 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.68 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.68 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.85 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 0.84 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 0.87 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 1.1 sec - Pass 4 of 4: Completed in 0.92 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.59 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

35-2527

35

Functions

- Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.68 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.77 sec Evaluation completed in 0.93 sec | 6 | Accept | 0.10155 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.33 sec Evaluation completed in 0.46 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.89 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.85 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.87 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.98 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.62 sec

35-2528

Parallel Pool 'local':

Parallel Pool 'local': 61.043 | 0.10087 | Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

0.10089 |

1089 |

fitctree

- Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 2 sec Evaluation completed in 4.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 2.7 sec Evaluation completed in 5.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.64 sec - Pass 2 of 4: Completed in 0.87 sec - Pass 3 of 4: Completed in 1.2 sec - Pass 4 of 4: Completed in 3.7 sec Evaluation completed in 7.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.73 sec - Pass 2 of 4: Completed in 0.92 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 4.4 sec Evaluation completed in 7.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.86 sec - Pass 2 of 4: Completed in 1.5 sec - Pass 3 of 4: Completed in 0.64 sec - Pass 4 of 4: Completed in 4.8 sec Evaluation completed in 8.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.9 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.65 sec - Pass 4 of 4: Completed in 5.2 sec Evaluation completed in 8.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.73 sec - Pass 4 of 4: Completed in 5.6 sec Evaluation completed in 9.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.6 sec - Pass 3 of 4: Completed in 0.75 sec - Pass 4 of 4: Completed in 5.8 sec Evaluation completed in 10 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.3 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

35-2529

35

Functions

- Pass 2 of 4: Completed in 1.4 sec - Pass 3 of 4: Completed in 1.2 sec - Pass 4 of 4: Completed in 5.1 sec Evaluation completed in 9.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.4 sec - Pass 2 of 4: Completed in 1.5 sec - Pass 3 of 4: Completed in 0.7 sec - Pass 4 of 4: Completed in 4.1 sec Evaluation completed in 8.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.4 sec - Pass 2 of 4: Completed in 1.6 sec - Pass 3 of 4: Completed in 0.71 sec - Pass 4 of 4: Completed in 3.6 sec Evaluation completed in 7.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.8 sec - Pass 3 of 4: Completed in 0.74 sec - Pass 4 of 4: Completed in 3.2 sec Evaluation completed in 7.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.4 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.73 sec - Pass 4 of 4: Completed in 2.8 sec Evaluation completed in 7.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.82 sec - Pass 4 of 4: Completed in 2.4 sec Evaluation completed in 7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 2 sec - Pass 2 of 4: Completed in 1.9 sec - Pass 3 of 4: Completed in 0.79 sec - Pass 4 of 4: Completed in 2.3 sec Evaluation completed in 7.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.6 sec - Pass 2 of 4: Completed in 1.8 sec - Pass 3 of 4: Completed in 0.73 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 6.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.6 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.79 sec - Pass 4 of 4: Completed in 2.3 sec Evaluation completed in 7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.7 sec - Pass 2 of 4: Completed in 1.9 sec - Pass 3 of 4: Completed in 0.8 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 6.8 sec

35-2530

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

Evaluating tall expression using the - Pass 1 of 4: Completed in 1.7 sec - Pass 2 of 4: Completed in 1.8 sec - Pass 3 of 4: Completed in 0.77 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 6.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.4 sec - Pass 2 of 4: Completed in 1.6 sec - Pass 3 of 4: Completed in 0.73 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 6.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 1.3 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 6.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.73 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 6.3 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 1.3 sec Evaluation completed in 1.5 sec | 7 | Accept | 0.13495 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.31 sec Evaluation completed in 0.44 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.87 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.47 sec - Pass 2 of 4: Completed in 0.67 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.74 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 1.1 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.85 sec Evaluation completed in 3.8 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 241.76 | 0.10087 | Parallel Pool 'local':

0.10089 |

1 |

Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

35-2531

35

Functions

Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.89 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 4.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.2 sec - Pass 2 of 4: Completed in 1.5 sec - Pass 3 of 4: Completed in 1.1 sec - Pass 4 of 4: Completed in 1.9 sec Evaluation completed in 6.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.69 sec - Pass 2 of 4: Completed in 0.88 sec - Pass 3 of 4: Completed in 0.75 sec - Pass 4 of 4: Completed in 2.1 sec Evaluation completed in 5.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 4.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.84 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 4.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.68 sec - Pass 2 of 4: Completed in 0.85 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 4.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.68 sec - Pass 2 of 4: Completed in 0.91 sec - Pass 3 of 4: Completed in 0.58 sec

35-2532

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

- Pass 4 of 4: Completed in 2.4 sec Evaluation completed in 5.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.92 sec - Pass 2 of 4: Completed in 0.86 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 4.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.69 sec - Pass 2 of 4: Completed in 0.91 sec - Pass 3 of 4: Completed in 0.63 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 4.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.67 sec - Pass 2 of 4: Completed in 0.86 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.99 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.2 sec - Pass 2 of 4: Completed in 0.9 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.95 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.73 sec - Pass 2 of 4: Completed in 0.91 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.91 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.76 sec - Pass 2 of 4: Completed in 0.93 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.91 sec Evaluation completed in 1.1 sec | 8 | Accept | 0.10246 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.34 sec Evaluation completed in 0.49 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.87 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.52 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 115.31 | 0.10087 | Parallel Pool 'local':

0.10089 |

58 |

Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

35-2533

35

Functions

- Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 4.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 1.1 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 1.1 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 4.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.95 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.94 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec

35-2534

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

- Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.72 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 1.3 sec Evaluation completed in 1.4 sec | 9 | Accept | 0.10173 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.32 sec Evaluation completed in 0.46 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.84 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.75 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.68 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.91 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.82 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 77.229 | 0.10087 | Parallel Pool 'local':

0.10086 |

418 |

Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

35-2535

35

Functions

- Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 0.82 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.95 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.71 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 5.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.65 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 4.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.67 sec - Pass 2 of 4: Completed in 0.87 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 4.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.82 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.7 sec

35-2536

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

Evaluating tall expression using the - Pass 1 of 4: Completed in 0.65 sec - Pass 2 of 4: Completed in 0.84 sec - Pass 3 of 4: Completed in 0.62 sec - Pass 4 of 4: Completed in 0.89 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.64 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.88 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.9 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.65 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.77 sec Evaluation completed in 0.89 sec | 10 | Accept | 0.10114 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.86 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.83 sec Evaluation completed in 0.99 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.85 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.2 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 94.532 | 0.10087 | Parallel Pool 'local':

0.10091 |

123 |

Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

35-2537

35

Functions

Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.82 sec - Pass 3 of 4: Completed in 0.64 sec - Pass 4 of 4: Completed in 0.94 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.97 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.65 sec - Pass 2 of 4: Completed in 0.84 sec - Pass 3 of 4: Completed in 1.1 sec

35-2538

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

- Pass 4 of 4: Completed in 1 sec Evaluation completed in 4.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.63 sec - Pass 2 of 4: Completed in 0.84 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.2 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.65 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.77 sec Evaluation completed in 0.89 sec | 11 | Best | 0.1008 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.38 sec Evaluation completed in 0.52 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.88 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.93 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.59 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 90.637 | 0.1008 | Parallel Pool 'local':

0.10088 |

178 |

Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

35-2539

35

Functions

- Pass 4 of 4: Completed in 0.91 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.85 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 4.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.82 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.66 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.85 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec

35-2540

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

- Pass 2 of 4: Completed in 0.86 sec - Pass 3 of 4: Completed in 1.1 sec - Pass 4 of 4: Completed in 0.96 sec Evaluation completed in 4.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.65 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.69 sec - Pass 2 of 4: Completed in 0.84 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.76 sec Evaluation completed in 0.89 sec | 12 | Accept | 0.1008 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.32 sec Evaluation completed in 0.45 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.9 sec Evaluation completed in 1.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.72 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 90.267 | 0.1008 | Parallel Pool 'local':

0.10086 |

179 |

Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

35-2541

35

Functions

- Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 0.74 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.83 sec Evaluation completed in 0.97 sec | 13 | Accept | 0.11126 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.32 sec Evaluation completed in 0.45 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.85 sec Evaluation completed in 0.99 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.74 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.68 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.91 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.64 sec - Pass 4 of 4: Completed in 0.99 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec

35-2542

Parallel Pool 'local': 32.134 | 0.1008 | Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

0.10084 |

10251 |

fitctree

- Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 0.94 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.89 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.83 sec Evaluation completed in 0.97 sec | 14 | Accept | 0.10154 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.36 sec Evaluation completed in 0.5 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.89 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.74 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.67 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 66.262 | 0.1008 | Parallel Pool 'local':

0.10085 |

736 |

Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

35-2543

35

Functions

- Pass 2 of 4: Completed in 0.69 - Pass 3 of 4: Completed in 0.54 - Pass 4 of 4: Completed in 0.84 Evaluation completed in 3.1 sec Evaluating tall expression using Evaluation 0% ...

sec sec sec the Parallel Pool 'local':

Mdl = CompactClassificationTree ResponseName: 'Y' CategoricalPredictors: [] ClassNames: [0 1] ScoreTransform: 'none' Properties, Methods FitInfo = struct with no fields.

HyperparameterOptimizationResults = BayesianOptimization with properties: ObjectiveFcn: VariableDescriptions: Options: MinObjective: XAtMinObjective: MinEstimatedObjective: XAtMinEstimatedObjective: NumObjectiveEvaluations: TotalElapsedTime: NextPoint: XTrace: ObjectiveTrace: ConstraintsTrace: UserDataTrace: ObjectiveEvaluationTimeTrace: IterationTimeTrace: ErrorTrace: FeasibilityTrace: FeasibilityProbabilityTrace: IndexOfMinimumTrace: ObjectiveMinimumTrace: EstimatedObjectiveMinimumTrace:

@createObjFcn/tallObjFcn [4×1 optimizableVariable] [1×1 struct] 0.1004 [1×1 table] 0.1008 [1×1 table] 30 3.0367e+03 [1×1 table] [30×1 table] [30×1 double] [] {30×1 cell} [30×1 double] [30×1 double] [30×1 double] [30×1 logical] [30×1 double] [30×1 double] [30×1 double] [30×1 double]

Input Arguments Tbl — Sample data table Sample data used to train the model, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain one 35-2544

fitctree

additional column for the response variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed. • If Tbl contains the response variable, and you want to use all remaining variables in Tbl as predictors, then specify the response variable by using ResponseVarName. • If Tbl contains the response variable, and you want to use only a subset of the remaining variables in Tbl as predictors, then specify a formula by using formula. • If Tbl does not contain the response variable, then specify a response variable by using Y. The length of the response variable and the number of rows in Tbl must be equal. ResponseVarName — Response variable name name of variable in Tbl Response variable name, specified as the name of a variable in Tbl. You must specify ResponseVarName as a character vector or string scalar. For example, if the response variable Y is stored as Tbl.Y, then specify it as "Y". Otherwise, the software treats all columns of Tbl, including Y, as predictors when training the model. The response variable must be a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. If Y is a character array, then each element of the response variable must correspond to one row of the array. A good practice is to specify the order of the classes by using the ClassNames name-value argument. Data Types: char | string formula — Explanatory model of response variable and subset of predictor variables character vector | string scalar Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form "Y~x1+x2+x3". In this form, Y represents the response variable, and x1, x2, and x3 represent the predictor variables. To specify a subset of variables in Tbl as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in Tbl that do not appear in formula. The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB identifiers. You can verify the variable names in Tbl by using the isvarname function. If the variable names are not valid, then you can convert them by using the matlab.lang.makeValidName function. Data Types: char | string Y — Class labels numeric vector | categorical vector | logical vector | character array | string array | cell array of character vectors Class labels, specified as a numeric vector, categorical vector, logical vector, character array, string array, or cell array of character vectors. Each row of Y represents the classification of the corresponding row of X.

35-2545

35

Functions

When fitting the tree, fitctree considers NaN, '' (empty character vector), "" (empty string), , and values in Y to be missing values. fitctree does not use observations with missing values for Y in the fit. For numeric Y, consider fitting a regression tree using fitrtree instead. Data Types: single | double | categorical | logical | char | string | cell X — Predictor data numeric matrix Predictor data, specified as a numeric matrix. Each row of X corresponds to one observation, and each column corresponds to one predictor variable. fitctree considers NaN values in X as missing values. fitctree does not use observations with all missing values for X in the fit. fitctree uses observations with some missing values for X to find splits on variables for which these observations have valid values. Data Types: single | double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Note You cannot use any cross-validation name-value argument together with the 'OptimizeHyperparameters' name-value argument. You can modify the cross-validation for 'OptimizeHyperparameters' only by using the 'HyperparameterOptimizationOptions' name-value argument. Example: 'CrossVal','on','MinLeafSize',40 specifies a cross-validated classification tree with a minimum of 40 observations per leaf. Model Parameters

AlgorithmForCategorical — Algorithm for best categorical predictor split 'Exact' | 'PullLeft' | 'PCA' | 'OVAbyClass' Algorithm to find the best split on a categorical predictor with C categories for data and K ≥ 3 classes, specified as the comma-separated pair consisting of 'AlgorithmForCategorical' and one of the following values.

35-2546

Value

Description

'Exact'

Consider all 2C–1 – 1 combinations.

fitctree

Value

Description

'PullLeft'

Start with all C categories on the right branch. Consider moving each category to the left branch as it achieves the minimum impurity for the K classes among the remaining categories. From this sequence, choose the split that has the lowest impurity.

'PCA'

Compute a score for each category using the inner product between the first principal component of a weighted covariance matrix (of the centered class probability matrix) and the vector of class probabilities for that category. Sort the scores in ascending order, and consider all C – 1 splits.

'OVAbyClass'

Start with all C categories on the right branch. For each class, order the categories based on their probability for that class. For the first class, consider moving each category to the left branch in order, recording the impurity criterion at each move. Repeat for the remaining classes. From this sequence, choose the split that has the minimum impurity.

fitctree automatically selects the optimal subset of algorithms for each split using the known number of classes and levels of a categorical predictor. For K = 2 classes, fitctree always performs the exact search. To specify a particular algorithm, use the 'AlgorithmForCategorical' namevalue pair argument. For more details, see “Splitting Categorical Predictors in Classification Trees” on page 20-25. Example: 'AlgorithmForCategorical','PCA' CategoricalPredictors — Categorical predictors list vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | 'all' Categorical predictors list, specified as one of the values in this table. Value

Description

Vector of positive integers

Each entry in the vector is an index value indicating that the corresponding predictor is categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If fitctree uses a subset of input variables as predictors, then the function indexes the predictors using only the subset. The CategoricalPredictors values do not count the response variable, observation weights variable, or any other variables that the function does not use.

Logical vector

A true entry means that the corresponding predictor is categorical. The length of the vector is p.

35-2547

35

Functions

Value

Description

Character matrix

Each row of the matrix is the name of a predictor variable. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length.

String array or cell array of character vectors

Each element in the array is the name of a predictor variable. The names must match the entries in PredictorNames.

"all"

All predictors are categorical.

By default, if the predictor data is a table (Tbl), fitctree assumes that a variable is categorical if it is a logical vector, unordered categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (X), fitctree assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the CategoricalPredictors name-value argument. Example: 'CategoricalPredictors','all' Data Types: single | double | logical | char | string | cell ClassNames — Names of classes to use for training categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Names of classes to use for training, specified as a categorical, character, or string array; a logical or numeric vector; or a cell array of character vectors. ClassNames must have the same data type as the response variable in Tbl or Y. If ClassNames is a character array, then each element must correspond to one row of the array. Use ClassNames to: • Specify the order of the classes during training. • Specify the order of any input or output argument dimension that corresponds to the class order. For example, use ClassNames to specify the order of the dimensions of Cost or the column order of classification scores returned by predict. • Select a subset of classes for training. For example, suppose that the set of all distinct class names in Y is ["a","b","c"]. To train the model using observations from classes "a" and "c" only, specify "ClassNames",["a","c"]. The default value for ClassNames is the set of all distinct class names in the response variable in Tbl or Y. Example: "ClassNames",["b","g"] Data Types: categorical | char | string | logical | single | double | cell Cost — Cost of misclassification square matrix | structure Cost of misclassification of a point, specified as the comma-separated pair consisting of 'Cost' and one of the following: • Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (i.e., the rows correspond to the true class and the columns correspond to the predicted class). To 35-2548

fitctree

specify the class order for the corresponding rows and columns of Cost, also specify the ClassNames name-value pair argument. • Structure S having two fields: S.ClassNames containing the group names as a variable of the same data type as Y, and S.ClassificationCosts containing the cost matrix. The default is Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j. Data Types: single | double | struct MaxDepth — Maximum tree depth positive integer Maximum tree depth, specified as the comma-separated pair consisting of 'MaxDepth' and a positive integer. Specify a value for this argument to return a tree that has fewer levels and requires fewer passes through the tall array to compute. Generally, the algorithm of fitctree takes one pass through the data and an additional pass for each tree level. The function does not set a maximum tree depth, by default. Note This option applies only when you use fitctree on tall arrays. See Tall Arrays on page 352567 for more information. MaxNumCategories — Maximum category levels 10 (default) | nonnegative scalar value Maximum category levels, specified as the comma-separated pair consisting of 'MaxNumCategories' and a nonnegative scalar value. fitctree splits a categorical predictor using the exact search algorithm if the predictor has at most MaxNumCategories levels in the split node. Otherwise, fitctree finds the best categorical split using one of the inexact algorithms. Passing a small value can lead to loss of accuracy and passing a large value can increase computation time and memory overload. Example: 'MaxNumCategories',8 MaxNumSplits — Maximal number of decision splits size(X,1) - 1 (default) | positive integer Maximal number of decision splits (or branch nodes), specified as the comma-separated pair consisting of 'MaxNumSplits' and a positive integer. fitctree splits MaxNumSplits or fewer branch nodes. For more details on splitting behavior, see Algorithms on page 35-2563. Example: 'MaxNumSplits',5 Data Types: single | double MergeLeaves — Leaf merge flag 'on' (default) | 'off' Leaf merge flag, specified as the comma-separated pair consisting of 'MergeLeaves' and 'on' or 'off'. If MergeLeaves is 'on', then fitctree: • Merges leaves that originate from the same parent node, and that yields a sum of risk values greater than or equal to the risk associated with the parent node 35-2549

35

Functions

• Estimates the optimal sequence of pruned subtrees, but does not prune the classification tree Otherwise, fitctree does not merge leaves. Example: 'MergeLeaves','off' MinLeafSize — Minimum number of leaf node observations 1 (default) | positive integer value Minimum number of leaf node observations, specified as the comma-separated pair consisting of 'MinLeafSize' and a positive integer value. Each leaf has at least MinLeafSize observations per tree leaf. If you supply both MinParentSize and MinLeafSize, fitctree uses the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize). Example: 'MinLeafSize',3 Data Types: single | double MinParentSize — Minimum number of branch node observations 10 (default) | positive integer value Minimum number of branch node observations, specified as the comma-separated pair consisting of 'MinParentSize' and a positive integer value. Each branch node in the tree has at least MinParentSize observations. If you supply both MinParentSize and MinLeafSize, fitctree uses the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize). Example: 'MinParentSize',8 Data Types: single | double NumBins — Number of bins for numeric predictors [](empty) (default) | positive integer scalar Number of bins for numeric predictors, specified as the comma-separated pair consisting of 'NumBins' and a positive integer scalar. • If the 'NumBins' value is empty (default), then fitctree does not bin any predictors. • If you specify the 'NumBins' value as a positive integer scalar (numBins), then fitctree bins every numeric predictor into at most numBins equiprobable bins, and then grows trees on the bin indices instead of the original data. • The number of bins can be less than numBins if a predictor has fewer than numBins unique values. • fitctree does not bin categorical predictors. When you use a large training data set, this binning option speeds up training but might cause a potential decrease in accuracy. You can try 'NumBins',50 first, and then change the value depending on the accuracy and training speed. A trained model stores the bin edges in the BinEdges property. Example: 'NumBins',50 Data Types: single | double NumVariablesToSample — Number of predictors to select at random for each split 'all' (default) | positive integer value 35-2550

fitctree

Number of predictors to select at random for each split, specified as the comma-separated pair consisting of 'NumVariablesToSample' and a positive integer value. Alternatively, you can specify 'all' to use all available predictors. If the training data includes many predictors and you want to analyze predictor importance, then specify 'NumVariablesToSample' as 'all'. Otherwise, the software might not select some predictors, underestimating their importance. To reproduce the random selections, you must set the seed of the random number generator by using rng and specify 'Reproducible',true. Example: 'NumVariablesToSample',3 Data Types: char | string | single | double PredictorNames — Predictor variable names string array of unique names | cell array of unique character vectors Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The functionality of PredictorNames depends on the way you supply the training data. • If you supply X and Y, then you can use PredictorNames to assign names to the predictor variables in X. • The order of the names in PredictorNames must correspond to the column order of X. That is, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal. • By default, PredictorNames is {'x1','x2',...}. • If you supply Tbl, then you can use PredictorNames to choose which predictor variables to use in training. That is, fitctree uses only the predictor variables in PredictorNames and the response variable during training. • PredictorNames must be a subset of Tbl.Properties.VariableNames and cannot include the name of the response variable. • By default, PredictorNames contains the names of all predictor variables. • A good practice is to specify the predictors for training using either PredictorNames or formula, but not both. Example: "PredictorNames", ["SepalLength","SepalWidth","PetalLength","PetalWidth"] Data Types: string | cell PredictorSelection — Algorithm used to select the best split predictor 'allsplits' (default) | 'curvature' | 'interaction-curvature' Algorithm used to select the best split predictor at each node, specified as the comma-separated pair consisting of 'PredictorSelection' and a value in this table. Value

Description

'allsplits'

Standard CART — Selects the split predictor that maximizes the splitcriterion gain over all possible splits of all predictors [1].

35-2551

35

Functions

Value

Description

'curvature'

Curvature test on page 35-2560 — Selects the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response [4]. Training speed is similar to standard CART.

'interactioncurvature'

Interaction test on page 35-2562 — Chooses the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response, and that minimizes the p-value of a chi-square test of independence between each pair of predictors and response [3]. Training speed can be slower than standard CART.

For 'curvature' and 'interaction-curvature', if all tests yield p-values greater than 0.05, then fitctree stops splitting nodes. Tip • Standard CART tends to select split predictors containing many distinct values, e.g., continuous variables, over those containing few distinct values, e.g., categorical variables [4]. Consider specifying the curvature or interaction test if any of the following are true: • If there are predictors that have relatively fewer distinct values than other predictors, for example, if the predictor data set is heterogeneous. • If an analysis of predictor importance is your goal. For more on predictor importance estimation, see predictorImportance and “Introduction to Feature Selection” on page 1646. • Trees grown using standard CART are not sensitive to predictor variable interactions. Also, such trees are less likely to identify important variables in the presence of many irrelevant predictors than the application of the interaction test. Therefore, to account for predictor interactions and identify importance variables in the presence of many irrelevant variables, specify the interaction test [3]. • Prediction speed is unaffected by the value of 'PredictorSelection'.

For details on how fitctree selects split predictors, see “Node Splitting Rules” on page 35-2563 and “Choose Split Predictor Selection Technique” on page 20-14. Example: 'PredictorSelection','curvature' Prior — Prior probabilities 'empirical' (default) | 'uniform' | vector of scalar values | structure Prior probabilities for each class, specified as one of the following: • Character vector or string scalar. • 'empirical' determines class probabilities from class frequencies in the response variable in Y or Tbl. If you pass observation weights, fitctree uses the weights to compute the class probabilities. • 'uniform' sets all class probabilities to be equal. • Vector (one scalar value for each class). To specify the class order for the corresponding elements of 'Prior', set the 'ClassNames' name-value argument. 35-2552

fitctree

• Structure S with two fields. • S.ClassNames contains the class names as a variable of the same type as the response variable in Y or Tbl. • S.ClassProbs contains a vector of corresponding probabilities. fitctree normalizes the weights in each class ('Weights') to add up to the value of the prior probability of the respective class. Example: 'Prior','uniform' Data Types: char | string | single | double | struct Prune — Flag to estimate optimal sequence of pruned subtrees 'on' (default) | 'off' Flag to estimate the optimal sequence of pruned subtrees, specified as the comma-separated pair consisting of 'Prune' and 'on' or 'off'. If Prune is 'on', then fitctree grows the classification tree without pruning it, but estimates the optimal sequence of pruned subtrees. Otherwise, fitctree grows the classification tree without estimating the optimal sequence of pruned subtrees. To prune a trained ClassificationTree model, pass it to prune. Example: 'Prune','off' PruneCriterion — Pruning criterion 'error' (default) | 'impurity' Pruning criterion, specified as the comma-separated pair consisting of 'PruneCriterion' and 'error' or 'impurity'. If you specify 'impurity', then fitctree uses the impurity measure specified by the 'SplitCriterion' name-value pair argument. For details, see “Impurity and Node Error” on page 35-2561. Example: 'PruneCriterion','impurity' Reproducible — Flag to enforce reproducibility false (logical 0) (default) | true (logical 1) Flag to enforce reproducibility over repeated runs of training a model, specified as the commaseparated pair consisting of 'Reproducible' and either false or true. If 'NumVariablesToSample' is not 'all', then the software selects predictors at random for each split. To reproduce the random selections, you must specify 'Reproducible',true and set the seed of the random number generator by using rng. Note that setting 'Reproducible' to true can slow down training. Example: 'Reproducible',true Data Types: logical ResponseName — Response variable name 'Y' (default) | character vector | string scalar 35-2553

35

Functions

Response variable name, specified as the comma-separated pair consisting of 'ResponseName' and a character vector or string scalar representing the name of the response variable. This name-value pair is not valid when using the ResponseVarName or formula input arguments. Example: 'ResponseName','IrisType' Data Types: char | string ScoreTransform — Score transformation "none" (default) | "doublelogit" | "invlogit" | "ismax" | "logit" | function handle | ... Score transformation, specified as a character vector, string scalar, or function handle. This table summarizes the available character vectors and string scalars. Value

Description

"doublelogit"

1/(1 + e–2x)

"invlogit"

log(x / (1 – x))

"ismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

"logit"

1/(1 + e–x)

"none" or "identity"

x (no transformation)

"sign"

–1 for x < 0 0 for x = 0 1 for x > 0

"symmetric"

2x – 1

"symmetricismax"

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

"symmetriclogit"

2/(1 + e–x) – 1

For a MATLAB function or a function you define, use its function handle for the score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Example: "ScoreTransform","logit" Data Types: char | string | function_handle SplitCriterion — Split criterion 'gdi' (default) | 'twoing' | 'deviance' Split criterion, specified as the comma-separated pair consisting of 'SplitCriterion' and 'gdi' (Gini's diversity index), 'twoing' for the twoing rule, or 'deviance' for maximum deviance reduction (also known as cross entropy). For details, see “Impurity and Node Error” on page 35-2561. Example: 'SplitCriterion','deviance' Surrogate — Surrogate decision splits flag 'off' (default) | 'on' | 'all' | positive integer value 35-2554

fitctree

Surrogate decision splits on page 35-2563 flag, specified as the comma-separated pair consisting of 'Surrogate' and 'on', 'off', 'all', or a positive integer value. • When set to 'on', fitctree finds at most 10 surrogate splits at each branch node. • When set to 'all', fitctree finds all surrogate splits at each branch node. The 'all' setting can use considerable time and memory. • When set to a positive integer value, fitctree finds at most the specified number of surrogate splits at each branch node. Use surrogate splits to improve the accuracy of predictions for data with missing values. The setting also lets you compute measures of predictive association between predictors. For more details, see “Node Splitting Rules” on page 35-2563. Example: 'Surrogate','on' Data Types: single | double | char | string Weights — Observation weights ones(size(X,1),1) (default) | vector of scalar values | name of variable in Tbl Observation weights, specified as the comma-separated pair consisting of 'Weights' and a vector of scalar values or the name of a variable in Tbl. The software weights the observations in each row of X or Tbl with the corresponding value in Weights. The size of Weights must equal the number of rows in X or Tbl. If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored as Tbl.W, then specify it as 'W'. Otherwise, the software treats all columns of Tbl, including W, as predictors when training the model. fitctree normalizes the weights in each class to add up to the value of the prior probability of the respective class. Data Types: single | double | char | string Cross-Validation Options

CrossVal — Flag to grow cross-validated decision tree 'off' (default) | 'on' Flag to grow a cross-validated decision tree, specified as the comma-separated pair consisting of 'CrossVal' and 'on' or 'off'. If 'on', fitctree grows a cross-validated decision tree with 10 folds. You can override this crossvalidation setting using one of the 'KFold', 'Holdout', 'Leaveout', or 'CVPartition' namevalue pair arguments. You can only use one of these four arguments at a time when creating a crossvalidated tree. Alternatively, cross-validate tree later using the crossval method. Example: 'CrossVal','on' CVPartition — Partition for cross-validated tree cvpartition object Partition to use in a cross-validated tree, specified as the comma-separated pair consisting of 'CVPartition' and an object created using cvpartition. 35-2555

35

Functions

If you use 'CVPartition', you cannot use any of the 'KFold', 'Holdout', or 'Leaveout' namevalue pair arguments. Holdout — Fraction of data for holdout validation 0 (default) | scalar value in the range [0,1] Fraction of data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range [0,1]. Holdout validation tests the specified fraction of the data, and uses the rest of the data for training. If you use 'Holdout', you cannot use any of the 'CVPartition', 'KFold', or 'Leaveout' namevalue pair arguments. Example: 'Holdout',0.1 Data Types: single | double KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated classifier, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify, e.g., 'KFold',k, then the software: 1

Randomly partitions the data into k sets

2

For each set, reserves the set as validation data, and trains the model using the other k – 1 sets

3

Stores the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four options only: CVPartition, Holdout, KFold, or Leaveout. Example: 'KFold',8 Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. Specify 'on' to use leave-one-out cross-validation. If you use 'Leaveout', you cannot use any of the 'CVPartition', 'Holdout', or 'KFold' namevalue pair arguments. Example: 'Leaveout','on'

Hyperparameter Optimization Options

OptimizeHyperparameters — Parameters to optimize 'none' (default) | 'auto' | 'all' | string array or cell array of eligible parameter names | vector of optimizableVariable objects Parameters to optimize, specified as the comma-separated pair consisting of 'OptimizeHyperparameters' and one of the following: 35-2556

fitctree

• 'none' — Do not optimize. • 'auto' — Use {'MinLeafSize'} • 'all' — Optimize all eligible parameters. • String array or cell array of eligible parameter names • Vector of optimizableVariable objects, typically the output of hyperparameters The optimization attempts to minimize the cross-validation loss (error) for fitctree by varying the parameters. For information about cross-validation loss (albeit in a different context), see “Classification Loss” on page 35-4305. To control the cross-validation type and other aspects of the optimization, use the HyperparameterOptimizationOptions name-value pair. Note The values of 'OptimizeHyperparameters' override any values you specify using other name-value arguments. For example, setting 'OptimizeHyperparameters' to 'auto' causes fitctree to optimize hyperparameters corresponding to the 'auto' option and to ignore any specified values for the hyperparameters. The eligible parameters for fitctree are: • MaxNumSplits — fitctree searches among integers, by default log-scaled in the range [1,max(2,NumObservations-1)]. • MinLeafSize — fitctree searches among integers, by default log-scaled in the range [1,max(2,floor(NumObservations/2))]. • SplitCriterion — For two classes, fitctree searches among 'gdi' and 'deviance'. For three or more classes, fitctree also searches among 'twoing'. • NumVariablesToSample — fitctree does not optimize over this hyperparameter. If you pass NumVariablesToSample as a parameter name, fitctree simply uses the full number of predictors. However, fitcensemble does optimize over this hyperparameter. Set nondefault parameters by passing a vector of optimizableVariable objects that have nondefault values. For example, load fisheriris params = hyperparameters('fitctree',meas,species); params(1).Range = [1,30];

Pass params as the value of OptimizeHyperparameters. By default, the iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is the misclassification rate. To control the iterative display, set the Verbose field of the 'HyperparameterOptimizationOptions' name-value argument. To control the plots, set the ShowPlots field of the 'HyperparameterOptimizationOptions' name-value argument. For an example, see “Optimize Classification Tree” on page 35-2510. Example: 'auto' HyperparameterOptimizationOptions — Options for optimization structure 35-2557

35

Functions

Options for optimization, specified as a structure. This argument modifies the effect of the OptimizeHyperparameters name-value argument. All fields in the structure are optional. Field Name

Values

Default

Optimizer

• 'bayesopt' — Use Bayesian optimization. Internally, this setting calls bayesopt.

'bayesopt'

• 'gridsearch' — Use grid search with NumGridDivisions values per dimension. • 'randomsearch' — Search at random among MaxObjectiveEvaluations points. 'gridsearch' searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command sortrows(Mdl.HyperparameterOptimizatio nResults). AcquisitionFunct • 'expected-improvement-per-secondionName plus' • 'expected-improvement'

'expectedimprovement-persecond-plus'

• 'expected-improvement-plus' • 'expected-improvement-per-second' • 'lower-confidence-bound' • 'probability-of-improvement' Acquisition functions whose names include persecond do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include plus modify their behavior when they are overexploiting an area. For more details, see “Acquisition Function Types” on page 10-3. MaxObjectiveEval Maximum number of objective function uations evaluations.

30 for 'bayesopt' and 'randomsearch', and the entire grid for 'gridsearch'

MaxTime

Inf

Time limit, specified as a positive real scalar. The time limit is in seconds, as measured by tic and toc. The run time can exceed MaxTime because MaxTime does not interrupt function evaluations.

NumGridDivisions For 'gridsearch', the number of values in each 10 dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables.

35-2558

fitctree

Field Name

Values

Default

ShowPlots

Logical value indicating whether to show plots. If true true, this field plots the best observed objective function value against the iteration number. If you use Bayesian optimization (Optimizer is 'bayesopt'), then this field also plots the best estimated objective function value. The best observed objective function values and best estimated objective function values correspond to the values in the BestSoFar (observed) and BestSoFar (estim.) columns of the iterative display, respectively. You can find these values in the properties ObjectiveMinimumTrace and EstimatedObjectiveMinimumTrace of Mdl.HyperparameterOptimizationResults. If the problem includes one or two optimization parameters for Bayesian optimization, then ShowPlots also plots a model of the objective function against the parameters.

SaveIntermediate Logical value indicating whether to save results Results when Optimizer is 'bayesopt'. If true, this field overwrites a workspace variable named 'BayesoptResults' at each iteration. The variable is a BayesianOptimization object.

false

Verbose

1

Display at the command line: • 0 — No iterative display • 1 — Iterative display • 2 — Iterative display with extra information For details, see the bayesopt Verbose namevalue argument and the example “Optimize Classifier Fit Using Bayesian Optimization” on page 10-56.

UseParallel

Logical value indicating whether to run Bayesian false optimization in parallel, which requires Parallel Computing Toolbox. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For details, see “Parallel Bayesian Optimization” on page 10-7.

35-2559

35

Functions

Field Name

Values

Default

Repartition

Logical value indicating whether to repartition the false cross-validation at every iteration. If this field is false, the optimizer uses a single partition for the optimization. The setting true usually gives the most robust results because it takes partitioning noise into account. However, for good results, true requires at least twice as many function evaluations.

Use no more than one of the following three options. CVPartition

A cvpartition object, as created by cvpartition

Holdout

A scalar in the range (0,1) representing the holdout fraction

Kfold

An integer greater than 1

'Kfold',5 if you do not specify a cross-validation field

Example: 'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60) Data Types: struct

Output Arguments tree — Classification tree classification tree object Classification tree, returned as a classification tree object. Using the 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition' options results in a tree of class ClassificationPartitionedModel. You cannot use a partitioned tree for prediction, so this kind of tree does not have a predict method. Instead, use kfoldPredict to predict responses for observations not used for training. Otherwise, tree is of class ClassificationTree, and you can use the predict method to make predictions.

More About Curvature Test The curvature test is a statistical test assessing the null hypothesis that two variables are unassociated. The curvature test between predictor variable x and y is conducted using this process.

35-2560

1

If x is continuous, then partition it into its quartiles. Create a nominal variable that bins observations according to which section of the partition they occupy. If there are missing values, then create an extra bin for them.

2

For each level in the partitioned predictor j = 1...J and class in the response k = 1,...,K, compute the weighted proportion of observations in class k

fitctree

π

jk

n

∑

=

i=1

I yi = k wi .

wi is the weight of observation i, ∑ wi = 1, I is the indicator function, and n is the sample size. If n jk all observations have the same weight, then π jk = , where njk is the number of observations in n level j of the predictor that are in class k. 3

Compute the test statistic K

t=n

J

∑ ∑

k=1 j=1

π

j+

=

π

jk − π j + π +k

π

2

j + π +k

∑ π jk, that is, the marginal probability of observing the predictor at level j. π +k = ∑ π jk, k

j

that is the marginal probability of observing class k. If n is large enough, then t is distributed as a χ2 with (K – 1)(J – 1) degrees of freedom. 4

If the p-value for the test is less than 0.05, then reject the null hypothesis that there is no association between x and y.

When determining the best split predictor at each node, the standard CART algorithm prefers to select continuous predictors that have many levels. Sometimes, such a selection can be spurious and can also mask more important predictors that have fewer levels, such as categorical predictors. The curvature test can be applied instead of standard CART to determine the best split predictor at each node. In that case, the best split predictor variable is the one that minimizes the significant pvalues (those less than 0.05) of curvature tests between each predictor and the response variable. Such a selection is robust to the number of levels in individual predictors. Note If levels of a predictor are pure for a particular class, then fitctree merges those levels. Therefore, in step 3 of the algorithm, J can be less than the actual number of levels in the predictor. For example, if x has 4 levels, and all observations in bins 1 and 2 belong to class 1, then those levels are pure for class 1. Consequently, fitctree merges the observations in bins 1 and 2, and J reduces to 3. For more details on how the curvature test applies to growing classification trees, see “Node Splitting Rules” on page 35-2563 and [4]. Impurity and Node Error A decision tree splits nodes based on either impurity or node error. Impurity means one of several things, depending on your choice of the SplitCriterion name-value argument: • Gini's Diversity Index (gdi) — The Gini index of a node is 1 − ∑ p2(i), i

35-2561

35

Functions

where the sum is over the classes i at the node, and p(i) is the observed fraction of classes with class i that reach the node. A node with just one class (a pure node) has Gini index 0; otherwise, the Gini index is positive. So the Gini index is a measure of node impurity. • Deviance ("deviance") — With p(i) defined the same as for the Gini index, the deviance of a node is − ∑ p(i)log2p(i) . i

A pure node has deviance 0; otherwise, the deviance is positive. • Twoing rule ("twoing") — Twoing is not a purity measure of a node, but is a different measure for deciding how to split a node. Let L(i) denote the fraction of members of class i in the left child node after a split, and R(i) denote the fraction of members of class i in the right child node after a split. Choose the split criterion to maximize P(L)P(R)

∑

L(i) − R(i)

2

,

i

where P(L) and P(R) are the fractions of observations that split to the left and right, respectively. If the expression is large, the split made each child node purer. Similarly, if the expression is small, the split made each child node similar to each other and, therefore, similar to the parent node. The split did not increase node purity. • Node error — The node error is the fraction of misclassified classes at a node. If j is the class with the largest number of training samples at a node, the node error is 1 – p(j). Interaction Test The interaction test is a statistical test that assesses the null hypothesis that there is no interaction between a pair of predictor variables and the response variable. The interaction test assessing the association between predictor variables x1 and x2 with respect to y is conducted using this process. 1

If x1 or x2 is continuous, then partition that variable into its quartiles. Create a nominal variable that bins observations according to which section of the partition they occupy. If there are missing values, then create an extra bin for them.

2

Create the nominal variable z with J = J1J2 levels that assigns an index to observation i according to which levels of x1 and x2 it belongs. Remove any levels of z that do not correspond to any observations.

3

Conduct a curvature test on page 35-2560 between z and y.

When growing decision trees, if there are important interactions between pairs of predictors, but there are also many other less important predictors in the data, then standard CART tends to miss the important interactions. However, conducting curvature and interaction tests for predictor selection instead can improve detection of important interactions, which can yield more accurate decision trees. For more details on how the interaction test applies to growing decision trees, see “Curvature Test” on page 35-2560, “Node Splitting Rules” on page 35-2563 and [3]. 35-2562

fitctree

Predictive Measure of Association The predictive measure of association is a value that indicates the similarity between decision rules that split observations. Among all possible decision splits that are compared to the optimal split (found by growing the tree), the best surrogate decision split on page 35-2563 yields the maximum predictive measure of association. The second-best surrogate split has the second-largest predictive measure of association. Suppose xj and xk are predictor variables j and k, respectively, and j ≠ k. At node t, the predictive measure of association between the optimal split xj < u and a surrogate split xk < v is λ jk =

min PL, PR − 1 − PL jLk − PR jRk min PL, PR

.

• PL is the proportion of observations in node t, such that xj < u. The subscript L stands for the left child of node t. • PR is the proportion of observations in node t, such that xj ≥ u. The subscript R stands for the right child of node t. • PL jLk is the proportion of observations at node t, such that xj < u and xk < v. • PR jRk is the proportion of observations at node t, such that xj ≥ u and xk ≥ v. • Observations with missing values for xj or xk do not contribute to the proportion calculations. λjk is a value in (–∞,1]. If λjk > 0, then xk < v is a worthwhile surrogate split for xj < u. Surrogate Decision Splits A surrogate decision split is an alternative to the optimal decision split at a given node in a decision tree. The optimal split is found by growing the tree; the surrogate split uses a similar or correlated predictor variable and split criterion. When the value of the optimal split predictor for an observation is missing, the observation is sent to the left or right child node using the best surrogate predictor. When the value of the best surrogate split predictor for the observation is also missing, the observation is sent to the left or right child node using the second-best surrogate predictor, and so on. Candidate splits are sorted in descending order by their predictive measure of association on page 35-3092.

Tip • By default, Prune is 'on'. However, this specification does not prune the classification tree. To prune a trained classification tree, pass the classification tree to prune. • After training a model, you can generate C/C++ code that predicts labels for new data. Generating C/C++ code requires MATLAB Coder. For details, see “Introduction to Code Generation” on page 34-3.

Algorithms Node Splitting Rules fitctree uses these processes to determine how to split node t. 35-2563

35

Functions

• For standard CART (that is, if PredictorSelection is 'allpairs') and for all predictors xi, i = 1,...,p: 1

fitctree computes the weighted impurity of node t, it. For supported impurity measures, see SplitCriterion.

2

fitctree estimates the probability that an observation is in node t using PT =

∑

j∈T

wj .

wj is the weight of observation j, and T is the set of all observation indices in node t. If you do not specify Prior or Weights, then wj = 1/n, where n is the sample size. 3

fitctree sorts xi in ascending order. Each element of the sorted predictor is a splitting candidate or cut point. fitctree stores any indices corresponding to missing values in the set TU, which is the unsplit set.

4

fitctree determines the best way to split node t using xi by maximizing the impurity gain (ΔI) over all splitting candidates. That is, for all splitting candidates in xi: a

fitctree splits the observations in node t into left and right child nodes (tL and tR, respectively).

b

fitctree computes ΔI. Suppose that for a particular splitting candidate, tL and tR contain observation indices in the sets TL and TR, respectively. • If xi does not contain any missing values, then the impurity gain for the current splitting candidate is

ΔI = P T it − P TL itL − P TR itR . • If xi contains missing values then, assuming that the observations are missing at random, the impurity gain is ΔIU = P T − TU it − P TL itL − P TR itR . T – TU is the set of all observation indices in node t that are not missing. • If you use surrogate decision splits on page 35-2563, then:

35-2564

i

fitctree computes the predictive measures of association on page 35-2563 between the decision split xj < u and all possible decision splits xk < v, j ≠ k.

ii

fitctree sorts the possible alternative decision splits in descending order by their predictive measure of association with the optimal split. The surrogate split is the decision split yielding the largest measure.

iii

fitctree decides the child node assignments for observations with a missing value for xi using the surrogate split. If the surrogate predictor also contains a missing value, then fitctree uses the decision split with the second largest measure, and so on, until there are no other surrogates. It is possible for fitctree to split two different observations at node t using two different surrogate splits. For example, suppose the predictors x1 and x2 are the best and second best surrogates, respectively, for the predictor xi, i ∉ {1,2}, at node t. If observation m of predictor xi is missing (i.e., xmi is missing), but xm1 is not missing, then x1 is the surrogate predictor for observation xmi. If observations x(m + 1),i and x(m + 1),1 are missing, but x(m + 1),2 is not missing, then x2 is the surrogate predictor for observation m + 1.

fitctree

iv

c

fitctree uses the appropriate impurity gain formula. That is, if fitctree fails to assign all missing observations in node t to children nodes using surrogate splits, then the impurity gain is ΔIU. Otherwise, fitctree uses ΔI for the impurity gain.

fitctree chooses the candidate that yields the largest impurity gain.

fitctree splits the predictor variable at the cut point that maximizes the impurity gain. • For the curvature test (that is, if PredictorSelection is 'curvature'): 1

fitctree conducts curvature tests on page 35-2560 between each predictor and the response for observations in node t. • If all p-values are at least 0.05, then fitctree does not split node t. • If there is a minimal p-value, then fitctree chooses the corresponding predictor to split node t. • If more than one p-value is zero due to underflow, then fitctree applies standard CART to the corresponding predictors to choose the split predictor.

2

If fitctree chooses a split predictor, then it uses standard CART to choose the cut point (see step 4 in the standard CART process).

• For the interaction test (that is, if PredictorSelection is 'interaction-curvature' ): 1

For observations in node t, fitctree conducts curvature tests on page 35-2560 between each predictor and the response and interaction tests on page 35-2562 between each pair of predictors and the response. • If all p-values are at least 0.05, then fitctree does not split node t. • If there is a minimal p-value and it is the result of a curvature test, then fitctree chooses the corresponding predictor to split node t. • If there is a minimal p-value and it is the result of an interaction test, then fitctree chooses the split predictor using standard CART on the corresponding pair of predictors. • If more than one p-value is zero due to underflow, then fitctree applies standard CART to the corresponding predictors to choose the split predictor.

2

If fitctree chooses a split predictor, then it uses standard CART to choose the cut point (see step 4 in the standard CART process).

Tree Depth Control • If MergeLeaves is 'on' and PruneCriterion is 'error' (which are the default values for these name-value pair arguments), then the software applies pruning only to the leaves and by using classification error. This specification amounts to merging leaves that share the most popular class per leaf. • To accommodate MaxNumSplits, fitctree splits all nodes in the current layer, and then counts the number of branch nodes. A layer is the set of nodes that are equidistant from the root node. If the number of branch nodes exceeds MaxNumSplits, fitctree follows this procedure: 1

Determine how many branch nodes in the current layer must be unsplit so that there are at most MaxNumSplits branch nodes.

2

Sort the branch nodes by their impurity gains.

3

Unsplit the number of least successful branches. 35-2565

35

Functions

4

Return the decision tree grown so far.

This procedure produces maximally balanced trees. • The software splits branch nodes layer by layer until at least one of these events occurs: • There are MaxNumSplits branch nodes. • A proposed split causes the number of observations in at least one branch node to be fewer than MinParentSize. • A proposed split causes the number of observations in at least one leaf node to be fewer than MinLeafSize. • The algorithm cannot find a good split within a layer (i.e., the pruning criterion (see PruneCriterion), does not improve for all proposed splits in a layer). A special case is when all nodes are pure (i.e., all observations in the node have the same class). • For values 'curvature' or 'interaction-curvature' of PredictorSelection, all tests yield p-values greater than 0.05. MaxNumSplits and MinLeafSize do not affect splitting at their default values. Therefore, if you set 'MaxNumSplits', splitting might stop due to the value of MinParentSize, before MaxNumSplits splits occur. Parallelization For dual-core systems and above, fitctree parallelizes training decision trees using Intel Threading Building Blocks (TBB). For details on Intel TBB, see https://www.intel.com/content/www/us/en/ developer/tools/oneapi/onetbb.html. Cost, Prior, and Weights If you specify the Cost, Prior, and Weights name-value arguments, the output model object stores the specified values in the Cost, Prior, and W properties, respectively. The Cost property stores the user-specified cost matrix as is. The Prior and W properties store the prior probabilities and observation weights, respectively, after normalization. For details, see “Misclassification Cost Matrix, Prior Probabilities, and Observation Weights” on page 19-8.

Version History Introduced in R2014a

References [1] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984. [2] Coppersmith, D., S. J. Hong, and J. R. M. Hosking. “Partitioning Nominal Attributes in Decision Trees.” Data Mining and Knowledge Discovery, Vol. 3, 1999, pp. 197–217. [3] Loh, W.Y. “Regression Trees with Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, Vol. 12, 2002, pp. 361–386. [4] Loh, W.Y. and Y.S. Shih. “Split Selection Methods for Classification Trees.” Statistica Sinica, Vol. 7, 1997, pp. 815–840. 35-2566

fitctree

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. Usage notes and limitations: • Supported syntaxes are: • tree = fitctree(Tbl,Y) • tree = fitctree(X,Y) • tree = fitctree(___,Name,Value) • [tree,FitInfo,HyperparameterOptimizationResults] = fitctree(___,Name,Value) — fitctree returns the additional output arguments FitInfo and HyperparameterOptimizationResults when you specify the 'OptimizeHyperparameters' name-value pair argument. • The FitInfo output argument is an empty structure array currently reserved for possible future use. • The HyperparameterOptimizationResults output argument is a BayesianOptimization object or a table of hyperparameters with associated values that describe the cross-validation optimization of hyperparameters. 'HyperparameterOptimizationResults' is nonempty when the 'OptimizeHyperparameters' name-value pair argument is nonempty at the time you create the model. The values in 'HyperparameterOptimizationResults' depend on the value you specify for the 'HyperparameterOptimizationOptions' name-value pair argument when you create the model. • If you specify 'bayesopt' (default), then HyperparameterOptimizationResults is an object of class BayesianOptimization. • If you specify 'gridsearch' or 'randomsearch', then HyperparameterOptimizationResults is a table of the hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst). • Supported name-value pair arguments, and any differences, are: • 'AlgorithmForCategorical' • 'CategoricalPredictors' • 'ClassNames' • 'Cost' • 'HyperparameterOptimizationOptions' — For cross-validation, tall optimization supports only 'Holdout' validation. By default, the software selects and reserves 20% of the data as holdout validation data, and trains the model using the rest of the data. You can specify a different value for the holdout fraction by using this argument. For example, specify 'HyperparameterOptimizationOptions',struct('Holdout',0.3) to reserve 30% of the data as validation data. • 'MaxNumCategories' • 'MaxNumSplits'— for tall optimization, fitctree searches among integers, by default logscaled in the range [1,max(2,min(10000,NumObservations-1))]. 35-2567

35

Functions

• 'MergeLeaves' • 'MinLeafSize' • 'MinParentSize' • 'NumVariablesToSample' • 'OptimizeHyperparameters' • 'PredictorNames' • 'Prior' • 'ResponseName' • 'ScoreTransform' • 'SplitCriterion' • 'Weights' • This additional name-value pair argument is specific to tall arrays: • 'MaxDepth' — A positive integer specifying the maximum depth of the output tree. Specify a value for this argument to return a tree that has fewer levels and requires fewer passes through the tall array to compute. Generally, the algorithm of fitctree takes one pass through the data and an additional pass for each tree level. The function does not set a maximum tree depth, by default. For more information, see “Tall Arrays”. Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To perform parallel hyperparameter optimization, use the 'HyperparameterOptimizationOptions', struct('UseParallel',true) name-value argument in the call to the fitctree function. For more information on parallel hyperparameter optimization, see “Parallel Bayesian Optimization” on page 10-7. For general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox). GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • fitctree does not support surrogate splits. You can specify the name-value argument Surrogate only as "off". • For data with categorical predictors, the following apply: • For multiclass classification, fitctree supports only the OVAbyClass algorithm for finding the best split. • You can specify the name-value argument NumVariablesToSample only as "all". • You can specify the name-value argument PredictorSelection only as "allsplits". • fitctree fits the model on a GPU if either of the following apply: 35-2568

fitctree

• The input argument X is a gpuArray object. • The input argument Tbl contains gpuArray predictor variables. • Note that fitctree might not execute faster on a GPU than a CPU for deeper decision trees. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also kfoldPredict | predict | ClassificationTree | ClassificationPartitionedModel | prune Topics “Splitting Categorical Predictors in Classification Trees” on page 20-25

35-2569

35

Functions

fitglm Create generalized linear regression model

Syntax mdl mdl mdl mdl

= = = =

fitglm(tbl) fitglm(X,y) fitglm( ___ ,modelspec) fitglm( ___ ,Name,Value)

Description mdl = fitglm(tbl) returns a generalized linear model fit to variables in the table or dataset array tbl. By default, fitglm takes the last variable as the response variable. mdl = fitglm(X,y) returns a generalized linear model of the responses y, fit to the data matrix X. mdl = fitglm( ___ ,modelspec) returns a generalized linear model of the type you specify in modelspec. mdl = fitglm( ___ ,Name,Value) returns a generalized linear model with additional options specified by one or more Name,Value pair arguments. For example, you can specify which variables are categorical, the distribution of the response variable, and the link function to use.

Examples Fit a Logistic Regression Model Make a logistic binomial model of the probability of smoking as a function of age, weight, and sex, using a two-way interactions model. Load the hospital dataset array. load hospital dsa = hospital;

Specify the model using a formula that allows up to two-way interactions between the variables age, weight, and sex. Smoker is the response variable. modelspec = 'Smoker ~ Age*Weight*Sex - Age:Weight:Sex';

Fit a logistic binomial model. mdl = fitglm(dsa,modelspec,'Distribution','binomial') mdl = Generalized linear regression model: logit(Smoker) ~ 1 + Sex*Age + Sex*Weight + Age*Weight

35-2570

fitglm

Distribution = Binomial Estimated Coefficients:

(Intercept) Sex_Male Age Weight Sex_Male:Age Sex_Male:Weight Age:Weight

Estimate ___________

SE _________

tStat ________

pValue _______

-6.0492 -2.2859 0.11691 0.031109 0.020734 0.01216 -0.00071959

19.749 12.424 0.50977 0.15208 0.20681 0.053168 0.0038964

-0.3063 -0.18399 0.22934 0.20455 0.10025 0.22871 -0.18468

0.75938 0.85402 0.81861 0.83792 0.92014 0.8191 0.85348

100 observations, 93 error degrees of freedom Dispersion: 1 Chi^2-statistic v